]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
4 months agoMerge tag 'amd-drm-fixes-6.13-2024-11-22' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Wed, 27 Nov 2024 01:58:55 +0000 (11:58 +1000)]
Merge tag 'amd-drm-fixes-6.13-2024-11-22' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-fixes-6.13-2024-11-22:

amdgpu:
- SMU 13.0.6 fixes
- XGMI fixes
- SMU 13.0.7 fixes
- Misc code cleanups
- Plane refcount fixes
- DCN 4.0.1 fixes
- DC power fixes
- DTO fixes
- NBIO 7.11 fixes
- SMU 14.0.x fixes
- Reset fixes
- Enable DC on LoongArch
- Sysfs hotplug warning fix
- Misc small fixes
- VCN 4.0.3 fix
- Slab usage fix
- Jpeg delayed work fix

amdkfd:
- wptr handling fixes

radeon:
- Use ttm_bo_move_null()
- Constify struct pci_device_id
- Fix spurious hotplug
- HPD fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241122154441.636075-1-alexander.deucher@amd.com
4 months agoMerge tag 'drm-xe-next-fixes-2024-11-21' of https://gitlab.freedesktop.org/drm/xe...
Dave Airlie [Wed, 27 Nov 2024 01:04:47 +0000 (11:04 +1000)]
Merge tag 'drm-xe-next-fixes-2024-11-21' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

Driver Changes:
- Wake up waiters after wait condition set to true (Nirmoy Das)
- Mark the preempt fence workqueue as reclaim. (Matthew Brost)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Zz-MiVLFjOZQLrlc@fedora
4 months agoMerge tag 'drm-intel-next-fixes-2024-11-21' of https://gitlab.freedesktop.org/drm...
Dave Airlie [Wed, 27 Nov 2024 01:04:06 +0000 (11:04 +1000)]
Merge tag 'drm-intel-next-fixes-2024-11-21' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

- Fix when the first read and write are retried [hdcp] (Suraj Kandpal)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Zz7xWbodMn9zZD_C@linux
5 months agoRevert "drm/radeon: Delay Connector detecting when HPD singals is unstable"
Alex Deucher [Thu, 14 Nov 2024 21:23:45 +0000 (16:23 -0500)]
Revert "drm/radeon: Delay Connector detecting when HPD singals is unstable"

This reverts commit 949658cb9b69ab9d22a42a662b2fdc7085689ed8.

This causes a blank screen on boot.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3696
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Shixiong Ou <oushixiong@kylinos.cn>
Cc: stable@vger.kernel.org
5 months agodrm/amdgpu/jpeg: cancel the jpeg worker
Alex Deucher [Wed, 13 Nov 2024 19:28:54 +0000 (14:28 -0500)]
drm/amdgpu/jpeg: cancel the jpeg worker

Looks like these got missed when jpeg was split from vcn.
Cancel the jpeg workers rather than vcn workers.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: fix usage slab after free
Vitaly Prosyak [Mon, 11 Nov 2024 22:24:08 +0000 (17:24 -0500)]
drm/amdgpu: fix usage slab after free

[  +0.000021] BUG: KASAN: slab-use-after-free in drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[  +0.000027] Read of size 8 at addr ffff8881b8605f88 by task amd_pci_unplug/2147

[  +0.000023] CPU: 6 PID: 2147 Comm: amd_pci_unplug Not tainted 6.10.0+ #1
[  +0.000016] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
[  +0.000016] Call Trace:
[  +0.000008]  <TASK>
[  +0.000009]  dump_stack_lvl+0x76/0xa0
[  +0.000017]  print_report+0xce/0x5f0
[  +0.000017]  ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[  +0.000019]  ? srso_return_thunk+0x5/0x5f
[  +0.000015]  ? kasan_complete_mode_report_info+0x72/0x200
[  +0.000016]  ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[  +0.000019]  kasan_report+0xbe/0x110
[  +0.000015]  ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[  +0.000023]  __asan_report_load8_noabort+0x14/0x30
[  +0.000014]  drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[  +0.000020]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? __kasan_check_write+0x14/0x30
[  +0.000016]  ? __pfx_drm_sched_entity_flush+0x10/0x10 [gpu_sched]
[  +0.000020]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? __kasan_check_write+0x14/0x30
[  +0.000013]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? enable_work+0x124/0x220
[  +0.000015]  ? __pfx_enable_work+0x10/0x10
[  +0.000013]  ? srso_return_thunk+0x5/0x5f
[  +0.000014]  ? free_large_kmalloc+0x85/0xf0
[  +0.000016]  drm_sched_entity_destroy+0x18/0x30 [gpu_sched]
[  +0.000020]  amdgpu_vce_sw_fini+0x55/0x170 [amdgpu]
[  +0.000735]  ? __kasan_check_read+0x11/0x20
[  +0.000016]  vce_v4_0_sw_fini+0x80/0x110 [amdgpu]
[  +0.000726]  amdgpu_device_fini_sw+0x331/0xfc0 [amdgpu]
[  +0.000679]  ? mutex_unlock+0x80/0xe0
[  +0.000017]  ? __pfx_amdgpu_device_fini_sw+0x10/0x10 [amdgpu]
[  +0.000662]  ? srso_return_thunk+0x5/0x5f
[  +0.000014]  ? __kasan_check_write+0x14/0x30
[  +0.000013]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? mutex_unlock+0x80/0xe0
[  +0.000016]  amdgpu_driver_release_kms+0x16/0x80 [amdgpu]
[  +0.000663]  drm_minor_release+0xc9/0x140 [drm]
[  +0.000081]  drm_release+0x1fd/0x390 [drm]
[  +0.000082]  __fput+0x36c/0xad0
[  +0.000018]  __fput_sync+0x3c/0x50
[  +0.000014]  __x64_sys_close+0x7d/0xe0
[  +0.000014]  x64_sys_call+0x1bc6/0x2680
[  +0.000014]  do_syscall_64+0x70/0x130
[  +0.000014]  ? srso_return_thunk+0x5/0x5f
[  +0.000014]  ? irqentry_exit_to_user_mode+0x60/0x190
[  +0.000015]  ? srso_return_thunk+0x5/0x5f
[  +0.000014]  ? irqentry_exit+0x43/0x50
[  +0.000012]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? exc_page_fault+0x7c/0x110
[  +0.000015]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  +0.000014] RIP: 0033:0x7ffff7b14f67
[  +0.000013] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 73 ba f7 ff
[  +0.000026] RSP: 002b:00007fffffffe378 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[  +0.000019] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff7b14f67
[  +0.000014] RDX: 0000000000000000 RSI: 00007ffff7f6f47a RDI: 0000000000000003
[  +0.000014] RBP: 00007fffffffe3a0 R08: 0000555555569890 R09: 0000000000000000
[  +0.000014] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5c8
[  +0.000013] R13: 00005555555552a9 R14: 0000555555557d48 R15: 00007ffff7ffd040
[  +0.000020]  </TASK>

[  +0.000016] Allocated by task 383 on cpu 7 at 26.880319s:
[  +0.000014]  kasan_save_stack+0x28/0x60
[  +0.000008]  kasan_save_track+0x18/0x70
[  +0.000007]  kasan_save_alloc_info+0x38/0x60
[  +0.000007]  __kasan_kmalloc+0xc1/0xd0
[  +0.000007]  kmalloc_trace_noprof+0x180/0x380
[  +0.000007]  drm_sched_init+0x411/0xec0 [gpu_sched]
[  +0.000012]  amdgpu_device_init+0x695f/0xa610 [amdgpu]
[  +0.000658]  amdgpu_driver_load_kms+0x1a/0x120 [amdgpu]
[  +0.000662]  amdgpu_pci_probe+0x361/0xf30 [amdgpu]
[  +0.000651]  local_pci_probe+0xe7/0x1b0
[  +0.000009]  pci_device_probe+0x248/0x890
[  +0.000008]  really_probe+0x1fd/0x950
[  +0.000008]  __driver_probe_device+0x307/0x410
[  +0.000007]  driver_probe_device+0x4e/0x150
[  +0.000007]  __driver_attach+0x223/0x510
[  +0.000006]  bus_for_each_dev+0x102/0x1a0
[  +0.000007]  driver_attach+0x3d/0x60
[  +0.000006]  bus_add_driver+0x2ac/0x5f0
[  +0.000006]  driver_register+0x13d/0x490
[  +0.000008]  __pci_register_driver+0x1ee/0x2b0
[  +0.000007]  llc_sap_close+0xb0/0x160 [llc]
[  +0.000009]  do_one_initcall+0x9c/0x3e0
[  +0.000008]  do_init_module+0x241/0x760
[  +0.000008]  load_module+0x51ac/0x6c30
[  +0.000006]  __do_sys_init_module+0x234/0x270
[  +0.000007]  __x64_sys_init_module+0x73/0xc0
[  +0.000006]  x64_sys_call+0xe3/0x2680
[  +0.000006]  do_syscall_64+0x70/0x130
[  +0.000007]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[  +0.000015] Freed by task 2147 on cpu 6 at 160.507651s:
[  +0.000013]  kasan_save_stack+0x28/0x60
[  +0.000007]  kasan_save_track+0x18/0x70
[  +0.000007]  kasan_save_free_info+0x3b/0x60
[  +0.000007]  poison_slab_object+0x115/0x1c0
[  +0.000007]  __kasan_slab_free+0x34/0x60
[  +0.000007]  kfree+0xfa/0x2f0
[  +0.000007]  drm_sched_fini+0x19d/0x410 [gpu_sched]
[  +0.000012]  amdgpu_fence_driver_sw_fini+0xc4/0x2f0 [amdgpu]
[  +0.000662]  amdgpu_device_fini_sw+0x77/0xfc0 [amdgpu]
[  +0.000653]  amdgpu_driver_release_kms+0x16/0x80 [amdgpu]
[  +0.000655]  drm_minor_release+0xc9/0x140 [drm]
[  +0.000071]  drm_release+0x1fd/0x390 [drm]
[  +0.000071]  __fput+0x36c/0xad0
[  +0.000008]  __fput_sync+0x3c/0x50
[  +0.000007]  __x64_sys_close+0x7d/0xe0
[  +0.000007]  x64_sys_call+0x1bc6/0x2680
[  +0.000007]  do_syscall_64+0x70/0x130
[  +0.000007]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[  +0.000014] The buggy address belongs to the object at ffff8881b8605f80
               which belongs to the cache kmalloc-64 of size 64
[  +0.000020] The buggy address is located 8 bytes inside of
               freed 64-byte region [ffff8881b8605f80ffff8881b8605fc0)

[  +0.000028] The buggy address belongs to the physical page:
[  +0.000011] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1b8605
[  +0.000008] anon flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[  +0.000007] page_type: 0xffffefff(slab)
[  +0.000009] raw: 0017ffffc0000000 ffff8881000428c0 0000000000000000 dead000000000001
[  +0.000006] raw: 0000000000000000 0000000000200020 00000001ffffefff 0000000000000000
[  +0.000006] page dumped because: kasan: bad access detected

[  +0.000012] Memory state around the buggy address:
[  +0.000011]  ffff8881b8605e80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[  +0.000015]  ffff8881b8605f00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[  +0.000015] >ffff8881b8605f80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[  +0.000013]                       ^
[  +0.000011]  ffff8881b8606000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
[  +0.000014]  ffff8881b8606080: fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb fb
[  +0.000013] ==================================================================

The issue reproduced on VG20 during the IGT pci_unplug test.
The root cause of the issue is that the function drm_sched_fini is called before drm_sched_entity_kill.
In drm_sched_fini, the drm_sched_rq structure is freed, but this structure is later accessed by
each entity within the run queue, leading to invalid memory access.
To resolve this, the order of cleanup calls is updated:

    Before:
        amdgpu_fence_driver_sw_fini
        amdgpu_device_ip_fini

    After:
        amdgpu_device_ip_fini
        amdgpu_fence_driver_sw_fini

This updated order ensures that all entities in the IPs are cleaned up first, followed by proper
cleanup of the schedulers.

Additional Investigation:

During debugging, another issue was identified in the amdgpu_vce_sw_fini function. The vce.vcpu_bo
buffer must be freed only as the final step in the cleanup process to prevent any premature
access during earlier cleanup stages.

v2: Using Christian suggestion call drm_sched_entity_destroy before drm_sched_fini.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
5 months agodrm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3
Xiang Liu [Fri, 15 Nov 2024 08:59:30 +0000 (16:59 +0800)]
drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3

It is not necessarily corrupted. When there is RAS fatal error, device
memory access is blocked. Hence vcpu bo cannot be saved to system memory
as in a regular suspend sequence before going for reset. In other full
device reset cases, that gets saved and restored during resume.

v2: Remove redundant code like vcn_v4_0 did
v2: Refine commit message
v3: Drop the volatile
v3: Refine commit message

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Fix sysfs warning when hotplugging
Jesse.zhang@amd.com [Mon, 18 Nov 2024 04:06:30 +0000 (12:06 +0800)]
drm/amdgpu: Fix sysfs warning when hotplugging

Fix the similar warning when hotplugging:

[  155.585721] kernfs: can not remove 'enforce_isolation', no directory
[  155.592201] WARNING: CPU: 3 PID: 6960 at fs/kernfs/dir.c:1683 kernfs_remove_by_name_ns+0xb9/0xc0
[  155.601145] Modules linked in: xt_MASQUERADE xt_comment nft_compat veth bridge stp llc overlay nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr intel_rapl_msr amd_atl intel_rapl_common amd64_edac edac_mce_amd amdgpu kvm_amd kvm ipmi_ssif amdxcp rapl drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_ttm_helper ttm pcspkr drm_display_helper acpi_cpufreq drm_kms_helper video wmi k10temp i2c_piix4 acpi_ipmi ipmi_si drm zram ip_tables loop squashfs dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 sp5100_tco ixgbe rfkill ccp dca sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_msghandler fuse
[  155.685224] systemd-journald[1354]: Compressed data object 957 -> 524 using ZSTD
[  155.685687] CPU: 3 PID: 6960 Comm: amd_pci_unplug Not tainted 6.10.0-1148853.1.zuul.164395107d6642bdb451071313e9378d #1
[  155.704149] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019
[  155.712383] RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0
[  155.717805] Code: a0 00 48 89 ef e8 37 96 c7 ff 5b b8 fe ff ff ff 5d 41 5c 41 5d e9 f7 96 a0 00 0f 0b eb ab 48 c7 c7 48 ba 7e 8f e8 f7 66 bf ff <0f> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
[  155.736766] RSP: 0018:ffffb1685d7a3e20 EFLAGS: 00010296
[  155.742108] RAX: 0000000000000038 RBX: ffff929e94c80000 RCX: 0000000000000000
[  155.749363] RDX: ffff928e1efaf200 RSI: ffff928e1efa18c0 RDI: ffff928e1efa18c0
[  155.756612] RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000003
[  155.763855] R10: ffffb1685d7a3cd8 R11: ffffffff8fb3e1c8 R12: ffffffffc1ef5341
[  155.771104] R13: ffff929e94cc5530 R14: 0000000000000000 R15: 0000000000000000
[  155.778357] FS:  00007fd9dd8d9c40(0000) GS:ffff928e1ef80000(0000) knlGS:0000000000000000
[  155.786594] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  155.792450] CR2: 0000561245ceee38 CR3: 0000000113018000 CR4: 00000000003506f0
[  155.799702] Call Trace:
[  155.802254]  <TASK>
[  155.804460]  ? __warn+0x80/0x120
[  155.807798]  ? kernfs_remove_by_name_ns+0xb9/0xc0
[  155.812617]  ? report_bug+0x164/0x190
[  155.816393]  ? handle_bug+0x3c/0x80
[  155.819994]  ? exc_invalid_op+0x17/0x70
[  155.823939]  ? asm_exc_invalid_op+0x1a/0x20
[  155.828235]  ? kernfs_remove_by_name_ns+0xb9/0xc0
[  155.833058]  amdgpu_gfx_sysfs_fini+0x59/0xd0 [amdgpu]
[  155.838637]  gfx_v9_0_sw_fini+0x123/0x1c0 [amdgpu]
[  155.843887]  amdgpu_device_fini_sw+0xbc/0x3e0 [amdgpu]
[  155.849432]  amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
[  155.855235]  drm_dev_put.part.0+0x3c/0x60 [drm]
[  155.859914]  drm_release+0x8b/0xc0 [drm]
[  155.863978]  __fput+0xf1/0x2c0
[  155.867141]  __x64_sys_close+0x3c/0x80
[  155.870998]  do_syscall_64+0x64/0x170

V2: Add details in comments (Tim)

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reported-by: Andy Dong <andy.dong@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Add sysfs interface for vcn reset mask
Jesse.zhang@amd.com [Thu, 7 Nov 2024 14:53:47 +0000 (22:53 +0800)]
drm/amdgpu: Add sysfs interface for vcn reset mask

Add the sysfs interface for vcn:
vcn_reset_mask

The interface is read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.

V2: the sysfs node returns a text string instead of some flags (Christian)

V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
    and print the strings in the order they are applied (Christian)

    check amdgpu_gpu_recovery  before creating sysfs file itself,
    and initialize supported_reset_types in IP version files (Lijo)
v4: s/sdma/vcn/ in the reset mask setup

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu/gmc7: fix wait_for_idle callers
Alex Deucher [Tue, 19 Nov 2024 19:19:07 +0000 (14:19 -0500)]
drm/amdgpu/gmc7: fix wait_for_idle callers

The wait_for_idle signature was changed, but the callers
were not.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reported-by: Michel Dänzer <michel@daenzer.net>
Fixes: 82ae6619a450 ("drm/amdgpu: update the handle ptr in wait_for_idle")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
5 months agodrm/amd/pm: Remove arcturus min power limit
Lijo Lazar [Wed, 20 Nov 2024 03:04:39 +0000 (08:34 +0530)]
drm/amd/pm: Remove arcturus min power limit

As per power team, there is no need to impose a lower bound on arcturus
power limit. Any unreasonable limit set will result in frequent
throttling.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
5 months agodrm/amd/pm: skip setting the power source on smu v14.0.2/3
Kenneth Feng [Tue, 19 Nov 2024 07:03:22 +0000 (15:03 +0800)]
drm/amd/pm: skip setting the power source on smu v14.0.2/3

skip setting power source on smu v14.0.2/3

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
5 months agodrm/amd/pm: disable pcie speed switching on Intel platform for smu v14.0.2/3
Kenneth Feng [Tue, 19 Nov 2024 06:26:58 +0000 (14:26 +0800)]
drm/amd/pm: disable pcie speed switching on Intel platform for smu v14.0.2/3

disable pcie speed switching on Intel platform for smu v14.0.2/3
based on Intel's requirement.
v2: align the setting with smu v13.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
5 months agodrm/amdkfd: Use the correct wptr size
Lijo Lazar [Mon, 11 Nov 2024 14:41:38 +0000 (20:11 +0530)]
drm/amdkfd: Use the correct wptr size

Write pointer could be 32-bit or 64-bit. Use the correct size during
initialization.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
5 months agodrm/xe: Mark preempt fence workqueue as reclaim
Matthew Brost [Wed, 13 Nov 2024 17:17:51 +0000 (09:17 -0800)]
drm/xe: Mark preempt fence workqueue as reclaim

Preempt fences are in the path of reclaim, and we signal these fences in
the preempt workqueue. With that, we need to mark the preempt fence
workqueue with reclaim so that this workqueue can make forward progress
during reclaim.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241113171751.1677784-1-matthew.brost@intel.com
(cherry picked from commit 15cf53ece41748a102f4b5ee26947c2ec059bf95)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
5 months agodrm/xe/ufence: Wake up waiters after setting ufence->signalled
Nirmoy Das [Thu, 14 Nov 2024 15:05:37 +0000 (16:05 +0100)]
drm/xe/ufence: Wake up waiters after setting ufence->signalled

If a previous ufence is not signalled, vm_bind will return -EBUSY.
Delaying the modification of ufence->signalled can cause issues if the
UMD reuses the same ufence so update ufence->signalled before waking up
waiters.

Cc: Matthew Brost <matthew.brost@intel.com>
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3233
Fixes: 977e5b82e090 ("drm/xe: Expose user fence from xe_sync_entry")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241114150537.4161573-1-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
(cherry picked from commit 553a5d14fcd927194c409b10faced6a6dbc678d1)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
5 months agodrm/amd/display: Allow building DC with clang on LoongArch
Huacai Chen [Fri, 15 Nov 2024 15:02:25 +0000 (23:02 +0800)]
drm/amd/display: Allow building DC with clang on LoongArch

Clang on LoongArch (18+) appears to be unaffected by the bug causing
excessive stack usage in calculate_bandwidth(). But when building DC_FP
support the stack frame size can be as large as 2816 bytes, which causes
the FRAME_WARN build warnings. So on LoongArch we allow building DC with
clang, but disable DC_FP by default.

The help message is also updated.

Tested-by: Rui Wang <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Fix null check for pipe_ctx->plane_state in hwss_setup_dpp
Zicheng Qu [Tue, 5 Nov 2024 14:01:37 +0000 (14:01 +0000)]
drm/amd/display: Fix null check for pipe_ctx->plane_state in hwss_setup_dpp

This commit addresses a null pointer dereference issue in
hwss_setup_dpp(). The issue could occur when pipe_ctx->plane_state is
null. The fix adds a check to ensure `pipe_ctx->plane_state` is not null
before accessing. This prevents a null pointer dereference.

Fixes: 0baae6246307 ("drm/amd/display: Refactor fast update to use new HWSS build sequence")
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Fix null check for pipe_ctx->plane_state in dcn20_program_pipe
Zicheng Qu [Tue, 5 Nov 2024 14:01:36 +0000 (14:01 +0000)]
drm/amd/display: Fix null check for pipe_ctx->plane_state in dcn20_program_pipe

This commit addresses a null pointer dereference issue in
dcn20_program_pipe(). Previously, commit 8e4ed3cf1642 ("drm/amd/display:
Add null check for pipe_ctx->plane_state in dcn20_program_pipe")
partially fixed the null pointer dereference issue. However, in
dcn20_update_dchubp_dpp(), the variable pipe_ctx is passed in, and
plane_state is accessed again through pipe_ctx. Multiple if statements
directly call attributes of plane_state, leading to potential null
pointer dereference issues. This patch adds necessary null checks to
ensure stability.

Fixes: 8e4ed3cf1642 ("drm/amd/display: Add null check for pipe_ctx->plane_state in dcn20_program_pipe")
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/radeon: Fix spurious unplug event on radeon HDMI
Steven 'Steve' Kendall [Fri, 15 Nov 2024 21:17:58 +0000 (21:17 +0000)]
drm/radeon: Fix spurious unplug event on radeon HDMI

On several HP models (tested on HP 3125 and HP Probook 455 G2),
spurious unplug events are emitted upon login on Chrome OS.
This is likely due to the way Chrome OS restarts graphics
upon login, so it's possible it's an issue on other
distributions but not as common, though I haven't
reproduced the issue elsewhere.
Use logic from an earlier version of the merged change
(see link below) which iterates over connectors and finds
matching encoders, rather than the other way around.
Also fixes an issue with screen mirroring on Chrome OS.
I've deployed this patch on Fedora and did not observe
any regression on these devices.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1569#note_1603002
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3771
Fixes: 20ea34710f7b ("drm/radeon: Add HD-audio component notifier support (v6)")
Signed-off-by: Steven 'Steve' Kendall <skend@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/radeon: Constify struct pci_device_id
Christophe JAILLET [Fri, 15 Nov 2024 17:26:06 +0000 (18:26 +0100)]
drm/radeon: Constify struct pci_device_id

'struct pci_device_id' is not modified in this driver.

Constifying this structure moves some data to a read-only section, so
increase overall security.

On a x86_64, with allmodconfig:
Before:
======
   text    data     bss     dec     hex filename
  11984   28672      44   40700    9efc drivers/gpu/drm/radeon/radeon_drv.o

After:
=====
   text    data     bss     dec     hex filename
  40000     664      44   40708    9f04 drivers/gpu/drm/radeon/radeon_drv.o

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Use reset recovery state checks
Lijo Lazar [Fri, 15 Nov 2024 06:05:50 +0000 (11:35 +0530)]
drm/amdgpu: Use reset recovery state checks

Some in_reset checks are infact checking whether the state is
reinitialization after reset. Replace with reset_in_recovery calls to
identify that it's really checking for recovery stage after reset.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Add init level for post reset reinit
Lijo Lazar [Fri, 15 Nov 2024 05:38:02 +0000 (11:08 +0530)]
drm/amdgpu: Add init level for post reset reinit

When device needs to be reset before initialization, it's not required
for all IPs to be initialized before a reset. In such cases, it needs to
identify whether the IP/feature is initialized for the first time or
whether it's reinitialized after a reset.

Add RESET_RECOVERY init level to identify post reset reinitialization
phase. This only provides a device level identification, IP/features may
choose to track their state independently also.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu/pm: add gen5 display to the user on smu v14.0.2/3
Kenneth Feng [Tue, 19 Nov 2024 03:10:47 +0000 (11:10 +0800)]
drm/amdgpu/pm: add gen5 display to the user on smu v14.0.2/3

add gen5 display to the user on smu v14.0.2/3

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
5 months agodrm/amdkfd: make sure ring buffer is flushed before update wptr
Victor Zhao [Thu, 14 Nov 2024 09:45:34 +0000 (17:45 +0800)]
drm/amdkfd: make sure ring buffer is flushed before update wptr

In a consecutive packet submission, for example unmap and query status,
when CP is reading wptr caused by unmap packet doorbell ring, if in some
case CP operates slower (e.g. doorbell_mode=1) and wptr has been updated
to next packet (query status), but the query status packet content has
not been flushed to memory yet, it will cause CP fetched stalled data.

Adding mb to ensure ring buffer has been updated before updating wptr.
Also adding a mb to ensure wptr updated before doorbell ring.

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd: Fix initialization mistake for NBIO 7.11 devices
Mario Limonciello [Mon, 18 Nov 2024 17:46:11 +0000 (11:46 -0600)]
drm/amd: Fix initialization mistake for NBIO 7.11 devices

There is a strapping issue on NBIO 7.11.x that can lead to spurious PME
events while in the D0 state.

Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20241118174611.10700-2-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd: Add some missing straps from NBIO 7.11.0
Mario Limonciello [Mon, 18 Nov 2024 17:46:10 +0000 (11:46 -0600)]
drm/amd: Add some missing straps from NBIO 7.11.0

Earlier ASICs have strap information exported, and this is missing
for NBIO 7.11.0.

Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: ca8c68142ad8 ("drm/amdgpu: add nbio 7.11 registers")
Link: https://lore.kernel.org/r/20241118174611.10700-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: 3.2.310
Aric Cyr [Mon, 11 Nov 2024 01:09:27 +0000 (20:09 -0500)]
drm/amd/display: 3.2.310

This version brings along the following:

- DC core fixes
- DCN35 fix
- DCN4+ fixes
- DML2 fix
- New SPL features

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Remove PIPE_DTO_SRC_SEL programming from set_dtbclk_dto
Ovidiu Bunea [Wed, 6 Nov 2024 21:25:18 +0000 (16:25 -0500)]
drm/amd/display: Remove PIPE_DTO_SRC_SEL programming from set_dtbclk_dto

There are cases where an OTG is remapped from driving a regular HDMI
display to a DP/eDP display. There are also cases where DTBCLK needs to
be enabled for HPO, but DTBCLK DTO programming may be done while OTG is
still enabled which is dangerous as the PIPE_DTO_SRC_SEL programming may
change the pixel clock generator source for a mapped and running OTG and
cause it to hang.

Remove the PIPE_DTO_SRC_SEL programming from this sequence since it is
already done in program_pixel_clk(). Additionally, make sure that
program_pixel_clk sets DTBCLK DTO as source for special HDMI cases.

Cc: stable@vger.kernel.org # 6.11+
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: allow chroma 1:1 scaling when sharpness is off
Samson Tam [Fri, 8 Nov 2024 00:05:03 +0000 (19:05 -0500)]
drm/amd/display: allow chroma 1:1 scaling when sharpness is off

[Why]
SPL code forces taps to 1 when ratio is 1:1 and sharpness is off
But for chroma 1:1, need taps > 1 to handle cositing

[How]
Do not force chroma taps to 1 when ratio is 1:1 for YUV420
Remove 420_CHROMA_BYPASS mode for scaler

Reviewed-by: Navid Assadian <navid.assadian@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Populate Power Profile In Case of Early Return
Austin Zheng [Tue, 5 Nov 2024 15:22:02 +0000 (10:22 -0500)]
drm/amd/display: Populate Power Profile In Case of Early Return

Early return possible if context has no clk_mgr.
This will lead to an invalid power profile being returned
which looks identical to a profile with the lowest power level.
Add back logic that populated the power profile and overwrite
the value if needed.

Cc: stable@vger.kernel.org
Fixes: d016d0dd5a57 ("drm/amd/display: Update Interface to Check UCLK DPM")
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: add public taps API in SPL
Samson Tam [Tue, 5 Nov 2024 15:15:19 +0000 (10:15 -0500)]
drm/amd/display: add public taps API in SPL

[Why]
Add public API to obtain number of taps in SPL.

[How]
Isolate function to calculate recout, ratios and viewport before
calculating taps. Call function in both public taps API call and private
scaling call.

Reviewed-by: Jun Lei <jun.lei@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Enable Request rate limiter during C-State on dcn401
Dillon Varone [Fri, 1 Nov 2024 14:51:02 +0000 (10:51 -0400)]
drm/amd/display: Enable Request rate limiter during C-State on dcn401

[WHY]
When C-State entry is requested, the rate limiter will be disabled
which can result in high contention in the DCHUB return path.

[HOW]
Enable the rate limiter during C-state requests to prevent contention.

Cc: stable@vger.kernel.org # 6.11+
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Fix handling of plane refcount
Joshua Aberback [Mon, 28 Oct 2024 21:12:22 +0000 (17:12 -0400)]
drm/amd/display: Fix handling of plane refcount

[Why]
The mechanism to backup and restore plane states doesn't maintain
refcount, which can cause issues if the refcount of the plane changes
in between backup and restore operations, such as memory leaks if the
refcount was supposed to go down, or double frees / invalid memory
accesses if the refcount was supposed to go up.

[How]
Cache and re-apply current refcount when restoring plane states.

Cc: stable@vger.kernel.org
Reviewed-by: Josip Pavic <josip.pavic@amd.com>
Signed-off-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Ignore scalar validation failure if pipe is phantom
Chris Park [Mon, 4 Nov 2024 18:18:39 +0000 (13:18 -0500)]
drm/amd/display: Ignore scalar validation failure if pipe is phantom

[Why]
There are some pipe scaler validation failure when the pipe is phantom
and causes crash in DML validation. Since, scalar parameters are not
as important in phantom pipe and we require this plane to do successful
MCLK switches, the failure condition can be ignored.

[How]
Ignore scalar validation failure if the pipe validation is marked as
phantom pipe.

Cc: stable@vger.kernel.org # 6.11+
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Chris Park <chris.park@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: update pipe selection policy to check head pipe
Yihan Zhu [Wed, 30 Oct 2024 20:20:21 +0000 (16:20 -0400)]
drm/amd/display: update pipe selection policy to check head pipe

[Why]
No check on head pipe during the dml to dc hw mapping will allow illegal
pipe usage. This will result in a wrong pipe topology to cause mpcc tree
totally mess up then cause a display hang.

[How]
Avoid to use the pipe is head in all check and avoid ODM slice during
preferred pipe check.

Cc: stable@vger.kernel.org
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: remove redundant is_dsc_possible check
Bhavin Sharma [Thu, 14 Nov 2024 15:11:11 +0000 (20:41 +0530)]
drm/amd/display: remove redundant is_dsc_possible check

Since is_dsc_possible is already checked just above, there's no need to
check it again before filling out the DSC settings.

Signed-off-by: Bhavin Sharma <bhavin.sharma@siliconsignals.io>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/pm: remove redundant tools_size check
Bhavin Sharma [Thu, 14 Nov 2024 15:11:12 +0000 (20:41 +0530)]
drm/amd/pm: remove redundant tools_size check

The check for tools_size being non-zero is redundant as tools_size is
explicitly set to a non-zero value (0x19000). Removing the if condition
simplifies the code without altering functionality.

Signed-off-by: Bhavin Sharma <bhavin.sharma@siliconsignals.io>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/pm: update current_socclk and current_uclk in gpu_metrics on smu v13.0.7
Umio Yasuno [Thu, 14 Nov 2024 07:15:27 +0000 (16:15 +0900)]
drm/amd/pm: update current_socclk and current_uclk in gpu_metrics on smu v13.0.7

These were missed before.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3751
Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
5 months agodrm/radeon: Use ttm_bo_move_null() in radeon_bo_move()
Huacai Chen [Wed, 13 Nov 2024 12:51:58 +0000 (20:51 +0800)]
drm/radeon: Use ttm_bo_move_null() in radeon_bo_move()

Since ttm_bo_move_null() is exactly the same as ttm_resource_free() +
ttm_bo_assign_mem(), we use ttm_bo_move_null() for the GTT --> SYSTEM
move case too. Then the code is more consistent as the SYSTEM --> GTT
move case.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/pm: Get xgmi link status for XGMI_v_6_4_0
Asad Kamal [Mon, 11 Nov 2024 13:16:01 +0000 (21:16 +0800)]
drm/amd/pm: Get xgmi link status for XGMI_v_6_4_0

Get XGMI_v_6_4_0 link status and populate it to metrics v1_7 for
SMU_v_13_0_6

v2: Get link status register value for each soc from separate
function (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/pm: Add gpu_metrics_v1_7
Asad Kamal [Mon, 11 Nov 2024 12:11:48 +0000 (20:11 +0800)]
drm/amd/pm: Add gpu_metrics_v1_7

Add new gpu_metrics_v1_7 to acquire xgmi link status,
application counter and max vram bandwidth

v2: Use gpu_metrics_v1_7 for SMU_v_13_0_6 (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/pm: Update data types used for uapi i/f
Asad Kamal [Wed, 13 Nov 2024 09:17:23 +0000 (17:17 +0800)]
drm/amd/pm: Update data types used for uapi i/f

Update member's data type in amdgpu_xcp_metrics from linux specific
to the ones compatible to uapi interface

Fixes: 4c07ff7d07f7 ("drm/amd/pm: Add gpu_metrics_v1_6")
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/i915/hdcp: Fix when the first read and write are retried
Suraj Kandpal [Mon, 4 Nov 2024 03:59:51 +0000 (09:29 +0530)]
drm/i915/hdcp: Fix when the first read and write are retried

Make sure that the first read/write in hdcp2_authentication_key_exchange
are only retried when we have either DP/DPMST encoder connected,
since we do this to give docks and dp encoders some extra time to
get their HDCP DPCD registers ready only for DP/DPMST encoders as
this issue is only observed here no need to burden other encoders
with extra retries as this causes the HDMI connector to have some
other timing issue and fails HDCP authentication.

--v2
-Add intent of patch [Chaitanya]
-Add reasoning for loop [Jani]
-Make sure we forfiet the 50ms wait for non DP/DPMST encoders.

--v3
-Remove the is_dp_encoder check [Jani/Chaitanya]
-Make the commit message more clearer [Jani]

Fixes: 9d5a05f86d2f ("drm/i915/hdcp: Retry first read and writes to downstream")
Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241104035951.517837-1-suraj.kandpal@intel.com
(cherry picked from commit 44499559496c1dac43583f4387d38de1b612a69b)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
5 months agoMerge tag 'drm-xe-next-fixes-2024-11-15' of https://gitlab.freedesktop.org/drm/xe...
Dave Airlie [Mon, 18 Nov 2024 03:38:42 +0000 (13:38 +1000)]
Merge tag 'drm-xe-next-fixes-2024-11-15' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

Driver Changes:
- Fix a NULL pointer deref (Everest K.C.)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZzcsMT_FEqBE0cAW@fedora
5 months agoMerge tag 'amd-drm-next-6.13-2024-11-15' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Mon, 18 Nov 2024 02:36:34 +0000 (12:36 +1000)]
Merge tag 'amd-drm-next-6.13-2024-11-15' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.13-2024-11-15:

amdgpu:
- Parition fixes
- GFX 12 fixes
- SR-IOV fixes
- MES fixes
- RAS fixes
- GC queue handling fixes
- VCN fixes
- Add sysfs reset masks
- Better error messages for P2P failurs
- SMU fixes
- Documentation updates
- GFX11 enforce isolation updates
- Display HPD fixes
- PSR fixes
- Panel replay fixes
- DP MST fixes
- USB4 fixes
- Misc display fixes and cleanups
- VRAM handling fix for APUs
- NBIO fix

amdkfd:
- INIT_WORK fix
- Refcount fix
- KFD MES scheduling fixes

drm/fourcc:
- Add missing tiling mode

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241115165012.573465-1-alexander.deucher@amd.com
5 months agodrm/xe/guc: Fix dereference before NULL check
Everest K.C. [Wed, 23 Oct 2024 23:33:55 +0000 (17:33 -0600)]
drm/xe/guc: Fix dereference before NULL check

The pointer list->list is dereferenced before the NULL check.
Fix this by moving the NULL check outside the for loop, so that
the check is performed before the dereferencing.
The list->list pointer cannot be NULL so this has no effect on runtime.
It's just a correctness issue.

This issue was reported by Coverity Scan.
https://scan7.scan.coverity.com/#/project-view/51525/11354?selectedIssue=1600335

Fixes: 0f1fdf559225 ("drm/xe/guc: Save manual engine capture into capture list")
Signed-off-by: Everest K.C. <everestkc@everestkc.com.np>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241023233356.5479-1-everestkc@everestkc.com.np
(cherry picked from commit 2aff81e039de5b0b7ef6bdcb2c320f121f69e2b4)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
5 months agodrm/amd: Fix initialization mistake for NBIO 7.7.0
Vijendar Mukunda [Tue, 12 Nov 2024 16:11:42 +0000 (10:11 -0600)]
drm/amd: Fix initialization mistake for NBIO 7.7.0

There is a strapping issue on NBIO 7.7.0 that can lead to spurious PME
events while in the D0 state.

Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20241112161142.28974-1-mario.limonciello@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agoRevert "drm/amd/display: parse umc_info or vram_info based on ASIC"
Alex Deucher [Fri, 8 Nov 2024 14:34:46 +0000 (09:34 -0500)]
Revert "drm/amd/display: parse umc_info or vram_info based on ASIC"

This reverts commit 2551b4a321a68134360b860113dd460133e856e5.

This was not the root cause.  Revert.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3678
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: aurabindo.pillai@amd.com
Cc: hamishclaxton@gmail.com
5 months agodrm/amd/display: Fix failure to read vram info due to static BP_RESULT
Hamish Claxton [Tue, 5 Nov 2024 00:42:31 +0000 (10:42 +1000)]
drm/amd/display: Fix failure to read vram info due to static BP_RESULT

The static declaration causes the check to fail.  Remove it.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3678
Fixes: 00c391102abc ("drm/amd/display: Add misc DC changes for DCN401")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Hamish Claxton <hamishclaxton@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: aurabindo.pillai@amd.com
Cc: hamishclaxton@gmail.com
5 months agodrm/amdgpu: enable GTT fallback handling for dGPUs only
Christian König [Tue, 4 Jun 2024 16:05:00 +0000 (18:05 +0200)]
drm/amdgpu: enable GTT fallback handling for dGPUs only

That is just a waste of time on APUs.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3704
Fixes: 216c1282dde3 ("drm/amdgpu: use GTT only as fallback for VRAM|GTT")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/amdgpu: limit single process inside MES
Shaoyun Liu [Wed, 23 Oct 2024 15:12:00 +0000 (11:12 -0400)]
drm/amd/amdgpu: limit single process inside MES

This is for MES to limit only one process for the user queues

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/fourcc: add AMD_FMT_MOD_TILE_GFX9_4K_D_X
Qiang Yu [Fri, 25 Oct 2024 03:23:29 +0000 (11:23 +0800)]
drm/fourcc: add AMD_FMT_MOD_TILE_GFX9_4K_D_X

This is used when radeonsi export small texture's modifier
to user with eglExportDMABUFImageQueryMESA().

mesa changes is available here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31658

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <qiang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu/mes12: correct kiq unmap latency
Jack Xiao [Mon, 4 Nov 2024 10:06:01 +0000 (18:06 +0800)]
drm/amdgpu/mes12: correct kiq unmap latency

Correct kiq unmap queue timeout value.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Support vcn and jpeg error info parsing
Stanley.Yang [Mon, 4 Nov 2024 06:14:09 +0000 (14:14 +0800)]
drm/amdgpu: Support vcn and jpeg error info parsing

Add vcn and jpeg error count parsing.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd : Update MES API header file for v11 & v12
Shaoyun Liu [Fri, 18 Oct 2024 19:56:25 +0000 (15:56 -0400)]
drm/amd : Update MES API header file for v11 & v12

New features require the new fields defines

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling
Shaoyun Liu [Fri, 4 Oct 2024 16:00:36 +0000 (12:00 -0400)]
drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling

Add back kfd queues in start scheduling that originally been
removed on stop scheduling.

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdkfd: change kfd process kref count at creation
Xiaogang Chen [Mon, 28 Oct 2024 20:06:19 +0000 (15:06 -0500)]
drm/amdkfd: change kfd process kref count at creation

kfd process kref count(process->ref) is initialized to 1 by kref_init. After
it is created not need to increase its kref. Instad add kfd process kref at kfd
process mmu notifier allocation since we already decrease the kref at
free_notifier of mmu_notifier_ops, so pair them.

When user process opens kfd node multiple times the kfd process kref is
increased each time to balance with kfd node close operation.

Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Cleanup shift coding style
Advait Dhamorikar [Tue, 8 Oct 2024 19:16:23 +0000 (00:46 +0530)]
drm/amdgpu: Cleanup shift coding style

Improves the coding style by updating bit-shift
operations in the amdgpu_jpeg.c driver file.
It ensures consistency and avoids potential issues
by explicitly using 1U and 1ULL for unsigned
and unsigned long long shifts in all relevant instances.

Signed-off-by: Advait Dhamorikar <advaitdhamorikar@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/amdgpu: Increase MES log buffer to dump mes scratch data
shaoyunl [Thu, 17 Oct 2024 14:35:02 +0000 (10:35 -0400)]
drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data

MES internal scratch data is useful for mes debug, it can only located
in VRAM, change the allocation type and increase size for mes 11

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Feifei Xu <Feifei.Xu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Implement virt req_ras_err_count
Victor Skvortsov [Wed, 30 Oct 2024 14:18:00 +0000 (10:18 -0400)]
drm/amdgpu: Implement virt req_ras_err_count

Enable RAS late init  if VF RAS Telemetry is supported.

When enabled, the VF can use this interface to query total
RAS error counts from the host.

The VF FB access may abruptly end due to a fatal error,
therefore the VF must cache and sanitize the input.

The Host allows 15 Telemetry messages every 60 seconds, afterwhich
the host will ignore any more in-coming telemetry messages. The VF will
rate limit its msg calling to once every 5 seconds (12 times in 60 seconds).
While the VF is rate limited, it will continue to report the last
good cached data.

v2: Flip generate report & update statistics order for VF

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Acked-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: VF Query RAS Caps from Host if supported
Victor Skvortsov [Wed, 30 Oct 2024 13:58:56 +0000 (09:58 -0400)]
drm/amdgpu: VF Query RAS Caps from Host if supported

If VF RAS Capability support is enabled, guest is able to
retrieve the real RAS support from the host.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Add msg handlers for SRIOV RAS Telemetry
Victor Skvortsov [Wed, 30 Oct 2024 13:45:27 +0000 (09:45 -0400)]
drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry

Add message handlers for RAS telemetry.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support
Victor Skvortsov [Wed, 30 Oct 2024 13:33:51 +0000 (09:33 -0400)]
drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support

The SRIOV PF/VF Data exchange is extended by 64KB for VF RAS Telemetry data.
Add Host RAS Telemetry enable capabilities bitfields.
Add a new VF msg REQ_RAS_ERROR_COUNT, the host response data will be populated
in the RAS Telemetry region.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: 3.2.309
Aric Cyr [Sun, 3 Nov 2024 23:09:50 +0000 (18:09 -0500)]
drm/amd/display: 3.2.309

This version brings along the following:

- DML2 fixes
- DP fixes
- DPMS fix
- HPD fixes
- Misc cleanup
- ODM fix
- Replay fix
- SPL fix

Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Adjust VSDB parser for replay feature
Rodrigo Siqueira [Tue, 5 Nov 2024 15:40:23 +0000 (08:40 -0700)]
drm/amd/display: Adjust VSDB parser for replay feature

At some point, the IEEE ID identification for the replay check in the
AMD EDID was added. However, this check causes the following
out-of-bounds issues when using KASAN:

[   27.804016] BUG: KASAN: slab-out-of-bounds in amdgpu_dm_update_freesync_caps+0xefa/0x17a0 [amdgpu]
[   27.804788] Read of size 1 at addr ffff8881647fdb00 by task systemd-udevd/383

...

[   27.821207] Memory state around the buggy address:
[   27.821215]  ffff8881647fda00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   27.821224]  ffff8881647fda80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   27.821234] >ffff8881647fdb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   27.821243]                    ^
[   27.821250]  ffff8881647fdb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   27.821259]  ffff8881647fdc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   27.821268] ==================================================================

This is caused because the ID extraction happens outside of the range of
the edid lenght. This commit addresses this issue by considering the
amd_vsdb_block size.

Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Remove unused code
Rodrigo Siqueira [Tue, 5 Nov 2024 15:50:02 +0000 (08:50 -0700)]
drm/amd/display: Remove unused code

This commit removes a legacy debug_defaults_diags struct.

Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Require minimum VBlank size for stutter optimization
Dillon Varone [Fri, 1 Nov 2024 16:00:14 +0000 (12:00 -0400)]
drm/amd/display: Require minimum VBlank size for stutter optimization

If the nominal VBlank is too small, optimizing for stutter can cause
the prefetch bandwidth to increase drasticaly, resulting in higher
clock and power requirements. Only optimize if it is >3x the stutter
latency.

Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agoMerge tag 'drm-misc-next-2024-11-08' of https://gitlab.freedesktop.org/drm/misc/kerne...
Dave Airlie [Mon, 11 Nov 2024 02:10:48 +0000 (12:10 +1000)]
Merge tag 'drm-misc-next-2024-11-08' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for v6.13:

UAPI Changes:
- Add 1X7X5 media-bus formats.

Cross-subsystem Changes:
- Maintainer updates for VKMS and IT6263.
- Add media-bus-fmt for MEDIA_BUS_FMT_RGB101010_1X7X5_*.
- Add IT6263 DT bindings and driver.

Core Changes:
- Add ABGR210101010 support to panic handler.
- Use ATOMIC64_INIT in drm_file.c
- Improve scheduler teardown documentation.

Driver Changes:
- Make mediatek compile on ARM again.
- Add missing drm/drm_bridge.h header include, already in drm-next.
- Small fixes and cleanups to vkms, bridge/it6505, panfrost, panthor.
- Add panic support to nouveau for nv50+.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/344afe41-d27b-408a-8542-bfecfd3555f6@linux.intel.com
5 months agodrm/amd/display: Handle dml allocation failure to avoid crash
Ryan Seto [Fri, 1 Nov 2024 14:19:56 +0000 (10:19 -0400)]
drm/amd/display: Handle dml allocation failure to avoid crash

[Why]
In the case where a dml allocation fails for any reason, the
current state's dml contexts would no longer be valid. Then
subsequent calls dc_state_copy_internal would shallow copy
invalid memory and if the new state was released, a double
free would occur.

[How]
Reset dml pointers in new_state to NULL and avoid invalid
pointer

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Ryan Seto <ryanseto@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Use region6 size in fw_meta_info
JinZe Xu [Fri, 18 Oct 2024 14:15:57 +0000 (22:15 +0800)]
drm/amd/display: Use region6 size in fw_meta_info

[Why]
If driver allocated region6 size is not same as the size in firmware,
dmcub won't enable region6.

[How]
Use region6 size in dmcub_fw_meta instead of a constant value.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: JinZe Xu <jinze.xu@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Update SPL Taps Required For Integer Scaling
Austin Zheng [Thu, 31 Oct 2024 18:10:39 +0000 (14:10 -0400)]
drm/amd/display: Update SPL Taps Required For Integer Scaling

Number of taps is incorrectly being set when integer scaling is enabled.
Taps required when src_rect != dst_rect previously not considered.
Perform the calculations when integer scaling is enabled.
Set taps to 1 if the scaling ratio is 1:1.

Reviewed-by: Samson Tam <samson.tam@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: disabling p-state checks for DCN31 and DCN314
Emily Nie [Wed, 16 Oct 2024 19:52:28 +0000 (15:52 -0400)]
drm/amd/display: disabling p-state checks for DCN31 and DCN314

[Why]
IGT displays Dmesg warnings which are likely false

[How]
Disabling p-state checks leading to this warning for DCN31 and DCN314

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Emily Nie <Emily.Nie@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: always blank stream before disable crtc
Fudongwang [Wed, 30 Oct 2024 08:45:56 +0000 (16:45 +0800)]
drm/amd/display: always blank stream before disable crtc

Garbage will show due to dig is on. So blank stream needed.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Fudongwang <Fudong.Wang@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Read DP tunneling support only for DPIA endpoints
Aurabindo Pillai [Wed, 30 Oct 2024 18:06:18 +0000 (18:06 +0000)]
drm/amd/display: Read DP tunneling support only for DPIA endpoints

Unconditionally reading DP tunneling support results in extraneous
errors messages on certain devices. Fix this by guarding the DPCD read
for DP tunneling support for USB4 DPIA endpoints.

Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Adding flag for forced MST blocked discovery
Meenakshikumar Somasundaram [Fri, 25 Oct 2024 03:06:37 +0000 (23:06 -0400)]
drm/amd/display: Adding flag for forced MST blocked discovery

[Why]
Need a flag to force MST blocked discovery for certain branch devices.

[How]
Added a flag to force MST blocked discovery in struct dc_panel_patch.

Reviewed-by: PeiChen Huang <peichen.huang@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Fix Panel Replay not update screen correctly
Tom Chung [Tue, 29 Oct 2024 09:28:23 +0000 (17:28 +0800)]
drm/amd/display: Fix Panel Replay not update screen correctly

[Why]
In certain use case such as KDE login screen, there will be no atomic
commit while do the frame update.
If the Panel Replay enabled, it will cause the screen not updated and
looks like system hang.

[How]
Delay few atomic commits before enabled the Panel Replay just like PSR.

Fixes: be64336307a6c ("drm/amd/display: Re-enable panel replay feature")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3686
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3682
Tested-By: Corey Hickey <bugfood-c@fatooh.org>
Tested-By: James Courtier-Dutton <james.dutton@gmail.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Change some variable name of psr
Tom Chung [Tue, 29 Oct 2024 07:38:16 +0000 (15:38 +0800)]
drm/amd/display: Change some variable name of psr

Panel Replay feature may also use the same variable with PSR.
Change the variable name and make it not specify for PSR.

Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Change parameters to fix certain compiler errors
Revalla Hari Krishna [Fri, 25 Oct 2024 12:55:58 +0000 (18:25 +0530)]
drm/amd/display: Change parameters to fix certain compiler errors

[Why]
String literals must be assigned to const char pointers.

[How]
By adding const keyword to fix compilation errors.

Reviewed-by: Lohita Mudimela <lohita.mudimela@amd.com>
Signed-off-by: Revalla Hari Krishna <Harikrishna.Revalla@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/display: Refactor HPD IRQ error checking flow
Leon Huang [Mon, 21 Oct 2024 09:22:35 +0000 (17:22 +0800)]
drm/amd/display: Refactor HPD IRQ error checking flow

[Why]
HPD error status does not cover Replay desync error status
while executing autotests and CTS tests.

[How]
Refactor the checking flow, reporting the HPD error based on
different eDP feature.

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Leon Huang <Leon.Huang1@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu/gfx11: Enable cleaner shader for GFX11.0.0/11.0.2 GPUs
Srinivasan Shanmugam [Wed, 6 Nov 2024 07:14:45 +0000 (12:44 +0530)]
drm/amdgpu/gfx11: Enable cleaner shader for GFX11.0.0/11.0.2 GPUs

Enable the cleaner shader for GFX11.0.0/11.0.2 GPUs to provide data
isolation between GPU workloads. The cleaner shader is responsible for
clearing the Local Data Store (LDS), Vector General Purpose Registers
(VGPRs), and Scalar General Purpose Registers (SGPRs), which helps
prevent data leakage and ensures accurate computation results.

This update extends cleaner shader support to GFX11.0.0/11.0.2 GPUs,
previously available for GFX11.0.3. It enhances security by clearing GPU
memory between processes and maintains a consistent GPU state across KGD
and KFD workloads.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Add documentation for enforce isolation feature
Srinivasan Shanmugam [Wed, 6 Nov 2024 13:55:50 +0000 (19:25 +0530)]
drm/amdgpu: Add documentation for enforce isolation feature

This feature enables process isolation on the graphics engine by
serializing access to it and adding a cleaner shader which clears LDS
(Local Data Store) and GPRs (General Purpose Registers) between jobs.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdkfd: Fix wrong usage of INIT_WORK()
Yuan Can [Wed, 6 Nov 2024 01:35:41 +0000 (09:35 +0800)]
drm/amdkfd: Fix wrong usage of INIT_WORK()

In kfd_procfs_show(), the sdma_activity_work_handler is a local variable
and the sdma_activity_work_handler.sdma_activity_work should initialize
with INIT_WORK_ONSTACK() instead of INIT_WORK().

Fixes: 32cb59f31362 ("drm/amdkfd: Track SDMA utilization per process")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: fix check in gmc_v9_0_get_vm_pte()
Christian König [Thu, 31 Oct 2024 09:04:17 +0000 (10:04 +0100)]
drm/amdgpu: fix check in gmc_v9_0_get_vm_pte()

The coherency flags can only be determined when the BO is locked and that
in turn is only guaranteed when the mapping is validated.

Fix the check, move the resource check into the function and add an assert
that the BO is locked.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: d1a372af1c3d ("drm/amdgpu: Set MTYPE in PTE based on BO flags")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amd/pm: print pp_dpm_mclk in ascending order on SMU v14.0.0
Tim Huang [Mon, 28 Oct 2024 05:51:50 +0000 (13:51 +0800)]
drm/amd/pm: print pp_dpm_mclk in ascending order on SMU v14.0.0

Currently, the pp_dpm_mclk values are reported in descending order
on SMU IP v14.0.0/1/4. Adjust to ascending order for consistency
with other clock interfaces.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Inform if PCIe based P2P links are not available
Ramesh Errabolu [Wed, 6 Nov 2024 00:50:02 +0000 (18:50 -0600)]
drm/amdgpu: Inform if PCIe based P2P links are not available

Raise an info message in kernel log if PCIe root complex
determines that a AMD GPU device D<i> cannot have P2P
communication with another AMD GPU device D<j>

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Fix video caps for H264 and HEVC encode maximum size
David Rosca [Mon, 21 Oct 2024 07:36:11 +0000 (09:36 +0200)]
drm/amdgpu: Fix video caps for H264 and HEVC encode maximum size

H264 supports 4096x4096 starting from Polaris.
HEVC also supports 4096x4096, with VCN 3 and newer 8192x4352
is supported.

Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Add sysfs interface for jpeg reset mask
Jesse.zhang@amd.com [Tue, 29 Oct 2024 06:36:35 +0000 (14:36 +0800)]
drm/amdgpu: Add sysfs interface for jpeg reset mask

Add the sysfs interface for jpeg:
jpeg_reset_mask

The interface is read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.

V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
    and print the strings in the order they are applied (Christian)

    check amdgpu_gpu_recovery  before creating sysfs file itself,
    and initialize supported_reset_types in IP version files (Lijo)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Add sysfs interface for vpe reset mask
Jesse.zhang@amd.com [Tue, 29 Oct 2024 06:33:05 +0000 (14:33 +0800)]
drm/amdgpu: Add sysfs interface for vpe reset mask

Add the sysfs interface for vpe:
    vpe_reset_mask

The interface is read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.

V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
    and print the strings in the order they are applied (Christian)

    check amdgpu_gpu_recovery  before creating sysfs file itself,
    and initialize supported_reset_types in IP version files (Lijo)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Add sysfs interface for sdma reset mask
Jesse.zhang@amd.com [Mon, 4 Nov 2024 05:15:49 +0000 (13:15 +0800)]
drm/amdgpu: Add sysfs interface for sdma reset mask

Add the sysfs interface for sdma:
sdma_reset_mask

The interface is read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.

V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
   and print the strings in the order they are applied (Christian)

   check amdgpu_gpu_recovery  before creating sysfs file itself,
   and initialize supported_reset_types in IP version files (Lijo)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Normalize reg offsets on VCN v4.0.3
Sathishkumar S [Tue, 5 Nov 2024 05:23:01 +0000 (10:53 +0530)]
drm/amdgpu: Normalize reg offsets on VCN v4.0.3

Remote access to external AIDs isn't possible with VCN RRMT disabled
and it is disabled on SoCs with GC 9.4.4, so use only local offsets.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Avoid kcq disable during reset
Lijo Lazar [Tue, 5 Nov 2024 05:34:09 +0000 (11:04 +0530)]
drm/amdgpu: Avoid kcq disable during reset

Reset sequence indicates that hardware already ran into a bad state.
Avoid sending unmap queue request to reset KCQ. This will also cover RAS
error scenarios which need a reset to recover, hence remove the check.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Fix map/unmap queue logic
Lijo Lazar [Tue, 5 Nov 2024 05:00:20 +0000 (10:30 +0530)]
drm/amdgpu: Fix map/unmap queue logic

In current logic, it calls ring_alloc followed by a ring_test. ring_test
in turn will call another ring_alloc. This is illegal usage as a
ring_alloc is expected to be closed properly with a ring_commit. Change
to commit the map/unmap queue packet first followed by a ring_test. Add a
comment about the usage of ring_test.

Also, reorder the current pre-condition checks of job hang or kiq ring
scheduler not ready. Without them being met, it is not useful to attempt
ring or memory allocations.

Fixes tag refers to the original patch which introduced this issue which
then got carried over into newer code.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: 6c10b5cc4eaa ("drm/amdgpu: Remove duplicate code in gfx_v8_0.c")
5 months agodrm/amdgpu: fix ACA bank count boundary check error
Yang Wang [Wed, 6 Nov 2024 06:49:56 +0000 (14:49 +0800)]
drm/amdgpu: fix ACA bank count boundary check error

fix ACA bank count boundary check error.

Fixes: f5e4cc8461c4 ("drm/amdgpu: implement RAS ACA driver framework")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Add sysfs interface for gc reset mask
Jesse.zhang@amd.com [Tue, 5 Nov 2024 07:22:56 +0000 (15:22 +0800)]
drm/amdgpu: Add sysfs interface for gc reset mask

Add two sysfs interfaces for gfx and compute:
gfx_reset_mask
compute_reset_mask

These interfaces are read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.

V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
    and print the strings in the order they are applied (Christian)

    check amdgpu_gpu_recovery  before creating sysfs file itself,
    and initialize supported_reset_types in IP version files (Lijo)
v4: Fixing uninitialized variables (Tim)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: fix return random value when multiple threads read registers via mes.
chongli2 [Wed, 6 Nov 2024 03:43:09 +0000 (11:43 +0800)]
drm/amdgpu: fix return random value when multiple threads read registers via mes.

The currect code use the address "adev->mes.read_val_ptr" to
store the value read from register via mes.
So when multiple threads read register,
multiple threads have to share the one address,
and overwrite the value each other.

Assign an address by "amdgpu_device_wb_get" to store register value.
each thread will has an address to store register value.

Signed-off-by: chongli2 <chongli2@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdkfd: remove gfx 12 trap handler page size cap
Jonathan Kim [Tue, 5 Nov 2024 17:38:30 +0000 (12:38 -0500)]
drm/amdkfd: remove gfx 12 trap handler page size cap

GFX 12 does not require a page size cap for the trap handler because
it does not require a CWSR work around like GFX 11 did.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agodrm/amdgpu: Add supported NPS modes node
Asad Kamal [Wed, 30 Oct 2024 11:02:01 +0000 (19:02 +0800)]
drm/amdgpu: Add supported NPS modes node

Add sysfs node to show supported NPS mode for the
partition configuration selected using xcp_config

v2: Hide node if dynamic nps switch not supported

v3: Fix removal of files in case of error

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 months agoMerge tag 'drm-etnaviv-next-2024-11-07' of https://git.pengutronix.de/git/lst/linux...
Dave Airlie [Fri, 8 Nov 2024 02:05:45 +0000 (12:05 +1000)]
Merge tag 'drm-etnaviv-next-2024-11-07' of https://git.pengutronix.de/git/lst/linux into drm-next

- improve handling of DMA address limited systems
- improve GPU hangcheck
- fix address space collision on >= 4K CPU pages
- flush all known writeback caches before memory release
- various code cleanups

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/c84075a0257e7bee222d008fa3118117422d664e.camel@pengutronix.de
5 months agoMerge tag 'amd-drm-next-6.13-2024-11-06' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Fri, 8 Nov 2024 02:04:24 +0000 (12:04 +1000)]
Merge tag 'amd-drm-next-6.13-2024-11-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.13-2024-11-06:

amdgpu:
- Misc cleanups
- OLED fixes
- DCN 4.x fixes
- DCN 3.5 fixes
- 8K fixes
- IPS fixes
- DSC fixes
- S3 fix
- KASAN fix
- SMU13 fixes
- fdinfo fixes
- USB-C fixes
- ACPI fix
- Fix dummy page overlapping mappings
- Fix workload profile handling
- Add user control for zero RPM on SMU13
- Cleaner shader updates
- Stop syncing PRT map operations
- Debugfs permissions fixes
- Debugfs bounds check fix
- RAS cleanups
- Enforce isolation updates

amdkfd:
- Add topology cap flag for per queue reset
- Add an interface to query whether KFD queues are present
- Use dynamic allocation for get_cu_occupancy

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241106163904.189108-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>