]> www.infradead.org Git - users/hch/misc.git/log
users/hch/misc.git
6 weeks agodrm/amdkfd: Fix bug in config_dequeue_wait_counts
Harish Kasiviswanathan [Fri, 14 Mar 2025 16:03:53 +0000 (12:03 -0400)]
drm/amdkfd: Fix bug in config_dequeue_wait_counts

For certain ASICs where dequeue_wait_count don't need to be initialized,
pm_config_dequeue_wait_counts_v9 return without filling in the packet
information. However, the calling function interprets this as a success
and sends the uninitialized packet to firmware causing hang.

Fix the above bug by not calling pm_config_dequeue_wait_counts_v9 for
ASICs that don't need the value to be initialized.

v2: Removed redudant code.
    Tidy up code based on review comments
v3: Don't call pm_config_dequeue_wait_counts_v9 for certain ASICs

Fixes: ed962f8d0603 ("drm/amdkfd: Add pm_config_dequeue_wait_counts API")
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/radeon/uvd: Replace nested max() with single max3()
FengWei [Mon, 17 Mar 2025 02:57:45 +0000 (10:57 +0800)]
drm/radeon/uvd: Replace nested max() with single max3()

Use max3() macro instead of nesting max() to simplify the return
statement.

Signed-off-by: FengWei <feng.wei8@zte.com.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: grab an additional reference on the gang fence v2
Christian König [Tue, 14 Jan 2025 12:51:39 +0000 (13:51 +0100)]
drm/amdgpu: grab an additional reference on the gang fence v2

We keep the gang submission fence around in adev, make sure that it
stays alive.

v2: fix memory leak on retry

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Use wafl version for xgmi
Lijo Lazar [Tue, 25 Feb 2025 05:59:47 +0000 (11:29 +0530)]
drm/amdgpu: Use wafl version for xgmi

XGMI and WAFL share the same versions. Use WAFL version if XGMI version
is not present in discovery.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Fix SDMA engine reset logic
Jesse.zhang@amd.com [Mon, 17 Mar 2025 01:14:36 +0000 (09:14 +0800)]
drm/amdgpu: Fix SDMA engine reset logic

The scheduler should restart only if the reset operation
succeeds  This ensures that new tasks are only submitted
to the queues after a successful reset.

Fixes: 4c02f7301657 ("drm/amdgpu: Introduce conditional user queue suspension for SDMA resets")
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse.Zhang <Jesse.zhang@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu/pm: Handle SCLK offset correctly in overdrive for smu 14.0.2
Tomasz Pakuła [Tue, 11 Mar 2025 21:38:33 +0000 (22:38 +0100)]
drm/amdgpu/pm: Handle SCLK offset correctly in overdrive for smu 14.0.2

Currently, it seems like the code was carried over from RDNA3 because
it assumes two possible values to set. RDNA4, instead of having:
0: min SCLK
1: max SCLK
only has
0: SCLK offset

This change makes it so it only reports current offset value instead of
showing possible min/max values and their indices. Moreover, it now only
accepts the offset as a value, without the indice index.

Additionally, the lower bound was printed as %u by mistake.

Old:
OD_SCLK_OFFSET:
0: -500Mhz
1: 1000Mhz
OD_MCLK:
0: 97Mhz
1: 1259MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK_OFFSET:    -500Mhz       1000Mhz
MCLK:      97Mhz       1500Mhz
VDDGFX_OFFSET:    -200mv          0mv

New:
OD_SCLK_OFFSET:
0Mhz
OD_MCLK:
0: 97Mhz
1: 1259MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK_OFFSET:    -500Mhz       1000Mhz
MCLK:      97Mhz       1500Mhz
VDDGFX_OFFSET:    -200mv          0mv

Setting this offset:
Old: "s 1 <offset>"
New: "s <offset>"

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4036
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: release xcp_mgr on exit
Flora Cui [Fri, 14 Mar 2025 02:27:55 +0000 (10:27 +0800)]
drm/amdgpu: release xcp_mgr on exit

Free on driver cleanup.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: 3.2.325
Taimur Hassan [Sun, 9 Mar 2025 08:57:46 +0000 (04:57 -0400)]
drm/amd/display: 3.2.325

This version brings along following fixes:
- Use DPM table clk setting for dml2 soc dscclk
- Update static soc table
- Fix incorrect fw_state address in dmub_srv
- Use HW lock mgr for PSR1 when only one eDP
- Revert "Support for reg inbox0 for host->DMUB CMDs"
- Change notification of link BW allocation
- Fix message for support_edp0_on_dp1
- Guard against setting dispclk low for dcn31x
- Prevent VStartup Overflow
- Check pipe->stream before passing it to a function

Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Use DPM table clk setting for dml2 soc dscclk
Charlene Liu [Fri, 28 Feb 2025 13:02:20 +0000 (08:02 -0500)]
drm/amd/display: Use DPM table clk setting for dml2 soc dscclk

[WHY]
Not like dppclk/dispclk, dml2 will calculate the minimum required clocks.
For dscclk, it is used for pure comparision.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Update static soc table
Charlene Liu [Fri, 7 Mar 2025 23:24:56 +0000 (18:24 -0500)]
drm/amd/display: Update static soc table

[WHY]
Update the static soc table dcn3_5_soc.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Fix incorrect fw_state address in dmub_srv
Lo-an Chen [Mon, 10 Mar 2025 06:52:22 +0000 (14:52 +0800)]
drm/amd/display: Fix incorrect fw_state address in dmub_srv

[WHY]
The fw_state in dmub_srv was assigned with wrong address.
The address was pointed to the firmware region.

[HOW]
Fix the firmware state by using DMUB_DEBUG_FW_STATE_OFFSET
in dmub_cmd.h.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Lo-an Chen <lo-an.chen@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Use HW lock mgr for PSR1 when only one eDP
Mario Limonciello [Fri, 7 Mar 2025 21:55:20 +0000 (15:55 -0600)]
drm/amd/display: Use HW lock mgr for PSR1 when only one eDP

[WHY]
DMUB locking is important to make sure that registers aren't accessed
while in PSR.  Previously it was enabled but caused a deadlock in
situations with multiple eDP panels.

[HOW]
Detect if multiple eDP panels are in use to decide whether to use
lock. Refactor the function so that the first check is for PSR-SU
and then replay is in use to prevent having to look up number
of eDP panels for those configurations.

Fixes: f245b400a223 ("Revert "drm/amd/display: Use HW lock mgr for PSR1"")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3965
Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Revert "Support for reg inbox0 for host->DMUB CMDs"
Dillon Varone [Mon, 3 Mar 2025 21:48:50 +0000 (16:48 -0500)]
drm/amd/display: Revert "Support for reg inbox0 for host->DMUB CMDs"

This reverts commit 15d1c2e6bf60511ba068d7d735d051911c6c5b92.

Reason: Cursor movement causes system to hang.

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Change notification of link BW allocation
Cruise Hung [Tue, 25 Feb 2025 15:03:29 +0000 (23:03 +0800)]
drm/amd/display: Change notification of link BW allocation

[WHY & HOW]
The response of DP BW allocation is handled in Outbox ISR.
When it failed to request the DP BW allocation, it sent another
DPCD request in Outbox ISR immediately. The DP AUX reply also
uses the Outbox ISR. So, no AUX reply happened in this case.
Change to use HPD IRQ for the notification.

Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Cruise Hung <Cruise.Hung@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Fix message for support_edp0_on_dp1
Yilin Chen [Wed, 5 Mar 2025 17:19:49 +0000 (12:19 -0500)]
drm/amd/display: Fix message for support_edp0_on_dp1

[WHY]
The info message was wrong when support_edp0_on_dp1 is enabled

[HOW]
Use correct info message for support_edp0_on_dp1

Fixes: f6d17270d18a ("drm/amd/display: add a quirk to enable eDP0 on DP1")
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Yilin Chen <Yilin.Chen@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Guard against setting dispclk low for dcn31x
Jing Zhou [Tue, 4 Mar 2025 15:15:56 +0000 (23:15 +0800)]
drm/amd/display: Guard against setting dispclk low for dcn31x

[WHY]
We should never apply a minimum dispclk value while in
prepare_bandwidth or while displays are active. This is
always an optimizaiton for when all displays are disabled.

[HOW]
Defer dispclk optimization until safe_to_lower = true
and display_count reaches 0.

Since 0 has a special value in this logic (ie. no dispclk
required) we also need adjust the logic that clamps it for
the actual request to PMFW.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Chris Park <chris.park@amd.com>
Reviewed-by: Eric Yang <eric.yang@amd.com>
Signed-off-by: Jing Zhou <Jing.Zhou@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Prevent VStartup Overflow
Ryan Seto [Fri, 28 Feb 2025 20:52:42 +0000 (15:52 -0500)]
drm/amd/display: Prevent VStartup Overflow

[WHY & HOW]
Fixed Overflow issue by clamping VStartup to max value of register.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Ryan Seto <ryanseto@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Check pipe->stream before passing it to a function
Alex Hung [Mon, 3 Mar 2025 18:16:19 +0000 (11:16 -0700)]
drm/amd/display: Check pipe->stream before passing it to a function

[WHAT & HOW]
dp_is_128b_132b_signal dereferences pipe->stream so it is necessary to
check it in advance.

Also fix erroneous spaces and move a variable declaration to top.

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Add debug masks for HDCP LC FW testing
Dominik Kaszewski [Thu, 13 Mar 2025 08:52:31 +0000 (09:52 +0100)]
drm/amdgpu: Add debug masks for HDCP LC FW testing

HDCP Locality Check is being moved to FW, add debug flags to control
its behavior in existing hardware for validation purposes.

Signed-off-by: Dominik Kaszewski <dominik.kaszewski@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Fix computation for remain size of CPER ring
Xiang Liu [Thu, 13 Mar 2025 03:24:55 +0000 (11:24 +0800)]
drm/amdgpu: Fix computation for remain size of CPER ring

The mistake of computation for remain size of CPER ring will cause
unbreakable while cycle when CPER ring overflow.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/amdgpu: shorten the gfx idle worker timeout
Kenneth Feng [Tue, 11 Mar 2025 08:46:39 +0000 (16:46 +0800)]
drm/amd/amdgpu: shorten the gfx idle worker timeout

Shorten the gfx idle worker timeout. This is to sync with
DAL when there is no activity on the screen. Original 1
second can not sync with DAL, so DAL can not apply MALL
when the workload type is not bootup default.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: format old RAS eeprom data into V3 version
Tao Zhou [Tue, 4 Mar 2025 07:05:53 +0000 (15:05 +0800)]
drm/amdgpu: format old RAS eeprom data into V3 version

Clear old data and save it in V3 format.

v2: only format eeprom data for new ASICs.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: don't free conflicting apertures for non-display devices
Alex Deucher [Wed, 25 Sep 2024 14:29:31 +0000 (10:29 -0400)]
drm/amdgpu: don't free conflicting apertures for non-display devices

PCI_CLASS_ACCELERATOR_PROCESSING devices won't ever be
the sysfb, so there is no need to free conflicting
apertures.

Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: adjust drm_firmware_drivers_only() handling
Alex Deucher [Fri, 14 Mar 2025 00:52:38 +0000 (20:52 -0400)]
drm/amdgpu: adjust drm_firmware_drivers_only() handling

Move to probe so we can check the PCI device type and
only apply the drm_firmware_drivers_only() check for
PCI DISPLAY classes.  Also add a module parameter to
override the nomodeset kernel parameter as a workaround
for platforms that have this hardcoded on their kernel
command lines.

Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: drop drm_firmware_drivers_only()
Alex Deucher [Thu, 13 Mar 2025 20:57:50 +0000 (16:57 -0400)]
drm/amdgpu: drop drm_firmware_drivers_only()

There are a number of systems and cloud providers out there
that have nomodeset hardcoded in their kernel parameters
to block nouveau for the nvidia driver.  This prevents the
amdgpu driver from loading. Unfortunately the end user cannot
easily change this.  The preferred way to block modules from
loading is to use modprobe.blacklist=<driver>.  That is what
providers should be using to block specific drivers.

Drop the check to allow the driver to load even when nomodeset
is specified on the kernel command line.

Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Restore uncached behaviour on GFX12
David Belanger [Tue, 2 Jul 2024 21:56:41 +0000 (17:56 -0400)]
drm/amdgpu: Restore uncached behaviour on GFX12

Always use MTYPE_UC if UNCACHED flag is specified.

This makes kernarg region uncached and it restores
usermode cache disable debug flag functionality.

Do not set MTYPE_UC for COHERENT flag, on GFX12 coherence is handled by
shader code.

Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini()
Wentao Liang [Wed, 12 Mar 2025 06:31:06 +0000 (14:31 +0800)]
drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini()

In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is
incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini()
can only release 'pfp' field of 'gfx'. The release function of 'me' field
should be gfx_v12_0_me_fini().

Fixes: 52cb80c12e8a ("drm/amdgpu: Add gfx v12_0 ip block support (v6)")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: avoid NPD when ASIC does not support DMUB
Thadeu Lima de Souza Cascardo [Wed, 5 Feb 2025 13:06:38 +0000 (10:06 -0300)]
drm/amd/display: avoid NPD when ASIC does not support DMUB

ctx->dmub_srv will de NULL if the ASIC does not support DMUB, which is
tested in dm_dmub_sw_init.

However, it will be dereferenced in dmub_hw_lock_mgr_cmd if
should_use_dmub_lock returns true.

This has been the case since dmub support has been added for PSR1.

Fix this by checking for dmub_srv in should_use_dmub_lock.

[   37.440832] BUG: kernel NULL pointer dereference, address: 0000000000000058
[   37.447808] #PF: supervisor read access in kernel mode
[   37.452959] #PF: error_code(0x0000) - not-present page
[   37.458112] PGD 0 P4D 0
[   37.460662] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[   37.465553] CPU: 2 UID: 1000 PID: 1745 Comm: DrmThread Not tainted 6.14.0-rc1-00003-gd62e938120f0 #23 99720e1cb1e0fc4773b8513150932a07de3c6e88
[   37.478324] Hardware name: Google Morphius/Morphius, BIOS Google_Morphius.13434.858.0 10/26/2023
[   37.487103] RIP: 0010:dmub_hw_lock_mgr_cmd+0x77/0xb0
[   37.492074] Code: 44 24 0e 00 00 00 00 48 c7 04 24 45 00 00 0c 40 88 74 24 0d 0f b6 02 88 44 24 0c 8b 01 89 44 24 08 85 f6 75 05 c6 44 24 0e 01 <48> 8b 7f 58 48 89 e6 ba 01 00 00 00 e8 08 3c 2a 00 65 48 8b 04 5
[   37.510822] RSP: 0018:ffff969442853300 EFLAGS: 00010202
[   37.516052] RAX: 0000000000000000 RBX: ffff92db03000000 RCX: ffff969442853358
[   37.523185] RDX: ffff969442853368 RSI: 0000000000000001 RDI: 0000000000000000
[   37.530322] RBP: 0000000000000001 R08: 00000000000004a7 R09: 00000000000004a5
[   37.537453] R10: 0000000000000476 R11: 0000000000000062 R12: ffff92db0ade8000
[   37.544589] R13: ffff92da01180ae0 R14: ffff92da011802a8 R15: ffff92db03000000
[   37.551725] FS:  0000784a9cdfc6c0(0000) GS:ffff92db2af00000(0000) knlGS:0000000000000000
[   37.559814] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.565562] CR2: 0000000000000058 CR3: 0000000112b1c000 CR4: 00000000003506f0
[   37.572697] Call Trace:
[   37.575152]  <TASK>
[   37.577258]  ? __die_body+0x66/0xb0
[   37.580756]  ? page_fault_oops+0x3e7/0x4a0
[   37.584861]  ? exc_page_fault+0x3e/0xe0
[   37.588706]  ? exc_page_fault+0x5c/0xe0
[   37.592550]  ? asm_exc_page_fault+0x22/0x30
[   37.596742]  ? dmub_hw_lock_mgr_cmd+0x77/0xb0
[   37.601107]  dcn10_cursor_lock+0x1e1/0x240
[   37.605211]  program_cursor_attributes+0x81/0x190
[   37.609923]  commit_planes_for_stream+0x998/0x1ef0
[   37.614722]  update_planes_and_stream_v2+0x41e/0x5c0
[   37.619703]  dc_update_planes_and_stream+0x78/0x140
[   37.624588]  amdgpu_dm_atomic_commit_tail+0x4362/0x49f0
[   37.629832]  ? srso_return_thunk+0x5/0x5f
[   37.633847]  ? mark_held_locks+0x6d/0xd0
[   37.637774]  ? _raw_spin_unlock_irq+0x24/0x50
[   37.642135]  ? srso_return_thunk+0x5/0x5f
[   37.646148]  ? lockdep_hardirqs_on+0x95/0x150
[   37.650510]  ? srso_return_thunk+0x5/0x5f
[   37.654522]  ? _raw_spin_unlock_irq+0x2f/0x50
[   37.658883]  ? srso_return_thunk+0x5/0x5f
[   37.662897]  ? wait_for_common+0x186/0x1c0
[   37.666998]  ? srso_return_thunk+0x5/0x5f
[   37.671009]  ? drm_crtc_next_vblank_start+0xc3/0x170
[   37.675983]  commit_tail+0xf5/0x1c0
[   37.679478]  drm_atomic_helper_commit+0x2a2/0x2b0
[   37.684186]  drm_atomic_commit+0xd6/0x100
[   37.688199]  ? __cfi___drm_printfn_info+0x10/0x10
[   37.692911]  drm_atomic_helper_update_plane+0xe5/0x130
[   37.698054]  drm_mode_cursor_common+0x501/0x670
[   37.702600]  ? __cfi_drm_mode_cursor_ioctl+0x10/0x10
[   37.707572]  drm_mode_cursor_ioctl+0x48/0x70
[   37.711851]  drm_ioctl_kernel+0xf2/0x150
[   37.715781]  drm_ioctl+0x363/0x590
[   37.719189]  ? __cfi_drm_mode_cursor_ioctl+0x10/0x10
[   37.724165]  amdgpu_drm_ioctl+0x41/0x80
[   37.728013]  __se_sys_ioctl+0x7f/0xd0
[   37.731685]  do_syscall_64+0x87/0x100
[   37.735355]  ? vma_end_read+0x12/0xe0
[   37.739024]  ? srso_return_thunk+0x5/0x5f
[   37.743041]  ? find_held_lock+0x47/0xf0
[   37.746884]  ? vma_end_read+0x12/0xe0
[   37.750552]  ? srso_return_thunk+0x5/0x5f
[   37.754565]  ? lock_release+0x1c4/0x2e0
[   37.758406]  ? vma_end_read+0x12/0xe0
[   37.762079]  ? exc_page_fault+0x84/0xe0
[   37.765921]  ? srso_return_thunk+0x5/0x5f
[   37.769938]  ? lockdep_hardirqs_on+0x95/0x150
[   37.774303]  ? srso_return_thunk+0x5/0x5f
[   37.778317]  ? exc_page_fault+0x84/0xe0
[   37.782163]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
[   37.787218] RIP: 0033:0x784aa5ec3059
[   37.790803] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1d 48 8b 45 c8 64 48 2b 04 25 28 00 0
[   37.809553] RSP: 002b:0000784a9cdf90e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   37.817121] RAX: ffffffffffffffda RBX: 0000784a9cdf917c RCX: 0000784aa5ec3059
[   37.824256] RDX: 0000784a9cdf917c RSI: 00000000c01c64a3 RDI: 0000000000000020
[   37.831391] RBP: 0000784a9cdf9130 R08: 0000000000000100 R09: 0000000000ff0000
[   37.838525] R10: 0000000000000000 R11: 0000000000000246 R12: 0000025c01606ed0
[   37.845657] R13: 0000025c00030200 R14: 00000000c01c64a3 R15: 0000000000000020
[   37.852799]  </TASK>
[   37.854992] Modules linked in:
[   37.864546] gsmi: Log Shutdown Reason 0x03
[   37.868656] CR2: 0000000000000058
[   37.871979] ---[ end trace 0000000000000000 ]---
[   37.880976] RIP: 0010:dmub_hw_lock_mgr_cmd+0x77/0xb0
[   37.885954] Code: 44 24 0e 00 00 00 00 48 c7 04 24 45 00 00 0c 40 88 74 24 0d 0f b6 02 88 44 24 0c 8b 01 89 44 24 08 85 f6 75 05 c6 44 24 0e 01 <48> 8b 7f 58 48 89 e6 ba 01 00 00 00 e8 08 3c 2a 00 65 48 8b 04 5
[   37.904703] RSP: 0018:ffff969442853300 EFLAGS: 00010202
[   37.909933] RAX: 0000000000000000 RBX: ffff92db03000000 RCX: ffff969442853358
[   37.917068] RDX: ffff969442853368 RSI: 0000000000000001 RDI: 0000000000000000
[   37.924201] RBP: 0000000000000001 R08: 00000000000004a7 R09: 00000000000004a5
[   37.931336] R10: 0000000000000476 R11: 0000000000000062 R12: ffff92db0ade8000
[   37.938469] R13: ffff92da01180ae0 R14: ffff92da011802a8 R15: ffff92db03000000
[   37.945602] FS:  0000784a9cdfc6c0(0000) GS:ffff92db2af00000(0000) knlGS:0000000000000000
[   37.953689] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.959435] CR2: 0000000000000058 CR3: 0000000112b1c000 CR4: 00000000003506f0
[   37.966570] Kernel panic - not syncing: Fatal exception
[   37.971901] Kernel Offset: 0x30200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   37.982840] gsmi: Log Shutdown Reason 0x02

Fixes: b5c764d6ed55 ("drm/amd/display: Use HW lock mgr for PSR1")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Cc: Sun peng Li <sunpeng.li@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Daniel Wheeler <daniel.wheeler@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/amdgpu: Fix MES init sequence
Shaoyun Liu [Mon, 10 Mar 2025 16:38:12 +0000 (12:38 -0400)]
drm/amd/amdgpu: Fix MES init sequence

When MES is been used , the set_hw_resource_1 API is required to
initialize MES internal context correctly

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Enable ACA by default for psp v13_0_6/v13_0_14
Xiang Liu [Fri, 28 Feb 2025 06:56:45 +0000 (14:56 +0800)]
drm/amdgpu: Enable ACA by default for psp v13_0_6/v13_0_14

Enable ACA by default for psp v13_0_6/v13_0_14.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdkfd: Correct F8_MODE for gfx950
Amber Lin [Thu, 13 Mar 2025 01:14:43 +0000 (21:14 -0400)]
drm/amdkfd: Correct F8_MODE for gfx950

Correct F8_MODE setting for gfx950 that was removed

Fixes: 61972cd93af7 ("drm/amdkfd: Set per-process flags only once for gfx9/10/11/12")
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviwanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Save PA of bad pages for old asics
ganglxie [Tue, 11 Mar 2025 10:35:44 +0000 (18:35 +0800)]
drm/amdgpu: Save PA of bad pages for old asics

for old asics that do not support mca translating, we
just save PA for them

Signed-off-by: ganglxie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 for sriov multiple vf.
Emily Deng [Fri, 7 Feb 2025 06:00:00 +0000 (14:00 +0800)]
drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 for sriov multiple vf.

In sriov multiple vf, Set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Add amdgpu_sriov_multi_vf_mode function
Emily Deng [Thu, 6 Feb 2025 05:09:00 +0000 (13:09 +0800)]
drm/amdgpu: Add amdgpu_sriov_multi_vf_mode function

Use amdgpu_sriov_multi_vf_mode to replace amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev).

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu/pm: enable vcn busy sysfs for GC 9.3.0
Alex Deucher [Tue, 11 Mar 2025 20:34:45 +0000 (16:34 -0400)]
drm/amdgpu/pm: enable vcn busy sysfs for GC 9.3.0

Make it visible for the all GC 9.3.0 chips that support it.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu/pm: enable vcn busy sysfs for GC 12.x
Alex Deucher [Tue, 11 Mar 2025 19:53:55 +0000 (15:53 -0400)]
drm/amdgpu/pm: enable vcn busy sysfs for GC 12.x

Make it visible for the all GC 12.x chips that support it.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdkfd: Fix instruction hazard in gfx12 trap handler
Jay Cornwall [Fri, 7 Feb 2025 21:40:34 +0000 (16:40 -0500)]
drm/amdkfd: Fix instruction hazard in gfx12 trap handler

VALU instructions with SGPR source need wait states to avoid hazard
with SALU using different SGPR.

v2: Eliminate some hazards to reduce code explosion

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu/pm: enable vcn busy sysfs for additional GC 11.x
Alex Deucher [Tue, 11 Mar 2025 19:37:30 +0000 (15:37 -0400)]
drm/amdgpu/pm: enable vcn busy sysfs for additional GC 11.x

Make it visible for the all GC 11.x chips that support it.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu/pm: add VCN activity for SMU 14.0.2
Alex Deucher [Tue, 11 Mar 2025 18:48:22 +0000 (14:48 -0400)]
drm/amdgpu/pm: add VCN activity for SMU 14.0.2

Wire up the query.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu/pm: add VCN activity for SMU 13.0.0/7
Alex Deucher [Tue, 11 Mar 2025 19:00:12 +0000 (15:00 -0400)]
drm/amdgpu/pm: add VCN activity for SMU 13.0.0/7

Wire up the query.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Remove incorrect macro guard
Alex Hung [Tue, 11 Mar 2025 17:10:17 +0000 (11:10 -0600)]
drm/amd/display: Remove incorrect macro guard

This macro guard "__cplusplus" is unnecessary and should not be there.

Signed-off-by: Alex Hung <alex.hung@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Calculate IP specific xgmi bandwidth
Lijo Lazar [Thu, 6 Feb 2025 12:10:42 +0000 (17:40 +0530)]
drm/amdgpu: Calculate IP specific xgmi bandwidth

Use IP version specific xgmi speed/width for bandwidth calculation.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu/pm: add VCN activity for renoir
Alex Deucher [Tue, 11 Mar 2025 20:30:58 +0000 (16:30 -0400)]
drm/amdgpu/pm: add VCN activity for renoir

Wire up the query.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu/pm: wire up hwmon fan speed for smu 14.0.2
Alex Deucher [Tue, 11 Mar 2025 14:34:36 +0000 (10:34 -0400)]
drm/amdgpu/pm: wire up hwmon fan speed for smu 14.0.2

Add callbacks for fan speed fetching.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4034
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Reduce dequeue retry timeout for gfx9 family
Harish Kasiviswanathan [Tue, 25 Feb 2025 20:50:30 +0000 (15:50 -0500)]
drm/amdgpu: Reduce dequeue retry timeout for gfx9 family

Dequeue retry timeout controls the interval between checks for unmet
conditions. On MI series, reduce this from 0x40 to 0x1 (~ 1 uS). The
cost of additional bandwidth consumed by CP when polling memory
shouldn't be substantial.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/pm: Update feature list for smu_v13_0_12
Asad Kamal [Tue, 11 Mar 2025 10:17:33 +0000 (18:17 +0800)]
drm/amd/pm: Update feature list for smu_v13_0_12

Update feature list for smu_v13_0_12 to show vcn & smu deep
sleep feature enable status.

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu/gfx12: don't read registers in mqd init
Alex Deucher [Wed, 26 Feb 2025 21:08:03 +0000 (16:08 -0500)]
drm/amdgpu/gfx12: don't read registers in mqd init

Just use the default values.  There's not need to
get the value from hardware and it could cause problems
if we do that at runtime and gfxoff is active.

Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu/gfx11: don't read registers in mqd init
Alex Deucher [Wed, 26 Feb 2025 20:55:33 +0000 (15:55 -0500)]
drm/amdgpu/gfx11: don't read registers in mqd init

Just use the default values.  There's not need to
get the value from hardware and it could cause problems
if we do that at runtime and gfxoff is active.

Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Fix the race condition for draining retry fault
Emily Deng [Thu, 6 Mar 2025 00:40:01 +0000 (08:40 +0800)]
drm/amdgpu: Fix the race condition for draining retry fault

Issue:
In the scenario where svm_range_restore_pages is called, but
svm->checkpoint_ts has not been set and the retry fault has not been
drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free.
Meanwhile, svm_range_restore_pages continues execution and reaches
svm_range_from_addr. This results in a "failed to find prange..." error,
 causing the page recovery to fail.

How to fix:
Move the timestamp check code under the protection of svm->lock.

v2:
Make sure all right locks are released before go out.

v3:
Directly goto out_unlock_svms, and return -EAGAIN.

v4:
Refine code.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Remove unsupported xgmi versions
Lijo Lazar [Thu, 6 Feb 2025 12:03:46 +0000 (17:33 +0530)]
drm/amdgpu: Remove unsupported xgmi versions

XGMI v4.8.0 is not used in any SOCs. Remove the associated functions.
Also, ensure get_xgmi_info callback pointer is not NULL before calling
the function.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/pm: add unique_id for gfx12
Harish Kasiviswanathan [Tue, 11 Mar 2025 18:15:18 +0000 (14:15 -0400)]
drm/amd/pm: add unique_id for gfx12

Expose unique_id for gfx12

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Update SRIOV video codec caps
David Rosca [Fri, 28 Feb 2025 12:44:32 +0000 (13:44 +0100)]
drm/amdgpu: Update SRIOV video codec caps

There have been multiple fixes to the video caps that are missing for
SRIOV. Update the SRIOV caps with correct values.

Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Remove JPEG from vega and carrizo video caps
David Rosca [Fri, 28 Feb 2025 13:12:10 +0000 (14:12 +0100)]
drm/amdgpu: Remove JPEG from vega and carrizo video caps

JPEG is only supported for VCN1+.

Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Fix JPEG video caps max size for navi1x and raven
David Rosca [Fri, 28 Feb 2025 12:34:49 +0000 (13:34 +0100)]
drm/amdgpu: Fix JPEG video caps max size for navi1x and raven

8192x8192 is the maximum supported resolution.

Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max size
David Rosca [Fri, 28 Feb 2025 12:32:46 +0000 (13:32 +0100)]
drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max size

1920x1088 is the maximum supported resolution.

Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/radeon: fix uninitialized size issue in radeon_vce_cs_parse()
Nikita Zhandarovich [Tue, 11 Mar 2025 11:14:59 +0000 (14:14 +0300)]
drm/radeon: fix uninitialized size issue in radeon_vce_cs_parse()

On the off chance that command stream passed from userspace via
ioctl() call to radeon_vce_cs_parse() is weirdly crafted and
first command to execute is to encode (case 0x03000001), the function
in question will attempt to call radeon_vce_cs_reloc() with size
argument that has not been properly initialized. Specifically, 'size'
will point to 'tmp' variable before the latter had a chance to be
assigned any value.

Play it safe and init 'tmp' with 0, thus ensuring that
radeon_vce_cs_reloc() will catch an early error in cases like these.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: 2fc5703abda2 ("drm/radeon: check VCE relocation buffer range v3")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flags
Natalie Vock [Mon, 10 Mar 2025 17:08:05 +0000 (18:08 +0100)]
drm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flags

PRT BOs may not have any backing store, so bo->tbo.resource will be
NULL. Check for that before dereferencing.

Fixes: 0cce5f285d9a ("drm/amdkfd: Check correct memory types for is_system variable")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: finish wiring up sid.h in DCE6
Alexandre Demers [Sun, 9 Mar 2025 16:48:52 +0000 (12:48 -0400)]
drm/amdgpu: finish wiring up sid.h in DCE6

For coherence with DCE8 et DCE10, add or move some values under sid.h
and remove duplicated from si_enums.h.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/amdkfd: Evict all queues even HWS remove queue failed
Yifan Zha [Wed, 5 Mar 2025 05:14:55 +0000 (13:14 +0800)]
drm/amd/amdkfd: Evict all queues even HWS remove queue failed

[Why]
If reset is detected and kfd need to evict working queues, HWS moving queue will be failed.
Then remaining queues are not evicted and in active state.

After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted.
So remove queue will be failed again.

[How]
Keep removing all queues even if HWS returns failed.
It will not affect cpsch as it checks reset_domain->sem.

v2: If any queue failed, evict queue returns error.
v3: Declare err inside the if-block.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: fix SI's GB_ADDR_CONFIG_GOLDEN values and wire up sid.h in GFX6
Alexandre Demers [Sun, 9 Mar 2025 16:48:51 +0000 (12:48 -0400)]
drm/amdgpu: fix SI's GB_ADDR_CONFIG_GOLDEN values and wire up sid.h in GFX6

By wiring up sid.h in GFX6, we end up with a few duplicated defines such as
the golden registers. Let's clean this up.

[TAHITI,VERDE, HAINAN]_GB_ADDR_CONFIG_GOLDEN were defined both in sid.h
and under si_enums.h, with different values. Keep the values used under radeon
and move them under gfx_v6_0.c where they are used (as it is done under cik)

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: prepare DCE6 uniformisation with DCE8 and DCE10
Alexandre Demers [Sun, 9 Mar 2025 16:48:50 +0000 (12:48 -0400)]
drm/amdgpu: prepare DCE6 uniformisation with DCE8 and DCE10

Let's begin the cleanup in sid.h to prevent warnings and errors when wiring
sid.h into dce_v6_0.c.

This is a bigger cleanup.
Many defines found under sid.h have already been properly moved
into the different "_d.h" and "_sh_mask.h", so they should have been
already removed from sid.h and properly linked in where needed.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdkfd: delete stray tab in kfd_dbg_set_mes_debug_mode()
Dan Carpenter [Mon, 10 Mar 2025 10:47:25 +0000 (13:47 +0300)]
drm/amdkfd: delete stray tab in kfd_dbg_set_mes_debug_mode()

These lines are indented one tab more than they should be.  Delete
the stray tabs.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/gfx: delete stray tabs
Dan Carpenter [Mon, 10 Mar 2025 10:47:02 +0000 (13:47 +0300)]
drm/amdgpu/gfx: delete stray tabs

These lines are indented one tab too far.  Delete the extra tabs.

Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Trigger a wedged event for ring reset
André Almeida [Tue, 25 Feb 2025 01:02:21 +0000 (22:02 -0300)]
drm/amdgpu: Trigger a wedged event for ring reset

Instead of only triggering a wedged event for complete GPU resets,
trigger for ring resets. Regardless of the reset, it's useful for
userspace to know that it happened because the kernel will reject
further submissions from that app.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/vce2: fix ip block reference
Alex Deucher [Sun, 9 Mar 2025 16:26:50 +0000 (12:26 -0400)]
drm/amdgpu/vce2: fix ip block reference

Need to use the correct IP block type.  VCE vs VCN.
Fixes mclk issues on Hawaii.

Suggested by selendym.

Fixes: 82ae6619a450 ("drm/amdgpu: update the handle ptr in wait_for_idle")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3997
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: change kzalloc to kcalloc in dml1_validate()
Ethan Carter Edwards [Thu, 27 Feb 2025 23:16:24 +0000 (18:16 -0500)]
drm/amd/display: change kzalloc to kcalloc in dml1_validate()

We are trying to get rid of all multiplications from allocation
functions to prevent integer overflows. Here the multiplication is
probably safe, but using kcalloc() is more appropriate and improves
readability. This patch has no effect on runtime behavior.

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: change kzalloc to kcalloc in dcn314_validate_bandwidth()
Ethan Carter Edwards [Thu, 27 Feb 2025 23:16:23 +0000 (18:16 -0500)]
drm/amd/display: change kzalloc to kcalloc in dcn314_validate_bandwidth()

We are trying to get rid of all multiplications from allocation
functions to prevent integer overflows. Here the multiplication is
probably safe, but using kcalloc() is more appropriate and improves
readability. This patch has no effect on runtime behavior.

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: change kzalloc to kcalloc in dcn31_validate_bandwidth()
Ethan Carter Edwards [Thu, 27 Feb 2025 23:16:22 +0000 (18:16 -0500)]
drm/amd/display: change kzalloc to kcalloc in dcn31_validate_bandwidth()

We are trying to get rid of all multiplications from allocation
functions to prevent integer overflows. Here the multiplication is
probably safe, but using kcalloc() is more appropriate and improves
readability. This patch has no effect on runtime behavior.

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: change kzalloc to kcalloc in dcn30_validate_bandwidth()
Ethan Carter Edwards [Thu, 27 Feb 2025 23:16:21 +0000 (18:16 -0500)]
drm/amd/display: change kzalloc to kcalloc in dcn30_validate_bandwidth()

We are trying to get rid of all multiplications from allocation
functions to prevent integer overflows. Here the multiplication is
probably safe, but using kcalloc() is more appropriate and improves
readability. This patch has no effect on runtime behavior.

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Promote DAL to 3.2.324
Taimur Hassan [Mon, 3 Mar 2025 16:10:19 +0000 (11:10 -0500)]
drm/amd/display: Promote DAL to 3.2.324

This version brings along following fixes:
- Fix some Replay/PSR issue
- Fix backlight brightness
- Fix suspend issue
- Fix visual confirm color
- Add scoped mutexes for amdgpu_dm_dhcp

Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: remove minimum Dispclk and apply oem panel timing.
Charlene Liu [Mon, 3 Mar 2025 18:53:16 +0000 (13:53 -0500)]
drm/amd/display: remove minimum Dispclk and apply oem panel timing.

[why & how]
1. apply oem panel timing (not only on OLED)
2. remove MIN_DPP_DISP_CLK request in driver.

This fix will apply for dcn31x but not
sync with DML's output.

Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Drop unnecessary ret variable for enable_assr()
Mario Limonciello [Fri, 28 Feb 2025 19:32:57 +0000 (13:32 -0600)]
drm/amd/display: Drop unnecessary ret variable for enable_assr()

[Why]
enable_assr() has a res variable that only is changed in one block with
no cleanup necessary.

[How]
Remove variable and return early from failure cases.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Add scoped mutexes for amdgpu_dm_dhcp
Mario Limonciello [Fri, 28 Feb 2025 19:30:01 +0000 (13:30 -0600)]
drm/amd/display: Add scoped mutexes for amdgpu_dm_dhcp

[Why]
Guards automatically release mutex when it goes out of scope making
code easier to follow.

[How]
Replace all use of mutex_lock()/mutex_unlock() with guard(mutex).

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Fix slab-use-after-free on hdcp_work
Mario Limonciello [Fri, 28 Feb 2025 19:18:14 +0000 (13:18 -0600)]
drm/amd/display: Fix slab-use-after-free on hdcp_work

[Why]
A slab-use-after-free is reported when HDCP is destroyed but the
property_validate_dwork queue is still running.

[How]
Cancel the delayed work when destroying workqueue.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4006
Fixes: da3fd7ac0bcf ("drm/amd/display: Update CP property based on HW query")
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Prevent VStartup Overflow
Ryan Seto [Fri, 28 Feb 2025 19:24:57 +0000 (14:24 -0500)]
drm/amd/display: Prevent VStartup Overflow

[Why]
For some VR headsets with large blanks, it's possible
to overflow the OTG_VSTARTUP_PARAM:VSTARTUP_START
register. This can lead to incorrect DML calculations
and underflow downstream.

[How]
Min the calcualted max_vstartup_lines with the max
value of the register.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Ryan Seto <ryanseto@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Correct timing_adjust_pending flag setting.
Zhongwei Zhang [Fri, 28 Feb 2025 02:35:23 +0000 (10:35 +0800)]
drm/amd/display: Correct timing_adjust_pending flag setting.

[Why&How]
stream->adjust will be overwritten by update->crtc_timing_adjust.
We should set update->crtc_timing_adjust->timing_adjust_pending
and then overwrite stream->adjust.
Reset update->crtc_timing_adjust->timing_adjust_pending after
the assignment.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Zhongwei Zhang <Zhongwei.Zhang@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: calculate the remain segments for all pipes
Zhikai Zhai [Thu, 27 Feb 2025 12:09:14 +0000 (20:09 +0800)]
drm/amd/display: calculate the remain segments for all pipes

[WHY]
In some cases the remain de-tile buffer segments will be greater
than zero if we don't add the non-top pipe to calculate, at
this time the override de-tile buffer size will be valid and used.
But it makes the de-tile buffer segments used finally for all of pipes
exceed the maximum.

[HOW]
Add the non-top pipe to calculate the remain de-tile buffer segments.
Don't set override size to use the average according to pipe count
if the value exceed the maximum.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Zhikai Zhai <zhikai.zhai@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Fix visual confirm color not updating
Leo Zeng [Thu, 27 Feb 2025 20:09:04 +0000 (15:09 -0500)]
drm/amd/display: Fix visual confirm color not updating

[WHY]
Sometimes visual confirm color is updated, but the
background color is not changed. This causes visual
confrim to show incorrect colors.

[HOW]
Update background color when visual confirm color changes.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Leo Zeng <Leo.Zeng@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Assign normalized_pix_clk when color depth = 14
Alex Hung [Thu, 27 Feb 2025 23:36:25 +0000 (16:36 -0700)]
drm/amd/display: Assign normalized_pix_clk when color depth = 14

[WHY & HOW]
A warning message "WARNING: CPU: 4 PID: 459 at ... /dc_resource.c:3397
calculate_phy_pix_clks+0xef/0x100 [amdgpu]" occurs because the
display_color_depth == COLOR_DEPTH_141414 is not handled. This is
observed in Radeon RX 6600 XT.

It is fixed by assigning pix_clk * (14 * 3) / 24 - same as the rests.

Also fixes the indentation in get_norm_pix_clk.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Add Support for reg inbox0 for host->DMUB CMDs
Dillon Varone [Fri, 20 Dec 2024 22:01:29 +0000 (17:01 -0500)]
drm/amd/display: Add Support for reg inbox0 for host->DMUB CMDs

[WHY]
DCN4+ supports a new register based mailbox for sending messages
from host to DMCUB. This mailbox supports 64 byte commands, which makes
it compatible with the same structure as the frame buffer based mailbox.

[HOW]
The intention for reg_inbox0 is to be slot in replacement for the frame
buffer based mailbox (Inbox1). It supports all of the required features:
- Supports all messages handled by FB Inbox1
- Supports multi command batching

Reviewed-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: assume VBIOS supports DSC as default
Charlene Liu [Thu, 27 Feb 2025 04:52:57 +0000 (23:52 -0500)]
drm/amd/display: assume VBIOS supports DSC as default

[Why & How]
The clear_dsc_setting at boot logic was based on dcn version
check.
As such new ASIC lost this DSC clear up logic, change the
assumption to BIOS support eDP DSC for new ASIC.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Implement PCON regulated autonomous mode handling
George Shen [Thu, 13 Feb 2025 19:46:49 +0000 (14:46 -0500)]
drm/amd/display: Implement PCON regulated autonomous mode handling

[Why/How]
DP spec has been updated recently to make regulated autonomous mode more
well-defined. In case any PCON vendors choose to implement regulated
autonomous mode in the future, pre-emptively add handling for the
regulated autonomous mode based on current spec.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: not abort link train when bw is low
Peichen Huang [Tue, 25 Feb 2025 06:52:30 +0000 (14:52 +0800)]
drm/amd/display: not abort link train when bw is low

[WHY]
DP tunneling should not abort link train even bandwidth become
too low after downgrade. Otherwise, it would fail compliance test.

[HOW}
Do link train with downgrade settings even bandwidth is not enough

Reviewed-by: Cruise Hung <cruise.hung@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Do not enable replay when vtotal update is pending.
Danny Wang [Thu, 13 Feb 2025 08:18:34 +0000 (16:18 +0800)]
drm/amd/display: Do not enable replay when vtotal update is pending.

[Why&How]
Vtotal is not applied to HW when handling vsync interrupt.
Make sure vtotal is aligned before enable replay.

Reviewed-by: Anthony Koo <anthony.koo@amd.com>
Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Danny Wang <danny.wang@amd.com>
Signed-off-by: Zhongwei Zhang <Zhongwei.Zhang@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Add and use new dm_prepare_suspend() callback
Mario Limonciello [Thu, 13 Feb 2025 22:26:22 +0000 (16:26 -0600)]
drm/amd/display: Add and use new dm_prepare_suspend() callback

[Why]
The displays currently don't get turned off until after other IP blocks
have been suspended.  However turning off the displays first gives a
very visible response that the system is on it's way down.

[How]
Turn off displays in a prepare_suspend() callback instead when possible.
This will help for suspend and hibernate sequences.
The shutdown sequence however will not call prepare() so check whether
the state has been already saved to decide what to do.

Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Restore correct backlight brightness after a GPU reset
Mario Limonciello [Sun, 23 Feb 2025 06:04:35 +0000 (00:04 -0600)]
drm/amd/display: Restore correct backlight brightness after a GPU reset

[Why]
GPU reset will attempt to restore cached state, but brightness doesn't
get restored. It will come back at 100% brightness, but userspace thinks
it's the previous value.

[How]
When running resume sequence if GPU is in reset restore brightness
to previous value.

Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: fix default brightness
Mario Limonciello [Sun, 23 Feb 2025 05:37:32 +0000 (23:37 -0600)]
drm/amd/display: fix default brightness

[Why]
To avoid flickering during boot default brightness level set by BIOS
should be maintained for as much of the boot as feasible.
commit 2fe87f54abdc ("drm/amd/display: Set default brightness according
to ACPI") attempted to set the right levels for AC vs DC, but brightness
still got reset to maximum level in initialization code for
setup_backlight_device().

[How]
Remove the hardcoded initialization in setup_backlight_device() and
instead program brightness value to match BIOS (AC or DC).  This avoids a
brightness flicker from kernel changing the value.  Userspace may however
still change it during boot.

Fixes: 2fe87f54abdc ("drm/amd/display: Set default brightness according to ACPI")
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Add more debug data to dmub_srv
Joshua Aberback [Tue, 25 Feb 2025 21:17:25 +0000 (16:17 -0500)]
drm/amd/display: Add more debug data to dmub_srv

[Why]
When analyzing some crash dumps, not all of the expected DMUB info was
available, so we want to add in-object storage for this data.

[How]
 - dmub_srv_debug (renamed to dmub_timeout_info) is already a member of
dmub_diagnostic_data, therefore keep a dmub_diagnostic_data directly in
dmub_srv
 - use dmub_srv->debug when collecting diagnostic info instead of stack
object to allow for easy inspection in crash dumps

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Disable unneeded hpd interrupts during dm_init
Leo Li [Thu, 20 Feb 2025 21:20:26 +0000 (16:20 -0500)]
drm/amd/display: Disable unneeded hpd interrupts during dm_init

[Why]

It seems HPD interrupts are enabled by default for all connectors, even
if the hpd source isn't valid. An eDP for example, does not have a valid
hpd source (but does have a valid hpdrx source; see construct_phy()).
Thus, eDPs should have their hpd interrupt disabled.

In the past, this wasn't really an issue. Although the driver gets
interrupted, then acks by writing to hw registers, there weren't any
subscribed handlers that did anything meaningful (see
register_hpd_handlers()).

But things changed with the introduction of IPS. s2idle requires that
the driver allows IPS for DMUB fw to put hw to sleep. Since register
access requires hw to be awake, the driver will block IPS entry to do
so. And no IPS means no hw sleep during s2idle.

This was the observation on DCN35 systems with an eDP. During suspend,
the eDP toggled its hpd pin as part of the panel power down sequence.
The driver was then interrupted, and acked by writing to registers,
blocking IPS entry.

[How]

Since DC marks eDP connections as having invalid hpd sources (see
construct_phy()), DM should disable them at the hw level. Do so in
amdgpu_dm_hpd_init() by disabling all hpd ints first, then selectively
enabling ones for connectors that have valid hpd sources.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Fix incorrect DPCD configs while Replay/PSR switch
Leon Huang [Tue, 11 Feb 2025 07:45:43 +0000 (15:45 +0800)]
drm/amd/display: Fix incorrect DPCD configs while Replay/PSR switch

[Why]
When switching between PSR/Replay,
the DPCD config of previous mode is not cleared,
resulting in unexpected behavior in TCON.

[How]
Initialize the DPCD in setup function

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Leon Huang <Leon.Huang1@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdkfd: Add pm_config_dequeue_wait_counts API
Harish Kasiviswanathan [Tue, 11 Feb 2025 17:56:01 +0000 (12:56 -0500)]
drm/amdkfd: Add pm_config_dequeue_wait_counts API

Update pm_update_grace_period() to more cleaner
pm_config_dequeue_wait_counts(). Previously, grace_period variable was
overloaded as a variable and a macro, making it inflexible to configure
additional dequeue wait times.

pm_config_dequeue_wait_counts() now takes in a cmd / variable. This
allows flexibility to update different dequeue wait times.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/vcn: fix idle work handler for VCN 2.5
Alex Deucher [Tue, 4 Mar 2025 22:57:43 +0000 (17:57 -0500)]
drm/amdgpu/vcn: fix idle work handler for VCN 2.5

VCN 2.5 uses the PG callback to enable VCN DPM which is
a global state.  As such, we need to make sure all instances
are in the same state.

v2: switch to a ref count (Lijo)
v3: switch to its own idle work handler
v4: fix logic in DPG handling

Fixes: 4ce4fe27205c ("drm/amdgpu/vcn: use per instance callbacks for idle work handler")
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: allow 256B DCC max compressed block sizes on gfx12
Marek Olšák [Fri, 7 Mar 2025 14:57:45 +0000 (09:57 -0500)]
drm/amd/display: allow 256B DCC max compressed block sizes on gfx12

The hw supports it.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agoMerge tag 'amd-drm-next-6.15-2025-03-07' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Sun, 9 Mar 2025 22:19:25 +0000 (08:19 +1000)]
Merge tag 'amd-drm-next-6.15-2025-03-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amdgpu:
- Fix spelling typos
- RAS updates
- VCN 5.0.1 updates
- SubVP fixes
- DCN 4.0.1 fixes
- MSO DPCD fixes
- DIO encoder refactor
- PCON fixes
- Misc cleanups
- DMCUB fixes
- USB4 DP fixes
- DM cleanups
- Backlight cleanups and fixes
- Support platform backlight curves
- Misc code cleanups
- SMU 14 fixes
- JPEG 4.0.3 reset updates
- SR-IOV fixes
- SVM fixes
- GC 12 DCC fixes
- DC DCE 6.x fix
- Hiberation fix

amdkfd:
- Fix possible NULL pointer in queue validation
- Remove unnecessary CP domain validation
- SDMA queue reset support
- Add per process flags

radeon:
- Fix spelling typos
- RS400 hyperZ fix

UAPI:
- Add KFD per process flags for setting precision
  Proposed user space: https://github.com/ROCm/ROCR-Runtime/commit/2a64fa5e06e80e0af36df4ce0c76ae52eeec0a9d

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250307211051.1880472-1-alexander.deucher@amd.com
7 weeks agodrm/amdkfd: Add support for more per-process flag
Harish Kasiviswanathan [Tue, 14 Jan 2025 21:02:21 +0000 (16:02 -0500)]
drm/amdkfd: Add support for more per-process flag

Add support for more per-process flags starting with option to configure
MFMA precision for gfx 9.5

v2: Change flag name to KFD_PROC_FLAG_MFMA_HIGH_PRECISION
    Remove unused else condition
v3: Bump the KFD API version
v4: Missed SH_MEM_CONFIG__PRECISION_MODE__SHIFT define. Added it.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdkfd: Set per-process flags only once for gfx9/10/11/12
Harish Kasiviswanathan [Tue, 14 Jan 2025 19:13:35 +0000 (14:13 -0500)]
drm/amdkfd: Set per-process flags only once for gfx9/10/11/12

Define set_cache_memory_policy() for these asics and move all static
changes from update_qpd() which is called each time a queue is created
to set_cache_memory_policy() which is called once during process
initialization

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdkfd: Set per-process flags only once cik/vi
Harish Kasiviswanathan [Tue, 14 Jan 2025 19:07:24 +0000 (14:07 -0500)]
drm/amdkfd: Set per-process flags only once cik/vi

Set per-process static sh_mem config only once during process
initialization. Move all static changes from update_qpd() which is
called each time a queue is created to set_cache_memory_policy() which
is called once during process initialization.

set_cache_memory_policy() is currently defined only for cik and vi
family. So this commit only focuses on these two. A separate commit will
address other asics.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Keep display off while going into S4
Mario Limonciello [Thu, 6 Mar 2025 18:51:24 +0000 (12:51 -0600)]
drm/amd: Keep display off while going into S4

When userspace invokes S4 the flow is:

1) amdgpu_pmops_prepare()
2) amdgpu_pmops_freeze()
3) Create hibernation image
4) amdgpu_pmops_thaw()
5) Write out image to disk
6) Turn off system

Then on resume amdgpu_pmops_restore() is called.

This flow has a problem that because amdgpu_pmops_thaw() is called
it will call amdgpu_device_resume() which will resume all of the GPU.

This includes turning the display hardware back on and discovering
connectors again.

This is an unexpected experience for the display to turn back on.
Adjust the flow so that during the S4 sequence display hardware is
not turned back on.

Reported-by: Xaver Hugl <xaver.hugl@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/amdgpu: Add missing GC 11.5.0 register
Tom St Denis [Thu, 6 Mar 2025 17:31:56 +0000 (12:31 -0500)]
drm/amd/amdgpu: Add missing GC 11.5.0 register

Adds register needed for debugging purposes.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdkfd: clear F8_MODE for gfx950
Alex Sierra [Thu, 16 May 2024 22:06:48 +0000 (17:06 -0500)]
drm/amdkfd: clear F8_MODE for gfx950

Default F8_MODE should be OCP format on gfx950.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>