Alex Deucher [Fri, 14 Mar 2025 23:23:46 +0000 (19:23 -0400)]
drm/amdgpu/sdma: fix engine reset handling
Move the kfd suspend/resume code into the caller. That
is where the KFD is likely to detect a reset so on the KFD
side there is no need to call them. Also add a mutex to
lock the actual reset sequence.
v2: make the locking per instance
Fixes: bac38ca8c475 ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+") Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Tue, 18 Mar 2025 14:43:41 +0000 (15:43 +0100)]
drm/amdgpu: remove invalid usage of sched.ready
I can't count how often I had to remove this nonsense.
Probably doesn't need an explanation any more.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 6 Feb 2025 13:26:30 +0000 (14:26 +0100)]
drm/amdgpu: add cleaner shader trace point
Note when the cleaner shader is executed.
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 6 Feb 2025 13:16:06 +0000 (14:16 +0100)]
drm/amdgpu: add isolation trace point
Note when we switch from one isolation owner to another.
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Mon, 27 Jan 2025 15:27:51 +0000 (16:27 +0100)]
drm/amdgpu: stop reserving VMIDs to enforce isolation
That was quite troublesome for gang submit. Completely drop this
approach and enforce the isolation separately.
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Wed, 15 Jan 2025 12:44:26 +0000 (13:44 +0100)]
drm/amdgpu: rework how isolation is enforced v2
Limiting the number of available VMIDs to enforce isolation causes some
issues with gang submit and applying certain HW workarounds which
require multiple VMIDs to work correctly.
So instead start to track all submissions to the relevant engines in a
per partition data structure and use the dma_fences of the submissions
to enforce isolation similar to what a VMID limit does.
v2: use ~0l for jobs without isolation to distinct it from kernel
submissions which uses NULL for the owner. Add some warning when we
are OOM.
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 23 Jan 2025 13:59:01 +0000 (14:59 +0100)]
drm/amdgpu: overwrite signaled fence in amdgpu_sync
This allows using amdgpu_sync even without peeking into the fences for a
long time.
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Wed, 15 Jan 2025 14:10:13 +0000 (15:10 +0100)]
drm/amdgpu: use GFP_NOWAIT for memory allocations
In the critical submission path memory allocations can't wait for
reclaim since that can potentially wait for submissions to finish.
Finally clean that up and mark most memory allocations in the critical
path with GFP_NOWAIT. The only exception left is the dma_fence_array()
used when no VMID is available, but that will be cleaned up later on.
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reason for revert: this causes some tests fail with call trace.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Fri, 14 Mar 2025 15:08:21 +0000 (11:08 -0400)]
drm/amdkfd: set precise mem ops caps to disabled for gfx 11 and 12
Clause instructions with precise memory enabled currently hang the
shader so set capabilities flag to disabled since it's unsafe to use
for debugging.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: Lancelot Six <lancelot.six@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Fri, 7 Mar 2025 05:41:23 +0000 (11:11 +0530)]
drm/amd/pm: Add debug bit for smu pool allocation
In certain cases, it's desirable to avoid PMFW log transactions to
system memory. Add a mask bit to decide whether to allocate smu pool in
device memory or system memory.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 14 Mar 2025 13:45:51 +0000 (09:45 -0400)]
drm/amdgpu/vcn: adjust workload profile handling
No need to make the workload profile setup dependent
on the results of cancelling the delayed work thread.
We have all of the necessary checking in place for the
workload profile reference counting, so separate the
two. As it is now, we can theoretically end up with
the call from begin_use happening while the worker
thread is executing which would result in the profile
not getting set for that submission. It should not
affect the reference counting.
v2: bail early if the the profile is already active (Lijo)
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 14 Mar 2025 13:29:59 +0000 (09:29 -0400)]
drm/amdgpu/gfx: adjust workload profile handling
No need to make the workload profile setup dependent
on the results of cancelling the delayed work thread.
We have all of the necessary checking in place for the
workload profile reference counting, so separate the
two. As it is now, we can theoretically end up with
the call from begin_use happening while the worker
thread is executing which would result in the profile
not getting set for that submission. It should not
affect the reference counting.
v2: bail early if the the profile is already active (Lijo)
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 12 Mar 2025 13:48:47 +0000 (09:48 -0400)]
drm/amdgpu/vcn: fix ref counting for ring based profile handling
We need to make sure the workload profile ref counts are
balanced. This isn't currently the case because we can
increment the count on submissions, but the decrement may
be delayed as work comes in. Track when we enable the
workload profile so the references are balanced.
v2: switch to a mutex and active flag
v3: fix mutex init
Fixes: 1443dd3c67f6 ("drm/amd/pm: fix and simplify workload handling") Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 12 Mar 2025 13:44:19 +0000 (09:44 -0400)]
drm/amdgpu/gfx: fix ref counting for ring based profile handling
We need to make sure the workload profile ref counts are
balanced. This isn't currently the case because we can
increment the count on submissions, but the decrement may
be delayed as work comes in. Track when we enable the
workload profile so the references are balanced.
v2: switch to a mutex and active flag
v3: fix mutex init
Fixes: 8fdb3958e396 ("drm/amdgpu/gfx: add ring helpers for setting workload profile") Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Kenneth Feng <kenneth.feng@amd.com> Tested-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harish Kasiviswanathan [Fri, 14 Mar 2025 16:03:53 +0000 (12:03 -0400)]
drm/amdkfd: Fix bug in config_dequeue_wait_counts
For certain ASICs where dequeue_wait_count don't need to be initialized,
pm_config_dequeue_wait_counts_v9 return without filling in the packet
information. However, the calling function interprets this as a success
and sends the uninitialized packet to firmware causing hang.
Fix the above bug by not calling pm_config_dequeue_wait_counts_v9 for
ASICs that don't need the value to be initialized.
v2: Removed redudant code.
Tidy up code based on review comments
v3: Don't call pm_config_dequeue_wait_counts_v9 for certain ASICs
Fixes: ed962f8d0603 ("drm/amdkfd: Add pm_config_dequeue_wait_counts API") Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Tue, 14 Jan 2025 12:51:39 +0000 (13:51 +0100)]
drm/amdgpu: grab an additional reference on the gang fence v2
We keep the gang submission fence around in adev, make sure that it
stays alive.
v2: fix memory leak on retry
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Mon, 17 Mar 2025 01:14:36 +0000 (09:14 +0800)]
drm/amdgpu: Fix SDMA engine reset logic
The scheduler should restart only if the reset operation
succeeds This ensures that new tasks are only submitted
to the queues after a successful reset.
Fixes: 4c02f7301657 ("drm/amdgpu: Introduce conditional user queue suspension for SDMA resets") Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse.Zhang <Jesse.zhang@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tomasz Pakuła [Tue, 11 Mar 2025 21:38:33 +0000 (22:38 +0100)]
drm/amdgpu/pm: Handle SCLK offset correctly in overdrive for smu 14.0.2
Currently, it seems like the code was carried over from RDNA3 because
it assumes two possible values to set. RDNA4, instead of having:
0: min SCLK
1: max SCLK
only has
0: SCLK offset
This change makes it so it only reports current offset value instead of
showing possible min/max values and their indices. Moreover, it now only
accepts the offset as a value, without the indice index.
Additionally, the lower bound was printed as %u by mistake.
Setting this offset:
Old: "s 1 <offset>"
New: "s <offset>"
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4036 Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sun, 9 Mar 2025 08:57:46 +0000 (04:57 -0400)]
drm/amd/display: 3.2.325
This version brings along following fixes:
- Use DPM table clk setting for dml2 soc dscclk
- Update static soc table
- Fix incorrect fw_state address in dmub_srv
- Use HW lock mgr for PSR1 when only one eDP
- Revert "Support for reg inbox0 for host->DMUB CMDs"
- Change notification of link BW allocation
- Fix message for support_edp0_on_dp1
- Guard against setting dispclk low for dcn31x
- Prevent VStartup Overflow
- Check pipe->stream before passing it to a function
Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Fri, 28 Feb 2025 13:02:20 +0000 (08:02 -0500)]
drm/amd/display: Use DPM table clk setting for dml2 soc dscclk
[WHY]
Not like dppclk/dispclk, dml2 will calculate the minimum required clocks.
For dscclk, it is used for pure comparision.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Fri, 7 Mar 2025 23:24:56 +0000 (18:24 -0500)]
drm/amd/display: Update static soc table
[WHY]
Update the static soc table dcn3_5_soc.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lo-an Chen [Mon, 10 Mar 2025 06:52:22 +0000 (14:52 +0800)]
drm/amd/display: Fix incorrect fw_state address in dmub_srv
[WHY]
The fw_state in dmub_srv was assigned with wrong address.
The address was pointed to the firmware region.
[HOW]
Fix the firmware state by using DMUB_DEBUG_FW_STATE_OFFSET
in dmub_cmd.h.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Lo-an Chen <lo-an.chen@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Fri, 7 Mar 2025 21:55:20 +0000 (15:55 -0600)]
drm/amd/display: Use HW lock mgr for PSR1 when only one eDP
[WHY]
DMUB locking is important to make sure that registers aren't accessed
while in PSR. Previously it was enabled but caused a deadlock in
situations with multiple eDP panels.
[HOW]
Detect if multiple eDP panels are in use to decide whether to use
lock. Refactor the function so that the first check is for PSR-SU
and then replay is in use to prevent having to look up number
of eDP panels for those configurations.
Fixes: f245b400a223 ("Revert "drm/amd/display: Use HW lock mgr for PSR1"") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3965 Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cruise Hung [Tue, 25 Feb 2025 15:03:29 +0000 (23:03 +0800)]
drm/amd/display: Change notification of link BW allocation
[WHY & HOW]
The response of DP BW allocation is handled in Outbox ISR.
When it failed to request the DP BW allocation, it sent another
DPCD request in Outbox ISR immediately. The DP AUX reply also
uses the Outbox ISR. So, no AUX reply happened in this case.
Change to use HPD IRQ for the notification.
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Cruise Hung <Cruise.Hung@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jing Zhou [Tue, 4 Mar 2025 15:15:56 +0000 (23:15 +0800)]
drm/amd/display: Guard against setting dispclk low for dcn31x
[WHY]
We should never apply a minimum dispclk value while in
prepare_bandwidth or while displays are active. This is
always an optimizaiton for when all displays are disabled.
[HOW]
Defer dispclk optimization until safe_to_lower = true
and display_count reaches 0.
Since 0 has a special value in this logic (ie. no dispclk
required) we also need adjust the logic that clamps it for
the actual request to PMFW.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Chris Park <chris.park@amd.com> Reviewed-by: Eric Yang <eric.yang@amd.com> Signed-off-by: Jing Zhou <Jing.Zhou@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ryan Seto [Fri, 28 Feb 2025 20:52:42 +0000 (15:52 -0500)]
drm/amd/display: Prevent VStartup Overflow
[WHY & HOW]
Fixed Overflow issue by clamping VStartup to max value of register.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Ryan Seto <ryanseto@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kenneth Feng [Tue, 11 Mar 2025 08:46:39 +0000 (16:46 +0800)]
drm/amd/amdgpu: shorten the gfx idle worker timeout
Shorten the gfx idle worker timeout. This is to sync with
DAL when there is no activity on the screen. Original 1
second can not sync with DAL, so DAL can not apply MALL
when the workload type is not bootup default.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move to probe so we can check the PCI device type and
only apply the drm_firmware_drivers_only() check for
PCI DISPLAY classes. Also add a module parameter to
override the nomodeset kernel parameter as a workaround
for platforms that have this hardcoded on their kernel
command lines.
Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 13 Mar 2025 20:57:50 +0000 (16:57 -0400)]
drm/amdgpu: drop drm_firmware_drivers_only()
There are a number of systems and cloud providers out there
that have nomodeset hardcoded in their kernel parameters
to block nouveau for the nvidia driver. This prevents the
amdgpu driver from loading. Unfortunately the end user cannot
easily change this. The preferred way to block modules from
loading is to use modprobe.blacklist=<driver>. That is what
providers should be using to block specific drivers.
Drop the check to allow the driver to load even when nomodeset
is specified on the kernel command line.
Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Belanger [Tue, 2 Jul 2024 21:56:41 +0000 (17:56 -0400)]
drm/amdgpu: Restore uncached behaviour on GFX12
Always use MTYPE_UC if UNCACHED flag is specified.
This makes kernarg region uncached and it restores
usermode cache disable debug flag functionality.
Do not set MTYPE_UC for COHERENT flag, on GFX12 coherence is handled by
shader code.
Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wentao Liang [Wed, 12 Mar 2025 06:31:06 +0000 (14:31 +0800)]
drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini()
In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is
incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini()
can only release 'pfp' field of 'gfx'. The release function of 'me' field
should be gfx_v12_0_me_fini().
Fixes: 52cb80c12e8a ("drm/amdgpu: Add gfx v12_0 ip block support (v6)") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: b5c764d6ed55 ("drm/amd/display: Use HW lock mgr for PSR1") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Cc: Sun peng Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Daniel Wheeler <daniel.wheeler@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Amber Lin [Thu, 13 Mar 2025 01:14:43 +0000 (21:14 -0400)]
drm/amdkfd: Correct F8_MODE for gfx950
Correct F8_MODE setting for gfx950 that was removed
Fixes: 61972cd93af7 ("drm/amdkfd: Set per-process flags only once for gfx9/10/11/12") Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviwanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jay Cornwall [Fri, 7 Feb 2025 21:40:34 +0000 (16:40 -0500)]
drm/amdkfd: Fix instruction hazard in gfx12 trap handler
VALU instructions with SGPR source need wait states to avoid hazard
with SALU using different SGPR.
v2: Eliminate some hazards to reduce code explosion
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Lancelot Six <lancelot.six@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Tue, 11 Mar 2025 17:10:17 +0000 (11:10 -0600)]
drm/amd/display: Remove incorrect macro guard
This macro guard "__cplusplus" is unnecessary and should not be there.
Signed-off-by: Alex Hung <alex.hung@amd.com> Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Thu, 6 Feb 2025 12:10:42 +0000 (17:40 +0530)]
drm/amdgpu: Calculate IP specific xgmi bandwidth
Use IP version specific xgmi speed/width for bandwidth calculation.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harish Kasiviswanathan [Tue, 25 Feb 2025 20:50:30 +0000 (15:50 -0500)]
drm/amdgpu: Reduce dequeue retry timeout for gfx9 family
Dequeue retry timeout controls the interval between checks for unmet
conditions. On MI series, reduce this from 0x40 to 0x1 (~ 1 uS). The
cost of additional bandwidth consumed by CP when polling memory
shouldn't be substantial.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 26 Feb 2025 21:08:03 +0000 (16:08 -0500)]
drm/amdgpu/gfx12: don't read registers in mqd init
Just use the default values. There's not need to
get the value from hardware and it could cause problems
if we do that at runtime and gfxoff is active.
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 26 Feb 2025 20:55:33 +0000 (15:55 -0500)]
drm/amdgpu/gfx11: don't read registers in mqd init
Just use the default values. There's not need to
get the value from hardware and it could cause problems
if we do that at runtime and gfxoff is active.
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Emily Deng [Thu, 6 Mar 2025 00:40:01 +0000 (08:40 +0800)]
drm/amdgpu: Fix the race condition for draining retry fault
Issue:
In the scenario where svm_range_restore_pages is called, but
svm->checkpoint_ts has not been set and the retry fault has not been
drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free.
Meanwhile, svm_range_restore_pages continues execution and reaches
svm_range_from_addr. This results in a "failed to find prange..." error,
causing the page recovery to fail.
How to fix:
Move the timestamp check code under the protection of svm->lock.
v2:
Make sure all right locks are released before go out.
v3:
Directly goto out_unlock_svms, and return -EAGAIN.
v4:
Refine code.
Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Thu, 6 Feb 2025 12:03:46 +0000 (17:33 +0530)]
drm/amdgpu: Remove unsupported xgmi versions
XGMI v4.8.0 is not used in any SOCs. Remove the associated functions.
Also, ensure get_xgmi_info callback pointer is not NULL before calling
the function.
Nikita Zhandarovich [Tue, 11 Mar 2025 11:14:59 +0000 (14:14 +0300)]
drm/radeon: fix uninitialized size issue in radeon_vce_cs_parse()
On the off chance that command stream passed from userspace via
ioctl() call to radeon_vce_cs_parse() is weirdly crafted and
first command to execute is to encode (case 0x03000001), the function
in question will attempt to call radeon_vce_cs_reloc() with size
argument that has not been properly initialized. Specifically, 'size'
will point to 'tmp' variable before the latter had a chance to be
assigned any value.
Play it safe and init 'tmp' with 0, thus ensuring that
radeon_vce_cs_reloc() will catch an early error in cases like these.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 2fc5703abda2 ("drm/radeon: check VCE relocation buffer range v3") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yifan Zha [Wed, 5 Mar 2025 05:14:55 +0000 (13:14 +0800)]
drm/amd/amdkfd: Evict all queues even HWS remove queue failed
[Why]
If reset is detected and kfd need to evict working queues, HWS moving queue will be failed.
Then remaining queues are not evicted and in active state.
After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted.
So remove queue will be failed again.
[How]
Keep removing all queues even if HWS returns failed.
It will not affect cpsch as it checks reset_domain->sem.
v2: If any queue failed, evict queue returns error.
v3: Declare err inside the if-block.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alexandre Demers [Sun, 9 Mar 2025 16:48:51 +0000 (12:48 -0400)]
drm/amdgpu: fix SI's GB_ADDR_CONFIG_GOLDEN values and wire up sid.h in GFX6
By wiring up sid.h in GFX6, we end up with a few duplicated defines such as
the golden registers. Let's clean this up.
[TAHITI,VERDE, HAINAN]_GB_ADDR_CONFIG_GOLDEN were defined both in sid.h
and under si_enums.h, with different values. Keep the values used under radeon
and move them under gfx_v6_0.c where they are used (as it is done under cik)
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alexandre Demers [Sun, 9 Mar 2025 16:48:50 +0000 (12:48 -0400)]
drm/amdgpu: prepare DCE6 uniformisation with DCE8 and DCE10
Let's begin the cleanup in sid.h to prevent warnings and errors when wiring
sid.h into dce_v6_0.c.
This is a bigger cleanup.
Many defines found under sid.h have already been properly moved
into the different "_d.h" and "_sh_mask.h", so they should have been
already removed from sid.h and properly linked in where needed.
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
André Almeida [Tue, 25 Feb 2025 01:02:21 +0000 (22:02 -0300)]
drm/amdgpu: Trigger a wedged event for ring reset
Instead of only triggering a wedged event for complete GPU resets,
trigger for ring resets. Regardless of the reset, it's useful for
userspace to know that it happened because the kernel will reject
further submissions from that app.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ethan Carter Edwards [Thu, 27 Feb 2025 23:16:24 +0000 (18:16 -0500)]
drm/amd/display: change kzalloc to kcalloc in dml1_validate()
We are trying to get rid of all multiplications from allocation
functions to prevent integer overflows. Here the multiplication is
probably safe, but using kcalloc() is more appropriate and improves
readability. This patch has no effect on runtime behavior.
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ethan Carter Edwards [Thu, 27 Feb 2025 23:16:23 +0000 (18:16 -0500)]
drm/amd/display: change kzalloc to kcalloc in dcn314_validate_bandwidth()
We are trying to get rid of all multiplications from allocation
functions to prevent integer overflows. Here the multiplication is
probably safe, but using kcalloc() is more appropriate and improves
readability. This patch has no effect on runtime behavior.
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ethan Carter Edwards [Thu, 27 Feb 2025 23:16:22 +0000 (18:16 -0500)]
drm/amd/display: change kzalloc to kcalloc in dcn31_validate_bandwidth()
We are trying to get rid of all multiplications from allocation
functions to prevent integer overflows. Here the multiplication is
probably safe, but using kcalloc() is more appropriate and improves
readability. This patch has no effect on runtime behavior.
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ethan Carter Edwards [Thu, 27 Feb 2025 23:16:21 +0000 (18:16 -0500)]
drm/amd/display: change kzalloc to kcalloc in dcn30_validate_bandwidth()
We are trying to get rid of all multiplications from allocation
functions to prevent integer overflows. Here the multiplication is
probably safe, but using kcalloc() is more appropriate and improves
readability. This patch has no effect on runtime behavior.
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Mon, 3 Mar 2025 16:10:19 +0000 (11:10 -0500)]
drm/amd/display: Promote DAL to 3.2.324
This version brings along following fixes:
- Fix some Replay/PSR issue
- Fix backlight brightness
- Fix suspend issue
- Fix visual confirm color
- Add scoped mutexes for amdgpu_dm_dhcp
Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Mon, 3 Mar 2025 18:53:16 +0000 (13:53 -0500)]
drm/amd/display: remove minimum Dispclk and apply oem panel timing.
[why & how]
1. apply oem panel timing (not only on OLED)
2. remove MIN_DPP_DISP_CLK request in driver.
This fix will apply for dcn31x but not
sync with DML's output.
Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Fri, 28 Feb 2025 19:32:57 +0000 (13:32 -0600)]
drm/amd/display: Drop unnecessary ret variable for enable_assr()
[Why]
enable_assr() has a res variable that only is changed in one block with
no cleanup necessary.
[How]
Remove variable and return early from failure cases.
Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Fri, 28 Feb 2025 19:30:01 +0000 (13:30 -0600)]
drm/amd/display: Add scoped mutexes for amdgpu_dm_dhcp
[Why]
Guards automatically release mutex when it goes out of scope making
code easier to follow.
[How]
Replace all use of mutex_lock()/mutex_unlock() with guard(mutex).
Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Fri, 28 Feb 2025 19:18:14 +0000 (13:18 -0600)]
drm/amd/display: Fix slab-use-after-free on hdcp_work
[Why]
A slab-use-after-free is reported when HDCP is destroyed but the
property_validate_dwork queue is still running.
[How]
Cancel the delayed work when destroying workqueue.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4006 Fixes: da3fd7ac0bcf ("drm/amd/display: Update CP property based on HW query") Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ryan Seto [Fri, 28 Feb 2025 19:24:57 +0000 (14:24 -0500)]
drm/amd/display: Prevent VStartup Overflow
[Why]
For some VR headsets with large blanks, it's possible
to overflow the OTG_VSTARTUP_PARAM:VSTARTUP_START
register. This can lead to incorrect DML calculations
and underflow downstream.
[How]
Min the calcualted max_vstartup_lines with the max
value of the register.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Ryan Seto <ryanseto@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhongwei Zhang [Fri, 28 Feb 2025 02:35:23 +0000 (10:35 +0800)]
drm/amd/display: Correct timing_adjust_pending flag setting.
[Why&How]
stream->adjust will be overwritten by update->crtc_timing_adjust.
We should set update->crtc_timing_adjust->timing_adjust_pending
and then overwrite stream->adjust.
Reset update->crtc_timing_adjust->timing_adjust_pending after
the assignment.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Zhongwei Zhang <Zhongwei.Zhang@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhikai Zhai [Thu, 27 Feb 2025 12:09:14 +0000 (20:09 +0800)]
drm/amd/display: calculate the remain segments for all pipes
[WHY]
In some cases the remain de-tile buffer segments will be greater
than zero if we don't add the non-top pipe to calculate, at
this time the override de-tile buffer size will be valid and used.
But it makes the de-tile buffer segments used finally for all of pipes
exceed the maximum.
[HOW]
Add the non-top pipe to calculate the remain de-tile buffer segments.
Don't set override size to use the average according to pipe count
if the value exceed the maximum.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Zhikai Zhai <zhikai.zhai@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>