Kenneth Feng [Thu, 27 Feb 2025 02:13:53 +0000 (10:13 +0800)]
drm/amd/pm: add fan abnormal detection
add fan abnormal detection on smu v14.0.2&smu v14.0.3
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Yat Sin [Tue, 25 Feb 2025 23:08:02 +0000 (18:08 -0500)]
drm/amdkfd: clamp queue size to minimum
If queue size is less than minimum, clamp it to minimum to prevent
underflow when writing queue mqd.
Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
André Almeida [Wed, 26 Feb 2025 13:11:18 +0000 (10:11 -0300)]
drm/amdgpu: Create a debug option to disable ring reset
Prior to the addition of ring reset, the debug option
`debug_disable_soft_recovery` could be used to force a full device
reset. Now that we have ring reset, create a debug option to disable
them in amdgpu, forcing the driver to go with the full device
reset path again when both options are combined.
This option is useful for testing and debugging purposes when one wants
to test the full reset from userspace.
Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ma Ke [Wed, 26 Feb 2025 08:37:31 +0000 (16:37 +0800)]
drm/amd/display: Fix null check for pipe_ctx->plane_state in resource_build_scaling_params
Null pointer dereference issue could occur when pipe_ctx->plane_state
is null. The fix adds a check to ensure 'pipe_ctx->plane_state' is not
null before accessing. This prevents a null pointer dereference.
Found by code review.
Fixes: 3be5262e353b ("drm/amd/display: Rename more dc_surface stuff to plane_state") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 20 Feb 2025 15:42:30 +0000 (10:42 -0500)]
drm/amdgpu/mes11: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls
They are noops on GFX11 for most firmware versions. KFD already
handles its own queues and they should already be unmapped at this
point so even if this runs, it's not doing anything.
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Colin Ian King [Wed, 26 Feb 2025 08:57:33 +0000 (08:57 +0000)]
drm/amdgpu: Fix spelling mistake "initiailize" -> "initialize" and grammar
There is a spelling mistake and a grammatical error in a dev_err
message. Fix it.
Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Le Ma [Tue, 11 Feb 2025 06:06:54 +0000 (14:06 +0800)]
drm/amdgpu: add sdma page queue irq processing for sdma442
Add the trap irq processing for page queue of sdma442
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by and Tested-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Mon, 24 Feb 2025 08:16:32 +0000 (13:46 +0530)]
drm/amdkfd: Fix Circular Locking Dependency in 'svm_range_cpu_invalidate_pagetables'
This commit addresses a circular locking dependency in the
svm_range_cpu_invalidate_pagetables function. The function previously
held a lock while determining whether to perform an unmap or eviction
operation, which could lead to deadlocks.
Fixes the below:
[ 223.418794] ======================================================
[ 223.418820] WARNING: possible circular locking dependency detected
[ 223.418845] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE
[ 223.418869] ------------------------------------------------------
[ 223.418889] kfdtest/3939 is trying to acquire lock:
[ 223.418906] ffff8957552eae38 (&dqm->lock_hidden){+.+.}-{3:3}, at: evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.419302]
but task is already holding lock:
[ 223.419303] ffff8957556b83b0 (&prange->lock){+.+.}-{3:3}, at: svm_range_cpu_invalidate_pagetables+0x9d/0x850 [amdgpu]
[ 223.419447] Console: switching to colour dummy device 80x25
[ 223.419477] [IGT] amd_basic: executing
[ 223.419599]
which lock already depends on the new lock.
v2: To resolve this issue, the allocation of the process context buffer
(`proc_ctx_bo`) has been moved from the `add_queue_mes` function to the
`pqm_create_queue` function. This change ensures that the buffer is
allocated only when the first queue for a process is created and only if
the Micro Engine Scheduler (MES) is enabled. (Felix)
Fixes: 438b39ac74e2 ("drm/amdkfd: pause autosuspend when creating pdd") Cc: Jesse Zhang <jesse.zhang@amd.com> Cc: Yunxiang Li <Yunxiang.Li@amd.com> Cc: Philip Yang <Philip.Yang@amd.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 20 Feb 2025 14:58:25 +0000 (09:58 -0500)]
drm/amdgpu/mes12: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls
They are noops on GFX12. There is no suspend/resume all support
in firmware so the function doesn't do anything. KFD already
handles its own queues and they should already be unmapped at this
point so even if this runs, it's not doing anything.
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The last use of optc3_fpu_set_vrr_m_const() was removed in 2022's
commit 64f991590ff4 ("drm/amd/display: Fix a compilation failure on PowerPC
caused by FPU code")
which removed the only caller (with a similar) name.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Luan Arcanjo [Tue, 25 Feb 2025 01:55:29 +0000 (22:55 -0300)]
drm/amd/display/dc: Refactor remove duplications
All dce command_table_helper's shares a copy-pasted collection
of copy-pasted functions, which are: phy_id_to_atom,
clock_source_id_to_atom_phy_clk_src_id, and engine_bp_to_atom.
This patch removes the multiple copy-pasted by moving them to
the command_table_helper.c and make the command_table_helper's
calls the functions implemented by the command_table_helper.c
instead.
The changes were not tested on actual hardware. I am only able
to verify that the changes keep the code compileable and do my
best to to look repeatedly if I am not actually changing any code.
This is the version 4 of the PATCH, fixed comments about
licence in the new files and the matches From email to
Signed-off-by email. Fixed comments about using
command_table_helper instead of creating a dce_common
Signed-off-by: Luan Icaro Pinto Arcanjo <luanicaro@usp.br> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:26:32 +0000 (12:26 -0500)]
drm/amdgpu/vcn: use per instance callbacks for idle work handler
Use the vcn instance power gating callbacks rather than
the IP powergating callback. This limits power gating to
only the instance in use rather than all of the instances.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_1_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Excess function parameter 'handle' description in 'vcn_v5_0_1_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Excess function parameter 'handle' description in 'vcn_v4_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_3_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Excess function parameter 'handle' description in 'vcn_v4_0_3_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_5_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Excess function parameter 'handle' description in 'vcn_v4_0_5_is_idle'
Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Fri, 21 Feb 2025 10:18:07 +0000 (18:18 +0800)]
drm/amdgpu: update SDMA sysfs reset mask in late_init
- Added `sdma_v4_4_2_update_reset_mask` function to update the reset mask.
- update the sysfs reset mask to the `late_init` stage to ensure that the SMU initialization
and capability setup are completed before checking the SDMA reset capability.
- For IP versions 9.4.3 and 9.4.4, enable per-queue reset if the MEC firmware version is at least 0xb0 and PMFW supports queue reset.
- Add a TODO comment for future support of per-queue reset for IP version 9.5.0.
This change ensures that per-queue reset is only enabled when the MEC and PMFW support it.
v2: fix ip version (9.5.4 -> 9.5.0)(Lijo)
Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Mon, 10 Feb 2025 18:15:48 +0000 (13:15 -0500)]
drm/amdgpu: simplify xgmi peer info calls
Deprecate KFD XGMI peer info calls in favour of calling directly from
simplified XGMI peer info functions.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sun, 16 Feb 2025 21:27:26 +0000 (16:27 -0500)]
drm/amd/display: Promote DAL to 3.2.322
- Disable PSR-SU on eDP panels
- Fix HPD after GPU reset
- Fixes on dcn4x init, DML2 state policy on DCN36
- Various minor logic fixes
Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sun, 16 Feb 2025 20:06:06 +0000 (15:06 -0500)]
drm/amd/display: [FW Promotion] Release 0.0.255.0
Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Wed, 12 Feb 2025 19:49:36 +0000 (14:49 -0500)]
drm/amd/display: Fix HPD after gpu reset
[Why]
DC is not using amdgpu_irq_get/put to manage the HPD interrupt refcounts.
So when amdgpu_irq_gpu_reset_resume_helper() reprograms all of the IRQs,
HPD gets disabled.
[How]
Use amdgpu_irq_get/put() for HPD init/fini in DM in order to sync refcounts
Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mike Katsnelson [Thu, 13 Feb 2025 16:52:32 +0000 (11:52 -0500)]
drm/amd/display: stop DML2 from removing pipes based on planes
[Why]
Transitioning from low to high resolutions at high refresh rates caused grey corruption.
During the transition state, there is a period where plane size is based on low resultion
state and ODM slices are based on high resoultion state, causing the entire plane to be
contained in one ODM slice. DML2 would turn off the pipe for the ODM slice with no plane,
causing an underflow since the pixel rate for the higher resolution cannot be supported on
one pipe. This change stops DML2 from turning off pipes that are mapped to an ODM slice
with no plane. This is possible to do without negative consequences because pipes can now
take the minimum viewport and draw with zero recout size, removing the need to have the
pipe turned off.
[How]
In map_pipes_from_plane(), remove "check" that skips ODM slices that are not covered by
the plane. This prevents the pipes for those ODM slices from being freed.
Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Mike Katsnelson <mike.katsnelson@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Thu, 13 Feb 2025 22:40:29 +0000 (17:40 -0500)]
drm/amd/display: Increase halt timeout for DMCUB to 1s
[Why]
If we soft reset before halt finishes and there are outstanding
memory transactions then the memory interface may produce unexpected
results, such as out of order transactions when the firmware next runs.
These can manifest as random or unexpected load/store violations.
[How]
Increase the timeout before soft reset to ensure the DMCUB has quiesced.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
If max_downscale_src_width check fails, we exit early from TAP calculation and left a NULL
value to the scaling data structure to cause the zero divide in the DML validation.
[HOW]
Call set default TAP calculation before early exit in get_optimal_number_of_taps due to
max downscale limit exceed.
Reviewed-by: Samson Tam <samson.tam@amd.com> Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michael Strauss [Fri, 24 Jan 2025 20:02:27 +0000 (15:02 -0500)]
drm/amd/display: Update FIXED_VS Link Rate Toggle Workaround Usage
[WHY]
Previously the 128b/132b LTTPR support DPCD field was used to decide if
FIXED_VS training sequence required a rate toggle before initiating LT.
When running DP2.1 4.9.x.x compliance tests, emulated LTTPRs can report
no-128b/132b support which is then forwarded by the FIXED_VS retimer.
As a result this test exposes the rate toggle again, erroneously causing
failures as certain compliance sinks don't expect this behaviour.
[HOW]
Add new DPCD register defines/reads to read LTTPR IEEE OUI and device ID.
Decide whether to perform the rate toggle based on the LTTPR's IEEE OUI
which guarantees that we only perform the toggle on affected retimers.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Thu, 13 Feb 2025 17:37:10 +0000 (12:37 -0500)]
drm/amd/display: fix dcn4x init failed
[why]
failed due to cmdtable not created.
switch atombios cmdtable as default.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Mon, 20 Jan 2025 20:27:23 +0000 (15:27 -0500)]
drm/amd/display: Temporarily disable hostvm on DCN31
With HostVM enabled, DCN31 fails to pass validation for 3x4k60. Some Linux
userspace does not downgrade one of the monitors to 4k30, and the result
is that the monitor does not light up. Disable it until the bandwidth
calculation failure is resolved.
Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rafal Ostrowski [Wed, 12 Feb 2025 07:08:07 +0000 (08:08 +0100)]
drm/amd/display: ACPI Re-timer Programming
[Why]
We must implement an ACPI re-timer programming interface and notify
ACPI driver whenever a PHY transition is about to take place.
Because some trace lengths on certain platforms are very long,
then a re-timer may need to be programmed whenever a PHY transition
takes place. The implementation of this re-timer programming interface
will notify ACPI driver that PHY transition is taking place and it
will trigger the re-timer as needed.
First we need to gather retimer information from ACPI interface.
Then, in the PRE case, the re-timer interface needs to be called before we call
transmitter ENABLE.
In the POST case, it has to be called after we call transmitter DISABLE.
Yilin Chen [Fri, 7 Feb 2025 20:26:19 +0000 (15:26 -0500)]
drm/amd/display: add a quirk to enable eDP0 on DP1
[why]
some board designs have eDP0 connected to DP1, need a way to enable
support_edp0_on_dp1 flag, otherwise edp related features cannot work
[how]
do a dmi check during dm initialization to identify systems that
require support_edp0_on_dp1. Optimize quirk table with callback
functions to set quirk entries, retrieve_dmi_info can set quirks
according to quirk entries
Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Yilin Chen <Yilin.Chen@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Samson Tam [Tue, 7 Jan 2025 19:17:15 +0000 (14:17 -0500)]
drm/amd/display: Fix unit test failure
[Why]
Some of unit tests use large scaling ratio such that when we
calculate optimal number of taps, max_taps is negative.
Then in recent change, we changed max_taps to uint instead
of int so now max_taps wraps and is positive. This change
changed the behaviour from returning back false to return
true and breaks unit test check
[How]
Add check to prevent max_taps from wrapping and set to 0
instead
Signed-off-by: Samson Tam <Samson.Tam@amd.com> Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Samson Tam [Tue, 21 Jan 2025 16:01:47 +0000 (11:01 -0500)]
drm/amd/display: fix check for identity ratio
[Why]
IDENTITY_RATIO check uses 2 bits for integer, which only allows
checking downscale ratios up to 3. But we support up to 6x
downscale
[How]
Update IDENTITY_RATIO to check 3 bits for integer
Add ASSERT to catch if we downscale more than 6x
Signed-off-by: Samson Tam <Samson.Tam@amd.com> Reviewed-by: Jun Lei <jun.lei@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Assadian, Navid [Thu, 19 Dec 2024 22:19:09 +0000 (17:19 -0500)]
drm/amd/display: Fix mismatch type comparison
The mismatch type comparison/assignment may cause data loss. Since the
values are always non-negative, it is safe to use unsigned variables to
resolve the mismatch.
Signed-off-by: Navid Assadian <navid.assadian@amd.com> Reviewed-by: Joshua Aberback <joshua.aberback@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Samson Tam [Tue, 7 Jan 2025 19:16:04 +0000 (14:16 -0500)]
drm/amd/display: Fix mismatch type comparison in custom_float
[Why & How]
Passing uint into uchar function param. Pass uint instead
Signed-off-by: Samson Tam <Samson.Tam@amd.com> Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>