Jesse.Zhang [Fri, 9 May 2025 01:55:17 +0000 (09:55 +0800)]
drm/amdgpu: Add GFX 9.5.0 support for per-queue/pipe reset
This patch enables per-queue and per-pipe reset functionality for
GFX IP v9.5.0 when using MEC firmware version 21 (0x15) or later.
This change:
1. Refactors the pipe reset support check in gfx_v9_4_3_pipe_reset_support()
to use the compute_supported_reset flags instead of hardcoding
version checks.
2. Adds support for GFX9.5.0 (IP 9.5.0) with MEC firmware version >= 21
to enable per-queue and per-pipe reset capabilities.
v2: Replaced mec version check with !!(adev->gfx.compute_supported_reset & AMDGPU_RESET_TYPE_PER_PIPE) (Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sebastian Aguilera Novoa [Sat, 3 May 2025 03:59:46 +0000 (00:59 -0300)]
drm/amd/display/dc/irq: Remove duplications of hpd_ack function from IRQ
The major of dcn and dce irqs share a copy-pasted collection
of copy-pasted function, which is: hpd_ack.
This patch removes the multiple copy-pasted by moving them to
the irq_service.c and make the irq_service's
calls the functions implemented by the irq_service.c
instead.
The hpd_ack function is replaced by hpd0_ack and hpd1_ack, the
required constants are also added.
The changes were not tested on actual hardware. I am only able
to verify that the changes keep the code compileable and do my
best to look repeatedly if I am not actually changing any code.
Signed-off-by: Sebastian Aguilera Novoa <saguileran@ime.usp.br> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix null check of pipe_ctx->plane_state for update_dchubp_dpp
Similar to commit 6a057072ddd1 ("drm/amd/display: Fix null check for
pipe_ctx->plane_state in dcn20_program_pipe") that addresses a null
pointer dereference on dcn20_update_dchubp_dpp. This is the same
function hooked for update_dchubp_dpp in dcn401, with the same issue.
Fix possible null pointer deference on dcn401_program_pipe too.
Fixes: 63ab80d9ac0a ("drm/amd/display: DML2.1 Post-Si Cleanup") Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 7 May 2025 13:57:59 +0000 (09:57 -0400)]
drm/amdkfd: drop warning in event_interrupt_isr_v1*()
Commit ded8b3c36f17 ("drm/amdgpu: properly handle GC vs MM in amdgpu_vmid_mgr_init()")
enables all 16 vmids for MMHUB on GC 10 and newer for KGD since
there are no KFD resources using MMHUB. With this change, KFD
starts seeing MMHUB vmids in it's range with no pasid set. As
such there is no need to warn, we can just ignore those interrupts.
Fixes: aded8b3c36f1 ("drm/amdgpu: properly handle GC vs MM in amdgpu_vmid_mgr_init()") Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Tue, 6 May 2025 11:08:27 +0000 (16:38 +0530)]
drm/amdgpu: Log RAS errors during load
During driver load, RAS event manager may not be initialized. This will
cause any ATHUB event during driver load to be skipped in dmesg log. Log
the error in dmesg log for easier diagnosis.
Jesse.Zhang [Fri, 9 May 2025 09:18:16 +0000 (17:18 +0800)]
drm/amdgpu: Fix user queue deadlock by reordering mutex locking
This resolves a deadlock between user queue management and GPU reset
paths by enforcing consistent lock ordering.
The deadlock occurred when:
1. Process exit path (amdgpu_userq_mgr_fini) would:
- Take uqm->userq_mutex
- Then try to take adev->userq_mutex for list operations
2. GPU reset path (amdgpu_userq_pre_reset) would:
- Take adev->userq_mutex first (for list traversal)
- Then take uqm->userq_mutex
The solution establishes a strict top-down locking order:
1. Always take adev->userq_mutex before any uqm->userq_mutex
2. Maintain this order consistently across all code paths
Changes made:
- Reordered locking in amdgpu_userq_mgr_fini() to take device lock first
- Kept existing proper order in amdgpu_userq_pre_reset()
- Simplified the fini flow by removing redundant operations
This prevents circular dependencies while maintaining thread safety
during both normal operation and GPU reset scenarios.
Fixes: 4ce60dbada96 ("drm/amdgpu: store userq_managers in a list in adev") Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ce Sun [Thu, 8 May 2025 13:09:29 +0000 (21:09 +0800)]
drm/amdgpu: Fix the kernel panic caused by RAS records exceed threshold
kernel panic caused by RAS records exceeding the threshold when load driver
specifying RMA(bad_page_threshold=128)
1.Fix the warnings caused by disabling the interrupt source
before it was enabled
2.Fix kernel panic when xcp sysfs is not initialized,null pointer
appears during fini
3.Fix the memory leak caused by the device's early exit due to rma
Tao Zhou [Wed, 2 Apr 2025 08:06:58 +0000 (16:06 +0800)]
drm/amdgpu: adjust high bits for RAS retired page
Per UMC address conversion algorithm, the high row bits of UMC MCA
address are changed when they're converted into normalized address
on specific ASICs.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Refine RAS bad page records counting and parsing in eeprom V3
there is only MCA records in V3, no need to care about PA records.
recalculate the value of ras_num_bad_pages when parsing failed and
go on with the left records instead of quit.
Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Mon, 5 May 2025 13:14:43 +0000 (09:14 -0400)]
drm/amd/display: Promote DC to 3.2.333
Summary
* Refactor DMI quirks
* Fix link-off issue triggered by quick unplug/replug
* Fix race condition when submitting DMUB commands
* Correct reply value when AUX Write incomplete
* Backup / restore plane config only on update
Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michael Strauss [Mon, 4 Dec 2023 08:30:39 +0000 (16:30 +0800)]
drm/amd/display: Add early 8b/10b channel equalization test pattern sequence
[WHY]
Early EQ pattern sequence is required for some LTTPR + old dongle
combinations.
[HOW]
If DP_EARLY_8B10B_TPS2 chip cap is set, this new sequence programs phy
to output TPS2 before initiating link training and writes TPS1 to
LTTPR training pattern register as instructed by vendor.
Add function to get embedded LTTPR target address offset.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: TungYu Lu <tungyu.lu@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sung Lee [Fri, 2 May 2025 20:07:44 +0000 (16:07 -0400)]
drm/amd/display: Program triplebuffer on all pipes
[WHY]
Triplebuffer should be programmed on all pipes.
Some code assumed it only needed to be called on top
pipe, but as the HWSS function does not account
for that, it must be called on every pipe.
[HOW]
Remove condition to not program triplebuffer
on non-top/next pipe. Call the function
unconditionally on all pipes.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Sung Lee <Sung.Lee@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sun, 4 May 2025 15:35:56 +0000 (11:35 -0400)]
drm/amd/display: [FW Promotion] Release 0.1.10.0
Refactoring some IPS and panel replay structs
Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why & How]
Default should be 1 to disable EASF narrow filter sharpening.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wayne Lin [Wed, 30 Apr 2025 09:35:21 +0000 (17:35 +0800)]
drm/amd/display: Return the exact value for debugging
[Why]
It's unnecessary to set operation_result as invalid reply when
p_notify->result != AUX_RET_SUCCESS.
[How]
Set operation_result as p_notify->result to better understand
the reason for the error
Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Tue, 22 Apr 2025 20:58:41 +0000 (15:58 -0500)]
drm/amd/display: Restructure DMI quirks
[Why]
DMI quirks are relatively big code that makes amdgpu_dm 200 lines
larger.
[How]
Move DMI quirks into a dedicated source file and make all quirks
variables for `struct amdgpu_display_manager`.
Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: check stream id dml21 wrapper to get plane_id
[Why & How]
Fix a false positive warning which occurs due to lack of correct checks
when querying plane_id in DML21. This fixes the warning when performing a
mode1 reset (cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover):
George Shen [Thu, 24 Apr 2025 14:02:59 +0000 (10:02 -0400)]
drm/amd/display: fix link_set_dpms_off multi-display MST corner case
[Why & How]
When MST config is unplugged/replugged too quickly, it can potentially
result in a scenario where previous DC state has not been reset before
the HPD link detection sequence begins. In this case, driver will
disable the streams/link prior to re-enabling the link for link
training.
There is a bug in the current logic that does not account for the fact
that current_state can be released and cleared prior to swapping to a
new state (resulting in the pipe_ctx stream pointers to be cleared) in
between disabling streams.
To resolve this, cache the original streams prior to committing any
stream updates.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why & How]
Instead of dropping DRR updates, defer them. This fixes issues where
monitor continues to see incorrect refresh rate after VRR was turned off
by userspace.
Fixes: 32953485c558 ("drm/amd/display: Do not update DRR while BW optimizations pending") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3546 Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: John Olender <john.olender@gmail.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Gabe Teeger [Fri, 25 Apr 2025 14:31:39 +0000 (10:31 -0400)]
Revert: "drm/amd/display: Enable urgent latency adjustment on DCN35"
This reverts commit 756c85e4d0dd ("drm/amd/display: Enable urgent latency adjustment on DCN35")
Reason for revert: Negative power impact.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Gabe Teeger <Gabe.Teeger@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix race in dmub_srv_wait_for_pending
[WHY]
If commands are being submitted to DMCUB while concurrently waiting for
pending commands to complete, rptr and wptr may never match again, and
reported command count will not update.
[HOW]
Modify dmub_srv_wait_for_pending to constantly check wptr and rptr
match, and update inbox status whenever a message is sent to avoid the
race and determine message completion or idle as quickly as possible.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wayne Lin [Fri, 25 Apr 2025 06:44:02 +0000 (14:44 +0800)]
drm/amd/display: Correct the reply value when AUX write incomplete
[Why]
Now forcing aux->transfer to return 0 when incomplete AUX write is
inappropriate. It should return bytes have been transferred.
[How]
aux->transfer is asked not to change original msg except reply field of
drm_dp_aux_msg structure. Copy the msg->buffer when it's write request,
and overwrite the first byte when sink reply 1 byte indicating partially
written byte number. Then we can return the correct value without
changing the original msg.
Fixes: 3637e457eb00 ("drm/amd/display: Fix wrong handling for AUX_DEFER case") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Backup and restore plane configuration only on update
[WHY&HOW]
When backing up and restoring plane states for minimal transition
cases, only configuration should be backed up and restored. Information
only relevant to the object/allocation (like refcount) should be
excluded. Also move this interface to dc_plane.h.
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
smu_v13_0_init_display_count() was added in 2020 by
commit c05d1c401572 ("drm/amd/swsmu: add aldebaran smu13 ip support (v3)")
but is unused.
See discussion on:
https://lore.kernel.org/all/DM4PR12MB5165D85BD85BC8FC8BF7A3B48E88A@DM4PR12MB5165.namprd12.prod.outlook.com/
that it really isn't neede.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Arvind Yadav [Wed, 7 May 2025 16:36:17 +0000 (22:06 +0530)]
drm/amdgpu: Fix NULL dereference in amdgpu_userq_restore_worker
Switch cancel_delayed_work() to cancel_delayed_work_sync() to ensure
the delayed work has finished executing before proceeding with
resource cleanup. This prevents a potential use-after-free or
NULL dereference if the resume_work is still running during finalization.
v2: Replace cancel_delayed_work() to cancel_delayed_work_sync()
in amdgpu_userq_destroy() and amdgpu_userq_evict().
Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Wed, 7 May 2025 15:04:32 +0000 (11:04 -0400)]
drm/amdgpu: csa unmap use uninterruptible lock
After process exit to unmap csa and free GPU vm, if signal is accepted
and then waiting to take vm lock is interrupted and return, it causes
memory leaking and below warning backtrace.
Change to use uninterruptible wait lock fix the issue.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Tue, 6 May 2025 17:22:25 +0000 (13:22 -0400)]
drm/amd/display: use drm_dbg_driver() in amdgpu_dm.c
Replace all use of DRM_DEBUG_DRIVER in amdgpu_dm.c with
drm_dbg_driver(). The latter prints the instance of the drm device
associated with the error which would helpful in debugging scenarios
involving multiple GPUs
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dave Airlie [Fri, 9 May 2025 20:10:04 +0000 (06:10 +1000)]
Merge tag 'drm-intel-next-2025-05-08' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Non-display related:
- Fix undefined reference to `intel_pxp_gsccs_is_ready_for_sessions'
Display related:
- More work towards display separation (Jani)
- Stop writing VRR_CTL_IGN_MAX_SHIFT for MTL onwards (Jouni)
- DSC checks for 3 engines (Ankit)
- Add link rate and lane count to i915_display_info (Khaled)
- PSR fixes and workaround for underrun on idle (Jouni)
- LOBF enablement and ALMP fixes (Animesh)
- Clean up VGA plane handling (Ville)
- Use an intel_connector pointer everywhere (Imre)
- Fix warning for coffeelake on SunrisePoint PCH (Jiajia)
- Rework/Correction on minimum hblank calculation (Arun)
- Dmesg clean up (Jani)
- Add a couple of simple display workarounds (Ankit, Vinod)
- Refactor HDCP GSC (Jani)
Dave Airlie [Fri, 9 May 2025 01:39:27 +0000 (11:39 +1000)]
Merge tag 'drm-intel-gt-next-2025-05-08-1' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Driver Changes:
- Fix SLPC wait boosting reference counting to avoid getting stuck on non-boost
frequency on power saving profile on DG1/DG2 (Vinay)
- Add 20ms delay to engine reset for robustness on HSW (Nitin)
- Use proper sleeping functions for timeouts shorter than 20ms (Andi)
- Fix fence not released on early probe errors for HuC (Janusz)
- Remove const from struct i915_wa list allocation (Kees)
- Apply SPDX license format where missing and use single-line format (Andi)
- Whitespace fixes (Dan, Andi)
- Selftest improvements (Mikolaj, Badal, Sk,
APU doesn't have second IH ring, so re-routing action here is a no-op.
It will take a lot of time to wait timeout from PSP during the
initialization. So remove the function in psp v12.
Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Tue, 6 May 2025 20:49:46 +0000 (15:49 -0500)]
drm/amd: Add per-ring reset for vcn v4.0.5 use
There is a problem occurring on VCN 4.0.5 where in some situations a job
is timing out. This triggers a job timeout which then causes a GPU
reset for recovery. That has exposed a number of issues with GPU reset
that have since been fixed. But also a GPU reset isn't actually needed
for this circumstance. Just restarting the ring is enough.
Add a reset callback for the ring which will stop and start VCN if the
issue happens.
Dr. David Alan Gilbert [Wed, 7 May 2025 00:24:25 +0000 (01:24 +0100)]
drm/amd/pm/smu13: Remove unused smu_v3 functions
smu_v13_0_display_clock_voltage_request() and
smu_v13_0_set_min_deep_sleep_dcefclk() were added in 2020 by
commit c05d1c401572 ("drm/amd/swsmu: add aldebaran smu13 ip support (v3)")
but have remained unused.
Remove them.
smu_v13_0_display_clock_voltage_request() was the only user
of smu_v13_0_set_hard_freq_limited_range(). Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The last use of smu_v11_0_get_dpm_level_range() was removed in 2020 by
commit 46a301e14e8a ("drm/amd/powerplay: drop unnecessary Navi1x specific
APIs")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amd/amdkfd: Trigger segfault for early userptr unmmapping
If applications unmap the memory before destroying the userptr, it needs
trigger a segfault to notify user space to correct the free sequence in
VM debug mode.
v2: Send gpu access fault to user space
v3: Report gpu address to user space, remove unnecessary params
v4: update pr_err into one line, remove userptr log info
Signed-off-by: Shane Xiao <shane.xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In VM debug mode, it is desirable to notify the application
to correct the freeing sequence by unmapping the memory before
destroying the userptr in the old userptr path. Add a bitmask
to decide whether to send gpu vm fault to the applition.
Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Prike Liang [Tue, 6 May 2025 08:32:23 +0000 (16:32 +0800)]
drm/amdgpu: unreserve the gem BO before returning from attach error
It requires unlocking the reserved gem BO before returning from
attaching the eviction fence error.
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: promote the implicit sync to the dependent read fences
The driver doesn't want to implicitly sync on the DMA_RESV_USAGE_BOOKKEEP
usage fences, and the BOOKEEP fences should be synced explicitly. So, as
the VM implicit syncing only need to return and sync the dependent read
fences.
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 29 Apr 2025 13:45:03 +0000 (09:45 -0400)]
drm/amdgpu/psp: mark securedisplay TA as optional
This is an optional TA which is only available on
certain embedded systems. Mark it as optional to avoid
user confusion. This mirrors what we already do for
other optional TAs.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4181 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 1 May 2025 17:46:46 +0000 (13:46 -0400)]
drm/amdgpu: fix pm notifier handling
Set the s3/s0ix and s4 flags in the pm notifier so that we can skip
the resource evictions properly in pm prepare based on whether
we are suspending or hibernating. Drop the eviction as processes
are not frozen at this time, we we can end up getting stuck trying
to evict VRAM while applications continue to submit work which
causes the buffers to get pulled back into VRAM.
v2: Move suspend flags out of pm notifier (Mario)
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178 Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support") Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ellen Pan [Tue, 29 Apr 2025 21:18:44 +0000 (17:18 -0400)]
drm/amdgpu: Implement unrecoverable error message handling for VFs
This notification may arrive in VF mailbox while polling for response from
another event.
This patches covers the following scenarios:
- If VF is already in RMA state, then do not attempt to contact the host.
Host will ignore the VF after sending the notification.
- If the notification is detected during polling, then set the RMA status,
and return error to caller.
- If the notification arrives by interrupt, then set the RMA status and
queue a reset. This reset will fail and VF will stop runtime services.
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ellen Pan [Tue, 29 Apr 2025 21:00:50 +0000 (17:00 -0400)]
drm/amdgpu: Add unrecoverable error message definitions for VFs
Host may stop runtime services after reaching a bad page threshold.
This notification will indicate to the VF that it no longer has
access to the GPU.
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This breaks S4 because we end up setting the s3/s0ix flags
even when we are entering s4 since prepare is used by both
flows. The causes both the S3/s0ix and s4 flags to be set
which breaks several checks in the driver which assume they
are mutually exclusive.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3634 Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The dma_resv_add_fence() already refers to the added fence.
So when attaching the evciton fence to the gem bo, it needn't
refer to it anymore.
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ellen Pan [Tue, 29 Apr 2025 20:48:09 +0000 (16:48 -0400)]
drm/amdgpu: Implement Runtime Bad Page query for VFs
Host will send a notification when new bad pages are available.
Uopn guest request, the first 256 bad page addresses
will be placed into the PF2VF region.
Guest should pause the PF2VF worker thread while
the copy is in progress.
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ellen Pan [Tue, 29 Apr 2025 20:22:53 +0000 (16:22 -0400)]
drm/amdgpu: Add Runtime Bad Page message definitions for VFs
Currently VFs rely on poison consumption interrupt from HW
to kick off the bad page retirement process. Part of this process
includes a VF reset.
This patch adds the following:
1) Host Bad Pages notification message.
2) Guest request bad pages message.
When combined, VFs are able to reserve the pages early, and potentially
avoid future poison consumption that will disrupt user services
from consequent FLR.
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Mon, 28 Apr 2025 20:26:20 +0000 (16:26 -0400)]
drm/amd/display: Don't check for NULL divisor in fixpt code
[Why]
We check for a NULL divisor but don't act on it.
This check does nothing other than throw a warning.
It does confuse static checkers though:
See https://lkml.org/lkml/2025/4/26/371
[How]
Drop the ASSERTs in both DC and SPL variants.
Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)") Fixes: 6efc0ab3b05d ("drm/amd/display: add back quality EASF and ISHARP and dc dependency changes") Signed-off-by: Harry Wentland <harry.wentland@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Leo Li <sunpeng.li@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ivan Shamliev [Thu, 24 Apr 2025 15:14:53 +0000 (18:14 +0300)]
drm/amd/display: Use true/false for boolean variables in DML2 core files
Replace 0 and 1 with false and true for boolean variables in
dml2_core_dcn4_calcs.c and dml2_core_utils.c to align with the Linux
kernel coding style guidelines, which recommend using C99 bool type
with true/false values.
Signed-off-by: Ivan Shamliev <ivan.shamliev.dev@abv.bg> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Flowers [Sat, 3 May 2025 21:18:51 +0000 (14:18 -0700)]
drm/amd/display: adds kernel-doc comment for dc_stream_remove_writeback()
Adds kernel-doc for externally linked dc_stream_remove_writeback function.
Signed-off-by: James Flowers <bold.zone2373@fastmail.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shuicheng Lin [Wed, 7 May 2025 02:23:02 +0000 (02:23 +0000)]
drm/xe: Release force wake first then runtime power
xe_force_wake_get() is dependent on xe_pm_runtime_get(), so for
the release path, xe_force_wake_put() should be called first then
xe_pm_runtime_put().
Combine the error path and normal path together with goto.
v2 (Matt):
refine commit message to have more details
add Fixes tag
move the code to xe_svm.h which already have the config
remove a blank line per codestyle suggestion
Fixes: 63f6e480d115 ("drm/xe: Add SVM garbage collector") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250502170052.1787973-1-shuicheng.lin@intel.com
Qiu-ji Chen [Wed, 6 Nov 2024 09:59:06 +0000 (17:59 +0800)]
drm/tegra: Fix a possible null pointer dereference
In tegra_crtc_reset(), new memory is allocated with kzalloc(), but
no check is performed. Before calling __drm_atomic_helper_crtc_reset,
state should be checked to prevent possible null pointer dereference.
Fixes: b7e0b04ae450 ("drm/tegra: Convert to using __drm_atomic_helper_crtc_reset() for reset.") Cc: stable@vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20241106095906.15247-1-chenqiuji666@gmail.com
Biju Das [Wed, 5 Feb 2025 11:21:35 +0000 (11:21 +0000)]
drm/tegra: rgb: Fix the unbound reference count
The of_get_child_by_name() increments the refcount in tegra_dc_rgb_probe,
but the driver does not decrement the refcount during unbind. Fix the
unbound reference count using devm_add_action_or_reset() helper.
Mikko Perttunen [Tue, 4 Feb 2025 02:45:46 +0000 (02:45 +0000)]
gpu: host1x: Remove mid-job CDMA flushes
The current code can issue CDMA flushes (DMAPUT bumps) in the middle
of a job, before all opcodes have been written into the pushbuffer.
This can happen when pushbuffer fills up. Presumably this made sense
at some point in the past, but it doesn't anymore, as it cannot lead
to more space appearing in the pushbuffer as it is only cleaned full
jobs at a time.
Mid-job flushes can also cause problems, as in an extreme situation
(seen in practice), the hardware can run through the entire pushbuffer
including the prefix of a partially written job without the driver
being able to process any CDMA updates. This can cause the engine
MLOCK to be taken and held for extended periods as the tail of the
job is not yet available to hardware.
Mikko Perttunen [Wed, 5 Feb 2025 06:10:27 +0000 (06:10 +0000)]
drm/tegra: falcon: Pipeline firmware copy
The Falcon DMA engine allows queueing multiple operations for
improved performance. Do this to optimize firmware loading.
A performance improvement of 4x to 6x is observed.
Co-developed-by: Ivan Raul Guadarrama <iguadarrama@nvidia.com> Signed-off-by: Ivan Raul Guadarrama <iguadarrama@nvidia.com> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20250205061027.1205748-1-mperttunen@nvidia.com
Jon Hunter [Mon, 28 Apr 2025 15:34:35 +0000 (16:34 +0100)]
drm/tegra: Remove unneeded include
The header file 'tegra_drm.h' is included in gem.c, but this file is
already include 'drm.h'. Although there is no harm in including this
file again, it is also not necessary. Furthermore, the header file is
located under 'include/uapi/drm' so ideally the full path would be
used to be explicit. For now, just remove from gem.c.
Changes to a plane's type after it has been registered aren't propagated
to userspace automatically. This could possibly be achieved by updating
the property, but since we can already determine which type this should
be before the registration, passing in the right type from the start is
a much better solution.
Jani Nikula [Wed, 7 May 2025 08:31:36 +0000 (11:31 +0300)]
drm/i915/rps: fix stale reference to i915->irq_lock
The RPS code has been switched from using i915->irq_lock to gt->irq_lock
years ago in commit d762043f7ab1 ("drm/i915: Extract GT powermanagement
interrupt handling"). Fix the stale comment referencing i915->irq_lock,
which has also ceased to exist.
drm/i915/gt: Remove const from struct i915_wa list allocation
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)
The assigned type is "struct i915_wa *". The returned type, while
technically matching, will be const qualified. As there is no general
way to remove const qualifiers, adjust the allocation type to match
the assignment.
Jani Nikula [Tue, 6 May 2025 13:06:50 +0000 (16:06 +0300)]
drm/i915/irq: move i915->irq_lock to display->irq.lock
Observe that i915->irq_lock is no longer used to protect anything
outside of display. Make it a display thing.
This allows us to remove the ugly #define irq_lock irq.lock hack from xe
compat header.
Note that this is slightly more subtle than it first looks. For i915,
there's no functional change here. The lock is moved. However, for xe,
we'll now have *two* locks, xe->irq.lock and display->irq.lock. These
should protect different things, though. Indeed, nesting in the past
would've lead to a deadlock because they were the same lock.
With the i915 references gone, we can make a handful more files
independent of i915_drv.h.
Jani Nikula [Tue, 6 May 2025 13:06:49 +0000 (16:06 +0300)]
drm/i915/rps: refactor display rps support
Make the gt rps code and display irq code interact via
intel_display_rps.[ch], instead of direct access. Add no-op static
inline stubs for xe instead of having a separate build unit doing
nothing. All of this clarifies the interfaces between i915 core and
display.