Jesse Agate [Fri, 13 Jun 2025 18:20:53 +0000 (14:20 -0400)]
drm/amd/display: Incorrect Mirror Cositing
[WHY]
hinit/vinit are incorrect in the case of mirroring.
[HOW]
Cositing sign must be flipped when image is mirrored in the vertical
or horizontal direction.
Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Samson Tam <samson.tam@amd.com> Signed-off-by: Jesse Agate <jesse.agate@amd.com> Signed-off-by: Brendan Leder <breleder@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Mon, 6 Oct 2025 10:45:25 +0000 (12:45 +0200)]
drm/amdgpu: partially revert "revert to old status lock handling v3"
The CI systems are pointing out list corruptions, so we still need to
fix something here.
Keep the asserts, but revert the lock changes for now.
Fixes: 59e4405e9ee2 ("drm/amdgpu: revert to old status lock handling v3") Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ard Biesheuvel [Thu, 2 Oct 2025 21:00:45 +0000 (23:00 +0200)]
drm/amd/display: Fix unsafe uses of kernel mode FPU
The point of isolating code that uses kernel mode FPU in separate
compilation units is to ensure that even implicit uses of, e.g., SIMD
registers for spilling occur only in a context where this is permitted,
i.e., from inside a kernel_fpu_begin/end block.
This is important on arm64, which uses -mgeneral-regs-only to build all
kernel code, with the exception of such compilation units where FP or
SIMD registers are expected to be used. Given that the compiler may
invent uses of FP/SIMD anywhere in such a unit, none of its code may be
accessible from outside a kernel_fpu_begin/end block.
This means that all callers into such compilation units must use the
DC_FP start/end macros, which must not occur there themselves. For
robustness, all functions with external linkage that reside there should
call dc_assert_fp_enabled() to assert that the FPU context was set up
correctly.
Fix this for the DCN35, DCN351 and DCN36 implementations.
Cc: Austin Zheng <austin.zheng@amd.com> Cc: Jun Lei <jun.lei@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <siqueira@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine
After GPU reset with VRAM loss, a general protection fault occurs
during user queue restoration when accessing vm_bo->vm after
spinlock release in amdgpu_vm_bo_reset_state_machine.
The root cause is that vm_bo points to the last entry from the
list_for_each_entry loop, but this becomes invalid after the
spinlock is released. Accessing vm_bo->vm at this point leads
to memory corruption.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Fri, 3 Oct 2025 20:05:46 +0000 (16:05 -0400)]
drm/amdkfd: Fix two comments in kfd_ioctl.h
Queue read and write pointers are "to KFD", not "from KFD".
Suggested-by: Robert Liu <robert.liu@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Robert Liu <robert.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
PMFW interface version is not used by some IP implementations like SMU
v13.0.6/12, instead rely on PMFW version checks. Avoid the log if
interface version is not used.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init
As KFD no longer uses a separate PASID, the global amdgpu_vm_set_pasid()function is no longer necessary.
Merge its functionality directly intoamdgpu_vm_init() to simplify code flow and eliminate redundant locking.
v2: remove superflous check
adjust amdgpu_vm_fin and remove amdgpu_vm_set_pasid (Chritian)
v3: drop amdgpu_vm_assert_locked (Chritian)
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4614 Fixes: 59e4405e9ee2 ("drm/amdgpu: revert to old status lock handling v3") Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shaoyun Liu [Thu, 25 Sep 2025 01:40:57 +0000 (21:40 -0400)]
drm/amd/amdgpu: Fix the mes version that support inv_tlbs
MES version 0x83 is not stable to use the inv_tlbs API. Defer it to 0x84 vertsion.
Fixes: 85442bac8466 ("drm/amd/amdgpu: Fix the mes version that support inv_tlbs") Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Michael Chen <michael.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Thu, 25 Sep 2025 19:10:57 +0000 (14:10 -0500)]
drm/amd: Check whether secure display TA loaded successfully
[Why]
Not all renoir hardware supports secure display. If the TA is present
but the feature isn't supported it will fail to load or send commands.
This shows ERR messages to the user that make it seems like there is
a problem.
[How]
Check the resp_status of the context to see if there was an error
before trying to send any secure display commands.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1415 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Mon, 15 Sep 2025 19:57:32 +0000 (15:57 -0400)]
drm/amdkfd: Fix mmap write lock not release
If mmap write lock is taken while draining retry fault, mmap write lock
is not released because svm_range_restore_pages calls mmap_read_unlock
then returns. This causes deadlock and system hangs later because mmap
read or write lock cannot be taken.
Downgrade mmap write lock to read lock if draining retry fault fix this
bug.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Tue, 27 May 2025 15:09:53 +0000 (11:09 -0400)]
drm/amdkfd: Fix kfd process ref leaking when userptr unmapping
kfd_lookup_process_by_pid hold the kfd process reference to ensure it
doesn't get destroyed while sending the segfault event to user space.
Calling kfd_lookup_process_by_pid as function parameter leaks the kfd
process refcount and miss the NULL pointer check if app process is
already destroyed.
Fixes: 2d274bf7099b ("amd/amdkfd: Trigger segfault for early userptr unmmapping") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O.
There is some probability that reset workqueue is blocked by KIQ I/O for 10+ seconds after gpu hangs.
So we need to add a in_reset check during each KIQ register poll.
Signed-off-by: Heng Zhou <Heng.Zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Thu, 25 Sep 2025 18:45:25 +0000 (20:45 +0200)]
drm/amd/display: Disable scaling on DCE6 for now
Scaling doesn't work on DCE6 at the moment, the current
register programming produces incorrect output when using
fractional scaling (between 100-200%) on resolutions higher
than 1080p.
Disable it until we figure out how to program it properly.
Fixes: 7c15fd86aaec ("drm/amd/display: dc/dce: add initial DCE6 support (v10)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Thu, 25 Sep 2025 18:45:24 +0000 (20:45 +0200)]
drm/amd/display: Properly disable scaling on DCE6
SCL_SCALER_ENABLE can be used to enable/disable the scaler
on DCE6. Program it to 0 when scaling isn't used, 1 when used.
Additionally, clear some other registers when scaling is
disabled and program the SCL_UPDATE register as recommended.
This fixes visible glitches for users whose BIOS sets up a
mode with scaling at boot, which DC was unable to clean up.
Fixes: b70aaf5586f2 ("drm/amd/display: dce_transform: add DCE6 specific macros,functions") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Thu, 25 Sep 2025 18:45:23 +0000 (20:45 +0200)]
drm/amd/display: Properly clear SCL_*_FILTER_CONTROL on DCE6
Previously, the code would set a bit field which didn't exist
on DCE6 so it would be effectively a no-op.
Fixes: b70aaf5586f2 ("drm/amd/display: dce_transform: add DCE6 specific macros,functions") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dave Airlie [Tue, 7 Oct 2025 01:50:02 +0000 (11:50 +1000)]
Merge tag 'drm-xe-next-fixes-2025-10-03' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
Cross-subsystem Changes:
- Fix userptr to not allow device private pages with SVM (Thomas
Hellström)
Driver Changes:
- Fix build with clang 16 (Michal Wajdeczko)
- Fix handling of invalid configfs syntax usage and spell out the
expected syntax in the documentation (Lucas De Marchi)
- Do not try late bind firmware when running as VF since it
shouldn't handle firmware loading (Michal Wajdeczko)
- Fix idle assertion for local BOs (Thomas Hellström)
- Fix uninitialized variable for late binding (Colin Ian King,
Mallesh Koujalagi)
- Do not require perfmon_capable to expose free memory at page
granularity. Handle it like other drm drivers do (Matthew Auld)
- Fix lock handling on suspend error path (Shuicheng Lin)
- Fix I2C controller resume after S3 (Raag Jadav)
drm/xe/i2c: Don't rely on d3cold.allowed flag in system PM path
In S3 and above sleep states, the device can loose power regardless of
d3cold.allowed flag. Bring up I2C controller explicitly in system PM
path to ensure its normal operation after losing power.
Shuicheng Lin [Thu, 25 Sep 2025 02:31:46 +0000 (02:31 +0000)]
drm/xe/hw_engine_group: Fix double write lock release in error path
In xe_hw_engine_group_get_mode(), a write lock is acquired before
calling switch_mode(), which in turn invokes
xe_hw_engine_group_suspend_faulting_lr_jobs().
On failure inside xe_hw_engine_group_suspend_faulting_lr_jobs(),
the write lock is released there, and then again in
xe_hw_engine_group_get_mode(), leading to a double release.
Fix this by keeping both acquire and release operation in
xe_hw_engine_group_get_mode().
Fixes: 770bd1d34113 ("drm/xe/hw_engine_group: Ensure safe transition between execution modes") Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250925023145.1203004-2-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 662d98b8b373007fa1b08ba93fee11f6fd3e387c) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Matthew Auld [Fri, 19 Sep 2025 12:20:53 +0000 (13:20 +0100)]
drm/xe/uapi: loosen used tracking restriction
Currently this is hidden behind perfmon_capable() since this is
technically an info leak, given that this is a system wide metric.
However the granularity reported here is always PAGE_SIZE aligned, which
matches what the core kernel is already willing to expose to userspace
if querying how many free RAM pages there are on the system, and that
doesn't need any special privileges. In addition other drm drivers seem
happy to expose this.
The motivation here if with oneAPI where they want to use the system
wide 'used' reporting here, so not the per-client fdinfo stats. This has
also come up with some perf overlay applications wanting this
information.
Fixes: 1105ac15d2a1 ("drm/xe/uapi: restrict system wide accounting") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Joshua Santosh <joshua.santosh.ranjan@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250919122052.420979-2-matthew.auld@intel.com
(cherry picked from commit 4d0b035fd6dae8ee48e9c928b10f14877e595356) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Mallesh Koujalagi [Thu, 2 Oct 2025 00:56:48 +0000 (06:26 +0530)]
drm/xe/xe_late_bind_fw: Initialize uval variable in xe_late_bind_fw_num_fans()
Initialize the uval variable to 0 in xe_late_bind_fw_num_fans() to fix
a potential use of uninitialized variable warning and ensure predictable
behavior.
The variable is passed by reference to xe_pcode_read() which should
populate it on success, but initializing it to 0 provides a safe
default value and follows kernel coding best practices.
v2:
- uval = 0 which serves as both a safe default and the fallback
value when the pcode read operation fails.
v3:
- Handle MMIO failure (Rodrigo)
- The function should probably return the error and make the uval as
pointer-argument, like the pcode_read.
- Change the caller of this function to propagate the error
upwards if mmio failed.
Thomas Hellström [Tue, 30 Sep 2025 12:27:52 +0000 (14:27 +0200)]
drm/gpusvm, drm/xe: Fix userptr to not allow device private pages
When userptr is used on SVM-enabled VMs, a non-NULL
hmm_range::dev_private_owner value might mean that
hmm_range_fault() attempts to return device private pages.
Either that will fail, or the userptr code will not know
how to handle those.
Use NULL for hmm_range::dev_private_owner to migrate
such pages to system. In order to do that, move the
struct drm_gpusvm::device_private_page_owner field to
struct drm_gpusvm_ctx::device_private_page_owner so that
it doesn't remain immutable over the drm_gpusvm lifetime.
Colin Ian King [Wed, 24 Sep 2025 10:22:08 +0000 (11:22 +0100)]
drm/xe/xe_late_bind_fw: Fix missing initialization of variable offset
The variable offset is not being initialized, and it is only set inside
a for-loop if entry->name is the same as manifest_entry. In the case
where it is not initialized a non-zero check on offset is potentialy checking
a bogus uninitalized value. Fix this by initializing offset to zero.
Fixes: efa29317a553 ("drm/xe/xe_late_bind_fw: Extract and print version info") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250924102208.9216-1-colin.i.king@gmail.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 20f3b28e2e07747fd27301f0f5deb3cb569ee15c) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Thomas Hellström [Mon, 29 Sep 2025 11:26:49 +0000 (13:26 +0200)]
drm/xe/bo: Fix an idle assertion for local bos
Before calling ttm_bo_populate() in the CPU fault path of a bo,
we assert that the bo is not being migrated. However, for
local bos we share the reservation object with other local bos
that might be in the process of being migrated. Also some VM
operations may attach USAGE_KERNEL fences to the common
reservation object and trigger false positives from the assert.
So remove the assert and instead wait for bo idle. This may
unnecessarily wait for idle in some cases but since we're
doing this wait later in the fault path anyway we might as
well do it here as well.
This fixes warnings like:
Sep 25 14:56:23 desky kernel: ------------[ cut here ]------------
Sep 25 14:56:23 desky kernel: xe 0000:03:00.0: [drm] Assertion `dma_resv_test_signaled(tbo->base.resv, DMA_RESV_USAGE_KERNEL) || (tbo->ttm && ttm_tt_is_populated(tbo->ttm))` failed!
platform: BATTLEMAGE subplatform: 1
graphics: Xe2_HPG 20.01 step A0
media: Xe2_HPM 13.01 step A1
Sep 25 14:56:23 desky kernel: WARNING: CPU: 6 PID: 24767 at drivers/gpu/drm/xe/xe_bo.c:1748 xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Modules linked in: cpuid dm_crypt xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfr>
Sep 25 14:56:23 desky kernel: snd_soc_sdca snd_seq_midi prime_numbers coretemp snd_seq_midi_event drm_ttm_helper snd_hda_codec drm_buddy drm_exec snd_rawmidi snd_soc_core snd_hda_cor>
Sep 25 14:56:23 desky kernel: CPU: 6 UID: 1000 PID: 24767 Comm: steamwebhelper Tainted: G U W 6.17.0-rc7+ #32 PREEMPT(voluntary)
Sep 25 14:56:23 desky kernel: Tainted: [U]=USER, [W]=WARN
Sep 25 14:56:23 desky kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D36/PRO Z690-P DDR4 (MS-7D36), BIOS A.A1 10/18/2022
Sep 25 14:56:23 desky kernel: RIP: 0010:xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Code: fa 64 29 f9 48 c7 c7 40 e0 d3 c1 51 48 c7 c1 c0 e3 d3 c1 52 4c 8b 45 c0 41 50 44 8b 4d c8 4d 89 e0 48 8b 55 a8 e8 25 27 95 ef <0f> 0b 48 83 c4 40 4>
Sep 25 14:56:23 desky kernel: RSP: 0000:ffffae1ca88c7b10 EFLAGS: 00010286
Sep 25 14:56:23 desky kernel: RAX: 0000000000000000 RBX: ffff8d7cfd7e6800 RCX: 0000000000000027
Sep 25 14:56:23 desky kernel: RDX: ffff8d845019cec8 RSI: 0000000000000001 RDI: ffff8d845019cec0
Sep 25 14:56:23 desky kernel: RBP: ffffae1ca88c7bc8 R08: 0000000000000000 R09: 0000000000000000
Sep 25 14:56:23 desky kernel: R10: 0000000000000000 R11: 0000000000000004 R12: ffffffffc1db1faa
Sep 25 14:56:23 desky kernel: R13: ffffffffc1db2ab4 R14: 0000000000000001 R15: ffffae1ca88c7bd8
Sep 25 14:56:23 desky kernel: FS: 00007fb1baf31940(0000) GS:ffff8d849c870000(0000) knlGS:0000000000000000
Sep 25 14:56:23 desky kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 14:56:23 desky kernel: CR2: 00007fb1b2860020 CR3: 00000001705a9004 CR4: 0000000000772ef0
Sep 25 14:56:23 desky kernel: PKRU: 55555558
Sep 25 14:56:23 desky kernel: Call Trace:
Sep 25 14:56:23 desky kernel: <TASK>
Sep 25 14:56:23 desky kernel: xe_bo_cpu_fault_fastpath+0x11e/0x220 [xe]
Sep 25 14:56:23 desky kernel: xe_bo_cpu_fault+0x84/0x410 [xe]
Sep 25 14:56:23 desky kernel: ? __x64_sys_mmap+0x33/0x50
Sep 25 14:56:23 desky kernel: ? x64_sys_call+0x1b2e/0x20d0
Sep 25 14:56:23 desky kernel: ? do_syscall_64+0x9d/0x1f0
Sep 25 14:56:23 desky kernel: ? __check_object_size+0x4a/0x2e0
Sep 25 14:56:23 desky kernel: __do_fault+0x36/0x190
Sep 25 14:56:23 desky kernel: do_fault+0xcf/0x570
Sep 25 14:56:23 desky kernel: __handle_mm_fault+0x92b/0xfe0
Sep 25 14:56:23 desky kernel: ? ktime_get_mono_fast_ns+0x39/0xd0
Sep 25 14:56:23 desky kernel: handle_mm_fault+0x164/0x2c0
Sep 25 14:56:23 desky kernel: do_user_addr_fault+0x2cb/0x840
Sep 25 14:56:23 desky kernel: exc_page_fault+0x75/0x180
Sep 25 14:56:23 desky kernel: asm_exc_page_fault+0x27/0x30
Sep 25 14:56:23 desky kernel: RIP: 0033:0x7fb1bc388bb7
Sep 25 14:56:23 desky kernel: Code: 48 ff c7 48 01 fe 48 8d 54 11 80 0f 1f 84 00 00 00 00 00 c5 fe 6f 0e c5 fe 6f 56 20 c5 fe 6f 5e 40 c5 fe 6f 66 60 48 83 ee 80 <c5> fd 7f 0f c5 fd 7>
Sep 25 14:56:23 desky kernel: RSP: 002b:00007ffd7814fad8 EFLAGS: 00010207
Sep 25 14:56:23 desky kernel: RAX: 00007fb1b2860000 RBX: 0000000000000690 RCX: 00007fb1b2860000
Sep 25 14:56:23 desky kernel: RDX: 00007fb1b2860610 RSI: 0000556eda79f4c0 RDI: 00007fb1b2860020
Sep 25 14:56:23 desky kernel: RBP: 00007ffd7814fb60 R08: 0000000000000000 R09: 000000012be0e000
Sep 25 14:56:23 desky kernel: R10: 00007fb1b2860000 R11: 0000000000000246 R12: 0000556edd39a240
Sep 25 14:56:23 desky kernel: R13: 00007fb1b2dcb010 R14: 0000556eda79f420 R15: 0000000000000000
Sep 25 14:56:23 desky kernel: </TASK>
Lucas De Marchi [Wed, 24 Sep 2025 15:27:11 +0000 (08:27 -0700)]
drm/xe/configfs: Improve doc for ctx_restore* attributes
Spell out the syntax instead of only using examples. Particularly
important the <engine-class> part since that's different than
engines_allowed and may confuse users. The same batch buffer is used for
all engines of a certain class.
Cc: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Fixes: e2a9854d806e ("drm/xe/configfs: Allow to select by class only") Link: https://lore.kernel.org/r/20250924152709.659483-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 47ca7acff4011fa322853a3612f464b959e88210) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Lucas De Marchi [Wed, 24 Sep 2025 15:27:10 +0000 (08:27 -0700)]
drm/xe/configfs: Fix engine class parsing
If mask is NULL, only the engine class should be accepted, so the
pattern string should be completely parsed. This should fix passing e.g.
rcs0 to ctx_restore_post_bb when it's only expecting the engine class.
Reported-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Closes: https://lore.kernel.org/r/20250922155544.67712-1-jonathan.cavitt@intel.com Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/aNJKnrCQmL9xS9Gv@stanley.mountain Fixes: e2a9854d806e ("drm/xe/configfs: Allow to select by class only") Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250924152709.659483-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit dd797967160b79cc0ca2d2eb05fc55436b66dce0) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Mon, 22 Sep 2025 10:12:07 +0000 (12:12 +0200)]
drm/xe/tests: Fix build break on clang 16.0.6
The following error was reported when building with clang 16.0.6:
In file included from drivers/gpu/drm/xe/xe_pci.c:1104:
>> drivers/gpu/drm/xe/tests/xe_pci.c:214:2: error: initializer \
element is not a compile-time constant
graphics_ip_xelp,
^~~~~~~~~~~~~~~~
drivers/gpu/drm/xe/tests/xe_pci.c:221:2: error: initializer \
element is not a compile-time constant
media_ip_xem,
^~~~~~~~~~~~
2 errors generated.
Fix that by explicit re-definition of pre-GMDID IPs, as there are
not so many of them.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509192041.tQwdE4DS-lkp@intel.com/ Fixes: 5bb5258e357e ("drm/xe/tests: Add pre-GMDID IP descriptors to param generators") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250922101207.192028-1-michal.wajdeczko@intel.com
(cherry picked from commit 2de80e2da74b402a9d838b8e729cd01cf94cdcbc) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Don't mix dma fence lock with the active_job lock. Use fence_lock to
protect the dma fence used by drm scheduler when signalling a job
completion and queue_lock to protect concurrent access to active bin job
in OOM and stats collection for a given file priv. The issue was
uncovered when PREEMPT_RT on with a system freeze when opening multiple
Chromium tabs on Raspberry Pi 5.
Dave Airlie [Fri, 26 Sep 2025 03:25:42 +0000 (13:25 +1000)]
Merge tag 'drm-habanalabs-next-2025-09-25' of https://github.com/HabanaAI/drivers.accel.habanalabs.kernel into drm-next
This tag contains habanalabs driver changes for v6.18.
It continues the previous upstream work from tags/drm-habanalabs-next-2024-06-23,
Including improvements in debug and visibility, alongside general code cleanups,
and new features such as vmalloc-backed coherent mmap, HLDIO infrastructure, etc.
Mario Limonciello [Wed, 24 Sep 2025 16:16:24 +0000 (11:16 -0500)]
drm/amd: Add name to modes from amdgpu_connector_add_common_modes()
[Why]
When DC adds common modes it adds modes with a string to match what
they are. Non-DC doesn't. This can be inconsistent when turning on/off
DC support.
[How]
Add a name member to common_modes[] and copy it into the drm display
mode.
Cc: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Link: https://lore.kernel.org/r/20250924161624.1975819-6-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Wed, 24 Sep 2025 16:16:23 +0000 (11:16 -0500)]
drm/amd: Drop some common modes from amdgpu_connector_add_common_modes()
[Why]
DC and non-DC codepaths have different sets of common modes that are
added for eDP and LVDS cases. This can cause different behaviors for
turning on DC on hardware that can support both.
[How]
Drop extra modes from amdgpu_connector_add_common_modes() not present
in amdgpu_dm_connector_add_common_modes().
Cc: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250924161624.1975819-5-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 15 Nov 2024 13:56:33 +0000 (08:56 -0500)]
drm/amdgpu: update MODULE_PARM_DESC for freesync_video
To better describe what it does.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3756 Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Wed, 24 Sep 2025 16:16:22 +0000 (11:16 -0500)]
drm/amd: Use dynamic array size declaration for amdgpu_connector_add_common_modes()
[Why]
Adding or removing a mode from common_modes[] can be fragile if a user
forgot to update the for loop boundaries.
[How]
Use ARRAY_SIZE() to detect size of the array and use that instead.
Cc: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Link: https://lore.kernel.org/r/20250924161624.1975819-4-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Wed, 24 Sep 2025 11:38:36 +0000 (13:38 +0200)]
drm/amd/display: Share dce100_validate_global with DCE6-8
The dce100_validate_global function was verbatim exactly the
same as dce60_validate_global and dce80_validate_global.
Share dce100_validate_global between DCE6-10 to save code size.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Wed, 24 Sep 2025 11:38:35 +0000 (13:38 +0200)]
drm/amd/display: Share dce100_validate_bandwidth with DCE6-8
DCE6-8 have very similar capabilities to DCE10, they support the
same DP and HDMI versions and work similarly.
Share dce100_validate_bandwidth between DCE6-10 to reduce code
duplication in the DC driver.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Fix fence signaling race condition in userqueue
This commit fixes a potential race condition in the userqueue fence
signaling mechanism by replacing dma_fence_is_signaled_locked() with
dma_fence_is_signaled().
The issue occurred because:
1. dma_fence_is_signaled_locked() should only be used when holding
the fence's individual lock, not just the fence list lock
2. Using the locked variant without the proper fence lock could lead
to double-signaling scenarios:
- Hardware completion signals the fence
- Software path also tries to signal the same fence
By using dma_fence_is_signaled() instead, we properly handle the
locking hierarchy and avoid the race condition while still maintaining
the necessary synchronization through the fence_list_lock.
v2: drop the comment (Christian)
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amd/amdkfd: enhance kfd process check in switch partition
current switch partition only check if kfd_processes_table is empty.
kfd_prcesses_table entry is deleted in kfd_process_notifier_release, but
kfd_process tear down is in kfd_process_wq_release.
consider two processes:
Process A (workqueue) -> kfd_process_wq_release -> Access kfd_node member
Process B switch partition -> amdgpu_xcp_pre_partition_switch -> amdgpu_amdkfd_device_fini_sw
-> kfd_node tear down.
Process A and B may trigger a race as shown in dmesg log.
This patch is to resolve the race by adding an atomic kfd_process counter
kfd_processes_count, it increment as create kfd process, decrement as
finish kfd_process_wq_release.
v2: Put kfd_processes_count per kfd_dev, move decrement to kfd_process_destroy_pdds
and bug fix. (Philip Yang)
amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
kfree(kfd), and KGD interrupt generated.
kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K
RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40
Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO> OF b1 17 75 Ba 4c 89 e@ 41 Sc
Timur Kristóf [Wed, 24 Sep 2025 11:38:34 +0000 (13:38 +0200)]
drm/amd/display: Reject modes with too high pixel clock on DCE6-10
Reject modes with a pixel clock higher than the maximum display
clock. Use 400 MHz as a fallback value when the maximum display
clock is not known. Pixel clocks that are higher than the display
clock just won't work and are not supported.
With the addition of the YUV422 fallback, DC can now accidentally
select a mode requiring higher pixel clock than actually supported
when the DP version supports the required bandwidth but the clock
is otherwise too high for the display engine. DCE 6-10 don't
support these modes but they don't have a bandwidth calculation
to reject them properly.
Fixes: db291ed1732e ("drm/amd/display: Add fallback path for YCBCR422") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Wed, 24 Sep 2025 16:16:21 +0000 (11:16 -0500)]
drm/amd: Drop unnecessary check in amdgpu_connector_add_common_modes()
[Why]
amdgpu_connector_add_common_modes() has a check for the width and height
of common modes being too small, but the array of common_modes[] has fixed
values. The check is dead code.
[How]
Drop unnecessary check.
Cc: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250924161624.1975819-3-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Wed, 24 Sep 2025 16:16:20 +0000 (11:16 -0500)]
drm/amd/display: Only enable common modes for eDP and LVDS
[Why]
The main reason common modes are added is for compatibility with
clone mode when a laptop is connected to a projector or external
monitor. Since commit 978fa2f6d0b12 ("drm/amd/display: Use scaling
for non-native resolutions on eDP") when non-native modes are picked
for eDP the GPU scalar will be used. This is because it is inconsistent
whether eDP panels have the capability to actually drive non-native
resolutions. With panels connected to other connectors this limitation
generally doesn't exist as we the EDID will advertise support for a
number of resolutions and monitors will use built in scaling hardware.
Comparing DC and non-DC code paths the non-DC code path only adds
common modes for LVDS and eDP whereas the DC codepath does it for
all connector types.
In the past there was an experiment done to disable common mode adding
for eDP and LVDS from commit 6d396e7ac1ce3 ("drm/amd/display: Disable
common modes for LVDS") and commit 7948afb46af92 ("drm/amd/display:
Disable common modes for eDP") but this was reverted in
commit a8b79b09185de ("drm/amd: Re-enable common modes for eDP and
LVDS") because it caused problems with Xorg.
[How]
Only add common modes for eDP and LVDS for DC, matching the behavior
of non-DC.
Suggested-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250924161624.1975819-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initially we used VMID reservation to enforce isolation between
processes. That has now been replaced by proper fence handling.
Both OpenGL, RADV and ROCm developers requested a way to reserve a VMID
for SPM, so restore that approach by reverting back to only allowing a
single process to use the reserved VMID.
Only compile tested for now.
v2: use -ENOENT instead of -EINVAL if VMID is not available
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Tue, 16 Sep 2025 14:07:35 +0000 (16:07 +0200)]
drm/amdgpu: remove leftover from enforcing isolation by VMID
Initially we enforced isolation by reserving a VMID, but that practice
was now removed.
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Add fallback to pipe reset if KCQ ring reset fails
Add a fallback mechanism to attempt pipe reset when KCQ reset
fails to recover the ring. After performing the KCQ reset and
queue remapping, test the ring functionality. If the ring test
fails, initiate a pipe reset as an additional recovery step.
v2: fix the typo (Lijo)
v3: try pipeline reset when kiq mapping fails (Lijo)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pavan S [Wed, 2 Oct 2024 07:46:40 +0000 (10:46 +0300)]
accel/habanalabs: add Infineon version check
On HL338 ASICs, the Infineon first‑stage firmware is not present and
the reported version is 0. In this case printing a version number is
misleading, as it suggests valid firmware when it does not exist.
Fix this by printing the first‑stage Infineon firmware version only
if the reported value is non‑zero. This avoids confusing or incorrect
log messages on devices where the first stage is not applicable.
Konstantin Sinyuk [Tue, 1 Oct 2024 12:52:27 +0000 (15:52 +0300)]
accel/habanalabs/gaudi2: read preboot status after recovering from dirty state
Dirty state can occur when the host VM undergoes a reset while the
device does not. In such a case, the driver must reset the device before
it can be used again. As part of this reset, the device capabilities
are zeroed. Therefore, the driver must read the Preboot status again to
learn the Preboot state, capabilities, and security configuration.
Konstantin Sinyuk [Mon, 16 Sep 2024 12:15:32 +0000 (15:15 +0300)]
accel/habanalabs: add debugfs interface for HLDIO testing
Add debugfs files for NVMe Direct I/O (HLDIO) functionality.
This interface allows userspace access to direct SSD ↔ device transfers
through debugfs nodes.
Four debugfs files are created under /sys/kernel/debug/habanalabs/hlN/:
- dio_ssd2hl : trigger SSD-to-device transfers
- dio_hl2ssd : trigger device-to-SSD transfers
(placeholder, not yet implemented)
- dio_stats : show transfer statistics
- dio_reset : reset statistics counters
Konstantin Sinyuk [Mon, 9 Sep 2024 15:21:22 +0000 (18:21 +0300)]
accel/habanalabs: add NVMe Direct I/O (HLDIO) infrastructure
Introduce NVMe Direct I/O (HLDIO) infrastructure to support
peer‑to‑peer DMA in the habanalabs driver. This adds internal helpers
and data structures to enable direct transfers between NVMe storage
and device memory.
The feature is built only when CONFIG_HL_HLDIO is enabled. A debugfs
interface is also provided for functional validation.
accel/habanalabs: support mapping cb with vmalloc-backed coherent memory
When IOMMU is enabled, dma_alloc_coherent() with GFP_USER may return
addresses from the vmalloc range. If such an address is mapped without
VM_MIXEDMAP, vm_insert_page() will trigger a BUG_ON due to the
VM_PFNMAP restriction.
Fix this by checking for vmalloc addresses and setting VM_MIXEDMAP
in the VMA before mapping. This ensures safe mapping and avoids kernel
crashes. The memory is still driver-allocated and cannot be accessed
directly by userspace.
Ilia Levi [Mon, 19 Aug 2024 09:13:07 +0000 (12:13 +0300)]
accel/habanalabs: remove old interface variation of 'access_ok()'
The access_ok() API no longer requires the VERIFY_WRITE argument,
and the use of the old interface with VERIFY_WRITE is deprecated.
Clean up the habanalabs memory manager to use the modern access_ok()
interface consistently. This removes old #ifdef guards and aligns the
driver with current upstream kernel APIs.
Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
accel/habanalabs: clarify ctx use after hl_ctx_put() in dmabuf release
In hl_release_dmabuf(), ctx is dereferenced after calling hl_ctx_put()
to obtain the compute device file.
This is safe because the dma-buf object holds a file reference taken in
export_dmabuf(), and the file release (which drops another ctx reference)
can only happen after we drop that file reference via fput(). Thus, this
hl_ctx_put() call cannot be the last one at this point.
accel/habanalabs/gaudi2: add support for logging register accesses from debugfs
Add infrastructure for logging the last configuration register accesses
that occur via debugfs read/write operations. At interrupt time, these
log entries can be dumped to dmesg, which helps in diagnosing the cause
of RAZWI and ADDR_DEC interrupts.
The logging is implemented as a ring buffer of access entries, with each
entry recording timestamp and access details. To ensure correctness
under concurrent access, operations are now protected using spinlocks.
Entries are copied under lock and then printed after releasing it, which
minimizes time spent in the critical section.
Change the BMON_CR register value back to its original state before
enabling, so that BMON does not continue to collect information
after being disabled.
Tomer Tayar [Sun, 26 May 2024 13:32:32 +0000 (16:32 +0300)]
accel/habanalabs: return ENOMEM if less than requested pages were pinned
EFAULT is currently returned if less than requested user pages are
pinned. This value means a "bad address" which might be confusing to
the user, as the address of the given user memory is not necessarily
"bad".
Modify the return value to ENOMEM, as "out of memory" is more suitable
in this case.
drm/amd/pm: Add VCN reset message support for SMU v13.0.12
This commit adds support for VCN reset functionality in SMU v13.0.12 by:
1. Adding two new PPSMC messages in smu_v13_0_12_ppsmc.h:
- PPSMC_MSG_ResetVCN (0x5E)
- Updates PPSMC_Message_Count to 0x5F to account for new messages
2. Adding message mapping for ResetVCN in smu_v13_0_12_ppt.c:
- Maps SMU_MSG_ResetVCN to PPSMC_MSG_ResetVCN
These changes enable proper VCN reset handling through the SMU firmware
interface for compatible AMD GPUs.
v2: Added fw version check to support vcn queue reset.
Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.Zhang [Fri, 15 Aug 2025 15:44:11 +0000 (23:44 +0800)]
drm/amdgpu: Move VCN reset mask setup to late_init for VCN 5.0.1
This patch moves the initialization of the VCN supported_reset mask from
sw_init to a new late_init function for VCN 5.0.1. The change ensures
that all necessary hardware and firmware initialization is complete
before determining the supported reset types.
Key changes:
- Added vcn_v5_0_1_late_init() function to handle late initialization
- Moved supported_reset mask setup from sw_init to late_init
- Added check for per-queue reset support via amdgpu_dpm_reset_vcn_is_supported()
- Updated ip_funcs to use the new late_init function
This change helps ensure proper reset behavior by waiting until all
dependencies are initialized before determining available reset types.
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Ruili Ji <ruiliji2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.Zhang [Wed, 6 Aug 2025 08:20:28 +0000 (16:20 +0800)]
drm/amdgpu: Add ring reset support for VCN v5.0.1
Implement the ring reset callback for VCN v5.0.1 to properly handle
hardware recovery when encountering GPU hangs. The new functionality:
1. Adds vcn_v5_0_1_ring_reset() function that:
- Prepares for reset using amdgpu_ring_reset_helper_begin()
- Performs VCN instance reset via amdgpu_dpm_reset_vcn()
- Re-initializes hardware through vcn_v5_0_1_hw_init_inst()
- Restarts DPG mode with vcn_v5_0_1_start_dpg_mode()
- Completes reset with amdgpu_ring_reset_helper_end()
2. Hooks the reset function into the unified ring functions via:
- Adding .reset = vcn_v5_0_1_ring_reset to vcn_v5_0_1_unified_ring_vm_funcs
3. Maintains existing behavior for SR-IOV VF cases by checking RRMT status
This provides proper hardware recovery capabilities for VCN 5.0.1 IP block
during fault conditions, matching functionality available in other VCN versions.
v2: Remove the RRMT_ENABLED cap setting in the reset function
and replace adev->vcn.inst[ring->me].indirect_sram with vinst->indirect_sram (Lijo)
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Ruili Ji <ruiliji2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.Zhang [Wed, 6 Aug 2025 08:03:13 +0000 (16:03 +0800)]
drm/amdgpu: Refactor VCN v5.0.1 HW init into separate instance function
Split the per-instance initialization code from vcn_v5_0_1_hw_init()
into a new vcn_v5_0_1_hw_init_inst() function. This improves code
organization by:
1. Separating the instance-specific initialization logic
2. Making the main init function more readable
3. Following the pattern used in queue reset
The SR-IOV specific initialization remains in the main function since
it has different requirements.
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Ruili Ji <ruiliji2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove this flag as the driver stopped managing it individually since
commit a4056c2a6344 ("drm/amd/display: use HW hdr mult for brightness
boost"). After some back and forth it was reintroduced as a condition to
`set_output_transfer_func()` in [1]. Without direct management, this
flag only changes value when all surface update flags are set true on
UPDATE_TYPE_FULL with no output TF status meaning.
Fixes: bb622e0c0044 ("drm/amd/display: program output tf when required") [1] Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Optimize remove_duplicates() from O(N^2) to O(N)
Replace the previous O(N^2) implementation of remove_duplicates() with
a O(N) version using a fast/slow pointer approach. The new version
keeps only the first occurrence of each element and compacts the array
in place, improving efficiency without changing functionality.
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: change dc stream color settings only in atomic commit
Don't update DC stream color components during atomic check. The driver
will continue validating the new CRTC color state but will not change DC
stream color components. The DC stream color state will only be
programmed at commit time in the `atomic_setup_commit` stage.
It fixes gamma LUT loss reported by KDE users when changing brightness
quickly or changing Display settings (such as overscan) with nightlight
on and HDR. As KWin can do a test commit with color settings different
from those that should be applied in a non-test-only commit, if the
driver changes DC stream color state in atomic check, this state can be
eventually HW programmed in commit tail, instead of the respective state
set by the non-blocking commit.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4444 Reported-by: Xaver Hugl <xaver.hugl@gmail.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Use kmalloc_array() instead of kmalloc()
Documentation/process/deprecated.rst recommends against the use of
kmalloc with dynamic size calculations due to the risk of overflow and
smaller allocation being made than the caller was expecting.
Replace kmalloc() with kmalloc_array() in amdgpu_amdkfd_gfx_v10.c,
amdgpu_amdkfd_gfx_v10_3.c, amdgpu_amdkfd_gfx_v11.c and
amdgpu_amdkfd_gfx_v12.c to make the intended allocation size clearer
and avoid potential overflow issues.
Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Rahul Kumar <rk0006818@gmail.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: update color on atomic commit time
Use `atomic_commit_setup` to change the DC stream state. It's a
preparation to remove from `atomic_check` changes in CRTC color
components of DC stream state and prevent DC to commit TEST_ONLY
changes.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4444 Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Li [Fri, 12 Sep 2025 15:01:50 +0000 (11:01 -0400)]
drm/amd/display: Init DCN35 clocks from pre-os HW values
[Why]
We did not initialize dc clocks with boot-time hw values during init.
This lead to incorrect clock values in dc, causing `dcn35_update_clocks`
to make incorrect updates.
[How]
Correctly initialize DC with pre-os clk values from HW.
s/dump/save/ as that accurately reflects the purpose of the functions.
Fixes: 8774029f76b9 ("drm/amd/display: Add DCN35 CLK_MGR") Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com> Reviewed-by: Chris Park <chris.park@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Thu, 11 Sep 2025 23:20:45 +0000 (19:20 -0400)]
drm/amd/display: Correct sw cache timing to ensure dispclk ramping
[why]
Current driver will cache the dispclk right after send cmd to pmfw,
but actual clock not reached yet.
Change to only cache the dispclk setting after HW reached to the real clock.
Also give some range as it might be in bypass clock setting.
Reviewed-by: Yihan Zhu <yihan.zhu@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY&HOW]
Add new tracing and performance measurements for SMU messaging.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dillon Varone [Fri, 22 Aug 2025 17:23:18 +0000 (13:23 -0400)]
drm/amd/display: Isolate dcn401 SMU functions
[WHY&HOW]
SMU interfaces are not backwards and forwards compatible, so they should
be isolated per version.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allen Li [Fri, 5 Sep 2025 08:58:38 +0000 (16:58 +0800)]
drm/amd/display: Add fast sync field in ultra sleep more for DMUB
[Why&How]
We need to inform DMUB whether fast sync in ultra sleep mode is supported,
so that it can disable desync error detection when the it is not enabled.
This helps prevent unexpected desync errors when transitioning out of
ultra sleep mode.
Add fast sync in ultra sleep mode field in replay copy setting command.
Reviewed-by: Robin Chen <robin.chen@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Allen Li <wei-guang.li@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alvin Lee [Tue, 9 Sep 2025 20:03:08 +0000 (16:03 -0400)]
drm/amd/display: Use mpc.preblend flag to indicate preblend
[Description]
Modifications in per asic capability means mpc.preblend flag should be used
to indicate preblend. Update relevant paths to use this flag.
Fixes: 39923050615c ("drm/amd/display: Clear DPP 3DLUT Cap") Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix for test crash due to power gating
[Why/How]
Call power gating routine only if it is defined.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Sridevi Arvindekar <sarvinde@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lo-an Chen [Mon, 25 Aug 2025 10:16:24 +0000 (18:16 +0800)]
drm/amd/display: Init dispclk from bootup clock for DCN314
[Why]
Driver does not pick up and save vbios's clocks during init clocks,
the dispclk in clk_mgr will keep 0 until the first update clocks.
In some cases, OS changes the timing in the second set mode
(lower the pixel clock), causing the driver to lower the dispclk
in prepare bandwidth, which is illegal and causes grey screen.
[How]
1. Dump and save the vbios's clocks, and init the dispclk in
dcn314_init_clocks.
2. Fix the condition in dcn314_update_clocks, regarding a 0kHz value.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Lo-an Chen <lo-an.chen@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Thu, 4 Sep 2025 18:49:35 +0000 (13:49 -0500)]
drm/amd/display: Handle interpolation for first data point
[Why]
If the first data point for a custom brightness curve is not 0% luminance
then the first few luminance values will be ignored.
[How]
Check signal is below first data point and if so do linear interpolation to
0 instead.
Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>