Vitaly Prosyak [Mon, 11 Nov 2024 22:24:08 +0000 (17:24 -0500)]
drm/amdgpu: fix usage slab after free
[ +0.000021] BUG: KASAN: slab-use-after-free in drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[ +0.000027] Read of size 8 at addr ffff8881b8605f88 by task amd_pci_unplug/2147
[ +0.000014] The buggy address belongs to the object at ffff8881b8605f80
which belongs to the cache kmalloc-64 of size 64
[ +0.000020] The buggy address is located 8 bytes inside of
freed 64-byte region [ffff8881b8605f80, ffff8881b8605fc0)
[ +0.000012] Memory state around the buggy address:
[ +0.000011] ffff8881b8605e80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ +0.000015] ffff8881b8605f00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[ +0.000015] >ffff8881b8605f80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ +0.000013] ^
[ +0.000011] ffff8881b8606000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
[ +0.000014] ffff8881b8606080: fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb fb
[ +0.000013] ==================================================================
The issue reproduced on VG20 during the IGT pci_unplug test.
The root cause of the issue is that the function drm_sched_fini is called before drm_sched_entity_kill.
In drm_sched_fini, the drm_sched_rq structure is freed, but this structure is later accessed by
each entity within the run queue, leading to invalid memory access.
To resolve this, the order of cleanup calls is updated:
This updated order ensures that all entities in the IPs are cleaned up first, followed by proper
cleanup of the schedulers.
Additional Investigation:
During debugging, another issue was identified in the amdgpu_vce_sw_fini function. The vce.vcpu_bo
buffer must be freed only as the final step in the cleanup process to prevent any premature
access during earlier cleanup stages.
v2: Using Christian suggestion call drm_sched_entity_destroy before drm_sched_fini.
Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Xiang Liu [Fri, 15 Nov 2024 08:59:30 +0000 (16:59 +0800)]
drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3
It is not necessarily corrupted. When there is RAS fatal error, device
memory access is blocked. Hence vcpu bo cannot be saved to system memory
as in a regular suspend sequence before going for reset. In other full
device reset cases, that gets saved and restored during resume.
v2: Remove redundant code like vcn_v4_0 did
v2: Refine commit message
v3: Drop the volatile
v3: Refine commit message
Signed-off-by: Xiang Liu <xiang.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Thu, 7 Nov 2024 14:53:47 +0000 (22:53 +0800)]
drm/amdgpu: Add sysfs interface for vcn reset mask
Add the sysfs interface for vcn:
vcn_reset_mask
The interface is read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.
V2: the sysfs node returns a text string instead of some flags (Christian)
V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
and print the strings in the order they are applied (Christian)
check amdgpu_gpu_recovery before creating sysfs file itself,
and initialize supported_reset_types in IP version files (Lijo)
v4: s/sdma/vcn/ in the reset mask setup
Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Matthew Brost [Wed, 13 Nov 2024 17:17:51 +0000 (09:17 -0800)]
drm/xe: Mark preempt fence workqueue as reclaim
Preempt fences are in the path of reclaim, and we signal these fences in
the preempt workqueue. With that, we need to mark the preempt fence
workqueue with reclaim so that this workqueue can make forward progress
during reclaim.
Nirmoy Das [Thu, 14 Nov 2024 15:05:37 +0000 (16:05 +0100)]
drm/xe/ufence: Wake up waiters after setting ufence->signalled
If a previous ufence is not signalled, vm_bind will return -EBUSY.
Delaying the modification of ufence->signalled can cause issues if the
UMD reuses the same ufence so update ufence->signalled before waking up
waiters.
Huacai Chen [Fri, 15 Nov 2024 15:02:25 +0000 (23:02 +0800)]
drm/amd/display: Allow building DC with clang on LoongArch
Clang on LoongArch (18+) appears to be unaffected by the bug causing
excessive stack usage in calculate_bandwidth(). But when building DC_FP
support the stack frame size can be as large as 2816 bytes, which causes
the FRAME_WARN build warnings. So on LoongArch we allow building DC with
clang, but disable DC_FP by default.
The help message is also updated.
Tested-by: Rui Wang <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zicheng Qu [Tue, 5 Nov 2024 14:01:37 +0000 (14:01 +0000)]
drm/amd/display: Fix null check for pipe_ctx->plane_state in hwss_setup_dpp
This commit addresses a null pointer dereference issue in
hwss_setup_dpp(). The issue could occur when pipe_ctx->plane_state is
null. The fix adds a check to ensure `pipe_ctx->plane_state` is not null
before accessing. This prevents a null pointer dereference.
Fixes: 0baae6246307 ("drm/amd/display: Refactor fast update to use new HWSS build sequence") Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Zicheng Qu <quzicheng@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zicheng Qu [Tue, 5 Nov 2024 14:01:36 +0000 (14:01 +0000)]
drm/amd/display: Fix null check for pipe_ctx->plane_state in dcn20_program_pipe
This commit addresses a null pointer dereference issue in
dcn20_program_pipe(). Previously, commit 8e4ed3cf1642 ("drm/amd/display:
Add null check for pipe_ctx->plane_state in dcn20_program_pipe")
partially fixed the null pointer dereference issue. However, in
dcn20_update_dchubp_dpp(), the variable pipe_ctx is passed in, and
plane_state is accessed again through pipe_ctx. Multiple if statements
directly call attributes of plane_state, leading to potential null
pointer dereference issues. This patch adds necessary null checks to
ensure stability.
Fixes: 8e4ed3cf1642 ("drm/amd/display: Add null check for pipe_ctx->plane_state in dcn20_program_pipe") Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Zicheng Qu <quzicheng@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Steven 'Steve' Kendall [Fri, 15 Nov 2024 21:17:58 +0000 (21:17 +0000)]
drm/radeon: Fix spurious unplug event on radeon HDMI
On several HP models (tested on HP 3125 and HP Probook 455 G2),
spurious unplug events are emitted upon login on Chrome OS.
This is likely due to the way Chrome OS restarts graphics
upon login, so it's possible it's an issue on other
distributions but not as common, though I haven't
reproduced the issue elsewhere.
Use logic from an earlier version of the merged change
(see link below) which iterates over connectors and finds
matching encoders, rather than the other way around.
Also fixes an issue with screen mirroring on Chrome OS.
I've deployed this patch on Fedora and did not observe
any regression on these devices.
Christophe JAILLET [Fri, 15 Nov 2024 17:26:06 +0000 (18:26 +0100)]
drm/radeon: Constify struct pci_device_id
'struct pci_device_id' is not modified in this driver.
Constifying this structure moves some data to a read-only section, so
increase overall security.
On a x86_64, with allmodconfig:
Before:
======
text data bss dec hex filename
11984 28672 44 40700 9efc drivers/gpu/drm/radeon/radeon_drv.o
After:
=====
text data bss dec hex filename
40000 664 44 40708 9f04 drivers/gpu/drm/radeon/radeon_drv.o
Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Fri, 15 Nov 2024 06:05:50 +0000 (11:35 +0530)]
drm/amdgpu: Use reset recovery state checks
Some in_reset checks are infact checking whether the state is
reinitialization after reset. Replace with reset_in_recovery calls to
identify that it's really checking for recovery stage after reset.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Fri, 15 Nov 2024 05:38:02 +0000 (11:08 +0530)]
drm/amdgpu: Add init level for post reset reinit
When device needs to be reset before initialization, it's not required
for all IPs to be initialized before a reset. In such cases, it needs to
identify whether the IP/feature is initialized for the first time or
whether it's reinitialized after a reset.
Add RESET_RECOVERY init level to identify post reset reinitialization
phase. This only provides a device level identification, IP/features may
choose to track their state independently also.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Victor Zhao [Thu, 14 Nov 2024 09:45:34 +0000 (17:45 +0800)]
drm/amdkfd: make sure ring buffer is flushed before update wptr
In a consecutive packet submission, for example unmap and query status,
when CP is reading wptr caused by unmap packet doorbell ring, if in some
case CP operates slower (e.g. doorbell_mode=1) and wptr has been updated
to next packet (query status), but the query status packet content has
not been flushed to memory yet, it will cause CP fetched stalled data.
Adding mb to ensure ring buffer has been updated before updating wptr.
Also adding a mb to ensure wptr updated before doorbell ring.
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aric Cyr [Mon, 11 Nov 2024 01:09:27 +0000 (20:09 -0500)]
drm/amd/display: 3.2.310
This version brings along the following:
- DC core fixes
- DCN35 fix
- DCN4+ fixes
- DML2 fix
- New SPL features
Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ovidiu Bunea [Wed, 6 Nov 2024 21:25:18 +0000 (16:25 -0500)]
drm/amd/display: Remove PIPE_DTO_SRC_SEL programming from set_dtbclk_dto
There are cases where an OTG is remapped from driving a regular HDMI
display to a DP/eDP display. There are also cases where DTBCLK needs to
be enabled for HPO, but DTBCLK DTO programming may be done while OTG is
still enabled which is dangerous as the PIPE_DTO_SRC_SEL programming may
change the pixel clock generator source for a mapped and running OTG and
cause it to hang.
Remove the PIPE_DTO_SRC_SEL programming from this sequence since it is
already done in program_pixel_clk(). Additionally, make sure that
program_pixel_clk sets DTBCLK DTO as source for special HDMI cases.
Cc: stable@vger.kernel.org # 6.11+ Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Samson Tam [Fri, 8 Nov 2024 00:05:03 +0000 (19:05 -0500)]
drm/amd/display: allow chroma 1:1 scaling when sharpness is off
[Why]
SPL code forces taps to 1 when ratio is 1:1 and sharpness is off
But for chroma 1:1, need taps > 1 to handle cositing
[How]
Do not force chroma taps to 1 when ratio is 1:1 for YUV420
Remove 420_CHROMA_BYPASS mode for scaler
Reviewed-by: Navid Assadian <navid.assadian@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Austin Zheng [Tue, 5 Nov 2024 15:22:02 +0000 (10:22 -0500)]
drm/amd/display: Populate Power Profile In Case of Early Return
Early return possible if context has no clk_mgr.
This will lead to an invalid power profile being returned
which looks identical to a profile with the lowest power level.
Add back logic that populated the power profile and overwrite
the value if needed.
Cc: stable@vger.kernel.org Fixes: d016d0dd5a57 ("drm/amd/display: Update Interface to Check UCLK DPM") Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Samson Tam [Tue, 5 Nov 2024 15:15:19 +0000 (10:15 -0500)]
drm/amd/display: add public taps API in SPL
[Why]
Add public API to obtain number of taps in SPL.
[How]
Isolate function to calculate recout, ratios and viewport before
calculating taps. Call function in both public taps API call and private
scaling call.
Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Joshua Aberback [Mon, 28 Oct 2024 21:12:22 +0000 (17:12 -0400)]
drm/amd/display: Fix handling of plane refcount
[Why]
The mechanism to backup and restore plane states doesn't maintain
refcount, which can cause issues if the refcount of the plane changes
in between backup and restore operations, such as memory leaks if the
refcount was supposed to go down, or double frees / invalid memory
accesses if the refcount was supposed to go up.
[How]
Cache and re-apply current refcount when restoring plane states.
Cc: stable@vger.kernel.org Reviewed-by: Josip Pavic <josip.pavic@amd.com> Signed-off-by: Joshua Aberback <joshua.aberback@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chris Park [Mon, 4 Nov 2024 18:18:39 +0000 (13:18 -0500)]
drm/amd/display: Ignore scalar validation failure if pipe is phantom
[Why]
There are some pipe scaler validation failure when the pipe is phantom
and causes crash in DML validation. Since, scalar parameters are not
as important in phantom pipe and we require this plane to do successful
MCLK switches, the failure condition can be ignored.
[How]
Ignore scalar validation failure if the pipe validation is marked as
phantom pipe.
Cc: stable@vger.kernel.org # 6.11+ Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Chris Park <chris.park@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yihan Zhu [Wed, 30 Oct 2024 20:20:21 +0000 (16:20 -0400)]
drm/amd/display: update pipe selection policy to check head pipe
[Why]
No check on head pipe during the dml to dc hw mapping will allow illegal
pipe usage. This will result in a wrong pipe topology to cause mpcc tree
totally mess up then cause a display hang.
[How]
Avoid to use the pipe is head in all check and avoid ODM slice during
preferred pipe check.
Cc: stable@vger.kernel.org Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Bhavin Sharma [Thu, 14 Nov 2024 15:11:12 +0000 (20:41 +0530)]
drm/amd/pm: remove redundant tools_size check
The check for tools_size being non-zero is redundant as tools_size is
explicitly set to a non-zero value (0x19000). Removing the if condition
simplifies the code without altering functionality.
Signed-off-by: Bhavin Sharma <bhavin.sharma@siliconsignals.io> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Huacai Chen [Wed, 13 Nov 2024 12:51:58 +0000 (20:51 +0800)]
drm/radeon: Use ttm_bo_move_null() in radeon_bo_move()
Since ttm_bo_move_null() is exactly the same as ttm_resource_free() +
ttm_bo_assign_mem(), we use ttm_bo_move_null() for the GTT --> SYSTEM
move case too. Then the code is more consistent as the SYSTEM --> GTT
move case.
Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Mon, 11 Nov 2024 13:16:01 +0000 (21:16 +0800)]
drm/amd/pm: Get xgmi link status for XGMI_v_6_4_0
Get XGMI_v_6_4_0 link status and populate it to metrics v1_7 for
SMU_v_13_0_6
v2: Get link status register value for each soc from separate
function (Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Mon, 11 Nov 2024 12:11:48 +0000 (20:11 +0800)]
drm/amd/pm: Add gpu_metrics_v1_7
Add new gpu_metrics_v1_7 to acquire xgmi link status,
application counter and max vram bandwidth
v2: Use gpu_metrics_v1_7 for SMU_v_13_0_6 (Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Wed, 13 Nov 2024 09:17:23 +0000 (17:17 +0800)]
drm/amd/pm: Update data types used for uapi i/f
Update member's data type in amdgpu_xcp_metrics from linux specific
to the ones compatible to uapi interface
Fixes: 4c07ff7d07f7 ("drm/amd/pm: Add gpu_metrics_v1_6") Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Suraj Kandpal [Mon, 4 Nov 2024 03:59:51 +0000 (09:29 +0530)]
drm/i915/hdcp: Fix when the first read and write are retried
Make sure that the first read/write in hdcp2_authentication_key_exchange
are only retried when we have either DP/DPMST encoder connected,
since we do this to give docks and dp encoders some extra time to
get their HDCP DPCD registers ready only for DP/DPMST encoders as
this issue is only observed here no need to burden other encoders
with extra retries as this causes the HDMI connector to have some
other timing issue and fails HDCP authentication.
--v2
-Add intent of patch [Chaitanya]
-Add reasoning for loop [Jani]
-Make sure we forfiet the 50ms wait for non DP/DPMST encoders.
--v3
-Remove the is_dp_encoder check [Jani/Chaitanya]
-Make the commit message more clearer [Jani]
Everest K.C. [Wed, 23 Oct 2024 23:33:55 +0000 (17:33 -0600)]
drm/xe/guc: Fix dereference before NULL check
The pointer list->list is dereferenced before the NULL check.
Fix this by moving the NULL check outside the for loop, so that
the check is performed before the dereferencing.
The list->list pointer cannot be NULL so this has no effect on runtime.
It's just a correctness issue.
This issue was reported by Coverity Scan.
https://scan7.scan.coverity.com/#/project-view/51525/11354?selectedIssue=1600335
Christian König [Tue, 4 Jun 2024 16:05:00 +0000 (18:05 +0200)]
drm/amdgpu: enable GTT fallback handling for dGPUs only
That is just a waste of time on APUs.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3704 Fixes: 216c1282dde3 ("drm/amdgpu: use GTT only as fallback for VRAM|GTT") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shaoyun Liu [Wed, 23 Oct 2024 15:12:00 +0000 (11:12 -0400)]
drm/amd/amdgpu: limit single process inside MES
This is for MES to limit only one process for the user queues
Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stanley.Yang [Mon, 4 Nov 2024 06:14:09 +0000 (14:14 +0800)]
drm/amdgpu: Support vcn and jpeg error info parsing
Add vcn and jpeg error count parsing.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shaoyun Liu [Fri, 18 Oct 2024 19:56:25 +0000 (15:56 -0400)]
drm/amd : Update MES API header file for v11 & v12
New features require the new fields defines
Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shaoyun Liu [Fri, 4 Oct 2024 16:00:36 +0000 (12:00 -0400)]
drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling
Add back kfd queues in start scheduling that originally been
removed on stop scheduling.
Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiaogang Chen [Mon, 28 Oct 2024 20:06:19 +0000 (15:06 -0500)]
drm/amdkfd: change kfd process kref count at creation
kfd process kref count(process->ref) is initialized to 1 by kref_init. After
it is created not need to increase its kref. Instad add kfd process kref at kfd
process mmu notifier allocation since we already decrease the kref at
free_notifier of mmu_notifier_ops, so pair them.
When user process opens kfd node multiple times the kfd process kref is
increased each time to balance with kfd node close operation.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Advait Dhamorikar [Tue, 8 Oct 2024 19:16:23 +0000 (00:46 +0530)]
drm/amdgpu: Cleanup shift coding style
Improves the coding style by updating bit-shift
operations in the amdgpu_jpeg.c driver file.
It ensures consistency and avoids potential issues
by explicitly using 1U and 1ULL for unsigned
and unsigned long long shifts in all relevant instances.
Signed-off-by: Advait Dhamorikar <advaitdhamorikar@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Victor Skvortsov [Wed, 30 Oct 2024 14:18:00 +0000 (10:18 -0400)]
drm/amdgpu: Implement virt req_ras_err_count
Enable RAS late init if VF RAS Telemetry is supported.
When enabled, the VF can use this interface to query total
RAS error counts from the host.
The VF FB access may abruptly end due to a fatal error,
therefore the VF must cache and sanitize the input.
The Host allows 15 Telemetry messages every 60 seconds, afterwhich
the host will ignore any more in-coming telemetry messages. The VF will
rate limit its msg calling to once every 5 seconds (12 times in 60 seconds).
While the VF is rate limited, it will continue to report the last
good cached data.
v2: Flip generate report & update statistics order for VF
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Acked-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Victor Skvortsov [Wed, 30 Oct 2024 13:58:56 +0000 (09:58 -0400)]
drm/amdgpu: VF Query RAS Caps from Host if supported
If VF RAS Capability support is enabled, guest is able to
retrieve the real RAS support from the host.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Victor Skvortsov [Wed, 30 Oct 2024 13:45:27 +0000 (09:45 -0400)]
drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry
Add message handlers for RAS telemetry.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Victor Skvortsov [Wed, 30 Oct 2024 13:33:51 +0000 (09:33 -0400)]
drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support
The SRIOV PF/VF Data exchange is extended by 64KB for VF RAS Telemetry data.
Add Host RAS Telemetry enable capabilities bitfields.
Add a new VF msg REQ_RAS_ERROR_COUNT, the host response data will be populated
in the RAS Telemetry region.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rodrigo Siqueira [Tue, 5 Nov 2024 15:40:23 +0000 (08:40 -0700)]
drm/amd/display: Adjust VSDB parser for replay feature
At some point, the IEEE ID identification for the replay check in the
AMD EDID was added. However, this check causes the following
out-of-bounds issues when using KASAN:
[ 27.804016] BUG: KASAN: slab-out-of-bounds in amdgpu_dm_update_freesync_caps+0xefa/0x17a0 [amdgpu]
[ 27.804788] Read of size 1 at addr ffff8881647fdb00 by task systemd-udevd/383
...
[ 27.821207] Memory state around the buggy address:
[ 27.821215] ffff8881647fda00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 27.821224] ffff8881647fda80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 27.821234] >ffff8881647fdb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 27.821243] ^
[ 27.821250] ffff8881647fdb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 27.821259] ffff8881647fdc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 27.821268] ==================================================================
This is caused because the ID extraction happens outside of the range of
the edid lenght. This commit addresses this issue by considering the
amd_vsdb_block size.
Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rodrigo Siqueira [Tue, 5 Nov 2024 15:50:02 +0000 (08:50 -0700)]
drm/amd/display: Remove unused code
This commit removes a legacy debug_defaults_diags struct.
Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dillon Varone [Fri, 1 Nov 2024 16:00:14 +0000 (12:00 -0400)]
drm/amd/display: Require minimum VBlank size for stutter optimization
If the nominal VBlank is too small, optimizing for stutter can cause
the prefetch bandwidth to increase drasticaly, resulting in higher
clock and power requirements. Only optimize if it is >3x the stutter
latency.
Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dave Airlie [Mon, 11 Nov 2024 02:10:48 +0000 (12:10 +1000)]
Merge tag 'drm-misc-next-2024-11-08' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.13:
UAPI Changes:
- Add 1X7X5 media-bus formats.
Cross-subsystem Changes:
- Maintainer updates for VKMS and IT6263.
- Add media-bus-fmt for MEDIA_BUS_FMT_RGB101010_1X7X5_*.
- Add IT6263 DT bindings and driver.
Core Changes:
- Add ABGR210101010 support to panic handler.
- Use ATOMIC64_INIT in drm_file.c
- Improve scheduler teardown documentation.
Driver Changes:
- Make mediatek compile on ARM again.
- Add missing drm/drm_bridge.h header include, already in drm-next.
- Small fixes and cleanups to vkms, bridge/it6505, panfrost, panthor.
- Add panic support to nouveau for nv50+.
Ryan Seto [Fri, 1 Nov 2024 14:19:56 +0000 (10:19 -0400)]
drm/amd/display: Handle dml allocation failure to avoid crash
[Why]
In the case where a dml allocation fails for any reason, the
current state's dml contexts would no longer be valid. Then
subsequent calls dc_state_copy_internal would shallow copy
invalid memory and if the new state was released, a double
free would occur.
[How]
Reset dml pointers in new_state to NULL and avoid invalid
pointer
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Ryan Seto <ryanseto@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Austin Zheng [Thu, 31 Oct 2024 18:10:39 +0000 (14:10 -0400)]
drm/amd/display: Update SPL Taps Required For Integer Scaling
Number of taps is incorrectly being set when integer scaling is enabled.
Taps required when src_rect != dst_rect previously not considered.
Perform the calculations when integer scaling is enabled.
Set taps to 1 if the scaling ratio is 1:1.
Reviewed-by: Samson Tam <samson.tam@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Emily Nie [Wed, 16 Oct 2024 19:52:28 +0000 (15:52 -0400)]
drm/amd/display: disabling p-state checks for DCN31 and DCN314
[Why]
IGT displays Dmesg warnings which are likely false
[How]
Disabling p-state checks leading to this warning for DCN31 and DCN314
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Emily Nie <Emily.Nie@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fudongwang [Wed, 30 Oct 2024 08:45:56 +0000 (16:45 +0800)]
drm/amd/display: always blank stream before disable crtc
Garbage will show due to dig is on. So blank stream needed.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Fudongwang <Fudong.Wang@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Wed, 30 Oct 2024 18:06:18 +0000 (18:06 +0000)]
drm/amd/display: Read DP tunneling support only for DPIA endpoints
Unconditionally reading DP tunneling support results in extraneous
errors messages on certain devices. Fix this by guarding the DPCD read
for DP tunneling support for USB4 DPIA endpoints.
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom Chung [Tue, 29 Oct 2024 09:28:23 +0000 (17:28 +0800)]
drm/amd/display: Fix Panel Replay not update screen correctly
[Why]
In certain use case such as KDE login screen, there will be no atomic
commit while do the frame update.
If the Panel Replay enabled, it will cause the screen not updated and
looks like system hang.
[How]
Delay few atomic commits before enabled the Panel Replay just like PSR.
Fixes: be64336307a6c ("drm/amd/display: Re-enable panel replay feature") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3686 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3682 Tested-By: Corey Hickey <bugfood-c@fatooh.org> Tested-By: James Courtier-Dutton <james.dutton@gmail.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom Chung [Tue, 29 Oct 2024 07:38:16 +0000 (15:38 +0800)]
drm/amd/display: Change some variable name of psr
Panel Replay feature may also use the same variable with PSR.
Change the variable name and make it not specify for PSR.
Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
HPD error status does not cover Replay desync error status
while executing autotests and CTS tests.
[How]
Refactor the checking flow, reporting the HPD error based on
different eDP feature.
Reviewed-by: Robin Chen <robin.chen@amd.com> Signed-off-by: Leon Huang <Leon.Huang1@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Wed, 6 Nov 2024 07:14:45 +0000 (12:44 +0530)]
drm/amdgpu/gfx11: Enable cleaner shader for GFX11.0.0/11.0.2 GPUs
Enable the cleaner shader for GFX11.0.0/11.0.2 GPUs to provide data
isolation between GPU workloads. The cleaner shader is responsible for
clearing the Local Data Store (LDS), Vector General Purpose Registers
(VGPRs), and Scalar General Purpose Registers (SGPRs), which helps
prevent data leakage and ensures accurate computation results.
This update extends cleaner shader support to GFX11.0.0/11.0.2 GPUs,
previously available for GFX11.0.3. It enhances security by clearing GPU
memory between processes and maintains a consistent GPU state across KGD
and KFD workloads.
Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Wed, 6 Nov 2024 13:55:50 +0000 (19:25 +0530)]
drm/amdgpu: Add documentation for enforce isolation feature
This feature enables process isolation on the graphics engine by
serializing access to it and adding a cleaner shader which clears LDS
(Local Data Store) and GPRs (General Purpose Registers) between jobs.
Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yuan Can [Wed, 6 Nov 2024 01:35:41 +0000 (09:35 +0800)]
drm/amdkfd: Fix wrong usage of INIT_WORK()
In kfd_procfs_show(), the sdma_activity_work_handler is a local variable
and the sdma_activity_work_handler.sdma_activity_work should initialize
with INIT_WORK_ONSTACK() instead of INIT_WORK().
Fixes: 32cb59f31362 ("drm/amdkfd: Track SDMA utilization per process") Signed-off-by: Yuan Can <yuancan@huawei.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 31 Oct 2024 09:04:17 +0000 (10:04 +0100)]
drm/amdgpu: fix check in gmc_v9_0_get_vm_pte()
The coherency flags can only be determined when the BO is locked and that
in turn is only guaranteed when the mapping is validated.
Fix the check, move the resource check into the function and add an assert
that the BO is locked.
Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: d1a372af1c3d ("drm/amdgpu: Set MTYPE in PTE based on BO flags") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tim Huang [Mon, 28 Oct 2024 05:51:50 +0000 (13:51 +0800)]
drm/amd/pm: print pp_dpm_mclk in ascending order on SMU v14.0.0
Currently, the pp_dpm_mclk values are reported in descending order
on SMU IP v14.0.0/1/4. Adjust to ascending order for consistency
with other clock interfaces.
Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ramesh Errabolu [Wed, 6 Nov 2024 00:50:02 +0000 (18:50 -0600)]
drm/amdgpu: Inform if PCIe based P2P links are not available
Raise an info message in kernel log if PCIe root complex
determines that a AMD GPU device D<i> cannot have P2P
communication with another AMD GPU device D<j>
Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Tue, 29 Oct 2024 06:36:35 +0000 (14:36 +0800)]
drm/amdgpu: Add sysfs interface for jpeg reset mask
Add the sysfs interface for jpeg:
jpeg_reset_mask
The interface is read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.
V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
and print the strings in the order they are applied (Christian)
check amdgpu_gpu_recovery before creating sysfs file itself,
and initialize supported_reset_types in IP version files (Lijo)
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Tue, 29 Oct 2024 06:33:05 +0000 (14:33 +0800)]
drm/amdgpu: Add sysfs interface for vpe reset mask
Add the sysfs interface for vpe:
vpe_reset_mask
The interface is read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.
V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
and print the strings in the order they are applied (Christian)
check amdgpu_gpu_recovery before creating sysfs file itself,
and initialize supported_reset_types in IP version files (Lijo)
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Mon, 4 Nov 2024 05:15:49 +0000 (13:15 +0800)]
drm/amdgpu: Add sysfs interface for sdma reset mask
Add the sysfs interface for sdma:
sdma_reset_mask
The interface is read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.
V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
and print the strings in the order they are applied (Christian)
check amdgpu_gpu_recovery before creating sysfs file itself,
and initialize supported_reset_types in IP version files (Lijo)
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Tue, 5 Nov 2024 05:34:09 +0000 (11:04 +0530)]
drm/amdgpu: Avoid kcq disable during reset
Reset sequence indicates that hardware already ran into a bad state.
Avoid sending unmap queue request to reset KCQ. This will also cover RAS
error scenarios which need a reset to recover, hence remove the check.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Tue, 5 Nov 2024 05:00:20 +0000 (10:30 +0530)]
drm/amdgpu: Fix map/unmap queue logic
In current logic, it calls ring_alloc followed by a ring_test. ring_test
in turn will call another ring_alloc. This is illegal usage as a
ring_alloc is expected to be closed properly with a ring_commit. Change
to commit the map/unmap queue packet first followed by a ring_test. Add a
comment about the usage of ring_test.
Also, reorder the current pre-condition checks of job hang or kiq ring
scheduler not ready. Without them being met, it is not useful to attempt
ring or memory allocations.
Fixes tag refers to the original patch which introduced this issue which
then got carried over into newer code.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Fixes: 6c10b5cc4eaa ("drm/amdgpu: Remove duplicate code in gfx_v8_0.c")
Yang Wang [Wed, 6 Nov 2024 06:49:56 +0000 (14:49 +0800)]
drm/amdgpu: fix ACA bank count boundary check error
fix ACA bank count boundary check error.
Fixes: f5e4cc8461c4 ("drm/amdgpu: implement RAS ACA driver framework") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Tue, 5 Nov 2024 07:22:56 +0000 (15:22 +0800)]
drm/amdgpu: Add sysfs interface for gc reset mask
Add two sysfs interfaces for gfx and compute:
gfx_reset_mask
compute_reset_mask
These interfaces are read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.
V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
and print the strings in the order they are applied (Christian)
check amdgpu_gpu_recovery before creating sysfs file itself,
and initialize supported_reset_types in IP version files (Lijo)
v4: Fixing uninitialized variables (Tim)
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
chongli2 [Wed, 6 Nov 2024 03:43:09 +0000 (11:43 +0800)]
drm/amdgpu: fix return random value when multiple threads read registers via mes.
The currect code use the address "adev->mes.read_val_ptr" to
store the value read from register via mes.
So when multiple threads read register,
multiple threads have to share the one address,
and overwrite the value each other.
Assign an address by "amdgpu_device_wb_get" to store register value.
each thread will has an address to store register value.
Signed-off-by: chongli2 <chongli2@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Tue, 5 Nov 2024 17:38:30 +0000 (12:38 -0500)]
drm/amdkfd: remove gfx 12 trap handler page size cap
GFX 12 does not require a page size cap for the trap handler because
it does not require a CWSR work around like GFX 11 did.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: David Belanger <david.belanger@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dave Airlie [Fri, 8 Nov 2024 02:05:45 +0000 (12:05 +1000)]
Merge tag 'drm-etnaviv-next-2024-11-07' of https://git.pengutronix.de/git/lst/linux into drm-next
- improve handling of DMA address limited systems
- improve GPU hangcheck
- fix address space collision on >= 4K CPU pages
- flush all known writeback caches before memory release
- various code cleanups
amdkfd:
- Add topology cap flag for per queue reset
- Add an interface to query whether KFD queues are present
- Use dynamic allocation for get_cu_occupancy