Dave Airlie [Tue, 11 Mar 2025 02:15:48 +0000 (12:15 +1000)]
Merge tag 'drm-intel-next-2025-03-10' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
drm/i915 feature pull #2 for v6.15:
Features and functionality:
- FBC dirty rectangle support for display version 30+ (Vinod)
- Update plane scalers via DSB based commits (Ville)
- Move runtime power status info to display power debugfs (Jani)
Refactoring and cleanups:
- Convert i915 and xe to DRM client setup (Thomas)
- Refactor and clean up CDCLK/bw/dbuf readout/sanitation (Ville)
- Conversions from drm_i915_private to struct intel_display (Jani, Suraj)
- Refactor display reset for better separation between display and core (Jani)
- Move panel fitter code together (Jani)
- Add mst and hdcp sub-structs to display structs for clarity (Jani)
- Header refactoring to clarify separation between display and i915 core (Jani)
Fixes:
- Fix DP MST max stream count to match number of pipes (Jani)
- Fix encoder HW state readout of DP MST UHBR (Imre)
- Fix ICL+ combo PHY cursor and coeff polarity programming (Ville)
- Fix pipeDMC and ATS fault handling (Ville)
- Display workarounds (Gustavo)
- Remove duplicate forward declaration (Vinod)
- Improve POWER_DOMAIN_*() macro type safety (Gustavo)
- Move CDCLK post plane programming later (Ville)
Dave Airlie [Tue, 11 Mar 2025 00:26:08 +0000 (10:26 +1000)]
Merge tag 'drm-xe-next-2025-03-07' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- Expose per-engine activity via perf pmu (Riana, Lucas, Umesh)
- Add support for EU stall sampling (Harish, Ashutosh)
- Allow userspace to provide low latency hint for submission (Tejas)
- GPU SVM and Xe SVM implementation (Matthew Brost)
Cross-subsystem Changes:
- devres handling for component drivers (Lucas)
- Backmege drm-next to allow cross dependent change with i915
- GPU SVM and Xe SVM implementation (Matthew Brost)
Core Changes:
Driver Changes:
- Fixes to userptr and missing validations (Matthew Auld, Thomas
Hellström, Matthew Brost)
- devcoredump typos and error handling improvement (Shuicheng)
- Allow oa_exponent value of 0 (Umesh)
- Finish moving device probe to devm (Lucas)
- Fix race between submission restart and scheduled being freed (Tejas)
- Fix counter overflows in gt_stats (Francois)
- Refactor and add missing workarounds and tunings for pre-Xe2 platforms
(Aradhya, Tvrtko)
- Fix PXP locks interaction with exec queues being killed (Daniele)
- Eliminate TIMESTAMP_OVERRIDE from xe (Matt Roper)
- Change xe_gen_wa_oob to allow building on MacOS (Daniel Gomez)
- New workarounds for Panther Lake (Tejas)
- Fix VF resume errors (Satyanarayana)
- Fix workaround infra skipping some workarounds dependent on engine
initialization (Tvrtko)
- Improve per-IP descriptors (Gustavo)
- Add more error injections to probe sequence (Francois)
Dave Airlie [Tue, 11 Mar 2025 00:19:06 +0000 (10:19 +1000)]
Merge tag 'drm-msm-next-2025-03-09' of https://gitlab.freedesktop.org/drm/msm into drm-next
Updates for v6.15
GPU:
- Fix obscure GMU suspend failure
- Expose syncobj timeline support
- Extend GPU devcoredump with pagetable info
- a623 support
- Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot / devcoredump
Display:
- Add cpu-cfg interconnect paths on SM8560 and SM8650
- Introduce KMS OMMU fault handler, causing devcoredump snapshot
- Fixed error pointer dereference in msm_kms_init_aspace()
DPU:
- Fix mode_changing handling
- Add writeback support on SM6150 (QCS615)
- Fix DSC programming in 1:1:1 topology
- Reworked hardware resource allocation, moving it to the CRTC code
- Enabled support for Concurrent WriteBack (CWB) on SM8650
- Enabled CDM blocks on all relevant platforms
- Reworked debugfs interface for BW/clocks debugging
- Clear perf params before calculating bw
- Support YUV formats on writeback
- Fixed double inclusion
- Fixed writeback in YUV formats when using cloned output, Dropped
wb2_formats_rgb
- Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt
kerneldocs
- Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode()
amdkfd:
- Fix possible NULL pointer in queue validation
- Remove unnecessary CP domain validation
- SDMA queue reset support
- Add per process flags
radeon:
- Fix spelling typos
- RS400 hyperZ fix
UAPI:
- Add KFD per process flags for setting precision
Proposed user space: https://github.com/ROCm/ROCR-Runtime/commit/2a64fa5e06e80e0af36df4ce0c76ae52eeec0a9d
Specific constrain in if:then: blocks for variable lists, like clocks
and clock-names, should have a fixed upper and lower size. Older
dtschema implied minItems, but that's not true since 2024 and missing
minItems means that lower bound is not set.
Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/638751/ Link: https://lore.kernel.org/r/20250221-b4-sm8750-display-v3-2-3ea95b1630ea@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Correct commit 20972609d12c ("drm/msm/dpu: Require modeset if clone mode
status changes") and describe old_crtc_state and new_crtc_state params
instead of the single previously used parameter crtc_state.
Fixes: 20972609d12c ("drm/msm/dpu: Require modeset if clone mode status changes") Signed-off-by: Dmitry Baryshkov <lumag@kernel.org> Reviewed-by: Rob Clark <robdclark@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/641313/ Link: https://lore.kernel.org/r/20250306-dpu-fix-docs-v1-1-e51b71e8ad84@kernel.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Harish Kasiviswanathan [Tue, 14 Jan 2025 21:02:21 +0000 (16:02 -0500)]
drm/amdkfd: Add support for more per-process flag
Add support for more per-process flags starting with option to configure
MFMA precision for gfx 9.5
v2: Change flag name to KFD_PROC_FLAG_MFMA_HIGH_PRECISION
Remove unused else condition
v3: Bump the KFD API version
v4: Missed SH_MEM_CONFIG__PRECISION_MODE__SHIFT define. Added it.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harish Kasiviswanathan [Tue, 14 Jan 2025 19:13:35 +0000 (14:13 -0500)]
drm/amdkfd: Set per-process flags only once for gfx9/10/11/12
Define set_cache_memory_policy() for these asics and move all static
changes from update_qpd() which is called each time a queue is created
to set_cache_memory_policy() which is called once during process
initialization
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harish Kasiviswanathan [Tue, 14 Jan 2025 19:07:24 +0000 (14:07 -0500)]
drm/amdkfd: Set per-process flags only once cik/vi
Set per-process static sh_mem config only once during process
initialization. Move all static changes from update_qpd() which is
called each time a queue is created to set_cache_memory_policy() which
is called once during process initialization.
set_cache_memory_policy() is currently defined only for cik and vi
family. So this commit only focuses on these two. A separate commit will
address other asics.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Thu, 6 Mar 2025 18:51:24 +0000 (12:51 -0600)]
drm/amd: Keep display off while going into S4
When userspace invokes S4 the flow is:
1) amdgpu_pmops_prepare()
2) amdgpu_pmops_freeze()
3) Create hibernation image
4) amdgpu_pmops_thaw()
5) Write out image to disk
6) Turn off system
Then on resume amdgpu_pmops_restore() is called.
This flow has a problem that because amdgpu_pmops_thaw() is called
it will call amdgpu_device_resume() which will resume all of the GPU.
This includes turning the display hardware back on and discovering
connectors again.
This is an unexpected experience for the display to turn back on.
Adjust the flow so that during the S4 sequence display hardware is
not turned back on.
Reported-by: Xaver Hugl <xaver.hugl@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038 Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom St Denis [Thu, 6 Mar 2025 17:31:56 +0000 (12:31 -0500)]
drm/amd/amdgpu: Add missing GC 11.5.0 register
Adds register needed for debugging purposes.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Sierra [Thu, 16 May 2024 22:06:48 +0000 (17:06 -0500)]
drm/amdkfd: clear F8_MODE for gfx950
Default F8_MODE should be OCP format on gfx950.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Wed, 5 Mar 2025 16:31:22 +0000 (22:01 +0530)]
drm/amdgpu: Fix annotation for dce_v6_0_line_buffer_adjust function
Updated description for the 'other_mode' parameter. This parameter is
used to determine the display mode of another display controller that
may be sharing the line buffer.
Cc: Ken Wang <Qingqing.Wang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wentao Liang [Thu, 6 Mar 2025 07:51:48 +0000 (15:51 +0800)]
drm/amdgpu: handle amdgpu_cgs_create_device() errors in amd_powerplay_create()
Add error handling to propagate amdgpu_cgs_create_device() failures
to the caller. When amdgpu_cgs_create_device() fails, release hwmgr
and return -ENOMEM to prevent null pointer dereference.
[v1]->[v2]: Change error code from -EINVAL to -ENOMEM. Free hwmgr.
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Starting from 6.11, AMDGPU driver, while being loaded with amdgpu.dc=1,
due to lack of .is_two_pixels_per_container function in dce60_tg_funcs,
causes a NULL pointer dereference on PCs with old GPUs, such as R9 280X.
So this fix adds missing .is_two_pixels_per_container to dce60_tg_funcs.
Reported-by: Rosen Penev <rosenp@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3942 Fixes: e6a901a00822 ("drm/amd/display: use even ODM slice width for two pixels per container") Signed-off-by: Aliaksei Urbanski <aliaksei.urbanski@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shiwu Zhang [Tue, 4 Mar 2025 03:13:48 +0000 (11:13 +0800)]
drm/amdgpu: fix the gb_addr_config_fields init value mismatch
For gfx_v9_4_3 specifically, before regGB_ADDR_CONFIG is overwritten
in gfx hw_init it is read out to popluate the gb_addr_config_fields
in the sw_init stage, which causes mismatch.
Fix it by using the golden value in sw_init as well.
v2: This is a driver-set golden reg and keep as it is (Lijo)
Tao Zhou [Thu, 6 Mar 2025 03:36:49 +0000 (11:36 +0800)]
drm/amdgpu: increase RAS bad page threshold
For default policy, driver will issue an RMA event when the number of
bad pages is greater than 8 physical rows, rather than reaches 8
physical rows, don't rely on threshold configurable parameters in
default mode.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Emily Deng [Mon, 3 Mar 2025 07:10:22 +0000 (15:10 +0800)]
drm/amdgpu: Fix missing drain retry fault the last entry
While the entry get in svm_range_unmap_from_cpu is the last entry, and
the entry is page fault, it also need to be dropped. So for equal case,
it also need to be dropped.
v2:
Only modify the svm_range_restore_pages.
Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Xiaogang Chen<xiaogang.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Victor Lu [Thu, 13 Feb 2025 23:49:46 +0000 (18:49 -0500)]
drm/amdgpu: Do not set power brake sequence for Aldebaran SRIOV
Aldebaran SRIOV VF cannot access the power brake feature regs.
The accesses can be skipped to avoid a dmesg warning.
v2: Remove redundant asic type check
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Victor Lu [Thu, 13 Feb 2025 23:41:26 +0000 (18:41 -0500)]
drm/amdgpu: Do not write to GRBM_CNTL if Aldebaran SRIOV
Aldebaran SRIOV VF does not have write permissions to GRBM_CTNL.
This access can be skipped to avoid a dmesg warning.
v2: Use GC IP version check instead of asic check
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ville Syrjälä [Thu, 6 Mar 2025 16:34:20 +0000 (18:34 +0200)]
drm/i915: Relocate intel_bw_crtc_update()
intel_bw_crtc_update() is only used by the readout path, so relocate
the function next its only caller. Easier to read the code when related
things are nearby.
Ville Syrjälä [Thu, 6 Mar 2025 16:34:17 +0000 (18:34 +0200)]
drm/i915: Split wm sanitize from readout
I'll need to move the wm readout to an earlier point in the
sequence (since the bw state readout will need ddb information
from the wm readout). But (at least for now) the wm sanitation
will need to stay put as it needs to also sanitize things for
any pipes/planes we disable later during the hw state takeover.
Ville Syrjälä [Thu, 6 Mar 2025 16:34:16 +0000 (18:34 +0200)]
drm/i915: Simplify cdclk_disable_noatomic()
Instead of hand rolling the cdclk state disabling for a
pipe in noatomic() let's just recompute the whole thing
from scratch. Less code we have to remember to keep in sync.
Ville Syrjälä [Thu, 6 Mar 2025 16:34:15 +0000 (18:34 +0200)]
sem/i915: Simplify intel_cdclk_update_hw_state()
intel_crtc_calculate_min_cdclk() can't return an error
(since commit 5ac860cc5254 ("drm/i915: Fix DBUF bandwidth vs.
cdclk handling")) so there is no point in checking for one.
Also we can just call it unconditionally since it itself
checks crtc_state->hw.enabled. We are currently checking
crtc_state->hw.active in the readout path, but active==enabled
during readout, and arguably enabled is the more correct thing
to check anyway.
Ville Syrjälä [Thu, 6 Mar 2025 16:34:14 +0000 (18:34 +0200)]
drm/i915: Skip some bw_state readout on pre-icl
We only compute bw_state->data_rate and bw_state->num_active_planes
on icl+. Do the same during readout so that we don't leave random
junk inside the state.
Ville Syrjälä [Thu, 6 Mar 2025 16:34:09 +0000 (18:34 +0200)]
drm/i915: Add skl_wm_plane_disable_noatomic()
Add skl_wm_plane_disable_noatomic() which will clear out all
the ddb and wm state for the plane. And let's do this _before_
we call plane->disable_arm() so that it'll actually clear out
the state in the hardware as well.
Currently this won't do anything new for most of the
intel_plane_disable_noatomic() calls since those are done before
wm readout, and thus everything wm/ddb related in the state
will still be zeroed anyway. The only difference will be for
skl_dbuf_sanitize() is happens after wm readout. But I'll be
reordering thigns so that wm readout happens earlier and at that
point this will guarantee that we still clear out the old
wm/ddb junk from the state.
Ville Syrjälä [Thu, 6 Mar 2025 16:34:07 +0000 (18:34 +0200)]
drm/i915: Extract skl_wm_crtc_disable_noatomic()
Hoist the dbuf stuff into a separate function from
intel_crtc_disable_noatomic_complete() so that the details
are better hidden inside skl_watermark.c.
We can also skip the whole thing on pre-skl since the dbuf state
isn't actually used on those platforms. The readout path does
still fill dbuf_state->active_pipes but we'll remedy that later.
Ville Syrjälä [Thu, 6 Mar 2025 16:34:04 +0000 (18:34 +0200)]
drm/i915: Don't clobber crtc_state->cpu_transcoder for inactive crtcs
Inactive crtcs are supposed to have their crtc_state completely
cleared. Currently we are clobbering crtc_state->cpu_transcoder
before determining whether it's actually enabled or not. Don't
do that.
I want to rework the inherited flag handling for inactive crtcs
a bit, and having a bogus cpu_transcoder in the crtc state can
then cause confusing fastset mismatches even when the crtc never
changes state during the commit.
Jani Nikula [Wed, 5 Mar 2025 16:38:22 +0000 (18:38 +0200)]
drm/xe/compat: refactor compat i915_drv.h
The compat i915_drv.h contains things that aren't there in the original
i915_drv.h. Split out gem/i915_gem_object.h and i915_scheduler_types.h,
moving the corresponding pieces out, including FORCEWAKE_ALL to
intel_uncore.h.
Technically I915_PRIORITY_DISPLAY should be in i915_priolist_types.h,
but it's a bit overkill to split out another file just for
that. i915_scheduler_types.h shall do.
With this, the compat i915_drv.h becomes a strict subset of the
original.
Matthew Brost [Thu, 6 Mar 2025 01:26:56 +0000 (17:26 -0800)]
drm/xe: Add always_migrate_to_vram modparam
Used to show we can bounce memory multiple times which will happen once
a real migration policy is implemented. Can be removed once migration
policy is implemented.
v3:
- Pull some changes into the previous patch (Thomas)
- Better commit message (Thomas)
Matthew Brost [Thu, 6 Mar 2025 01:26:52 +0000 (17:26 -0800)]
drm/xe: Add SVM VRAM migration
Migration is implemented with range granularity, with VRAM backing being
a VM private TTM BO (i.e., shares dma-resv with VM). The lifetime of the
TTM BO is limited to when the SVM range is in VRAM (i.e., when a VRAM
SVM range is migrated to SRAM, the TTM BO is destroyed).
The design choice for using TTM BO for VRAM backing store, as opposed to
direct buddy allocation, is as follows:
- DRM buddy allocations are not at page granularity, offering no
advantage over a BO.
- Unified eviction is required (SVM VRAM and TTM BOs need to be able to
evict each other).
- For exhaustive eviction [1], SVM VRAM allocations will almost certainly
require a dma-resv.
- Likely allocation size is 2M which makes of size of BO (872)
acceptable per allocation (872 / 2M == .0004158).
With this, using TTM BO for VRAM backing store seems to be an obvious
choice as it allows leveraging of the TTM eviction code.
Current migration policy is migrate any SVM range greater than or equal
to 64k once.
v2:
- Rebase on latest GPU SVM
- Retry page fault on get pages returning mixed allocation
- Use drm_gpusvm_devmem
v3:
- Use new BO flags
- New range structure (Thomas)
- Hide migration behind Kconfig
- Kernel doc (Thomas)
- Use check_pages_threshold
v4:
- Don't evict partial unmaps in garbage collector (Thomas)
- Use %pe to print errors (Thomas)
- Use %p to print pointers (Thomas)
v5:
- Use range size helper (Thomas)
- Make BO external (Thomas)
- Set tile to NULL for BO creation (Thomas)
- Drop BO mirror flag (Thomas)
- Hold BO dma-resv lock across migration (Auld, Thomas)
v6:
- s/drm_info/drm_dbg (Thomas)
- s/migrated/skip_migrate (Himal)
- Better debug message on VRAM migration failure (Himal)
- Drop return BO from VRAM allocation function (Thomas)
Thomas Hellström [Thu, 6 Mar 2025 01:26:48 +0000 (17:26 -0800)]
drm/xe: Add drm_pagemap ops to SVM
Add support for mapping device pages to Xe SVM by attaching drm_pagemap
to a memory region, which is then linked to a GPU SVM devmem allocation.
This enables GPU SVM to derive the device page address.
Matthew Brost [Thu, 6 Mar 2025 01:26:46 +0000 (17:26 -0800)]
drm/xe: Add SVM device memory mirroring
Add SVM device memory mirroring which enables device pages for
migration. Enabled via CONFIG_XE_DEVMEM_MIRROR Kconfig. Kconfig option
defaults to enabled. If not enabled, SVM will work sans migration and
KMD memory footprint will be less.
v3:
- Add CONFIG_XE_DEVMEM_MIRROR
v4:
- Fix Kconfig (Himal)
- Use %pe to print errors (Thomas)
- Fix alignment issue (Checkpatch)
v5:
- s/xe_mem_region/xe_vram_region (Rebase)
v6:
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
- s/drm_info/drm_dbg/
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Oak Zeng <oak.zeng@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-22-matthew.brost@intel.com
Add the DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR device query flag,
which indicates whether the device supports CPU address mirroring. The
intent is for UMDs to use this query to determine if a VM can be set up
with CPU address mirroring. This flag is implemented by checking if the
device supports GPU faults.
v7:
- Only report enabled if CONFIG_DRM_GPUSVM is selected (CI)
Matthew Brost [Thu, 6 Mar 2025 01:26:42 +0000 (17:26 -0800)]
drm/xe: Do not allow CPU address mirror VMA unbind if
uAPI is designed with the use case that only mapping a BO to a malloc'd
address will unbind a CPU-address mirror VMA. Therefore, allowing a
CPU-address mirror VMA to unbind when the GPU has bindings in the range
being unbound does not make much sense. This behavior is not supported,
as it simplifies the code. This decision can always be revisited if a
use case arises.
v3:
- s/arrises/arises (Thomas)
- s/system allocator/GPU address mirror (Thomas)
- Kernel doc (Thomas)
- Newline between function defs (Thomas)
v5:
- Kernel doc (Thomas)
v6:
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
Matthew Brost [Thu, 6 Mar 2025 01:26:41 +0000 (17:26 -0800)]
drm/xe: Add unbind to SVM garbage collector
Add unbind to SVM garbage collector. To facilitate add unbind support
function to VM layer which unbinds a SVM range. Also teach PT layer to
understand unbinds of SVM ranges.
v3:
- s/INVALID_VMA/XE_INVALID_VMA (Thomas)
- Kernel doc (Thomas)
- New GPU SVM range structure (Thomas)
- s/DRM_GPUVA_OP_USER/DRM_GPUVA_OP_DRIVER (Thomas)
v4:
- Use xe_vma_op_unmap_range (Himal)
v5:
- s/PY/PT (Thomas)
Matthew Brost [Thu, 6 Mar 2025 01:26:40 +0000 (17:26 -0800)]
drm/xe: Add SVM garbage collector
Add basic SVM garbage collector which destroy a SVM range upon a MMU
UNMAP event. The garbage collector runs on worker or in GPU fault
handler and is required as locks in the path of reclaim are required and
cannot be taken the notifier.
v2:
- Flush garbage collector in xe_svm_close
v3:
- Better commit message (Thomas)
- Kernel doc (Thomas)
- Use list_first_entry_or_null for garbage collector loop (Thomas)
- Don't add to garbage collector if VM is closed (Thomas)
v4:
- Use %pe to print error (Thomas)
v5:
- s/visable/visible (Thomas)
Matthew Brost [Thu, 6 Mar 2025 01:26:39 +0000 (17:26 -0800)]
drm/xe: Add (re)bind to SVM page fault handler
Add (re)bind to SVM page fault handler. To facilitate add support
function to VM layer which (re)binds a SVM range. Also teach PT layer to
understand (re)binds of SVM ranges.
v2:
- Don't assert BO lock held for range binds
- Use xe_svm_notifier_lock/unlock helper in xe_svm_close
- Use drm_pagemap dma cursor
- Take notifier lock in bind code to check range state
v3:
- Use new GPU SVM range structure (Thomas)
- Kernel doc (Thomas)
- s/DRM_GPUVA_OP_USER/DRM_GPUVA_OP_DRIVER (Thomas)
v5:
- Kernel doc (Thomas)
v6:
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
Matthew Brost [Thu, 6 Mar 2025 01:26:38 +0000 (17:26 -0800)]
drm/gpuvm: Add DRM_GPUVA_OP_DRIVER
Add DRM_GPUVA_OP_DRIVER which allows driver to define their own gpuvm
ops. Useful for driver created ops which can be passed into the bind
software pipeline.
Matthew Brost [Thu, 6 Mar 2025 01:26:37 +0000 (17:26 -0800)]
drm/xe: Add SVM range invalidation and page fault
Add SVM range invalidation vfunc which invalidates PTEs. A new PT layer
function which accepts a SVM range is added to support this. In
addition, add the basic page fault handler which allocates a SVM range
which is used by SVM range invalidation vfunc.
v2:
- Don't run invalidation if VM is closed
- Cycle notifier lock in xe_svm_close
- Drop xe_gt_tlb_invalidation_fence_fini
v3:
- Better commit message (Thomas)
- Add lockdep asserts (Thomas)
- Add kernel doc (Thomas)
- s/change/changed (Thomas)
- Use new GPU SVM range / notifier structures
- Ensure PTEs are zapped / dma mappings are unmapped on VM close (Thomas)
v4:
- Fix macro (Checkpatch)
v5:
- Use range start/end helpers (Thomas)
- Use notifier start/end helpers (Thomas)
v6:
- Use min/max helpers (Himal)
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
Matthew Brost [Thu, 6 Mar 2025 01:26:36 +0000 (17:26 -0800)]
drm/xe: Nuke VM's mapping upon close
Clear root PT entry and invalidate entire VM's address space when
closing the VM. Will prevent the GPU from accessing any of the VM's
memory after closing.
v2:
- s/vma/vm in kernel doc (CI)
- Don't nuke migration VM as this occur at driver unload (CI)
v3:
- Rebase and pull into SVM series (Thomas)
- Wait for pending binds (Thomas)
v5:
- Remove xe_gt_tlb_invalidation_fence_fini in error case (Matt Auld)
- Drop local migration bool (Thomas)
v7:
- Add drm_dev_enter/exit protecting invalidation (CI, Matt Auld)
Matthew Brost [Thu, 6 Mar 2025 01:26:34 +0000 (17:26 -0800)]
drm/xe: Add SVM init / close / fini to faulting VMs
Add SVM init / close / fini to faulting VMs. Minimual implementation
acting as a placeholder for follow on patches.
v2:
- Add close function
v3:
- Better commit message (Thomas)
- Kernel doc (Thomas)
- Update chunk array to be unsigned long (Thomas)
- Use new drm_gpusvm.h header location (Thomas)
- Newlines between functions in xe_svm.h (Thomas)
- Call drm_gpusvm_driver_set_lock in init (Thomas)
v6:
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
v7:
- Only select CONFIG_DRM_GPUSVM if DEVICE_PRIVATE (CI)
Add the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag, which is used to
create unpopulated virtual memory areas (VMAs) without memory backing or
GPU page tables. These VMAs are referred to as CPU address mirror VMAs.
The idea is that upon a page fault or prefetch, the memory backing and
GPU page tables will be populated.
CPU address mirror VMAs only update GPUVM state; they do not have an
internal page table (PT) state, nor do they have GPU mappings.
It is expected that CPU address mirror VMAs will be mixed with buffer
object (BO) VMAs within a single VM. In other words, system allocations
and runtime allocations can be mixed within a single user-mode driver
(UMD) program.
Expected usage:
- Bind the entire virtual address (VA) space upon program load using the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag.
- If a buffer object (BO) requires GPU mapping (runtime allocation),
allocate a CPU address using mmap(PROT_NONE), bind the BO to the
mmapped address using existing bind IOCTLs. If a CPU map of the BO is
needed, mmap it again to the same CPU address using mmap(MAP_FIXED)
- If a BO no longer requires GPU mapping, munmap it from the CPU address
space and them bind the mapping address with the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag.
- Any malloc'd or mmapped CPU address accessed by the GPU will be
faulted in via the SVM implementation (system allocation).
- Upon freeing any mmapped or malloc'd data, the SVM implementation will
remove GPU mappings.
Only supporting 1 to 1 mapping between user address space and GPU
address space at the moment as that is the expected use case. uAPI
defines interface for non 1 to 1 but enforces 1 to 1, this restriction
can be lifted if use cases arrise for non 1 to 1 mappings.
This patch essentially short-circuits the code in the existing VM bind
paths to avoid populating page tables when the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag is set.
v3:
- Call vm_bind_ioctl_ops_fini on -ENODATA
- Don't allow DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR on non-faulting VMs
- s/DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATOR/DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (Thomas)
- Rework commit message for expected usage (Thomas)
- Describe state of code after patch in commit message (Thomas)
v4:
- Fix alignment (Checkpatch)
Matthew Brost [Thu, 6 Mar 2025 01:26:31 +0000 (17:26 -0800)]
drm/gpusvm: Add support for GPU Shared Virtual Memory
This patch introduces support for GPU Shared Virtual Memory (SVM) in the
Direct Rendering Manager (DRM) subsystem. SVM allows for seamless
sharing of memory between the CPU and GPU, enhancing performance and
flexibility in GPU computing tasks.
The patch adds the necessary infrastructure for SVM, including data
structures and functions for managing SVM ranges and notifiers. It also
provides mechanisms for allocating, deallocating, and migrating memory
regions between system RAM and GPU VRAM.
This is largely inspired by GPUVM.
v2:
- Take order into account in check pages
- Clear range->pages in get pages error
- Drop setting dirty or accessed bit in get pages (Vetter)
- Remove mmap assert for cpu faults
- Drop mmap write lock abuse (Vetter, Christian)
- Decouple zdd from range (Vetter, Oak)
- Add drm_gpusvm_range_evict, make it work with coherent pages
- Export drm_gpusvm_evict_to_sram, only use in BO evict path (Vetter)
- mmget/put in drm_gpusvm_evict_to_sram
- Drop range->vram_alloation variable
- Don't return in drm_gpusvm_evict_to_sram until all pages detached
- Don't warn on mixing sram and device pages
- Update kernel doc
- Add coherent page support to get pages
- Use DMA_FROM_DEVICE rather than DMA_BIDIRECTIONAL
- Add struct drm_gpusvm_vram and ops (Thomas)
- Update the range's seqno if the range is valid (Thomas)
- Remove the is_unmapped check before hmm_range_fault (Thomas)
- Use drm_pagemap (Thomas)
- Drop kfree_mapping (Thomas)
- dma mapp pages under notifier lock (Thomas)
- Remove ctx.prefault
- Remove ctx.mmap_locked
- Add ctx.check_pages
- s/vram/devmem (Thomas)
v3:
- Fix memory leak drm_gpusvm_range_get_pages
- Only migrate pages with same zdd on CPU fault
- Loop over al VMAs in drm_gpusvm_range_evict
- Make GPUSVM a drm level module
- GPL or MIT license
- Update main kernel doc (Thomas)
- Prefer foo() vs foo for functions in kernel doc (Thomas)
- Prefer functions over macros (Thomas)
- Use unsigned long vs u64 for addresses (Thomas)
- Use standard interval_tree (Thomas)
- s/drm_gpusvm_migration_put_page/drm_gpusvm_migration_unlock_put_page (Thomas)
- Drop err_out label in drm_gpusvm_range_find_or_insert (Thomas)
- Fix kernel doc in drm_gpusvm_range_free_pages (Thomas)
- Newlines between functions defs in header file (Thomas)
- Drop shall language in driver vfunc kernel doc (Thomas)
- Move some static inlines from head to C file (Thomas)
- Don't allocate pages under page lock in drm_gpusvm_migrate_populate_ram_pfn (Thomas)
- Change check_pages to a threshold
v4:
- Fix NULL ptr deref in drm_gpusvm_migrate_populate_ram_pfn (Thomas, Himal)
- Fix check pages threshold
- Check for range being unmapped under notifier lock in get pages (Testing)
- Fix characters per line
- Drop WRITE_ONCE for zdd->devmem_allocation assignment (Thomas)
- Use completion for devmem_allocation->detached (Thomas)
- Make GPU SVM depend on ZONE_DEVICE (CI)
- Use hmm_range_fault for eviction (Thomas)
- Drop zdd worker (Thomas)
v5:
- Select Kconfig deps (CI)
- Set device to NULL in __drm_gpusvm_migrate_to_ram (Matt Auld, G.G.)
- Drop Thomas's SoB (Thomas)
- Add drm_gpusvm_range_start/end/size helpers (Thomas)
- Add drm_gpusvm_notifier_start/end/size helpers (Thomas)
- Absorb drm_pagemap name changes (Thomas)
- Fix driver lockdep assert (Thomas)
- Move driver lockdep assert to static function (Thomas)
- Assert mmap lock held in drm_gpusvm_migrate_to_devmem (Thomas)
- Do not retry forever on eviction (Thomas)
v6:
- Fix drm_gpusvm_get_devmem_page alignment (Checkpatch)
- Modify Kconfig (CI)
- Compile out lockdep asserts (CI)
v7:
- Add kernel doc for flags fields (CI, Auld)
Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-7-matthew.brost@intel.com
Thomas Hellström [Thu, 6 Mar 2025 01:26:29 +0000 (17:26 -0800)]
drm/pagemap: Add DRM pagemap
Introduce drm_pagemap ops to map and unmap dma to VRAM resources. In the
local memory case it's a matter of merely providing an offset into the
device's physical address. For future p2p the map and unmap functions may
encode as needed.
Similar to how dma-buf works, let the memory provider (drm_pagemap) provide
the mapping functionality.
Matthew Brost [Thu, 6 Mar 2025 01:26:28 +0000 (17:26 -0800)]
mm/migrate: Trylock device page in do_swap_page
Avoid multiple CPU page faults to the same device page racing by trying
to lock the page in do_swap_page before taking an extra reference to the
page. This prevents scenarios where multiple CPU page faults each take
an extra reference to a device page, which could abort migration in
folio_migrate_mapping. With the device page being locked in
do_swap_page, the migrate_vma_* functions need to be updated to avoid
locking the fault_page argument.
Prior to this change, a livelock scenario could occur in Xe's (Intel GPU
DRM driver) SVM implementation if enough threads faulted the same device
page.
v3:
- Put page after unlocking page (Alistair)
- Warn on spliting a TPH which is fault page (Alistair)
- Warn on dst page == fault page (Alistair)
v6:
- Add more verbose comment around THP (Alistair)
v7:
- Fix migrate_device_finalize alignment (Checkpatch)
Cc: Alistair Popple <apopple@nvidia.com> Cc: Philip Yang <Philip.Yang@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Suggested-by: Simona Vetter <simona.vetter@ffwll.ch> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Tested-by: Alistair Popple <apopple@nvidia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-4-matthew.brost@intel.com
Matthew Brost [Thu, 6 Mar 2025 01:26:27 +0000 (17:26 -0800)]
mm/migrate: Add migrate_device_pfns
Add migrate_device_pfns which prepares an array of pre-populated device
pages for migration. This is needed for eviction of known set of
non-contiguous devices pages to cpu pages which is a common case for SVM
in DRM drivers using TTM.
v2:
- s/migrate_device_vma_range/migrate_device_prepopulated_range
- Drop extra mmu invalidation (Vetter)
v3:
- s/migrate_device_prepopulated_range/migrate_device_pfns (Alistar)
- Use helper to lock device pages (Alistar)
- Update commit message with why this is required (Alistar)
Matthew Brost [Thu, 6 Mar 2025 01:26:26 +0000 (17:26 -0800)]
drm/xe: Retry BO allocation
TTM doesn't support fair eviction via WW locking, this mitigated in by
using retry loops in exec and preempt rebind worker. Extend this retry
loop to BO allocation. Once TTM supports fair eviction this patch can be
reverted.
Ville Syrjälä [Tue, 18 Feb 2025 21:18:55 +0000 (23:18 +0200)]
drm/i915/cdclk: Do cdclk post plane programming later
We currently call intel_set_cdclk_post_plane_update() far
too early. When pipes are active during the reprogramming
the current spot only works for the cd2x divider update
case, as that is synchronize to the pipe's vblank. Squashing
and crawling are not synchronized in any way, so doing the
programming while the pipes/planes are potentially still using
the old hardware state could lead to underruns.
Move the post plane reprgramming to a spot where we know
that the pipes/planes have switched over the new hardware
state.
Francois Dugast [Wed, 5 Mar 2025 15:06:59 +0000 (16:06 +0100)]
drm/xe: Allow fault injection in exec queue IOCTLs
Use fault injection infrastructure to allow specific functions to
be configured over debugfs for failing during the execution of
xe_exec_queue_create_ioctl(). xe_exec_queue_destroy_ioctl() and
xe_exec_queue_get_property_ioctl() are not considered as there is
no unwinding code to test with fault injection.
This allows more thorough testing from user space by going through
code paths for error handling and unwinding which cannot be reached
by simply injecting errors in IOCTL arguments. This can help
increase code robustness.
The corresponding IGT series is:
https://patchwork.freedesktop.org/series/144138/
Thomas Zimmermann [Wed, 26 Feb 2025 17:03:13 +0000 (18:03 +0100)]
drm/prime: Use dma_buf from GEM object instance
Avoid dereferencing struct drm_gem_object.import_attach for the
imported dma-buf. The dma_buf field in the GEM object instance refers
to the same buffer. Prepares to make import_attach optional.
Thomas Zimmermann [Wed, 26 Feb 2025 17:03:10 +0000 (18:03 +0100)]
drm/gem-framebuffer: Use dma_buf from GEM object instance
Avoid dereferencing struct drm_gem_object.import_attach for the
imported dma-buf. The dma_buf field in the GEM object instance refers
to the same buffer. Prepares to make import_attach optional.
Thomas Zimmermann [Wed, 26 Feb 2025 17:03:08 +0000 (18:03 +0100)]
drm/gem-shmem: Use dma_buf from GEM object instance
Avoid dereferencing struct drm_gem_object.import_attach for the
imported dma-buf. The dma_buf field in the GEM object instance refers
to the same buffer. Prepares to make import_attach optional.
Thomas Zimmermann [Wed, 26 Feb 2025 17:03:06 +0000 (18:03 +0100)]
drm/gem-dma: Use dma_buf from GEM object instance
Avoid dereferencing struct drm_gem_object.import_attach for the
imported dma-buf. The dma_buf field in the GEM object instance refers
to the same buffer. Prepares to make import_attach optional.