]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
3 months agodrm/xe: Support for I2C attached MCUs
Heikki Krogerus [Tue, 1 Jul 2025 12:22:50 +0000 (15:22 +0300)]
drm/xe: Support for I2C attached MCUs

Adding adaption/glue layer where the I2C host adapter
(Synopsys DesignWare I2C adapter) and the I2C clients (the
microcontroller units) are enumerated.

The microcontroller units (MCU) that are attached to the GPU
depend on the OEM. The initially supported MCU will be the
Add-In Management Controller (AMC).

Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20250701122252.2590230-4-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo fixed the co-developed tags and SPDX format in the .c file]

3 months agoi2c: designware: Add quirk for Intel Xe
Heikki Krogerus [Tue, 1 Jul 2025 12:22:49 +0000 (15:22 +0300)]
i2c: designware: Add quirk for Intel Xe

The regmap is coming from the parent also in case of Xe
GPUs. Reusing the Wangxun quirk for that.

Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250701122252.2590230-3-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo fixed the co-developed tags while merging]

3 months agoi2c: designware: Use polling by default when there is no irq resource
Heikki Krogerus [Tue, 1 Jul 2025 12:22:48 +0000 (15:22 +0300)]
i2c: designware: Use polling by default when there is no irq resource

The irq resource itself can be used as a generic way to
determine when polling is needed.

This not only removes the need for special additional device
properties that would soon be needed when the platform may
or may not have the irq, but it also removes the need to
check the platform in the first place in order to determine
is polling needed or not.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250701122252.2590230-2-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 months agodrm/xe/guc: Don't allocate temporary policies object
Michal Wajdeczko [Wed, 2 Jul 2025 14:25:04 +0000 (16:25 +0200)]
drm/xe/guc: Don't allocate temporary policies object

Since we are already using reusable buffer objects from the GuC
buffer cache, we can directly write into their CPU pointers and
spare unnecessary temporary allocation.

While around, also make sure to clear obtained buffer, to avoid
sending some stale data.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250702142504.1656-1-michal.wajdeczko@intel.com
3 months agodrm/xe/pf: Print configuration KLVs using debug printer
Michal Wajdeczko [Thu, 3 Jul 2025 14:57:09 +0000 (16:57 +0200)]
drm/xe/pf: Print configuration KLVs using debug printer

While we print VF's configuration KLVs only under DEBUG_SRIOV
config, we should be doing it at debug level, not info level.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com>
Link: https://lore.kernel.org/r/20250703145709.1832-1-michal.wajdeczko@intel.com
3 months agodrm/xe/pf: Print runtime registers using debug printer
Michal Wajdeczko [Wed, 4 Jun 2025 19:00:20 +0000 (21:00 +0200)]
drm/xe/pf: Print runtime registers using debug printer

While we already print VF's runtime registers only under DEBUG_SRIOV
config, we should be still doing it at debug level, not info.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com>
Link: https://lore.kernel.org/r/20250604190021.725-2-michal.wajdeczko@intel.com
3 months agodrm/xe: Expose fan control and voltage regulator version
Raag Jadav [Wed, 9 Jul 2025 16:42:24 +0000 (22:12 +0530)]
drm/xe: Expose fan control and voltage regulator version

Add sysfs attributes for late binding features which expose bound version
to the user.

v2: Rework attribute and macro naming (Badal)
v3: Drop fancy formatting (Rodrigo)
v4: Form version string using local variables (Rodrigo)

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250709164224.2676086-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 months agodrm/xe: Release runtime pm for error path of xe_devcoredump_read()
Shuicheng Lin [Mon, 7 Jul 2025 00:49:14 +0000 (00:49 +0000)]
drm/xe: Release runtime pm for error path of xe_devcoredump_read()

xe_pm_runtime_put() is missed to be called for the error path in
xe_devcoredump_read().
Add function description comments for xe_devcoredump_read() to help
understand it.

v2: more detail function comments and refine goto logic (Matt)

Fixes: c4a2e5f865b7 ("drm/xe: Add devcoredump chunking")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250707004911.3502904-6-shuicheng.lin@intel.com
3 months agodrm/xe: Remove unused code in devcoredump_snapshot()
Shuicheng Lin [Mon, 7 Jul 2025 00:49:13 +0000 (00:49 +0000)]
drm/xe: Remove unused code in devcoredump_snapshot()

The deleted code is no longer needed because patch "drm/xe/guc: Plumb
GuC-capture into dev coredump" has removed the related usage code.
Remove the code to tidy up the function.

v2: s/bacause/because

Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250707004911.3502904-5-shuicheng.lin@intel.com
3 months agodrm/xe/uc: Disable GuC communication on hardware initialization error
Zhanjun Dong [Mon, 7 Jul 2025 23:11:08 +0000 (19:11 -0400)]
drm/xe/uc: Disable GuC communication on hardware initialization error

Disable GuC communication on Xe micro controller hardware initialization
error.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4917
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250707231108.3217573-1-zhanjun.dong@intel.com
3 months agodrm/xe/pm: Restore display pm if there is error after display suspend
Shuicheng Lin [Tue, 8 Jul 2025 03:54:25 +0000 (03:54 +0000)]
drm/xe/pm: Restore display pm if there is error after display suspend

xe_bo_evict_all() is called after xe_display_pm_suspend(). So if there
is error with xe_bo_evict_all(), display pm should be restored.

Fixes: 51462211f4a9 ("drm/xe/pxp: add PXP PM support")
Fixes: cb8f81c17531 ("drm/xe/display: Make display suspend/resume work on discrete")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://lore.kernel.org/r/20250708035424.3608190-2-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 months agodrm/xe/ptl: Drop force_probe requirement
Matt Atwood [Mon, 7 Jul 2025 20:19:59 +0000 (13:19 -0700)]
drm/xe/ptl: Drop force_probe requirement

Panther Lake has proven to be stable through testing and use.

Remove the force_probe requirement and enable the platform by default.

Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250707201959.319406-1-matthew.s.atwood@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 months agodrm/xe/ptl: Add HuC FW definition for PTL
Daniele Ceraolo Spurio [Thu, 26 Jun 2025 18:28:12 +0000 (11:28 -0700)]
drm/xe/ptl: Add HuC FW definition for PTL

Add the unversioned define for the PTL HuC FW.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250626182805.1701096-15-daniele.ceraolospurio@intel.com
3 months agodrm/xe/ptl: Add GuC FW definition for PTL
Daniele Ceraolo Spurio [Thu, 26 Jun 2025 18:28:11 +0000 (11:28 -0700)]
drm/xe/ptl: Add GuC FW definition for PTL

The first official GuC relase for PTL is 70.47.0, which maps to
API version 1.22.4.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250626182805.1701096-14-daniele.ceraolospurio@intel.com
3 months agodrm/xe/guc: Recommend GuC v70.46.2 for BMG, LNL, DG2
Julia Filipchuk [Thu, 26 Jun 2025 18:28:10 +0000 (11:28 -0700)]
drm/xe/guc: Recommend GuC v70.46.2 for BMG, LNL, DG2

UAPI compatibility version 1.22.2

Resolves various bugs. Recommend newer version.

Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250626182805.1701096-13-daniele.ceraolospurio@intel.com
3 months agodrm/xe/bmg: Add one additional PCI ID
Vodapalli, Ravi Kumar [Fri, 4 Jul 2025 10:35:27 +0000 (16:05 +0530)]
drm/xe/bmg: Add one additional PCI ID

One additional PCI ID is added in Bspec for BMG, Add it so that
driver recognizes this device with this new ID.

Bspec: 68090
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Vodapalli, Ravi Kumar <ravi.kumar.vodapalli@intel.com>
Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250704103527.100178-1-ravi.kumar.vodapalli@intel.com
3 months agodrm/xe/bmg: fix compressed VRAM handling
Matthew Auld [Tue, 1 Jul 2025 10:39:50 +0000 (11:39 +0100)]
drm/xe/bmg: fix compressed VRAM handling

There looks to be an issue in our compression handling when the BO pages
are very fragmented, where we choose to skip the identity map and
instead fall back to emitting the PTEs by hand when migrating memory,
such that we can hopefully do more work per blit operation. However in
such a case we need to ensure the src PTEs are correctly tagged with a
compression enabled PAT index on dgpu xe2+, otherwise the copy will
simply treat the src memory as uncompressed, leading to corruption if
the memory was compressed by the user.

To fix this pass along use_comp_pat into emit_pte() on the src side, to
indicate that compression should be considered.

v2 (Jonathan): tweak the commit message

Fixes: 523f191cc0c7 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com
3 months agoRevert "drm/xe/xe2: Enable Indirect Ring State support for Xe2"
Matthew Brost [Wed, 2 Jul 2025 03:58:46 +0000 (20:58 -0700)]
Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2"

This reverts commit fe0154cf8222d9e38c60ccc124adb2f9b5272371.

Seeing some unexplained random failures during LRC context switches with
indirect ring state enabled. The failures were always there, but the
repro rate increased with the addition of WA BB as a separate BO.
Commit 3a1edef8f4b5 ("drm/xe: Make WA BB part of LRC BO") helped to
reduce the issues in the context switches, but didn't eliminate them
completely.

Indirect ring state is not required for any current features, so disable
for now until failures can be root caused.

Cc: stable@vger.kernel.org
Fixes: fe0154cf8222 ("drm/xe/xe2: Enable Indirect Ring State support for Xe2")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe/vf: Make multi-GT migration less error prone
Tomasz Lis [Mon, 30 Jun 2025 15:21:55 +0000 (17:21 +0200)]
drm/xe/vf: Make multi-GT migration less error prone

There is a remote chance that after migration,  some GTs will not
send the MIGRATED interrupt, or due to current VF KMD state the
interrupt will not lead to marking the GT for recovery.

Requiring IRQs from all GTs before starting migration introduces
the possibility that the process will get stalled due to one GuC.

One could argue it is also waste of time to wait for all IRQs,
but we should get them all IRQs as soon as VGPU starts, so that's
not really an impactful argument.

Still, not waiting for all GTs makes it easier to handle situations:
* where one GuC IRQ is missing
* where state before probe is unclean - getting MIGRATED IRQ as soon
  as interrupts are enabled
* where multiple migrations happen close to each other

To help with these cases, this patch alters the post-migration
recovery so that recovery task is started as soon as one GuC IRQ
is handled, and other GTs are included in recovery later as the
subsequent IRQs are serviced.

The post-migration recovery can now be called for any selection of
GTs, and it will perform recovery on all GTs for which IRQs have
arrived, even multiple times if necessary.

v2: Typos and style fixes
v3: Transferring gt_flags by value rather than reference to last
 function where it is used

Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Michal Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250630152155.195648-1-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe: Allocate PF queue size on pow2 boundary
Matthew Brost [Wed, 2 Jul 2025 21:35:11 +0000 (14:35 -0700)]
drm/xe: Allocate PF queue size on pow2 boundary

CIRC_SPACE does not work unless the size argument is a power of 2,
allocate PF queue size on power of 2 boundary.

Cc: stable@vger.kernel.org
Fixes: 3338e4f90c14 ("drm/xe: Use topology to determine page fault queue size")
Fixes: 29582e0ea75c ("drm/xe: Add page queue multiplier")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250702213511.3226167-1-matthew.brost@intel.com
3 months agodrm/xe/pf: Clear all LMTT pages on alloc
Michal Wajdeczko [Tue, 1 Jul 2025 22:00:52 +0000 (00:00 +0200)]
drm/xe/pf: Clear all LMTT pages on alloc

Our LMEM buffer objects are not cleared by default on alloc
and during VF provisioning we only setup LMTT PTEs for the
actually provisioned LMEM range. But beyond that valid range
we might leave some stale data that could either point to some
other VFs allocations or even to the PF pages.

Explicitly clear all new LMTT page to avoid the risk that a
malicious VF would try to exploit that gap.

While around add asserts to catch any undesired PTE overwrites
and low-level debug traces to track LMTT PT life-cycle.

Fixes: b1d204058218 ("drm/xe/pf: Introduce Local Memory Translation Table")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250701220052.1612-1-michal.wajdeczko@intel.com
3 months agodrm/xe/xe_pmu: Validate gt in event supported
Riana Tauro [Mon, 30 Jun 2025 09:37:41 +0000 (15:07 +0530)]
drm/xe/xe_pmu: Validate gt in event supported

Validate gt instead of checking gt_id is lesser
than max gts per tile

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250630093741.2435281-1-riana.tauro@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe/xe_query: Use separate iterator while filling GT list
Matt Roper [Tue, 1 Jul 2025 20:13:28 +0000 (13:13 -0700)]
drm/xe/xe_query: Use separate iterator while filling GT list

The 'id' value updated by for_each_gt() is the uapi GT ID of the GTs
being iterated over, and may skip over values if a GT is not present on
the device.  Use a separate iterator for GT list array assignments to
ensure that the array will be filled properly on future platforms where
index in the GT query list may not match the uapi ID.

v2:
 - Include the missing increment of the iterator.  (Jonathan)

Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-16-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe: Don't compare GT ID to GT count when determining valid GTs
Matt Roper [Tue, 1 Jul 2025 20:13:27 +0000 (13:13 -0700)]
drm/xe: Don't compare GT ID to GT count when determining valid GTs

On current platforms with multiple GTs, all of the GT IDs are
consecutive; as a result we know that the GT IDs range from 0 to
gt_count-1 and can determine if a GT ID is valid by comparing against
the count.  The consecutive nature of GT IDs may not hold true on future
platforms if/when we have platforms that are both multi-tile and have
multiple GTs within each tile.  Once such platforms exist, it's quite
possible that we could wind up with something like a GT list composed of
IDs 0, 2, and 3 with no GT 1 (which would be a 2-tile platform with
media only on the second tile).

To future-proof the code we should stop comparing against the GT count
to determine whether a GT ID is valid or not.  Instead we should do an
actual lookup of the ID to determine whether the GT exists.  This also
means that our GT loop macro should not end at the GT count, but should
rather examine the entire space up to (# of tiles) * (max GT per tile)
to ensure it doesn't stop prematurely.

Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-15-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe: Assign GT IDs properly on multi-tile + multi-GT platforms
Matt Roper [Tue, 1 Jul 2025 20:13:26 +0000 (13:13 -0700)]
drm/xe: Assign GT IDs properly on multi-tile + multi-GT platforms

Although "multi-tile" and "multiple GTs per tile" are mutually-exclusive
characteristics on all of our platforms today, this may not always be
true.  Assign GT IDs according to xe->info.max_gt_per_tile in a way that
should work even if future platforms have different configurations.

This patch should not change the behavior of current platforms; it only
future-proofs for potential future designs.

v2:
 - Re-calculate gt_count if tile count gets reduced by MTCFG.  (PVC CI)

Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-14-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe/tests/pci: Ensure all platforms have a valid GT/tile count
Matt Roper [Tue, 1 Jul 2025 20:13:25 +0000 (13:13 -0700)]
drm/xe/tests/pci: Ensure all platforms have a valid GT/tile count

Add a simple kunit test to ensure each platform's GT per tile count is
non-zero and does not exceed the global XE_MAX_GT_PER_TILE definition.

We need to move 'struct xe_subplatform_desc' from the .c file to the
types header to ensure it is accessible from the kunit test.

v2:
 - Rebase on latest xe_pci test rework from Michal and convert to
   a parameterized test that runs on each PCI ID supported by the
   driver.

Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-13-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe: Track maximum GTs per tile on a per-platform basis
Matt Roper [Tue, 1 Jul 2025 20:13:24 +0000 (13:13 -0700)]
drm/xe: Track maximum GTs per tile on a per-platform basis

Today all of our platforms fall into one of three cases:
 * Single tile platforms with a single (primary) GT
 * Single tile platforms with two GTs (primary + media)
 * Two-tile platforms with a single GT (primary) in each

Our numbering of GTs has been a bit inconsistent between platforms
(e.g., GT1 is the media GT on some platforms, but the second tile's
primary GT on others).  In the future we'll likely have platforms that
are both multi-tile and multi-GT, which will make the situation more
confusing.  We could also wind up with more than just two types of GTs
at some point in the future.

Going forward we should standardize the way we assign uapi GT IDs to
internal GT structures.  Let's declare that for userspace GT ID n,

  GT[n]'s tile             = n / (max gt per tile)
  GT[n]'s slot within tile = n % (max gt per tile)

We don't want the GT numbering to change for any of our current
platforms since the current IDs are part of our ABI contract with
userspace so this means we should track the 'max gt per tile' value on a
per-platform basis rather than just using a single value across the
driver.  Encode this into device descriptors in xe_pci.c and use the
per-platform number for various checks in the code.  Constant
XE_MAX_GT_PER_TILE will remain just as the maximum across all platforms
for easy of sizing array allocations.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-12-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe: Export xe_step_name for kunit tests
Matt Roper [Tue, 1 Jul 2025 20:13:23 +0000 (13:13 -0700)]
drm/xe: Export xe_step_name for kunit tests

xe_step_name() is used by xe_assert(), so adding assertions to functions
like xe_device_get_gt() will result in

  ERROR: modpost: "xe_step_name" [drivers/gpu/drm/xe/tests/xe_test.ko] undefined!

while building the kunit tests.  Export xe_step_name to avoid these
build failures when adding assertions.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-11-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe/hw_engine_group: Fix potential leak
Michal Wajdeczko [Fri, 27 Jun 2025 18:41:43 +0000 (20:41 +0200)]
drm/xe/hw_engine_group: Fix potential leak

If we fail to allocate a workqueue we will leak kzalloc'ed group
object since it was designed to be kfree'ed in the drmm cleanup
action, but we didn't have a chance to register this action yet.

To avoid this leak allocate a group object using drmm_kzalloc()
and start using predefined drmm action to release the workqueue.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250627184143.1480-1-michal.wajdeczko@intel.com
3 months agodrm/xe: Consolidate LRC offset calculations
Tvrtko Ursulin [Mon, 30 Jun 2025 12:46:47 +0000 (13:46 +0100)]
drm/xe: Consolidate LRC offset calculations

Attempt to consolidate the LRC offsets calculations by aligning the
recently added wa_bb_offset with the naming scheme in the file and
also change the size stored in struct xe_lrc to not include the ring
buffer.

The former makes it somewhat visually easier to follow the layout of the
various logical blocks stored in the LRC bo, while the latter reduces the
number of sprinkled around calculations.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250630124711.8209-2-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe: Fix typo in Kconfig
Maarten Lankhorst [Fri, 27 Jun 2025 07:41:19 +0000 (09:41 +0200)]
drm/xe: Fix typo in Kconfig

doubut -> doubt.

Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250627074119.347826-1-dev@lankhorst.se
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe: Allow dropping kunit dependency as built-in
Harry Austen [Fri, 27 Jun 2025 20:30:35 +0000 (13:30 -0700)]
drm/xe: Allow dropping kunit dependency as built-in

Fix Kconfig symbol dependency on KUNIT, which isn't actually required
for XE to be built-in. However, if KUNIT is enabled, it must be built-in
too.

Fixes: 08987a8b6820 ("drm/xe: Fix build with KUNIT=m")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Harry Austen <hpausten@protonmail.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20250627-xe-kunit-v2-2-756fe5cd56cf@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe: Drop bo->size
Matthew Brost [Wed, 25 Jun 2025 14:41:28 +0000 (07:41 -0700)]
drm/xe: Drop bo->size

bo->size is redundant because the base GEM object already has a size
field with the same value. Drop bo->size and use the base GEM object’s
size instead. While at it, introduce xe_bo_size() to abstract the BO
size.

v2:
 - Fix typo in kernel doc (Ashutosh)
 - Fix kunit (CI)
 - Fix line wrap (Checkpatch)
v3:
 - Fix sriov build (CI)
v4:
 - Fix display build (CI)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://lore.kernel.org/r/20250625144128.2827577-1-matthew.brost@intel.com
3 months agodrm/xe/guc: Enable the Dynamic Inhibit Context Switch optimization
Daniele Ceraolo Spurio [Wed, 25 Jun 2025 20:54:07 +0000 (13:54 -0700)]
drm/xe/guc: Enable the Dynamic Inhibit Context Switch optimization

The Dynamic Inhibit Context Switch is an optimization aimed at reducing
the amount of time the HW is stuck waiting on an unsatisfied semaphore.
When this optimization is enabled, the GuC will dynamically modify the
CTX_CTRL_INHIBIT_SYN_CTX_SWITCH in the CTX_CONTEXT_CONTROL register of
LRCs to enable immediate switching out on an unsatisfied semaphore wait
when multiple contexts are competing for time on the same engine.

This feature is available on recent HW from GuC 70.40.1 onwards and it
is enabled via a per-VF feature opt-in.

v2: rebase
v3: switch to using guc_buf_cache instead of dedicated alloc
v4: add helper to check for feature availability (Michal), don't enable
if multi-lrc is possible.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Julia Filipchuk <julia.filipchuk@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250625205405.1653212-4-daniele.ceraolospurio@intel.com
3 months agodrm/xe/guc: Enable extended CAT error reporting
Daniele Ceraolo Spurio [Wed, 25 Jun 2025 20:54:06 +0000 (13:54 -0700)]
drm/xe/guc: Enable extended CAT error reporting

On newer HW (Xe2 onwards + PVC) it is possible to get extra information
when a CAT error occurs, specifically a dword reporting the error type.
To enable this extra reporting, we need to opt-in with the GuC, which is
done via a specific per-VF feature opt-in H2G.

On platforms where the HW does not support the extra reporting, the GuC
will set the type to 0xdeadbeef, so we can keep the code simple and
opt-in to the feature on every platform and then just discard the data
if it is invalid.

Note that on native/PF we're guaranteed that the opt in is available
because we don't support any GuC old enough to not have it, but if we're
a VF we might be running on a non-XE PF with an older GuC, so we need to
handle that case. We can re-use the invalid type above to handle this
scenario the same way as if the feature was not supported in HW.

Given that this patch is the first user of the guc_buf_cache on native
and VF, it also extends that feature to non-PF use-cases.

v2: simpler print for the error type (John), rebase
v3: use guc_buf_cache instead of new alloc, simpler doc (Michal)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> #v1
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250625205405.1653212-3-daniele.ceraolospurio@intel.com
3 months agodrm/xe: Fix out-of-bounds field write in MI_STORE_DATA_IMM
Jia Yao [Thu, 12 Jun 2025 22:46:20 +0000 (22:46 +0000)]
drm/xe: Fix out-of-bounds field write in MI_STORE_DATA_IMM

According to Bspec, bits 0~9 of MI_STORE_DATA_IMM must not exceed 0x3FE.
The macro MI_SDI_NUM_QW(x) evaluates to 2 * x + 1, which means the
condition 2 * x + 1 <= 0x3FE must be satisfied. Therefore, the maximum
valid value for x is 0x1FE, not 0x1FF.

v2
 - Replace 0x1fe with macro MAX_PTE_PER_SDI (Auld, Matthew & Patelczyk, Maciej)

v3
 - Change macro MAX_PTE_PER_SDI from 0x1fe to 0x1feU (De Marchi, Lucas)

Bspec: 60246

Fixes: 9c44fd5f6e8a ("drm/xe: Add migrate layer functions for SVM support")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Brian3 Nguyen <brian3.nguyen@intel.com>
Cc: Alex Zuo <alex.zuo@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Suggested-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Jia Yao <jia.yao@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Link: https://lore.kernel.org/r/20250612224620.161105-1-jia.yao@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe: Rename xe_uc_init_hw to xe_uc_load_hw
Maarten Lankhorst [Thu, 19 Jun 2025 10:49:10 +0000 (12:49 +0200)]
drm/xe: Rename xe_uc_init_hw to xe_uc_load_hw

It feels to me like load is closer to the intention than init_hw.

It makes the init calls slightly less confusing to me. :)

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-24-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
3 months agodrm/xe: Remove xe_uc_fini_hw
Maarten Lankhorst [Thu, 19 Jun 2025 10:49:09 +0000 (12:49 +0200)]
drm/xe: Remove xe_uc_fini_hw

xe_uc_init_hw() is called multiple times from xe_gt.c,
and that makes the name xe_uc_fini_hw(), called for a different
reason in xe_guc.c confusing.

Remove it and inline the xe_uc_sanitize_reset into xe_guc.c directly.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-23-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
3 months agodrm/xe: Remove xe_uc_init_hwconfig()
Maarten Lankhorst [Thu, 19 Jun 2025 10:49:08 +0000 (12:49 +0200)]
drm/xe: Remove xe_uc_init_hwconfig()

This function is called immediately after xe_uc_init(), so just put it
there instead.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-22-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
3 months agodrm/xe: Move xe_ttm_sys_mgr_init() downwards.
Maarten Lankhorst [Thu, 19 Jun 2025 10:49:07 +0000 (12:49 +0200)]
drm/xe: Move xe_ttm_sys_mgr_init() downwards.

Now that all previous allocations are gone, ensure no new allocations
will ever be done before xe_display_init_early(), by moving the call
that allows allocations downwards.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-21-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
3 months agodrm/xe: Split init of xe_gt_init_hwconfig to xe_gt_init and *_early
Maarten Lankhorst [Thu, 19 Jun 2025 10:49:06 +0000 (12:49 +0200)]
drm/xe: Split init of xe_gt_init_hwconfig to xe_gt_init and *_early

Now that we added the separate step of initialising GUC in
xe_gt_init_early, it should be ok to initialise the minimum during early
init, and the rest after allocations are allowed.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-20-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
3 months agodrm/xe: Rename gt_init sub-functions
Maarten Lankhorst [Thu, 19 Jun 2025 10:49:05 +0000 (12:49 +0200)]
drm/xe: Rename gt_init sub-functions

s/gt_fw_domain_init/gt_init_with_gt_forcewake()/
s/all_fw_domain_init/gt_init_with_all_forcewake()/

Clarify that the functions are the part of gt_init() that are called
with the respective power domains held. all_domain() of course only
works after discovering and initialisation of force_wake on all engines,
that's why the split is needed in the first place.

Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-19-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
3 months agodrm/xe: Only dump PAT when xe_hw_engines_init_early fails
Maarten Lankhorst [Thu, 19 Jun 2025 10:49:04 +0000 (12:49 +0200)]
drm/xe: Only dump PAT when xe_hw_engines_init_early fails

After discussion with Lucas De Marchi, it turns out that is the
specific caller requiring a dump. This allows us to cleanup
gt_init in a bit.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-18-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
3 months agodrm/xe: Make it possible to read instance0 MCR registers after xe_gt_mcr_init_early
Maarten Lankhorst [Thu, 19 Jun 2025 10:49:03 +0000 (12:49 +0200)]
drm/xe: Make it possible to read instance0 MCR registers after xe_gt_mcr_init_early

After mcr_init_early, we need to be able to do VRAM and CCS probing
without hwconfig probe. Fortunately the relevant registers are all
instance 0, which fortunately means no dependencies on further initialization
is required.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-17-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
3 months agodrm/xe: Simplify GuC early initialization
Maarten Lankhorst [Thu, 19 Jun 2025 10:49:02 +0000 (12:49 +0200)]
drm/xe: Simplify GuC early initialization

Add a 2-stage GuC init. An early one for everything that is needed
for VF, and a full one that loads GuC and is allowed to do allocations.

Link: https://lore.kernel.org/r/20250619104858.418440-16-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe/sriov: Move VF bootstrap and query_config to vf_guc_init
Maarten Lankhorst [Thu, 19 Jun 2025 10:49:01 +0000 (12:49 +0200)]
drm/xe/sriov: Move VF bootstrap and query_config to vf_guc_init

We want to split up GUC init to an alloc and noalloc part to keep the
init path the same for VF and !VF as much as possible.

Everything in vf_guc_init should be done as early as possible, otherwise
VRAM probing becomes impossible.

Also move xe_gt_mmio_init to the end of xe_gt_init_early(), cleaning up
the init in xe_device slightly.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-15-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
3 months agodrm/xe: Defer memirq init until needed
Maarten Lankhorst [Thu, 19 Jun 2025 10:49:00 +0000 (12:49 +0200)]
drm/xe: Defer memirq init until needed

memirqs require allocations into GGTT, which we cannot use until
after display is enabled.

Now that the initialisation of interrupts is postponed, move memirq
init too.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-14-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
4 months agodrm/xe: Implement and use the drm_pagemap populate_mm op
Thomas Hellström [Thu, 19 Jun 2025 13:40:35 +0000 (15:40 +0200)]
drm/xe: Implement and use the drm_pagemap populate_mm op

Add runtime PM since we might call populate_mm on a foreign device.

v3:
- Fix a kerneldoc failure (Matt Brost)
- Revert the bo type change from device to kernel (Matt Brost)
v4:
- Add an assert in xe_svm_alloc_vram (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250619134035.170086-4-thomas.hellstrom@linux.intel.com
4 months agodrm/pagemap: Add a populate_mm op
Thomas Hellström [Thu, 19 Jun 2025 13:40:34 +0000 (15:40 +0200)]
drm/pagemap: Add a populate_mm op

Add an operation to populate a part of a drm_mm with device
private memory. Clarify how migration using it is intended
to work.

v3:
- Kerneldoc fixes and updates (Matt Brost).
v4:
- More kerneldoc fixes. Rebase.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250619134035.170086-3-thomas.hellstrom@linux.intel.com
4 months agodrm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap
Matthew Brost [Thu, 19 Jun 2025 13:40:33 +0000 (15:40 +0200)]
drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap

The migration functionality and track-keeping of per-pagemap VRAM
mapped to the CPU mm is not per GPU_vm, but rather per pagemap.
This is also reflected by the functions not needing the drm_gpusvm
structures. So move to drm_pagemap.

With this, drm_gpusvm shouldn't really access the page zone-device-data
since its meaning is internal to drm_pagemap. Currently it's used to
reject mapping ranges backed by multiple drm_pagemap allocations.
For now, make the zone-device-data a void pointer.

Alter the interface of drm_gpusvm_migrate_to_devmem() to ensure we don't
pass a gpusvm pointer.

Rename CONFIG_DRM_XE_DEVMEM_MIRROR to CONFIG_DRM_XE_PAGEMAP.

Matt is listed as author of this commit since he wrote most of the code,
and it makes sense to retain his git authorship.
Thomas mostly moved the code around.

v3:
- Kerneldoc fixes (CI)
- Don't update documentation about how the drm_pagemap
  migration should be interpreted until upcoming
  patches where the functionality is implemented.
  (Matt Brost)
v4:
- More kerneldoc fixes around timeslice_ms
  (Himal Ghimiray, Matt Brost)
v6:
- Fix an uninitialized pagemap pointer (CI)

Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250619134035.170086-2-thomas.hellstrom@linux.intel.com
4 months agodrm/xe: Do not wedge device on killed exec queues
Matthew Brost [Tue, 24 Jun 2025 17:41:03 +0000 (10:41 -0700)]
drm/xe: Do not wedge device on killed exec queues

When a user closes an exec queue or interrupts an app with Ctrl-C,
this does not warrant wedging the device in mode 2.

Avoid this by skipping the wedge check for killed exec queues in
the TDR and LR exec queue cleanup worker.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250624174103.2707941-1-matthew.brost@intel.com
4 months agoRevert "drm/xe/ptl: Apply Wa_16026007364"
Daniele Ceraolo Spurio [Wed, 25 Jun 2025 00:12:03 +0000 (17:12 -0700)]
Revert "drm/xe/ptl: Apply Wa_16026007364"

This reverts commit 3972872e459d812ab5e481a231a6066cf4f4d0f4.

There are several things wrong with the way this WA was implemented:

- The KLV is only supported on GuC 70.47.0 or newer, so we shouldn't
  apply it unconditionally.

- The KLV requires 2 DWs of data, which are not currently provided.

The GuC currently ignores any unknown KLVs, so on versions older that
70.47.0 nothing happens. However, starting on 70.47.0 the GuC attempts
to parse the KLV and fails due to the missing data, causing a GuC load
abort.

Given that 70.47.0 is the first GuC version approved for public release
for PTL, let's revert this patch so it doesn't cause the GuC load to
fail with that blob. We can then re-apply it properly fixed after the
GuC definition is merged, which will also have the added benefit of
running the KLV addition through CI with the right GuC version.

Fixes: 3972872e459d ("drm/xe/ptl: Apply Wa_16026007364")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: sanirban <sk.anirban@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250625001202.1616606-2-daniele.ceraolospurio@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe/uapi: Correct sync type definition in comments
Shuicheng Lin [Sun, 8 Jun 2025 23:01:33 +0000 (23:01 +0000)]
drm/xe/uapi: Correct sync type definition in comments

Commit 37d078e51b4c ("drm/xe/uapi: Split xe_sync types from flags") renamed some DRM_XE_SYNC_*
defines but later commits kept using the old names. Correct them with the new definition.

v2: correct fixes tag and update commit message to explain why (Lucas)

Fixes: 9329f0667215 ("drm/xe/uapi: Use LR abbrev for long-running vms")
Fixes: 4b437893a826 ("drm/xe/uapi: More uAPI documentation additions and cosmetic updates")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Zongyao Bai <zongyao.bai@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250608230133.1250849-1-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe/guc: Explicitly exit CT safe mode on unwind
Michal Wajdeczko [Thu, 12 Jun 2025 22:09:37 +0000 (00:09 +0200)]
drm/xe/guc: Explicitly exit CT safe mode on unwind

During driver probe we might be briefly using CT safe mode, which
is based on a delayed work, but usually we are able to stop this
once we have IRQ fully operational.  However, if we abort the probe
quite early then during unwind we might try to destroy the workqueue
while there is still a pending delayed work that attempts to restart
itself which triggers a WARN.

This was recently observed during unsuccessful VF initialization:

 [ ] xe 0000:00:02.1: probe with driver xe failed with error -62
 [ ] ------------[ cut here ]------------
 [ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq
 [ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710
 [ ] RIP: 0010:__queue_work+0x287/0x710
 [ ] Call Trace:
 [ ]  delayed_work_timer_fn+0x19/0x30
 [ ]  call_timer_fn+0xa1/0x2a0

Exit the CT safe mode on unwind to avoid that warning.

Fixes: 09b286950f29 ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250612220937.857-3-michal.wajdeczko@intel.com
4 months agodrm/xe: Process deferred GGTT node removals on device unwind
Michal Wajdeczko [Thu, 12 Jun 2025 22:09:36 +0000 (00:09 +0200)]
drm/xe: Process deferred GGTT node removals on device unwind

While we are indirectly draining our dedicated workqueue ggtt->wq
that we use to complete asynchronous removal of some GGTT nodes,
this happends as part of the managed-drm unwinding (ggtt_fini_early),
which could be later then manage-device unwinding, where we could
already unmap our MMIO/GMS mapping (mmio_fini).

This was recently observed during unsuccessful VF initialization:

 [ ] xe 0000:00:02.1: probe with driver xe failed with error -62
 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747340 __xe_bo_unpin_map_no_vm (16 bytes)
 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747540 __xe_bo_unpin_map_no_vm (16 bytes)
 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747240 __xe_bo_unpin_map_no_vm (16 bytes)
 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747040 tiles_fini (16 bytes)
 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746840 mmio_fini (16 bytes)
 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747f40 xe_bo_pinned_fini (16 bytes)
 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746b40 devm_drm_dev_init_release (16 bytes)
 [ ] xe 0000:00:02.1: [drm:drm_managed_release] drmres release begin
 [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef81640 __fini_relay (8 bytes)
 [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80d40 guc_ct_fini (8 bytes)
 [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80040 __drmm_mutex_release (8 bytes)
 [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80140 ggtt_fini_early (8 bytes)

and this was leading to:

 [ ] BUG: unable to handle page fault for address: ffffc900058162a0
 [ ] #PF: supervisor write access in kernel mode
 [ ] #PF: error_code(0x0002) - not-present page
 [ ] Oops: Oops: 0002 [#1] SMP NOPTI
 [ ] Tainted: [W]=WARN
 [ ] Workqueue: xe-ggtt-wq ggtt_node_remove_work_func [xe]
 [ ] RIP: 0010:xe_ggtt_set_pte+0x6d/0x350 [xe]
 [ ] Call Trace:
 [ ]  <TASK>
 [ ]  xe_ggtt_clear+0xb0/0x270 [xe]
 [ ]  ggtt_node_remove+0xbb/0x120 [xe]
 [ ]  ggtt_node_remove_work_func+0x30/0x50 [xe]
 [ ]  process_one_work+0x22b/0x6f0
 [ ]  worker_thread+0x1e8/0x3d

Add managed-device action that will explicitly drain the workqueue
with all pending node removals prior to releasing MMIO/GSM mapping.

Fixes: 919bb54e989c ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250612220937.857-2-michal.wajdeczko@intel.com
4 months agodrm/xe: move DPT l2 flush to a more sensible place
Matthew Auld [Fri, 6 Jun 2025 10:45:48 +0000 (11:45 +0100)]
drm/xe: move DPT l2 flush to a more sensible place

Only need the flush for DPT host updates here. Normal GGTT updates don't
need special flush.

Fixes: 01570b446939 ("drm/xe/bmg: implement Wa_16023588340")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250606104546.1996818-4-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe: Move DSB l2 flush to a more sensible place
Maarten Lankhorst [Fri, 6 Jun 2025 10:45:47 +0000 (11:45 +0100)]
drm/xe: Move DSB l2 flush to a more sensible place

Flushing l2 is only needed after all data has been written.

Fixes: 01570b446939 ("drm/xe/bmg: implement Wa_16023588340")
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://lore.kernel.org/r/20250606104546.1996818-3-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/bmg: Update Wa_22019338487
Vinay Belgaumkar [Wed, 18 Jun 2025 18:50:01 +0000 (11:50 -0700)]
drm/xe/bmg: Update Wa_22019338487

Limit GT max frequency to 2600MHz and wait for frequency to reduce
before proceeding with a transient flush. This is really only needed for
the transient flush: if L2 flush is needed due to 16023588340 then
there's no need to do this additional wait since we are already using
the bigger hammer.

v2: Use generic names, ensure user set max frequency requests wait
for flush to complete (Rodrigo)
v3:
 - User requests wait via wait_var_event_timeout (Lucas)
 - Close races on flush + user requests (Lucas)
 - Fix xe_guc_pc_remove_flush_freq_limit() being called on last gt
   rather than root gt (Lucas)
v4:
 - Only apply the freq reducing part if a TDF is needed: L2 flush trumps
   the need for waiting a lower frequency

Fixes: aaa08078e725 ("drm/xe/bmg: Apply Wa_22019338487")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-4-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe: Split xe_device_td_flush()
Lucas De Marchi [Wed, 18 Jun 2025 18:50:00 +0000 (11:50 -0700)]
drm/xe: Split xe_device_td_flush()

xe_device_td_flush() has 2 possible implementations: an entire L2 flush
or a transient flush, depending on WA 16023588340. Make this clear by
splitting the function so it calls each of them.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-3-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/xe_guc_pc: Lock once to update stashed frequencies
Lucas De Marchi [Wed, 18 Jun 2025 18:49:59 +0000 (11:49 -0700)]
drm/xe/xe_guc_pc: Lock once to update stashed frequencies

pc_set_mert_freq_cap() currently lock()/unlock() the mutex multiple times
to stash the current frequencies. It's not a problem since
xe_guc_pc_restore_stashed_freq() is guaranteed to be called only later
in the init sequence. However, now that we have _locked() variants for
this functions, use them and avoid potential issues when called from
other places or using the same pattern.

While at it, prefer and early return for the WA check to reduce
indentation.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-2-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/guc_pc: Add _locked variant for min/max freq
Lucas De Marchi [Wed, 18 Jun 2025 18:49:58 +0000 (11:49 -0700)]
drm/xe/guc_pc: Add _locked variant for min/max freq

There are places in which the getters/setters are called one after the
other causing a multiple lock()/unlock(). These are not currently a
problem since they are all happening from the same thread, but there's a
race possibility as calls are added outside of the early init when the
max/min and stashed values need to be correlated.

Add the _locked() variants to prepare for that.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-1-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/nvm: add support for non-posted erase
Reuven Abliyev [Tue, 17 Jun 2025 14:51:58 +0000 (17:51 +0300)]
drm/xe/nvm: add support for non-posted erase

Erase command is slow on discrete graphics storage
and may overshot PCI completion timeout.
BMG introduces the ability to have non-posted erase.
Add driver support for non-posted erase with polling
for erase completion.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Reuven Abliyev <reuven.abliyev@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-9-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe/nvm: add support for access mode
Alexander Usyskin [Tue, 17 Jun 2025 14:51:57 +0000 (17:51 +0300)]
drm/xe/nvm: add support for access mode

Check NVM access mode from GSC FW status registers
and overwrite access status read from SPI descriptor, if needed.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-8-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe/nvm: add on-die non-volatile memory device
Alexander Usyskin [Tue, 17 Jun 2025 14:51:56 +0000 (17:51 +0300)]
drm/xe/nvm: add on-die non-volatile memory device

Enable access to internal non-volatile memory on DGFX
with GSC/CSC devices via a child device.
The nvm child device is exposed via auxiliary bus.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-7-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agomtd: intel-dg: align 64bit read and write
Alexander Usyskin [Tue, 17 Jun 2025 14:51:55 +0000 (17:51 +0300)]
mtd: intel-dg: align 64bit read and write

GSC NVM controller HW errors on quad access overlapping 1K border.
Align 64bit read and write to avoid readq/writeq over 1K border.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-6-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agomtd: intel-dg: register with mtd
Alexander Usyskin [Tue, 17 Jun 2025 14:51:54 +0000 (17:51 +0300)]
mtd: intel-dg: register with mtd

Register the on-die nvm device with the mtd subsystem.
Refcount nvm object on _get and _put mtd callbacks.
For erase operation address and size should be 4K aligned.
For write operation address and size has to be 4bytes aligned.

CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
CC: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Co-developed-by: Tomas Winkler <tomasw@gmail.com>
Signed-off-by: Tomas Winkler <tomasw@gmail.com>
Co-developed-by: Vitaly Lubart <lubvital@gmail.com>
Signed-off-by: Vitaly Lubart <lubvital@gmail.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-5-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agomtd: intel-dg: implement access functions
Alexander Usyskin [Tue, 17 Jun 2025 14:51:53 +0000 (17:51 +0300)]
mtd: intel-dg: implement access functions

Implement read(), erase() and write() functions.

CC: Lucas De Marchi <lucas.demarchi@intel.com>
CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Co-developed-by: Tomas Winkler <tomasw@gmail.com>
Signed-off-by: Tomas Winkler <tomasw@gmail.com>
Co-developed-by: Vitaly Lubart <lubvital@gmail.com>
Signed-off-by: Vitaly Lubart <lubvital@gmail.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-4-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agomtd: intel-dg: implement region enumeration
Alexander Usyskin [Tue, 17 Jun 2025 14:51:52 +0000 (17:51 +0300)]
mtd: intel-dg: implement region enumeration

In intel-dg, there is no access to the spi controller,
the information is extracted from the descriptor region.

CC: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Co-developed-by: Tomas Winkler <tomasw@gmail.com>
Signed-off-by: Tomas Winkler <tomasw@gmail.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-3-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agomtd: add driver for intel graphics non-volatile memory device
Alexander Usyskin [Tue, 17 Jun 2025 14:51:51 +0000 (17:51 +0300)]
mtd: add driver for intel graphics non-volatile memory device

Add auxiliary driver for intel discrete graphics
non-volatile memory device.

CC: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Co-developed-by: Tomas Winkler <tomasw@gmail.com>
Signed-off-by: Tomas Winkler <tomasw@gmail.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-2-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agoMerge drm/drm-next into drm-xe-next
Rodrigo Vivi [Mon, 23 Jun 2025 17:14:13 +0000 (13:14 -0400)]
Merge drm/drm-next into drm-xe-next

Catch up on i915 changes to be able to include mtd
driver for both xe and i915.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agoMerge tag 'drm-intel-next-2025-06-18' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Mon, 23 Jun 2025 00:49:25 +0000 (10:49 +1000)]
Merge tag 'drm-intel-next-2025-06-18' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

drm/i915 feature pull for v6.17:

Features and functionality:
- Add support for DSC fractional link bpp on DP MST (Imre)
- Add support for simultaneous Panel Replay and Adaptive Sync (Jouni)
- Add support for PTL+ double buffered LUT registers (Chaitanya, Ville)
- Add PIPEDMC event handling in preparation for flip queue (Ville)

Refactoring and cleanups:
- Rename lots of DPLL interfaces to unify them (Suraj)
- Allocate struct intel_display dynamically (Jani)
- Abstract VLV IOSF sideband better (Jani)
- Use str_true_false() helper (Yumeng Fang)
- Refactor DSB code in preparation for flip queue (Ville)
- Use drm_modeset_lock_assert_held() instead of open coding (Luca)
- Remove unused arg from skl_scaler_get_filter_select() (Luca)
- Split out a separate display register header (Jani)
- Abstract DRAM detection better (Jani)
- Convert LPT/WPT SBI sideband to struct intel_display (Jani)

Fixes:
- Fix DSI HS command dispatch with forced pipeline flush (Gareth Yu)
- Fix BMG and LNL+ DP adaptive sync SDP programming (Ankit)
- Fix error path for xe display workqueue allocation (Haoxiang Li)
- Disable DP AUX access probe where not required (Imre)
- Fix DKL PHY access if the port is invalid (Luca)
- Fix PSR2_SU_STATUS access on ADL+ (Jouni)
- Add sanity checks for porch and sync on BXT/GLK DSI (Ville)

DRM core changes:
- Change AUX DPCD access probe address (Imre)
- Refactor EDID quirks, amd make them available to drivers (Imre)
- Add quirk for DPCD access probe (Imre)
- Add DPCD definitions for Panel Replay capabilities (Jouni)

Merges:
- Backmerges to sync with v6.15-rcs and v6.16-rc1 (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/fff9f231850ed410bd81b53de43eff0b98240d31@intel.com
4 months agodrm/xe/ptl: Apply Wa_16026007364
sanirban [Thu, 19 Jun 2025 13:34:14 +0000 (19:04 +0530)]
drm/xe/ptl: Apply Wa_16026007364

As part of this WA GuC will save and restore value of two XE3_Media
control registers that were not included in the HW power context.

v2:
  - Update klv name (Badal)

Signed-off-by: sanirban <sk.anirban@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250619133413.107423-2-sk.anirban@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agoMerge tag 'drm-misc-next-2025-06-19' of https://gitlab.freedesktop.org/drm/misc/kerne...
Dave Airlie [Fri, 20 Jun 2025 01:33:41 +0000 (11:33 +1000)]
Merge tag 'drm-misc-next-2025-06-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for 6.17:

UAPI Changes:
- Add Task Information for the wedge API

Cross-subsystem Changes:

Core Changes:
- Fix warnings related to export.h
- fbdev: Make CONFIG_FIRMWARE_EDID available on all architectures
- fence: Fix UAF issues
- format-helper: Improve tests

Driver Changes:
- ivpu: Add turbo flag, Add Wildcat Lake Support
- rz-du: Improve MIPI-DSI Support
- vmwgfx: fence improvement

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250619-perfect-industrious-whippet-8ed3db@houat
4 months agoMerge tag 'drm-xe-next-2025-06-18' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Thu, 19 Jun 2025 23:07:49 +0000 (09:07 +1000)]
Merge tag 'drm-xe-next-2025-06-18' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

UAPI Changes:
- Expose media OA units (Ashutosh)

Merge:
 - Restore GuC submit UAF fix around queue destruction
   accidentally removed in a drm-xe-fixes merge (Auld)

Core Changes:
- drm/gpusvm: Introduce devmem_only flag for allocation (Himal)
- drm/gpusvm: Add timeslicing support to GPU SVM (Brost)

Driver Changes:
 - Make gem shrinker drm managed (Thomas)
 - SRIOV VF Post-migration recovery of GGTT nodes and CTB (Tomasz)
 - Some W/A additions and updates (Aradhya, Shekhar, Vinay, Daniele)
 - Prefetch Support for svm ranges (Himal, Brost)
 - Don't allocate managed BO for each policy change (Michal)
 - Simplify and fix diff calculation in GuC submit (Lucas)
 - Track FAST_REQ GuC H2Gs to report where errors came from (John)
 - SRIOV PF: Don't allow LMEM provisioning if LMTT isn't available (Piotr)
 - Check if all domains awake for MOCS dump (Tejas)
 - Make creation of SLPC debugfs files conditional (Aradhya)
 - Default auto_link_downgrade status to false (Aradhya)
 - Use xe_mmio_read32() to read mtcfg register (Shuicheng)
 - Updates in PCI ID tables (Atwood, Shekhar)
 - SRIOV VF:  Fail migration recovery if fixups needed but not supported (Tomasz)
 - Add missing documentation around freq and RPa (Rodrigo)
 - Some other SVM related fixes (Himal, Auld, Brost, Maarten)
 - Allow to trigger GT resets using debugfs writes (Michal)
 - Optimise CCS case for WB pages (Auld)
 - Create LRC BO without VM (Niranjana)
 - Initialize MOCS index early (Bala)
 - HWMON fixes for BMG (Karthik, Lucas)
 - Drop redundant conversion to bool (Raag)
 - Rework eviction rejection of bound external bos (Thomas)
 - Stop re-submitting signalled jobs (Auld)
 - Small fixes and cleanups for PXP (Daniele)
 - Convert some print messages to GT-oriented ones (Michal)
 - Resend potentially lost GuC H2G MMIO request (Michal)
 - Add configfs to load with fewer engines (Lucas)
 - Remove unmatched xe_vm_unlock from __xe_exec_queue_init (Maciej)
 - SRIOV VF: Small updates around GGTT handling (Michal)
 - Make VMA tile_present, tile_invalidated access rules clear (Brost)
 - Xe3 Tuning: Disable NULL query for Anyhit Shader (Nitin)
 - Fixes for VF GuC version (Daniele)
 - Don't store the xe device pointer inside xe_ttm_tt (Dave)
 - Small improvements in topology code (Michal)
 - Stop relying on GGTT internals (Maarten)
 - GSM size should be constant on most platforms (Roper)
 - Reorder 'Get pages failed' message (Brost)
 - WA BB related fixes and improvements (Lucas, Brost)
 - Fix early wedge on GuC load failure (Daniele)
 - Add helper function to inject fault into ct_dead_capture (Satyanarayana)
 - Determine ATS / PTA programming during early sw init (Roper)
 - Consolidate PAT programming logic for pre-Xe2 and post-Xe2 (Roper)
 - Fix kconfig prompt (Lucas)
 - Convert xe_pci tests to parametrized tests (Michal)
 - Do not kill VM in PT code on -ENODATA (Brost)
 - Move LRC_ENGINE_ID_PPHWSP_OFFSET outside of parallel offset (Brost)
 - Enable media OA (Ashutosh)
 - GuC log level tuning (Lucas)
 - Add xe_vm_has_valid_gpu_mapping helper (Brost)
 - Opportunistically skip TLB invalidaion on unbind (Brost)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aFMb_NVF_oCW7UVl@intel.com
4 months agodrm/xe: Add Wildcat Lake device IDs to PTL list
Matt Roper [Fri, 13 Jun 2025 19:31:43 +0000 (01:01 +0530)]
drm/xe: Add Wildcat Lake device IDs to PTL list

Introduce wildcat lake device Id.
Wildcat Lake uses slightly different graphics and media IP versions
than Panther Lake, but can still be treated as PTL for general driver
flows.

Bspec: 73951
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250613193146.3549862-7-dnyaneshwar.bhadane@intel.com
4 months agodrm/xe/xe3: Add support for media IP version 30.02
Matt Roper [Fri, 13 Jun 2025 19:31:40 +0000 (01:01 +0530)]
drm/xe/xe3: Add support for media IP version 30.02

Media version 30.02 should be treated the same as other Xe3 IP, but
will have a slightly different set of workarounds.

-v2: Extend the range in existing WA entry (Bala)
-v3: Revert v2, Do not extend the range for the time being(Matt)

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250613193146.3549862-4-dnyaneshwar.bhadane@intel.com
4 months agodrm/xe/xe3: Add support for graphics IP version 30.03
Matt Roper [Fri, 13 Jun 2025 19:31:39 +0000 (01:01 +0530)]
drm/xe/xe3: Add support for graphics IP version 30.03

Graphics version 30.03 should be treated the same as other Xe3 IP, but
will have a slightly different set of workarounds.

-v2: Merge and extend the WA onto existing entry (Bala)
-v3: Revert v2's feedback changes and keep entry saparate (Matt).

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@inte.com>
Link: https://lore.kernel.org/r/20250613193146.3549862-3-dnyaneshwar.bhadane@intel.com
4 months agodrm/xe/hwmon: Fix xe_hwmon_power_max_write
Karthik Poosa [Tue, 17 Jun 2025 12:00:30 +0000 (17:30 +0530)]
drm/xe/hwmon: Fix xe_hwmon_power_max_write

Prevent other bits of mailbox power limit from being overwritten with 0.
This issue was due to a missing read and modify of current power limit,
before setting a requested mailbox power limit, which is added in this
patch.

v2:
 - Improve commit message. (Anshuman)

v3:
 - Rebase.
 - Rephrase commit message. (Riana)
 - Add read-modify-write variant of xe_hwmon_pcode_write_power_limit()
   i.e. xe_hwmon_pcode_rmw_power_limit(). (Badal)
 - Use xe_hwmon_pcode_rmw_power_limit() to set mailbox power limits.
 - Remove xe_hwmon_pcode_write_power_limit() as all mailbox power limits
   writes use xe_hwmon_pcode_rmw_power_limit() only.

v4:
 - Use PWR_LIM in place of (PWR_LIM_EN | PWR_LIM_VAL) wherever
   applicable. (Riana)

Fixes: 7596d839f6228 ("drm/xe/hwmon: Add support to manage power limits though mailbox")
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250617120030.612819-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/format-helper: Move drm_fb_build_fourcc_list() to sysfb helpers
Thomas Zimmermann [Mon, 16 Jun 2025 08:37:06 +0000 (10:37 +0200)]
drm/format-helper: Move drm_fb_build_fourcc_list() to sysfb helpers

Only sysfb drivers use drm_fb_build_fourcc_list(). Move the function
to sysfb helpers and rename it accordingly. Update drivers and tests.

v3:
- update naming in tests
v2:
- select DRM_SYSFB_HELPER (kernel test robot)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: José Expósito <jose.exposito89@gmail.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://lore.kernel.org/r/20250616083846.221396-4-tzimmermann@suse.de
4 months agodrm/tests: Test drm_fb_build_fourcc_list() in separate test suite
Thomas Zimmermann [Mon, 16 Jun 2025 08:37:05 +0000 (10:37 +0200)]
drm/tests: Test drm_fb_build_fourcc_list() in separate test suite

Only sysfb drivers use drm_fb_build_fourcc_list(). The helper will
be moved from format helpers to sysfb helpers. Moving the related
tests to their own test suite.

v3:
- rename tests according to filename (José)
v2:
- rename filename to match tested code (Maxime)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: José Expósito <jose.exposito89@gmail.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20250616083846.221396-3-tzimmermann@suse.de
4 months agodrm/tests: Do not use drm_fb_blit() in format-helper tests
Thomas Zimmermann [Mon, 16 Jun 2025 08:37:04 +0000 (10:37 +0200)]
drm/tests: Do not use drm_fb_blit() in format-helper tests

Export additional helpers from the format-helper library and open-code
drm_fb_blit() in tests. Prepares for the removal of drm_fb_blit(). Only
sysfb drivers use drm_fb_blit(). The function will soon be removed from
format helpers and be refactored within sysfb helpers.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: José Expósito <jose.exposito89@gmail.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20250616083846.221396-2-tzimmermann@suse.de
4 months agodrm/vmwgfx: Fix Host-Backed userspace on Guest-Backed kernel
Ian Forbes [Tue, 29 Apr 2025 20:34:27 +0000 (15:34 -0500)]
drm/vmwgfx: Fix Host-Backed userspace on Guest-Backed kernel

Running 3D applications with SVGA_FORCE_HOST_BACKED=1 or using an
ancient version of mesa was broken because the buffer was pinned in
VMW_BO_DOMAIN_SYS and could not be moved to VMW_BO_DOMAIN_MOB during
validation.

The compat_shader buffer should not pinned.

Fixes: 668b206601c5 ("drm/vmwgfx: Stop using raw ttm_buffer_object's")
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://lore.kernel.org/r/20250429203427.1742331-1-ian.forbes@broadcom.com
4 months agodrm/vmwgfx: Implement dma_fence_ops properly
Ian Forbes [Fri, 30 May 2025 18:35:09 +0000 (13:35 -0500)]
drm/vmwgfx: Implement dma_fence_ops properly

vmwgfx's fencing predates dma_fence and as a result dma_fence_ops was never
properly implemented, especially with respect to enabling signaling.

Because of this dma_fence callbacks don't work properly. This change
implements enable_signaling properly so that dma_fence callbacks now
work as expected.

It also removes vmwgfx's custom implementation of fence callbacks
and removes vmwgfx's custom dma_fence_ops::wait function which is no
longer necessary now that enable_signaling works.

Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://lore.kernel.org/r/20250530183510.733175-2-ian.forbes@broadcom.com
4 months agodrm/vmwgfx: Update last_read_seqno under the fence lock
Ian Forbes [Fri, 30 May 2025 18:35:08 +0000 (13:35 -0500)]
drm/vmwgfx: Update last_read_seqno under the fence lock

There was a possible race in vmw_update_seqno. Because of this race it
was possible for last_read_seqno to go backwards. Remove this function
and replace it with vmw_update_fences which now sets and returns the
last_read_seqno while holding the fence lock. This serialization via the
fence lock ensures that last_read_seqno is monotonic again.

Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://lore.kernel.org/r/20250530183510.733175-1-ian.forbes@broadcom.com
4 months agodrm/xe: Opportunistically skip TLB invalidaion on unbind
Matthew Brost [Mon, 16 Jun 2025 06:30:24 +0000 (23:30 -0700)]
drm/xe: Opportunistically skip TLB invalidaion on unbind

If a range or VMA is invalidated and scratch page is disabled, there
is no reason to issue a TLB invalidation on unbind, skip TLB
innvalidation is this condition is true. This is an opportunistic check
as it is done without the notifier lock, thus it possible for the range
to be invalidated after this check is performed.

This should improve performance of the SVM garbage collector, for
example, xe_exec_system_allocator --r many-stride-new-prefetch, went
~20s to ~9.5s on a BMG.

v2:
 - Use helper for valid check (Thomas)
v3:
 - Avoid skipping TLB invalidation if PTEs are removed at a higher
   level than the range
 - Never skip TLB invalidations for VMA
 - Drop Himal's RB

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250616063024.2059829-3-matthew.brost@intel.com
4 months agodrm/xe: Add xe_vm_has_valid_gpu_mapping helper
Matthew Brost [Mon, 16 Jun 2025 06:30:23 +0000 (23:30 -0700)]
drm/xe: Add xe_vm_has_valid_gpu_mapping helper

Rather than having multiple READ_ONCE of the tile_* fields and comments
in code, use helper with kernel doc for single access point and clear
rules.

v3:
 - s/xe_vm_has_valid_gpu_pages/xe_vm_has_valid_gpu_mapping

Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250616063024.2059829-2-matthew.brost@intel.com
4 months agoMerge tag 'drm-misc-next-2025-06-12' of https://gitlab.freedesktop.org/drm/misc/kerne...
Dave Airlie [Tue, 17 Jun 2025 22:09:27 +0000 (08:09 +1000)]
Merge tag 'drm-misc-next-2025-06-12' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for 6.17:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:
 - atomic-helpers: Tune the enable / disable sequence
 - bridge: Add destroy hook
 - color management: Add helpers for hardware gamma LUT handling
 - HDMI: Add CEC handling, YUV420 output support
 - sched: tracing improvements

Driver Changes:
 - hyperv: Move out of simple-kms, drm_panic support
 - i915: drm_panel_follower support
 - imx: Add IMX8qxq Display Controller Support
 - lima: Add Rockchip RK3528 GPU Support
 - nouveau: fence handling cleanup
 - panfrost: Add BO labeling, 64-bit registers access
 - qaic: Add RAS Support
 - rz-du: Add RZ/V2H(P) Support, MIPI-DSI DCS Support
 - sun4i: Add H616 Support
 - tidss: Add TI AM62L Support
 - vkms: YUV and R* formats support

 - bridges:
   - Switched to reference counted drm_bridge allocations

 - panels:
   - Switched to reference counted drm_panel allocations
   - Add support for fwnode-based panel lookup
   - himax-hx8394: Support for Huiling hl055fhv028c
   - ilitek-ili9881c: Support for 7" Raspberry Pi 720x1280
   - panel-edp: Support for KDC KD116N3730A05, N160JCE-ELL CMN,
   - panel-simple: Support for AUO P238HAN01
   - st7701: Support for Winstar wf40eswaa6mnn0
   - visionox-rm69299: Support for rm69299-shift
   - New panels: Renesas R61307, Renesas R69328

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250612-coucal-of-impossible-cleaning-a5eecf@houat
4 months agodrm/xe: Extend WA 14018094691 to BMG
Daniele Ceraolo Spurio [Fri, 13 Jun 2025 23:11:29 +0000 (16:11 -0700)]
drm/xe: Extend WA 14018094691 to BMG

This WA is applicable to BMG as well.

Note that this is a GSC WA and we don't load the GSC on BMG, so
extending the WA to BMG won't do anything right now. However, it helps
future-proof the driver so that if we ever turn the GSC on we won't have
to remember to extend this WA.

v2: don't use VERSION_RANGE from 2001 to 2004 (Matt)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250613231128.1261815-2-daniele.ceraolospurio@intel.com
4 months agodrm/xe: Fix memset on iomem
Lucas De Marchi [Thu, 12 Jun 2025 22:14:12 +0000 (15:14 -0700)]
drm/xe: Fix memset on iomem

It should rather use xe_map_memset() as the BO is created with
XE_BO_FLAG_VRAM_IF_DGFX in xe_guc_pc_init().

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250612-vmap-vaddr-v1-1-26238ed443eb@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe: Annotate default for guc_log_level param
Lucas De Marchi [Fri, 13 Jun 2025 20:00:38 +0000 (13:00 -0700)]
drm/xe: Annotate default for guc_log_level param

Reword the parameter description so it's clear what's the default and
what are the verbose levels.

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250613-guc-log-level-v2-2-cb84a63e49fe@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/guc: Default log level to non-verbose
Lucas De Marchi [Fri, 13 Jun 2025 20:00:37 +0000 (13:00 -0700)]
drm/xe/guc: Default log level to non-verbose

Currently xe sets the guc log level to a verbose level since it's useful
to debug hangs and general development. However the verbose level may
already be too much and affect performance.

Michal Mrozek did some tests with the L0 compute stack for submission
latency with ULLS disabled. Below are the normalized numbers with log
level 3 (the current default) as baseline for each test:

                          Test \ Log Level                        3      0      1      2
 ----------------------------------------------------------- ------ ------ ------ ------
  BestWalkerNthCommandListSubmission(CmdListCount=2)           1.00   0.63   0.63   0.96
  BestWalkerNthSubmission(KernelCount=2)                       1.00   0.62   0.63   0.96
  BestWalkerNthSubmissionImmediate(KernelCount=2)              1.00   0.58   0.58   0.85
  BestWalkerSubmission                                         1.00   0.62   0.62   0.96
  BestWalkerSubmissionImmediate                                1.00   0.63   0.62   0.96
  BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=2)   1.00   0.58   0.58   0.86
  BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=4)   1.00   0.70   0.70   0.83
  BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=8)   1.00   0.53   0.52   0.78

Log level 2 is the first "verbose level" for GuC, where the biggest
difference happens. Keep log level 3 for CONFIG_DRM_XE_DEBUG, but switch
to 1, i.e.  GUC_LOG_LEVEL_NON_VERBOSE, for "normal" builds.

Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250613-guc-log-level-v2-1-cb84a63e49fe@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/oa: Enable OAM latency measurement
Ashutosh Dixit [Fri, 6 Jun 2025 19:26:17 +0000 (12:26 -0700)]
drm/xe/oa: Enable OAM latency measurement

Enable OAM latency measurement for Xe3+ platforms.

Bspec: 58840

v2: Introduce DRM_XE_OA_UNIT_TYPE_OAM_SAG
v3: Also add LNCF_MISC_CONFIG_REGISTER0 needed by MDAPI

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250606192618.4133817-6-ashutosh.dixit@intel.com
4 months agodrm/xe/oa: Assign hwe for OAM_SAG
Ashutosh Dixit [Fri, 6 Jun 2025 19:26:16 +0000 (12:26 -0700)]
drm/xe/oa: Assign hwe for OAM_SAG

Because OAM_SAG doesn't have an attached hwe, assign another hwe belonging
to the same gt (and different OAM unit) to OAM_SAG. A hwe is needed for
batch submissions to program OA HW.

v2: Assign an engine with a valid OA unit for OAM_SAG (Umesh)

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250606192618.4133817-5-ashutosh.dixit@intel.com
4 months agodrm/xe/oa: Introduce stream->oa_unit
Ashutosh Dixit [Fri, 6 Jun 2025 19:26:15 +0000 (12:26 -0700)]
drm/xe/oa: Introduce stream->oa_unit

Previously, the oa_unit associated with an OA stream was derived from hwe
associated with the stream (stream->hwe->oa_unit). This breaks with OAM_SAG
since OAM_SAG does not have any attached hardware engines. Resolve this by
introducing stream->oa_unit and stop depending on stream->hwe.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250606192618.4133817-4-ashutosh.dixit@intel.com
4 months agodrm/xe/oa: Print hwe to OA unit mapping
Ashutosh Dixit [Fri, 6 Jun 2025 19:26:14 +0000 (12:26 -0700)]
drm/xe/oa: Print hwe to OA unit mapping

Print hwe to OA unit mapping to dmesg, to help debug for current and new
platforms.

v2: Separate out xe_oa_print_gt_oa_units() (Umesh)

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250606192618.4133817-3-ashutosh.dixit@intel.com
4 months agodrm/xe/oa/uapi: Expose media OA units
Ashutosh Dixit [Fri, 6 Jun 2025 19:26:13 +0000 (12:26 -0700)]
drm/xe/oa/uapi: Expose media OA units

On Xe2+ platforms, media engines are attached to "SCMI" OA media (OAM)
units. One or more SCMI OAM units might be present on a platform. In
addition there is another OAM unit for global events, called
OAM-SAG. Performance metrics for media workloads can be obtained from these
OAM units, similar to OAG.

Expose these OAM units for userspace to use. OAM-SAG is exposed as an OA
unit without any attached engines.

Bspec: 70819, 67103, 63844, 72572, 74476, 61284

v2: Fix xe_gt_WARN_ON in __hwe_oam_unit for < 12.7 platforms
v3: Return XE_OA_UNIT_INVALID for < 12.7 to indicate no OAM units
v4: Move xe_oa_print_oa_units() to separate patch
v5: Introduce DRM_XE_OA_UNIT_TYPE_OAM_SAG
v6: Introduce DRM_XE_OA_CAPS_OAM

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250606192618.4133817-2-ashutosh.dixit@intel.com
4 months agodrm/i915/dsb: Disable the GOSUB interrupt
Ville Syrjälä [Thu, 12 Jun 2025 14:50:18 +0000 (17:50 +0300)]
drm/i915/dsb: Disable the GOSUB interrupt

Current DSB hardware is apparently a bit borked and likes to signal
spurious GOSUB errors. We already have most for the workarounds for
this in place, but the last part is simply not enabling the corresponding
interrupt.

While at it polish up the w/a comments with the w/a number,
and consistently take the short blurp from the w/a page.

Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250612145018.8735-7-ville.syrjala@linux.intel.com
4 months agodrm/i915/dsb: Move the DSB_PMCTRL* reset out of intel_dsb_finish()
Ville Syrjälä [Thu, 12 Jun 2025 14:50:17 +0000 (17:50 +0300)]
drm/i915/dsb: Move the DSB_PMCTRL* reset out of intel_dsb_finish()

When using the flip queue, due to the DMC vs. DSB register corruption
problem, we must not issue any register writes from the DSB after
unhalting the DMC. Currently we are doing just that by trying to
restore DSB_PMCTRL* back to a sane state from intel_dsb_finish().

Since the only place left that pokes at DSB_PMCTRL* is intel_dsb_chain()
we can just do DSB_PMCTRL_2/DSB_FORCE_DEWAKE reset in the same place.

The DSB_PMCTRL reset is trickier since we'd have to do it from the
chained DSB itself. But based on my earlier testing
DSB_PMCTRL/DSB_ENABLE_DEWAKE doesn't actually do anything if the DSB
isn't actually enabled, so we can omit the reset to keep things a bit
simpler. We do need to reset DSB_PMCTRL/DSB_ENABLE_DEWAKE before
tarting the DSB however, in case it was left enabled from a previous
use.

Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250612145018.8735-6-ville.syrjala@linux.intel.com
4 months agodrm/i915/dsb: Garbage collect the MMIO DEwake stuff
Ville Syrjälä [Thu, 12 Jun 2025 14:50:16 +0000 (17:50 +0300)]
drm/i915/dsb: Garbage collect the MMIO DEwake stuff

Since the introduction of DSB chaining we no longer need the
DEwake tricks in intel_dsb_commit().

I also need to relocate the DSB_PMCTRL* writes out of
intel_dsb_finish() (due to the flip queue DMC vs. DSB register
corruption issues), and it'll be a bit more straightforward if
I don't have to worry about the non-chained DSB path anymore.

Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250612145018.8735-5-ville.syrjala@linux.intel.com
4 months agodrm/i915/dsb: Introduce intel_dsb_exec_time_us()
Ville Syrjälä [Thu, 12 Jun 2025 14:50:15 +0000 (17:50 +0300)]
drm/i915/dsb: Introduce intel_dsb_exec_time_us()

Pull the magic 20 usec DSB execution deadline into
intel_dsb_arm_exec_time_us(), and also add its counterpart
for the non-arming register write section. For the non-arming
part we'll just throw in a random 80 usec for now so the total
is 100usec. The total exec time will be needed by the upcoming
flip queue code.

Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250612145018.8735-4-ville.syrjala@linux.intel.com