This WA is applied while initializing the media GT, but it a primary
GT WA (because it modifies a register on the primary GT), so the XE_WA
macro is returning false even when the WA should be applied.
Fix this by using the primary GT in the macro.
Note that this WA only applies to PXP and we don't yet support that in
Xe, so there are no negative effects to this bug, which is why we didn't
see any errors in testing.
v2: use the primary GT in the macro instead of marking the WA as
platform-wide (Lucas, Matt).
Enable workarounds for HW bug where render engine reset fails. Given
that we're bumping the minimum required GuC version to 70.29, we're
guaranteed to always have support for this KLV in the GuC.
v2: Enable KLV correctly for either workaround (Lucas)
v4: Add check for minimum supported GuC firmware version. Enable w/a for
hw version 20.01 too. (Daniele)
v5 (Daniele): remove now unneeded fw type and version checks (JohnH)
Julia Filipchuk [Fri, 2 Aug 2024 22:21:27 +0000 (15:21 -0700)]
drm/xe/guc: Bump minimum required GuC version to v70.29.2
The VF API version for this release is 1.13.4.
Bumping the minimum required GuC version just before force-probe
removal allows us to set a baseline for what API features are expected
to be available. I.e., at this point there is no need for any version
checking in the code before using a feature. Of course, if/when the
API is extended in future GuC releases, those new features will need
API version checks in the code.
Bump the recommended GuC versions to match.
Also add numerical comparison helpers to simplify the version number
checks.
v2: Reword commit message and make comparison helpers GuC specific -
review feedback from Daniele, done by JohnH
Matthew Brost [Thu, 1 Aug 2024 15:41:18 +0000 (08:41 -0700)]
drm/xe: Faster devcoredump
The current algorithm to read out devcoredump is O(N*N) where N is the
size of coredump due to usage of the drm_coredump_printer in
xe_devcoredump_read. Switch to a O(N) algorithm which prints the
devcoredump into a readable format in snapshot work and update
xe_devcoredump_read to memcpy from the readable format directly.
v2:
- Fix double free on devcoredump removal (Testing)
- Set read_data_size after snap work flush
- Adjust remaining in iterator upon realloc (Testing)
- Set read_data upon realloc (Testing)
v3:
- Kernel doc
v4:
- Two pass algorithm to determine size (Maarten)
v5:
- Use scope for reading variables (Johnathan)
Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2408 Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240801154118.2547543-4-matthew.brost@intel.com
Matthew Brost [Thu, 1 Aug 2024 15:41:17 +0000 (08:41 -0700)]
drm/printer: Allow NULL data in devcoredump printer
We want to determine the size of the devcoredump before writing it out.
To that end, we will run the devcoredump printer with NULL data to get
the size, alloc data based on the generated offset, then run the
devcorecump again with a valid data pointer to print. This necessitates
not writing data to the data pointer on the initial pass, when it is
NULL.
Matthew Brost [Thu, 1 Aug 2024 15:41:16 +0000 (08:41 -0700)]
drm/xe: Take ref to VM in delayed snapshot
Kernel BO's don't take a ref to the VM, we need the VM for the
delayed snapshot, so take a ref to the VM in delayed snapshot.
v2:
- Check for lrc_bo before taking a VM ref (CI)
- Check lrc_bo->vm before taking / dropping a VM ref (CI)
- Drop VM in xe_lrc_snapshot_free
v5:
- Fix commit message wording (Johnathan)
Karthik Poosa [Thu, 1 Aug 2024 11:24:24 +0000 (16:54 +0530)]
drm/xe/hwmon: Fix PL1 disable flow in xe_hwmon_power_max_write
In xe_hwmon_power_max_write, for PL1 disable supported case, instead of
returning after PL1 disable, PL1 enable path was also being run.
Fixed it by returning after disable.
v2: Correct typo and grammar in commit message. (Jonathan)
Enable feature to allow memory reads to take a priority memory path.
This will reduce latency on the read path, but may introduce read after
write (RAW) hazards as read and writes will no longer be ordered.
To avoid RAW hazards, SW can use the MI_MEM_FENCE command or any other
MI command that generates non posted memory writes. This will ensure
data is coherent in memory prior to execution of commands which read
data from memory. RCS,BCS and CCS support this feature.
No pattern identified in KMD that could lead to a hazard.
v2: Modify commit message, enable priority mem read feature for media,
modify version range, modify bspec detail (Matt Roper)
v3: Rebase, fix cramped line-wrapping (jcavitt)
v4: Rebase
v5: Media does not support Priority Mem Read. Modify commit
to reflect the same.
Matthew Brost [Sat, 27 Jul 2024 01:22:16 +0000 (18:22 -0700)]
drm/xe: Use dma_fence_chain_free in chain fence unused as a sync
A chain fence is uninitialized if not installed in a drm sync obj. Thus
if xe_sync_entry_cleanup is called and sync->chain_fence is non-NULL the
proper cleanup is dma_fence_chain_free rather than a dma-fence put.
Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2411 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2261 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240727012216.2118276-1-matthew.brost@intel.com
In function ‘decode_oa_format.isra.26’,
inlined from ‘xe_oa_set_prop_oa_format’ at drivers/gpu/drm/xe/xe_oa.c:1664:6:
././include/linux/compiler_types.h:510:38: error: call to ‘__compiletime_assert_1336’ declared with attribute error: FIELD_GET: mask is not constant
[...]
./include/linux/bitfield.h:155:3: note: in expansion of macro ‘__BF_FIELD_CHECK’
__BF_FIELD_CHECK(_mask, _reg, 0U, "FIELD_GET: "); \
^
drivers/gpu/drm/xe/xe_oa.c:1573:18: note: in expansion of macro ‘FIELD_GET’
u32 bc_report = FIELD_GET(DRM_XE_OA_FORMAT_MASK_BC_REPORT, fmt);
^
Lucas De Marchi [Sat, 27 Jul 2024 01:59:07 +0000 (18:59 -0700)]
drm/xe: Migrate OOB WAs to OR rules
Now that rtp has OR rules, it's not needed to extend it to process OOB
WAs. Previously if an entry had no name, it was considered as "a set of
rules OR'ed with the last named entry".
Instead of generating new entries, add OR rules. The syntax for
xe_wa_oob.rules remains the same, with xe_gen_wa_oob generating the
slightly different table. Object sizes delta are negligible, but having
just one logic makes it easier to maintain:
add/remove: 0/0 grow/shrink: 1/2 up/down: 160/-269 (-109)
Function old new delta
__compound_literal 6104 6264 +160
xe_wa_dump 1839 1810 -29
oob_was 816 576 -240
Total: Before=17257, After=17148, chg -0.63%
Lucas De Marchi [Sat, 27 Jul 2024 01:59:06 +0000 (18:59 -0700)]
drm/xe/rtp: Expand max rules/actions per entry again
Like commit 512660cd1f1a ("drm/xe/rtp: Expand max rules/actions per
entry") did, expand the maximum number of actions/rules. That commit was
too conservative, just incrementing 2. Other than the ugliness of these
macros and additional preprocessor steps when they are used, there are
no downsides on increasing the maximum: the tables in which they are
used use a sentinel to mark the last element.
With rtp processing now supporting OR rules, it's possible to migrate
the extension made for OOB WAs that "entries with name are OR'ed in
previous entry". For that the maximum number of rules needs to be
increased.
Just double it. Hopefully 12 is sufficient for longer than 6 was.
Lucas De Marchi [Sat, 27 Jul 2024 01:59:05 +0000 (18:59 -0700)]
drm/xe/rtp: Simplify marking active workarounds
Stop doing the calculation both in rtp_mark_active() and in its caller.
The caller easily knows the number of entries to mark, so just pass it
forward. That also simplifies rtp_mark_active() since now it doesn't
have a special case when handling 1 entry.
Lucas De Marchi [Sat, 27 Jul 2024 01:59:04 +0000 (18:59 -0700)]
drm/xe/kunit: Test rtp with no actions
The OOB WAs use xe_rtp_process(), without passing an sr to save result
of the actions since there are none. They are also executed in a gt-only
context, making it harder to share the implementation. Thus, introduce a
new set of tests to check these RTP entries. The only check that can be
done is if the entry was marked as active.
Before commit fd6797ec50c5 ("drm/xe/rtp: Fix off-by-one when processing
rules") several of these tests were failing: the processing of OR'ed
entries would make the subsequent entry to be inadvertently enabled.
Lucas De Marchi [Sat, 27 Jul 2024 01:59:01 +0000 (18:59 -0700)]
drm/xe/kunit: Rename count to count_sr_entries
The RTP tests check both the result of processing the RTP entries and
the outcome saved as SR entries. Rename "count" to be explicit about
what's being counted.
Matt Roper [Fri, 26 Jul 2024 17:17:58 +0000 (10:17 -0700)]
drm/xe/migrate: Future-proof compressed PAT check
Although all current Xe2 platforms support FlatCCS, we probably
shouldn't assume that will be universally true forever. In the past
we've had platforms like PVC that didn't support compression, and the
same could show up again at some point in the future. Future-proof the
migration code by adding an explicit check for FlatCCS support to the
condition that decides whether to use a compressed PAT index for
migration.
While we're at it, we can drop the IS_DGFX check since it's redundant
with the src_is_vram check (only dGPUs have VRAM).
Lucas De Marchi [Fri, 26 Jul 2024 06:43:35 +0000 (23:43 -0700)]
drm/xe/rtp: Fix off-by-one when processing rules
Gustavo noticed an odd "+ 2" in rtp_mark_active() while processing
rtp rules and pointed that it should be "+ 1". In fact, while processing
entries without actions (OOB workarounds), if the WA is activated and
has OR rules, it will also inadvertently activate the very next
workaround.
Test in a LNL B0 platform by moving 18024947630 on top of 16020292621,
makes the latter become active:
drm/xe/mmio: Use single logic for waiting functions
The implementations for xe_mmio_wait32() and xe_mmio_wait32_not() are
almost identical. Let us avoid duplication of logic by having them
calling a common __xe_mmio_wait32() function.
drm/xe: Remove stale declaration of xe_mmio_probe_vram()
The declaration of xe_mmio_probe_vram() became useless since
commit 638d1c79cbf1 ("drm/xe: Promote VRAM initialization function to
own file"). Remove it.
All new binaries are GSC-enabled (and even if they weren't the driver
can detect the type of HuC binary), so the new lnl HuC filename doesn't
use the _gsc postfix to avoid confusion with the GSC binary.
Matthew Brost [Tue, 23 Jul 2024 19:07:14 +0000 (12:07 -0700)]
drm/xe: Remove fence check from send_tlb_invalidation
'fence' argument in send_tlb_invalidation cannot be NULL, remove
non-NULL check from send_tlb_invalidation.
Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202407231049.esig0Fkb-lkp@intel.com/ Fixes: 61ac035361ae ("drm/xe: Drop xe_gt_tlb_invalidation_wait") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240723190714.1744653-1-matthew.brost@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Matthew Brost [Tue, 23 Jul 2024 15:10:45 +0000 (08:10 -0700)]
drm/xe: Store process name and pid in xe file
An xe file can outlive the associated process as the GPU cleanup is just
triggered upon file close (process kill) and completes sometime later.
If the file close triggers error conditions (GPU hangs) the process
cannot be safely referenced to retrieve the name and pid for debug
information. Store the process name and pid directly in the xe file to
be safe.
v2:
- Access file->pid via rcu_access_pointer (Matthew Auld)
Fixes: b10d0c5e9df7 ("drm/xe: Add process name to devcoredump") Fixes: f6ca930d974e ("drm/xe: Add process name and PID to job timedout message") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240723151045.1725417-1-matthew.brost@intel.com
Matthew Brost [Tue, 23 Jul 2024 01:17:02 +0000 (18:17 -0700)]
drm/xe: Return -ENOBUFS if a kmalloc fails which is tied to an array of binds
The size of an array of binds is directly tied to several kmalloc in the
KMD, thus making these kmalloc more likely to fail. Return -ENOBUFS in
the case of these failures.
The expected UMD behavior upon returning -ENOBUFS is to split an array
of binds into a series of single binds.
Matthew Brost [Tue, 23 Jul 2024 01:02:30 +0000 (18:02 -0700)]
drm/xe: Fix xe_pt_abort_unbind
When restoring the children PT entries on a bind failure the incorrect
loop index has used resulting in PT entries being leaked. This is shown
by running xe_vm.bind-array-conflict-error-inject on a VRAM device going
into a suspend state after the test completes.
Lucas De Marchi [Fri, 19 Jul 2024 19:15:34 +0000 (12:15 -0700)]
drm/xe: Fix warning on unreachable statement
eu_type_to_str() relies on -Wswitch to warn (and -Werror) to make sure
it handles all enum values. However it's perfectly legal to pass an int
to that function so in the end that function may happen to return
nothing. There's too much implicit knowledge about the initialization
of eu_type for a compiler to notice eu_type is never assigned to
anything other than those values.
Trying to reproduce this issue, none of gcc-9, gcc-10 and gcc-13
triggered for me, but this was reported in a different system with
gcc-10:
drivers/gpu/drm/xe/xe.o: warning: objtool: xe_gt_topology_dump() falls through to next function xe_gt_topology_init()
Also it was reported these warnings when building with clang:
drivers/gpu/drm/xe/xe.o: warning: objtool: xe_gt_topology_dump+0x77: sibling call from callable instruction with modified stack frame
drivers/gpu/drm/xe/xe.o: warning: objtool: xe_gt_topology_dump() falls through to next function xe_dss_mask_group_ffs()
drivers/gpu/drm/xe/xe.o: warning: objtool: xe_gt_topology_dump+0x77: can't find jump dest instruction at .text.xe_gt_topology_dump+0xc0
Since that value is not really possible in real world, just take the
simple approach and return NULL.
Fixes: 7108b4a589cd ("drm/xe/uapi: Expose SIMD16 EU mask in topology query") Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240719191534.3845469-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Mon, 15 Jul 2024 18:05:38 +0000 (20:05 +0200)]
drm/xe: Add NEEDS_2M BO flag
In addition of NEEDS_64K BO flag, add similar one to force 2 MiB
alignment of the buffer objects. Explicitly use this flag during
VF LMEM provisioning as LMTT uses 2 MiB pages and one day we may
drop requirement of allocating pinned objects as contiguous.
Michal Wajdeczko [Sat, 20 Jul 2024 14:25:23 +0000 (16:25 +0200)]
drm/xe/tests: Add helpers for use in live tests
Instead of iterating over available Xe devices within a testcase,
without being able to distinguish potential failures from different
devices on system with many Xe devices, introduce helpers that will
allow to treat each Xe device as a parameter for the testcase like:
Michal Wajdeczko [Sat, 20 Jul 2024 14:25:22 +0000 (16:25 +0200)]
drm/xe: Introduce const cast helper
Typically we want to preserve pointer constness when converting
from one xe pointer to another, but in some rare cases, like kunit
parameter conversions, we might want to discard this constness.
Add a helper that we will use to clearly indicate our intention.
Matthew Brost [Fri, 19 Jul 2024 17:29:05 +0000 (10:29 -0700)]
drm/xe: Build PM into GuC CT layer
Take PM ref when any G2H are outstanding, drop when none are
outstanding.
To safely ensure we have PM ref when in the GuC CT layer, a PM ref needs
to be held when scheduler messages are pending too.
v2:
- Add outer PM protections to xe_file_close (CI)
v3:
- Only take PM ref 0->1 and drop on 1->0 (Matthew Auld)
v4:
- Add assert to G2H increment function
v5:
- Rebase
v6:
- Declare xe as local variable in xe_file_close (CI)
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-5-matthew.brost@intel.com
Matthew Brost [Fri, 19 Jul 2024 17:29:03 +0000 (10:29 -0700)]
drm/xe: Drop xe_gt_tlb_invalidation_wait
Having two methods to wait on GT TLB invalidations is not ideal. Remove
xe_gt_tlb_invalidation_wait and only use GT TLB invalidation fences.
In addition to two methods being less than ideal, once GT TLB
invalidations are coalesced the seqno cannot be assigned during
xe_gt_tlb_invalidation_ggtt/range. Thus xe_gt_tlb_invalidation_wait
would not have a seqno to wait one. A fence however can be armed and
later signaled.
v3:
- Add explaination about coalescing to commit message
v4:
- Don't put dma fence if defined on stack (CI)
v5:
- Initialize ret to zero (CI)
v6:
- Use invalidation_fence_signal helper in tlb timeout (Matthew Auld)
Michal Wajdeczko [Thu, 18 Jul 2024 20:31:55 +0000 (22:31 +0200)]
drm/xe/vf: Fix register value lookup
We should use the number of actual entries stored in the runtime
register buffer, not the maximum number of entries that this buffer
can hold, otherwise bsearch() may fail and we may miss the data and
wrongly report unexpected access to some registers.
Fixes: 4edadc41a3a4 ("drm/xe/vf: Use register values obtained from the PF") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240718203155.486-1-michal.wajdeczko@intel.com
drm/xe: Fix use after free when client stats are captured
xe_file_close triggers an asynchronous queue cleanup and then frees up
the xef object. Since queue cleanup flushes all pending jobs and the KMD
stores client usage stats into the xef object after jobs are flushed, we
see a use-after-free for the xef object. Resolve this by taking a
reference to xef from xe_exec_queue.
While at it, revert an earlier change that contained a partial work
around for this issue.
v2:
- Take a ref to xef even for the VM bind queue (Matt)
- Squash patches relevant to that fix and work around (Lucas)
v3: Fix typo (Lucas)
Fixes: ce62827bc294 ("drm/xe: Do not access xe file when updating exec queue run_ticks") Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1908 Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240718210548.3580382-5-umesh.nerlige.ramappa@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Lucas De Marchi [Wed, 10 Jul 2024 22:02:27 +0000 (15:02 -0700)]
drm/xe/uapi: Expose SIMD16 EU mask in topology query
PVC, Xe2 and later platforms have 16-wide EUs. We were implicitly
reporting for PVC the number of 16-wide EUs without giving userspace any
hint that they were different than for other platforms. Xe2 and later
also have 16-wide, but in those cases the reported number would
correspond to the 8-wide count.
To avoid confusion and make sure the right number is used by userspace
depending on the platform, add a new item to the topology query and drop
the one that is not available. The new mask reported for both PVC and
Xe2 should now match the numbers reported via hwconfig.
v2: Use a different topo item with EU type in its name to report the
new mask instead of adding the type itself as the item (Matt Roper)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Acked-by: Mateusz Jablonski <mateusz.jablonski@intel.com> Acked-by: Wenbin Lu <wenbin.lu@intel.com> Acked-by: Effie Yu <effie.yu@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240710220446.2169797-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Uma Shankar [Wed, 17 Jul 2024 08:22:52 +0000 (13:52 +0530)]
drm/xe/fbdev: Limit the usage of stolen for LNL+
As per recommendation in the workarounds:
WA_22019338487
There is an issue with accessing Stolen memory pages due a
hardware limitation. Limit the usage of stolen memory for
fbdev for LNL+. Don't use BIOS FB from stolen on LNL+ and
assign the same from system memory.
v2: Corrected the WA Number, limited WA to LNL and
Adopted XE_WA framework as suggested by Lucas and Matt.
v3: Introduced the waxxx_display to implement display side
of WA changes on Lunarlake. Used xe_root_mmio_gt and
avoid the for loop (Suggested by Lucas)
In xe2+ dgfx, we don't need to handle the copying of ccs
metadata during migration. This test validates the ccs data post
clear and copy during evict/restore operation. Thus, we can skip
this test on xe2+ dgfx.
drm/xe/migrate: Add kunit to test migration functionality for BMG
This part of kunit verifies that
- main data is decompressed and ccs data is clear post bo eviction.
- main data is raw copied and ccs data is clear post bo restore.
drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx
During eviction (vram->sysmem), we use compressed -> uncompressed mapping.
During restore (sysmem->vram), we need to use mapping from
uncompressed -> uncompressed.
Handle logic for selecting the compressed identity map for eviction,
and selecting uncompressed map for restore operations.
v2: Move check of xe_migrate_ccs_emit() before calling
xe_migrate_ccs_copy(). (Nirmoy)
drm/xe/xe2: Introduce identity map for compressed pat for vram
Xe2+ has unified compression (exactly one compression mode/format),
where compression is now controlled via PAT at PTE level.
This simplifies KMD operations, as it can now decompress freely
without concern for the buffer's original compression format—unlike DG2,
which had multiple compression formats and thus required copying the
raw CCS state during VRAM eviction. In addition mixed VRAM and system
memory buffers were not supported with compression enabled.
On Xe2 dGPU compression is still only supported with VRAM, however we
can now support compression with VRAM and system memory buffers,
with GPU access being seamless underneath. So long as when doing
VRAM -> system memory the KMD uses compressed -> uncompressed,
to decompress it. This also allows CPU access to such buffers,
assuming that userspace first decompress the corresponding
pages being accessed.
If the pages are already in system memory then KMD would have already
decompressed them. When restoring such buffers with sysmem -> VRAM
the KMD can't easily know which pages were originally compressed,
so we always use uncompressed -> uncompressed here.
With this it also means we can drop all the raw CCS handling on such
platforms (including needing to allocate extra CCS storage).
In order to support this we now need to have two different identity
mappings for compressed and uncompressed VRAM.
In this patch, we set up the additional identity map for the VRAM with
compressed pat_index. We then select the appropriate mapping during
migration/clear. During eviction (vram->sysmem), we use the mapping
from compressed -> uncompressed. During restore (sysmem->vram), we need
the mapping from uncompressed -> uncompressed.
Therefore, we need to have two different mappings for compressed and
uncompressed vram. We set up an additional identity map for the vram
with compressed pat_index.
We then select the appropriate mapping during migration/clear.
v2: Formatting nits, Updated code to match recent changes in
xe_migrate_prepare_vm(). (Matt)
v3: Move identity map loop to a helper function. (Matt Brost)
v4: Split helper function in different patch, and
add asserts and nits. (Matt Brost)
v5: Convert the 2 bool arguments of pte_update_size to flags
argument (Matt Brost)
drm/xe/migrate: Add kunit to test clear functionality
This test verifies if the main and ccs data are cleared during bo creation.
The motivation to use Kunit instead of IGT is that, although we can verify
whether the data is zero following bo creation,
we cannot confirm whether the zero value after bo creation is the result of
our clear function or simply because the initial data present was zero.
v2: Updated the mutex_lock and unlock logic,
Changed out_unlock to out_put. (Matt)
drm/xe/migrate: Handle clear ccs logic for xe2 dgfx
For Xe2 dGPU, we clear the bo by modifying the VRAM using an
uncompressed pat index which then indirectly updates the
compression status as uncompressed i.e zeroed CCS.
So xe_migrate_clear() should be updated for BMG to not
emit CCS surf copy commands.
v2: Moved xe_device_needs_ccs_emit() to xe_migrate.c and changed
name to xe_migrate_needs_ccs_emit() since its very specific to
migration.(Matt)
Matthew Brost [Tue, 16 Jul 2024 06:39:01 +0000 (23:39 -0700)]
drm/xe: Wedge the entire device
Wedge the entire device, not just GT which may have triggered the wedge.
To implement this, cleanup the layering so xe_device_declare_wedged()
calls into the lower layers (GT) to ensure entire device is wedged.
While we are here, also signal any pending GT TLB invalidations upon
wedging device.
Lastly, short circuit reset wait if device is wedged.
v2:
- Short circuit reset wait if device is wedged (Local testing)
Michal Wajdeczko [Sat, 13 Jul 2024 14:26:43 +0000 (16:26 +0200)]
drm/xe/vf: Track writes to inaccessible registers from VF
Only limited set of registers is accessible for the VF driver and
the hardware will silently drop writes to inaccessible registers.
To improve our VF driver lets intercept all such writes to warn
about such unexpected writes on debug builds or optionally allow
to provide some substitution (as a potential future extension).
Wa_15015404425 asks us to perform four "dummy" writes to a
non-existent register offset before every real register read.
Although the specific offset of the writes doesn't directly
matter, the workaround suggests offset 0x130030 as a good target
so that these writes will be easy to recognize and filter out in
debugging traces.
V5(MattR):
- Avoid negating an equality comparison
V4(MattR):
- Use writel and remove xe_reg usage
V3(MattR):
- Define dummy reg local to function
- Avoid tracing dummy writes
- Update commit message
V2:
- Add WA to 8/16/32bit reads also - MattR
- Corrected dummy reg address - MattR
- Use for loop to avoid mental pause - JaniN
Michal Wajdeczko [Thu, 11 Jul 2024 19:23:19 +0000 (21:23 +0200)]
drm/xe/pf: Limit fair VF LMEM provisioning
Due to the current design of the BO and VRAM manager, any object
with XE_BO_FLAG_PINNED flag, which the PF driver uses during VF
LMEM provisionining, is created with the TTM_PL_FLAG_CONTIGUOUS
flag, which may cause VRAM fragmentation that prevents subsequent
allocations of larger objects, like fair VF LMEM provisioning.
To avoid such failures, round down fair VF LMEM provisioning size
to next power of two size, to compensate what xe_ttm_vram_mgr is
doing to achieve contiguous allocations.
Fixes: ac6598aed1b3 ("drm/xe/pf: Add support to configure SR-IOV VFs") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240711192320.1198-2-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Mon, 8 Jul 2024 11:12:10 +0000 (13:12 +0200)]
drm/xe/kunit: Simplify xe_mocs live tests code layout
The test case logic is implemented by the functions compiled as
part of the core Xe driver module and then exported to build and
register the test suite in the live test module.
But we don't need to export individual test case functions, we may
just export the entire test suite. And we don't need to register
this test suite in a separate file, it can be done in the main
file of the live test module.
Michal Wajdeczko [Mon, 8 Jul 2024 11:12:09 +0000 (13:12 +0200)]
drm/xe/kunit: Simplify xe_migrate live tests code layout
The test case logic is implemented by the functions compiled as
part of the core Xe driver module and then exported to build and
register the test suite in the live test module.
But we don't need to export individual test case functions, we may
just export the entire test suite. And we don't need to register
this test suite in a separate file, it can be done in the main
file of the live test module.
Michal Wajdeczko [Mon, 8 Jul 2024 11:12:08 +0000 (13:12 +0200)]
drm/xe/kunit: Simplify xe_dma_buf live tests code layout
The test case logic is implemented by the functions compiled as
part of the core Xe driver module and then exported to build and
register the test suite in the live test module.
But we don't need to export individual test case functions, we may
just export the entire test suite. And we don't need to register
this test suite in a separate file, it can be done in the main
file of the live test module.
Michal Wajdeczko [Mon, 8 Jul 2024 11:12:07 +0000 (13:12 +0200)]
drm/xe/kunit: Simplify xe_bo live tests code layout
The test case logic is implemented by the functions compiled as
part of the core Xe driver module and then exported to build and
register the test suite in the live test module.
But we don't need to export individual test case functions, we may
just export the entire test suite. And we don't need to register
this test suite in a separate file, it can be done in the main
file of the live test module.
Lucas De Marchi [Mon, 8 Jul 2024 21:29:06 +0000 (14:29 -0700)]
drm/xe: Generate oob before compiling anything
Instead of keep adding more dependencies as WAs are needed in different
places of the driver, just add a rule with all the objects so the code
generation happens before anything else.
While at it, group lines related to wa_oob in the Makefile.
Matthew Brost [Mon, 8 Jul 2024 21:10:08 +0000 (14:10 -0700)]
drm/xe: Drop trace_xe_hw_fence_free
fence->ctx may be stale memory when trace_xe_hw_fence_free is called
resuling UAF bug when deriving the device name. This tracepoint is not
all that useful, so just drop it.
Fixes: 501c4255c409 ("drm/xe/trace: Print device_id in xe_trace events") Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Gustavo Sousa <gustavo.sousa@intel.com> Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240708211008.956384-1-matthew.brost@intel.com
Thomas Hellström [Fri, 5 Jul 2024 13:28:28 +0000 (15:28 +0200)]
drm/xe: Use write-back caching mode for system memory on DGFX
The caching mode for buffer objects with VRAM as a possible
placement was forced to write-combined, regardless of placement.
However, write-combined system memory is expensive to allocate and
even though it is pooled, the pool is expensive to shrink, since
it involves global CPU TLB flushes.
Moreover write-combined system memory from TTM is only reliably
available on x86 and DGFX doesn't have an x86 restriction.
So regardless of the cpu caching mode selected for a bo,
internally use write-back caching mode for system memory on DGFX.
Coherency is maintained, but user-space clients may perceive a
difference in cpu access speeds.
v2:
- Update RB- and Ack tags.
- Rephrase wording in xe_drm.h (Matt Roper)
v3:
- Really rephrase wording.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode") Cc: Pallavi Mishra <pallavi.mishra@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: dri-devel@lists.freedesktop.org Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Effie Yu <effie.yu@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Jose Souza <jose.souza@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Acked-by: Matthew Auld <matthew.auld@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode") Acked-by: Michal Mrozek <michal.mrozek@intel.com> Acked-by: Effie Yu <effie.yu@intel.com> #On chat Link: https://patchwork.freedesktop.org/patch/msgid/20240705132828.27714-1-thomas.hellstrom@linux.intel.com
Matthew Auld [Wed, 3 Jul 2024 12:43:39 +0000 (13:43 +0100)]
drm/i915: disable fbc due to Wa_16023588340
On BMG-G21 we need to disable fbc due to complications around the WA.
v2:
- Try to handle with i915_drv.h and compat layer. (Rodrigo)
v3:
- For simplicity retreat back to the original design for now.
- Drop the extra \ from the Makefile (Jani)
Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Vinod Govindapillai <vinod.govindapillai@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: intel-gfx@lists.freedesktop.org Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703124338.208220-4-matthew.auld@intel.com
Matthew Auld [Wed, 3 Jul 2024 12:43:38 +0000 (13:43 +0100)]
drm/xe/bmg: implement Wa_16023588340
This involves enabling l2 caching of host side memory access to VRAM
through the CPU BAR. The main fallout here is with display since VRAM
writes from CPU can now be cached in GPU l2, and display is never
coherent with caches, so needs various manual flushing. In the case of
fbc we disable it due to complications in getting this to work
correctly (in a later patch).
Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Vinod Govindapillai <vinod.govindapillai@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703124338.208220-3-matthew.auld@intel.com