]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
11 months agodrm/xe/vf: Use only assigned GGTT region
Michal Wajdeczko [Mon, 27 May 2024 11:20:15 +0000 (13:20 +0200)]
drm/xe/vf: Use only assigned GGTT region

Each VF is assigned a limited range of the GGTT address space.
To ensure that the VF driver does not use GGTT allocations outside
of the assigned region, explicitly reserve GGTT space below and
above this region when initializing GGTT.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240527112015.1020-1-michal.wajdeczko@intel.com
11 months agodrm/xe/vf: Read VF configuration prior to GGTT initialization
Michal Wajdeczko [Fri, 24 May 2024 11:37:13 +0000 (13:37 +0200)]
drm/xe/vf: Read VF configuration prior to GGTT initialization

Each VF will be assigned with only a limited range of the GGTT
address space. Make sure that VF driver will read its own GGTT
configuration before starting any GGTT initialization.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240524113714.932-2-michal.wajdeczko@intel.com
11 months agodrm/xe/vf: Treat GMDID as another runtime register
Michal Wajdeczko [Thu, 23 May 2024 19:22:40 +0000 (21:22 +0200)]
drm/xe/vf: Treat GMDID as another runtime register

While the GMDID registers are not part of the runtime register list
shared by the PF driver, we may still return cached values from our
VF specific read32() helper function.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-7-michal.wajdeczko@intel.com
11 months agodrm/xe/vf: Cache value of the GMDID register
Michal Wajdeczko [Thu, 23 May 2024 19:22:39 +0000 (21:22 +0200)]
drm/xe/vf: Cache value of the GMDID register

Read and cache value of the GMDID register as part of the config
query that VF driver is doing over MMIO.

While the VF driver likely already obtained the value of the GMDID
register once during the early driver probe, we couldn't cache it
then as the GT structures were not ready yet.

Cache it now, in case the driver needs it later when the GuC MMIO
communication, required to query GMDID from GuC, could be no longer
desired as it will be replaced by the CTB communication.

While around, assert that we will query GMDID only when applicable.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-6-michal.wajdeczko@intel.com
11 months agodrm/xe/vf: Provide early access to GMDID register
Michal Wajdeczko [Thu, 23 May 2024 22:30:42 +0000 (00:30 +0200)]
drm/xe/vf: Provide early access to GMDID register

VFs do not have direct access to the GMDID register and must obtain
its value from the GuC. Since we need GMDID value very early in the
driver probe flow, before we even start the full setup of GT and GuC
data structures, we must do some early initializations ourselves.

Additionally, since we also need GMDID for the media GT, which isn't
created yet, temporarly tweak the root GT type into MEDIA to allow
communication with the correct GuC, as only it can provide the value
of the media GMDID register.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523223042.888-1-michal.wajdeczko@intel.com
11 months agodrm/xe/vf: Obtain value of GMDID register from GuC
Michal Wajdeczko [Thu, 23 May 2024 19:22:37 +0000 (21:22 +0200)]
drm/xe/vf: Obtain value of GMDID register from GuC

VFs don't have access to the GMDID register and must obtain it
value using GuC VF ABI KLV query. Add function for doing that.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-4-michal.wajdeczko@intel.com
11 months agodrm/xe/guc: Add GLOBAL_CFG_GMD_ID KLV definition
Michal Wajdeczko [Thu, 23 May 2024 19:22:36 +0000 (21:22 +0200)]
drm/xe/guc: Add GLOBAL_CFG_GMD_ID KLV definition

VF drivers can't access GMD_ID register over MMIO.
The value of the GMD_ID register must be queried from GuC.
It is available as GLOBAL_CFG_GMD_ID KLV.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-3-michal.wajdeczko@intel.com
11 months agodrm/xe/vf: Use register values obtained from the PF
Michal Wajdeczko [Thu, 23 May 2024 19:22:35 +0000 (21:22 +0200)]
drm/xe/vf: Use register values obtained from the PF

As part of the its initialization, the VF driver has already
obtained a list of the runtime (fuse) register values from the
PF driver. When VF driver is attempting to read register that is
inaccessible to the VF, use the values from this list instead.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-2-michal.wajdeczko@intel.com
11 months agodrm/xe/guc: Port over the slow GuC loading support from i915
John Harrison [Sat, 18 May 2024 04:36:59 +0000 (21:36 -0700)]
drm/xe/guc: Port over the slow GuC loading support from i915

GuC loading can take longer than it is supposed to for various
reasons. So add in the code to cope with that and to report it when it
happens. There are also many different reasons why GuC loading can
fail, so add in the code for checking for those and for reporting
issues in a meaningful manner rather than just hitting a timeout and
saying 'fail: status = %x'.

Also, remove the 'FIXME' comment about an i915 bug that has never been
applicable to Xe!

v2: Actually report the requested and granted frequencies rather than
showing granted twice (review feedback from Badal).
v3: Locally code all the timeout and end condition handling because a
helper function is not allowed (review feedback from Lucas/Rodrigo).
v4: Add more documentation comments and rename a define to add units
(review feedback from Lucas).
v5: Fix copy/paste error in xe_mmio_wait32_not (review feedback from
Lucas) and rebase (no more return value from guc_wait_ucode).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-3-John.C.Harrison@Intel.com
11 months agodrm/xe: Make read_perf_limit_reasons globally accessible
John Harrison [Sat, 18 May 2024 04:36:58 +0000 (21:36 -0700)]
drm/xe: Make read_perf_limit_reasons globally accessible

Other driver code beyond the sysfs interface wants to know about
throttling. So make the query function globally accessible.

v2: Revert include order change (review feedback from Lucas)
v3: Remove '_sysfs' from throttle file names and keep limit query in
the same file rather than moving elsewhere (review feedback from
Rodrigo).
v4: Correct #include while renaming header file (review feedback
from Lucas).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-2-John.C.Harrison@Intel.com
11 months agodrm/xe: Nuke simple error capture
José Roberto de Souza [Wed, 22 May 2024 20:34:31 +0000 (13:34 -0700)]
drm/xe: Nuke simple error capture

This error capture prints into dmesg HW state when a gpu hang happens.
It was useful when we did not had devcoredump, now it is a incompleted
version of devcoredump that has potential to flood dmesg.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522203431.191594-1-jose.souza@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: Add process name to devcoredump
José Roberto de Souza [Wed, 22 May 2024 20:12:03 +0000 (13:12 -0700)]
drm/xe: Add process name to devcoredump

Process name help us track what application caused the gpug hang, this
is crucial when running several applications at the same time.

v2:
- handle Xe KMD exec_queues without VM

v3:
- use get_pid_task() (suggested by Nirmoy)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522201203.145403-1-jose.souza@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: remove unused struct 'xe_gt_desc'
Dr. David Alan Gilbert [Wed, 22 May 2024 17:58:40 +0000 (18:58 +0100)]
drm/xe: remove unused struct 'xe_gt_desc'

'xe_gt_desc' is unused since
commit 1e6c20be6c83 ("drm/xe: Drop extra_gts[] declarations and
XE_GT_TYPE_REMOTE").

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link:
https://patchwork.freedesktop.org/patch/msgid/20240522175840.382107-1-linux@treblig.org
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: Enable D3Cold on 'low' VRAM utilization
Rodrigo Vivi [Wed, 22 May 2024 17:01:05 +0000 (13:01 -0400)]
drm/xe: Enable D3Cold on 'low' VRAM utilization

Now that we eliminated all the mem_access get/put with its
locking issues from the inner calls of migration, we can
allow D3Cold.

Enable it when VRAM utilization is lower then 300Mb. On
higher utilization we only allow D3hot so we don't increase
so much the latency on runtime resume due to the memory
restoration.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-7-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: Stop checking for power_lost on D3Cold
Rodrigo Vivi [Wed, 22 May 2024 17:01:04 +0000 (13:01 -0400)]
drm/xe: Stop checking for power_lost on D3Cold

GuC reset status is not reliable for this purpose and it is
once in a while ending up in a situation of D3Cold, where
power_reset is false and without the proper memory restoration
the GuC reload and Display will fail to come back from D3Cold.

So, let's do a full restoration of everything if we have a risk
of losing power, without further optimizations.

v2: also remove the gut_in_reset function (Anshuman)

Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-6-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: Prepare display for D3Cold
Rodrigo Vivi [Wed, 22 May 2024 17:01:03 +0000 (13:01 -0400)]
drm/xe: Prepare display for D3Cold

Prepare power-well and DC handling for a full power
lost during D3Cold, then sanitize it upon D3->D0.
Otherwise we get a bunch of state mismatch.

Ideally we could leave DC9 enabled and wouldn't need
to move DC9->DC0 on every runtime resume, however,
the disable_DC is part of the power-well checks and
intrinsic to the dc_off power well. In the future that
can be detangled so we can have even bigger power savings.
But for now, let's focus on getting a D3Cold, which saves
much more power by itself.

v2: create new functions to avoid full-suspend-resume path,
which would result in a deadlock between xe_gem_fault and the
modeset-ioctl.

v3: Only avoid the full modeset to avoid the race, for a more
robust suspend-resume.

Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-5-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: Relax runtime pm protection around VM
Rodrigo Vivi [Wed, 22 May 2024 17:01:02 +0000 (13:01 -0400)]
drm/xe: Relax runtime pm protection around VM

In the regular use case scenario, user space will create a
VM, and keep it alive for the entire duration of its workload.

For the regular desktop cases, it means that the VM
is alive even on idle scenarios where display goes off. This
is unacceptable since this would entirely block runtime PM
indefinitely, blocking deeper Package-C state. This would be
a waste drainage of power.

Limit the VM protection solely for long-running workloads that
are not protected by the scheduler references.
By design, run_job for long-running workloads returns NULL and
the scheduler drops all the references of it, hence protecting
the VM for this case is necessary.

v2: Update commit message to a more imperative language and to
    reflect why the VM protection is really needed.
    Also add a comment in the code to let the reason visbible.

v3: Remove vma_access case and the mentions to mmap. Mmap cases
    are already protected by the gem page fault.

Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-4-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: Relax runtime pm protection during execution
Rodrigo Vivi [Wed, 22 May 2024 17:01:01 +0000 (13:01 -0400)]
drm/xe: Relax runtime pm protection during execution

Limit the protection only during moments of actual job execution,
and introduce protection for guc submit fini, which is currently
unprotected due to the absence of exec_queue life protection.

In the regular use case scenario, user space will create an
exec queue, and keep it alive to reuse that until it is done
with that kind of workload.

For the regular desktop cases, it means that the exec_queue
is alive even on idle scenarios where display goes off. This
is unacceptable since this would entirely block runtime PM
indefinitely, blocking deeper Package-C state. This would be
a waste drainage of power.

Cc: Matthew Brost <matthew.brost@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-3-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: Fix xe_pm_runtime_get_if_in_use documentation
Rodrigo Vivi [Wed, 22 May 2024 17:01:00 +0000 (13:01 -0400)]
drm/xe: Fix xe_pm_runtime_get_if_in_use documentation

Let's be clear on what it is actually doing and align with
xe_pm_runtime_get_if_active doc style.

Tested-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: Fix xe_pm_runtime_get_if_active return
Rodrigo Vivi [Wed, 22 May 2024 17:00:59 +0000 (13:00 -0400)]
drm/xe: Fix xe_pm_runtime_get_if_active return

Current callers of this function are already taking the result
to a boolean and using in an if. It might be a problem because
current function might return negative error codes on failure,
without increasing the reference counter.

In this scenario we could end up with extra 'put' call ending
in unbalanced scenarios.

Let's fix it, while aligning with the current xe_pm_get_if_in_use
style.

Tested-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: Properly handle alloc_guc_id() failure
Niranjana Vishwanathapura [Tue, 21 May 2024 20:17:11 +0000 (13:17 -0700)]
drm/xe: Properly handle alloc_guc_id() failure

Release the submission_state lock if alloc_guc_id() fails.

v2: Add Fixes tag and CC stable kernel

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521201711.4934-1-niranjana.vishwanathapura@intel.com
11 months agodrm/xe/uc: Don't emit false error if running in execlist mode
Michal Wajdeczko [Tue, 21 May 2024 11:48:57 +0000 (13:48 +0200)]
drm/xe/uc: Don't emit false error if running in execlist mode

When running in execlist mode (using force_execlist=1 modparam)
we incorrectly select the error path in xe_uc_init(), leading to
an unwanted error message like this:

[ ] xe 0000:00:00.0: [drm] *ERROR* GT0: Failed to initialize uC (0000000000000000)

Fix that by doing early return like we do in other similar cases.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521114857.712-1-michal.wajdeczko@intel.com
11 months agodrm/xe/display: move device_remove over to drmm
Matthew Auld [Wed, 22 May 2024 10:22:01 +0000 (11:22 +0100)]
drm/xe/display: move device_remove over to drmm

i915 display calls this when releasing the drm_device, match this also
in xe by using drmm. intel_display_device_remove() is freeing purely
software state for the drm_device.

v2: fix build error

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-36-matthew.auld@intel.com
11 months agodrm/xe/display: stop calling domains_driver_remove twice
Matthew Auld [Wed, 22 May 2024 10:22:00 +0000 (11:22 +0100)]
drm/xe/display: stop calling domains_driver_remove twice

Unclear why we call this twice.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-35-matthew.auld@intel.com
11 months agodrm/xe/display: move display fini stuff to devm
Matthew Auld [Wed, 22 May 2024 10:21:59 +0000 (11:21 +0100)]
drm/xe/display: move display fini stuff to devm

Match the i915 display handling here with calling both no_irq and
noaccel when removing the device.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-34-matthew.auld@intel.com
11 months agodrm/xe: reset mmio mappings with devm
Matthew Auld [Wed, 22 May 2024 10:21:58 +0000 (11:21 +0100)]
drm/xe: reset mmio mappings with devm

Set our various mmio mappings to NULL. This should make it easier to
catch something rogue trying to mess with mmio after device removal. For
example, we might unmap everything and then start hitting some mmio
address which has already been unmamped by us and then remapped by
something else, causing all kinds of carnage.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-33-matthew.auld@intel.com
11 months agodrm/xe/mmio: move mmio_fini over to devm
Matthew Auld [Wed, 22 May 2024 10:21:57 +0000 (11:21 +0100)]
drm/xe/mmio: move mmio_fini over to devm

Not valid to touch mmio once the device is removed, so make sure we
unmap on removal and not just when driver instance goes away. Also set
the mmio pointers to NULL to hopefully catch such issues more easily.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-32-matthew.auld@intel.com
11 months agodrm/xe: make gt_remove use devm
Matthew Auld [Wed, 22 May 2024 10:21:56 +0000 (11:21 +0100)]
drm/xe: make gt_remove use devm

No need to hand roll the onion unwind here, just move gt_remove over to
devm which will already have the correct ordering.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-31-matthew.auld@intel.com
11 months agodrm/xe/gt: break out gt_fini into sw vs hw state
Matthew Auld [Wed, 22 May 2024 10:21:55 +0000 (11:21 +0100)]
drm/xe/gt: break out gt_fini into sw vs hw state

Have a cleaner separation between hw vs sw.

v2: Fix missing return

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-30-matthew.auld@intel.com
11 months agodrm/xe/coredump: move over to devm
Matthew Auld [Wed, 22 May 2024 10:21:54 +0000 (11:21 +0100)]
drm/xe/coredump: move over to devm

Here we are using drmm to ensure we release the coredump when unloading
the module, however the coredump is very much tied to the struct device
underneath. We can see this when we hotunplug the device, for which we
have already got a coredump attached. In such a case the coredump still
remains and adding another is not possible. However we still register
the release action via xe_driver_devcoredump_fini(), so in effect two or
more releases for one dump.  The other consideration is that the
coredump state is embedded in the xe_driver instance, so technically
once the drmm release action fires we might free the coredumpe state
from a different driver instance, assuming we have two release actions
and they can race. Rather use devm here to remove the coredump when the
device is released.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1679
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-29-matthew.auld@intel.com
11 months agodrm/xe/device: move xe_device_sanitize over to devm
Matthew Auld [Wed, 22 May 2024 10:21:53 +0000 (11:21 +0100)]
drm/xe/device: move xe_device_sanitize over to devm

Disable GuC submission when removing the device.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-28-matthew.auld@intel.com
11 months agodrm/xe/device: move flr to devm
Matthew Auld [Wed, 22 May 2024 10:21:52 +0000 (11:21 +0100)]
drm/xe/device: move flr to devm

Should be called when driver is removed, not when this particular driver
instance is destroyed.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-27-matthew.auld@intel.com
11 months agodrm/xe/irq: move irq_uninstall over to devm
Matthew Auld [Wed, 22 May 2024 10:21:51 +0000 (11:21 +0100)]
drm/xe/irq: move irq_uninstall over to devm

Makes sense to trigger this when the device is removed.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-26-matthew.auld@intel.com
11 months agodrm/xe/guc_pc: s/pc_fini/pc_fini_hw/
Matthew Auld [Wed, 22 May 2024 10:21:50 +0000 (11:21 +0100)]
drm/xe/guc_pc: s/pc_fini/pc_fini_hw/

Make it clear that is about cleaning up the HW/FW side, and not software
state.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-25-matthew.auld@intel.com
11 months agodrm/xe/guc_pc: move pc_fini to devm
Matthew Auld [Wed, 22 May 2024 10:21:49 +0000 (11:21 +0100)]
drm/xe/guc_pc: move pc_fini to devm

Here we are touching the HW/GuC and presumably this should happen when
the device is removed. Currently if you hotunplug the device this is
skipped if there is already open driver instance.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-24-matthew.auld@intel.com
11 months agodrm/xe/guc: s/guc_fini/guc_fini_hw/
Matthew Auld [Wed, 22 May 2024 10:21:48 +0000 (11:21 +0100)]
drm/xe/guc: s/guc_fini/guc_fini_hw/

Make it clear that is about cleaning up the HW/FW side, and not software
state.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-23-matthew.auld@intel.com
11 months agodrm/xe/guc: move guc_fini over to devm
Matthew Auld [Wed, 22 May 2024 10:21:47 +0000 (11:21 +0100)]
drm/xe/guc: move guc_fini over to devm

Make sure to actually call this when the device is removed. Currently we
only trigger it when the driver instance goes away, but that doesn't
work too well with hotunplug, since device can be removed and re-probed
with a new driver instance, where the guc_fini() is called too late.
Move the fini over to devm to ensure this is called when device is
removed.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-22-matthew.auld@intel.com
11 months agodrm/xe/ggtt: use drm_dev_enter to mark device section
Matthew Auld [Wed, 22 May 2024 10:21:46 +0000 (11:21 +0100)]
drm/xe/ggtt: use drm_dev_enter to mark device section

Device can be hotunplugged before we start destroying gem objects. In
such a case don't touch the GGTT entries, trigger any invalidations or
mess around with rpm.  This should already be taken care of when
removing the device, we just need to take care of dealing with the
software state, like removing the mm node.

v2: (Andrzej)
  - Avoid some duplication by tracking the bound status and checking
    that instead.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-21-matthew.auld@intel.com
11 months agodrm/xe: covert sysfs over to devm
Matthew Auld [Wed, 22 May 2024 10:21:45 +0000 (11:21 +0100)]
drm/xe: covert sysfs over to devm

Hotunplugging the device seems to result in stuff like:

kobject_add_internal failed for tile0 with -EEXIST, don't try to
register things with the same name in the same directory.

We only remove the sysfs as part of drmm, however that is tied to the
lifetime of the driver instance and not the device underneath. Attempt
to fix by using devm for all of the remaining sysfs stuff related to the
device.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1667
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1432
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-20-matthew.auld@intel.com
11 months agodrm/xe/pci: remove broken driver_release
Matthew Auld [Wed, 22 May 2024 10:21:44 +0000 (11:21 +0100)]
drm/xe/pci: remove broken driver_release

This is quite broken since we are nuking the pdev link to the private
driver struct, but note here that driver_release is called when the
drm_device is released (poor mans drmm), which can be long after the
device has been removed. So here what we are actually doing is nuking
the pdev link for what is potentially bound to a different drm_device.
If that happens before our pci remove callback is triggered (for the new
drm_device) we silently exit and skip some important cleanup steps,
resulting in hilarity.

There should be no reason to implement driver_release, when we already
have nicer stuff like drmm, so just remove completely. The actual pdev
link is already nuked when removing the device.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-19-matthew.auld@intel.com
11 months agodrm/xe/vf: Custom GuC initialization if VF
Michal Wajdeczko [Tue, 21 May 2024 09:25:18 +0000 (11:25 +0200)]
drm/xe/vf: Custom GuC initialization if VF

The GuC firmware is loaded and initialized by the PF driver. Make
sure VF drivers only perform permitted operations. For submission
initialization, use number of GuC context IDs from self config.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521092518.624-3-michal.wajdeczko@intel.com
11 months agodrm/xe/guc: Allow to initialize submission with limited set of IDs
Michal Wajdeczko [Tue, 21 May 2024 09:25:17 +0000 (11:25 +0200)]
drm/xe/guc: Allow to initialize submission with limited set of IDs

While PF and native drivers may initialize submission code to use
all available GuC contexts IDs, the VF driver may only use limited
number of IDs. Update init function to accept number of context
IDs available for use.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521092518.624-2-michal.wajdeczko@intel.com
11 months agodrm/xe: Cleanup xe_mmio.h
Michal Wajdeczko [Mon, 20 May 2024 18:18:14 +0000 (20:18 +0200)]
drm/xe: Cleanup xe_mmio.h

We don't need <linux/delay.h> include since commit 5c09bd6ccd41
("drm/xe/mmio: Move xe_mmio_wait32() to xe_mmio.c").

We don't need <linux/io-64-nonatomic-lo-hi.h> include since commit
54c659660d63 ("drm/xe: Make xe_mmio_read|write() functions non-inline").

And since commit 924e6a9789a0 ("drm/xe/uapi: Remove MMIO ioctl")
we don't need forward declarations of drm_device and drm_file.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240520181814.2392-4-michal.wajdeczko@intel.com
11 months agodrm/xe: Don't rely on indirect includes from xe_mmio.h
Michal Wajdeczko [Mon, 20 May 2024 18:18:13 +0000 (20:18 +0200)]
drm/xe: Don't rely on indirect includes from xe_mmio.h

These compilation units use udelay() or some GT oriented printk
functions without explicitly including proper header files, and
relying on #includes from the xe_mmio.h instead. Fix that.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240520181814.2392-3-michal.wajdeczko@intel.com
11 months agodrm/i915/display: Add missing include to intel_vga.c
Michal Wajdeczko [Mon, 20 May 2024 18:18:12 +0000 (20:18 +0200)]
drm/i915/display: Add missing include to intel_vga.c

This compilation unit uses udelay() function without including
it's header file. Fix that to break dependency on other code.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240520181814.2392-2-michal.wajdeczko@intel.com
11 months agodrm/xe: Fix xe_guc_pc.h
Michal Wajdeczko [Tue, 21 May 2024 10:28:28 +0000 (12:28 +0200)]
drm/xe: Fix xe_guc_pc.h

Prefer forward declaration over #include xe_guc_pc_types.h

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521102828.668-5-michal.wajdeczko@intel.com
11 months agodrm/xe: Fix xe_huc.h
Michal Wajdeczko [Tue, 21 May 2024 10:28:27 +0000 (12:28 +0200)]
drm/xe: Fix xe_huc.h

Prefer forward declaration over #include xe_huc_types.h

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521102828.668-4-michal.wajdeczko@intel.com
11 months agodrm/xe: Fix xe_gsc.h
Michal Wajdeczko [Tue, 21 May 2024 10:28:26 +0000 (12:28 +0200)]
drm/xe: Fix xe_gsc.h

Prefer forward declaration over #include xe_gsc_types.h

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521102828.668-3-michal.wajdeczko@intel.com
11 months agodrm/xe: Fix xe_uc.h
Michal Wajdeczko [Tue, 21 May 2024 10:28:25 +0000 (12:28 +0200)]
drm/xe: Fix xe_uc.h

Prefer forward declaration over #include xe_uc_types.h

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521102828.668-2-michal.wajdeczko@intel.com
11 months agodrm/xe/tests: Use uninterruptible VM lock
Nirmoy Das [Tue, 21 May 2024 10:27:15 +0000 (12:27 +0200)]
drm/xe/tests: Use uninterruptible VM lock

Interruptible lock can return error and needed a return value
check. This test should finish quick enough so use a uninterruptible
lock instead.

Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521102715.22700-1-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
11 months agodrm/xe: Add warn when level can not be zero.
Nirmoy Das [Tue, 21 May 2024 10:36:23 +0000 (12:36 +0200)]
drm/xe: Add warn when level can not be zero.

At xe_pt_zap_ptes_entry() and xe_pt_stage_unbind_entry, the level cannot
be 0. Therefore, add an independent check for the level. Since the level
cannot be zero at this point, there is no need to check for `is_compact`,
so remove that instead.

Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521103623.11645-1-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
11 months agodrm/xe/uapi: Expose the L3 bank mask
Francois Dugast [Tue, 16 Apr 2024 14:50:37 +0000 (14:50 +0000)]
drm/xe/uapi: Expose the L3 bank mask

The L3 bank mask is already generated and stored internally with
the rest of the GT topology. In user space, the compute runtime
now needs this information to be added to the device properties
therefore the topology mask query is extended to provide a new
mask which represents the L3 banks enabled on the GT.

The changes in the compute runtime are ready and approved, see
link below.

v2: Rewrite commit message and add a link to the compute
    runtime PR (Francois Dugast)

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Robert Krzemien <robert.krzemien@intel.com>
Cc: Mateusz Jablonski <mateusz.jablonski@intel.com>
Link: https://github.com/intel/compute-runtime/pull/722
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Acked-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240416145037.7-2-francois.dugast@intel.com
11 months agodrm/xe/client: Print runtime to fdinfo
Lucas De Marchi [Fri, 17 May 2024 20:43:10 +0000 (13:43 -0700)]
drm/xe/client: Print runtime to fdinfo

Print the accumulated runtime for client when printing fdinfo.
Each time a query is done it first does 2 things:

1) loop through all the exec queues for the current client and
   accumulate the runtime, per engine class. CTX_TIMESTAMP is used for
   that, being read from the context image.

2) Read a "GPU timestamp" that can be used for considering "how much GPU
   time has passed" and that has the same unit/refclock as the one
   recording the runtime. RING_TIMESTAMP is used for that via MMIO.

Since for all current platforms RING_TIMESTAMP follows the same
refclock, just read it once, using any first engine available.

This is exported to userspace as 2 numbers in fdinfo:

drm-cycles-<class>: <RUNTIME>
drm-total-cycles-<class>: <TIMESTAMP>

Userspace is expected to collect at least 2 samples, which allows to
know the client engine busyness as per:

    RUNTIME1 - RUNTIME0
busyness = ---------------------
  T1 - T0

Since drm-cycles-<class> always starts at 0, it's also possible to know
if and engine was ever used by a client.

It's expected that userspace will read any 2 samples every few seconds.
Given the update frequency of the counters involved and that
CTX_TIMESTAMP is 32-bits, the counter for each exec_queue can wrap
around (assuming 100% utilization) after ~200s. The wraparound is not
perceived by userspace since it's just accumulated for all the
exec_queues in a 64-bit counter) but the measurement will not be
accurate if the samples are too far apart.

This could be mitigated by adding a workqueue to accumulate the counters
every so often, but it's additional complexity for something that is
done already by userspace every few seconds in tools like gputop (from
igt), htop, nvtop, etc, with none of them really defaulting to 1 sample
per minute or more.

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-9-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe: Add helper to return any available hw engine
Lucas De Marchi [Fri, 17 May 2024 20:43:09 +0000 (13:43 -0700)]
drm/xe: Add helper to return any available hw engine

Get the first available engine from a gt, which helps in the case any
engine serves as a context, like when reading RING_TIMESTAMP.

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-8-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe: Cache data about user-visible engines
Lucas De Marchi [Fri, 17 May 2024 20:43:08 +0000 (13:43 -0700)]
drm/xe: Cache data about user-visible engines

gt->info.engine_mask used to indicate the available engines, but that
is not always true anymore: some engines are reserved to kernel and some
may be exposed as a single engine (e.g. with ccs_mode).

Runtime changes only happen when no clients exist, so it's safe to cache
the list of engines in the gt and update that when it's needed. This
will help implementing per client engine utilization so this (mostly
constant) information doesn't need to be re-calculated on every query.

Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe: Add helper to accumulate exec queue runtime
Umesh Nerlige Ramappa [Fri, 17 May 2024 20:43:07 +0000 (13:43 -0700)]
drm/xe: Add helper to accumulate exec queue runtime

Add a helper to accumulate per-client runtime of all its
exec queues. This is called every time a sched job is finished.

v2:
  - Use guc_exec_queue_free_job() and execlist_job_free() to accumulate
    runtime when job is finished since xe_sched_job_completed() is not a
    notification that job finished.
  - Stop trying to update runtime from xe_exec_queue_fini() - that is
    redundant and may happen after xef is closed, leading to a
    use-after-free
  - Do not special case the first timestamp read: the default LRC sets
    CTX_TIMESTAMP to zero, so even the first sample should be a valid
    one.
  - Handle the parallel submission case by multiplying the runtime by
    width.
v3: Update comments

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe: Add helper to capture engine timestamp
Lucas De Marchi [Fri, 17 May 2024 20:43:06 +0000 (13:43 -0700)]
drm/xe: Add helper to capture engine timestamp

Just like CTX_TIMESTAMP is used to calculate runtime, add a helper to
get the timestamp for the engine so it can be used to calculate the
"engine time" with the same unit as the runtime is recorded.

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe/lrc: Add helper to capture context timestamp
Umesh Nerlige Ramappa [Fri, 17 May 2024 20:43:05 +0000 (13:43 -0700)]
drm/xe/lrc: Add helper to capture context timestamp

Add a helper to capture CTX_TIMESTAMP from the context image so it can
be used to calculate the runtime.

v2: Add kernel-doc to clarify expectation from caller

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe: Add XE_ENGINE_CLASS_OTHER to str conversion
Lucas De Marchi [Fri, 17 May 2024 20:43:04 +0000 (13:43 -0700)]
drm/xe: Add XE_ENGINE_CLASS_OTHER to str conversion

XE_ENGINE_CLASS_OTHER was missing from the str conversion. Add it and
remove the default handling so it's protected by -Wswitch.
Currently the only user is xe_hw_engine_class_sysfs_init(), which
already skips XE_ENGINE_CLASS_OTHER, so there's no change in behavior.

Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe: Promote xe_hw_engine_class_to_str()
Lucas De Marchi [Fri, 17 May 2024 20:43:03 +0000 (13:43 -0700)]
drm/xe: Promote xe_hw_engine_class_to_str()

Move it out of the sysfs compilation unit so it can be re-used in other
places.

Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe: Replace RING_START_UDW by u64 RING_START
José Roberto de Souza [Fri, 10 May 2024 15:01:08 +0000 (08:01 -0700)]
drm/xe: Replace RING_START_UDW by u64 RING_START

Other u64 registers are printed in a single line so RING_START
needs to follow that too.
As there is no upstream decoder tool parsing RING_START this will
not break any decoder application.

Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240510150108.80679-1-jose.souza@intel.com
11 months agodrm/xe/vf: Expose SR-IOV VF attributes to GT debugfs
Michal Wajdeczko [Thu, 16 May 2024 11:05:46 +0000 (13:05 +0200)]
drm/xe/vf: Expose SR-IOV VF attributes to GT debugfs

For debug purposes we might want to view actual VF configuration
(including GGTT range, LMEM size, number of GuC contexts IDs or
doorbells) and the negotiated ABI versions (with GuC and PF).

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240516110546.2216-7-michal.wajdeczko@intel.com
11 months agodrm/xe/vf: Custom hardware config load step if VF
Michal Wajdeczko [Thu, 16 May 2024 11:05:45 +0000 (13:05 +0200)]
drm/xe/vf: Custom hardware config load step if VF

The VF drivers may immediately communicate with the GuC to obtain
the hardware config since the firmware shall already be running.

With the GuC communication established, VFs can also obtain the
values of the runtime registers (fuses) from the PF driver.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240516110546.2216-6-michal.wajdeczko@intel.com
11 months agodrm/xe/vf: Add support for VF to query its configuration
Michal Wajdeczko [Thu, 16 May 2024 11:05:44 +0000 (13:05 +0200)]
drm/xe/vf: Add support for VF to query its configuration

The VF driver doesn't know which GuC firmware was loaded by the PF
driver and must perform GuC ABI version handshake prior to sending
any other H2G actions to the GuC to submit workloads.

The VF driver also doesn't have access to the fuse registers and
must rely on the runtime info, which includes values of the fuse
registers, that the PF driver is exposing to the VFs.

Add functions to cover that functionality. We will use these
functions in upcoming patches.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240516110546.2216-5-michal.wajdeczko@intel.com
11 months agodrm/xe/guc: Add VF2GUC_QUERY_SINGLE_KLV to ABI
Michal Wajdeczko [Thu, 16 May 2024 11:05:43 +0000 (13:05 +0200)]
drm/xe/guc: Add VF2GUC_QUERY_SINGLE_KLV to ABI

In upcoming patches we will add support to the VF driver to
read its configuration from the GuC using special H2G actions.
Add necessary definitions to our GuC firmware ABI header.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240516110546.2216-4-michal.wajdeczko@intel.com
11 months agodrm/xe/guc: Add VF2GUC_VF_RESET to ABI
Michal Wajdeczko [Thu, 16 May 2024 11:05:42 +0000 (13:05 +0200)]
drm/xe/guc: Add VF2GUC_VF_RESET to ABI

The version negotiation between the VF driver and the GuC firmware
must start with explicit soft reset of the GuC state initiated by
the VF driver. Add VF2GUC action definitions to the ABI header.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240516110546.2216-3-michal.wajdeczko@intel.com
11 months agodrm/xe/guc: Add VF2GUC_MATCH_VERSION to ABI
Michal Wajdeczko [Thu, 16 May 2024 11:05:41 +0000 (13:05 +0200)]
drm/xe/guc: Add VF2GUC_MATCH_VERSION to ABI

In upcoming patches we will add a version negotiation between
the VF driver and the GuC firmware. Add necessary definitions
to our GuC firmware ABI header.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240516110546.2216-2-michal.wajdeczko@intel.com
11 months agodrm/xe/pf: Expose PF monitor details via debugfs
Michal Wajdeczko [Tue, 14 May 2024 19:00:15 +0000 (21:00 +0200)]
drm/xe/pf: Expose PF monitor details via debugfs

For debug purposes we might want to view statistics maintained by
the PF driver about VFs activity.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514190015.2172-9-michal.wajdeczko@intel.com
11 months agodrm/xe/pf: Track adverse events notifications from GuC
Michal Wajdeczko [Tue, 14 May 2024 19:00:14 +0000 (21:00 +0200)]
drm/xe/pf: Track adverse events notifications from GuC

When thresholds used to monitor VFs activities are configured,
then GuC may send GUC2PF_ADVERSE_EVENT messages informing the
PF driver about exceeded thresholds. Start handling such messages.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514190015.2172-8-michal.wajdeczko@intel.com
11 months agodrm/xe/guc: Add GUC2PF_ADVERSE_EVENT to ABI
Michal Wajdeczko [Tue, 14 May 2024 19:00:13 +0000 (21:00 +0200)]
drm/xe/guc: Add GUC2PF_ADVERSE_EVENT to ABI

When thresholds used to monitor VFs activities are configured,
then GuC may send GUC2PF_ADVERSE_EVENT messages informing the
PF driver about exceeded thresholds. Add necessary definitions
to our GuC firmware ABI header.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514190015.2172-7-michal.wajdeczko@intel.com
11 months agodrm/xe/pf: Allow configuration of VF thresholds over debugfs
Michal Wajdeczko [Tue, 14 May 2024 19:00:12 +0000 (21:00 +0200)]
drm/xe/pf: Allow configuration of VF thresholds over debugfs

Initial values of all thresholds used by the GuC to monitor VF's
activity is zero (disabled) and we need to explicitly configure
them per each VF. Expose additional attributes over debugfs.

Definitions of all attributes are generated so we will not need
to make any changes if new thresholds would be added to the set.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514190015.2172-6-michal.wajdeczko@intel.com
11 months agodrm/xe/pf: Introduce functions to configure VF thresholds
Michal Wajdeczko [Tue, 14 May 2024 19:00:11 +0000 (21:00 +0200)]
drm/xe/pf: Introduce functions to configure VF thresholds

The GuC firmware monitors VF's activity and notifies the PF driver
once any configured threshold related to such activity is exceeded.
Add functions to allow configuration of these thresholds per VF.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514190015.2172-5-michal.wajdeczko@intel.com
11 months agodrm/xe/guc: Add support for threshold KLVs in to_string() helper
Michal Wajdeczko [Tue, 14 May 2024 19:00:10 +0000 (21:00 +0200)]
drm/xe/guc: Add support for threshold KLVs in to_string() helper

Use MAKE_XE_GUC_KLV_THRESHOLDS_SET to generate missing conversion
of threshold KLV keys to string.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514190015.2172-4-michal.wajdeczko@intel.com
11 months agodrm/xe/guc: Introduce GuC KLV thresholds set
Michal Wajdeczko [Tue, 14 May 2024 19:00:09 +0000 (21:00 +0200)]
drm/xe/guc: Introduce GuC KLV thresholds set

The GuC firmware monitors VF's activity and notifies the PF driver
once any configured threshold related to such activity is exceeded.

The available thresholds are defined in the GuC ABI as part of the
GuC VF Configuration KLVs.  Threshold configurations performed by
the PF driver and notifications sent by the GuC rely on the KLV keys,
which are not zero-based and might not guarantee continuity.

To simplify the driver code and eliminate the need to repeat very
similar code for each threshold, introduce the threshold set macro
that allows to generate required code based on unique threshold tag.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514190015.2172-3-michal.wajdeczko@intel.com
11 months agodrm/xe/guc: Add more KLV helper macros
Michal Wajdeczko [Tue, 14 May 2024 19:00:08 +0000 (21:00 +0200)]
drm/xe/guc: Add more KLV helper macros

In upcoming patches we will want to generate some of the KLV keys
from other macros. Add MAKE_GUC_KLV_{KEY|LEN} macros for that and
make sure they will correctly expand provided TAG parameter. Also
fix PREP_GUC_KLV_TAG to also work correctly within other macros.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514190015.2172-2-michal.wajdeczko@intel.com
11 months agodrm/xe/pf: Implement pci_driver.sriov_configure callback
Michal Wajdeczko [Mon, 6 May 2024 18:41:21 +0000 (20:41 +0200)]
drm/xe/pf: Implement pci_driver.sriov_configure callback

The PCI subsystem already exposes the "sriov_numvfs" attribute
that users can use to enable or disable SR-IOV VFs. Add custom
implementation of the .sriov_configure callback defined by the
pci_driver to perform additional steps, including fair VFs
provisioning with the resources, as required by our platforms.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> #v2
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240506184121.2615-1-michal.wajdeczko@intel.com
11 months agodrm/xe/pf: Don't advertise support to enable VFs if not ready
Michal Wajdeczko [Tue, 7 May 2024 16:57:57 +0000 (18:57 +0200)]
drm/xe/pf: Don't advertise support to enable VFs if not ready

Even if we have not enabled SR-IOV support using the platform
specific has_sriov flag, the hardware may still report SR-IOV
capability and the PCI layer may wrongly advertise driver support
to enable VFs.  Explicitly reset the number of supported VFs to
zero to avoid confusion.

Applications may read the /sys/bus/pci/devices/.../sriov_totalvfs
prior to enabling VFs using the sriov_numvfs to check if such an
operation is possible.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240507165757.2835-1-michal.wajdeczko@intel.com
11 months agodrm/xe: Only zap PTEs as needed
Matthew Brost [Tue, 14 May 2024 23:23:25 +0000 (16:23 -0700)]
drm/xe: Only zap PTEs as needed

If PTEs are already invalidated no need to invalidate again.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514232325.84508-1-matthew.brost@intel.com
11 months agodrm/xe/xe_guc_submit: Declare reset if banned or killed or wedged
Jonathan Cavitt [Fri, 10 May 2024 19:45:40 +0000 (12:45 -0700)]
drm/xe/xe_guc_submit: Declare reset if banned or killed or wedged

Add an additional condition to the reset_status guc_exec_queue_op that
returns true if the exec queue has been banned or killed or wedged.  The
reset_status op is only used for exiting any xe_wait_user_fence_ioctl
that waits on an exec queue without timing out, so doing this will exit
the ioctl early in cases where the exec queue can no longer function,
such as after a GuC stop during a reset.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240510194540.3246991-3-jonathan.cavitt@intel.com
11 months agodrm/xe/xe_guc_submit: Allow lr exec queues to be banned
Jonathan Cavitt [Fri, 10 May 2024 19:45:39 +0000 (12:45 -0700)]
drm/xe/xe_guc_submit: Allow lr exec queues to be banned

LR queues currently don't get banned during a GT/GuC reset because they
lack a job.  Though they don't have a job to detect the reset status of,
it's still possible to tell when they should be banned by looking at the
LRC: if the LRC head and tail don't match, then the exec queue should be
banned and cleaned up.

This also requires swapping the usage of xe_sched_tdr_queue_imm with
xe_guc_exec_queue_trigger_cleanup, as the former is specific to non-lr
exec queues.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240510194540.3246991-2-jonathan.cavitt@intel.com
11 months agodrm/xe/xe_guc_submit: Fix exec queue stop race condition
Jonathan Cavitt [Fri, 10 May 2024 19:45:38 +0000 (12:45 -0700)]
drm/xe/xe_guc_submit: Fix exec queue stop race condition

Reorder the xe_sched_tdr_queue_imm and set_exec_queue_banned calls in
guc_exec_queue_stop.  This prevents a possible race condition between
the two events in which it's possible for xe_sched_tdr_queue_imm to
wake the ufence waiter before the exec queue is banned, causing the
ufence waiter to miss the banned state.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240510194540.3246991-1-jonathan.cavitt@intel.com
11 months agodrm/xe/uc: Move GuC submission init to post hwconfig step
Michal Wajdeczko [Fri, 10 May 2024 20:38:10 +0000 (22:38 +0200)]
drm/xe/uc: Move GuC submission init to post hwconfig step

We shouldn't need anything from the GuC submission code until we
finish GuC initialization in post hwconfig step.

While around add diagnostic message if we fail uC init.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240510203810.1952-3-michal.wajdeczko@intel.com
11 months agodrm/xe/uc: Reorder post hwconfig uC initialization step
Michal Wajdeczko [Fri, 10 May 2024 20:38:09 +0000 (22:38 +0200)]
drm/xe/uc: Reorder post hwconfig uC initialization step

We want to move the GuC submission initialization to the post
hwconfig step, but now this step is done too late as migration
initialization uses exec_queue that would crash due to a unset
exec_queue_ops. We can easily fix that by small function reorder.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240510203810.1952-2-michal.wajdeczko@intel.com
11 months agodrm/xe: Only use reserved BCS instances for usm migrate exec queue
Matthew Brost [Mon, 15 Apr 2024 19:04:53 +0000 (12:04 -0700)]
drm/xe: Only use reserved BCS instances for usm migrate exec queue

The GuC context scheduling queue is 2 entires deep, thus it is possible
for a migration job to be stuck behind a fault if migration exec queue
shares engines with user jobs. This can deadlock as the migrate exec
queue is required to service page faults. Avoid deadlock by only using
reserved BCS instances for usm migrate exec queue.

Fixes: a043fbab7af5 ("drm/xe/pvc: Use fast copy engines as migrate engine on PVC")
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415190453.696553-2-matthew.brost@intel.com
Reviewed-by: Brian Welty <brian.welty@intel.com>
11 months agodrm/xe: Fix the warning conditions
Himal Prasad Ghimiray [Wed, 8 May 2024 15:22:16 +0000 (20:52 +0530)]
drm/xe: Fix the warning conditions

The maximum timeout display uses in xe_pcode_request is 3 msec, add the
warning in cases the function is misused with higher timeouts.

Add a warning if pcode_try_request is not passed the timeout parameter
greater than 0.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240508152216.3263109-3-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: Change pcode timeout to 50msec while polling again
Himal Prasad Ghimiray [Wed, 8 May 2024 15:22:15 +0000 (20:52 +0530)]
drm/xe: Change pcode timeout to 50msec while polling again

Polling is initially attempted with timeout_base_ms enabled for
preemption, and if it exceeds this timeframe, another attempt is made
without preemption, allowing an additional 50 ms before timing out.

v2
- Rebase

v3
- Move warnings to separate patch (Lucas)

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Fixes: 7dc9b92dcfef ("drm/xe: Remove i915_utils dependency from xe_pcode.")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240508152216.3263109-2-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: Move sw-only pcode initialization
Lucas De Marchi [Mon, 13 May 2024 21:37:51 +0000 (14:37 -0700)]
drm/xe: Move sw-only pcode initialization

Move it to xe_gt_init_early() that initializes the sw-only part for each
gt.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513213751.1017791-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe: Move xe_force_wake_init_gt() inside gt initialization
Lucas De Marchi [Mon, 13 May 2024 21:37:50 +0000 (14:37 -0700)]
drm/xe: Move xe_force_wake_init_gt() inside gt initialization

xe_force_wake_init_gt() is a software-only initialization and doesn't
need to be called from xe_device_probe(). Move it to initialize
together with the gt.

Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513213751.1017791-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe: Move xe_gt_init_early() where it belongs
Lucas De Marchi [Mon, 13 May 2024 21:37:49 +0000 (14:37 -0700)]
drm/xe: Move xe_gt_init_early() where it belongs

Early shall be early enough, stop doing other things with gt before it.
Now that xe_gt_init_early() doesn't need forcewake and doesn't depend on
the fake engine_mask initialization, move it where it belongs: it
doesn't need to be after hwconfig config anymore.

Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513213751.1017791-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe: Drop useless forcewake get/put
Lucas De Marchi [Mon, 13 May 2024 21:37:48 +0000 (14:37 -0700)]
drm/xe: Drop useless forcewake get/put

Forcewake used to be needed in xe_gt_init_early() since it was calling
xe_gt_topology_init(). That call was dropped in commit 4c47049d93b7
("drm/xe/guc: Fix missing topology init"), but the forcewake calls were
left behind. Remove them.

Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513213751.1017791-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe: Drop __engine_mask
Lucas De Marchi [Mon, 13 May 2024 21:37:47 +0000 (14:37 -0700)]
drm/xe: Drop __engine_mask

Not really used, it's just a copy of engine_mask, which already reads
the fuses to mark engines as available/not-available.

Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513213751.1017791-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe: Fix xe_reg_sr.h
Michal Wajdeczko [Mon, 13 May 2024 08:42:18 +0000 (10:42 +0200)]
drm/xe: Fix xe_reg_sr.h

Prefer forward declarations over #include xe_reg_sr_types.h

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513084218.2084-5-michal.wajdeczko@intel.com
11 months agodrm/xe: Fix xe_lrc.h
Michal Wajdeczko [Mon, 13 May 2024 08:42:17 +0000 (10:42 +0200)]
drm/xe: Fix xe_lrc.h

Prefer forward declarations over #include xe_lrc_types.h

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513084218.2084-4-michal.wajdeczko@intel.com
11 months agodrm/xe: Fix xe_guc_ads.h
Michal Wajdeczko [Mon, 13 May 2024 08:42:16 +0000 (10:42 +0200)]
drm/xe: Fix xe_guc_ads.h

We don't need to include xe_guc_ads_types.h here.
Use forward declaration instead.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513084218.2084-3-michal.wajdeczko@intel.com
11 months agodrm/xe: Fix xe_gt_throttle_sysfs.h
Michal Wajdeczko [Mon, 13 May 2024 08:42:15 +0000 (10:42 +0200)]
drm/xe: Fix xe_gt_throttle_sysfs.h

We don't need to include drm/drm_managed.h here.
We don't need to comment final #endif.
Also remove empty line at the end.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513084218.2084-2-michal.wajdeczko@intel.com
11 months agodrm/xe: Rename few xe_args.h macros
Michal Wajdeczko [Wed, 8 May 2024 17:10:00 +0000 (19:10 +0200)]
drm/xe: Rename few xe_args.h macros

To minimize the risk of future name collisions, rename macros to
always include the ARG or ARGS tag:

  DROP_FIRST to DROP_FIRST_ARG
  PICK_FIRST to FIRST_ARG
  PICK_LAST to LAST_ARG

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> #v2
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240508171000.1864-1-michal.wajdeczko@intel.com
11 months agodrm/xe: Move xe_gpu_commands.h file to instructions/
Michal Wajdeczko [Wed, 8 May 2024 17:48:56 +0000 (19:48 +0200)]
drm/xe: Move xe_gpu_commands.h file to instructions/

All other files with commands definitions are in instructions/
folder. Move xe_gpu_commands.h also there.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240508174856.1908-1-michal.wajdeczko@intel.com
11 months agodrm/xe/hwmon: Remove unwanted write permission for currN_label
Karthik Poosa [Fri, 19 Apr 2024 12:59:45 +0000 (18:29 +0530)]
drm/xe/hwmon: Remove unwanted write permission for currN_label

Change umode of currN_label from 0644 to 0444 as write permission
not needed for label.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419125945.4085629-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
11 months agodrm/xe: Fix UBSAN shift-out-of-bounds failure
Shuicheng Lin [Tue, 7 May 2024 13:04:11 +0000 (13:04 +0000)]
drm/xe: Fix UBSAN shift-out-of-bounds failure

Here is the failure stack:
[   12.988209] ------------[ cut here ]------------
[   12.988216] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[   12.988232] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[   12.988235] CPU: 4 PID: 1310 Comm: gnome-shell Tainted: G     U             6.9.0-rc6+prerelease1158+ #19
[   12.988237] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.3301.A02.2208050712 08/05/2022
[   12.988239] Call Trace:
[   12.988240]  <TASK>
[   12.988242]  dump_stack_lvl+0xd7/0xf0
[   12.988248]  dump_stack+0x10/0x20
[   12.988250]  ubsan_epilogue+0x9/0x40
[   12.988253]  __ubsan_handle_shift_out_of_bounds+0x10e/0x170
[   12.988260]  dma_resv_reserve_fences.cold+0x2b/0x48
[   12.988262]  ? ww_mutex_lock_interruptible+0x3c/0x110
[   12.988267]  drm_exec_prepare_obj+0x45/0x60 [drm_exec]
[   12.988271]  ? vm_bind_ioctl_ops_execute+0x5b/0x740 [xe]
[   12.988345]  vm_bind_ioctl_ops_execute+0x78/0x740 [xe]

It is caused by the value 0 of parameter num_fences in function
drm_exec_prepare_obj.  And lead to in function __rounddown_pow_of_two,
"0 - 1" causes the shift-out-of-bounds.

By design drm_exec_prepare_obj() should be called only when there are
fences to be reserved. If num_fences is 0, calling drm_exec_lock_obj()
is sufficient as was done in commit 9377de4cb3e8 ("drm/xe/vm: Avoid
reserving zero fences")

Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/all/24d4a9a9-c622-4f56-8672-21f4c6785476@amd.com
Link: https://patchwork.freedesktop.org/patch/msgid/20240507130411.630361-1-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
11 months agodrm/xe/xe2: Enable Indirect Ring State support for Xe2
Niranjana Vishwanathapura [Tue, 7 May 2024 22:42:53 +0000 (15:42 -0700)]
drm/xe/xe2: Enable Indirect Ring State support for Xe2

Indirect Ring State is the recommended mode for Xe2 platforms,
enable it by default.

v2: Set has_indirect_ring_state to '1' instead of 'true'

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240507224255.5059-5-niranjana.vishwanathapura@intel.com