]> www.infradead.org Git - users/willy/linux.git/log
users/willy/linux.git
2 years agodrm/amd/pm: fulfill the OD support for SMU13.0.0
Evan Quan [Mon, 8 May 2023 08:57:02 +0000 (16:57 +0800)]
drm/amd/pm: fulfill the OD support for SMU13.0.0

Fulfill the interfaces for OD settings retrieving and setting.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/pm: fulfill SMU13 OD settings init and restore
Evan Quan [Tue, 11 Apr 2023 03:49:09 +0000 (11:49 +0800)]
drm/amd/pm: fulfill SMU13 OD settings init and restore

Gfxclk fmin/fmax, Uclk fmin/fmax and Gfx v/f curve voltage offset
OD settings are supported for SMU13.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: bump kfd ioctl minor version for debug api availability
Jonathan Kim [Tue, 10 May 2022 16:51:26 +0000 (12:51 -0400)]
drm/amdkfd: bump kfd ioctl  minor version for debug api availability

Bump the minor version to declare debugging capability is now
available.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add debug device snapshot operation
Jonathan Kim [Tue, 10 May 2022 16:47:45 +0000 (12:47 -0400)]
drm/amdkfd: add debug device snapshot operation

Similar to queue snapshot, return an array of device information using
an entry_size check and return.
Unlike queue snapshots, the debugger needs to pass to correct number of
devices that exist.  If it fails to do so, the KFD will return the
number of actual devices so that the debugger can make a subsequent
successful call.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add debug queue snapshot operation
Jonathan Kim [Tue, 10 May 2022 15:15:29 +0000 (11:15 -0400)]
drm/amdkfd: add debug queue snapshot operation

Allow the debugger to get a snapshot of a specified number of queues
containing various queue property information that is copied to the
debugger.

Since the debugger doesn't know how many queues exist at any given time,
allow the debugger to pass the requested number of snapshots as 0 to get
the actual number of potential snapshots to use for a subsequent snapshot
request for actual information.

To prevent future ABI breakage, pass in the requested entry_size.
The KFD will return it's own entry_size in case the debugger still wants
log the information in a core dump on sizing failure.

Also allow the debugger to clear exceptions when doing a snapshot.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add debug query exception info operation
Jonathan Kim [Mon, 9 May 2022 17:37:36 +0000 (13:37 -0400)]
drm/amdkfd: add debug query exception info operation

Allow the debugger to query additional info based on an exception code.
For device exceptions, it's currently only memory violation information.
For process exceptions, it's currently only runtime information.
Queue exception only report the queue exception status.

The debugger has the option of clearing the target exception on query.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add debug query event operation
Jonathan Kim [Mon, 9 May 2022 15:10:32 +0000 (11:10 -0400)]
drm/amdkfd: add debug query event operation

Allow the debugger to query a single queue, device and process
exception.
The KFD should also return the GPU or Queue id of the exception.
The debugger also has the option of clearing exceptions after
being queried.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add debug set flags operation
Jonathan Kim [Mon, 9 May 2022 14:51:56 +0000 (10:51 -0400)]
drm/amdkfd: add debug set flags operation

Allow the debugger to set single memory and single ALU operations.

Some exceptions are imprecise (memory violations, address watch) in the
sense that a trap occurs only when the exception interrupt occurs and
not at the non-halting faulty instruction.  Trap temporaries 0 & 1 save
the program counter address, which means that these values will not point
to the faulty instruction address but to whenever the interrupt was
raised.

Setting the Single Memory Operations flag will inject an automatic wait
on every memory operation instruction forcing imprecise memory exceptions
to become precise at the cost of performance.  This setting is not
permitted on debug devices that support only a global setting of this
option.

Return the previous set flags to the debugger as well.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add debug set and clear address watch points operation
Jonathan Kim [Fri, 6 May 2022 18:58:55 +0000 (14:58 -0400)]
drm/amdkfd: add debug set and clear address watch points operation

Shader read, write and atomic memory operations can be alerted to the
debugger as an address watch exception.

Allow the debugger to pass in a watch point to a particular memory
address per device.

Note that there exists only 4 watch points per devices to date, so have
the KFD keep track of what watch points are allocated or not.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add debug suspend and resume process queues operation
Jonathan Kim [Thu, 5 May 2022 20:15:37 +0000 (16:15 -0400)]
drm/amdkfd: add debug suspend and resume process queues operation

In order to inspect waves from the saved context at any point during a
debug session, the debugger must be able to preempt queues to trigger
context save by suspending them.

On queue suspend, the KFD will copy the context save header information
so that the debugger can correctly crawl the appropriate size of the saved
context. The debugger must then also be allowed to resume suspended queues.

A queue that is newly created cannot be suspended because queue ids are
recycled after destruction so the debugger needs to know that this has
occurred.  Query functions will be later added that will clear a given
queue of its new queue status.

A queue cannot be destroyed while it is suspended to preserve its saved
context during debugger inspection.  Have queue destruction block while
a queue is suspended and unblocked when it is resumed.  Likewise, if a
queue is about to be destroyed, it cannot be suspended.

Return the number of queues successfully suspended or resumed along with
a per queue status array where the upper bits per queue status show that
the request was invalid (new/destroyed queue suspend request, missing
queue) or an error occurred (HWS in a fatal state so it can't suspend or
resume queues).

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add debug wave launch mode operation
Jonathan Kim [Mon, 2 May 2022 15:45:05 +0000 (11:45 -0400)]
drm/amdkfd: add debug wave launch mode operation

Allow the debugger to set wave behaviour on to either normally operate,
halt at launch, trap on every instruction, terminate immediately or
stall on allocation.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add debug wave launch override operation
Jonathan Kim [Wed, 27 Apr 2022 17:18:10 +0000 (13:18 -0400)]
drm/amdkfd: add debug wave launch override operation

This operation allows the debugger to override the enabled HW
exceptions on the device.

On debug devices that only support the debugging of a single process,
the HW exceptions are global and set through the SPI_GDBG_TRAP_MASK
register.
Because they are global, only address watch exceptions are allowed to
be enabled.  In other words, the debugger must preserve all non-address
watch exception states in normal mode operation by barring a full
replacement override or a non-address watch override request.

For multi-process debugging, all HW exception overrides are per-VMID so
all exceptions can be overridden or fully replaced.

In order for the debugger to know what is permissible, returned the
supported override mask back to the debugger along with the previously
enable overrides.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add debug set exceptions enabled operation
Jonathan Kim [Wed, 27 Apr 2022 14:24:37 +0000 (10:24 -0400)]
drm/amdkfd: add debug set exceptions enabled operation

The debugger subscibes to nofication for requested exceptions on attach.
Allow the debugger to change its subsciption later on.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: update process interrupt handling for debug events
Jonathan Kim [Fri, 22 Apr 2022 16:26:18 +0000 (12:26 -0400)]
drm/amdkfd: update process interrupt handling for debug events

The debugger must be notified by any debugger subscribed exception
that comes from hardware interrupts.

If a debugger session exits, any exceptions it subscribed to may still
have interrupts in the interrupt ring buffer or KGD/KFD pipeline.
To prevent a new session from inheriting stale interrupts, when a new
queue is created, open an interrupt drain and allow the IH ring to drain
from a timestamped checkpoint.  Then inject a custom IV so that once
the custom IV is picked up by the KFD, it's safe to close the drain
and proceed with queue creation.

The drain must also be on debug disable as SW interrupts may still
be processed.  Drain at this time and clear all the exception status.

The debugger may also not be attached nor subscibed to certain
exceptions so forward them directly to the runtime.

GFX10 also requires its own IV processing, hence the creation of
kfd_int_process_v10.c.  This is because the IV from SQ interrupts are
packed into a new continguous format unlike GFX9. To make this clear,
a separate interrupting handling code file was created.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/pm: update SMU13 header files for coming OD support
Evan Quan [Tue, 11 Apr 2023 03:25:52 +0000 (11:25 +0800)]
drm/amd/pm: update SMU13 header files for coming OD support

Correct the data structures for OD feature support.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add debug trap enabled flag to tma
Jay Cornwall [Tue, 2 Mar 2021 00:34:39 +0000 (18:34 -0600)]
drm/amdkfd: add debug trap enabled flag to tma

Trap handler behavior will differ when a debugger is attached.

Make the debug trap flag available in the trap handler TMA.
Update it when the debug trap ioctl is invoked.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add runtime enable operation
Jonathan Kim [Fri, 8 Apr 2022 17:12:24 +0000 (13:12 -0400)]
drm/amdkfd: add runtime enable operation

The debugger can attach to a process prior to HSA enablement (i.e.
inferior is spawned by the debugger and attached to immediately before
target process has been enabled for HSA dispatches) or it
can attach to a running target that is already HSA enabled.  Either
way, the debugger needs to know the enablement status to know when
it can inspect queues.

For the scenario where the debugger spawns the target process,
it will have to wait for ROCr's runtime enable request from the target.
The runtime enable request will be able to see that its process has been
debug attached.  ROCr raises an EC_PROCESS_RUNTIME signal to the
debugger then blocks the target process while waiting the debugger's
response. Once the debugger has received the runtime signal, it will
unblock the target process.

For the scenario where the debugger attaches to a running target
process, ROCr will set the target process' runtime status as enabled so
that on an attach request, the debugger will be able to see this
status and will continue with debug enablement as normal.

A secondary requirement is to conditionally enable the trap tempories only
if the user requests it (env var HSA_ENABLE_DEBUG=1) or if the debugger
attaches with HSA runtime enabled.  This is because setting up the trap
temporaries incurs a performance overhead that is unacceptable for
microbench performance in normal mode for certain customers.

In the scenario where the debugger spawns the target process, when ROCr
detects that the debugger has attached during the runtime enable
request, it will enable the trap temporaries before it blocks the target
process while waiting for the debugger to respond.

In the scenario where the debugger attaches to a running target process,
it will enable to trap temporaries itself.

Finally, there is an additional restriction that is required to be
enforced with runtime enable and HW debug mode setting. The debugger must
first ensure that HW debug mode has been enabled before permitting HW debug
mode operations.

With single process debug devices, allowing the debugger to set debug
HW modes prior to trap activation means that debug HW mode setting can
occur before the KFD has reserved the debug VMID (0xf) from the hardware
scheduler's VMID allocation resource pool.  This can result in the
hardware scheduler assigning VMID 0xf to a non-debugged process and
having that process inherit debug HW mode settings intended for the
debugged target process instead, which is both incorrect and potentially
fatal for normal mode operation.

With multi process debug devices, allowing the debugger to set debug
HW modes prior to trap activation means that non-debugged processes
migrating to a new VMID could inherit unintended debug settings.

All debug operations that touch HW settings must require trap activation
where trap activation is triggered by both debug attach and runtime
enablement (target has KFD opened and is ready to dispatch work).

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add send exception operation
Jonathan Kim [Fri, 8 Apr 2022 16:49:48 +0000 (12:49 -0400)]
drm/amdkfd: add send exception operation

Add a debug operation that allows the debugger to send an exception
directly to runtime through a payload address.

For memory violations, normal vmfault signals will be applied to
notify runtime instead after passing in the saved exception data
when a memory violation was raised to the debugger.

For runtime exceptions, this will unblock the runtime enable
function which will be explained and implemented in a follow up
patch.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add raise exception event function
Jonathan Kim [Wed, 6 Apr 2022 16:03:31 +0000 (12:03 -0400)]
drm/amdkfd: add raise exception event function

Exception events can be generated from interrupts or queue activitity.

The raise event function will save exception status of a queue, device
or process then notify the debugger of the status change by writing to
a debugger polled file descriptor that the debugger provides during
debug attach.

For memory violation exceptions, extra exception data will be saved.

The debugger will be able to query the saved exception states by query
operation that will be provided by follow up patches.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: apply trap workaround for gfx11
Jonathan Kim [Thu, 1 Sep 2022 15:27:15 +0000 (11:27 -0400)]
drm/amdkfd: apply trap workaround for gfx11

Due to a HW bug, waves in only half the shader arrays can enter trap.

When starting a debug session, relocate all waves to the first shader
array of each shader engine and mask off the 2nd shader array as
unavailable.

When ending a debug session, re-enable the 2nd shader array per
shader engine.

User CU masking per queue cannot be guaranteed to remain functional
if requested during debugging (e.g. user cu mask requests only 2nd shader
array as an available resource leading to zero HW resources available)
nor can runtime be alerted of any of these changes during execution.

Make user CU masking and debugging mutual exclusive with respect to
availability.

If the debugger tries to attach to a process with a user cu masked
queue, return the runtime status as enabled but busy.

If the debugger tries to attach and fails to reallocate queue waves to
the first shader array of each shader engine, return the runtime status
as enabled but with an error.

In addition, like any other mutli-process debug supported devices,
disable trap temporary setup per-process to avoid performance impact from
setup overhead.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add per process hw trap enable and disable functions
Jonathan Kim [Tue, 5 Apr 2022 16:34:55 +0000 (12:34 -0400)]
drm/amdkfd: add per process hw trap enable and disable functions

To enable HW debug mode per process, all devices must be debug enabled
successfully.  If a failure occures, rewind the enablement of debug mode
on the enabled devices.

A power management scenario that needs to be considered is HW
debug mode setting during GFXOFF.  During GFXOFF, these registers
will be unreachable so we have to transiently disable GFXOFF when
setting.  Also, some devices don't support the RLC save restore
function for these debug registers so we have to disable GFXOFF
completely during a debug session.

Cooperative launch also has debugging restriction based on HW/FW bugs.
If such bugs exists, the debugger cannot attach to a process that uses GWS
resources nor can GWS resources be requested if a process is being
debugged.

Multi-process debug devices can only enable trap temporaries based
on certain runtime scenerios, which will be explained when the
runtime enable functions are implemented in a follow up patch.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: expose debug api for mes
Jonathan Kim [Sat, 27 Aug 2022 02:04:15 +0000 (22:04 -0400)]
drm/amdgpu: expose debug api for mes

Similar to the F32 HWS, the RS64 HWS for GFX11 now supports a multi-process
debug API.

The skip_process_ctx_clear ADD_QUEUE requirement is to prevent the MES
from clearing the process context when the first queue is added to the
scheduler in order to maintain debug mode settings during queue preemption
and restore.  The MES clears the process context in this case due to an
unresolved FW caching bug during normal mode operations.
During debug mode, the KFD will hold a reference to the target process
so the process context should never go stale and MES can afford to skip
this requirement.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: prepare map process for multi-process debug devices
Jonathan Kim [Mon, 4 Apr 2022 17:38:11 +0000 (13:38 -0400)]
drm/amdgpu: prepare map process for multi-process debug devices

Unlike single process debug devices, multi-process debug devices allow
debug mode setting per-VMID (non-device-global).

Because the HWS manages PASID-VMID mapping, the new MAP_PROCESS API allows
the KFD to forward the required SPI debug register write requests.

To request a new debug mode setting change, the KFD must be able to
preempt all queues then remap all queues with these new setting
requests for MAP_PROCESS to take effect.

Note that by default, trap enablement in non-debug mode must be disabled
for performance reasons for multi-process debug devices due to setup
overhead in FW.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: prepare map process for single process debug devices
Jonathan Kim [Mon, 4 Apr 2022 16:27:43 +0000 (12:27 -0400)]
drm/amdkfd: prepare map process for single process debug devices

Older HW only supports debugging on a single process because the
SPI debug mode setting registers are device global.

The HWS has supplied a single pinned VMID (0xf) for MAP_PROCESS
for debug purposes. To pin the VMID, the KFD will remove the VMID from
the HWS dynamic VMID allocation via SET_RESOUCES so that a debugged
process will never migrate away from its pinned VMID.

The KFD is responsible for reserving and releasing this pinned VMID
accordingly whenever the debugger attaches and detaches respectively.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add configurable grace period for unmap queues
Jonathan Kim [Thu, 23 Mar 2023 21:17:20 +0000 (17:17 -0400)]
drm/amdgpu: add configurable grace period for unmap queues

The HWS schedule allows a grace period for wave completion prior to
preemption for better performance by avoiding CWSR on waves that can
potentially complete quickly. The debugger, on the other hand, will
want to inspect wave status immediately after it actively triggers
preemption (a suspend function to be provided).

To minimize latency between preemption and debugger wave inspection, allow
immediate preemption by setting the grace period to 0.

Note that setting the preepmtion grace period to 0 will result in an
infinite grace period being set due to a CP FW bug so set it to 1 for now.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add gfx11 hw debug mode enable and disable calls
Jonathan Kim [Sat, 27 Aug 2022 02:35:50 +0000 (22:35 -0400)]
drm/amdgpu: add gfx11 hw debug mode enable and disable calls

Implement the per-device calls to enable or disable HW debug mode
for GFX11.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls
Jonathan Kim [Fri, 1 Apr 2022 17:31:57 +0000 (13:31 -0400)]
drm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls

GFX9.4.2 now supports per-VMID debug mode controls registers
(SPI_GDBG_PER_VMID_CNTL).

Because the KFD lets the HWS handle PASID-VMID mapping, the KFD will
forward all debug mode setting register writes to the HWS scheduler
using a new MAP_PROCESS API, so instead of writing to registers, return
the required register values that the HWS needs to write on debug enable
and disable.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add gfx10 hw debug mode enable and disable calls
Jonathan Kim [Thu, 31 Mar 2022 17:14:01 +0000 (13:14 -0400)]
drm/amdgpu: add gfx10 hw debug mode enable and disable calls

Similar to GFX9 debug devices, set the hardware debug mode by draining
the SPI appropriately prior the mode setting request.

Because GFX10 has waves allocated by the work group boundary and each
SE's SPI instances do not communicate, the SPI drain time is much longer.
This long drain time will be fixed for GFX11 onwards.

Also remove a bunch of deprecated misplaced references for GFX10.3.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: fix kfd_suspend_all_processes
Jonathan Kim [Fri, 24 Mar 2023 20:19:27 +0000 (16:19 -0400)]
drm/amdkfd: fix kfd_suspend_all_processes

Flush delayed restore work in kfd_suspend_all_queues instead of
cancelling. Cancelling the work before it runs results in the queues
becoming permanently disabled. Flushing the work ensures that the
queue suspend/resume state stays balanced.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls
Jonathan Kim [Wed, 30 Mar 2022 19:31:00 +0000 (15:31 -0400)]
drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

On GFX9.4.1, the implicit wait count instruction on s_barrier is
disabled by default in the driver during normal operation for
performance requirements.

There is a hardware bug in GFX9.4.1 where if the implicit wait count
instruction after an s_barrier instruction is disabled, any wave that
hits an exception may step over the s_barrier when returning from the
trap handler with the barrier logic having no ability to be
aware of this, thereby causing other waves to wait at the barrier
indefinitely resulting in a shader hang.  This bug has been corrected
for GFX9.4.2 and onward.

Since the debugger subscribes to hardware exceptions, in order to avoid
this bug, the debugger must enable implicit wait count on s_barrier
for a debug session and disable it on detach.

In order to change this setting in the in the device global SQ_CONFIG
register, the GFX pipeline must be idle.  GFX9.4.1 as a compute device
will either dispatch work through the compute ring buffers used for
image post processing or through the hardware scheduler by the KFD.

Have the KGD suspend and drain the compute ring buffer, then suspend the
hardware scheduler and block any future KFD process job requests before
changing the implicit wait count setting.  Once set, resume all work.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add gfx9 hw debug mode enable and disable calls
Jonathan Kim [Wed, 30 Mar 2022 19:09:11 +0000 (15:09 -0400)]
drm/amdgpu: add gfx9 hw debug mode enable and disable calls

Implement the per-device calls to enable or disable HW debug mode for
GFX9 prior to GFX9.4.1.

GFX9.4.1 and onward will require their own enable/disable sequence as
follow on patches.

When hardware debug mode setting is requested, waves will inherit
these settings in the Shader Processor Input's (SPI) Sequencer Global
Block (SQG). This means that the KGD must drain all waves from the SPI
into SQG (approximately 96 SPI clock cycles) prior to debug mode setting
to ensure that the order of operations that the debugger expects with
regards to debug mode setting transaction requests and wave inheritence
of that mode is upheld.

Also ensure that exception overrides are reset to their original state
prior to debug enable or disable.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: clean up one inconsistent indenting
Yang Li [Wed, 31 May 2023 02:08:11 +0000 (10:08 +0800)]
drm/amdkfd: clean up one inconsistent indenting

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:1036 kgd2kfd_interrupt() warn: inconsistent indenting

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: Drop unused DCN_BASE variable in dcn314_resource.c
Srinivasan Shanmugam [Wed, 31 May 2023 03:33:27 +0000 (09:03 +0530)]
drm/amd/display: Drop unused DCN_BASE variable in dcn314_resource.c

Fixes the following W=1 kernel build warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/dcn314/dcn314_resource.c:128:29: warning: ‘DCN_BASE’ defined but not used [-Wunused-const-variable=]
  128 | static const struct IP_BASE DCN_BASE = { { { { 0x00000012, 0x000000C0, 0x000034C0, 0x00009000, 0x02403C00, 0, 0, 0 } },
      |                             ^~~~~~~~

Suggested-by: Roman Li <Roman.Li@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd: Make lack of `ACPI_FADT_LOW_POWER_S0` or `CONFIG_AMD_PMC` louder during...
Mario Limonciello [Tue, 30 May 2023 17:44:30 +0000 (12:44 -0500)]
drm/amd: Make lack of `ACPI_FADT_LOW_POWER_S0` or `CONFIG_AMD_PMC` louder during suspend path

Users have reported that s2idle wasn't working on OEM Phoenix systems,
but it was root caused to be because `CONFIG_AMD_PMC` wasn't set in
the distribution kernel config.

To make this more apparent, raise the messaging to err instead of warn.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217497
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: setup hw debug registers on driver initialization
Jonathan Kim [Thu, 31 Mar 2022 16:05:00 +0000 (12:05 -0400)]
drm/amdgpu: setup hw debug registers on driver initialization

Add missing debug trap registers references and initialize all debug
registers on boot by clearing the hardware exception overrides and the
wave allocation ID index.

The debugger requires that TTMPs 6 & 7 save the dispatch ID to map
waves onto dispatch during compute context inspection.
In order to correctly set this up, set the special reserved CP bit by
default whenever the MQD is initailized.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add kgd hw debug mode setting interface
Jonathan Kim [Wed, 30 Mar 2022 18:54:16 +0000 (14:54 -0400)]
drm/amdgpu: add kgd hw debug mode setting interface

Introduce the require KGD debug calls that will execute hardware debug
mode setting.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: prepare per-process debug enable and disable
Jonathan Kim [Fri, 25 Mar 2022 18:55:30 +0000 (14:55 -0400)]
drm/amdkfd: prepare per-process debug enable and disable

The ROCm debugger will attach to a process to debug by PTRACE and will
expect the KFD to prepare a process for the target PID, whether the
target PID has opened the KFD device or not.

This patch is to explicity handle this requirement.  Further HW mode
setting and runtime coordination requirements will be handled in
following patches.

In the case where the target process has not opened the KFD device,
a new KFD process must be created for the target PID.
The debugger as well as the target process for this case will have not
acquired any VMs so handle process restoration to correctly account for
this.

To coordinate with HSA runtime, the debugger must be aware of the target
process' runtime enablement status and will copy the runtime status
information into the debugged KFD process for later query.

On enablement, the debugger will subscribe to a set of exceptions where
each exception events will notify the debugger through a pollable FIFO
file descriptor that the debugger provides to the KFD to manage.

Finally on process termination of either the debugger or the target,
debugging must be disabled if it has not been done so.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: display debug capabilities
Jonathan Kim [Fri, 25 Mar 2022 16:39:06 +0000 (12:39 -0400)]
drm/amdkfd: display debug capabilities

Expose debug capabilities in the KFD topology node's HSA capabilities and
debug properties flags.

Ensure correct capabilities are exposed based on firmware support.

Flag definitions can be referenced in uapi/linux/kfd_sysfs.h.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: add debug and runtime enable interface
Jonathan Kim [Wed, 2 Mar 2022 19:30:12 +0000 (14:30 -0500)]
drm/amdkfd: add debug and runtime enable interface

Introduce the GPU debug operations interface.

For ROCm-GDB to extend the GNU Debugger's ability to inspect the AMD GPU
instruction set, provide the necessary interface to allow the debugger
to HW debug-mode set and query exceptions per HSA queue, process or
device.

The runtime_enable interface coordinates exception handling with the
HSA runtime.

Usage is available in the kern docs at uapi/linux/kfd_ioctl.h.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agoamd/amdkfd: drop unused KFD_IOCTL_SVM_FLAG_UNCACHED flag
Alex Deucher [Fri, 2 Jun 2023 16:58:05 +0000 (12:58 -0400)]
amd/amdkfd: drop unused KFD_IOCTL_SVM_FLAG_UNCACHED flag

Was leftover from GC 9.4.3 bring up and is currently
unused.  Drop it for now.

Cc: Philip.Yang@amd.com
Cc: rajneesh.bhardwaj@amd.com
Cc: Felix.Kuehling@amd.com
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/pm: conditionally disable pcie lane switching for some sienna_cichlid SKUs
Evan Quan [Thu, 6 Apr 2023 04:08:21 +0000 (12:08 +0800)]
drm/amd/pm: conditionally disable pcie lane switching for some sienna_cichlid SKUs

Disable the pcie lane switching for some sienna_cichlid SKUs since it
might not work well on some platforms.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/pm: Fix power context allocation in SMU13
Lijo Lazar [Fri, 31 Mar 2023 11:00:01 +0000 (16:30 +0530)]
drm/amd/pm: Fix power context allocation in SMU13

Use the right data structure for allocation.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/pm: add unique serial number support for smu_v13_0_6
Yang Wang [Wed, 24 May 2023 05:54:26 +0000 (13:54 +0800)]
drm/amd/pm: add unique serial number support for smu_v13_0_6

add unique serial number support for smu_v13_0_6.
(use aid0 serial number by default)

Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/pm: Fix SMUv13.0.6 throttle status report
Lijo Lazar [Fri, 31 Mar 2023 11:04:15 +0000 (16:34 +0530)]
drm/amd/pm: Fix SMUv13.0.6 throttle status report

Add throttle status in power context
Keep throttle status indicator in SMUv13 power context

v2: Removed Dummy definition

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/pm: Update SMUv13.0.6 PMFW headers
Lijo Lazar [Mon, 3 Apr 2023 06:08:17 +0000 (11:38 +0530)]
drm/amd/pm: Update SMUv13.0.6 PMFW headers

Update PMFW interface headers to for new metrics table format and
throttling information.

v2: Added dummy definition for compilation error

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: fix Null pointer dereference error in amdgpu_device_recover_vram
Horatio Zhang [Mon, 29 May 2023 18:23:37 +0000 (14:23 -0400)]
drm/amdgpu: fix Null pointer dereference error in amdgpu_device_recover_vram

Use the function of amdgpu_bo_vm_destroy to handle the resource release
of shadow bo. During the amdgpu_mes_self_test, shadow bo released, but
vmbo->shadow_list was not, which caused a null pointer reference error
in amdgpu_device_recover_vram when GPU reset.

Fixes: 6c032c37ac3e ("drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)")
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Acked-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add function parameter 'event' to kdoc in svm_range_evict()
Srinivasan Shanmugam [Wed, 31 May 2023 18:03:35 +0000 (23:33 +0530)]
drm/amdgpu: Add function parameter 'event' to kdoc in svm_range_evict()

Fixes the following gcc with W=1:

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:1841: warning: Function parameter or member 'event' not described in 'svm_range_evict'

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix up kdoc in amdgpu_device.c
Srinivasan Shanmugam [Thu, 25 May 2023 17:26:17 +0000 (22:56 +0530)]
drm/amdgpu: Fix up kdoc in amdgpu_device.c

Fix these warnings by deleting the deviant arguments.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:799: warning: Excess function parameter 'pcie_index' description in 'amdgpu_device_indirect_wreg'
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:799: warning: Excess function parameter 'pcie_data' description in 'amdgpu_device_indirect_wreg'
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:870: warning: Excess function parameter 'pcie_index' description in 'amdgpu_device_indirect_wreg64'
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:870: warning: Excess function parameter 'pcie_data' description in 'amdgpu_device_indirect_wreg64'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix up kdoc 'ring' parameter in sdma_v6_0_ring_pad_ib
Srinivasan Shanmugam [Tue, 30 May 2023 18:42:22 +0000 (00:12 +0530)]
drm/amdgpu: Fix up kdoc 'ring' parameter in sdma_v6_0_ring_pad_ib

Fix this warning by adding 'ring' arguments to kdoc.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:1128: warning: Function parameter or member 'ring' not described in 'sdma_v6_0_ring_pad_ib'

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: Fix up kdoc formatting in display_mode_vba.c
Srinivasan Shanmugam [Tue, 30 May 2023 19:04:10 +0000 (00:34 +0530)]
drm/amd/display: Fix up kdoc formatting in display_mode_vba.c

Fixes the following W=1 kernel build warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_vba.c:936: warning: Cannot understand  * *************************************************************************

Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/amdgpu: introduce DRM_AMDGPU_WERROR
Hamza Mahfooz [Wed, 24 May 2023 18:59:32 +0000 (14:59 -0400)]
drm/amd/amdgpu: introduce DRM_AMDGPU_WERROR

We want to do -Werror builds on our CI. However, non-amdgpu breakages
have prevented us from doing so thus far. Also, there are a number of
additional checks that we should enable, that the community cares about
and are hidden behind -Wextra. So, define DRM_AMDGPU_WERROR to only
enable -Werror for the amdgpu kernel module and enable -Wextra while
disabling all of the checks that are too noisy.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Kenny Ho <kenny.ho@amd.com>
Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Kenny Ho <Kenny.Ho@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: remove unused sq_int_priv variable
Tom Rix [Thu, 30 Mar 2023 15:20:40 +0000 (11:20 -0400)]
drm/amdkfd: remove unused sq_int_priv variable

clang with W=1 reports
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v11.c:282:38: error: variable
  'sq_int_priv' set but not used [-Werror,-Wunused-but-set-variable]
        uint8_t sq_int_enc, sq_int_errtype, sq_int_priv;
                                            ^
This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd: Disallow s0ix without BIOS support again
Mario Limonciello [Tue, 30 May 2023 16:57:59 +0000 (11:57 -0500)]
drm/amd: Disallow s0ix without BIOS support again

commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support") showed
improvements to power consumption over suspend when s0ix wasn't enabled in
BIOS and the system didn't support S3.

This patch however was misguided because the reason the system didn't
support S3 was because SMT was disabled in OEM BIOS setup.
This prevented the BIOS from allowing S3.

Also allowing GPUs to use the s2idle path actually causes problems if
they're invoked on systems that may not support s2idle in the platform
firmware. `systemd` has a tendency to try to use `s2idle` if `deep` fails
for any reason, which could lead to unexpected flows.

The original commit also fixed a problem during resume from suspend to idle
without hardware support, but this is no longer necessary with commit
ca4751866397 ("drm/amd: Don't allow s0ix on APUs older than Raven")

Revert commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support")
to make it match the expected behavior again.

Cc: Rafael Ávila de Espíndola <rafael@espindo.la>
Link: https://github.com/torvalds/linux/blob/v6.1/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c#L1060
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2599
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: Correct kdoc formatting for DCN32_CRB_SEGMENT_SIZE_KB in dcn32_hubbub.c
Srinivasan Shanmugam [Wed, 31 May 2023 03:52:36 +0000 (09:22 +0530)]
drm/amd/display: Correct kdoc formatting for DCN32_CRB_SEGMENT_SIZE_KB in dcn32_hubbub.c

Fixes the following W=1 kernel build warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_hubbub.c:45: warning: Cannot understand  * @DCN32_CRB_SEGMENT_SIZE_KB: Maximum Configurable Return Buffer size for
 on line 45 - I thought it was a doc line

Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: Fix up missing 'dc' & 'pipe_ctx' kdoc parameters in delay_cursor_unt...
Srinivasan Shanmugam [Wed, 31 May 2023 05:11:59 +0000 (10:41 +0530)]
drm/amd/display: Fix up missing 'dc' & 'pipe_ctx' kdoc parameters in delay_cursor_until_vupdate()

Fixes the following gcc with W=1:

drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:1904: warning: Function parameter or member 'dc' not described in 'delay_cursor_until_vupdate'
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:1904: warning: Function parameter or member 'pipe_ctx' not described in 'delay_cursor_until_vupdate'

Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: Fix up kdoc formatting in dcn32_resource_helpers.c
Srinivasan Shanmugam [Wed, 31 May 2023 09:22:02 +0000 (14:52 +0530)]
drm/amd/display: Fix up kdoc formatting in dcn32_resource_helpers.c

Fixes the following W=1 kernel build warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_resource_helpers.c:97: warning: Cannot understand  * **************************************************************************
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_resource_helpers.c:264: warning: Cannot understand  * *************************************************************************
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_resource_helpers.c:435: warning: Cannot understand  * *************************************************************************
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_resource_helpers.c:475: warning: Cannot understand  * *************************************************************************

drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_resource_helpers.c:599:
warning: Function parameter or member 'dc' not described in
'dcn32_can_support_mclk_switch_using_fw_based_vblank_stretch'
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_resource_helpers.c:599:
warning: Function parameter or member 'context' not described in
'dcn32_can_support_mclk_switch_using_fw_based_vblank_stretch'

drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_resource_helpers.c:587:
warning: Function parameter or member 'dc' not described in
'dcn32_can_support_mclk_switch_using_fw_based_vblank_stretch'
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_resource_helpers.c:587:
warning: Function parameter or member 'context' not described in
'dcn32_can_support_mclk_switch_using_fw_based_vblank_stretch'

Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdxcp: fix Makefile to build amdxcp module
Bob Zhou [Tue, 30 May 2023 06:48:02 +0000 (14:48 +0800)]
drm/amdxcp: fix Makefile to build amdxcp module

After drm conduct amdgpu Makefile, amdgpu.ko has been created
and "amdgpu-y +=" in amdxcp Makefile isn't used.
So modify amdgpu-y to amdxcp-y and build amdxcp module.

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix up missing parameters kdoc in svm_migrate_vma_to_ram
Srinivasan Shanmugam [Mon, 29 May 2023 22:33:16 +0000 (04:03 +0530)]
drm/amdgpu: Fix up missing parameters kdoc in svm_migrate_vma_to_ram

Fix these warnings by adding & deleting the deviant arguments.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:671: warning: Function parameter or member 'node' not described in 'svm_migrate_vma_to_ram'
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:671: warning: Function parameter or member 'trigger' not described in 'svm_migrate_vma_to_ram'
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:671: warning: Function parameter or member 'fault_page' not described in 'svm_migrate_vma_to_ram'
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:671: warning: Excess function parameter 'adev' description in 'svm_migrate_vma_to_ram'
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:771: warning: Function parameter or member 'fault_page' not described in 'svm_migrate_vram_to_ram'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: set finished fence error if job timedout
ZhenGuo Yin [Tue, 9 May 2023 09:42:11 +0000 (17:42 +0800)]
drm/amdgpu: set finished fence error if job timedout

Set finished fence to ETIME error if job timedout.

Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix missing parameter desc for 'xcp_id' in amdgpu_amdkfd_reserve_mem_limit
Srinivasan Shanmugam [Mon, 29 May 2023 14:53:33 +0000 (20:23 +0530)]
drm/amdgpu: Fix missing parameter desc for 'xcp_id' in amdgpu_amdkfd_reserve_mem_limit

Fix these warnings by adding 'xcp_id' argument.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:160: warning: Function parameter or member 'xcp_id' not described in 'amdgpu_amdkfd_reserve_mem_limit'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix up missing parameter in kdoc for 'inst' in gmc_ v7, v8, v9, v10,...
Srinivasan Shanmugam [Tue, 30 May 2023 09:13:14 +0000 (14:43 +0530)]
drm/amdgpu: Fix up missing parameter in kdoc for 'inst' in gmc_ v7, v8, v9, v10, v11.c

Fix these warnings by adding 'inst' arguments to kdocs.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:428: warning: Function parameter or member 'inst' not described in 'gmc_v7_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:626: warning: Function parameter or member 'inst' not described in 'gmc_v8_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c:423: warning: Function parameter or member 'inst' not described in 'gmc_v10_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c:328: warning: Function parameter or member 'inst' not described in 'gmc_v11_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:950: warning: Function parameter or member 'inst' not described in 'gmc_v9_0_flush_gpu_tlb_pasid'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix up missing kdoc parameter 'inst' in get_wave_count() & kgd_gfx_v9_get...
Srinivasan Shanmugam [Tue, 30 May 2023 12:47:17 +0000 (18:17 +0530)]
drm/amdgpu: Fix up missing kdoc parameter 'inst' in get_wave_count() & kgd_gfx_v9_get_cu_occupancy()

Fix these warnings by adding 'inst' arguments to kdocs.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:692: warning: Function parameter or member 'inst' not described in 'get_wave_count'
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:763: warning: Function parameter or member 'inst' not described in 'kgd_gfx_v9_get_cu_occupancy'

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix missing parameter desc for 'xcc_id' in gfx_v7_0.c & amdgpu_rlc.c
Srinivasan Shanmugam [Mon, 29 May 2023 13:59:35 +0000 (19:29 +0530)]
drm/amdgpu: Fix missing parameter desc for 'xcc_id' in gfx_v7_0.c & amdgpu_rlc.c

Fix these warnings by adding 'xcc_id' arguments.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c:1557: warning: Function parameter or member 'xcc_id' not described in 'gfx_v7_0_select_se_sh'
drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c:38: warning: Function parameter or member 'xcc_id' not described in 'amdgpu_gfx_rlc_enter_safe_mode'
drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c:62: warning: Function parameter or member 'xcc_id' not described in 'amdgpu_gfx_rlc_exit_safe_mode'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: flag added to handle errors from svm validate and map
Alex Sierra [Mon, 29 May 2023 21:01:37 +0000 (16:01 -0500)]
drm/amdkfd: flag added to handle errors from svm validate and map

If a return error is raised during validation and mapping of a
prange, this flag is set. It is a rare occurrence, but it could happen
when `amdgpu_hmm_range_get_pages_done` returns true. In such cases,
the caller should retry. However, it is important to ensure that the
prange is updated correctly during the retry.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Initialize xcc mask
Lijo Lazar [Tue, 30 May 2023 06:22:45 +0000 (11:52 +0530)]
drm/amdgpu: Initialize xcc mask

For ASICs which are not initialized through discovery, initialize GFX
cluster as 1.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: Fix up kdoc formats in dcn32_fpu.c
Srinivasan Shanmugam [Sat, 27 May 2023 16:35:40 +0000 (22:05 +0530)]
drm/amd/display: Fix up kdoc formats in dcn32_fpu.c

Fixes the following gcc with W=1:

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2806: warning: Cannot understand  * *************************************************************************
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2855: warning: Cannot understand  * *************************************************************************
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2900: warning: Function parameter or member 'dc' not described in 'dcn32_assign_fpo_vactive_candidate'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2900: warning: Function parameter or member 'context' not described in 'dcn32_assign_fpo_vactive_candidate'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2900: warning: Function parameter or member 'fpo_candidate_stream' not described in 'dcn32_assign_fpo_vactive_candidate'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2929: warning: Function parameter or member 'dc' not described in 'dcn32_find_vactive_pipe'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2929: warning: Function parameter or member 'context' not described in 'dcn32_find_vactive_pipe'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2929: warning: Function parameter or member 'vactive_margin_req_us' not described in 'dcn32_find_vactive_pipe'

Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: Add missing kdoc entries in update_planes_and_stream_adapter
Srinivasan Shanmugam [Sat, 27 May 2023 14:15:52 +0000 (19:45 +0530)]
drm/amd/display: Add missing kdoc entries in update_planes_and_stream_adapter

Fixes the following gcc with W=1:

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:374: warning: Function parameter or member 'dc' not described in 'update_planes_and_stream_adapter'
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:374: warning: Function parameter or member 'update_type' not described in 'update_planes_and_stream_adapter'
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:374: warning: Function parameter or member 'planes_count' not described in 'update_planes_and_stream_adapter'
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:374: warning: Function parameter or member 'stream' not described in 'update_planes_and_stream_adapter'
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:374: warning: Function parameter or member 'stream_update' not described in 'update_planes_and_stream_adapter'
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:374: warning: Function parameter or member 'array_of_surface_update' not described in 'update_planes_and_stream_adapter'

Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix create_dmamap_sg_bo kdoc warnings
Srinivasan Shanmugam [Sat, 27 May 2023 16:54:53 +0000 (22:24 +0530)]
drm/amdgpu: Fix create_dmamap_sg_bo kdoc warnings

Fix the following gcc with W=1:

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:292: warning: Cannot understand  * @create_dmamap_sg_bo: Creates a amdgpu_bo object to reflect information

Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Fix MEC pipe interrupt enablement
Lijo Lazar [Mon, 29 May 2023 13:48:54 +0000 (19:18 +0530)]
drm/amdkfd: Fix MEC pipe interrupt enablement

for_each_inst modifies xcc_mask and therefore the loop doesn't
initialize properly interrupts on all pipes. Keep looping through xcc as
the outer loop to fix this issue.

Fixes: c4050ff1a43e ("drm/amdkfd: Use xcc mask for identifying xcc")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Add new gfx_target_versions for GC 9.4.3
Graham Sider [Mon, 3 Apr 2023 19:31:53 +0000 (15:31 -0400)]
drm/amdkfd: Add new gfx_target_versions for GC 9.4.3

For GC 9.4.3, set gfx_target_version to 90402 for rev 1 and later (APU
or dGPU), 90401 for rev 0 dGPU, and 90400 for rev 0 APU.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix up missing kdoc in sdma_v6_0.c
Srinivasan Shanmugam [Thu, 25 May 2023 16:07:35 +0000 (21:37 +0530)]
drm/amdgpu: Fix up missing kdoc in sdma_v6_0.c

Address a bunch of kdoc warnings:

gcc with W=1
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:248: warning: Function parameter or member 'job' not described in 'sdma_v6_0_ring_emit_ib'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:248: warning: Function parameter or member 'flags' not described in 'sdma_v6_0_ring_emit_ib'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:946: warning: Function parameter or member 'timeout' not described in 'sdma_v6_0_ring_test_ib'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:1125: warning: Function parameter or member 'ring' not described in 'sdma_v6_0_ring_pad_ib'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:1176: warning: Function parameter or member 'vmid' not described in 'sdma_v6_0_ring_emit_vm_flush'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:1176: warning: Function parameter or member 'pd_addr' not described in 'sdma_v6_0_ring_emit_vm_flush'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix up kdoc in amdgpu_acpi.c
Srinivasan Shanmugam [Thu, 25 May 2023 18:00:59 +0000 (23:30 +0530)]
drm/amdgpu: Fix up kdoc in amdgpu_acpi.c

Fix these warnings by adding & deleting the deviant arguments.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:906: warning: Function parameter or member 'numa_info' not described in 'amdgpu_acpi_get_node_id'
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:906: warning: Excess function parameter 'nid' description in 'amdgpu_acpi_get_node_id'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix up kdoc in sdma_v4_4_2.c
Srinivasan Shanmugam [Thu, 25 May 2023 17:02:35 +0000 (22:32 +0530)]
drm/amdgpu: Fix up kdoc in sdma_v4_4_2.c

Address a bunch of kdoc warnings:

gcc with W=1
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:426: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_gfx_stop'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:457: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_rlc_stop'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:470: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_page_stop'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:506: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_ctx_switch_enable'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:794: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_rlc_resume'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:810: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_load_microcode'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:854: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_start'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: enable tmz by default for GC 11.0.1
Ikshwaku Chauhan [Thu, 25 May 2023 05:27:26 +0000 (10:57 +0530)]
drm/amdgpu: enable tmz by default for GC 11.0.1

Add IP GC 11.0.1 in the list of target to have
tmz enabled by default.

Signed-off-by: Ikshwaku Chauhan <ikshwaku.chauhan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: fix gfx_target_version for certain 11.0.3 devices
Alex Deucher [Wed, 24 May 2023 18:30:12 +0000 (14:30 -0400)]
drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices

Certain boards with GC IP 11.0.3 need slightly different handling
in the shader compiler due to board specific bounding box
optimizations.

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: keep irq count in amdgpu_irq_disable_all
Guchun Chen [Thu, 25 May 2023 09:24:31 +0000 (17:24 +0800)]
drm/amdgpu: keep irq count in amdgpu_irq_disable_all

This can clean up all irq warnings because of unbalanced
amdgpu_irq_get/put when unplugging/unbinding device, and leave
irq count decrease in each ip fini function.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/amdgpu: Fix up locking etc in amdgpu_debugfs_gprwave_ioctl()
Dan Carpenter [Thu, 25 May 2023 08:04:51 +0000 (11:04 +0300)]
drm/amd/amdgpu: Fix up locking etc in amdgpu_debugfs_gprwave_ioctl()

There are two bugs here.
1) Drop the lock if copy_from_user() fails.
2) If the copy fails then the correct error code is -EFAULT instead of
   -EINVAL.

I also broke up the long line and changed "sizeof rd->id" to
"sizeof(rd->id)".

Fixes: 553f973a0d7b ("drm/amd/amdgpu: Update debugfs for XCC support (v3)")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: remove unused definition
Yang Li [Wed, 24 May 2023 03:59:52 +0000 (11:59 +0800)]
drm/amd/display: remove unused definition

Eliminate the following warning:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn201/dcn201_resource.c:899:43: warning: unused variable 'res_create_maximus_funcs'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5296
Fixes: 25879d7b4986 ("drm/amd/display: Clean FPGA code in dc")
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: use amdxcp platform device as spatial partition
James Zhu [Tue, 25 Apr 2023 21:02:48 +0000 (17:02 -0400)]
drm/amdgpu: use amdxcp platform device as spatial partition

Use amdxcp platform device as spatial partition device.

-v2: remove unused variable

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: remove unused definition
Yang Li [Wed, 24 May 2023 03:59:48 +0000 (11:59 +0800)]
drm/amd/display: remove unused definition

Eliminate the following warning:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn314/dcn314_resource.c:1390:43: warning: unused variable 'res_create_maximus_funcs'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5296
Fixes: 25879d7b4986 ("drm/amd/display: Clean FPGA code in dc")
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdxcp: add platform device driver for amdxcp
James Zhu [Tue, 25 Apr 2023 21:00:50 +0000 (17:00 -0400)]
drm/amdxcp: add platform device driver for amdxcp

Add platform device driver for amdxcp to support
amdgpu spatial partition.

-v2: fix build warning

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Mark mmhub_v1_8_mmea_err_status_reg as __maybe_unused
Srinivasan Shanmugam [Fri, 19 May 2023 12:40:40 +0000 (18:10 +0530)]
drm/amdgpu: Mark mmhub_v1_8_mmea_err_status_reg as __maybe_unused

Silencing the compiler from below compilation error:

drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c:704:23: error: variable 'mmhub_v1_8_mmea_err_status_reg' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static const uint32_t mmhub_v1_8_mmea_err_status_reg[] = {
                      ^
1 error generated.

Mark the variable as __maybe_unused to make it clear to clang that this
is expected, so there is no more warning.

Cc: Christian König <christian.koenig@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: remove unused definition
Yang Li [Wed, 24 May 2023 03:59:51 +0000 (11:59 +0800)]
drm/amd/display: remove unused definition

Eliminate the following warnings:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn321/dcn321_resource.c:1346:43: warning: unused variable 'res_create_maximus_funcs'
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn321/dcn321_resource.c:735:38: warning: unused variable 'debug_defaults_diags'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5296
Fixes: 25879d7b4986 ("drm/amd/display: Clean FPGA code in dc")
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: clean up some inconsistent indenting
Jiapeng Chong [Wed, 24 May 2023 08:57:09 +0000 (16:57 +0800)]
drm/amd/display: clean up some inconsistent indenting

No functional modification involved.

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn321/dcn321_fpu.c:556 dcn321_update_bw_bounding_box_fpu() warn: inconsistent indenting.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5304
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: remove unused definition
Yang Li [Wed, 24 May 2023 03:59:50 +0000 (11:59 +0800)]
drm/amd/display: remove unused definition

Eliminate the following warnings:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn316/dcn316_resource.c:1355:43: warning: unused variable 'res_create_maximus_funcs'
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn316/dcn316_resource.c:899:38: warning: unused variable 'debug_defaults_diags'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5296
Fixes: 25879d7b4986 ("drm/amd/display: Clean FPGA code in dc")
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: clean up some inconsistent indenting
Jiapeng Chong [Wed, 24 May 2023 08:57:08 +0000 (16:57 +0800)]
drm/amd/display: clean up some inconsistent indenting

No functional modification involved.

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/dcn314_fpu.c:269 dcn314_update_bw_bounding_box_fpu() warn: inconsistent indenting.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5305
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: remove unused definition
Yang Li [Wed, 24 May 2023 03:59:49 +0000 (11:59 +0800)]
drm/amd/display: remove unused definition

Eliminate the following warnings:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn315/dcn315_resource.c:1357:43: warning: unused variable 'res_create_maximus_funcs'
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn315/dcn315_resource.c:893:38: warning: unused variable 'debug_defaults_diags'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5296
Fixes: 25879d7b4986 ("drm/amd/display: Clean FPGA code in dc")
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: remove unused definition
Yang Li [Wed, 24 May 2023 03:59:46 +0000 (11:59 +0800)]
drm/amd/display: remove unused definition

Eliminate the following warnings:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn302/dcn302_resource.c:957:43: warning: unused variable 'res_create_maximus_funcs'
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn302/dcn302_resource.c:101:38: warning: unused variable 'debug_defaults_diags'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5296
Fixes: 25879d7b4986 ("drm/amd/display: Clean FPGA code in dc")
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: remove unused definition
Yang Li [Wed, 24 May 2023 03:59:47 +0000 (11:59 +0800)]
drm/amd/display: remove unused definition

Eliminate the following warnings:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn303/dcn303_resource.c:884:43: warning: unused variable 'res_create_maximus_funcs'
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn303/dcn303_resource.c:84:38: warning: unused variable 'debug_defaults_diags'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5296
Fixes: 25879d7b4986 ("drm/amd/display: Clean FPGA code in dc")
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add the accelerator pcie class
Shiwu Zhang [Mon, 22 May 2023 07:58:10 +0000 (15:58 +0800)]
drm/amdgpu: add the accelerator pcie class

v2: add the base class id for accelerator (lijo)
v3: add the new pci class in amdgpu tree (hawking)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: save/restore part of xcp drm_device fields
James Zhu [Tue, 25 Apr 2023 20:55:56 +0000 (16:55 -0400)]
drm/amdgpu: save/restore part of xcp drm_device fields

Redirect xcp allocated drm_device::rdev/pdev/driver with
amdgpu pci_device/drm_device setting. They need be saved
before redirect and restored after unregister xcp drm_device.

-v2: fix warning discarded-qualifiers

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: set the APU flag based on package type
Shiwu Zhang [Mon, 22 May 2023 09:11:59 +0000 (17:11 +0800)]
drm/amdgpu: set the APU flag based on package type

Since currently APU and dGPU share the same pcie class
while gmc init needs the flag to set up correctly for upcomming
memory allocations

v2: call get_pkg_type in smuio 13_0_3 is enough (hawking)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/jpeg: add init value for num_jpeg_rings
James Zhu [Wed, 24 May 2023 14:48:40 +0000 (10:48 -0400)]
drm/jpeg: add init value for num_jpeg_rings

Need init new num_jpeg_rings to 1 on jpeg.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Richard Liang <rliang1@amd.com>
Tested-by: Ying Li <ying.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: complement the 4, 6 and 8 XCC cases
Shiwu Zhang [Wed, 17 May 2023 05:40:04 +0000 (13:40 +0800)]
drm/amdgpu: complement the 4, 6 and 8 XCC cases

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: golden settings for ASIC rev_id 0
Shiwu Zhang [Tue, 16 May 2023 02:31:49 +0000 (10:31 +0800)]
drm/amdgpu: golden settings for ASIC rev_id 0

Suggested by FW team that GB_ADDR_CONFIG is handled by golden
settings in driver to get the expected value

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: bypass bios dependent operations
Shiwu Zhang [Wed, 17 May 2023 06:15:05 +0000 (14:15 +0800)]
drm/amdgpu: bypass bios dependent operations

Since bios reading does not work currently so just bypass all operations
related to bios

v2: hardcode the vram info for APP_APU case (hawking)
v3: correct the vram_width with channel number * channel size (lijo)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Program gds backup address as zero if no gds allocated
Jiadong Zhu [Wed, 24 May 2023 08:51:32 +0000 (16:51 +0800)]
drm/amdgpu: Program gds backup address as zero if no gds allocated

It is firmware requirement to set gds_backup_addrlo and gds_backup_addrhi
of DE meta both zero if no gds partition is allocated for the frame.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Reset CP_VMID_PREEMPT after trailing fence signaled
Jiadong Zhu [Wed, 24 May 2023 03:42:19 +0000 (11:42 +0800)]
drm/amdgpu: Reset CP_VMID_PREEMPT after trailing fence signaled

When MEC executes unmap_queue for mid command buffer preemption, it will
kick the write pointer of the gfx ring, set CP_VMID_PREEMPT to trigger the
preemption and wait for CP_VMID_PREEMPT becomes zero after the preemption
done. There is a race condition that PFP may excute the resetting command
before MEC set CP_VMID_PREEMPT. As a result, hang happens as
CP_VMID_PREEMPT is always 0xffff.

To avoid this, we send resetting CP_VMID_PREEMPT command after the trailing
fence is siganled and update gfx write pointer explicitly.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: remove unused definition
Yang Li [Wed, 24 May 2023 03:59:45 +0000 (11:59 +0800)]
drm/amd/display: remove unused definition

Eliminate the following warnings:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn301/dcn301_resource.c:1050:43: warning: unused variable 'res_create_maximus_funcs'
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn301/dcn301_resource.c:705:38: warning: unused variable 'debug_defaults_diags'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5296
Fixes: 25879d7b4986 ("drm/amd/display: Clean FPGA code in dc")
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: remove unused definition
Yang Li [Wed, 24 May 2023 03:59:44 +0000 (11:59 +0800)]
drm/amd/display: remove unused definition

Eliminate the following warnings:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_resource.c:1360:43: warning: unused variable 'res_create_maximus_funcs'
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_resource.c:737:38: warning: unused variable 'debug_defaults_diags'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5296
Fixes: 25879d7b4986 ("drm/amd/display: Clean FPGA code in dc")
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>