Add svm_range_bo_unref_async to schedule work to wait for svm_bo
eviction work done and then free svm_bo. __do_munmap put_page
is atomic context, call svm_range_bo_unref_async to avoid warning
invalid wait context. Other non atomic context call svm_range_bo_unref.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Huang Rui [Thu, 16 Dec 2021 18:35:27 +0000 (13:35 -0500)]
drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence
The job embedded fence donesn't initialize the flags at
dma_fence_init(). Then we will go a wrong way in
amdgpu_fence_get_timeline_name callback and trigger a null pointer panic
once we enabled the trace event here. So introduce new amdgpu_fence
object to indicate the job embedded fence.
v2: fix mismatch warning between the prototype and function name (Ray, kernel test robot)
Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Prike Liang [Mon, 13 Dec 2021 08:17:02 +0000 (16:17 +0800)]
drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
In the s0ix entry need retain gfx in the gfxoff state,so here need't
set gfx cgpg in the S0ix suspend-resume process. Moreover move the S0ix
check into SMU12 can simplify the code condition check.
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang wangx [Tue, 14 Dec 2021 13:52:17 +0000 (21:52 +0800)]
drm/radeon: Fix syntax errors in comments
Delete the redundant word 'we'.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Tue, 14 Dec 2021 07:04:47 +0000 (15:04 +0800)]
drm/amd/pm: Skip power state allocation
Power states are not valid for arcturus and aldebaran, no need to
allocate memory.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Mon, 13 Dec 2021 06:38:38 +0000 (14:38 +0800)]
drm/amdgpu: correct the wrong cached state for GMC on PICASSO
Pair the operations did in GMC ->hw_init and ->hw_fini. That
can help to maintain correct cached state for GMC and avoid
unintention gate operation dropping due to wrong cached state.
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1828 Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Mon, 13 Dec 2021 03:37:56 +0000 (11:37 +0800)]
drm/amdgpu: move smu_debug_mask to a more proper place
As the smu_context will be invisible from outside(of power). Also,
the smu_debug_mask can be shared around all power code instead of
some specific framework(swSMU) only.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Victor Skvortsov [Mon, 13 Dec 2021 21:38:20 +0000 (21:38 +0000)]
drm/amdgpu: SRIOV flr_work should use down_write
Host initiated VF FLR may fail if someone else is
already holding a read_lock. Change from down_write_trylock
to down_write to guarantee the reset goes through.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Martin Leung [Fri, 10 Dec 2021 23:04:07 +0000 (15:04 -0800)]
drm/amd/display: implement dc_mode_memclk
why:
Need interface to lower clocks when in dc (power save)
mode. Must be able to work with p_state unsupported cases
Can cause flicker when OS notifies us of dc state change
how:
added dal3 interface for KMD
added pathway to query smu for this softmax
added blank before clock change to override underflow
added logic to change clk based on pstatesupport and softmax
added logic in prepare/optimize_bw to conform while changing
clocks
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Martin Leung <Martin.Leung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Eric Bernstein [Fri, 10 Dec 2021 23:04:06 +0000 (15:04 -0800)]
drm/amd/display: ODM + MPO window on only one half of ODM
[Why]
For ODM + MPO window on one half of ODM, only 3 pipes should
be allocated and scaling parameters adjusted to handle this case
[How]
Fix pipe allocation when MPO viewport is only on one side of ODM
split, and modify scaling paramters.
Added diags test cases for ODM + windows MPO, where MPO window is
on right half, left half, and both halves or ODM.
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Eric Bernstein <eric.bernstein@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Fri, 10 Dec 2021 23:04:05 +0000 (15:04 -0800)]
drm/amd/display: Reset DMCUB before HW init
[Why]
If the firmware wasn't reset by PSP or HW and is currently running
then the firmware will hang or perform underfined behavior when we
modify its firmware state underneath it.
[How]
Reset DMCUB before setting up cache windows and performing HW init.
Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michael Strauss [Fri, 10 Dec 2021 23:04:03 +0000 (15:04 -0800)]
drm/amd/display: Force det buf size to 192KB with 3+ streams and upscaling
[WHY]
This workaround resolves underflow caused by incorrect DST_Y_PREFETCH.
Overriding to 192KB DET buf size until the DST_Y_PREFETCH calc is fixed.
Reviewed-by: Eric Yang <Eric.Yang2@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mikita Lipski [Fri, 10 Dec 2021 23:04:02 +0000 (15:04 -0800)]
drm/amd/display: parse and check PSR SU caps
[why]
Adding a function to read PSR capabilities
and ALPM capabilities.
Also adding a helper function to validate if
the sink and the driver support PSR SU.
[how]
- isolated all PSR and ALPM reading calls to a separate funciton
- set all required PSR caps
- added a helper function to check if PSR SU is supported by sink
and the driver
Reviewed-by: Roman Li <Roman.Li@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Mikita Lipski <mikita.lipski@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Solomon Chiu [Fri, 10 Dec 2021 23:04:01 +0000 (15:04 -0800)]
drm/amd/display: Add src/ext ID info for dummy service
[Why]
Current error log of dummy irq service doesn't have
src/ext ID info in the log.
[How]
Add src/ext ID in ack/set of dummy irq service.
Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wayne Lin [Fri, 10 Dec 2021 23:04:00 +0000 (15:04 -0800)]
drm/amd/display: Add debugfs entry for ILR
[Why & How]
In order to know the intermediate link rates supported by the eDP
panel and test to select the optimized link rate to save power,
create a new debugfs entry "ilr_setting" for
setting ILR.
Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Mon, 13 Dec 2021 15:05:09 +0000 (09:05 -0600)]
drivers/amd/pm: drop statement to print FW version for smu_v13
Update smu_v13 to match smu_v12 and smu_v11 behavior where this is
fetched from debugfs rather than in kernel logs on every boot.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Thu, 9 Dec 2021 18:13:53 +0000 (12:13 -0600)]
drm/amd/pm: fix reading SMU FW version from amdgpu_firmware_info on YC
This value does not get cached into adev->pm.fw_version during
startup for smu13 like it does for other SMU like smu12.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Wed, 8 Dec 2021 17:46:51 +0000 (14:46 -0300)]
drm/amd/display: fix function scopes
This turns previously global functions into static, thus removing
compile-time warnings such as:
warning: no previous prototype for 'get_highest_allowed_voltage_level'
[-Wmissing-prototypes]
742 | unsigned int get_highest_allowed_voltage_level(uint32_t chip_family, uint32_t hw_internal_rev, uint32_t pci_revision_id)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warning: no previous prototype for 'rv1_vbios_smu_send_msg_with_param'
[-Wmissing-prototypes]
102 | int rv1_vbios_smu_send_msg_with_param(struct clk_mgr_internal *clk_mgr, unsigned int msg_id, unsigned int param)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Changes since v1:
- As suggested by Rodrigo Siqueira:
1. Rewrite function signatures to make them more readable.
2. Get rid of unused functions in order to remove 'defined but not
used' warnings.
Michel Dänzer [Thu, 9 Dec 2021 16:46:44 +0000 (17:46 +0100)]
drm/amd/display: Reduce stack size for dml31 UseMinimumDCFCLK
Use the struct display_mode_lib pointer instead of passing lots of large
arrays as parameters by value.
Addresses this warning (resulting in failure to build a RHEL debug kernel
with Werror enabled):
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c: In function ‘UseMinimumDCFCLK’:
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:7478:1: warning: the frame size of 2128 bytes is larger than 2048 bytes [-Wframe-larger-than=]
NOTE: AFAICT this function previously had no observable effect, since it
only modified parameters passed by value and doesn't return anything.
Now it may modify some values in struct display_mode_lib passed in by
reference.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michel Dänzer [Thu, 9 Dec 2021 16:46:43 +0000 (17:46 +0100)]
drm/amd/display: Reduce stack size for dml31_ModeSupportAndSystemConfigurationFull
Move code using the Pipe struct to a new helper function.
Works around[0] this warning (resulting in failure to build a RHEL debug
kernel with Werror enabled):
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c: In function ‘dml31_ModeSupportAndSystemConfigurationFull’:
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:5740:1: warning: the frame size of 2144 bytes is larger than 2048 bytes [-Wframe-larger-than=]
The culprit seems to be the Pipe struct, so pull the relevant block out
into its own sub-function. (This is porting
commit a62427ef9b55 ("drm/amd/display: Reduce stack size for dml21_ModeSupportAndSystemConfigurationFull")
from dml31 to dml21)
[0] AFAICT this doesn't actually reduce the total amount of stack which
can be used, just moves some of it from
dml31_ModeSupportAndSystemConfigurationFull to the new helper function,
so the former happens to no longer exceed the limit for a single
function.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Thu, 9 Dec 2021 15:47:21 +0000 (12:47 -0300)]
drm/amdgpu: remove unnecessary variables
This fixes the warnings below, and also drops the display_count
variable, as it's unused.
In function 'svm_range_map_to_gpu':
warning: variable 'bo_va' set but not used [-Wunused-but-set-variable]
1172 | struct amdgpu_bo_va bo_va;
| ^~~~~
...
In function 'dcn201_update_clocks':
warning: variable 'enter_display_off' set but not used [-Wunused-but-set-variable]
132 | bool enter_display_off = false;
| ^~~~~~~~~~~~~~~~~
Changes since v1:
- As suggested by Rodrigo Siqueira:
1. Drop display_count variable.
- As suggested by Felix Kuehling:
1. Remove block surrounding amdgpu_xgmi_same_hive.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Fri, 10 Dec 2021 15:49:52 +0000 (09:49 -0600)]
drm/amd: move variable to local scope
`edp_stream` is only used when backend is enabled on eDP, don't
declare the variable outside that scope.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Fri, 10 Dec 2021 15:45:23 +0000 (09:45 -0600)]
drm/amd: add some extra checks that is_dig_enabled is defined
There are a few places that this isn't checked that could potentially
be a NULL pointer access.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Tue, 7 Dec 2021 02:17:14 +0000 (21:17 -0500)]
drm/amdgpu: Reduce SG bo memory usage for mGPUs
For userptr bo, if adev is not in IOMMU isolation mode, RAM direct map
to GPU, multiple GPUs use same system memory dma mapping address, they
can share the original mem->bo in attachment to reduce dma address array
memory usage.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Tue, 7 Dec 2021 02:10:52 +0000 (21:10 -0500)]
drm/amdgpu: Detect if amdgpu in IOMMU direct map mode
If host and amdgpu IOMMU is not enabled or IOMMU is pass through mode,
set adev->ram_is_direct_mapped flag which will be used to optimize
memory usage for multi GPU mappings.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rodrigo Siqueira [Thu, 25 Nov 2021 15:38:30 +0000 (10:38 -0500)]
Documentation/gpu: Add amdgpu and dc glossary
In the DC driver, we have multiple acronyms that are not obvious most of
the time; the same idea is valid for amdgpu. This commit introduces a DC
and amdgpu glossary in order to make it easier to navigate through our
driver.
Changes since V3:
- Yann: Add new acronyms to amdgpu glossary
- Daniel: Add link between dc and amdgpu glossary
Changes since V2:
- Add MMHUB
Changes since V1:
- Yann: Divide glossary based on driver context.
- Alex: Make terms more consistent and update CPLIB
- Add new acronyms to the glossary
Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rodrigo Siqueira [Thu, 25 Nov 2021 15:38:29 +0000 (10:38 -0500)]
Documentation/gpu: Add basic overview of DC pipeline
This commit describes how DCN works by providing high-level diagrams
with an explanation of each component. In particular, it details the
Global Sync signals.
Change since V2:
- Add a comment about MMHUBBUB.
Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Display core provides a feature that makes it easy for users to debug
Multiple planes by enabling a visual notification at the bottom of each
plane. This commit introduces how to use such a feature.
Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rodrigo Siqueira [Thu, 25 Nov 2021 15:38:25 +0000 (10:38 -0500)]
Documentation/gpu: Reorganize DC documentation
Display core documentation is not well organized, and it is hard to find
information due to the lack of sections. This commit reorganizes the
documentation layout, and it is preparation work for future changes.
Changes since V1:
- Christian: Group amdgpu documentation together.
- Daniel: Drop redundant amdgpu prefix.
- Jani: Create index pages.
- Yann: Mirror display folder in the documentation.
Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lang Yu [Tue, 30 Nov 2021 03:06:40 +0000 (11:06 +0800)]
drm/amdgpu: add support for SMU debug option
SMU firmware expects the driver maintains error context
and doesn't interact with SMU any more when SMU errors
occurred. That will aid in debugging SMU firmware issues.
Add SMU debug option support for this request, it can be
enabled or disabled via amdgpu_smu_debug debugfs file.
Use a 32-bit mask to indicate corresponding debug modes.
Currently, only one mode(HALT_ON_ERROR) is supported.
When enabled, it brings hardware to a kind of halt state
so that no one can touch it any more in the envent of SMU
errors.
The dirver interacts with SMU via sending messages. And
threre are three ways to sending messages to SMU in current
implementation. Handle them respectively as following:
1, smu_cmn_send_smc_msg_with_param() for normal timeout cases
Halt on any error.
2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response()
for longer timeout cases
Halt on errors apart from ETIME. Otherwise this way won't work.
Let the user handle ETIME error in such a case.
3, smu_cmn_send_msg_without_waiting() for no waiting cases
Halt on errors apart from ETIME. Otherwise second way won't work.
Lang Yu [Thu, 9 Dec 2021 07:10:32 +0000 (15:10 +0800)]
drm/amdgpu: introduce a kind of halt state for amdgpu device
It is useful to maintain error context when debugging
SW/FW issues. Introduce amdgpu_device_halt() for this
purpose. It will bring hardware to a kind of halt state,
so that no one can touch it any more.
Compare to a simple hang, the system will keep stable
at least for SSH access. Then it should be trivial to
inspect the hardware state and see what's going on.
v2:
- Set adev->no_hw_access earlier to avoid potential crashes.(Christian)
Suggested-by: Christian Koenig <christian.koenig@amd.com> Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Christian Koenig <christian.koenig@amd.co> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leslie Shi [Wed, 8 Dec 2021 07:47:41 +0000 (15:47 +0800)]
drm/amdgpu: fix incorrect VCN revision in SRIOV
Guest OS will setup VCN instance 1 which is disabled as an enabled instance and
execute initialization work on it, but this causes VCN ib ring test failure
on the disabled VCN instance during modprobe:
amdgpu 0000:00:08.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 5 on hub 1
amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_dec_0 (-110).
amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_enc_0.0 (-110).
[drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110).
v2: drop amdgpu_discovery_get_vcn_version and rename sriov_config to
vcn_config
v3: modify VCN's revision in SR-IOV and bare-metal
Leslie Shi [Mon, 6 Dec 2021 08:44:54 +0000 (16:44 +0800)]
drm/amdgpu: add modifiers in amdgpu_vkms_plane_init()
Fix following warning in SRIOV during modprobe:
amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier
WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu]
Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Fri, 10 Dec 2021 03:01:22 +0000 (22:01 -0500)]
drm/amdgpu: disable default navi2x co-op kernel support
This patch reverts the following:
commit 48733b224fa7ba ("drm/amdkfd: add Navi2x to GWS init conditions")
Disable GWS usage in default settings for now due to FW bugs.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Graham Sider [Thu, 9 Dec 2021 18:26:05 +0000 (13:26 -0500)]
drm/amdkfd: add Navi2x to GWS init conditions
Initalize GWS on Navi2x with mec2_fw_version >= 0x42.
Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-and-tested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lang Yu [Fri, 3 Dec 2021 06:30:16 +0000 (14:30 +0800)]
drm/amdgpu: only hw fini SMU fisrt for ASICs need that
We found some headaches on ASICs don't need that,
so remove that for them.
Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Kevin Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Wed, 8 Dec 2021 03:20:35 +0000 (22:20 -0500)]
drm/amdkfd: Make KFD support on Hawaii experimental
Hawaii support is mostly untested these days. ROCm user mode also
depends on custom firmware for AQL packet processing, that was never
pushed upstream due to quality regressions in graphics driver testing.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Wed, 8 Dec 2021 02:04:06 +0000 (21:04 -0500)]
drm/amdkfd: Don't split unchanged SVM ranges
If an existing SVM range overlaps an svm_range_set_attr call, we would
normally split it in order to update only the overlapping part.
However, if the attributes of the existing range would not be changed
splitting it is unnecessary.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Wed, 8 Dec 2021 02:02:42 +0000 (21:02 -0500)]
drm/amdkfd: Fix svm_range_is_same_attrs
The existing function doesn't compare the access bitmaps and flags.
This can result in failure to update those attributes in existing
ranges when all other attributes remained unchanged.
Because the access and flags attributes modify only some bits in the
respective bitmaps, we cannot compare them directly. Instead we need to
check whether applying the attributes to a particular range would
change the bitmaps.
A PREFETCH_LOC attribute must always trigger a migration, even if the
attribute value remains unchanged. E.g. if some pages were migrated due
to a CPU page fault, a prefetch must still be executed to migrate pages
back to VRAM.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Wed, 8 Dec 2021 01:23:14 +0000 (20:23 -0500)]
drm/amdkfd: Fix error handling in svm_range_add
Add null-pointer check after the last svm_range_new call. This was
originally reported by Zhou Qingyang <zhou1615@umn.edu> based on a
static analyzer.
To avoid duplicating the unwinding code from svm_range_handle_overlap,
I merged the two functions into one.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Zhou Qingyang <zhou1615@umn.edu> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Wed, 8 Dec 2021 19:55:15 +0000 (14:55 -0500)]
drm/amdgpu: Handle fault with same timestamp
Remove not unique timestamp WARNING as same timestamp interrupt happens
on some chips,
Drain fault need to wait for the processed_timestamp to be truly greater
than the checkpoint or the ring to be empty to be sure no stale faults
are handled.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1818 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Wed, 8 Dec 2021 01:25:27 +0000 (22:25 -0300)]
drm/amdgpu: fix location of prototype for amdgpu_kms_compat_ioctl
This fixes the warning below by changing the prototype to a location
that's actually included by the .c files that call
amdgpu_kms_compat_ioctl:
warning: no previous prototype for ‘amdgpu_kms_compat_ioctl’
[-Wmissing-prototypes]
37 | long amdgpu_kms_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
| ^~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Wed, 8 Dec 2021 01:25:23 +0000 (22:25 -0300)]
drm/amdgpu: fix function scopes
This turns previously global functions into static, thus removing
compile-time warnings such as:
warning: no previous prototype for 'amdgpu_vkms_output_init' [-Wmissing-prototypes]
399 | int amdgpu_vkms_output_init(struct drm_device *dev,
| ^~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Wed, 8 Dec 2021 01:25:21 +0000 (22:25 -0300)]
drm/amd: fix improper docstring syntax
This fixes various warnings relating to erroneous docstring syntax, of
which some are listed below:
warning: Function parameter or member 'adev' not described in
'amdgpu_atomfirmware_ras_rom_addr'
...
warning: expecting prototype for amdgpu_atpx_validate_functions().
Prototype was for amdgpu_atpx_validate() instead
...
warning: Excess function parameter 'mem' description in 'amdgpu_preempt_mgr_new'
...
warning: Cannot understand * @kfd_get_cu_occupancy - Collect number of
waves in-flight on this device
...
warning: This comment starts with '/**', but isn't a kernel-doc
comment. Refer Documentation/doc-guide/kernel-doc.rst
Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhigang Luo [Mon, 6 Dec 2021 21:21:03 +0000 (16:21 -0500)]
drm/amdgpu: recover XGMI topology for SRIOV VF after reset
For SRIOV VF, the XGMI topology was not recovered after reset. This
change added code to SRIOV VF reset function to update XGMI topology
for SRIOV VF after reset.
Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhigang Luo [Mon, 6 Dec 2021 19:52:00 +0000 (14:52 -0500)]
drm/amdgpu: added PSP XGMI initialization for SRIOV VF during recover
For SRIOV VF, XGMI was not initialized in PSP during recover. This change
added PSP XGMI initialization for SRIOV VF during recover.
Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhigang Luo [Fri, 26 Nov 2021 17:16:45 +0000 (12:16 -0500)]
drm/amdgpu: skip reset other device in the same hive if it's SRIOV VF
On SRIOV, host driver can support FLR(function level reset) on individual VF
within the hive which might bring the individual device back to normal without
the necessary to execute the hive reset. If the FLR failed , host driver will
trigger the hive reset, each guest VF will get reset notification before the
real hive reset been executed. The VF device can handle the reset request
individually in it's reset work handler.
This change updated gpu recover sequence to skip reset other device in
the same hive for SRIOV VF.
Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Claudio Suarez [Sun, 17 Oct 2021 11:35:00 +0000 (13:35 +0200)]
drm/amdgpu: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi
Once EDID is parsed, the monitor HDMI support information is available
through drm_display_info.is_hdmi. The amdgpu driver still calls
drm_detect_hdmi_monitor() to retrieve the same information, which
is less efficient. Change to drm_display_info.is_hdmi
This is a TODO task in Documentation/gpu/todo.rst
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Claudio Suarez <cssk@net-c.es> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Claudio Suarez [Sun, 17 Oct 2021 11:34:58 +0000 (13:34 +0200)]
drm/amdgpu: update drm_display_info correctly when the edid is read
drm_display_info is updated by drm_get_edid() or
drm_connector_update_edid_property(). In the amdgpu driver it is almost
always updated when the edid is read in amdgpu_connector_get_edid(),
but not always. Change amdgpu_connector_get_edid() and
amdgpu_connector_free_edid() to keep drm_display_info updated.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Claudio Suarez <cssk@net-c.es> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Tue, 7 Dec 2021 14:46:39 +0000 (09:46 -0500)]
drm/amd/display: Fix out of bounds access on DNC31 stream encoder regs
[Why]
During dcn31_stream_encoder_create, if PHYC/D get remapped to F/G on B0
then we'll index 5 or 6 into a array of length 5 - leading to an
access violation on some configs during device creation.
[How]
Software won't be touching PHYF/PHYG directly, so just extend the
array to cover all possible engine IDs.
Even if it does by try to access one of these registers by accident
the offset will be 0 and we'll get a warning during the access.
Fixes: 2fe9a0e1173f ("drm/amd/display: Fix DCN3 B0 DP Alt Mapping") Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flora Cui [Thu, 2 Dec 2021 07:47:56 +0000 (15:47 +0800)]
drm/amdgpu: free vkms_output after use
Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flora Cui [Wed, 24 Nov 2021 07:53:17 +0000 (15:53 +0800)]
drm/amdgpu: drop the critial WARN_ON in amdgpu_vkms
Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wayne Lin [Fri, 26 Nov 2021 07:19:57 +0000 (15:19 +0800)]
drm/amd/display: Fix bug in debugfs crc_win_update entry
[Why]
crc_rd_wrk shouldn't be null in crc_win_update_set(). Current programming
logic is inconsistent in crc_win_update_set().
[How]
Initially, return if crc_rd_wrk is NULL. Later on, we can use member of
crc_rd_wrk safely.
Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 9a65df193108 ("drm/amd/display: Use PSP TA to read out crc") Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wyatt Wood [Wed, 24 Nov 2021 17:50:20 +0000 (12:50 -0500)]
drm/amd/display: Prevent PSR disable/reenable in HPD IRQ
[Why]
When HPD IRQ occurs, it triggers a PSR disable and reenable
directly through dc layer.
Since it does not pass through the power layer, the layer
that tracks whether PSR is enabled or disabled and which
masks are set, this layer is now out of sync with the real
PSR state in FW.
Theoretically PSR can be enabled during hw programming
sequences or any other situation where we must disable PSR.
[How]
Check if PSR is enabled before doing PSR disable/reenable.
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Wyatt Wood <wyatt.wood@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Tue, 23 Nov 2021 16:56:38 +0000 (11:56 -0500)]
drm/amd/display: Fix DPIA outbox timeout after S3/S4/reset
[Why]
The HW interrupt gets disabled after S3/S4/reset so we don't receive
notifications for HPD or AUX from DMUB - leading to timeout and
black screen with (or without) DPIA links connected.
[How]
Re-enable the interrupt after S3/S4/reset like we do for the other
DC interrupts.
Guard both instances of the outbox interrupt enable or we'll hang
during restore on ASIC that don't support it.
Fixes: 524a0ba6fab955 ("drm/amd/display: Fix DPIA outbox timeout after GPU reset") Reviewed-by: Jude Shih <Jude.Shih@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>