Alex Sierra [Wed, 23 Feb 2022 17:34:51 +0000 (11:34 -0600)]
drm/amdgpu: Add use_xgmi_p2p module parameter
This parameter controls xGMI p2p communication, which is enabled by
default. However, it can be disabled by setting it to 0. In case xGMI
p2p is disabled in a dGPU, PCIe p2p interface will be used instead.
This parameter is ignored in GPUs that do not support xGMI
p2p configuration.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sung Joon Kim [Tue, 8 Feb 2022 22:13:57 +0000 (17:13 -0500)]
drm/amd/display: increasing DRAM BW percent for DCN315
[why]
DML validation fails when we connect two or
more displays with HDR. Need to increase
DRAM BW to make the validation passing.
Following the value from DCN31.
[how]
Change the max DRAM BW DML field to 60%.
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Sung Joon Kim <sungkim@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Duncan Ma [Tue, 8 Feb 2022 20:05:09 +0000 (15:05 -0500)]
drm/amd/display: Set compbuf size to min at prep prevent overbook crb
[Why]
Detbuffer size is dynamically set for dcn31x. At certain moment,
compbuf+(def size * num pipes) > config return buffer size causing
flickering. This is easily reproducible when MPO is
enabled with two displays.
[How]
At prepare BW, use the min comp buffer size. When it is to
optimize BW, set compbuf size back to maximum possible size.
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Duncan Ma <duncanma@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dmytro Laktyushkin [Fri, 21 Jan 2022 22:28:07 +0000 (17:28 -0500)]
drm/amd/display: revert populating dcn315 clk table based on dcfclk
[Why & How]
Due to how pmfw fills out the table when dcfclk states are disabled,
using dcfclk based clk table would cause a no read situation.
Revert the change to prevent underflow until a better solution is coded.
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Somalapuram Amaranath [Wed, 23 Feb 2022 10:50:31 +0000 (16:20 +0530)]
drm/amdgpu: add debugfs for reset registers list
List of register populated for dump collection during the GPU reset.
Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Qiang Yu [Mon, 21 Feb 2022 09:53:56 +0000 (17:53 +0800)]
drm/amdgpu: check vm ready by amdgpu_vm->evicting flag
Workstation application ANSA/META v21.1.4 get this error dmesg when
running CI test suite provided by ANSA/META:
[drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)
This is caused by:
1. create a 256MB buffer in invisible VRAM
2. CPU map the buffer and access it causes vm_fault and try to move
it to visible VRAM
3. force visible VRAM space and traverse all VRAM bos to check if
evicting this bo is valuable
4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable()
will set amdgpu_vm->evicting, but latter due to not in visible
VRAM, won't really evict it so not add it to amdgpu_vm->evicted
5. before next CS to clear the amdgpu_vm->evicting, user VM ops
ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted)
but fail in amdgpu_vm_bo_update_mapping() (check
amdgpu_vm->evicting) and get this error log
This error won't affect functionality as next CS will finish the
waiting VM ops. But we'd better clear the error log by checking
the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling
amdgpu_vm_bo_update_mapping() later.
Another reason is amdgpu_vm->evicted list holds all BOs (both
user buffer and page table), but only page table BOs' eviction
prevent VM ops. amdgpu_vm->evicting flag is set only for page
table BOs, so we should use evicting flag instead of evicted list
in amdgpu_vm_ready().
The side effect of this change is: previously blocked VM op (user
buffer in "evicted" list but no page table in it) gets done
immediately.
v2: update commit comments.
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Qiang Yu <qiang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Prike Liang [Mon, 21 Feb 2022 03:14:52 +0000 (11:14 +0800)]
drm/amdgpu/nv: enable gfx10.3.7 clock gating support
This will enable the following gfx clock gating.
- Fine clock gating
- Medium Grain clock gating
- 3D Coarse clock gating
- Coarse Grain clock gating
- RLC/CP light sleep clock gating
Aric Cyr [Mon, 14 Feb 2022 04:09:36 +0000 (23:09 -0500)]
drm/amd/display: 3.2.174
This version brings along following fixes:
- add debug option to bypass ssinfo from bios.
- Refactor fixed VS logic for non-transparent mode
- add cable ID support for usb c connector
- clear remote dc_sink when stop mst
- Ignore Transitional Invalid Link Rate Error Message
- Fix wrong resolution with DP/VGA adapter
- Refactor PSR DPCD caps detection
- Set compbuf size to min at prep prevent overbook crb
- lock/un-lock cursor if odm pipe split used
Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Sat, 12 Feb 2022 01:55:25 +0000 (20:55 -0500)]
drm/amd/display: add debug option to bypass ssinfo from bios.
[Why&How]
add debug option to bypass ssinfo from bios.
Reviewed-by: Chris Park <Chris.Park@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
George Shen [Fri, 11 Feb 2022 23:37:39 +0000 (18:37 -0500)]
drm/amd/display: Refactor fixed VS logic for non-transparent mode
[Why]
All fixed VS/PE link training sequence should be refactored
into a separate function outside of the standard link training
sequence. This includes the sequence for non-transparent
mode.
[How]
Isolate link training sequence for fixed VS/PE non-transparent
mode into a separate function.
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: George Shen <George.Shen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wenjing Liu [Tue, 8 Feb 2022 23:38:26 +0000 (18:38 -0500)]
drm/amd/display: add cable ID support for usb c connector
[how]
Call to DMUB to retrieve usb c cable ID data from PD firmware.
If cable id is retrieved from DMUB, skip reading cable ID from RX.
Reviewed-by: George Shen <George.Shen@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wayne Lin [Thu, 10 Feb 2022 10:38:26 +0000 (18:38 +0800)]
drm/amd/display: clear remote dc_sink when stop mst
[Why]
Currently, we don't have code path to release remote dc_sink when unplug
MST hub from the system. After few times hotplug, we hit the limition of
maximum number of remote dc_sink and can't light up new connected monitor
anymore.
[How]
Releasing all remote dc_sink at dm_helpers_dp_mst_stop_top_mgr() was
removed by previous patch. Restore it.
Reviewed-by: Jerry Zuo <Jerry.Zuo@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fangzhi Zuo [Wed, 9 Feb 2022 21:43:45 +0000 (16:43 -0500)]
drm/amd/display: Ignore Transitional Invalid Link Rate Error Message
[Why]
When hotplug or unplug happens, each stream disabled one by one, and then
enable any alived streams. Link phy and payload table is cleared when 1st
stream is disabled. That causes the error message pops up when disable 2nd
stream. There is no active stream after link_rate is cleared.
After all streams are disabled, link will be trained again and link rate is
assigned to any alived streams.
Therefore there is no harm for the error message that represents invalid
link rate value in the atomic reset transitional time period.
[How]
Downgrade the log level from ERROR to DEBUG.
Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ilya [Thu, 10 Feb 2022 16:09:33 +0000 (11:09 -0500)]
drm/amd/display: Fix wrong resolution with DP/VGA adapter
[Why]
Hotplugging the VGA side of some DP/VGA adapters caused the display to
light up with the wrong (non-native) resolution.
This is caused by the adapter misbehaving by reporting the wrong number
of downstream ports when the VGA side is unplugged (reports 1 instead of
0), but only if the SINK_COUNT DPCD register is read more than once.
[How]
To work around the adapter behavior, remove the sink if link is
detected, but EDID cannot be read.
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Ilya <Ilya.Bakoulin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Duncan Ma [Tue, 8 Feb 2022 20:05:09 +0000 (15:05 -0500)]
drm/amd/display: Set compbuf size to min at prep prevent overbook crb
[Why]
Detbuffer size is dynamically set for dcn31x. At certain moment,
compbuf+(def size * num pipes) > config return buffer size causing
flickering. This is easily reproducible when MPO is
enabled with two displays.
[How]
At prepare BW, use the min comp buffer size. When it is to
optimize BW, set compbuf size back to maximum possible size.
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Duncan Ma <duncanma@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Paul Hsieh [Mon, 7 Feb 2022 03:45:24 +0000 (11:45 +0800)]
drm/amd/display: lock/un-lock cursor if odm pipe split used
[Why]
When system resume from sleep, the cursor lock will be reset
to default(lock status). And the cursor programming sequence
doesn't consider about odm pipe split cause cursor can't be
enabled.
[How]
If odm pipe split has been used, lock/un-lock on each pipes.
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Paul Hsieh <paul.hsieh@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Maíra Canal [Tue, 22 Feb 2022 13:17:01 +0000 (10:17 -0300)]
drm/amd/display: Turn global functions into static functions
Turn previously global functions into static functions to avoid
-Wmissing-prototype warnings, such as:
drivers/gpu/drm/amd/amdgpu/../display/dc/irq/dcn30/irq_service_dcn30.c:50:20:
warning: no previous prototype for function 'to_dal_irq_source_dcn30'
[-Wmissing-prototypes]
enum dc_irq_source to_dal_irq_source_dcn30(
^
drivers/gpu/drm/amd/amdgpu/../display/dc/irq/dcn30/irq_service_dcn30.c:50:1:
note: declare 'static' if the function is not intended to be used outside
of this translation unit
enum dc_irq_source to_dal_irq_source_dcn30(
^
static
1 warning generated.
drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn316/dcn316_clk_mgr.c:488:6:
warning: no previous prototype for function
'dcn316_clk_mgr_helper_populate_bw_params' [-Wmissing-prototypes]
void dcn316_clk_mgr_helper_populate_bw_params(
^
drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn316/dcn316_clk_mgr.c:488:1:
note: declare 'static' if the function is not intended to be used outside
of this translation unit
void dcn316_clk_mgr_helper_populate_bw_params(
^
static
1 warning generated.
v2: drop is_timing_changed hunk (Alex)
Signed-off-by: Maíra Canal <maira.canal@usp.br> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Maíra Canal [Tue, 22 Feb 2022 13:17:00 +0000 (10:17 -0300)]
drm/amd/display: Add missing prototypes to dcn201_init
Include the header with the prototype to silence the following clang
warning:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn201/dcn201_init.c:127:6:
warning: no previous prototype for function 'dcn201_hw_sequencer_construct'
[-Wmissing-prototypes]
void dcn201_hw_sequencer_construct(struct dc *dc)
^
Signed-off-by: Maíra Canal <maira.canal@usp.br> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Maíra Canal [Tue, 22 Feb 2022 13:16:59 +0000 (10:16 -0300)]
drm/amd/display: Remove unused variable
Remove the variable clamshell_closed from the function
dcn10_align_pixel_clocks.
This was pointed by clang with the following warning:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:2063:7:
warning: variable 'clamshell_closed' set but not used
[-Wunused-but-set-variable]
bool clamshell_closed = false;
^
Signed-off-by: Maíra Canal <maira.canal@usp.br> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Maíra Canal [Tue, 22 Feb 2022 13:16:56 +0000 (10:16 -0300)]
drm/amd/display: Remove unused dcn316_smu_set_voltage_via_phyclk function
Remove dcn316_smu_set_voltage_via_phyclk function, which is not used in the
codebase.
This was pointed by clang with the following warning:
drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn316/dcn316_smu.c:171:5:
warning: no previous prototype for function
'dcn316_smu_set_voltage_via_phyclk' [-Wmissing-prototypes]
int dcn316_smu_set_voltage_via_phyclk(struct clk_mgr_internal *clk_mgr, int
requested_phyclk_khz)
^
Signed-off-by: Maíra Canal <maira.canal@usp.br> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Maíra Canal [Tue, 22 Feb 2022 13:16:55 +0000 (10:16 -0300)]
drm/amd/display: Remove unused temp variable
Remove unused temp variable from the dmub_rb_flush_pending function by
using arithmetic to remove the loop.
The -Wunused-but-set-variable warning was pointed out by Clang with the
following warning:
drivers/gpu/drm/amd/amdgpu/../display/dmub/inc/dmub_cmd.h:2921:12: warning:
variable 'temp' set but not used [-Wunused-but-set-variable]
uint64_t temp;
^
v2: squash in revert and comment update (Alex)
Signed-off-by: Maíra Canal <maira.canal@usp.br> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Maíra Canal [Tue, 22 Feb 2022 13:16:52 +0000 (10:16 -0300)]
drm/amdgpu: Change amdgpu_ras_block_late_init_default function scope
Turn previously global function into a static function to avoid the
following Clang warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2459:5: warning: no previous prototype
for function 'amdgpu_ras_block_late_init_default' [-Wmissing-prototypes]
int amdgpu_ras_block_late_init_default(struct amdgpu_device *adev,
^
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2459:1: note: declare 'static' if the
function is not intended to be used outside of this translation unit
int amdgpu_ras_block_late_init_default(struct amdgpu_device *adev,
^
static
Signed-off-by: Maíra Canal <maira.canal@usp.br> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Fri, 18 Feb 2022 22:20:53 +0000 (17:20 -0500)]
drm/amdkfd: Use real device for messages
kfd_chardev() doesn't provide much useful information in dev_... messages
on multi-GPU systems because there is only one KFD device, which doesn't
correspond to any particular GPU. Use the actual GPU device to indicate
the GPU that caused a message.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Yat Sin [Fri, 18 Feb 2022 22:53:43 +0000 (17:53 -0500)]
drm/amdkfd: Fix for possible integer overflow
Fix for possible integer overflow when doing addition.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 18 Feb 2022 20:55:15 +0000 (15:55 -0500)]
drm/amdgpu/benchmark: use dev_info rather than DRM macros for logging
So we can tell which output goes to which device when multiple GPUs
are present. Also while we are here, convert DRM_ERROR to dev_info.
The error cases are not critical.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 18 Feb 2022 20:40:12 +0000 (15:40 -0500)]
drm/amdkfd: make CRAT table missing message informational only
The driver has a fallback so make the message informational
rather than a warning. The driver has a fallback if the
Component Resource Association Table (CRAT) is missing, so
make this informational now.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1906 Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Guchun Chen [Fri, 18 Feb 2022 03:57:51 +0000 (11:57 +0800)]
drm/amdgpu: read harvest bit per IP data on legacy GPUs
Based on firmware team's input, harvest table in VBIOS does
not apply well to legacy products like Navi1x, so seperate
harvest mask configuration retrieve from different places.
On legacy GPUs, scan harvest bit per IP data stuctures,
while for newer ones, still read IP harvest info from harvest
table.
v2: squash in fix to limit it to specific skus (Guchun)
Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There's always miss the SMU feature enabled checked in the NPI phase,
so let validate the SMU feature enable message directly rather than
add more and more MP1 version check.
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Lijo Lazar <Lijo.Lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Guchun Chen [Fri, 18 Feb 2022 05:05:26 +0000 (13:05 +0800)]
drm/amdgpu: bypass tiling flag check in virtual display case (v2)
vkms leverages common amdgpu framebuffer creation, and
also as it does not support FB modifier, there is no need
to check tiling flags when initing framebuffer when virtual
display is enabled.
This can fix below calltrace:
amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier
WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu]
v2: check adev->enable_virtual_display instead as vkms can be
enabled in bare metal as well.
Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to support modifier in virtual kms, otherwise, in SRIOV
mode, when lanuching X server, set crtc will fail due to mismatch
between primary plane modifier and framebuffer modifier.
Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Fri, 18 Feb 2022 22:25:23 +0000 (17:25 -0500)]
drm/amdkfd: Fix criu_restore_bo error handling
Clang static analysis reports this problem
kfd_chardev.c:2327:2: warning: 1st function call argument
is an uninitialized value
kvfree(bo_privs);
^~~~~~~~~~~~~~~~
Make sure bo_buckets and bo_privs are initialized so freeing them in the
error handling code path will never result in undefined behaviour.
Fixes: 73fa13b6a511 ("drm/amdkfd: CRIU Implement KFD restore ioctl") Reported-by: Tom Rix <trix@redhat.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kent Russell [Fri, 18 Feb 2022 21:59:23 +0000 (16:59 -0500)]
drm/amdkfd: Drop IH ring overflow message to dbg
When this was first implemented, overflows weren't expected in regular
operations, and tests weren't in place to cause said overflow. Now there
are cases where overflows occur with real workloads, but we know that
the kernel can handle this robustly, so move the message to a debug
message.
Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chen Gong [Thu, 17 Feb 2022 07:29:41 +0000 (15:29 +0800)]
drm/amdgpu: do not enable asic reset for raven2
The GPU reset function of raven2 is not maintained or tested, so it should be
very unstable.
Now the amdgpu_asic_reset function is added to amdgpu_pmops_suspend, which
causes the S3 test of raven2 to fail, so the asic_reset of raven2 is ignored
here.
Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Signed-off-by: Chen Gong <curry.gong@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nathan Chancellor [Thu, 17 Feb 2022 16:21:42 +0000 (09:21 -0700)]
drm/amdkfd: Use proper enum in pm_unmap_queues_v9()
Clang warns:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_packet_manager_v9.c:267:3:
error: implicit conversion from enumeration type 'enum
mes_map_queues_extended_engine_sel_enum' to different enumeration type
'enum mes_unmap_queues_extended_engine_sel_enum'
[-Werror,-Wenum-conversion]
extended_engine_sel__mes_map_queues__sdma0_to_7_sel :
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Use 'extended_engine_sel__mes_unmap_queues__sdma0_to_7_sel' to eliminate
the warning, which is the same numeric value of the proper type.
Fixes: 009e9a158505 ("drm/amdkfd: navi2x requires extended engines to map and unmap sdma queues") Link: https://github.com/ClangBuiltLinux/linux/issues/1596 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clang build fails with
amdgpu_ras.c:2416:7: error: variable 'ras_obj' is used uninitialized
whenever 'if' condition is true
if (adev->in_suspend || amdgpu_in_reset(adev)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
amdgpu_ras.c:2453:6: note: uninitialized use occurs here
if (ras_obj->ras_cb)
^~~~~~~
There is a logic error in the error handler's labels.
ex/ The sysfs: is the last goto label in the normal code but
is the middle of error handler. Rework the error handler.
cleanup: is the first error, so it's handler should be last.
interrupt: is the second error, it's handler is next. interrupt:
handles the failure of amdgpu_ras_interrupt_add_hander() by
calling amdgpu_ras_interrupt_remove_handler(). This is wrong,
remove() assumes the interrupt has been setup, not torn down by
add(). Change the goto label to cleanup.
sysfs is the last error, it's handler should be first. sysfs:
handles the failure of amdgpu_ras_sysfs_create() by calling
amdgpu_ras_sysfs_remove(). But when the create() fails there
is nothing added so there is nothing to remove. This error
handler is not needed. Remove the error handler and change
goto label to interrupt.
Fixes: b293e891b057 ("drm/amdgpu: add helper function to do common ras_late_init/fini (v3)") Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Luben Tuikov [Thu, 17 Feb 2022 16:12:55 +0000 (11:12 -0500)]
drm/amdgpu: Dynamically initialize IP instance attributes
Dynamically initialize IP instance attributes. This eliminates bugs
stemming from adding new attributes to an IP instance.
Cc: Alex Deucher <Alexander.Deucher@amd.com> Reported-by: Tom StDenis <tom.stdenis@amd.com> Fixes: 4d7ba312dd1f ("drm/amdgpu: Add "harvest" to IP discovery sysfs") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom St Denis [Tue, 15 Feb 2022 14:59:07 +0000 (09:59 -0500)]
drm/amd/amdgpu: Add APU flag to gca_config debugfs data (v3)
Needed by umr to detect if ip discovered ASIC is an APU or not.
(v2): Remove asic type from packet it's not strictly needed
(v3): Correct comment
Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Thu, 17 Feb 2022 05:19:58 +0000 (23:19 -0600)]
drm/amd: Refactor `amdgpu_aspm` to be evaluated per device
Evaluating `pcie_aspm_enabled` as part of driver probe has the implication
that if one PCIe bridge with an AMD GPU connected doesn't support ASPM
then none of them do. This is an invalid assumption as the PCIe core will
configure ASPM for individual PCIe bridges.
Create a new helper function that can be called by individual dGPUs to
react to the `amdgpu_aspm` module parameter without having negative results
for other dGPUs on the PCIe bus.
Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Luben Tuikov [Wed, 16 Feb 2022 21:47:32 +0000 (16:47 -0500)]
drm/amdgpu: Fix ARM compilation warning
Fix this ARM warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:664:35: warning: format '%ld'
expects argument of type 'long int', but argument 4 has type 'size_t' {aka
'unsigned int'} [-Wformat=]
Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: kbuild-all@lists.01.org Cc: linux-kernel@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Fixes: a6c40b178092 ("drm/amdgpu: Show IP discovery in sysfs") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Tue, 1 Feb 2022 16:26:33 +0000 (10:26 -0600)]
drm/amd: Check if ASPM is enabled from PCIe subsystem
commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") enabled ASPM
by default but a variety of hardware configurations it turns out that this
caused a regression.
* PPC64LE hardware does not support ASPM at a hardware level.
CONFIG_PCIEASPM is often disabled on these architectures.
* Some dGPUs on ALD platforms don't work with ASPM enabled and PCIe subsystem
disables it
Check with the PCIe subsystem to see that ASPM has been enabled
or not.
yipechai [Tue, 15 Feb 2022 06:38:13 +0000 (14:38 +0800)]
drm/amdgpu: Remove redundant .ras_late_init initialization in some ras blocks
1. Define amdgpu_ras_block_late_init_default in amdgpu_ras.c as
.ras_late_init common function, which is called when
.ras_late_init of ras block isn't initialized.
2. Remove the code of using amdgpu_ras_block_late_init to
initialize .ras_late_init in ras blocks.
Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
yipechai [Mon, 14 Feb 2022 06:38:02 +0000 (14:38 +0800)]
drm/amdgpu: Optimize xxx_ras_late_init function of each ras block
1. Move calling ras block instance members from module internal
function to the top calling xxx_ras_late_init.
2. Module internal function calls can only use parameter variables
of xxx_ras_late_init instead of ras block instance members.
Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>