George Shen [Fri, 1 Oct 2021 14:36:09 +0000 (22:36 +0800)]
drm/amd/display: Skip override for preferred link settings during link training
[Why]
Overriding link setting inside override_training_settings
result in fallback link settings being ignored. This can
potentially cause link training to always fail and consequently
result in an infinite loop of link training to occur in
dp_verify_link_cap during detection.
[How]
Since preferred link settings are already considered inside
decide_link_settings, skip the check in override_training_settings
to avoid infinite link training loops.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Guchun Chen [Fri, 1 Oct 2021 01:48:50 +0000 (09:48 +0800)]
drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume
In current code, when a PCI error state pci_channel_io_normal is detectd,
it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI
driver will continue the execution of PCI resume callback report_resume by
pci_walk_bridge, and the callback will go into amdgpu_pci_resume
finally, where write lock is releasd unconditionally without acquiring
such lock first. In this case, a deadlock will happen when other threads
start to acquire the read lock.
To fix this, add a member in amdgpu_device strucutre to cache
pci_channel_state, and only continue the execution in amdgpu_pci_resume
when it's pci_channel_io_frozen.
Christian König [Thu, 30 Sep 2021 09:59:14 +0000 (11:59 +0200)]
drm/amdgpu: print warning and taint kernel if lockup timeout is disabled
Make sure that we notice this in error reports.
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Further discussion reveals that this feature is severely broken
and needs to be reverted ASAP.
GPU reset can never be delayed by userspace even for debugging or
otherwise we can run into in kernel deadlocks.
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is to fix clinfo failure in Raven/Picasso:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.2 AMD-APP (3364.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Prevent using DMUB rptr that is out-of-bounds
[Why]
Running into bugchecks during stress test where rptr is 0xFFFFFFFF.
Typically this is caused by a hard hang, and can come from HW outside
of DCN.
[How]
To prevent bugchecks when writing the DMUB rptr, fist check that the
rptr is valid.
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Wyatt Wood <wyatt.wood@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Guo Zhengkui [Fri, 1 Oct 2021 10:13:46 +0000 (18:13 +0800)]
drm/amdgpu: remove some repeated includings
Remove two repeated includings in line 46 and 47.
Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Fri, 1 Oct 2021 08:49:07 +0000 (16:49 +0800)]
drm/amdgpu: During s0ix don't wait to signal GFXOFF
In the rare event when GFX IP suspend coincides with a s0ix entry, don't
schedule a delayed work, instead signal PMFW immediately to allow GFXOFF
entry. GFXOFF is a prerequisite for s0ix entry. PMFW needs to be
signaled about GFXOFF status before amd-pmc module passes OS HINT
to PMFW telling that everything is ready for a safe s0ix entry.
Alex Deucher [Fri, 17 Sep 2021 15:23:45 +0000 (11:23 -0400)]
drm/amdgpu: add an option to override IP discovery table from a file
If you set amdgpu.discovery=2 you can force the the driver to
fetch the IP discovery table from a file rather than from the
table shipped on the device. This is useful for debugging and
for device bring up and emulation when the tables may be in flux.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 4 Oct 2021 19:19:10 +0000 (15:19 -0400)]
drm/amdgpu: convert IP version array to include instances
Allow us to query instances versions more cleanly.
Instancing support is not consistent unfortunately. SDMA is a
good example. Sienna cichlid has 4 total SDMA instances, each
enumerated separately (HWIDs 42, 43, 68, 69). Arcturus has 8
total SDMA instances, but they are enumerated as multiple
instances of the same HWIDs (4x HWID 42, 4x HWID 43). UMC
is another example. On most chips there are multiple
instances with the same HWID. This allows us to support both
forms.
v2: rebase
v3: clarify instancing support
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 9 Aug 2021 21:26:21 +0000 (17:26 -0400)]
drm/amdgpu: add new asic_type for IP discovery
Add a new asic type for asics where we don't have an
explicit entry in the PCI ID list. We don't need
an asic type for these asics, other than something higher
than the existing ones, so just use this for all new
asics.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 9 Aug 2021 15:50:23 +0000 (11:50 -0400)]
drm/amdgpu: get VCN and SDMA instances from IP discovery table
Rather than hardcoding it. We already have the number of VCN
instances from a previous patch, so just update the VCN
instances for chips with static tables.
v2: squash in checks for SDMA3,4 (Guchun)
v3: clarify VCN changes
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Guchun Chen [Mon, 9 Aug 2021 07:44:29 +0000 (15:44 +0800)]
drm/amd/display: fix error case handling
Otherwise, we will run into error case path.
v2: fix build when CONFIG_DRM_AMD_DC_DCN is not set
Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 3 Aug 2021 21:39:01 +0000 (17:39 -0400)]
drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support
We are not going to support any new chips with the old
non-DC code so make it the default.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 30 Jul 2021 16:44:07 +0000 (12:44 -0400)]
drm/amdgpu: add DCI HWIP
So we can track grab the appropriate DCE info out of the
IP discovery table. This is a separare IP from DCN.
Acked-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Wed, 22 Sep 2021 17:17:28 +0000 (13:17 -0400)]
drm/amd/display: Only define DP 2.0 symbols if not already defined
[Why]
For some reason we're defining DP 2.0 definitions inside our
driver. Now that patches to introduce relevant definitions
are slated to be merged into drm-next this is causing conflicts.
In file included from drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c:33:
In file included from ./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu.h:70:
In file included from ./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu_mode.h:36:
./include/drm/drm_dp_helper.h:1322:9: error: 'DP_MAIN_LINK_CHANNEL_CODING_PHY_REPEATER' macro redefined [-Werror,-Wmacro-redefined]
^
./drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dp_types.h:881:9: note: previous definition is here
^
1 error generated.
[How]
Guard all display driver defines with #ifndef for now. Once we pull
in the new definitions into amd-staging-drm-next we will follow
up and drop definitions from our driver and provide follow-up
header updates for any addition DP 2.0 definitions required
by our driver.
We also ensure drm_dp_helper.h is included before dc_dp_types.h.
v3: Ensure drm_dp_helper.h is included before dc_dp_types.h
v2: Add one missing endif
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Prike Liang [Wed, 25 Aug 2021 05:36:38 +0000 (13:36 +0800)]
drm/amdgpu: force exit gfxoff on sdma resume for rmb s0ix
In the s2idle stress test sdma resume fail occasionally,in the
failed case GPU is in the gfxoff state.This issue may introduce
by firmware miss handle doorbell S/R and now temporary fix the issue
by forcing exit gfxoff for sdma resume.
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhan Liu [Sat, 25 Sep 2021 07:01:48 +0000 (00:01 -0700)]
drm/amd/display: add cyan_skillfish display support
[Why]
add display related cyan_skillfish files in.
makefile controlled by CONFIG_DRM_AMD_DC_DCN201 flag.
v2: squash in clang fixes from Harry, Nathan
v3: squash in missing CONFIG_DRM_AMD_DC check (Alex)
Signed-off-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Zhan Liu <zhan.liu@amd.com> Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Jun Lei <jun.lei@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhan Liu [Sat, 25 Sep 2021 07:51:08 +0000 (00:51 -0700)]
drm/amdgpu: add cyan_skillfish asic header files
This patch is to add cyan_skillfish asic header files.
Signed-off-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Zhan Liu <zhan.liu@amd.com> Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Jun Lei <jun.lei@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Andrey Grodzovsky [Tue, 24 Aug 2021 20:38:20 +0000 (16:38 -0400)]
drm/amdgpu: Add a UAPI flag for hot plug/unplug
To support libdrm tests.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Andrey Grodzovsky [Tue, 24 Aug 2021 20:15:48 +0000 (16:15 -0400)]
drm/amdgpu: drm/amdgpu: Handle IOMMU enabled case
Handle all DMA IOMMU group related dependencies before the
group is removed and we try to access it after free.
v2:
Move the actul handling function to TTM
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
gpu: amd: replace open-coded offsetof() with builtin
The two AMD drivers have their own custom offsetof() implementation
that now triggers a warning with recent versions of clang:
drivers/gpu/drm/radeon/radeon_atombios.c:133:14: error: performing pointer subtraction with a null pointer has undefined behavior [-Werror,-Wnull-pointer-subtraction]
Change all the instances to use the normal offsetof() provided
by the kernel that does not have this problem.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Li [Sun, 26 Sep 2021 07:16:20 +0000 (15:16 +0800)]
drm/amdkfd: fix resource_size.cocci warnings
Use resource_size function on resource object
instead of explicit computation.
Clean up coccicheck warning:
./drivers/gpu/drm/amd/amdkfd/kfd_migrate.c:905:10-13: ERROR: Missing
resource_size with res
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: Amos Kong <kongjianjun@gmail.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The overflow check in amdgpu_bo_list_create() causes a warning with
clang-14 on 64-bit architectures, since the limit can never be
exceeded.
drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c:74:18: error: result of comparison of constant 256204778801521549 with expression of type 'unsigned int' is always false [-Werror,-Wtautological-constant-out-of-range-compare]
if (num_entries > (SIZE_MAX - sizeof(struct amdgpu_bo_list))
~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The check remains useful for 32-bit architectures, so just avoid the
warning by using size_t as the type for the count.
Fixes: 920990cb080a ("drm/amdgpu: allocate the bo_list array after the list") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Simon Ser [Mon, 27 Sep 2021 15:08:44 +0000 (15:08 +0000)]
drm/amdgpu: check tiling flags when creating FB on GFX8-
On GFX9+, format modifiers are always enabled and ensure the
frame-buffers can be scanned out at ADDFB2 time.
On GFX8-, format modifiers are not supported and no other check
is performed. This means ADDFB2 IOCTLs will succeed even if the
tiling isn't supported for scan-out, and will result in garbage
displayed on screen [1].
Fix this by adding a check for tiling flags for GFX8 and older.
The check is taken from radeonsi in Mesa (see how is_displayable
is populated in gfx6_compute_surface).
Changes in v2: use drm_WARN_ONCE instead of drm_WARN (Michel)
Signed-off-by: Simon Ser <contact@emersion.fr> Acked-by: Michel Dänzer <mdaenzer@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Harry Wentland <hwentlan@amd.com> Cc: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The commit 2766534b766e1b12e0fa0a4e2e26929e808fde71 added the offset
header but didn't add the masks. This adds the masks based on what
was selected for the offsets.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Mon, 20 Sep 2021 18:30:02 +0000 (14:30 -0400)]
drm/amd/display: Pass PCI deviceid into DC
[why]
pci deviceid not passed to dal dc, without proper break,
dcn2.x falls into dcn3.x code path
[how]
pass in pci deviceid, and break once dal_version initialized.
Reviewed-by: Zhan Liu <Zhan.Liu@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
George Shen [Thu, 16 Sep 2021 23:59:34 +0000 (19:59 -0400)]
drm/amd/display: Update VCP X.Y logging to improve usefulness
[Why]
Recently debugging efforts have involved setting/checking the
X.Y value used during payload allocation. Current output for
Y was calculated with incorrect bitshift. Y value is also not
human readable.
[How]
Refactor logging into separate function. Fix Y calculation error
and format output to be human readable.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>