]> www.infradead.org Git - users/hch/block.git/log
users/hch/block.git
20 months agoDocumentation/gpu: Add simple doc page for DCHUBBUB
Rodrigo Siqueira [Mon, 22 Jan 2024 21:24:55 +0000 (14:24 -0700)]
Documentation/gpu: Add simple doc page for DCHUBBUB

Enable the documentation to extract code documentation from dchubbub.h
file.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Harry Wentland <Harry.Wentland@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agoDocumentation/gpu: Add basic page for HUBP
Rodrigo Siqueira [Mon, 22 Jan 2024 21:24:54 +0000 (14:24 -0700)]
Documentation/gpu: Add basic page for HUBP

Create the HUBP documentation page and add the doc references to extract
the HUBP code documentation.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Harry Wentland <Harry.Wentland@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: update documentation on new chips
Alex Deucher [Thu, 18 Jan 2024 18:52:28 +0000 (13:52 -0500)]
drm/amdgpu: update documentation on new chips

These have been released now, so add them to the documentation.

Acked-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agoamdgpu/drm: Use vram manager for virtualization page retirement
Victor Skvortsov [Mon, 22 Jan 2024 17:45:09 +0000 (12:45 -0500)]
amdgpu/drm: Use vram manager for virtualization page retirement

In runtime, use vram manager for virtualization page retirement.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: Add RAS_POISON_READY host response message
Victor Skvortsov [Sun, 21 Jan 2024 15:25:24 +0000 (10:25 -0500)]
drm/amdgpu: Add RAS_POISON_READY host response message

In a non-FLR page avoidance scenario, the host driver will
provide the bad pages in the pf2vf exchange region.

Adding a new host response message to indicate when the
pf2vf exchange region has been updated.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: Support passing poison consumption ras block to SRIOV
YiPeng Chai [Tue, 23 Jan 2024 08:08:11 +0000 (16:08 +0800)]
drm/amdgpu: Support passing poison consumption ras block to SRIOV

Support passing poison consumption ras blocks
to SRIOV.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/display: "Enable IPS by default"
Roman Li [Tue, 23 Jan 2024 20:18:24 +0000 (15:18 -0500)]
drm/amd/display: "Enable IPS by default"

[Why]
IPS was temporary disabled due to instability.
It was fixed in dmub firmware and with:
- "drm/amd/display: Add IPS checks before dcn register access"
- "drm/amd/display: Disable ips before dc interrupt setting"

[How]
Enable IPS by default.
Disable IPS if 0x800 bit set in amdgpu.dcdebugmask module params

Signed-off-by: Roman Li <Roman.Li@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd: Add a DC debug mask for IPS
Roman Li [Tue, 23 Jan 2024 20:14:28 +0000 (15:14 -0500)]
drm/amd: Add a DC debug mask for IPS

For debugging IPS-related issues, expose a new debug mask
that allows to disable IPS.
Usage:
amdgpu.dcdebugmask=0x800

Signed-off-by: Roman Li <Roman.Li@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/display: Disable ips before dc interrupt setting
Roman Li [Mon, 22 Jan 2024 22:45:41 +0000 (17:45 -0500)]
drm/amd/display: Disable ips before dc interrupt setting

[Why]
While in IPS2 an access to dcn registers is not allowed.
If interrupt results in dc call, we should disable IPS.

[How]
Safeguard register access in IPS2 by disabling idle optimization
before calling dc interrupt setting api.

Signed-off-by: Roman Li <Roman.Li@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: adjust aca init/fini sequence to match gpu reset
Yang Wang [Wed, 24 Jan 2024 02:15:10 +0000 (10:15 +0800)]
drm/amdgpu: adjust aca init/fini sequence to match gpu reset

- move aca init/fini function into ras init/fini to adapt gpu reset
  sequence.
- add new function amdgpu_aca_reset()

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: add aca sysfs remove support
Yang Wang [Tue, 23 Jan 2024 06:31:26 +0000 (14:31 +0800)]
drm/amdgpu: add aca sysfs remove support

add aca sysfs remove support.

Fixes: 37973b69eab4 ("drm/amdgpu: add aca sysfs support")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/display: Fix a potential buffer overflow in 'dp_dsc_clock_en_read()'
Srinivasan Shanmugam [Tue, 23 Jan 2024 14:48:07 +0000 (20:18 +0530)]
drm/amd/display: Fix a potential buffer overflow in 'dp_dsc_clock_en_read()'

Tell snprintf() to store at most 10 bytes in the output buffer
instead of 30.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_debugfs.c:1508 dp_dsc_clock_en_read() error: snprintf() is printing too much 30 vs 10

Fixes: c06e09b76639 ("drm/amd/display: Add DSC parameters logging to debugfs")
Cc: Alex Hung <alex.hung@amd.com>
Cc: Qingqing Zhuo <qingqing.zhuo@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: Fix module unload hang with RAS enabled
Mukul Joshi [Wed, 24 Jan 2024 02:14:51 +0000 (10:14 +0800)]
drm/amdgpu: Fix module unload hang with RAS enabled

The driver unload hangs because the page retirement
kthread cannot be stopped as it is sleeping and waiting
on page retirement event to occur. Add kthread_should_stop()
to the event condition to wake up the kthread when kthread
stop is called during driver unload.

Fixes: 3fdcd0a31d7a ("drm/amdgpu: Prepare for asynchronous processing of umc page retirement")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu/pm: Add default case for smu IH process func
Ma Jun [Mon, 22 Jan 2024 06:21:11 +0000 (14:21 +0800)]
drm/amdgpu/pm: Add default case for smu IH process func

Add default case for smu IH process func.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: convert some variable sized arrays to [] style
Alex Deucher [Wed, 17 Jan 2024 23:09:00 +0000 (18:09 -0500)]
drm/amdgpu: convert some variable sized arrays to [] style

Replace [1] with [].  Silences UBSAN warnings.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3107
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu/gfx11: set UNORD_DISPATCH in compute MQDs
Alex Deucher [Fri, 19 Jan 2024 17:32:59 +0000 (12:32 -0500)]
drm/amdgpu/gfx11: set UNORD_DISPATCH in compute MQDs

This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer.  On GC 9.x and older, this needs
to be set to 0. This can lead to hangs in some mixed
graphics and compute workloads. Updated firmware is also
required for AQL.

Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
20 months agodrm/amdgpu/gfx10: set UNORD_DISPATCH in compute MQDs
Alex Deucher [Fri, 19 Jan 2024 17:23:55 +0000 (12:23 -0500)]
drm/amdgpu/gfx10: set UNORD_DISPATCH in compute MQDs

This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer.  On GC 9.x and older, this needs
to be set to 0.  This can lead to hangs in some mixed
graphics and compute workloads.  Updated firmware is also
required for AQL.

Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
20 months agodrm/amd/amdgpu: Assign GART pages to AMD device mapping
Tom St Denis [Wed, 17 Jan 2024 17:47:37 +0000 (12:47 -0500)]
drm/amd/amdgpu: Assign GART pages to AMD device mapping

This allows kernel mapped pages like the PDB and PTB to be
read via the iomem debugfs when there is no vram in the system.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: Fix return type in 'aca_bank_hwip_is_matched()'
Srinivasan Shanmugam [Tue, 23 Jan 2024 07:16:36 +0000 (12:46 +0530)]
drm/amdgpu: Fix return type in 'aca_bank_hwip_is_matched()'

Change the return type of "if (!bank || type == ACA_HWIP_TYPE_UNKNOW)"
to be bool instead of int.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c:185 aca_bank_hwip_is_matched() warn: signedness bug returning '(-22)'

Fixes: f5e4cc8461c4 ("drm/amdgpu: implement RAS ACA driver framework")
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/pm: Fetch current power limit from FW
Lijo Lazar [Thu, 18 Jan 2024 08:55:35 +0000 (14:25 +0530)]
drm/amd/pm: Fetch current power limit from FW

Power limit of SMUv13.0.6 SOCs can be updated by out-of-band ways. Fetch
the limit from firmware instead of using cached values.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu/pptable: convert some variable sized arrays to [] style
Alex Deucher [Mon, 22 Jan 2024 15:48:39 +0000 (10:48 -0500)]
drm/amdgpu/pptable: convert some variable sized arrays to [] style

Replace [1] with [].  Silences UBSAN warnings.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039926
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agoRevert "drm/amd/pm: fix the high voltage and temperature issue"
Mario Limonciello [Fri, 19 Jan 2024 09:08:37 +0000 (03:08 -0600)]
Revert "drm/amd/pm: fix the high voltage and temperature issue"

This reverts commit 5f38ac54e60562323ea4abb1bfb37d043ee23357.
This causes issues with rebooting and the 7800XT.

Cc: Kenneth Feng <kenneth.feng@amd.com>
Cc: stable@vger.kernel.org
Fixes: 5f38ac54e605 ("drm/amd/pm: fix the high voltage and temperature issue")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3062
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: hook up DCN20 color blocks data to DTN log
Melissa Wen [Tue, 28 Nov 2023 17:52:57 +0000 (16:52 -0100)]
drm/amd/display: hook up DCN20 color blocks data to DTN log

Color caps changed between HW versions, which caused the DCN10 color
state sections in the DTN log to no longer match DCN2+ state. Create a
color state log specific to DCN2.0 and hook it up to DCN2 family
drivers. Instead of reading gamut remap reg values, display gamut remap
matrix data in fixed 31.32.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: add DPP and MPC color caps to DTN log
Melissa Wen [Tue, 28 Nov 2023 17:52:56 +0000 (16:52 -0100)]
drm/amd/display: add DPP and MPC color caps to DTN log

Add color caps information for DPP and MPC block to show HW color caps.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Address kdoc for eDP Panel Replay feature in 'amdgpu_dm_crtc_set_pan...
Srinivasan Shanmugam [Mon, 22 Jan 2024 15:17:32 +0000 (20:47 +0530)]
drm/amd/display: Address kdoc for eDP Panel Replay feature in 'amdgpu_dm_crtc_set_panel_sr_feature()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_crtc.c:100: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * The DRM vblank counter enable/disable action is used as the trigger
   to enable

Cc: Sun peng Li <sunpeng.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: hook up DCN30 color blocks data to DTN log
Melissa Wen [Tue, 28 Nov 2023 17:52:55 +0000 (16:52 -0100)]
drm/amd/display: hook up DCN30 color blocks data to DTN log

Color caps changed between HW versions, which caused the DCN10 color
state sections in the DTN log to no longer match DCN3+ state. Create a
color state log specific to DCN3.0 and hook it up to DCN3.0+ and DCN3.1+
drivers.

rfc-v2:
- detail RAM mode for gamcor and blnd gamma blocks
- add MPC gamut remap matrix log

v3:
- read MPC gamut remap matrix in fixed 31.32 format
- extend to DCN3.0+ and DCN3.1+ drivers (Harry)

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Fix null pointer dereference
Hawking Zhang [Mon, 22 Jan 2024 09:38:23 +0000 (17:38 +0800)]
drm/amdgpu: Fix null pointer dereference

amdgpu_reg_state_sysfs_fini could be invoked at the
time when asic_func is even not initialized, i.e.,
amdgpu_discovery_init fails for some reason.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: skip call ras_late_init if ras block is not supported
Yang Wang [Mon, 22 Jan 2024 04:09:31 +0000 (12:09 +0800)]
drm/amdgpu: skip call ras_late_init if ras block is not supported

skip call ras_late_init callback if ras block is not supported.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Show vram vendor only if available
Lijo Lazar [Sat, 20 Jan 2024 08:18:09 +0000 (13:48 +0530)]
drm/amdgpu: Show vram vendor only if available

Ony if vram vendor info is available, show in sysfs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Avoid fetching vram vendor information
Lijo Lazar [Sat, 20 Jan 2024 08:02:51 +0000 (13:32 +0530)]
drm/amdgpu: Avoid fetching vram vendor information

For GFX 9.4.3 APUs, the current method of fetching vram vendor
information is not reliable. Avoid fetching the information.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/pm: update the power cap setting
Kenneth Feng [Fri, 19 Jan 2024 08:12:00 +0000 (16:12 +0800)]
drm/amd/pm: update the power cap setting

update the power cap setting for smu_v13.0.0/smu_v13.0.7

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2356
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu/pm: Fix the power source flag error
Ma Jun [Wed, 17 Jan 2024 06:35:29 +0000 (14:35 +0800)]
drm/amdgpu/pm: Fix the power source flag error

The power source flag should be updated when
[1] System receives an interrupt indicating that the power source
has changed.
[2] System resumes from suspend or runtime suspend

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Promote DAL to 3.2.269
Aric Cyr [Mon, 15 Jan 2024 13:49:49 +0000 (08:49 -0500)]
drm/amd/display: Promote DAL to 3.2.269

- FW Release 0.0.201.0
- Fix resizing video window for dcn321
- Fix timing bandwidth calculation for HDMI
- Fix null-deref in dml2 assigned pipe search
- Add GART memory support for dmcub
- Add power_state and pme_pending flag
- Add usb4_bw_alloc_support flag
- Revert "Rework DC Z10 restore

Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: [FW Promotion] Release 0.0.201.0
Anthony Koo [Sat, 13 Jan 2024 21:32:02 +0000 (16:32 -0500)]
drm/amd/display: [FW Promotion] Release 0.0.201.0

 - Add debug flag for Replay IPS visual confirm
 - Remove unused debug flags that should not
   be controlled inside Replay FSM

Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Anthony Koo <anthony.koo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Replay + IPS + ABM in Full Screen VPB
ChunTao Tso [Mon, 8 Jan 2024 05:46:59 +0000 (13:46 +0800)]
drm/amd/display: Replay + IPS + ABM in Full Screen VPB

[Why]
Because ABM will wait VStart to start getting histogram data,
 it will cause we can't enter IPS while full screnn video playing.

[How]
Modify the panel refresh rate to the maximun multiple of current
 refresh rate.

Reviewed-by: Dennis Chan <dennis.chan@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: ChunTao Tso <chuntao.tso@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: turn off windowed Mpo ODM feature for dcn321
Wenjing Liu [Fri, 12 Jan 2024 17:31:33 +0000 (12:31 -0500)]
drm/amd/display: turn off windowed Mpo ODM feature for dcn321

[why]
It has been found a regression caused by enabling this feature during ODM to
MPC combine switch when user is resizing video window. The transition is
only needed when the feature is enabled. During the transition driver will
temporary switch to use max dppclk level through SMU set hard min interface.
The interface times out and fail to configure the max dpp clock level, which caused
system issue as the desired clock can't be set. We will continue investigating
the issue and root cause the issue where max dppclk level can't be reached.
But for now we have to disable this feature as this feature will cause us to hit this
problem in common use cases during video playback unfortunately. The issue
is dcn321 specific so it won't impact other dcn revisions.

Reviewed-by: Martin Leung <martin.leung@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Add GART memory support for dmcub
Fudongwang [Tue, 19 Dec 2023 02:20:12 +0000 (10:20 +0800)]
drm/amd/display: Add GART memory support for dmcub

[Why]
In dump file, GART memory can be accessed while frame buffer cannot.

[How]
Add GART memory support for dmcub.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Fudongwang <fudong.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Revert "Rework DC Z10 restore"
Charlene Liu [Thu, 11 Jan 2024 19:40:28 +0000 (14:40 -0500)]
drm/amd/display: Revert "Rework DC Z10 restore"

This reverts commit e6f82bd44b401049367fcdee3328c7c720351419.

It caused intermittent hangs when enabling IPS on static screen.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Charlene Liu <charlene.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: add power_state and pme_pending flag
Muhammad Ahmed [Thu, 7 Dec 2023 04:41:33 +0000 (23:41 -0500)]
drm/amd/display: add power_state and pme_pending flag

[what]
Adding power_state to dc.h and pme_pending flag to clk_mgr_internal.h

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Muhammad Ahmed <ahmed.ahmed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Add IPS checks before dcn register access
Roman Li [Tue, 9 Jan 2024 22:31:33 +0000 (17:31 -0500)]
drm/amd/display: Add IPS checks before dcn register access

[Why]
With IPS enabled a system hangs once PSR is active.
PSR active triggers transition to IPS2 state.
While in IPS2 an access to dcn registers results in hard hang.
Existing check doesn't cover for PSR sequence.

[How]
Safeguard register access by disabling idle optimization in atomic commit
and crtc scanout. It will be re-enabled on next vblank.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Add NULL-checks in dml2 assigned pipe search
Allen Pan [Tue, 9 Jan 2024 19:54:02 +0000 (14:54 -0500)]
drm/amd/display: Add NULL-checks in dml2 assigned pipe search

[Why]
NULL-deref regression after:
"drm/amd/display: Fix dml2 assigned pipe search"

[How]
Add verification for potential NULLs

Fixes: d451b534e0b4 ("drm/amd/display: Fix dml2 assigned pipe search")
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Gabe Teeger <gabe.teeger@amd.com>
Signed-off-by: Allen Pan <allen.pan@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Add usb4_bw_alloc_support flag
Peichen Huang [Tue, 9 Jan 2024 06:13:16 +0000 (14:13 +0800)]
drm/amd/display: Add usb4_bw_alloc_support flag

[Why]
dc should have a flag for DM to enable usb4_bw_alloc in dptx

[How]
- Add usb4_bw_alloc_support flag in dc_config

Reviewed-by: Wayne Lin <wayne.lin@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Peichen Huang <peichen.huang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Promote DAL to 3.2.268
Aric Cyr [Mon, 8 Jan 2024 15:59:39 +0000 (10:59 -0500)]
drm/amd/display: Promote DAL to 3.2.268

Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: create DCN3-specific log for MPC state
Melissa Wen [Tue, 28 Nov 2023 17:52:54 +0000 (16:52 -0100)]
drm/amd/display: create DCN3-specific log for MPC state

Logging DCN3 MPC state was following DCN1 implementation that doesn't
consider new DCN3 MPC color blocks. Create new elements according to
DCN3 MPC color caps and a new DCN3-specific function for reading MPC
data.

v3:
- remove gamut remap reg reading in favor of fixed31_32 matrix data

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Fix timing bandwidth calculation for HDMI
Leo (Hanghong) Ma [Thu, 4 Jan 2024 18:29:32 +0000 (13:29 -0500)]
drm/amd/display: Fix timing bandwidth calculation for HDMI

[Why && How]
The current bandwidth calculation for timing doesn't account for
certain HDMI modes overhead which leads to DSC can't be enabled.
Add support to calculate the actual bandwidth for these HDMI modes.

Reviewed-by: Chris Park <chris.park@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Leo (Hanghong) Ma <hanghong.ma@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: add get_gamut_remap helper for MPC3
Melissa Wen [Tue, 28 Nov 2023 17:52:53 +0000 (16:52 -0100)]
drm/amd/display: add get_gamut_remap helper for MPC3

We want to be able to read the MPC's gamut remap matrix similar to
what we do with .dpp_get_gamut_remap functions. On the other hand, we
don't need a hook here because only DCN3+ has the MPC gamut remap
block, being absent in previous families.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: fill up DCN3 DPP color state
Melissa Wen [Tue, 28 Nov 2023 17:52:52 +0000 (16:52 -0100)]
drm/amd/display: fill up DCN3 DPP color state

DCN3 DPP color state was uncollected and some state elements from DCN1
doesn't fit DCN3. Create new elements according to DCN3 color caps and
fill them up for DTN log output.

rfc-v2:
- fix reading of gamcor and blnd gamma states
- remove gamut remap register in favor of gamut remap matrix reading

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: read gamut remap matrix in fixed-point 31.32 format
Melissa Wen [Tue, 28 Nov 2023 17:52:51 +0000 (16:52 -0100)]
drm/amd/display: read gamut remap matrix in fixed-point 31.32 format

Instead of read gamut remap data from hw values, convert HW register
values (S2D13) into a fixed-point 31.32 matrix for color state log.
Change DCN10 log to print data in the format of the gamut remap matrix.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Add dpp_get_gamut_remap functions
Harry Wentland [Tue, 28 Nov 2023 17:52:50 +0000 (16:52 -0100)]
drm/amd/display: Add dpp_get_gamut_remap functions

We want to be able to read the DPP's gamut remap matrix.

v2:
- code-style and doc comments clean-up (Melissa)

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: decouple color state from hw state log
Melissa Wen [Tue, 28 Nov 2023 17:52:49 +0000 (16:52 -0100)]
drm/amd/display: decouple color state from hw state log

Prepare to hook up color state log according to the DCN version.

v3:
- put functions in single line (Siqueira)

Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Fix uninitialized variable usage in core_link_ 'read_dpcd() & write_...
Srinivasan Shanmugam [Wed, 17 Jan 2024 03:11:52 +0000 (08:41 +0530)]
drm/amd/display: Fix uninitialized variable usage in core_link_ 'read_dpcd() & write_dpcd()' functions

The 'status' variable in 'core_link_read_dpcd()' &
'core_link_write_dpcd()' was uninitialized.

Thus, initializing 'status' variable to 'DC_ERROR_UNEXPECTED' by default.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dpcd.c:226 core_link_read_dpcd() error: uninitialized symbol 'status'.
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dpcd.c:248 core_link_write_dpcd() error: uninitialized symbol 'status'.

Cc: stable@vger.kernel.org
Cc: Jerry Zuo <jerry.zuo@amd.com>
Cc: Jun Lei <Jun.Lei@amd.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: update check condition of query for ras page retire
Tao Zhou [Thu, 18 Jan 2024 06:25:07 +0000 (14:25 +0800)]
drm/amdgpu: update check condition of query for ras page retire

Support page retirement handling in debug mode.

v2: revert smu_v13_0_6_get_ecc_info directly.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agoRevert "drm/amd/pm: smu v13_0_6 supports ecc info by default"
Tao Zhou [Thu, 18 Jan 2024 06:15:32 +0000 (14:15 +0800)]
Revert "drm/amd/pm: smu v13_0_6 supports ecc info by default"

This reverts commit 6fe08f56db798659beca41ab5b1727a31518f794.
We use debug mode flag instead of this interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Drop kdoc markers for some Panel Replay functions
Srinivasan Shanmugam [Thu, 18 Jan 2024 13:47:16 +0000 (19:17 +0530)]
drm/amd/display: Drop kdoc markers for some Panel Replay functions

Fixes the below gcc with W=1:
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_replay.c:262: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Set REPLAY power optimization flags and coasting vtotal.
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_replay.c:284: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * send Replay general cmd to DMUB.

Fixes: e379787cbc2a ("drm/amd/display: Add some functions for Panel Replay")
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Cleanup inconsistent indenting in 'amdgpu_gfx_enable_kcq()'
Srinivasan Shanmugam [Thu, 18 Jan 2024 03:14:42 +0000 (08:44 +0530)]
drm/amdgpu: Cleanup inconsistent indenting in 'amdgpu_gfx_enable_kcq()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:645 amdgpu_gfx_enable_kcq() warn: inconsistent indenting

Cc: Le Ma <Le.Ma@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/pm: udpate smu v13.0.6 message permission
Yang Wang [Fri, 19 Jan 2024 03:32:41 +0000 (11:32 +0800)]
drm/amd/pm: udpate smu v13.0.6 message permission

update smu v13.0.6 message to allow guest driver set gfx clock.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu:Support retiring multiple MCA error address pages
YiPeng Chai [Mon, 15 Jan 2024 03:02:52 +0000 (11:02 +0800)]
drm/amdgpu:Support retiring multiple MCA error address pages

Support retiring multiple MCA error address pages in
one in-band query for umc v12_0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: add interface to check mca umc status
YiPeng Chai [Mon, 15 Jan 2024 02:56:02 +0000 (10:56 +0800)]
drm/amdgpu: add interface to check mca umc status

Add interface to check mca umc status.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Use asynchronous polling to handle umc_v12_0 poisoning
YiPeng Chai [Mon, 15 Jan 2024 02:14:23 +0000 (10:14 +0800)]
drm/amdgpu: Use asynchronous polling to handle umc_v12_0 poisoning

Use asynchronous polling to handle umc_v12_0 poisoning.

v2:
  1. Change function name.
  2. Change the debugging information content.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Fix ras features value calltrace
Stanley.Yang [Wed, 17 Jan 2024 07:23:41 +0000 (15:23 +0800)]
drm/amdgpu: Fix ras features value calltrace

The high three bits of ras features mask indicate socket
id, it should skip to check high three bits of ras features
mask before disable all ras features.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Prepare for asynchronous processing of umc page retirement
YiPeng Chai [Thu, 18 Jan 2024 06:57:22 +0000 (14:57 +0800)]
drm/amdgpu: Prepare for asynchronous processing of umc page retirement

Preparing for asynchronous processing of umc page retirement.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Add log info for umc_v12_0
YiPeng Chai [Mon, 15 Jan 2024 01:52:22 +0000 (09:52 +0800)]
drm/amdgpu: Add log info for umc_v12_0

Add log info for umc_v12_0.

v2:
 Delete redundant logs.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: fix wrong sizeof argument
Samasth Norway Ananda [Thu, 18 Jan 2024 07:53:53 +0000 (23:53 -0800)]
drm/amdgpu: fix wrong sizeof argument

voltage_parameters is a point to a struct of type
SET_VOLTAGE_PARAMETERS_V1_3. Passing just voltage_parameters would
not print the right size of the struct variable. So we need to pass
*voltage_parameters to sizeof().

Fixes: 4630d5031cd8 ("drm/amdgpu: check PS, WS index")
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Enable seq64 manager and fix bugs
Arunpravin Paneer Selvam [Fri, 12 Jan 2024 07:21:13 +0000 (23:21 -0800)]
drm/amdgpu: Enable seq64 manager and fix bugs

- Enable the seq64 mapping sequence.
- Fix wflinfo va conflict and other bugs.

v1:
  - The seq64 area needs to be included in the AMDGPU_VA_RESERVED_SIZE
    otherwise the areas will conflict with user space allocations (Alex)

  - It needs to be mapped read only in the user VM (Alex)

v2:
  - Instead of just one define for TOP/BOTTOM
    reserved space separate them into two (Christian)

  - Fix the CPU and VA calculations and while at it
    also cleanup error handling and kerneldoc (Christian)

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
21 months agodrm/radeon/ni_dpm: remove redundant NULL check
Nikita Zhandarovich [Wed, 17 Jan 2024 14:45:14 +0000 (06:45 -0800)]
drm/radeon/ni_dpm: remove redundant NULL check

'leakage_table' will always be successfully initialized as a pointer
to '&rdev->pm.dpm.dyn_state.cac_leakage_table'.

Remove unnecessary check if only to silence static checkers.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool Svace.

Fixes: 69e0b57a91ad ("drm/radeon/kms: add dpm support for cayman (v5)")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: remove dead code in ni_mc_load_microcode()
Nikita Zhandarovich [Wed, 17 Jan 2024 14:44:36 +0000 (06:44 -0800)]
drm/radeon: remove dead code in ni_mc_load_microcode()

Inside the if block with (running == 0), the checks for 'running'
possibly being non-zero are redundant. Remove them altogether.

This change is similar to the one authored by Heinrich Schuchardt
<xypron.glpk@gmx.de> in commit
ddbbd3be9679 ("drm/radeon: remove dead code, si_mc_load_microcode (v2)")

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool Svace.

Fixes: 0af62b016804 ("drm/radeon/kms: add ucode loader for NI")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/pm: enable amdgpu smu send message log
Yang Wang [Wed, 26 Apr 2023 08:17:05 +0000 (16:17 +0800)]
drm/amd/pm: enable amdgpu smu send message log

v1:
enable amdgpu smu driver message log.

v2:
add smu/pmfw response value into debug log.

Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: update error condition check for umc_v12_0_query_error_address
Tao Zhou [Wed, 17 Jan 2024 08:35:59 +0000 (16:35 +0800)]
drm/amdgpu: update error condition check for umc_v12_0_query_error_address

Deferred error is also taken into account.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Skip do PCI error slot reset during RAS recovery
Stanley.Yang [Wed, 10 Jan 2024 08:13:50 +0000 (16:13 +0800)]
drm/amdgpu: Skip do PCI error slot reset during RAS recovery

Why:
    The PCI error slot reset maybe triggered after inject ue to UMC multi times, this
    caused system hang.
    [  557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset succeeded, trying to resume
    [  557.373718] [drm] PCIE GART of 512M enabled.
    [  557.373722] [drm] PTB located at 0x0000031FED700000
    [  557.373788] [drm] VRAM is lost due to GPU reset!
    [  557.373789] [drm] PSP is resuming...
    [  557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1 pci_status: 0. Exit, result = 3, need reset
    [  557.547067] [drm] PCI error: detected callback, state(1)!!
    [  557.547069] [drm] No support for XGMI hive yet...
    [  557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 0. Enter
    [  557.607763] mlx5_core 0000:55:00.0: wait vital counter value 0x16b5b after 1 iterations
    [  557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err = 0, result = 5, recovered
    [  557.610492] [drm] PCI error: slot reset callback!!
    ...
    [  560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2) succeeded!
    [  560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2) succeeded!
    [  560.689562] general protection fault, probably for non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI
    [  560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G           OE     5.15.0-91-generic #101-Ubuntu
    [  560.712057] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 11/08/2023
    [  560.720959] Workqueue: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu]
    [  560.728887] RIP: 0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]
    [  560.736891] Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff <48> 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00
    [  560.757967] RSP: 0018:ffa0000032e53d80 EFLAGS: 00010202
    [  560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0
    [  560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010
    [  560.779867] RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08
    [  560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000
    [  560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000
    [  560.803889] FS:  0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000
    [  560.812973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0
    [  560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  560.835433] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
    [  560.843444] PKRU: 55555554
    [  560.846480] Call Trace:
    [  560.849225]  <TASK>
    [  560.851580]  ? show_trace_log_lvl+0x1d6/0x2ea
    [  560.856488]  ? show_trace_log_lvl+0x1d6/0x2ea
    [  560.861379]  ? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]
    [  560.867778]  ? show_regs.part.0+0x23/0x29
    [  560.872293]  ? __die_body.cold+0x8/0xd
    [  560.876502]  ? die_addr+0x3e/0x60
    [  560.880238]  ? exc_general_protection+0x1c5/0x410
    [  560.885532]  ? asm_exc_general_protection+0x27/0x30
    [  560.891025]  ? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]
    [  560.898323]  amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]
    [  560.904520]  process_one_work+0x228/0x3d0
How:
    In RAS recovery, mode-1 reset is issued from RAS fatal error handling and expected
    all the nodes in a hive to be reset. no need to issue another mode-1 during this procedure.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Show deferred error count for UMC
Stanley.Yang [Wed, 17 Jan 2024 03:49:35 +0000 (11:49 +0800)]
drm/amdgpu: Show deferred error count for UMC

Show deferred error count for UMC syfs node

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Enable GFXOFF for Compute on GFX11
Ori Messinger [Wed, 22 Nov 2023 05:12:13 +0000 (00:12 -0500)]
drm/amdgpu: Enable GFXOFF for Compute on GFX11

On GFX version 11, GFXOFF was disabled due to a MES KIQ firmware
issue, which has since been fixed after version 64.
This patch only re-enables GFXOFF for GFX version 11 if the GPU's
MES KIQ firmware version is newer than version 64.

V2: Keep GFXOFF disabled on GFX11 if MES KIQ is below version 64.
V3: Add parentheses to avoid GCC warning for parentheses:
"suggest parentheses around comparison in operand of ‘&’"
V4: Remove "V3" from commit title
V5: Change commit description and insert 'Acked-by'

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: fix UBSAN array-index-out-of-bounds for ras_block_string[]
Yang Wang [Tue, 16 Jan 2024 10:58:39 +0000 (18:58 +0800)]
drm/amdgpu: fix UBSAN array-index-out-of-bounds for ras_block_string[]

fix array index out of bounds issue for ras_block_string[] array.

Fixes: 30df05fb74f6 ("drm/amdgpu: Align ras block enum with firmware")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/amdgpu: Update RLC_SPM_MC_CNT by ring wreg in guest
YuanShang [Thu, 11 Jan 2024 14:03:30 +0000 (22:03 +0800)]
drm/amd/amdgpu: Update RLC_SPM_MC_CNT by ring wreg in guest

Submit command of wreg in GFX and COMPUTE ring to update
RLC_SPM_MC_CNT in guest machine during runtime.

Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Drop 'acrtc' and add 'new_crtc_state' NULL check for writeback requests.
Srinivasan Shanmugam [Sat, 13 Jan 2024 09:02:27 +0000 (14:32 +0530)]
drm/amd/display: Drop 'acrtc' and add 'new_crtc_state' NULL check for writeback requests.

Return value of 'to_amdgpu_crtc' which is container_of(...) can't be
null, so it's null check 'acrtc' is dropped.

Fixing the below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:9302 amdgpu_dm_atomic_commit_tail() error: we previously assumed 'acrtc' could be null (see line 9299)

Added 'new_crtc_state' NULL check for function
'drm_atomic_get_new_crtc_state' that retrieves the new state for a CRTC,
while enabling writeback requests.

Cc: stable@vger.kernel.org
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Remove unnecessary NULL check
Felix Kuehling [Mon, 15 Jan 2024 21:51:46 +0000 (16:51 -0500)]
drm/amdgpu: Remove unnecessary NULL check

A static checker pointed out, that bo_va->base.bo was already derefenced
earlier in the same scope. Therefore this check is unnecessary here.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 50661eb1a2c8 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs")
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"
Christian König [Wed, 10 Jan 2024 14:19:29 +0000 (15:19 +0100)]
drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"

Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the
HW in an active state and is an unbalanced use of the IP callbacks.

Using the IP callbacks like this can lead to memory leaks, double
free and imbalanced reference counters.

Leaving the HW in an active state can lead to DMA accesses to memory now
freed by the driver.

Both is a complete no-go for driver unload so completely revert the
workaround for now.

This reverts commit f5c7e7797060255dbc8160734ccc5ad6183c5e04.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdgpu: Remove usage of the deprecated ida_simple_xx() API
Christophe JAILLET [Sun, 14 Jan 2024 15:14:27 +0000 (16:14 +0100)]
drm/amdgpu: Remove usage of the deprecated ida_simple_xx() API

ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().

Note that the upper limit of ida_simple_get() is exclusive, but the one of
ida_alloc_range() is inclusive. So a -1 has been added when needed.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amdkfd: init drm_client with funcs hook
Flora Cui [Wed, 10 Jan 2024 11:23:56 +0000 (19:23 +0800)]
drm/amdkfd: init drm_client with funcs hook

otherwise drm_client_dev_unregister() would try to
kfree(&adev->kfd.client).

Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles")
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/display: Fix a switch statement in populate_dml_output_cfg_from_stream_state()
Christophe JAILLET [Sat, 13 Jan 2024 14:58:21 +0000 (15:58 +0100)]
drm/amd/display: Fix a switch statement in populate_dml_output_cfg_from_stream_state()

It is likely that the statement related to 'dml_edp' is misplaced. So move
it in the correct "case SIGNAL_TYPE_EDP".

Fixes: 7966f319c66d ("drm/amd/display: Introduce DML2")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon/ni_dpm: Clean up errors in nislands_smc.h
XueBing Chen [Thu, 11 Jan 2024 09:49:53 +0000 (09:49 +0000)]
drm/radeon/ni_dpm: Clean up errors in nislands_smc.h

Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line

Signed-off-by: XueBing Chen <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon/evergreen_cs: Clean up errors in evergreen_cs.c
XueBing Chen [Thu, 11 Jan 2024 09:40:13 +0000 (09:40 +0000)]
drm/radeon/evergreen_cs: Clean up errors in evergreen_cs.c

Fix the following errors reported by checkpatch:

ERROR: space required after that ',' (ctx:VxV)
ERROR: spaces required around that '>' (ctx:VxV)
ERROR: spaces required around that '<' (ctx:VxV)

Signed-off-by: XueBing Chen <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in si.c
XueBing Chen [Thu, 11 Jan 2024 09:38:01 +0000 (09:38 +0000)]
drm/radeon: Clean up errors in si.c

Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line
ERROR: trailing statements should be on next lineo

Signed-off-by: XueBing Chen <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in radeon.h
XueBing Chen [Thu, 11 Jan 2024 09:34:50 +0000 (09:34 +0000)]
drm/radeon: Clean up errors in radeon.h

Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line

Signed-off-by: XueBing Chen <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in ci_dpm.h
XueBing Chen [Thu, 11 Jan 2024 09:33:01 +0000 (09:33 +0000)]
drm/radeon: Clean up errors in ci_dpm.h

Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line

Signed-off-by: XueBing Chen <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon/dpm: Clean up errors in evergreen_smc.h
XueBing Chen [Thu, 11 Jan 2024 09:32:10 +0000 (09:32 +0000)]
drm/radeon/dpm: Clean up errors in evergreen_smc.h

Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line

Signed-off-by: XueBing Chen <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in clearstate_cayman.h
XueBing Chen [Thu, 11 Jan 2024 09:30:39 +0000 (09:30 +0000)]
drm/radeon: Clean up errors in clearstate_cayman.h

Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line

Signed-off-by: XueBing Chen <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in clearstate_ci.h
XueBing Chen [Thu, 11 Jan 2024 09:21:43 +0000 (09:21 +0000)]
drm/radeon: Clean up errors in clearstate_ci.h

Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line

Signed-off-by: XueBing Chen <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon/kms: Clean up errors in radeon_pm.c
XueBing Chen [Thu, 11 Jan 2024 09:11:59 +0000 (09:11 +0000)]
drm/radeon/kms: Clean up errors in radeon_pm.c

Fix the following errors reported by checkpatch:

ERROR: space required before the open parenthesis '('

Signed-off-by: XueBing Chen <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon/kms: Clean up errors in smu7.h
GuoHua Chen [Thu, 11 Jan 2024 09:05:36 +0000 (09:05 +0000)]
drm/radeon/kms: Clean up errors in smu7.h

Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line

Signed-off-by: GuoHua Chen <chenguohua_716@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon/kms: Clean up errors in smu7_fusion.h
GuoHua Chen [Thu, 11 Jan 2024 09:04:03 +0000 (09:04 +0000)]
drm/radeon/kms: Clean up errors in smu7_fusion.h

Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line
ERROR: space prohibited before open square bracket '['

Signed-off-by: GuoHua Chen <chenguohua_716@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in r600_dpm.c
GuoHua Chen [Thu, 11 Jan 2024 09:00:17 +0000 (09:00 +0000)]
drm/radeon: Clean up errors in r600_dpm.c

Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line

Signed-off-by: GuoHua Chen <chenguohua_716@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in rv515.c
GuoHua Chen [Thu, 11 Jan 2024 08:59:19 +0000 (08:59 +0000)]
drm/radeon: Clean up errors in rv515.c

Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line

Signed-off-by: GuoHua Chen <chenguohua_716@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in radeon_mode.h
GuoHua Chen [Thu, 11 Jan 2024 08:58:07 +0000 (08:58 +0000)]
drm/radeon: Clean up errors in radeon_mode.h

Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line

Signed-off-by: GuoHua Chen <chenguohua_716@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in evergreen_reg.h
GuoHua Chen [Thu, 11 Jan 2024 08:56:54 +0000 (08:56 +0000)]
drm/radeon: Clean up errors in evergreen_reg.h

Fix the following errors reported by checkpatch:

ERROR: space prohibited before that close parenthesis ')'
ERROR: need consistent spacing around '<<' (ctx:WxV)
ERROR: need consistent spacing around '-' (ctx:WxV)

Signed-off-by: GuoHua Chen <chenguohua_716@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in kv_smc.c
GuoHua Chen [Thu, 11 Jan 2024 08:54:45 +0000 (08:54 +0000)]
drm/radeon: Clean up errors in kv_smc.c

Fix the following errors reported by checkpatch:

ERROR: spaces required around that '=' (ctx:VxW)

Signed-off-by: GuoHua Chen <chenguohua_716@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agogpu/drm/radeon: Clean up errors in evergreen.c
GuoHua Chen [Thu, 11 Jan 2024 08:53:19 +0000 (08:53 +0000)]
gpu/drm/radeon: Clean up errors in evergreen.c

Fix the following errors reported by checkpatch:

ERROR: space prohibited after that open parenthesis '('
ERROR: space prohibited before that close parenthesis ')'

Signed-off-by: GuoHua Chen <chenguohua_716@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in evergreen.c
GuoHua Chen [Thu, 11 Jan 2024 08:51:58 +0000 (08:51 +0000)]
drm/radeon: Clean up errors in evergreen.c

Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line
ERROR: spaces required around that '&=' (ctx:WxO)
ERROR: space required before that '~' (ctx:OxV)
ERROR: space prohibited before that close parenthesis ')'
ERROR: space required after that ',' (ctx:WxO)
ERROR: space required before that '&' (ctx:OxV)
ERROR: need consistent spacing around '*' (ctx:VxW)

Signed-off-by: GuoHua Chen <chenguohua_716@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in rv770_smc.h
GuoHua Chen [Thu, 11 Jan 2024 08:46:00 +0000 (08:46 +0000)]
drm/radeon: Clean up errors in rv770_smc.h

Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line
ERROR: open brace '{' following union go on the same line

Signed-off-by: GuoHua Chen <chenguohua_716@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon/ci_dpm: Clean up errors in ci_dpm.c
GuoHua Chen [Thu, 11 Jan 2024 08:43:25 +0000 (08:43 +0000)]
drm/radeon/ci_dpm: Clean up errors in ci_dpm.c

Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line
ERROR: need consistent spacing around '-' (ctx:WxV)
ERROR: space required before the open parenthesis '('
ERROR: "foo* bar" should be "foo *bar"

Signed-off-by: GuoHua Chen <chenguohua_716@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/radeon: Clean up errors in r600.c
GuoHua Chen [Thu, 11 Jan 2024 08:36:54 +0000 (08:36 +0000)]
drm/radeon: Clean up errors in r600.c

Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line

Signed-off-by: GuoHua Chen <chenguohua_716@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>