www.infradead.org Git - users/jedix/linux-maple.git/log

drm/amdgpu: fix channel index mapping for SIENNA_CICHLID

Pmfw read ecc info registers in the following order,
     umc0: ch_inst 0, 1, 2 ... 7
     umc1: ch_inst 0, 1, 2 ... 7
The position of the register value stored in eccinfo
table is calculated according to the below formula,
     channel_index = umc_inst * channel_in_umc + ch_inst
Driver directly use the index of eccinfo table array
as channel index, it's not correct, driver needs convert
eccinfo table array index to channel index according to
channel_idx_tbl.

Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: switch to common helper to read bios from rom

create a common helper function for soc15 and onwards
to read bios image from rom

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: retire rlc callbacks sriov_rreg/wreg

Not needed anymore.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Zhou, Peng Ju <PengJu.Zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: switch to amdgpu_sriov_rreg/wreg

Instead of ip specific implementation for rlcg
indirect register access

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Zhou, Peng Ju <PengJu.Zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add helper for rlcg indirect reg access

The helper will be used to access registers from sriov
guest in full access time

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Zhou, Peng Ju <PengJu.Zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: init rlcg_reg_access_ctrl for gfx10

Initialize all the register offsets that will be
used in rlcg indirect reg access path for gfx10
in sw_init phase

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Zhou, Peng Ju <PengJu.Zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: init rlcg_reg_access_ctrl for gfx9

Initialize all the register offsets that will be
used in rlcg indirect reg access path for gfx9 in
sw_init phase

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Zhou, Peng Ju <PengJu.Zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add structures for rlcg indirect reg access

Add structures that are used to cache registers offsets
for rlcg indirect reg access ctrl and flag availability
of such interface

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Zhou, Peng Ju <PengJu.Zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: switch to get_rlcg_reg_access_flag for gfx10

Switch to common helper to query rlcg access flag
specified by sriov host driver for gfx10

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Zhou, Peng Ju <PengJu.Zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: switch to get_rlcg_reg_access_flag for gfx9

Switch to common helper to query rlcg access flag
specified by sriov host driver for gfx9

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Zhou, Peng Ju <PengJu.Zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add helper to query rlcg reg access flag

Query rlc indirect register access approach specified
by sriov host driver per ip blocks

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Zhou, Peng Ju <PengJu.Zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: clean up some inconsistent indenting

Eliminate the follow smatch warning:
drivers/gpu/drm/amd/display/dc/dml/calcs/dce_calcs.c:3415 bw_calcs() warn: inconsistent indenting

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix memory leak

[why]
Resource release is needed on the error handling path
to prevent memory leak.

[how]
Fix this by adding kfree on the error handling path.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Yongzhi Liu <lyz_cs@pku.edu.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: remove useless if

Clean the following coccicheck warning:

./drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c:7035:2-4: WARNING: possible
condition with no effect (if == else).

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj

This issue takes place in an error path in
amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into
default case, the function simply returns -EINVAL, forgetting to
decrement the reference count of a dma_fence obj, which is bumped
earlier by amdgpu_cs_get_fence(). This may result in reference count
leaks.

Fix it by decreasing the refcount of specific object before returning
the error code.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/display: use msleep rather than udelay for long delays

Some architectures (e.g., ARM) throw an compilation error if the
udelay is too long. In general udelays of longer than 2000us are
not recommended on any architecture. Switch to msleep in these
cases.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/display: adjust msleep limit in dp_wait_for_training_aux_rd_interval

Some architectures (e.g., ARM) have relatively low udelay limits.
On most architectures, anything longer than 2000us is not recommended.
Change the check to align with other similar checks in DC.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: filter out radeon secondary ids as well

Older radeon boards (r2xx-r5xx) had secondary PCI functions
which we solely there for supporting multi-head on OSs with
special requirements. Add them to the unsupported list
as well so we don't attempt to bind to them. The driver
would fail to bind to them anyway, but this does so
in a cleaner way that should not confuse the user.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: use dev_*** to print output in multiple GPUs

In multiple GPU configuration, when failed to send a SMU
message, it's hard to figure out which GPU has such problem.
So it's not comfortable to user.

[40190.142181] amdgpu: [powerplay]
last message was failed ret is 65535
[40190.242420] amdgpu: [powerplay]
failed to send message 201 ret is 65535
[40190.392763] amdgpu: [powerplay]
last message was failed ret is 65535
[40190.492997] amdgpu: [powerplay]
failed to send message 200 ret is 65535
[40190.743575] amdgpu: [powerplay]
last message was failed ret is 65535
[40190.843812] amdgpu: [powerplay]
failed to send message 282 ret is 65535

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: drop WARN_ON in amdgpu_gart_bind/unbind

NULL pointer check has guarded it already.

calltrace:
amdgpu_ttm_gart_bind+0x49/0xa0 [amdgpu]
amdgpu_ttm_alloc_gart+0x13f/0x180 [amdgpu]
amdgpu_bo_create_reserved+0x139/0x2c0 [amdgpu]
? amdgpu_ttm_debugfs_init+0x120/0x120 [amdgpu]
amdgpu_bo_create_kernel+0x17/0x80 [amdgpu]
amdgpu_ttm_init+0x542/0x5e0 [amdgpu]

Fixes: 1b08dfb889b2c5 ("drm/amdgpu: remove gart.ready flag")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Not to call dpcd_set_source_specific_data during resume.

[Why]
During resume path, dpcd_set_source_specific_data is taking
extra time when core_link_write_dpcd fails on DP_SOURCE_OUI+0x03
and DP_SOURCE_MINIMUM_HBLANK_SUPPORTED. Here,aux->transfer fails
with multiple retries and consume significant amount time during
S0i3 resume.

[How]
Not to call dpcd_set_source_specific_data during resume path
when there is no oled panel connected and achieve faster resume
during S0i3.

Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Signed-off-by: Rajib Mahapatra <rajib.mahapatra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: drop unneeded hwmgr->smu_lock

As all those related APIs are already well protected by adev->pm.mutex.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: drop unneeded feature->mutex

As all those related APIs are already well protected by adev->pm.mutex.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: drop unneeded smu_baco->mutex

As those APIs related are already well protected by adev->pm.mutex.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: drop unneeded smu->sensor_lock

As all those related APIs are already well protected by
adev->pm.mutex and smu->message_lock.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: drop unneeded smu->metrics_lock

As all those related APIs are already well protected by
adev->pm.mutex and smu->message_lock.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: drop unneeded vcn/jpeg_gate_lock

As those related APIs are already protected by adev->pm.mutex.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: drop unneeded lock protection smu->mutex

As all those APIs are already protected either by adev->pm.mutex
or smu->message_lock.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: suppress the warning about enum value 'AMD_IP_BLOCK_TYPE_NUM'

Suppress the warning below on building htmldocs:
drivers/gpu/drm/amd/include/amd_shared.h:103: warning: Enum value
'AMD_IP_BLOCK_TYPE_NUM' not described in enum 'amd_ip_block_type'

Fixes: 6ee27ee27ba8 ("drm/amd/pm: avoid duplicate powergate/ungate setting")
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: enable amdgpu_dc module parameter

It doesn't work under IP discovery mode. Make it work!

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Alex Deucher <aleander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd: Fix MSB of SMU version printing

Yellow carp has been outputting versions like `1093.24.0`, but this
is supposed to be 69.24.0. That is the MSB is being interpreted
incorrectly.

The MSB is not part of the major version, but has generally been
treated that way thus far. It's actually the program, and used to
distinguish between two programs from a similar family but different
codebase.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Disable FRU EEPROM access for SRIOV

VF acces the EEPROM is blocked by security policy, we might need other way
to get SKUs info for VF

v2: squash in compilation fix from Luben

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix the page fault caused by uninitialized variables

This patch will fix the page fault caused by uninitialized variables.

Error Log:
......
[ 130.246323] [drm] GART: num cpu pages 131072, num gpu pages 131072
[ 131.963112] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[ 131.963130] BUG: unable to handle page fault for address: 000000000002db80
[ 131.963181] #PF: supervisor write access in kernel mode
[ 131.963210] #PF: error_code(0x0002) - not-present page
[ 131.963233] PGD 0 P4D 0
[ 131.963253] Oops: 0002 [#1] SMP NOPTI
[ 131.963273] CPU: 3 PID: 1411 Comm: modprobe Not tainted 5.13.0+ #1
[ 131.963338] RIP: 0010:osq_lock+0x4d/0x120
[ 131.963381] Code: 10 00 00 00 00 48 c7 02 00 00 00 00 89 42 14 87 07 85 c0 0f 84 d0 00 00 00 83 e8 01 48 98 48 03 0c c5 00 d9 ea 9c 48 89 4a 08 <48> 89 11 44 8b 42 10 45 85 c0 0f 85 af 00 00 00 55 48 89 fe 65 4c
[ 131.963460] RSP: 0018:ffffa40481717768 EFLAGS: 00010202
[ 131.963483] RAX: fffffffffffffffe RBX: ffffa40481717920 RCX: 000000000002db80
[ 131.963520] RDX: ffff9256fecedb80 RSI: ffff9256cbed2e80 RDI: ffffa40481717ac4
[ 131.963547] RBP: ffffa40481717808 R08: ffffa40481717920 R09: 00000000ffffffff
[ 131.963582] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[ 131.963609] R13: ffffa40481717ac4 R14: ffffa40481717ab8 R15: ffff9256c9480000
[ 131.963646] FS: 00007f23d9b9c540(0000) GS:ffff9256fecc0000(0000) knlGS:0000000000000000
[ 131.963687] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 131.963721] CR2: 000000000002db80 CR3: 0000000008444000 CR4: 00000000000506e0
[ 131.963758] Call Trace:
[ 131.963772] ? __ww_mutex_lock.isra.0+0x3a2/0x760
[ 131.963810] ? prb_read_valid+0x1c/0x20
[ 131.963830] ? console_unlock+0x2fe/0x4f0
[ 131.963849] __ww_mutex_lock_interruptible_slowpath+0x16/0x20
[ 131.963882] ww_mutex_lock_interruptible+0x83/0x90
[ 131.963908] amdgpu_bo_create_reserved+0xf0/0x1e0 [amdgpu]
[ 131.964237] amdgpu_bo_create_kernel+0x17/0x80 [amdgpu]
[ 131.964509] amdgpu_gmc_vram_checking+0x41/0xf0 [amdgpu]
[ 131.964807] gmc_v10_0_hw_init+0x105/0x120 [amdgpu]
[ 131.965108] amdgpu_device_init.cold+0x1aa4/0x1e3e [amdgpu]
......

Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix convert bad page retiremt

Pmfw read ecc info registers and store values in
eccinfo_table in the following order

umc0 ch_inst 0, 1, 2 ... 7
umc1 ch_inst 0, 1, 2 ... 7
...
umc3 ch_inst 0, 1, 2 ... 7

Driver should convert eccinfo_table_idx to channel_index according
to channel_idx_tbl.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: change FIFO reset condition to embedded display only

[Why]
FIFO reset is only necessary for fast boot sequence, where otg is disabled
and dig fe is enabled when changing dispclk. Fast boot is only enabled
on embedded displays.

[How]
Change FIFO reset condition to "embedded display only".

Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Correct MPC split policy for DCN301

[Why]
DCN301 has seamless boot enabled. With MPC split enabled
at the same time, system will hang.

[How]
Revert MPC split policy back to "MPC_SPLIT_AVOID". Since we have
ODM combine enabled on DCN301, pipe split is not necessary here.

Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: enable heavy-weight TLB flush on Arcturus

SDMA FW fixes the hang issue for adding heavy-weight TLB
flush on Arcturus, so we can enable it.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix broken debug sdma vram access function

Debug VRAM access through SDMA has several broken parts resulting in
silent MMIO fallback.

BO kernel creation takes the location of the cpu addr pointer, not
the pointer itself for address kmap.

drm_dev_enter return true on success so change access check.

The source BO is reserved but not pinned so find the address using the
cursor offset relative to its memory domain start.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: remove gart.ready flag

That's just a leftover from old radeon days and was preventing CS and GART
bindings before the hardware was initialized. But nowdays that is
perfectly valid.

The only thing we need to warn about are GART binding before the table
is even allocated.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: remove unused variable warning

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: remove unused variable

Remove set but unused variable.
warning: variable 'umc_reg_offset' set but not used

Signed-off-by: mziya <Mohammadzafar.ziya@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Remove repeated calls

Remove repeated calls.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: modify a pair of functions for the pcie port wreg/rreg

This patch will modify a pair of functions for pcie port wreg/rreg.
AMD GPU have had an independent NBIO block from SOC15 arch.
If the dirver wants to read/write the address space of the pcie devices,
it has to go through the NBIO block.
This patch will move the pcie port wreg/rreg functions to
"amdgpu_device.c", so that to reuse the functions on the
future GPU ASICs.

Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add vram check function for GMC

This patch will add vram check function for GMC block.
It will write pattern data to the vram and then read back from the vram,
so that to verify the work status of vram.
This patch will cover gmc v6/7/8/9/10.

Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

amdgpu/amdgpu_psp: remove unneeded ret variable

Return value from amdgpu_bo_create_kernel() directly instead
of taking this in another redundant variable.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: fix UVD suspend error

I met a bug recently and the kernel log:

[  330.171875] radeon 0000:03:00.0: couldn't schedule ib
[  330.175781] [drm:radeon_uvd_suspend [radeon]] *ERROR* Error destroying UVD (-22)!

In radeon drivers, using UVD suspend is as follows:

if (rdev->has_uvd) {
        uvd_v1_0_fini(rdev);
        radeon_uvd_suspend(rdev);
}

In radeon_ib_schedule function, we check the 'ring->ready' state,
but in uvd_v1_0_fini funciton, we've cleared the ready state.
So, just modify the suspend code flow to fix error.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add missing pm_runtime_put_autosuspend

pm_runtime_get_sync() increments the runtime PM usage counter even
when it returns an error code, thus a matching decrement is needed
on the error handling path to keep the counter balanced.

Signed-off-by: Yongzhi Liu <lyz_cs@pku.edu.cn>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: move calcs folder into DML

The calcs folder has FPU code on it, which should be isolated inside the
DML folder as per https://patchwork.freedesktop.org/series/93042/.

This commit aims single-handedly to correct the location of such FPU
code and does not refactor any functions.

Changes since v2:
- Corrected problems to compile when DCN was disabled.

Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: 3.2.169

This version brings along following fixes:

- Organize FPU associated code to DML
- Modify SMU_TIMEOUT macro
- Organize dcn201 code
- Address DS stays disabled problem under specific scenario
- Fix black screen issue
- Update DML to rev.99
- Address problem of eDP hot-plug feature

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: [FW Promotion] Release 0.0.100.0

Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add signal type check when verify stream backends same

[Why]
For allow eDP hot-plug feature, the stream signal may change to VIRTUAL
when plug-out and back to eDP when plug-in. OS will still setPathMode
with same timing for each plugging, but eDP gets no stream update as we
don't check signal type changing back as keeping it VIRTUAL. It's also
unsafe for future cases that stream signal is switched with same timing.

[How]
Check stream signal type change include previous HDMI signal case.

Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Dale Zhao <dale.zhao@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: update dml to rev.99 and smu clk_table w/a

[why]
1. update dml to rev.99
2. add smu clk table w/a: smu gives 1 dtm level with mismatch votage
table which causes multiple issues.

Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix black screen issue on memory clock switch en

[WHY]
With some monitors when multi plane overlay is enabled the memory
clock switching mechanism has to change and, due to an error in the
initialization sequence, it may cause a black screen.

[HOW]
Change the firmware assisted memory clock switch initialization and
tear-down sequence utilizing the prepare_bandwidth and
optimize_bandwidth contexts.

Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Felipe Clark <feclark@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: DCEFCLK DS on CLK init

[Why]
On HG APU + dGPU scenario with no display to dGPU,
DS stays disabled due to no display present.
This problem can be worked around by DAL calling
DCEFCLK DS message to SMU on clk init.

[How]
Call DCEFCLK DS message to SMU on clk init.

Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Chris Park <Chris.Park@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: modify SMU_TIMEOUT macro.

[WHY]
If some SMU features are not enabled, SMU will return fail to that
message.

[HOW]
SMU_TIMEOUT macro will treat "return fail" as timeout also.
Correct the macro to only report timeout case.

Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Ian Chen <ian.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: move FPU associated DCN302 code to DML folder (#2266)

[Why & How]
As part of the FPU isolation work documented in
https://patchwork.freedesktop.org/series/93042/, isolate
code that uses FPU in DCN302 to DML, where all FPU code
should locate.

Co-authored-by: Jasdeep Dhillon <jdhillon@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Jasdeep Dhillon <jdhillon@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: 3.2.168

This version brings improvements in the following:

- Drop unnecessary DCN guards
- Improve Z9 interface

Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: support new PMFW interface to disable Z9 only

[Why]
Need to disable Z9 on configurations that only support Z10

[How]
Support new PMFW interface to disable Z9

Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Eric Yang <Eric.Yang2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: adjust bit comparison to be more type safe

Might potentially have truncation problem with the implicit casting

Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Eric Yang <Eric.Yang2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Disabled seamless boots on DP and renamed power_down_display_on_boot

[WHY]
- We only ever want seamless boots on eDPs
- The naming and logic did not match the context

[HOW]
- Removed unnecessary if statements
- Renamed power_down_display_on_boot to seamless_boot_edp_requested and
swapped the logic

Reviewed-by: Martin Leung <Martin.Leung@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Jarif Aftab <jaraftab@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: fix error handling in radeon_driver_open_kms

The return value was never initialized so the cleanup code executed when
it isn't even necessary.

Just add proper error handling.

Fixes: ab50cb9df889 ("drm/radeon/radeon_kms: Fix a NULL pointer dereference in radeon_driver_open_kms()")
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Enable sysfs required by rocm-smi tool for One VF mode

Enable power level, power limit and fan speed
information retrieval in one VF mode.
This is required so that tool ROCM-SMI
can provide this information to users.

Signed-off-by: Marina Nikolic <Marina.Nikolic@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kevin Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOV

[Why]
This fixes 892deb48269c ("drm/amdgpu: Separate vf2pf work item init from virt data exchange").
we should read pf2vf data based at mman.fw_vram_usage_va after gmc
sw_init. commit 892deb48269c breaks this logic.

[How]
calling amdgpu_virt_exchange_data in amdgpu_virt_init_data_exchange to
set the right base in the right sequence.

v2:
call amdgpu_virt_init_data_exchange after gmc sw_init to make data
exchange workqueue run

v3:
clean up the code logic

v4:
add some comment and make the code more readable

Fixes: 892deb48269c ("drm/amdgpu: Separate vf2pf work item init from virt data exchange")
Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Reviewed-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix the code style warnings in hdp xgmi mca and umc

drm/amdgpu: Fix the code style warnings in hdp xgmi mca and umc:
1. WARNING: missing space after struct definition.
2. WARNING: please, no space before tabs.
3. WARNING: line length of xxx exceeds 100 columns.
4. ERROR: "foo* bar" should be "foo *bar".
5. ERROR: space required before the open parenthesis '('.
6. ERROR: space prohibited after that open parenthesis '('.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix the code style warnings in sdma

Fix the code style warnings in sdma:
1. WARNING: Missing a blank line after declarations.
2. ERROR: that open brace { should be on the previous line.
3. WARNING: unnecessary whitespace before a quoted newline.
4. ERROR: space required after that ',' (ctx:VxV).

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix the code style warnings in gmc

Fix the code style warnings in gmc:
ERROR: space required after that ',' (ctx:VxV).

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix the code style warnings in gfx

Fix the code style warnings in gfx:
1. WARNING: suspect code indent for conditional statements.
2. ERROR: spaces required around that '=' (ctx:WxV).

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix the code style warnings in amdgpu_ras

Fix the code style warnings in amdgpu_ras:
1. ERROR: space required before the open parenthesis '('.
2. WARNING: line length of xxx exceeds 100 columns.
3. ERROR: "foo* bar" should be "foo *bar".
4. WARNING: unnecessary whitespace before a quoted newline.
5. WARNING: space prohibited before semicolon.
6. WARNING: suspect code indent for conditional statements.
7. WARNING: braces {} are not necessary for single statement blocks.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: apply vcn harvest quirk

This is a following patch to apply the workaround only on
those boards with a bad harvest table in ip discovery.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: drop redundant check of ip discovery_bin

Early check in amdgpu_discovery_reg_base_init promises this.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: handle denied inject error into critical regions v2

Changed from v1:
remove unused brace

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: add message smu to get ecc_table

support ECC TABLE message, this table include umc ras error count
and error address

V2:
Return after smu version check fail

V3:
Return -EOPNOTSUPP, if fail to get smc ver.

V4:
ECCTABLE typo corrected and sentence rephrased.

Signed-off-by: mziya <Mohammadzafar.ziya@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add new query interface for umc_v8_7 block

add smu message query error information interface,

function name align with IP version number

V2:
Removed unused err cnt entry

Signed-off-by: mziya <Mohammadzafar.ziya@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Update smu driver interface for sienna cichlid

update smu driver if version to 0x40

V2:
Interface version append with sienna_cichlid
V3:
Aligned with latest driver interface.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: mziya <Mohammadzafar.ziya@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Revert W/A for hard hangs on DCN20/DCN21

The WA from commit 2a50edbf10c8 ("drm/amd/display: Apply w/a for hard hang
on HPD") and commit 1bd3bc745e7f ("drm/amd/display: Extend w/a for hard
hang on HPD to dcn20") causes a regression in s0ix where the system will
fail to resume properly on many laptops. Pull the workarounds out to
avoid that s0ix regression in the common case. This HPD hang happens with
an external device in special circumstances and a new W/A will need to be
developed for this in the future.

Cc: Qingqing Zhuo <qingqing.zhuo@amd.com>
Reported-by: Scott Bruce <smbruce@gmail.com>
Reported-by: Chris Hixon <linux-kernel-bugs@hixontech.com>
Reported-by: spasswolf@web.de
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215436
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1821
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1852
Fixes: 2a50edbf10c8 ("drm/amd/display: Apply w/a for hard hang on HPD")
Fixes: 1bd3bc745e7f ("drm/amd/display: Extend w/a for hard hang on HPD to dcn20")
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: drop flags check for CHIP_IP_DISCOVERY

Support for IP based discovery is in place now so this
check is no longer required.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix rejecting Tahiti GPUs

eb4fd29afd4a ("drm/amdgpu: bind to any 0x1002 PCI diplay class device") added
generic bindings to amdgpu so that that it binds to all display class devices
with VID 0x1002 and then rejects those in amdgpu_pci_probe.

Unfortunately it reuses a driver_data value of 0 to detect those new bindings,
which is already used to denote CHIP_TAHITI ASICs.

The driver_data value given to those new bindings was changed in
dd0761fd24ea1 ("drm/amdgpu: set CHIP_IP_DISCOVERY as the asic type by default")
to CHIP_IP_DISCOVERY (=36), but it seems that the check in amdgpu_pci_probe
was forgotten to be changed. Therefore, it still rejects Tahiti GPUs.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1860
Fixes: eb4fd29afd4a ("drm/amdgpu: bind to any 0x1002 PCI diplay class device")
Signed-off-by: Lukas Fink <lukas.fink1@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: don't do resets on APUs which don't support it

It can cause a hang. This is normally not enabled for GPU
hangs on these asics, but was recently enabled for handling
aborted suspends. This causes hangs on some platforms
on suspend.

Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1858
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: invert the logic in amdgpu_device_should_recover_gpu()

Rather than opting into GPU recovery support, default to on, and
opt out if it's not working on a particular GPU. This avoids the
need to add new asics to this list since this is a core feature.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Enable recovery on yellow carp

Add yellow carp to devices which support recovery

Signed-off-by: CHANDAN VURDIGERE NATARAJ <chandan.vurdigerenataraj@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Remove redundant initialization of dpg_width

dpg_width is being initialized to width but this is never read
as dpg_width is overwritten later on. Remove the redundant
initialization.

Cleans up the following clang-analyzer warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:6020:8:
warning: Value stored to 'dpg_width' during its initialization is never
read [clang-analyzer-deadcode.DeadStores].

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Replace one-element array with flexible-array member

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use "flexible array members" for these cases. The older
style of one-element or zero-length arrays should no longer be used.
Reference:
https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: fix null ptr access

check null ptr first before access its element

v2: check adev->pm.dpm_enabled early in amdgpu_debugfs_pm_init()

Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix compile warning for ras_block_match_default

fix compile warning for ras_block_match_default

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Use ARRAY_SIZE to get array length

Use ARRAY_SIZE to get array length.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: clean up some inconsistent indenting

Eliminate the follow smatch warnings:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3504 amdgpu_device_init()
warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1716
amdgpu_ras_error_status_query() warn: if statement not indented
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1058 amdgpu_ras_error_inject()
warn: inconsistent indenting

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: remove unneeded semicolon

Eliminate the following coccicheck warning:
./drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2725:16-17: Unneeded semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list

No longer insert ras blocks into ras_list if it already exists in ras_list.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add ras supported check for register_ras_block

Add ras supported check for register_ras_block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add interface to load SRIOV cap FW

- Add interface to load SRIOV cap FW. If the FW does not
  exist, simply skip this FW loading routine.
  This FW will only be loaded under SRIOV. Other driver
  configuration will not be affected.
  By adding this interface, it will make us easier to
  prepare SRIOV Linux guest driver for different users.

- Update sysfs interface to read cap FW version.

- Refactor PSP FW loading routine under SRIOV to use a
  unified SWITCH statement instead of using IF statement

- Remove redundant amdgpu_sriov_vf() check in FW loading
  routine

Acked-by: Monk Liu <monk.liu@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Fix indentation on switch statement

Cases should be same indentation as switch. Also fix string spanning
across multiple lines.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: cleanup ttm debug sdma vram access function

Some suggested cleanups to declutter ttm when doing debug VRAM access over
SDMA.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: improve debug VRAM access performance using sdma

For better performance during VRAM access for debugged processes, do
read/write copies over SDMA.

In order to fulfill post mortem debugging on a broken device, fallback to
stable MMIO access when gpu recovery is disabled or when job submission
time outs are set to max. Failed SDMA access should automatically fall
back to MMIO access.

Use a pre-allocated GTT bounce buffer pre-mapped into GART to avoid
page-table updates and TLB flushes on access.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Removed redundant ras code

Removed redundant ras code.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Adjust error inject function code style in amdgpu_ras.c

1. Move xgmi special error inject function from amdgpu_ras.c to xgmi block.
2. Support to use psp_ras_trigger_error as default error inject function in amdgpu_ras.c. If .ras_error_inject isn't defined in ras block, default error inject function will take effect.

v2: squash in warning fix (Alex)

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Modify mca block to fit for the unified ras block data and ops

1.Modify mca block to fit for the unified ras block data and ops.
2.Define special .ras_block_match function for mca block to identify itself.
3.Change amdgpu_mca_ras_funcs to amdgpu_mca_ras_block(amdgpu_mca_ras had been used), and the corresponding variable name remove _funcs suffix.
4.Remove the const flag of cma ras variable so that cma ras block can be able to be inserted into amdgpu device ras block link list.
5.Invoke amdgpu_ras_register_ras_block function to register cma ras block into amdgpu device ras block link list.
6.Remove the redundant code about cma in amdgpu_ras.c after using the unified ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Modify sdma block to fit for the unified ras block data and ops

1.Modify sdma block to fit for the unified ras block data and ops.
2.Change amdgpu_sdma_ras_funcs to amdgpu_sdma_ras, and the corresponding variable name remove _funcs suffix.
3.Remove the const flag of sdma ras variable so that sdma ras block can be able to be inserted into amdgpu device ras block link list.
4.Invoke amdgpu_ras_register_ras_block function to register sdma ras block into amdgpu device ras block link list.
5.Remove the redundant code about sdma in amdgpu_ras.c after using the unified ras block.
6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of sdma versions. If .ras_late_init and .ras_fini had been defined by the selected sdma version, the defined functions will take effect; if not defined, default fill them with amdgpu_sdma_ras_late_init and amdgpu_sdma_ras_fini.

v2: squash in warning fix (Alex)

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Modify umc block to fit for the unified ras block data and ops

1.Modify umc block to fit for the unified ras block data and ops.
2.Change amdgpu_umc_ras_funcs to amdgpu_umc_ras, and the corresponding variable name remove _funcs suffix.
3.Remove the const flag of umc ras variable so that umc ras block can be able to be inserted into amdgpu device ras block link list.
4.Invoke amdgpu_ras_register_ras_block function to register umc ras block into amdgpu device ras block link list.
5.Remove the redundant code about umc in amdgpu_ras.c after using the unified ras block.
6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of umc versions. If .ras_late_init and .ras_fini had been defined by the selected umc version, the defined functions will take effect; if not defined, default fill them with amdgpu_umc_ras_late_init and amdgpu_umc_ras_fini.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Modify nbio block to fit for the unified ras block data and ops

1.Modify nbio block to fit for the unified ras block data and ops.
2.Change amdgpu_nbio_ras_funcs to amdgpu_nbio_ras, and the corresponding variable name remove _funcs suffix.
3.Remove the const flag of mmhub ras variable so that nbio ras block can be able to be inserted into amdgpu device ras block link list.
4.Invoke amdgpu_ras_register_ras_block function to register nbio ras block into amdgpu device ras block link list.
5.Remove the redundant code about nbio in amdgpu_ras.c after using the unified ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>