]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
5 years agodrm/amd/dc: Kill dc_conn_log_hex_linux()
Lyude Paul [Tue, 31 Mar 2020 21:22:24 +0000 (17:22 -0400)]
drm/amd/dc: Kill dc_conn_log_hex_linux()

DRM already supports tracing DPCD transactions, there's no reason for
the existence of this function. Also, it prints one byte per-line which
is way too loud. So, just remove it.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/amdgpu_dm/mst: Remove useless sideband tracing
Lyude Paul [Tue, 31 Mar 2020 21:22:23 +0000 (17:22 -0400)]
drm/amd/amdgpu_dm/mst: Remove useless sideband tracing

We already trace DPCD reads/writes on both MST and SST, there's no
reason to have this code here (plus, toggling these things with a
define at the top of the file isn't how we do things in the kernel).

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/vcn: fix spelling mistake "fimware" -> "firmware"
Colin Ian King [Wed, 1 Apr 2020 16:35:45 +0000 (17:35 +0100)]
drm/amdgpu/vcn: fix spelling mistake "fimware" -> "firmware"

There is a spelling mistake in a dev_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: fix and cleanup amdgpu_gem_object_close v4
Christian König [Thu, 12 Mar 2020 11:03:34 +0000 (12:03 +0100)]
drm/amdgpu: fix and cleanup amdgpu_gem_object_close v4

The problem is that we can't add the clear fence to the BO
when there is an exclusive fence on it since we can't
guarantee the the clear fence will complete after the
exclusive one.

To fix this refactor the function and also add the exclusive
fence as shared to the resv object.

v2: fix warning
v3: add excl fence as shared instead
v4: squash in fix for fence handling in amdgpu_gem_object_close

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: code cleanup of dc_link file on func dc_link_construct
Melissa Wen [Tue, 31 Mar 2020 11:00:32 +0000 (08:00 -0300)]
drm/amd/display: code cleanup of dc_link file on func dc_link_construct

Removes codestyle issues in dc_link file, on dc_link_construct and
translate_encoder_to_transmitter as suggested by checkpatch.pl.

Types covered:

CHECK: Lines should not end with a '('
WARNING: Missing a blank line after declarations
CHECK: Alignment should match open parenthesis
CHECK: Comparison to NULL could be written
CHECK: Logical continuations should be on the previous line
CHECK: Blank lines aren't necessary after an open brace '{'

Signed-off-by: Melissa Wen <melissa.srw@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: code cleanup on dc_link from is_same_edid to get_ddc_line
Melissa Wen [Tue, 31 Mar 2020 10:59:59 +0000 (07:59 -0300)]
drm/amd/display: code cleanup on dc_link from is_same_edid to get_ddc_line

Removes codestyle issues on the file dc_link between is_same_edid and
get_ddc_line as suggested by checkpatch.pl.

Types covered:

CHECK: Blank lines aren't necessary after an open brace '{'
CHECK: Blank lines aren't necessary before a close brace '}'
WARNING: braces {} are not necessary for single statement blocks
CHECK: Comparison to NULL could be written
CHECK: Lines should not end with a '('
CHECK: Alignment should match open parenthesis
CHECK: Using comparison to false is error prone
CHECK: Using comparison to true is error prone
WARNING: Avoid multiple line dereference - prefer 'link->dpcd_caps.sink_count.bits.SINK_COUNT'
CHECK: Unnecessary parentheses around
WARNING: Missing a blank line after declarations

Signed-off-by: Melissa Wen <melissa.srw@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: codestyle cleanup on dc_link file until detect_dp func
Melissa Wen [Tue, 31 Mar 2020 10:59:20 +0000 (07:59 -0300)]
drm/amd/display: codestyle cleanup on dc_link file until detect_dp func

Removes codestyle issues on the file dc_link until detect_dp func as
suggested by checkpatch.pl.

Types covered:

CHECK: Please don't use multiple blank lines
CHECK: Comparison to NULL could be written
ERROR: space required before the open parenthesis '('
CHECK: Alignment should match open parenthesis
CHECK: Lines should not end with a '('
WARNING: please, no space before tabs
WARNING: Comparisons should place the constant on the right side of the test
WARNING: braces {} are not necessary for single statement blocks
CHECK: Please don't use multiple blank lines

Signed-off-by: Melissa Wen <melissa.srw@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: cleanup codestyle type BLOCK_COMMENT_STYLE on dc_link
Melissa Wen [Tue, 31 Mar 2020 10:57:34 +0000 (07:57 -0300)]
drm/amd/display: cleanup codestyle type BLOCK_COMMENT_STYLE on dc_link

Solve comments alignment problems on dc_link file

Signed-off-by: Melissa Wen <melissa.srw@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: enable VCN2.5 DPG mode for Arcturus
James Zhu [Mon, 10 Feb 2020 15:28:00 +0000 (10:28 -0500)]
drm/amdgpu: enable VCN2.5 DPG mode for Arcturus

Enable VCN2.5 DPG mode for arcturus after below items are applied.
ASD: 0x21000023
SOS: 0x17003B
VCN firmware Version ENC: 1.1 DEC: 1 VEP: 0 Revision: 16
VBIOS: 23

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/vcn2.5: Add firmware w/r ptr reset sync
James Zhu [Mon, 30 Mar 2020 00:15:44 +0000 (20:15 -0400)]
drm/amdgpu/vcn2.5: Add firmware w/r ptr reset sync

Add firmware write/read point reset sync through shared memory

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/vcn2.0: Add firmware w/r ptr reset sync
James Zhu [Mon, 30 Mar 2020 00:11:34 +0000 (20:11 -0400)]
drm/amdgpu/vcn2.0: Add firmware w/r ptr reset sync

Add firmware write/read point reset sync through shared memory

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/vcn: Add firmware share memory support
James Zhu [Mon, 30 Mar 2020 00:00:22 +0000 (20:00 -0400)]
drm/amdgpu/vcn: Add firmware share memory support

Added firmware share memory support for VCN. Current multiple
queue mode is enabled only.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/vcn2.5: stall DPG when WPTR/RPTR reset
James Zhu [Tue, 18 Feb 2020 22:46:29 +0000 (17:46 -0500)]
drm/amdgpu/vcn2.5: stall DPG when WPTR/RPTR reset

Add vcn dpg harware synchronization to fix race condition
issue between vcn driver and hardware.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/vcn2.0: stall DPG when WPTR/RPTR reset
James Zhu [Tue, 18 Feb 2020 22:44:39 +0000 (17:44 -0500)]
drm/amdgpu/vcn2.0: stall DPG when WPTR/RPTR reset

Add vcn dpg harware synchronization to fix race condition
issue between vcn driver and hardware.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/vcn: fix race condition issue for dpg unpause mode switch
James Zhu [Mon, 10 Feb 2020 17:52:16 +0000 (12:52 -0500)]
drm/amdgpu/vcn: fix race condition issue for dpg unpause mode switch

Couldn't only rely on enc fence to decide switching to dpg unpaude mode.
Since a enc thread may not schedule a fence in time during multiple
threads running situation.

v3: 1. Rename enc_submission_cnt to dpg_enc_submission_cnt
    2. Add dpg_enc_submission_cnt check in idle_work_handler

v4:  Remove extra counter check, and reduce counter before idle
    work schedule

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/vcn: fix race condition issue for vcn start
James Zhu [Mon, 10 Feb 2020 17:41:41 +0000 (12:41 -0500)]
drm/amdgpu/vcn: fix race condition issue for vcn start

Fix race condition issue when multiple vcn starts are called.

v2: Removed checking the return value of cancel_delayed_work_sync()
to prevent possible races here.

v3: Add total_submission_cnt to avoid gate power unexpectedly.

v4: Remove extra counter check, and reduce counter before idle
work schedule

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: skip access sdma_v5_0 registers under SRIOV (v2)
Yintian Tao [Tue, 10 Mar 2020 15:24:53 +0000 (23:24 +0800)]
drm/amdgpu: skip access sdma_v5_0 registers under SRIOV (v2)

Due to the new L1.0b0c011b policy, many SDMA registers are blocked which raise
the violation warning. There are total 6 pair register needed to be skipped
when driver init and de-init.
mmSDMA0/1_CNTL
mmSDMA0/1_F32_CNTL
mmSDMA0/1_UTCL1_PAGE
mmSDMA0/1_UTCL1_CNTL
mmSDMA0/1_CHICKEN_BITS,
mmSDMA0/1_SEM_WAIT_FAIL_TIMER_CNTL

v2: squash in warning fix

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: stop disable the scheduler during HW fini
Christian König [Fri, 21 Feb 2020 14:10:31 +0000 (15:10 +0100)]
drm/amdgpu: stop disable the scheduler during HW fini

When we stop the HW for example for GPU reset we should not stop the
front-end scheduler. Otherwise we run into intermediate failures during
command submission.

The scheduler should only be stopped in very few cases:
1. We can't get the hardware working in ring or IB test after a GPU reset.
2. The KIQ scheduler is not used in the front-end and should be disabled during GPU reset.
3. In amdgpu_ring_fini() when the driver unloads.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Test-by: Dennis Li <dennis.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: added mutex protection on msg issuing
Evan Quan [Fri, 27 Mar 2020 03:20:29 +0000 (11:20 +0800)]
drm/amd/powerplay: added mutex protection on msg issuing

This could avoid the possible race condition.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: unified interfaces for message issuing and response checking
Evan Quan [Fri, 27 Mar 2020 02:48:20 +0000 (10:48 +0800)]
drm/amd/powerplay: unified interfaces for message issuing and response checking

This can avoid potential race condition between them.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: avoid calling Vega20 specific SMU message implemention
Evan Quan [Thu, 26 Mar 2020 10:01:25 +0000 (18:01 +0800)]
drm/amd/powerplay: avoid calling Vega20 specific SMU message implemention

Prepare for coming lock protection for SMU message issuing.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: avoid calling SMU10 specific SMU message implemention
Evan Quan [Thu, 26 Mar 2020 09:52:50 +0000 (17:52 +0800)]
drm/amd/powerplay: avoid calling SMU10 specific SMU message implemention

Prepare for coming lock protection for SMU message issuing.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: avoid calling SMU9 specific SMU message implemention
Evan Quan [Thu, 26 Mar 2020 09:44:45 +0000 (17:44 +0800)]
drm/amd/powerplay: avoid calling SMU9 specific SMU message implemention

Prepare for coming lock protection for SMU message issuing.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: avoid calling SMU8 specific SMU message implemention
Evan Quan [Thu, 26 Mar 2020 08:47:45 +0000 (16:47 +0800)]
drm/amd/powerplay: avoid calling SMU8 specific SMU message implemention

Prepare for coming lock protection for SMU message issuing.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerpaly: drop unused APIs
Evan Quan [Thu, 26 Mar 2020 08:35:25 +0000 (16:35 +0800)]
drm/amd/powerpaly: drop unused APIs

Drop unused smu7 message APIs.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: avoid calling SMU7 specific SMU message implemention
Evan Quan [Thu, 26 Mar 2020 08:30:52 +0000 (16:30 +0800)]
drm/amd/powerplay: avoid calling SMU7 specific SMU message implemention

Prepare for coming lock protection for SMU message issuing.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: avoid calling CI specific SMU message implemention
Evan Quan [Thu, 26 Mar 2020 08:09:05 +0000 (16:09 +0800)]
drm/amd/powerplay: avoid calling CI specific SMU message implemention

Prepare for coming lock protection for SMU message issuing.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: reroute VMC and UMD to IH ring 1 for oss v5
Alex Sierra [Mon, 23 Mar 2020 19:00:43 +0000 (14:00 -0500)]
drm/amdgpu: reroute VMC and UMD to IH ring 1 for oss v5

[Why]
Due Page faults can easily overwhelm the interrupt handler.
So to make sure that we never lose valuable interrupts on the primary ring
we re-route page faults to IH ring 1.
It also facilitates the recovery page process, since it's already
running from a process context.
This is valid for Arcturus and future Navi generation GPUs.

[How]
Setting IH_CLIENT_CFG_DATA for VMC and UMD IH clients.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: call psp to program ih cntl in SR-IOV for Navi
Alex Sierra [Mon, 23 Mar 2020 18:53:39 +0000 (13:53 -0500)]
drm/amdgpu: call psp to program ih cntl in SR-IOV for Navi

call psp to program ih cntl in SR-IOV if supported on Navi and Arcturus.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: enable IH ring 1 and ring 2 for navi
Alex Sierra [Mon, 23 Mar 2020 18:28:15 +0000 (13:28 -0500)]
drm/amdgpu: enable IH ring 1 and ring 2 for navi

Support added into IH to enable ring1 and ring2 for navi10_ih.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: ih doorbell size of range changed for nbio v7.4
Alex Sierra [Wed, 18 Mar 2020 23:26:19 +0000 (18:26 -0500)]
drm/amdgpu: ih doorbell size of range changed for nbio v7.4

[Why]
nbio v7.4 size of ih doorbell range is 64 bit. This requires 2 DWords per register.

[How]
Change ih doorbell size from 2 to 4. This means two Dwords per ring.
Current configuration uses two ih rings.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: infinite retries fix from UTLC1 RB SDMA
Alex Sierra [Wed, 18 Mar 2020 22:57:10 +0000 (17:57 -0500)]
drm/amdgpu: infinite retries fix from UTLC1 RB SDMA

[Why]
Previously these registers were set to 0. This was causing an
infinite retry on the UTCL1 RB, preventing higher priority RB such as paging RB.

[How]
Set to one the SDMAx_UTLC1_TIMEOUT registers for all SDMAs on Vega10, Vega12,
Vega20 and Arcturus.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: fix non-pointer dereference for non-RAS supported
Evan Quan [Fri, 27 Mar 2020 07:39:06 +0000 (15:39 +0800)]
drm/amdgpu: fix non-pointer dereference for non-RAS supported

Backtrace on gpu recover test on Navi10.

[ 1324.516681] RIP: 0010:amdgpu_ras_set_error_query_ready+0x15/0x20 [amdgpu]
[ 1324.523778] Code: 4c 89 f7 e8 cd a2 a0 d8 e9 99 fe ff ff 45 31 ff e9 91 fe ff ff 0f 1f 44 00 00 55 48 85 ff 48 89 e5 74 0e 48 8b 87 d8 2b 01 00 <40> 88 b0 38 01 00 00 5d c3 66 90 0f 1f 44 00 00 55 31 c0 48 85 ff
[ 1324.543452] RSP: 0018:ffffaa1040e4bd28 EFLAGS: 00010286
[ 1324.549025] RAX: 0000000000000000 RBX: ffff911198b20000 RCX: 0000000000000000
[ 1324.556217] RDX: 00000000000c0a01 RSI: 0000000000000000 RDI: ffff911198b20000
[ 1324.563514] RBP: ffffaa1040e4bd28 R08: 0000000000001000 R09: ffff91119d0028c0
[ 1324.570804] R10: ffffffff9a606b40 R11: 0000000000000000 R12: 0000000000000000
[ 1324.578413] R13: ffffaa1040e4bd70 R14: ffff911198b20000 R15: 0000000000000000
[ 1324.586464] FS:  00007f4441cbf540(0000) GS:ffff91119ed80000(0000) knlGS:0000000000000000
[ 1324.595434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1324.601345] CR2: 0000000000000138 CR3: 00000003fcdf8004 CR4: 00000000003606e0
[ 1324.608694] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1324.616303] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1324.623678] Call Trace:
[ 1324.626270]  amdgpu_device_gpu_recover+0x6e7/0xc50 [amdgpu]
[ 1324.632018]  ? seq_printf+0x4e/0x70
[ 1324.636652]  amdgpu_debugfs_gpu_recover+0x50/0x80 [amdgpu]
[ 1324.643371]  seq_read+0xda/0x420
[ 1324.647601]  full_proxy_read+0x5c/0x90
[ 1324.652426]  __vfs_read+0x1b/0x40
[ 1324.656734]  vfs_read+0x8e/0x130
[ 1324.660981]  ksys_read+0xa7/0xe0
[ 1324.665201]  __x64_sys_read+0x1a/0x20
[ 1324.669907]  do_syscall_64+0x57/0x1c0
[ 1324.674517]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1324.680654] RIP: 0033:0x7f44417cf081

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/amdgpu: Include headers for PWR and SMUIO registers
Tom St Denis [Fri, 27 Mar 2020 13:30:52 +0000 (09:30 -0400)]
drm/amd/amdgpu: Include headers for PWR and SMUIO registers

Clean up the smu10, smu12, and gfx9 drivers to use headers for
registers instead of hardcoding in the C source files.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: implement more ib pools (v2)
xinhui pan [Thu, 26 Mar 2020 00:38:29 +0000 (08:38 +0800)]
drm/amdgpu: implement more ib pools (v2)

We have three ib pools, they are normal, VM, direct pools.

Any jobs which schedule IBs without dependence on gpu scheduler should
use DIRECT pool.

Any jobs schedule direct VM update IBs should use VM pool.

Any other jobs use NORMAL pool.

v2: squash in coding style fix

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Move backlight pwm enable function call
Wyatt Wood [Wed, 11 Mar 2020 19:46:26 +0000 (15:46 -0400)]
drm/amd/display: Move backlight pwm enable function call

[Why]
Can't call dmub_abm_enable_fractional_pwm from dmub_abm_create as
dmub_srv is still null at this init stage, and therefore can't call to
fw.

[How]
Move call to dmub_abm_init_backlight. This should be the first call from
the driver for ABM.

Signed-off-by: Wyatt Wood <wyatt.wood@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Add ABM driver implementation
Wyatt Wood [Thu, 20 Feb 2020 16:50:44 +0000 (11:50 -0500)]
drm/amd/display: Add ABM driver implementation

[Why]
Moving ABM from DMCU to DMCUB.

[How]
Add ABM driver files and implementation.

Signed-off-by: Wyatt Wood <wyatt.wood@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: 3.2.77
Aric Cyr [Mon, 9 Mar 2020 15:08:49 +0000 (11:08 -0400)]
drm/amd/display: 3.2.77

Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: extend compute job timeout
Jiawei [Thu, 26 Mar 2020 07:10:51 +0000 (15:10 +0800)]
drm/amdgpu: extend compute job timeout

extend compute lockup timeout to 60000 for SR-IOV.

Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Jiawei <Jiawei.Gu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: No need support vcn decode
Emily Deng [Thu, 26 Mar 2020 05:41:57 +0000 (13:41 +0800)]
drm/amdgpu: No need support vcn decode

As no need to support vcn decode feature, so disable the
ring for SR-IOV.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: postpone entering fullaccess mode
Monk Liu [Tue, 10 Mar 2020 10:12:13 +0000 (18:12 +0800)]
drm/amdgpu: postpone entering fullaccess mode

if host support new handshake we only need to enter
fullaccess_mode in ip_init() part, otherwise we need
to do it before reading vbios (becuase host prepares vbios
for VF only after received REQ_GPU_INIT event under
legacy handshake)

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: adjust sequence of ip_discovery init and timeout_setting
Monk Liu [Wed, 4 Mar 2020 13:33:27 +0000 (21:33 +0800)]
drm/amdgpu: adjust sequence of ip_discovery init and timeout_setting

what:
1)move timtout setting before ip_early_init to reduce exclusive mode
cost for SRIOV

2)move ip_discovery_init() to inside of amdgpu_discovery_reg_base_init()
it is a prepare for the later upcoming patches.

why:
in later upcoming patches we would use a new mailbox event --
"req_gpu_init_data", which is a callback hooked in adev->virt.ops and
this callback send a new event "REQ_GPU_INIT_DAT" to host to notify
host to do some preparation like "IP discovery/vbios on the VF FB"
and this callback must be:

A) invoked after set_ip_block() because virt.ops is configured during
set_ip_block()

B) invoked before ip_discovery_init() becausen ip_discovery_init()
need host side prepares everything in VF FB first.

current place of ip_discovery_init() is before we can invoke callback
of adev->virt.ops, thus we must move ip_discovery_init() to a place
after the adev->virt.ops all settle done, and the perfect place is in
amdgpu_discovery_reg_base_init()

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: equip new req_init_data handshake
Monk Liu [Wed, 4 Mar 2020 15:51:51 +0000 (23:51 +0800)]
drm/amdgpu: equip new req_init_data handshake

by this new handshake host side can prepare vbios/ip-discovery
and pf&vf exchange data upon recieving this request without
stopping world switch.

this way the world switch is less impacted by VF's exclusive mode
request

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: use static mmio offset for NV mailbox
Monk Liu [Wed, 4 Mar 2020 15:46:45 +0000 (23:46 +0800)]
drm/amdgpu: use static mmio offset for NV mailbox

what:
with the new "req_init_data" handshake we need to use mailbox
before do IP discovery, so in mxgpu_nv.c file the original
SOC15_REG method won'twork because that depends on IP discovery
complete first.

how:
so the solution is to always use static MMIO offset for NV+ mailbox
registers.
HW team confirm us all MAILBOX registers will be at the same
offset for all ASICs, no IP discovery needed for those registers

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: introduce new request and its function
Monk Liu [Wed, 4 Mar 2020 03:38:36 +0000 (11:38 +0800)]
drm/amdgpu: introduce new request and its function

1) modify xgpu_nv_send_access_requests to support
new idh request

2) introduce new function: req_gpu_init_data() which
is used to notify host to prepare vbios/ip-discovery/pfvf exchange

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: introduce new idh_request/event enum
Monk Liu [Tue, 3 Mar 2020 10:13:51 +0000 (18:13 +0800)]
drm/amdgpu: introduce new idh_request/event enum

new idh_request and ihd_event to prepare for the
new handshake protocol implementation later

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: cleanup idh event/req for NV headers
Monk Liu [Tue, 3 Mar 2020 08:40:00 +0000 (16:40 +0800)]
drm/amdgpu: cleanup idh event/req for NV headers

1) drop the headers from AI in mxgpu_nv.c, should refer to mxgpu_nv.h

2) the IDH_EVENT_MAX is not used and not aligned with host side
   so drop it
3) the IDH_TEXT_MESSAG was provided in host but not defined in guest

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/amdgpu: Fix SMUIO/PWR Confusion (v2)
Tom St Denis [Wed, 25 Mar 2020 19:07:01 +0000 (15:07 -0400)]
drm/amd/amdgpu: Fix SMUIO/PWR Confusion (v2)

The PWR block was merged into the SMUIO block by revision 12 so we add
that to the smuio_12_0_0 headers.

(v2): Drop nonsensical smuio_10_0_0 header

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/amdgpu: Move PWR_MISC_CNTL_STATUS to its own header
Tom St Denis [Wed, 25 Mar 2020 17:44:54 +0000 (13:44 -0400)]
drm/amd/amdgpu: Move PWR_MISC_CNTL_STATUS to its own header

The register is part of the PWR block not the GC block.  Move to
its own header.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/amdgpu: Add missing SMUIO v12 register to headers
Tom St Denis [Wed, 25 Mar 2020 13:33:29 +0000 (09:33 -0400)]
drm/amd/amdgpu: Add missing SMUIO v12 register to headers

This register is needed by umr.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/uvd7: remove unnecessary conversion to bool
Chen Zhou [Wed, 25 Mar 2020 02:32:50 +0000 (10:32 +0800)]
drm/amdgpu/uvd7: remove unnecessary conversion to bool

The conversion to bool is not needed, remove it.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/radeon: align short build log
Masahiro Yamada [Thu, 13 Feb 2020 15:39:27 +0000 (00:39 +0900)]
drm/radeon: align short build log

This beautifies the build log.

[Before]

  HOSTCC  drivers/gpu/drm/radeon/mkregtable
  MKREGTABLE drivers/gpu/drm/radeon/r100_reg_safe.h
  MKREGTABLE drivers/gpu/drm/radeon/rn50_reg_safe.h
  CC [M]  drivers/gpu/drm/radeon/r100.o
  MKREGTABLE drivers/gpu/drm/radeon/r300_reg_safe.h
  CC [M]  drivers/gpu/drm/radeon/r300.o

[After]

  HOSTCC  drivers/gpu/drm/radeon/mkregtable
  MKREG   drivers/gpu/drm/radeon/r100_reg_safe.h
  MKREG   drivers/gpu/drm/radeon/rn50_reg_safe.h
  CC [M]  drivers/gpu/drm/radeon/r100.o
  MKREG   drivers/gpu/drm/radeon/r300_reg_safe.h
  CC [M]  drivers/gpu/drm/radeon/r300.o

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/radeon: use pattern rule to avoid code duplication in Makefile
Masahiro Yamada [Thu, 13 Feb 2020 15:39:26 +0000 (00:39 +0900)]
drm/radeon: use pattern rule to avoid code duplication in Makefile

This Makefile repeats similar build rules. Use a pattern rule.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/radeon: fix build rules of *_reg_safe.h
Masahiro Yamada [Wed, 25 Mar 2020 20:48:35 +0000 (16:48 -0400)]
drm/radeon: fix build rules of *_reg_safe.h

if_changed must have FORCE as a prerequisite, and the targets must be
added to 'targets'.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/radeon: remove unneeded header include path
Masahiro Yamada [Wed, 25 Mar 2020 20:47:16 +0000 (16:47 -0400)]
drm/radeon: remove unneeded header include path

A header include path without $(srctree)/ is suspicious because it does
not work with O= builds.

You can build drivers/gpu/drm/radeon/ without this include path.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Ignore the not supported error from psp
Emily Deng [Wed, 25 Mar 2020 10:58:02 +0000 (18:58 +0800)]
drm/amdgpu: Ignore the not supported error from psp

As the VCN firmware will not use
vf vmr now. And new psp policy won't support set tmr
now.
For driver compatible issue, ignore the not support error.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Add 4k resolution for virtual display
Emily Deng [Wed, 25 Mar 2020 10:59:16 +0000 (18:59 +0800)]
drm/amdgpu: Add 4k resolution for virtual display

Add 4k resolution for virtual connector.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Virtual display need to support multiple ctrcs
Emily Deng [Wed, 25 Mar 2020 11:01:58 +0000 (19:01 +0800)]
drm/amdgpu: Virtual display need to support multiple ctrcs

The crtc num is determined by virtual_display parameter.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: disable ras query and iject during gpu reset
John Clements [Wed, 25 Mar 2020 08:01:14 +0000 (16:01 +0800)]
drm/amdgpu: disable ras query and iject during gpu reset

added flag to ras context to indicate if ras query functionality is ready

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: added xgmi ras error reset sequence
John Clements [Wed, 25 Mar 2020 07:56:31 +0000 (15:56 +0800)]
drm/amdgpu: added xgmi ras error reset sequence

added mechanism to clear xgmi ras status inbetween error queries

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: cleanup all virtualization detection routine
Monk Liu [Wed, 4 Mar 2020 06:02:55 +0000 (14:02 +0800)]
drm/amdgpu: cleanup all virtualization detection routine

we need to move virt detection much earlier because:
1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always
be at DE5 (dw) mmio offset from vega10, this way there is no
need to implement detect_hw_virt() routine in each nbio/chip file.
for VI SRIOV chip (tonga & fiji), the BIF_IOV_FUNC_IDENTIFIER is at
0x1503

2) we need to acknowledged we are SRIOV VF before we do IP discovery because
the IP discovery content will be updated by host everytime after it recieved
a new coming "REQ_GPU_INIT_DATA" request from guest (there will be patches
for this new handshake soon).

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: amends feature bits for MM bandwidth mgr
Monk Liu [Tue, 3 Mar 2020 11:23:24 +0000 (19:23 +0800)]
drm/amdgpu: amends feature bits for MM bandwidth mgr

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: purge ip_discovery headers
Monk Liu [Wed, 4 Mar 2020 05:46:09 +0000 (13:46 +0800)]
drm/amdgpu: purge ip_discovery headers

those two headers are not needed for ip discovery

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Fix FRU data checking
Kent Russell [Tue, 24 Mar 2020 09:29:46 +0000 (05:29 -0400)]
drm/amdgpu: Fix FRU data checking

Ensure that when we memcpy, we don't end up copying more data than
the struct supports. For now, this is 16 characters for product number
and serial number, and 32 chars for product name

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Expose TA FW version in fw_version file
Kent Russell [Tue, 24 Mar 2020 11:40:20 +0000 (07:40 -0400)]
drm/amdgpu: Expose TA FW version in fw_version file

Reporting the fw_version just returns 0, the actual version is kept as
ta_*_ucode_version. This is the same as the feature reported in
the amdgpu_firmware_info debugfs file.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: disabled fru eeprom access
John Clements [Mon, 23 Mar 2020 09:22:01 +0000 (17:22 +0800)]
drm/amdgpu: disabled fru eeprom access

added asic support checking function to be filled in by supported asic types

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/amdgpu: Add GFX9.1 PWR_MISC_CNTL_STATUS register to headers
Tom St Denis [Fri, 20 Mar 2020 18:21:58 +0000 (14:21 -0400)]
drm/amd/amdgpu: Add GFX9.1 PWR_MISC_CNTL_STATUS register to headers

The registers are needed for umr and not in the headers.  I left them
in the gfx_v9_0.c since it includes 9.0 and 9.4 headers and including
9.1 headers would result in a lot of duplicate registers clashing.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Add documentation for unique_id
Kent Russell [Fri, 20 Mar 2020 13:19:01 +0000 (09:19 -0400)]
drm/amdgpu: Add documentation for unique_id

Add the amdgpu.rst tie-ins for the unique_id documentation

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Add documentation for PCIe accounting
Kent Russell [Fri, 20 Mar 2020 13:17:21 +0000 (09:17 -0400)]
drm/amdgpu: Add documentation for PCIe accounting

Add the amdgpu.rst tie-ins for the pcie accounting documentation

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Add documentation for memory info
Kent Russell [Thu, 19 Mar 2020 19:03:40 +0000 (15:03 -0400)]
drm/amdgpu: Add documentation for memory info

Add the amdgpu.rst tie-ins for the mem_info documentation

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Enable reading FRU chip via I2C v3
Kent Russell [Fri, 13 Mar 2020 13:21:55 +0000 (09:21 -0400)]
drm/amdgpu: Enable reading FRU chip via I2C v3

Allow for reading of information like manufacturer, product number
and serial number from the FRU chip. Report the serial number as
the new sysfs file serial_number. Note that this only works on
server cards, as consumer cards do not feature the FRU chip, which
contains this information.

v2: Add documentation to amdgpu.rst, add helper functions,
    rename functions for consistency, fix bad starting offset
v3: Remove testing definitions

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: kfree the wrong pointer
Jack Zhang [Wed, 1 Apr 2020 12:06:58 +0000 (20:06 +0800)]
drm/amdkfd: kfree the wrong pointer

Originally, it kfrees the wrong pointer for mem_obj.
It would cause memory leak under stress test.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: increase HDCP authentication delay
Bhawanpreet Lakha [Mon, 30 Mar 2020 17:43:23 +0000 (13:43 -0400)]
drm/amd/display: increase HDCP authentication delay

[Why]
Some displays have an issue where the hdcp chips are initialized after the
display has already lit up. This means we can sometimes authentication too early
and cause authentication failures.

This happens when HDCP is enabled and the display is power cycled. Normally we
will authenticate 2 seconds after the display is lit, but some displays need a
bit more time.

[How]
Increase delay to 3 second before we start authentication.

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Correctly cancel future watchdog and callback events
Bhawanpreet Lakha [Mon, 30 Mar 2020 17:37:07 +0000 (13:37 -0400)]
drm/amd/display: Correctly cancel future watchdog and callback events

[Why]
-We need to cancel future callbacks/watchdogs events when a callback/watchdog event happens

[How]
-fix typo in event_callback()
-cancel callback, not watchdog
-cancel watchdog events in event_watchdog_timer().

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Don't try hdcp1.4 when content_type is set to type1
Bhawanpreet Lakha [Mon, 30 Mar 2020 17:29:46 +0000 (13:29 -0400)]
drm/amd/display: Don't try hdcp1.4 when content_type is set to type1

[Why]
When content type property is set to 1. We should enable hdcp2.2 and if we cant
then stop. Currently the way it works in DC is that if we fail hdcp2, we will
try hdcp1 after.

[How]
Use link config to force disable hdcp1.4 when type1 is set.

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: move the ASIC specific nbio operation out of smu_v11_0.c
Evan Quan [Fri, 27 Mar 2020 07:33:00 +0000 (15:33 +0800)]
drm/amd/powerplay: move the ASIC specific nbio operation out of smu_v11_0.c

This is ASIC specific and should be placed in _ppt.c of each ASIC.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: drop redundant BIF doorbell interrupt operations
Evan Quan [Fri, 27 Mar 2020 07:05:09 +0000 (15:05 +0800)]
drm/amd/powerplay: drop redundant BIF doorbell interrupt operations

This is already done in soc15.c. And this is really ASIC specific
and should not be placed here.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Fix dcn21 num_states
Dmytro Laktyushkin [Mon, 9 Mar 2020 21:11:16 +0000 (17:11 -0400)]
drm/amd/display: Fix dcn21 num_states

[Why]
DML expects num_states to exclude the duplicate state.

[How]
Set num_states to correct value to prevent array off-by-one error.  Also
refactor max clock level code for diags.

Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Enable BT2020 in COLOR_ENCODING property
Stylon Wang [Fri, 13 Mar 2020 14:21:38 +0000 (10:21 -0400)]
drm/amd/display: Enable BT2020 in COLOR_ENCODING property

[Why]
BT2020 is not supported in COLOR_ENCODING property of planes.  Only
BT601 and BT709 was available.

[How]
Allow BT2020 as legit value in setting COLOR_ENCODING property.

Signed-off-by: Stylon Wang <stylon.wang@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: LFC not working on 2.0x range monitors (v2)
Aric Cyr [Wed, 11 Mar 2020 22:10:20 +0000 (18:10 -0400)]
drm/amd/display: LFC not working on 2.0x range monitors (v2)

[Why]
Nominal pixel clock and EDID information differ in precision so although
monitor reports maximum refresh is 2x minimum, LFC was not being
enabled.

[How]
Use minimum refresh rate as nominal/2 when EDID dictates that min
refresh = max refresh/2.

v2: squash in 64 bit divide fix

Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Support plane level CTM
Stylon Wang [Tue, 10 Mar 2020 19:09:29 +0000 (15:09 -0400)]
drm/amd/display: Support plane level CTM

[Why]
CTM was only supported at CRTC level and we need color space conversion
in linear space at plane level.

[How]
- Add plane-level CTM to dc interface
- Program plane-level CTM in DCN

Signed-off-by: Stylon Wang <stylon.wang@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Revert change to HDCP display states
Isabel Zhang [Wed, 11 Mar 2020 19:59:41 +0000 (15:59 -0400)]
drm/amd/display: Revert change to HDCP display states

[Why]
Change is causing a regression where the OPC app no longer functions
properly.

[How]
Revert the changelist causing the issue.

Signed-off-by: Isabel Zhang <isabel.zhang@amd.com>
Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Not doing optimize bandwidth if flip pending.
Yongqiang Sun [Mon, 9 Mar 2020 21:13:02 +0000 (17:13 -0400)]
drm/amd/display: Not doing optimize bandwidth if flip pending.

[Why]
In some scenario like 1366x768 VSR enabled connected with a 4K monitor
and playing 4K video in clone mode, underflow will be observed due to
decrease dppclk when previouse surface scan isn't finished

[How]
In this use case, surface flip is switching between 4K and 1366x768,
1366x768 needs smaller dppclk, and when decrease the clk and previous
surface scan is for 4K and scan isn't done, underflow will happen.  Not
doing optimize bandwidth in case of flip pending.

Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Use double buffered DRR timing update by default
Nicholas Kazlauskas [Thu, 5 Mar 2020 19:43:00 +0000 (14:43 -0500)]
drm/amd/display: Use double buffered DRR timing update by default

[Why]
For some monitors extreme flickering can occur while using LFC for if
we're not doing the DRR timing update for V_TOTAL_MIN / V_TOTAL_MAX at
the DP start of frame.

Hardware can default to any time in the frame which isn't the behavior
we want.

[How]
Add a new function for setting the double buffering mode for DRR timing.

Default to DP start of frame double buffering on timing generator init.

Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Support P010 pixel format
Stylon Wang [Fri, 6 Mar 2020 14:55:29 +0000 (09:55 -0500)]
drm/amd/display: Support P010 pixel format

[Why]
P010 pixel format is not declared as supported in DRM and DM.

[How]
Add P010 format to the support list presented to DRM and checked in DM

Signed-off-by: Stylon Wang <stylon.wang@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Update function to get optimal number of taps
Eric Bernstein [Fri, 6 Mar 2020 22:07:12 +0000 (17:07 -0500)]
drm/amd/display: Update function to get optimal number of taps

[Why]
Diagnostics scaling test failing to set required number of vertical taps
in 4:2:0 surface case

[How]
In dpp3_get_optimal_number_of_taps() need to use LB_MEMORY_CONFIG_3 for
4:2:0 surface case. In resource_build_scaling_params() make sure to also
set plane res alpha enable based on updated surface state

Signed-off-by: Eric Bernstein <eric.bernstein@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: fix hpd bo size calculation error
Kevin Wang [Wed, 25 Mar 2020 09:06:14 +0000 (17:06 +0800)]
drm/amdgpu: fix hpd bo size calculation error

the HPD bo size calculation error.
the "mem.size" can't present actual BO size all time.

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <Christian.Koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agoMerge tag 'drm-msm-next-2020-03-22' of https://gitlab.freedesktop.org/drm/msm into...
Dave Airlie [Tue, 31 Mar 2020 06:34:49 +0000 (16:34 +1000)]
Merge tag 'drm-msm-next-2020-03-22' of https://gitlab.freedesktop.org/drm/msm into drm-next

A bit smaller this time around.. there are still a couple uabi
additions for vulkan waiting in the wings, but I punted on them this
cycle due to running low on time.  (They should be easy enough to
rebase, and if it is a problem for anyone I can push a next+uabi
branch so that tu work can proceed.)

The bigger change is refactoring dpu resource manager and moving dpu
to use atomic global state.  Other than that, it is mostly cleanups
and fixes.

From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/
Signed-off-by: Dave Airlie <airlied@redhat.com>
5 years agoMerge v5.6 into drm-next
Dave Airlie [Tue, 31 Mar 2020 05:15:47 +0000 (15:15 +1000)]
Merge v5.6 into drm-next

msm needed rc6, so I just went and merged release
(msm has been in drm-next outside of this tree)

Signed-off-by: Dave Airlie <airlied@redhat.com>
5 years agoMerge tag 'drm-intel-next-fixes-2020-03-27' of git://anongit.freedesktop.org/drm...
Dave Airlie [Mon, 30 Mar 2020 05:56:03 +0000 (15:56 +1000)]
Merge tag 'drm-intel-next-fixes-2020-03-27' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

Fixes for instability on Baytrail and Haswell;
Ice Lake RPS; Sandy Bridge RC6; and few others around
GT hangchec/reset; livelock; and a null dereference.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200327081607.GA3082710@intel.com
5 years agoMerge tag 'amd-drm-next-5.7-2020-03-26' of git://people.freedesktop.org/~agd5f/linux...
Dave Airlie [Mon, 30 Mar 2020 05:21:02 +0000 (15:21 +1000)]
Merge tag 'amd-drm-next-5.7-2020-03-26' of git://people.freedesktop.org/~agd5f/linux into drm-next

amd-drm-next-5.7-2020-03-26:

amdgpu:
- Remove a dpm quirk that is not necessary
- Fix handling of AC/DC mode in newer SMU firmwares on navi
- SR-IOV fixes
- RAS fixes

scheduler:
- Fix a race condition

radeon:
- Remove a dpm quirk that is not necessary

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200326155310.5486-1-alexander.deucher@amd.com
5 years agoLinux 5.6 v5.6
Linus Torvalds [Sun, 29 Mar 2020 22:25:41 +0000 (15:25 -0700)]
Linux 5.6

5 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sun, 29 Mar 2020 17:40:31 +0000 (10:40 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge vm fixes from Andrew Morton:
 "5 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/sparse: fix kernel crash with pfn_section_valid check
  mm: fork: fix kernel_stack memcg stats for various stack implementations
  hugetlb_cgroup: fix illegal access to memory
  drivers/base/memory.c: indicate all memory blocks as removable
  mm/swapfile.c: move inode_lock out of claim_swapfile

5 years agoMerge tag 'timers-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 29 Mar 2020 17:36:29 +0000 (10:36 -0700)]
Merge tag 'timers-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix for the Hyper-V clocksource driver to make sched clock
  actually return nanoseconds and not the virtual clock value which
  increments at 10e7 HZ (100ns)"

* tag 'timers-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/hyper-v: Make sched clock return nanoseconds correctly

5 years agoMerge tag 'irq-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 29 Mar 2020 17:07:00 +0000 (10:07 -0700)]
Merge tag 'irq-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "A single bugfix to prevent reference leaks in irq affinity notifiers"

* tag 'irq-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Fix reference leaks on irq affinity notifiers

5 years agomm/sparse: fix kernel crash with pfn_section_valid check
Aneesh Kumar K.V [Sun, 29 Mar 2020 02:17:29 +0000 (19:17 -0700)]
mm/sparse: fix kernel crash with pfn_section_valid check

Fix the crash like this:

    BUG: Kernel NULL pointer dereference on read at 0x00000000
    Faulting instruction address: 0xc000000000c3447c
    Oops: Kernel access of bad area, sig: 11 [#1]
    LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
    CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
    ...
    NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
    LR [c000000000088354] vmemmap_free+0x144/0x320
    Call Trace:
       section_deactivate+0x220/0x240
       __remove_pages+0x118/0x170
       arch_remove_memory+0x3c/0x150
       memunmap_pages+0x1cc/0x2f0
       devm_action_release+0x30/0x50
       release_nodes+0x2f8/0x3e0
       device_release_driver_internal+0x168/0x270
       unbind_store+0x130/0x170
       drv_attr_store+0x44/0x60
       sysfs_kf_write+0x68/0x80
       kernfs_fop_write+0x100/0x290
       __vfs_write+0x3c/0x70
       vfs_write+0xcc/0x240
       ksys_write+0x7c/0x140
       system_call+0x5c/0x68

The crash is due to NULL dereference at

test_bit(idx, ms->usage->subsection_map);

due to ms->usage = NULL in pfn_section_valid()

With commit d41e2f3bd546 ("mm/hotplug: fix hot remove failure in
SPARSEMEM|!VMEMMAP case") section_mem_map is set to NULL after
depopulate_section_mem().  This was done so that pfn_page() can work
correctly with kernel config that disables SPARSEMEM_VMEMMAP.  With that
config pfn_to_page does

__section_mem_map_addr(__sec) + __pfn;

where

  static inline struct page *__section_mem_map_addr(struct mem_section *section)
  {
unsigned long map = section->section_mem_map;
map &= SECTION_MAP_MASK;
return (struct page *)map;
  }

Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is
used to check the pfn validity (pfn_valid()).  Since section_deactivate
release mem_section->usage if a section is fully deactivated,
pfn_valid() check after a subsection_deactivate cause a kernel crash.

  static inline int pfn_valid(unsigned long pfn)
  {
  ...
return early_section(ms) || pfn_section_valid(ms, pfn);
  }

where

  static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
  {
int idx = subsection_map_index(pfn);

return test_bit(idx, ms->usage->subsection_map);
  }

Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is
freed.  For architectures like ppc64 where large pages are used for
vmmemap mapping (16MB), a specific vmemmap mapping can cover multiple
sections.  Hence before a vmemmap mapping page can be freed, the kernel
needs to make sure there are no valid sections within that mapping.
Clearing the section valid bit before depopulate_section_memap enables
this.

[aneesh.kumar@linux.ibm.com: add comment]
Link: http://lkml.kernel.org/r/20200326133235.343616-1-aneesh.kumar@linux.ibm.comLink:
Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm: fork: fix kernel_stack memcg stats for various stack implementations
Roman Gushchin [Sun, 29 Mar 2020 02:17:25 +0000 (19:17 -0700)]
mm: fork: fix kernel_stack memcg stats for various stack implementations

Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the
space for task stacks can be allocated using __vmalloc_node_range(),
alloc_pages_node() and kmem_cache_alloc_node().

In the first and the second cases page->mem_cgroup pointer is set, but
in the third it's not: memcg membership of a slab page should be
determined using the memcg_from_slab_page() function, which looks at
page->slab_cache->memcg_params.memcg .  In this case, using
mod_memcg_page_state() (as in account_kernel_stack()) is incorrect:
page->mem_cgroup pointer is NULL even for pages charged to a non-root
memory cgroup.

It can lead to kernel_stack per-memcg counters permanently showing 0 on
some architectures (depending on the configuration).

In order to fix it, let's introduce a mod_memcg_obj_state() helper,
which takes a pointer to a kernel object as a first argument, uses
mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls
mod_memcg_state().  It allows to handle all possible configurations
(CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without
spilling any memcg/kmem specifics into fork.c .

Note: This is a special version of the patch created for stable
backports.  It contains code from the following two patches:
  - mm: memcg/slab: introduce mem_cgroup_from_obj()
  - mm: fork: fix kernel_stack memcg stats for various stack implementations

[guro@fb.com: introduce mem_cgroup_from_obj()]
Link: http://lkml.kernel.org/r/20200324004221.GA36662@carbon.dhcp.thefacebook.com
Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200303233550.251375-1-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agohugetlb_cgroup: fix illegal access to memory
Mina Almasry [Sun, 29 Mar 2020 02:17:22 +0000 (19:17 -0700)]
hugetlb_cgroup: fix illegal access to memory

This appears to be a mistake in commit faced7e0806cf ("mm: hugetlb
controller for cgroups v2").

Essentially that commit does a hugetlb_cgroup_from_counter assuming that
page_counter_try_charge has initialized counter.

But if that has failed then it seems will not initialize counter, so
hugetlb_cgroup_from_counter(counter) ends up pointing to random memory,
causing kasan to complain.

The solution is to simply use 'h_cg', instead of
hugetlb_cgroup_from_counter(counter), since that is a reference to the
hugetlb_cgroup anyway.  After this change kasan ceases to complain.

Fixes: faced7e0806cf ("mm: hugetlb controller for cgroups v2")
Reported-by: syzbot+cac0c4e204952cf449b1@syzkaller.appspotmail.com
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Giuseppe Scrivano <gscrivan@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/20200313223920.124230-1-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agodrivers/base/memory.c: indicate all memory blocks as removable
David Hildenbrand [Sun, 29 Mar 2020 02:17:19 +0000 (19:17 -0700)]
drivers/base/memory.c: indicate all memory blocks as removable

We see multiple issues with the implementation/interface to compute
whether a memory block can be offlined (exposed via
/sys/devices/system/memory/memoryX/removable) and would like to simplify
it (remove the implementation).

1. It runs basically lockless. While this might be good for performance,
   we see possible races with memory offlining that will require at
   least some sort of locking to fix.

2. Nowadays, more false positives are possible. No arch-specific checks
   are performed that validate if memory offlining will not be denied
   right away (and such check will require locking). For example, arm64
   won't allow to offline any memory block that was added during boot -
   which will imply a very high error rate. Other archs have other
   constraints.

3. The interface is inherently racy. E.g., if a memory block is detected
   to be removable (and was not a false positive at that time), there is
   still no guarantee that offlining will actually succeed. So any
   caller already has to deal with false positives.

4. It is unclear which performance benefit this interface actually
   provides. The introducing commit 5c755e9fd813 ("memory-hotplug: add
   sysfs removable attribute for hotplug memory remove") mentioned

"A user-level agent must be able to identify which sections
 of memory are likely to be removable before attempting the
 potentially expensive operation."

   However, no actual performance comparison was included.

Known users:

 - lsmem: Will group memory blocks based on the "removable" property. [1]

 - chmem: Indirect user. It has a RANGE mode where one can specify
          removable ranges identified via lsmem to be offlined. However,
          it also has a "SIZE" mode, which allows a sysadmin to skip the
          manual "identify removable blocks" step. [2]

 - powerpc-utils: Uses the "removable" attribute to skip some memory
          blocks right away when trying to find some to offline+remove.
          However, with ballooning enabled, it already skips this
          information completely (because it once resulted in many false
          negatives). Therefore, the implementation can deal with false
          positives properly already. [3]

According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer
driven from userspace via the drmgr command (powerpc-utils).  Nowadays
it's managed in the kernel - including onlining/offlining of memory
blocks - triggered by drmgr writing to /sys/kernel/dlpar.  So the
affected legacy userspace handling is only active on old kernels.  Only
very old versions of drmgr on a new kernel (unlikely) might execute
slower - totally acceptable.

With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not
break any user space tool.  We implement a very bad heuristic now.
Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report
"not removable" as before.

Original discussion can be found in [4] ("[PATCH RFC v1] mm:
is_mem_section_removable() overhaul").

Other users of is_mem_section_removable() will be removed next, so that
we can remove is_mem_section_removable() completely.

[1] http://man7.org/linux/man-pages/man1/lsmem.1.html
[2] http://man7.org/linux/man-pages/man8/chmem.8.html
[3] https://github.com/ibm-power-utilities/powerpc-utils
[4] https://lkml.kernel.org/r/20200117105759.27905-1-david@redhat.com

Also, this patch probably fixes a crash reported by Steve.
http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLXw@mail.gmail.com

Reported-by: "Scargall, Steve" <steve.scargall@intel.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Nathan Fontenot <ndfont@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Karel Zak <kzak@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200128093542.6908-1-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/swapfile.c: move inode_lock out of claim_swapfile
Naohiro Aota [Sun, 29 Mar 2020 02:17:15 +0000 (19:17 -0700)]
mm/swapfile.c: move inode_lock out of claim_swapfile

claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -EBUSY).  And, on the other error
cases, it does not lock the inode.

This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
check out of claim_swapfile().  The inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap".  Thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".

    =====================================
    WARNING: bad unlock balance detected!
    5.5.0-rc7+ #176 Not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
    Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
    Call Trace:
     dump_stack+0xa1/0xea
     print_unlock_imbalance_bug.cold+0x114/0x123
     lock_release+0x562/0xed0
     up_write+0x2d/0x490
     __do_sys_swapon+0x94b/0x3550
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f15da0a0dc7

Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Qais Youef <qais.yousef@arm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>