]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
5 years agoRevert "drm/amdkfd: Unify gfx9/gfx10 context save area layouts"
Felix Kuehling [Fri, 7 Aug 2020 22:23:56 +0000 (18:23 -0400)]
Revert "drm/amdkfd: Unify gfx9/gfx10 context save area layouts"

This reverts commit 0a5baee415000a3e18730ac98e19d046c3cebbe6.

The change introduced a regression on some chips. Reverting until
a proper solution can be found.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agoRevert "drm/amdkfd: Fix spurious debug exception on gfx10"
Felix Kuehling [Fri, 7 Aug 2020 22:22:27 +0000 (18:22 -0400)]
Revert "drm/amdkfd: Fix spurious debug exception on gfx10"

This reverts commit ea368183ae900e376b66d3f23da22acde48e385a.

Needed due to conflicts when reverting "drm/amdkfd: Unify gfx9/gfx10
context save area layouts".

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm: amdgpu: Use the correct size when allocating memory
Christophe JAILLET [Sun, 9 Aug 2020 20:34:06 +0000 (22:34 +0200)]
drm: amdgpu: Use the correct size when allocating memory

When '*sgt' is allocated, we must allocated 'sizeof(**sgt)' bytes instead
of 'sizeof(*sg)'.

The sizeof(*sg) is bigger than sizeof(**sgt) so this wastes memory but
it won't lead to corruption.

Fixes: f44ffd677fb3 ("drm/amdgpu: add support for exporting VRAM using DMA-buf v3")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Fix bug where DPM is not enabled after hibernate and resume
Sandeep Raghuraman [Thu, 6 Aug 2020 17:22:20 +0000 (22:52 +0530)]
drm/amdgpu: Fix bug where DPM is not enabled after hibernate and resume

Reproducing bug report here:
After hibernating and resuming, DPM is not enabled. This remains the case
even if you test hibernate using the steps here:
https://www.kernel.org/doc/html/latest/power/basic-pm-debugging.html

I debugged the problem, and figured out that in the file hardwaremanager.c,
in the function, phm_enable_dynamic_state_management(), the check
'if (!hwmgr->pp_one_vf && smum_is_dpm_running(hwmgr) && !amdgpu_passthrough(adev) && adev->in_suspend)'
returns true for the hibernate case, and false for the suspend case.

This means that for the hibernate case, the AMDGPU driver doesn't enable DPM
(even though it should) and simply returns from that function.
In the suspend case, it goes ahead and enables DPM, even though it doesn't need to.

I debugged further, and found out that in the case of suspend, for the
CIK/Hawaii GPUs, smum_is_dpm_running(hwmgr) returns false, while in the case of
hibernate, smum_is_dpm_running(hwmgr) returns true.

For CIK, the ci_is_dpm_running() function calls the ci_is_smc_ram_running() function,
which is ultimately used to determine if DPM is currently enabled or not,
and this seems to provide the wrong answer.

I've changed the ci_is_dpm_running() function to instead use the same method that
some other AMD GPU chips do (e.g Fiji), which seems to read the voltage controller.
I've tested on my R9 390 and it seems to work correctly for both suspend and
hibernate use cases, and has been stable so far.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=208839
Signed-off-by: Sandeep Raghuraman <sandy.8925@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: unlock mutex on error
Dennis Li [Tue, 4 Aug 2020 04:32:13 +0000 (12:32 +0800)]
drm/amdgpu: unlock mutex on error

Make sure to unlock the mutex when error happen

v2:
1. correct syntax error in the commit comments
2. remove change-Id

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: put VCN/JPEG into PG ungate state before dpm table setup(V3)
Evan Quan [Wed, 5 Aug 2020 09:24:41 +0000 (17:24 +0800)]
drm/amd/powerplay: put VCN/JPEG into PG ungate state before dpm table setup(V3)

As VCN related dpm table setup needs VCN be in PG ungate state. Same logics
applies to JPEG.

V2: fix paste typo
V3: code cosmetic

Signed-off-by: Evan Quan <evan.quan@amd.com>
Tested-by: Matt Coffin <mcoffin13@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: update swSMU VCN/JPEG PG logics
Evan Quan [Mon, 3 Aug 2020 03:15:14 +0000 (11:15 +0800)]
drm/amd/powerplay: update swSMU VCN/JPEG PG logics

Add lock protections and avoid unnecessary actions
if the PG state is already the same as required.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Tested-by: Matt Coffin <mcoffin13@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: use mode1 reset by default for sienna_cichlid
Likun Gao [Thu, 6 Aug 2020 09:37:28 +0000 (17:37 +0800)]
drm/amdgpu: use mode1 reset by default for sienna_cichlid

Swith default gpu reset method for sienna_cichlid to MODE1 reset.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Drop dm_determine_update_type_for_commit
Nicholas Kazlauskas [Tue, 28 Jul 2020 15:08:02 +0000 (11:08 -0400)]
drm/amd/display: Drop dm_determine_update_type_for_commit

[Why]
This was added in the past to solve the issue of not knowing when
to stall for medium and full updates in DM.

Since DC is ultimately decides what requires bandwidth changes we
wanted to make use of it directly to determine this.

The problem is that we can't actually pass any of the stream or surface
updates into DC global validation, so we don't actually check if the new
configuration is valid - we just validate the old existing config
instead and stall for outstanding commits to finish.

There's also the problem of grabbing the DRM private object for
pageflips which can lead to page faults in the case where commits
execute out of order and free a DRM private object state that was
still required for commit tail.

[How]
Now that we reset the plane in DM with the same conditions DC checks
we can have planes go through DC validation and we know when we need
to check and stall based on whether the stream or planes changed.

We mark lock_and_validation_needed whenever we've done this, so just
go back to using that instead of dm_determine_update_type_for_commit.

Since we'll skip resetting the plane for a pageflip we will no longer
grab the DRM private object for pageflips as well, avoiding the
page fault issued caused by pageflipping under load with commits
executing out of order.

Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Reset plane for anything that's not a FAST update
Nicholas Kazlauskas [Tue, 28 Jul 2020 14:48:21 +0000 (10:48 -0400)]
drm/amd/display: Reset plane for anything that's not a FAST update

[Why]
MEDIUM or FULL updates can require global validation or affect
bandwidth. By treating these all simply as surface updates we aren't
actually passing this through DC global validation.

[How]
There's currently no way to pass surface updates through DC global
validation, nor do I think it's a good idea to change the interface
to accept these.

DC global validation itself is currently stateless, and we can move
our update type checking to be stateless as well by duplicating DC
surface checks in DM based on DRM properties.

We wanted to rely on DC automatically determining this since DC knows
best, but DM is ultimately what fills in everything into DC plane
state so it does need to know as well.

There are basically only three paths that we exercise in DM today:

1) Cursor (async update)
2) Pageflip (fast update)
3) Full pipe programming (medium/full updates)

Which means that anything that's more than a pageflip really needs to
go down path #3.

So this change duplicates all the surface update checks based on DRM
state instead inside of should_reset_plane().

Next step is dropping dm_determine_update_type_for_commit and we no
longer require the old DC state at all for global validation.

Optimization can come later so we don't reset DC planes at all for
MEDIUM udpates and avoid validation, but we might require some extra
checks in DM to achieve this.

Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Use validated tiling_flags and tmz_surface in commit_tail
Nicholas Kazlauskas [Thu, 6 Aug 2020 19:48:10 +0000 (15:48 -0400)]
drm/amd/display: Use validated tiling_flags and tmz_surface in commit_tail

[Why]
So we're not racing with userspace or deadlocking DM.

[How]
These flags are now stored on dm_plane_state itself and acquried and
validated during commit_check, so just use those instead.

Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Avoid using unvalidated tiling_flags and tmz_surface in prepare_planes
Nicholas Kazlauskas [Tue, 28 Jul 2020 14:03:10 +0000 (10:03 -0400)]
drm/amd/display: Avoid using unvalidated tiling_flags and tmz_surface in prepare_planes

[Why]
We're racing with userspace as the flags could potentially change
from when we acquired and validated them in commit_check.

[How]
We unfortunately can't drop this function in its entirety from
prepare_planes since we don't know the afb->address at commit_check
time yet.

So instead of querying new tiling_flags and tmz_surface use the ones
from the plane_state directly.

While we're at it, also update the force_disable_dcc option based
on the state from atomic check.

Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Reset plane when tiling flags change
Nicholas Kazlauskas [Tue, 28 Jul 2020 13:59:53 +0000 (09:59 -0400)]
drm/amd/display: Reset plane when tiling flags change

[Why]
Enabling or disable DCC or switching between tiled and linear formats
can require bandwidth updates.

They're currently skipping all DC validation by being treated as purely
surface updates.

[How]
Treat tiling_flag changes (which encode DCC state) as a condition for
resetting the plane.

Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Store tiling_flags and tmz_surface on dm_plane_state
Nicholas Kazlauskas [Tue, 28 Jul 2020 13:44:26 +0000 (09:44 -0400)]
drm/amd/display: Store tiling_flags and tmz_surface on dm_plane_state

[Why]
Store these in advance so we can reuse them later in commit_tail without
having to reserve the fbo again.

These will also be used for checking for tiling changes when deciding
to reset the plane or not.

[How]
This change should mostly be a refactor. Only commit check is affected
for now and I'll drop the get_fb_info calls in prepare_planes and
commit_tail after.

This runs a prepass loop once we think that all planes have been added
to the context and replaces the get_fb_info calls with accessing the
dm_plane_state instead.

Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: update driver if file for sienna_cichlid
Likun Gao [Thu, 6 Aug 2020 06:41:06 +0000 (14:41 +0800)]
drm/amd/powerplay: update driver if file for sienna_cichlid

Update drive if file for sienna_cichlid.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: new ids flag for tmz (v2)
Pierre-Eric Pelloux-Prayer [Thu, 30 Jul 2020 13:54:59 +0000 (15:54 +0200)]
drm/amdgpu: new ids flag for tmz (v2)

Allows UMD to know if TMZ is supported and enabled.

This commit also bumps KMS_DRIVER_MINOR because if we don't
UMD can't tell if "ids_flags & AMDGPU_IDS_FLAGS_TMZ == 0" means
"tmz is not enabled" or "tmz may be enabled but the kernel doesn't
report it".

v2: use amdgpu_is_tmz() and reworded commit message.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add control method to bypass metrics cache on Vega12
Evan Quan [Thu, 30 Jul 2020 07:28:40 +0000 (15:28 +0800)]
drm/amd/powerplay: add control method to bypass metrics cache on Vega12

As for the gpu metric export, metrics cache makes no sense. It's up to
user to decide how often the metrics should be retrieved.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add control method to bypass metrics cache on Vega20
Evan Quan [Thu, 30 Jul 2020 07:24:08 +0000 (15:24 +0800)]
drm/amd/powerplay: add control method to bypass metrics cache on Vega20

As for the gpu metric export, metrics cache makes no sense. It's up to
user to decide how often the metrics should be retrieved.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add control method to bypass metrics cache on Renoir
Evan Quan [Thu, 30 Jul 2020 07:02:11 +0000 (15:02 +0800)]
drm/amd/powerplay: add control method to bypass metrics cache on Renoir

As for the gpu metric export, metrics cache makes no sense. It's up to
user to decide how often the metrics should be retrieved.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add control method to bypass metrics cache on Sienna Cichlid
Evan Quan [Thu, 30 Jul 2020 07:09:57 +0000 (15:09 +0800)]
drm/amd/powerplay: add control method to bypass metrics cache on Sienna Cichlid

As for the gpu metric export, metrics cache makes no sense. It's up to
user to decide how often the metrics should be retrieved.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add control method to bypass metrics cache on Navi10
Evan Quan [Thu, 30 Jul 2020 06:55:32 +0000 (14:55 +0800)]
drm/amd/powerplay: add control method to bypass metrics cache on Navi10

As for the gpu metric export, metrics cache makes no sense. It's up to
user to decide how often the metrics should be retrieved.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add control method to bypass metrics cache on Arcturus
Evan Quan [Thu, 30 Jul 2020 06:31:21 +0000 (14:31 +0800)]
drm/amd/powerplay: add control method to bypass metrics cache on Arcturus

As for the gpu metric export, metrics cache makes no sense. It's up to
user to decide how often the metrics should be retrieved.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add Vega12 support for gpu metrics export
Evan Quan [Thu, 30 Jul 2020 04:39:58 +0000 (12:39 +0800)]
drm/amd/powerplay: add Vega12 support for gpu metrics export

Add Vega12 gpu metrics export interface.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add Vega20 support for gpu metrics export
Evan Quan [Thu, 30 Jul 2020 04:23:42 +0000 (12:23 +0800)]
drm/amd/powerplay: add Vega20 support for gpu metrics export

Add Vega20 gpu metrics export interface.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: enable gpu_metrics export on legacy powerplay routines
Evan Quan [Thu, 30 Jul 2020 03:40:07 +0000 (11:40 +0800)]
drm/amd/powerplay: enable gpu_metrics export on legacy powerplay routines

Enable gpu_metrics support on legacy powerplay routines.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add Renoir support for gpu metrics export(V2)
Evan Quan [Mon, 27 Jul 2020 08:24:46 +0000 (16:24 +0800)]
drm/amd/powerplay: add Renoir support for gpu metrics export(V2)

Add Renoir gpu metrics export interface.

V2: use memcpy to make code more compact

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add Sienna Cichlid support for gpu metrics export
Evan Quan [Mon, 27 Jul 2020 02:00:47 +0000 (10:00 +0800)]
drm/amd/powerplay: add Sienna Cichlid support for gpu metrics export

Add Sienna Cichlid gpu metrics export interface.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add Navi1x support for gpu metrics export
Evan Quan [Fri, 24 Jul 2020 09:24:34 +0000 (17:24 +0800)]
drm/amd/powerplay: add Navi1x support for gpu metrics export

Add Navi1x gpu metrics export interface.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: update the data structure for NV12 SmuMetrics
Evan Quan [Fri, 24 Jul 2020 09:47:03 +0000 (17:47 +0800)]
drm/amd/powerplay: update the data structure for NV12 SmuMetrics

Although it does not bring any problem for now, the coming gpu
metrics interface needs to handle them differently based on the
asic type.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add Arcturus support for gpu metrics export
Evan Quan [Fri, 24 Jul 2020 02:42:39 +0000 (10:42 +0800)]
drm/amd/powerplay: add Arcturus support for gpu metrics export

Add Arcturus gpu metrics export interface.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: implement SMU V11 common APIs for retrieving link speed/width
Evan Quan [Fri, 24 Jul 2020 10:39:33 +0000 (18:39 +0800)]
drm/amd/powerplay: implement SMU V11 common APIs for retrieving link speed/width

This will be shared around all SMU V11 asics.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: add new sysfs interface for retrieving gpu metrics(V2)
Evan Quan [Thu, 23 Jul 2020 10:03:35 +0000 (18:03 +0800)]
drm/amd/powerplay: add new sysfs interface for retrieving gpu metrics(V2)

A new interface for UMD to retrieve gpu metrics data.

V2: rich the documentation

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: define an universal data structure for gpu metrics (V4)
Evan Quan [Thu, 23 Jul 2020 08:07:01 +0000 (16:07 +0800)]
drm/amd/powerplay: define an universal data structure for gpu metrics (V4)

Thus we can provide an interface for UMD to retrieve gpu metrics data.

V2: better naming and comments
V3: two structures created for dGPU and APU separately
V4: add driver attached timestamp

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: fix spelling mistake "paramter" -> "parameter"
Colin Ian King [Wed, 5 Aug 2020 12:15:27 +0000 (13:15 +0100)]
drm/amdgpu: fix spelling mistake "paramter" -> "parameter"

There is a spelling mistake in a dev_warn message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: grant Arcturus softmin/max setting on latest PM firmware
Evan Quan [Tue, 4 Aug 2020 08:58:30 +0000 (16:58 +0800)]
drm/amd/powerplay: grant Arcturus softmin/max setting on latest PM firmware

For Arcturus, the softmin/max settings from driver are permitted on the
latest(54.26 later) SMU firmware. Thus enabling them in driver.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: option to disable system mem limit
Philip Yang [Mon, 27 Jul 2020 13:06:18 +0000 (09:06 -0400)]
drm/amdkfd: option to disable system mem limit

If multiple process share system memory through /dev/shm, KFD allocate
memory should not fail if it reaches the system memory limit because
one copy of physical system memory are shared by multiple process.

Add module parameter no_system_mem_limit to provide user option to
disable system memory limit check at runtime using sysfs or during
driver module init using kernel boot argument. By default the system
memory limit is on.

Print out debug message to warn user if KFD allocate memory failed
because system memory reaches limit.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Fix wrong return value in dm_update_plane_state()
Tianjia Zhang [Sun, 2 Aug 2020 11:15:36 +0000 (19:15 +0800)]
drm/amd/display: Fix wrong return value in dm_update_plane_state()

On an error exit path, a negative error code should be returned
instead of a positive return value.

Fixes: 9e869063b0021 ("drm/amd/display: Move iteration out of dm_update_planes")
Cc: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Constify dcn30_res_pool_funcs
Rikard Falkeborn [Tue, 4 Aug 2020 20:06:55 +0000 (22:06 +0200)]
drm/amd/display: Constify dcn30_res_pool_funcs

The only usage of dcn30_res_pool_funcs is to assign its address to a
const pointer. Make it const to allow the compiler to put it in
read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Constify dcn21_res_pool_funcs
Rikard Falkeborn [Tue, 4 Aug 2020 20:06:54 +0000 (22:06 +0200)]
drm/amd/display: Constify dcn21_res_pool_funcs

The only usage of dcn21_res_pool_funcs is to assign its address to a
const pointer. Make it const to allow the compiler to put it in
read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Constify dcn20_res_pool_funcs
Rikard Falkeborn [Tue, 4 Aug 2020 20:06:53 +0000 (22:06 +0200)]
drm/amd/display: Constify dcn20_res_pool_funcs

The only usage of dcn20_res_pool_funcs is to assign its address to a
const pointer. Make it const to allow the compiler to put it in
read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Indent an if statement
Dan Carpenter [Mon, 3 Aug 2020 14:35:19 +0000 (17:35 +0300)]
drm/amd/display: Indent an if statement

The if statement wasn't indented so it's confusing.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: move vram usage by vbios to mman (v2)
Alex Deucher [Wed, 29 Jul 2020 17:14:17 +0000 (13:14 -0400)]
drm/amdgpu: move vram usage by vbios to mman (v2)

It's related to the memory manager so move it there.

v2: inline the structure

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: move IP discovery data to mman
Alex Deucher [Wed, 29 Jul 2020 17:02:25 +0000 (13:02 -0400)]
drm/amdgpu: move IP discovery data to mman

It's related to the memory manager so move it there.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: move stolen memory from gmc to mman
Alex Deucher [Wed, 29 Jul 2020 16:53:56 +0000 (12:53 -0400)]
drm/amdgpu: move stolen memory from gmc to mman

It's more related to memory management than memory
controller.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/gmc: disable keep_stolen_vga_memory on arcturus
Alex Deucher [Tue, 28 Jul 2020 19:35:56 +0000 (15:35 -0400)]
drm/amdgpu/gmc: disable keep_stolen_vga_memory on arcturus

I suspect the only reason this was set was to avoid touching
the display related registers on arcturus.  Someone should
double check this on arcturus with S3.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: drop the CPU pointers for the stolen vga bos
Alex Deucher [Tue, 28 Jul 2020 22:34:50 +0000 (18:34 -0400)]
drm/amdgpu: drop the CPU pointers for the stolen vga bos

We never use them.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/gmc10: switch to using amdgpu_gmc_get_vbios_allocations
Alex Deucher [Tue, 28 Jul 2020 22:30:14 +0000 (18:30 -0400)]
drm/amdgpu/gmc10: switch to using amdgpu_gmc_get_vbios_allocations

The new helper centralizes the logic in one place.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/gmc9: switch to using amdgpu_gmc_get_vbios_allocations
Alex Deucher [Tue, 28 Jul 2020 22:29:55 +0000 (18:29 -0400)]
drm/amdgpu/gmc9: switch to using amdgpu_gmc_get_vbios_allocations

The new helper centralizes the logic in one place.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/gmc8: switch to using amdgpu_gmc_get_vbios_allocations
Alex Deucher [Tue, 28 Jul 2020 22:29:39 +0000 (18:29 -0400)]
drm/amdgpu/gmc8: switch to using amdgpu_gmc_get_vbios_allocations

The new helper centralizes the logic in one place.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/gmc7: switch to using amdgpu_gmc_get_vbios_allocations
Alex Deucher [Tue, 28 Jul 2020 22:29:20 +0000 (18:29 -0400)]
drm/amdgpu/gmc7: switch to using amdgpu_gmc_get_vbios_allocations

The new helper centralizes the logic in one place.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/gmc6: switch to using amdgpu_gmc_get_vbios_allocations
Alex Deucher [Tue, 28 Jul 2020 22:27:46 +0000 (18:27 -0400)]
drm/amdgpu/gmc6: switch to using amdgpu_gmc_get_vbios_allocations

The new helper centralizes the logic in one place.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/gmc: add new helper to get the FB size used by pre-OS console
Alex Deucher [Tue, 28 Jul 2020 19:04:52 +0000 (15:04 -0400)]
drm/amdgpu/gmc: add new helper to get the FB size used by pre-OS console

This adds a new gmc callback to get the size reserved by the pre-OS
console and provides a helper function for use by gmc IP drivers.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: add support for extended stolen vga memory
Alex Deucher [Tue, 28 Jul 2020 22:05:11 +0000 (18:05 -0400)]
drm/amdgpu: add support for extended stolen vga memory

This will allow us to split the allocation for systems
where we have to keep the stolen memory around to avoid
S3 issues.  This way we don't waste as much memory and
still avoid any screen artifacts during the bios to
driver transition.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: move keep stolen memory check into gmc core
Alex Deucher [Tue, 28 Jul 2020 21:55:30 +0000 (17:55 -0400)]
drm/amdgpu: move keep stolen memory check into gmc core

Rather than leaving this as a gmc v9 specific hack.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: move stolen vga bo from amdgpu to amdgpu.gmc
Alex Deucher [Tue, 28 Jul 2020 21:46:00 +0000 (17:46 -0400)]
drm/amdgpu: move stolen vga bo from amdgpu to amdgpu.gmc

Since that is where we store the other data related to
the stolen vga memory.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: use a define for the memory size of the vga emulator
Alex Deucher [Tue, 28 Jul 2020 18:10:46 +0000 (14:10 -0400)]
drm/amdgpu: use a define for the memory size of the vga emulator

Rather than open coding it everywhere.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: use create_at for the stolen pre-OS buffer
Alex Deucher [Tue, 28 Jul 2020 17:57:20 +0000 (13:57 -0400)]
drm/amdgpu: use create_at for the stolen pre-OS buffer

Should be functionally the same since nothing else is
allocated at that point, but let's be exact.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: handle bo size 0 in amdgpu_bo_create_kernel_at (v2)
Alex Deucher [Tue, 28 Jul 2020 21:38:29 +0000 (17:38 -0400)]
drm/amdgpu: handle bo size 0 in amdgpu_bo_create_kernel_at (v2)

Just return early to match other bo_create functions.

v2: check if the bo_ptr is NULL rather than checking the size.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/smu: rework i2c adpater registration
Alex Deucher [Thu, 30 Jul 2020 19:21:33 +0000 (15:21 -0400)]
drm/amdgpu/smu: rework i2c adpater registration

The i2c init/fini functions just register the i2c adapter.
There is no need to call them during hw init/fini.  They only
need to be called once per driver init/fini.  The previous
behavior broke runtime pm because we unregistered the i2c
adapter during suspend.

Tested-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: 3.2.97
Aric Cyr [Mon, 27 Jul 2020 14:53:38 +0000 (10:53 -0400)]
drm/amd/display: 3.2.97

Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: [FW Promotion] Release 0.0.27
Anthony Koo [Sat, 25 Jul 2020 01:37:56 +0000 (21:37 -0400)]
drm/amd/display: [FW Promotion] Release 0.0.27

| [Header Changes]
|       - Reworked the FW versioning to include hotfix
|         and test bits

Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Separate pipe disconnect from rest of progrmaming
Alvin Lee [Wed, 22 Jul 2020 04:32:14 +0000 (00:32 -0400)]
drm/amd/display: Separate pipe disconnect from rest of progrmaming

[Why]
When changing pixel formats for HDR (e.g. ARGB -> FP16)
there are configurations that change from 2 pipes to 1 pipe.
In these cases, it seems that disconnecting MPCC and doing
a surface update at the same time(after unlocking) causes
some registers to be updated slightly faster than others
after unlocking (e.g. if the pixel format is updated to FP16
before the new surface address is programmed, we get
corruption on the screen because the pixel formats aren't
matching). We separate disconnecting MPCC from the rest
of  the  pipe programming sequence to prevent this.

[How]
Move MPCC disconnect into separate operation than the
rest of the pipe programming.

Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Add debugfs for forcing stream timing sync
Victor Lu [Tue, 21 Jul 2020 16:08:34 +0000 (12:08 -0400)]
drm/amd/display: Add debugfs for forcing stream timing sync

[why]
There's currently no method to enable multi-stream synchronization from
userspace and we don't check the VSDB bits to know whether or not
specific displays should have the feature enable.

[how]
Add a debugfs entry that controls a new DM debug option,
"force_timing_sync". This debug option will set on any newly created
stream following the change to the debug option.
Expose a new interface from DC that performs the timing sync and a helper
to the "force_timing_sync" debugfs that iterates over the current streams
and modifies the current synchornization state and grouping.

Example usage to force a resync (from an X based desktop):

echo 1 > /sys/kernel/debug/dri/0/amdgpu_dm_force_timing_sync
xset dpms force off && xset dpms force on

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Display goes blank after inst
Igor Kravchenko [Fri, 24 Jul 2020 15:10:40 +0000 (11:10 -0400)]
drm/amd/display: Display goes blank after inst

[why]
Display goes blank after driver installation.
Aux tuning parameters must be used for 2.x only.
Wrong dc_golden_table offset was used.

[How]
Implement a new enc3_hw_init function without VBIOS constants usage to
be called for 3.x
Calculate dc_golden_table offset using sum of
base dce_info offset and golden table offset

Signed-off-by: Igor Kravchenko <Igor.Kravchenko@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Change null plane state swizzle mode to 4kb_s
George Shen [Fri, 17 Jul 2020 17:19:27 +0000 (13:19 -0400)]
drm/amd/display: Change null plane state swizzle mode to 4kb_s

[Why]
During SetPathMode and UpdatePlanes, the plane state can be null. We default
to linear swizzle mode when plane state is null. This resulted in bandwidth
validation failing when trying to set 8K60 mode (which previously passed validation
during rebuild timing list).

[How]
Change the default swizzle mode from linear to 4kb_s and update pitch accordingly.

Signed-off-by: George Shen <george.shen@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Use helper function to check for HDMI signal
JinZe.Xu [Tue, 21 Jul 2020 09:52:41 +0000 (17:52 +0800)]
drm/amd/display: Use helper function to check for HDMI signal

[How]
Use dc_is_hdmi_signal to determine signal type.

Signed-off-by: JinZe.Xu <JinZe.Xu@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: AMD OUI (DPCD 0x00300) skipped on some sink
Aric Cyr [Thu, 23 Jul 2020 17:06:23 +0000 (13:06 -0400)]
drm/amd/display: AMD OUI (DPCD 0x00300) skipped on some sink

[Why]
Sink OUI supported cap is not set so driver skips programming it.

[How]
Revert the change the skips OUI programming if the cap is not set

Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Comments on how to use DSC debugfs some entries
Eryk Brol [Fri, 19 Jun 2020 18:07:03 +0000 (14:07 -0400)]
drm/amd/display: Comments on how to use DSC debugfs some entries

[why]
Some of the DSC debugfs read enteries are missing comments
explaining how to use and how to comprehend the results.

Signed-off-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Mikita Lipski <Mikita.Lipski@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Fix logger context
Harry Wentland [Tue, 30 Jun 2020 15:16:05 +0000 (11:16 -0400)]
drm/amd/display: Fix logger context

[Why&How]
use correct logger context

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: DSC Bit target rate debugfs write entry
Eryk Brol [Fri, 19 Jun 2020 18:02:38 +0000 (14:02 -0400)]
drm/amd/display: DSC Bit target rate debugfs write entry

[Why]
We need to be able to specify bits per pixel for DSC on any
connector.

[How]
Overwrite computed DSC target rate in dsc_cfg, with requested value.
Overwrites for both SST and MST connectors, but in different places, but the process is identical. Overwrites only if DSC is decided to be enabled on that connector.

Signed-off-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Mikita Lipski <Mikita.Lipski@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: populate new dml variable
Dmytro Laktyushkin [Fri, 26 Jun 2020 18:30:29 +0000 (14:30 -0400)]
drm/amd/display: populate new dml variable

Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Eric Bernstein <Eric.Bernstein@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Read VBIOS Golden Settings Tbl
Igor Kravchenko [Mon, 20 Jul 2020 00:45:28 +0000 (20:45 -0400)]
drm/amd/display: Read VBIOS Golden Settings Tbl

[Why]
For ver.4.4 and higher VBIOS contains default setting table.

{How]
Read Golden Settings Table from VBIOS, apply Aux tuning parameters.

Signed-off-by: Igor Kravchenko <Igor.Kravchenko@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Use parameter for call to set output mux
Eric Bernstein [Mon, 20 Jul 2020 23:18:43 +0000 (19:18 -0400)]
drm/amd/display: Use parameter for call to set output mux

Signed-off-by: Eric Bernstein <eric.bernstein@amd.com>
Reviewed-by: Chris Park <Chris.Park@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Update virtual stream encoder
Eric Bernstein [Mon, 20 Jul 2020 20:10:22 +0000 (16:10 -0400)]
drm/amd/display: Update virtual stream encoder

Signed-off-by: Eric Bernstein <eric.bernstein@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: DSC Slice height debugfs write entry
Eryk Brol [Wed, 17 Jun 2020 19:28:04 +0000 (15:28 -0400)]
drm/amd/display: DSC Slice height debugfs write entry

[Why]
We need to be able to specify slice height for any connector's DSC

[How]
Overwrite computed parameters in dsc_cfg, with the value needed/
Overwrites for both SST and MST connectors, but in different places, but the process is identical. Overwrites only if DSC is decided to be enabled on that connector.

Signed-off-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Mikita Lipski <Mikita.Lipski@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: added RAS EEPROM device support check
John Clements [Mon, 3 Aug 2020 07:52:52 +0000 (15:52 +0800)]
drm/amdgpu: added RAS EEPROM device support check

updated RAS EEPROM init/threshold sequences to check for device support

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: enable RAS support for sienna cichlid
John Clements [Mon, 3 Aug 2020 06:24:50 +0000 (14:24 +0800)]
drm/amdgpu: enable RAS support for sienna cichlid

enabled GECC error injection and query support

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5)
Monk Liu [Mon, 27 Jul 2020 07:20:12 +0000 (15:20 +0800)]
drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5)

what:
the MQD's save and restore of KCQ (kernel compute queue)
cost lots of clocks during world switch which impacts a lot
to multi-VF performance

how:
introduce a paramter to control the number of KCQ to avoid
performance drop if there is no kernel compute queue needed

notes:
this paramter only affects gfx 8/9/10

v2:
refine namings

v3:
choose queues for each ring to that try best to cross pipes evenly.

v4:
fix indentation
some cleanupsin the gfx_compute_queue_acquire()

v5:
further fix on indentations
more cleanupsin gfx_compute_queue_acquire()

TODO:
in the future we will let hypervisor driver to set this paramter
automatically thus no need for user to configure it through
modprobe in virtual machine

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: update eeprom once specifying one bigger threshold(v3)
Guchun Chen [Mon, 27 Jul 2020 07:51:05 +0000 (15:51 +0800)]
drm/amdgpu: update eeprom once specifying one bigger threshold(v3)

During driver's probe, when it hits bad gpu tag in eeprom i2c
init calling(the tag was set when reported bad page reaches
bad page threshold in last driver's working loop), there are
some strategys to deal with the cases:

1. when the module parameter amdgpu_bad_page_threshold = 0,
that means page retirement feature is disabled, so just resetting
the eeprom is fine.
2. When amdgpu_bad_page_threshold is not 0, and moreover, user
sets one bigger valid data in order to make current boot up
succeeds, correct eeprom header tag and do not break booting.
3. For other cases, driver's probe will be broken.

v2: Just update eeprom header tag instead of resetting the whole
    table header when user sets one bigger threshold data.

v3: Use dev_info/dev_err to print PCI device information, which
    helps in mGPU case.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: disable page reservation when amdgpu_bad_page_threshold = 0
Guchun Chen [Mon, 27 Jul 2020 06:56:27 +0000 (14:56 +0800)]
drm/amdgpu: disable page reservation when amdgpu_bad_page_threshold = 0

When amdgpu_bad_page_threshold = 0, bad page reservation stuffs
are skipped in either UMC ECC irq or page retirement calling of
sync flood isr.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: decouple sysfs creating of bad page node
Guchun Chen [Fri, 31 Jul 2020 07:06:32 +0000 (15:06 +0800)]
drm/amdgpu: decouple sysfs creating of bad page node

Bad page information should not be exposed by sysfs when
bad page retirement is disabled, so decouple it from ras
sysfs group creating, and add one guard before creating.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: add one definition for RAS's sysfs/debugfs name(v2)
Guchun Chen [Fri, 31 Jul 2020 06:39:57 +0000 (14:39 +0800)]
drm/amdgpu: add one definition for RAS's sysfs/debugfs name(v2)

Add one definition for the RAS module's FS name. It's used
in both debugfs and sysfs cases.

v2: Use static variable instead of macro definition.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: restore ras flags when user resets eeprom(v2)
Guchun Chen [Wed, 22 Jul 2020 02:37:01 +0000 (10:37 +0800)]
drm/amdgpu: restore ras flags when user resets eeprom(v2)

RAS flags needs to be cleaned as well when user requires
one clean eeprom.

v2: RAS flags shall be restored after eeprom reset succeeds.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: break GPU recovery once it's in bad state(v4)
Guchun Chen [Thu, 23 Jul 2020 08:20:02 +0000 (16:20 +0800)]
drm/amdgpu: break GPU recovery once it's in bad state(v4)

When GPU executes recovery and retriving bad GPU tag
from external eerpom device, the recovery will be broken
and error message is printed as well for user's awareness.

v2: Refine warning message in threshold reaching case, and
    fix spelling typo.

v3: Fix explicit calling of bad gpu.

v4: Rename function names.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: schedule ras recovery when reaching bad page threshold(v2)
Guchun Chen [Thu, 23 Jul 2020 08:05:00 +0000 (16:05 +0800)]
drm/amdgpu: schedule ras recovery when reaching bad page threshold(v2)

Once the bad page saved to eeprom reaches the configured
threshold, ras recovery will be issued to notify user.

v2: Fix spelling typo.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: skip bad page reservation once issuing from eeprom write
Guchun Chen [Thu, 23 Jul 2020 07:50:42 +0000 (15:50 +0800)]
drm/amdgpu: skip bad page reservation once issuing from eeprom write

Once the ras recovery is issued from eeprom write itself,
bad page reservation should be ignored, otherwise, recursive
calling of writting to eeprom would happen.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: break driver init process when it's bad GPU(v5)
Guchun Chen [Thu, 23 Jul 2020 07:42:19 +0000 (15:42 +0800)]
drm/amdgpu: break driver init process when it's bad GPU(v5)

When retrieving bad gpu tag from eeprom, GPU init should
fail as the GPU needs to be retired for further check.

v2: Fix spelling typo, correct the condition to detect
    bad gpu tag and refine error message.

v3: Refine function argument name.

v4: Fix missing check of returning value of i2c
    initialization error case.

v5: Use dev_err to print PCI information in dmesg instead
    of DRM_ERROR.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: add bad gpu tag definition
Guchun Chen [Thu, 23 Jul 2020 07:35:53 +0000 (15:35 +0800)]
drm/amdgpu: add bad gpu tag definition

This tag will be hired for bad gpu detection in eeprom's access.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: validate bad page threshold in ras(v3)
Guchun Chen [Wed, 22 Jul 2020 02:00:27 +0000 (10:00 +0800)]
drm/amdgpu: validate bad page threshold in ras(v3)

Bad page threshold value should be valid in the range between
-1 and max records length of eeprom. It could determine when
saved bad pages exceed threshold value, and proceed corresponding
actions.

v2: When using the default typical value, it should be min
value between typical value and eeprom max records length.

v3: drop the case of setting bad_page_cnt_threshold to be
    0xFFFFFFFF, as it confuses user.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: add bad page count threshold in module parameter(v3)
Guchun Chen [Tue, 21 Jul 2020 10:02:00 +0000 (18:02 +0800)]
drm/amdgpu: add bad page count threshold in module parameter(v3)

bad_page_threshold could be configured to enable/disable the
associated bad page retirement feature in RAS.

When it's -1, ras will use typical bad page failure value to
handle bad page retirement.

When it's 0, disable bad page retirement, and no bad page
will be recorded and saved.

For other valid value, driver will use this manual value
as the threshold value of totoal bad pages.

v2: correct documentation of this parameter.

v3: remove confused statement in documentation.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: Replace bitmask with event idx in SMI event msg
Mukul Joshi [Thu, 30 Jul 2020 22:04:33 +0000 (18:04 -0400)]
drm/amdkfd: Replace bitmask with event idx in SMI event msg

Event bitmask is a 64-bit mask with only 1 bit set. Sending this
event bitmask in KFD SMI event message is both wasteful of memory
and potentially limiting to only 64 events. Instead send event
index in SMI event message.
Please note this change does not break the ABI for the two event
types defined so far. The new index is identical to the mask used
before.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agoRevert "drm/amdgpu: Fix NULL dereference in dpm sysfs handlers"
Alex Deucher [Thu, 30 Jul 2020 15:02:30 +0000 (11:02 -0400)]
Revert "drm/amdgpu: Fix NULL dereference in dpm sysfs handlers"

This regressed some working configurations so revert it.  Will
fix this properly for 5.9 and backport then.

This reverts commit 38e0c89a19fd13f28d2b4721035160a3e66e270b.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
5 years agodrm/amdgpu: enable GFXOFF for navy_flounder
Jiansong Chen [Thu, 30 Jul 2020 10:09:47 +0000 (18:09 +0800)]
drm/amdgpu: enable GFXOFF for navy_flounder

Enable GFXOFF for navy_flounder.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm amdgpu: Skip tmr load for SRIOV
Liu ChengZhe [Fri, 24 Jul 2020 07:55:33 +0000 (15:55 +0800)]
drm amdgpu: Skip tmr load for SRIOV

1. For Navi12, CHIP_SIENNA_CICHLID, skip tmr load operation;
2. Check pointer before release firmware.

v2: use CHIP_SIENNA_CICHLID instead
v3: remove local "bool ret"; fix grammer issue
v4: use my name instead of "root"
v5: fix grammer issue and indent issue

Signed-off-by: Liu ChengZhe <ChengZhe.Liu@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: fix PSP autoload twice in FLR
Liu ChengZhe [Fri, 24 Jul 2020 09:22:15 +0000 (17:22 +0800)]
drm/amdgpu: fix PSP autoload twice in FLR

Assigning false to block->status.hw overwrites PSP's previous
hardware status, which causes the PSP to Resume operation after
hardware init.

Remove this assignment and let the PSP execute Resume operation
when it is told to.

v2: Remove the braces.
v3: Modify the description.

Signed-off-by: Liu ChengZhe <ChengZhe.Liu@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail
Daniel Vetter [Mon, 27 Jul 2020 21:30:18 +0000 (23:30 +0200)]
drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail

Trying to grab dma_resv_lock while in commit_tail before we've done
all the code that leads to the eventual signalling of the vblank event
(which can be a dma_fence) is deadlock-y. Don't do that.

Here the solution is easy because just grabbing locks to read
something races anyway. We don't need to bother, READ_ONCE is
equivalent. And avoids the locking issue.

v2: Also take into account tmz_surface boolean, plus just delete the
old code.

Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: linux-rdma@vger.kernel.org
Cc: amd-gfx@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: Remove unneeded cast from memory allocation
Li Heng [Wed, 29 Jul 2020 08:34:01 +0000 (16:34 +0800)]
drm/amd/powerplay: Remove unneeded cast from memory allocation

Remove casting the values returned by memory allocation function.

Coccinelle emits WARNING:

./drivers/gpu/drm/amd/powerplay/hwmgr/vega20_processpptables.c:893:37-46: WARNING: casting value returned by memory allocation function to (PPTable_t *) is useless.

Signed-off-by: Li Heng <liheng40@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Prevent kernel-infoleak in amdgpu_info_ioctl()
Peilin Ye [Tue, 28 Jul 2020 19:29:24 +0000 (15:29 -0400)]
drm/amdgpu: Prevent kernel-infoleak in amdgpu_info_ioctl()

Compiler leaves a 4-byte hole near the end of `dev_info`, causing
amdgpu_info_ioctl() to copy uninitialized kernel stack memory to userspace
when `size` is greater than 356.

In 2015 we tried to fix this issue by doing `= {};` on `dev_info`, which
unfortunately does not initialize that 4-byte hole. Fix it by using
memset() instead.

Cc: stable@vger.kernel.org
Fixes: c193fa91b918 ("drm/amdgpu: information leak in amdgpu_info_ioctl()")
Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Clear dm_state for fast updates
Mazin Rezk [Mon, 27 Jul 2020 05:40:46 +0000 (05:40 +0000)]
drm/amd/display: Clear dm_state for fast updates

This patch fixes a race condition that causes a use-after-free during
amdgpu_dm_atomic_commit_tail. This can occur when 2 non-blocking commits
are requested and the second one finishes before the first. Essentially,
this bug occurs when the following sequence of events happens:

1. Non-blocking commit #1 is requested w/ a new dm_state #1 and is
deferred to the workqueue.

2. Non-blocking commit #2 is requested w/ a new dm_state #2 and is
deferred to the workqueue.

3. Commit #2 starts before commit #1, dm_state #1 is used in the
commit_tail and commit #2 completes, freeing dm_state #1.

4. Commit #1 starts after commit #2 completes, uses the freed dm_state
1 and dereferences a freelist pointer while setting the context.

Since this bug has only been spotted with fast commits, this patch fixes
the bug by clearing the dm_state instead of using the old dc_state for
fast updates. In addition, since dm_state is only used for its dc_state
and amdgpu_dm_atomic_commit_tail will retain the dc_state if none is found,
removing the dm_state should not have any consequences in fast updates.

This use-after-free bug has existed for a while now, but only caused a
noticeable issue starting from 5.7-rc1 due to 3202fa62f ("slub: relocate
freelist pointer to middle of object") moving the freelist pointer from
dm_state->base (which was unused) to dm_state->context (which is
dereferenced).

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207383
Fixes: bd200d190f45 ("drm/amd/display: Don't replace the dc_state for fast updates")
Reported-by: Duncan <1i5t5.duncan@cox.net>
Signed-off-by: Mazin Rezk <mnrzk@protonmail.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: update GC golden setting for navy_flounder
Jiansong Chen [Wed, 29 Jul 2020 03:58:21 +0000 (11:58 +0800)]
drm/amdgpu: update GC golden setting for navy_flounder

Update GC golden setting for navy_flounder.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>