Leo Liu [Wed, 15 Feb 2017 15:24:55 +0000 (10:24 -0500)]
uapi/drm: add AMDGPU_HW_IP_VCN_ENC for encode CS
Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Liu [Thu, 11 May 2017 20:27:33 +0000 (16:27 -0400)]
drm/amdgpu: move amdgpu_vcn structure to vcn header
Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Liu [Wed, 21 Dec 2016 18:56:44 +0000 (13:56 -0500)]
drm/amdgpu: add encode tests for vcn
Add encode ring and ib tests.
Signed-off-by: Leo Liu <leo.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Liu [Wed, 21 Dec 2016 18:21:52 +0000 (13:21 -0500)]
drm/amdgpu: add initial vcn support and decode tests
VCN is the new media block on Raven. Add core support
and the ring and ib tests for decode.
Signed-off-by: Leo Liu <leo.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chunming Zhou [Thu, 4 May 2017 19:06:25 +0000 (15:06 -0400)]
drm/amdgpu: apply nbio7 for Raven (v3)
nbio handles misc bus io operations. Handle
differences between different nbio bus versions.
v2: switch checks from RAVEN to APU (Alex)
squash in raven rev id fetch
squash in fix uninitalized hdp flush reg index for raven
v3: add some missed RAVEN to APU checks (Alex)
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chunming Zhou [Thu, 15 Dec 2016 03:15:27 +0000 (11:15 +0800)]
drm/amdgpu/gmc9: set mc vm fb offset for raven
APU fb offset is set by sbios, which is different with DGPU.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> Reviewed-by: Ken Wang <Qingqing.Wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chunming Zhou [Tue, 13 Dec 2016 07:58:33 +0000 (15:58 +0800)]
drm/amdgpu: add module firmware for raven
Fetch correct firmware for raven for gfx and sdma.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> Reviewed-by: Ken Wang <Qingqing.Wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Thu, 11 May 2017 05:59:15 +0000 (13:59 +0800)]
drm/amdgpu:use job's list instead of check fence
because if the fence is really signaled, it could already
released so the fence pointer is a wild pointer, but if
we use job->base.node we are safe because job will not
be released untill amdgpu_job_timedout finished.
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Thu, 11 May 2017 05:36:44 +0000 (13:36 +0800)]
drm/amdgpu/SRIOV:implement guilty job TDR for(V2)
1,TDR will kickout guilty job if it hang exceed the threshold
of the given one from kernel paramter "job_hang_limit", that
way a bad command stream will not infinitly cause GPU hang.
by default this threshold is 1 so a job will be kicked out
after it hang.
2,if a job timeout TDR routine will not reset all sched/ring,
instead if will only reset on the givn one which is indicated
by @job of amdgpu_sriov_gpu_reset, that way we don't need to
reset and recover each sched/ring if we already know which job
cause GPU hang.
3,unblock sriov_gpu_reset for AI family.
V2:
1:put kickout guilty job after sched parked.
2:since parking scheduler prior to kickout already occupies a
while, we can do last check on the in question job before
doing hw_reset.
TODO:
1:when a job is considered as guilty, we should mark some flag
in its fence status flag, and let UMD side aware that this
fence signaling is not due to job complete but job hang.
2:if gpu reset cause all video memory lost, we need introduce
a new policy to implement TDR, like drop all jobs not yet
signaled, and all IOCTL on this device will return ERROR
DEVICE_LOST.
this will be implemented later.
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Thu, 11 May 2017 05:36:33 +0000 (13:36 +0800)]
drm/amdgpu:don't init entity for KIQ
We don't need a scheduler for KIQ.
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Wed, 26 Apr 2017 06:51:54 +0000 (14:51 +0800)]
drm/amdgpu:only call flr_work under infinite timeout
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Wed, 26 Apr 2017 06:51:54 +0000 (14:51 +0800)]
drm/amdgpu:use job* to replace voluntary
that way we can know which job cause hang and
can do per sched reset/recovery instead of all
sched.
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Fri, 5 May 2017 07:09:42 +0000 (15:09 +0800)]
drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset (v2)
because we don't want to do sriov-gpu-reset under certain
cases, so just split those two funtion and don't invoke
sr-iov one from bare-metal one.
V2:
remove debugfs_gpu_reset routine on SRIOV case.
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chunming Zhou [Wed, 10 May 2017 05:02:39 +0000 (13:02 +0800)]
drm/amdgpu: id reset count only is updated when used end v2
before that, we have function to check if reset happens by using reset count.
v2: always update reset count after vm flush
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chunming Zhou [Thu, 11 May 2017 18:52:48 +0000 (14:52 -0400)]
drm/amdgpu: make pipeline sync be in same place v2
v2: directly return for 'if' case.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chunming Zhou [Tue, 9 May 2017 07:50:22 +0000 (15:50 +0800)]
drm/amdgpu: add sched sync for amdgpu job v2
this is an improvement for previous patch, the sched_sync is to store fence
that could be skipped as scheduled, when job is executed, we didn't need
pipeline_sync if all fences in sched_sync are signalled, otherwise insert
pipeline_sync still.
v2: handle error when adding fence to sync failed.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> (v1) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This kind of reset handling was removed a long time ago.
v2: fix warning (Alex)
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chunming Zhou [Fri, 5 May 2017 02:50:09 +0000 (10:50 +0800)]
drm/amdgpu: print when gpu reset successed
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> Reviewed-by: Roger.He <Hongbo.He@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chunming Zhou [Fri, 5 May 2017 02:33:33 +0000 (10:33 +0800)]
drm/amdgpu: fix ring0 failed on pro card
the root cause is vram content is lost completely after pci reset.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> Reviewed-by: Roger.He <Hongbo.He@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roger.He [Fri, 5 May 2017 05:27:10 +0000 (13:27 +0800)]
drm/amdgpu: extend lock range for race condition when gpu reset
to cover below case:
1. A task gart bind/unbind but not add to adev->gtt_list yet
2. at this time gpu reset, gtt only recover those gtt in adev->gtt_list
Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Roger.He <Hongbo.He@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Xie [Tue, 9 May 2017 01:36:03 +0000 (21:36 -0400)]
drm/amdgpu: Fix comments in source code
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Xie [Mon, 8 May 2017 17:41:11 +0000 (13:41 -0400)]
drm/amdgpu: fix errors in comments.
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 4 May 2017 14:32:33 +0000 (10:32 -0400)]
drm/amdgpu/gfx9: move define to header file
rather than defining it locally.
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nikola Pajkovsky [Thu, 4 May 2017 16:39:50 +0000 (12:39 -0400)]
drm/amd/amdgpu: get rid of else branch
else branch is pointless if it's right at the end of function and use
unlikely() on err path.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Nikola Pajkovsky <npajkovsky@suse.cz> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Wed, 3 May 2017 06:55:07 +0000 (14:55 +0800)]
drm/amdgpu:cleanup flag not used
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Mon, 1 May 2017 10:09:22 +0000 (18:09 +0800)]
drm/amdgpu:use FRAME_CNTL for new GFX ucode (v2)
AI affected:
CP/HW team requires KMD insert FRAME_CONTROL(end) after
the last IB and before the fence of this DMAframe.
this is to make sure the cache are flushed, and it's a must
change no matter MCBP/SR-IOV or bare-metal case because new
CP hw won't do the cache flush for each IB anymore, it just
leaves it to KMD now.
with this patch, certain MCBP hang issue when rendering
vulkan/chained-ib are resolved.
v2: drop gfx8 changes. gfx8 is not affected (Alex)
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Mon, 1 May 2017 09:23:44 +0000 (17:23 +0800)]
drm/amdgpu:new PM4 entry for VI/AI
TMZ package will be used for VULKAN/CHAINED-IB MCBP
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Mon, 1 May 2017 09:05:02 +0000 (17:05 +0800)]
drm/amdgpu:change SR-IOV DMAframe scheme
According to CP/hw team requirment, to support PAL/CHAINED-IB
MCBP, kernel driver must guarantee DE_META must be inserted
right prior to the work_load DE IB (with PREEMPT flag), there
cannot be any non-work_load DE IB between-in DE_META and
work_load DE IB.
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Mon, 1 May 2017 10:05:06 +0000 (18:05 +0800)]
drm/amdgpu:unify gfx8/9 ce/de meta_data
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Mon, 1 May 2017 09:00:13 +0000 (17:00 +0800)]
drm/amdgpu:cleanup indent/format for gfx_v9_0.c
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Frank Min [Tue, 2 May 2017 11:49:32 +0000 (19:49 +0800)]
drm/amdgpu: clean doorbell after sending init table to mmsch
According to HW design, need to clean doorbell after setup MMSCH
table.
Signed-off-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Reviewed-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiangliang Yu [Thu, 4 May 2017 03:05:13 +0000 (11:05 +0800)]
drm/amdgpu/psp: Do not load asd for SRIOV
If psp version doesn't match asd version, asd loading will be
failed. Add workaround to bypass it for sriov.
Signed-off-by: Daniel Wang <Daniel.Wang2@amd.com> Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Reviewed-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Wed, 26 Apr 2017 04:00:49 +0000 (12:00 +0800)]
drm/amdgpu:re-write sriov_reinit_early/late (v2)
1,this way we make those routines compatible with the sequence
requirment for both Tonga and Vega10
2,ignore PSP hw init when doing TDR, because for SR-IOV device
the ucode won't get lost after VF FLR, so no need to invoke PSP
doing the ucode reloading again.
v2: squash in ARRAY_SIZE fix
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Fri, 21 Apr 2017 11:35:11 +0000 (19:35 +0800)]
drm/amdgpu:need som change on vega10 mailbox
if sriov gpu reset is invoked by job timeout, it is run
in a global work-queue which is very slow and better not call
msleep ortherwise it takes long time to get back CPU.
so make below changes:
1: Change msleep 1 to mdelay 5
2: Ignore the ack fail from pf after time out,
because VF FLR will clear ack, sometime VF FLR is done
prior to the beginning of poll_ack so we can ignore this ack
TODO:
Put job_timedout (and the following gpu reset) in a driver thread,
instead of the global work_struct.
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>