]> www.infradead.org Git - users/hch/misc.git/commit
drm/amdgpu: Fix the race condition for draining retry fault
authorEmily Deng <Emily.Deng@amd.com>
Thu, 6 Mar 2025 00:40:01 +0000 (08:40 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Fri, 14 Mar 2025 03:10:16 +0000 (23:10 -0400)
commitf844732e3ad9c4b78df7436232949b8d2096d1a6
tree97bbb9b221dcec7b77fc845c362f35bedac43bc2
parentb9e75bcb2b39e1202364d958ee4f27fd8a6f1313
drm/amdgpu: Fix the race condition for draining retry fault

Issue:
In the scenario where svm_range_restore_pages is called, but
svm->checkpoint_ts has not been set and the retry fault has not been
drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free.
Meanwhile, svm_range_restore_pages continues execution and reaches
svm_range_from_addr. This results in a "failed to find prange..." error,
 causing the page recovery to fail.

How to fix:
Move the timestamp check code under the protection of svm->lock.

v2:
Make sure all right locks are released before go out.

v3:
Directly goto out_unlock_svms, and return -EAGAIN.

v4:
Refine code.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdkfd/kfd_svm.c