www.infradead.org Git - users/hch/misc.git/commit

author	Emily Deng <Emily.Deng@amd.com>
	Thu, 6 Mar 2025 00:40:01 +0000 (08:40 +0800)
committer	Alex Deucher <alexander.deucher@amd.com>
	Fri, 14 Mar 2025 03:10:16 +0000 (23:10 -0400)
commit	f844732e3ad9c4b78df7436232949b8d2096d1a6
tree	97bbb9b221dcec7b77fc845c362f35bedac43bc2	tree
parent	b9e75bcb2b39e1202364d958ee4f27fd8a6f1313	commit \| diff

drm/amdgpu: Fix the race condition for draining retry fault

Issue:
In the scenario where svm_range_restore_pages is called, but
svm->checkpoint_ts has not been set and the retry fault has not been
drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free.
Meanwhile, svm_range_restore_pages continues execution and reaches
svm_range_from_addr. This results in a "failed to find prange..." error,
causing the page recovery to fail.

How to fix:
Move the timestamp check code under the protection of svm->lock.

v2:
Make sure all right locks are released before go out.

v3:
Directly goto out_unlock_svms, and return -EAGAIN.

v4:
Refine code.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>