]> www.infradead.org Git - users/hch/misc.git/commit
KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault
authorSean Christopherson <seanjc@google.com>
Fri, 22 Aug 2025 07:03:47 +0000 (15:03 +0800)
committerSean Christopherson <seanjc@google.com>
Wed, 10 Sep 2025 19:06:30 +0000 (12:06 -0700)
commit3ccbf6f47098f5d5e247d1b7739d0fd90802187b
treecf538cde3b7ad951ac7034e8d0399d614da4fd3d
parenta57750909580a4e3f7278ea6c13336677ea46af6
KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault

Return -EAGAIN if userspace attempts to delete or move a memslot while also
prefaulting memory for that same memslot, i.e. force userspace to retry
instead of trying to handle the scenario entirely within KVM.  Unlike
KVM_RUN, which needs to handle the scenario entirely within KVM because
userspace has come to depend on such behavior, KVM_PRE_FAULT_MEMORY can
return -EAGAIN without breaking userspace as this scenario can't have ever
worked (and there's no sane use case for prefaulting to a memslot that's
being deleted/moved).

And also unlike KVM_RUN, the prefault path doesn't naturally guarantee
forward progress.  E.g. to handle such a scenario, KVM would need to drop
and reacquire SRCU to break the deadlock between the memslot update
(synchronizes SRCU) and the prefault (waits for the memslot update to
complete).

However, dropping SRCU creates more problems, as completing the memslot
update will bump the memslot generation, which in turn will invalidate the
MMU root.  To handle that, prefaulting would need to handle pending
KVM_REQ_MMU_FREE_OBSOLETE_ROOTS requests and do kvm_mmu_reload() prior to
mapping each individual.

I.e. to fully handle this scenario, prefaulting would eventually need to
look a lot like vcpu_enter_guest().  Given that there's no reasonable use
case and practically zero risk of breaking userspace, punt the problem to
userspace and avoid adding unnecessary complexity to the prefault path.

Note, TDX's guest_memfd post-populate path is unaffected as slots_lock is
held for the entire duration of populate(), i.e. any memslot modifications
will be fully serialized against TDX's flavor of prefaulting.

Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Closes: https://lore.kernel.org/all/20250519023737.30360-1-yan.y.zhao@intel.com
Debugged-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250822070347.26451-1-yan.y.zhao@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
arch/x86/kvm/mmu/mmu.c