]> www.infradead.org Git - users/willy/linux.git/log
users/willy/linux.git
4 years agoselftests: kvm: add hardware_disable test
Ignacio Alvarado [Sat, 13 Feb 2021 00:14:52 +0000 (00:14 +0000)]
selftests: kvm: add hardware_disable test

This test launches 512 VMs in serial and kills them after a random
amount of time.

The test was original written to exercise KVM user notifiers in
the context of1650b4ebc99d:
- KVM: Disable irq while unregistering user notifier
- https://lore.kernel.org/kvm/CACXrx53vkO=HKfwWwk+fVpvxcNjPrYmtDZ10qWxFvVX_PTGp3g@mail.gmail.com/

Recently, this test piqued my interest because it proved useful to
for AMD SNP in exercising the "in-use" pages, described in APM section
15.36.12, "Running SNP-Active Virtual Machines".

Signed-off-by: Ignacio Alvarado <ikalvarado@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Message-Id: <20210213001452.1719001-1-marcorr@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoMerge tag 'kvmarm-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmar...
Paolo Bonzini [Fri, 12 Feb 2021 16:23:44 +0000 (11:23 -0500)]
Merge tag 'kvmarm-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.12

- Make the nVHE EL2 object relocatable, resulting in much more
  maintainable code
- Handle concurrent translation faults hitting the same page
  in a more elegant way
- Support for the standard TRNG hypervisor call
- A bunch of small PMU/Debug fixes
- Allow the disabling of symbol export from assembly code
- Simplification of the early init hypercall handling

4 years agoMerge branch 'kvm-arm64/pmu-debug-fixes-5.11' into kvmarm-master/next
Marc Zyngier [Fri, 12 Feb 2021 14:08:41 +0000 (14:08 +0000)]
Merge branch 'kvm-arm64/pmu-debug-fixes-5.11' into kvmarm-master/next

Signed-off-by: Marc Zyngier <maz@kernel.org>
4 years agoMerge branch 'kvm-arm64/rng-5.12' into kvmarm-master/next
Marc Zyngier [Fri, 12 Feb 2021 14:08:25 +0000 (14:08 +0000)]
Merge branch 'kvm-arm64/rng-5.12' into kvmarm-master/next

Signed-off-by: Marc Zyngier <maz@kernel.org>
4 years agoMerge branch 'kvm-arm64/hyp-reloc' into kvmarm-master/next
Marc Zyngier [Fri, 12 Feb 2021 14:08:18 +0000 (14:08 +0000)]
Merge branch 'kvm-arm64/hyp-reloc' into kvmarm-master/next

Signed-off-by: Marc Zyngier <maz@kernel.org>
4 years agoMerge branch 'kvm-arm64/concurrent-translation-fault' into kvmarm-master/next
Marc Zyngier [Fri, 12 Feb 2021 14:08:13 +0000 (14:08 +0000)]
Merge branch 'kvm-arm64/concurrent-translation-fault' into kvmarm-master/next

Signed-off-by: Marc Zyngier <maz@kernel.org>
4 years agoMerge branch 'kvm-arm64/misc-5.12' into kvmarm-master/next
Marc Zyngier [Fri, 12 Feb 2021 14:08:07 +0000 (14:08 +0000)]
Merge branch 'kvm-arm64/misc-5.12' into kvmarm-master/next

Signed-off-by: Marc Zyngier <maz@kernel.org>
4 years agoMerge tag 'kvmarm-fixes-5.11-2' into kvmarm-master/next
Marc Zyngier [Fri, 12 Feb 2021 14:07:39 +0000 (14:07 +0000)]
Merge tag 'kvmarm-fixes-5.11-2' into kvmarm-master/next

KVM/arm64 fixes for 5.11, take #2

- Don't allow tagged pointers to point to memslots
- Filter out ARMv8.1+ PMU events on v8.0 hardware
- Hide PMU registers from userspace when no PMU is configured
- More PMU cleanups
- Don't try to handle broken PSCI firmware
- More sys_reg() to reg_to_encoding() conversions

Signed-off-by: Marc Zyngier <maz@kernel.org>
4 years agoKVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes
Sean Christopherson [Wed, 10 Feb 2021 18:26:09 +0000 (10:26 -0800)]
KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes

Add a 2 byte pad to struct compat_vcpu_info so that the sum size of its
fields is actually 64 bytes.  The effective size without the padding is
also 64 bytes due to the compiler aligning evtchn_pending_sel to a 4-byte
boundary, but depending on compiler alignment is subtle and unnecessary.

Opportunistically replace spaces with tables in the other fields.

Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210210182609.435200-6-seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: selftests: Don't bother mapping GVA for Xen shinfo test
Sean Christopherson [Wed, 10 Feb 2021 18:26:08 +0000 (10:26 -0800)]
KVM: selftests: Don't bother mapping GVA for Xen shinfo test

Don't bother mapping the Xen shinfo pages into the guest, they don't need
to be accessed using the GVAs and passing a define with "GPA" in the name
to addr_gva2hpa() is confusing.

Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210210182609.435200-5-seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: selftests: Fix hex vs. decimal snafu in Xen test
Sean Christopherson [Wed, 10 Feb 2021 18:26:07 +0000 (10:26 -0800)]
KVM: selftests: Fix hex vs. decimal snafu in Xen test

The Xen shinfo selftest uses '40' when setting the GPA of the vCPU info
struct, but checks for the result at '0x40'.  Arbitrarily use the hex
version to resolve the bug.

Fixes: 8d4e7e80838f ("KVM: x86: declare Xen HVM shared info capability and add test case")
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210210182609.435200-4-seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: selftests: Fix size of memslots created by Xen tests
Sean Christopherson [Wed, 10 Feb 2021 18:26:06 +0000 (10:26 -0800)]
KVM: selftests: Fix size of memslots created by Xen tests

For better or worse, the memslot APIs take the number of pages, not the
size in bytes.  The Xen tests need 2 pages, not 8192 pages.

Fixes: 8d4e7e80838f ("KVM: x86: declare Xen HVM shared info capability and add test case")
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210210182609.435200-3-seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: selftests: Ignore recently added Xen tests' build output
Sean Christopherson [Wed, 10 Feb 2021 18:26:05 +0000 (10:26 -0800)]
KVM: selftests: Ignore recently added Xen tests' build output

Add the new Xen test binaries to KVM selftest's .gitnore.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210210182609.435200-2-seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: selftests: Add missing header file needed by xAPIC IPI tests
Peter Shier [Wed, 10 Feb 2021 01:17:47 +0000 (17:17 -0800)]
KVM: selftests: Add missing header file needed by xAPIC IPI tests

Fixes: 678e90a349a4 ("KVM: selftests: Test IPI to halted vCPU in xAPIC while backing page moves")
Cc: Andrew Jones <drjones@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Peter Shier <pshier@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210210011747.240913-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c
Ricardo Koller [Wed, 10 Feb 2021 03:17:19 +0000 (03:17 +0000)]
KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c

Building the KVM selftests with LLVM's integrated assembler fails with:

  $ CFLAGS=-fintegrated-as make -C tools/testing/selftests/kvm CC=clang
  lib/x86_64/svm.c:77:16: error: too few operands for instruction
          asm volatile ("vmsave\n\t" : : "a" (vmcb_gpa) : "memory");
                        ^
  <inline asm>:1:2: note: instantiated into assembly here
          vmsave
          ^
  lib/x86_64/svm.c:134:3: error: too few operands for instruction
                  "vmload\n\t"
                  ^
  <inline asm>:1:2: note: instantiated into assembly here
          vmload
          ^
This is because LLVM IAS does not currently support calling vmsave,
vmload, or vmload without an explicit %rax operand.

Add an explicit operand to vmsave, vmload, and vmrum in svm.c. Fixing
this was suggested by Sean Christopherson.

Tested: building without this error in clang 11. The following patch
(not queued yet) needs to be applied to solve the other remaining error:
"selftests: kvm: remove reassignment of non-absolute variables".

Suggested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/kvm/X+Df2oQczVBmwEzi@google.com/
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Message-Id: <20210210031719.769837-1-ricarkol@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: SVM: Make symbol 'svm_gp_erratum_intercept' static
Wei Yongjun [Wed, 10 Feb 2021 07:59:58 +0000 (07:59 +0000)]
KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static

The sparse tool complains as follows:

arch/x86/kvm/svm/svm.c:204:6: warning:
 symbol 'svm_gp_erratum_intercept' was not declared. Should it be static?

This symbol is not used outside of svm.c, so this
commit marks it static.

Fixes: 82a11e9c6fa2b ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Message-Id: <20210210075958.1096317-1-weiyongjun1@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoMerge tag 'kvm-ppc-next-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Thu, 11 Feb 2021 13:00:04 +0000 (08:00 -0500)]
Merge tag 'kvm-ppc-next-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

PPC KVM update for 5.12

- Support for second data watchpoint on POWER10, from Ravi Bangoria
- Remove some complex workarounds for buggy early versions of POWER9
- Guest entry/exit fixes from Nick Piggin and Fabiano Rosas

4 years agolocking/arch: Move qrwlock.h include after qspinlock.h
Waiman Long [Wed, 10 Feb 2021 18:16:31 +0000 (13:16 -0500)]
locking/arch: Move qrwlock.h include after qspinlock.h

include/asm-generic/qrwlock.h was trying to get arch_spin_is_locked via
asm-generic/qspinlock.h.  However, this does not work because architectures
might be using queued rwlocks but not queued spinlocks (csky), or because they
might be defining their own queued_* macros before including asm/qspinlock.h.

To fix this, ensure that asm/spinlock.h always includes qrwlock.h after
defining arch_spin_is_locked (either directly for csky, or via
asm/qspinlock.h for other architectures).  The only inclusion elsewhere
is in kernel/locking/qrwlock.c.  That one is really unnecessary because
the file is only compiled in SMP configurations (config QUEUED_RWLOCKS
depends on SMP) and in that case linux/spinlock.h already includes
asm/qrwlock.h if needed, via asm/spinlock.h.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Waiman Long <longman@redhat.com>
Fixes: 26128cb6c7e6 ("locking/rwlocks: Add contention detection for rwlocks")
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ben Gardon <bgardon@google.com>
[Add arch/sparc and kernel/locking parts per discussion with Waiman. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests
Nicholas Piggin [Thu, 11 Feb 2021 05:38:30 +0000 (15:38 +1000)]
KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests

Commit 68ad28a4cdd4 ("KVM: PPC: Book3S HV: Fix radix guest SLB side
channel") incorrectly removed the radix host instruction patch to skip
re-loading the host SLB entries when exiting from a hash
guest. Restore it.

Fixes: 68ad28a4cdd4 ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoKVM: PPC: Book3S HV: Ensure radix guest has no SLB entries
Paul Mackerras [Thu, 11 Feb 2021 06:09:45 +0000 (17:09 +1100)]
KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries

Commit 68ad28a4cdd4 ("KVM: PPC: Book3S HV: Fix radix guest SLB side
channel") changed the older guest entry path, with the side effect
that vcpu->arch.slb_max no longer gets cleared for a radix guest.
This means that a HPT guest which loads some SLB entries, switches to
radix mode, runs the guest using the old guest entry path (e.g.,
because the indep_threads_mode module parameter has been set to
false), and then switches back to HPT mode would now see the old SLB
entries being present, whereas previously it would have seen no SLB
entries.

To avoid changing guest-visible behaviour, this adds a store
instruction to clear vcpu->arch.slb_max for a radix guest using the
old guest entry path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoKVM: PPC: Don't always report hash MMU capability for P9 < DD2.2
Fabiano Rosas [Fri, 5 Feb 2021 16:41:54 +0000 (13:41 -0300)]
KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2

These machines don't support running both MMU types at the same time,
so remove the KVM_CAP_PPC_MMU_HASH_V3 capability when the host is
using Radix MMU.

[paulus@ozlabs.org - added defensive check on
 kvmppc_hv_ops->hash_v3_possible]

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoKVM: PPC: Book3S HV: Save and restore FSCR in the P9 path
Fabiano Rosas [Thu, 4 Feb 2021 20:05:17 +0000 (17:05 -0300)]
KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path

The Facility Status and Control Register is a privileged SPR that
defines the availability of some features in problem state. Since it
can be written by the guest, we must restore it to the previous host
value after guest exit.

This restoration is currently done by taking the value from
current->thread.fscr, which in the P9 path is not enough anymore
because the guest could context switch the QEMU thread, causing the
guest-current value to be saved into the thread struct.

The above situation manifested when running a QEMU linked against a
libc with System Call Vectored support, which causes scv
instructions to be run by QEMU early during the guest boot (during
SLOF), at which point the FSCR is 0 due to guest entry. After a few
scv calls (1 to a couple hundred), the context switching happens and
the QEMU thread runs with the guest value, resulting in a Facility
Unavailable interrupt.

This patch saves and restores the host value of FSCR in the inner
guest entry loop in a way independent of current->thread.fscr. The old
way of doing it is still kept in place because it works for the old
entry path.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoKVM: PPC: remove unneeded semicolon
Yang Li [Tue, 2 Feb 2021 03:30:53 +0000 (11:30 +0800)]
KVM: PPC: remove unneeded semicolon

Eliminate the following coccicheck warning:
./arch/powerpc/kvm/booke.c:701:2-3: Unneeded semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoKVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB
Nicholas Piggin [Mon, 18 Jan 2021 06:28:09 +0000 (16:28 +1000)]
KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB

IH=6 may preserve hypervisor real-mode ERAT entries and is the
recommended SLBIA hint for switching partitions.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoKVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest
Nicholas Piggin [Mon, 18 Jan 2021 06:28:08 +0000 (16:28 +1000)]
KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoKVM: PPC: Book3S HV: Fix radix guest SLB side channel
Nicholas Piggin [Mon, 18 Jan 2021 06:28:07 +0000 (16:28 +1000)]
KVM: PPC: Book3S HV: Fix radix guest SLB side channel

The slbmte instruction is legal in radix mode, including radix guest
mode. This means radix guests can load the SLB with arbitrary data.

KVM host does not clear the SLB when exiting a guest if it was a
radix guest, which would allow a rogue radix guest to use the SLB as
a side channel to communicate with other guests.

Fix this by ensuring the SLB is cleared when coming out of a radix
guest. Only the first 4 entries are a concern, because radix guests
always run with LPCR[UPRT]=1, which limits the reach of slbmte. slbia
is not used (except in a non-performance-critical path) because it
can clear cached translations.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoKVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed...
Nicholas Piggin [Mon, 18 Jan 2021 06:28:06 +0000 (16:28 +1000)]
KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support

This reverts much of commit c01015091a770 ("KVM: PPC: Book3S HV: Run HPT
guests on POWER9 radix hosts"), which was required to run HPT guests on
RPT hosts on early POWER9 CPUs without support for "mixed mode", which
meant the host could not run with MMU on while guests were running.

This code has some corner case bugs, e.g., when the guest hits a machine
check or HMI the primary locks up waiting for secondaries to switch LPCR
to host, which they never do. This could all be fixed in software, but
most CPUs in production have mixed mode support, and those that don't
are believed to be all in installations that don't use this capability.
So simplify things and remove support.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoKVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR
Ravi Bangoria [Wed, 16 Dec 2020 10:42:19 +0000 (16:12 +0530)]
KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR

Introduce KVM_CAP_PPC_DAWR1 which can be used by QEMU to query whether
KVM supports 2nd DAWR or not. The capability is by default disabled
even when the underlying CPU supports 2nd DAWR. QEMU needs to check
and enable it manually to use the feature.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoKVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR
Ravi Bangoria [Wed, 16 Dec 2020 10:42:18 +0000 (16:12 +0530)]
KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR

KVM code assumes single DAWR everywhere. Add code to support 2nd DAWR.
DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/
unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR.
Also, KVM will support 2nd DAWR only if CPU_FTR_DAWR1 is set.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoKVM: PPC: Book3S HV: Rename current DAWR macros and variables
Ravi Bangoria [Wed, 16 Dec 2020 10:42:17 +0000 (16:12 +0530)]
KVM: PPC: Book3S HV: Rename current DAWR macros and variables

Power10 is introducing a second DAWR (Data Address Watchpoint
Register). Use real register names (with suffix 0) from ISA for
current macros and variables used by kvm.  One exception is
KVM_REG_PPC_DAWR.  Keep it as it is because it's uapi so changing it
will break userspace.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoKVM: PPC: Book3S HV: Allow nested guest creation when L0 hv_guest_state > L1
Ravi Bangoria [Wed, 16 Dec 2020 10:42:16 +0000 (16:12 +0530)]
KVM: PPC: Book3S HV: Allow nested guest creation when L0 hv_guest_state > L1

On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED
hcall to load L2 guest state in cpu. L1 hypervisor prepares the
L2 state in struct hv_guest_state and passes a pointer to it via
hcall. Using that pointer, L0 reads/writes that state directly
from/to L1 memory. Thus L0 must be aware of hv_guest_state layout
of L1. Currently it uses version field to achieve this. i.e. If
L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't
allow nested kvm guest.

This restriction can be loosened up a bit. L0 can be taught to
understand older layout of hv_guest_state, if we restrict the
new members to be added only at the end, i.e. we can allow
nested guest even when L0 hv_guest_state.version > L1
hv_guest_state.version. Though, the other way around is not
possible.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years agoDocumentation: kvm: fix warning
Paolo Bonzini [Tue, 9 Feb 2021 10:07:45 +0000 (05:07 -0500)]
Documentation: kvm: fix warning

Documentation/virt/kvm/api.rst:4927: WARNING: Title underline too short.

4.130 KVM_XEN_VCPU_GET_ATTR
--------------------------

Fixes: e1f68169a4f8 ("KVM: Add documentation for Xen hypercall and shared_info updates")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86/xen: Allow reset of Xen attributes
David Woodhouse [Mon, 8 Feb 2021 23:23:25 +0000 (23:23 +0000)]
KVM: x86/xen: Allow reset of Xen attributes

In order to support Xen SHUTDOWN_soft_reset (for guest kexec, etc.) the
VMM needs to be able to tear everything down and return the Xen features
to a clean slate.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20210208232326.1830370-1-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86/mmu: Make HVA handler retpoline-friendly
Maciej S. Szmigiero [Mon, 8 Feb 2021 18:51:32 +0000 (19:51 +0100)]
KVM: x86/mmu: Make HVA handler retpoline-friendly

When retpolines are enabled they have high overhead in the inner loop
inside kvm_handle_hva_range() that iterates over the provided memory area.

Let's mark this function and its TDP MMU equivalent __always_inline so
compiler will be able to change the call to the actual handler function
inside each of them into a direct one.

This significantly improves performance on the unmap test on the existing
kernel memslot code (tested on a Xeon 8167M machine):
30 slots in use:
Test       Before   After     Improvement
Unmap      0.0353s  0.0334s   5%
Unmap 2M   0.00104s 0.000407s 61%

509 slots in use:
Test       Before   After     Improvement
Unmap      0.0742s  0.0740s   None
Unmap 2M   0.00221s 0.00159s  28%

Looks like having an indirect call in these functions (and, so, a
retpoline) might have interfered with unrolling of the whole loop in the
CPU.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <732d3fe9eb68aa08402a638ab0309199fa89ae56.1612810129.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Drop hv_vcpu_to_vcpu() helper
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:16 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Drop hv_vcpu_to_vcpu() helper

hv_vcpu_to_vcpu() helper is only used by other helpers and
is not very complex, we can drop it without much regret.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-16-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Allocate Hyper-V context lazily
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:15 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Allocate Hyper-V context lazily

Hyper-V context is only needed for guests which use Hyper-V emulation in
KVM (e.g. Windows/Hyper-V guests) so we don't actually need to allocate
it in kvm_arch_vcpu_create(), we can postpone the action until Hyper-V
specific MSRs are accessed or SynIC is enabled.

Once allocated, let's keep the context alive for the lifetime of the vCPU
as an attempt to free it would require additional synchronization with
other vCPUs and normally it is not supposed to happen.

Note, Hyper-V style hypercall enablement is done by writing to
HV_X64_MSR_GUEST_OS_ID so we don't need to worry about allocating Hyper-V
context from kvm_hv_hypercall().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-15-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Make Hyper-V emulation enablement conditional
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:14 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Make Hyper-V emulation enablement conditional

Hyper-V emulation is enabled in KVM unconditionally. This is bad at least
from security standpoint as it is an extra attack surface. Ideally, there
should be a per-VM capability explicitly enabled by VMM but currently it
is not the case and we can't mandate one without breaking backwards
compatibility. We can, however, check guest visible CPUIDs and only enable
Hyper-V emulation when "Hv#1" interface was exposed in
HYPERV_CPUID_INTERFACE.

Note, VMMs are free to act in any sequence they like, e.g. they can try
to set MSRs first and CPUIDs later so we still need to allow the host
to read/write Hyper-V specific MSRs unconditionally.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-14-vkuznets@redhat.com>
[Add selftest vcpu_set_hv_cpuid API to avoid breaking xen_vmcall_test. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Allocate 'struct kvm_vcpu_hv' dynamically
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:13 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Allocate 'struct kvm_vcpu_hv' dynamically

Hyper-V context is only needed for guests which use Hyper-V emulation in
KVM (e.g. Windows/Hyper-V guests). 'struct kvm_vcpu_hv' is, however, quite
big, it accounts for more than 1/4 of the total 'struct kvm_vcpu_arch'
which is also quite big already. This all looks like a waste.

Allocate 'struct kvm_vcpu_hv' dynamically. This patch does not bring any
(intentional) functional change as we still allocate the context
unconditionally but it paves the way to doing that only when needed.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-13-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:12 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context

Currently, Hyper-V context is part of 'struct kvm_vcpu_arch' and is always
available. As a preparation to allocating it dynamically, check that it is
not NULL at call sites which can normally proceed without it i.e. the
behavior is identical to the situation when Hyper-V emulation is not being
used by the guest.

When Hyper-V context for a particular vCPU is not allocated, we may still
need to get 'vp_index' from there. E.g. in a hypothetical situation when
Hyper-V emulation was enabled on one CPU and wasn't on another, Hyper-V
style send-IPI hypercall may still be used. Luckily, vp_index is always
initialized to kvm_vcpu_get_idx() and can only be changed when Hyper-V
context is present. Introduce kvm_hv_get_vpindex() helper for
simplification.

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-12-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Always use to_hv_vcpu() accessor to get to 'struct kvm_vcpu_hv'
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:11 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Always use to_hv_vcpu() accessor to get to 'struct kvm_vcpu_hv'

As a preparation to allocating Hyper-V context dynamically, make it clear
who's the user of the said context.

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-11-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Stop shadowing global 'current_vcpu' variable
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:10 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Stop shadowing global 'current_vcpu' variable

'current_vcpu' variable in KVM is a per-cpu pointer to the currently
scheduled vcpu. kvm_hv_flush_tlb()/kvm_hv_send_ipi() functions used
to have local 'vcpu' variable to iterate over vCPUs but it's gone
now and there's no need to use anything but the standard 'vcpu' as
an argument.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-10-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Introduce to_kvm_hv() helper
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:09 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Introduce to_kvm_hv() helper

Spelling '&kvm->arch.hyperv' correctly is hard. Also, this makes the code
more consistent with vmx/svm where to_kvm_vmx()/to_kvm_svm() are already
being used.

Opportunistically change kvm_hv_msr_{get,set}_crash_{data,ctl}() and
kvm_hv_msr_set_crash_data() to take 'kvm' instead of 'vcpu' as these
MSRs are partition wide.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-9-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Rename vcpu_to_hv_syndbg() to to_hv_syndbg()
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:08 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Rename vcpu_to_hv_syndbg() to to_hv_syndbg()

vcpu_to_hv_syndbg()'s argument is  always 'vcpu' so there's no need to have
an additional prefix. Also, this makes the code more consistent with
vmx/svm where to_vmx()/to_svm() are being used.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-8-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Rename vcpu_to_stimer()/stimer_to_vcpu()
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:07 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Rename vcpu_to_stimer()/stimer_to_vcpu()

vcpu_to_stimers()'s argument is almost always 'vcpu' so there's no need to
have an additional prefix. Also, this makes the naming more consistent with
to_hv_vcpu()/to_hv_synic().

Rename stimer_to_vcpu() to hv_stimer_to_vcpu() for consitency.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Rename vcpu_to_synic()/synic_to_vcpu()
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:06 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Rename vcpu_to_synic()/synic_to_vcpu()

vcpu_to_synic()'s argument is almost always 'vcpu' so there's no need to
have an additional prefix. Also, as this is used outside of hyper-v
emulation code, add '_hv_' part to make it clear what this s. This makes
the naming more consistent with to_hv_vcpu().

Rename synic_to_vcpu() to hv_synic_to_vcpu() for consistency.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Rename vcpu_to_hv_vcpu() to to_hv_vcpu()
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:05 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Rename vcpu_to_hv_vcpu() to to_hv_vcpu()

vcpu_to_hv_vcpu()'s argument is almost always 'vcpu' so there's
no need to have an additional prefix. Also, this makes the code
more consistent with vmx/svm where to_vmx()/to_svm() are being
used.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: hyper-v: Drop unused kvm_hv_vapic_assist_page_enabled()
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:04 +0000 (14:48 +0100)]
KVM: x86: hyper-v: Drop unused kvm_hv_vapic_assist_page_enabled()

kvm_hv_vapic_assist_page_enabled() seems to be unused since its
introduction in commit 10388a07164c1 ("KVM: Add HYPER-V apic access MSRs"),
drop it.

Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoselftests: kvm: Properly set Hyper-V CPUIDs in evmcs_test
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:03 +0000 (14:48 +0100)]
selftests: kvm: Properly set Hyper-V CPUIDs in evmcs_test

Generally, when Hyper-V emulation is enabled, VMM is supposed to set
Hyper-V CPUID identifications so the guest knows that Hyper-V features
are available. evmcs_test doesn't currently do that but so far Hyper-V
emulation in KVM was enabled unconditionally. As we are about to change
that, proper Hyper-V CPUID identification should be set in selftests as
well.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoselftests: kvm: Move kvm_get_supported_hv_cpuid() to common code
Vitaly Kuznetsov [Tue, 26 Jan 2021 13:48:02 +0000 (14:48 +0100)]
selftests: kvm: Move kvm_get_supported_hv_cpuid() to common code

kvm_get_supported_hv_cpuid() may come handy in all Hyper-V related tests.
Split it off hyperv_cpuid test, create system-wide and vcpu versions.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210126134816.1880136-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: Raise the maximum number of user memslots
Vitaly Kuznetsov [Thu, 28 Jan 2021 18:01:31 +0000 (13:01 -0500)]
KVM: Raise the maximum number of user memslots

Current KVM_USER_MEM_SLOTS limits are arch specific (512 on Power, 509 on x86,
32 on s390, 16 on MIPS) but they don't really need to be. Memory slots are
allocated dynamically in KVM when added so the only real limitation is
'id_to_index' array which is 'short'. We don't have any other
KVM_MEM_SLOTS_NUM/KVM_USER_MEM_SLOTS-sized statically defined structures.

Low KVM_USER_MEM_SLOTS can be a limiting factor for some configurations.
In particular, when QEMU tries to start a Windows guest with Hyper-V SynIC
enabled and e.g. 256 vCPUs the limit is hit as SynIC requires two pages per
vCPU and the guest is free to pick any GFN for each of them, this fragments
memslots as QEMU wants to have a separate memslot for each of these pages
(which are supposed to act as 'overlay' pages).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210127175731.2020089-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoselftests: kvm: Raise the default timeout to 120 seconds
Vitaly Kuznetsov [Wed, 27 Jan 2021 17:57:31 +0000 (18:57 +0100)]
selftests: kvm: Raise the default timeout to 120 seconds

With the updated maximum number of user memslots (32)
set_memory_region_test sometimes takes longer than the default 45 seconds
to finish. Raise the value to an arbitrary 120 seconds.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210127175731.2020089-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: move kvm_inject_gp up from kvm_set_dr to callers
Paolo Bonzini [Mon, 14 Dec 2020 12:49:54 +0000 (07:49 -0500)]
KVM: x86: move kvm_inject_gp up from kvm_set_dr to callers

Push the injection of #GP up to the callers, so that they can just use
kvm_complete_insn_gp. __kvm_set_dr is pretty much what the callers can use
together with kvm_complete_insn_gp, so rename it to kvm_set_dr and drop
the old kvm_set_dr wrapper.

This also allows nested VMX code, which really wanted to use __kvm_set_dr,
to use the right function.

While at it, remove the kvm_require_dr() check from the SVM interception.
The APM states:

  All normal exception checks take precedence over the SVM intercepts.

which includes the CR4.DE=1 #UD.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: reading DR cannot fail
Paolo Bonzini [Wed, 3 Feb 2021 08:42:41 +0000 (03:42 -0500)]
KVM: x86: reading DR cannot fail

kvm_get_dr and emulator_get_dr except an in-range value for the register
number so they cannot fail.  Change the return type to void.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: SVM: Remove an unnecessary forward declaration
Sean Christopherson [Fri, 5 Feb 2021 00:57:43 +0000 (16:57 -0800)]
KVM: SVM: Remove an unnecessary forward declaration

Drop a defunct forward declaration of svm_complete_interrupts().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210205005750.3841462-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: SVM: Move AVIC vCPU kicking snippet to helper function
Sean Christopherson [Fri, 5 Feb 2021 00:57:42 +0000 (16:57 -0800)]
KVM: SVM: Move AVIC vCPU kicking snippet to helper function

Add a helper function to handle kicking non-running vCPUs when sending
virtual IPIs.  A future patch will change SVM's interception functions
to take @vcpu instead of @svm, at which piont declaring and modifying
'vcpu' in a case statement is confusing, and potentially dangerous.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210205005750.3841462-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: Restore all 64 bits of DR6 and DR7 during RSM on x86-64
Sean Christopherson [Fri, 5 Feb 2021 01:24:58 +0000 (17:24 -0800)]
KVM: x86: Restore all 64 bits of DR6 and DR7 during RSM on x86-64

Restore the full 64-bit values of DR6 and DR7 when emulating RSM on
x86-64, as defined by both Intel's SDM and AMD's APM.

Note, bits 63:32 of DR6 and DR7 are reserved, so this is a glorified nop
unless the SMM handler is poking into SMRAM, which it most definitely
shouldn't be doing since both Intel and AMD list the DR6 and DR7 fields
as read-only.

Fixes: 660a5d517aaa ("KVM: x86: save/load state on SMM switch")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210205012458.3872687-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: Remove misleading DR6/DR7 adjustments from RSM emulation
Sean Christopherson [Fri, 5 Feb 2021 01:24:57 +0000 (17:24 -0800)]
KVM: x86: Remove misleading DR6/DR7 adjustments from RSM emulation

Drop the DR6/7 volatile+fixed bits adjustments in RSM emulation, which
are redundant and misleading.  The necessary adjustments are made by
kvm_set_dr(), which properly sets the fixed bits that are conditional
on the vCPU model.

Note, KVM incorrectly reads only bits 31:0 of the DR6/7 fields when
emulating RSM on x86-64.  On the plus side for this change, that bug
makes removing "& DRx_VOLATILE" a nop.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210205012458.3872687-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86/xen: Use hva_t for holding hypercall page address
Sean Christopherson [Mon, 8 Feb 2021 20:15:02 +0000 (12:15 -0800)]
KVM: x86/xen: Use hva_t for holding hypercall page address

Use hva_t, a.k.a. unsigned long, for the local variable that holds the
hypercall page address.  On 32-bit KVM, gcc complains about using a u64
due to the implicit cast from a 64-bit value to a 32-bit pointer.

  arch/x86/kvm/xen.c: In function â€˜kvm_xen_write_hypercall_page’:
  arch/x86/kvm/xen.c:300:22: error: cast to pointer from integer of
                             different size [-Werror=int-to-pointer-cast]
  300 |   page = memdup_user((u8 __user *)blob_addr, PAGE_SIZE);

Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Fixes: 23200b7a30de ("KVM: x86/xen: intercept xen hypercalls if enabled")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210208201502.1239867-1-seanjc@google.com>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86/xen: Remove extra unlock in kvm_xen_hvm_set_attr()
David Woodhouse [Mon, 8 Feb 2021 23:23:26 +0000 (23:23 +0000)]
KVM: x86/xen: Remove extra unlock in kvm_xen_hvm_set_attr()

This accidentally ended up locking and then immediately unlocking kvm->lock
at the beginning of the function. Fix it.

Fixes: a76b9641ad1c ("KVM: x86/xen: add KVM_XEN_HVM_SET_ATTR/KVM_XEN_HVM_GET_ATTR")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20210208232326.1830370-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped()
Sean Christopherson [Mon, 8 Feb 2021 20:19:40 +0000 (12:19 -0800)]
KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped()

Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving
a so called "remapped" hva/pfn pair.  In theory, the hva could resolve to
a pfn in high memory on a 32-bit kernel.

This bug was inadvertantly exposed by commit bd2fae8da794 ("KVM: do not
assume PTE is writable after follow_pfn"), which added an error PFN value
to the mix, causing gcc to comlain about overflowing the unsigned long.

  arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function â€˜hva_to_pfn_remapped’:
  include/linux/kvm_host.h:89:30: error: conversion from â€˜long long unsigned int’
                                  to â€˜long unsigned int’ changes value from
                                  â€˜9218868437227405314’ to â€˜2’ [-Werror=overflow]
   89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2)
      |                              ^
virt/kvm/kvm_main.c:1935:9: note: in expansion of macro â€˜KVM_PFN_ERR_RO_FAULT’

Cc: stable@vger.kernel.org
Fixes: add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210208201940.1258328-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agomm: provide a saner PTE walking API for modules
Paolo Bonzini [Fri, 5 Feb 2021 10:07:11 +0000 (05:07 -0500)]
mm: provide a saner PTE walking API for modules

Currently, the follow_pfn function is exported for modules but
follow_pte is not.  However, follow_pfn is very easy to misuse,
because it does not provide protections (so most of its callers
assume the page is writable!) and because it returns after having
already unlocked the page table lock.

Provide instead a simplified version of follow_pte that does
not have the pmdpp and range arguments.  The older version
survives as follow_invalidate_pte() for use by fs/dax.c.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: compile out TDP MMU on 32-bit systems
Paolo Bonzini [Sat, 6 Feb 2021 14:53:33 +0000 (09:53 -0500)]
KVM: x86: compile out TDP MMU on 32-bit systems

The TDP MMU assumes that it can do atomic accesses to 64-bit PTEs.
Rather than just disabling it, compile it out completely so that it
is possible to use for example 64-bit xchg.

To limit the number of stubs, wrap all accesses to tdp_mmu_enabled
or tdp_mmu_page with a function.  Calls to all other functions in
tdp_mmu.c are eliminated and do not even reach the linker.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoi915: kvmgt: the KVM mmu_lock is now an rwlock
Paolo Bonzini [Mon, 8 Feb 2021 11:29:28 +0000 (06:29 -0500)]
i915: kvmgt: the KVM mmu_lock is now an rwlock

Adjust the KVMGT page tracking callbacks.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: Add helper to consolidate "raw" reserved GPA mask calculations
Sean Christopherson [Thu, 4 Feb 2021 00:01:15 +0000 (16:01 -0800)]
KVM: x86: Add helper to consolidate "raw" reserved GPA mask calculations

Add a helper to generate the mask of reserved GPA bits _without_ any
adjustments for repurposed bits, and use it to replace a variety of
open coded variants in the MTRR and APIC_BASE flows.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86/mmu: Add helper to generate mask of reserved HPA bits
Sean Christopherson [Thu, 4 Feb 2021 00:01:14 +0000 (16:01 -0800)]
KVM: x86/mmu: Add helper to generate mask of reserved HPA bits

Add a helper to generate the mask of reserved PA bits in the host.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: Use reserved_gpa_bits to calculate reserved PxE bits
Sean Christopherson [Thu, 4 Feb 2021 00:01:13 +0000 (16:01 -0800)]
KVM: x86: Use reserved_gpa_bits to calculate reserved PxE bits

Use reserved_gpa_bits, which accounts for exceptions to the maxphyaddr
rule, e.g. SEV's C-bit, for the page {table,directory,etc...} entry (PxE)
reserved bits checks.  For SEV, the C-bit is ignored by hardware when
walking pages tables, e.g. the APM states:

  Note that while the guest may choose to set the C-bit explicitly on
  instruction pages and page table addresses, the value of this bit is a
  don't-care in such situations as hardware always performs these as
  private accesses.

Such behavior is expected to hold true for other features that repurpose
GPA bits, e.g. KVM could theoretically emulate SME or MKTME, which both
allow non-zero repurposed bits in the page tables.  Conceptually, KVM
should apply reserved GPA checks universally, and any features that do
not adhere to the basic rule should be explicitly handled, i.e. if a GPA
bit is repurposed but not allowed in page tables for whatever reason.

Refactor __reset_rsvds_bits_mask() to take the pre-generated reserved
bits mask, and opportunistically clean up its code, e.g. to align lines
and comments.

Practically speaking, this is change is a likely a glorified nop given
the current KVM code base.  SEV's C-bit is the only repurposed GPA bit,
and KVM doesn't support shadowing encrypted page tables (which is
theoretically possible via SEV debug APIs).

Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: SEV: Treat C-bit as legal GPA bit regardless of vCPU mode
Sean Christopherson [Thu, 4 Feb 2021 00:01:12 +0000 (16:01 -0800)]
KVM: x86: SEV: Treat C-bit as legal GPA bit regardless of vCPU mode

Rename cr3_lm_rsvd_bits to reserved_gpa_bits, and use it for all GPA
legality checks.  AMD's APM states:

  If the C-bit is an address bit, this bit is masked from the guest
  physical address when it is translated through the nested page tables.

Thus, any access that can conceivably be run through NPT should ignore
the C-bit when checking for validity.

For features that KVM emulates in software, e.g. MTRRs, there is no
clear direction in the APM for how the C-bit should be handled.  For
such cases, follow the SME behavior inasmuch as possible, since SEV is
is essentially a VM-specific variant of SME.  For SME, the APM states:

  In this case the upper physical address bits are treated as reserved
  when the feature is enabled except where otherwise indicated.

Collecting the various relavant SME snippets in the APM and cross-
referencing the omissions with Linux kernel code, this leaves MTTRs and
APIC_BASE as the only flows that KVM emulates that should _not_ ignore
the C-bit.

Note, this means the reserved bit checks in the page tables are
technically broken.  This will be remedied in a future patch.

Although the page table checks are technically broken, in practice, it's
all but guaranteed to be irrelevant.  NPT is required for SEV, i.e.
shadowing page tables isn't needed in the common case.  Theoretically,
the checks could be in play for nested NPT, but it's extremely unlikely
that anyone is running nested VMs on SEV, as doing so would require L1
to expose sensitive data to L0, e.g. the entire VMCB.  And if anyone is
running nested VMs, L0 can't read the guest's encrypted memory, i.e. L1
would need to put its NPT in shared memory, in which case the C-bit will
never be set.  Or, L1 could use shadow paging, but again, if L0 needs to
read page tables, e.g. to load PDPTRs, the memory can't be encrypted if
L1 has any expectation of L0 doing the right thing.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: nSVM: Use common GPA helper to check for illegal CR3
Sean Christopherson [Thu, 4 Feb 2021 00:01:11 +0000 (16:01 -0800)]
KVM: nSVM: Use common GPA helper to check for illegal CR3

Replace an open coded check for an invalid CR3 with its equivalent
helper.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: VMX: Use GPA legality helpers to replace open coded equivalents
Sean Christopherson [Thu, 4 Feb 2021 00:01:10 +0000 (16:01 -0800)]
KVM: VMX: Use GPA legality helpers to replace open coded equivalents

Replace a variety of open coded GPA checks with the recently introduced
common helpers.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: Add a helper to handle legal GPA with an alignment requirement
Sean Christopherson [Thu, 4 Feb 2021 00:01:09 +0000 (16:01 -0800)]
KVM: x86: Add a helper to handle legal GPA with an alignment requirement

Add a helper to genericize checking for a legal GPA that also must
conform to an arbitrary alignment, and use it in the existing
page_address_valid().  Future patches will replace open coded variants
in VMX and SVM.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: Add a helper to check for a legal GPA
Sean Christopherson [Thu, 4 Feb 2021 00:01:08 +0000 (16:01 -0800)]
KVM: x86: Add a helper to check for a legal GPA

Add a helper to check for a legal GPA, and use it to consolidate code
in existing, related helpers.  Future patches will extend usage to
VMX and SVM code, properly handle exceptions to the maxphyaddr rule, and
add more helpers.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs
Sean Christopherson [Thu, 4 Feb 2021 00:01:07 +0000 (16:01 -0800)]
KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs

Don't clear the SME C-bit when reading a guest PDPTR, as the GPA (CR3) is
in the guest domain.

Barring a bizarre paravirtual use case, this is likely a benign bug.  SME
is not emulated by KVM, loading SEV guest PDPTRs is doomed as KVM can't
use the correct key to read guest memory, and setting guest MAXPHYADDR
higher than the host, i.e. overlapping the C-bit, would cause faults in
the guest.

Note, for SEV guests, stripping the C-bit is technically aligned with CPU
behavior, but for KVM it's the greater of two evils.  Because KVM doesn't
have access to the guest's encryption key, ignoring the C-bit would at
best result in KVM reading garbage.  By keeping the C-bit, KVM will
fail its read (unless userspace creates a memslot with the C-bit set).
The guest will still undoubtedly die, as KVM will use '0' for the PDPTR
value, but that's preferable to interpreting encrypted data as a PDPTR.

Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: Set so called 'reserved CR3 bits in LM mask' at vCPU reset
Sean Christopherson [Thu, 4 Feb 2021 00:01:06 +0000 (16:01 -0800)]
KVM: x86: Set so called 'reserved CR3 bits in LM mask' at vCPU reset

Set cr3_lm_rsvd_bits, which is effectively an invalid GPA mask, at vCPU
reset.  The reserved bits check needs to be done even if userspace never
configures the guest's CPUID model.

Cc: stable@vger.kernel.org
Fixes: 0107973a80ad ("KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: Add documentation for Xen hypercall and shared_info updates
David Woodhouse [Fri, 4 Dec 2020 09:03:56 +0000 (09:03 +0000)]
KVM: Add documentation for Xen hypercall and shared_info updates

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86: declare Xen HVM shared info capability and add test case
David Woodhouse [Fri, 4 Dec 2020 01:02:04 +0000 (01:02 +0000)]
KVM: x86: declare Xen HVM shared info capability and add test case

Instead of adding a plethora of new KVM_CAP_XEN_FOO capabilities, just
add bits to the return value of KVM_CAP_XEN_HVM.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: Add event channel interrupt vector upcall
David Woodhouse [Wed, 9 Dec 2020 20:08:30 +0000 (20:08 +0000)]
KVM: x86/xen: Add event channel interrupt vector upcall

It turns out that we can't handle event channels *entirely* in userspace
by delivering them as ExtINT, because KVM is a bit picky about when it
accepts ExtINT interrupts from a legacy PIC. The in-kernel local APIC
has to have LVT0 configured in APIC_MODE_EXTINT and unmasked, which
isn't necessarily the case for Xen guests especially on secondary CPUs.

To cope with this, add kvm_xen_get_interrupt() which checks the
evtchn_pending_upcall field in the Xen vcpu_info, and delivers the Xen
upcall vector (configured by KVM_XEN_ATTR_TYPE_UPCALL_VECTOR) if it's
set regardless of LAPIC LVT0 configuration. This gives us the minimum
support we need for completely userspace-based implementation of event
channels.

This does mean that vcpu_enter_guest() needs to check for the
evtchn_pending_upcall flag being set, because it can't rely on someone
having set KVM_REQ_EVENT unless we were to add some way for userspace to
do so manually.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: register vcpu time info region
Joao Martins [Mon, 23 Jul 2018 15:20:57 +0000 (11:20 -0400)]
KVM: x86/xen: register vcpu time info region

Allow the Xen emulated guest the ability to register secondary
vcpu time information. On Xen guests this is used in order to be
mapped to userspace and hence allow vdso gettimeofday to work.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: setup pvclock updates
Joao Martins [Fri, 1 Feb 2019 18:01:45 +0000 (13:01 -0500)]
KVM: x86/xen: setup pvclock updates

Parameterise kvm_setup_pvclock_page() a little bit so that it can be
invoked for different gfn_to_hva_cache structures, and with different
offsets. Then we can invoke it for the normal KVM pvclock and also for
the Xen one in the vcpu_info.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: register vcpu info
Joao Martins [Fri, 29 Jun 2018 14:52:52 +0000 (10:52 -0400)]
KVM: x86/xen: register vcpu info

The vcpu info supersedes the per vcpu area of the shared info page and
the guest vcpus will use this instead.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: Add KVM_XEN_VCPU_SET_ATTR/KVM_XEN_VCPU_GET_ATTR
David Woodhouse [Tue, 2 Feb 2021 16:53:25 +0000 (16:53 +0000)]
KVM: x86/xen: Add KVM_XEN_VCPU_SET_ATTR/KVM_XEN_VCPU_GET_ATTR

This will be used for per-vCPU setup such as runstate and vcpu_info.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: update wallclock region
Joao Martins [Thu, 28 Jun 2018 19:06:43 +0000 (15:06 -0400)]
KVM: x86/xen: update wallclock region

Wallclock on Xen is written in the shared_info page.

To that purpose, export kvm_write_wall_clock() and pass on the GPA of
its location to populate the shared_info wall clock data.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoxen: add wc_sec_hi to struct shared_info
David Woodhouse [Thu, 3 Dec 2020 21:02:23 +0000 (21:02 +0000)]
xen: add wc_sec_hi to struct shared_info

Xen added this in 2015 (Xen 4.6). On x86_64 and Arm it fills what was
previously a 32-bit hole in the generic shared_info structure; on
i386 it had to go at the end of struct arch_shared_info.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: register shared_info page
Joao Martins [Sat, 16 Jun 2018 01:17:14 +0000 (21:17 -0400)]
KVM: x86/xen: register shared_info page

Add KVM_XEN_ATTR_TYPE_SHARED_INFO to allow hypervisor to know where the
guest's shared info page is.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: add definitions of compat_shared_info, compat_vcpu_info
David Woodhouse [Thu, 3 Dec 2020 18:45:22 +0000 (18:45 +0000)]
KVM: x86/xen: add definitions of compat_shared_info, compat_vcpu_info

There aren't a lot of differences for the things that the kernel needs
to care about, but there are a few.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: latch long_mode when hypercall page is set up
David Woodhouse [Thu, 3 Dec 2020 16:20:32 +0000 (16:20 +0000)]
KVM: x86/xen: latch long_mode when hypercall page is set up

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: add KVM_XEN_HVM_SET_ATTR/KVM_XEN_HVM_GET_ATTR
Joao Martins [Thu, 3 Dec 2020 15:52:25 +0000 (15:52 +0000)]
KVM: x86/xen: add KVM_XEN_HVM_SET_ATTR/KVM_XEN_HVM_GET_ATTR

This will be used to set up shared info pages etc.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: Add kvm_xen_enabled static key
David Woodhouse [Tue, 2 Feb 2021 15:48:05 +0000 (15:48 +0000)]
KVM: x86/xen: Add kvm_xen_enabled static key

The code paths for Xen support are all fairly lightweight but if we hide
them behind this, they're even *more* lightweight for any system which
isn't actually hosting Xen guests.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: Move KVM_XEN_HVM_CONFIG handling to xen.c
David Woodhouse [Tue, 2 Feb 2021 13:19:35 +0000 (13:19 +0000)]
KVM: x86/xen: Move KVM_XEN_HVM_CONFIG handling to xen.c

This is already more complex than the simple memcpy it originally had.
Move it to xen.c with the rest of the Xen support.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: Fix coexistence of Xen and Hyper-V hypercalls
Joao Martins [Wed, 13 Jun 2018 13:55:44 +0000 (09:55 -0400)]
KVM: x86/xen: Fix coexistence of Xen and Hyper-V hypercalls

Disambiguate Xen vs. Hyper-V calls by adding 'orl $0x80000000, %eax'
at the start of the Hyper-V hypercall page when Xen hypercalls are
also enabled.

That bit is reserved in the Hyper-V ABI, and those hypercall numbers
will never be used by Xen (because it does precisely the same trick).

Switch to using kvm_vcpu_write_guest() while we're at it, instead of
open-coding it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: intercept xen hypercalls if enabled
Joao Martins [Wed, 13 Jun 2018 13:55:44 +0000 (09:55 -0400)]
KVM: x86/xen: intercept xen hypercalls if enabled

Add a new exit reason for emulator to handle Xen hypercalls.

Since this means KVM owns the ABI, dispense with the facility for the
VMM to provide its own copy of the hypercall pages; just fill them in
directly using VMCALL/VMMCALL as we do for the Hyper-V hypercall page.

This behaviour is enabled by a new INTERCEPT_HCALL flag in the
KVM_XEN_HVM_CONFIG ioctl structure, and advertised by the same flag
being returned from the KVM_CAP_XEN_HVM check.

Rename xen_hvm_config() to kvm_xen_write_hypercall_page() and move it
to the nascent xen.c while we're at it, and add a test case.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: Fix __user pointer handling for hypercall page installation
David Woodhouse [Tue, 2 Feb 2021 11:05:10 +0000 (11:05 +0000)]
KVM: x86/xen: Fix __user pointer handling for hypercall page installation

The address we give to memdup_user() isn't correctly tagged as __user.
This is harmless enough as it's a one-off use and we're doing exactly
the right thing, but fix it anyway to shut the checker up. Otherwise
it'll whine when the (now legacy) code gets moved around in a later
patch.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/xen: fix Xen hypercall page msr handling
Joao Martins [Wed, 13 Jun 2018 10:10:37 +0000 (06:10 -0400)]
KVM: x86/xen: fix Xen hypercall page msr handling

Xen usually places its MSR at 0x40000000 or 0x40000200 depending on
whether it is running in viridian mode or not. Note that this is not
ABI guaranteed, so it is possible for Xen to advertise the MSR some
place else.

Given the way xen_hvm_config() is handled, if the former address is
selected, this will conflict with Hyper-V's MSR
(HV_X64_MSR_GUEST_OS_ID) which unconditionally uses the same address.

Given that the MSR location is arbitrary, move the xen_hvm_config()
handling to the top of kvm_set_msr_common() before falling through.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
4 years agoKVM: x86/mmu: Allow parallel page faults for the TDP MMU
Ben Gardon [Tue, 2 Feb 2021 18:57:29 +0000 (10:57 -0800)]
KVM: x86/mmu: Allow parallel page faults for the TDP MMU

Make the last few changes necessary to enable the TDP MMU to handle page
faults in parallel while holding the mmu_lock in read mode.

Reviewed-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-24-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86/mmu: Mark SPTEs in disconnected pages as removed
Ben Gardon [Tue, 2 Feb 2021 18:57:28 +0000 (10:57 -0800)]
KVM: x86/mmu: Mark SPTEs in disconnected pages as removed

When clearing TDP MMU pages what have been disconnected from the paging
structure root, set the SPTEs to a special non-present value which will
not be overwritten by other threads. This is needed to prevent races in
which a thread is clearing a disconnected page table, but another thread
has already acquired a pointer to that memory and installs a mapping in
an already cleared entry. This can lead to memory leaks and accounting
errors.

Reviewed-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-23-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86/mmu: Flush TLBs after zap in TDP MMU PF handler
Ben Gardon [Tue, 2 Feb 2021 18:57:27 +0000 (10:57 -0800)]
KVM: x86/mmu: Flush TLBs after zap in TDP MMU PF handler

When the TDP MMU is allowed to handle page faults in parallel there is
the possiblity of a race where an SPTE is cleared and then imediately
replaced with a present SPTE pointing to a different PFN, before the
TLBs can be flushed. This race would violate architectural specs. Ensure
that the TLBs are flushed properly before other threads are allowed to
install any present value for the SPTE.

Reviewed-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-22-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86/mmu: Use atomic ops to set SPTEs in TDP MMU map
Ben Gardon [Tue, 2 Feb 2021 18:57:26 +0000 (10:57 -0800)]
KVM: x86/mmu: Use atomic ops to set SPTEs in TDP MMU map

To prepare for handling page faults in parallel, change the TDP MMU
page fault handler to use atomic operations to set SPTEs so that changes
are not lost if multiple threads attempt to modify the same SPTE.

Reviewed-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-21-bgardon@google.com>
[Document new locking rules. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86/mmu: Factor out functions to add/remove TDP MMU pages
Ben Gardon [Tue, 2 Feb 2021 18:57:25 +0000 (10:57 -0800)]
KVM: x86/mmu: Factor out functions to add/remove TDP MMU pages

Move the work of adding and removing TDP MMU pages to/from  "secondary"
data structures to helper functions. These functions will be built on in
future commits to enable MMU operations to proceed (mostly) in parallel.

No functional change expected.

Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-20-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86/mmu: Use an rwlock for the x86 MMU
Ben Gardon [Tue, 2 Feb 2021 18:57:24 +0000 (10:57 -0800)]
KVM: x86/mmu: Use an rwlock for the x86 MMU

Add a read / write lock to be used in place of the MMU spinlock on x86.
The rwlock will enable the TDP MMU to handle page faults, and other
operations in parallel in future commits.

Reviewed-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-19-bgardon@google.com>
[Introduce virt/kvm/mmu_lock.h - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agosched: Add cond_resched_rwlock
Ben Gardon [Tue, 2 Feb 2021 18:57:14 +0000 (10:57 -0800)]
sched: Add cond_resched_rwlock

Safely rescheduling while holding a spin lock is essential for keeping
long running kernel operations running smoothly. Add the facility to
cond_resched rwlocks.

CC: Ingo Molnar <mingo@redhat.com>
CC: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-9-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agosched: Add needbreak for rwlocks
Ben Gardon [Tue, 2 Feb 2021 18:57:13 +0000 (10:57 -0800)]
sched: Add needbreak for rwlocks

Contention awareness while holding a spin lock is essential for reducing
latency when long running kernel operations can hold that lock. Add the
same contention detection interface for read/write spin locks.

CC: Ingo Molnar <mingo@redhat.com>
CC: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-8-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>