]> www.infradead.org Git - users/jedix/linux-maple.git/commitdiff
KVM/x86: Add IBPB support
authorAshok Raj <ashok.raj@intel.com>
Thu, 1 Feb 2018 21:59:43 +0000 (22:59 +0100)
committerBrian Maly <brian.maly@oracle.com>
Tue, 8 Jan 2019 16:12:19 +0000 (11:12 -0500)
The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.

Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.

IBPB helps mitigate against three potential attacks:

* Mitigate guests from being attacked by other guests.
  - This is addressed by issing IBPB when we do a guest switch.

* Mitigate attacks from guest/ring3->host/ring3.
  These would require a IBPB during context switch in host, or after
  VMEXIT. The host process has two ways to mitigate
  - Either it can be compiled with retpoline
  - If its going through context switch, and has set !dumpable then
    there is a IBPB in that path.
    (Tim's patch: https://patchwork.kernel.org/patch/10192871)
  - The case where after a VMEXIT you return back to Qemu might make
    Qemu attackable from guest when Qemu isn't compiled with retpoline.
  There are issues reported when doing IBPB on every VMEXIT that resulted
  in some tsc calibration woes in guest.

* Mitigate guest/ring0->host/ring0 attacks.
  When host kernel is using retpoline it is safe against these attacks.
  If host kernel isn't using retpoline we might need to do a IBPB flush on
  every VMEXIT.

Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.

* IBPB is issued only for SVM during svm_free_vcpu().
  VMX has a vmclear and SVM doesn't.  Follow discussion here:
  https://lkml.org/lkml/2018/1/15/146

Please refer to the following spec for more details on the enumeration
and control.

Refer here to get documentation about mitigations.

https://software.intel.com/en-us/side-channel-security-support

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
           - vmx: expose PRED_CMD if guest has it in CPUID
           - svm: only pass through IBPB if guest has it in CPUID
           - vmx: support !cpu_has_vmx_msr_bitmap()]
           - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
        PRED_CMD is a write-only MSR]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: kvm@vger.kernel.org
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon.de
(cherry picked from commit 15d45071523d89b3fb7372e2135fbd72f6af9506)

Orabug: 28069548

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kvm/cpuid.c
arch/x86/kvm/svm.c
arch/x86/kvm/vmx.c

All the conflicts were contextual. Major differences in the code between UEK4
and upstream (also in UEK4 we only have the feature IBRS, not SPEC_CTRL). We
had to introduce guest_cpuid_has_* functions in cpuid.h for each feature. Also
moved defines in cpuid.h that were needed in cpuid.h and cpuid.c.

Signed-off-by: Brian Maly <brian.maly@oracle.com>
arch/x86/kvm/cpuid.c
arch/x86/kvm/cpuid.h
arch/x86/kvm/svm.c
arch/x86/kvm/vmx.c

index 58bfdbd02a61c39998c80cfe29b32a4c0f9eec04..feda28e6f539f62bff0d6a70a1b1127dad8d0188 100644 (file)
@@ -56,20 +56,6 @@ u64 kvm_supported_xcr0(void)
        return xcr0;
 }
 
-#define F(x) bit(X86_FEATURE_##x)
-
-/* These are scattered features in cpufeatures.h. */
-#define KVM_CPUID_BIT_IBRS             26
-#define KVM_CPUID_BIT_STIBP            27
-#define KVM_CPUID_BIT_IA32_ARCH_CAPS   29
-#define KVM_CPUID_BIT_SSBD             31
-
-
-/* CPUID[eax=0x80000008].ebx */
-#define KVM_CPUID_BIT_IBPB_SUPPORT     12
-#define KVM_CPUID_BIT_VIRT_SSBD                25
-
-#define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
@@ -372,7 +358,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
        /* cpuid 0x80000008.ebx */
        const u32 kvm_cpuid_80000008_ebx_x86_features =
-               KF(IBPB_SUPPORT) | KF(VIRT_SSBD);
+               KF(IBPB) | KF(VIRT_SSBD);
 
        /* all calls to cpuid_count() should be made on the same cpu */
        get_cpu();
@@ -609,7 +595,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
                entry->ebx &= kvm_cpuid_80000008_ebx_x86_features;
 
                if ( !boot_cpu_has(X86_FEATURE_IBPB) )
-                       entry->ebx &= ~(1u << KVM_CPUID_BIT_IBPB_SUPPORT);
+                       entry->ebx &= ~(1u << KVM_CPUID_BIT_IBPB);
 
                if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
                        entry->ebx |= KF(VIRT_SSBD);
index 496b3695d3d3c96fd2687b2b6bc013d9ee8d96e5..97f1c35eb7611101484f28172cb851d66fec1b52 100644 (file)
@@ -125,4 +125,31 @@ static inline bool guest_cpuid_has_mpx(struct kvm_vcpu *vcpu)
        best = kvm_find_cpuid_entry(vcpu, 7, 0);
        return best && (best->ebx & bit(X86_FEATURE_MPX));
 }
+
+#define F(x) bit(X86_FEATURE_##x)
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_IBPB             12
+#define KVM_CPUID_BIT_VIRT_SSBD                25
+#define KVM_CPUID_BIT_IBRS             26
+#define KVM_CPUID_BIT_STIBP            27
+#define KVM_CPUID_BIT_IA32_ARCH_CAPS   29
+#define KVM_CPUID_BIT_SSBD             31
+
+static inline bool guest_cpuid_has_ibpb(struct kvm_vcpu *vcpu)
+{
+       struct kvm_cpuid_entry2 *best;
+
+       best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0);
+       return best && (best->ebx & KF(IBPB));
+}
+
+static inline bool guest_cpuid_has_ibrs(struct kvm_vcpu *vcpu)
+{
+       struct kvm_cpuid_entry2 *best;
+
+       best = kvm_find_cpuid_entry(vcpu, 7, 0);
+       return best && (best->edx & KF(IBRS));
+}
 #endif
index b5195e02c549318d016b7a724ede35dac111dfa0..09f118462999842e580b5f8d490ebf10d8ba52bd 100644 (file)
@@ -194,7 +194,7 @@ static const struct svm_direct_access_msrs {
        { .index = MSR_IA32_LASTINTFROMIP,              .always = false },
        { .index = MSR_IA32_LASTINTTOIP,                .always = false },
        { .index = MSR_IA32_SPEC_CTRL,                  .always = true },
-       { .index = MSR_IA32_PRED_CMD,                   .always = true },
+       { .index = MSR_IA32_PRED_CMD,                   .always = false },
        { .index = MSR_INVALID,                         .always = false },
 };
 
@@ -3304,6 +3304,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
                svm->spec_ctrl = data;
                break;
        case MSR_IA32_PRED_CMD:
+               if (!msr->host_initiated &&
+                   !guest_cpuid_has_ibpb(vcpu))
+                       return 1;
+
                if (data & ~FEATURE_SET_IBPB)
                        return 1;
 
@@ -3312,6 +3316,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 
                if (ibpb_inuse)
                        wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
+
+               if (is_guest_mode(vcpu))
+                       break;
+               set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
                break;
        case MSR_AMD64_VIRT_SPEC_CTRL:
                if (data & ~SPEC_CTRL_SSBD)
index 9c83350297c53dac4198b54c50fcd85ee37c8675..e5a13224b919e6daa4294c4c3b50997bb18dc4bc 100644 (file)
@@ -986,6 +986,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
 static int alloc_identity_pagetable(struct kvm *kvm);
 
 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+                                                         u32 msr, int type);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -1724,6 +1726,29 @@ static u32 vmx_read_guest_seg_ar(struct vcpu_vmx *vmx, unsigned seg)
        return *p;
 }
 
+/*
+ * Check if MSR is intercepted for L01 MSR bitmap.
+ */
+static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
+{
+       unsigned long *msr_bitmap;
+       int f = sizeof(unsigned long);
+
+       if (!cpu_has_vmx_msr_bitmap())
+               return true;
+
+       msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
+
+       if (msr <= 0x1fff) {
+               return !!test_bit(msr, msr_bitmap + 0x800 / f);
+       } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+               msr &= 0x1fff;
+               return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+       }
+
+       return true;
+}
+
 static void update_exception_bitmap(struct kvm_vcpu *vcpu)
 {
        u32 eb;
@@ -2934,6 +2959,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
                to_vmx(vcpu)->spec_ctrl = data;
                break;
        case MSR_IA32_PRED_CMD:
+               if (!msr_info->host_initiated &&
+                   !guest_cpuid_has_ibpb(vcpu) &&
+                   !guest_cpuid_has_ibrs(vcpu))
+                       return 1;
+
                if (data & ~FEATURE_SET_IBPB)
                        return 1;
 
@@ -2942,6 +2972,20 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
                if (ibpb_inuse)
                        wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
+
+               /*
+                * For non-nested:
+                * When it's written (to non-zero) for the first time, pass
+                * it through.
+                *
+                * For nested:
+                * The handling of the MSR bitmap for L2 guests is done in
+                * nested_vmx_merge_msr_bitmap. We should not touch the
+                * vmcs02.msr_bitmap here since it gets completely overwritten
+                * in the merging.
+                */
+               vmx_disable_intercept_for_msr(to_vmx(vcpu)->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
+                                             MSR_TYPE_W);
                break;
        case MSR_IA32_ARCH_CAPABILITIES:
                vmx->arch_capabilities = data;
@@ -9073,8 +9117,23 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
        unsigned long *msr_bitmap_l1;
        unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
 
-       /* This shortcut is ok because we support only x2APIC MSRs so far. */
-       if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
+       /*
+        * pred_cmd is trying to verify two things:
+        *
+        * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
+        *    ensures that we do not accidentally generate an L02 MSR bitmap
+        *    from the L12 MSR bitmap that is too permissive.
+        * 2. That L1 or L2s have actually used the MSR. This avoids
+        *    unnecessarily merging of the bitmap if the MSR is unused. This
+        *    works properly because we only update the L01 MSR bitmap lazily.
+        *    So even if L0 should pass L1 these MSRs, the L01 bitmap is only
+        *    updated to reflect this when L1 (or its L2s) actually write to
+        *    the MSR.
+        */
+       bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+
+       if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
+           !pred_cmd)
                return false;
 
        page = nested_get_page(vcpu, vmcs12->msr_bitmap);
@@ -9114,6 +9173,13 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
                                MSR_TYPE_W);
                }
        }
+
+       if (pred_cmd)
+               nested_vmx_disable_intercept_for_msr(
+                                       msr_bitmap_l1, msr_bitmap_l0,
+                                       MSR_IA32_PRED_CMD,
+                                       MSR_TYPE_W);
+
        kunmap(page);
        nested_release_page_clean(page);