]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
7 years agox86: more ibrs/pti fixes
Pavel Tatashin [Sun, 7 Jan 2018 18:39:15 +0000 (10:39 -0800)]
x86: more ibrs/pti fixes

Restore IBRS before cr3 is restored, and save IBRS3 after
switching to kernel cr3.

Orabug: 27333760
CVE: CVE-2017-5754

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/spec: Actually do the check for in_use on ENABLE_IBRS
Konrad Rzeszutek Wilk [Sun, 7 Jan 2018 16:30:51 +0000 (11:30 -0500)]
x86/spec: Actually do the check for in_use on ENABLE_IBRS

Orabug: 27344012
CVE: CVE-2017-5715
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokvm: svm: Expose the CPUID.0x80000008 ebx flag.
Konrad Rzeszutek Wilk [Thu, 4 Jan 2018 18:13:30 +0000 (13:13 -0500)]
kvm: svm: Expose the CPUID.0x80000008 ebx flag.

If it is enabled of course.

Orabug: 27344012
CVE: CVE-2017-5715

Reviewed-by: Todd Vierling <todd.vierling@oracle.com>
Acked-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/spec_ctrl: Provide the sysfs version of the ibrs_enabled
Konrad Rzeszutek Wilk [Fri, 5 Jan 2018 03:37:27 +0000 (22:37 -0500)]
x86/spec_ctrl: Provide the sysfs version of the ibrs_enabled

as well as the proc version.

This is rather important as customers are asking about the sysfs
now. But this may change in the future. See

https://lkml.org/lkml/2018/1/6/303

Orabug: 27344012
CVE: CVE-2017-5715

Reviewed-by: Todd Vierling <todd.vierling@oracle.com>
Acked-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86: Use better #define for FEATURE_ENABLE_IBRS and 0
Konrad Rzeszutek Wilk [Fri, 5 Jan 2018 03:34:14 +0000 (22:34 -0500)]
x86: Use better #define for FEATURE_ENABLE_IBRS and 0

Upstream patches use:
 SPEC_CTRL_FEATURE_DISABLE_IBRS for 0
 SPEC_CTRL_FEATURE_ENABLE_IBRS for  1

Lets use those fancy names so that it is easier to look
in the code and compare to upstream.

Orabug: 27344012
CVE: CVE-2017-5715

Reviewed-by: Todd Vierling <todd.vierling@oracle.com>
Acked-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86: Instead of 0x2, 0x4, and 0x1 use #defines.
Konrad Rzeszutek Wilk [Fri, 5 Jan 2018 02:33:07 +0000 (21:33 -0500)]
x86: Instead of 0x2, 0x4, and 0x1 use #defines.

This mirrors what the posted upstream patches are doing.

Orabug: 27344012
CVE: CVE-2017-5715

Reviewed-by: Todd Vierling <todd.vierling@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokpti: Disable when running under Xen PV
Konrad Rzeszutek Wilk [Sun, 7 Jan 2018 05:17:15 +0000 (00:17 -0500)]
kpti: Disable when running under Xen PV

Very very partial backport from aa8c6248f8c75 where
there is a check to see if this is an Xen PV guest - and
if so disable it.

The reason is that the PV ABI would require a major
overhaul to be Meltdown resistent.

Instead there are mitigations (PV in HVM) which are far more
suitable.

Orabug: 27333760
CVE: CVE-2017-5754

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86: Don't ENABLE_IBRS in nmi when we are still running on user cr3
Konrad Rzeszutek Wilk [Sun, 7 Jan 2018 04:53:40 +0000 (23:53 -0500)]
x86: Don't ENABLE_IBRS in nmi when we are still running on user cr3

It won't end well - especially as we need to be careful about
touching kernel variables and can only do that in the kernel cr3.

The code:
        movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp

may lead one to believe you can access kernel variables, but in fact
we haven't yet switched over the kernel cr3.

Orabug: 27344012
CVE: CVE-2017-5715

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/enter: Use IBRS on syscall and interrupts - fix ia32 path
Konrad Rzeszutek Wilk [Sun, 7 Jan 2018 04:35:11 +0000 (23:35 -0500)]
x86/enter: Use IBRS on syscall and interrupts - fix ia32 path

The backports missed a tiny bit of changes.

The easier of them is the ia32_syscall - there are two ways it returns
back to userspace - to int_ret_from_sys_call and there eventually
end up either in syscall_return_via_sysret or opportunistic_sysret_failed.

syscall_return_via_sysret had it, but opportunistic_sysret_failed failed
to have it. That is b/c we optimized a bit and stuck the DISABLE_IBRS
on restore_c_regs_and_iret which was called from opportunistic_sysret_failed
and retint_swapgs.

But with KPTI, doing IBRS_DISABLE from within restore_c_regs_and_iret is
not good - as we are touching an kernel variable and restore_c_regs_and_iret is
running with user-mode cr3!

So "x86: Fix spectre/kpti integration" fixed it by adding the DISABLE_IBRS
syscall_return_via_sysret.
(If you look at the original commit you would think that we should
also fix opportunistic_sysret_failed, but that is fixed in
"x86: Fix spectre/kpti integration")

The seconday issue is that we did not call DISABLE_IBRS from
sysexit_from_sys_call. This patch adds that in too.

Orabug: 27344012
CVE: CVE-2017-5715

Reported-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86: Fix spectre/kpti integration
Konrad Rzeszutek Wilk [Sun, 7 Jan 2018 04:25:30 +0000 (23:25 -0500)]
x86: Fix spectre/kpti integration

The issue is that DISABLE_IBRS (and pretty much all of the _IBRS) first
operation is touching an kernel variable. The restore_c_regs_and_iret is
already in user-space cr3 so we page fault.

The fix is simple - do not run any of the IBRS macros from within
restore_c_regs_and_iret. Which means that the three functions that
used to call it now have to call IBRS_DISABLE by themselves:
retint_swapgs, opportunistic_sysret_failed, and nmi.

Adding in the IBRS_DISABLE in opportunistic_sysret_failed also
fixes another bug - which is more clearly explained in
"x86/enter: Use IBRS on syscall and interrupts  - fix ia32 path"

Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoPTI: unbreak EFI old_memmap
Jiri Kosina [Fri, 5 Jan 2018 19:21:38 +0000 (11:21 -0800)]
PTI: unbreak EFI old_memmap

old_memmap's efi_call_phys_prolog() calls set_pgd() with swapper PGD that
has PAGE_USER set, which makes PTI set NX on it, and therefore EFI can't
execute it's code.

Fix that by forcefully clearing _PAGE_NX from the PGD (this can't be done
by the pgprot API).

_PAGE_NX will be automatically reintroduced in efi_call_phys_epilog(), as
_set_pgd() will again notice that this is _PAGE_USER, and set _PAGE_NX on
it.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoKAISER KABI tweaks.
Martin K. Petersen [Thu, 4 Jan 2018 21:45:34 +0000 (16:45 -0500)]
KAISER KABI tweaks.

Makes KPTI KABI compatible.

Orabug: 27333760
CVE: CVE-2017-5754

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/ldt: fix crash in ldt freeing.
Jamie Iles [Fri, 5 Jan 2018 18:13:10 +0000 (18:13 +0000)]
x86/ldt: fix crash in ldt freeing.

94b1f3e2c4b7 (kaiser: merged update) factored out __free_ldt_struct() to
use vfree/free_page, but in the page allocation case it is actually
allocated with kmalloc so needs to be freed with kfree and not
free_page().

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jamie Iles <jamie.iles@oracle.com>
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/entry: Define 'cpu_current_top_of_stack' for 64-bit code
Denys Vlasenko [Fri, 24 Apr 2015 15:31:35 +0000 (17:31 +0200)]
x86/entry: Define 'cpu_current_top_of_stack' for 64-bit code

32-bit code has PER_CPU_VAR(cpu_current_top_of_stack).
64-bit code uses somewhat more obscure: PER_CPU_VAR(cpu_tss + TSS_sp0).

Define the 'cpu_current_top_of_stack' macro on CONFIG_X86_64
as well so that the PER_CPU_VAR(cpu_current_top_of_stack)
expression can be used in both 32-bit and 64-bit code.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 3a23208e69679597e767cf3547b1a30dd845d9b5)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/ia32/ia32entry.S

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/entry: Remove unused 'kernel_stack' per-cpu variable
Denys Vlasenko [Fri, 24 Apr 2015 15:31:34 +0000 (17:31 +0200)]
x86/entry: Remove unused 'kernel_stack' per-cpu variable

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit fed7c3f0f750f225317828d691e9eb76eec887b3)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/entry: Stop using PER_CPU_VAR(kernel_stack)
Denys Vlasenko [Fri, 24 Apr 2015 15:31:33 +0000 (17:31 +0200)]
x86/entry: Stop using PER_CPU_VAR(kernel_stack)

PER_CPU_VAR(kernel_stack) is redundant:

  - On the 64-bit build, we can use PER_CPU_VAR(cpu_tss + TSS_sp0).
  - On the 32-bit build, we can use PER_CPU_VAR(cpu_current_top_of_stack).

PER_CPU_VAR(kernel_stack) will be deleted by a separate change.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 63332a8455d8310b77d38779c6c21a660a8d9feb)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: Set _PAGE_NX only if supported
Guenter Roeck [Thu, 4 Jan 2018 21:41:55 +0000 (13:41 -0800)]
kaiser: Set _PAGE_NX only if supported

This resolves a crash if loaded under qemu + haxm under windows.
See https://www.spinics.net/lists/kernel/msg2689835.html for details.
Here is a boot log (the log is from chromeos-4.4, but Tao Wu says that
the same log is also seen with vanilla v4.4.110-rc1).

[    0.712750] Freeing unused kernel memory: 552K
[    0.721821] init: Corrupted page table at address 57b029b332e0
[    0.722761] PGD 80000000bb238067 PUD bc36a067 PMD bc369067 PTE 45d2067
[    0.722761] Bad pagetable: 000b [#1] PREEMPT SMP
[    0.722761] Modules linked in:
[    0.722761] CPU: 1 PID: 1 Comm: init Not tainted 4.4.96 #31
[    0.722761] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[    0.722761] task: ffff8800bc290000 ti: ffff8800bc28c000 task.ti: ffff8800bc28c000
[    0.722761] RIP: 0010:[<ffffffff83f4129e>]  [<ffffffff83f4129e>] __clear_user+0x42/0x67
[    0.722761] RSP: 0000:ffff8800bc28fcf8  EFLAGS: 00010202
[    0.722761] RAX: 0000000000000000 RBX: 00000000000001a4 RCX: 00000000000001a4
[    0.722761] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000057b029b332e0
[    0.722761] RBP: ffff8800bc28fd08 R08: ffff8800bc290000 R09: ffff8800bb2f4000
[    0.722761] R10: ffff8800bc290000 R11: ffff8800bb2f4000 R12: 000057b029b332e0
[    0.722761] R13: 0000000000000000 R14: 000057b029b33340 R15: ffff8800bb1e2a00
[    0.722761] FS:  0000000000000000(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
[    0.722761] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.722761] CR2: 000057b029b332e0 CR3: 00000000bb2f8000 CR4: 00000000000006e0
[    0.722761] Stack:
[    0.722761]  000057b029b332e0 ffff8800bb95fa80 ffff8800bc28fd18 ffffffff83f4120c
[    0.722761]  ffff8800bc28fe18 ffffffff83e9e7a1 ffff8800bc28fd68 0000000000000000
[    0.722761]  ffff8800bc290000 ffff8800bc290000 ffff8800bc290000 ffff8800bc290000
[    0.722761] Call Trace:
[    0.722761]  [<ffffffff83f4120c>] clear_user+0x2e/0x30
[    0.722761]  [<ffffffff83e9e7a1>] load_elf_binary+0xa7f/0x18f7
[    0.722761]  [<ffffffff83de2088>] search_binary_handler+0x86/0x19c
[    0.722761]  [<ffffffff83de389e>] do_execveat_common.isra.26+0x909/0xf98
[    0.722761]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
[    0.722761]  [<ffffffff83de40be>] do_execve+0x23/0x25
[    0.722761]  [<ffffffff83c002e3>] run_init_process+0x2b/0x2d
[    0.722761]  [<ffffffff844fec4d>] kernel_init+0x6d/0xda
[    0.722761]  [<ffffffff84505b2f>] ret_from_fork+0x3f/0x70
[    0.722761]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
[    0.722761] Code: 86 84 be 12 00 00 00 e8 87 0d e8 ff 66 66 90 48 89 d8 48 c1
eb 03 4c 89 e7 83 e0 07 48 89 d9 be 08 00 00 00 31 d2 48 85 c9 74 0a <48> 89 17
48 01 f7 ff c9 75 f6 48 89 c1 85 c9 74 09 88 17 48 ff
[    0.722761] RIP  [<ffffffff83f4129e>] __clear_user+0x42/0x67
[    0.722761]  RSP <ffff8800bc28fcf8>
[    0.722761] ---[ end trace def703879b4ff090 ]---
[    0.722761] BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/v4.4/kernel/locking/rwsem.c:21
[    0.722761] in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: init
[    0.722761] CPU: 1 PID: 1 Comm: init Tainted: G      D         4.4.96 #31
[    0.722761] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[    0.722761]  0000000000000086 dcb5d76098c89836 ffff8800bc28fa30 ffffffff83f34004
[    0.722761]  ffffffff84839dc2 0000000000000015 ffff8800bc28fa40 ffffffff83d57dc9
[    0.722761]  ffff8800bc28fa68 ffffffff83d57e6a ffffffff84a53640 0000000000000000
[    0.722761] Call Trace:
[    0.722761]  [<ffffffff83f34004>] dump_stack+0x4d/0x63
[    0.722761]  [<ffffffff83d57dc9>] ___might_sleep+0x13a/0x13c
[    0.722761]  [<ffffffff83d57e6a>] __might_sleep+0x9f/0xa6
[    0.722761]  [<ffffffff84502788>] down_read+0x20/0x31
[    0.722761]  [<ffffffff83cc5d9b>] __blocking_notifier_call_chain+0x35/0x63
[    0.722761]  [<ffffffff83cc5ddd>] blocking_notifier_call_chain+0x14/0x16
[    0.800374] usb 1-1: new full-speed USB device number 2 using uhci_hcd
[    0.722761]  [<ffffffff83cefe97>] profile_task_exit+0x1a/0x1c
[    0.802309]  [<ffffffff83cac84e>] do_exit+0x39/0xe7f
[    0.802309]  [<ffffffff83ce5938>] ? vprintk_default+0x1d/0x1f
[    0.802309]  [<ffffffff83d7bb95>] ? printk+0x57/0x73
[    0.802309]  [<ffffffff83c46e25>] oops_end+0x80/0x85
[    0.802309]  [<ffffffff83c7b747>] pgtable_bad+0x8a/0x95
[    0.802309]  [<ffffffff83ca7f4a>] __do_page_fault+0x8c/0x352
[    0.802309]  [<ffffffff83eefba5>] ? file_has_perm+0xc4/0xe5
[    0.802309]  [<ffffffff83ca821c>] do_page_fault+0xc/0xe
[    0.802309]  [<ffffffff84507682>] page_fault+0x22/0x30
[    0.802309]  [<ffffffff83f4129e>] ? __clear_user+0x42/0x67
[    0.802309]  [<ffffffff83f4127f>] ? __clear_user+0x23/0x67
[    0.802309]  [<ffffffff83f4120c>] clear_user+0x2e/0x30
[    0.802309]  [<ffffffff83e9e7a1>] load_elf_binary+0xa7f/0x18f7
[    0.802309]  [<ffffffff83de2088>] search_binary_handler+0x86/0x19c
[    0.802309]  [<ffffffff83de389e>] do_execveat_common.isra.26+0x909/0xf98
[    0.802309]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
[    0.802309]  [<ffffffff83de40be>] do_execve+0x23/0x25
[    0.802309]  [<ffffffff83c002e3>] run_init_process+0x2b/0x2d
[    0.802309]  [<ffffffff844fec4d>] kernel_init+0x6d/0xda
[    0.802309]  [<ffffffff84505b2f>] ret_from_fork+0x3f/0x70
[    0.802309]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
[    0.830559] Kernel panic - not syncing: Attempted to kill init!  exitcode=0x00000009
[    0.830559]
[    0.831305] Kernel Offset: 0x2c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    0.831305] ---[ end Kernel panic - not syncing: Attempted to kill init!  exitcode=0x00000009

The crash part of this problem may be solved with the following patch
(thanks to Hugh for the hint). There is still another problem, though -
with this patch applied, the qemu session aborts with "VCPU Shutdown
request", whatever that means.

Cc: lepton <ytht.net@gmail.com>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b33c3c64c4786cd724ccde6fa97c87ada49f6a73)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
Andy Lutomirski [Fri, 11 Dec 2015 03:20:20 +0000 (19:20 -0800)]
x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap

commit dac16fba6fc590fa7239676b35ed75dae4c4cd2b upstream.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/9d37826fdc7e2d2809efe31d5345f97186859284.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jamie Iles <jamie.iles@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 755bd549d9328d6d1e949a0a213f9a78e84d11fc)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/vdso/vclock_gettime.c (not in this tree)
arch/x86/vdso/vclock_gettime.c (patched instead of that)
arch/x86/entry/vdso/vdso2c.c (not in this tree)
arch/x86/vdso/vdso2c.c (patched instead of that)
arch/x86/entry/vdso/vma.c (not in this tree)
arch/x86/vdso/vma.c (patched instead of that)

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoKPTI: Report when enabled
Kees Cook [Wed, 3 Jan 2018 18:43:32 +0000 (10:43 -0800)]
KPTI: Report when enabled

Make sure dmesg reports when KPTI is enabled.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bfd51a4d715b6ef44bd01b9fbfc13da936f93d76)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/mm/kaiser.c

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoKPTI: Rename to PAGE_TABLE_ISOLATION
Kees Cook [Wed, 3 Jan 2018 18:43:15 +0000 (10:43 -0800)]
KPTI: Rename to PAGE_TABLE_ISOLATION

This renames CONFIG_KAISER to CONFIG_PAGE_TABLE_ISOLATION.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3e1457d6bf26d9ec300781f84cd0057e44deb45d)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/kaiser: Move feature detection up
Borislav Petkov [Mon, 25 Dec 2017 12:57:16 +0000 (13:57 +0100)]
x86/kaiser: Move feature detection up

... before the first use of kaiser_enabled as otherwise funky
things happen:

  about to get started...
  (XEN) d0v0 Unhandled page fault fault/trap [#14, ec=0000]
  (XEN) Pagetable walk from ffff88022a449090:
  (XEN)  L4[0x110] = 0000000229e0e067 0000000000001e0e
  (XEN)  L3[0x008] = 0000000000000000 ffffffffffffffff
  (XEN) domain_crash_sync called from entry.S: fault at ffff82d08033fd08
  entry.o#create_bounce_frame+0x135/0x14d
  (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
  (XEN) ----[ Xen-4.9.1_02-3.21  x86_64  debug=n   Not tainted ]----
  (XEN) CPU:    0
  (XEN) RIP:    e033:[<ffffffff81007460>]
  (XEN) RFLAGS: 0000000000000286   EM: 1   CONTEXT: pv guest (d0v0)

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7f79599df9c4a36130f7a4f6778b334a97632477)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/kernel/setup.c

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/kaiser: Reenable PARAVIRT
Borislav Petkov [Tue, 2 Jan 2018 13:19:49 +0000 (14:19 +0100)]
x86/kaiser: Reenable PARAVIRT

Now that the required bits have been addressed, reenable
PARAVIRT.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 750fb627d764eb66430c36961b94ab0002694c02)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/paravirt: Dont patch flush_tlb_single
Thomas Gleixner [Mon, 4 Dec 2017 14:07:30 +0000 (15:07 +0100)]
x86/paravirt: Dont patch flush_tlb_single

commit a035795499ca1c2bd1928808d1a156eda1420383 upstream

native_flush_tlb_single() will be changed with the upcoming
PAGE_TABLE_ISOLATION feature. This requires to have more code in
there than INVLPG.

Remove the paravirt patching for it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Cc: michael.schwarz@iaik.tugraz.at
Cc: moritz.lipp@iaik.tugraz.at
Cc: richard.fellner@student.tugraz.at
Link: https://lkml.kernel.org/r/20171204150606.828111617@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3e809caffdd7beeac731feb16788873c3bdb811e)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: kaiser_flush_tlb_on_return_to_user() check PCID
Hugh Dickins [Sun, 5 Nov 2017 01:43:06 +0000 (18:43 -0700)]
kaiser: kaiser_flush_tlb_on_return_to_user() check PCID

Let kaiser_flush_tlb_on_return_to_user() do the X86_FEATURE_PCID
check, instead of each caller doing it inline first: nobody needs
to optimize for the noPCID case, it's clearer this way, and better
suits later changes.  Replace those no-op X86_CR3_PCID_KERN_FLUSH lines
by a BUILD_BUG_ON() in load_new_mm_cr3(), in case something changes.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8eaca4c7d9f167209a9cc568ff028c0a3b0deb2d)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: asm/tlbflush.h handle noPGE at lower level
Hugh Dickins [Sun, 5 Nov 2017 01:23:24 +0000 (18:23 -0700)]
kaiser: asm/tlbflush.h handle noPGE at lower level

I found asm/tlbflush.h too twisty, and think it safer not to avoid
__native_flush_tlb_global_irq_disabled() in the kaiser_enabled case,
but instead let it handle kaiser_enabled along with cr3: it can just
use __native_flush_tlb() for that, no harm in re-disabling preemption.

(This is not the same change as Kirill and Dave have suggested for
upstream, flipping PGE in cr4: that's neat, but needs a cpu_has_pge
check; cr3 is enough for kaiser, and thought to be cheaper than cr4.)

Also delete the X86_FEATURE_INVPCID invpcid_flush_all_nonglobals()
preference from __native_flush_tlb(): unlike the invpcid_flush_all()
preference in __native_flush_tlb_global(), it's not seen in upstream
4.14, and was recently reported to be surprisingly slow.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0651b3ad99dd59269e2ec883338ab8fba617e203)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: drop is_atomic arg to kaiser_pagetable_walk()
Hugh Dickins [Sun, 29 Oct 2017 18:36:19 +0000 (11:36 -0700)]
kaiser: drop is_atomic arg to kaiser_pagetable_walk()

I have not observed a might_sleep() warning from setup_fixmap_gdt()'s
use of kaiser_add_mapping() in our tree (why not?), but like upstream
we have not provided a way for that to pass is_atomic true down to
kaiser_pagetable_walk(), and at startup it's far from a likely source
of trouble: so just delete the walk's is_atomic arg and might_sleep().

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 28c6de5441740f868a5b371804a0e8dde03757fb)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/mm/kaiser.c

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
Hugh Dickins [Wed, 4 Oct 2017 03:49:04 +0000 (20:49 -0700)]
kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush

Now that we're playing the ALTERNATIVE game, use that more efficient
method: instead of user-mapping an extra page, and reading an extra
cacheline each time for x86_cr3_pcid_noflush.

Neel has found that __stringify(bts $X86_CR3_PCID_NOFLUSH_BIT, %rax)
is a working substitute for the "bts $63, %rax" in these ALTERNATIVEs;
but the one line with $63 in looks clearer, so let's stick with that.

Worried about what happens with an ALTERNATIVE between the jump and
jump label in another ALTERNATIVE?  I was, but have checked the
combinations in SWITCH_KERNEL_CR3_NO_STACK at entry_SYSCALL_64,
and it does a good job.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2dff99eb0335f9e0817410696a180dba25ca7371)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/kaiser: Check boottime cmdline params
Borislav Petkov [Tue, 2 Jan 2018 13:19:48 +0000 (14:19 +0100)]
x86/kaiser: Check boottime cmdline params

AMD (and possibly other vendors) are not affected by the leak
KAISER is protecting against.

Keep the "nopti" for traditional reasons and add pti=<on|off|auto>
like upstream.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e405a064bd7d6eca88935342ddb71057a9d6ceab)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/kaiser: Rename and simplify X86_FEATURE_KAISER handling
Borislav Petkov [Tue, 2 Jan 2018 13:19:48 +0000 (14:19 +0100)]
x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling

Concentrate it in arch/x86/mm/kaiser.c and use the upstream string "nopti".

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit dea9aa9ffae11c91285335cc3215b4f0e48e8139)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: add "nokaiser" boot option, using ALTERNATIVE
Hugh Dickins [Sun, 24 Sep 2017 23:59:49 +0000 (16:59 -0700)]
kaiser: add "nokaiser" boot option, using ALTERNATIVE

Added "nokaiser" boot option: an early param like "noinvpcid".
Most places now check int kaiser_enabled (#defined 0 when not
CONFIG_KAISER) instead of #ifdef CONFIG_KAISER; but entry_64.S
and entry_64_compat.S are using the ALTERNATIVE technique, which
patches in the preferred instructions at runtime.  That technique
is tied to x86 cpu features, so X86_FEATURE_KAISER is fabricated.

Prior to "nokaiser", Kaiser #defined _PAGE_GLOBAL 0: revert that,
but be careful with both _PAGE_GLOBAL and CR4.PGE: setting them when
nokaiser like when !CONFIG_KAISER, but not setting either when kaiser -
neither matters on its own, but it's hard to be sure that _PAGE_GLOBAL
won't get set in some obscure corner, or something add PGE into CR4.
By omitting _PAGE_GLOBAL from __supported_pte_mask when kaiser_enabled,
all page table setup which uses pte_pfn() masks it out of the ptes.

It's slightly shameful that the same declaration versus definition of
kaiser_enabled appears in not one, not two, but in three header files
(asm/kaiser.h, asm/pgtable.h, asm/tlbflush.h).  I felt safer that way,
than with #including any of those in any of the others; and did not
feel it worth an asm/kaiser_enabled.h - kernel/cpu/common.c includes
them all, so we shall hear about it if they get out of synch.

Cleanups while in the area: removed the silly #ifdef CONFIG_KAISER
from kaiser.c; removed the unused native_get_normal_pgd(); removed
the spurious reg clutter from SWITCH_*_CR3 macro stubs; corrected some
comments.  But more interestingly, set CR4.PSE in secondary_startup_64:
the manual is clear that it does not matter whether it's 0 or 1 when
4-level-pts are enabled, but I was distracted to find cr4 different on
BSP and auxiliaries - BSP alone was adding PSE, in probe_page_size_mask().

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e345dcc9481543edf4a0a5df4c4c2f9597b0a997)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: fix unlikely error in alloc_ldt_struct()
Hugh Dickins [Tue, 5 Dec 2017 04:13:35 +0000 (20:13 -0800)]
kaiser: fix unlikely error in alloc_ldt_struct()

An error from kaiser_add_mapping() here is not at all likely, but
Eric Biggers rightly points out that __free_ldt_struct() relies on
new_ldt->size being initialized: move that up.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 500943e57db8d3e298e98f595f835c5b613e843b)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls
Hugh Dickins [Fri, 13 Oct 2017 19:10:00 +0000 (12:10 -0700)]
kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls

Synthetic filesystem mempressure testing has shown softlockups, with
hour-long page allocation stalls, and pgd_alloc() trying for order:1
with __GFP_REPEAT in one of the backtraces each time.

That's _pgd_alloc() going for a Kaiser double-pgd, using the __GFP_REPEAT
common to all page table allocations, but actually having no effect on
order:0 (see should_alloc_oom() and should_continue_reclaim() in this
tree, but beware that ports to another tree might behave differently).

Order:1 stack allocation has been working satisfactorily without
__GFP_REPEAT forever, and page table allocation only asks __GFP_REPEAT
for awkward occasions in a long-running process: it's not appropriate
at fork or exec time, and seems to be doing much more harm than good:
getting those contiguous pages under very heavy mempressure can be
hard (though even without it, Kaiser does generate more mempressure).

Mask out that __GFP_REPEAT inside _pgd_alloc().  Why not take it out
of the PGALLOG_GFP altogether, as v4.7 commit a3a9a59d2067 ("x86: get
rid of superfluous __GFP_REPEAT") did?  Because I think that might
make a difference to our page table memcg charging, which I'd prefer
not to interfere with at this time.

hughd adds: __alloc_pages_slowpath() in the 4.4.89-stable tree handles
__GFP_REPEAT a little differently than in prod kernel or 3.18.72-stable,
so it may not always be exactly a no-op on order:0 pages, as said above;
but I think still appropriate to omit it from Kaiser or non-Kaiser pgd.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d41f46f778951b0ea851ca52b88b2549c6336b47)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: paranoid_entry pass cr3 need to paranoid_exit
Hugh Dickins [Wed, 27 Sep 2017 01:43:07 +0000 (18:43 -0700)]
kaiser: paranoid_entry pass cr3 need to paranoid_exit

Neel Natu points out that paranoid_entry() was wrong to assume that
an entry that did not need swapgs would not need SWITCH_KERNEL_CR3:
paranoid_entry (used for debug breakpoint, int3, double fault or MCE;
though I think it's only the MCE case that is cause for concern here)
can break in at an awkward time, between cr3 switch and swapgs, but
its handling always needs kernel gs and kernel cr3.

Easy to fix in itself, but paranoid_entry() also needs to convey to
paranoid_exit() (and my reading of macro idtentry says paranoid_entry
and paranoid_exit are always paired) how to restore the prior state.
The swapgs state is already conveyed by %ebx (0 or 1), so extend that
also to convey when SWITCH_USER_CR3 will be needed (2 or 3).

(Yes, I'd much prefer that 0 meant no swapgs, whereas it's the other
way round: and a convention shared with error_entry() and error_exit(),
which I don't want to touch.  Perhaps I should have inverted the bit
for switch cr3 too, but did not.)

paranoid_exit() would be straightforward, except for TRACE_IRQS: it
did TRACE_IRQS_IRETQ when doing swapgs, but TRACE_IRQS_IRETQ_DEBUG
when not: which is it supposed to use when SWITCH_USER_CR3 is split
apart from that?  As best as I can determine, commit 5963e317b1e9
("ftrace/x86: Do not change stacks in DEBUG when calling lockdep")
missed the swapgs case, and should have used TRACE_IRQS_IRETQ_DEBUG
there too (the discrepancy has nothing to do with the liberal use
of _NO_STACK and _UNSAFE_STACK hereabouts: TRACE_IRQS_OFF_DEBUG has
just been used in all cases); discrepancy lovingly preserved across
several paranoid_exit() cleanups, but I'm now removing it.

Neel further indicates that to use SWITCH_USER_CR3_NO_STACK there in
paranoid_exit() is now not only unnecessary but unsafe: might corrupt
syscall entry's unsafe_stack_register_backup of %rax.  Just use
SWITCH_USER_CR3: and delete SWITCH_USER_CR3_NO_STACK altogether,
before we make the mistake of using it again.

hughd adds: this commit fixes an issue in the Kaiser-without-PCIDs
part of the series, and ought to be moved earlier, if you decided
to make a release of Kaiser-without-PCIDs.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fc8334e6b3e5d28afd4eec8a74493933f73b2784)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
Hugh Dickins [Sun, 27 Aug 2017 23:24:27 +0000 (16:24 -0700)]
kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user

Mostly this commit is just unshouting X86_CR3_PCID_KERN_VAR and
X86_CR3_PCID_USER_VAR: we usually name variables in lower-case.

But why does x86_cr3_pcid_noflush need to be __aligned(PAGE_SIZE)?
Ah, it's a leftover from when kaiser_add_user_map() once complained
about mapping the same page twice.  Make it __read_mostly instead.
(I'm a little uneasy about all the unrelated data which shares its
page getting user-mapped too, but that was so before, and not a big
deal: though we call it user-mapped, it's not mapped with _PAGE_USER.)

And there is a little change around the two calls to do_nmi().
Previously they set the NOFLUSH bit (if PCID supported) when
forcing to kernel context before do_nmi(); now they also have the
NOFLUSH bit set (if PCID supported) when restoring context after:
nothing done in do_nmi() should require a TLB to be flushed here.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 20268a10ffecd9fcc04880b21fc99a9192394599)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: PCID 0 for kernel and 128 for user
Hugh Dickins [Sat, 9 Sep 2017 02:26:30 +0000 (19:26 -0700)]
kaiser: PCID 0 for kernel and 128 for user

Why was 4 chosen for kernel PCID and 6 for user PCID?
No good reason in a backport where PCIDs are only used for Kaiser.

If we continue with those, then we shall need to add Andy Lutomirski's
4.13 commit 6c690ee1039b ("x86/mm: Split read_cr3() into read_cr3_pa()
and __read_cr3()"), which deals with the problem of read_cr3() callers
finding stray bits in the cr3 that they expected to be page-aligned;
and for hibernation, his 4.14 commit f34902c5c6c0 ("x86/hibernate/64:
Mask off CR3's PCID bits in the saved CR3").

But if 0 is used for kernel PCID, then there's no need to add in those
commits - whenever the kernel looks, it sees 0 in the lower bits; and
0 for kernel seems an obvious choice.

And I naughtily propose 128 for user PCID.  Because there's a place
in _SWITCH_TO_USER_CR3 where it takes note of the need for TLB FLUSH,
but needs to reset that to NOFLUSH for the next occasion.  Currently
it does so with a "movb $(0x80)" into the high byte of the per-cpu
quadword, but that will cause a machine without PCID support to crash.
Now, if %al just happened to have 0x80 in it at that point, on a
machine with PCID support, but 0 on a machine without PCID support...

(That will go badly wrong once the pgd can be at a physical address
above 2^56, but even with 5-level paging, physical goes up to 2^52.)

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3b4ce0e1a17228eec71815d7997e49e403ebf2a7)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
Hugh Dickins [Thu, 17 Aug 2017 22:00:37 +0000 (15:00 -0700)]
kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user

We have many machines (Westmere, Sandybridge, Ivybridge) supporting
PCID but not INVPCID: on these load_new_mm_cr3() simply crashed.

Flushing user context inside load_new_mm_cr3() without the use of
invpcid is difficult: momentarily switch from kernel to user context
and back to do so?  I'm not sure whether that can be safely done at
all, and would risk polluting user context with kernel internals,
and kernel context with stale user externals.

Instead, follow the hint in the comment that was there: change
X86_CR3_PCID_USER_VAR to be a per-cpu variable, then load_new_mm_cr3()
can leave a note in it, for SWITCH_USER_CR3 on return to userspace to
flush user context TLB, instead of default X86_CR3_PCID_USER_NOFLUSH.

Which works well enough that there's no need to do it this way only
when invpcid is unsupported: it's a good alternative to invpcid here.
But there's a couple of inlines in asm/tlbflush.h that need to do the
same trick, so it's best to localize all this per-cpu business in
mm/kaiser.c: moving that part of the initialization from setup_pcid()
to kaiser_setup_pcid(); with kaiser_flush_tlb_on_return_to_user() the
function for noting an X86_CR3_PCID_USER_FLUSH.  And let's keep a
KAISER_SHADOW_PGD_OFFSET in there, to avoid the extra OR on exit.

I did try to make the feature tests in asm/tlbflush.h more consistent
with each other: there seem to be far too many ways of performing such
tests, and I don't have a good grasp of their differences.  At first
I converted them all to be static_cpu_has(): but that proved to be a
mistake, as the comment in __native_flush_tlb_single() hints; so then
I reversed and made them all this_cpu_has().  Probably all gratuitous
change, but that's the way it's working at present.

I am slightly bothered by the way non-per-cpu X86_CR3_PCID_KERN_VAR
gets re-initialized by each cpu (before and after these changes):
no problem when (as usual) all cpus on a machine have the same
features, but in principle incorrect.  However, my experiment
to per-cpu-ify that one did not end well...

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0731188fc74cc2237975a2b5bedd36e2463ef10b)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/include/asm/tlbflush.h

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: enhanced by kernel and user PCIDs
Dave Hansen [Wed, 30 Aug 2017 23:23:00 +0000 (16:23 -0700)]
kaiser: enhanced by kernel and user PCIDs

Merged performance improvements to Kaiser, using distinct kernel
and user Process Context Identifiers to minimize the TLB flushing.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit eb82151d0b1df53d1ad8d060ecd554ca12eb552a)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)
arch/x86/entry/entry_64_compat.S (not in this tree)
arch/x86/ia32/ia32entry.S (patched instead of that)
arch/x86/include/asm/tlbflush.h

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: vmstat show NR_KAISERTABLE as nr_overhead
Hugh Dickins [Sun, 10 Sep 2017 04:27:32 +0000 (21:27 -0700)]
kaiser: vmstat show NR_KAISERTABLE as nr_overhead

The kaiser update made an interesting choice, never to free any shadow
page tables.  Contention on global spinlock was worrying, particularly
with it held across page table scans when freeing.  Something had to be
done: I was going to add refcounting; but simply never to free them is
an appealing choice, minimizing contention without complicating the code
(the more a page table is found already, the less the spinlock is used).

But leaking pages in this way is also a worry: can we get away with it?
At the very least, we need a count to show how bad it actually gets:
in principle, one might end up wasting about 1/256 of memory that way
(1/512 for when direct-mapped pages have to be user-mapped, plus 1/512
for when they are user-mapped from the vmalloc area on another occasion
(but we don't have vmalloc'ed stacks, so only large ldts are vmalloc'ed).

Add per-cpu stat NR_KAISERTABLE: including 256 at startup for the
shared pgd entries, and 1 for each intermediate page table added
thereafter for user-mapping - but leave out the 1 per mm, for its
shadow pgd, because that distracts from the monotonic increase.
Shown in /proc/vmstat as nr_overhead (0 if kaiser not enabled).

In practice, it doesn't look so bad so far: more like 1/12000 after
nine hours of gtests below; and movable pageblock segregation should
tend to cluster the kaiser tables into a subset of the address space
(if not, they will be bad for compaction too).  But production may
tell a different story: keep an eye on this number, and bring back
lighter freeing if it gets out of control (maybe a shrinker).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3e3d38fd9832e82a8cb1a5b1154acfa43ac08d15)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: delete KAISER_REAL_SWITCH option
Hugh Dickins [Mon, 4 Sep 2017 01:30:43 +0000 (18:30 -0700)]
kaiser: delete KAISER_REAL_SWITCH option

We fail to see what CONFIG_KAISER_REAL_SWITCH is for: it seems to be
left over from early development, and now just obscures tricky parts
of the code.  Delete it before adding PCIDs, or nokaiser boot option.

(Or if there is some good reason to keep the option, then it needs
a help text - and a "depends on KAISER", so that all those without
KAISER are not asked the question.)

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b9d2ccc54e17b5aa50dd0c036d3f4fb4e5248d54)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
Hugh Dickins [Sun, 10 Sep 2017 00:31:18 +0000 (17:31 -0700)]
kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET

There's a 0x1000 in various places, which looks better with a name.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit aeda21d77e22fb382c51fd3f6bbb18df69bc032f)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: cleanups while trying for gold link
Hugh Dickins [Tue, 22 Aug 2017 03:11:43 +0000 (20:11 -0700)]
kaiser: cleanups while trying for gold link

While trying to get our gold link to work, four cleanups:
matched the gdt_page declaration to its definition;
in fiddling unsuccessfully with PERCPU_INPUT(), lined up backslashes;
lined up the backslashes according to convention in percpu-defs.h;
deleted the unused irq_stack_pointer addition to irq_stack_union.

Sad to report that aligning backslashes does not appear to help gold
align to 8192: but while these did not help, they are worth keeping.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c52e55a2a82d3a44189810d35717d81cb4cf61d4)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: kaiser_remove_mapping() move along the pgd
Hugh Dickins [Mon, 2 Oct 2017 17:57:24 +0000 (10:57 -0700)]
kaiser: kaiser_remove_mapping() move along the pgd

When removing the bogus comment from kaiser_remove_mapping(),
I really ought to have checked the extent of its bogosity: as
Neel points out, there is nothing to stop unmap_pud_range_nofree()
from continuing beyond the end of a pud (and starting in the wrong
position on the next).

Fix kaiser_remove_mapping() to constrain the extent and advance pgd
pointer correctly: use pgd_addr_end() macro as used throughout base
mm (but don't assume page-rounded start and size in this case).

But this bug was very unlikely to trigger in this backport: since
any buddy allocation is contained within a single pud extent, and
we are not using vmapped stacks (and are only mapping one page of
stack anyway): the only way to hit this bug here would be when
freeing a large modified ldt.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f127705d26b34c053e59b47aef84b3ea564dd743)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: tidied up kaiser_add/remove_mapping slightly
Hugh Dickins [Mon, 4 Sep 2017 02:23:08 +0000 (19:23 -0700)]
kaiser: tidied up kaiser_add/remove_mapping slightly

Yes, unmap_pud_range_nofree()'s declaration ought to be in a
header file really, but I'm not sure we want to use it anyway:
so for now just declare it inside kaiser_remove_mapping().
And there doesn't seem to be such a thing as unmap_p4d_range(),
even in a 5-level paging tree.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0c68228f7b39c96cabd89bee3e1d6bd55926df80)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: tidied up asm/kaiser.h somewhat
Hugh Dickins [Mon, 4 Sep 2017 02:18:07 +0000 (19:18 -0700)]
kaiser: tidied up asm/kaiser.h somewhat

Mainly deleting a surfeit of blank lines, and reflowing header comment.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5fbd46c4be78174656b52e1b04d3057a5dd7af66)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: ENOMEM if kaiser_pagetable_walk() NULL
Hugh Dickins [Mon, 4 Sep 2017 01:48:02 +0000 (18:48 -0700)]
kaiser: ENOMEM if kaiser_pagetable_walk() NULL

kaiser_add_user_map() took no notice when kaiser_pagetable_walk() failed.
And avoid its might_sleep() when atomic (though atomic at present unused).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 407c3ff6a24c7cb418b77a124d17e282f9622037)
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Orabug: 27333760
CVE: CVE-2017-5754

Conflicts:
arch/x86/mm/kaiser.c

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: fix perf crashes
Hugh Dickins [Wed, 23 Aug 2017 21:21:14 +0000 (14:21 -0700)]
kaiser: fix perf crashes

Avoid perf crashes: place debug_store in the user-mapped per-cpu area
instead of allocating, and use page allocator plus kaiser_add_mapping()
to keep the BTS and PEBS buffers user-mapped (that is, present in the
user mapping, though visible only to kernel and hardware).  The PEBS
fixup buffer does not need this treatment.

The need for a user-mapped struct debug_store showed up before doing
any conscious perf testing: in a couple of kernel paging oopses on
Westmere, implicating the debug_store offset of the per-cpu area.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 20cbe9a3aa2e341824da57ce0ac6d52cbffaa570)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/kernel/cpu/perf_event_intel_ds.c

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
Hugh Dickins [Fri, 22 Sep 2017 03:39:56 +0000 (20:39 -0700)]
kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER

pjt has observed that nmi's second (nmi_from_kernel) call to do_nmi()
adjusted the %rdi regs arg, rightly when CONFIG_KAISER, but wrongly
when not CONFIG_KAISER.

Although the minimal change is to add an #ifdef CONFIG_KAISER around
the addq line, that looks cluttered, and I prefer how the first call
to do_nmi() handled it: prepare args in %rdi and %rsi before getting
into the CONFIG_KAISER block, since it does not touch them at all.

And while we're here, place the "#ifdef CONFIG_KAISER" that follows
each, to enclose the "Unconditionally restore CR3" comment: matching
how the "Unconditionally use kernel CR3" comment above is enclosed.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 487f0b73d82611a2dc48d7d78409e2e9d994006a)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: KAISER depends on SMP
Hugh Dickins [Wed, 13 Sep 2017 21:03:10 +0000 (14:03 -0700)]
kaiser: KAISER depends on SMP

It is absurd that KAISER should depend on SMP, but apparently nobody
has tried a UP build before: which breaks on implicit declaration of
function 'per_cpu_offset' in arch/x86/mm/kaiser.c.

Now, you would expect that to be trivially fixed up; but looking at
the System.map when that block is #ifdef'ed out of kaiser_init(),
I see that in a UP build __per_cpu_user_mapped_end is precisely at
__per_cpu_user_mapped_start, and the items carefully gathered into
that section for user-mapping on SMP, dispersed elsewhere on UP.

So, some other kind of section assignment will be needed on UP,
but implementing that is not a priority: just make KAISER depend
on SMP for now.

Also inserted a blank line before the option, tidied up the
brief Kconfig help message, and added an "If unsure, Y".

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d94df20135ccfdfb77b1479c501564e9b4ab5bc9)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: fix build and FIXME in alloc_ldt_struct()
Hugh Dickins [Mon, 4 Sep 2017 00:09:44 +0000 (17:09 -0700)]
kaiser: fix build and FIXME in alloc_ldt_struct()

Include linux/kaiser.h instead of asm/kaiser.h to build ldt.c without
CONFIG_KAISER.  kaiser_add_mapping() does already return an error code,
so fix the FIXME.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9b94cf97f42ca30fe9b5010900fa6e1d6855a9f6)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
Hugh Dickins [Mon, 4 Sep 2017 01:57:03 +0000 (18:57 -0700)]
kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE

Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping().  And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 003e476716906afa135faf605ae0a5c3598c0293)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
init/main.c

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: do not set _PAGE_NX on pgd_none
Hugh Dickins [Tue, 5 Sep 2017 19:05:01 +0000 (12:05 -0700)]
kaiser: do not set _PAGE_NX on pgd_none

native_pgd_clear() uses native_set_pgd(), so native_set_pgd() must
avoid setting the _PAGE_NX bit on an otherwise pgd_none() entry:
usually that just generated a warning on exit, but sometimes
more mysterious and damaging failures (our production machines
could not complete booting).

The original fix to this just avoided adding _PAGE_NX to
an empty entry; but eventually more problems surfaced with kexec,
and EFI mapping expected to be a problem too.  So now instead
change native_set_pgd() to update shadow only if _PAGE_USER:

A few places (kernel/machine_kexec_64.c, platform/efi/efi_64.c for sure)
use set_pgd() to set up a temporary internal virtual address space, with
physical pages remapped at what Kaiser regards as userspace addresses:
Kaiser then assumes a shadow pgd follows, which it will try to corrupt.

This appears to be responsible for the recent kexec and kdump failures;
though it's unclear how those did not manifest as a problem before.
Ah, the shadow pgd will only be assumed to "follow" if the requested
pgd is on an even-numbered page: so I suppose it was going wrong 50%
of the time all along.

What we need is a flag to set_pgd(), to tell it we're dealing with
userspace.  Er, isn't that what the pgd's _PAGE_USER bit is saying?
Add a test for that.  But we cannot do the same for pgd_clear()
(which may be called to clear corrupted entries - set aside the
question of "corrupt in which pgd?" until later), so there just
rely on pgd_clear() not being called in the problematic cases -
with a WARN_ON_ONCE() which should fire half the time if it is.

But this is getting too big for an inline function: move it into
arch/x86/mm/kaiser.c (which then demands a boot/compressed mod);
and de-void and de-space native_get_shadow/normal_pgd() while here.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit edde73205b3fdde8c8a3adfce78cc6d0de72386b)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokaiser: merged update
Dave Hansen [Wed, 30 Aug 2017 23:23:00 +0000 (16:23 -0700)]
kaiser: merged update

Merged fixes and cleanups, rebased to 4.4.89 tree (no 5-level paging).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bed9bb7f3e6d4045013d2bb9e4004896de57f02b)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)
arch/x86/kernel/ldt.c
kernel/fork.c

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoKAISER: Kernel Address Isolation
Richard Fellner [Thu, 4 May 2017 12:26:50 +0000 (14:26 +0200)]
KAISER: Kernel Address Isolation

This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

        https://github.com/IAIK/KAISER

From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>
After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).

With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.

If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf

[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8a43ddfb93a0c6ae1a6e1f5c25705ec5d1843c40)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)
arch/x86/entry/entry_64_compat.S (not in this tree)
arch/x86/ia32/ia32entry.S (patched instead of that)
arch/x86/include/asm/hw_irq.h
arch/x86/kernel/irqinit.c
kernel/fork.c

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/boot: Add early cmdline parsing for options with arguments
Tom Lendacky [Mon, 17 Jul 2017 21:10:33 +0000 (16:10 -0500)]
x86/boot: Add early cmdline parsing for options with arguments

commit e505371dd83963caae1a37ead9524e8d997341be upstream.

Add a cmdline_find_option() function to look for cmdline options that
take arguments. The argument is returned in a supplied buffer and the
argument length (regardless of whether it fits in the supplied buffer)
is returned, with -1 indicating not found.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/36b5f97492a9745dce27682305f990fc20e5cf8a.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0fa147b407478e73fe7a478677ff2b12bb824014)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm/64: Fix reboot interaction with CR4.PCIDE
Andy Lutomirski [Mon, 9 Oct 2017 04:53:05 +0000 (21:53 -0700)]
x86/mm/64: Fix reboot interaction with CR4.PCIDE

commit 924c6b900cfdf376b07bccfd80e62b21914f8a5a upstream.

Trying to reboot via real mode fails with PCID on: long mode cannot
be exited while CR4.PCIDE is set.  (No, I have no idea why, but the
SDM and actual CPUs are in agreement here.)  The result is a GPF and
a hang instead of a reboot.

I didn't catch this in testing because neither my computer nor my VM
reboots this way.  I can trigger it with reboot=bios, though.

Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
Reported-and-tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/f1e7d965998018450a7a70c2823873686a8b21c0.1507524746.git.luto@kernel.org
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6c4db09c291a19da66512f99c4bcb378a862f9e6)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm: Enable CR4.PCIDE on supported systems
Andy Lutomirski [Thu, 29 Jun 2017 15:53:21 +0000 (08:53 -0700)]
x86/mm: Enable CR4.PCIDE on supported systems

commit 660da7c9228f685b2ebe664f9fd69aaddcc420b5 upstream.

We can use PCID if the CPU has PCID and PGE and we're not on Xen.

By itself, this has no effect. A followup patch will start using PCID.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/6327ecd907b32f79d5aa0d466f04503bbec5df88.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fd0504525efd2ce2063cd4229baabd3e3a56ecbc)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm: Add the 'nopcid' boot option to turn off PCID
Andy Lutomirski [Thu, 29 Jun 2017 15:53:20 +0000 (08:53 -0700)]
x86/mm: Add the 'nopcid' boot option to turn off PCID

commit 0790c9aad84901ca1bdc14746175549c8b5da215 upstream.

The parameter is only present on x86_64 systems to save a few bytes,
as PCID is always disabled on x86_32.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/8bbb2e65bcd249a5f18bfb8128b4689f08ac2b60.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit dcccd3c266e24ce80ecf592765315c54c222ac33)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm: Disable PCID on 32-bit kernels
Andy Lutomirski [Thu, 29 Jun 2017 15:53:19 +0000 (08:53 -0700)]
x86/mm: Disable PCID on 32-bit kernels

commit cba4671af7550e008f7a7835f06df0763825bf3e upstream.

32-bit kernels on new hardware will see PCID in CPUID, but PCID can
only be used in 64-bit mode.  Rather than making all PCID code
conditional, just disable the feature on 32-bit builds.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/2e391769192a4d31b808410c383c6bf0734bc6ea.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 78043e5b6fb2921d836b31f23e89e52925191153)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code
Andy Lutomirski [Sun, 28 May 2017 17:00:14 +0000 (10:00 -0700)]
x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code

commit ce4a4e565f5264909a18c733b864c3f74467f69e upstream.

The UP asm/tlbflush.h generates somewhat nicer code than the SMP version.
Aside from that, it's fallen quite a bit behind the SMP code:

 - flush_tlb_mm_range() didn't flush individual pages if the range
   was small.

 - The lazy TLB code was much weaker.  This usually wouldn't matter,
   but, if a kernel thread flushed its lazy "active_mm" more than
   once (due to reclaim or similar), it wouldn't be unlazied and
   would instead pointlessly flush repeatedly.

 - Tracepoints were missing.

Aside from that, simply having the UP code around was a maintanence
burden, since it means that any change to the TLB flush code had to
make sure not to break it.

Simplify everything by deleting the UP code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b2e24274d50e0ecdf560ebe06dbed0cc648ad3f9)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/Kconfig

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
Andy Lutomirski [Mon, 22 May 2017 22:30:01 +0000 (15:30 -0700)]
x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()

commit ca6c99c0794875c6d1db6e22f246699691ab7e6b upstream.

flush_tlb_page() was very similar to flush_tlb_mm_range() except that
it had a couple of issues:

 - It was missing an smp_mb() in the case where
   current->active_mm != mm.  (This is a longstanding bug reported by Nadav Amit)

 - It was missing tracepoints and vm counter updates.

The only reason that I can see for keeping it at as a separate
function is that it could avoid a few branches that
flush_tlb_mm_range() needs to decide to flush just one page.  This
hardly seems worthwhile.  If we decide we want to get rid of those
branches again, a better way would be to introduce an
__flush_tlb_mm_range() helper and make both flush_tlb_page() and
flush_tlb_mm_range() use it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/3cc3847cf888d8907577569b8bac3f01992ef8f9.1495492063.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3efba6062a410a2a65fc9d6f53dca63db2602e65)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm: Make flush_tlb_mm_range() more predictable
Andy Lutomirski [Sat, 22 Apr 2017 07:01:21 +0000 (00:01 -0700)]
x86/mm: Make flush_tlb_mm_range() more predictable

commit ce27374fabf553153c3f53efcaa9bfab9216bd8c upstream.

I'm about to rewrite the function almost completely, but first I
want to get a functional change out of the way.  Currently, if
flush_tlb_mm_range() does not flush the local TLB at all, it will
never do individual page flushes on remote CPUs.  This seems to be
an accident, and preserving it will be awkward.  Let's change it
first so that any regressions in the rewrite will be easier to
bisect and so that the rewrite can attempt to change no visible
behavior at all.

The fix is simple: we can simply avoid short-circuiting the
calculation of base_pages_to_flush.

As a side effect, this also eliminates a potential corner case: if
tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range()
could have ended up flushing the entire address space one page at a
time.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9f4d1ba1d407e56dac833aa0b11c60f952939e1c)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm: Remove flush_tlb() and flush_tlb_current_task()
Andy Lutomirski [Sat, 22 Apr 2017 07:01:20 +0000 (00:01 -0700)]
x86/mm: Remove flush_tlb() and flush_tlb_current_task()

commit 29961b59a51f8c6838a26a45e871a7ed6771809b upstream.

I was trying to figure out what how flush_tlb_current_task() would
possibly work correctly if current->mm != current->active_mm, but I
realized I could spare myself the effort: it has no callers except
the unused flush_tlb() macro.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 227d6f0e79f809e448d3157fbfd00eb54dcbb54e)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
Andy Lutomirski [Sat, 22 Apr 2017 07:01:19 +0000 (00:01 -0700)]
x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()

commit 9ccee2373f0658f234727700e619df097ba57023 upstream.

mark_screen_rdonly() is the last remaining caller of flush_tlb().
flush_tlb_mm_range() is potentially faster and isn't obsolete.

Compile-tested only because I don't know whether software that uses
this mechanism even exists.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6ce9d1e6819e53c4de0bf980555c4e07bbedb4ce)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/irq: Do not substract irq_tlb_count from irq_call_count
Aaron Lu [Thu, 11 Aug 2016 07:44:30 +0000 (15:44 +0800)]
x86/irq: Do not substract irq_tlb_count from irq_call_count

commit 82ba4faca1bffad429f15c90c980ffd010366c25 upstream.

Since commit:

  52aec3308db8 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR")

the TLB remote shootdown is done through call function vector. That
commit didn't take care of irq_tlb_count, which a later commit:

  fd0f5869724f ("x86: Distinguish TLB shootdown interrupts from other functions call interrupts")

... tried to fix.

The fix assumes every increase of irq_tlb_count has a corresponding
increase of irq_call_count. So the irq_call_count is always bigger than
irq_tlb_count and we could substract irq_tlb_count from irq_call_count.

Unfortunately this is not true for the smp_call_function_single() case.
The IPI is only sent if the target CPU's call_single_queue is empty when
adding a csd into it in generic_exec_single. That means if two threads
are both adding flush tlb csds to the same CPU's call_single_queue, only
one IPI is sent. In other words, the irq_call_count is incremented by 1
but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
bigger than irq_call_count and the substract will produce a very large
irq_call_count value due to overflow.

Considering that:

  1) it's not worth to send more IPIs for the sake of accurate counting of
     irq_call_count in generic_exec_single();

  2) it's not easy to tell if the call function interrupt is for TLB
     shootdown in __smp_call_function_single_interrupt().

Not to exclude TLB shootdown from call function count seems to be the
simplest fix and this patch just does that.

This bug was found by LKP's cyclic performance regression tracking recently
with the vm-scalability test suite. I have bisected to commit:

  3dec0ba0be6a ("mm/rmap: share the i_mmap_rwsem")

This commit didn't do anything wrong but revealed the irq_call_count
problem. IIUC, the commit makes rwc->remap_one in rmap_walk_file
concurrent with multiple threads.  When remap_one is try_to_unmap_one(),
then multiple threads could queue flush TLB to the same CPU but only
one IPI will be sent.

Since the commit was added in Linux v3.19, the counting problem only
shows up from v3.19 onwards.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Link: http://lkml.kernel.org/r/20160811074430.GA18163@aaronlu.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3b9d9ec0d8261bb9b12f858e66f0c84cd2a6a3bb)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agosched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()
Andy Lutomirski [Fri, 9 Jun 2017 18:49:15 +0000 (11:49 -0700)]
sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()

commit 252d2a4117bc181b287eeddf848863788da733ae upstream.

idle_task_exit() can be called with IRQs on x86 on and therefore
should use switch_mm(), not switch_mm_irqs_off().

This doesn't seem to cause any problems right now, but it will
confuse my upcoming TLB flush changes.  Nonetheless, I think it
should be backported because it's trivial.  There won't be any
meaningful performance impact because idle_task_exit() is only
used when offlining a CPU.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
Link: http://lkml.kernel.org/r/ca3d1a9fa93a0b49f5a8ff729eda3640fb6abdf9.1497034141.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 18a5348d49afcfc2b95da939143c9420edd78b9e)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoARM: Hide finish_arch_post_lock_switch() from modules
Steven Rostedt [Fri, 13 May 2016 13:30:13 +0000 (15:30 +0200)]
ARM: Hide finish_arch_post_lock_switch() from modules

commit ef0491ea17f8019821c7e9c8e801184ecf17f85a upstream.

The introduction of switch_mm_irqs_off() brought back an old bug
regarding the use of preempt_enable_no_resched:

As part of:

  62b94a08da1b ("sched/preempt: Take away preempt_enable_no_resched() from modules")

the definition of preempt_enable_no_resched() is only available in
built-in code, not in loadable modules, so we can't generally use
it from header files.

However, the ARM version of finish_arch_post_lock_switch()
calls preempt_enable_no_resched() and is defined as a static
inline function in asm/mmu_context.h. This in turn means we cannot
include asm/mmu_context.h from modules.

With today's tip tree, asm/mmu_context.h gets included from
linux/mmu_context.h, which is normally the exact pattern one would
expect, but unfortunately, linux/mmu_context.h can be included from
the vhost driver that is a loadable module, now causing this compile
time error with modular configs:

  In file included from ../include/linux/mmu_context.h:4:0,
                   from ../drivers/vhost/vhost.c:18:
  ../arch/arm/include/asm/mmu_context.h: In function 'finish_arch_post_lock_switch':
  ../arch/arm/include/asm/mmu_context.h:88:3: error: implicit declaration of function 'preempt_enable_no_resched' [-Werror=implicit-function-declaration]
     preempt_enable_no_resched();

Andy already tried to fix the bug by including linux/preempt.h
from asm/mmu_context.h, but that didn't help. Arnd suggested reordering
the header files, which wasn't popular, so let's use this
workaround instead:

The finish_arch_post_lock_switch() definition is now also hidden
inside of #ifdef MODULE, so we don't see anything referencing
preempt_enable_no_resched() from a header file. I've built a
few hundred randconfig kernels with this, and did not see any
new problems.

Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
Link: http://lkml.kernel.org/r/1463146234-161304-1-git-send-email-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c22d4b4d1c7fcc0d9eb4d8618d86c554c48ed9c0)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm, sched/core: Turn off IRQs in switch_mm()
Andy Lutomirski [Tue, 26 Apr 2016 16:39:09 +0000 (09:39 -0700)]
x86/mm, sched/core: Turn off IRQs in switch_mm()

commit 078194f8e9fe3cf54c8fd8bded48a1db5bd8eb8a upstream.

Potential races between switch_mm() and TLB-flush or LDT-flush IPIs
could be very messy.  AFAICT the code is currently okay, whether by
accident or by careful design, but enabling PCID will make it
considerably more complicated and will no longer be obviously safe.

Fix it with a big hammer: run switch_mm() with IRQs off.

To avoid a performance hit in the scheduler, we take advantage of
our knowledge that the scheduler already has IRQs disabled when it
calls switch_mm().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f19baf759693c9dcae64bbff76189db77cb13398.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4ead44fd2525ed97e5362a806d312a0e3b0ea445)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/include/asm/mmu_context.h

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm, sched/core: Uninline switch_mm()
Andy Lutomirski [Tue, 26 Apr 2016 16:39:08 +0000 (09:39 -0700)]
x86/mm, sched/core: Uninline switch_mm()

commit 69c0319aabba45bcf33178916a2f06967b4adede upstream.

It's fairly large and it has quite a few callers.  This may also
help untangle some headers down the road.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/54f3367803e7f80b2be62c8a21879aa74b1a5f57.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 70a39c7fd167399fde76aeac314dce026a255b49)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/include/asm/mmu_context.h

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm: Build arch/x86/mm/tlb.c even on !SMP
Andy Lutomirski [Tue, 26 Apr 2016 16:39:07 +0000 (09:39 -0700)]
x86/mm: Build arch/x86/mm/tlb.c even on !SMP

commit e1074888c326038340a1ada9129d679e661f2ea6 upstream.

Currently all of the functions that live in tlb.c are inlined on
!SMP builds.  One can debate whether this is a good idea (in many
respects the code in tlb.c is better than the inlined UP code).

Regardless, I want to add code that needs to be built on UP and SMP
kernels and relates to tlb flushing, so arrange for tlb.c to be
compiled unconditionally.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f0d778f0d828fc46e5d1946bca80f0aaf9abf032.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 83cc4b50e3a977915666ade0b951ba446e7181bd)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agosched/core: Add switch_mm_irqs_off() and use it in the scheduler
Andy Lutomirski [Tue, 26 Apr 2016 16:39:06 +0000 (09:39 -0700)]
sched/core: Add switch_mm_irqs_off() and use it in the scheduler

commit f98db6013c557c216da5038d9c52045be55cd039 upstream.

By default, this is the same thing as switch_mm().

x86 will override it as an optimization.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/df401df47bdd6be3e389c6f1e3f5310d70e81b2c.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 425f13a36652523d604fd96413d6c438d415dd70)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agomm/mmu_context, sched/core: Fix mmu_context.h assumption
Ingo Molnar [Thu, 28 Apr 2016 09:39:12 +0000 (11:39 +0200)]
mm/mmu_context, sched/core: Fix mmu_context.h assumption

commit 8efd755ac2fe262d4c8d5c9bbe054bb67dae93da upstream.

Some architectures (such as Alpha) rely on include/linux/sched.h definitions
in their mmu_context.h files.

So include sched.h before mmu_context.h.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit dfe513a4e8ddde75ffc6abd3f139c5d65bf925d7)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm: If INVPCID is available, use it to flush global mappings
Andy Lutomirski [Fri, 29 Jan 2016 19:42:59 +0000 (11:42 -0800)]
x86/mm: If INVPCID is available, use it to flush global mappings

commit d8bced79af1db6734f66b42064cc773cada2ce99 upstream.

On my Skylake laptop, INVPCID function 2 (flush absolutely
everything) takes about 376ns, whereas saving flags, twiddling
CR4.PGE to flush global mappings, and restoring flags takes about
539ns.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/ed0ef62581c0ea9c99b9bf6df726015e96d44743.1454096309.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 85d3700c744a11ee2989252acf50ccbbd814167a)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm: Add a 'noinvpcid' boot option to turn off INVPCID
Andy Lutomirski [Fri, 29 Jan 2016 19:42:58 +0000 (11:42 -0800)]
x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID

commit d12a72b844a49d4162f24cefdab30bed3f86730e upstream.

This adds a chicken bit to turn off INVPCID in case something goes
wrong.  It's an early_param() because we do TLB flushes before we
parse __setup() parameters.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/f586317ed1bc2b87aee652267e515b90051af385.1454096309.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 791a0f3fecdabe18cc291e5f9b7ebbdc81895975)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm: Fix INVPCID asm constraint
Borislav Petkov [Wed, 10 Feb 2016 14:51:16 +0000 (15:51 +0100)]
x86/mm: Fix INVPCID asm constraint

commit e2c7698cd61f11d4077fdb28148b2d31b82ac848 upstream.

So we want to specify the dependency on both @pcid and @addr so that the
compiler doesn't reorder accesses to them *before* the TLB flush. But
for that to work, we need to express this properly in the inline asm and
deref the whole desc array, not the pointer to it. See clwb() for an
example.

This fixes the build error on 32-bit:

  arch/x86/include/asm/tlbflush.h: In function __invpcid
  arch/x86/include/asm/tlbflush.h:26:18: error: memory input 0 is not directly addressable

which gcc4.7 caught but 5.x didn't. Which is strange. :-\

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Michael Matz <matz@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 04ec428b15f161ce8449756fb64b6f380c8d95fd)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/mm: Add INVPCID helpers
Andy Lutomirski [Fri, 29 Jan 2016 19:42:57 +0000 (11:42 -0800)]
x86/mm: Add INVPCID helpers

commit 060a402a1ddb551455ee410de2eadd3349f2801b upstream.

This adds helpers for each of the four currently-specified INVPCID
modes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/8a62b23ad686888cee01da134c91409e22064db9.1454096309.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit becf292446e9f2dc8842c448836bbe8005e24db0)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/ibrs: Remove 'ibrs_dump' and remove the pr_debug
Konrad Rzeszutek Wilk [Fri, 5 Jan 2018 04:29:31 +0000 (20:29 -0800)]
x86/ibrs: Remove 'ibrs_dump' and remove the pr_debug

Orabug: 27351274

There is no business in having ibrs_dump exposed to user-space
and it allowing to write to it. Reading that entry ends up
doing an IPI across all CPUs reading an MSR and that (on large
machines) is not something user-space should be able to do.

And also remove the pr_debug statements.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokABI: Revert kABI: Make the boot_cpu_data look normal
Konrad Rzeszutek Wilk [Fri, 5 Jan 2018 04:29:31 +0000 (20:29 -0800)]
kABI: Revert kABI: Make the boot_cpu_data look normal

.. which was the wrong way about it. The 'struct task_struct'
embeds boot_cpu_data in it, and the increase from 13 to 14
meant that any offset's in the 'struct task_struct' are now
off. Which is definitly an kABI breakage!

This fix puts the structure back to the original size,
and moves the 'ipbp' in the 'Linux custom' word and all is good.

Orabug: 27344012
CVE: CVE-2017-5715

Reported-by: Todd Vierling <todd.vierling@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agouserns: prevent speculative execution
Elena Reshetova [Thu, 4 Jan 2018 10:38:15 +0000 (02:38 -0800)]
userns: prevent speculative execution

From: Elena Reshetova <elena.reshetova@intel.com>

Since the pos value in function m_start()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
map->extent, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoudf: prevent speculative execution
Elena Reshetova [Thu, 4 Jan 2018 10:35:57 +0000 (02:35 -0800)]
udf: prevent speculative execution

Since the eahd->appAttrLocation value in function
udf_add_extendedattr() seems to be controllable by
userspace and later on conditionally (upon bound check)
used in following memmove, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agonet: mpls: prevent speculative execution
Elena Reshetova [Thu, 4 Jan 2018 10:12:38 +0000 (02:12 -0800)]
net: mpls: prevent speculative execution

Since the index value in function mpls_route_input_rcu()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
platform_label, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agofs: prevent speculative execution
Elena Reshetova [Thu, 4 Jan 2018 10:10:20 +0000 (02:10 -0800)]
fs: prevent speculative execution

Since the fd value in function __fcheck_files()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
fdt->fd, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoipv6: prevent speculative execution
Elena Reshetova [Thu, 4 Jan 2018 10:07:33 +0000 (02:07 -0800)]
ipv6: prevent speculative execution

Since the offset value in function raw6_getfrag()
seems to be controllable by userspace and later on
conditionally (upon bound check) used in the
following memcpy, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoipv4: prevent speculative execution
Elena Reshetova [Thu, 4 Jan 2018 10:03:54 +0000 (02:03 -0800)]
ipv4: prevent speculative execution

Since the offset value in function raw_getfrag()
seems to be controllable by userspace and later on
conditionally (upon bound check) used in the following
memcpy, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoThermal/int340x: prevent speculative execution
Elena Reshetova [Thu, 4 Jan 2018 09:53:58 +0000 (01:53 -0800)]
Thermal/int340x: prevent speculative execution

Since the trip value in function int340x_thermal_get_trip_temp()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
d->aux_trips, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
        patch refers to arch/x86/include/asm/msr-index.h
        code base has arch/x86/include/uapi/asm/msr-index.h

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agocw1200: prevent speculative execution
Elena Reshetova [Thu, 4 Jan 2018 09:50:27 +0000 (01:50 -0800)]
cw1200: prevent speculative execution

Since the queue value in function cw1200_conf_tx()
seems to be controllable by userspace and later on
conditionally (upon bound check) used in
WSM_TX_QUEUE_SET, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
        patch refers to drivers/net/wireless/st/cw1200/sta.c
        code base has drivers/net/wireless/cw1200/sta.c

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoqla2xxx: prevent speculative execution
Elena Reshetova [Thu, 4 Jan 2018 09:42:47 +0000 (01:42 -0800)]
qla2xxx: prevent speculative execution

Since the handle value in functions qlafx00_status_entry()
and qlafx00_multistatus_entry() seems to be controllable
by userspace and later on conditionally (upon bound check)
used to resolve req->outstanding_cmds, insert an observable
speculation barrier before its usage. This should prevent
observable speculation on that branch and avoid kernel
memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agop54: prevent speculative execution
Elena Reshetova [Thu, 4 Jan 2018 09:38:52 +0000 (01:38 -0800)]
p54: prevent speculative execution

Since the queue value in function p54_conf_tx()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
priv->qos_params, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
patch refers to drivers/net/wireless/intersil/p54/main.c
code base has drivers/net/wireless/p54/main.c

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agocarl9170: prevent speculative execution
Elena Reshetova [Thu, 4 Jan 2018 09:31:31 +0000 (01:31 -0800)]
carl9170: prevent speculative execution

Since the queue value in function carl9170_op_conf_tx()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
ar9170_qmap and following ar->edcf, insert an observable
speculation barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agouvcvideo: prevent speculative execution
Elena Reshetova [Thu, 4 Jan 2018 09:25:32 +0000 (01:25 -0800)]
uvcvideo: prevent speculative execution

Since the index value in function uvc_ioctl_enum_input()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
selector->baSourceID, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agobpf: prevent speculative execution in eBPF interpreter
Elena Reshetova [Thu, 4 Jan 2018 08:05:42 +0000 (00:05 -0800)]
bpf: prevent speculative execution in eBPF interpreter

This adds an observable speculation barrier before LD_IMM_DW and
LDX_MEM_B/H/W/DW eBPF instructions during eBPF program
execution in order to prevent speculative execution on out
of bound BFP_MAP array indexes. This way an arbitary kernel
memory is not exposed through side channel attacks.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
kernel/bpf/core.c code base differences

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agolocking/barriers: introduce new observable speculation barrier
Elena Reshetova [Thu, 4 Jan 2018 07:56:24 +0000 (23:56 -0800)]
locking/barriers: introduce new observable speculation barrier

The new observable speculation barrier, osb(), ensures
that any user observable speculation doesn't cross the boundary.

Any user observable speculative activity on this CPU
thread before this point either completes, reaches a
state it can no longer cause an observable activity, or
is aborted before instructions after the barrier execute.

In x86 case, osb() resolves in lfence if X86_FEATURE_LFENCE_RDTSC
is present. Other architectures can define their variants.

Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Suggested-by: Alan Cox <alan.cox@intel.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
include/asm-generic/barrier.h code base differences

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/cpu/AMD: Remove now unused definition of MFENCE_RDTSC feature
Elena Reshetova [Thu, 4 Jan 2018 07:43:33 +0000 (23:43 -0800)]
x86/cpu/AMD: Remove now unused definition of MFENCE_RDTSC feature

With the switch to using LFENCE_RDTSC on AMD platforms there is no longer
a need for the MFENCE_RDTSC feature.  Remove its usage and definition.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
Patch refers to arch/x86/include/asm/cpufeatures.h
Code base has   arch/x86/include/asm/cpufeature.h
Patch references X86_FEATURE_MFENCE_RDTSC in arch/x86/include/asm/msr.h
Code base references it in:
arch/x86/include/asm/barrier.h
arch/x86/um/asm/barrier.h

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/cpu/AMD: Make the LFENCE instruction serialized
Elena Reshetova [Thu, 4 Jan 2018 07:19:32 +0000 (23:19 -0800)]
x86/cpu/AMD: Make the LFENCE instruction serialized

In order to reduce the impact of using MFENCE, make the execution of the
LFENCE instruction serialized.  This is done by setting bit 1 of MSR
0xc0011029 (DE_CFG).

Some families that support LFENCE do not have this MSR.  For these
families, the LFENCE instruction is already serialized.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
patch refers to arch/x86/include/asm/msr-index.h
code base has arch/x86/include/uapi/asm/msr-index.h

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokABI: Make the boot_cpu_data look normal.
Konrad Rzeszutek Wilk [Thu, 4 Jan 2018 17:26:58 +0000 (12:26 -0500)]
kABI: Make the boot_cpu_data look normal.

It is statically allocated and we only grow it - so having an
GENKSYMS around it is fine. This fixes
aff7641cb9f37c7aa6897a7b51faa6e20b08013f
"x86/cpu/AMD: Add speculative control support for AMD" breaking the kABI

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agokernel.spec: Require the new microcode_ctl.
Konrad Rzeszutek Wilk [Thu, 4 Jan 2018 15:38:42 +0000 (10:38 -0500)]
kernel.spec: Require the new microcode_ctl.

Which provides the new CPUID and MSRs to combat
CVE-2017-5715

Orabug: 27344012
CVE: CVE-2017-5715

Suggested-by: Todd Vierling <todd.vierling@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/microcode/AMD: Add support for fam17h microcode loading
Tom Lendacky [Thu, 30 Nov 2017 22:46:40 +0000 (16:46 -0600)]
x86/microcode/AMD: Add support for fam17h microcode loading

The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes.  Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20171130224640.15391.40247.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agox86/spec_ctrl: Disable if running as Xen PV guest.
Konrad Rzeszutek Wilk [Fri, 29 Dec 2017 19:45:40 +0000 (14:45 -0500)]
x86/spec_ctrl: Disable if running as Xen PV guest.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoSet IBPB when running a different VCPU
Dave Hansen [Wed, 20 Dec 2017 20:54:52 +0000 (12:54 -0800)]
Set IBPB when running a different VCPU

Picking up a change from:
From: Tim Chen <tim.c.chen@linux.intel.com>
Date: Thu, 30 Nov 2017 15:00:12 +0100
[RHEL7.5 PATCH 07/35] kvm: vmx: Set IBPB when running a different
 VCPU

Ensure an IBPB (Indirect branch prediction barrier) before every VCPU
switch.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoClear the host registers after setbe
Jun Nakajima [Wed, 20 Dec 2017 16:04:54 +0000 (08:04 -0800)]
Clear the host registers after setbe

The original patch cleared the host registers before setbe doing XOR,
and it set a false flag as VM enry failure.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
7 years agoUse the ibpb_inuse variable.
Jun Nakajima [Wed, 20 Dec 2017 16:04:53 +0000 (08:04 -0800)]
Use the ibpb_inuse variable.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>