www.infradead.org Git - users/jedix/linux-maple.git/log

kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE

Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping(). And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 003e476716906afa135faf605ae0a5c3598c0293)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
init/main.c

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

kaiser: do not set _PAGE_NX on pgd_none

native_pgd_clear() uses native_set_pgd(), so native_set_pgd() must
avoid setting the _PAGE_NX bit on an otherwise pgd_none() entry:
usually that just generated a warning on exit, but sometimes
more mysterious and damaging failures (our production machines
could not complete booting).

The original fix to this just avoided adding _PAGE_NX to
an empty entry; but eventually more problems surfaced with kexec,
and EFI mapping expected to be a problem too.  So now instead
change native_set_pgd() to update shadow only if _PAGE_USER:

A few places (kernel/machine_kexec_64.c, platform/efi/efi_64.c for sure)
use set_pgd() to set up a temporary internal virtual address space, with
physical pages remapped at what Kaiser regards as userspace addresses:
Kaiser then assumes a shadow pgd follows, which it will try to corrupt.

This appears to be responsible for the recent kexec and kdump failures;
though it's unclear how those did not manifest as a problem before.
Ah, the shadow pgd will only be assumed to "follow" if the requested
pgd is on an even-numbered page: so I suppose it was going wrong 50%
of the time all along.

What we need is a flag to set_pgd(), to tell it we're dealing with
userspace.  Er, isn't that what the pgd's _PAGE_USER bit is saying?
Add a test for that.  But we cannot do the same for pgd_clear()
(which may be called to clear corrupted entries - set aside the
question of "corrupt in which pgd?" until later), so there just
rely on pgd_clear() not being called in the problematic cases -
with a WARN_ON_ONCE() which should fire half the time if it is.

But this is getting too big for an inline function: move it into
arch/x86/mm/kaiser.c (which then demands a boot/compressed mod);
and de-void and de-space native_get_shadow/normal_pgd() while here.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit edde73205b3fdde8c8a3adfce78cc6d0de72386b)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

kaiser: merged update

Merged fixes and cleanups, rebased to 4.4.89 tree (no 5-level paging).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bed9bb7f3e6d4045013d2bb9e4004896de57f02b)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)
arch/x86/kernel/ldt.c
kernel/fork.c

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

KAISER: Kernel Address Isolation

This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

https://github.com/IAIK/KAISER

From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>
After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).

With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.

If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf

[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8a43ddfb93a0c6ae1a6e1f5c25705ec5d1843c40)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/entry/entry_64.S (not in this tree)
arch/x86/kernel/entry_64.S (patched instead of that)
arch/x86/entry/entry_64_compat.S (not in this tree)
arch/x86/ia32/ia32entry.S (patched instead of that)
arch/x86/include/asm/hw_irq.h
arch/x86/kernel/irqinit.c
kernel/fork.c

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/boot: Add early cmdline parsing for options with arguments

commit e505371dd83963caae1a37ead9524e8d997341be upstream.

Add a cmdline_find_option() function to look for cmdline options that
take arguments. The argument is returned in a supplied buffer and the
argument length (regardless of whether it fits in the supplied buffer)
is returned, with -1 indicating not found.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/36b5f97492a9745dce27682305f990fc20e5cf8a.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0fa147b407478e73fe7a478677ff2b12bb824014)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm/64: Fix reboot interaction with CR4.PCIDE

commit 924c6b900cfdf376b07bccfd80e62b21914f8a5a upstream.

Trying to reboot via real mode fails with PCID on: long mode cannot
be exited while CR4.PCIDE is set.  (No, I have no idea why, but the
SDM and actual CPUs are in agreement here.)  The result is a GPF and
a hang instead of a reboot.

I didn't catch this in testing because neither my computer nor my VM
reboots this way.  I can trigger it with reboot=bios, though.

Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
Reported-and-tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/f1e7d965998018450a7a70c2823873686a8b21c0.1507524746.git.luto@kernel.org
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6c4db09c291a19da66512f99c4bcb378a862f9e6)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Enable CR4.PCIDE on supported systems

commit 660da7c9228f685b2ebe664f9fd69aaddcc420b5 upstream.

We can use PCID if the CPU has PCID and PGE and we're not on Xen.

By itself, this has no effect. A followup patch will start using PCID.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/6327ecd907b32f79d5aa0d466f04503bbec5df88.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fd0504525efd2ce2063cd4229baabd3e3a56ecbc)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Add the 'nopcid' boot option to turn off PCID

commit 0790c9aad84901ca1bdc14746175549c8b5da215 upstream.

The parameter is only present on x86_64 systems to save a few bytes,
as PCID is always disabled on x86_32.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/8bbb2e65bcd249a5f18bfb8128b4689f08ac2b60.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit dcccd3c266e24ce80ecf592765315c54c222ac33)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Disable PCID on 32-bit kernels

commit cba4671af7550e008f7a7835f06df0763825bf3e upstream.

32-bit kernels on new hardware will see PCID in CPUID, but PCID can
only be used in 64-bit mode. Rather than making all PCID code
conditional, just disable the feature on 32-bit builds.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/2e391769192a4d31b808410c383c6bf0734bc6ea.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 78043e5b6fb2921d836b31f23e89e52925191153)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code

commit ce4a4e565f5264909a18c733b864c3f74467f69e upstream.

The UP asm/tlbflush.h generates somewhat nicer code than the SMP version.
Aside from that, it's fallen quite a bit behind the SMP code:

- flush_tlb_mm_range() didn't flush individual pages if the range
   was small.

- The lazy TLB code was much weaker.  This usually wouldn't matter,
   but, if a kernel thread flushed its lazy "active_mm" more than
   once (due to reclaim or similar), it wouldn't be unlazied and
   would instead pointlessly flush repeatedly.

- Tracepoints were missing.

Aside from that, simply having the UP code around was a maintanence
burden, since it means that any change to the TLB flush code had to
make sure not to break it.

Simplify everything by deleting the UP code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b2e24274d50e0ecdf560ebe06dbed0cc648ad3f9)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/Kconfig

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()

commit ca6c99c0794875c6d1db6e22f246699691ab7e6b upstream.

flush_tlb_page() was very similar to flush_tlb_mm_range() except that
it had a couple of issues:

- It was missing an smp_mb() in the case where
   current->active_mm != mm.  (This is a longstanding bug reported by Nadav Amit)

- It was missing tracepoints and vm counter updates.

The only reason that I can see for keeping it at as a separate
function is that it could avoid a few branches that
flush_tlb_mm_range() needs to decide to flush just one page.  This
hardly seems worthwhile.  If we decide we want to get rid of those
branches again, a better way would be to introduce an
__flush_tlb_mm_range() helper and make both flush_tlb_page() and
flush_tlb_mm_range() use it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/3cc3847cf888d8907577569b8bac3f01992ef8f9.1495492063.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3efba6062a410a2a65fc9d6f53dca63db2602e65)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Make flush_tlb_mm_range() more predictable

commit ce27374fabf553153c3f53efcaa9bfab9216bd8c upstream.

I'm about to rewrite the function almost completely, but first I
want to get a functional change out of the way.  Currently, if
flush_tlb_mm_range() does not flush the local TLB at all, it will
never do individual page flushes on remote CPUs.  This seems to be
an accident, and preserving it will be awkward.  Let's change it
first so that any regressions in the rewrite will be easier to
bisect and so that the rewrite can attempt to change no visible
behavior at all.

The fix is simple: we can simply avoid short-circuiting the
calculation of base_pages_to_flush.

As a side effect, this also eliminates a potential corner case: if
tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range()
could have ended up flushing the entire address space one page at a
time.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9f4d1ba1d407e56dac833aa0b11c60f952939e1c)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Remove flush_tlb() and flush_tlb_current_task()

commit 29961b59a51f8c6838a26a45e871a7ed6771809b upstream.

I was trying to figure out what how flush_tlb_current_task() would
possibly work correctly if current->mm != current->active_mm, but I
realized I could spare myself the effort: it has no callers except
the unused flush_tlb() macro.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 227d6f0e79f809e448d3157fbfd00eb54dcbb54e)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()

commit 9ccee2373f0658f234727700e619df097ba57023 upstream.

mark_screen_rdonly() is the last remaining caller of flush_tlb().
flush_tlb_mm_range() is potentially faster and isn't obsolete.

Compile-tested only because I don't know whether software that uses
this mechanism even exists.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6ce9d1e6819e53c4de0bf980555c4e07bbedb4ce)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/irq: Do not substract irq_tlb_count from irq_call_count

commit 82ba4faca1bffad429f15c90c980ffd010366c25 upstream.

Since commit:

  52aec3308db8 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR")

the TLB remote shootdown is done through call function vector. That
commit didn't take care of irq_tlb_count, which a later commit:

  fd0f5869724f ("x86: Distinguish TLB shootdown interrupts from other functions call interrupts")

... tried to fix.

The fix assumes every increase of irq_tlb_count has a corresponding
increase of irq_call_count. So the irq_call_count is always bigger than
irq_tlb_count and we could substract irq_tlb_count from irq_call_count.

Unfortunately this is not true for the smp_call_function_single() case.
The IPI is only sent if the target CPU's call_single_queue is empty when
adding a csd into it in generic_exec_single. That means if two threads
are both adding flush tlb csds to the same CPU's call_single_queue, only
one IPI is sent. In other words, the irq_call_count is incremented by 1
but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
bigger than irq_call_count and the substract will produce a very large
irq_call_count value due to overflow.

Considering that:

  1) it's not worth to send more IPIs for the sake of accurate counting of
     irq_call_count in generic_exec_single();

  2) it's not easy to tell if the call function interrupt is for TLB
     shootdown in __smp_call_function_single_interrupt().

Not to exclude TLB shootdown from call function count seems to be the
simplest fix and this patch just does that.

This bug was found by LKP's cyclic performance regression tracking recently
with the vm-scalability test suite. I have bisected to commit:

  3dec0ba0be6a ("mm/rmap: share the i_mmap_rwsem")

This commit didn't do anything wrong but revealed the irq_call_count
problem. IIUC, the commit makes rwc->remap_one in rmap_walk_file
concurrent with multiple threads.  When remap_one is try_to_unmap_one(),
then multiple threads could queue flush TLB to the same CPU but only
one IPI will be sent.

Since the commit was added in Linux v3.19, the counting problem only
shows up from v3.19 onwards.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Link: http://lkml.kernel.org/r/20160811074430.GA18163@aaronlu.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3b9d9ec0d8261bb9b12f858e66f0c84cd2a6a3bb)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()

commit 252d2a4117bc181b287eeddf848863788da733ae upstream.

idle_task_exit() can be called with IRQs on x86 on and therefore
should use switch_mm(), not switch_mm_irqs_off().

This doesn't seem to cause any problems right now, but it will
confuse my upcoming TLB flush changes. Nonetheless, I think it
should be backported because it's trivial. There won't be any
meaningful performance impact because idle_task_exit() is only
used when offlining a CPU.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
Link: http://lkml.kernel.org/r/ca3d1a9fa93a0b49f5a8ff729eda3640fb6abdf9.1497034141.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 18a5348d49afcfc2b95da939143c9420edd78b9e)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

ARM: Hide finish_arch_post_lock_switch() from modules

commit ef0491ea17f8019821c7e9c8e801184ecf17f85a upstream.

The introduction of switch_mm_irqs_off() brought back an old bug
regarding the use of preempt_enable_no_resched:

As part of:

  62b94a08da1b ("sched/preempt: Take away preempt_enable_no_resched() from modules")

the definition of preempt_enable_no_resched() is only available in
built-in code, not in loadable modules, so we can't generally use
it from header files.

However, the ARM version of finish_arch_post_lock_switch()
calls preempt_enable_no_resched() and is defined as a static
inline function in asm/mmu_context.h. This in turn means we cannot
include asm/mmu_context.h from modules.

With today's tip tree, asm/mmu_context.h gets included from
linux/mmu_context.h, which is normally the exact pattern one would
expect, but unfortunately, linux/mmu_context.h can be included from
the vhost driver that is a loadable module, now causing this compile
time error with modular configs:

  In file included from ../include/linux/mmu_context.h:4:0,
                   from ../drivers/vhost/vhost.c:18:
  ../arch/arm/include/asm/mmu_context.h: In function 'finish_arch_post_lock_switch':
  ../arch/arm/include/asm/mmu_context.h:88:3: error: implicit declaration of function 'preempt_enable_no_resched' [-Werror=implicit-function-declaration]
     preempt_enable_no_resched();

Andy already tried to fix the bug by including linux/preempt.h
from asm/mmu_context.h, but that didn't help. Arnd suggested reordering
the header files, which wasn't popular, so let's use this
workaround instead:

The finish_arch_post_lock_switch() definition is now also hidden
inside of #ifdef MODULE, so we don't see anything referencing
preempt_enable_no_resched() from a header file. I've built a
few hundred randconfig kernels with this, and did not see any
new problems.

Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
Link: http://lkml.kernel.org/r/1463146234-161304-1-git-send-email-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c22d4b4d1c7fcc0d9eb4d8618d86c554c48ed9c0)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm, sched/core: Turn off IRQs in switch_mm()

commit 078194f8e9fe3cf54c8fd8bded48a1db5bd8eb8a upstream.

Potential races between switch_mm() and TLB-flush or LDT-flush IPIs
could be very messy. AFAICT the code is currently okay, whether by
accident or by careful design, but enabling PCID will make it
considerably more complicated and will no longer be obviously safe.

Fix it with a big hammer: run switch_mm() with IRQs off.

To avoid a performance hit in the scheduler, we take advantage of
our knowledge that the scheduler already has IRQs disabled when it
calls switch_mm().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f19baf759693c9dcae64bbff76189db77cb13398.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4ead44fd2525ed97e5362a806d312a0e3b0ea445)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/include/asm/mmu_context.h

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm, sched/core: Uninline switch_mm()

commit 69c0319aabba45bcf33178916a2f06967b4adede upstream.

It's fairly large and it has quite a few callers. This may also
help untangle some headers down the road.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/54f3367803e7f80b2be62c8a21879aa74b1a5f57.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 70a39c7fd167399fde76aeac314dce026a255b49)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Conflicts:
arch/x86/include/asm/mmu_context.h

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Build arch/x86/mm/tlb.c even on !SMP

commit e1074888c326038340a1ada9129d679e661f2ea6 upstream.

Currently all of the functions that live in tlb.c are inlined on
!SMP builds. One can debate whether this is a good idea (in many
respects the code in tlb.c is better than the inlined UP code).

Regardless, I want to add code that needs to be built on UP and SMP
kernels and relates to tlb flushing, so arrange for tlb.c to be
compiled unconditionally.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f0d778f0d828fc46e5d1946bca80f0aaf9abf032.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 83cc4b50e3a977915666ade0b951ba446e7181bd)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

sched/core: Add switch_mm_irqs_off() and use it in the scheduler

commit f98db6013c557c216da5038d9c52045be55cd039 upstream.

By default, this is the same thing as switch_mm().

x86 will override it as an optimization.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/df401df47bdd6be3e389c6f1e3f5310d70e81b2c.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 425f13a36652523d604fd96413d6c438d415dd70)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

mm/mmu_context, sched/core: Fix mmu_context.h assumption

commit 8efd755ac2fe262d4c8d5c9bbe054bb67dae93da upstream.

Some architectures (such as Alpha) rely on include/linux/sched.h definitions
in their mmu_context.h files.

So include sched.h before mmu_context.h.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit dfe513a4e8ddde75ffc6abd3f139c5d65bf925d7)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: If INVPCID is available, use it to flush global mappings

commit d8bced79af1db6734f66b42064cc773cada2ce99 upstream.

On my Skylake laptop, INVPCID function 2 (flush absolutely
everything) takes about 376ns, whereas saving flags, twiddling
CR4.PGE to flush global mappings, and restoring flags takes about
539ns.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/ed0ef62581c0ea9c99b9bf6df726015e96d44743.1454096309.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 85d3700c744a11ee2989252acf50ccbbd814167a)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID

commit d12a72b844a49d4162f24cefdab30bed3f86730e upstream.

This adds a chicken bit to turn off INVPCID in case something goes
wrong. It's an early_param() because we do TLB flushes before we
parse __setup() parameters.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/f586317ed1bc2b87aee652267e515b90051af385.1454096309.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 791a0f3fecdabe18cc291e5f9b7ebbdc81895975)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Fix INVPCID asm constraint

commit e2c7698cd61f11d4077fdb28148b2d31b82ac848 upstream.

So we want to specify the dependency on both @pcid and @addr so that the
compiler doesn't reorder accesses to them *before* the TLB flush. But
for that to work, we need to express this properly in the inline asm and
deref the whole desc array, not the pointer to it. See clwb() for an
example.

This fixes the build error on 32-bit:

arch/x86/include/asm/tlbflush.h: In function __invpcid
arch/x86/include/asm/tlbflush.h:26:18: error: memory input 0 is not directly addressable

which gcc4.7 caught but 5.x didn't. Which is strange. :-\

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Michael Matz <matz@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 04ec428b15f161ce8449756fb64b6f380c8d95fd)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Add INVPCID helpers

commit 060a402a1ddb551455ee410de2eadd3349f2801b upstream.

This adds helpers for each of the four currently-specified INVPCID
modes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/8a62b23ad686888cee01da134c91409e22064db9.1454096309.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit becf292446e9f2dc8842c448836bbe8005e24db0)
Orabug: 27333760
CVE: CVE-2017-5754
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/ibrs: Remove 'ibrs_dump' and remove the pr_debug

Orabug: 27351274

There is no business in having ibrs_dump exposed to user-space
and it allowing to write to it. Reading that entry ends up
doing an IPI across all CPUs reading an MSR and that (on large
machines) is not something user-space should be able to do.

And also remove the pr_debug statements.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

kABI: Revert kABI: Make the boot_cpu_data look normal

.. which was the wrong way about it. The 'struct task_struct'
embeds boot_cpu_data in it, and the increase from 13 to 14
meant that any offset's in the 'struct task_struct' are now
off. Which is definitly an kABI breakage!

This fix puts the structure back to the original size,
and moves the 'ipbp' in the 'Linux custom' word and all is good.

Orabug: 27344012
CVE: CVE-2017-5715

Reported-by: Todd Vierling <todd.vierling@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

userns: prevent speculative execution

From: Elena Reshetova <elena.reshetova@intel.com>

Since the pos value in function m_start()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
map->extent, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

udf: prevent speculative execution

Since the eahd->appAttrLocation value in function
udf_add_extendedattr() seems to be controllable by
userspace and later on conditionally (upon bound check)
used in following memmove, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

net: mpls: prevent speculative execution

Since the index value in function mpls_route_input_rcu()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
platform_label, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

fs: prevent speculative execution

Since the fd value in function __fcheck_files()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
fdt->fd, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

ipv6: prevent speculative execution

Since the offset value in function raw6_getfrag()
seems to be controllable by userspace and later on
conditionally (upon bound check) used in the
following memcpy, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

ipv4: prevent speculative execution

Since the offset value in function raw_getfrag()
seems to be controllable by userspace and later on
conditionally (upon bound check) used in the following
memcpy, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

Thermal/int340x: prevent speculative execution

Since the trip value in function int340x_thermal_get_trip_temp()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
d->aux_trips, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
patch refers to arch/x86/include/asm/msr-index.h
code base has arch/x86/include/uapi/asm/msr-index.h

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

cw1200: prevent speculative execution

Since the queue value in function cw1200_conf_tx()
seems to be controllable by userspace and later on
conditionally (upon bound check) used in
WSM_TX_QUEUE_SET, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
patch refers to drivers/net/wireless/st/cw1200/sta.c
code base has drivers/net/wireless/cw1200/sta.c

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

qla2xxx: prevent speculative execution

Since the handle value in functions qlafx00_status_entry()
and qlafx00_multistatus_entry() seems to be controllable
by userspace and later on conditionally (upon bound check)
used to resolve req->outstanding_cmds, insert an observable
speculation barrier before its usage. This should prevent
observable speculation on that branch and avoid kernel
memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

p54: prevent speculative execution

Since the queue value in function p54_conf_tx()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
priv->qos_params, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
patch refers to drivers/net/wireless/intersil/p54/main.c
code base has drivers/net/wireless/p54/main.c

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

carl9170: prevent speculative execution

Since the queue value in function carl9170_op_conf_tx()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
ar9170_qmap and following ar->edcf, insert an observable
speculation barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

uvcvideo: prevent speculative execution

Since the index value in function uvc_ioctl_enum_input()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
selector->baSourceID, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

bpf: prevent speculative execution in eBPF interpreter

This adds an observable speculation barrier before LD_IMM_DW and
LDX_MEM_B/H/W/DW eBPF instructions during eBPF program
execution in order to prevent speculative execution on out
of bound BFP_MAP array indexes. This way an arbitary kernel
memory is not exposed through side channel attacks.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
kernel/bpf/core.c code base differences

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

locking/barriers: introduce new observable speculation barrier

The new observable speculation barrier, osb(), ensures
that any user observable speculation doesn't cross the boundary.

Any user observable speculative activity on this CPU
thread before this point either completes, reaches a
state it can no longer cause an observable activity, or
is aborted before instructions after the barrier execute.

In x86 case, osb() resolves in lfence if X86_FEATURE_LFENCE_RDTSC
is present. Other architectures can define their variants.

Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Suggested-by: Alan Cox <alan.cox@intel.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
include/asm-generic/barrier.h code base differences

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/cpu/AMD: Remove now unused definition of MFENCE_RDTSC feature

With the switch to using LFENCE_RDTSC on AMD platforms there is no longer
a need for the MFENCE_RDTSC feature. Remove its usage and definition.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
Patch refers to arch/x86/include/asm/cpufeatures.h
Code base has arch/x86/include/asm/cpufeature.h
Patch references X86_FEATURE_MFENCE_RDTSC in arch/x86/include/asm/msr.h
Code base references it in:
arch/x86/include/asm/barrier.h
arch/x86/um/asm/barrier.h

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/cpu/AMD: Make the LFENCE instruction serialized

In order to reduce the impact of using MFENCE, make the execution of the
LFENCE instruction serialized. This is done by setting bit 1 of MSR
0xc0011029 (DE_CFG).

Some families that support LFENCE do not have this MSR. For these
families, the LFENCE instruction is already serialized.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
patch refers to arch/x86/include/asm/msr-index.h
code base has arch/x86/include/uapi/asm/msr-index.h

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

kABI: Make the boot_cpu_data look normal.

It is statically allocated and we only grow it - so having an
GENKSYMS around it is fine. This fixes
aff7641cb9f37c7aa6897a7b51faa6e20b08013f
"x86/cpu/AMD: Add speculative control support for AMD" breaking the kABI

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

kernel.spec: Require the new microcode_ctl.

Which provides the new CPUID and MSRs to combat
CVE-2017-5715

Orabug: 27344012
CVE: CVE-2017-5715

Suggested-by: Todd Vierling <todd.vierling@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/microcode/AMD: Add support for fam17h microcode loading

The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes. Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20171130224640.15391.40247.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/spec_ctrl: Disable if running as Xen PV guest.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

Set IBPB when running a different VCPU

Picking up a change from:
From: Tim Chen <tim.c.chen@linux.intel.com>
Date: Thu, 30 Nov 2017 15:00:12 +0100
[RHEL7.5 PATCH 07/35] kvm: vmx: Set IBPB when running a different
VCPU

Ensure an IBPB (Indirect branch prediction barrier) before every VCPU
switch.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

Clear the host registers after setbe

The original patch cleared the host registers before setbe doing XOR,
and it set a false flag as VM enry failure.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

Use the ibpb_inuse variable.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

KVM: x86: add SPEC_CTRL to MSR and CPUID lists

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

kvm: vmx: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD

[RHEL7.5 PATCH 08/35] kvm: vmx: add MSR_IA32_SPEC_CTRL and
MSR_IA32_PRED_CMD

Allow load/store of MSR_IA32_SPEC_CTRL, restore guest IBRS on VM entry
and set it to 1 on VM exit.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

Use the "ibrs_inuse" variable.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

kvm: svm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/svm: Set IBPB when running a different VCPU

[RHEL7.5 PATCH 09/35] x86/svm: Set IBPB when running a different VCPU

Set IBPB (Indirect Branch Prediction Barrier) when the current CPU is
going to run a VCPU different from what was previously run. Nested
virtualization uses the same VMCB for the second level guest, but the
L1 hypervisor should be using IBRS to protect itself.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/kvm: Pad RSB on VM transition

This is a patch from:

From: Tim Chen <tim.c.chen@linux.intel.com>
Date: Thu, 30 Nov 2017 15:00:10 +0100
Subject: [RHEL7.5 PATCH 05/35] x86/kvm: Pad RSB on VM transition

Add code to pad the local CPU's RSB entries to protect
from previous less privilege mode.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/cpu/AMD: Add speculative control support for AMD

Add speculative control support for AMD processors. For AMD, speculative
control is indicated as follows:

  CPUID EAX=0x00000007, ECX=0x00 return EDX[26] indicates support for
  both IBRS and IBPB.

  CPUID EAX=0x80000008, ECX=0x00 return EBX[12] indicates support for
  just IBPB.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.inte.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport: We don't have 39c06df4dc10a "x86/cpufeature: Cleanup get_cpu_cap()"
which adds a nice enum and we neither do we have 2167ceabf3416
"x86/cpu: Add CLZERO detection". As such we just a partial backport
of the last one and only look for one specific bit (12).]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/microcode: Recheck IBRS and IBPB feature on microcode reload

On new microcode write, check whether IBRS and IBPB features
are present by rescanning scattered CPU features.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86: Move IBRS/IBPB feature detection to scattered.c

Move IBRS/IBPB to scattered features for easier feature rescan.
This help to rescan feature on microcode reload later.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb control

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature

There are 2 ways to control IBPB and IBRS

1. At boot time
        noibrs kernel boot parameter will disable IBRS usage
        noibpb kernel boot parameter will disable IBPB usage
Otherwise if the above parameters are not specified, the system
will enable ibrs and ibpb usage if the cpu supports it.

2. At run time
        echo 0 > /proc/sys/kernel/ibrs_enabled will turn off IBRS
        echo 1 > /proc/sys/kernel/ibrs_enabled will turn on IBRS in kernel
        echo 2 > /proc/sys/kernel/ibrs_enabled will turn on IBRS in both userspace and kernel

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport: This completes the scaffolding work done in the earlier
patch which had the same title]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/kvm: clear registers on VM exit

Clear registers on VM exit to prevent speculative use of them.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/kvm: Set IBPB when switching VM

Set IBPB (Indirect branch prediction barrier) when switching VM.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

*INCOMPLETE* x86/syscall: Clear unused extra registers on syscall entrance

To prevent the unused registers %r12-%r15, %rbp and %rbx from
being used speculatively, we clear them upon syscall entrance
for code hygiene.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Backport: We don't have the ORC stack which means our calling.h
has the CTF code. And that has RESTORE_EXTRA_ARGS and ZERO_EXTRA_ARGS
so there was no need to port that in. See
commit 76f5df43cab5e765c0bd42289103e8f625813ae1
x86/asm/entry/64: Always allocate a complete "struct pt_regs" on the kernel stack
which added them.

The ZERO_EXTRA_REGS (aka CLEAR_EXTRA_REGS) is not part of it.
It ends up crashing the user-space. Not sure why not.

Which means this patch is pretty much useless - we don't clear
any of the %r12-%r15, nor %rbp, nor %rbx at all.

In other words we just save now more registers on the %esp
and restore them.

But somewhere we depend on these and need to fix that.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/entry: Stuff RSB for entry to kernel for non-SMEP platform

Stuff RSB to prevent RSB underflow on non-SMEP platform.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Only set IBPB when the new thread cannot ptrace current thread

To reduce overhead of setting IBPB, we only do that when
the new thread cannot ptrace the current one. If the new
thread has ptrace capability on current thread, it is safe.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport: Need more #include's than the original]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Set IBPB upon context switch

Set IBPB on context switch with changing of page table.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport needs an asm/microcode.h to include the native_wrmsrl]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup

Clear IBRS when cpu is offlined and set it when bringing it back online.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/idle: Disable IBRS entering idle and enable it on wakeup

Clear IBRS on idle entry and set it on idle exit into kernel on mwait.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport: We don't have b466bdb614823
"x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer"
hence the change to delay_mwaitx is not needed]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/spec_ctrl: save IBRS MSR value in paranoid_entry

If the NMI runs while entering kernel between SWAPGS and IBRS_ENABLE
everything is fine, paranoid_entry would have unconditionally set
IBRS bit 0 and when exiting the NMI it would have cleared bit 0 like
if it was returning to userland. IBRS_ENABLE would have then enabled
bit 0 again.

If NMI instead runs when exiting kernel between IBRS_DISABLE and
SWAPGS, the NMI would have turned on IBRS bit 0 and then it would have
left enabled when exiting the NMI. IBRS bit 0 would then be left
enabled in userland until the next enter kernel.

That is a minor inefficiency only, but we can eliminate it by saving
the MSR when entering the NMI in save_paranoid and restoring it when
exiting the NMI.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

*Scaffolding* x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature

1. At boot time
noibrs kernel boot parameter will disable IBRS usage
noibpb kernel boot parameter will disable IBPB usage

Otherwise if the above parameters are not specified, the system
will enable ibrs and ibpb usage if the cpu supports it.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[*Scaffolding*:This backport lacks a lot. It is only put it on so that the
later patches compiled _and_ can be tested to run. It is meant to be removed
once the full set of patches are all good. Aka scaffolding.]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/enter: Use IBRS on syscall and interrupts

Set IBRS upon kernel entrance via syscall and interrupts. Clear it
upon exit.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport: I had to add 'asm/spec_ctrl.h' in the assembler files]
Also we should not put ENABLE_IBRS on irq_entries_start]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86: Add macro that does not save rax, rcx, rdx on stack to disable IBRS

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/enter: MACROS to set/clear IBRS and set IBP

Setup macros to control IBRS and IBPB

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport: In UEK4 it is 'cpufeature.h', not 'cpufeatures.h']

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/feature: Report presence of IBPB and IBRS control

Report presence of IBPB and IBRS.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86: Add STIBP feature enumeration

Enumerate single thread indirect branch predictors (STIBP) feature. It
provides means to prevent indirect branch predictions from being
controlled by sibling HW thread.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/cpufeature: Add X86_FEATURE_IA32_ARCH_CAPS and X86_FEATURE_IBRS_ATT

Enumerate future CPU that implements IBRS all the time in its architecture.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/feature: Enable the x86 feature to control

cpuid ax=0x7, return rdx bit 26 to indicate presence of this feature
IA32_SPEC_CTRL (0x48) and IA32_PRED_CMD (0x49)
IA32_SPEC_CTRL, bit0 – Indirect Branch Restricted Speculation (IBRS)
IA32_PRED_CMD, bit0 – Indirect Branch Prediction Barrier (IBPB)

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

dccp: CVE-2017-8824: use-after-free in DCCP code

Whenever the sock object is in DCCP_CLOSED state,
dccp_disconnect() must free dccps_hc_tx_ccid and
dccps_hc_rx_ccid and set to NULL.

Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 69c64866ce072dea1d1e59a0d61e0f66c0dffb76)

Orabug: 27290292
CVE: CVE-2017-8824

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

negotiate_mq should happen in all cases of a new VBD being discovered by
xen-blkfront, whether called through _probe() or a hot-attached new VBD
from dom-0 via xenstore. Otherwise, hot-attached new VBDs are left
configured without multi-queue.

Orabug: 27180421

Signed-off-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Patrick Colp <patrick.colp@oracle.com>

e1000: avoid null pointer dereference on invalid stat type

Currently if the stat type is invalid then data[i] is being set
either by dereferencing a null pointer p, or it is reading from
an incorrect previous location if we had a valid stat type
previously. Fix this by skipping over the read of p on an invalid
stat type.

Detected by CoverityScan, CID#113385 ("Explicit null dereferenced")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 5983587c8c5ef00d6886477544ad67d495bc5479)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: fix race condition between e1000_down() and e1000_watchdog

This patch fixes a race condition that can result into the interface being
up and carrier on, but with transmits disabled in the hardware.
The bug may show up by repeatedly IFF_DOWN+IFF_UP the interface, which
allows e1000_watchdog() interleave with e1000_down().

    CPU x                           CPU y
    --------------------------------------------------------------------
    e1000_down():
        netif_carrier_off()
                                    e1000_watchdog():
                                        if (carrier == off) {
                                            netif_carrier_on();
                                            enable_hw_transmit();
                                        }
        disable_hw_transmit();
                                    e1000_watchdog():
                                        /* carrier on, do nothing */

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 44c445c3d1b4eacff23141fa7977c3b2ec3a45c9)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Be drop monitor friendly

e1000e_put_txbuf() can be called from normal reclamation path as well as
when a DMA mapping failure, so we need to differentiate these two cases
when freeing SKBs to be drop monitor friendly. e1000e_tx_hwtstamp_work()
and e1000_remove() are processing TX timestamped SKBs and those should
not be accounted as drops either.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 377b62736c01f14309141c69caa6d84363c12e12)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: apply burst mode settings only on default

Devices that support FLAG2_DMA_BURST have different default values
for RDTR and RADV. Apply burst mode default settings only when no
explicit value was passed at module load.

The RDTR default is zero. If the module is loaded for low latency
operation with RxIntDelay=0, do not override this value with a burst
default of 32.

Move the decision to apply burst values earlier, where explicitly
initialized module variables can be distinguished from defaults.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 48072ae1ec7a1c778771cad8c1b8dd803c4992ab)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: fix buffer overrun while the I219 is processing DMA transactions

Intel® 100/200 Series Chipset platforms reduced the round-trip
latency for the LAN Controller DMA accesses, causing in some high
performance cases a buffer overrun while the I219 LAN Connected
Device is processing the DMA transactions. I219LM and I219V devices
can fall into unrecovered Tx hang under very stressfully UDP traffic
and multiple reconnection of Ethernet cable. This Tx hang of the LAN
Controller is only recovered if the system is rebooted. Slightly slow
down DMA access by reducing the number of outstanding requests.
This workaround could have an impact on TCP traffic performance
on the platform. Disabling TSO eliminates performance loss for TCP
traffic without a noticeable impact on CPU performance.

Please, refer to I218/I219 specification update:
https://www.intel.com/content/www/us/en/embedded/products/networking/
ethernet-connection-i218-family-documentation.html

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit b10effb92e272051dd1ec0d7be56bf9ca85ab927)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Avoid receiver overrun interrupt bursts

When e1000e_poll() is not fast enough to keep up with incoming traffic, the
adapter (when operating in msix mode) raises the Other interrupt to signal
Receiver Overrun.

This is a double problem because 1) at the moment e1000_msix_other()
assumes that it is only called in case of Link Status Change and 2) if the
condition persists, the interrupt is repeatedly raised again in quick
succession.

Ideally we would configure the Other interrupt to not be raised in case of
receiver overrun but this doesn't seem possible on this adapter. Instead,
we handle the first part of the problem by reverting to the practice of
reading ICR in the other interrupt handler, like before commit 16ecba59bc33
("e1000e: Do not read ICR in Other interrupt"). Thanks to commit
0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME
from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts
anymore. We handle the second part of the problem by not re-enabling the
Other interrupt right away when there is overrun. Instead, we wait until
traffic subsides, napi polling mode is exited and interrupts are
re-enabled.

Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Fixes: 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 4aea7a5c5e940c1723add439f4088844cd26196d)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Separate signaling for link check/link up

Lennart reported the following race condition:

\ e1000_watchdog_task
    \ e1000e_has_link
        \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
            /* link is up */
            mac->get_link_status = false;

                            /* interrupt */
                            \ e1000_msix_other
                                hw->mac.get_link_status = true;

        link_active = !hw->mac.get_link_status
        /* link_active is false, wrongly */

This problem arises because the single flag get_link_status is used to
signal two different states: link status needs checking and link status is
down.

Avoid the problem by using the return value of .check_for_link to signal
the link status to e1000e_has_link().

Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 19110cfbb34d4af0cdfe14cd243f3b09dc95b013)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Fix return value test

All the helpers return -E1000_ERR_PHY.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit d3509f8bc7b0560044c15f0e3ecfde1d9af757a6)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Fix wrong comment related to link detection

Reading e1000e_check_for_copper_link() shows that get_link_status is set to
false after link has been detected. Therefore, it stays TRUE until then.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 65a29da1f5fd20fdebef3b959bef9b3660807b20)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Fix error path in link detection

In case of error from e1e_rphy(), the loop will exit early and "success"
will be set to true erroneously.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit c4c40e51f9c32c6dd8adf606624c930a1c4d9bbb)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

drivers: net: e1000e: use setup_timer() helper.

Use setup_timer function instead of initializing timer with the
function and data fields.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 27069012
(cherry picked from commit 4a9c07ed71c2b8d755ee585264f80dd2d82a8066)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Initial Support for IceLake

i219 (8) and i219 (9) are the next LOM generations that will be available
on the next Intel Client platform (IceLake).
This patch provides the initial support for these devices

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 48f76b68f9fca4e1d5bbb1755d14e8e8e09bdd5b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: add check on e1e_wphy() return value

Check return value from call to e1e_wphy(). This value is being
checked during previous calls to function e1e_wphy() and it seems
a check was missing here.

Addresses-Coverity-ID: 1226905
Signed-off-by: Gustavo A R Silva <garsilva@embeddedor.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit d75372a2daf5dc48207ee9e5592917e893cddb87)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails

An error during suspend (e100e_pm_suspend),

[  429.994338] ACPI : EC: event blocked
[  429.994633] e1000e: EEE TX LPI TIMER: 00000011
[  430.955451] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x30 [e1000e] returns -2
[  430.955454] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2
[  430.955458] PM: Device 0000:00:19.0 failed to suspend async: error -2
[  430.955581] PM: Some devices failed to suspend, or early wake event detected
[  430.957709] ACPI : EC: event unblocked

lead to complete failure:

[  432.585002] ------------[ cut here ]------------
[  432.585013] WARNING: CPU: 3 PID: 8372 at kernel/irq/manage.c:1478 __free_irq+0x9f/0x280
[  432.585015] Trying to free already-free IRQ 20
[  432.585016] Modules linked in: cdc_ncm usbnet x86_pkg_temp_thermal intel_powerclamp coretemp mii crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep lpc_ich snd_hda_core snd_pcm mei_me mei sdhci_pci sdhci i915 mmc_core e1000e ptp pps_core prime_numbers
[  432.585042] CPU: 3 PID: 8372 Comm: kworker/u16:40 Tainted: G     U          4.10.0-rc8-CI-Patchwork_3870+ #1
[  432.585044] Hardware name: LENOVO 2356GCG/2356GCG, BIOS G7ET31WW (1.13 ) 07/02/2012
[  432.585050] Workqueue: events_unbound async_run_entry_fn
[  432.585051] Call Trace:
[  432.585058]  dump_stack+0x67/0x92
[  432.585062]  __warn+0xc6/0xe0
[  432.585065]  warn_slowpath_fmt+0x4a/0x50
[  432.585070]  ? _raw_spin_lock_irqsave+0x49/0x60
[  432.585072]  __free_irq+0x9f/0x280
[  432.585075]  free_irq+0x34/0x80
[  432.585089]  e1000_free_irq+0x65/0x70 [e1000e]
[  432.585098]  e1000e_pm_freeze+0x7a/0xb0 [e1000e]
[  432.585106]  e1000e_pm_suspend+0x21/0x30 [e1000e]
[  432.585113]  pci_pm_suspend+0x71/0x140
[  432.585118]  dpm_run_callback+0x6f/0x330
[  432.585122]  ? pci_pm_freeze+0xe0/0xe0
[  432.585125]  __device_suspend+0xea/0x330
[  432.585128]  async_suspend+0x1a/0x90
[  432.585132]  async_run_entry_fn+0x34/0x160
[  432.585137]  process_one_work+0x1f4/0x6d0
[  432.585140]  ? process_one_work+0x16e/0x6d0
[  432.585143]  worker_thread+0x49/0x4a0
[  432.585145]  kthread+0x107/0x140
[  432.585148]  ? process_one_work+0x6d0/0x6d0
[  432.585150]  ? kthread_create_on_node+0x40/0x40
[  432.585154]  ret_from_fork+0x2e/0x40
[  432.585156] ---[ end trace 6712df7f8c4b9124 ]---

The unwind failures stems from commit 2800209994f8 ("e1000e: Refactor PM
flows"), but it may be a later patch that introduced the non-recoverable
behaviour.

Fixes: 2800209994f8 ("e1000e: Refactor PM flows")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99847
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 833521ebc65b1c3092e5c0d8a97092f98eec595d)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

qla2xxx: Fix system crash in qlt_plogi_ack_unref

Orabug: 27235104

Fix system crash due to NULL pointer access.

qlt_plogi_ack_t and fc_port structures were not properly
bound before calling qlt_plogi_ack_unref().

RIP: 0010:qlt_plogi_ack_unref+0xa1/0x150 [qla2xxx]
Call Trace:
qla24xx_create_new_sess+0xb1/0x320 [qla2xxx]
qla2x00_do_work+0x123/0x260 [qla2xxx]
qla2x00_iocb_work_fn+0x30/0x40 [qla2xxx]
process_one_work+0x1f3/0x530
worker_thread+0x4e/0x480
kthread+0x10c/0x140

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Giridhar Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit 19759033e0d0beed70421ab9258f5ede79e070ae ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Remove aborting ELS IOCB call issued as part of timeout.

Orabug: 27235104

This fix the spinlock recursion issue seen while unloading the driver.

14 [ffff9f2e21e03db8] native_queued_spin_lock_slowpath at ffffffffad0d8802
15 [ffff9f2e21e03dc0] do_raw_spin_lock at ffffffffad0d99e4
16 [ffff9f2e21e03dd8] _raw_spin_lock_irqsave at ffffffffad652471
17 [ffff9f2e21e03e00] qla2x00_els_dcmd_iocb_timeout at ffffffffc070cd63
18 [ffff9f2e21e03e40] qla2x00_sp_timeout at ffffffffc06f06d3 [qla2xxx]
19 [ffff9f2e21e03e68] call_timer_fn at ffffffffad0f97d8
20 [ffff9f2e21e03ed8] run_timer_softirq at ffffffffad0faf47
21 [ffff9f2e21e03f68] __softirqentry_text_start at ffffffffad655f32

Fixes: 6eb54715b54bb ("qla2xxx: Added interface to send explicit LOGO.")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Giridhar Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit bf07ef86e882013522876f7c834c8eea085f35b4 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Defer processing of GS IOCB calls

Orabug: 27235104

This patch defers processing of GS IOCB calls from interrupt
context to avoid hardware spinlock recursion.

Following stack trace is seen

? mod_timer+0x193/0x330
? ql_dbg+0xa7/0xf0 [qla2xxx]
_raw_spin_lock_irqsave+0x31/0x40
qla2x00_start_sp+0x3b/0x250 [qla2xxx]
qla24xx_async_gnl+0x1d3/0x240 [qla2xxx]
qla24xx_fcport_handle_login+0x285/0x290 [qla2xxx]
? vprintk_func+0x20/0x50

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Giridhar Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit 5d3300a9b8b122b4743aed5a178bf12c87e2b8c9 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>