www.infradead.org Git - users/jedix/linux-maple.git/log

kernel: on OL6 only, simulate the gcc 4.4 kABI for __stack_chk_fail()

The __visible linkage changes st_value calculation for genksyms on newer
compilers such as OL7's 4.8 and SCL's 4.9. However, this symbol is in
the kABI whitelist and needs to retain its old st_value, even though the
guts of the compiled function have not changed.

Add a special directive, CONFIG_SIMULATE_GCC44_KABI, which will mask the
__visible qualifier at genksyms time; and use it only on OL6 builds.

Orabug: 27509351
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

uek-rpm: configs: Don't set HAVE_FENTRY on OL6 builds.

CONFIG_HAVE_FENTRY turns on a *conditional* addition of compile time
flags in the top level Makefile (-mfentry -DCC_USES_FENTRY). The OL6
stock gcc-4.4 code doesn't support this, but gcc 4.8+ does. This
changes the way function trace preambles work on the newer compilers,
which breaks kABI.

To do this effectively, the option needs to be removed from
arch/x86/Kconfig ("select HAVE_FENTRY if X86_64") so that it can be
set optionally in the RPM build configs. The OL7 configs properly
contain CONFIG_HAVE_FENTRY=y, so those will continue to build the same.

Orabug: 27509351
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL

[ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
guests that do not actually use the MSR, only start saving and restoring
when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: kvm@vger.kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon.de
(cherry picked from commit d28b387fb74da95d69d2615732f50cceb38e9a4d)

Orabug: 27525575
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport:

There is a lot that this patch does not pick up - but the most important we need
to pick up is the wrmsr(0x48, 0) when the retpoline is used. That is we cannot leave
the MSR048 hanging around with the guest value. The reason is that on a particular
CPU we may schedule another guest vCPU (a different) one, and the check on whether
to write the MSR0x48 is if 'vmx->spec_ctrl' (the vmx is tied to a specific VCPU).
Which means we may not write the prpoer guest vCPU MSR value in and have the
stale one in the guest.!]

x86/spectre_v2: Disable IBRS if spectre_v2=off

We found some interesting behaviors when booting on IBRS enabled
hardware where 'spectre_v2=off' did not disable IBRS completely.

That is during the bootup we received some interrupts which
meant that we set the MSR 0x48 to 1.. and then when we got
to the code that handles 'spectre_v2=off' we would just
call 'set_ibrs_disabled' which would clear the in_use parameter.

But that would not clear the wedged MSR 048 being set to 1.

Which means we would continue on with both retpoline and IBRS on!

The fix is rather simple - clear the MSR as well.

Note that if 'noibrs' was used - then we would have disabled the
IBRS _before_ we got the interrupt, as 'noibrs' is an early parameter.

OraBug: 27525754
Reported-by: Yao-Min Chen <yaomin.chen@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xenbus: track caller request id

Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent
xenstore accesses") optimized xenbus concurrent accesses but in doing so
broke UABI of /dev/xen/xenbus. Through /dev/xen/xenbus applications are in
charge of xenbus message exchange with the correct header and body. Now,
after the mentioned commit the replies received by application will no
longer have the header req_id echoed back as it was on request (see
specification below for reference), because that particular field is being
overwritten by kernel.

struct xsd_sockmsg
{
  uint32_t type;  /* XS_??? */
  uint32_t req_id;/* Request identifier, echoed in daemon's response.  */
  uint32_t tx_id; /* Transaction id (0 if not related to a transaction). */
  uint32_t len;   /* Length of data following this. */

  /* Generally followed by nul-terminated string(s). */
};

Before there was only one request at a time so req_id could simply be
forwarded back and forth. To allow simultaneous requests we need a
different req_id for each message thus kernel keeps a monotonic increasing
counter for this field and is written on every request irrespective of
userspace value.

Forwarding again the req_id on userspace requests is not a solution because
we would open the possibility of userspace-generated req_id colliding with
kernel ones. So this patch instead takes another route which is to
artificially keep user req_id while keeping the xenbus logic as is. We do
that by saving the original req_id before xs_send(), use the private kernel
counter as req_id and then once reply comes and was validated, we restore
back the original req_id.

Cc: <stable@vger.kernel.org> # 4.11
Fixes: fd8aa9095a ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
Reported-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 27472576

x86/spectre_v2: Remove 0xc2 from spectre_bad_microcodes

According to the latest microcode update from Intel (on Feb 8, 2018) on
Skylake we should be using the microcode revisions 0xC2***, so we need
to remove that from the blacklist now.

Orabug: 27523393

Signed-off-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/speculation: Use Indirect Branch Prediction Barrier in context switch

This patch is a subset of the changes in the upstream commit
18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7. Since we don't have 'ctx_id' in
mm_context_t in UEK4, we can't check whether the context ID of the new
task is the same as that of the previous one. In this patch, we flush indirect
branches when switching into a process that marked itself non-dumpable.

This protects high value processes like gpg better, without having too high
performance overhead.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: linux@dominikbrodowski.net
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: pbonzini@redhat.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1517263487-3708-1-git-send-email-dwmw@amazon.co.uk
Orabug: 27524608
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Fix typo IBRS_ATT, which should be IBRS_ALL (redux)

In our code-base from Intel we still use IBRS_ATT, so lets
fix it to what is upstream.

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>

x86/spectre_v2: Add spectre_v2_heuristics=

As we have one already in the tree:
x86/spectre: Favor IBRS on Skylake over retpoline

But we may want a knob to turn that off altgoether. The
hard to remember spectre_v2_heuristics=off or
spectre_v2_heuristics=skylake=off

will take care of disabling that optimization.

Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>

x86/spectre_v2: Do not disable IBPB when disabling IBRS

Upstream has decided that while IBRS is bad, IBPB is good.

In fact:
18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7 x86/speculation: Use Indirect Branch Prediction Barrier in context switch

and KVM patches:
15d45071523d89b3fb7372e2135fbd72f6af9506 KVM/x86: Add IBPB support

all use indirect_branch_prediction_barrier().

In our code base the indirect_branch_prediction_barrier
is wrapped with an check:

if (ibpb_inuse)
wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);

But nonethless we should keep the IBPB disabled on the normal path.

However if folks have choosen 'spectre_v2=off' or 'spectre_v2=none'
then we MUST disable the IBPB.

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>

x86/scattered: Fix the order.

The order of the CPUID leafs is sorted, so lets put them back in
a proper order.

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>

x86/spectre: Favor IBRS on Skylake over retpoline

Couple of rules around this. If the user has choosen:

spectre_v2=retpoline
spectre_v2=retpoline,generic

That we will respect their wishes.

If the customer has:

spectre_v2=auto (by default)
spectre_v2=force

Then we will figure out if this is a machine with Skylake
affected CPUS. If so, we will pick IBRS over retpoline
if IBRS is available.

And lastly, if the kernel is compiled without retpoline
support we will pick IBRS over minimal retpoline support
(if IBRS is available).

In other words the priority for non-Skylake is:

retpoline
IBRS
minimal asm

On Skylake:

IBRS
retpoline
minimal asm

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL

Fixes: 117cc7a908c83 ("x86/retpoline: Fill return stack buffer on vmexit")
Signed-off-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180202191220.blvgkgutojecxr3b@starbug-vm.ie.oracle.com
(cherry picked from commit af189c95a371b59f493dbe0f50c0a09724868881)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/spectre: Now that we expose 'stbibp' make sure it is correct.

Right now it is 'stipb' but that is not correct. It should
be 'stibp'

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

x86/cpufeatures: Clean up Spectre v2 related CPUID flags

We want to expose the hardware features simply in /proc/cpuinfo as "ibrs",
"ibpb" and "stibp". Since AMD has separate CPUID bits for those, use them
as the user-visible bits.

When the Intel SPEC_CTRL bit is set which indicates both IBRS and IBPB
capability, set those (AMD) bits accordingly. Likewise if the Intel STIBP
bit is set, set the AMD STIBP that's used for the generic hardware
capability.

Hide the rest from /proc/cpuinfo by putting "" in the comments. Including
RETPOLINE and RETPOLINE_AMD which shouldn't be visible there. There are
patches to make the sysfs vulnerabilities information non-readable by
non-root, and the same should apply to all information about which
mitigations are actually in use. Those *shouldn't* appear in /proc/cpuinfo.

The feature bit for whether IBPB is actually used, which is needed for
ALTERNATIVEs, is renamed to X86_FEATURE_USE_IBPB.

Originally-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1517070274-12128-2-git-send-email-dwmw@amazon.co.uk
(cherry picked from commit 2961298efe1ea1b6fc0d7ee8b76018fa6c0bcef2)
Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
arch/x86/include/asm/cpufeatures.h
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/cpu/intel.c

[Backport: The majority of the changes are to make 'retpoline' and friends
not show up. Also fix so that instead of 'spec_ctrl' we have 'ibrs'.
The rest we had already]

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support

Expose indirect_branch_prediction_barrier() for use in subsequent patches.

[ tglx: Add IBPB status to spectre_v2 sysfs file ]

Co-developed-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: ak@linux.intel.com
Cc: ashok.raj@intel.com
Cc: dave.hansen@intel.com
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1516896855-7642-8-git-send-email-dwmw@amazon.co.uk
(cherry picked from commit 20ffa1caecca4db8f79fe665acdeaa5af815a24d)
Orabug: 27477743
CVE: CVE-2017-5715

Conflicts:
arch/x86/include/asm/cpufeatures.h
arch/x86/kernel/cpu/bugs.c
[The original version of this patch doesn't set X86_FEATURE_IBPB, so do
it ourselves. Given X86_FEATURE_SPEC_CTRL (i.e. CPUID.07.[EDX.26])[*],
always set X86_FEATURE_IBPB in scattered.c. This omission did not
impact the actual IBPB functionality as the code uses 'ibpb_inuse': the
only thing missing was the 'ibpb' string in /proc/cpuinfo.

Since we already have code to enable IBPB (e.g. switch_mm_irqs_off),
there is no point in backporting indirect_branch_prediction_barrier from
this patch.]

[*] 336996-Speculative-Execution-Side-Channel-Mitigations.pdf

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>

x86/bugs: Drop one "mitigation" from dmesg

Make

[ 0.031118] Spectre V2 mitigation: Mitigation: Full generic retpoline

into

[ 0.031118] Spectre V2: Mitigation: Full generic retpoline

to reduce the mitigation mitigations strings.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: riel@redhat.com
Cc: ak@linux.intel.com
Cc: peterz@infradead.org
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: jikos@kernel.org
Cc: luto@amacapital.net
Cc: dave.hansen@intel.com
Cc: torvalds@linux-foundation.org
Cc: keescook@google.com
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: tim.c.chen@linux.intel.com
Cc: pjt@google.com
Link: https://lkml.kernel.org/r/20180126121139.31959-5-bp@alien8.de
(cherry picked from commit 55fa19d3e51f33d9cd4056d25836d93abf9438db)
Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
[It is bugs_64.c in our kernel]

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/nospec: Fix header guards names

... to adhere to the _ASM_X86_ naming scheme.

No functional change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: riel@redhat.com
Cc: ak@linux.intel.com
Cc: peterz@infradead.org
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: jikos@kernel.org
Cc: luto@amacapital.net
Cc: dave.hansen@intel.com
Cc: torvalds@linux-foundation.org
Cc: keescook@google.com
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Cc: pjt@google.com
Link: https://lkml.kernel.org/r/20180126121139.31959-3-bp@alien8.de
(cherry picked from commit 7a32fc51ca938e67974cbb9db31e1a43f98345a9)
Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

x86/spectre_v2: Don't spam the console with these:

Intel Spectre v2 broken microcode detected; disabling SPEC_CTRL
for every CPU.

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes

This doesn't refuse to load the affected microcodes; it just refuses to
use the Spectre v2 mitigation features if they're detected, by clearing
the appropriate feature bits.

The AMD CPUID bits are handled here too, because hypervisors *may* have
been exposing those bits even on Intel chips, for fine-grained control
of what's available.

It is non-trivial to use x86_match_cpu() for this table because that
doesn't handle steppings. And the approach taken in commit bd9240a18
almost made me lose my lunch.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: ak@linux.intel.com
Cc: ashok.raj@intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1516896855-7642-7-git-send-email-dwmw@amazon.co.uk
(cherry picked from commit a5b2966364538a0e68c9fa29bc0a3a1651799035)

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
arch/x86/kernel/cpu/intel.c

[Our resolution to
5d10cbc91d9e x86/cpufeatures: Add AMD feature bits for Speculation Control
is different, so we still clear the right bits if there (or would be)are AMD machines
with bad microcode]

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/cpu: Keep model defines sorted by model number

For better maintenance keep it sorted by numeric model ID. Add new lines to
seperate model groups.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20170316155045.50389-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit c238f2343441e3995d2d4e993de42b072d005f4a)
Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown

Also, for CPUs which don't speculate at all, don't report that they're
vulnerable to the Spectre variants either.

Leave the cpu_no_meltdown[] match table with just X86_VENDOR_AMD in it
for now, even though that could be done with a simple comparison, on the
assumption that we'll have more to add.

Based on suggestions from Dave Hansen and Alan Cox.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: ak@linux.intel.com
Cc: ashok.raj@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1516896855-7642-6-git-send-email-dwmw@amazon.co.uk
(cherry picked from commit fec9434a12f38d3aeafeb75711b71d8a1fdef621)

Orabug: 27477743
CVE: CVE-2017-5715

[Backport:
s/X86_FEATURE_ARCH_CAPABILITIES/X86_FEATURE_IA32_ARCH_CAPS/
Fix compiler warnings by removing __init/__initdata from
cpu_no_speculation, cpu_no_meltdown, and cpu_vulnerable_to_meltdown
]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/msr: Add definitions for new speculation control MSRs

Add MSR and bit definitions for SPEC_CTRL, PRED_CMD and ARCH_CAPABILITIES.

See Intel's 336996-Speculative-Execution-Side-Channel-Mitigations.pdf

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: ak@linux.intel.com
Cc: ashok.raj@intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1516896855-7642-5-git-send-email-dwmw@amazon.co.uk
(cherry picked from commit 1e340c60d0dd3ae07b5bedc16a0469c14b9f3410)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 27477743
CVE: CVE-2017-5715

Conflicts:
arch/x86/include/asm/msr-index.h

[File does not exist, and more importantly we have most of the MSRs, except
two of the bits]

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/cpufeatures: Add AMD feature bits for Speculation Control

AMD exposes the PRED_CMD/SPEC_CTRL MSRs slightly differently to Intel.
See http://lkml.kernel.org/r/2b3e25cc-286d-8bd0-aeaf-9ac4aae39de8@amd.com

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: ak@linux.intel.com
Cc: ashok.raj@intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1516896855-7642-4-git-send-email-dwmw@amazon.co.uk
Conflicts:
arch/x86/include/asm/cpufeatures.h
[Upstream has the NCAPINTS raised so there is another array to stuff all the
0x80000008 cpuid leaf in. But we don't have that so we just toggle the global
ones]

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/spectre_v2: Print what options are available.

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

x86/spectre_v2: Add VMEXIT_FILL_RSB instead of RETPOLINE

The backport of "x86/retpoline: Fill return stack buffer on vmexit"
made the full stuffing of RSB only enabled if the kernel had
selected X86_FEATURE_RETPOLINE.

But if we are using IBRS we still want the full RSB stuffing
as it was prior to the backport.

Since we have both retpoline and ibrs wanting it we introduce
a new feature to enable the common mitigation that both of them
need.

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

x86/spectre: If IBRS is enabled disable "Filling RSB on context switch"

As that is only needed if the machine is running IBRS and !SMEP.

The comment above the conditional says it all:

Skylake era CPUs have a separate issue with *underflow* of the
RSB, when they will predict 'ret' targets from the generic BTB.
The proper mitigation for this is IBRS. If IBRS is not supported
or deactivated in favour of retpolines the RSB fill on context

.. and if we have IBRS then we should ignore this conditional.

Note that the check (!SMEP) and using the STUFF_RSB is already
done in:
x86/spectre_v2: Figure out if STUFF_RSB macro needs to be used.

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

This is an adopation of what is being posted upstream.

The issue here is that guest kernels such as Windows
needs IBRS to solve their spectre_v2 mitigation.

And since our kernel can do either IBRS or retpoline
we need to be aware of both.

(We are ignoring for right now the situation in which
the microcode is not loaded and we want to emulate IBRS)

In short:
1) If the CPU has microcode, and host uses IBRS we need
    to save the MSR during VMEXIT and write our IBRS value.
    Also before VMENTER we need to write the guest MSR value.

2) If the host kernel uses retpoline we need to WRMSR the
    guest MSR value on VMENTER, but we are optimizing by only
    doing it if it a non-zero value.

    On VMEXIT we read the guest MSR value.

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/spectre_v2: Don't allow {ibrs,ipbp,lfence}_enabled to be toggled if retpoline

is enabled.

And also refresh the spectre_v2_enabled mode depending on whether
the sysfs knobs are turned on/off.

Preserve the ability to tweak IBPB if retpoline is enabled.

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/spectre: Fix retpoline_enabled

As it will report that retpoline_enabled if IBRS is set.

Instead just figure out which mode we are in and if it
has retpoline then return true.

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/spectre: Update sysctl values if toggled only by set_{ibrs,ibpb}_disabled

As otherwise we will report the wrong values in the
/sys/kernel/debug/x86/{ibrs,ipbp}_enabled at first
bootup - that is if you have 'noibrs' on the command line.

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

retpoline/module: Taint kernel for missing retpoline in module

There's a risk that a kernel that has full retpoline mitigations
becomes vulnerable when a module gets loaded that hasn't been
compiled with the right compiler or the right option.

We cannot fix it, but should at least warn the user when that
happens.

Add a flag to each module if it has been compiled with RETPOLINE

When the a module hasn't been compiled with a retpoline
aware compiler, print a warning and set a taint flag.

For modules it is checked at compile time, however it cannot
check assembler or other non compiled objects used in the module link.

Due to lack of better letter it uses taint option 'Z'

We only set the taint flag for incorrectly compiled modules
now, not for the main kernel, which already has other
report mechanisms.

Also make sure to report vulnerable for spectre if such a module
has been loaded.

v2: Change warning message
v3: Port to latest tree
Cc: jeyu@kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
(cherry picked from commit abc01f3c4bbd927f5e47cc6dff99a76a393d1bbe)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Conflicts:
        Documentation/oops-tracing.txt
  (dmj: patch had Documentation/admin-guide/tainted-kernels.rst)
        arch/x86/kernel/cpu/bugs_64.c
  (dmj: patch had arch/x86/kernel/cpu/bugs.c)
include/linux/kernel.h
kernel/module.c
kernel/panic.c
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline: Fill RSB on context switch for affected CPUs

commit c995efd5a740d9cbafbf58bde4973e8b50b4d761 upstream.

On context switch from a shallow call stack to a deeper one, as the CPU
does 'ret' up the deeper side it may encounter RSB entries (predictions for
where the 'ret' goes to) which were populated in userspace.

This is problematic if neither SMEP nor KPTI (the latter of which marks
userspace pages as NX for the kernel) are active, as malicious code in
userspace may then be executed speculatively.

Overwrite the CPU's return prediction stack with calls which are predicted
to return to an infinite loop, to "capture" speculation if this
happens. This is required both for retpoline, and also in conjunction with
IBRS for !SMEP && !KPTI.

On Skylake+ the problem is slightly different, and an *underflow* of the
RSB may cause errant branch predictions to occur. So there it's not so much
overwrite, as *filling* the RSB to attempt to prevent it getting
empty. This is only a partial solution for Skylake+ since there are many
other conditions which may result in the RSB becoming empty. The full
solution on Skylake+ is to use IBRS, which will prevent the problem even
when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
required on context switch.

[ tglx: Added missing vendor check and slighty massaged comments and
   changelog ]

[js] backport to 4.4 -- __switch_to_asm does not exist there, we
     have to patch the switch_to macros for both x86_32 and x86_64.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 18bb117d1b7690181346e6365c6237b6ceaac4c4)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Conflicts:
arch/x86/include/asm/cpufeature.h
          (dmj: bumped all uek4-specific microcode features up one)
arch/x86/include/asm/switch_to.h
arch/x86/kernel/cpu/bugs_64.c
  (dmj:
            - patch had arch/x86/kernel/cpu/bugs.c
            - preserved X86_FEATURE_RSB_CTXSW check in case of IBRS)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline: Optimize inline assembler for vmexit_fill_RSB

commit 3f7d875566d8e79c5e0b2c9a413e91b2c29e0854 upstream.

The generated assembler for the C fill RSB inline asm operations has
several issues:

- The C code sets up the loop register, which is then immediately
overwritten in __FILL_RETURN_BUFFER with the same value again.

- The C code also passes in the iteration count in another register, which
is not used at all.

Remove these two unnecessary operations. Just rely on the single constant
passed to the macro for the iterations.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: dave.hansen@intel.com
Cc: gregkh@linuxfoundation.org
Cc: torvalds@linux-foundation.org
Cc: arjan@linux.intel.com
Link: https://lkml.kernel.org/r/20180117225328.15414-1-andi@firstfloor.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 11e619414b69b7f1e47baac72c5be589d86e5393)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

kprobes/x86: Disable optimizing on the function jumps to indirect thunk

commit c86a32c09f8ced67971a2310e3b0dda4d1749007 upstream.

Since indirect jump instructions will be replaced by jump
to __x86_indirect_thunk_*, those jmp instruction must be
treated as an indirect jump. Since optprobe prohibits to
optimize probes in the function which uses an indirect jump,
it also needs to find out the function which jump to
__x86_indirect_thunk_* and disable optimization.

Add a check that the jump target address is between the
__indirect_thunk_start/end when optimizing kprobe.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Link: https://lkml.kernel.org/r/151629212062.10241.6991266100233002273.stgit@devbox
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6cb73eb8045157ea280f0a047777ea1f56547375)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

kprobes/x86: Blacklist indirect thunk functions for kprobes

commit c1804a236894ecc942da7dc6c5abe209e56cba93 upstream.

Mark __x86_indirect_thunk_* functions as blacklist for kprobes
because those functions can be called from anywhere in the kernel
including blacklist functions of kprobes.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Link: https://lkml.kernel.org/r/151629209111.10241.5444852823378068683.stgit@devbox
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9b8bd0d3586842716f5673e7a0922a9c9e1fcbfd)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

retpoline: Introduce start/end markers of indirect thunk

commit 736e80a4213e9bbce40a7c050337047128b472ac upstream.

Introduce start/end markers of __x86_indirect_thunk_* functions.
To make it easy, consolidate .text.__x86.indirect_thunk.* sections
to one .text.__x86.indirect_thunk section and put it in the
end of kernel text section and adds __indirect_thunk_start/end
so that other subsystem (e.g. kprobes) can identify it.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Link: https://lkml.kernel.org/r/151629206178.10241.6828804696410044771.stgit@devbox
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 799dc737680a8074a0c7c2d3426b85f4c439377f)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/mce: Make machine check speculation protected

commit 6f41c34d69eb005e7848716bbcafc979b35037d5 upstream.

The machine check idtentry uses an indirect branch directly from the low
level code. This evades the speculation protection.

Replace it by a direct call into C code and issue the indirect call there
so the compiler can apply the proper speculation protection.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by:Borislav Petkov <bp@alien8.de>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Niced-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801181626290.1847@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f59e7ce17ba327245c8feb312d447b09d3b98eba)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Conflicts:
arch/x86/kernel/entry_64.S
(dmj: patch had arch/x86/entry/entry_64.S)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

kbuild: modversions for EXPORT_SYMBOL() for asm

commit 4efca4ed05cbdfd13ec3e8cb623fb77d6e4ab187 upstream.

Allow architectures to create asm/asm-prototypes.h file that
provides C prototypes for exported asm functions, which enables
proper CRC versions to be generated for them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
[jkosina@suse.cz: folded cc6acc11cad1 fixup in as well ]
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ff535919c1366b0218113f95c447d92c3aac0c1f)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros

commit 28d437d550e1e39f805d99f9f8ac399c778827b7 upstream.

The PAUSE instruction is currently used in the retpoline and RSB filling
macros as a speculation trap.  The use of PAUSE was originally suggested
because it showed a very, very small difference in the amount of
cycles/time used to execute the retpoline as compared to LFENCE.  On AMD,
the PAUSE instruction is not a serializing instruction, so the pause/jmp
loop will use excess power as it is speculated over waiting for return
to mispredict to the correct target.

The RSB filling macro is applicable to AMD, and, if software is unable to
verify that LFENCE is serializing on AMD (possible when running under a
hypervisor), the generic retpoline support will be used and, so, is also
applicable to AMD.  Keep the current usage of PAUSE for Intel, but add an
LFENCE instruction to the speculation trap for AMD.

The same sequence has been adopted by GCC for the GCC generated retpolines.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Link: https://lkml.kernel.org/r/20180113232730.31060.36287.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fba063e6dfb413e06b9daa5d45b164761172f5ed)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline: Remove compile time warning

commit b8b9ce4b5aec8de9e23cabb0a26b78641f9ab1d6 upstream.

Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler
does not have retpoline support. Linus rationale for this is:

  It's wrong because it will just make people turn off RETPOLINE, and the
  asm updates - and return stack clearing - that are independent of the
  compiler are likely the most important parts because they are likely the
  ones easiest to target.

  And it's annoying because most people won't be able to do anything about
  it. The number of people building their own compiler? Very small. So if
  their distro hasn't got a compiler yet (and pretty much nobody does), the
  warning is just annoying crap.

  It is already properly reported as part of the sysfs interface. The
  compile-time warning only encourages bad things.

Fixes: 76b043848fd2 ("x86/retpoline: Add initial retpoline support")
Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 451725c3e785dfc3ede6c65184b96c213181995a)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline: Fill return stack buffer on vmexit

commit 117cc7a908c83697b0b737d15ae1eb5943afe35b upstream.

In accordance with the Intel and AMD documentation, we need to overwrite
all entries in the RSB on exiting a guest, to prevent malicious branch
target predictions from affecting the host kernel. This is needed both
for retpoline and for IBRS.

[ak: numbers again for the RSB stuffing labels]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515755487-8524-1-git-send-email-dwmw@amazon.co.uk
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit eebc3f8adee0a6f43a4789ef0bf5c5b35de8cfe4)

Removed now-unused stuff_RSB function from uek4's 60f856955c1b
("x86/kvm: Pad RSB on VM transition").

Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Conflicts:
arch/x86/kvm/svm.c
arch/x86/kvm/vmx.c
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline/irq32: Convert assembler indirect jumps

commit 7614e913db1f40fff819b36216484dc3808995d4 upstream.

Convert all indirect jumps in 32bit irq inline asm code to use non
speculative sequences.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-12-git-send-email-dwmw@amazon.co.uk
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f72655b837eb4320a1ffebbd0e0ebe92ce1e5314)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Conflicts:
arch/x86/kernel/irq_32.c
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline/checksum32: Convert assembler indirect jumps

commit 5096732f6f695001fa2d6f1335a2680b37912c69 upstream.

Convert all indirect jumps in 32bit checksum assembler code to use
non-speculative sequences when CONFIG_RETPOLINE is enabled.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-11-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7e5bb301bd2fdd62cbee7b26a8234cccb6731849)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline/xen: Convert Xen hypercall indirect jumps

commit ea08816d5b185ab3d09e95e393f265af54560350 upstream.

Convert indirect call in Xen hypercall to use non-speculative sequence,
when CONFIG_RETPOLINE is enabled.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-10-git-send-email-dwmw@amazon.co.uk
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27477743
CVE: CVE-2017-5715
(cherry picked from commit 6b222e7483af4fd8f632efbf3b91025c2359b10a)
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Conflicts:
arch/x86/include/asm/xen/hypercall.h
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline/hyperv: Convert assembler indirect jumps

commit e70e5892b28c18f517f29ab6e83bd57705104b31 upstream.

Convert all indirect jumps in hyperv inline asm code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-9-git-send-email-dwmw@amazon.co.uk
[ backport to 4.4, hopefully correct, not tested... - gregkh ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27477743
CVE: CVE-2017-5715
(cherry picked from commit d2beed45635e3c430bc6d84ff8e6c6e8cb2e10b4)
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline/ftrace: Convert ftrace assembler indirect jumps

commit 9351803bd803cdbeb9b5a7850b7b6f464806e3db upstream.

Convert all indirect jumps in ftrace assembler code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-8-git-send-email-dwmw@amazon.co.uk
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7153a6d5ff050050555066f58ac3458c5efc699b)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Conflicts:
arch/x86/kernel/entry_64.S
(dmj: patch had arch/x86/entry/entry_32.S)
arch/x86/kernel/mcount_64.S
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline/entry: Convert entry assembler indirect jumps

commit 2641f08bb7fc63a636a2b18173221d7040a3512e upstream.

Convert indirect jumps in core 32/64bit entry assembler code to use
non-speculative sequences when CONFIG_RETPOLINE is enabled.

Don't use CALL_NOSPEC in entry_SYSCALL_64_fastpath because the return
address after the 'call' instruction must be *precisely* at the
.Lentry_SYSCALL_64_after_fastpath label for stub_ptregs_64 to work,
and the use of alternatives will mess that up unless we play horrid
games to prepend with NOPs and make the variants the same length. It's
not worth it; in the case where we ALTERNATIVE out the retpoline, the
first instruction at __x86.indirect_thunk.rax is going to be a bare
jmp *%rax anyway.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-7-git-send-email-dwmw@amazon.co.uk
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 028083cb02db69237e73950576bc81ac579693dc)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Conflicts:
arch/x86/kernel/entry_32.S
          (dmj:
            - patch had arch/x86/entry/entry_32.S
            - extra retpolines needed for sys_call_table)
arch/x86/kernel/entry_64.S
          (dmj: patch had arch/x86/entry/entry_64.S)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline/crypto: Convert crypto assembler indirect jumps

commit 9697fa39efd3fc3692f2949d4045f393ec58450b upstream.

Convert all indirect jumps in crypto assembler code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-6-git-send-email-dwmw@amazon.co.uk
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9fe55976f0c8acfd7408bf693b6d171587b62129)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/spectre_v2: Add disable_ibrs_and_friends

Instead of having the three calls to set_{ibrs,ibpb,lfence}_disabled.

Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/spectre_v2: Figure out if STUFF_RSB macro needs to be used.

If IBRS is on, STUFF_RSB only needs to be enabled if the CPU
does not have SMEP enabled.

We change the macro to depend on a synthetic one called
X86_FEATURE_STUFF_RSB - which will only be turned on
if 'spectre_v2=ibrs' is set and the BIOS does not have SMEP
enabled.

(And naturally also if the kernel is compiled without retpoline
and the CPU does not have SMEP).

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/spectre_v2: Figure out when to use IBRS.

Which is if:
a) on the bootline you have 'spectre_v2=ibrs' _and_ the microcode
    is loaded that exposes this. And nobody used 'noibrs'.

    Also if you have 'spectre_v2=ibrs noibrs' we end up falling
    in the 'lfence'. Unless somebody did 'spectre_v2=ibrs noibrs nolfence',
    then we go back to automatic discovery.

b) Kernel compiled without retpoline and the microcode is
   available.

c) Kernel compiled without retpoline and there is no microcode with IBRS
   then we pick 'lfence' on system calls.

Orabug: 27477743
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/spectre: Add IBRS option.

The spectre_v2_mitigation already has an IBRS option, lets make
the override possible. But don't select it by default if
the kernel has been compiled with retpoline.

If it has not (no compiler support), then fallback to ibrs.

Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/spectre: Add boot time option to select Spectre v2 mitigation

commit da285121560e769cc31797bba6422eea71d473e0 upstream.

Add a spectre_v2= option to select the mitigation used for the indirect
branch speculation vulnerability.

Currently, the only option available is retpoline, in its various forms.
This will be expanded to cover the new IBRS/IBPB microcode features.

The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation
control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a
serializing instruction, which is indicated by the LFENCE_RDTSC feature.

[ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS
   integration becomes simple ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.uk
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9f789bc5711bcacb5df003594b992f0c1cc19df4)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs_64.c
          (dmj:
            - patch used arch/x86/kernel/cpu/bugs.c
            - UEK4-specific IBPB and IBRS code merged with retpoline
              cmdline parsing
            - disabled IBRS in the case of full retpoline mitigation)
           (konrad:
            - disabled lfence on IBRS paths in case of full retpoline)
arch/x86/kernel/cpu/common.c
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/retpoline: Add initial retpoline support

commit 76b043848fd22dbf7f8bf3a1452f8c70d557b860 upstream.

Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
the corresponding thunks. Provide assembler macros for invoking the thunks
in the same way that GCC does, from native and inline assembler.

This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
some circumstances, IBRS microcode features may be used instead, and the
retpoline can be disabled.

On AMD CPUs if lfence is serialising, the retpoline can be dramatically
simplified to a simple "lfence; jmp *\reg". A future patch, after it has
been verified that lfence really is serialising in all circumstances, can
enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
to X86_FEATURE_RETPOLINE.

Do not align the retpoline in the altinstr section, because there is no
guarantee that it stays aligned when it's copied over the oldinstr during
alternative patching.

[ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
[ tglx: Put actual function CALL/JMP in front of the macros, convert to
symbolic labels ]
[ dwmw2: Convert back to numeric labels, merge objtool fixes ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
[ 4.4 backport: removed objtool annotation since there is no objtool ]
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3c5e10905263dbe9fbc621d1889b85e9c867da25)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Conflicts:
arch/x86/kernel/cpu/common.c
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

kconfig.h: use __is_defined() to check if MODULE is defined

commit 4f920843d248946545415c1bf6120942048708ed upstream.

The macro MODULE is not a config option, it is a per-file build
option. So, config_enabled(MODULE) is not sensible. (There is
another case in include/linux/export.h, where config_enabled() is
used against a non-config option.)

This commit renames some macros in include/linux/kconfig.h for the
use for non-config macros and replaces config_enabled(MODULE) with
__is_defined(MODULE).

I am keeping config_enabled() because it is still referenced from
some places, but I expect it would be deprecated in the future.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27477743
CVE: CVE-2017-5715
(cherry picked from commit 675901851fd2a00c7a70bcf327303efba3b81e2f)
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

EXPORT_SYMBOL() for asm

commit 22823ab419d8ed884195cfa75483fd3a99bb1462 upstream.

Add asm-usable variants of EXPORT_SYMBOL/EXPORT_SYMBOL_GPL.  This
commit just adds the default implementation; most of the architectures
can simply add export.h to asm/Kbuild and start using <asm/export.h>
from assembler.  The rest needs to have their <asm/export.h> define
everal macros and then explicitly include <asm-generic/export.h>

One area where the things might diverge from default is the alignment;
normally it's 8 bytes on 64bit targets and 4 on 32bit ones, both for
unsigned long and for struct kernel_symbol.  Unfortunately, amd64 and
m68k are unusual - m68k aligns to 2 bytes (for both) and amd64 aligns
struct kernel_symbol to 16 bytes.  For those we'll need asm/export.h to
override the constants used by generic version - KSYM_ALIGN and KCRC_ALIGN
for kernel_symbol and unsigned long resp.  And no, __alignof__ would not
do the trick - on amd64 __alignof__ of struct kernel_symbol is 8, not 16.

More serious source of unpleasantness is treatment of function
descriptors on architectures that have those.  Things like ppc64,
parisc, ia64, etc.  need more than the address of the first insn to
call an arbitrary function.  As the result, their representation of
pointers to functions is not the typical "address of the entry point" -
it's an address of a small static structure containing all the required
information (including the entry point, of course).  Sadly, the asm-side
conventions differ in what the function name refers to - entry point or
the function descriptor.  On ppc64 we do the latter;
bar: .quad foo
is what void (*bar)(void) = foo; turns into and the rare places where
we need to explicitly work with the label of entry point are dealt with
as DOTSYM(foo).  For our purposes it's ideal - generic macros are usable.
However, parisc would have foo and P%foo used for label of entry point
and address of the function descriptor and
bar: .long P%foo
woudl be used instead. ia64 goes similar to parisc in that respect,
except that there it's @fptr(foo) rather than P%foo.  Such architectures
need to define KSYM_FUNC that would turn a function name into whatever
is needed to refer to function descriptor.

What's more, on such architectures we need to know whether we are exporting
a function or an object - in assembler we have to tell that explicitly, to
decide whether we want EXPORT_SYMBOL(foo) produce e.g.
__ksymtab_foo: .quad foo
or
__ksymtab_foo: .quad @fptr(foo)

For that reason we introduce EXPORT_DATA_SYMBOL{,_GPL}(), to be used for
exports of data objects.  On normal architectures it's the same thing
as EXPORT_SYMBOL{,_GPL}(), but on parisc-like ones they differ and the
right one needs to be used.  Most of the exports are functions, so we
keep EXPORT_SYMBOL for those...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27477743
CVE: CVE-2017-5715
(cherry picked from commit a88693d006986b8fc9ae5036852ac0156106de2a)
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/asm: Make asm/alternative.h safe from assembly

commit f005f5d860e0231fe212cfda8c1a3148b99609f4 upstream.

asm/alternative.h isn't directly useful from assembly, but it
shouldn't break the build.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e5b693fcef99fe6e80341c9e97a002fb23871e91.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27477743
CVE: CVE-2017-5715
(cherry picked from commit b8e7a489b51810992ba3266bfd9e043709e07267)
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/kbuild: enable modversions for symbols exported from asm

commit 334bb773876403eae3457d81be0b8ea70f8e4ccc upstream.

Commit 4efca4ed ("kbuild: modversions for EXPORT_SYMBOL() for asm") adds
modversion support for symbols exported from asm files. Architectures
must include C-style declarations for those symbols in asm/asm-prototypes.h
in order for them to be versioned.

Add these declarations for x86, and an architecture-independent file that
can be used for common symbols.

With f27c2f6 reverting 8ab2ae6 ("default exported asm symbols to zero") we
produce a scary warning on x86, this commit fixes that.

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Kalle Valo <kvalo@codeaurora.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Peter Wu <peter@lekensteyn.nl>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27477743
CVE: CVE-2017-5715
(cherry picked from commit b76ac90af34dc08f057da15aa34584a0a3d381b5)
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/asm: Use register variable to get stack pointer value

commit 196bd485ee4f03ce4c690bfcf38138abfcd0a4bc upstream.

Currently we use current_stack_pointer() function to get the value
of the stack pointer register. Since commit:

  f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")

... we have a stack register variable declared. It can be used instead of
current_stack_pointer() function which allows to optimize away some
excessive "mov %rsp, %<dst>" instructions:

-mov    %rsp,%rdx
-sub    %rdx,%rax
-cmp    $0x3fff,%rax
-ja     ffffffff810722fd <ist_begin_non_atomic+0x2d>

+sub    %rsp,%rax
+cmp    $0x3fff,%rax
+ja     ffffffff810722fa <ist_begin_non_atomic+0x2a>

Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer
and use it instead of the removed function.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[dwmw2: We want ASM_CALL_CONSTRAINT for retpoline]
Signed-off-by: David Woodhouse <dwmw@amazon.co.ku>
Signed-off-by: Razvan Ghitulete <rga@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cfc8c1d61e46fd3c60a34a5b1962eeeb03222a3d)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier

commit b8b7abaed7a49b350f8ba659ddc264b04931d581 upstream.

Otherwise we might have the PCID feature bit set during cpu_init().

This is just for robustness. I haven't seen any actual bugs here.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: cba4671af755 ("x86/mm: Disable PCID on 32-bit kernels")
Link: http://lkml.kernel.org/r/b16dae9d6b0db5d9801ddbebbfd83384097c61f3.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 416f66509fce36e128ea2d9e62739a21f1a1d04d)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Conflicts:
arch/x86/kernel/cpu/common.c
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm

commit b9e705ef7cfaf22db0daab91ad3cd33b0fa32eb9 upstream.

Where an ALTERNATIVE is used in the middle of an inline asm block, this
would otherwise lead to the following instruction being appended directly
to the trailing ".popsection", and a failed compile.

Fixes: 9cebed423c84 ("x86, alternative: Use .pushsection/.popsection")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: ak@linux.intel.com
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180104143710.8961-8-dwmw@amazon.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 999d4f1961fa002bda138ddfe9119965421f85da)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

x86/alternatives: Fix optimize_nops() checking

commit 612e8e9350fd19cae6900cf36ea0c6892d1a0dca upstream.

The alternatives code checks only the first byte whether it is a NOP, but
with NOPs in front of the payload and having actual instructions after it
breaks the "optimized' test.

Make sure to scan all bytes before deciding to optimize the NOPs in there.

Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180110112815.mgciyf5acwacphkq@pd.tnic
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e997d991ab2b1dc9f9cdad999a891626c2aecf21)
Orabug: 27477743
CVE: CVE-2017-5715
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

block: Check for gaps on front and back merges

We are checking for gaps to previous bio_vec, which can
only detect back merges gaps. Moreover, at the point where
we check for a gap, we don't know if we will attempt a back
or a front merge. Thus, check for gap to prev in a back merge
attempt and check for a gap to next in a front merge attempt.

Signed-off-by: Jens Axboe <axboe@fb.com>
[sagig: Minor rename change]
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
(cherry picked from commit 5e7c4274a70aa2d6f485996d0ca1dad52d0039ca)

Orabug: 27484719

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

block: Copy a user iovec if it includes gaps

For drivers that don't support gaps in the SG lists handed to
them we must bounce (copy the user buffers) and pass a bio that
does not include gaps. This doesn't matter for any current user,
but will help to allow iser which can't handle gaps to use the
block virtual boundary instead of using driver-local bounce
buffering when handling SG_IO commands.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 46348456c1791053dcbe5a9e21825b10a3c8a8fb)

Orabug: 27484719

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

block: Replace SG_GAPS with new queue limits mask

The SG_GAPS queue flag caused checks for bio vector alignment against
PAGE_SIZE, but the device may have different constraints. This patch
adds a queue limits so a driver with such constraints can set to allow
requests that would have been unnecessarily split. The new gaps check
takes the request_queue as a parameter to simplify the logic around
invoking this function.

This new limit makes the queue flag redundant, so removing it and
all usage. Device-mappers will inherit the correct settings through
blk_stack_limits().

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 03100aada96f0645bbcb89aea24c01f02d0ef1fa)

Orabug: 27484719

Conflicts:
Changes in blk_bio_segment_split() are not picked up
from 03100aada96f0645bbcb89aea24c01f02d0ef1fa as we are not
porting arbitary bio size support.

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

Revert "block: Copy a user iovec if it includes gaps"

This reverts commit cc500383c29a734102683003e2485dac6c6a0f05.

In UEK4, the decision is made to do complete switch to
queue_virt_boundary discarding the use of SG_GAPS flags. Hence,
this partial porting of new queue limit mask is reverted. Also,
this partial patch is broken for stacked devices.

Orabug: 27484719

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

Revert "block: Check for gaps on front and back merges"

This reverts commit 50f113a81b852e45ead0d4b0fd5d79e96530f643.

In UEK4, the decision is made to do complete switch to
queue_virt_boundary discarding the use of SG_GAPS flags. Hence,
this partial porting of new queue limit mask is reverted. Also,
this partial patch is broken for stacked devices.

Orabug: 27484719

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

Revert "blk: [Partial] Replace SG_GAPGS with new queue limits mask"

This reverts commit 5ba566b455bd5302aff276dca38059202b7be08c.

In UEK4, the decision is made to do complete switch to
queue_virt_boundary discarding the use of SG_GAPS flags. Hence,
this partial porting of new queue limit mask is reverted. Also,
this partial patch is broken for stacked devices.

Orabug: 27484719

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

qlcnic: fix deadlock bug

Orabug: 27496907

The following soft lockup was caught. This is a deadlock caused by
recursive locking.

Process kworker/u40:1:28016 was holding spin lock "mbx->queue_lock" in
qlcnic_83xx_mailbox_worker(), while a softirq came in and ask the same spin
lock in qlcnic_83xx_enqueue_mbx_cmd(). This lock should be hold by disable
bh..

[161846.962125] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/u40:1:28016]
[161846.962367] Modules linked in: tun ocfs2 xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc fcoe libfcoe libfc sunrpc 8021q mrp garp bridge stp llc bonding dm_round_robin dm_multipath iTCO_wdt iTCO_vendor_support pcspkr sb_edac edac_core i2c_i801 shpchp lpc_ich mfd_core ioatdma ipmi_devintf ipmi_si ipmi_msghandler sg ext4 jbd2 mbcache2 sr_mod cdrom sd_mod igb i2c_algo_bit i2c_core ahci libahci megaraid_sas ixgbe dca ptp pps_core vxlan udp_tunnel ip6_udp_tunnel qla2xxx scsi_transport_fc qlcnic crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_mod
[161846.962454]
[161846.962460] CPU: 1 PID: 28016 Comm: kworker/u40:1 Not tainted 4.1.12-94.5.9.el6uek.x86_64 #2
[161846.962463] Hardware name: Oracle Corporation SUN SERVER X4-2L      /ASSY,MB,X4-2L         , BIOS 26050100 09/19/2017
[161846.962489] Workqueue: qlcnic_mailbox qlcnic_83xx_mailbox_worker [qlcnic]
[161846.962493] task: ffff8801f2e34600 ti: ffff88004ca5c000 task.ti: ffff88004ca5c000
[161846.962496] RIP: e030:[<ffffffff810013aa>]  [<ffffffff810013aa>] xen_hypercall_sched_op+0xa/0x20
[161846.962506] RSP: e02b:ffff880202e43388  EFLAGS: 00000206
[161846.962509] RAX: 0000000000000000 RBX: ffff8801f6996b70 RCX: ffffffff810013aa
[161846.962511] RDX: ffff880202e433cc RSI: ffff880202e433b0 RDI: 0000000000000003
[161846.962513] RBP: ffff880202e433d0 R08: 0000000000000000 R09: ffff8801fe893200
[161846.962516] R10: ffff8801fe400538 R11: 0000000000000206 R12: ffff880202e4b000
[161846.962518] R13: 0000000000000050 R14: 0000000000000001 R15: 000000000000020d
[161846.962528] FS:  0000000000000000(0000) GS:ffff880202e40000(0000) knlGS:ffff880202e40000
[161846.962531] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[161846.962533] CR2: 0000000002612640 CR3: 00000001bb796000 CR4: 0000000000042660
[161846.962536] Stack:
[161846.962538]  ffff880202e43608 0000000000000000 ffffffff813f0442 ffff880202e433b0
[161846.962543]  0000000000000000 ffff880202e433cc ffffffff00000001 0000000000000000
[161846.962547]  00000009813f03d6 ffff880202e433e0 ffffffff813f0460 ffff880202e43440
[161846.962552] Call Trace:
[161846.962555]  <IRQ>
[161846.962565]  [<ffffffff813f0442>] ? xen_poll_irq_timeout+0x42/0x50
[161846.962570]  [<ffffffff813f0460>] xen_poll_irq+0x10/0x20
[161846.962578]  [<ffffffff81014222>] xen_lock_spinning+0xe2/0x110
[161846.962583]  [<ffffffff81013f01>] __raw_callee_save_xen_lock_spinning+0x11/0x20
[161846.962592]  [<ffffffff816e5c57>] ? _raw_spin_lock+0x57/0x80
[161846.962609]  [<ffffffffa028acfc>] qlcnic_83xx_enqueue_mbx_cmd+0x7c/0xe0 [qlcnic]
[161846.962623]  [<ffffffffa028e008>] qlcnic_83xx_issue_cmd+0x58/0x210 [qlcnic]
[161846.962636]  [<ffffffffa028caf2>] qlcnic_83xx_sre_macaddr_change+0x162/0x1d0 [qlcnic]
[161846.962649]  [<ffffffffa028cb8b>] qlcnic_83xx_change_l2_filter+0x2b/0x30 [qlcnic]
[161846.962657]  [<ffffffff8160248b>] ? __skb_flow_dissect+0x18b/0x650
[161846.962670]  [<ffffffffa02856e5>] qlcnic_send_filter+0x205/0x250 [qlcnic]
[161846.962682]  [<ffffffffa0285c77>] qlcnic_xmit_frame+0x547/0x7b0 [qlcnic]
[161846.962691]  [<ffffffff8160ac22>] xmit_one+0x82/0x1a0
[161846.962696]  [<ffffffff8160ad90>] dev_hard_start_xmit+0x50/0xa0
[161846.962701]  [<ffffffff81630112>] sch_direct_xmit+0x112/0x220
[161846.962706]  [<ffffffff8160b80f>] __dev_queue_xmit+0x1df/0x5e0
[161846.962710]  [<ffffffff8160bc33>] dev_queue_xmit_sk+0x13/0x20
[161846.962721]  [<ffffffffa0575bd5>] bond_dev_queue_xmit+0x35/0x80 [bonding]
[161846.962729]  [<ffffffffa05769fb>] __bond_start_xmit+0x1cb/0x210 [bonding]
[161846.962736]  [<ffffffffa0576a71>] bond_start_xmit+0x31/0x60 [bonding]
[161846.962740]  [<ffffffff8160ac22>] xmit_one+0x82/0x1a0
[161846.962745]  [<ffffffff8160ad90>] dev_hard_start_xmit+0x50/0xa0
[161846.962749]  [<ffffffff8160bb1e>] __dev_queue_xmit+0x4ee/0x5e0
[161846.962754]  [<ffffffff8160bc33>] dev_queue_xmit_sk+0x13/0x20
[161846.962760]  [<ffffffffa05cfa72>] vlan_dev_hard_start_xmit+0xb2/0x150 [8021q]
[161846.962764]  [<ffffffff8160ac22>] xmit_one+0x82/0x1a0
[161846.962769]  [<ffffffff8160ad90>] dev_hard_start_xmit+0x50/0xa0
[161846.962773]  [<ffffffff8160bb1e>] __dev_queue_xmit+0x4ee/0x5e0
[161846.962777]  [<ffffffff8160bc33>] dev_queue_xmit_sk+0x13/0x20
[161846.962789]  [<ffffffffa05adf74>] br_dev_queue_push_xmit+0x54/0xa0 [bridge]
[161846.962797]  [<ffffffffa05ae4ff>] br_forward_finish+0x2f/0x90 [bridge]
[161846.962807]  [<ffffffff810b0dad>] ? ttwu_do_wakeup+0x1d/0x100
[161846.962811]  [<ffffffff815f929b>] ? __alloc_skb+0x8b/0x1f0
[161846.962818]  [<ffffffffa05ae04d>] __br_forward+0x8d/0x120 [bridge]
[161846.962822]  [<ffffffff815f613b>] ? __kmalloc_reserve+0x3b/0xa0
[161846.962829]  [<ffffffff810be55e>] ? update_rq_runnable_avg+0xee/0x230
[161846.962836]  [<ffffffffa05ae176>] br_forward+0x96/0xb0 [bridge]
[161846.962845]  [<ffffffffa05af85e>] br_handle_frame_finish+0x1ae/0x420 [bridge]
[161846.962853]  [<ffffffffa05afc4f>] br_handle_frame+0x17f/0x260 [bridge]
[161846.962862]  [<ffffffffa05afad0>] ? br_handle_frame_finish+0x420/0x420 [bridge]
[161846.962867]  [<ffffffff8160d057>] __netif_receive_skb_core+0x1f7/0x870
[161846.962872]  [<ffffffff8160d6f2>] __netif_receive_skb+0x22/0x70
[161846.962877]  [<ffffffff8160d913>] netif_receive_skb_internal+0x23/0x90
[161846.962884]  [<ffffffffa07512ea>] ? xenvif_idx_release+0xea/0x100 [xen_netback]
[161846.962889]  [<ffffffff816e5a10>] ? _raw_spin_unlock_irqrestore+0x20/0x50
[161846.962893]  [<ffffffff8160e624>] netif_receive_skb_sk+0x24/0x90
[161846.962899]  [<ffffffffa075269a>] xenvif_tx_submit+0x2ca/0x3f0 [xen_netback]
[161846.962906]  [<ffffffffa0753f0c>] xenvif_tx_action+0x9c/0xd0 [xen_netback]
[161846.962915]  [<ffffffffa07567f5>] xenvif_poll+0x35/0x70 [xen_netback]
[161846.962920]  [<ffffffff8160e01b>] napi_poll+0xcb/0x1e0
[161846.962925]  [<ffffffff8160e1c0>] net_rx_action+0x90/0x1c0
[161846.962931]  [<ffffffff8108aaba>] __do_softirq+0x10a/0x350
[161846.962938]  [<ffffffff8108ae75>] irq_exit+0x125/0x130
[161846.962943]  [<ffffffff813f03a9>] xen_evtchn_do_upcall+0x39/0x50
[161846.962950]  [<ffffffff816e7ffe>] xen_do_hypervisor_callback+0x1e/0x40
[161846.962952]  <EOI>
[161846.962959]  [<ffffffff816e5c4a>] ? _raw_spin_lock+0x4a/0x80
[161846.962964]  [<ffffffff816e5b1e>] ? _raw_spin_lock_irqsave+0x1e/0xa0
[161846.962978]  [<ffffffffa028e279>] ? qlcnic_83xx_mailbox_worker+0xb9/0x2a0 [qlcnic]
[161846.962991]  [<ffffffff810a14e1>] ? process_one_work+0x151/0x4b0
[161846.962995]  [<ffffffff8100c3f2>] ? check_events+0x12/0x20
[161846.963001]  [<ffffffff810a1960>] ? worker_thread+0x120/0x480
[161846.963005]  [<ffffffff816e187b>] ? __schedule+0x30b/0x890
[161846.963010]  [<ffffffff810a1840>] ? process_one_work+0x4b0/0x4b0
[161846.963015]  [<ffffffff810a1840>] ? process_one_work+0x4b0/0x4b0
[161846.963021]  [<ffffffff810a6b3e>] ? kthread+0xce/0xf0
[161846.963025]  [<ffffffff810a6a70>] ? kthread_freezable_should_stop+0x70/0x70
[161846.963031]  [<ffffffff816e6522>] ? ret_from_fork+0x42/0x70
[161846.963035]  [<ffffffff810a6a70>] ? kthread_freezable_should_stop+0x70/0x70
[161846.963037] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 233ac3891607f501f08879134d623b303838f478)
Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>

x86/entry: RESTORE_IBRS needs to be done under kernel CR3

RESTORE_IBRS_CLOBBER executes after we have already switched
to the USER_CR3. This blows up because RESTORE_IBRS_CLOBBER
looks at a kernel variable (use_ibrs).

Orabug: 27501734

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit a2b15d7844fc60bc3ebb5f1703cd2fe39256db35)

rds: Fix NULL pointer dereference in __rds_rdma_map

This is a fix for syzkaller719569, where memory registration was
attempted without any underlying transport being loaded.

Analysis of the case reveals that it is the setsockopt() RDS_GET_MR
(2) and RDS_GET_MR_FOR_DEST (7) that are vulnerable.

Here is an example stack trace when the bug is hit:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000c0
IP: __rds_rdma_map+0x36/0x440 [rds]
PGD 2f93d03067 P4D 2f93d03067 PUD 2f93d02067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: bridge stp llc tun rpcsec_gss_krb5 nfsv4
dns_resolver nfs fscache rds binfmt_misc sb_edac intel_powerclamp
coretemp kvm_intel kvm irqbypass crct10dif_pclmul c rc32_pclmul
ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd
iTCO_wdt mei_me sg iTCO_vendor_support ipmi_si mei ipmi_devintf nfsd
shpchp pcspkr i2c_i801 ioatd ma ipmi_msghandler wmi lpc_ich mfd_core
auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
mgag200 i2c_algo_bit drm_kms_helper ixgbe syscopyarea ahci sysfillrect
sysimgblt libahci mdio fb_sys_fops ttm ptp libata sd_mod mlx4_core drm
crc32c_intel pps_core megaraid_sas i2c_core dca dm_mirror
dm_region_hash dm_log dm_mod
CPU: 48 PID: 45787 Comm: repro_set2 Not tainted 4.14.2-3.el7uek.x86_64 #2
Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017
task: ffff882f9190db00 task.stack: ffffc9002b994000
RIP: 0010:__rds_rdma_map+0x36/0x440 [rds]
RSP: 0018:ffffc9002b997df0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff882fa2182580 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffc9002b997e40 RDI: ffff882fa2182580
RBP: ffffc9002b997e30 R08: 0000000000000000 R09: 0000000000000002
R10: ffff885fb29e3838 R11: 0000000000000000 R12: ffff882fa2182580
R13: ffff882fa2182580 R14: 0000000000000002 R15: 0000000020000ffc
FS: 00007fbffa20b700(0000) GS:ffff882fbfb80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000c0 CR3: 0000002f98a66006 CR4: 00000000001606e0
Call Trace:
rds_get_mr+0x56/0x80 [rds]
rds_setsockopt+0x172/0x340 [rds]
? __fget_light+0x25/0x60
? __fdget+0x13/0x20
SyS_setsockopt+0x80/0xe0
do_syscall_64+0x67/0x1b0
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fbff9b117f9
RSP: 002b:00007fbffa20aed8 EFLAGS: 00000293 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00000000000c84a4 RCX: 00007fbff9b117f9
RDX: 0000000000000002 RSI: 0000400000000114 RDI: 000000000000109b
RBP: 00007fbffa20af10 R08: 0000000000000020 R09: 00007fbff9dd7860
R10: 0000000020000ffc R11: 0000000000000293 R12: 0000000000000000
R13: 00007fbffa20b9c0 R14: 00007fbffa20b700 R15: 0000000000000021

Code: 41 56 41 55 49 89 fd 41 54 53 48 83 ec 18 8b 87 f0 02 00 00 48
89 55 d0 48 89 4d c8 85 c0 0f 84 2d 03 00 00 48 8b 87 00 03 00 00 <48>
83 b8 c0 00 00 00 00 0f 84 25 03 00 0 0 48 8b 06 48 8b 56 08

The fix is to check the existence of an underlying transport in
__rds_rdma_map().

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f3069c6d33f6ae63a1668737bc78aaaa51bff7ca)

Conflicts:
net/rds/rdma.c

Due to commit 4d2cc57ee46b ("rds: Changed IP address internal
representation to struct in6_addr")

(cherry picked from commit eb54c5657bd52347a5b75f1e6be49432d8944357)

Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
Orabug: 27477010

Btrfs: fix unexpected EEXIST from btrfs_get_extent

Orabug: 27446668

This fixes a corner case that is caused by a race of dio write vs dio
read/write.

Here is how the race could happen.

Suppose that no extent map has been loaded into memory yet.
There is a file extent [0, 32K), two jobs are running concurrently
against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio
read from [0, 4K) or [4K, 8K).

t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K).

------------------------------------------------------
         t1                                t2
  btrfs_get_blocks_direct()         btrfs_get_blocks_direct()
   -> btrfs_get_extent()              -> btrfs_get_extent()
       -> lookup_extent_mapping()
       -> add_extent_mapping()            -> lookup_extent_mapping()
          # load [0, 32K)
   -> btrfs_new_extent_direct()
       -> btrfs_drop_extent_cache()
          # split [0, 32K)
       -> add_extent_mapping()
          # add [8K, 32K)
                                          -> add_extent_mapping()
                                             # handle -EEXIST when adding
                                             # [0, 32K)
------------------------------------------------------
More details about how t2(dio read/write) runs into -EEXIST:

When add_extent_mapping() gets -EEXIST for adding em [0, 32k),
search_extent_mapping() would return [0, 8k) as existing em, even
though start == existing->start, em is [0, 32k) and
extent_map_end(em) > extent_map_end(existing), ie. 32k > 8k,
then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
(with a length 0), and btrfs_get_extent() ends up returning -EEXIST,
and dio read/write will get -EEXIST which is confusing applications.

Here I also concluded all possible situations,
1) start < existing->start

            +-----------+em+-----------+
+--prev---+ |     +-------------+      |
|         | |     |             |      |
+---------+ +     +---+existing++      ++
                +
                |
                +
             start

2) start == existing->start

      +------------em------------+
      |     +-------------+      |
      |     |             |      |
      +     +----existing-+      +
            |
            |
            +
         start

3) start > existing->start && start < (existing->start + existing->len)

      +------------em------------+
      |     +-------------+      |
      |     |             |      |
      +     +----existing-+      +
               |
               |
               +
             start

4) start >= (existing->start + existing->len)

+-----------+em+-----------+
|     +-------------+      | +--next---+
|     |             |      | |         |
+     +---+existing++      + +---------+
                      +
                      |
                      +
                   start

After going thru the above case by case, it turns out that if start is
within existing em (front inclusive), then the existing em should be
returned, otherwise, we try our best to merge candidate em with
sibling ems to form a larger em.

Reported-by: David Vallender <david.vallender@landmark.co.uk>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>

Btrfs: fix incorrect block_len in merge_extent_mapping

Orabug: 27446668

%block_len could be checked on deciding if two em are mergable.

merge_extent_mapping() has only added the front pad if the front part
of em gets truncated, but it's possible that the end part gets
truncated.

For both compressed extent and inline extent, em->block_len is not
adjusted accordingly, while for regular extent, em->block_len always
equals to em->len, hence this sets em->block_len with em->len.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>

Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mapping

Orabug: 27446668

This is a subtle case, so in order to understand the problem, it'd be
good to know the content of existing and em when any error occurs.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>

Btrfs: deal with existing encompassing extent map in btrfs_get_extent()

Orabug: 27446668

My QEMU VM was seeing inexplicable I/O errors that I tracked down to
errors coming from the qcow2 virtual drive in the host system. The qcow2
file is a nocow file on my Btrfs drive, which QEMU opens with O_DIRECT.
Every once in awhile, pread() or pwrite() would return EEXIST, which
makes no sense. This turned out to be a bug in btrfs_get_extent().

Commit 8dff9c853410 ("Btrfs: deal with duplciates during extent_map
insertion in btrfs_get_extent") fixed a case in btrfs_get_extent() where
two threads race on adding the same extent map to an inode's extent map
tree. However, if the added em is merged with an adjacent em in the
extent tree, then we'll end up with an existing extent that is not
identical to but instead encompasses the extent we tried to add. When we
call merge_extent_mapping() to find the nonoverlapping part of the new
em, the arithmetic overflows because there is no such thing. We then end
up trying to add a bogus em to the em_tree, which results in a EEXIST
that can bubble all the way up to userspace.

Fix it by extending the identical extent map special case.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
(cherry picked from commit 8e2bd3b7fac91b79a6115fd1511ca20b2a09696d)
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>

Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent

Orabug: 27446668

When dealing with inline extents, btrfs_get_extent will incorrectly try
to insert a duplicate extent_map. The dup hits -EEXIST from
add_extent_map, but then we try to merge with the existing one and end
up trying to insert a zero length extent_map.

This actually works most of the time, except when there are extent maps
past the end of the inline extent. rocksdb will trigger this sometimes
because it preallocates an extent and then truncates down.

Josef made a script to trigger with xfs_io:

#!/bin/bash

xfs_io -f -c "pwrite 0 1000" inline
xfs_io -c "falloc -k 4k 1M" inline
xfs_io -c "pread 0 1000" -c "fadvise -d 0 1000" -c "pread 0 1000" inline
xfs_io -c "fadvise -d 0 1000" inline
cat inline

You'll get EIOs trying to read inline after this because add_extent_map
is returning EEXIST

Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit 8dff9c85341032767d7b519217a79ea04cd676b0)
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>

x86/spec: Fix spectre_v1 bug and mitigation indicators

/proc/cpuinfo correctly shows that we have the spectre_v1 bug and
the sysfs vulnerabilities file shows that we have the lfence mitigation
in place.

Orabug: 27470687

Signed-off-by: John Haxby <john.haxby@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Drivers: hv: util: Backup: Fix a rescind processing issue

VSS may use a char device to support the communication between
the user level daemon and the driver. When the VSS channel is rescinded
we need to make sure that the char device is fully cleaned up before
we can process a new VSS offer from the host. Implement this logic.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426063
(cherry picked from commit d77044d142e960f7b5f814a91ecb8bcf86aa552c)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>

Drivers: hv: vss: Operation timeouts should match host expectation

Increase the timeout of backup operations. When system is under I/O load,
it needs more time to freeze. These timeout values should also match the
host timeout values more closely.

Signed-off-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426063
(cherry picked from commit b357fd3908c1191f2f56e38aa77f2aecdae18bc8)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>

Drivers: hv: vss: Improve log messages.

Adding log messages to help troubleshoot error cases and transaction
handling.

Signed-off-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426063
(cherry picked from commit 23d2cc0c29eb0e7c6fe4cac88098306c31c40208)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>

Drivers: hv: utils: Check VSS daemon is listening before a hot backup

Hyper-V host will send a VSS_OP_HOT_BACKUP request to check if guest is
ready for a live backup/snapshot. The driver should respond to the check
only if the daemon is running and listening to requests. This allows the
host to fallback to standard snapshots in case the VSS daemon is not
running.

Signed-off-by: Alex Ng <alexng@messages.microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426063
(cherry picked from commit db886e4d24c2b3d334be2cc1bd1bd05d547eb4c4)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>

Drivers: hv: utils: Continue to poll VSS channel after handling requests.

Multiple VSS_OP_HOT_BACKUP requests may arrive in quick succession, even
though the host only signals once. The driver wass handling the first
request while ignoring the others in the ring buffer. We should poll the
VSS channel after handling a request to continue processing other requests.

Signed-off-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426063
(cherry picked from commit 497af84b81b98b27e9ba7aebb8a373412e328497)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>

Drivers: hv: utils: fix a race on userspace daemons registration

Background: userspace daemons registration protocol for Hyper-V utilities
drivers has two steps:
1) daemon writes its own version to kernel
2) kernel reads it and replies with module version
at this point we consider the handshake procedure being completed and we
do hv_poll_channel() transitioning the utility device to HVUTIL_READY
state. At this point we're ready to handle messages from kernel.

When hvutil_transport is in HVUTIL_TRANSPORT_CHARDEV mode we have a
single buffer for outgoing message. hvutil_transport_send() puts to this
buffer and till the buffer is cleared with hvt_op_read() returns -EFAULT
to all consequent calls. Host<->guest protocol guarantees there is no more
than one request at a time and we will not get new requests till we reply
to the previous one so this single message buffer is enough.

Now to the race. When we finish negotiation procedure and send kernel
module version to userspace with hvutil_transport_send() it goes into the
above mentioned buffer and if the daemon is slow enough to read it from
there we can get a collision when a request from the host comes, we won't
be able to put anything to the buffer so the request will be lost. To
solve the issue we need to know when the negotiation is really done (when
the version message is read by the daemon) and transition to HVUTIL_READY
state after this happens. Implement a callback on read to support this.
Old style netlink communication is not affected by the change, we don't
really know when these messages are delivered but we don't have a single
message buffer there.

Reported-by: Barry Davis <barry_davis@stormagic.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426063
(cherry picked from commit e0fa3e5e7df61eb2c339c9f0067c202c0cdeec2c)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>

Drivers: hv: util: catch allocation errors

Catch allocation errors in hvutil_transport_send.

Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport abstraction')
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426063
(cherry picked from commit cdc0c0c94e4e6dfa371d497a3130f83349b6ead6)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>

Drivers: hv: vss: run only on supported host versions

The Backup integration service on WS2012 has appearently trouble to
negotiate with a guest which does not support the provided util version.
Currently the VSS driver supports only version 5/0. A WS2012 offers only
version 1/x and 3/x, and vmbus_prep_negotiate_resp correctly returns an
empty icframe_vercnt/icmsg_vercnt. But the host ignores that and
continues to send ICMSGTYPE_NEGOTIATE messages. The result are weird
errors during boot and general misbehaviour.

Check the Windows version to work around the host bug, skip hv_vss_init
on WS2012 and older.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426063
(cherry picked from commit ed9ba608e4851144af8c7061cbb19f751c73e998)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>

Drivers: hv: utils: unify driver registration reporting

Unify driver registration reporting and move it to debug level as normally daemons write to syslog themselves
and these kernel messages are useless.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 27426063
(cherry picked from commit 7c959127581d420ec692344ee905504fdeecec1a)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ashok Vairavan <ashok.vairavan@oracle.com>

drivers/char/mem.c: deny access in open operation when securelevel is set

Orabug: 26943864

There is still a secure hole left in mem.c driver -- when securelevel is set
userland application could access PCI configuration space via this driver.

--
Attempting to write via mmap() API using acc_test.

[root@ban92uut054 ~]# ./acc_test mmap 0x846000c4=0x1
Using mmap() API for access
mmap write wrote 0x1

[root@ban92uut054 ~]# setpci -s 46:00.0 0xc4.l
00000001

How, write 0x0 to offset 0xc4
[root@ban92uut054 ~]# ./acc_test mmap 0x846000c4=0x0
Using mmap() API for access
mmap write wrote 0x0
[root@ban92uut054 ~]# setpci -s 46:00.0 0xc4.l
00000000
--
source code of acc_test program:

main(int argc, char *argv[])
{

int fd;
int retval;
int val;
off_t addr;
char *tmp;
int operation = 0; //read

int access_type = -1;
off_t page_base = 0;
off_t page_offset = 0;
off_t pagesize = sysconf(_SC_PAGE_SIZE);
char *mem;
int prot;

if (argc < 3) {
printf("Insufficient args: acc_test rw|mmap <addr> [-w <val>]\n");
return -1;
}

if (strcmp("rw", argv[1]) == 0) {
access_type = 1;
printf("Using pread()/pwrite() API for access\n");
}
else if (strcmp("mmap", argv[1]) == 0) {
access_type = 2;
printf("Using mmap() API for access\n");
}
else {
printf("Illegal access type: must be rw or mmap\n");
return -1;
}

addr = strtoul(argv[2], &tmp, 16);
if ((tmp && (*tmp != '='))  &&
            ((*tmp != '\0') || (errno == EINVAL) ||
            (addr == ULONG_MAX && errno == ERANGE))) {
                fprintf(stderr, "Invalid address specified; must be hex based\n");
                if (errno) perror("error : ");
                exit(1);
}
else if (tmp && (*tmp == '=')) { // write case
tmp++;
val = strtoul(tmp, NULL, 16);
operation = 1;
}

//fd = open("/sys/bus/pci/devices/0000:46:00.0/config",O_RDWR | O_SYNC);
//retval = pread(fd, &val, 4, 0xc4);

if (operation == 1)
fd = open("/dev/mem",O_RDWR);
else
fd = open("/dev/mem",O_RDONLY);

        if (fd < 0) {
perror("open failed");
exit(1);
}

switch (access_type) {

case 1 : // pread/pwrite API
if (!operation) {
if (pread(fd, &val, 4, addr) < 0) {
perror("pread failed");
return -1;
}
else
printf("pread returned 0x%x from 0x%x\n",val,addr);
}
else {
if (pwrite(fd, &val, 4, addr) < 0) {
perror ("pwrite failed");
return -1;
}
printf("pwrite() wrote 0x%x to 0x%x\n",val,addr);
}
break;

case 2 :   // mmap API
page_base = (addr / pagesize) * pagesize;
page_offset = addr - page_base;
prot = PROT_READ;
if (operation)
prot |= PROT_WRITE;
mem = mmap(NULL, page_offset + 4, prot, MAP_SHARED,
fd, page_base);
if (mem == MAP_FAILED) {
perror("can't mmap");
return -1;
}

if (!operation)
printf("mmap read returned 0x%x\n",*(uint32_t *)&mem[page_offset]);
else {
*(uint32_t *)&mem[page_offset] = (uint32_t)val;
printf("mmap write wrote 0x%x\n",val);
}

break;

default :
printf("Illegal access mode\n");
return -1;
}

close(fd);

}
--

This patch is purposed to fix this hole when securelevel is set where one could write to
/dev/mem via the mmap() API. The fix to disallow opening /dev/mem or /dev/kmem
for access. The fix checks access at open rather than have get_securelevel() called at
the various write/read locations.

Signed-off-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Reviewed-by: Eric Snowberg <eric.snowberg@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle>
(cherry picked from commit 8ca81822fffb2536aa1076fb17d1ea1413df3e7f)

Add this commit back to UEK4 now that the regressions have been fixed

Orabug: 27465736

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dan Duval <dan.duval@oracle.com>

rds: Calling getsockname() on unbounded socket generates seg fault

If a socket is not yet bound, calling getsockname() should return an
unspecified address with family AF_UNSPEC as an RDS socket can be
bound to either an IPv4 or an IPv6 address.  Hence returning either
family is incorrect.  Currently, the returned address is set to an
unspecified address with family AF_INET6.  If the passed in buffer is
smaller than sizeof(struct sockaddr_in6), the app may get a
segmentation fault error.  This is similar to passing a buffer smaller
than sizeof(struct sockaddr_in) before the IPv6 changes.

Orabug: 27463484

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

rds: Second bind() can overwrite the first bind()

In rds_bind(), it does not check if a socket is already bound
or not. Hence a second bind() will overwrite what is done in
the first bind().

Orabug: 27463500

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

rds: Un-connected socket sendmsg() with a NULL destination does not fail

If the destnation address used in sendto()/sendmsg() is NULL, the
send call does not fail because the check done in rds_sendmsg() is
not correct.

Orabug: 27463507

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

x86/mitigation/spectre_v2: Add reporting of 'lfence'

if IBRS is off, but we do have 'lfence' enabled.

Obviously if 'lfence' is IBRS and 'nolfence' has been used - we
print that too.

OraBug: 27472666
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/spec: Add 'lfence_enabled' in sysfs

To toggle during runtime whether the lfence fallback should be used.
If IBRS is enabled warnings will be returned.

OraBug: 27472666
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/spec_ctrl: Add 'nolfence' knob to disable fallback for spectre_v2 mitigation

If 'noibrs' is used, or the hardware does not have IBRS microcode
we fallback on using 'lfence' on every system call/interrupt/exception/etc.

This can dramatically slow down the performance. As a knob to
measure this provide 'nolfence' which will also disable this
security big hammer.

OraBug: 27472666
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86: Fix compile issues if CONFIG_XEN not defined

We added the usage of 'xen_initial_domain()' but forgot
to include the header file.

OraBug: 27477720
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

hugetlb: fix nr_pmds accounting with shared page tables

We account HugeTLB's shared page table to all processes who share it.
The accounting happens during huge_pmd_share().

If somebody populates pud entry under us, we should decrease pagetable's
refcount and decrease nr_pmds of the process.

By mistake, I increase nr_pmds again in this case. :-/ It will lead to
"BUG: non-zero nr_pmds on freeing mm: 2" on process' exit.

Let's fix this by increasing nr_pmds only when we're sure that the page
table will be used.

Link: http://lkml.kernel.org/r/20160617122506.GC6534@node.shutemov.name
Fixes: dc6c9a35b66b ("mm: account pmd page tables to the process")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: zhongjiang <zhongjiang@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 27451809

(cherry picked from commit c17b1f42594eb71b8d3eb5a6dfc907a7eb88a51d)
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

net/mlx4_core: allow QPs with enable_smi_admin enabled

In the commit 64453f042519 ("net/mlx4_core: Disallow creation of RAW QPs
on a VF"), it disallows some QPs. But when enable_smi_admin is enabled,
it should be allowed to pass.

Orabug: 27452072

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
(cherry picked from commit a3fe544e915d367100f9f149109d7a7599b032ca)
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>

net/rds: Fix incorrect error handling

Commit 5f58d7e81c2f ("net/rds: reduce memory footprint during
ib_post_recv in IB transport") removes order two allocations used to
receive fragments. Instead, zero order allocations are used. However,
said commit has incorrect error handling.

This commit fixes this.

Orabug: 27469760

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
(cherry picked from commit a3740c633f4f7089d5cbb11b4a9d19c79c96f746)

x86: Move STUFF_RSB in to the idt macro

instead of it sitting in paranoid_entry or error_entry.

The idea behind the STUFF_RSB is to be done _before_
any calls are done. Which means we really want this in the idt
macro that is handled for exceptions - such as device not available,
which currently looks as so:

[Ignore the callq *0x40.. that gets converted to an 'cld']

<device_not_available>:
  nop
  nop
  nop
  callq  *0x40d0b7(%rip)        # ffffffff81b55330 <pv_irq_ops+0x30> <= patched to cld
  pushq  $0xffffffffffffffff
  sub    $0x78,%rsp
  callq  ffffffff81748ea0 <error_entry>           <=== call!
  mov    %rsp,%rdi
  xor    %esi,%esi
  callq  ffffffff81018830 <do_device_not_available>
  test   %rax,%rax
  jne    ffffffff81747f10 <dtrace_error_exit>
  jmpq   ffffffff817490a0 <error_exit>
  nopl   0x0(%rax)

By stuffing the RSB before the call to error_entry (or
paranoid_entry) we remove the chance of this becoming an attack vector.

While at it, remove the useless comment - we don't encode any frames
in UEK4.

OraBug: 27417150
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/spectre: Drop the warning about ibrs being obsolete.

They scare folks thinking in that they don't work if you
use 'noibrs'. And create support tickets. Just let us
be quiet and support both options.

OraBug: 27439976
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>