www.infradead.org Git - users/jedix/linux-maple.git/log

x86/cpu/AMD: Make the LFENCE instruction serialized

In order to reduce the impact of using MFENCE, make the execution of the
LFENCE instruction serialized. This is done by setting bit 1 of MSR
0xc0011029 (DE_CFG).

Some families that support LFENCE do not have this MSR. For these
families, the LFENCE instruction is already serialized.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Orabug: 27340445
CVE: CVE-2017-5753

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
patch refers to arch/x86/include/asm/msr-index.h
code base has arch/x86/include/uapi/asm/msr-index.h

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

kABI: Make the boot_cpu_data look normal.

It is statically allocated and we only grow it - so having an
GENKSYMS around it is fine. This fixes
aff7641cb9f37c7aa6897a7b51faa6e20b08013f
"x86/cpu/AMD: Add speculative control support for AMD" breaking the kABI

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

kernel.spec: Require the new microcode_ctl.

Which provides the new CPUID and MSRs to combat
CVE-2017-5715

Orabug: 27344012
CVE: CVE-2017-5715

Suggested-by: Todd Vierling <todd.vierling@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/microcode/AMD: Add support for fam17h microcode loading

The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes. Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20171130224640.15391.40247.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/spec_ctrl: Disable if running as Xen PV guest.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

Set IBPB when running a different VCPU

Picking up a change from:
From: Tim Chen <tim.c.chen@linux.intel.com>
Date: Thu, 30 Nov 2017 15:00:12 +0100
[RHEL7.5 PATCH 07/35] kvm: vmx: Set IBPB when running a different
VCPU

Ensure an IBPB (Indirect branch prediction barrier) before every VCPU
switch.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

Clear the host registers after setbe

The original patch cleared the host registers before setbe doing XOR,
and it set a false flag as VM enry failure.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

Use the ibpb_inuse variable.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

KVM: x86: add SPEC_CTRL to MSR and CPUID lists

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

kvm: vmx: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD

[RHEL7.5 PATCH 08/35] kvm: vmx: add MSR_IA32_SPEC_CTRL and
MSR_IA32_PRED_CMD

Allow load/store of MSR_IA32_SPEC_CTRL, restore guest IBRS on VM entry
and set it to 1 on VM exit.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

Use the "ibrs_inuse" variable.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

kvm: svm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/svm: Set IBPB when running a different VCPU

[RHEL7.5 PATCH 09/35] x86/svm: Set IBPB when running a different VCPU

Set IBPB (Indirect Branch Prediction Barrier) when the current CPU is
going to run a VCPU different from what was previously run. Nested
virtualization uses the same VMCB for the second level guest, but the
L1 hypervisor should be using IBRS to protect itself.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/kvm: Pad RSB on VM transition

This is a patch from:

From: Tim Chen <tim.c.chen@linux.intel.com>
Date: Thu, 30 Nov 2017 15:00:10 +0100
Subject: [RHEL7.5 PATCH 05/35] x86/kvm: Pad RSB on VM transition

Add code to pad the local CPU's RSB entries to protect
from previous less privilege mode.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/cpu/AMD: Add speculative control support for AMD

Add speculative control support for AMD processors. For AMD, speculative
control is indicated as follows:

  CPUID EAX=0x00000007, ECX=0x00 return EDX[26] indicates support for
  both IBRS and IBPB.

  CPUID EAX=0x80000008, ECX=0x00 return EBX[12] indicates support for
  just IBPB.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.inte.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport: We don't have 39c06df4dc10a "x86/cpufeature: Cleanup get_cpu_cap()"
which adds a nice enum and we neither do we have 2167ceabf3416
"x86/cpu: Add CLZERO detection". As such we just a partial backport
of the last one and only look for one specific bit (12).]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/microcode: Recheck IBRS and IBPB feature on microcode reload

On new microcode write, check whether IBRS and IBPB features
are present by rescanning scattered CPU features.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86: Move IBRS/IBPB feature detection to scattered.c

Move IBRS/IBPB to scattered features for easier feature rescan.
This help to rescan feature on microcode reload later.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb control

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature

There are 2 ways to control IBPB and IBRS

1. At boot time
        noibrs kernel boot parameter will disable IBRS usage
        noibpb kernel boot parameter will disable IBPB usage
Otherwise if the above parameters are not specified, the system
will enable ibrs and ibpb usage if the cpu supports it.

2. At run time
        echo 0 > /proc/sys/kernel/ibrs_enabled will turn off IBRS
        echo 1 > /proc/sys/kernel/ibrs_enabled will turn on IBRS in kernel
        echo 2 > /proc/sys/kernel/ibrs_enabled will turn on IBRS in both userspace and kernel

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport: This completes the scaffolding work done in the earlier
patch which had the same title]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/kvm: clear registers on VM exit

Clear registers on VM exit to prevent speculative use of them.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/kvm: Set IBPB when switching VM

Set IBPB (Indirect branch prediction barrier) when switching VM.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

*INCOMPLETE* x86/syscall: Clear unused extra registers on syscall entrance

To prevent the unused registers %r12-%r15, %rbp and %rbx from
being used speculatively, we clear them upon syscall entrance
for code hygiene.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Backport: We don't have the ORC stack which means our calling.h
has the CTF code. And that has RESTORE_EXTRA_ARGS and ZERO_EXTRA_ARGS
so there was no need to port that in. See
commit 76f5df43cab5e765c0bd42289103e8f625813ae1
x86/asm/entry/64: Always allocate a complete "struct pt_regs" on the kernel stack
which added them.

The ZERO_EXTRA_REGS (aka CLEAR_EXTRA_REGS) is not part of it.
It ends up crashing the user-space. Not sure why not.

Which means this patch is pretty much useless - we don't clear
any of the %r12-%r15, nor %rbp, nor %rbx at all.

In other words we just save now more registers on the %esp
and restore them.

But somewhere we depend on these and need to fix that.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/entry: Stuff RSB for entry to kernel for non-SMEP platform

Stuff RSB to prevent RSB underflow on non-SMEP platform.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Only set IBPB when the new thread cannot ptrace current thread

To reduce overhead of setting IBPB, we only do that when
the new thread cannot ptrace the current one. If the new
thread has ptrace capability on current thread, it is safe.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport: Need more #include's than the original]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/mm: Set IBPB upon context switch

Set IBPB on context switch with changing of page table.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport needs an asm/microcode.h to include the native_wrmsrl]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup

Clear IBRS when cpu is offlined and set it when bringing it back online.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/idle: Disable IBRS entering idle and enable it on wakeup

Clear IBRS on idle entry and set it on idle exit into kernel on mwait.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport: We don't have b466bdb614823
"x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer"
hence the change to delay_mwaitx is not needed]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/spec_ctrl: save IBRS MSR value in paranoid_entry

If the NMI runs while entering kernel between SWAPGS and IBRS_ENABLE
everything is fine, paranoid_entry would have unconditionally set
IBRS bit 0 and when exiting the NMI it would have cleared bit 0 like
if it was returning to userland. IBRS_ENABLE would have then enabled
bit 0 again.

If NMI instead runs when exiting kernel between IBRS_DISABLE and
SWAPGS, the NMI would have turned on IBRS bit 0 and then it would have
left enabled when exiting the NMI. IBRS bit 0 would then be left
enabled in userland until the next enter kernel.

That is a minor inefficiency only, but we can eliminate it by saving
the MSR when entering the NMI in save_paranoid and restoring it when
exiting the NMI.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

*Scaffolding* x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature

1. At boot time
noibrs kernel boot parameter will disable IBRS usage
noibpb kernel boot parameter will disable IBPB usage

Otherwise if the above parameters are not specified, the system
will enable ibrs and ibpb usage if the cpu supports it.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[*Scaffolding*:This backport lacks a lot. It is only put it on so that the
later patches compiled _and_ can be tested to run. It is meant to be removed
once the full set of patches are all good. Aka scaffolding.]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/enter: Use IBRS on syscall and interrupts

Set IBRS upon kernel entrance via syscall and interrupts. Clear it
upon exit.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport: I had to add 'asm/spec_ctrl.h' in the assembler files]
Also we should not put ENABLE_IBRS on irq_entries_start]

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86: Add macro that does not save rax, rcx, rdx on stack to disable IBRS

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/enter: MACROS to set/clear IBRS and set IBP

Setup macros to control IBRS and IBPB

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Backport: In UEK4 it is 'cpufeature.h', not 'cpufeatures.h']

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/feature: Report presence of IBPB and IBRS control

Report presence of IBPB and IBRS.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86: Add STIBP feature enumeration

Enumerate single thread indirect branch predictors (STIBP) feature. It
provides means to prevent indirect branch predictions from being
controlled by sibling HW thread.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/cpufeature: Add X86_FEATURE_IA32_ARCH_CAPS and X86_FEATURE_IBRS_ATT

Enumerate future CPU that implements IBRS all the time in its architecture.

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

x86/feature: Enable the x86 feature to control

cpuid ax=0x7, return rdx bit 26 to indicate presence of this feature
IA32_SPEC_CTRL (0x48) and IA32_PRED_CMD (0x49)
IA32_SPEC_CTRL, bit0 – Indirect Branch Restricted Speculation (IBRS)
IA32_PRED_CMD, bit0 – Indirect Branch Prediction Barrier (IBPB)

Orabug: 27344012
CVE: CVE-2017-5715

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

dccp: CVE-2017-8824: use-after-free in DCCP code

Whenever the sock object is in DCCP_CLOSED state,
dccp_disconnect() must free dccps_hc_tx_ccid and
dccps_hc_rx_ccid and set to NULL.

Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 69c64866ce072dea1d1e59a0d61e0f66c0dffb76)

Orabug: 27290292
CVE: CVE-2017-8824

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

negotiate_mq should happen in all cases of a new VBD being discovered by
xen-blkfront, whether called through _probe() or a hot-attached new VBD
from dom-0 via xenstore. Otherwise, hot-attached new VBDs are left
configured without multi-queue.

Orabug: 27180421

Signed-off-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Patrick Colp <patrick.colp@oracle.com>

e1000: avoid null pointer dereference on invalid stat type

Currently if the stat type is invalid then data[i] is being set
either by dereferencing a null pointer p, or it is reading from
an incorrect previous location if we had a valid stat type
previously. Fix this by skipping over the read of p on an invalid
stat type.

Detected by CoverityScan, CID#113385 ("Explicit null dereferenced")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 5983587c8c5ef00d6886477544ad67d495bc5479)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000: fix race condition between e1000_down() and e1000_watchdog

This patch fixes a race condition that can result into the interface being
up and carrier on, but with transmits disabled in the hardware.
The bug may show up by repeatedly IFF_DOWN+IFF_UP the interface, which
allows e1000_watchdog() interleave with e1000_down().

    CPU x                           CPU y
    --------------------------------------------------------------------
    e1000_down():
        netif_carrier_off()
                                    e1000_watchdog():
                                        if (carrier == off) {
                                            netif_carrier_on();
                                            enable_hw_transmit();
                                        }
        disable_hw_transmit();
                                    e1000_watchdog():
                                        /* carrier on, do nothing */

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 44c445c3d1b4eacff23141fa7977c3b2ec3a45c9)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Be drop monitor friendly

e1000e_put_txbuf() can be called from normal reclamation path as well as
when a DMA mapping failure, so we need to differentiate these two cases
when freeing SKBs to be drop monitor friendly. e1000e_tx_hwtstamp_work()
and e1000_remove() are processing TX timestamped SKBs and those should
not be accounted as drops either.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 377b62736c01f14309141c69caa6d84363c12e12)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: apply burst mode settings only on default

Devices that support FLAG2_DMA_BURST have different default values
for RDTR and RADV. Apply burst mode default settings only when no
explicit value was passed at module load.

The RDTR default is zero. If the module is loaded for low latency
operation with RxIntDelay=0, do not override this value with a burst
default of 32.

Move the decision to apply burst values earlier, where explicitly
initialized module variables can be distinguished from defaults.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 48072ae1ec7a1c778771cad8c1b8dd803c4992ab)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: fix buffer overrun while the I219 is processing DMA transactions

Intel® 100/200 Series Chipset platforms reduced the round-trip
latency for the LAN Controller DMA accesses, causing in some high
performance cases a buffer overrun while the I219 LAN Connected
Device is processing the DMA transactions. I219LM and I219V devices
can fall into unrecovered Tx hang under very stressfully UDP traffic
and multiple reconnection of Ethernet cable. This Tx hang of the LAN
Controller is only recovered if the system is rebooted. Slightly slow
down DMA access by reducing the number of outstanding requests.
This workaround could have an impact on TCP traffic performance
on the platform. Disabling TSO eliminates performance loss for TCP
traffic without a noticeable impact on CPU performance.

Please, refer to I218/I219 specification update:
https://www.intel.com/content/www/us/en/embedded/products/networking/
ethernet-connection-i218-family-documentation.html

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit b10effb92e272051dd1ec0d7be56bf9ca85ab927)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Avoid receiver overrun interrupt bursts

When e1000e_poll() is not fast enough to keep up with incoming traffic, the
adapter (when operating in msix mode) raises the Other interrupt to signal
Receiver Overrun.

This is a double problem because 1) at the moment e1000_msix_other()
assumes that it is only called in case of Link Status Change and 2) if the
condition persists, the interrupt is repeatedly raised again in quick
succession.

Ideally we would configure the Other interrupt to not be raised in case of
receiver overrun but this doesn't seem possible on this adapter. Instead,
we handle the first part of the problem by reverting to the practice of
reading ICR in the other interrupt handler, like before commit 16ecba59bc33
("e1000e: Do not read ICR in Other interrupt"). Thanks to commit
0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME
from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts
anymore. We handle the second part of the problem by not re-enabling the
Other interrupt right away when there is overrun. Instead, we wait until
traffic subsides, napi polling mode is exited and interrupts are
re-enabled.

Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Fixes: 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 4aea7a5c5e940c1723add439f4088844cd26196d)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Separate signaling for link check/link up

Lennart reported the following race condition:

\ e1000_watchdog_task
    \ e1000e_has_link
        \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
            /* link is up */
            mac->get_link_status = false;

                            /* interrupt */
                            \ e1000_msix_other
                                hw->mac.get_link_status = true;

        link_active = !hw->mac.get_link_status
        /* link_active is false, wrongly */

This problem arises because the single flag get_link_status is used to
signal two different states: link status needs checking and link status is
down.

Avoid the problem by using the return value of .check_for_link to signal
the link status to e1000e_has_link().

Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 19110cfbb34d4af0cdfe14cd243f3b09dc95b013)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Fix return value test

All the helpers return -E1000_ERR_PHY.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit d3509f8bc7b0560044c15f0e3ecfde1d9af757a6)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Fix wrong comment related to link detection

Reading e1000e_check_for_copper_link() shows that get_link_status is set to
false after link has been detected. Therefore, it stays TRUE until then.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 65a29da1f5fd20fdebef3b959bef9b3660807b20)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Fix error path in link detection

In case of error from e1e_rphy(), the loop will exit early and "success"
will be set to true erroneously.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit c4c40e51f9c32c6dd8adf606624c930a1c4d9bbb)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

drivers: net: e1000e: use setup_timer() helper.

Use setup_timer function instead of initializing timer with the
function and data fields.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 27069012
(cherry picked from commit 4a9c07ed71c2b8d755ee585264f80dd2d82a8066)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Initial Support for IceLake

i219 (8) and i219 (9) are the next LOM generations that will be available
on the next Intel Client platform (IceLake).
This patch provides the initial support for these devices

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 48f76b68f9fca4e1d5bbb1755d14e8e8e09bdd5b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: add check on e1e_wphy() return value

Check return value from call to e1e_wphy(). This value is being
checked during previous calls to function e1e_wphy() and it seems
a check was missing here.

Addresses-Coverity-ID: 1226905
Signed-off-by: Gustavo A R Silva <garsilva@embeddedor.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit d75372a2daf5dc48207ee9e5592917e893cddb87)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails

An error during suspend (e100e_pm_suspend),

[  429.994338] ACPI : EC: event blocked
[  429.994633] e1000e: EEE TX LPI TIMER: 00000011
[  430.955451] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x30 [e1000e] returns -2
[  430.955454] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2
[  430.955458] PM: Device 0000:00:19.0 failed to suspend async: error -2
[  430.955581] PM: Some devices failed to suspend, or early wake event detected
[  430.957709] ACPI : EC: event unblocked

lead to complete failure:

[  432.585002] ------------[ cut here ]------------
[  432.585013] WARNING: CPU: 3 PID: 8372 at kernel/irq/manage.c:1478 __free_irq+0x9f/0x280
[  432.585015] Trying to free already-free IRQ 20
[  432.585016] Modules linked in: cdc_ncm usbnet x86_pkg_temp_thermal intel_powerclamp coretemp mii crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep lpc_ich snd_hda_core snd_pcm mei_me mei sdhci_pci sdhci i915 mmc_core e1000e ptp pps_core prime_numbers
[  432.585042] CPU: 3 PID: 8372 Comm: kworker/u16:40 Tainted: G     U          4.10.0-rc8-CI-Patchwork_3870+ #1
[  432.585044] Hardware name: LENOVO 2356GCG/2356GCG, BIOS G7ET31WW (1.13 ) 07/02/2012
[  432.585050] Workqueue: events_unbound async_run_entry_fn
[  432.585051] Call Trace:
[  432.585058]  dump_stack+0x67/0x92
[  432.585062]  __warn+0xc6/0xe0
[  432.585065]  warn_slowpath_fmt+0x4a/0x50
[  432.585070]  ? _raw_spin_lock_irqsave+0x49/0x60
[  432.585072]  __free_irq+0x9f/0x280
[  432.585075]  free_irq+0x34/0x80
[  432.585089]  e1000_free_irq+0x65/0x70 [e1000e]
[  432.585098]  e1000e_pm_freeze+0x7a/0xb0 [e1000e]
[  432.585106]  e1000e_pm_suspend+0x21/0x30 [e1000e]
[  432.585113]  pci_pm_suspend+0x71/0x140
[  432.585118]  dpm_run_callback+0x6f/0x330
[  432.585122]  ? pci_pm_freeze+0xe0/0xe0
[  432.585125]  __device_suspend+0xea/0x330
[  432.585128]  async_suspend+0x1a/0x90
[  432.585132]  async_run_entry_fn+0x34/0x160
[  432.585137]  process_one_work+0x1f4/0x6d0
[  432.585140]  ? process_one_work+0x16e/0x6d0
[  432.585143]  worker_thread+0x49/0x4a0
[  432.585145]  kthread+0x107/0x140
[  432.585148]  ? process_one_work+0x6d0/0x6d0
[  432.585150]  ? kthread_create_on_node+0x40/0x40
[  432.585154]  ret_from_fork+0x2e/0x40
[  432.585156] ---[ end trace 6712df7f8c4b9124 ]---

The unwind failures stems from commit 2800209994f8 ("e1000e: Refactor PM
flows"), but it may be a later patch that introduced the non-recoverable
behaviour.

Fixes: 2800209994f8 ("e1000e: Refactor PM flows")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99847
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 27069012
(cherry picked from commit 833521ebc65b1c3092e5c0d8a97092f98eec595d)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

qla2xxx: Fix system crash in qlt_plogi_ack_unref

Orabug: 27235104

Fix system crash due to NULL pointer access.

qlt_plogi_ack_t and fc_port structures were not properly
bound before calling qlt_plogi_ack_unref().

RIP: 0010:qlt_plogi_ack_unref+0xa1/0x150 [qla2xxx]
Call Trace:
qla24xx_create_new_sess+0xb1/0x320 [qla2xxx]
qla2x00_do_work+0x123/0x260 [qla2xxx]
qla2x00_iocb_work_fn+0x30/0x40 [qla2xxx]
process_one_work+0x1f3/0x530
worker_thread+0x4e/0x480
kthread+0x10c/0x140

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Giridhar Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit 19759033e0d0beed70421ab9258f5ede79e070ae ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Remove aborting ELS IOCB call issued as part of timeout.

Orabug: 27235104

This fix the spinlock recursion issue seen while unloading the driver.

14 [ffff9f2e21e03db8] native_queued_spin_lock_slowpath at ffffffffad0d8802
15 [ffff9f2e21e03dc0] do_raw_spin_lock at ffffffffad0d99e4
16 [ffff9f2e21e03dd8] _raw_spin_lock_irqsave at ffffffffad652471
17 [ffff9f2e21e03e00] qla2x00_els_dcmd_iocb_timeout at ffffffffc070cd63
18 [ffff9f2e21e03e40] qla2x00_sp_timeout at ffffffffc06f06d3 [qla2xxx]
19 [ffff9f2e21e03e68] call_timer_fn at ffffffffad0f97d8
20 [ffff9f2e21e03ed8] run_timer_softirq at ffffffffad0faf47
21 [ffff9f2e21e03f68] __softirqentry_text_start at ffffffffad655f32

Fixes: 6eb54715b54bb ("qla2xxx: Added interface to send explicit LOGO.")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Giridhar Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit bf07ef86e882013522876f7c834c8eea085f35b4 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Defer processing of GS IOCB calls

Orabug: 27235104

This patch defers processing of GS IOCB calls from interrupt
context to avoid hardware spinlock recursion.

Following stack trace is seen

? mod_timer+0x193/0x330
? ql_dbg+0xa7/0xf0 [qla2xxx]
_raw_spin_lock_irqsave+0x31/0x40
qla2x00_start_sp+0x3b/0x250 [qla2xxx]
qla24xx_async_gnl+0x1d3/0x240 [qla2xxx]
qla24xx_fcport_handle_login+0x285/0x290 [qla2xxx]
? vprintk_func+0x20/0x50

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Giridhar Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit 5d3300a9b8b122b4743aed5a178bf12c87e2b8c9 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Clear loop id after delete

Orabug: 27235104

clear loop id after delete to prevent session invalidation
of stale session.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit ba743f9148e951abe1c94f89c174ec8e44fb145b ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Fix scan state field for fcport

Orabug: 27235104

Add correct value of scan_state field indicating state
of the FC port

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit 76f9a2dd4c60183879a1898bcd56a1dbab19a85d ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Replace fcport alloc with qla2x00_alloc_fcport

Orabug: 27235104

Current code manually allocate an fcport structure that
is not properly initialize. Replace kzalloc with
qla2x00_alloc_fcport, so that all fields are initialized.

Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit 063b36d6b0ad74c748d536f5cb47bac2f850a0fa ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Fix abort command deadlock due to spinlock

Orabug: 27235104

Original code acquires hardware_lock to add Abort IOCB
onto driver request queue for processing. However,
abort_command() will also acquire hardware lock to look up
sp pointer before issuing abort IOCB command resulting
into a deadlock. This patch safely removes the possible
deadlock scenario by removing extra spinlock.

Fixes: 6eb54715b54bb ("qla2xxx: Added interface to send explicit LOGO.")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit b0dcce746b32ac573343ad39cb3dc485030de95e ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Fix PRLI state check

Orabug: 27235104

Get Port Database MBX cmd is to validate current Login state upon
PRLI completion. Current code looks at the last login state for
re-validation which was incorrect. This patch removed incorrect
state check.

Fixes: 15f30a5752287 ("qla2xxx: Use IOCB interface to submit non-critical MBX.")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit 23c645595dab7b414f23639d0a428a07515807df ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Clear send ELS LOGO flag after target re-login

Orabug: 27235104

This patch fixes clearing out els_send_logo flag at the
time of session deletion.

Fixes: 3515832cc614 ("scsi: qla2xxx: Reset the logo flag, after target re-login.")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Fix Relogin being triggered too fast

Orabug: 27235104

Current driver design schedules relogin process via DPC thread
every 1 second. In a large fabric, this DPC thread tries to
schedule too many jobs and might get overloaded. As a result of
this processing of DPC thread, it can schedule relogin earlier
than 1 second.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit 4005a995668b8fd58f4cf1460dd4cf63efa18363 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Relogin to target port on a cable swap

Orabug: 27235104

If user swaps one target port for another target port for same
switch port, the new target port is not being recognized by the
driver. Current code assumes that old Target port has recovered
from link down. The fix will ask switch what is the WWPN of a
specific NportID (GPNID) rather than assuming it's the same Target
port which has came back.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>'
[ Upstream commit 5ef696aa9f3ccf999552d924c4e21a348f2bbea9 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Recheck session state after RSCN.

Orabug: 27235104

when RSCN is delivered for specific remote port,
the switch say the remote port is still up and
current state of the remote port/session is good,
don't trust the state. Instead use ADISC to very
the session is still valid or not.

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Sawan Chandak <sawan.chandak@cavium.com>
[ Upstream commit a07fc0a42e9ae76e93235f59b089986dc1b751c8 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Fix login state machine stuck at GPDB

Orabug: 27235104

This patch returns discovery state machine back to
Login Complete.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit 414d9ff3f8039f85d23f619dcbbd1ba2628a1a67 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Serialize GPNID for multiple RSCN

Orabug: 27235104

GPNID is triggered by RSCN. For multiple RSCNs of the same
affected NPORT ID, serialize the GPNID to prevent confusion.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
[ Upstream commit 2d73ac6102d943c4be4945735a338005359c6abc ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: fix stale memory access.

Orabug: 27235104

Name pointer describing each command is assigned with stack frame's memory.
The stack frame could eventually be re-use, where name pointer
access can get get garbage. To fix the problem, use designated
static memory for name pointer.

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Sawan Chandak <sawan.chandak@cavium.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Retry switch command on time out

Orabug: 27235104

Retry GID_PN & GPN_ID switch commands for time out case.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit 25ad76b703d9ad536f3411b15b1070aeb059ab55 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Fix system crash for Notify ack timeout handling

Orabug: 27235104

Fix NULL pointer crash due to missing timeout handling callback
for Notify Ack IOCB.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[ Upstream commit 2e01d0ba868ec1d4d55ddcba519339e072b0bf4d ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qla2xxx: Fix re-login for Nport Handle in use

Orabug: 27235104

When NPort Handle is in use, driver needs to mark the handle
as used and pick another. Instead, the code clears the handle
and re-pick the same handle.

Fixes: 726b85487067d ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
[ Upstream commit a084fd68e1d26174c4cc1a13fbb0112f468ff7f4 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: qla2xxx: Cleanup debug message IDs

Orabug: 27235104

Assign unique id to all traces and logs for debug purpose.

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ Upstream commit 83548fe2fcbb78a233e8156feff4e167f1d0831e ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: qla2xxx: Fix name server relogin

Orabug: 27235104

Name server login is normally handle by FW. In some rare case where one
of the switches is being updated, name server login could get
affected. Trigger relogin to name server when driver detects this
condition.

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ Upstream commit b98ae0d748dbc80016c5cc2e926f33648d83353d ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: qla2xxx: Fix path recovery

Orabug: 27235104

If the port is moved/changed, current code would trigger
a deletion. If the port is already deleted, then do relogin.

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ Upstream commit 66ee0fece9a0b91f59a7eb8dd39808a0ef5db02b ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: qla2xxx: Fix an integer overflow in sysfs code

Orabug: 27235104

The value of "size" comes from the user.  When we add "start + size" it
could lead to an integer overflow bug.

It means we vmalloc() a lot more memory than we had intended.  I believe
that on 64 bit systems vmalloc() can succeed even if we ask it to
allocate huge 4GB buffers.  So we would get memory corruption and likely
a crash when we call ha->isp_ops->write_optrom() and ->read_optrom().

Only root can trigger this bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=194061
Cc: <stable@vger.kernel.org>
Fixes: b7cc176c9eb3 ("[SCSI] qla2xxx: Allow region-based flash-part accesses.")
Reported-by: shqking <shqking@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ Upstream commit e6f77540c067b48dee10f1e33678415bfcc89017 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: qla2xxx: don't disable a not previously enabled PCI device

Orabug: 27235104

commit ddff7ed45edce4a4c92949d3c61cd25d229c4a14 upstream.

When pci_enable_device() or pci_enable_device_mem() fail in
qla2x00_probe_one() we bail out but do a call to
pci_disable_device(). This causes the dev_WARN_ON() in
pci_disable_device() to trigger, as the device wasn't enabled
previously.

So instead of taking the 'probe_out' error path we can directly return
*iff* one of the pci_enable_device() calls fails.

Additionally rename the 'probe_out' goto label's name to the more
descriptive 'disable_device'.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: e315cd28b9ef ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Giridhar Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ddff7ed45edce4a4c92949d3c61cd25d229c4a14 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

ALSA: pcm: prevent UAF in snd_pcm_info

When the device descriptor is closed, the `substream->runtime` pointer
is freed. But another thread may be in the ioctl handler, case
SNDRV_CTL_IOCTL_PCM_INFO. This case calls snd_pcm_info_user() which
calls snd_pcm_info() which accesses the now freed `substream->runtime`.

Note: this fixes CVE-2017-0861

Signed-off-by: Robb Glasser <rglasser@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 362bca57f5d78220f8b5907b875961af9436e229)

Orabug: 27344839
CVE: CVE-2017-0861

Signed-off-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

kernel-uek.spec: update linux-firmware and linux-nano-firmware dependency

Orabug: 27185358

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>

x86, kasan: Fix build failure on KASAN=y && KMEMCHECK=y kernels

Declaration of memcpy() is hidden under #ifndef CONFIG_KMEMCHECK.
In asm/efi.h under #ifdef CONFIG_KASAN we #undef memcpy(), due to
which the following happens:

  In file included from arch/x86/kernel/setup.c:96:0:
  ./arch/x86/include/asm/desc.h: In function â\80\98native_write_idt_entryâ\80\99:
  ./arch/x86/include/asm/desc.h:122:2: error: implicit declaration of function â\80\98memcpyâ\80\99 [-Werror=implicit-function-declaration]   memcpy(&idt[entry], gate, sizeof(*gate));
    ^
    cc1: some warnings being treated as errors
    make[2]: *** [arch/x86/kernel/setup.o] Error 1

We will get rid of that #undef in asm/efi.h eventually.
But in the meanwhile move memcpy() declaration out of #ifdefs
to fix the build.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1444994933-28328-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit a75ca545e8d57473da47ece828ad98a10727ec6f)

Orabug: 27132235

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Acked-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Acked-by: Girish Moodalbail <girish.moodalbail@oracle.com>

x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels

With KMEMCHECK=y, KASAN=n we get this build failure:

  arch/x86/platform/efi/efi.c:673:3:    error: implicit declaration of function â\80\98memcpyâ\80\99 [-Werror=implicit-function-declaration]
  arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function â\80\98memcpyâ\80\99 [-Werror=implicit-function-declaration]
  arch/x86/include/asm/desc.h:121:2:    error: implicit declaration of function â\80\98memcpyâ\80\99 [-Werror=implicit-function-declaration]

Don't #undef memcpy if KASAN=n.

Reported-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 769a8089c1fd ("x86, efi, kasan: #undef memset/memcpy/memmove per arch")
Link: http://lkml.kernel.org/r/1443544814-20122-1-git-send-email-ryabinin.a.a@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 4ac86a6dcec1c3878de9747bf5a2aa4455be69e3)

Orabug: 27132235

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Acked-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Acked-by: Girish Moodalbail <girish.moodalbail@oracle.com>

x86, efi, kasan: #undef memset/memcpy/memmove per arch

In not-instrumented code KASAN replaces instrumented memset/memcpy/memmove
with not-instrumented analogues __memset/__memcpy/__memove.

However, on x86 the EFI stub is not linked with the kernel. It uses
not-instrumented mem*() functions from arch/x86/boot/compressed/string.c

So we don't replace them with __mem*() variants in EFI stub.

On ARM64 the EFI stub is linked with the kernel, so we should replace
mem*() functions with __mem*(), because the EFI stub runs before KASAN
sets up early shadow.

So let's move these #undef mem* into arch's asm/efi.h which is also
included by the EFI stub.

Also, this will fix the warning in 32-bit build reported by kbuild test
robot:

efi-stub-helper.c:599:2: warning: implicit declaration of function 'memcpy'

[akpm@linux-foundation.org: use 80 cols in comment]
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reported-by: Fengguang Wu <fengguang.wu@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 769a8089c1fd2fe94c13e66fe6e03d7820953ee3)

Orabug: 27132235

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Acked-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Acked-by: Girish Moodalbail <girish.moodalbail@oracle.com>

Revert "Makefile: Build with -Werror=date-time if the compiler supports it"

This reverts commit fe7c36c7bde1
("Makefile: Build with -Werror=date-time if the compiler supports it")

Revert this change because UEK 4.1 still uses __DATE__ and __TIME__ macro
and building UEK with GCC 4.9 or newer should not cause compilation errors.

Orabug: 27132235

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Acked-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Acked-by: Girish Moodalbail <girish.moodalbail@oracle.com>

x86/efi: Initialize and display UEFI secure boot state a bit later during init

Otherwise Xen dom0 does not display "Secure boot enabled" message if it runs
on secure boot enabled platform. This happens because boot_params.secure_boot
is initialized too late. However, despite lack of message all features depending
on UEFI secure boot are enabled properly.

Orabug: 27258204

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

bnxt_en: Fix possible corrupted NVRAM parameters from firmware response.

In bnxt_find_nvram_item(), it is copying firmware response data after
releasing the mutex. This can cause the firmware response data
to be corrupted if the next firmware response overwrites the response
buffer. The rare problem shows up when running ethtool -i repeatedly.

Fix it by calling the new variant _hwrm_send_message_silent() that requires
the caller to take the mutex and to release it after the response data has
been copied.

Fixes: 3ebf6f0a09a2 ("bnxt_en: Add installed-package version reporting via Ethtool GDRVINFO")
Reported-by: Sarveswara Rao Mygapula <sarveswararao.mygapula@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cc72f3b1feb4fd38d33ab7a013d5ab95041cb8ba)

Orabug: 27285190

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>

dtrace: do not use copy_from_user when accessing kernel stack

The implementation of sdt_getarg() for x86_64 uses a copy_from_user
variant while reading from kernel stack which is obviously wrong.
This commit corrects that.

Orabug: 25949088
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>

dtrace: fix arg5 and up retrieval for FBT entry probes on x86

When tracing function entry using FBT entry probes, access to all the
function arguments should be supported.  The existing code supported
access up to the 5th argument, but would yield incorrect results for
any argument beyond that.  On some kernels it could result in a crash
when the 6th (or higher) argument was being retrieved.

The reason for the problem lies in the fact that the first 5 arguments
could be read directly from the register set, whereas the 6th argument
and beyond needs to be retrieved from the stack.  The generic code
implementing retrieval of arguments turns out to be incorrect for FBT
entry probes.

Thi commit introduces a FBT-specific function to access function
arguments.

Orabug: 25949088
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>

x86/espfix: Init espfix on the boot CPU side

As we alloc pages with GFP_KERNEL in init_espfix_ap() which is
called before we enable local irqs, so the lockdep sub-system
would (correctly) trigger a warning about the potentially
blocking API.

So we allocate them on the boot CPU side when the secondary CPU is
brought up by the boot CPU, and hand them over to the secondary
CPU.

And we use alloc_pages_node() with the secondary CPU's node, to
make sure the espfix stack is NUMA-local to the CPU that is
going to use it.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Cc: <bp@alien8.de>
Cc: <luto@amacapital.net>
Cc: <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c97add2670e9abebb90095369f0cfc172373ac94.1435824469.git.zhugh.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Orabug: 26523661
(cherry picked from commit 20d5e4a9cd453991e2590a4c25230a99b42dee0c)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>

x86/espfix: Add 'cpu' parameter to init_espfix_ap()

Add a CPU index parameter to init_espfix_ap(), so that the
parameter could be propagated to the function for espfix
page allocation.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Cc: <bp@alien8.de>
Cc: <luto@amacapital.net>
Cc: <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cde3fcf1b3211f3f03feb1a995bce3fee850f0fc.1435824469.git.zhugh.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Orabug: 26523661
(cherry picked from commit 1db875631f8d5bbf497f67e47f254eece888d51d)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Tim Tianyang Chen <tianyang.chen@oracle.com>

xen: Make PV Dom0 Linux kernel NUMA aware

Issues Xen hypercall subop XENMEM_get_vnumainfo and sets the
NUMA topology, otherwise sets dummy NUMA node and prevents
numa_init from calling other numa initializators.
Enables vNUMA for dom0 if numa kernel boot option does not
disable it.
It also requires Xen to have patches that support Dom0 NUMA
and xen boot option dom0_vcpus_pin=numa.

Dom0 NUMA topology with this patch applied and Xen booted with
"dom0_mem=max:6144M dom0_vcpus_pin=numa dom0_max_vcpus=20":

[root@localhost ~]# numactl --ha
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9
node 0 size: 2966 MB
node 0 free: 2175 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19
node 1 size: 2880 MB
node 1 free: 2143 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

And lstopo output:

Machine (5847MB)
  NUMANode L#0 (P#0 2967MB)
    Socket L#0
      L3 L#0 (25MB) + L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L3 L#1 (25MB) + L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
      L3 L#2 (25MB) + L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
      L3 L#3 (25MB) + L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
      L3 L#4 (25MB) + L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
      L3 L#5 (25MB) + L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
      L3 L#6 (25MB) + L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
      L3 L#7 (25MB) + L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
      L3 L#8 (25MB) + L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8)
      L3 L#9 (25MB) + L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9)
    HostBridge L#0
      PCIBridge
        PCI 1000:005d
          Block L#0 "sda"
          Block L#1 "sdb"
      PCIBridge
        PCI 8086:1528
          Net L#2 "eth0"
        PCI 8086:1528
          Net L#3 "eth1"
      PCIBridge
        PCI 102b:0522
      PCI 8086:8d02
  NUMANode L#1 (P#1 2880MB)
    Socket L#1
      L3 L#10 (25MB) + L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#10)
      L3 L#11 (25MB) + L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#11)
      L3 L#12 (25MB) + L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#12)
      L3 L#13 (25MB) + L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#13)
      L3 L#14 (25MB) + L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#14)
      L3 L#15 (25MB) + L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#15)
      L3 L#16 (25MB) + L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 (P#16)
      L3 L#17 (25MB) + L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU L#17 (P#17)
      L3 L#18 (25MB) + L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 + PU L#18 (P#18)
      L3 L#19 (25MB) + L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 + PU L#19 (P#19)
    HostBridge L#4
      PCIBridge
        PCI 8086:1528
          Net L#4 "eth2"
        PCI 8086:1528
          Net L#5 "eth3"
      PCIBridge
        PCI 8086:2701

OraBug: 26189217

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff()

Orabug: 27255674

ext4_find_unwritten_pgoff() is used to search for offset of hole or
data in page range [index, end] (both inclusive), and the max number
of pages to search should be at least one, if end == index.
Otherwise the only page is missed and no hole or data is found,
which is not correct.

When block size is smaller than page size, this can be demonstrated
by preallocating a file with size smaller than page size and writing
data to the last block. E.g. run this xfs_io command on a 1k block
size ext4 on x86_64 host.

  # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
       -c "seek -d 0" /mnt/ext4/testfile
  wrote 1024/1024 bytes at offset 2048
  1 KiB, 1 ops; 0.0000 sec (42.459 MiB/sec and 43478.2609 ops/sec)
  Whence  Result
  DATA    EOF

Data at offset 2k was missed, and lseek(2) returned ENXIO.

This is unconvered by generic/285 subtest 07 and 08 on ppc64 host,
where pagesize is 64k. Because a recent change to generic/285
reduced the preallocated file size to smaller than 64k.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
(cherry picked from commit 624327f8794704c5066b11a52f9da6a09dce7f9a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>

DTrace: IO wait probes b_flags can contain incorrect operation

In the synchronous IO path, the xfs buffer flag value can change,
causing the IO provider io:::wait-start and io:::wait-done to report
an incorrect operation through the bufinfo_t b_flags field.

Orabug: 27193447

Signed-off-by: Nicolas Droux <nicolas.droux@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk

When guest passes KVM it's pvclock-page GPA via WRMSR to
MSR_KVM_SYSTEM_TIME / MSR_KVM_SYSTEM_TIME_NEW, KVM don't initialize
pvclock-page to some start-values. It just requests a clock-update which
will happen before entering to guest.

The clock-update logic will call kvm_setup_pvclock_page() to update the
pvclock-page with info. However, kvm_setup_pvclock_page() *wrongly*
assumes that the version-field is initialized to an even number. This is
wrong because at first-time write, field could be any-value.

Fix simply makes sure that if first-time version-field is odd, increment
it once more to make it even and only then start standard logic.
This follows same logic as done in other pvclock shared-pages (See
kvm_write_wall_clock() and record_steal_time()).

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry picked from commit 51c4b8bba674cfd2260d173602c4dac08e4c3a99)

Orabug: 27146591

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

KVM: x86: always fill in vcpu->arch.hv_clock

We will use it in the next patches for KVM_GET_CLOCK and as a basis for the
contents of the Hyper-V TSC page. Get the values from the Linux
timekeeper even if kvmclock is not enabled.

Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0d6dd2ff8206dc1da3428d5b1611f6304d481dab)

Orabug: 27146591

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2

vmx_check_nested_events() should return -EBUSY only in case there is a
pending L1 event which requires a VMExit from L2 to L1 but such a
VMExit is currently blocked. Such VMExits are blocked either
because nested_run_pending=1 or an event was reinjected to L2.
vmx_check_nested_events() should return 0 in case there are no
pending L1 events which requires a VMExit from L2 to L1 or if
a VMExit from L2 to L1 was done internally.

However, upstream commit which introduced blocking in case an event was
reinjected to L2 (commit acc9ab601327 ("KVM: nVMX: Fix pending events
injection")) contains a bug: It returns -EBUSY even if there are no
pending L1 events which requires VMExit from L2 to L1.

This commit fix this issue.

Fixes: acc9ab601327 ("KVM: nVMX: Fix pending events injection")
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry picked from commit 917dc6068bc12a2dafffcf0e9d405ddb1b8780cb)
Orabug: 27200329
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Acked-by: Liran Alon <liran.alon@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

KVM: VMX: use kvm_event_needs_reinjection

Use kvm_event_needs_reinjection() encapsulation.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 274bba52a01d6de01f03cfb1b80af2d35772e62e)
Orabug: 27200329
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Acked-by: Liran Alon <liran.alon@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

KVM: nVMX: Fix pending events injection

L2 fails to boot on a non-APICv box dues to 'commit 0ad3bed6c5ec
("kvm: nVMX: move nested events check to kvm_vcpu_running")'

KVM internal error. Suberror: 3
extra data[0]: 800000ef
extra data[1]: 1
RAX=0000000000000000 RBX=ffffffff81f36140 RCX=0000000000000000 RDX=0000000000000000
RSI=0000000000000000 RDI=0000000000000000 RBP=ffff88007c92fe90 RSP=ffff88007c92fe90
R8 =ffff88007fccdca0 R9 =0000000000000000 R10=00000000fffedb3d R11=0000000000000000
R12=0000000000000003 R13=0000000000000000 R14=0000000000000000 R15=ffff88007c92c000
RIP=ffffffff810645e6 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 ffffffff 00c00000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c00000
DS =0000 0000000000000000 ffffffff 00c00000
FS =0000 0000000000000000 ffffffff 00c00000
GS =0000 ffff88007fcc0000 ffffffff 00c00000
LDT=0000 0000000000000000 ffffffff 00c00000
TR =0040 ffff88007fcd4200 00002087 00008b00 DPL=0 TSS64-busy
GDT= ffff88007fcc9000 0000007f
IDT= ffffffffff578000 00000fff
CR0=80050033 CR2=00000000ffffffff CR3=0000000001e0a000 CR4=003406e0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000fffe0ff0 DR7=0000000000000400
EFER=0000000000000d01

We should try to reinject previous events if any before trying to inject
new event if pending. If vmexit is triggered by L2 guest and L0 interested
in, we should reinject IDT-vectoring info to L2 through vmcs02 if any,
otherwise, we can consider new IRQs/NMIs which can be injected and call
nested events callback to switch from L2 to L1 if needed and inject the
proper vmexit events. However, 'commit 0ad3bed6c5ec ("kvm: nVMX: move
nested events check to kvm_vcpu_running")' results in the handle events
order reversely on non-APICv box. This patch fixes it by bailing out for
pending events and not consider new events in this scenario.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Fixes: 0ad3bed6c5ec ("kvm: nVMX: move nested events check to kvm_vcpu_running")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry picked from commit acc9ab60132739e0c5ab40644444cd7b22d12886)
Orabug: 27200329
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Acked-by: Liran Alon <liran.alon@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/time: do not decrease steal time after live migration on xen

After guest live migration on xen, steal time in /proc/stat
(cpustat[CPUTIME_STEAL]) might decrease because steal returned by
xen_steal_lock() might be less than this_rq()->prev_steal_time which is
derived from previous return value of xen_steal_clock().

For instance, steal time of each vcpu is 335 before live migration.

cpu  198 0 368 200064 1962 0 0 1340 0 0
cpu0 38 0 81 50063 492 0 0 335 0 0
cpu1 65 0 97 49763 634 0 0 335 0 0
cpu2 38 0 81 50098 462 0 0 335 0 0
cpu3 56 0 107 50138 374 0 0 335 0 0

After live migration, steal time is reduced to 312.

cpu  200 0 370 200330 1971 0 0 1248 0 0
cpu0 38 0 82 50123 500 0 0 312 0 0
cpu1 65 0 97 49832 634 0 0 312 0 0
cpu2 39 0 82 50167 462 0 0 312 0 0
cpu3 56 0 107 50207 374 0 0 312 0 0

Since runstate times are cumulative and cleared during xen live migration
by xen hypervisor, the idea of this patch is to accumulate runstate times
to global percpu variables before live migration suspend. Once guest VM is
resumed, xen_get_runstate_snapshot_cpu() would always return the sum of new
runstate times and previously accumulated times stored in global percpu
variables.

Comment above HYPERVISOR_suspend() has been removed as it is inaccurate:
the call can return an error code (e.g., possibly -EPERM in the future).

Similar and more severe issue would impact prior linux 4.8-4.10 as
discussed by Michael Las at
https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest,
which would overflow steal time and lead to 100% st usage in top command
for linux 4.8-4.10. A backport of this patch would fix that issue.

[boris: added linux/slab.h to driver/xen/time.c, slightly reformatted
        commit message]

References: https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Orabug: 27181243

(cherry picked from commit 5e25f5db6abb96ca8ee2aaedcb863daa6dfcc07a)
The original purpose of this patch is to not decrease steal time after live
migration on xen. We backport this patch to avoid xen steal clock overflow
issue which would lead to 100% st usage in top command as mentioned in
above commit message. UEK3 or UEK2 would not hit this issue.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

bnx2x: fix slowpath null crash

Backport: upstream 442866ff9743d51957685cecaa722a7fd47b02e2

When "NETDEV WATCHDOG: em4 (bnx2x): transmit queue 2 timed out" occurs,
BNX2X_SP_RTNL_TX_TIMEOUT is set. In the function bnx2x_sp_rtnl_task,
bnx2x_nic_unload and bnx2x_nic_load are executed to shutdown and open
NIC. In the function bnx2x_nic_load, bnx2x_alloc_mem fails to allocate
dma memory. The message "bnx2x: [bnx2x_alloc_mem:8399(em4)]Can't
allocate memory" pops out. The variable slowpath is set to NULL.
When shutdown the NIC, the function bnx2x_nic_unload is called. In
the function bnx2x_nic_unload, the following functions are executed.
bnx2x_chip_cleanup
    bnx2x_set_storm_rx_mode
        bnx2x_set_q_rx_mode
            bnx2x_set_q_rx_mode
                bnx2x_config_rx_mode
                    bnx2x_set_rx_mode_e2
In the function bnx2x_set_rx_mode_e2, the variable slowpath is operated.
Then the crash occurs.
To fix this crash, the variable slowpath is checked. And in the function
bnx2x_sp_rtnl_task, after dma memory allocation fails, another shutdown
and open NIC is executed.

Orabug: 27041078

[Correct the typos in the short log and comments]

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Ariel Elior <aelior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

Replace max_t() with sub_positive() in dequeue_entity_load_avg()

Orabug: 27026563

This patch replaces max_t() with sub_positive() to prevent intermediate
values getting stored in cfs_rq->runnable_load_avg/runnable_load_sum.
This patch has been partially cherry picked from commit c7b50216818e
("sched/fair: Remove se->load.weight from se->avg.load_sum")

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>

sched/fair: Fix cfs_rq avg tracking underflow

As per commit:

  b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")

> the code generated from update_cfs_rq_load_avg():
>
> if (atomic_long_read(&cfs_rq->removed_load_avg)) {
> s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
> sa->load_avg = max_t(long, sa->load_avg - r, 0);
> sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
> removed_load = 1;
> }
>
> turns into:
>
> ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
> ffffffff8108706b:       48 85 c0                test   %rax,%rax
> ffffffff8108706e:       74 40                   je     ffffffff810870b0 <update_blocked_averages+0xc0>
> ffffffff81087070:       4c 89 f8                mov    %r15,%rax
> ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
> ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
> ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
> ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
> ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
> ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx
>
> Which you'll note ends up with sa->load_avg -= r in memory at
> ffffffff8108707a.

So I _should_ have looked at other unserialized users of ->load_avg,
but alas. Luckily nikbor reported a similar /0 from task_h_load() which
instantly triggered recollection of this here problem.

Aside from the intermediate value hitting memory and causing problems,
there's another problem: the underflow detection relies on the signed
bit. This reduces the effective width of the variables, IOW its
effectively the same as having these variables be of signed type.

This patch changes to a different means of unsigned underflow
detection to not rely on the signed bit. This allows the variables to
use the 'full' unsigned range. And it does so with explicit LOAD -
STORE to ensure any intermediate value will never be visible in
memory, allowing these unserialized loads.

Note: GCC generates crap code for this, might warrant a look later.

Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
       maybe we should do clamping on add too.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yuyang Du <yuyang.du@intel.com>
Cc: bsegall@google.com
Cc: kernel@kyup.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: steve.muckle@linaro.org
Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 8974189222159154c55f24ddad33e3613960521a)

Conflicts:

kernel/sched/fair.c

    Orabug: 27026563

Signed-off-by: Gayatri Vasudevan <gayatri.vasudevan@oracle.com>
Reviewed-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>

rds: System panic if RDS netfilter is enabled and RDS/TCP is used

If a transport does not support netfilter, its inc_to_skb is NULL.
The code needs to make sure that it is not NULL before calling it.

Orabug: 26950401

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>