www.infradead.org Git - users/jedix/linux-maple.git/log

Revert "KVM: nVMX: Eliminate vmcs02 pool"

This reverts commit 22d8dc569898dec08b3c1fdc8a5b8b0e48ab8986.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM: VMX: introduce alloc_loaded_vmcs"

This reverts commit 8dd66ca98dfc03f7921ce5bf5926ce2d95507d84.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM: VMX: make MSR bitmaps per-VCPU"

This reverts commit b3bd557498bff191933d0481b609fdff0caa9ead.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM: x86: pass host_initiated to functions that read MSRs"

This reverts commit fe09396e31ddc452c8ef2e0a5474c28d5a501e4e.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM/x86: Add IBPB support"

This reverts commit e5455ef7dbbab7ee5bc901ffdc7666e61fc41e11.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL - reloaded"

This reverts commit eaaa119c0f931596b7da2bb70e33d8cd1fcea576.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL"

This reverts commit 53da198cf7a918baa7ac267af1d89b71bc117ce9.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "KVM: SVM: Add MSR-based feature support for serializing LFENCE"

This reverts commit 351ff7a273f566e1efed1be14f01fc9510ad9002.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "x86/cpufeatures: rename X86_FEATURE_AMD_SSBD to X86_FEATURE_LS_CFG_SSBD"

This reverts commit a1a02e26cd913a7cfb5086b3f3cb1561c04447b6.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "x86/bugs: Add AMD's SPEC_CTRL MSR usage"

This reverts commit 6ed384dcb12d6b7f2b8a448810e56206b9db4fb7.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "x86/bugs: Fix the AMD SSBD usage of the SPEC_CTRL MSR"

This reverts commit ceea9d71d84a402818f9b070d1e932a05fa46c8d.

Revert due to performance regression.

Orabug: 29542029

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs_64.c

arch: x86: remove unsued SET_IBPB from spec_ctrl.h

Remove unsued SET_IBPB macro from spec_ctrl.h

Orabug: 29336760

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86: cpu: microcode: fix late loading SpectreV2 bugs eval

On microcode reloading we have to update the status of SpectreV2 mitigations if
they were not present at init time: we run the logic of selecting the default
mitigation in auto mode.

It was not possible to use the same functions as most of the logic is using
boot_command_line which is in init data and dropped after booting. Also we had
to drop some of the __init clauses on some functions we use in order to not
duplicate them.

This patch is not addressing alternative instructions related to SpectreV2. The
only one that we found so far is STUFF_RSB macro, and it will be addressed
later.

Orabug: 29336760

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86: cpu: microcode: fix late loading SSBD and L1TF bugs eval

On microcode reloading we have to update the status of SpectreV2 mitigations if
they were not present at init time: we have opted for the default mitigation:
seccomp or prctl (so per process).

For L1TF we do not have to do anything as host mitigation does not depend on
any CPU bits and from hypervisor perspective we just call vmentry_l1d_flush_set
to re-assess the mitigation. vmentry_l1d_flush_ops is exposed through this structure:
static const struct kernel_param_ops vmentry_l1d_flush_ops = {
>-------.set = vmentry_l1d_flush_set,
>-------.get = vmentry_l1d_flush_get,
};

And can be set/get using this sysfs entry:
/sys/module/kvm_intel/parameters/vmentry_l1d_flush

It was not possible to use the same functions as most of the logic is using
boot_command_line which is in init data and dropped after booting.

Orabug: 29336760

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86: cpu: microcode: Re-evaluate bugs in a CPU after microcode loading

After doing a new microcode load, we have to re-evaluate bugs in the CPU in
order to see if the microcode brought changes. We clear caps for
X86_BUG_SPEC_STORE_BYPASS, X86_BUG_CPU_MELTDOWN and X86_BUG_L1TF as these can
be modified by new microcode loading.

Orabug: 29336760

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86: cpu: microcode: update flags for all cpus

After a new microcode is loaded, we need to update cpu features and flags for
all CPUs not just for CPU 0. As microcode_late_eval_cpuid is setting some of
the boot_cpu_data we are doing CPU 0 last.

Orabug: 29336760

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/apic: Make arch_setup_hwirq NUMA node aware

In a xen VM with vNUMA enabled, irq affinity for a device on node 1
may become stuck on CPU 0. /proc/irq/nnn/smp_affinity_list may show
affinity for all the CPUs on node 1, but this is wrong. All interrupts
are on the first CPU of node 0 which is usually CPU 0.

The problem is caused when __assign_irq_vector() is called by
arch_setup_hwirq() with a mask of all online CPUs, and then called later
with a mask including only the node 1 CPUs. The first call assigns affinity
to CPU 0, and the second tries to move affinity to the first online node 1
CPU. In the reported case this is always CPU 2. For some reason, the
CPU 0 affinity is never cleaned up, and all interrupts remain with CPU 0.
Since an incomplete move appears to be in progress, all attempts to
reassign affinity for the irq fail. Because of a quirk in how affinity is
displayed in /proc/irq/nnn/smp_affinity_list, changes may appear to work
temporarily.

It was not reproducible on baremetal on the machine I had available for
testing, but it is possible that it was observed on other machines. It
does not appear in UEK5. The APIC and IRQ code is very different in UEK5,
and the code changed here doesn't exist there. It is unknown whether KVM
guests might see the same problem with UEK4.

Making arch_setup_hwirq() NUMA sensitive eliminates the problem by
using the correct cpumask for the node for the initial assignment. The
second assignment becomes a noop. After initialization is complete,
affinity can be moved to any CPU on any node and back without a problem.

Orabug: 29292411
Signed-off-by: Henry Willard <henry.willard@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: scsi_transport_iscsi: modify detected conn err to KERN_ERR

Fix for Orabug 29469714 is insufficient.  The goal of fix for
Orabug 29469714 is to print "detected conn error" message
produced by scsi_transport_iscsi module onto console which is
particularly useful when iscsi boot device is used and there
are connection issues.  The fix for Orabug 29469714 falls short
because currently, the default console log level in OL is 4
which means that only log levels less than 4 (KERN_WARNING) will
be printed to console.  This commit modifies the corresponding
printk log level to KERN_ERR (log level 3).

Orabug: 29487790
Signed-off-by: Fred Herard <fred.herard@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xen/blkfront: avoid NULL blkfront_info dereference on device removal

If a block device is hot-added when we are out of grants,
gnttab_grant_foreign_access fails with -ENOSPC (log message "28
granting access to ring page") in this code path:

  talk_to_blkback ->
setup_blkring ->
xenbus_grant_ring ->
gnttab_grant_foreign_access

and the failing path in talk_to_blkback sets the driver_data to NULL:

destroy_blkring:
        blkif_free(info, 0);

        mutex_lock(&blkfront_mutex);
        free_info(info);
        mutex_unlock(&blkfront_mutex);

        dev_set_drvdata(&dev->dev, NULL);

This results in a NULL pointer BUG when blkfront_remove and blkif_free
try to access the failing device's NULL struct blkfront_info.

Cc: stable@vger.kernel.org # 4.5 and later
Signed-off-by: Vasilis Liaskovitis <vliaskovitis@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Orabug: 29469740

(cherry picked from commit f92898e7f32e3533bfd95be174044bc349d416ca)

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Fix race conditions in .ndo_get_stats64().

.ndo_get_stats64() may not be protected by RTNL and can race with
.ndo_stop() or other ethtool operations that can free the statistics
memory. Fix it by setting a new flag BNXT_STATE_READ_STATS and then
proceeding to read statistics memory only if the state is OPEN. The
close path that frees the memory clears the OPEN state and then waits
for the BNXT_STATE_READ_STATS to clear before proceeding to free the
statistics memory.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f9b76ebd49f97458857568918c305a17fa7c6567)
Signed-off-by: aru kolappan <aru.kolappan@oracle.com>
Orabug: 29129625

Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: aru kolappan <aru.kolappan@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ext4: always verify the magic number in xattr blocks

commit 513f86d73855ce556ea9522b6bfd79f87356dc3a upstream.

If there an inode points to a block which is also some other type of
metadata block (such as a block allocation bitmap), the
buffer_verified flag can be set when it was validated as that other
metadata block type; however, it would make a really terrible external
attribute block. The reason why we use the verified flag is to avoid
constantly reverifying the block. However, it doesn't take much
overhead to make sure the magic number of the xattr block is correct,
and this will avoid potential crashes.

This addresses CVE-2018-10879.

https://bugzilla.kernel.org/show_bug.cgi?id=200001

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3150e8913b957d71398511a0580606e181153d10)

Orabug: 29437127
CVE: CVE-2018-10879

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/ext4/xattr.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

ext4: add corruption check in ext4_xattr_set_entry()

commit 5369a762c882c0b6e9599e4ebbb3a9ba9eee7e2d upstream.

In theory this should have been caught earlier when the xattr list was
verified, but in case it got missed, it's simple enough to add check
to make sure we don't overrun the xattr buffer.

This addresses CVE-2018-10879.

https://bugzilla.kernel.org/show_bug.cgi?id=200001

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0dc148230f3827f49b1e2a72d14cbad0c5c63e86)

Orabug: 29437127
CVE: CVE-2018-10879

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/ext4/xattr.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

net: add netif_is_lag_port helper

Orabug: 29495360

Some code does not mind if a device is bond slave or team port and treats
them the same, as generic LAG ports.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e0ba1414f310c83bf425fe26fa2cd5f1befcd6dc)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net: add netif_is_lag_master helper

Orabug: 29495360

Some code does not mind if the master is bond or team and treats them
the same, as generic LAG.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7be61833042e7757745345eedc7b0efee240c189)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net: add netif_is_team_port helper

Orabug: 29495360

Similar to other helpers, caller can use this to find out if device is
team port.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f7f019ee6d117de5007d0b10e7960696bbf111eb)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net: add netif_is_team_master helper

Orabug: 29495360

Similar to other helpers, caller can use this to find out if device is
team master.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c981e4213e9d2d4ec79501bd607722ec712742a2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/team/team.c
include/linux/netdevice.h

scsi: scsi_transport_iscsi: redirect conn error to console

This commit changes "detected conn error" printk
log level from KERN_INFO to KERN_WARNING. This change
is made with the assumption that KERN_WARNING messages
are configured to be redirected to console. It is
particularly useful to have detected connection errors
redirected to console when using iscsi boot device as it
may give clues as to why the system appears to be hung.

Orabug: 29469714

Signed-off-by: Fred Herard <fred.herard@oracle.com>
Reviewed-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

Revert x86/apic/x2apic: set affinity of a single interrupt to one cpu

The commit 092aa78c11f0
("x86/apic/x2apic: set affinity of a single interrupt to one cpu")
was causing performance regression on block storage server on X5.

On OCI X5 server, they were not binding irqs to CPUs 1:1,
irq to cpu affinity was set to multiple cpus
(/proc/$irq/smp_affinity: 00,003ffff0,0003ffff, cpu0-17 and 36-53).
This is not the default behavior of bnxt_en. From bnxt_en,
driver, when NIC link is up, it sets irq affinity, OCI assumed that
most of the bnxt_en interrupts will go to cpu3.

After the patch "x86/apic/x2apic:
set affinity of a single interrupt to one cpu",
if we set irq to cpu 1:1, it works fine, but if we set irq affinity
to multiple cpus, it only sets irq_cfg->domain/cpumask to the first
online cpu which is on the cpu affinity list. With the current setting
which caused the perf issue, although /proc/$irq/smp_affinity is set
to multiple cpus, irq_cfg->domain cpumask only has cpu 0, this lead all
ens4f0-TxRx interrupts to route to cpu0, also iscsi target application
was being run on CPU0 during the testing which led to the performace issue.
The issue is no longer seen after the patch was reverted.

Orabug: 29449976
Signed-off-by: Mridula Shastry <mridula.c.shastry@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

jiffies: use jiffies64_to_nsecs() to fix 100% steal usage for xen vcpu hotplug

[ Not relevant upstream, therefore no upstream commit. ]

To fix, use jiffies64_to_nsecs() directly instead of deriving the result
according to jiffies_to_usecs().

As the return type of jiffies_to_usecs() is 'unsigned int', when the return
value is more than the size of 'unsigned int', the leading 32 bits would be
discarded.

Suppose USEC_PER_SEC=1000000L and HZ=1000, below are the expected and
actual incorrect result of jiffies_to_usecs(0x7770ef70):

- expected  : jiffies_to_usecs(0x7770ef70) = 0x000001d291274d80
- incorrect : jiffies_to_usecs(0x7770ef70) = 0x0000000091274d80

The leading 0x000001d200000000 is discarded.

After xen vcpu hotplug and when the new vcpu steal clock is calculated for
the first time, the result of this_rq()->prev_steal_time in
steal_account_process_tick() would be far smaller than the expected
value, due to that jiffies_to_usecs() discards the leading 32 bits.

As a result, the diff between current steal and this_rq()->prev_steal_time
is always very large. Steal usage would become 100% when the initial steal
clock obtained from xen hypervisor is very large during xen vcpu hotplug,
that is, when the guest is already up for a long time.

The bug can be detected by doing the following:

* Boot xen guest with vcpus=2 and maxvcpus=4
* Leave the guest running for a month so that the initial steal clock for
  the new vcpu would be very large
* Hotplug 2 extra vcpus
* The steal time of new vcpus in /proc/stat would increase abnormally and
  sometimes steal usage in top can become 100%

This was incidentally fixed in the patch set starting by
commit 93825f2ec736 ("jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY")
and ended with
commit b672592f0221 ("sched/cputime: Remove generic asm headers").

Orabug: 28806208

Link: https://lkml.org/lkml/2019/2/28/1373
Suggested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

time: Introduce jiffies64_to_nsecs()

This will be needed for the cputime_t to nsec conversion.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 07e5f5e353aaa61696c8353d87050994a0c4648a)

Orabug: 28806208

This backport makes jiffies64_to_nsecs() available for the next patch.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net_failover: delay taking over primary device to accommodate udevd renaming

There is an inherent race with udev rename in userspace due to the exposure
of two lower slave devices while kernel attempts to manage the creation
for failover bonding itself automatically. The existing userspace naming
logic in udevd was not specifically written for this in-kernel
automagic.

The clean fix for the problem is either to update the udevd to not try
rename the 3-netdev (ideally rename the device in a coordinated manner),
or to fix the kernel to hide the 2 lower devices which does not have to
be shown to userspace unless needed (1-netdev model).

However, our pursuance of 1-netdev model had not been acknowledged by
upstream, and there's no motivation in the systemd/udevd community at
this point to refactor the rename logic and make it work well with
3-netdev.

Hyper-V's netvsc mitigated this by postponing the VF's dev_open() to
allow a userspace thread to rename the device within a 100ms worth of
window. For the interim, we follow the same as done by netvsc to avoid
the renaming failure, until we move to the point where a clean solution
is available in upstream.

OraBug: 29281273

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: John Haxby <john.haxby@oracle.com>
Reviewed-by: Vijay Balakrishna <vijay.balakrishna@oracle.com>
Tested-by: Vijay Balakrishna <vijay.balakrishna@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

fs/dcache.c: add cond_resched() in shrink_dentry_list()

Orabug: 29412146

As previously reported (https://patchwork.kernel.org/patch/8642031/)
it's possible to call shrink_dentry_list with a large number of dentries
(> 10000).  This, in turn, could trigger the softlockup detector and
possibly trigger a panic.  In addition to the unmount path being
vulnerable to this scenario, at SuSE we've observed similar situation
happening during process exit on processes that touch a lot of dentries.
Here is an excerpt from a crash dump.  The number after the colon are
the number of dentries on the list passed to shrink_dentry_list:

PID 99760: 10722
PID 107530: 215
PID 108809: 24134
PID 108877: 21331
PID 141708: 16487

So we want to kill between 15k-25k dentries without yielding.

And one possible call stack looks like:

4 [ffff8839ece41db0] _raw_spin_lock at ffffffff8152a5f8
5 [ffff8839ece41db0] evict at ffffffff811c3026
6 [ffff8839ece41dd0] __dentry_kill at ffffffff811bf258
7 [ffff8839ece41df0] shrink_dentry_list at ffffffff811bf593
8 [ffff8839ece41e18] shrink_dcache_parent at ffffffff811bf830
9 [ffff8839ece41e50] proc_flush_task at ffffffff8120dd61
10 [ffff8839ece41ec0] release_task at ffffffff81059ebd
11 [ffff8839ece41f08] do_exit at ffffffff8105b8ce
12 [ffff8839ece41f78] sys_exit at ffffffff8105bd53
13 [ffff8839ece41f80] system_call_fastpath at ffffffff81532909

While some of the callers of shrink_dentry_list do use cond_resched,
this is not sufficient to prevent softlockups.  So just move
cond_resched into shrink_dentry_list from its callers.

David said: I've found hundreds of occurrences of warnings that we emit
when need_resched stays set for a prolonged period of time with the
stack trace that is included in the change log.

Link: http://lkml.kernel.org/r/1521718946-31521-1-git-send-email-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 32785c0539b7e96f77a14a4f4ab225712665a5a4)
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Conflicts:
fs/dcache.c
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

NFS: commit direct writes even if they fail partially

If some of the WRITE calls making up an O_DIRECT write syscall fail,
we neglect to commit, even if some of the WRITEs succeed.

We also depend on the commit code to free the reference count on the
nfs_page taken in the "if (request_commit)" case at the end of
nfs_direct_write_completion(). The problem was originally noticed
because ENOSPC's encountered partway through a write would result in a
closed file being sillyrenamed when it should have been unlinked.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
(cherry picked from commit 1b8d97b0a837beaf48a8449955b52c650a7114b4)

Orabug: 28212440
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Calum Mackay <calum.mackay@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: update correct congestion map for loopback transport

Loopback transport delivers data directly to the destination
socket since destination is always local to the machine.
The connection data structure passed to an internal
initializer API rds_inc_init() is from send side (unlike the
usual call to APIs from receive path when receive side
connection data structure is passed).

This inconsistency causes an update of the incorrect
congestion map when marking destination port congested when
loopback transport is used (which is when one or both end(s)
of a RDS connection has an IP loopback address).

The fix it to ensure correct map is updated, that of the
destination IP regardless of delivery coming from send side
of loopback transport or receive side of other transports.

Orabug: 29175685

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ext4: only look at the bg_flags field if it is valid

The bg_flags field in the block group descripts is only valid if the
uninit_bg or metadata_csum feature is enabled. We were not
consistently looking at this field; fix this.

Also block group #0 must never have uninitialized allocation bitmaps,
or need to be zeroed, since that's where the root inode, and other
special inodes are set up. Check for these conditions and mark the
file system as corrupted if they are detected.

This addresses CVE-2018-10876.

https://bugzilla.kernel.org/show_bug.cgi?id=199403

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
(cherry picked from commit 8844618d8aa7a9973e7b527d038a2a589665002c)

Orabug: 29316684
CVE: CVE-2018-10876.

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/ext4/balloc.c
fs/ext4/ialloc.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

uek-rpm: Add kernel-uek version to kernel-ueknano provides

Orabug: 29357643

kernel-ueknano package provides kernel-uek with no version in it. So it
can match any kernel-uek version installed. This commit adds the version
that the rpm is built for in kernel-uek provides also.

Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net: Set sk_prot_creator when cloning sockets to the right proto

sk->sk_prot and sk->sk_prot_creator can differ when the app uses
IPV6_ADDRFORM (transforming an IPv6-socket to an IPv4-one).
Which is why sk_prot_creator is there to make sure that sk_prot_free()
does the kmem_cache_free() on the right kmem_cache slab.

Now, if such a socket gets transformed back to a listening socket (using
connect() with AF_UNSPEC) we will allocate an IPv4 tcp_sock through
sk_clone_lock() when a new connection comes in. But sk_prot_creator will
still point to the IPv6 kmem_cache (as everything got copied in
sk_clone_lock()). When freeing, we will thus put this
memory back into the IPv6 kmem_cache although it was allocated in the
IPv4 cache. I have seen memory corruption happening because of this.

With slub-debugging and MEMCG_KMEM enabled this gives the warning
"cache_from_obj: Wrong slab cache. TCPv6 but object is from TCP"

A C-program to trigger this:

void main(void)
{
        int fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP);
        int new_fd, newest_fd, client_fd;
        struct sockaddr_in6 bind_addr;
        struct sockaddr_in bind_addr4, client_addr1, client_addr2;
        struct sockaddr unsp;
        int val;

        memset(&bind_addr, 0, sizeof(bind_addr));
        bind_addr.sin6_family = AF_INET6;
        bind_addr.sin6_port = ntohs(42424);

        memset(&client_addr1, 0, sizeof(client_addr1));
        client_addr1.sin_family = AF_INET;
        client_addr1.sin_port = ntohs(42424);
        client_addr1.sin_addr.s_addr = inet_addr("127.0.0.1");

        memset(&client_addr2, 0, sizeof(client_addr2));
        client_addr2.sin_family = AF_INET;
        client_addr2.sin_port = ntohs(42421);
        client_addr2.sin_addr.s_addr = inet_addr("127.0.0.1");

        memset(&unsp, 0, sizeof(unsp));
        unsp.sa_family = AF_UNSPEC;

        bind(fd, (struct sockaddr *)&bind_addr, sizeof(bind_addr));

        listen(fd, 5);

        client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
        connect(client_fd, (struct sockaddr *)&client_addr1, sizeof(client_addr1));
        new_fd = accept(fd, NULL, NULL);
        close(fd);

        val = AF_INET;
        setsockopt(new_fd, SOL_IPV6, IPV6_ADDRFORM, &val, sizeof(val));

        connect(new_fd, &unsp, sizeof(unsp));

        memset(&bind_addr4, 0, sizeof(bind_addr4));
        bind_addr4.sin_family = AF_INET;
        bind_addr4.sin_port = ntohs(42421);
        bind(new_fd, (struct sockaddr *)&bind_addr4, sizeof(bind_addr4));

        listen(new_fd, 5);

        client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
        connect(client_fd, (struct sockaddr *)&client_addr2, sizeof(client_addr2));

        newest_fd = accept(new_fd, NULL, NULL);
        close(new_fd);

        close(client_fd);
        close(new_fd);
}

As far as I can see, this bug has been there since the beginning of the
git-days.

Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9d538fa60bad4f7b23193c89e843797a1cf71ef3)

Orabug: 29422739
CVE: CVE-2018-9568

Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ext4: always check block group bounds in ext4_init_block_bitmap()

commit 819b23f1c501b17b9694325471789e6b5cc2d0d2 upstream.

Regardless of whether the flex_bg feature is set, we should always
check to make sure the bits we are setting in the block bitmap are
within the block group bounds.

https://bugzilla.kernel.org/show_bug.cgi?id=199865

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ac48bb9bc0a32f5a4432be1645b57607f8c46aa7)

Orabug: 29428607
CVE: CVE-2018-10878

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ext4: make sure bitmaps and the inode table don't overlap with bg descriptors

commit 77260807d1170a8cf35dbb06e07461a655f67eee upstream.

It's really bad when the allocation bitmaps and the inode table
overlap with the block group descriptors, since it causes random
corruption of the bg descriptors. So we really want to head those off
at the pass.

https://bugzilla.kernel.org/show_bug.cgi?id=199865

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ac93c718365ac6ea9d7631641c8dec867d623491)

Orabug: 29428607
CVE: CVE-2018-10878

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags

Add an sb_rdonly() function to query the MS_RDONLY flag on sb->s_flags
preparatory to providing an SB_RDONLY flag.

Signed-off-by: David Howells <dhowells@redhat.com>
(cherry picked from commit 94e92e7ac90d06e1e839e112d3ae80b2457dbdd7)

Orabug: 29428607
CVE: CVE-2018-10878

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

iscsi: Capture iscsi debug messages using tracepoints

This commit enhances iscsi initiator modules to capture iscsi debug messages
using linux kernel tracepoint facility:

https://www.kernel.org/doc/Documentation/trace/tracepoints.txt

The following tracepoint events have been created under the iscsi tracepoint
event group:

iscsi_dbg_conn - to capture connection debug messages (libiscsi module)
iscsi_dbg_session - to capture session debug messages (libiscsi module)
iscsi_dbg_eh - to capture error handling debug messages (libiscsi module)
iscsi_dbg_tcp - to capture iscsi tcp debug messages (libiscsi_tcp module)
iscsi_dbg_sw_tcp - to capture iscsi sw tcp debug messages (iscsi_tcp module)
iscsi_dbg_trans_session - to cpature iscsi trasnsport sess debug messages
(scsi_transport_iscsi module)
iscsi_dbg_trans_conn - to capture iscsi tansport conn debug messages
(scsi_transport_iscsi module)

Orabug: 29429855

Signed-off-by: Fred Herard <fred.herard@oracle.com>
Reviewed-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

KEYS: add missing permission check for request_key() destination

When the request_key() syscall is not passed a destination keyring, it
links the requested key (if constructed) into the "default" request-key
keyring.  This should require Write permission to the keyring.  However,
there is actually no permission check.

This can be abused to add keys to any keyring to which only Search
permission is granted.  This is because Search permission allows joining
the keyring.  keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_SESSION_KEYRING)
then will set the default request-key keyring to the session keyring.
Then, request_key() can be used to add keys to the keyring.

Both negatively and positively instantiated keys can be added using this
method.  Adding negative keys is trivial.  Adding a positive key is a
bit trickier.  It requires that either /sbin/request-key positively
instantiates the key, or that another thread adds the key to the process
keyring at just the right time, such that request_key() misses it
initially but then finds it in construct_alloc_key().

Fix this bug by checking for Write permission to the keyring in
construct_get_dest_keyring() when the default keyring is being used.

We don't do the permission check for non-default keyrings because that
was already done by the earlier call to lookup_user_key().  Also,
request_key_and_link() is currently passed a 'struct key *' rather than
a key_ref_t, so the "possessed" bit is unavailable.

We also don't do the permission check for the "requestor keyring", to
continue to support the use case described by commit 8bbf4976b59f
("KEYS: Alter use of key instantiation link-to-keyring argument") where
/sbin/request-key recursively calls request_key() to add keys to the
original requestor's destination keyring.  (I don't know of any users
who actually do that, though...)

Fixes: 3e30148c3d52 ("[PATCH] Keys: Make request-key create an authorisation key")
Cc: <stable@vger.kernel.org> # v2.6.13+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
(cherry picked from commit 4dca6ea1d9432052afb06baf2e3ae78188a4410b)

Orabug: 29304551
CVE: CVE-2017-17807

Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
security/keys/request_key.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

KEYS: Don't permit request_key() to construct a new keyring

If request_key() is used to find a keyring, only do the search part - don't
do the construction part if the keyring was not found by the search. We
don't really want keyrings in the negative instantiated state since the
rejected/negative instantiation error value in the payload is unioned with
keyring metadata.

Now the kernel gives an error:

request_key("keyring", "#selinux,bdekeyring", "keyring", KEY_SPEC_USER_SESSION_KEYRING) = -1 EPERM (Operation not permitted)

Signed-off-by: David Howells <dhowells@redhat.com>
(cherry picked from commit 911b79cde95c7da0ec02f48105358a36636b7a71)

Orabug: 29304551
CVE: CVE-2017-17807

Reviewed-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

mlx4_ib: Distribute completion vectors when zero is supplied

MAD packet sending/receiving is not properly virtualized in
CX-3. Hence, these are proxied through the PF driver. The proxying
uses UD QPs. The associated CQs are created with completion vector
zero, in anticipation that zero will return the least-used vector, as
per commit 6ba1eb776461 ("IB/mlx4: Scatter CQs to different EQs").

However, this does not happen, and we see that only the first EQ is
used for these proxy QPs.

This leads to great imbalance in CPU processing, in particular during
fail-over and fail-back, when a large number of RDMA CM
requests/responses are proxied through the PF driver.

Orabug: 29318191

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Sudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Fix TX timeout during netpoll.

Orabug: 29357977

The current netpoll implementation in the bnxt_en driver has problems
that may miss TX completion events. bnxt_poll_work() in effect is
only handling at most 1 TX packet before exiting. In addition,
there may be in flight TX completions that ->poll() may miss even
after we fix bnxt_poll_work() to handle all visible TX completions.
netpoll may not call ->poll() again and HW may not generate IRQ
because the driver does not ARM the IRQ when the budget (0 for netpoll)
is reached.

We fix it by handling all TX completions and to always ARM the IRQ
when we exit ->poll() with 0 budget.

Also, the logic to ACK the completion ring in case it is almost filled
with TX completions need to be adjusted to take care of the 0 budget
case, as discussed with Eric Dumazet <edumazet@google.com>

Reported-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 73f21c653f930f438d53eed29b5e4c65c8a0f906)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Fix for system hang if request_irq fails

Orabug: 29357977

Fix bug in the error code path when bnxt_request_irq() returns failure.
bnxt_disable_napi() should not be called in this error path because
NAPI has not been enabled yet.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c58387ab1614f6d7fb9e244f214b61e7631421fc)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Fix firmware message delay loop regression.

Orabug: 29357977

A recent change to reduce delay granularity waiting for firmware
reponse has caused a regression.  With a tighter delay loop,
the driver may see the beginning part of the response faster.
The original 5 usec delay to wait for the rest of the message
is not long enough and some messages are detected as invalid.

Increase the maximum wait time from 5 usec to 20 usec.  Also, fix
the debug message that shows the total delay time for the response
when the message times out.  With the new logic, the delay time
is not fixed per iteration of the loop, so we define a macro to
show the total delay time.

Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cc559c1ac250a6025bd4a9528e424b8da250655b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

bnxt_en: reduce timeout on initial HWRM calls

Orabug: 29357977

Testing with DIM enabled on older kernels indicated that firmware calls
were slower than expected. More detailed analysis indicated that the
default 25us delay was higher than necessary. Reducing the time spend in
usleep_range() for the first several calls would reduce the overall
latency of firmware calls on newer Intel processors.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9751e8e714872aa650b030e52a9fafbb694a3714)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c

bnxt_en: Fix NULL pointer dereference at bnxt_free_irq().

Orabug: 29357977

When open fails during ethtool -L ring change, for example, the driver
may crash at bnxt_free_irq() because bp->bnapi is NULL.

If we fail to allocate all the new rings, bnxt_open_nic() will free
all the memory including bp->bnapi. Subsequent call to bnxt_close_nic()
will try to dereference bp->bnapi in bnxt_free_irq().

Fix it by checking for !bp->bnapi in bnxt_free_irq().

Fixes: e5811b8c09df ("bnxt_en: Add IRQ remapping logic.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cb98526bf9b985866d648dbb9c983ba9eb59daba)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa().

Orabug: 29357977

During initialization, if we encounter errors, there is a code path that
calls bnxt_hwrm_vnic_set_tpa() with invalid VNIC ID. This may cause a
warning in firmware logs.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3c4fe80b32c685bdc02b280814d0cfe80d441c72)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Do not modify max IRQ count after RDMA driver requests/frees IRQs.

Orabug: 29357977

Calling bnxt_set_max_func_irqs() to modify the max IRQ count requested or
freed by the RDMA driver is flawed.  The max IRQ count is checked when
re-initializing the IRQ vectors and this can happen multiple times
during ifup or ethtool -L.  If the max IRQ is reduced and the RDMA
driver is operational, we may not initailize IRQs correctly.  This
problem shows up on VFs with very small number of MSIX.

There is no other logic that relies on the IRQ count excluding the ones
used by RDMA.  So we fix it by just removing the call to subtract or
add the IRQs used by RDMA.

Fixes: a588e4580a7e ("bnxt_en: Add interface to support RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 30f529473ec962102e8bcd33a6a04f1e1b490ae2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.h
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c

mm: cleancache: fix corruption on missed inode invalidation

commit 6ff38bd40230af35e446239396e5fc8ebd6a5248 upstream.

If all pages are deleted from the mapping by memory reclaim and also
moved to the cleancache:

__delete_from_page_cache
  (no shadow case)
  unaccount_page_cache_page
    cleancache_put_page
  page_cache_delete
    mapping->nrpages -= nr
    (nrpages becomes 0)

We don't clean the cleancache for an inode after final file truncation
(removal).

truncate_inode_pages_final
  check (nrpages || nrexceptional) is false
    no truncate_inode_pages
      no cleancache_invalidate_inode(mapping)

These way when reading the new file created with same inode we may get
these trash leftover pages from cleancache and see wrong data instead of
the contents of the new file.

Fix it by always doing truncate_inode_pages which is already ready for
nrpages == 0 && nrexceptional == 0 case and just invalidates inode.

[akpm@linux-foundation.org: add comment, per Jan]
Link: http://lkml.kernel.org/r/20181112095734.17979-1-ptikhomirov@virtuozzo.com
Fixes: commit 91b0abe36a7b ("mm + fs: store shadow entries in page cache")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 29364670
CVE: CVE-2018-16862

Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

l2tp: fix reading optional fields of L2TPv3

Use pskb_may_pull() to make sure the optional fields are in skb linear
parts, so we can safely read them later.

It's easy to reproduce the issue with a net driver that supports paged
skb data. Just create a L2TPv3 over IP tunnel and then generates some
network traffic.
Once reproduced, rx err in /sys/kernel/debug/l2tp/tunnels will increase.

Changes in v4:
1. s/l2tp_v3_pull_opt/l2tp_v3_ensure_opt_in_linear/
2. s/tunnel->version != L2TP_HDR_VER_2/tunnel->version == L2TP_HDR_VER_3/
3. Add 'Fixes' in commit messages.

Changes in v3:
1. To keep consistency, move the code out of l2tp_recv_common.
2. Use "net" instead of "net-next", since this is a bug fix.

Changes in v2:
1. Only fix L2TPv3 to make code simple.
   To fix both L2TPv3 and L2TPv2, we'd better refactor l2tp_recv_common.
   It's complicated to do so.
2. Reloading pointers after pskb_may_pull

Fixes: f7faffa3ff8e ("l2tp: Add L2TPv3 protocol support")
Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support")
Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4522a70db7aa5e77526a4079628578599821b193)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/l2tp/l2tp_core.c
net/l2tp/l2tp_ip.c
net/l2tp/l2tp_ip6.c
Commit 2b139e6b1ec8 ("l2tp: remove ->recv_payload_hook") is not in UEK5.

l2tp_core.h:
    s/l2tp_get_l2specific_len(session)/session->l2specific_len/ due to
    62e7b6a57c7b ("l2tp: remove l2specific_len dependency in l2tp_core")
    is not in UEK4.

Orabug: 29368048

Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/packet: fix a race in packet_bind() and packet_notifier()

[ Upstream commit 15fe076edea787807a7cdc168df832544b58eba6 ]

syzbot reported crashes [1] and provided a C repro easing bug hunting.

When/if packet_do_bind() calls __unregister_prot_hook() and releases
po->bind_lock, another thread can run packet_notifier() and process an
NETDEV_UP event.

This calls register_prot_hook() and hooks again the socket right before
first thread is able to grab again po->bind_lock.

Fixes this issue by temporarily setting po->num to 0, as suggested by
David Miller.

[1]
dev_remove_pack: ffff8801bf16fa80 not found
------------[ cut here ]------------
kernel BUG at net/core/dev.c:7945!  ( BUG_ON(!list_empty(&dev->ptype_all)); )
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
device syz0 entered promiscuous mode
CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801cc57a500 task.stack: ffff8801cc588000
RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945
RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293
RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2
RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810
device syz0 entered promiscuous mode
RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8
R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0
FS:  0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106
tun_detach drivers/net/tun.c:670 [inline]
tun_chr_close+0x49/0x60 drivers/net/tun.c:2845
__fput+0x333/0x7f0 fs/file_table.c:210
____fput+0x15/0x20 fs/file_table.c:244
task_work_run+0x199/0x270 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x9bb/0x1ae0 kernel/exit.c:865
do_group_exit+0x149/0x400 kernel/exit.c:968
SYSC_exit_group kernel/exit.c:979 [inline]
SyS_exit_group+0x1d/0x20 kernel/exit.c:977
entry_SYSCALL_64_fastpath+0x1f/0x96
RIP: 0033:0x44ad19

Fixes: 30f7ea1c2b5f ("packet: race condition in packet_bind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
(cherry picked from commit 72b01bee76591d3741a87a689bf210d729d530f8)

Orabug: 29385593
CVE: CVE-2018-18559

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ext4: verify the depth of extent tree in ext4_find_extent()

commit bc890a60247171294acc0bd67d211fa4b88d40ba upstream.

If there is a corupted file system where the claimed depth of the
extent tree is -1, this can cause a massive buffer overrun leading to
sadness.

This addresses CVE-2018-10877.

https://bugzilla.kernel.org/show_bug.cgi?id=199417

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d69a9df614fc68741efcb0fcc020f05caa99d668)

Orabug: 29396712
CVE:CVE-2018-10877

Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

blk-mq: Do not invoke .queue_rq() for a stopped queue

The meaning of the BLK_MQ_S_STOPPED flag is "do not call
.queue_rq()". Hence modify blk_mq_make_request() such that requests
are queued instead of issued if a queue has been stopped.

Reported-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Orabug: 28766011

commit bc27c01b5c46d3bfec42c96537c7a3fae0bb2cc4 upstream

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
  - There are so many commits between the most recent uek4 block commit and
    this upstream commit, and blk_mq_make_request() in upstream commit is
    different with uek4. blk_mq_direct_issue_request() is not available.
  - The 3rd argument of blk_mq_insert_request() is set to false in this
    backport because there is no need to run the queue again when it is
    already stopped.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

uek-rpm: use multi-threaded xz compression for rpms

By default kernel rpms use xz compression for all rpms files and
gzip compression for src.rpms. This commit changes compression type to xz
for all rpms produced, and enables multi-threaded compression for rpms. It
allows to use as many threads as there are CPU cores available.

Orabug: 29323635

Signed-off-by: Alex Burmashev <alexander.burmashev@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

uek-rpm: optimize find-requires usage

There are no deps found for debuginfo-common, doc and headers subpackage, so
Autoreq: no is now set for them, also switch from /usr/lib/rpm/redhat/find-requires
usage to /usr/lib/rpm/rpmdeps --requires, since latter is faster and less buggy.
Both changes noticeably boost kernel rpm build time.

Orabug: 29323635

Signed-off-by: Alex Burmashev <alexander.burmashev@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

find-debuginfo.sh: backport parallel files procession

Use bundled find-debuginfo.sh instead of copying one during the build

rpm upstream commit 038bfe01796f751001e02de41c5d8678f511f366

find-debuginfo.sh: Split directory traversal and debuginfo extraction

This siplifies the handling of hardlinks a bit and allows a later patch
to parallelize the debuginfo extraction.

Signed-off-by: Michal Marek <mmarek@suse.com>
rpm upstream commit 1b338aa84d4c67fefa957352a028eaca1a45d1f6

find-debuginfo.sh: Process files in parallel

Add a -j <n> option, which, when used, will spawn <n> processes to do the
debuginfo extraction in parallel. A pipe is used to dispatch the files among
the processes.

Signed-off-by: Michal Marek <mmarek@suse.com>
Orabug: 29323635

Signed-off-by: Alex Burmashev <alexander.burmashev@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Todd Vierling <todd.vierling@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

KVM: SVM: Add MSR-based feature support for serializing LFENCE

In order to determine if LFENCE is a serializing instruction on AMD
processors, MSR 0xc0011029 (MSR_F10H_DECFG) must be read and the state
of bit 1 checked.  This patch will add support to allow a guest to
properly make this determination.

Add the MSR feature callback operation to svm.c and add MSR 0xc0011029
to the list of MSR-based features.  If LFENCE is serializing, then the
feature is supported, allowing the hypervisor to set the value of the
MSR that guest will see.  Support is also added to write (hypervisor only)
and read the MSR value for the guest.  A write by the guest will result in
a #GP.  A read by the guest will return the value as set by the host.  In
this way, the support to expose the feature to the guest is controlled by
the hypervisor.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry picked from commit d1d93fa90f1afa926cb060b7f78ab01a65705b4d)

Orabug: 29335274

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kvm/svm.c
arch/x86/kvm/x86.c
Contextual

Signed-off-by: Brian Maly <brian.maly@oracle.com>

Enable RANDOMIZE_BASE

UEK4 needs at least some degree of KASLR; to that end enable
RANDOMIZE_BASE and set RANDOMIZE_BASE_MAX_OFFSET to its default value
(0x40000000).

Orabug: 29305587

Signed-off-by: John Haxby <john.haxby@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

slub: make ->cpu_partial unsigned

/*
* cpu_partial determined the maximum number of objects
* kept in the per cpu partial lists of a processor.
*/

Can't be negative.

Orabug: 28620592

We can't reproduce the issue, this patch is expected to help in theory.

Link: http://lkml.kernel.org/r/20180305200730.15812-15-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

dtrace: support kernels built with RANDOMIZE_BASE

SDT probe addresses were being generated as absolute addresses which
breaks when the kernel may get relocated to a place other than the
default load address.

The solution is to generate the probe locations as an offset relative to
the _stext symbol in the .tmp_sdtinfo.S source file (generated at build
time), so that the actual addresses are processed as relocations when the
kernel boots.

This fix also optimizes the SDT info data (function and probe names) by
using de-duplication since especially with perf probes) many of those
strings are non-unique.

Orabug: 29204005
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
Tested-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Fix the AMD SSBD usage of the SPEC_CTRL MSR

On AMD, the presence of the MSR_SPEC_CTRL feature does not imply that the
SSBD mitigation support should use the SPEC_CTRL MSR. Other features could
have caused the MSR_SPEC_CTRL feature to be set, while a different SSBD
mitigation option is in place.

Update the SSBD support to check for the actual SSBD features that will
use the SPEC_CTRL MSR.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 6ac2f49edb1e ("x86/bugs: Add AMD's SPEC_CTRL MSR usage")
Link: http://lkml.kernel.org/r/20180702213602.29202.33151.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 612bc3b3d4be749f73a513a17d9b3ee1330d3487)

Orabug: 28870524
CVE: CVE-2018-3639

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
Different filename (bugs_64.c) and different context

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/bugs: Add AMD's SPEC_CTRL MSR usage

The AMD document outlining the SSBD handling
124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf
mentions that if CPUID 8000_0008.EBX[24] is set we should be using
the SPEC_CTRL MSR (0x48) over the VIRT SPEC_CTRL MSR (0xC001_011f)
for speculative store bypass disable.

This in effect means we should clear the X86_FEATURE_VIRT_SSBD
flag so that we would prefer the SPEC_CTRL MSR.

See the document titled:
124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf

A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199889

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: kvm@vger.kernel.org
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: andrew.cooper3@citrix.com
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20180601145921.9500-3-konrad.wilk@oracle.com
(cherry picked from commit 6ac2f49edb1ef5446089c7c660017732886d62d6)

Orabug: 28870524
CVE: CVE-2018-3639

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/include/asm/cpufeatures.h
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/cpu/common.c
arch/x86/kvm/cpuid.c
arch/x86/kvm/svm.c
The conflicts were due to different filenames (cpufeature.h, bugs_64.c) and different context.

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/cpufeatures: rename X86_FEATURE_AMD_SSBD to X86_FEATURE_LS_CFG_SSBD

The commit 52817587e706 ('x86/cpufeatures: Disentangle SSBD enumeration') from
upstream disentangles SSBD enumeration. We did not backport that commit because
we did not have what to disentangle on UEK4. Our cpufeature was already
synthetic.

That commit also renames X86_FEATURE_AMD_SSBD to X86_FEATURE_LS_CFG_SSBD. We
need this rename in order to not have conflicting cpu features while
backporting commit 6ac2f49edb1e ('x86/bugs: Add AMD's SPEC_CTRL MSR usage')
from upstream which introduces SPEC_CTRL MSR, which will be the prefered
method.

Orabug: 28870524
CVE: CVE-2018-3639

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Make file credentials available to the seqfile interfaces

A lot of seqfile users seem to be using things like %pK that uses the
credentials of the current process, but that is actually completely
wrong for filesystem interfaces.

The unix semantics for permission checking files is to check permissions
at _open_ time, not at read or write time, and that is not just a small
detail: passing off stdin/stdout/stderr to a suid application and making
the actual IO happen in privileged context is a classic exploit
technique.

So if we want to be able to look at permissions at read time, we need to
use the file open credentials, not the current ones.  Normal file
accesses can just use "f_cred" (or any of the helper functions that do
that, like file_ns_capable()), but the seqfile interfaces do not have
any such options.

It turns out that seq_file _does_ save away the user_ns information of
the file, though.  Since user_ns is just part of the full credential
information, replace that special case with saving off the cred pointer
instead, and suddenly seq_file has all the permission information it
needs.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 34dbbcdbf63360661ff7bda6c5f52f99ac515f92)

Orabug: 29114879
CVE: CVE-2018-17972

Conflict:  Refactored include/linux/seq_file.h to include __GENKSYM__
and UEK_KABI_REPLACE() to pass check_kabi test.

Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

proc: restrict kernel stack dumps to root

Currently, you can use /proc/self/task/*/stack to cause a stack walk on
a task you control while it is running on another CPU.  That means that
the stack can change under the stack walker.  The stack walker does
have guards against going completely off the rails and into random
kernel memory, but it can interpret random data from your kernel stack
as instruction pointers and stack pointers.  This can cause exposure of
kernel stack contents to userspace.

Restrict the ability to inspect kernel stacks of arbitrary tasks to root
in order to prevent a local attacker from exploiting racy stack unwinding
to leak kernel task stack contents.  See the added comment for a longer
rationale.

There don't seem to be any users of this userspace API that can't
gracefully bail out if reading from the file fails.  Therefore, I believe
that this change is unlikely to break things.  In the case that this patch
does end up needing a revert, the next-best solution might be to fake a
single-entry stack based on wchan.

Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com
Fixes: 2ec220e27f50 ("proc: add /proc/*/stack")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f8a00cef17206ecd1b30d3d9f99e10d9fa707aa7)

Orabug: 29114879
CVE: CVE-2018-17972

Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/speculation: Clean up retpoline code in bugs.c

Now that the minimal retpoline modes are removed, also remove
unnecessary checks to simplify retpoline code.

Orabug: 29211617

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE

Commit

4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")

replaced the RETPOLINE define with CONFIG_RETPOLINE checks. Remove the
remaining pieces.

[ bp: Massage commit message. ]

Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux-kbuild@vger.kernel.org
Cc: srinivas.eeda@oracle.com
Cc: stable <stable@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181210163725.95977-1-chao.wang@ucloud.cn
(cherry picked from commit e4f358916d528d479c3c12bd2fd03f2d5a576380)

Orabug: 29211617

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
include/linux/compiler-gcc.h
include/linux/module.h
UEK4 either implements the changes in different files
or it does not have the patches that introduce the
lines changed by this cherry-picked commit.

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/build: Fix compiler support check for CONFIG_RETPOLINE

It is troublesome to add a diagnostic like this to the Makefile
parse stage because the top-level Makefile could be parsed with
a stale include/config/auto.conf.

Once you are hit by the error about non-retpoline compiler, the
compilation still breaks even after disabling CONFIG_RETPOLINE.

The easiest fix is to move this check to the "archprepare" like
this commit did:

829fe4aa9ac1 ("x86: Allow generating user-space headers without a compiler")

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
Link: http://lkml.kernel.org/r/1543991239-18476-1-git-send-email-yamada.masahiro@socionext.com
Link: https://lkml.org/lkml/2018/12/4/206
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 25896d073d8a0403b07e6dec56f58e6c33678207)

Orabug: 29211617

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/Makefile
The archprepare rule is different in UEK and upstream makefiles

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/retpoline: Remove minimal retpoline support

Now that CONFIG_RETPOLINE hard depends on compiler support, there is no
reason to keep the minimal retpoline support around which only provided
basic protection in the assembly files.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <srinivas.eeda@oracle.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/f06f0a89-5587-45db-8ed2-0a9d6638d5c0@default
(cherry picked from commit ef014aae8f1cd2793e4e014bbb102bed53f852b7)

Orabug: 29211617

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
UEK4 has the corresponding code in bugs_64.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support

Since retpoline capable compilers are widely available, make
CONFIG_RETPOLINE hard depend on the compiler capability.

Break the build when CONFIG_RETPOLINE is enabled and the compiler does not
support it. Emit an error message in that case:

"arch/x86/Makefile:226: *** You are building kernel with non-retpoline
compiler, please update your compiler.. Stop."

[dwmw: Fail the build with non-retpoline compiler]

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Borislav Petkov <bp@suse.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: <srinivas.eeda@oracle.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/cca0cb20-f9e2-4094-840b-fb0f8810cd34@default
(cherry picked from commit 4cd24de3a0980bf3100c9dcb08ef65ca7c31af48)

Orabug: 29211617

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/Kconfig
arch/x86/include/asm/nospec-branch.h
Minor differences between UEK and upstream.
arch/x86/Makefile
Need to add line defining RETPOLINE_CFLAGS.
arch/x86/kernel/cpu/bugs.c
UEK4 has the corresponding code in bugs_64.c
scripts/Makefile.build
Commit e699314 (objtool: Add retpoline validation) has not
been ported to UEK4, nothing to change.

Signed-off-by: Brian Maly <brian.maly@oracle.com>

nl80211: check for the required netlink attributes presence

nl80211_set_rekey_data() does not check if the required attributes
NL80211_REKEY_DATA_{REPLAY_CTR,KEK,KCK} are present when processing
NL80211_CMD_SET_REKEY_OFFLOAD request. This request can be issued by
users with CAP_NET_ADMIN privilege and may result in NULL dereference
and a system crash. Add a check for the required attributes presence.
This patch is based on the patch by bo Zhang.

This fixes CVE-2017-12153.

References: https://bugzilla.redhat.com/show_bug.cgi?id=1491046
Fixes: e5497d766ad ("cfg80211/nl80211: support GTK rekey offload")
Cc: <stable@vger.kernel.org> # v3.1-rc1
Reported-by: bo Zhang <zhangbo5891001@gmail.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
(cherry picked from commit e785fa0a164aa11001cba931367c7f94ffaff888)

Orabug: 29245533
CVE: CVE-2017-12153

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: lpfc: Fix PT2PT PRLI reject (reapply patch)

[backport of 114e80db15039e248eb4e458559cef57737930a8]
From: rkennedy <dick.kennedy@avagotech.com>

Orabug: 29281346

lpfc cannot establish connection with targets that send PRLI in P2P
configurations.

If lpfc rejects a PRLI that is sent from a target the target will not
resend and will reject the PRLI send from the initiator.

[tv: original mistakenly applied in reverse, because change was already
present in the code at that point; this reapplies forwards]

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Fred Herard <fred.herard@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

rds: congestion updates can be missed when kernel low on memory

The congestion updates are allocated under GFP_NOWAIT and can
fail under temporary memory pressure. These are not retried and
the update here retries them until sent.

On receiving congestion updates, corrupt packet check failures
are not logged as warnings.

Orabug: 28425811

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/rds: ib: Fix endless RNR Retries caused by memory allocation failures

Temporary memory allocation failures may cause an RDS connection
to be stuck in an endless RNR (receiver Not Ready). Right around the
time the RDS connection becomes stuck, it reports these recv buffer
allocation failures:

rcuos/10: page allocation failure: order:2, mode:0x2
Call Trace:
<IRQ> [<ffffffff81698cc0>] dump_stack+0x63/0x83
[<ffffffff8118e59a>] warn_alloc_failed+0xea/0x140
[<ffffffff810b93fa>] ? select_idle_sibling+0x2a/0x120
[<ffffffff81191e09>] __alloc_pages_slowpath+0x409/0x760
[<ffffffff81192411>] __alloc_pages_nodemask+0x2b1/0x2d0
[<ffffffff810bae62>] ? check_preempt_wakeup+0x112/0x230
[<ffffffff811dc3af>] alloc_pages_current+0xaf/0x170
[<ffffffffa12e2090>] rds_page_remainder_alloc+0x60/0x2a4
[<ffffffffa0b1c0ac>] rds_ib_refill_one_frag+0x13c/0x200 [rds_rdma]
[<ffffffffa12994cd>] rds_ib_recv_refill_one+0x8d/0x220
[<ffffffffa0b1dfbf>] rds_ib_recv_refill+0x11f/0x340 [rds_rdma]
[<ffffffffa129989e>] rds_ib_recv_cqe_handler+0x23e/0x290
[<ffffffffa0b19326>] poll_cq+0x66/0xe0 [rds_rdma]
[<ffffffffa0b1945d>] rds_ib_rx+0xbd/0x210 [rds_rdma]
[<ffffffffa0b1964a>] rds_ib_tasklet_fn_recv+0x3a/0x50 [rds_rdma]
[<ffffffff81088361>] tasklet_action+0xb1/0xc0
[<ffffffff8108871a>] __do_softirq+0x10a/0x350
[<ffffffff8169f53c>] do_softirq_own_stack+0x1c/0x30
<EOI> [<ffffffff81088445>] do_softirq+0x55/0x60
[<ffffffff81088528>] __local_bh_enable_ip+0x88/0x90
[<ffffffff810e86d1>] rcu_nocb_kthread+0xf1/0x180
[<ffffffff810e85e0>] ? print_cpu_stall+0x170/0x170
[<ffffffff810e85e0>] ? print_cpu_stall+0x170/0x170
[<ffffffff810a465e>] kthread+0xce/0xf0
[<ffffffff810a4590>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8169dda2>] ret_from_fork+0x42/0x70
[<ffffffff810a4590>] ? kthread_freezable_should_stop+0x70/0x70

We re-schedule recv buffer refiller on satisfying these conditions:

if (rds_conn_up(conn) &&
   (must_wake || (can_wait && ring_low)
              || rds_ib_ring_empty(&ic->i_recv_ring))) {
   queue_delayed_work(conn->c_wq, &conn->c_recv_w, 1);
}

This currently doesn't take into account memory allocation failures.

A bit later the memory pressure clears away.
But RDS does not refill receive buffers for that connection any more.
This is because the receiver is only woken up on the last packet of a
multi-packet message. But the last packet is never received, because the
recv queue becomes empty and we end up in the endless RNR Retry situation.

Orabug: 28127993

Consultation with: Haakon Bugge

Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Reviewed-by: Haakon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net: rds: fix excess initialization of the recv SGEs

In rds_ib_recv_init_ring(), an excess array element is incorrectly
initialized. This is not an OOB situation, as the sge array is
initialized to eight entries. With a fragment size of a maximum of 16KiB
and a page size of minimum 4KiB, then num_send_sge can at most become
five.

Orabug: 29004503

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xhci: fix usb2 resume timing and races.

According to USB 2 specs ports need to signal resume for at least 20ms,
in practice even longer, before moving to U0 state.
Both host and devices can initiate resume.

On device initiated resume, a port status interrupt with the port in resume
state in issued. The interrupt handler tags a resume_done[port]
timestamp with current time + USB_RESUME_TIMEOUT, and kick roothub timer.
Root hub timer requests for port status, finds the port in resume state,
checks if resume_done[port] timestamp passed, and set port to U0 state.

On host initiated resume, current code sets the port to resume state,
sleep 20ms, and finally sets the port to U0 state. This should also
be changed to work in a similar way as the device initiated resume, with
timestamp tagging, but that is not yet tested and will be a separate
fix later.

There are a few issues with this approach

1. A host initiated resume will also generate a resume event. The event
   handler will find the port in resume state, believe it's a device
   initiated resume, and act accordingly.

2. A port status request might cut the resume signalling short if a
   get_port_status request is handled during the host resume signalling.
   The port will be found in resume state. The timestamp is not set leading
   to time_after_eq(jiffies, timestamp) returning true, as timestamp = 0.
   get_port_status will proceed with moving the port to U0.

3. If an error, or anything else happens to the port during device
   initiated resume signalling it will leave all the device resume
   parameters hanging uncleared, preventing further suspend, returning
   -EBUSY, and cause the pm thread to busyloop trying to enter suspend.

Fix this by using the existing resuming_ports bitfield to indicate that
resume signalling timing is taken care of.
Check if the resume_done[port] is set before using it for timestamp
comparison, and also clear out any resume signalling related variables
if port is not in U0 or Resume state

This issue was discovered when a PM thread busylooped, trying to runtime
suspend the xhci USB 2 roothub on a Dell XPS

Cc: stable <stable@vger.kernel.org>
Reported-by: Daniel J Blueman <daniel@quora.org>
Tested-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 29028940

(cherry picked from commit f69115fdbc1ac0718e7d19ad3caa3da2ecfe1c96)
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xhci: Fix a race in usb2 LPM resume, blocking U3 for usb2 devices

Clear device initiated resume variables once device is fully up and running
in U0 state.

Resume needs to be signaled for 20ms for usb2 devices before they can be
moved to U0 state.

An interrupt is triggered if a device initiates resume. As we handle the
event in interrupt context we can not sleep for 20ms, so we instead set
a resume flag, a timestamp, and start the roothub polling.

The roothub code will later move the port to U0 when it finds a port in
resume state with the resume flag set, and timestamp passed by 20ms.

A host initiated resume is however not done in interrupt context, and
host initiated resume code will directly signal resume, wait 20ms and then
move the port to U0.

These two codepaths can race, if we are in the middle of a host initated
resume, while sleeping for 20ms, we may handle a port event and find the
port in resume state. The port event handling code will assume the resume
was device initiated and set the resume flag and timestamp.

Root hub code will however not catch the port in resume state again as the
host initated resume code has already moved the port to U0.
The resume flag and timestamp will remain set for this port preventing port
from suspending again (LPM setting port to U3)

Fix this for now by always clearing the device initated resume parameters
once port is in U0

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 29028940

(cherry picked from commit dad67d5f3d0efe01d38c6cebcb6698280e51927b)
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/usb/host/xhci-hub.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered

Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd
could trigger an harmless false positive WARN_ON. Check the vma is
already registered before checking VM_MAYWRITE to shut off the false
positive warning.

Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com
Cc: <stable@vger.kernel.org>
Fixes: 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: syzbot+06c7092e7d71218a2c16@syzkaller.appspotmail.com
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 29163750
CVE: CVE-2018-18397

commit 01e881f5a1fca4677e82733061868c6d6ea05ca7 upstream

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/userfaultfd.c

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas

After the VMA to register the uffd onto is found, check that it has
VM_MAYWRITE set before allowing registration. This way we inherit all
common code checks before allowing to fill file holes in shmem and
hugetlbfs with UFFDIO_COPY.

The userfaultfd memory model is not applicable for readonly files unless
it's a MAP_PRIVATE.

Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com
Fixes: ff62a3421044 ("hugetlb: implement memfd sealing")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd@google.com>
Reported-by: Jann Horn <jannh@google.com>
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Cc: <stable@vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 29163750
CVE: CVE-2018-18397

commit 29ec90660d68bbdd69507c1c8b4e33aa299278b1 upstream

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/userfaultfd.c
mm/userfaultfd.c

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

x86/apic/x2apic: set affinity of a single interrupt to one cpu

Customer want to offline the cpus to 2 per node. And finally, a
lpfc HBA cannot work any more due to no available
irq vectors.
[   51.031812] IRQ 284 set affinity failed because there are no available vectors.  The device assigned to this IRQ is unstable.
[   51.031817] IRQ 285 set affinity failed because there are no available vectors.  The device assigned to this IRQ is unstable.
[   51.031822] IRQ 286 set affinity failed because there are no available vectors.  The device assigned to this IRQ is unstable.
[   51.031827] IRQ 287 set affinity failed because there are no available vectors.  The device assigned to this IRQ is unstable.

It was due to cluster_vector_allocation_domain which want to set
interrupt affinity of a single interrupt to multiple CPUs and need
a same irq vector to be available on multiple cpus. This is difficult
for customer's case where there are a lot of HBAs on node 0 and only
2 or 4 cpus online there.

And actually, this feature has been discarded by the upstream.
https://lkml.org/lkml/2017/9/13/576
We close this feature by just set one cpu in retmask in
cluster_vector_allocation_domain.

Customer that encountered this issue used RHCK, since UEK4 also
has the same code, post a same patch for UEK4

Orabug: 29196396

Reviewed-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Jianchao Wang <jianchao.w.wang.oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xen/blkback: rework validate_io_op()

Rework many if statements in validate_io_op() into a switch statement.

Orabug: 29199843

Suggested-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xen/blkback: optimize validate_io_op() to filter BLKIF_OP_RESERVED_1 operation

Instead of hardcoding operation = 4, BLKIF_OP_RESERVED_1 = 4 is defined in
the header file.

Orabug: 29199843

Suggested-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xen/blkback: do not BUG() for invalid blkif_request from frontend

Upstream commit 0e367ae46503 ("xen/blkback: correctly respond to unknown,
non-native requests") fixed a bug to correctly respond to unknown,
non-native requests, e.g., BLKIF_OP_RESERVED_1 or BLKIF_OP_PACKET for
64-bit SLES 11 guests when using a 32-bit backend.

Although such fix is already in uek4, it is broken by commit f0af2f840606
("xen-blkback: move indirect req allocation out-of-line") that introduced
the BUG() again.

This patch removes the BUG() to avoid panic backend by invalid
blkif_request from frontend.

Orabug: 29199843

Fixes: f0af2f840606 ("xen-blkback: move indirect req allocation out-of-line")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/rds: WARNING: at net/rds/recv.c:222 rds_recv_hs_exthdrs+0xf8/0x1e0

The stack trace looks as follows:

WARNING: at net/rds/recv.c:222 rds_recv_hs_exthdrs+0xf8/0x1e0 [rds]()
Call Trace:
dump_stack+0x63/0x81
warn_slowpath_common+0x8a/0xc0
warn_slowpath_null+0x1a/0x20
rds_recv_hs_exthdrs+0xf8/0x1e0 [rds]
rds_recv_local.isra.7+0x396/0x440 [rds]
rds_recv_incoming+0x2d8/0x3c0 [rds]
rds_ib_recv_cqe_handler+0x44f/0x6d0 [rds_rdma]
poll_rcq+0x7a/0xa0 [rds_rdma]
rds_ib_rx+0xa4/0x220 [rds_rdma]
rds_ib_tasklet_fn_recv+0x30/0x40 [rds_rdma]
...

commit 041dc3e4d3
("Backport multipath RDS from upstream to UEK4") treats an
incoming rds ping or rds pong differently if the local (in case of pong) or
sender's port (in case of ping) is 1 (RDS_FLAG_PROBE_PORT).

There is nothing stopping rds-ping from picking this port for it's local side
since it does wildcard socket bind.

The fix is to check for t_mp_capable transport.

Orabug: 29201779

Reviewed-by: Haakon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xen-netback: wake up xenvif_dealloc_kthread when it should stop

The feature 'staging grant' changed the behaviour of
xenvif_zerocopy_callback() that queue->dealloc_prod may not increase during
the do-while loop because of 'staging grant'. As a result,
xenvif_skb_zerocopy_complete() would not wake up xenvif_dealloc_kthread
because (prod == queue->dealloc_prod).

This makes trouble when the xenvif_dealloc_kthread is requested to stop by
xenvif_disconnect(). When xenvif_dealloc_kthread is stopped while
inflight_packets is not 0, xenvif_dealloc_kthread would not exit until
inflight_packets becomes 0.

However, because of 'staging grant', xenvif_skb_zerocopy_complete() would
not wake up xenvif_dealloc_kthread() although inflight_packets is
decremented and already becomes 0. As a result, xenvif_dealloc_kthread will
never wakes up.

xenvif_skb_zerocopy_complete() should wake up xenvif_dealloc_kthread when
the latter is in the progress to stop.

Orabug: 29217927

Fixes: fdbb2e3659b3 ("xen-netback: use gref mappings for Tx requests")
Reported-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "xfs: remove nonblocking mode from xfs_vm_writepage"

This reverts commit 6e2de7d4578d4f6ae76979286de5c5ee8e91754a.

These commits are very possibly to cause SIGBUS issue. (We can't verify
that in customer's environment). Revert them.

Orabug: 29279692

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "xfs: remove xfs_cancel_ioend"

This reverts commit c680537035066177fa845053354974f7245c02d8.

These commits are very possibly to cause SIGBUS issue. (We can't verify
that in customer's environment). Revert them.

Orabug: 29279692

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "xfs: Introduce writeback context for writepages"

This reverts commit b104054b547e9034c5c7bf763d08e9803b5b58ed.

These commits are very possibly to cause SIGBUS issue. (We can't verify
that in customer's environment). Revert them.

Orabug: 29279692

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "xfs: xfs_cluster_write is redundant"

This reverts commit e58eae1b82358f6df9a88b1312cac667b3d968db.

These commits are very possibly to cause SIGBUS issue. (We can't verify
that in customer's environment). Revert them.

Orabug: 29279692

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "xfs: factor mapping out of xfs_do_writepage"

This reverts commit 40a82631dc131f1b7f61b2a2fe6351c382aaf04f.

These commits are very possibly to cause SIGBUS issue. (We can't verify
that in customer's environment). Revert them.

Orabug: 29279692

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Revert "xfs: don't chain ioends during writepage submission"

This reverts commit 34457adcadaf557febddb9f715368bbd5c3fd239.

These commits are very possibly to cause SIGBUS issue. (We can't verify
that in customer's environment). Revert them.

Orabug: 29279692

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

mstflint: Fix coding style issues - left with LINUX_VERSION_CODE

Description:
Issue: 1471556

Orabug: 28878697

(cherry picked from commit 30e70911bcc22ac77b13d537225d7499261caac8)
cherry-pick-repo=github.com/Mellanox/mstflint.git

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

mstflint: Fix coding-style issues

Description:
Issue: 1471556

Orabug: 28878697

(cherry picked from commit d514e6f02dcd8436e864e8113fe010898be56d10)
cherry-pick-repo=github.com/Mellanox/mstflint.git

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

mstflint: Fix errors found with checkpatch script

Description:
Issue: 1471556

Title: Fix compilation isuue

Description:
Issue: N/A

Orabug: 28878697

(cherry picked from commit 8154be122d0f841208b787b728085c565710e0f7
and dfec3c77f977344d234c93704e59a5ca12832ab1)
cherry-pick-repo=github.com/Mellanox/mstflint.git

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'
Squashed two commits since the 1st commit has a compilation
issue which is fixed by the 2nd commit.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Added support for 5th Gen devices in Secure Boot module and mtcr

Signed-off-by: Adham Masarwah <adham@mellanox.com>
Orabug: 28878697

(cherry picked from commit 4cbcf2923e05d74694fa2a5355960ca979ee8a97)
cherry-pick-repo=github.com/Mellanox/mstflint.git

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

Fix typos in mst_kernel

Signed-off-by: Adham Masarwah <adham@mellanox.com>
Orabug: 28878697

(cherry picked from commit 5fd539b720c95b557f55aa6465fc220415d3dca4)
cherry-pick-repo=github.com/Mellanox/mstflint.git

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

bnxt_en: Report PCIe link properties with pcie_print_link_status()

Orabug: 28942099

Previously the driver used pcie_get_minimum_link() to warn when the NIC
is in a slot that can't supply as much bandwidth as the NIC could use.

pcie_get_minimum_link() can be misleading because it finds the slowest link
and the narrowest link (which may be different links) without considering
the total bandwidth of each link.  For a path with a 16 GT/s x1 link and a
2.5 GT/s x16 link, it returns 2.5 GT/s x1, which corresponds to 250 MB/s of
bandwidth, not the true available bandwidth of about 1969 MB/s for a
16 GT/s x1 link.

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.  This finds
the slowest link in the path to the device by computing the total bandwidth
of each link and compares that with the capabilities of the device.

The dmesg change is:

  - PCIe: Speed %s Width x%d
  + %u.%03u Gb/s available PCIe bandwidth (%s x%d link)

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[backport of upstream commit af125b754e2f09e6061e65db8f4eda0f7730011d]

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
[backport of UEK5 commit 48b32a7f2b4dddafbf42cde882c3c84c556fb477]
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt_compat.c
drivers/net/ethernet/broadcom/bnxt/bnxt_compat.h

Signed-off-by: Brian Maly <brian.maly@oracle.com>