Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs_64.c
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Mihai Carabas [Mon, 18 Mar 2019 14:07:28 +0000 (16:07 +0200)]
x86: cpu: microcode: fix late loading SpectreV2 bugs eval
On microcode reloading we have to update the status of SpectreV2 mitigations if
they were not present at init time: we run the logic of selecting the default
mitigation in auto mode.
It was not possible to use the same functions as most of the logic is using
boot_command_line which is in init data and dropped after booting. Also we had
to drop some of the __init clauses on some functions we use in order to not
duplicate them.
This patch is not addressing alternative instructions related to SpectreV2. The
only one that we found so far is STUFF_RSB macro, and it will be addressed
later.
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Mihai Carabas [Mon, 11 Feb 2019 13:04:07 +0000 (15:04 +0200)]
x86: cpu: microcode: fix late loading SSBD and L1TF bugs eval
On microcode reloading we have to update the status of SpectreV2 mitigations if
they were not present at init time: we have opted for the default mitigation:
seccomp or prctl (so per process).
For L1TF we do not have to do anything as host mitigation does not depend on
any CPU bits and from hypervisor perspective we just call vmentry_l1d_flush_set
to re-assess the mitigation. vmentry_l1d_flush_ops is exposed through this structure:
static const struct kernel_param_ops vmentry_l1d_flush_ops = {
>-------.set = vmentry_l1d_flush_set,
>-------.get = vmentry_l1d_flush_get,
};
And can be set/get using this sysfs entry:
/sys/module/kvm_intel/parameters/vmentry_l1d_flush
It was not possible to use the same functions as most of the logic is using
boot_command_line which is in init data and dropped after booting.
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Mihai Carabas [Mon, 18 Mar 2019 13:56:16 +0000 (15:56 +0200)]
x86: cpu: microcode: Re-evaluate bugs in a CPU after microcode loading
After doing a new microcode load, we have to re-evaluate bugs in the CPU in
order to see if the microcode brought changes. We clear caps for
X86_BUG_SPEC_STORE_BYPASS, X86_BUG_CPU_MELTDOWN and X86_BUG_L1TF as these can
be modified by new microcode loading.
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Mihai Carabas [Tue, 12 Mar 2019 07:33:44 +0000 (09:33 +0200)]
x86: cpu: microcode: update flags for all cpus
After a new microcode is loaded, we need to update cpu features and flags for
all CPUs not just for CPU 0. As microcode_late_eval_cpuid is setting some of
the boot_cpu_data we are doing CPU 0 last.
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Henry Willard [Thu, 14 Mar 2019 18:01:01 +0000 (11:01 -0700)]
x86/apic: Make arch_setup_hwirq NUMA node aware
In a xen VM with vNUMA enabled, irq affinity for a device on node 1
may become stuck on CPU 0. /proc/irq/nnn/smp_affinity_list may show
affinity for all the CPUs on node 1, but this is wrong. All interrupts
are on the first CPU of node 0 which is usually CPU 0.
The problem is caused when __assign_irq_vector() is called by
arch_setup_hwirq() with a mask of all online CPUs, and then called later
with a mask including only the node 1 CPUs. The first call assigns affinity
to CPU 0, and the second tries to move affinity to the first online node 1
CPU. In the reported case this is always CPU 2. For some reason, the
CPU 0 affinity is never cleaned up, and all interrupts remain with CPU 0.
Since an incomplete move appears to be in progress, all attempts to
reassign affinity for the irq fail. Because of a quirk in how affinity is
displayed in /proc/irq/nnn/smp_affinity_list, changes may appear to work
temporarily.
It was not reproducible on baremetal on the machine I had available for
testing, but it is possible that it was observed on other machines. It
does not appear in UEK5. The APIC and IRQ code is very different in UEK5,
and the code changed here doesn't exist there. It is unknown whether KVM
guests might see the same problem with UEK4.
Making arch_setup_hwirq() NUMA sensitive eliminates the problem by
using the correct cpumask for the node for the initial assignment. The
second assignment becomes a noop. After initialization is complete,
affinity can be moved to any CPU on any node and back without a problem.
Orabug: 29292411 Signed-off-by: Henry Willard <henry.willard@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Fred Herard [Thu, 14 Mar 2019 15:50:16 +0000 (11:50 -0400)]
scsi: scsi_transport_iscsi: modify detected conn err to KERN_ERR
Fix for Orabug 29469714 is insufficient. The goal of fix for
Orabug 29469714 is to print "detected conn error" message
produced by scsi_transport_iscsi module onto console which is
particularly useful when iscsi boot device is used and there
are connection issues. The fix for Orabug 29469714 falls short
because currently, the default console log level in OL is 4
which means that only log levels less than 4 (KERN_WARNING) will
be printed to console. This commit modifies the corresponding
printk log level to KERN_ERR (log level 3).
Orabug: 29487790 Signed-off-by: Fred Herard <fred.herard@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Vasilis Liaskovitis [Tue, 12 Mar 2019 06:15:10 +0000 (14:15 +0800)]
xen/blkfront: avoid NULL blkfront_info dereference on device removal
If a block device is hot-added when we are out of grants,
gnttab_grant_foreign_access fails with -ENOSPC (log message "28
granting access to ring page") in this code path:
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Michael Chan [Tue, 11 Jul 2017 17:05:34 +0000 (13:05 -0400)]
bnxt_en: Fix race conditions in .ndo_get_stats64().
.ndo_get_stats64() may not be protected by RTNL and can race with
.ndo_stop() or other ethtool operations that can free the statistics
memory. Fix it by setting a new flag BNXT_STATE_READ_STATS and then
proceeding to read statistics memory only if the state is OPEN. The
close path that frees the memory clears the OPEN state and then waits
for the BNXT_STATE_READ_STATS to clear before proceeding to free the
statistics memory.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f9b76ebd49f97458857568918c305a17fa7c6567) Signed-off-by: aru kolappan <aru.kolappan@oracle.com>
Orabug: 29129625
Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: aru kolappan <aru.kolappan@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
If there an inode points to a block which is also some other type of
metadata block (such as a block allocation bitmap), the
buffer_verified flag can be set when it was validated as that other
metadata block type; however, it would make a really terrible external
attribute block. The reason why we use the verified flag is to avoid
constantly reverifying the block. However, it doesn't take much
overhead to make sure the magic number of the xattr block is correct,
and this will avoid potential crashes.
Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/ext4/xattr.c
In theory this should have been caught earlier when the xattr list was
verified, but in case it got missed, it's simple enough to add check
to make sure we don't overrun the xattr buffer.
Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/ext4/xattr.c
Some code does not mind if a device is bond slave or team port and treats
them the same, as generic LAG ports.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e0ba1414f310c83bf425fe26fa2cd5f1befcd6dc) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Some code does not mind if the master is bond or team and treats them
the same, as generic LAG.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7be61833042e7757745345eedc7b0efee240c189) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Similar to other helpers, caller can use this to find out if device is
team port.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f7f019ee6d117de5007d0b10e7960696bbf111eb) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Similar to other helpers, caller can use this to find out if device is
team master.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c981e4213e9d2d4ec79501bd607722ec712742a2) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/team/team.c
include/linux/netdevice.h
Fred Herard [Mon, 11 Mar 2019 22:41:43 +0000 (18:41 -0400)]
scsi: scsi_transport_iscsi: redirect conn error to console
This commit changes "detected conn error" printk
log level from KERN_INFO to KERN_WARNING. This change
is made with the assumption that KERN_WARNING messages
are configured to be redirected to console. It is
particularly useful to have detected connection errors
redirected to console when using iscsi boot device as it
may give clues as to why the system appears to be hung.
Signed-off-by: Fred Herard <fred.herard@oracle.com> Reviewed-by: Allen Pais <allen.pais@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Mridula Shastry [Wed, 6 Mar 2019 21:29:55 +0000 (13:29 -0800)]
Revert x86/apic/x2apic: set affinity of a single interrupt to one cpu
The commit 092aa78c11f0
("x86/apic/x2apic: set affinity of a single interrupt to one cpu")
was causing performance regression on block storage server on X5.
On OCI X5 server, they were not binding irqs to CPUs 1:1,
irq to cpu affinity was set to multiple cpus
(/proc/$irq/smp_affinity: 00,003ffff0,0003ffff, cpu0-17 and 36-53).
This is not the default behavior of bnxt_en. From bnxt_en,
driver, when NIC link is up, it sets irq affinity, OCI assumed that
most of the bnxt_en interrupts will go to cpu3.
After the patch "x86/apic/x2apic:
set affinity of a single interrupt to one cpu",
if we set irq to cpu 1:1, it works fine, but if we set irq affinity
to multiple cpus, it only sets irq_cfg->domain/cpumask to the first
online cpu which is on the cpu affinity list. With the current setting
which caused the perf issue, although /proc/$irq/smp_affinity is set
to multiple cpus, irq_cfg->domain cpumask only has cpu 0, this lead all
ens4f0-TxRx interrupts to route to cpu0, also iscsi target application
was being run on CPU0 during the testing which led to the performace issue.
The issue is no longer seen after the patch was reverted.
Orabug: 29449976 Signed-off-by: Mridula Shastry <mridula.c.shastry@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Dongli Zhang [Thu, 7 Mar 2019 03:19:32 +0000 (11:19 +0800)]
jiffies: use jiffies64_to_nsecs() to fix 100% steal usage for xen vcpu hotplug
[ Not relevant upstream, therefore no upstream commit. ]
To fix, use jiffies64_to_nsecs() directly instead of deriving the result
according to jiffies_to_usecs().
As the return type of jiffies_to_usecs() is 'unsigned int', when the return
value is more than the size of 'unsigned int', the leading 32 bits would be
discarded.
Suppose USEC_PER_SEC=1000000L and HZ=1000, below are the expected and
actual incorrect result of jiffies_to_usecs(0x7770ef70):
After xen vcpu hotplug and when the new vcpu steal clock is calculated for
the first time, the result of this_rq()->prev_steal_time in
steal_account_process_tick() would be far smaller than the expected
value, due to that jiffies_to_usecs() discards the leading 32 bits.
As a result, the diff between current steal and this_rq()->prev_steal_time
is always very large. Steal usage would become 100% when the initial steal
clock obtained from xen hypervisor is very large during xen vcpu hotplug,
that is, when the guest is already up for a long time.
The bug can be detected by doing the following:
* Boot xen guest with vcpus=2 and maxvcpus=4
* Leave the guest running for a month so that the initial steal clock for
the new vcpu would be very large
* Hotplug 2 extra vcpus
* The steal time of new vcpus in /proc/stat would increase abnormally and
sometimes steal usage in top can become 100%
This was incidentally fixed in the patch set starting by
commit 93825f2ec736 ("jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY")
and ended with
commit b672592f0221 ("sched/cputime: Remove generic asm headers").
Link: https://lkml.org/lkml/2019/2/28/1373 Suggested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Si-Wei Liu [Thu, 17 Jan 2019 23:44:48 +0000 (18:44 -0500)]
net_failover: delay taking over primary device to accommodate udevd renaming
There is an inherent race with udev rename in userspace due to the exposure
of two lower slave devices while kernel attempts to manage the creation
for failover bonding itself automatically. The existing userspace naming
logic in udevd was not specifically written for this in-kernel
automagic.
The clean fix for the problem is either to update the udevd to not try
rename the 3-netdev (ideally rename the device in a coordinated manner),
or to fix the kernel to hide the 2 lower devices which does not have to
be shown to userspace unless needed (1-netdev model).
However, our pursuance of 1-netdev model had not been acknowledged by
upstream, and there's no motivation in the systemd/udevd community at
this point to refactor the rename logic and make it work well with
3-netdev.
Hyper-V's netvsc mitigated this by postponing the VF's dev_open() to
allow a userspace thread to rename the device within a 100ms worth of
window. For the interim, we follow the same as done by netvsc to avoid
the renaming failure, until we move to the point where a clean solution
is available in upstream.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: John Haxby <john.haxby@oracle.com> Reviewed-by: Vijay Balakrishna <vijay.balakrishna@oracle.com> Tested-by: Vijay Balakrishna <vijay.balakrishna@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
As previously reported (https://patchwork.kernel.org/patch/8642031/)
it's possible to call shrink_dentry_list with a large number of dentries
(> 10000). This, in turn, could trigger the softlockup detector and
possibly trigger a panic. In addition to the unmount path being
vulnerable to this scenario, at SuSE we've observed similar situation
happening during process exit on processes that touch a lot of dentries.
Here is an excerpt from a crash dump. The number after the colon are
the number of dentries on the list passed to shrink_dentry_list:
While some of the callers of shrink_dentry_list do use cond_resched,
this is not sufficient to prevent softlockups. So just move
cond_resched into shrink_dentry_list from its callers.
David said: I've found hundreds of occurrences of warnings that we emit
when need_resched stays set for a prolonged period of time with the
stack trace that is included in the change log.
Link: http://lkml.kernel.org/r/1521718946-31521-1-git-send-email-nborisov@suse.com Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 32785c0539b7e96f77a14a4f4ab225712665a5a4) Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Conflicts:
fs/dcache.c Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
J. Bruce Fields [Tue, 16 Jan 2018 15:08:00 +0000 (10:08 -0500)]
NFS: commit direct writes even if they fail partially
If some of the WRITE calls making up an O_DIRECT write syscall fail,
we neglect to commit, even if some of the WRITEs succeed.
We also depend on the commit code to free the reference count on the
nfs_page taken in the "if (request_commit)" case at the end of
nfs_direct_write_completion(). The problem was originally noticed
because ENOSPC's encountered partway through a write would result in a
closed file being sillyrenamed when it should have been unlinked.
Signed-off-by: J. Bruce Fields <bfields@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
(cherry picked from commit 1b8d97b0a837beaf48a8449955b52c650a7114b4)
Orabug: 28212440 Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Calum Mackay <calum.mackay@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Mukesh Kacker [Tue, 5 Feb 2019 00:21:28 +0000 (16:21 -0800)]
rds: update correct congestion map for loopback transport
Loopback transport delivers data directly to the destination
socket since destination is always local to the machine.
The connection data structure passed to an internal
initializer API rds_inc_init() is from send side (unlike the
usual call to APIs from receive path when receive side
connection data structure is passed).
This inconsistency causes an update of the incorrect
congestion map when marking destination port congested when
loopback transport is used (which is when one or both end(s)
of a RDS connection has an IP loopback address).
The fix it to ensure correct map is updated, that of the
destination IP regardless of delivery coming from send side
of loopback transport or receive side of other transports.
Theodore Ts'o [Thu, 14 Jun 2018 04:58:00 +0000 (00:58 -0400)]
ext4: only look at the bg_flags field if it is valid
The bg_flags field in the block group descripts is only valid if the
uninit_bg or metadata_csum feature is enabled. We were not
consistently looking at this field; fix this.
Also block group #0 must never have uninitialized allocation bitmaps,
or need to be zeroed, since that's where the root inode, and other
special inodes are set up. Check for these conditions and mark the
file system as corrupted if they are detected.
Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com> Reviewed-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
fs/ext4/balloc.c
fs/ext4/ialloc.c
kernel-ueknano package provides kernel-uek with no version in it. So it
can match any kernel-uek version installed. This commit adds the version
that the rpm is built for in kernel-uek provides also.
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Christoph Paasch [Wed, 27 Sep 2017 00:38:50 +0000 (17:38 -0700)]
net: Set sk_prot_creator when cloning sockets to the right proto
sk->sk_prot and sk->sk_prot_creator can differ when the app uses
IPV6_ADDRFORM (transforming an IPv6-socket to an IPv4-one).
Which is why sk_prot_creator is there to make sure that sk_prot_free()
does the kmem_cache_free() on the right kmem_cache slab.
Now, if such a socket gets transformed back to a listening socket (using
connect() with AF_UNSPEC) we will allocate an IPv4 tcp_sock through
sk_clone_lock() when a new connection comes in. But sk_prot_creator will
still point to the IPv6 kmem_cache (as everything got copied in
sk_clone_lock()). When freeing, we will thus put this
memory back into the IPv6 kmem_cache although it was allocated in the
IPv4 cache. I have seen memory corruption happening because of this.
With slub-debugging and MEMCG_KMEM enabled this gives the warning
"cache_from_obj: Wrong slab cache. TCPv6 but object is from TCP"
A C-program to trigger this:
void main(void)
{
int fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP);
int new_fd, newest_fd, client_fd;
struct sockaddr_in6 bind_addr;
struct sockaddr_in bind_addr4, client_addr1, client_addr2;
struct sockaddr unsp;
int val;
As far as I can see, this bug has been there since the beginning of the
git-days.
Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9d538fa60bad4f7b23193c89e843797a1cf71ef3)
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Regardless of whether the flex_bg feature is set, we should always
check to make sure the bits we are setting in the block bitmap are
within the block group bounds.
Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
It's really bad when the allocation bitmaps and the inode table
overlap with the block group descriptors, since it causes random
corruption of the bg descriptors. So we really want to head those off
at the pass.
Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Fred Herard <fred.herard@oracle.com> Reviewed-by: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Eric Biggers [Fri, 8 Dec 2017 15:13:27 +0000 (15:13 +0000)]
KEYS: add missing permission check for request_key() destination
When the request_key() syscall is not passed a destination keyring, it
links the requested key (if constructed) into the "default" request-key
keyring. This should require Write permission to the keyring. However,
there is actually no permission check.
This can be abused to add keys to any keyring to which only Search
permission is granted. This is because Search permission allows joining
the keyring. keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_SESSION_KEYRING)
then will set the default request-key keyring to the session keyring.
Then, request_key() can be used to add keys to the keyring.
Both negatively and positively instantiated keys can be added using this
method. Adding negative keys is trivial. Adding a positive key is a
bit trickier. It requires that either /sbin/request-key positively
instantiates the key, or that another thread adds the key to the process
keyring at just the right time, such that request_key() misses it
initially but then finds it in construct_alloc_key().
Fix this bug by checking for Write permission to the keyring in
construct_get_dest_keyring() when the default keyring is being used.
We don't do the permission check for non-default keyrings because that
was already done by the earlier call to lookup_user_key(). Also,
request_key_and_link() is currently passed a 'struct key *' rather than
a key_ref_t, so the "possessed" bit is unavailable.
We also don't do the permission check for the "requestor keyring", to
continue to support the use case described by commit 8bbf4976b59f
("KEYS: Alter use of key instantiation link-to-keyring argument") where
/sbin/request-key recursively calls request_key() to add keys to the
original requestor's destination keyring. (I don't know of any users
who actually do that, though...)
Fixes: 3e30148c3d52 ("[PATCH] Keys: Make request-key create an authorisation key") Cc: <stable@vger.kernel.org> # v2.6.13+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com>
(cherry picked from commit 4dca6ea1d9432052afb06baf2e3ae78188a4410b)
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
security/keys/request_key.c
David Howells [Mon, 19 Oct 2015 10:20:28 +0000 (11:20 +0100)]
KEYS: Don't permit request_key() to construct a new keyring
If request_key() is used to find a keyring, only do the search part - don't
do the construction part if the keyring was not found by the search. We
don't really want keyrings in the negative instantiated state since the
rejected/negative instantiation error value in the payload is unioned with
keyring metadata.
Now the kernel gives an error:
request_key("keyring", "#selinux,bdekeyring", "keyring", KEY_SPEC_USER_SESSION_KEYRING) = -1 EPERM (Operation not permitted)
Reviewed-by: John Donnelly <John.p.donnelly@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Håkon Bugge [Wed, 6 Feb 2019 18:28:52 +0000 (19:28 +0100)]
mlx4_ib: Distribute completion vectors when zero is supplied
MAD packet sending/receiving is not properly virtualized in
CX-3. Hence, these are proxied through the PF driver. The proxying
uses UD QPs. The associated CQs are created with completion vector
zero, in anticipation that zero will return the least-used vector, as
per commit 6ba1eb776461 ("IB/mlx4: Scatter CQs to different EQs").
However, this does not happen, and we see that only the first EQ is
used for these proxy QPs.
This leads to great imbalance in CPU processing, in particular during
fail-over and fail-back, when a large number of RDMA CM
requests/responses are proxied through the PF driver.
The current netpoll implementation in the bnxt_en driver has problems
that may miss TX completion events. bnxt_poll_work() in effect is
only handling at most 1 TX packet before exiting. In addition,
there may be in flight TX completions that ->poll() may miss even
after we fix bnxt_poll_work() to handle all visible TX completions.
netpoll may not call ->poll() again and HW may not generate IRQ
because the driver does not ARM the IRQ when the budget (0 for netpoll)
is reached.
We fix it by handling all TX completions and to always ARM the IRQ
when we exit ->poll() with 0 budget.
Also, the logic to ACK the completion ring in case it is almost filled
with TX completions need to be adjusted to take care of the 0 budget
case, as discussed with Eric Dumazet <edumazet@google.com>
Reported-by: Song Liu <songliubraving@fb.com> Reviewed-by: Song Liu <songliubraving@fb.com> Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 73f21c653f930f438d53eed29b5e4c65c8a0f906) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Fix bug in the error code path when bnxt_request_irq() returns failure.
bnxt_disable_napi() should not be called in this error path because
NAPI has not been enabled yet.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c58387ab1614f6d7fb9e244f214b61e7631421fc) Signed-off-by: Brian Maly <brian.maly@oracle.com>
A recent change to reduce delay granularity waiting for firmware
reponse has caused a regression. With a tighter delay loop,
the driver may see the beginning part of the response faster.
The original 5 usec delay to wait for the rest of the message
is not long enough and some messages are detected as invalid.
Increase the maximum wait time from 5 usec to 20 usec. Also, fix
the debug message that shows the total delay time for the response
when the message times out. With the new logic, the delay time
is not fixed per iteration of the loop, so we define a macro to
show the total delay time.
Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cc559c1ac250a6025bd4a9528e424b8da250655b) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
Testing with DIM enabled on older kernels indicated that firmware calls
were slower than expected. More detailed analysis indicated that the
default 25us delay was higher than necessary. Reducing the time spend in
usleep_range() for the first several calls would reduce the overall
latency of firmware calls on newer Intel processors.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9751e8e714872aa650b030e52a9fafbb694a3714) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
When open fails during ethtool -L ring change, for example, the driver
may crash at bnxt_free_irq() because bp->bnapi is NULL.
If we fail to allocate all the new rings, bnxt_open_nic() will free
all the memory including bp->bnapi. Subsequent call to bnxt_close_nic()
will try to dereference bp->bnapi in bnxt_free_irq().
Fix it by checking for !bp->bnapi in bnxt_free_irq().
Fixes: e5811b8c09df ("bnxt_en: Add IRQ remapping logic.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cb98526bf9b985866d648dbb9c983ba9eb59daba) Signed-off-by: Brian Maly <brian.maly@oracle.com>
During initialization, if we encounter errors, there is a code path that
calls bnxt_hwrm_vnic_set_tpa() with invalid VNIC ID. This may cause a
warning in firmware logs.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3c4fe80b32c685bdc02b280814d0cfe80d441c72) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Calling bnxt_set_max_func_irqs() to modify the max IRQ count requested or
freed by the RDMA driver is flawed. The max IRQ count is checked when
re-initializing the IRQ vectors and this can happen multiple times
during ifup or ethtool -L. If the max IRQ is reduced and the RDMA
driver is operational, we may not initailize IRQs correctly. This
problem shows up on VFs with very small number of MSIX.
There is no other logic that relies on the IRQ count excluding the ones
used by RDMA. So we fix it by just removing the call to subtract or
add the IRQs used by RDMA.
Fixes: a588e4580a7e ("bnxt_en: Add interface to support RDMA driver.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 30f529473ec962102e8bcd33a6a04f1e1b490ae2) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.h
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
If all pages are deleted from the mapping by memory reclaim and also
moved to the cleancache:
__delete_from_page_cache
(no shadow case)
unaccount_page_cache_page
cleancache_put_page
page_cache_delete
mapping->nrpages -= nr
(nrpages becomes 0)
We don't clean the cleancache for an inode after final file truncation
(removal).
truncate_inode_pages_final
check (nrpages || nrexceptional) is false
no truncate_inode_pages
no cleancache_invalidate_inode(mapping)
These way when reading the new file created with same inode we may get
these trash leftover pages from cleancache and see wrong data instead of
the contents of the new file.
Fix it by always doing truncate_inode_pages which is already ready for
nrpages == 0 && nrexceptional == 0 case and just invalidates inode.
[akpm@linux-foundation.org: add comment, per Jan] Link: http://lkml.kernel.org/r/20181112095734.17979-1-ptikhomirov@virtuozzo.com Fixes: commit 91b0abe36a7b ("mm + fs: store shadow entries in page cache") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orabug: 29364670
CVE: CVE-2018-16862
Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Jacob Wen [Wed, 30 Jan 2019 06:55:14 +0000 (14:55 +0800)]
l2tp: fix reading optional fields of L2TPv3
Use pskb_may_pull() to make sure the optional fields are in skb linear
parts, so we can safely read them later.
It's easy to reproduce the issue with a net driver that supports paged
skb data. Just create a L2TPv3 over IP tunnel and then generates some
network traffic.
Once reproduced, rx err in /sys/kernel/debug/l2tp/tunnels will increase.
Changes in v4:
1. s/l2tp_v3_pull_opt/l2tp_v3_ensure_opt_in_linear/
2. s/tunnel->version != L2TP_HDR_VER_2/tunnel->version == L2TP_HDR_VER_3/
3. Add 'Fixes' in commit messages.
Changes in v3:
1. To keep consistency, move the code out of l2tp_recv_common.
2. Use "net" instead of "net-next", since this is a bug fix.
Changes in v2:
1. Only fix L2TPv3 to make code simple.
To fix both L2TPv3 and L2TPv2, we'd better refactor l2tp_recv_common.
It's complicated to do so.
2. Reloading pointers after pskb_may_pull
Fixes: f7faffa3ff8e ("l2tp: Add L2TPv3 protocol support") Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support") Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6") Signed-off-by: Jacob Wen <jian.w.wen@oracle.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4522a70db7aa5e77526a4079628578599821b193) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/l2tp/l2tp_core.c
net/l2tp/l2tp_ip.c
net/l2tp/l2tp_ip6.c
Commit 2b139e6b1ec8 ("l2tp: remove ->recv_payload_hook") is not in UEK5.
l2tp_core.h:
s/l2tp_get_l2specific_len(session)/session->l2specific_len/ due to 62e7b6a57c7b ("l2tp: remove l2specific_len dependency in l2tp_core")
is not in UEK4.
syzbot reported crashes [1] and provided a C repro easing bug hunting.
When/if packet_do_bind() calls __unregister_prot_hook() and releases
po->bind_lock, another thread can run packet_notifier() and process an
NETDEV_UP event.
This calls register_prot_hook() and hooks again the socket right before
first thread is able to grab again po->bind_lock.
Fixes this issue by temporarily setting po->num to 0, as suggested by
David Miller.
Fixes: 30f7ea1c2b5f ("packet: race condition in packet_bind") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Francesco Ruggeri <fruggeri@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
(cherry picked from commit 72b01bee76591d3741a87a689bf210d729d530f8)
Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: John Donnelly <John.P.Donnelly@oracle.com> Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Bart Van Assche [Mon, 18 Feb 2019 02:55:59 +0000 (10:55 +0800)]
blk-mq: Do not invoke .queue_rq() for a stopped queue
The meaning of the BLK_MQ_S_STOPPED flag is "do not call
.queue_rq()". Hence modify blk_mq_make_request() such that requests
are queued instead of issued if a queue has been stopped.
Reported-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
Orabug: 28766011
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
- There are so many commits between the most recent uek4 block commit and
this upstream commit, and blk_mq_make_request() in upstream commit is
different with uek4. blk_mq_direct_issue_request() is not available.
- The 3rd argument of blk_mq_insert_request() is set to false in this
backport because there is no need to run the queue again when it is
already stopped.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Alexander Burmashev [Thu, 7 Feb 2019 15:49:31 +0000 (07:49 -0800)]
uek-rpm: use multi-threaded xz compression for rpms
By default kernel rpms use xz compression for all rpms files and
gzip compression for src.rpms. This commit changes compression type to xz
for all rpms produced, and enables multi-threaded compression for rpms. It
allows to use as many threads as there are CPU cores available.
Signed-off-by: Alex Burmashev <alexander.burmashev@oracle.com> Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Alexander Burmashev [Thu, 7 Feb 2019 15:45:10 +0000 (07:45 -0800)]
uek-rpm: optimize find-requires usage
There are no deps found for debuginfo-common, doc and headers subpackage, so
Autoreq: no is now set for them, also switch from /usr/lib/rpm/redhat/find-requires
usage to /usr/lib/rpm/rpmdeps --requires, since latter is faster and less buggy.
Both changes noticeably boost kernel rpm build time.
Signed-off-by: Alex Burmashev <alexander.burmashev@oracle.com> Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Add a -j <n> option, which, when used, will spawn <n> processes to do the
debuginfo extraction in parallel. A pipe is used to dispatch the files among
the processes.
Signed-off-by: Michal Marek <mmarek@suse.com>
Orabug: 29323635
Signed-off-by: Alex Burmashev <alexander.burmashev@oracle.com> Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Reviewed-by: Todd Vierling <todd.vierling@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Tom Lendacky [Fri, 23 Feb 2018 23:18:20 +0000 (00:18 +0100)]
KVM: SVM: Add MSR-based feature support for serializing LFENCE
In order to determine if LFENCE is a serializing instruction on AMD
processors, MSR 0xc0011029 (MSR_F10H_DECFG) must be read and the state
of bit 1 checked. This patch will add support to allow a guest to
properly make this determination.
Add the MSR feature callback operation to svm.c and add MSR 0xc0011029
to the list of MSR-based features. If LFENCE is serializing, then the
feature is supported, allowing the hypervisor to set the value of the
MSR that guest will see. Support is also added to write (hypervisor only)
and read the MSR value for the guest. A write by the guest will result in
a #GP. A read by the guest will return the value as set by the host. In
this way, the support to expose the feature to the guest is controlled by
the hypervisor.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry picked from commit d1d93fa90f1afa926cb060b7f78ab01a65705b4d)
Signed-off-by: John Haxby <john.haxby@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Kris Van Hees [Sun, 13 Jan 2019 03:10:01 +0000 (22:10 -0500)]
dtrace: support kernels built with RANDOMIZE_BASE
SDT probe addresses were being generated as absolute addresses which
breaks when the kernel may get relocated to a place other than the
default load address.
The solution is to generate the probe locations as an offset relative to
the _stext symbol in the .tmp_sdtinfo.S source file (generated at build
time), so that the actual addresses are processed as relocations when the
kernel boots.
This fix also optimizes the SDT info data (function and probe names) by
using de-duplication since especially with perf probes) many of those
strings are non-unique.
Orabug: 29204005 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Reviewed-by: Nick Alcock <nick.alcock@oracle.com> Tested-by: John Haxby <john.haxby@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Tom Lendacky [Mon, 2 Jul 2018 21:36:02 +0000 (16:36 -0500)]
x86/bugs: Fix the AMD SSBD usage of the SPEC_CTRL MSR
On AMD, the presence of the MSR_SPEC_CTRL feature does not imply that the
SSBD mitigation support should use the SPEC_CTRL MSR. Other features could
have caused the MSR_SPEC_CTRL feature to be set, while a different SSBD
mitigation option is in place.
Update the SSBD support to check for the actual SSBD features that will
use the SPEC_CTRL MSR.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6ac2f49edb1e ("x86/bugs: Add AMD's SPEC_CTRL MSR usage") Link: http://lkml.kernel.org/r/20180702213602.29202.33151.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 612bc3b3d4be749f73a513a17d9b3ee1330d3487)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
Different filename (bugs_64.c) and different context
Konrad Rzeszutek Wilk [Fri, 1 Jun 2018 14:59:20 +0000 (10:59 -0400)]
x86/bugs: Add AMD's SPEC_CTRL MSR usage
The AMD document outlining the SSBD handling
124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf
mentions that if CPUID 8000_0008.EBX[24] is set we should be using
the SPEC_CTRL MSR (0x48) over the VIRT SPEC_CTRL MSR (0xC001_011f)
for speculative store bypass disable.
This in effect means we should clear the X86_FEATURE_VIRT_SSBD
flag so that we would prefer the SPEC_CTRL MSR.
See the document titled:
124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf
A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199889
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: kvm@vger.kernel.org Cc: KarimAllah Ahmed <karahmed@amazon.de> Cc: andrew.cooper3@citrix.com Cc: Joerg Roedel <joro@8bytes.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20180601145921.9500-3-konrad.wilk@oracle.com
(cherry picked from commit 6ac2f49edb1ef5446089c7c660017732886d62d6)
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/include/asm/cpufeatures.h
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/cpu/common.c
arch/x86/kvm/cpuid.c
arch/x86/kvm/svm.c
The conflicts were due to different filenames (cpufeature.h, bugs_64.c) and different context.
Mihai Carabas [Wed, 7 Nov 2018 07:00:03 +0000 (09:00 +0200)]
x86/cpufeatures: rename X86_FEATURE_AMD_SSBD to X86_FEATURE_LS_CFG_SSBD
The commit 52817587e706 ('x86/cpufeatures: Disentangle SSBD enumeration') from
upstream disentangles SSBD enumeration. We did not backport that commit because
we did not have what to disentangle on UEK4. Our cpufeature was already
synthetic.
That commit also renames X86_FEATURE_AMD_SSBD to X86_FEATURE_LS_CFG_SSBD. We
need this rename in order to not have conflicting cpu features while
backporting commit 6ac2f49edb1e ('x86/bugs: Add AMD's SPEC_CTRL MSR usage')
from upstream which introduces SPEC_CTRL MSR, which will be the prefered
method.
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Make file credentials available to the seqfile interfaces
A lot of seqfile users seem to be using things like %pK that uses the
credentials of the current process, but that is actually completely
wrong for filesystem interfaces.
The unix semantics for permission checking files is to check permissions
at _open_ time, not at read or write time, and that is not just a small
detail: passing off stdin/stdout/stderr to a suid application and making
the actual IO happen in privileged context is a classic exploit
technique.
So if we want to be able to look at permissions at read time, we need to
use the file open credentials, not the current ones. Normal file
accesses can just use "f_cred" (or any of the helper functions that do
that, like file_ns_capable()), but the seqfile interfaces do not have
any such options.
It turns out that seq_file _does_ save away the user_ns information of
the file, though. Since user_ns is just part of the full credential
information, replace that special case with saving off the cred pointer
instead, and suddenly seq_file has all the permission information it
needs.
Conflict: Refactored include/linux/seq_file.h to include __GENKSYM__
and UEK_KABI_REPLACE() to pass check_kabi test.
Signed-off-by: John Donnelly <john.p.donnelly@oracle.com> Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Jann Horn [Fri, 5 Oct 2018 22:51:58 +0000 (15:51 -0700)]
proc: restrict kernel stack dumps to root
Currently, you can use /proc/self/task/*/stack to cause a stack walk on
a task you control while it is running on another CPU. That means that
the stack can change under the stack walker. The stack walker does
have guards against going completely off the rails and into random
kernel memory, but it can interpret random data from your kernel stack
as instruction pointers and stack pointers. This can cause exposure of
kernel stack contents to userspace.
Restrict the ability to inspect kernel stacks of arbitrary tasks to root
in order to prevent a local attacker from exploiting racy stack unwinding
to leak kernel task stack contents. See the added comment for a longer
rationale.
There don't seem to be any users of this userspace API that can't
gracefully bail out if reading from the file fails. Therefore, I believe
that this change is unlikely to break things. In the case that this patch
does end up needing a revert, the next-best solution might be to fake a
single-entry stack based on wchan.
Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com Fixes: 2ec220e27f50 ("proc: add /proc/*/stack") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ken Chen <kenchen@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f8a00cef17206ecd1b30d3d9f99e10d9fa707aa7)
Signed-off-by: John Donnelly <john.p.donnelly@oracle.com> Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
include/linux/compiler-gcc.h
include/linux/module.h
UEK4 either implements the changes in different files
or it does not have the patches that introduce the
lines changed by this cherry-picked commit.
Masahiro Yamada [Wed, 5 Dec 2018 06:27:19 +0000 (15:27 +0900)]
x86/build: Fix compiler support check for CONFIG_RETPOLINE
It is troublesome to add a diagnostic like this to the Makefile
parse stage because the top-level Makefile could be parsed with
a stale include/config/auto.conf.
Once you are hit by the error about non-retpoline compiler, the
compilation still breaks even after disabling CONFIG_RETPOLINE.
The easiest fix is to move this check to the "archprepare" like
this commit did:
829fe4aa9ac1 ("x86: Allow generating user-space headers without a compiler")
Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support") Link: http://lkml.kernel.org/r/1543991239-18476-1-git-send-email-yamada.masahiro@socionext.com Link: https://lkml.org/lkml/2018/12/4/206 Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 25896d073d8a0403b07e6dec56f58e6c33678207)
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/Makefile
The archprepare rule is different in UEK and upstream makefiles
Zhenzhong Duan [Fri, 2 Nov 2018 08:45:41 +0000 (01:45 -0700)]
x86/retpoline: Remove minimal retpoline support
Now that CONFIG_RETPOLINE hard depends on compiler support, there is no
reason to keep the minimal retpoline support around which only provided
basic protection in the assembly files.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <srinivas.eeda@oracle.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/f06f0a89-5587-45db-8ed2-0a9d6638d5c0@default
(cherry picked from commit ef014aae8f1cd2793e4e014bbb102bed53f852b7)
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/kernel/cpu/bugs.c
UEK4 has the corresponding code in bugs_64.c
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
arch/x86/Kconfig
arch/x86/include/asm/nospec-branch.h
Minor differences between UEK and upstream.
arch/x86/Makefile
Need to add line defining RETPOLINE_CFLAGS.
arch/x86/kernel/cpu/bugs.c
UEK4 has the corresponding code in bugs_64.c
scripts/Makefile.build
Commit e699314 (objtool: Add retpoline validation) has not
been ported to UEK4, nothing to change.
nl80211: check for the required netlink attributes presence
nl80211_set_rekey_data() does not check if the required attributes
NL80211_REKEY_DATA_{REPLAY_CTR,KEK,KCK} are present when processing
NL80211_CMD_SET_REKEY_OFFLOAD request. This request can be issued by
users with CAP_NET_ADMIN privilege and may result in NULL dereference
and a system crash. Add a check for the required attributes presence.
This patch is based on the patch by bo Zhang.
This fixes CVE-2017-12153.
References: https://bugzilla.redhat.com/show_bug.cgi?id=1491046 Fixes: e5497d766ad ("cfg80211/nl80211: support GTK rekey offload") Cc: <stable@vger.kernel.org> # v3.1-rc1 Reported-by: bo Zhang <zhangbo5891001@gmail.com> Signed-off-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
(cherry picked from commit e785fa0a164aa11001cba931367c7f94ffaff888)
Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
lpfc cannot establish connection with targets that send PRLI in P2P
configurations.
If lpfc rejects a PRLI that is sent from a target the target will not
resend and will reject the PRLI send from the initiator.
[tv: original mistakenly applied in reverse, because change was already
present in the code at that point; this reapplies forwards]
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Todd Vierling <todd.vierling@oracle.com> Reviewed-by: Fred Herard <fred.herard@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Mukesh Kacker [Wed, 1 Aug 2018 18:37:01 +0000 (11:37 -0700)]
rds: congestion updates can be missed when kernel low on memory
The congestion updates are allocated under GFP_NOWAIT and can
fail under temporary memory pressure. These are not retried and
the update here retries them until sent.
On receiving congestion updates, corrupt packet check failures
are not logged as warnings.
Temporary memory allocation failures may cause an RDS connection
to be stuck in an endless RNR (receiver Not Ready). Right around the
time the RDS connection becomes stuck, it reports these recv buffer
allocation failures:
This currently doesn't take into account memory allocation failures.
A bit later the memory pressure clears away.
But RDS does not refill receive buffers for that connection any more.
This is because the receiver is only woken up on the last packet of a
multi-packet message. But the last packet is never received, because the
recv queue becomes empty and we end up in the endless RNR Retry situation.
Zhu Yanjun [Fri, 25 Jan 2019 02:14:52 +0000 (21:14 -0500)]
net: rds: fix excess initialization of the recv SGEs
In rds_ib_recv_init_ring(), an excess array element is incorrectly
initialized. This is not an OOB situation, as the sge array is
initialized to eight entries. With a fragment size of a maximum of 16KiB
and a page size of minimum 4KiB, then num_send_sge can at most become
five.
Mathias Nyman [Fri, 11 Dec 2015 12:38:06 +0000 (14:38 +0200)]
xhci: fix usb2 resume timing and races.
According to USB 2 specs ports need to signal resume for at least 20ms,
in practice even longer, before moving to U0 state.
Both host and devices can initiate resume.
On device initiated resume, a port status interrupt with the port in resume
state in issued. The interrupt handler tags a resume_done[port]
timestamp with current time + USB_RESUME_TIMEOUT, and kick roothub timer.
Root hub timer requests for port status, finds the port in resume state,
checks if resume_done[port] timestamp passed, and set port to U0 state.
On host initiated resume, current code sets the port to resume state,
sleep 20ms, and finally sets the port to U0 state. This should also
be changed to work in a similar way as the device initiated resume, with
timestamp tagging, but that is not yet tested and will be a separate
fix later.
There are a few issues with this approach
1. A host initiated resume will also generate a resume event. The event
handler will find the port in resume state, believe it's a device
initiated resume, and act accordingly.
2. A port status request might cut the resume signalling short if a
get_port_status request is handled during the host resume signalling.
The port will be found in resume state. The timestamp is not set leading
to time_after_eq(jiffies, timestamp) returning true, as timestamp = 0.
get_port_status will proceed with moving the port to U0.
3. If an error, or anything else happens to the port during device
initiated resume signalling it will leave all the device resume
parameters hanging uncleared, preventing further suspend, returning
-EBUSY, and cause the pm thread to busyloop trying to enter suspend.
Fix this by using the existing resuming_ports bitfield to indicate that
resume signalling timing is taken care of.
Check if the resume_done[port] is set before using it for timestamp
comparison, and also clear out any resume signalling related variables
if port is not in U0 or Resume state
This issue was discovered when a PM thread busylooped, trying to runtime
suspend the xhci USB 2 roothub on a Dell XPS
(cherry picked from commit f69115fdbc1ac0718e7d19ad3caa3da2ecfe1c96) Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Mathias Nyman [Wed, 18 Nov 2015 08:48:22 +0000 (10:48 +0200)]
xhci: Fix a race in usb2 LPM resume, blocking U3 for usb2 devices
Clear device initiated resume variables once device is fully up and running
in U0 state.
Resume needs to be signaled for 20ms for usb2 devices before they can be
moved to U0 state.
An interrupt is triggered if a device initiates resume. As we handle the
event in interrupt context we can not sleep for 20ms, so we instead set
a resume flag, a timestamp, and start the roothub polling.
The roothub code will later move the port to U0 when it finds a port in
resume state with the resume flag set, and timestamp passed by 20ms.
A host initiated resume is however not done in interrupt context, and
host initiated resume code will directly signal resume, wait 20ms and then
move the port to U0.
These two codepaths can race, if we are in the middle of a host initated
resume, while sleeping for 20ms, we may handle a port event and find the
port in resume state. The port event handling code will assume the resume
was device initiated and set the resume flag and timestamp.
Root hub code will however not catch the port in resume state again as the
host initated resume code has already moved the port to U0.
The resume flag and timestamp will remain set for this port preventing port
from suspending again (LPM setting port to U3)
Fix this for now by always clearing the device initated resume parameters
once port is in U0
(cherry picked from commit dad67d5f3d0efe01d38c6cebcb6698280e51927b) Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/usb/host/xhci-hub.c
Andrea Arcangeli [Fri, 14 Dec 2018 22:17:17 +0000 (14:17 -0800)]
userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered
Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd
could trigger an harmless false positive WARN_ON. Check the vma is
already registered before checking VM_MAYWRITE to shut off the false
positive warning.
Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com Cc: <stable@vger.kernel.org> Fixes: 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: syzbot+06c7092e7d71218a2c16@syzkaller.appspotmail.com Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 29163750
CVE: CVE-2018-18397
Andrea Arcangeli [Fri, 30 Nov 2018 22:09:32 +0000 (14:09 -0800)]
userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
After the VMA to register the uffd onto is found, check that it has
VM_MAYWRITE set before allowing registration. This way we inherit all
common code checks before allowing to fill file holes in shmem and
hugetlbfs with UFFDIO_COPY.
The userfaultfd memory model is not applicable for readonly files unless
it's a MAP_PRIVATE.
Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com Fixes: ff62a3421044 ("hugetlb: implement memfd sealing") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Hugh Dickins <hughd@google.com> Reported-by: Jann Horn <jannh@google.com> Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Cc: <stable@vger.kernel.org> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 29163750
CVE: CVE-2018-18397
Jianchao Wang [Wed, 9 Jan 2019 08:20:24 +0000 (03:20 -0500)]
x86/apic/x2apic: set affinity of a single interrupt to one cpu
Customer want to offline the cpus to 2 per node. And finally, a
lpfc HBA cannot work any more due to no available
irq vectors.
[ 51.031812] IRQ 284 set affinity failed because there are no available vectors. The device assigned to this IRQ is unstable.
[ 51.031817] IRQ 285 set affinity failed because there are no available vectors. The device assigned to this IRQ is unstable.
[ 51.031822] IRQ 286 set affinity failed because there are no available vectors. The device assigned to this IRQ is unstable.
[ 51.031827] IRQ 287 set affinity failed because there are no available vectors. The device assigned to this IRQ is unstable.
It was due to cluster_vector_allocation_domain which want to set
interrupt affinity of a single interrupt to multiple CPUs and need
a same irq vector to be available on multiple cpus. This is difficult
for customer's case where there are a lot of HBAs on node 0 and only
2 or 4 cpus online there.
And actually, this feature has been discarded by the upstream.
https://lkml.org/lkml/2017/9/13/576
We close this feature by just set one cpu in retmask in
cluster_vector_allocation_domain.
Customer that encountered this issue used RHCK, since UEK4 also
has the same code, post a same patch for UEK4
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Jianchao Wang <jianchao.w.wang.oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Suggested-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com> Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Dongli Zhang [Wed, 23 Jan 2019 07:47:33 +0000 (15:47 +0800)]
xen/blkback: do not BUG() for invalid blkif_request from frontend
Upstream commit 0e367ae46503 ("xen/blkback: correctly respond to unknown,
non-native requests") fixed a bug to correctly respond to unknown,
non-native requests, e.g., BLKIF_OP_RESERVED_1 or BLKIF_OP_PACKET for
64-bit SLES 11 guests when using a 32-bit backend.
Although such fix is already in uek4, it is broken by commit f0af2f840606
("xen-blkback: move indirect req allocation out-of-line") that introduced
the BUG() again.
This patch removes the BUG() to avoid panic backend by invalid
blkif_request from frontend.
commit 041dc3e4d3
("Backport multipath RDS from upstream to UEK4") treats an
incoming rds ping or rds pong differently if the local (in case of pong) or
sender's port (in case of ping) is 1 (RDS_FLAG_PROBE_PORT).
There is nothing stopping rds-ping from picking this port for it's local side
since it does wildcard socket bind.
Dongli Zhang [Mon, 21 Jan 2019 01:56:55 +0000 (09:56 +0800)]
xen-netback: wake up xenvif_dealloc_kthread when it should stop
The feature 'staging grant' changed the behaviour of
xenvif_zerocopy_callback() that queue->dealloc_prod may not increase during
the do-while loop because of 'staging grant'. As a result,
xenvif_skb_zerocopy_complete() would not wake up xenvif_dealloc_kthread
because (prod == queue->dealloc_prod).
This makes trouble when the xenvif_dealloc_kthread is requested to stop by
xenvif_disconnect(). When xenvif_dealloc_kthread is stopped while
inflight_packets is not 0, xenvif_dealloc_kthread would not exit until
inflight_packets becomes 0.
However, because of 'staging grant', xenvif_skb_zerocopy_complete() would
not wake up xenvif_dealloc_kthread() although inflight_packets is
decremented and already becomes 0. As a result, xenvif_dealloc_kthread will
never wakes up.
xenvif_skb_zerocopy_complete() should wake up xenvif_dealloc_kthread when
the latter is in the progress to stop.
Fixes: fdbb2e3659b3 ("xen-netback: use gref mappings for Tx requests") Reported-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'
Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Aron Silverton <aron.silverton@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'
Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Aron Silverton <aron.silverton@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'
Squashed two commits since the 1st commit has a compilation
issue which is fixed by the 2nd commit.
Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Aron Silverton <aron.silverton@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'
Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Aron Silverton <aron.silverton@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
Files are relocated from 'kernel' directory to
'drivers/net/ethernet/mellanox/mstflint_access'
Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Aron Silverton <aron.silverton@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Previously the driver used pcie_get_minimum_link() to warn when the NIC
is in a slot that can't supply as much bandwidth as the NIC could use.
pcie_get_minimum_link() can be misleading because it finds the slowest link
and the narrowest link (which may be different links) without considering
the total bandwidth of each link. For a path with a 16 GT/s x1 link and a
2.5 GT/s x16 link, it returns 2.5 GT/s x1, which corresponds to 250 MB/s of
bandwidth, not the true available bandwidth of about 1969 MB/s for a
16 GT/s x1 link.
Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself. This finds
the slowest link in the path to the device by computing the total bandwidth
of each link and compares that with the capabilities of the device.
Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
[backport of UEK5 commit 48b32a7f2b4dddafbf42cde882c3c84c556fb477] Signed-off-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt_compat.c
drivers/net/ethernet/broadcom/bnxt/bnxt_compat.h