Santosh Shilimkar [Fri, 14 Aug 2015 15:18:28 +0000 (08:18 -0700)]
Merge branch 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek:
dtrace: ensure SDT module probes work with NORX
dtrace: prevent the stack protector from breaking syscall tracing.
kallsyms: make it possible to disable /proc/kallmodsyms
Kris Van Hees [Thu, 13 Aug 2015 16:23:54 +0000 (12:23 -0400)]
dtrace: ensure SDT module probes work with NORX
When the CONFIG_DEBUG_SET_MODULE_RONX option is enabled in the kernel,
the executable code of the module is marked read-only prior to calling
the module notifier. This causes the patching of NOPs in the probe
locations to fail with a system crash.
Because patching of SDT probe locations should happen upon module load
for DTrace-enabled kernels regardless of whether or not DTrace will be
used, it is both cleaner and more correct to make a direct call to the
code that handles the patching rather than using the notifier mechanism.
The new call takes place prior to marking the pages read-only, avoiding
the problem altogether.
Orabug: 21630297 Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Acked-by: Nick Alcock <nick.alcock@oracle.com>
Santosh Shilimkar [Thu, 13 Aug 2015 21:28:49 +0000 (14:28 -0700)]
Merge branch 'topic/uek-4.1/xen' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* 'topic/uek-4.1/xen' of git://ca-git.us.oracle.com/linux-uek: (31 commits)
xen-netfront: Remove the meaningless code
net/xen-netfront: Correct printf format in xennet_get_responses
xen-netfront: Use setup_timer
xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
Revert "xen/events/fifo: Handle linked events when closing a port"
x86/xen: build "Xen PV" APIC driver for domU as well
xen/events/fifo: Handle linked events when closing a port
xen: release lock occasionally during ballooning
xen/gntdevt: Fix race condition in gntdev_release()
block/xen-blkback: s/nr_pages/nr_segs/
block/xen-blkfront: Remove invalid comment
arm/xen: Drop duplicate define mfn_to_virt
xen/grant-table: Remove unused macro SPP
xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
xen: Include xen/page.h rather than asm/xen/page.h
kconfig: add xenconfig defconfig helper
kconfig: clarify kvmconfig is for kvm
xen/pcifront: Remove usage of struct timeval
xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
hvc_xen: avoid uninitialized variable warning
...
Santosh Shilimkar [Thu, 13 Aug 2015 21:28:40 +0000 (14:28 -0700)]
Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek:
mlx4_vnic: Skip fip discover restart if pkey index not changed
rds: rds_ib_device.refcount overflow
rds_rdma: rds_sendmsg should return EAGAIN if connection not setup
rds_rdma: allocate FMR according to max_item_soft
rds_rdma: do not dealloc fmrs in the pool under use
rds: set fmr pool dirty_count correctly
Santosh Shilimkar [Thu, 13 Aug 2015 20:14:34 +0000 (13:14 -0700)]
Merge branch 'uek-4.1/xen' of git://ca-git.us.oracle.com/linux-konrad-public into topic/uek-4.1/xen
* 'uek-4.1/xen' of git://ca-git.us.oracle.com/linux-konrad-public: (31 commits)
xen-netfront: Remove the meaningless code
net/xen-netfront: Correct printf format in xennet_get_responses
xen-netfront: Use setup_timer
xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
Revert "xen/events/fifo: Handle linked events when closing a port"
x86/xen: build "Xen PV" APIC driver for domU as well
xen/events/fifo: Handle linked events when closing a port
xen: release lock occasionally during ballooning
xen/gntdevt: Fix race condition in gntdev_release()
block/xen-blkback: s/nr_pages/nr_segs/
block/xen-blkfront: Remove invalid comment
arm/xen: Drop duplicate define mfn_to_virt
xen/grant-table: Remove unused macro SPP
xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
xen: Include xen/page.h rather than asm/xen/page.h
kconfig: add xenconfig defconfig helper
kconfig: clarify kvmconfig is for kvm
xen/pcifront: Remove usage of struct timeval
xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
hvc_xen: avoid uninitialized variable warning
...
Santosh Shilimkar [Thu, 13 Aug 2015 20:12:52 +0000 (13:12 -0700)]
Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into topic/uek-4.1/ofed
* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed:
mlx4_vnic: Skip fip discover restart if pkey index not changed
rds: rds_ib_device.refcount overflow
rds_rdma: rds_sendmsg should return EAGAIN if connection not setup
rds_rdma: allocate FMR according to max_item_soft
rds_rdma: do not dealloc fmrs in the pool under use
rds: set fmr pool dirty_count correctly
Nick Alcock [Thu, 13 Aug 2015 15:47:50 +0000 (16:47 +0100)]
dtrace: prevent the stack protector from breaking syscall tracing.
The systrace_syscall() function is unusual in that it requires %rax to be
conserved in the function prologue (until the volatile asm which collects the
syscall number from it and sticks it in a local variable). GCC doesn't know
about this, and recent GCC has started smashing it with the stack protector
prologue. Fix this by turning off stack protection in this one function (which
does not benefit from it anyway -- it contains only two assignments, neither of
which can overrun -- and is a notable hot spot).
Also declare it asmlinkage, like every other syscall already is: it is called
from asm, just like them.
Orabug: 21630345 Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Yuval Shaia [Sun, 1 Feb 2015 00:53:26 +0000 (16:53 -0800)]
mlx4_vnic: Skip fip discover restart if pkey index not changed
Driver receives MAD on any change made to partition table.
This fix aim to cover the case where driver shouldn't restart net interface
when receiving PKEY_CHANGE event but pkey index was not changed.
Mukesh Kacker [Wed, 12 Aug 2015 22:45:14 +0000 (15:45 -0700)]
Merge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed
* topic/uek-4.1/ofed.rds-p2:
rds: rds_ib_device.refcount overflow
rds_rdma: rds_sendmsg should return EAGAIN if connection not setup
rds_rdma: allocate FMR according to max_item_soft
rds_rdma: do not dealloc fmrs in the pool under use
rds: set fmr pool dirty_count correctly
Wengang Wang [Tue, 4 Aug 2015 05:39:51 +0000 (13:39 +0800)]
rds: rds_ib_device.refcount overflow
Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device")
There is a missing dropping of refcount on rds_ib_device.refcount
in case rds_ib_alloc_fmr() failed(mr pool running out). This lead to
the refcount overflow.
A BUG_ON on line 117(see following) is triggered.
From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544
and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. So rds_ib_alloc_fmr() is
very likely to return ERR_PTR(-EAGAIN).
Wengang Wang [Tue, 4 Aug 2015 08:00:55 +0000 (16:00 +0800)]
rds_rdma: rds_sendmsg should return EAGAIN if connection not setup
In rds_sendmsg(), in the case RDS_CMSG_RDMA_MAP is requested and
rds_cmsg_send() is called, a "struct rds_mr" needs to be created.
For creating the "struct rds_mr", the connection needs to be
established(ready) for rds_ib_transport. Otherwise, __rds_rdma_map()
would fail because it can't find the right rds_ib_device (which is
associated with the ip address matching rds_sock's bound ip address).
The ip address is set at the completion of the rdma connection.
But actually in code, the connecting is triggered after the call of call
rds_cmsg_send() so rds_cmsg_send() would fail with -NODEV.
The fix is to move the trigger of connection before calling
rds_cmsg_send() and return -EAGAIN in case connection is not
ready yet when we are calling rds_cmsg_send().
Wengang Wang [Tue, 4 Aug 2015 08:27:08 +0000 (16:27 +0800)]
rds_rdma: allocate FMR according to max_item_soft
When max_item on the pool is very large(say 170000), large number of
alloc_fmr request would fail with -EGAIN. This can greatly hurt
performance.
Actually, whether it is going to allocate or wait for pool flush
should be according to "max_item_soft" rather than "max_item" because
we are resizing the pool by changing "max_item_soft" not max_item.
Also, when successfully allocated a lower layer fmr, we should increase
"max_item_soft" incrementally, not set it immediately to "max_item".
Wengang Wang [Tue, 4 Aug 2015 08:16:01 +0000 (16:16 +0800)]
rds: set fmr pool dirty_count correctly
In rds_ib_flush_mr_pool() the setting of dirty_count is wrong in case
"free_all" is true -- "clean" ones also counted as dirty. dirty_count
being negative is seen in vmcore.
Konrad Rzeszutek Wilk [Wed, 12 Aug 2015 19:20:26 +0000 (15:20 -0400)]
Merge branch 'backports-for-konrad' of git://ca-git.us.oracle.com/linux-eufimtse-public into uek-4.1/xen
* 'backports-for-konrad' of git://ca-git.us.oracle.com/linux-eufimtse-public: (31 commits)
xen-netfront: Remove the meaningless code
net/xen-netfront: Correct printf format in xennet_get_responses
xen-netfront: Use setup_timer
xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
Revert "xen/events/fifo: Handle linked events when closing a port"
x86/xen: build "Xen PV" APIC driver for domU as well
xen/events/fifo: Handle linked events when closing a port
xen: release lock occasionally during ballooning
xen/gntdevt: Fix race condition in gntdev_release()
block/xen-blkback: s/nr_pages/nr_segs/
block/xen-blkfront: Remove invalid comment
arm/xen: Drop duplicate define mfn_to_virt
xen/grant-table: Remove unused macro SPP
xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
xen: Include xen/page.h rather than asm/xen/page.h
kconfig: add xenconfig defconfig helper
kconfig: clarify kvmconfig is for kvm
xen/pcifront: Remove usage of struct timeval
xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
hvc_xen: avoid uninitialized variable warning
...
Backports from Linux 4.1
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Carol Soto [Tue, 2 Jun 2015 21:07:24 +0000 (16:07 -0500)]
net/mlx4_core: need to call close fw if alloc icm is called twice
If mlx4_enable_sriov is called by adapter without this
feature MLX4_DEV_CAP_FLAG2_SYS_EQS then during this path the function alloc
icm is called twice without freeing the structures from the first time.
Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ed3d2276ef72be23c6367358d80004130d8c797d)
Li, Liang Z [Fri, 26 Jun 2015 23:17:26 +0000 (07:17 +0800)]
xen-netfront: Remove the meaningless code
The function netif_set_real_num_tx_queues() will return -EINVAL if
the second parameter < 1, so call this function with the second
parameter set to 0 is meaningless.
Signed-off-by: Liang Li <liang.z.li@intel.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 905726c1c5a3ca620ba7d73c78eddfb91de5ce28) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Julien Grall [Tue, 16 Jun 2015 19:10:46 +0000 (20:10 +0100)]
net/xen-netfront: Correct printf format in xennet_get_responses
rx->status is an int16_t, print it using %d rather than %u in order to
have a meaningful value when the field is negative.
Also use %u rather than %x for rx->offset.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6c10127d91bdc80e02938085d03c232ae3118ad5) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 493be55ac3d81f9c32832237288eb397a9993d5d) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Julien Grall [Mon, 10 Aug 2015 18:10:38 +0000 (19:10 +0100)]
xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
The commit ccc9d90a9a8b5c4ad7e9708ec41f75ff9e98d61d "xenbus_client:
Extend interface to support multi-page ring" removes the call to
free_xenballooned_pages() in xenbus_unmap_ring_vfree_hvm(), leaking a
page for every shared ring.
Only with backends running in HVM domains were affected.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Cc: <stable@vger.kernel.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c22fe519e7e2b94ad173e0ea3b89c1a7d8be8d00) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
This was causing a WARNING whenever a PIRQ was closed since
shutdown_pirq() is called with irqs disabled.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: <stable@vger.kernel.org>
(cherry picked from commit ad6cd7bafcd2c812ba4200d5938e07304f1e2fcd) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Jason A. Donenfeld [Mon, 10 Aug 2015 13:40:27 +0000 (15:40 +0200)]
x86/xen: build "Xen PV" APIC driver for domU as well
It turns out that a PV domU also requires the "Xen PV" APIC
driver. Otherwise, the flat driver is used and we get stuck in busy
loops that never exit, such as in this stack trace:
(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
#0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
#1 __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
#2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
#3 0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
#4 0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
#5 0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
#6 0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
at kernel/printk/printk.c:1778
#7 0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
#8 0xffffffffc00013ea in ?? ()
#9 0x0000000000000000 in ?? ()
Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit fc5fee86bdd3d720e2d1d324e4fae0c35845fa63) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Ross Lagerwall [Fri, 31 Jul 2015 13:30:42 +0000 (14:30 +0100)]
xen/events/fifo: Handle linked events when closing a port
An event channel bound to a CPU that was offlined may still be linked
on that CPU's queue. If this event channel is closed and reused,
subsequent events will be lost because the event channel is never
unlinked and thus cannot be linked onto the correct queue.
When a channel is closed and the event is still linked into a queue,
ensure that it is unlinked before completing.
If the CPU to which the event channel bound is online, spin until the
event is handled by that CPU. If that CPU is offline, it can't handle
the event, so clear the event queue during the close, dropping the
events.
This fixes the missing interrupts (and subsequent disk stalls etc.)
when offlining a CPU.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit fcdf31a7c162de0c93a2bee51df4688ab0a348f8) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
When dom0 is being ballooned balloon_process() will hold the balloon
mutex until it is finished. This will block e.g. creation of new
domains as the device backends for the new domain need some
autoballooned pages for the ring buffers.
Avoid this by releasing the balloon mutex from time to time during
ballooning. Adjust the comment above balloon_process() regarding
multiple instances of balloon_process().
Instead of open coding it, just use cond_resched().
Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 929423fa83e5b75e94101b280738b9a5a376a0e1) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Marek Marczykowski-GĂ³recki [Fri, 26 Jun 2015 01:28:24 +0000 (03:28 +0200)]
xen/gntdevt: Fix race condition in gntdev_release()
While gntdev_release() is called the MMU notifier is still registered
and can traverse priv->maps list even if no pages are mapped (which is
the case -- gntdev_release() is called after all). But
gntdev_release() will clear that list, so make sure that only one of
those things happens at the same time.
Signed-off-by: Marek Marczykowski-GĂ³recki <marmarek@invisiblethingslab.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 30b03d05e07467b8c6ec683ea96b5bffcbcd3931) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Julien Grall [Wed, 17 Jun 2015 14:28:07 +0000 (15:28 +0100)]
block/xen-blkfront: Remove invalid comment
Since commit b764915 "xen-blkfront: use a different scatterlist for each
request", biovec has been replaced by scatterlist when copying back the
data during a completion request.
Julien Grall [Wed, 17 Jun 2015 14:28:04 +0000 (15:28 +0100)]
xen/grant-table: Remove unused macro SPP
SPP was used by the grant table v2 code which has been removed in
commit 438b33c7145ca8a5131a30c36d8f59bce119a19a "xen/grant-table:
remove support for V2 tables".
Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 548f7c94759ac58d4744ef2663e2a66a106e21c5) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Julien Grall [Wed, 17 Jun 2015 14:28:03 +0000 (15:28 +0100)]
xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
virt_to_mfn should take a void* rather an unsigned long. While it
doesn't really matter now, it would throw a compiler warning later when
virt_to_mfn will enforce the type.
At the same time, avoid to compute new virtual address every time in the
loop and directly increment the parameter as we don't use it later.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c9fd55eb6625ead6a1207e7da38026ff47c5198b) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Luis R. Rodriguez [Wed, 20 May 2015 18:53:39 +0000 (11:53 -0700)]
kconfig: add xenconfig defconfig helper
This lets you build a kernel which can support xen dom0
or xen guests on i386, x86-64 and arm64 by just using:
make xenconfig
You can start from an allnoconfig and then switch to xenconfig.
This also splits out the options which are available currently
to be built with x86 and 'make ARCH=arm64' under a shared config.
Technically xen supports a dom0 kernel and also a guest
kernel configuration but upon review with the xen team
since we don't have many dom0 options its best to just
combine these two into one.
A few generic notes: we enable both of these:
CONFIG_INET=y
CONFIG_BINFMT_ELF=y
although technically not required given you likely will
end up with a pretty useless system otherwise.
A few architectural differences worth noting:
$ make allnoconfig; make xenconfig > /dev/null ; \
grep XEN .config > 64-bit-config
$ make ARCH=i386 allnoconfig; make ARCH=i386 xenconfig > /dev/null; \
grep XEN .config > 32-bit-config
$ make ARCH=arm64 allnoconfig; make ARCH=arm64 xenconfig > /dev/null; \
grep XEN .config > arm64-config
Since the options are already split up with a generic config and
architecture specific configs you anything on the x86 configs
are known to only work right now on x86. For instance arm64 doesn't
support MEMORY_HOTPLUG yet as such although we try to enabe it
generically arm64 doesn't have it yet, so we leave the xen
specific kconfig option XEN_BALLOON_MEMORY_HOTPLUG on x86's config
file to set expecations correctly.
Then on x86 we have differences between i386 and x86-64. The difference
between 64-bit-config and 32-bit-config is you don't get XEN_MCE_LOG as
this is only supported on 64-bit. You also do not get on i386
XEN_BALLOON_MEMORY_HOTPLUG, there does not seem to be any technical
reasons to not allow this but I gave up after a few attempts.
Cc: Josh Triplett <josh@joshtriplett.org> Cc: Borislav Petkov <bp@suse.de> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: penberg@kernel.org Cc: levinsasha928@gmail.com Cc: mtosatti@redhat.com Cc: fengguang.wu@intel.com Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xenproject.org Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org> Acked-by: Michal Marek <mmarek@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6c6685055a285de53f18fbf6611687291b57ccd6) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Tina Ruchandani [Tue, 19 May 2015 06:08:09 +0000 (11:38 +0530)]
xen/pcifront: Remove usage of struct timeval
struct timeval uses a 32-bit field for representing seconds, which
will overflow in the year 2038 and beyond. Replace struct timeval with
64-bit ktime_t which is 2038 safe. This is part of a larger effort to
remove instances of 32-bit timekeeping variables (timeval, time_t and
timespec) from the kernel.
Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit e1d5bbcdc7ca08d8731f5d780f0de342a768d96a) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Jan Beulich [Thu, 28 May 2015 12:04:33 +0000 (13:04 +0100)]
xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 01b720f3295b6c1b2dcfc1affd0fedc6f5d28c1e) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Jan Beulich [Thu, 28 May 2015 08:28:22 +0000 (09:28 +0100)]
hvc_xen: avoid uninitialized variable warning
Older compilers don't recognize that "v" can't be used uninitialized;
other code using hvm_get_parameter() zeros the value too, so follow
suit here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c4ace5daf4ff726402b13f1ababf2ad0e0ceec65) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Jan Beulich [Thu, 28 May 2015 08:26:37 +0000 (09:26 +0100)]
xenbus: avoid uninitialized variable warning
Older compilers don't recognize that "v" can't be used uninitialized;
other code using hvm_get_parameter() zeros the value too, so follow
suit here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 76ea3cb428c5dc4fd5a415a7651026caf198fb3d) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Ard Biesheuvel [Wed, 6 May 2015 14:14:22 +0000 (14:14 +0000)]
xen/arm: allow console=hvc0 to be omitted for guests
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
This patch registers hvc0 as the preferred console if no console
has been specified explicitly on the kernel command line.
The purpose is to allow platform agnostic kernels and boot images
(such as distro installers) to boot in a Xen/ARM domU without the
need to modify the command line by hand.
Stefano Stabellini [Wed, 6 May 2015 14:13:31 +0000 (14:13 +0000)]
arm,arm64/xen: move Xen initialization earlier
Currently, Xen is initialized/discovered in an initcall. This doesn't
allow us to support earlyprintk or choosing the preferred console when
running on Xen.
The current function xen_guest_init is now split in 2 parts:
- xen_early_init: Check if there is a Xen node in the device tree
and setup domain type
- xen_guest_init: Retrieve the information from the device node and
initialize Xen (grant table, shared page...)
The former is called in setup_arch, while the latter is an initcall.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Will Deacon <will.deacon@arm.com>
(cherry picked from commit 5882bfef6327093bff63569be19795170ff71e5f) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Bob Liu [Wed, 22 Jul 2015 06:40:09 +0000 (14:40 +0800)]
xen-blkfront: don't add indirect pages to list when !feature_persistent
We should consider info->feature_persistent when adding indirect page to list
info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered.
When we are using persistent grants the indirect_pages list
should always be empty because blkfront has pre-allocated enough
persistent pages to fill all requests on the ring.
There is a bug when migrate from !feature-persistent host to feature-persistent
host, because domU still thinks new host/backend doesn't support persistent.
Dmesg like:
backed has not unmapped grant: 839
backed has not unmapped grant: 773
backed has not unmapped grant: 773
backed has not unmapped grant: 773
backed has not unmapped grant: 839
The fix is to recheck feature-persistent of new backend in blkif_recover().
See: https://lkml.org/lkml/2015/5/25/469
As Roger suggested, we can split the part of blkfront_connect that checks for
optional features, like persistent grants, indirect descriptors and
flush/barrier features to a separate function and call it from both
blkfront_connect and blkif_recover
Bob Liu [Fri, 19 Jun 2015 04:23:00 +0000 (00:23 -0400)]
drivers: xen-blkfront: only talk_to_blkback() when in XenbusStateInitialising
Patch 69b91ede5cab843dcf345c28bd1f4b5a99dacd9b
"drivers: xen-blkback: delay pending_req allocation to connect_ring"
exposed an problem that Xen blkfront has. There is a race
with XenStored and the drivers such that we can see two:
vbd vbd-268440320: blkfront:blkback_changed to state 2.
vbd vbd-268440320: blkfront:blkback_changed to state 2.
vbd vbd-268440320: blkfront:blkback_changed to state 4.
state changes to XenbusStateInitWait ('2'). The end result is that
blkback_changed() receives two notify and calls twice setup_blkring().
While the backend driver may only get the first setup_blkring() which is
wrong and reads out-dated (or reads them as they are being updated
with new ring-ref values).
The end result is that the ring ends up being incorrectly set.
The other drivers in the tree have such checks already in.
Reported-and-Tested-by: Robert Butera <robert.butera@oracle.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit a9b54bb95176cd27f952cd9647849022c4c998d6) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Bob Liu [Wed, 3 Jun 2015 05:40:03 +0000 (13:40 +0800)]
xen/block: add multi-page ring support
Extend xen/block to support multi-page ring, so that more requests can be
issued by using more than one pages as the request ring between blkfront
and backend.
As a result, the performance can get improved significantly.
We got some impressive improvements on our highend iscsi storage cluster
backend. If using 64 pages as the ring, the IOPS increased about 15 times
for the throughput testing and above doubled for the latency testing.
The reason was the limit on outstanding requests is 32 if use only one-page
ring, but in our case the iscsi lun was spread across about 100 physical
drives, 32 was really not enough to keep them busy.
Changes in v2:
- Rebased to 4.0-rc6.
- Document on how multi-page ring feature working to linux io/blkif.h.
Changes in v3:
- Remove changes to linux io/blkif.h and follow the protocol defined
in io/blkif.h of XEN tree.
- Rebased to 4.1-rc3
Changes in v4:
- Turn to use 'ring-page-order' and 'max-ring-page-order'.
- A few comments from Roger.
Changes in v5:
- Clarify with 4k granularity to comment
- Address more comments from Roger
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 86839c56dee28c315a4c19b7bfee450ccd84cd25) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Bob Liu [Wed, 3 Jun 2015 05:40:02 +0000 (13:40 +0800)]
driver: xen-blkfront: move talk_to_blkback to a more suitable place
The major responsibility of talk_to_blkback() is allocate and initialize
the request ring and write the ring info to xenstore.
But this work should be done after backend entered 'XenbusStateInitWait' as
defined in the protocol file.
See xen/include/public/io/blkif.h in XEN git tree:
Front Back
================================= =====================================
XenbusStateInitialising XenbusStateInitialising
o Query virtual device o Query backend device identification
properties. data.
o Setup OS device instance. o Open and validate backend device.
o Publish backend features and
transport parameters.
|
|
V
XenbusStateInitWait
o Query backend features and
transport parameters.
o Allocate and initialize the
request ring.
There is no problem with this yet, but it is an violation of the design and
furthermore it would not allow frontend/backend to negotiate 'multi-page'
and 'multi-queue' features.
Changes in v2:
- Re-write the commit message to be more clear.
Bob Liu [Wed, 3 Jun 2015 05:40:01 +0000 (13:40 +0800)]
drivers: xen-blkback: delay pending_req allocation to connect_ring
This is a pre-patch for multi-page ring feature.
In connect_ring, we can know exactly how many pages are used for the shared
ring, delay pending_req allocation here so that we won't waste too much memory.
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 69b91ede5cab843dcf345c28bd1f4b5a99dacd9b) Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Santosh Shilimkar [Tue, 11 Aug 2015 16:57:56 +0000 (09:57 -0700)]
Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
uek-rpm: build: Update the base release to 5 with stable v4.1.45
uek-rpm: onfig: enable some secure boot features
Santosh Shilimkar [Mon, 10 Aug 2015 20:52:59 +0000 (13:52 -0700)]
Merge branch 'topic/uek-4.1/stable-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* 'topic/uek-4.1/stable-cherry-picks' of git://ca-git.us.oracle.com/linux-uek: (124 commits)
Linux 4.1.5
perf symbols: Store if there is a filter in place
xfs: remote attributes need to be considered data
xfs: remote attribute headers contain an invalid LSN
drm/nouveau/drm/nv04-nv40/instmem: protect access to priv->heap by mutex
drm/nouveau: hold mutex when calling nouveau_abi16_fini()
drm/nouveau/kms/nv50-: guard against enabling cursor on disabled heads
drm/nouveau/fbcon/nv11-: correctly account for ring space usage
qla2xxx: kill sessions/log out initiator on RSCN and port down events
qla2xxx: fix command initialization in target mode.
qla2xxx: Remove msleep in qlt_send_term_exchange
qla2xxx: release request queue reservation.
qla2xxx: Fix hardware lock/unlock issue causing kernel panic.
intel_pstate: Add get_scaling cpu_defaults param to Knights Landing
iscsi-target: Fix iser explicit logout TX kthread leak
iscsi-target: Fix iscsit_start_kthreads failure OOPs
iscsi-target: Fix use-after-free during TPG session shutdown
IB/ipoib: Fix CONFIG_INFINIBAND_IPOIB_CM
NFS: Fix a memory leak in nfs_do_recoalesce
NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked
...
Santosh Shilimkar [Mon, 10 Aug 2015 20:46:40 +0000 (13:46 -0700)]
Merge branch 'topic/uek-4.1/secureboot' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1
* 'topic/uek-4.1/secureboot' of git://ca-git.us.oracle.com/linux-uek:
efi: Disable secure boot if shim is in insecure mode
hibernate: Disable in a signed modules environment
efi: Add EFI_SECURE_BOOT bit
Add option to automatically set securelevel when in Secure Boot mode
asus-wmi: Restrict debugfs interface when securelevel is set
x86: Restrict MSR access when securelevel is set
uswsusp: Disable when securelevel is set
kexec: Disable at runtime if securelevel has been set.
acpi: Ignore acpi_rsdp kernel parameter when securelevel is set
acpi: Limit access to custom_method if securelevel is set
Restrict /dev/mem and /dev/kmem when securelevel is set.
x86: Lock down IO port access when securelevel is enabled
PCI: Lock down BAR access when securelevel is enabled
Enforce module signatures when securelevel is greater than 0
Add BSD-style securelevel support
MODSIGN: Support not importing certs from db
MODSIGN: Import certificates from UEFI Secure Boot
MODSIGN: Add module certificate blacklist keyring
Add an EFI signature blob parser and key loader.
Add EFI signature data types
When setting yup the symbols library we setup several filter lists,
for dsos, comms, symbols, etc, and there is code that, if there are
filters, do certain operations, like recalculate the number of non
filtered histogram entries in the top/report TUI.
But they were considering just the "Zoom" filters, when they need to
take into account as well the above mentioned filters (perf top --comms,
--dsos, etc).
So store in symbol_conf.has_filter true if any of those filters is in
place.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-f5edfmhq69vfvs1kmikq1wep@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andre Tomt <lkml@tomt.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We don't log remote attribute contents, and instead write them
synchronously before we commit the block allocation and attribute
tree update transaction. As a result we are writing to the allocated
space before the allcoation has been made permanent.
As a result, we cannot consider this allocation to be a metadata
allocation. Metadata allocation can take blocks from the free list
and so reuse them before the transaction that freed the block is
committed to disk. This behaviour is perfectly fine for journalled
metadata changes as log recovery will ensure the free operation is
replayed before the overwrite, but for remote attribute writes this
is not the case.
Hence we have to consider the remote attribute blocks to contain
data and allocate accordingly. We do this by dropping the
XFS_BMAPI_METADATA flag from the block allocation. This means the
allocation will not use blocks that are on the busy list without
first ensuring that the freeing transaction has been committed to
disk and the blocks removed from the busy list. This ensures we will
never overwrite a freed block without first ensuring that it is
really free.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On examination of the log via xfs_logprint, none of the symlink
buffers in the log had a bad magic number, nor were any other types
of buffer log format headers mis-identified as symlink buffers.
Tracing was used to find the buffer the kernel was tripping over,
and xfs_db identified it's contents as:
This is a remote attribute buffer, which are notable in that they
are not logged but are instead written synchronously by the remote
attribute code so that they exist on disk before the attribute
transactions are committed to the journal.
The above remote attribute block has an invalid LSN in it - cycle
0xd002000, block 0 - which means when log recovery comes along to
determine if the transaction that writes to the underlying block
should be replayed, it sees a block that has a future LSN and so
does not replay the buffer data in the transaction. Instead, it
validates the buffer magic number and attaches the buffer verifier
to it. It is this buffer magic number check that is failing in the
above assert, indicating that we skipped replay due to the LSN of
the underlying buffer.
The problem here is that the remote attribute buffers cannot have a
valid LSN placed into them, because the transaction that contains
the attribute tree pointer changes and the block allocation that the
attribute data is being written to hasn't yet been committed. Hence
the LSN field in the attribute block is completely unwritten,
thereby leaving the underlying contents of the block in the LSN
field. It could have any value, and hence a future overwrite of the
block by log recovery may or may not work correctly.
Fix this by always writing an invalid LSN to the remote attribute
block, as any buffer in log recovery that needs to write over the
remote attribute should occur. We are protected from having old data
written over the attribute by the fact that freeing the block before
the remote attribute is written will result in the buffer being
marked stale in the log and so all changes prior to the buffer stale
transaction will be cancelled by log recovery.
Hence it is safe to ignore the LSN in the case or synchronously
written, unlogged metadata such as remote attribute blocks, and to
ensure we do that correctly, we need to write an invalid LSN to all
remote attribute blocks to trigger immediate recovery of metadata
that is written over the top.
As a further protection for filesystems that may already have remote
attribute blocks with bad LSNs on disk, change the log recovery code
to always trigger immediate recovery of metadata over remote
attribute blocks.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The RING_SPACE macro accounts how much space is used up so it's
important to ask it for the right amount. Incorrect accounting of this
can cause page faults down the line as writes are attempted outside of
the ring.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To fix some issues talking to ESX, this patch modifies the qla2xxx driver
so that it never logs into remote ports. This has the side effect of
getting rid of the "rports" entirely, which means we never log out of
initiators and never tear down sessions when an initiator goes away.
This is mostly OK, except that we can run into trouble if we have
initiator A assigned FC address X:Y:Z by the fabric talking to us, and
then initiator A goes away. Some time (could be a long time) later,
initiator B comes along and also gets FC address X:Y:Z (which is
available again, because initiator A is gone). If initiator B starts
talking to us, then we'll still have the session for initiator A, and
since we look up incoming IO based on the FC address X:Y:Z, initiator B
will end up using ACLs for initiator A.
Fix this by:
1. Handling RSCN events somewhat differently; instead of completely
skipping the processing of fcports, we look through the list, and if
an fcport disappears, we tell the target code the tear down the
session and tell the HBA FW to release the N_Port handle.
2. Handling "port down" events by flushing all of our sessions. The
firmware was already releasing the N_Port handle but we want the
target code to drop all the sessions too.
Request IOCB queue element(s) is reserved during
good path IO. Under error condition such as unable
to allocate IOCB handle condition, the IOCB count
that was reserved is not released.
Signed-off-by: Quinn Tran <quinn.tran@qlogic.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Scaling for Knights Landing is same as the default scaling (100000).
When Knigts Landing support was added to the pstate driver, this
parameter was omitted resulting in a kernel panic during boot.
This patch fixes a regression introduced with the following commit
in v4.0-rc1 code, where an explicit iser-target logout would result
in ->tx_thread_active being incorrectly cleared by the logout post
handler, and subsequent TX kthread leak:
iscsi-target: Convert iscsi_thread_set usage to kthread.h
To address this bug, change iscsit_logout_post_handler_closesession()
and iscsit_logout_post_handler_samecid() to only cmpxchg() on
->tx_thread_active for traditional iscsi/tcp connections.
This is required because iscsi/tcp connections are invoking logout
post handler logic directly from TX kthread context, while iser
connections are invoking logout post handler logic from a seperate
workqueue context.
This patch fixes a regression introduced with the following commit
in v4.0-rc1 code, where a iscsit_start_kthreads() failure triggers
a NULL pointer dereference OOPs:
iscsi-target: Convert iscsi_thread_set usage to kthread.h
To address this bug, move iscsit_start_kthreads() immediately
preceeding the transmit of last login response, before signaling
a successful transition into full-feature-phase within existing
iscsi_target_do_tx_login_io() logic.
This ensures that no target-side resource allocation failures can
occur after the final login response has been successfully sent.
Also, it adds a iscsi_conn->rx_login_comp to allow the RX thread
to sleep to prevent other socket related failures until the final
iscsi_post_login_handler() call is able to complete.
This patch fixes a use-after-free bug in iscsit_release_sessions_for_tpg()
where se_portal_group->session_lock was incorrectly released/re-acquired
while walking the active se_portal_group->tpg_sess_list.
The can result in a NULL pointer dereference when iscsit_close_session()
shutdown happens in the normal path asynchronously to this code, causing
a bogus dereference of an already freed list entry to occur.
To address this bug, walk the session list checking for the same state
as before, but move entries to a local list to avoid dropping the lock
while walking the active list.
As before, signal using iscsi_session->session_restatement=1 for those
list entries to be released locally by iscsit_free_session() code.
If the above is turned off then ipoib_cm_dev_init unconditionally
returns ENOSYS, and the newly added error handling in
0b3957 prevents ipoib from coming up at all:
kernel: mlx4_0: ipoib_transport_dev_init failed
kernel: mlx4_0: failed to initialize port 1 (ret = -12)
Fixes: 0b39578bcde4 (IB/ipoib: Use dedicated workqueues per interface) Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the function exits early, then we must put those requests that were
not processed back onto the &mirror->pg_list so they can be cleaned up
by nfs_pgio_error().
Fixes: a7d42ddb30997 ("nfs: add mirroring support to pgio layer") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we've ensured that the size and the change attribute are both correct,
then there is no point in marking those attributes as needing revalidation
again. Only do so if we know the size is incorrect and was not updated.
The CQM task events are not safe to be called from within interrupt
context because they require performing an IPI to read the counter value
on all sockets. And performing IPIs from within IRQ context is a
"no-no".
Make do with the last read counter value currently event in
event->count when we're invoked in this context.
Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Will Auld <will.auld@intel.com> Link: http://lkml.kernel.org/r/1437490509-15373-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c86c7ca7606
Author: Namhyung Kim <namhyung@kernel.org>
Date: Mon Mar 17 18:18:54 2014 -0300
perf report: Merge al->filtered with hist_entry->filtered
We stopped dropping samples for things filtered via the --comms, --dsos,
--symbols, etc, i.e. things marked as filtered in the symbol resolution
routines (thread__find_addr_map(), perf_event__preprocess_sample(),
etc).
perf top/tui: Update nr_entries properly after a filter is applied
We don't take into account entries that were filtered in
perf_event__preprocess_sample() and friends, which leads to
inconsistency in the browser seek routines, that expects the number of
hist_entry->filtered entries to match what it thinks is the number of
unfiltered, browsable entries.
So, for instance, when we do:
perf top --symbols ___non_existent_symbol___
the hist_browser__nr_entries() routine thinks there are no filters in
place, uses the hists->nr_entries but all entries are filtered, leading
to a segfault.
Tested with:
perf top --symbols malloc,free --percentage=relative
Freezing, by pressing 'f', at any time and doing the math on the
percentages ends up with 100%, ditto for:
perf top --dsos libpthread-2.20.so,libxul.so --percentage=relative
Both were segfaulting, all fixed now.
More work needed to do away with checking if filters are in place, we
should just use the nr_non_filtered_samples counter, no need to
conditionally use it or hists.nr_filter, as what the browser does is
just show unfiltered stuff. An audit of how it is being accounted is
needed, this is the minimal fix.
Reported-by: Michael Petlan <mpetlan@redhat.com> Fixes: 268397cb2a47 ("perf top/tui: Update nr_entries properly after a filter is applied") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-6w01d5q97qk0d64kuojme5in@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is reasonable to set default timeout of request as 30 seconds instead of
30000 ticks, which may be 300 seconds if HZ is 100, for example, some arm64
based systems may choose 100 HZ.
Signed-off-by: Ming Lei <ming.lei@canonical.com> Fixes: c76cbbcf4044 ("blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue()" Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When handling signalling char, claim the termios write lock before
signalling waiting readers and writers to prevent further i/o
before flushing the echo and output buffers. This prevents a
userspace signal handler which may output from racing the terminal
flush.
Reference: Bugzilla #99351 ("Output truncated in ssh session after...") Fixes: commit d2b6f44779d3 ("n_tty: Fix signal handling flushes") Reported-by: Filipe Brandenburger <filbranden@google.com> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device")
There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.
A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).
ARCompact/ARCv2 ISA provide that any instructions which deals with
bitpos/count operand ASL, LSL, BSET, BCLR, BMSK .... will only consider
lower 5 bits. i.e. auto-clamp the pos to 0-31.
ARC Linux bitops exploited this fact by NOT explicitly masking out upper
bits for @nr operand in general, saving a bunch of AND/BMSK instructions
in generated code around bitops.
While this micro-optimization has worked well over years it is NOT safe
as shifting a number with a value, greater than native size is
"undefined" per "C" spec.
So as it turns outm EZChip ran into this eventually, in their massive
muti-core SMP build with 64 cpus. There was a test_bit() inside a loop
from 63 to 0 and gcc was weirdly optimizing away the first iteration
(so it was really adhering to standard by implementing undefined behaviour
vs. removing all the iterations which were phony i.e. (1 << [63..32])
| for i = 63 to 0
| X = ( 1 << i )
| if X == 0
| continue
So fix the code to do the explicit masking at the expense of generating
additional instructions. Fortunately, this can be mitigated to a large
extent as gcc has SHIFT_COUNT_TRUNCATED which allows combiner to fold
masking into shift operation itself. It is currently not enabled in ARC
gcc backend, but could be done after a bit of testing.
Fixes STAR 9000866918 ("unsafe "undefined behavior" code in kernel")
Even though it is documented how to specifiy efi parameters, it is
possible to cause a kernel panic due to a dereference of a NULL pointer when
parsing such parameters if "efi" alone is given:
This panic is not reproducible with "efi=" as this will result in a non-NULL
zero-length string.
Thus, verify that the pointer to the parameter string is not NULL. This is
consistent with other parameter-parsing functions which check for NULL pointers.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At boot, the UTF-16 UEFI vendor string is copied from the system
table into a char array with a size of 100 bytes. However, this
size of 100 bytes is also used for memremapping() the source,
which may not be sufficient if the vendor string exceeds 50
UTF-16 characters, and the placement of the vendor string inside
a 4 KB page happens to leave the end unmapped.
So use the correct '100 * sizeof(efi_char16_t)' for the size of
the mapping.
The memory error record structure includes as its first field a
bitmask of which subsequent fields are valid. The allows new fields
to be added to the structure while keeping compatibility with older
software that parses these records. This mechanism was used between
versions 2.2 and 2.3 to add four new fields, growing the size of the
structure from 73 bytes to 80. But Linux just added all the new
fields so this test:
if (gdata->error_data_length >= sizeof(*mem_err))
cper_print_mem(newpfx, mem_err);
else
goto err_section_too_small;
now make Linux complain about old format records being too short.
Add a definition for the old format of the structure and use that
for the minimum size check. Pass the actual size to cper_print_mem()
so it can sanity check the validation_bits field to ensure that if
a BIOS using the old format sets bits as if it were new, we won't
access fields beyond the end of the structure.
Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
flush_tlb_info->flush_start/end are both normal virtual
addresses. When calculating 'nr_pages' (only used for the
tracepoint), I neglected to put parenthesis in.
Thanks to David Koufaty for pointing this out.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@sr71.net Link: http://lkml.kernel.org/r/20150720230153.9E834081@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chrdev_open() increases reference counter on cdev->owner. Instead of
assigning the owner to mei subsystem, the owner has to be set to the
underlaying HW module (mei_me or mei_txe), so once the device is opened
the HW module cannot be unloaded.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Port link change with port in resume state should not be
reported to usbcore, as this is an internal state to be
handled by xhci driver. Reporting PLC to usbcore may
cause usbcore clearing PLC first and port change event irq
won't be generated.
Signed-off-by: Zhuang Jin Can <jin.can.zhuang@intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the link is just waken, it's in Resume state, and driver sets PLS to
U0. This refers to Phase 1. Phase 2 refers to when the link has completed
the transition from Resume state to U0.
With the fix of xhci: report U3 when link is in resume state, it also
exposes an issue that usb3 roothub and controller can suspend right
after phase 1, and this causes a hard hang in controller.
To fix the issue, we need to prevent usb3 bus suspend if any port is
resuming in phase 1.
[merge separate USB2 and USB3 port resume checking to one -Mathias] Signed-off-by: Zhuang Jin Can <jin.can.zhuang@intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xhci_hub_report_usb3_link_state() returns pls as U0 when the link
is in resume state, and this causes usb core to think the link is in
U0 while actually it's in resume state. When usb core transfers
control request on the link, it fails with TRB error as the link
is not ready for transfer.
To fix the issue, report U3 when the link is in resume state, thus
usb core knows the link it's not ready for transfer.
Signed-off-by: Zhuang Jin Can <jin.can.zhuang@intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When resetting a device the number of active TTs may need to be
corrected by xhci_update_tt_active_eps, but the number of old active
endpoints supplied to it was always zero, so the number of TTs and the
bandwidth reserved for them was not updated, and could rise
unnecessarily.
This affected systems using Intel's Patherpoint chipset, which rely on
software bandwidth checking. For example, a Lenovo X230 would lose the
ability to use ports on the docking station after enough suspend/resume
cycles because the bandwidth calculated would rise with every cycle when
a suitable device is attached.
The correct number of active endpoints is calculated in the same way as
in xhci_reserve_bandwidth.