www.infradead.org Git - users/jedix/linux-maple.git/log

Merge branch 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek:
OVMAPI: port ovmapi.ko to UEK4 from UEK3

OVMAPI: port ovmapi.ko to UEK4 from UEK3

Orabug: 20426111

Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
Reviewed-by: Adnan Misherfi <adnan.misherfi@oracle.com>
Reviewed-by: Guru Anbalagane <guru.anbalagane@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek:
  dtrace: ensure SDT module probes work with NORX
  dtrace: prevent the stack protector from breaking syscall tracing.
  kallsyms: make it possible to disable /proc/kallmodsyms

dtrace: ensure SDT module probes work with NORX

When the CONFIG_DEBUG_SET_MODULE_RONX option is enabled in the kernel,
the executable code of the module is marked read-only prior to calling
the module notifier. This causes the patching of NOPs in the probe
locations to fail with a system crash.

Because patching of SDT probe locations should happen upon module load
for DTrace-enabled kernels regardless of whether or not DTrace will be
used, it is both cleaner and more correct to make a direct call to the
code that handles the patching rather than using the notifier mechanism.
The new call takes place prior to marking the pages read-only, avoiding
the problem altogether.

Orabug: 21630297
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Nick Alcock <nick.alcock@oracle.com>

Merge branch 'topic/uek-4.1/xen' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/xen' of git://ca-git.us.oracle.com/linux-uek: (31 commits)
  xen-netfront: Remove the meaningless code
  net/xen-netfront: Correct printf format in xennet_get_responses
  xen-netfront: Use setup_timer
  xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
  Revert "xen/events/fifo: Handle linked events when closing a port"
  x86/xen: build "Xen PV" APIC driver for domU as well
  xen/events/fifo: Handle linked events when closing a port
  xen: release lock occasionally during ballooning
  xen/gntdevt: Fix race condition in gntdev_release()
  block/xen-blkback: s/nr_pages/nr_segs/
  block/xen-blkfront: Remove invalid comment
  arm/xen: Drop duplicate define mfn_to_virt
  xen/grant-table: Remove unused macro SPP
  xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
  xen: Include xen/page.h rather than asm/xen/page.h
  kconfig: add xenconfig defconfig helper
  kconfig: clarify kvmconfig is for kvm
  xen/pcifront: Remove usage of struct timeval
  xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
  hvc_xen: avoid uninitialized variable warning
  ...

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek:
  mlx4_vnic: Skip fip discover restart if pkey index not changed
  rds: rds_ib_device.refcount overflow
  rds_rdma: rds_sendmsg should return EAGAIN if connection not setup
  rds_rdma: allocate FMR according to max_item_soft
  rds_rdma: do not dealloc fmrs in the pool under use
  rds: set fmr pool dirty_count correctly

Merge branch 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek:
intel_pstate: append more Oracle OEM table id to vendor bypass list

Merge branch 'uek-4.1/xen' of git://ca-git.us.oracle.com/linux-konrad-public into topic/uek-4.1/xen

* 'uek-4.1/xen' of git://ca-git.us.oracle.com/linux-konrad-public: (31 commits)
  xen-netfront: Remove the meaningless code
  net/xen-netfront: Correct printf format in xennet_get_responses
  xen-netfront: Use setup_timer
  xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
  Revert "xen/events/fifo: Handle linked events when closing a port"
  x86/xen: build "Xen PV" APIC driver for domU as well
  xen/events/fifo: Handle linked events when closing a port
  xen: release lock occasionally during ballooning
  xen/gntdevt: Fix race condition in gntdev_release()
  block/xen-blkback: s/nr_pages/nr_segs/
  block/xen-blkfront: Remove invalid comment
  arm/xen: Drop duplicate define mfn_to_virt
  xen/grant-table: Remove unused macro SPP
  xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
  xen: Include xen/page.h rather than asm/xen/page.h
  kconfig: add xenconfig defconfig helper
  kconfig: clarify kvmconfig is for kvm
  xen/pcifront: Remove usage of struct timeval
  xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
  hvc_xen: avoid uninitialized variable warning
  ...

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into topic/uek-4.1/ofed

* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed:
  mlx4_vnic: Skip fip discover restart if pkey index not changed
  rds: rds_ib_device.refcount overflow
  rds_rdma: rds_sendmsg should return EAGAIN if connection not setup
  rds_rdma: allocate FMR according to max_item_soft
  rds_rdma: do not dealloc fmrs in the pool under use
  rds: set fmr pool dirty_count correctly

dtrace: prevent the stack protector from breaking syscall tracing.

The systrace_syscall() function is unusual in that it requires %rax to be
conserved in the function prologue (until the volatile asm which collects the
syscall number from it and sticks it in a local variable). GCC doesn't know
about this, and recent GCC has started smashing it with the stack protector
prologue. Fix this by turning off stack protection in this one function (which
does not benefit from it anyway -- it contains only two assignments, neither of
which can overrun -- and is a notable hot spot).

Also declare it asmlinkage, like every other syscall already is: it is called
from asm, just like them.

Orabug: 21630345
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>

intel_pstate: append more Oracle OEM table id to vendor bypass list

Orabug: 21447179

Append more Oracle X86 servers that have their own power management,

SUN FIRE X4275 M3
SUN FIRE X4170 M3
and
SUN FIRE X6-2

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes:
mlx4_vnic: Skip fip discover restart if pkey index not changed

mlx4_vnic: Skip fip discover restart if pkey index not changed

Driver receives MAD on any change made to partition table.
This fix aim to cover the case where driver shouldn't restart net interface
when receiving PKEY_CHANGE event but pkey index was not changed.

Ported from UEK3 commit: 7f56a9b2252e732b0d9e2621303b0c2b781bddfc

Orabug: 21446728

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>

Merge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.rds-p2:
  rds: rds_ib_device.refcount overflow
  rds_rdma: rds_sendmsg should return EAGAIN if connection not setup
  rds_rdma: allocate FMR according to max_item_soft
  rds_rdma: do not dealloc fmrs in the pool under use
  rds: set fmr pool dirty_count correctly

rds: rds_ib_device.refcount overflow

Fixes:
  3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device")

There is a missing dropping of refcount on rds_ib_device.refcount
in case rds_ib_alloc_fmr() failed(mr pool running out). This lead to
the refcount overflow.

A BUG_ON on line 117(see following) is triggered.
From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544
and rds_ibdev->refcount is -2147475448.

That is the evidence the mr pool is used up. So rds_ib_alloc_fmr() is
very likely to return ERR_PTR(-EAGAIN).

115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118         if (atomic_dec_and_test(&rds_ibdev->refcount))
119                 queue_work(rds_wq, &rds_ibdev->free_work);
120 }

The fix is to drop refcount when rds_ib_alloc_fmr() failed.

upstream commit: 4fabb59449aa44a585b3603ffdadd4c5f4d0c033

Orabug: 21534438

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Acked-by: Wei Xu <wei.xu.xu@oracle.com>
Acked-by: Zheng Li <zheng.x.li@oracle.com>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

rds_rdma: rds_sendmsg should return EAGAIN if connection not setup

In rds_sendmsg(), in the case RDS_CMSG_RDMA_MAP is requested and
rds_cmsg_send() is called, a "struct rds_mr" needs to be created.
For creating the "struct rds_mr", the connection needs to be
established(ready) for rds_ib_transport. Otherwise, __rds_rdma_map()
would fail because it can't find the right rds_ib_device (which is
associated with the ip address matching rds_sock's bound ip address).
The ip address is set at the completion of the rdma connection.

But actually in code, the connecting is triggered after the call of call
rds_cmsg_send() so rds_cmsg_send() would fail with -NODEV.

The fix is to move the trigger of connection before calling
rds_cmsg_send() and return -EAGAIN in case connection is not
ready yet when we are calling rds_cmsg_send().

Orabug: 21551474

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Chien-Hua Yen <chien.yen@oracle.com>

rds_rdma: allocate FMR according to max_item_soft

When max_item on the pool is very large(say 170000), large number of
alloc_fmr request would fail with -EGAIN. This can greatly hurt
performance.

Actually, whether it is going to allocate or wait for pool flush
should be according to "max_item_soft" rather than "max_item" because
we are resizing the pool by changing "max_item_soft" not max_item.

Also, when successfully allocated a lower layer fmr, we should increase
"max_item_soft" incrementally, not set it immediately to "max_item".

Orabug: 21551548

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Chien-Hua Yen <chien.yen@oracle.com>

rds_rdma: do not dealloc fmrs in the pool under use

In rds_ib_alloc_fmr, when it needs flush pools, it de-allocates FMRs.
That is time-consuming and is not meaningful in functionality perspective.

Fix is to not de-allocate the ones in the pool which is under use.

Orabug: 21551548

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Chien-Hua Yen <chien.yen@oracle.com>

rds: set fmr pool dirty_count correctly

In rds_ib_flush_mr_pool() the setting of dirty_count is wrong in case
"free_all" is true -- "clean" ones also counted as dirty. dirty_count
being negative is seen in vmcore.

Orabug: 21551548

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Chien-Hua Yen <chien.yen@oracle.com>

Merge branch 'backports-for-konrad' of git://ca-git.us.oracle.com/linux-eufimtse-public into uek-4.1/xen

* 'backports-for-konrad' of git://ca-git.us.oracle.com/linux-eufimtse-public: (31 commits)
  xen-netfront: Remove the meaningless code
  net/xen-netfront: Correct printf format in xennet_get_responses
  xen-netfront: Use setup_timer
  xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
  Revert "xen/events/fifo: Handle linked events when closing a port"
  x86/xen: build "Xen PV" APIC driver for domU as well
  xen/events/fifo: Handle linked events when closing a port
  xen: release lock occasionally during ballooning
  xen/gntdevt: Fix race condition in gntdev_release()
  block/xen-blkback: s/nr_pages/nr_segs/
  block/xen-blkfront: Remove invalid comment
  arm/xen: Drop duplicate define mfn_to_virt
  xen/grant-table: Remove unused macro SPP
  xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
  xen: Include xen/page.h rather than asm/xen/page.h
  kconfig: add xenconfig defconfig helper
  kconfig: clarify kvmconfig is for kvm
  xen/pcifront: Remove usage of struct timeval
  xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
  hvc_xen: avoid uninitialized variable warning
  ...

Backports from Linux 4.1

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek:
net/mlx4_core: need to call close fw if alloc icm is called twice

net/mlx4_core: need to call close fw if alloc icm is called twice

If mlx4_enable_sriov is called by adapter without this
feature MLX4_DEV_CAP_FLAG2_SYS_EQS then during this path the function alloc
icm is called twice without freeing the structures from the first time.

Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ed3d2276ef72be23c6367358d80004130d8c797d)

Orabug: 21606315

Acked-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>

xen-netfront: Remove the meaningless code

The function netif_set_real_num_tx_queues() will return -EINVAL if
the second parameter < 1, so call this function with the second
parameter set to 0 is meaningless.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 905726c1c5a3ca620ba7d73c78eddfb91de5ce28)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

net/xen-netfront: Correct printf format in xennet_get_responses

rx->status is an int16_t, print it using %d rather than %u in order to
have a meaningful value when the field is negative.

Also use %u rather than %x for rx->offset.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6c10127d91bdc80e02938085d03c232ae3118ad5)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen-netfront: Use setup_timer

Use the timer API function setup_timer instead of structure field
assignments to initialize a timer.

A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:

@change@
expression e, func, da;
@@

-init_timer (&e);
+setup_timer (&e, func, da);
-e.data = da;
-e.function = func;

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 493be55ac3d81f9c32832237288eb397a9993d5d)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/xenbus: Don't leak memory when unmapping the ring on HVM backend

The commit ccc9d90a9a8b5c4ad7e9708ec41f75ff9e98d61d "xenbus_client:
Extend interface to support multi-page ring" removes the call to
free_xenballooned_pages() in xenbus_unmap_ring_vfree_hvm(), leaking a
page for every shared ring.

Only with backends running in HVM domains were affected.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c22fe519e7e2b94ad173e0ea3b89c1a7d8be8d00)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Revert "xen/events/fifo: Handle linked events when closing a port"

This reverts commit fcdf31a7c162de0c93a2bee51df4688ab0a348f8.

This was causing a WARNING whenever a PIRQ was closed since
shutdown_pirq() is called with irqs disabled.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: <stable@vger.kernel.org>
(cherry picked from commit ad6cd7bafcd2c812ba4200d5938e07304f1e2fcd)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

x86/xen: build "Xen PV" APIC driver for domU as well

It turns out that a PV domU also requires the "Xen PV" APIC
driver. Otherwise, the flat driver is used and we get stuck in busy
loops that never exit, such as in this stack trace:

(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56              while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
#0  __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
#1  __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
#2  apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
#3  0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
#4  0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
#5  0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
#6  0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
    at kernel/printk/printk.c:1778
#7  0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
#8  0xffffffffc00013ea in ?? ()
#9  0x0000000000000000 in ?? ()

Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit fc5fee86bdd3d720e2d1d324e4fae0c35845fa63)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/events/fifo: Handle linked events when closing a port

An event channel bound to a CPU that was offlined may still be linked
on that CPU's queue. If this event channel is closed and reused,
subsequent events will be lost because the event channel is never
unlinked and thus cannot be linked onto the correct queue.

When a channel is closed and the event is still linked into a queue,
ensure that it is unlinked before completing.

If the CPU to which the event channel bound is online, spin until the
event is handled by that CPU. If that CPU is offline, it can't handle
the event, so clear the event queue during the close, dropping the
events.

This fixes the missing interrupts (and subsequent disk stalls etc.)
when offlining a CPU.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit fcdf31a7c162de0c93a2bee51df4688ab0a348f8)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen: release lock occasionally during ballooning

When dom0 is being ballooned balloon_process() will hold the balloon
mutex until it is finished. This will block e.g. creation of new
domains as the device backends for the new domain need some
autoballooned pages for the ring buffers.

Avoid this by releasing the balloon mutex from time to time during
ballooning. Adjust the comment above balloon_process() regarding
multiple instances of balloon_process().

Instead of open coding it, just use cond_resched().

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 929423fa83e5b75e94101b280738b9a5a376a0e1)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/gntdevt: Fix race condition in gntdev_release()

While gntdev_release() is called the MMU notifier is still registered
and can traverse priv->maps list even if no pages are mapped (which is
the case -- gntdev_release() is called after all). But
gntdev_release() will clear that list, so make sure that only one of
those things happens at the same time.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 30b03d05e07467b8c6ec683ea96b5bffcbcd3931)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

block/xen-blkback: s/nr_pages/nr_segs/

Make the code less confusing to read now that Linux may not have the
same page size as Xen.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6684fa1cdb1ebe804e9707f389255d461b2e95b0)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

block/xen-blkfront: Remove invalid comment

Since commit b764915 "xen-blkfront: use a different scatterlist for each
request", biovec has been replaced by scatterlist when copying back the
data during a completion request.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit ee4b7179fc9bf19fd22f9d17757a86582c40229e)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

arm/xen: Drop duplicate define mfn_to_virt

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 90d73c4f43630ca45398cee7f32f0fe51271b2ce)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/grant-table: Remove unused macro SPP

SPP was used by the grant table v2 code which has been removed in
commit 438b33c7145ca8a5131a30c36d8f59bce119a19a "xen/grant-table:
remove support for V2 tables".

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 548f7c94759ac58d4744ef2663e2a66a106e21c5)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring

virt_to_mfn should take a void* rather an unsigned long. While it
doesn't really matter now, it would throw a compiler warning later when
virt_to_mfn will enforce the type.

At the same time, avoid to compute new virtual address every time in the
loop and directly increment the parameter as we don't use it later.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c9fd55eb6625ead6a1207e7da38026ff47c5198b)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen: Include xen/page.h rather than asm/xen/page.h

Using xen/page.h will be necessary later for using common xen page
helpers.

As xen/page.h already include asm/xen/page.h, always use the later.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit a9fd60e2683fb80f5b26a7d686aebe3327a63e70)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

kconfig: add xenconfig defconfig helper

This lets you build a kernel which can support xen dom0
or xen guests on i386, x86-64 and arm64 by just using:

make xenconfig

You can start from an allnoconfig and then switch to xenconfig.
This also splits out the options which are available currently
to be built with x86 and 'make ARCH=arm64' under a shared config.

Technically xen supports a dom0 kernel and also a guest
kernel configuration but upon review with the xen team
since we don't have many dom0 options its best to just
combine these two into one.

A few generic notes: we enable both of these:

CONFIG_INET=y
CONFIG_BINFMT_ELF=y

although technically not required given you likely will
end up with a pretty useless system otherwise.

A few architectural differences worth noting:

$ make allnoconfig; make xenconfig > /dev/null ; \
grep XEN .config > 64-bit-config
$ make ARCH=i386 allnoconfig; make ARCH=i386 xenconfig > /dev/null; \
grep XEN .config > 32-bit-config
$ make ARCH=arm64 allnoconfig; make ARCH=arm64 xenconfig > /dev/null; \
grep XEN .config > arm64-config

Since the options are already split up with a generic config and
architecture specific configs you anything on the x86 configs
are known to only work right now on x86. For instance arm64 doesn't
support MEMORY_HOTPLUG yet as such although we try to enabe it
generically arm64 doesn't have it yet, so we leave the xen
specific kconfig option XEN_BALLOON_MEMORY_HOTPLUG on x86's config
file to set expecations correctly.

Then on x86 we have differences between i386 and x86-64. The difference
between 64-bit-config and 32-bit-config is you don't get XEN_MCE_LOG as
this is only supported on 64-bit. You also do not get on i386
XEN_BALLOON_MEMORY_HOTPLUG, there does not seem to be any technical
reasons to not allow this but I gave up after a few attempts.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: penberg@kernel.org
Cc: levinsasha928@gmail.com
Cc: mtosatti@redhat.com
Cc: fengguang.wu@intel.com
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xenproject.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Michal Marek <mmarek@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 6c6685055a285de53f18fbf6611687291b57ccd6)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

kconfig: clarify kvmconfig is for kvm

We'll be adding options for xen as well.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: penberg@kernel.org
Cc: levinsasha928@gmail.com
Cc: mtosatti@redhat.com
Cc: fengguang.wu@intel.com
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xenproject.org
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 9bcd776d299e8adb8ed37be0453472aa3cc2b7db)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/pcifront: Remove usage of struct timeval

struct timeval uses a 32-bit field for representing seconds, which
will overflow in the year 2038 and beyond. Replace struct timeval with
64-bit ktime_t which is 2038 safe. This is part of a larger effort to
remove instances of 32-bit timekeeping variables (timeval, time_t and
timespec) from the kernel.

Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit e1d5bbcdc7ca08d8731f5d780f0de342a768d96a)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 01b720f3295b6c1b2dcfc1affd0fedc6f5d28c1e)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

hvc_xen: avoid uninitialized variable warning

Older compilers don't recognize that "v" can't be used uninitialized;
other code using hvm_get_parameter() zeros the value too, so follow
suit here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit c4ace5daf4ff726402b13f1ababf2ad0e0ceec65)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xenbus: avoid uninitialized variable warning

Older compilers don't recognize that "v" can't be used uninitialized;
other code using hvm_get_parameter() zeros the value too, so follow
suit here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
(cherry picked from commit 76ea3cb428c5dc4fd5a415a7651026caf198fb3d)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/arm: allow console=hvc0 to be omitted for guests

From: Ard Biesheuvel <ard.biesheuvel@linaro.org>

This patch registers hvc0 as the preferred console if no console
has been specified explicitly on the kernel command line.

The purpose is to allow platform agnostic kernels and boot images
(such as distro installers) to boot in a Xen/ARM domU without the
need to modify the command line by hand.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit f1dddd118c555508ce383b7262f4e6440927bdf4)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

arm,arm64/xen: move Xen initialization earlier

Currently, Xen is initialized/discovered in an initcall. This doesn't
allow us to support earlyprintk or choosing the preferred console when
running on Xen.

The current function xen_guest_init is now split in 2 parts:
    - xen_early_init: Check if there is a Xen node in the device tree
    and setup domain type
    - xen_guest_init: Retrieve the information from the device node and
    initialize Xen (grant table, shared page...)

The former is called in setup_arch, while the latter is an initcall.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Will Deacon <will.deacon@arm.com>
(cherry picked from commit 5882bfef6327093bff63569be19795170ff71e5f)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

arm/xen: Correctly check if the event channel interrupt is present

The function irq_of_parse_and_map returns 0 when the IRQ is not found.

Futhermore, move the check before notifying the user that we are running on
Xen.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 81e863c3a28e69fd60411bde9f779b0f8ad0212a)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen-blkback: replace work_pending with work_busy in purge_persistent_gnt()

The BUG_ON() in purge_persistent_gnt() will be triggered when previous purge
work haven't finished.

There is a work_pending() before this BUG_ON, but it doesn't account if the work
is still currently running.

CC: stable@vger.kernel.org
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 53bc7dc004fecf39e0ba70f2f8d120a1444315d3)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen-blkfront: don't add indirect pages to list when !feature_persistent

We should consider info->feature_persistent when adding indirect page to list
info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered.

When we are using persistent grants the indirect_pages list
should always be empty because blkfront has pre-allocated enough
persistent pages to fill all requests on the ring.

CC: stable@vger.kernel.org
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 7b0767502b5db11cb1f0daef2d01f6d71b1192dc)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen-blkfront: introduce blkfront_gather_backend_features()

There is a bug when migrate from !feature-persistent host to feature-persistent
host, because domU still thinks new host/backend doesn't support persistent.
Dmesg like:
backed has not unmapped grant: 839
backed has not unmapped grant: 773
backed has not unmapped grant: 773
backed has not unmapped grant: 773
backed has not unmapped grant: 839

The fix is to recheck feature-persistent of new backend in blkif_recover().
See: https://lkml.org/lkml/2015/5/25/469

As Roger suggested, we can split the part of blkfront_connect that checks for
optional features, like persistent grants, indirect descriptors and
flush/barrier features to a separate function and call it from both
blkfront_connect and blkif_recover

Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit d50babbe300eedf33ea5b00a12c5df3a05bd96c7)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

drivers: xen-blkfront: only talk_to_blkback() when in XenbusStateInitialising

Patch 69b91ede5cab843dcf345c28bd1f4b5a99dacd9b
"drivers: xen-blkback: delay pending_req allocation to connect_ring"
exposed an problem that Xen blkfront has. There is a race
with XenStored and the drivers such that we can see two:

vbd vbd-268440320: blkfront:blkback_changed to state 2.
vbd vbd-268440320: blkfront:blkback_changed to state 2.
vbd vbd-268440320: blkfront:blkback_changed to state 4.

state changes to XenbusStateInitWait ('2'). The end result is that
blkback_changed() receives two notify and calls twice setup_blkring().

While the backend driver may only get the first setup_blkring() which is
wrong and reads out-dated (or reads them as they are being updated
with new ring-ref values).

The end result is that the ring ends up being incorrectly set.

The other drivers in the tree have such checks already in.

Reported-and-Tested-by: Robert Butera <robert.butera@oracle.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit a9b54bb95176cd27f952cd9647849022c4c998d6)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

xen/block: add multi-page ring support

Extend xen/block to support multi-page ring, so that more requests can be
issued by using more than one pages as the request ring between blkfront
and backend.
As a result, the performance can get improved significantly.

We got some impressive improvements on our highend iscsi storage cluster
backend. If using 64 pages as the ring, the IOPS increased about 15 times
for the throughput testing and above doubled for the latency testing.

The reason was the limit on outstanding requests is 32 if use only one-page
ring, but in our case the iscsi lun was spread across about 100 physical
drives, 32 was really not enough to keep them busy.

Changes in v2:
- Rebased to 4.0-rc6.
- Document on how multi-page ring feature working to linux io/blkif.h.

Changes in v3:
- Remove changes to linux io/blkif.h and follow the protocol defined
in io/blkif.h of XEN tree.
- Rebased to 4.1-rc3

Changes in v4:
- Turn to use 'ring-page-order' and 'max-ring-page-order'.
- A few comments from Roger.

Changes in v5:
- Clarify with 4k granularity to comment
- Address more comments from Roger

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 86839c56dee28c315a4c19b7bfee450ccd84cd25)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

driver: xen-blkfront: move talk_to_blkback to a more suitable place

The major responsibility of talk_to_blkback() is allocate and initialize
the request ring and write the ring info to xenstore.
But this work should be done after backend entered 'XenbusStateInitWait' as
defined in the protocol file.
See xen/include/public/io/blkif.h in XEN git tree:
Front                                Back
=================================    =====================================
XenbusStateInitialising              XenbusStateInitialising
o Query virtual device               o Query backend device identification
   properties.                          data.
o Setup OS device instance.          o Open and validate backend device.
                                      o Publish backend features and
                                        transport parameters.
                                                     |
                                                     |
                                                     V
                                     XenbusStateInitWait

o Query backend features and
  transport parameters.
o Allocate and initialize the
  request ring.

There is no problem with this yet, but it is an violation of the design and
furthermore it would not allow frontend/backend to negotiate 'multi-page'
and 'multi-queue' features.

Changes in v2:
- Re-write the commit message to be more clear.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 8ab0144a466320cc37c52e7866b5103c5bbd4e90)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

drivers: xen-blkback: delay pending_req allocation to connect_ring

This is a pre-patch for multi-page ring feature.
In connect_ring, we can know exactly how many pages are used for the shared
ring, delay pending_req allocation here so that we won't waste too much memory.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 69b91ede5cab843dcf345c28bd1f4b5a99dacd9b)
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
uek-rpm: build: Update the base release to 5 with stable v4.1.45
uek-rpm: onfig: enable some secure boot features

uek-rpm: build: Update the base release to 5 with stable v4.1.45

Stable v4.1.5 is available so lets get that in. Update the
spec file accordingly.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/stable-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/stable-cherry-picks' of git://ca-git.us.oracle.com/linux-uek: (124 commits)
  Linux 4.1.5
  perf symbols: Store if there is a filter in place
  xfs: remote attributes need to be considered data
  xfs: remote attribute headers contain an invalid LSN
  drm/nouveau/drm/nv04-nv40/instmem: protect access to priv->heap by mutex
  drm/nouveau: hold mutex when calling nouveau_abi16_fini()
  drm/nouveau/kms/nv50-: guard against enabling cursor on disabled heads
  drm/nouveau/fbcon/nv11-: correctly account for ring space usage
  qla2xxx: kill sessions/log out initiator on RSCN and port down events
  qla2xxx: fix command initialization in target mode.
  qla2xxx: Remove msleep in qlt_send_term_exchange
  qla2xxx: release request queue reservation.
  qla2xxx: Fix hardware lock/unlock issue causing kernel panic.
  intel_pstate: Add get_scaling cpu_defaults param to Knights Landing
  iscsi-target: Fix iser explicit logout TX kthread leak
  iscsi-target: Fix iscsit_start_kthreads failure OOPs
  iscsi-target: Fix use-after-free during TPG session shutdown
  IB/ipoib: Fix CONFIG_INFINIBAND_IPOIB_CM
  NFS: Fix a memory leak in nfs_do_recoalesce
  NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked
  ...

Merge branch 'topic/uek-4.1/secureboot' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/secureboot' of git://ca-git.us.oracle.com/linux-uek:
  efi: Disable secure boot if shim is in insecure mode
  hibernate: Disable in a signed modules environment
  efi: Add EFI_SECURE_BOOT bit
  Add option to automatically set securelevel when in Secure Boot mode
  asus-wmi: Restrict debugfs interface when securelevel is set
  x86: Restrict MSR access when securelevel is set
  uswsusp: Disable when securelevel is set
  kexec: Disable at runtime if securelevel has been set.
  acpi: Ignore acpi_rsdp kernel parameter when securelevel is set
  acpi: Limit access to custom_method if securelevel is set
  Restrict /dev/mem and /dev/kmem when securelevel is set.
  x86: Lock down IO port access when securelevel is enabled
  PCI: Lock down BAR access when securelevel is enabled
  Enforce module signatures when securelevel is greater than 0
  Add BSD-style securelevel support
  MODSIGN: Support not importing certs from db
  MODSIGN: Import certificates from UEFI Secure Boot
  MODSIGN: Add module certificate blacklist keyring
  Add an EFI signature blob parser and key loader.
  Add EFI signature data types

Merge tag 'v4.1.5' of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into topic/uek-4.1/stable-cherry-picks

Linux 4.1.5

perf symbols: Store if there is a filter in place

commit 0bc2f2f7d080561cc484d2d0a162a9396bed3383 upstream.

When setting yup the symbols library we setup several filter lists,
for dsos, comms, symbols, etc, and there is code that, if there are
filters, do certain operations, like recalculate the number of non
filtered histogram entries in the top/report TUI.

But they were considering just the "Zoom" filters, when they need to
take into account as well the above mentioned filters (perf top --comms,
--dsos, etc).

So store in symbol_conf.has_filter true if any of those filters is in
place.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-f5edfmhq69vfvs1kmikq1wep@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andre Tomt <lkml@tomt.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: remote attributes need to be considered data

commit df150ed102baa0e78c06e08e975dfb47147dd677 upstream.

We don't log remote attribute contents, and instead write them
synchronously before we commit the block allocation and attribute
tree update transaction. As a result we are writing to the allocated
space before the allcoation has been made permanent.

As a result, we cannot consider this allocation to be a metadata
allocation. Metadata allocation can take blocks from the free list
and so reuse them before the transaction that freed the block is
committed to disk. This behaviour is perfectly fine for journalled
metadata changes as log recovery will ensure the free operation is
replayed before the overwrite, but for remote attribute writes this
is not the case.

Hence we have to consider the remote attribute blocks to contain
data and allocate accordingly. We do this by dropping the
XFS_BMAPI_METADATA flag from the block allocation. This means the
allocation will not use blocks that are on the busy list without
first ensuring that the freeing transaction has been committed to
disk and the blocks removed from the busy list. This ensures we will
never overwrite a freed block without first ensuring that it is
really free.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: remote attribute headers contain an invalid LSN

commit e3c32ee9e3e747fec01eb38e6610a9157d44c3ea upstream.

In recent testing, a system that crashed failed log recovery on
restart with a bad symlink buffer magic number:

XFS (vda): Starting recovery (logdev: internal)
XFS (vda): Bad symlink block magic!
XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 2060

On examination of the log via xfs_logprint, none of the symlink
buffers in the log had a bad magic number, nor were any other types
of buffer log format headers mis-identified as symlink buffers.
Tracing was used to find the buffer the kernel was tripping over,
and xfs_db identified it's contents as:

000: 5841524d 00000000 00000346 64d82b48 8983e692 d71e4680 a5f49e2c b317576e
020: 00000000 00602038 00000000 006034ce d0020000 00000000 4d4d4d4d 4d4d4d4d
040: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
060: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
.....

This is a remote attribute buffer, which are notable in that they
are not logged but are instead written synchronously by the remote
attribute code so that they exist on disk before the attribute
transactions are committed to the journal.

The above remote attribute block has an invalid LSN in it - cycle
0xd002000, block 0 - which means when log recovery comes along to
determine if the transaction that writes to the underlying block
should be replayed, it sees a block that has a future LSN and so
does not replay the buffer data in the transaction. Instead, it
validates the buffer magic number and attaches the buffer verifier
to it. It is this buffer magic number check that is failing in the
above assert, indicating that we skipped replay due to the LSN of
the underlying buffer.

The problem here is that the remote attribute buffers cannot have a
valid LSN placed into them, because the transaction that contains
the attribute tree pointer changes and the block allocation that the
attribute data is being written to hasn't yet been committed. Hence
the LSN field in the attribute block is completely unwritten,
thereby leaving the underlying contents of the block in the LSN
field. It could have any value, and hence a future overwrite of the
block by log recovery may or may not work correctly.

Fix this by always writing an invalid LSN to the remote attribute
block, as any buffer in log recovery that needs to write over the
remote attribute should occur. We are protected from having old data
written over the attribute by the fact that freeing the block before
the remote attribute is written will result in the buffer being
marked stale in the log and so all changes prior to the buffer stale
transaction will be cancelled by log recovery.

Hence it is safe to ignore the LSN in the case or synchronously
written, unlogged metadata such as remote attribute blocks, and to
ensure we do that correctly, we need to write an invalid LSN to all
remote attribute blocks to trigger immediate recovery of metadata
that is written over the top.

As a further protection for filesystems that may already have remote
attribute blocks with bad LSNs on disk, change the log recovery code
to always trigger immediate recovery of metadata over remote
attribute blocks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/nouveau/drm/nv04-nv40/instmem: protect access to priv->heap by mutex

commit 7512223b1ece29a5968ed8b67ccb891d21b7834b upstream.

This fixes the list_del corruption reported
at <https://bugzilla.redhat.com/1205985>.

Signed-off-by: Kamil Dudka <kdudka@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/nouveau: hold mutex when calling nouveau_abi16_fini()

commit ac8c79304280da6ef05c348a9da03ab04898b994 upstream.

This was the only access to cli->abi16 without holding the mutex.

Signed-off-by: Kamil Dudka <kdudka@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/nouveau/kms/nv50-: guard against enabling cursor on disabled heads

commit 697bb728d9e2367020aa0c5af7363809d7658e43 upstream.

Userspace has started doing this, which upsets the display class hw
error checking in various unpleasant ways.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/nouveau/fbcon/nv11-: correctly account for ring space usage

commit d108142c0840ce389cd9898aa76943b3fb430b83 upstream.

The RING_SPACE macro accounts how much space is used up so it's
important to ask it for the right amount. Incorrect accounting of this
can cause page faults down the line as writes are attempted outside of
the ring.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

qla2xxx: kill sessions/log out initiator on RSCN and port down events

commit b2032fd567326ad0b2d443bb6d96d2580ec670a5 upstream.

To fix some issues talking to ESX, this patch modifies the qla2xxx driver
so that it never logs into remote ports.  This has the side effect of
getting rid of the "rports" entirely, which means we never log out of
initiators and never tear down sessions when an initiator goes away.

This is mostly OK, except that we can run into trouble if we have
initiator A assigned FC address X:Y:Z by the fabric talking to us, and
then initiator A goes away.  Some time (could be a long time) later,
initiator B comes along and also gets FC address X:Y:Z (which is
available again, because initiator A is gone).  If initiator B starts
talking to us, then we'll still have the session for initiator A, and
since we look up incoming IO based on the FC address X:Y:Z, initiator B
will end up using ACLs for initiator A.

Fix this by:

1. Handling RSCN events somewhat differently; instead of completely
    skipping the processing of fcports, we look through the list, and if
    an fcport disappears, we tell the target code the tear down the
    session and tell the HBA FW to release the N_Port handle.

2. Handling "port down" events by flushing all of our sessions.  The
    firmware was already releasing the N_Port handle but we want the
    target code to drop all the sessions too.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Alexei Potashnik <alexei@purestorage.com>
Acked-by: Quinn Tran <quinn.tran@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

qla2xxx: fix command initialization in target mode.

commit 9fce12540cb9f91e7f1f539a80b70f0b388bdae0 upstream.

Signed-off-by: Kanoj Sarcar <kanoj.sarcar@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

qla2xxx: Remove msleep in qlt_send_term_exchange

commit 6bc85dd595a5438b50ec085668e53ef26058bb90 upstream.

Remove unnecessary msleep from qlt_send_term_exchange as it
adds latency of 250 msec while sending terminate exchange to
an aborted task.

Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

qla2xxx: release request queue reservation.

commit 810e30bc4658e9c069577bde52394a5af872803c upstream.

Request IOCB queue element(s) is reserved during
good path IO. Under error condition such as unable
to allocate IOCB handle condition, the IOCB count
that was reserved is not released.

Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

qla2xxx: Fix hardware lock/unlock issue causing kernel panic.

commit ba9f6f64a0ff6b7ecaed72144c179061f8eca378 upstream.

[ Upstream commit ef86cb2059a14b4024c7320999ee58e938873032 ]

This patch fixes a kernel panic for qla2xxx Target core
Module driver introduced by a fix in the qla2xxx initiator code.

Commit ef86cb2 ("qla2xxx: Mark port lost when we receive an RSCN for it.")
introduced the regression for qla2xxx Target driver.

Stack trace will have following signature

--- <NMI exception stack> ---
[ffff88081faa3cc8] _raw_spin_lock_irqsave at ffffffff815b1f03
[ffff88081faa3cd0] qlt_fc_port_deleted at ffffffffa096ccd0 [qla2xxx]
[ffff88081faa3d20] qla2x00_schedule_rport_del at ffffffffa0913831[qla2xxx]
[ffff88081faa3d50] qla2x00_mark_device_lost at ffffffffa09159c5[qla2xxx]
[ffff88081faa3db0] qla2x00_async_event at ffffffffa0938d59 [qla2xxx]
[ffff88081faa3e30] qla24xx_msix_default at ffffffffa093a326 [qla2xxx]
[ffff88081faa3e90] handle_irq_event_percpu at ffffffff810a7b8d
[ffff88081faa3ee0] handle_irq_event at ffffffff810a7d32
[ffff88081faa3f10] handle_edge_irq at ffffffff810ab6b9
[ffff88081faa3f30] handle_irq at ffffffff8100619c
[ffff88081faa3f70] do_IRQ at ffffffff815b4b1c
--- <IRQ stack> ---

Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

intel_pstate: Add get_scaling cpu_defaults param to Knights Landing

commit 69cefc273f942bd7bb347a02e8b5b738d5f6e6f3 upstream.

Scaling for Knights Landing is same as the default scaling (100000).
When Knigts Landing support was added to the pstate driver, this
parameter was omitted resulting in a kernel panic during boot.

Fixes: b34ef932d79a (intel_pstate: Knights Landing support)
Reported-by: Yasuaki Ishimatsu <yishimat@redhat.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iscsi-target: Fix iser explicit logout TX kthread leak

commit 007d038bdf95ccfe2491d0078be54040d110fd06 upstream.

This patch fixes a regression introduced with the following commit
in v4.0-rc1 code, where an explicit iser-target logout would result
in ->tx_thread_active being incorrectly cleared by the logout post
handler, and subsequent TX kthread leak:

    commit 88dcd2dab5c23b1c9cfc396246d8f476c872f0ca
    Author: Nicholas Bellinger <nab@linux-iscsi.org>
    Date:   Thu Feb 26 22:19:15 2015 -0800

        iscsi-target: Convert iscsi_thread_set usage to kthread.h

To address this bug, change iscsit_logout_post_handler_closesession()
and iscsit_logout_post_handler_samecid() to only cmpxchg() on
->tx_thread_active for traditional iscsi/tcp connections.

This is required because iscsi/tcp connections are invoking logout
post handler logic directly from TX kthread context, while iser
connections are invoking logout post handler logic from a seperate
workqueue context.

Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iscsi-target: Fix iscsit_start_kthreads failure OOPs

commit e54198657b65625085834847ab6271087323ffea upstream.

This patch fixes a regression introduced with the following commit
in v4.0-rc1 code, where a iscsit_start_kthreads() failure triggers
a NULL pointer dereference OOPs:

    commit 88dcd2dab5c23b1c9cfc396246d8f476c872f0ca
    Author: Nicholas Bellinger <nab@linux-iscsi.org>
    Date:   Thu Feb 26 22:19:15 2015 -0800

        iscsi-target: Convert iscsi_thread_set usage to kthread.h

To address this bug, move iscsit_start_kthreads() immediately
preceeding the transmit of last login response, before signaling
a successful transition into full-feature-phase within existing
iscsi_target_do_tx_login_io() logic.

This ensures that no target-side resource allocation failures can
occur after the final login response has been successfully sent.

Also, it adds a iscsi_conn->rx_login_comp to allow the RX thread
to sleep to prevent other socket related failures until the final
iscsi_post_login_handler() call is able to complete.

Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iscsi-target: Fix use-after-free during TPG session shutdown

commit 417c20a9bdd1e876384127cf096d8ae8b559066c upstream.

This patch fixes a use-after-free bug in iscsit_release_sessions_for_tpg()
where se_portal_group->session_lock was incorrectly released/re-acquired
while walking the active se_portal_group->tpg_sess_list.

The can result in a NULL pointer dereference when iscsit_close_session()
shutdown happens in the normal path asynchronously to this code, causing
a bogus dereference of an already freed list entry to occur.

To address this bug, walk the session list checking for the same state
as before, but move entries to a local list to avoid dropping the lock
while walking the active list.

As before, signal using iscsi_session->session_restatement=1 for those
list entries to be released locally by iscsit_free_session() code.

Reported-by: Sunilkumar Nadumuttlu <sjn@datera.io>
Cc: Sunilkumar Nadumuttlu <sjn@datera.io>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/ipoib: Fix CONFIG_INFINIBAND_IPOIB_CM

commit efc1eedbf63a194b3b576fc25776f3f1fa55a4d4 upstream.

If the above is turned off then ipoib_cm_dev_init unconditionally
returns ENOSYS, and the newly added error handling in
0b3957 prevents ipoib from coming up at all:

kernel: mlx4_0: ipoib_transport_dev_init failed
kernel: mlx4_0: failed to initialize port 1 (ret = -12)

Fixes: 0b39578bcde4 (IB/ipoib: Use dedicated workqueues per interface)
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS: Fix a memory leak in nfs_do_recoalesce

commit 03d5eb65b53889fe98a5ecddfe205c16e3093190 upstream.

If the function exits early, then we must put those requests that were
not processed back onto the &mirror->pg_list so they can be cleaned up
by nfs_pgio_error().

Fixes: a7d42ddb30997 ("nfs: add mirroring support to pgio layer")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked

commit 3c38cbe2ade88240fabb585b408f779ad3b9a31b upstream.

Otherwise, nfs4_select_rw_stateid() will always return the zero stateid
instead of the correct open stateid.

Fixes: f95549cf24660 ("NFSv4: More CLOSE/OPEN races")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

avr32: handle NULL as a valid clock object

commit 5c02a4206538da12c040b51778d310df84c6bf6c upstream.

Since NULL is used as valid clock object on optional clocks we have to handle
this case in avr32 implementation as well.

Fixes: e1824dfe0d8e (net: macb: Adjust tx_clk when link speed changes)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS: Don't revalidate the mapping if both size and change attr are up to date

commit 85a23cee3f2c928475f31777ead5a71340a12fc3 upstream.

If we've ensured that the size and the change attribute are both correct,
then there is no point in marking those attributes as needing revalidation
again. Only do so if we know the size is incorrect and was not updated.

Fixes: f2467b6f64da ("NFS: Clear NFS_INO_REVAL_PAGECACHE when...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwmon: (nct7904) Rename pwm attributes to match hwmon ABI

commit 0d6aaffc3a6db642e0a165ba4d17d6d7bbaf5201 upstream.

pwm attributes have well defined names, which should be used.

Cc: Vadim V. Vlasov <vvlasov@dev.rtsoft.ru>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwmon: (nct7802) Fix integer overflow seen when writing voltage limits

commit 9200bc4c28cd8992eb5379345abd6b4f0c93df16 upstream.

Writing a large value into a voltage limit attribute can result
in an overflow due to an auto-conversion from unsigned long to
unsigned int.

Cc: Constantine Shulyupin <const@MakeLinux.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

vhost: actually track log eventfd file

commit 7932c0bd7740f4cd2aa168d3ce0199e7af7d72d5 upstream.

While reviewing vhost log code, I found out that log_file is never
set. Note: I haven't tested the change (QEMU doesn't use LOG_FD yet).

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/x86/intel/cqm: Return cached counter value from IRQ context

commit 2c534c0da0a68418693e10ce1c4146e085f39518 upstream.

Peter reported the following potential crash which I was able to
reproduce with his test program,

[  148.765788] ------------[ cut here ]------------
[  148.765796] WARNING: CPU: 34 PID: 2840 at kernel/smp.c:417 smp_call_function_many+0xb6/0x260()
[  148.765797] Modules linked in:
[  148.765800] CPU: 34 PID: 2840 Comm: perf Not tainted 4.2.0-rc1+ #4
[  148.765803]  ffffffff81cdc398 ffff88085f105950 ffffffff818bdfd5 0000000000000007
[  148.765805]  0000000000000000 ffff88085f105990 ffffffff810e413a 0000000000000000
[  148.765807]  ffffffff82301080 0000000000000022 ffffffff8107f640 ffffffff8107f640
[  148.765809] Call Trace:
[  148.765810]  <NMI>  [<ffffffff818bdfd5>] dump_stack+0x45/0x57
[  148.765818]  [<ffffffff810e413a>] warn_slowpath_common+0x8a/0xc0
[  148.765822]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[  148.765824]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[  148.765825]  [<ffffffff810e422a>] warn_slowpath_null+0x1a/0x20
[  148.765827]  [<ffffffff811613f6>] smp_call_function_many+0xb6/0x260
[  148.765829]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[  148.765831]  [<ffffffff81161748>] on_each_cpu_mask+0x28/0x60
[  148.765832]  [<ffffffff8107f6ef>] intel_cqm_event_count+0x7f/0xe0
[  148.765836]  [<ffffffff811cdd35>] perf_output_read+0x2a5/0x400
[  148.765839]  [<ffffffff811d2e5a>] perf_output_sample+0x31a/0x590
[  148.765840]  [<ffffffff811d333d>] ? perf_prepare_sample+0x26d/0x380
[  148.765841]  [<ffffffff811d3497>] perf_event_output+0x47/0x60
[  148.765843]  [<ffffffff811d36c5>] __perf_event_overflow+0x215/0x240
[  148.765844]  [<ffffffff811d4124>] perf_event_overflow+0x14/0x20
[  148.765847]  [<ffffffff8107e7f4>] intel_pmu_handle_irq+0x1d4/0x440
[  148.765849]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765853]  [<ffffffff81219bad>] ? vunmap_page_range+0x19d/0x2f0
[  148.765854]  [<ffffffff81219d11>] ? unmap_kernel_range_noflush+0x11/0x20
[  148.765859]  [<ffffffff814ce6fe>] ? ghes_copy_tofrom_phys+0x11e/0x2a0
[  148.765863]  [<ffffffff8109e5db>] ? native_apic_msr_write+0x2b/0x30
[  148.765865]  [<ffffffff8109e44d>] ? x2apic_send_IPI_self+0x1d/0x20
[  148.765869]  [<ffffffff81065135>] ? arch_irq_work_raise+0x35/0x40
[  148.765872]  [<ffffffff811c8d86>] ? irq_work_queue+0x66/0x80
[  148.765875]  [<ffffffff81075306>] perf_event_nmi_handler+0x26/0x40
[  148.765877]  [<ffffffff81063ed9>] nmi_handle+0x79/0x100
[  148.765879]  [<ffffffff81064422>] default_do_nmi+0x42/0x100
[  148.765880]  [<ffffffff81064563>] do_nmi+0x83/0xb0
[  148.765884]  [<ffffffff818c7c0f>] end_repeat_nmi+0x1e/0x2e
[  148.765886]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765888]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765890]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765891]  <<EOE>>  [<ffffffff8110ab66>] finish_task_switch+0x156/0x210
[  148.765898]  [<ffffffff818c1671>] __schedule+0x341/0x920
[  148.765899]  [<ffffffff818c1c87>] schedule+0x37/0x80
[  148.765903]  [<ffffffff810ae1af>] ? do_page_fault+0x2f/0x80
[  148.765905]  [<ffffffff818c1f4a>] schedule_user+0x1a/0x50
[  148.765907]  [<ffffffff818c666c>] retint_careful+0x14/0x32
[  148.765908] ---[ end trace e33ff2be78e14901 ]---

The CQM task events are not safe to be called from within interrupt
context because they require performing an IPI to read the counter value
on all sockets. And performing IPIs from within IRQ context is a
"no-no".

Make do with the last read counter value currently event in
event->count when we're invoked in this context.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/1437490509-15373-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf hists browser: Take the --comm, --dsos, etc filters into account

commit 9c0fa8dd3d58de8b688fda758eea1719949c7f0a upstream.

At some point:

  commit 2c86c7ca7606
  Author: Namhyung Kim <namhyung@kernel.org>
  Date:   Mon Mar 17 18:18:54 2014 -0300

    perf report: Merge al->filtered with hist_entry->filtered

We stopped dropping samples for things filtered via the --comms, --dsos,
--symbols, etc, i.e. things marked as filtered in the symbol resolution
routines (thread__find_addr_map(), perf_event__preprocess_sample(),
etc).

But then, in:

  commit 268397cb2a47
  Author: Namhyung Kim <namhyung@kernel.org>
  Date:   Tue Apr 22 14:49:31 2014 +0900

    perf top/tui: Update nr_entries properly after a filter is applied

We don't take into account entries that were filtered in
perf_event__preprocess_sample() and friends, which leads to
inconsistency in the browser seek routines, that expects the number of
hist_entry->filtered entries to match what it thinks is the number of
unfiltered, browsable entries.

So, for instance, when we do:

  perf top --symbols ___non_existent_symbol___

the hist_browser__nr_entries() routine thinks there are no filters in
place, uses the hists->nr_entries but all entries are filtered, leading
to a segfault.

Tested with:

   perf top --symbols malloc,free --percentage=relative

Freezing, by pressing 'f', at any time and doing the math on the
percentages ends up with 100%, ditto for:

   perf top --dsos libpthread-2.20.so,libxul.so --percentage=relative

Both were segfaulting, all fixed now.

More work needed to do away with checking if filters are in place, we
should just use the nr_non_filtered_samples counter, no need to
conditionally use it or hists.nr_filter, as what the browser does is
just show unfiltered stuff. An audit of how it is being accounted is
needed, this is the minimal fix.

Reported-by: Michael Petlan <mpetlan@redhat.com>
Fixes: 268397cb2a47 ("perf top/tui: Update nr_entries properly after a filter is applied")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-6w01d5q97qk0d64kuojme5in@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

blk-mq: set default timeout as 30 seconds

commit e56f698bd0720e17f10f39e8b0b5b446ad0ab22c upstream.

It is reasonable to set default timeout of request as 30 seconds instead of
30000 ticks, which may be 300 seconds if HZ is 100, for example, some arm64
based systems may choose 100 HZ.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Fixes: c76cbbcf4044 ("blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue()"
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

n_tty: signal and flush atomically

commit 3b19e032295647b7be2aa3be62510db4aaeda759 upstream.

When handling signalling char, claim the termios write lock before
signalling waiting readers and writers to prevent further i/o
before flushing the echo and output buffers. This prevents a
userspace signal handler which may output from racing the terminal
flush.

Reference: Bugzilla #99351 ("Output truncated in ssh session after...")
Fixes: commit d2b6f44779d3 ("n_tty: Fix signal handling flushes")
Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rds: rds_ib_device.refcount overflow

commit 4fabb59449aa44a585b3603ffdadd4c5f4d0c033 upstream.

Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device")
There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.

A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).

115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118         if (atomic_dec_and_test(&rds_ibdev->refcount))
119                 queue_work(rds_wq, &rds_ibdev->free_work);
120 }

fix is to drop refcount when rds_ib_alloc_fmr failed.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARC: Make ARC bitops "safer" (add anti-optimization)

commit 80f420842ff42ad61f84584716d74ef635f13892 upstream.

ARCompact/ARCv2 ISA provide that any instructions which deals with
bitpos/count operand ASL, LSL, BSET, BCLR, BMSK .... will only consider
lower 5 bits. i.e. auto-clamp the pos to 0-31.

ARC Linux bitops exploited this fact by NOT explicitly masking out upper
bits for @nr operand in general, saving a bunch of AND/BMSK instructions
in generated code around bitops.

While this micro-optimization has worked well over years it is NOT safe
as shifting a number with a value, greater than native size is
"undefined" per "C" spec.

So as it turns outm EZChip ran into this eventually, in their massive
muti-core SMP build with 64 cpus. There was a test_bit() inside a loop
from 63 to 0 and gcc was weirdly optimizing away the first iteration
(so it was really adhering to standard by implementing undefined behaviour
vs. removing all the iterations which were phony i.e. (1 << [63..32])

| for i = 63 to 0
|    X = ( 1 << i )
|    if X == 0
|       continue

So fix the code to do the explicit masking at the expense of generating
additional instructions. Fortunately, this can be mitigated to a large
extent as gcc has SHIFT_COUNT_TRUNCATED which allows combiner to fold
masking into shift operation itself. It is currently not enabled in ARC
gcc backend, but could be done after a bit of testing.

Fixes STAR 9000866918 ("unsafe "undefined behavior" code in kernel")

Reported-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARC: Reduce bitops lines of code using macros

commit 04e2eee4b02edcafce96c9c37b31b1a3318291a4 upstream.

No semantical changes !

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/efi: Use all 64 bit of efi_memmap in setup_e820()

commit 7cc03e48965453b5df1cce5062c826189b04b960 upstream.

The efi_info structure stores low 32 bits of memory map
in efi_memmap and high 32 bits in efi_memmap_hi.

While constructing pointer in the setup_e820(), need
to take into account all 64 bit of the pointer.

It is because on 64bit machine the function
efi_get_memory_map() may return full 64bit pointer and before
the patch that pointer was truncated.

The issue is triggered on Parallles virtual machine and
fixed with this patch.

Signed-off-by: Dmitry Skorodumov <sdmitry@parallels.com>
Cc: Denis V. Lunev <den@openvz.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

efi: Check for NULL efi kernel parameters

commit 9115c7589b11349a1c3099758b4bded579ff69e0 upstream.

Even though it is documented how to specifiy efi parameters, it is
possible to cause a kernel panic due to a dereference of a NULL pointer when
parsing such parameters if "efi" alone is given:

PANIC: early exception 0e rip 10:ffffffff812fb361 error 0 cr2 0
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-rc1+ #450
[ 0.000000]  ffffffff81fe20a9 ffffffff81e03d50 ffffffff8184bb0f 00000000000003f8
[ 0.000000]  0000000000000000 ffffffff81e03e08 ffffffff81f371a1 64656c62616e6520
[ 0.000000]  0000000000000069 000000000000005f 0000000000000000 0000000000000000
[ 0.000000] Call Trace:
[ 0.000000]  [<ffffffff8184bb0f>] dump_stack+0x45/0x57
[ 0.000000]  [<ffffffff81f371a1>] early_idt_handler_common+0x81/0xae
[ 0.000000]  [<ffffffff812fb361>] ? parse_option_str+0x11/0x90
[ 0.000000]  [<ffffffff81f4dd69>] arch_parse_efi_cmdline+0x15/0x42
[ 0.000000]  [<ffffffff81f376e1>] do_early_param+0x50/0x8a
[ 0.000000]  [<ffffffff8106b1b3>] parse_args+0x1e3/0x400
[ 0.000000]  [<ffffffff81f37a43>] parse_early_options+0x24/0x28
[ 0.000000]  [<ffffffff81f37691>] ? loglevel+0x31/0x31
[ 0.000000]  [<ffffffff81f37a78>] parse_early_param+0x31/0x3d
[ 0.000000]  [<ffffffff81f3ae98>] setup_arch+0x2de/0xc08
[ 0.000000]  [<ffffffff8109629a>] ? vprintk_default+0x1a/0x20
[ 0.000000]  [<ffffffff81f37b20>] start_kernel+0x90/0x423
[ 0.000000]  [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
[ 0.000000]  [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
[ 0.000000] RIP 0xffffffff81ba2efc

This panic is not reproducible with "efi=" as this will result in a non-NULL
zero-length string.

Thus, verify that the pointer to the parameter string is not NULL. This is
consistent with other parameter-parsing functions which check for NULL pointers.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64/efi: map the entire UEFI vendor string before reading it

commit f91b1feada0b6f0a4d33648155b3ded2c4e0707e upstream.

At boot, the UTF-16 UEFI vendor string is copied from the system
table into a char array with a size of 100 bytes. However, this
size of 100 bytes is also used for memremapping() the source,
which may not be sufficient if the vendor string exceeds 50
UTF-16 characters, and the placement of the vendor string inside
a 4 KB page happens to leave the end unmapped.

So use the correct '100 * sizeof(efi_char16_t)' for the size of
the mapping.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: f84d02755f5a ("arm64: add EFI runtime services")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

efi: Handle memory error structures produced based on old versions of standard

commit 4c62360d7562a20c996836d163259c87d9378120 upstream.

The memory error record structure includes as its first field a
bitmask of which subsequent fields are valid. The allows new fields
to be added to the structure while keeping compatibility with older
software that parses these records. This mechanism was used between
versions 2.2 and 2.3 to add four new fields, growing the size of the
structure from 73 bytes to 80. But Linux just added all the new
fields so this test:
if (gdata->error_data_length >= sizeof(*mem_err))
cper_print_mem(newpfx, mem_err);
else
goto err_section_too_small;
now make Linux complain about old format records being too short.

Add a definition for the old format of the structure and use that
for the minimum size check. Pass the actual size to cper_print_mem()
so it can sanity check the validation_bits field to ensure that if
a BIOS using the old format sets bits as if it were new, we won't
access fields beyond the end of the structure.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/mm: Add parenthesis for TLB tracepoint size calculation

commit bbc03778b9954a2ec93baed63718e4df0192f130 upstream.

flush_tlb_info->flush_start/end are both normal virtual
addresses. When calculating 'nr_pages' (only used for the
tracepoint), I neglected to put parenthesis in.

Thanks to David Koufaty for pointing this out.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Link: http://lkml.kernel.org/r/20150720230153.9E834081@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mei: prevent unloading mei hw modules while the device is opened.

commit 154322f47376fed6ab1e4b350aa45fffa15a61aa upstream.

chrdev_open() increases reference counter on cdev->owner. Instead of
assigning the owner to mei subsystem, the owner has to be set to the
underlaying HW module (mei_me or mei_txe), so once the device is opened
the HW module cannot be unloaded.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xhci: do not report PLC when link is in internal resume state

commit aca3a0489ac019b58cf32794d5362bb284cb9b94 upstream.

Port link change with port in resume state should not be
reported to usbcore, as this is an internal state to be
handled by xhci driver. Reporting PLC to usbcore may
cause usbcore clearing PLC first and port change event irq
won't be generated.

Signed-off-by: Zhuang Jin Can <jin.can.zhuang@intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xhci: prevent bus_suspend if SS port resuming in phase 1

commit fac4271d1126c45ceaceb7f4a336317b771eb121 upstream.

When the link is just waken, it's in Resume state, and driver sets PLS to
U0. This refers to Phase 1. Phase 2 refers to when the link has completed
the transition from Resume state to U0.

With the fix of xhci: report U3 when link is in resume state, it also
exposes an issue that usb3 roothub and controller can suspend right
after phase 1, and this causes a hard hang in controller.

To fix the issue, we need to prevent usb3 bus suspend if any port is
resuming in phase 1.

[merge separate USB2 and USB3 port resume checking to one -Mathias]
Signed-off-by: Zhuang Jin Can <jin.can.zhuang@intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xhci: report U3 when link is in resume state

commit 243292a2ad3dc365849b820a64868927168894ac upstream.

xhci_hub_report_usb3_link_state() returns pls as U0 when the link
is in resume state, and this causes usb core to think the link is in
U0 while actually it's in resume state. When usb core transfers
control request on the link, it fails with TRB error as the link
is not ready for transfer.

To fix the issue, report U3 when the link is in resume state, thus
usb core knows the link it's not ready for transfer.

Signed-off-by: Zhuang Jin Can <jin.can.zhuang@intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xhci: Calculate old endpoints correctly on device reset

commit 326124a027abc9a7f43f72dc94f6f0f7a55b02b3 upstream.

When resetting a device the number of active TTs may need to be
corrected by xhci_update_tt_active_eps, but the number of old active
endpoints supplied to it was always zero, so the number of TTs and the
bandwidth reserved for them was not updated, and could rise
unnecessarily.

This affected systems using Intel's Patherpoint chipset, which rely on
software bandwidth checking. For example, a Lenovo X230 would lose the
ability to use ports on the docking station after enough suspend/resume
cycles because the bandwidth calculated would rise with every cycle when
a suitable device is attached.

The correct number of active endpoints is calculated in the same way as
in xhci_reserve_bandwidth.

Signed-off-by: Brian Campbell <bacam@z273.org.uk>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>