www.infradead.org Git - users/jedix/linux-maple.git/log

Merge branch topic/uek-4.1/dtrace of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch topic/uek-4.1/xen of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Conflicts:
include/linux/mm.h

mm: remove gup_flags FOLL_WRITE games from __get_user_pages()

This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").

In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better).  The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9.  Earlier kernels will
have to look at the page state itself.

Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.

To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.

Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619)
Orabug: 24926639
Conflicts:
        include/linux/mm.h
        mm/gup.c
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

drivers/nvme: provide a module parameter for setting number of I/O queues

Orabug: 24914952

The current NVME driver allocates I/O queue per-cpu and alloctes IRQ
per-queue for the devices, a large number of IRQs will be allotted to
the I/O queues on a large NUMA/SMP system with multiple NVME devices
installed because of this design.

It would cause failure of CPU hotplug operations on the above mentioned
system, the problem is that the CPU cores could not be hotplugged after
certain number of them are offlined because the remaining online CPU cores
have not enough IRQ vectors to accept the large number of migrated IRQs.

This patch fixes it by providing a way to reduce the NVME queue IRQs to
an acceptable number.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

blk-mq: improve warning for running a queue on the wrong CPU

__blk_mq_run_hw_queue() currently warns if we are running the queue on a
CPU that isn't set in its mask. However, this can happen if a CPU is
being offlined, and the workqueue handling will place the work on CPU0
instead. Improve the warning so that it only triggers if the batch cpu
in the hardware queue is currently online. If it triggers for that
case, then it's indicative of a flow problem in blk-mq, so we want to
retain it for that case.

Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 0e87e58bf60edb6bb28e493c7a143f41b091a5e5)

Orabug: 24914952
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

blk-mq: fix freeze queue race

Orabug: 24914952

There are several race conditions while freezing queue.

When unfreezing queue, there is a small window between decrementing
q->mq_freeze_depth to zero and percpu_ref_reinit() call with
q->mq_usage_counter.  If the other calls blk_mq_freeze_queue_start()
in the window, q->mq_freeze_depth is increased from zero to one and
percpu_ref_kill() is called with q->mq_usage_counter which is already
killed.  percpu refcount should be re-initialized before killed again.

Also, there is a race condition while switching to percpu mode.
percpu_ref_switch_to_percpu() and percpu_ref_kill() must not be
executed at the same time as the following scenario is possible:

1. q->mq_usage_counter is initialized in atomic mode.
   (atomic counter: 1)

2. After the disk registration, a process like systemd-udev starts
   accessing the disk, and successfully increases refcount successfully
   by percpu_ref_tryget_live() in blk_mq_queue_enter().
   (atomic counter: 2)

3. In the final stage of initialization, q->mq_usage_counter is being
   switched to percpu mode by percpu_ref_switch_to_percpu() in
   blk_mq_finish_init().  But if CONFIG_PREEMPT_VOLUNTARY is enabled,
   the process is rescheduled in the middle of switching when calling
   wait_event() in __percpu_ref_switch_to_percpu().
   (atomic counter: 2)

4. CPU hotplug handling for blk-mq calls percpu_ref_kill() to freeze
   request queue.  q->mq_usage_counter is decreased and marked as
   DEAD.  Wait until all requests have finished.
   (atomic counter: 1)

5. The process rescheduled in the step 3. is resumed and finishes
   all remaining work in __percpu_ref_switch_to_percpu().
   A bias value is added to atomic counter of q->mq_usage_counter.
   (atomic counter: PERCPU_COUNT_BIAS + 1)

6. A request issed in the step 2. is finished and q->mq_usage_counter
   is decreased by blk_mq_queue_exit().  q->mq_usage_counter is DEAD,
   so atomic counter is decreased and no release handler is called.
   (atomic counter: PERCPU_COUNT_BIAS)

7. CPU hotplug handling in the step 4. will wait forever as
   q->mq_usage_counter will never be zero.

Also, percpu_ref_reinit() and percpu_ref_kill() must not be executed
at the same time.  Because both functions could call
__percpu_ref_switch_to_percpu() which adds the bias value and
initialize percpu counter.

Fix those races by serializing with per-queue mutex.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
(cherry picked from https://patchwork.kernel.org/patch/7269471/)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

nvme: Remove RCU namespace protection

We can't sleep with RCU read lock held, but we need to do potentially
blocking stuff to namespace queues when iterating the list. This patch
removes the RCU locking and holds a mutex instead.

To prevent deadlocks, this patch removes holding the mutex during
namespace scanning and removal. The unlocked namespace scanning is made
safe by holding a reference to the namespace being scanned.

List iteration that does IO has to be unlocked to allow error recovery.
The caller must ensure the list can not be manipulated during such an
event, so this patch adds a comment explaining this requirement to the
only function that iterates an unlocked list. All callers currently
meet this requirement, so no further changes required.

List iterations that do not do IO can safely use the lock since it couldn't
block recovery from missing forced IO completions.

Reported-by: Ming Lin <mlin at kernel.org>
[fixes 0bf77e9 nvme: switch to RCU freeing the namespace]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 32f0c4afb4363e31dad49202f1554ba591d649f2)

Orabug: 24583236
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

nvme: synchronize access to ctrl->namespaces

Currently traversal and modification of ctrl->namespaces happens completely
unsynchronized, which can be fixed by the addition of a simple mutex.

Note: nvme_dev_ioctl will be handled in the next patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69d3b8ac15a5eb938e6a01909f6cc8ae4b5d3a17)

Orabug: 24583236
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

NVMe: Implement namespace list scanning

The NVMe 1.1 specification provides an identify mode to return a
list of active namespaces. This is more efficient to discover which
namespace identifiers are active on a controller, providing potentially
significant improvement in scan time for controllers with sparesly
populated namespaces.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: add quirk for the broken Qemu Identify implementation. To be relaxed
later]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 540c801c65eb58e05e0ca38b6fd644a83d7e2b33)

Orabug: 24583236
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

Merge branch topic/uek-4.1/uek-carry of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

dtrace: ensure new SDT info generation works on sparc64

Due to addresses on sparc64 being in the lower range of the memory map, the
calculated addresses for SDT probe points were represented with less hex
characters then their function base address counterparts (obtained from
objdump output). This messed up the sorting based on address values, and
resulted in all probe points being associated with the last function in the
list.

This commit ensures that all addresses are 16 hex characters long.

Orabug: 24655168

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Nick Alcock <nick.alcock@oracle.com>

net: add recursion limit to GRO

Orabug: 24829124
CVE: CVE-2016-7039

Currently, GRO can do unlimited recursion through the gro_receive
handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem.  Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.

This patch adds a recursion counter to the GRO layer to prevent stack
overflow.  When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally.

Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca <sd () queasysnail net>
Reviewed-by: Jiri Benc <jbenc () redhat com>
Acked-by: Hannes Frederic Sowa <hannes () stressinduktion org>
(cherry picked from commit e71f3b1fca2ae5d0ae9d9c1a02a93d52beaae322)

Signed-off-by: Brian Maly <brian.maly@oracle.com>

ocfs2: fix trans extend while free cached blocks

Orabug: 24759174

Root cause of this issue is the same with the one fixed by last patch,
but this time credits for allocator inode and group descriptor may not
be consumed before trans extend.

The following error was caught.

[  685.240276] WARNING: CPU: 0 PID: 2037 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
[  685.240294] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_netfront parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
[  685.240296] CPU: 0 PID: 2037 Comm: rm Tainted: G        W       4.1.12-37.6.3.el6uek.bug24573128v2.x86_64 #2
[  685.240296] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
[  685.240298]  000000000000010d ffff88007ac3f808 ffffffff816bc5bc 000000000000010d
[  685.240300]  0000000000000000 ffff88007ac3f848 ffffffff81081475 ffff88007ac3f828
[  685.240301]  ffff880037bbf000 ffff880037688210 0000000000000095 0000000000000050
[  685.240301] Call Trace:
[  685.240305]  [<ffffffff816bc5bc>] dump_stack+0x48/0x5c
[  685.240308]  [<ffffffff81081475>] warn_slowpath_common+0x95/0xe0
[  685.240310]  [<ffffffff810814da>] warn_slowpath_null+0x1a/0x20
[  685.240313]  [<ffffffffa0080993>] start_this_handle+0x4c3/0x510 [jbd2]
[  685.240317]  [<ffffffffa0088f95>] ? __jbd2_log_start_commit+0xe5/0xf0 [jbd2]
[  685.240319]  [<ffffffff810c4eb3>] ? __wake_up+0x53/0x70
[  685.240322]  [<ffffffffa0080b41>] jbd2__journal_restart+0x161/0x1b0 [jbd2]
[  685.240325]  [<ffffffffa0080ba3>] jbd2_journal_restart+0x13/0x20 [jbd2]
[  685.240340]  [<ffffffffa06d1cf4>] ocfs2_extend_trans+0x74/0x220 [ocfs2]
[  685.240347]  [<ffffffffa069762b>] ocfs2_free_cached_blocks+0x16b/0x4e0 [ocfs2]
[  685.240349]  [<ffffffff810e9131>] ? internal_add_timer+0x91/0xc0
[  685.240356]  [<ffffffffa06981b0>] ocfs2_run_deallocs+0x70/0x270 [ocfs2]
[  685.240363]  [<ffffffffa06a2894>] ocfs2_commit_truncate+0x474/0x6f0 [ocfs2]
[  685.240374]  [<ffffffffa0744a40>] ? ocfs2_xattr_tree_et_ops+0x60/0xfffffffffffe8c00 [ocfs2]
[  685.240384]  [<ffffffffa06d1960>] ? ocfs2_journal_access_eb+0x20/0x20 [ocfs2]
[  685.240385]  [<ffffffff81202303>] ? __sb_end_write+0x33/0x70
[  685.240394]  [<ffffffffa06ca57d>] ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
[  685.240402]  [<ffffffffa06ca1f4>] ? ocfs2_query_inode_wipe+0xf4/0x320 [ocfs2]
[  685.240409]  [<ffffffffa06caed6>] ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
[  685.240415]  [<ffffffffa06ca1f4>] ? ocfs2_query_inode_wipe+0xf4/0x320 [ocfs2]
[  685.240422]  [<ffffffffa06cb6e2>] ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
[  685.240424]  [<ffffffff812298c9>] ? __inode_wait_for_writeback+0x69/0xc0
[  685.240437]  [<ffffffffa0732100>] ? __PRETTY_FUNCTION__.112282+0x20/0xffffffffffffb500 [ocfs2]
[  685.240444]  [<ffffffffa06cc1f8>] ocfs2_evict_inode+0x28/0x60 [ocfs2]
[  685.240445]  [<ffffffff8121b81b>] evict+0xab/0x1a0
[  685.240456]  [<ffffffffa0732100>] ? __PRETTY_FUNCTION__.112282+0x20/0xffffffffffffb500 [ocfs2]
[  685.240457]  [<ffffffff8121ba06>] iput_final+0xf6/0x190
[  685.240458]  [<ffffffff8121bb68>] iput+0xc8/0xe0
[  685.240460]  [<ffffffff8120f9b7>] do_unlinkat+0x1b7/0x310
[  685.240462]  [<ffffffff81126dbc>] ? __audit_syscall_entry+0xac/0x110
[  685.240464]  [<ffffffff810236cc>] ? do_audit_syscall_entry+0x6c/0x70
[  685.240465]  [<ffffffff81023823>] ? syscall_trace_enter_phase1+0x153/0x180
[  685.240467]  [<ffffffff8120fd52>] SyS_unlinkat+0x22/0x40
[  685.240468]  [<ffffffff816c122e>] system_call_fastpath+0x12/0x71
[  685.240469] ---[ end trace a62437cb060baa71 ]---
[  685.240470] JBD2: rm wants too many credits (149 > 128)

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix trans extend while flush truncate log

Orabug: 24759174

Every time,  ocfs2_extend_trans() included a credit for truncate log inode,
but as that inode had been managed by jbd2 running transaction first time,
it will not consume that credit until jbd2_journal_restart(). Since total
credits to extend always included the un-consumed ones, there will be more
and more un-consumed credit, at last jbd2_journal_restart() will fail due
to credit number over the half of max transction credit.

The following error was caught when unlink a large file with many extents.

[233096.013936] ------------[ cut here ]------------
[233096.018586] WARNING: CPU: 0 PID: 13626 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
[233096.028335] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
[233096.081751] CPU: 0 PID: 13626 Comm: unlink Tainted: G        W       4.1.12-37.6.3.el6uek.x86_64 #2
[233096.088556] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
[233096.093125]  000000000000010d ffff88000018b768 ffffffff816bc5bc 000000000000010d
[233096.099082]  0000000000000000 ffff88000018b7a8 ffffffff81081475 ffff88000018b788
[233096.105038]  ffff88007a99a000 ffff88007b573390 00000000000000fb 0000000000000050
[233096.110540] Call Trace:
[233096.111893]  [<ffffffff816bc5bc>] dump_stack+0x48/0x5c
[233096.114637]  [<ffffffff81081475>] warn_slowpath_common+0x95/0xe0
[233096.117797]  [<ffffffff810814da>] warn_slowpath_null+0x1a/0x20
[233096.120984]  [<ffffffffa0080993>] start_this_handle+0x4c3/0x510 [jbd2]
[233096.124505]  [<ffffffffa0088f95>] ? __jbd2_log_start_commit+0xe5/0xf0 [jbd2]
[233096.128115]  [<ffffffff810c4eb3>] ? __wake_up+0x53/0x70
[233096.130924]  [<ffffffffa0080b41>] jbd2__journal_restart+0x161/0x1b0 [jbd2]
[233096.134523]  [<ffffffffa0080ba3>] jbd2_journal_restart+0x13/0x20 [jbd2]
[233096.137986]  [<ffffffffa06d1d94>] ocfs2_extend_trans+0x74/0x220 [ocfs2]
[233096.141407]  [<ffffffffa06d156a>] ? ocfs2_journal_dirty+0x3a/0x90 [ocfs2]
[233096.144921]  [<ffffffffa0692943>] ocfs2_replay_truncate_records+0x93/0x360 [ocfs2]
[233096.148819]  [<ffffffffa0697ace>] __ocfs2_flush_truncate_log+0x13e/0x3a0 [ocfs2]
[233096.152644]  [<ffffffffa0697304>] ? ocfs2_reserve_blocks_for_rec_trunc.clone.0+0x44/0x1f0 [ocfs2]
[233096.157310]  [<ffffffffa069f768>] ocfs2_remove_btree_range+0x458/0x7f0 [ocfs2]
[233096.161099]  [<ffffffffa0696777>] ? __ocfs2_find_path+0x187/0x2d0 [ocfs2]
[233096.164612]  [<ffffffffa06a2673>] ocfs2_commit_truncate+0x1b3/0x6f0 [ocfs2]
[233096.168204]  [<ffffffffa0744ac0>] ? ocfs2_xattr_tree_et_ops+0x60/0xfffffffffffe8c20 [ocfs2]
[233096.172539]  [<ffffffffa06d1a00>] ? ocfs2_journal_access_eb+0x20/0x20 [ocfs2]
[233096.176285]  [<ffffffff81202303>] ? __sb_end_write+0x33/0x70
[233096.179226]  [<ffffffffa06ca61d>] ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
[233096.183009]  [<ffffffffa06ca294>] ? ocfs2_query_inode_wipe+0xf4/0x320 [ocfs2]
[233096.186738]  [<ffffffffa06caf76>] ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
[233096.190165]  [<ffffffffa06ca294>] ? ocfs2_query_inode_wipe+0xf4/0x320 [ocfs2]
[233096.193846]  [<ffffffffa06cb782>] ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
[233096.197274]  [<ffffffff812298c9>] ? __inode_wait_for_writeback+0x69/0xc0
[233096.200736]  [<ffffffffa0732180>] ? __PRETTY_FUNCTION__.112282+0x20/0xffffffffffffb520 [ocfs2]
[233096.205146]  [<ffffffffa06cc298>] ocfs2_evict_inode+0x28/0x60 [ocfs2]
[233096.208462]  [<ffffffff8121b81b>] evict+0xab/0x1a0
[233096.211020]  [<ffffffffa0732180>] ? __PRETTY_FUNCTION__.112282+0x20/0xffffffffffffb520 [ocfs2]
[233096.215396]  [<ffffffff8121ba06>] iput_final+0xf6/0x190
[233096.218169]  [<ffffffff8121bb68>] iput+0xc8/0xe0
[233096.220586]  [<ffffffff8120f9b7>] do_unlinkat+0x1b7/0x310
[233096.223487]  [<ffffffff8106ae5b>] ? __do_page_fault+0x18b/0x480
[233096.226655]  [<ffffffff81126dbc>] ? __audit_syscall_entry+0xac/0x110
[233096.230009]  [<ffffffff810236cc>] ? do_audit_syscall_entry+0x6c/0x70
[233096.233346]  [<ffffffff81023823>] ? syscall_trace_enter_phase1+0x153/0x180
[233096.237103]  [<ffffffff8120fb26>] SyS_unlink+0x16/0x20
[233096.239800]  [<ffffffff816c122e>] system_call_fastpath+0x12/0x71
[233096.244346] ---[ end trace 28aa7410e69369cf ]---
[233096.247798] JBD2: unlink wants too many credits (251 > 128)

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: extend enough credits for freeing one truncate record while replaying truncate records

Orabug: 24759174

Now function ocfs2_replay_truncate_records() first modifies tl_used,
then calls ocfs2_extend_trans() to extend transactions for gd and alloc
inode used for freeing clusters. jbd2_journal_restart() may be called
and it may happen that tl_used in truncate log is decreased but the
clusters are not freed, which means these clusters are lost. So we
should avoid extending transactions in these two operations.

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Acked-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 102c2595aa193f598c0f4b1bf2037d168c80e551)

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

Revert "scsi: fix soft lockup in scsi_remove_target() on module removal"

Now that we've done a more comprehensive fix with the intermediate
target state we can remove the previous hack introduced with commit
90a88d6ef88e ("scsi: fix soft lockup in scsi_remove_target() on module
removal").

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: stable@vger.kernel.org
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 305c2e71b3d733ec065cb716c76af7d554bd5571)

Orabug: 24844559
Signed-off-by: ashok.vairavan <ashok.vairavan@oracle.com>

scsi: Add intermediate STARGET_REMOVE state to scsi_target_state

Add intermediate STARGET_REMOVE state to scsi_target_state to avoid
running into the BUG_ON() in scsi_target_reap(). The STARGET_REMOVE
state is only valid in the path from scsi_remove_target() to
scsi_target_destroy() indicating this target is going to be removed.

This re-fixes the problem introduced in commits bc3f02a795d3 ("[SCSI]
scsi_remove_target: fix softlockup regression on hot remove") and
40998193560d ("scsi: restart list search after unlock in
scsi_remove_target") in a more comprehensive way.

[mkp: Included James' fix for scsi_target_destroy()]

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38
Cc: stable@vger.kernel.org
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 24844559
Mainline commit: f05795d3d771f30a7bdc3a138bf714b06d42aa95
Conflicts:
    The following changes are done to fix kabi breakage.

        #ifndef __GENKSYMS__
        STARGET_REMOVE,
        #endif

scsi: fix soft lockup in scsi_remove_target() on module removal

This softlockup is currently happening:

[  444.088002] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:29]
[  444.088002] Modules linked in: lpfc(-) qla2x00tgt(O) qla2xxx_scst(O) scst_vdisk(O) scsi_transport_fc libcrc32c scst(O) dlm configfs nfsd lockd grace nfs_acl auth_rpcgss sunrpc ed
d snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device dm_mod iTCO_wdt snd_hda_codec_realtek snd_hda_codec_generic gpio_ich iTCO_vendor_support ppdev snd_hda_intel snd_hda_codec snd_hda
_core snd_hwdep tg3 snd_pcm snd_timer libphy lpc_ich parport_pc ptp acpi_cpufreq snd pps_core fjes parport i2c_i801 ehci_pci tpm_tis tpm sr_mod cdrom soundcore floppy hwmon sg 8250_
fintek pcspkr i915 drm_kms_helper uhci_hcd ehci_hcd drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit usbcore button video usb_common fan ata_generic ata_piix libata th
ermal
[  444.088002] CPU: 1 PID: 29 Comm: kworker/1:1 Tainted: G           O    4.4.0-rc5-2.g1e923a3-default #1
[  444.088002] Hardware name: FUJITSU SIEMENS ESPRIMO E           /D2164-A1, BIOS 5.00 R1.10.2164.A1               05/08/2006
[  444.088002] Workqueue: fc_wq_4 fc_rport_final_delete [scsi_transport_fc]
[  444.088002] task: f6266ec0 ti: f6268000 task.ti: f6268000
[  444.088002] EIP: 0060:[<c07e7044>] EFLAGS: 00000286 CPU: 1
[  444.088002] EIP is at _raw_spin_unlock_irqrestore+0x14/0x20
[  444.088002] EAX: 00000286 EBX: f20d3800 ECX: 00000002 EDX: 00000286
[  444.088002] ESI: f50ba800 EDI: f2146848 EBP: f6269ec8 ESP: f6269ec8
[  444.088002]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  444.088002] CR0: 8005003b CR2: 08f96600 CR3: 363ae000 CR4: 000006d0
[  444.088002] Stack:
[  444.088002]  f6269eec c066b0f7 00000286 f2146848 f50ba808 f50ba800 f50ba800 f2146a90
[  444.088002]  f2146848 f6269f08 f8f0a4ed f3141000 f2146800 f2146a90 f619fa00 00000040
[  444.088002]  f6269f40 c026cb25 00000001 166c6392 00000061 f6757140 f6136340 00000004
[  444.088002] Call Trace:
[  444.088002]  [<c066b0f7>] scsi_remove_target+0x167/0x1c0
[  444.088002]  [<f8f0a4ed>] fc_rport_final_delete+0x9d/0x1e0 [scsi_transport_fc]
[  444.088002]  [<c026cb25>] process_one_work+0x155/0x3e0
[  444.088002]  [<c026cde7>] worker_thread+0x37/0x490
[  444.088002]  [<c027214b>] kthread+0x9b/0xb0
[  444.088002]  [<c07e72c1>] ret_from_kernel_thread+0x21/0x40

What appears to be happening is that something has pinned the target
so it can't go into STARGET_DEL via final release and the loop in
scsi_remove_target spins endlessly until that happens.

The fix for this soft lockup is to not keep looping over a device that
we've called remove on but which hasn't gone into DEL state.  This
patch will retain a simplistic memory of the last target and not keep
looping over it.

Reported-by: Sebastian Herbszt <herbszt@gmx.de>
Tested-by: Sebastian Herbszt <herbszt@gmx.de>
Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
(cherry picked from commit 90a88d6ef88edcfc4f644dddc7eef4ea41bccf8b)

Orabug: 24844559
Signed-off-by: ashok.vairavan <ashok.vairavan@oracle.com>

scsi: restart list search after unlock in scsi_remove_target

When dropping a lock while iterating a list we must restart the search
as other threads could have manipulated the list under us.  Without this
we can get stuck in an endless loop.  This bug was introduced by

commit bc3f02a795d3b4faa99d37390174be2a75d091bd
Author: Dan Williams <djbw@fb.com>
Date:   Tue Aug 28 22:12:10 2012 -0700

    [SCSI] scsi_remove_target: fix softlockup regression on hot remove

Which was itself trying to fix a reported soft lockup issue

http://thread.gmane.org/gmane.linux.kernel/1348679

However, we believe even with this revert of the original patch, the soft
lockup problem has been fixed by

commit f2495e228fce9f9cec84367547813cbb0d6db15a
Author: James Bottomley <JBottomley@Parallels.com>
Date:   Tue Jan 21 07:01:41 2014 -0800

    [SCSI] dual scan thread bug fix

Thanks go to Dan Williams <dan.j.williams@intel.com> for tracking all this
prior history down.

Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: bc3f02a795d3b4faa99d37390174be2a75d091bd
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Odin.com>
(cherry picked from commit 40998193560dab6c3ce8d25f4fa58a23e252ef38)

Orabug: 24844559
Signed-off-by: ashok.vairavan <ashok.vairavan@oracle.com>

RDS: ib: build fix rds_conn_drop() takes extra parameter now

rds_conn_drop() now takes reason as a parametr. Fixes
commit 052f62099 build issue

Orabug: 22506032

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: Drop the connection as part of cancel to avoid hangs

To avoid waiting indefinitely in rds_send_drop_to(), drop the connection
proactively if one of the cancelled messages is mapped.

Orabug: 22506032

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: add reconnect retry scheme for stalled connections

RDS IB connections gets stalled at times and letting the connections
take its sweet time to reconnect. On passive side, we wait for 15 seconds
for such stalled connections which is too slow based on application
IO timeouts. IB connections are established in milliseconds so we better
drop these stuck connections early and retry.

The retry timeout is kept tunable via reconnect_retry_ms sysctl. The
upper bound for retries is tunbale via rds_sysctl_reconnect_max_retries.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: restore the exponential back-off scheme

Lower IP and exponential back-off scheme was added to save the
SM queries because of races but it doesn't do what its intended.
The exponential back-off scheme does a good job of backing off
for races. The code just falls back to the original scheme.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: avoid duplicate connection drop for self loopback

For self-IB loopback is special mode and the c_passive conn is just a
place holder to stick the the second QP.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: don't modify conn state directly in rds_connect_complete

Avoid modifying the conn state directly and let
the APIs handle it to be consistent across.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: log associates connection details for setup failures

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: suppress log prints for FLUSH_ERR/RETRY_EXC

Flush errors are normal while draining the QP and retry exceeded
errors are normal for RC connections with finite transport retry.
No need so flood the log file. RDS in-memory trace already logs it.

Orabug: 22347191, 24663803

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

ipoib: supress the retry related completion errors

IPoIB doesn't support transport/rnr retry schemes as per
RFC so those errors are expected. No need to flood the
log files with them.

It was tempting make it as upstream(debug mode only) but
since we have this erros in Oracle IB stack for while,
leaving rest of them to be logged.

Orabug: 24663803, 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reported-by: Rajiv Raja <rajiv.raja@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: use c_wq for all activities on a connection

RDS connection work, send work, recv work etc events have been
serialised by use of a single threaded work queue. For loopback
connections, we created a separate thread but only connection
work is moved on it. This actually under utilises the thread
and creates un-necessary contention for send/recv work and
connection work for loopback connections.

We move remainder loopback work as well on the rds_local_wq
which garantees serialisation as well as delinks the loopback
and non loopback work(s).

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: Avoid double reject on ACL failures

We end up sending double reject on ACL failures. Fix it.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: make the rds_{local_}wq part of rds_connection

Instead of sprinkling if/else for loopback all over the place,
lets just add c_wq as part of rds_connection. This will prevent
missing cases like 'commit edca33be359c ("RDS: move more queing for
loopback connections to separate queue"), 'commit 8502173071b6
("rds: schedule local connection activity in proper workqueue")
or any future changes.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: make rds_conn_drop() take reason argument

This removes the need of modifying the conn all over the place
and moves it inside rds_conn_drop(). Actually there is almost
no need to carry the 'c_drop_source' information as part of
rds_connection but since shutdown thread wants to log this
info for debug, the field is left as is for now.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: Remove unused PATH migration event code

These events were used for APM which is removed already from the code.
Remove this remainder '#if 0' code and migrate handler.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: remove delayed queuing of address change

There is no good reason to delay the address change event. Remove
the delayed work and use the function directly.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDMA CM: init the return value to avoid false negative

Garbage return value can be seen as an error which is false negative

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: use address change event for failover/failback

The RDS active bonding code had few fundamental bugs which
have been addressed so the workaround(s) added can be removed.
These workaround are creating problems and races in connection
management code leading to occasional connections stalls.
Removal of these makes the code behavior predictable and clean.

Few notables fixes to mention here:
- Taking care of all layers of events for ports before
marking them up/down.
- ARP cache related fixes.
- Local loopback connection hang fix

Patch almost make RDS active bonding failover/failback path
as intended from the beginning.
i.e On failover/failback, the address change events needs to
be sent which then triggers all the necessary events to
re-establish the connection(s).

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: drop workaround for loopback connection hangs

There is no need to modify the ARP cache directly under the
assumption that it helps to speed up the failover/failback for
loopback connections. The ARP cache is properly updated by core
IPv4 code and one of the issue with RDS active bonding code
with arp has been addressed as part of
'commit 42a7becc725f ("RDS: IB: Make use of ARPOP_REQUEST instead
of ARPOP_REPLY in bonding code")'. Remove the workaround added as
part of bug 16979994 with patch "RDS: Local address resolution may
be delayed after IP has moved"

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Revert "RDS: IB: skip rx/tx work when destroying connection"

This reverts commit cb9e35124534e0803606ac58b03390d3f3d83dad.

Orabug: 24746103

HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands

Orabug: 24798688
CVE: CVE-2016-5829

This patch validates the num_values parameter from userland during the
HIDIOCGUSAGES and HIDIOCSUSAGES commands. Previously, if the report id was set
to HID_REPORT_ID_UNKNOWN, we would fail to validate the num_values parameter
leading to a heap overflow.

Cc: stable@vger.kernel.org
Signed-off-by: Scott Bauer <sbauer@plzdonthack.me>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
(cherry picked from commit 93a2001bdfd5376c3dc2158653034c20392d15c5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

xsigo: EoiB QP support

Orabug: 24508359

1) Enable EoiB QP property for uVNIC's created on Titan
   card using is_eoib variable.
2) Use qkey provided by OFOS for EoIB QP, this QKEY comes
   as part of VNIC Install message.
3) If Titan Card has TSO enabled then enable TSO using
   NETIF_F_TSO flag and notify upstream network stack about this
4) If Titan has Checksum offload feature then notify network stack.
   about this and on each WR set IB_SEND_IP_CSUM bit.
5) Added some Debug Flags
6) Printing Qkey information and EoiB information in proc stats

Reported-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

xsigo: Send Heart Beat Lost Operational state

Orabug: 23032392

In Case of Heart Beat loss due to saturn or Multicast issues
uVNIC driver needs to send XVE_NOTIFY_HBEAT_LOST as a part of
Operational Request to OFOS . This will enable OFOS to perform
appropriate actions

Reported-by: Suyi Shao <suyi.shao@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

Merge branch 'uek4/4.7-xen-backport' into topic/uek-4.1/xen

* uek4/4.7-xen-backport:
  xenbus: simplify xenbus_dev_request_and_reply()
  xenbus: don't bail early from xenbus_dev_request_and_reply()
  xenbus: don't BUG() on user mode induced condition
  xen-pciback: return proper values during BAR sizing
  x86/xen: avoid m2p lookup when setting early page table entries
  xen/pciback: Fix conf_space read/write overlap check.
  x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
  xen/balloon: Fix declared-but-not-defined warning
  xen-blkfront: fix resume issues after a migration
  xen-blkfront: don't call talk_to_blkback when already connected to blkback
  xen: use same main loop for counting and remapping pages
  Xen: don't warn about 2-byte wchar_t in efi
  xen/gntdev: reduce copy batch size to 16

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
OraBug: 23585393

sif: Retest last allocated entry with roundrobin allocation

Current codebase tests next element if it is unused(freed) when allocating
round-robin. In certain cases where an element is allocated and then
deallocated immediately it is convenient to reuse the element.

Orabug: 24761759

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: cq: Implement error tracking

This commit introduces a new state variable 'in_error'
in the CQ state. When a CQ error event is detected
cq->in_error is set.

This state is then checked before
* posting a req_notify_cq
* invoking any of the completion generating
workarounds.

Also set qp->last_set_state to ERR if a fatal QP event
is detected to reduce further posting on that QP.

This commit reduces the risk of getting into a privileged
QP error scenario for some common cases of misbehaved
applications, and also enables the driver to terminate more
quickly if a priv.QP error has occurred.

Orabug: 24715634

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: sqflush: Fix wrong casting in the calculation of CQ full

Orabug: 24735772

The CQ full is calculated via last posted seq number (last_seq)
minus the last completion seq number (head_seq). Both last_seq
and head_seq are defined as u16. However, in the calculation to
verify that the CQ is not full, a wrong casting is performed.
This causes a false negative of CQ full in the wrapped case.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: qp: Re-factor initializing of HW QP state

When a QP is created, the HW state is zeroed, then certain fields are
initialized. After modify_qp(), other fields are set. When a QP is
handed over to HW, HW will potentially modify parts of the QP
state. The QP will eventually be transitioned into the RESET state.

From RESET, it is legitimate to resurrect the QP.

It is imperative that the QP state is equal the state when it was
created when transitioned to RESET, in the case it will be resurrected

For any state to RESET transition, the IBTA specification states: "QP
attributes are reset to the same values after the QP was created."

This commit re-factors create_qp() and reset_qp() so the driver
adheres to the specification.

Orabug: 24747392

Signed-off-by: Hakon Bugge <Haakon.Bugge@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: cq: Add additional SIF visible cqes to CQ

Orabug: 24673784

Due to two HW bugs, SIF needs to add additional cqes apart from
the requested N cqes. First issue is that the CQ cannot be full
when it is being invalidated. Hence, we need 1 extra entry.

The second issue is that HW might generate duplicate completions.
Thus, SIF needs 768 extra entries to cater for these duplicate
completions and 1 additional entry for the fence completion.

Then, SIF driver rounds up to the nearest 2^N. The number of cqes
available to the ulp/user, will be the above 2^N - (768 + 1 + 1).

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: qp: Clear the QP state cq_int_err bit upon reset

During reset of a QP, the cq_int_err bit was not reset.
This would cause a subsequent transition to RTR to fail
and hardware to set the QP back in error again.

Orabug: 24708282

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: qp_attr: Fix qp attributes for query_qp verb

Orabug: 21946858

Following QP attributes were incorrectly reported:
1) max_rd_atomic
2) service level
3) alternate pkey index
4) alternate ack timeout
5) alternate address handle

Signed-off-by: Vinay Shaw <vinay.shaw@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: qp_attr: Fix qp attributes for modify_qp verb

Orabug: 24669222

Local ACK timeout were incorrectly set and reported.

Signed-off-by: Vinay Shaw <vinay.shaw@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: eq: Let compiler handle endianess and memory model

Handle port number, a u8 entity in struct ib_event in a platform agnostic way

This bug causes a crash/kernel panic on a (psif-)ib-connected SPARC at boot

Orabug: 24702857

Signed-off-by: George Refseth <george.refseth@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: cq: do not return errors from poll_cq

Orabug: 23321166

Remove returning errors from sif_poll_cq function. On the opposite, log the
error and skip the CQEs and continue with the next CQE.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: qp: Collapsed two log statements + removed incorrect port number print

With debug level bit 0x2000 (SIF_QP) set, the driver logs QP creation
info. Two log statements logging the same information are
collapsed. Also, incorrect logging of port number is removed for all
QPs except QP0/1 , which are the only QPs that have valid port number
at their creation time.

Orabug: 24695066

Signed-off-by: Hakon Bugge <Haakon.Bugge@oracle.com>
Tested-by: Francisco Trivi?o-Garcia <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: Avoid using SIFMT_2M for allocation of any tables in no_huge_page mode

The feature mask no_huge_pages, enabled for Xen due to
DMA address alignment issues with huge pages, did not apply
to allocation of CQs, RQs, and SQs, only to the tableworks.
This causes allocation of queues of these types
larger than 4M in total size to fail on Xen PV domains
such as dom0.

Orabug: 24683830

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: qp: Adjust EoIB qp inline size to support LSO

Handle EoIB in same way as IPoIB in create_qp. To support LSO we
need a minimum inline size, the code will adjust inline size
accordingly.

Orabug: 24672908

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: sqflush: set the duplicated CQ entry status as DUPL_COMPL_ERR

Orabug: 24652927

Mark the duplicate completion status as PSIF_WC_STATUS_DUPL_COMPL_ERR if
the additional/duplicate completions are detected by walk_and_update_cqes.
Then, the translate_wr_id can identify the duplicate completion during
sif_poll_cq.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: cq: fixup the CQEs when a QP is transitioned to RESET

Orabug: 24652927

The sif_fixup_cqes function, which update the wr_id from the SQ handle, is
moved to reset_qp to cover the scenario where IB user reuses a QP after
performing ib_modify_qp(RESET). This patch also handles a scenario in
sif_fixup_cqes where a QP has been reset multiple times but the IB user has
not polled the associated CQ completely.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: eq: Implement threaded interrupt handler

The handler function (sif_intr) for a single interrupt is mostly
processing completions notification events (CNE) as long as there
are events in the queue. Sometimes the CNEs are received at a higher
rate than the handler is able to process them, then it keeps
infinitely processing events until the queue might be full, which
leads to a fatal error, or the watchdog triggers a kernel panic,
as shown in orabug 24657844.

This commit replaces request_irq by request_threaded_irq, which
allows the driver to specify a threaded handler (sif_intr_worker)
in addition. The original handler function (sif_intr) is called
in hard interrupt context and can return IRQ_HANDLED if the timeout
SIF_IRQ_HANDLER_TIMEOUT is not exceeded or IRQ_WAKE_THREAD otherwise.

The flag IRQF_ONESHOT is used to ensure that the interrupt is
disabled when IRQ_WAKE_THREAD is returned.

Orabug: 24657844

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: eq: remove check_all_eqs_on_intr driver feature

The feature check_all_eqs_on_intr is no longer needed. This commit
removes this driver feature and hence simplifies the interrupt
handler implementation.

Orabug: 24665085

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: eq: Add max irq handling time to the sysfs eq table

Track the max time (in ms) that has been recorded for dispatching events
(dispatch_eq) on interrupt handling (sif_intr).

Orabug: 24657844

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

Merge branch 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

ocfs2: Fix double put of recount tree in ocfs2_lock_refcount_tree()

Orabug: 24759721

In ocfs2_lock_refcount_tree, if ocfs2_read_refcount_block() returns error,
we do ocfs2_refcount_tree_put twice (once in ocfs2_unlock_refcount_tree
and once outside it), thereby reducing the refcount of the refcount tree
twice, but we dont delete the tree in this case. This will make refcnt
of the tree = 0 and the ocfs2_refcount_tree_put will eventually call
ocfs2_mark_lockres_freeing, setting OCFS2_LOCK_FREEING for the
refcount_tree->rf_lockres.

The error returned by ocfs2_read_refcount_block is propagated all the way
back and for next iteration of write, ocfs2_lock_refcount_tree gets the
same tree back from ocfs2_get_refcount_tree because we havent deleted the
tree. Now we have the same tree, but OCFS2_LOCK_FREEING is set for
rf_lockres and eventually, when _ocfs2_lock_refcount_tree is called in
this iteration, BUG_ON( __ocfs2_cluster_lock:1395 ERROR: Cluster lock
called on freeing lockres T00000000000000000386019775b08d! flags 0x81) is
triggerred.

Call stack:

(loop16,11155,0):ocfs2_lock_refcount_tree:482 ERROR: status = -5
(loop16,11155,0):ocfs2_refcount_cow_hunk:3497 ERROR: status = -5
(loop16,11155,0):ocfs2_refcount_cow:3560 ERROR: status = -5
(loop16,11155,0):ocfs2_prepare_inode_for_refcount:2111 ERROR: status = -5
(loop16,11155,0):ocfs2_prepare_inode_for_write:2190 ERROR: status = -5
(loop16,11155,0):ocfs2_file_write_iter:2331 ERROR: status = -5
(loop16,11155,0):__ocfs2_cluster_lock:1395 ERROR: bug expression:
lockres->l_flags & OCFS2_LOCK_FREEING

(loop16,11155,0):__ocfs2_cluster_lock:1395 ERROR: Cluster lock called on
freeing lockres T00000000000000000386019775b08d! flags 0x81

------------[ cut here ]------------
kernel BUG at fs/ocfs2/dlmglue.c:1395!

invalid opcode: 0000 [#1] SMP  CPU 0
Modules linked in: tun ocfs2 jbd2 xen_blkback xen_netback xen_gntdev ..
sd_mod crc_t10dif ext3 jbd mbcache

Pid: 11155, comm: loop16 Tainted: G        W   2.6.39-400.279.1.el5uek #1
Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U
RIP: e030:[<ffffffffa082137c>]  [<ffffffffa082137c>]
__ocfs2_cluster_lock+0x31c/0x740 [ocfs2]
RSP: e02b:ffff88017c0138a0  EFLAGS: 00010086
RAX: 000000000000008b RBX: ffff8801b5374300 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: ffff88017c013980 R08: 0000000000206db1 R09: ffff8800000bbf20
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88017bd25848
R13: 1000000000000800 R14: ffff8801b5374948 R15: 0000000000000005
FS:  00007f8198d746e0(0000) GS:ffff8801d6600000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f641d181110 CR3: 000000017c15b000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process loop16 (pid: 11155, threadinfo ffff88017c010000, task
ffff8801b5374300)
Stack:
ffff88017bd25880 0000000000000081 000000017c013920 ffff88017c013960
000000000000001d 0000000000000001 ffff88017bd258b4 0000000000000000
ffff880172006000 00000000a07fa410 ffff88017bd202b4 0000000000000000
Call Trace:
[<ffffffffa08227de>] ocfs2_refcount_lock+0xae/0x130 [ocfs2]
[<ffffffffa0846b89>] ? __ocfs2_lock_refcount_tree+0x29/0xe0 [ocfs2]
[<ffffffff81509dde>] ? _raw_spin_lock+0xe/0x20
[<ffffffffa0846b89>] __ocfs2_lock_refcount_tree+0x29/0xe0 [ocfs2]
[<ffffffffa084d47d>] ocfs2_lock_refcount_tree+0xdd/0x320 [ocfs2]
[<ffffffffa084de3b>] ocfs2_refcount_cow_hunk+0x1cb/0x440 [ocfs2]
[<ffffffffa084e159>] ocfs2_refcount_cow+0xa9/0x1d0 [ocfs2]
[<ffffffffa08291c7>] ? ocfs2_prepare_inode_for_refcount+0x67/0x200 [ocfs2]
[<ffffffffa0829275>] ocfs2_prepare_inode_for_refcount+0x115/0x200 [ocfs2]
[<ffffffffa081f394>] ? ocfs2_inode_unlock+0xd4/0x140 [ocfs2]
[<ffffffffa082969b>] ocfs2_prepare_inode_for_write+0x33b/0x470 [ocfs2]
[<ffffffffa0822620>] ? ocfs2_rw_lock+0x80/0x190 [ocfs2]
[<ffffffffa082c150>] ocfs2_file_write_iter+0x220/0x8c0 [ocfs2]
[<ffffffff81112c67>] ? mempool_free_slab+0x17/0x20
[<ffffffff8119f2b1>] ? bio_free+0x61/0x70
[<ffffffff811adece>] ? aio_kernel_free+0xe/0x10
[<ffffffff811adb1e>] aio_write_iter+0x2e/0x30

Fix this by avoiding the second call to ocfs2_refcount_tree_put()

Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: Eric Ren <zren@suse.com>
Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com>

megaraid_sas: Don't issue kill adapter for MFI controllers in case of PD list DCMD failure

There are few MFI adapters which do not support MR_DCMD_PD_LIST_QUERY so
if MFI adapters fail this DCMD, it should not be considered as FATAL and
driver should not issue kill adapter and set per controller's instance
variable- pd_list_not_supported so that same variable can be used inside
functions- slave_alloc and slave_configure to allow firmware scan.

Killing adapter because of DCMD failure when this DCMD is not supported
causes driver's probe getting failed. This issue got introduced by
commit 6d40afbc7d13 ("megaraid_sas: MFI IO timeout handling").

Killing adapter in case of this DCMD failure should be limited to Fusion
adapters only. Per controller's instance variable allow_fw_scan is
removed as pd_list_not_supported better reflect the purpose.

Fixes: 6d40afbc7d13359b30a5cd783e3db6ebefa5f40a
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3084558658c2f9a48d7c460d57aeb30964c06b7e)

Orabug: 24759460
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>

netfilter: x_tables: speed up jump target validation

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit f4dc77713f8016d2e8a3295e1c9c53a21f296def ]

The dummy ruleset I used to test the original validation change was broken,
most rules were unreachable and were not tested by mark_source_chains().

In some cases rulesets that used to load in a few seconds now require
several minutes.

sample ruleset that shows the behaviour:

echo "*filter"
for i in $(seq 0 100000);do
        printf ":chain_%06x - [0:0]\n" $i
done
for i in $(seq 0 100000);do
   printf -- "-A INPUT -j chain_%06x\n" $i
   printf -- "-A INPUT -j chain_%06x\n" $i
   printf -- "-A INPUT -j chain_%06x\n" $i
done
echo COMMIT

[ pipe result into iptables-restore ]

This ruleset will be about 74mbyte in size, with ~500k searches
though all 500k[1] rule entries. iptables-restore will take forever
(gave up after 10 minutes)

Instead of always searching the entire blob for a match, fill an
array with the start offsets of every single ipt_entry struct,
then do a binary search to check if the jump target is present or not.

After this change ruleset restore times get again close to what one
gets when reverting 36472341017529e (~3 seconds on my workstation).

[1] every user-defined rule gets an implicit RETURN, so we get
300k jumps + 100k userchains + 100k returns -> 500k rule entries

Fixes: 36472341017529e ("netfilter: x_tables: validate targets of jumps")
Reported-by: Jeff Wu <wujiafu@gmail.com>
Tested-by: Jeff Wu <wujiafu@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 2686f12b26e217befd88357cf84e78d0ab3c86a1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: enforce nul-terminated table name from getsockopt GET_ENTRIES

Orabug: 24690280
CVE: CVE-2016-3134

Make sure the table names via getsockopt GET_ENTRIES is nul-terminated
in ebtables and all the x_tables variants and their respective compat
code. Uncovered by KASAN.

Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit b301f2538759933cf9ff1f7c4f968da72e3f0757)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: remove unused comefrom hookmask argument

Orabug: 24690280
CVE: CVE-2016-3134

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 2f06550b3b0e26f54045337e34ec2a1b666bb6c6)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/ipv4/netfilter/ip_tables.c
net/ipv6/netfilter/ip6_tables.c

netfilter: x_tables: introduce and use xt_copy_counters_from_user

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce ]

The three variants use same copy&pasted code, condense this into a
helper and use that.

Make sure info.name is 0-terminated.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit b7aa372fec2c1d4a1fcf4398ae88ba94134b9522)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: do compat validation via translate_table

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 09d9686047dbbe1cf4faa558d3ecc4aae2046054 ]

This looks like refactoring, but its also a bug fix.

Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few
sanity tests that are done in the normal path.

For example, we do not check for underflows and the base chain policies.

While its possible to also add such checks to the compat path, its more
copy&pastry, for instance we cannot reuse check_underflow() helper as
e->target_offset differs in the compat case.

Other problem is that it makes auditing for validation errors harder; two
places need to be checked and kept in sync.

At a high level 32 bit compat works like this:
1- initial pass over blob:
   validate match/entry offsets, bounds checking
   lookup all matches and targets
   do bookkeeping wrt. size delta of 32/64bit structures
   assign match/target.u.kernel pointer (points at kernel
   implementation, needed to access ->compatsize etc.)

2- allocate memory according to the total bookkeeping size to
   contain the translated ruleset

3- second pass over original blob:
   for each entry, copy the 32bit representation to the newly allocated
   memory.  This also does any special match translations (e.g.
   adjust 32bit to 64bit longs, etc).

4- check if ruleset is free of loops (chase all jumps)

5-first pass over translated blob:
   call the checkentry function of all matches and targets.

The alternative implemented by this patch is to drop steps 3&4 from the
compat process, the translation is changed into an intermediate step
rather than a full 1:1 translate_table replacement.

In the 2nd pass (step #3), change the 64bit ruleset back to a kernel
representation, i.e. put() the kernel pointer and restore ->u.user.name .

This gets us a 64bit ruleset that is in the format generated by a 64bit
iptables userspace -- we can then use translate_table() to get the
'native' sanity checks.

This has two drawbacks:

1. we re-validate all the match and target entry structure sizes even
though compat translation is supposed to never generate bogus offsets.
2. we put and then re-lookup each match and target.

THe upside is that we get all sanity tests and ruleset validations
provided by the normal path and can remove some duplicated compat code.

iptables-restore time of autogenerated ruleset with 300k chains of form
-A CHAIN0001 -m limit --limit 1/s -j CHAIN0002
-A CHAIN0002 -m limit --limit 1/s -j CHAIN0003

shows no noticeable differences in restore times:
old:   0m30.796s
new:   0m31.521s
64bit: 0m25.674s

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit af815d264b7ed1cdceb0e1fdf4fa7d3bd5fe9a99)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: xt_compat_match_from_user doesn't need a retval

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 0188346f21e6546498c2a0f84888797ad4063fc5 ]

Always returned 0.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 2756b2a32469316c98a46359710ea2bfdbfe331e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: ip6_tables: simplify translate_compat_table args

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 329a0807124f12fe1c8032f95d8a8eb47047fb0e ]

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 05e089b3a29c6c965fa9c8801ea72d21a959ca46)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: ip_tables: simplify translate_compat_table args

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 7d3f843eed29222254c9feab481f55175a1afcc9 ]

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit cd76c8c82c48ca6c1e3da1df7c1dca7f44302c65)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: arp_tables: simplify translate_compat_table args

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 8dddd32756f6fe8e4e82a63361119b7e2384e02f ]

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 7bdf4f49826159635b21516296be211f420eefac)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: don't reject valid target size on some architectures

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 7b7eba0f3515fca3296b8881d583f7c1042f5226 ]

Quoting John Stultz:
  In updating a 32bit arm device from 4.6 to Linus' current HEAD, I
  noticed I was having some trouble with networking, and realized that
  /proc/net/ip_tables_names was suddenly empty.
  Digging through the registration process, it seems we're catching on the:

   if (strcmp(t->u.user.name, XT_STANDARD_TARGET) == 0 &&
       target_offset + sizeof(struct xt_standard_target) != next_offset)
         return -EINVAL;

  Where next_offset seems to be 4 bytes larger then the
  offset + standard_target struct size.

next_offset needs to be aligned via XT_ALIGN (so we can access all members
of ip(6)t_entry struct).

This problem didn't show up on i686 as it only needs 4-byte alignment for
u64, but iptables userspace on other 32bit arches does insert extra padding.

Reported-by: John Stultz <john.stultz@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Fixes: 7ed2abddd20cf ("netfilter: x_tables: check standard target size too")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit d97246b5d16b77b57d835b6b4d1bde8ea6566b46)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: validate all offsets and sizes in a rule

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 13631bfc604161a9d69cd68991dff8603edd66f9 ]

Validate that all matches (if any) add up to the beginning of
the target and that each match covers at least the base structure size.

The compat path should be able to safely re-use the function
as the structures only differ in alignment; added a
BUILD_BUG_ON just in case we have an arch that adds padding as well.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit a605e7476c66a13312189026e6977bad6ed3050d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: check for bogus target offset

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit ce683e5f9d045e5d67d1312a42b359cb2ab2a13c ]

We're currently asserting that targetoff + targetsize <= nextoff.

Extend it to also check that targetoff is >= sizeof(xt_entry).
Since this is generic code, add an argument pointing to the start of the
match/target, we can then derive the base structure size from the delta.

We also need the e->elems pointer in a followup change to validate matches.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 451e4403bc4abc51539376d4314baa739ab9e996)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: check standard target size too

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 7ed2abddd20cf8f6bd27f65bd218f26fa5bf7f44 ]

We have targets and standard targets -- the latter carries a verdict.

The ip/ip6tables validation functions will access t->verdict for the
standard targets to fetch the jump offset or verdict for chainloop
detection, but this happens before the targets get checked/validated.

Thus we also need to check for verdict presence here, else t->verdict
can point right after a blob.

Spotted with UBSAN while testing malformed blobs.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 73bfda1c492bef7038a87adfa887b7e6b7cd6679)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: add compat version of xt_check_entry_offsets

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit fc1221b3a163d1386d1052184202d5dc50d302d1 ]

32bit rulesets have different layout and alignment requirements, so once
more integrity checks get added to xt_check_entry_offsets it will reject
well-formed 32bit rulesets.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit acbcf85306bd563910a2afe07f07d30381b031b0)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: assert minimum target size

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit a08e4e190b866579896c09af59b3bdca821da2cd ]

The target size includes the size of the xt_entry_target struct.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit aae91919c9d6d1aa6d6390826979e6d2c89a7ba4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: kill check_entry helper

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit aa412ba225dd3bc36d404c28cdc3d674850d80d0 ]

Once we add more sanity testing to xt_check_entry_offsets it
becomes relvant if we're expecting a 32bit 'config_compat' blob
or a normal one.

Since we already have a lot of similar-named functions (check_entry,
compat_check_entry, find_and_check_entry, etc.) and the current
incarnation is short just fold its contents into the callers.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 801cd32774d12dccfcfc0c22b0b26d84ed995c6f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: add and use xt_check_entry_offsets

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 7d35812c3214afa5b37a675113555259cfd67b98 ]

Currently arp/ip and ip6tables each implement a short helper to check that
the target offset is large enough to hold one xt_entry_target struct and
that t->u.target_size fits within the current rule.

Unfortunately these checks are not sufficient.

To avoid adding new tests to all of ip/ip6/arptables move the current
checks into a helper, then extend this helper in followup patches.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit a471ac817cf0e0d6e87779ca1fee216ba849e613)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: validate targets of jumps

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 36472341017529e2b12573093cc0f68719300997 ]

When we see a jump also check that the offset gets us to beginning of
a rule (an ipt_entry).

The extra overhead is negible, even with absurd cases.

300k custom rules, 300k jumps to 'next' user chain:
[ plus one jump from INPUT to first userchain ]:

Before:
real    0m24.874s
user    0m7.532s
sys     0m16.076s

After:
real    0m27.464s
user    0m7.436s
sys     0m18.840s

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 8163327a3a927c36f7c032bb67957e6c0f4ec27d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: fix unconditional helper

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 54d83fc74aa9ec72794373cb47432c5f7fb1a309 ]

Ben Hawkes says:

In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
is possible for a user-supplied ipt_entry structure to have a large
next_offset field. This field is not bounds checked prior to writing a
counter value at the supplied offset.

Problem is that mark_source_chains should not have been called --
the rule doesn't have a next entry, so its supposed to return
an absolute verdict of either ACCEPT or DROP.

However, the function conditional() doesn't work as the name implies.
It only checks that the rule is using wildcard address matching.

However, an unconditional rule must also not be using any matches
(no -m args).

The underflow validator only checked the addresses, therefore
passing the 'unconditional absolute verdict' test, while
mark_source_chains also tested for presence of matches, and thus
proceeeded to the next (not-existent) rule.

Unify this so that all the callers have same idea of 'unconditional rule'.

Reported-by: Ben Hawkes <hawkes@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 850c377e0e2d76723884d610ff40827d26aa21eb)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/ipv4/netfilter/arp_tables.c

netfilter: x_tables: validate targets of jumps

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 36472341017529e2b12573093cc0f68719300997 ]

When we see a jump also check that the offset gets us to beginning of
a rule (an ipt_entry).

The extra overhead is negible, even with absurd cases.

300k custom rules, 300k jumps to 'next' user chain:
[ plus one jump from INPUT to first userchain ]:

Before:
real    0m24.874s
user    0m7.532s
sys     0m16.076s

After:
real    0m27.464s
user    0m7.436s
sys     0m18.840s

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 8163327a3a927c36f7c032bb67957e6c0f4ec27d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: don't move to non-existent next rule

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit f24e230d257af1ad7476c6e81a8dc3127a74204e ]

Ben Hawkes says:

In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
is possible for a user-supplied ipt_entry structure to have a large
next_offset field. This field is not bounds checked prior to writing a
counter value at the supplied offset.

Base chains enforce absolute verdict.

User defined chains are supposed to end with an unconditional return,
xtables userspace adds them automatically.

But if such return is missing we will move to non-existent next rule.

Reported-by: Ben Hawkes <hawkes@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit cf756388f8f34e02a338356b3685c46938139871)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: fix unconditional helper

Orabug: 24690280
CVE: CVE-2016-3134

[ Upstream commit 54d83fc74aa9ec72794373cb47432c5f7fb1a309 ]

Ben Hawkes says:

In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
is possible for a user-supplied ipt_entry structure to have a large
next_offset field. This field is not bounds checked prior to writing a
counter value at the supplied offset.

Problem is that mark_source_chains should not have been called --
the rule doesn't have a next entry, so its supposed to return
an absolute verdict of either ACCEPT or DROP.

However, the function conditional() doesn't work as the name implies.
It only checks that the rule is using wildcard address matching.

However, an unconditional rule must also not be using any matches
(no -m args).

The underflow validator only checked the addresses, therefore
passing the 'unconditional absolute verdict' test, while
mark_source_chains also tested for presence of matches, and thus
proceeeded to the next (not-existent) rule.

Unify this so that all the callers have same idea of 'unconditional rule'.

Reported-by: Ben Hawkes <hawkes@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 850c377e0e2d76723884d610ff40827d26aa21eb)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: check for size overflow

Orabug: 24690280
CVE: CVE-2016-3134

Ben Hawkes says:
integer overflow in xt_alloc_table_info, which on 32-bit systems can
lead to small structure allocation and a copy_from_user based heap
corruption.

Reported-by: Ben Hawkes <hawkes@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: make sure e->next_offset covers remaining blob size

Orabug: 24690280
CVE: CVE-2016-4997, CVE-2016-4998

Otherwise this function may read data beyond the ruleset blob.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 6e94e0cfb0887e4013b3b930fa6ab1fe6bb6ba91)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

netfilter: x_tables: validate e->target_offset early

Orabug: 24690280
CVE: CVE-2016-4997, CVE-2016-4998

We should check that e->target_offset is sane before
mark_source_chains gets called since it will fetch the target entry
for loop detection.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit bdf533de6968e9686df777dc178486f600c6e617)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/ipv4/netfilter/arp_tables.c
net/ipv4/netfilter/ip_tables.c
net/ipv6/netfilter/ip6_tables.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handler

Tony reports that booting his 144-cpu machine with maxcpus=10 triggers
the following WARN_ON():

[   21.045727] WARNING: CPU: 8 PID: 647 at arch/x86/kernel/cpu/perf_event_intel_cqm.c:1267 intel_cqm_cpu_prepare+0x75/0x90()
[   21.045744] CPU: 8 PID: 647 Comm: systemd-udevd Not tainted 4.2.0-rc4 #1
[   21.045745] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0066.R00.1506021730 06/02/2015
[   21.045747]  0000000000000000 0000000082771b09 ffff880856333ba8 ffffffff81669b67
[   21.045748]  0000000000000000 0000000000000000 ffff880856333be8 ffffffff8107b02a
[   21.045750]  ffff88085b789800 ffff88085f68a020 ffffffff819e2470 000000000000000a
[   21.045750] Call Trace:
[   21.045757]  [<ffffffff81669b67>] dump_stack+0x45/0x57
[   21.045759]  [<ffffffff8107b02a>] warn_slowpath_common+0x8a/0xc0
[   21.045761]  [<ffffffff8107b15a>] warn_slowpath_null+0x1a/0x20
[   21.045762]  [<ffffffff81036725>] intel_cqm_cpu_prepare+0x75/0x90
[   21.045764]  [<ffffffff81036872>] intel_cqm_cpu_notifier+0x42/0x160
[   21.045767]  [<ffffffff8109a33d>] notifier_call_chain+0x4d/0x80
[   21.045769]  [<ffffffff8109a44e>] __raw_notifier_call_chain+0xe/0x10
[   21.045770]  [<ffffffff8107b538>] _cpu_up+0xe8/0x190
[   21.045771]  [<ffffffff8107b65a>] cpu_up+0x7a/0xa0
[   21.045774]  [<ffffffff8165e920>] cpu_subsys_online+0x40/0x90
[   21.045777]  [<ffffffff81433b37>] device_online+0x67/0x90
[   21.045778]  [<ffffffff81433bea>] online_store+0x8a/0xa0
[   21.045782]  [<ffffffff81430e78>] dev_attr_store+0x18/0x30
[   21.045785]  [<ffffffff8126b6ba>] sysfs_kf_write+0x3a/0x50
[   21.045786]  [<ffffffff8126ad40>] kernfs_fop_write+0x120/0x170
[   21.045789]  [<ffffffff811f0b77>] __vfs_write+0x37/0x100
[   21.045791]  [<ffffffff811f38b8>] ? __sb_start_write+0x58/0x110
[   21.045795]  [<ffffffff81296d2d>] ? security_file_permission+0x3d/0xc0
[   21.045796]  [<ffffffff811f1279>] vfs_write+0xa9/0x190
[   21.045797]  [<ffffffff811f2075>] SyS_write+0x55/0xc0
[   21.045800]  [<ffffffff81067300>] ? do_page_fault+0x30/0x80
[   21.045804]  [<ffffffff816709ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[   21.045805] ---[ end trace fe228b836d8af405 ]---

The root cause is that CPU_UP_PREPARE is completely the wrong notifier
action from which to access cpu_data(), because smp_store_cpu_info()
won't have been executed by the target CPU at that point, which in turn
means that ->x86_cache_max_rmid and ->x86_cache_occ_scale haven't been
filled out.

Instead let's invoke our handler from CPU_STARTING and rename it
appropriately.

Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Link: http://lkml.kernel.org/r/1438863163-14083-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Orabug: 24745516
(cherry picked from commit d7a702f0b1033cf402fef65bd6395072738f0844)
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

Merge branch topic/uek-4.1/kernel-generic of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1