www.infradead.org Git - users/jedix/linux-maple.git/log

dtrace: ensure that our die notifier gets executed amongst the first

The die notifier is crucial for implementing safe memory access in
DTrace. To avoid other handlers potentially causing interference, we
set the priority of our handler at the highest level.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: David Mc Lean <david.mclean@oracle.com>

Merge remote branch 'ca-git-uek/topic/uek-4.1/stable-cherry-picks' into official/topic/uek-4.1/dtrace

dtrace: allow invop handler to specify number of insns to skip

Rather than unconditionally skipping one instruction, repurpose the
unused return value of the invop handler to allow it to specify the
amount to skip. (In the common case where it knows how many
instructions to skip at all times, this is more efficient.)

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: is-enabled probes for SDT

"Is-enabled probes" are a conditional, long supported in userspace
probing, which lets you avoid doing expensive data-collection operations
needed only by DTrace probes unless those probes are active.

e.g. (an example using the core DTRACE_PROBE / DTRACE_IS_ENABLED macros,
rather than the DTRACE_providername macros used in practice, because
no such macros have been added to the kernel yet):

if (DTRACE_IS_ENABLED(__io_wait__start)) {
/* stuff done only when io:::wait-start is enabled */
}

As with normal SDT probes, the DTRACE_IS_ENABLED() macro compiles to a
stub function call (named like __dtrace_isenabled_*()) which is replaced
at bootup/module load time with an architecture-dependent instruction
sequence analogous to a function that always returns false, though no
function call is generated. At probe enabling time, this is replaced
with a trap into dtrace just like normal dtrace probes, incurring a
performance hit, but only when the probe is active.

The probe name used in the various ELF sections that track SDT
probes begins with a ? character to help the module distinguish
is-enabled probes from normal probes: this is internal to the DTrace
implementation and is otherwise invisible.

(Thanks to Kris Van Hees for initial work on this.)

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 25143173

dtrace: check for errors when getting a new fd

get_unused_fd_flags() can legitimately fail, e.g. when the fd table is
full. We need to diagnose that rather than trying to fd_install() the
resulting negative number.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Orabug: 24977175

dtrace: take mmap_sem in PTRACE_GETMAPFD

Without this, we may oops if the process exec()s and discards its
address space after we find_vma().

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Orabug: 24977175

ocfs2: fix not enough credit panic

The following panic was caught when run ocfs2 disconfig single test
(block size 512 and cluster size 8192). ocfs2_journal_dirty() return
-ENOSPC, that means credits were used up. The total credit should
include 3 times of "num_dx_leaves" from ocfs2_dx_dir_rebalance(),
because 2 times will be consumed in ocfs2_dx_dir_transfer_leaf() and
1 time will be consumed in ocfs2_dx_dir_new_cluster()->
__ocfs2_dx_dir_new_cluster()->ocfs2_dx_dir_format_cluster(). But only
two times is included in ocfs2_dx_dir_rebalance_credits(), fix it.

[34377.331151] ------------[ cut here ]------------
[34377.332007] kernel BUG at fs/ocfs2/journal.c:775!
[34377.344107] invalid opcode: 0000 [#1] SMP
[34377.346090] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport acpi_cpufreq i2c_piix4 i2c_core pcspkr ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
[34377.346090] CPU: 2 PID: 10601 Comm: dd Not tainted 4.1.12-71.el6uek.bug24939243.x86_64 #2
[34377.346090] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
[34377.346090] task: ffff8800b6de6200 ti: ffff8800a7d48000 task.ti: ffff8800a7d48000
[34377.346090] RIP: 0010:[<ffffffffa06e7397>]  [<ffffffffa06e7397>] ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
[34377.346090] RSP: 0018:ffff8800a7d4b6d8  EFLAGS: 00010286
[34377.346090] RAX: 00000000ffffffe4 RBX: 00000000814d0a9c RCX: 00000000000004f9
[34377.346090] RDX: ffffffffa008e990 RSI: ffffffffa008f1ee RDI: ffff8800622b6460
[34377.346090] RBP: ffff8800a7d4b6f8 R08: ffffffffa008f288 R09: ffff8800622b6460
[34377.346090] R10: 0000000000000000 R11: 0000000000000282 R12: 0000000002c8421e
[34377.346090] R13: ffff88006d0cad00 R14: ffff880092beef60 R15: 0000000000000070
[34377.346090] FS:  00007f9b83e92700(0000) GS:ffff8800be880000(0000) knlGS:0000000000000000
[34377.346090] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[34377.346090] CR2: 00007fb2c0d1a000 CR3: 0000000008f80000 CR4: 00000000000406e0
[34377.346090] Stack:
[34377.346090]  00000000814d0a9c ffff88005fe61e00 ffff88006e995c00 ffff880009847c00
[34377.346090]  ffff8800a7d4b768 ffffffffa06c0999 ffff88009d3c2a10 ffff88005fe61e30
[34377.346090]  ffff8800997ce500 ffff8800997ce980 ffff880092beef60 000000100000000e
[34377.346090] Call Trace:
[34377.346090]  [<ffffffffa06c0999>] ocfs2_dx_dir_transfer_leaf+0x159/0x1a0 [ocfs2]
[34377.346090]  [<ffffffffa06c3eeb>] ocfs2_dx_dir_rebalance+0xd9b/0xea0 [ocfs2]
[34377.346090]  [<ffffffffa06dedb2>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]
[34377.346090]  [<ffffffffa0761180>] ? ocfs2_refcount_tree_et_ops+0x60/0xfffffffffffe4b31 [ocfs2]
[34377.346090]  [<ffffffffa06e7730>] ? ocfs2_journal_access_dl+0x20/0x20 [ocfs2]
[34377.346090]  [<ffffffffa06c6b63>] ocfs2_find_dir_space_dx+0xd3/0x300 [ocfs2]
[34377.346090]  [<ffffffffa06c9709>] ocfs2_prepare_dx_dir_for_insert+0x219/0x450 [ocfs2]
[34377.346090]  [<ffffffffa06c9b16>] ocfs2_prepare_dir_for_insert+0x1d6/0x580 [ocfs2]
[34377.346090]  [<ffffffffa06dee90>] ? ocfs2_read_inode_block+0x10/0x20 [ocfs2]
[34377.346090]  [<ffffffffa06f38e2>] ocfs2_mknod+0x5a2/0x1400 [ocfs2]
[34377.346090]  [<ffffffffa06f4933>] ocfs2_create+0x73/0x180 [ocfs2]
[34377.346090]  [<ffffffff81211de8>] vfs_create+0xd8/0x100
[34377.346090]  [<ffffffff8120f5fd>] ? lookup_real+0x1d/0x60
[34377.346090]  [<ffffffff81212535>] lookup_open+0x185/0x1c0
[34377.346090]  [<ffffffff8121571d>] do_last+0x36d/0x780
[34377.346090]  [<ffffffff812a85d6>] ? security_file_alloc+0x16/0x20
[34377.346090]  [<ffffffff81215bc2>] path_openat+0x92/0x470
[34377.346090]  [<ffffffff81215fea>] do_filp_open+0x4a/0xa0
[34377.346090]  [<ffffffff8132c570>] ? find_next_zero_bit+0x10/0x20
[34377.346090]  [<ffffffff812232ec>] ? __alloc_fd+0xac/0x150
[34377.346090]  [<ffffffff8120459a>] do_sys_open+0x11a/0x230
[34377.346090]  [<ffffffff810259d3>] ? syscall_trace_enter_phase1+0x153/0x180
[34377.346090]  [<ffffffff812046ee>] SyS_open+0x1e/0x20
[34377.346090]  [<ffffffff816cb6ee>] system_call_fastpath+0x12/0x71
[34377.346090] Code: 1d 3f 29 09 00 48 85 db 74 1f 48 8b 03 0f 1f 80 00 00 00 00 48 8b 7b 08 48 83 c3 10 4c 89 e6 ff d0 48 8b 03 48 85 c0 75 eb eb 90 <0f> 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54
[34377.346090] RIP  [<ffffffffa06e7397>] ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
[34377.346090]  RSP <ffff8800a7d4b6d8>
[34377.615401] ---[ end trace 91ac5312a6ee1288 ]---
[34377.618919] Kernel panic - not syncing: Fatal exception
[34377.619910] Kernel Offset: disabled

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally.

In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
there are 2 process repeatedly performing the following operations
respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a',
1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then
ftruncate(fd, CLUSTER_SIZE) again and again.

This is the backtrace when the deadlock happens:

   __wait_on_bit_lock+0x50/0xa0
   __lock_page+0xb7/0xc0
   ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
   ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
   do_page_mkwrite+0x66/0xc0
   handle_mm_fault+0x685/0x1350
   __do_page_fault+0x1d8/0x4d0
   trace_do_page_fault+0x37/0xf0
   do_async_page_fault+0x19/0x70
   async_page_fault+0x28/0x30

In ocfs2_write_begin_nolock(), we first grab the pages and then allocate
disk space for this write; ocfs2_try_to_free_truncate_log() will be
called if -ENOSPC is returned; if we're lucky to get enough clusters,
which is usually the case, we start over again.

But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we
will deadlock when trying to grab the target page again.

Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write().
Another deadlock will happen in __do_page_mkwrite() if
ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a
locked target page.

These two errors fail on the same path, so fix them by unlocking the
target page manually before ocfs2_free_write_ctxt().

Jan Kara helps me clear out the JBD2 part, and suggest the hint for root
cause.

Changes since v1:
1. Also put ENOMEM error case into consideration.

Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: He Gang <ghe@suse.com>
Acked-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c33f0785bf292cf1d15f4fbe42869c63e205b21c)

Conflicts:

fs/ocfs2/aops.c

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: fix race between convert and migration

Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
checks if lockres master has changed to identify whether new master has
finished recovery or not. This will introduce a race that right after
old master does umount ( means master will change), a new convert
request comes.

In this case, it will reset lockres state to DLM_RECOVERING and then
retry convert, and then fail with lockres->l_action being set to
OCFS2_AST_INVALID, which will cause inconsistent lock level between
ocfs2 and dlm, and then finally BUG.

Since dlm recovery will clear lock->convert_pending in
dlm_move_lockres_to_recovery_list, we can use it to correctly identify
the race case between convert and recovery. So fix it.

Fixes: ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.com
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e6f0c6e6170fec175fe676495f29029aecdf486c)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: solve a problem of crossing the boundary in updating backups

In update_backups() there exists a problem of crossing the boundary as
follows:

we assume that lun will be resized to 1TB(cluster_size is 32kb), it will
include 0~33554431 cluster, in update_backups func, it will backup super
block in location of 1TB which is the 33554432th cluster, so the
phenomenon of crossing the boundary happens.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Xue jiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 584dca3440732afa84fbca07567bb66e1453936a)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: use spinlock_irqsave() to downconvert lock in ocfs2_osb_dump()

Commit a75e9ccabd92 ("ocfs2: use spinlock irqsave for downconvert lock")
missed an unmodified place in ocfs2_osb_dump(), so it still exists a
deadlock scenario.

    ocfs2_wake_downconvert_thread
    ocfs2_rw_unlock
    ocfs2_dio_end_io
    dio_complete
    .....
    bio_endio
    req_bio_endio
    ....
    scsi_io_completion
    blk_done_softirq
    __do_softirq
    do_softirq
    irq_exit
    do_IRQ
    ocfs2_osb_dump
    cat /sys/kernel/debug/ocfs2/${uuid}/fs_state

This patch still uses spin_lock_irqsave() - replace spin_lock() to solve
this situation.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit bfd97a0320d338b2fce422adeddd512466ef2390)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: access orphan dinode before delete entry in ocfs2_orphan_del

In ocfs2_orphan_del, currently it finds and deletes entry first, and
then access orphan dir dinode.  This will have a problem once
ocfs2_journal_access_di fails.  In this case, entry will be removed from
orphan dir, but in deed the inode hasn't been deleted successfully.  In
other words, the file is missing but not actually deleted.  So we should
access orphan dinode first like unlink and rename.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 074a6c655f6da12cb1123c8a84bfd8d781138800)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: do not insert a new mle when another process is already migrating

When two processes are migrating the same lockres,
dlm_add_migration_mle() return -EEXIST, but insert a new mle in hash
list. dlm_migrate_lockres() will detach the old mle and free the new
one which is already in hash list, that will destroy the list.

Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 32e493265b2be96404aaa478fb2913be29b06887)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix slot overwritten if storage link down during mount

The following case will lead to slot overwritten.

N1                               N2
mount ocfs2 volume, find and
allocate slot 0, then set
osb->slot_num to 0, begin to
write slot info to disk
                                 mount ocfs2 volume, wait for super lock
write block fail because of
storage link down, unlock
super lock
                                 got super lock and also allocate slot 0
                                 then unlock super lock

mount fail and then dismount,
since osb->slot_num is 0, try to
put invalid slot to disk. And it
will succeed if storage link
restores.
                                 N2 slot info is now overwritten

Once another node say N3 mount, it will find and allocate slot 0 again,
which will lead to mount hung because journal has already been locked by
N2.  so when write slot info failed, invalidate slot in advance to avoid
overwrite slot.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1247017f43a93eae3d64b7c25f3637dc545f5a47)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: return appropriate value when dlm_grab() returns NULL

dlm_grab() may return NULL when the node is doing unmount.  When doing
code review, we found that some dlm handlers may return error to caller
when dlm_grab() returns NULL and make caller BUG or other problems.
Here is an example:

Node 1                                 Node 2
receives migration message
from node 3, and send
migrate request to others
                                     start unmounting

                                     receives migrate request
                                     from node 1 and call
                                     dlm_migrate_request_handler()

                                     unmount thread unregisters
                                     domain handlers and removes
                                     dlm_context from dlm_domains

                                     dlm_migrate_request_handlers()
                                     returns -EINVAL to node 1
Exit migration neither clearing the
migration state nor sending
assert master message to node 3 which
cause node 3 hung.

Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c372f2193a2e73d5936bf37259ae63ca388b4cbc)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: wait until DLM_LOCK_RES_SETREF_INPROG is cleared in dlm_deref_lockres_worker

Commit f3f854648de6 ("ocfs2_dlm: Ensure correct ordering of set/clear
refmap bit on lockres") still exists a race which can't ensure the
ordering is exactly correct.

Node1               Node2                    Node3
umount, migrate
lockres to Node2
                    migrate finished,
                    send migrate request
                    to Node3
                                              received migrate request,
                                              create a migration_mle,
                                              respond to Node2.
                    set DLM_LOCK_RES_SETREF_INPROG
                    and send assert master to
                    Node3
                                              delete migration_mle in
                                              assert_master_handler,
                                              Node3 umount without response
                                              dlm_thread purge
                                              this lockres, send drop
                                              deref message to Node2
                    found the flag of
                    DLM_LOCK_RES_SETREF_INPROG
                    is set, dispatch
                    dlm_deref_lockres_worker to
                    clear refmap, but in function of
                    dlm_deref_lockres_worker,
                    only if node in refmap it wait
                    DLM_LOCK_RES_SETREF_INPROG
                    to be cleared. So worker is
                    done successfully

                                              purge lockres, send
                                              assert master response
                                              to Node1, and finish umount
                    set Node3 in refmap, and it
                    won't be cleared forever, thus
                    lead to umount hung

so wait until DLM_LOCK_RES_SETREF_INPROG is cleared in
dlm_deref_lockres_worker.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b5560143385e18b4109ad6951c7719705e3dd995)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: fix a race between purge and migration

We found a race between purge and migration when doing code review.
Node A put lockres to purgelist before receiving the migrate message
from node B which is the master.  Node A call dlm_mig_lockres_handler to
handle this message.

dlm_mig_lockres_handler
  dlm_lookup_lockres
  >>>>>> race window, dlm_run_purge_list may run and send
         deref message to master, waiting the response
  spin_lock(&res->spinlock);
  res->state |= DLM_LOCK_RES_MIGRATING;
  spin_unlock(&res->spinlock);
  dlm_mig_lockres_handler returns

  >>>>>> dlm_thread receives the response from master for the deref
  message and triggers the BUG because the lockres has the state
  DLM_LOCK_RES_MIGRATING with the following message:

dlm_purge_lockres:209 ERROR: 6633EB681FA7474A9C280A4E1A836F0F: res
M0000000000000000030c0300000000 in use after deref

Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 30bee898f86506893883ffb8db20d8101a29b5f5)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: clear migration_pending when migration target goes down

We have found a BUG on res->migration_pending when migrating lock
resources.  The situation is as follows.

dlm_mark_lockres_migration
  res->migration_pending = 1;
  __dlm_lockres_reserve_ast
  dlm_lockres_release_ast returns with res->migration_pending remains
      because other threads reserve asts
  wait dlm_migration_can_proceed returns 1
  >>>>>>> o2hb found that target goes down and remove target
          from domain_map
  dlm_migration_can_proceed returns 1
  dlm_mark_lockres_migrating returns -ESHOTDOWN with
      res->migration_pending still remains.

When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON
with res->migration_pending.  So clear migration_pending when target is
down.

Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit cc28d6d80f6ab494b10f0e2ec949eacd610f66e3)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix BUG when calculate new backup super

When resizing, it firstly extends the last gd.  Once it should backup
super in the gd, it calculates new backup super and update the
corresponding value.

But it currently doesn't consider the situation that the backup super is
already done.  And in this case, it still sets the bit in gd bitmap and
then decrease from bg_free_bits_count, which leads to a corrupted gd and
trigger the BUG in ocfs2_block_group_set_bits:

    BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);

So check whether the backup super is done and then do the updates.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5c9ee4cbf2a945271f25b89b137f2c03bbc3be33)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error

In ocfs2_mknod_locked if '__ocfs2_mknod_locke d' returns an error, we
should reclaim the inode successfully claimed above, otherwise, the
inode never be reused. The case is described below:

ocfs2_mknod
    ocfs2_mknod_locked
        ocfs2_claim_new_inode
                Successfully claim the inode
        __ocfs2_mknod_locked
            ocfs2_journal_access_di
            Failed because of -ENOMEM or other reasons, the inode
                        lockres has not been initialized yet.

    iput(inode)
        ocfs2_evict_inode
            ocfs2_delete_inode
                ocfs2_inode_lock
                    ocfs2_inode_lock_full_nested
                        __ocfs2_cluster_lock
                                Return -EINVAL because of the inode
                                lockres has not been initialized.

                So the following operations are not performed
                ocfs2_wipe_inode
                        ocfs2_remove_inode
                                ocfs2_free_dinode
                                        ocfs2_free_suballoc_bits

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b1529a41f777a48f95d4af29668b70ffe3360e1b)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix race between mount and delete node/cluster

There is a race case between mount and delete node/cluster, which will
lead o2hb_thread to malfunctioning dead loop.

    o2hb_thread
    {
        o2nm_depend_this_node();
        <<<<<< race window, node may have already been deleted, and then
               enter the loop, o2hb thread will be malfunctioning
               because of no configured nodes found.
        while (!kthread_should_stop() &&
               !reg->hr_unclean_stop && !reg->hr_aborted_start) {
    }

So check the return value of o2nm_depend_this_node() is needed.  If node
has been deleted, do not enter the loop and let mount fail.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 0986fe9b50f425ec81f25a1a85aaf3574b31d801)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: unlock lockres spinlock before dlm_lockres_put

dlm_lockres_put will call dlm_lockres_release if it is the last
reference, and then it may call dlm_print_one_lock_resource and
take lockres spinlock.

So unlock lockres spinlock before dlm_lockres_put to avoid deadlock.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b67de018b37a97548645a879c627d4188518e907)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: avoid access invalid address when read o2dlm debug messages

The following case will lead to a lockres is freed but is still in use.

cat /sys/kernel/debug/o2dlm/locking_state dlm_thread
lockres_seq_start
    -> lock dlm->track_lock
    -> get resA
                                                resA->refs decrease to 0,
                                                call dlm_lockres_release,
                                                and wait for "cat" unlock.
Although resA->refs is already set to 0,
increase resA->refs, and then unlock
                                                lock dlm->track_lock
                                                    -> list_del_init()
                                                    -> unlock
                                                    -> free resA

In such a race case, invalid address access may occurs.  So we should
delete list res->tracking before resA->refs decrease to 0.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f57a22ddecd6f26040a67e2c12880f98f88b6e00)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix a tiny case that inode can not removed

When running dirop_fileop_racer we found a case that inode
can not removed.

Two nodes, say Node A and Node B, mount the same ocfs2 volume.  Create
two dirs /race/1/ and /race/2/ in the filesystem.

  Node A                            Node B
  rm -r /race/2/
                                    mv /race/1/ /race/2/
  call ocfs2_unlink(), get
  the EX mode of /race/2/
                                    wait for B unlock /race/2/
  decrease i_nlink of /race/2/ to 0,
  and add inode of /race/2/ into
  orphan dir, unlock /race/2/
                                    got EX mode of /race/2/. because
                                    /race/1/ is dir, so inc i_nlink
                                    of /race/2/ and update into disk,
                                    unlock /race/2/
  because i_nlink of /race/2/
  is not zero, this inode will
  always remain in orphan dir

This patch fixes this case by test whether i_nlink of new dir is zero.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Xue jiufei <xuejiufei@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 928dda1f9433f024ac48c3d97ae683bf83dd0e42)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: trusted xattr missing CAP_SYS_ADMIN check

The trusted extended attributes are only visible to the process which
hvae CAP_SYS_ADMIN capability but the check is missing in ocfs2
xattr_handler trusted list. The check is important because this will be
used for implementing mechanisms in the userspace for which other
ordinary processes should not have access to.

Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Taesoo kim <taesoo@gatech.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 0f5e7b41f91814447defc34e915fc5d6e52266d9)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: set filesytem read-only when ocfs2_delete_entry failed.

In ocfs2_rename, it will lead to an inode with two entried(old and new) if
ocfs2_delete_entry(old) failed.  Thus, filesystem will be inconsistent.

The case is described below:

ocfs2_rename
    -> ocfs2_start_trans
    -> ocfs2_add_entry(new)
    -> ocfs2_delete_entry(old)
        -> __ocfs2_journal_access *failed* because of -ENOMEM
    -> ocfs2_commit_trans

So filesystem should be set to read-only at the moment.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 807a7907114c7c703017ed7a96477a2eeb0d08e0)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix NULL pointer dereference in function ocfs2_abort_trigger()

ocfs2_abort_trigger() use bh->b_assoc_map to get sb. But there's no
function to set bh->b_assoc_map in ocfs2, it will trigger NULL pointer
dereference while calling this function. We can get sb from
bh->b_bdev->bd_super instead of b_assoc_map.

[akpm@linux-foundation.org: update comment, per Joseph]
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 74e364ad1b13fd518a0bd4e5aec56d5e8706152f)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

kvm:vmx: more complete state update on APICv on/off

The function to update APICv on/off state (in particular, to deactivate
it when enabling Hyper-V SynIC) is incomplete: it doesn't adjust
APICv-related fields among secondary processor-based VM-execution
controls. As a result, Windows 2012 guests get stuck when SynIC-based
auto-EOI interrupt intersected with e.g. an IPI in the guest.

In addition, the MSR intercept bitmap isn't updated every time "virtualize
x2APIC mode" is toggled. This path can only be triggered by a malicious
guest, because Windows didn't use x2APIC but rather their own synthetic
APIC access MSRs; however a guest running in a SynIC-enabled VM could
switch to x2APIC and thus obtain direct access to host APIC MSRs
(CVE-2016-4440).

The patch fixes those omissions.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Reported-by: Steve Rutherford <srutherford@google.com>
Reported-by: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Orabug: 23347009
CVE: CVE-2016-4440
Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>

dtrace: eliminate need for arg counting in sdt macros

The existing SDT probe macros are of the form

DTRACE_PROBEn(probe, ...)

where there must be as many args in ... as the 'n'.

This is implemented via terribly repetitive code in include/linux/sdt.h,
made much worse by the recent typed sdt additions.  Reduce
repetitiveness, and drop the 'n' variants of the macros.

All the complexity is hidden away in include/linux/sdt_internal.h.

It's derived from the tricks the perf probes were already using, but
improved to handle the zero-argument case and armoured against namespace
pollution with copious __ing.  Even when DTrace is turned off, we use a
do-nothing form of the same macro to verify that the number of arguments
in the probe is valid.

There is only one public macro now, DTRACE_PROBE(), and the usual thin
wrappers around it, which have also gone non-numbered (DTRACE_PROC,
DTRACE_IO, etc): all uses have been adjusted.

These macros use a family of semi-public macros, described below.

__DTRACE_DOUBLE_APPLY(type_macro, arg_macro, ...): This takes a type and
  arg macro and a set of pairs of arguments and calls the first argument
  on each pair with the type macro, the second with the arg macro, etc:
  if called with (type, arg, a, b, c, d) it will expand to something like

  type(a), arg(b), type(c), arg(d)

__DTRACE_DOUBLE_APPLY_NOCOMMA(type_macro, arg_macro, ...): As above, but
  omits all commas between arguments: the example above will yield

  type(a) arg(b) type(c) arg(d)

  (suitable for string concatenation).

__DTRACE_TYPE_APPLY(type_macro, arg_macro, ...): as above, but only
  calls the type_macro:

  type(a), type(c)

__DTRACE_ARG_APPLY(type_macro, arg_macro, ...): as above, but only
  calls the arg_macro:

  arg(b), arg(d)

__DTRACE_TYPE_APPLY_DEFAULT(type_macro, arg_macro, def, ...): as above,
  but takes an extra parameter to be used as a default when there are no
  args.  Calling it with (type, arg, void, a, b, c, d) will yield

  type(a), arg(b), type(c), arg(d)

  but calling it with (type, arg, void) will yield

  void

All the macros above take any number of pairs of arguments up to eight:
providing any other number, including odd numbers, is a compilation
error.

__DTRACE_APPLY(macro, ...): This is pre-existing, and calls the macro
  repeatedly with the specified args, comma-separated:

  m(a), m(b), m(c), ...

__DTRACE_APPLY_DEFAULT(macro, def, ...): Like DTRACE_TYPE_APPLY_DEFAULT,
  takes a default for use when there are no args.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 24678897

dtrace: augment SDT probes with type information

With all the other work done, we can finally move the SDT probe
type information out of the static sdt_args array in the SDT kernel
module and put them in the probe definitions they relate to.

Most of it is already there -- all probes already carry type
information.  All that is missing is translations, so we add those.

The syntax for these is

type: translation
  or
type : (translation, ...)

i.e. a source type, followed by a colon, followed by zero or more
translated types (if there is more than one, they must be bracketed),
with optional spaces if you like.  The resulting probe will have as many
arguments as there are translations.  Obviously, if DTrace userspace is
to do anything with this knowledge it will need a translator from the
specified source type to each of the specified translated types: DTrace
picks these up from kernel-versioned subdirectories of
/usr/lib64/dtrace/ (e.g. /usr/lib64/dtrace/4.1).

If a probe appears more than once (say, in different functions), each
instance must presently have the same types.  Changing translations
between instances of a single probe will be diagnosed if CONFIG_DT_DEBUG
is enabled.

Typos in the types are only diagnosed at runtime: a dtrace -vln of your
new probe will tell you if all is well.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 24661801

dtrace: import the sdt type information into per-sdt_probedesc state

Now we have type and probe names in ELF sections, we have to do something
with them (it's initdata, so in the kernel, if not in modules, it will
be thrown away after initialization.)

We marshal each arg string in turn, without parsing it, into a new field
in the sdt_probedesc_t, the per-probe structure used to communicate
information about SDT probes to the SDT kernel module.  Before now, this
has been a simple task at runtime: the array is constructed by
dtrace_sdt.sh and just needs to be reformulated (for the core kernel) or
dropped unchanged into place (for modules).  But this extra field is
derived from C macro expansion and cannot be determined by
dtrace_sdt.sh: so it leaves it 0 and we fill it in at boot / module load
time.

The fillout process is a little baroque to avoid slowing down the boot
too much, since we have to associate one array with another and want
to avoid linear scans.  We build a hashtable mapping from probe name
to args string (i.e. from _dtrace_sdt_names to _dtrace_sdt_args entry),
in the process verifying that if the probe appears multiple times,
all its arg strings are the same.  We can then easily use this hash
table to point the extra argument types field in the sdt_probedesc_t
array at the args string too.

One inefficiency remains: there can be *lots* of duplicate args strings,
and currently we make no attempt to deduplicate them.  (We discard the
core kernel's array of probe names, but the array of arg strings
obviously must be preserved.  We could deduplicate it, but do not.)

This is probably wasting no more than a few kilobytes at present, so
a deduplicator is not worth writing, but as the number of probes
(particularly perf probes) increases, one may become worth writing.

The new sdpd_args field into which the type info string is dropped is
protected from the ABI checker: the sdt_probedesc_t is internal to the
kernel and the DTrace module, and the usual ABI guarantees do not apply
to it.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 24661801

dtrace: record SDT and perf probe types in a new ELF section

Before now, the types specified in the SDT probe macros in
include/linux/sdt.h were simply thrown away: DTrace was never
informed of them, and instead got type information for SDT probes
from a hardwired list in the out-of-tree sdt module.

As a start to fixing this, export the SDT type information into
a new ELF section named _dtrace_sdt_args. Both native SDT probes
and perf probes translated into SDT probes use the same section:
its format is a sequence of probe type strings, with individual
arguments separated by commas, each type string separated from the
next by a NUL:

type string, comma-separated\0
...

(This is the same format used by perf probe prototype declarations,
and is what you'd get if you removed the argument names from a DTrace
SDT probe specification.)

The only difference between SDT and perf probe type strings is that the
type string for perf probes is also used to define a tracing function,
so it includes argument names. The SDT format has various other
extensions which do not affect this commit so will be described later.

The type strings in the section are in no particular order (it depends on the
order in the source code) and may contain duplicates, so to associate
each string with a probe, there is another section, _dtrace_sdt_names,
which is a simple null-terminated array (string table) of names, associated
1:1 with the type strings in the _dtrace_sdt_types section.

At this point in the patch series, nothing uses the newly-added sections.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>
Orabug: 24661801

dtrace: ensure new SDT info generation works on sparc64

Due to addresses on sparc64 being in the lower range of the memory map, the
calculated addresses for SDT probe points were represented with less hex
characters then their function base address counterparts (obtained from
objdump output). This messed up the sorting based on address values, and
resulted in all probe points being associated with the last function in the
list.

This commit ensures that all addresses are 16 hex characters long.

Orabug: 24655168

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: rework kernel sdtinfo generation to be more accurate

With changes in the vmlinux.lds.S linker script for x86 in the 4.3 kernel
series, the existing mechanism for determining the addresses of SDT probe
points in the kernel proper was failing. The assumption that various
flavours of text sections were being concatenated into a single .text
section no longer holds.

The new implementation uses a relocation-retaining linking step in the
vmlinux linking process, which enables us to gain access to the final
addresses of symbols in the .text section, and also makes it very easy
to determine the addresses of SDT probes (.text base address + relocation
offset). We re-use the .tmp-vmlinux1 output file for the output of this
linking step because it is performed after that output file is no longer
needed in the kernel linking process.

Due to the relocation-retaining linking that occurs, the assertion test
for kernel image size had to be modified, because the _end symbol value
ends up being a relative offset rather than an absolute address.

Orabug: 24655168

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: David Mc Lean <david.mclean@oracle.com>

vfs: add vfs_select_inode() helper

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # v4.2+
Orabug: 24363418
CVE:CVE-2016-6198,CVE-2016-6197
Based on mainline v4.6 commit 54d5ca871e72f2bb172ec9323497f01cd5091ec7
Conflicts:
include/linux/dcache.h - code base
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

ctf: fix CONFIG_CTF && !CONFIG_DTRACE and CONFIG_DT_DISABLE_CTF

These combinations of config options, one of which should enable
CTF without enabling DTrace, the other one of which should enable
DTrace without enabling CTF, are both broken.

Fix them by making the CTF <-> DT_DISABLE_CTF relationship bidirectional,
by making dwarf2ctf compilation depend properly on CONFIG_CTF rather than
CONFIG_DTRACE, and by fixing the newly-introduced order-only dependency
of CTF on the sdtinfo stubs files so that it is only enforced when
both CTF and DTrace are enabled.

Orabug: 23859082
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-By: Todd Vierling <todd.vierling@oracle.com>

ext4: only call ext4_truncate when size <= isize

Orabug: 23598757

At LSF we decided that if we truncate up from isize we shouldn't trim
fallocated blocks that were fallocated with KEEP_SIZE and are past the
new i_size.  This patch fixes ext4 to do this.

[ Completely reworked patch so that i_disksize would actually get set
  when truncating up.  Also reworked the code for handling truncate so
  that it's easier to handle. -- tytso ]

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
(cherry picked from commit 3da40c7b089810ac9cf2bb1e59633f619f3a7312)
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Conflicts:
fs/ext4/inode.c

dtrace: better Kconfig documentation

It's still skeletal, but it's a lot better than oceans of "To be
written".

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: put the SDT perf probes in the perf provider namespace

The perf probes, like most other sdt probes, belong in an appropriate
provider, not under the overarching sdt multiprovider.

Put them in a provider named 'perf'.

Orabug: 23004534
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: make perf-events probes separately configured

This makes them look like every other provider, and lets us
conditionalize them on CONFIG_TRACEPOINTS, like other tracepoint users
do.

Orabug: 23004534
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: ensure pdata is large enough

With the sudden (large) increase of SDT probe points, the previously
statically defined default size for the kernel pdata block was not
sufficient. The code has been modified to calculate the required size
for the pdata block at runtime.

This primarily affects sparc where the pdata also covers the memory
block used for SDT trampolines. It is now dependent on the actual
number of SDT probes in the kernel.

This does not affect modules because they were already allocating the
needed memory block at load time based on the actual number of probes.

Orabug: 23004534
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Nick Alcock <nick.alcock@oracle.com>

dtrace: use perf_events probes

For every trace point defined, also define an SDT probe. This allows
DTrace to expose probe points maintained by upstream.

[nca: fixed TODOs: added DTRACE_UINTPTR_CAST_EACH so that tracepoints
      that pass structures by value will still compile: we jam the
      structure into a uintptr if it will fit, otherwise passing its
      address in.

      This is only partial, so far: there is no CTF type info for any
      of these new probes, but this is a relatively minor issue that can
      be fixed later.]

Orabug: 23004534
Signed-off-by: Timothy J Fontaine <tj.fontaine@oracle.com>
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>

dtrace: add support for probes in sections other than .text

This commit adds support for SDT probes in sections other than the
regular .text section, both for the kernel and kernel modules.

The core of the problem is that probe locations (for modules and
for the kernel proper) were stored as an offset relatve to the start
of the code section they occur in.  Processing them at load time (or
boot time, for the kernel) resulted in incorrect addresses for any
probes in sections other than .text because the offset was getting
applied to the wrong base address.

I.e. a probe at offset 0x1d0 in section .text.unlikely would be
resolved as
BASE(.text) + 0x1d0
rather than
BASE(.text.unlikely) + 0x1d0

The end result was that (using x86_64 as example) we would be writing
a 5-byte NOP sequence in a location that was really not expecting that,
with very odd and entirely unpredictable results.

Solving this required two distinct mechanisms because modules are
linked in a way where the distict code sections are retained.  Only
at module load time are they laoded consecutively into a single block
of memory that starts at module_core; worse yet, probes can be
duplicated by the compiler into multiple such sections.  This means
that a simple offset will not do; the only thing we can rely on is the
function name (since symbol names are guaranteed unique, and the
compiler synthesizes new ones not valid in C when it clones and
specializes functions), and its start address.

The question is how to identify the functions.  We can't use the symbol
table index, because the symbol table is rewritten by strip(1) as part
of RPM generation: and we can't directly refer to the names given to
cloned functions because they are not valid in C: even if valid, they
might well be local to the translation unit in any case.  So we run the
.o file through a new tool, scripts/kmodsdt, which identifies all
functions containing DTrace probes by hunting for relocations to the
__dtrace_probe_* probe calls and, if the containing functions are local,
introduces an alias to each named __dta_<function>_<symindex> (the
function name and index are necessary because these functions are local,
so there could be several of them with the same name in different
translation units: the symindex makes the aliases unique).  The function
name is rewritten to ensure it is valid in C by translating . into _.

The address in the sdtinfo then references each function symbol (or
alias) and simply adds the offset from that function symbol to the probe
to it in perfectly normal C code, and the linker then deals with all the
problems involving keeping the addresses valid across stripping, module
loading, etc.

When the module is loaded, the kernel resolves all relocations.  This
means that the sdtinfo data that is compiled into the module will have
the exact address of all probe locations by the time we need it.

The kernel is a different story altogether...

The kernel performs a final link into an executable object, merging
all non-init code sections into a single .text.  This mens that for
the kernel, we need to calculate the absolute address of each probe
location based on the knowledge that section merging occurs.

We make use of the multi-step linking process that the kernel uses to
ensure stability of the kallsyms data.  Three steps take place in the
sdtinfo generation process (though they will appear as a single step,
because they are a single pipeline):

  - a list of code sections is extracted from vmlinux.o, and for each
    section we take note of the function with the lowest offset in
    the section

  - temporary kernel image .tmp_vmlinux1 is used to determine the
    absolute load address of each merged section by searching for the
    first function of each respective section, and subtracting its
    offset from its address.

  - using the list of absolute base addresses for each section, the
    list of probes (associated with the function they occur in, and
    the section the function belongs in) gets its absolute addresses
    for the probe locations by adding the probe offset (relative to
    its encapsulating section) to the section base address.

(Note: if two distinct sections in the kernel happen to have fuctions
       at the lowest offset with identical names, this process cannot
       distinguish between those two sections.  That is a limitation
       of the heuristic but is extremely unlikely.  If it were to
       occur, a fix would require to use a more complex comparison to
       determine whether a function in the final kernel image is the
       start of one section or another.)

Orabug: 23344927
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Nick Alcock <nick.alcock@oracle.com>

dtrace, ctf: build sdtstubs and CTF after sdtinfo; sdtinfo follows modpost

In the following commit, sdtinfo generation will begin to
rewrite the .o file.  When this is happening, we cannot
have sdtstubs generation, dwarf2ctf, or modpost run in parallel
with it, since they all read the .o file (in modpost's case, for
symbol versions; in sdtstubs' case, for the list of symbols;
in dwarf2ctf's case, for the debugging info).  So have both of
the latter depend on the sdtinfo.c files that the sdtinfo pass
generates.  We force sdtinfo generation to run after modpost
by depending on the .mod.c files modpost generates.

For the CTF side of things, we use an order-only prerequisite
because sdtinfo's changes do not change anything which might
require dwarf2ctf to rerun, and so as not to disturb $^, which
we use to build the list of files dwarf2ctf should run over.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com

btrfs: put delayed item hook into inode

Orabug: 23513043

Inodes for delayed iput allocate a trivial helper structure, let's place
the list hook directly into the inode and save a kmalloc (killing a
__GFP_NOFAIL as a bonus) at the cost of increasing size of btrfs_inode.

The inode can be put into the delayed_iputs list more than once and we
have to keep the count. This means we can't use the list_splice to
process a bunch of inodes because we'd lost track of the count if the
inode is put into the delayed iputs again while it's processed.

Signed-off-by: David Sterba <dsterba@suse.com>
(cherry picked from commit 8089fe62c6603860f6796ca80519b92391292f21)
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Conflicts:
fs/btrfs/inode.c

Revert "net: preserve IP control block during GSO segmentation"

Orabug: 23522263

This reverts commit a9e889c735eef731df3fce53fb997277f3b0b498.

Yuval Shaia and team discovered that the aforementioned commit
resulted in a substantial (900-to-1) performance loss on a TCP
bandwidth test over IPoIB. Upstream is aware of the problem
but there is currently no solution in hand, so just revert the
offending commmit.

Note that the original commit purports to fix a kernel crash,
but so far we haven't seen the crash.

Signed-off-by: Dan Duval <dan.duval@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Tested-by: Rose Wang <rose.wang@oracle.com>

dtrace: support SDT in single-file modules

A limitation of long standing in the SDT machinery is that it didn't
support modules that were implemented entirely in one file.  This is
because the SDT stubs generation machinery (which emits zero-byte
assembler stubs for all DTrace probe points) was implemented at the
point at which multi-file modules are linked into a single file prior to
modpost.  Since single-file modules *have* no such stage, they never got
their stubs, and probes never fired.  This becomes essential to fix so
that perf_events probes will work, because some of those are in
single-file modules.

We fix it by tearing out the sdtstub code from its prior home in
Makefile.build, and moving it into the late modpost stage, at the same
point at which sdtinfo files are generated.  We use the same prerequites
trick as is used for CTF files to link the sdtstub files into the
resulting modules; because this all happens at the very last possible
moment, it shares the same code path for single- and multifile modules,
and is a good bit simpler to boot (no need to manipulate the link
process or transform filenames intricately during stub generation).

There is one additional wrinkle: because the dtrace probes are now
always undefined until the last minute, we must adjust modpost to
disregard all undefined symbols that start "__dtrace_probe_".

Orabug: 23316392
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Acked-by: Kris Van Hees <kris.van.hees@oracle.com>

Revert "dtrace: support SDT in single-file modules"

This reverts commit daf016df43a40608ca00906c9c42bcbdb48712e3.

Orabug: 23344927
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

Revert "dtrace: use perf_events probes"

This reverts commit d2f3d53c0470606652755c8afa32eebf154e4763.

Orabug: 23344927
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

Revert "dtrace: ensure pdata is large enough"

This reverts commit 3acb568516269ead26f9916f0d01c9ae553c590c.

Orabug: 23344927
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

Revert "dtrace: make perf-events probes separately configured"

This reverts commit a3b2e729cb05de0e870b5f1a1778a6ac4a233c6e.

Orabug: 23344927
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

Revert "dtrace: put the SDT perf probes in the perf provider namespace"

This reverts commit f357a87ee7bf5d6db96643734bea35db63f512cc.

Orabug: 23344927
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

Revert "dtrace: better Kconfig documentation"

This reverts commit b5ac3f613225e7cddc9600afd7eb499018132842.

Orabug: 23344927
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

Correct backport of fa3c776 ("Thermal: Ignore invalid trip points")

Orabug: 23331182

Backport of 81ad4276b505e987dd8ebbdf63605f92cd172b52 failed to adjust
for intervening ->get_trip_temp() argument type change, thus causing
stack protector to panic.

drivers/thermal/thermal_core.c: In function ‘thermal_zone_device_register’:
drivers/thermal/thermal_core.c:1569:41: warning: passing argument 3 of
‘tz->ops->get_trip_temp’ from incompatible pointer type [-Wincompatible-pointer-types]
if (tz->ops->get_trip_temp(tz, count, &trip_temp))
^
drivers/thermal/thermal_core.c:1569:41: note: expected ‘long unsigned int *’
but argument is of type ‘int *’

CC: <stable@vger.kernel.org> #3.18,#4.1
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>
(cherry picked from commit 5640c4c37eee293451388cd5ee74dfed3a30f32d)

Conflict:

drivers/thermal/thermal_core.c

tcp_cubic: do not set epoch_start in the future

Orabug: 23331181

[ Upstream commit c2e7204d180f8efc80f27959ca9cf16fa17f67db ]

Tracking idle time in bictcp_cwnd_event() is imprecise, as epoch_start
is normally set at ACK processing time, not at send time.

Doing a proper fix would need to add an additional state variable,
and does not seem worth the trouble, given CUBIC bug has been there
forever before Jana noticed it.

Let's simply not set epoch_start in the future, otherwise
bictcp_update() could overflow and CUBIC would again
grow cwnd too fast.

This was detected thanks to a packetdrill test Neal wrote that was flaky
before applying this fix.

Fixes: 30927520dbae ("tcp_cubic: better follow cubic curve after idle period")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Cc: Jana Iyengar <jri@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit af05df01c7a9f4b3cf3dc29365f4c7c4e2cd53ff)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

usb: xhci: fix xhci locking up during hcd remove

Orabug: 23331180

[ Upstream commit ad6b1d914a9e07f3b9a9ae3396f3c840d0070539 ]

The problem seems to be that if a new device is detected
while we have already removed the shared HCD, then many of the
xhci operations (e.g.  xhci_alloc_dev(), xhci_setup_device())
hang as command never completes.

I don't think XHCI can operate without the shared HCD as we've
already called xhci_halt() in xhci_only_stop_hcd() when shared HCD
goes away. We need to prevent new commands from being queued
not only when HCD is dying but also when HCD is halted.

The following lockup was detected while testing the otg state
machine.

[  178.199951] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[  178.205799] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
[  178.214458] xhci-hcd xhci-hcd.0.auto: hcc params 0x0220f04c hci version 0x100 quirks 0x00010010
[  178.223619] xhci-hcd xhci-hcd.0.auto: irq 400, io mem 0x48890000
[  178.230677] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[  178.237796] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[  178.245358] usb usb1: Product: xHCI Host Controller
[  178.250483] usb usb1: Manufacturer: Linux 4.0.0-rc1-00024-g6111320 xhci-hcd
[  178.257783] usb usb1: SerialNumber: xhci-hcd.0.auto
[  178.267014] hub 1-0:1.0: USB hub found
[  178.272108] hub 1-0:1.0: 1 port detected
[  178.278371] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[  178.284171] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2
[  178.294038] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
[  178.301183] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[  178.308776] usb usb2: Product: xHCI Host Controller
[  178.313902] usb usb2: Manufacturer: Linux 4.0.0-rc1-00024-g6111320 xhci-hcd
[  178.321222] usb usb2: SerialNumber: xhci-hcd.0.auto
[  178.329061] hub 2-0:1.0: USB hub found
[  178.333126] hub 2-0:1.0: 1 port detected
[  178.567585] dwc3 48890000.usb: usb_otg_start_host 0
[  178.572707] xhci-hcd xhci-hcd.0.auto: remove, state 4
[  178.578064] usb usb2: USB disconnect, device number 1
[  178.586565] xhci-hcd xhci-hcd.0.auto: USB bus 2 deregistered
[  178.592585] xhci-hcd xhci-hcd.0.auto: remove, state 1
[  178.597924] usb usb1: USB disconnect, device number 1
[  178.603248] usb 1-1: new high-speed USB device number 2 using xhci-hcd
[  190.597337] INFO: task kworker/u4:0:6 blocked for more than 10 seconds.
[  190.604273]       Not tainted 4.0.0-rc1-00024-g6111320 #1058
[  190.610228] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  190.618443] kworker/u4:0    D c05c0ac0     0     6      2 0x00000000
[  190.625120] Workqueue: usb_otg usb_otg_work
[  190.629533] [<c05c0ac0>] (__schedule) from [<c05c10ac>] (schedule+0x34/0x98)
[  190.636915] [<c05c10ac>] (schedule) from [<c05c1318>] (schedule_preempt_disabled+0xc/0x10)
[  190.645591] [<c05c1318>] (schedule_preempt_disabled) from [<c05c23d0>] (mutex_lock_nested+0x1ac/0x3fc)
[  190.655353] [<c05c23d0>] (mutex_lock_nested) from [<c046cf8c>] (usb_disconnect+0x3c/0x208)
[  190.664043] [<c046cf8c>] (usb_disconnect) from [<c0470cf0>] (_usb_remove_hcd+0x98/0x1d8)
[  190.672535] [<c0470cf0>] (_usb_remove_hcd) from [<c0485da8>] (usb_otg_start_host+0x50/0xf4)
[  190.681299] [<c0485da8>] (usb_otg_start_host) from [<c04849a4>] (otg_set_protocol+0x5c/0xd0)
[  190.690153] [<c04849a4>] (otg_set_protocol) from [<c0484b88>] (otg_set_state+0x170/0xbfc)
[  190.698735] [<c0484b88>] (otg_set_state) from [<c0485740>] (otg_statemachine+0x12c/0x470)
[  190.707326] [<c0485740>] (otg_statemachine) from [<c0053c84>] (process_one_work+0x1b4/0x4a0)
[  190.716162] [<c0053c84>] (process_one_work) from [<c00540f8>] (worker_thread+0x154/0x44c)
[  190.724742] [<c00540f8>] (worker_thread) from [<c0058f88>] (kthread+0xd4/0xf0)
[  190.732328] [<c0058f88>] (kthread) from [<c000e810>] (ret_from_fork+0x14/0x24)
[  190.739898] 5 locks held by kworker/u4:0/6:
[  190.744274]  #0:  ("%s""usb_otg"){.+.+.+}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
[  190.752799]  #1:  ((&otgd->work)){+.+.+.}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
[  190.761326]  #2:  (&otgd->fsm.lock){+.+.+.}, at: [<c048562c>] otg_statemachine+0x18/0x470
[  190.769934]  #3:  (usb_bus_list_lock){+.+.+.}, at: [<c0470ce8>] _usb_remove_hcd+0x90/0x1d8
[  190.778635]  #4:  (&dev->mutex){......}, at: [<c046cf8c>] usb_disconnect+0x3c/0x208
[  190.786700] INFO: task kworker/1:0:14 blocked for more than 10 seconds.
[  190.793633]       Not tainted 4.0.0-rc1-00024-g6111320 #1058
[  190.799567] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  190.807783] kworker/1:0     D c05c0ac0     0    14      2 0x00000000
[  190.814457] Workqueue: usb_hub_wq hub_event
[  190.818866] [<c05c0ac0>] (__schedule) from [<c05c10ac>] (schedule+0x34/0x98)
[  190.826252] [<c05c10ac>] (schedule) from [<c05c4e40>] (schedule_timeout+0x13c/0x1ec)
[  190.834377] [<c05c4e40>] (schedule_timeout) from [<c05c19f0>] (wait_for_common+0xbc/0x150)
[  190.843062] [<c05c19f0>] (wait_for_common) from [<bf068a3c>] (xhci_setup_device+0x164/0x5cc [xhci_hcd])
[  190.852986] [<bf068a3c>] (xhci_setup_device [xhci_hcd]) from [<c046b7f4>] (hub_port_init+0x3f4/0xb10)
[  190.862667] [<c046b7f4>] (hub_port_init) from [<c046eb64>] (hub_event+0x704/0x1018)
[  190.870704] [<c046eb64>] (hub_event) from [<c0053c84>] (process_one_work+0x1b4/0x4a0)
[  190.878919] [<c0053c84>] (process_one_work) from [<c00540f8>] (worker_thread+0x154/0x44c)
[  190.887503] [<c00540f8>] (worker_thread) from [<c0058f88>] (kthread+0xd4/0xf0)
[  190.895076] [<c0058f88>] (kthread) from [<c000e810>] (ret_from_fork+0x14/0x24)
[  190.902650] 5 locks held by kworker/1:0/14:
[  190.907023]  #0:  ("usb_hub_wq"){.+.+.+}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
[  190.915454]  #1:  ((&hub->events)){+.+.+.}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
[  190.924070]  #2:  (&dev->mutex){......}, at: [<c046e490>] hub_event+0x30/0x1018
[  190.931768]  #3:  (&port_dev->status_lock){+.+.+.}, at: [<c046eb50>] hub_event+0x6f0/0x1018
[  190.940558]  #4:  (&bus->usb_address0_mutex){+.+.+.}, at: [<c046b458>] hub_port_init+0x58/0xb10

Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 5d0b7d4792890a775d3765ba3e6e444b28c836cc)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

usb: xhci: fix wild pointers in xhci_mem_cleanup

Orabug: 23331178

[ Upstream commit 71504062a7c34838c3fccd92c447f399d3cb5797 ]

This patch fixes some wild pointers produced by xhci_mem_cleanup.
These wild pointers will cause system crash if xhci_mem_cleanup()
is called twice.

Reported-and-tested-by: Pengcheng Li <lpc.li@hisilicon.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit bd713f99caa8c08b9bbc57f7c19eefc2eca25fde)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

debugfs: Make automount point inodes permanently empty

Orabug: 23331176

[ Upstream commit 87243deb88671f70def4c52dfa7ca7830707bd31 ]

Starting with 4.1 the tracing subsystem has its own filesystem
which is automounted in the tracing subdirectory of debugfs.
Prior to this debugfs could be bind mounted in a cloned mount
namespace, but if tracefs has been mounted under debugfs this
now fails because there is a locked child mount. This creates
a regression for container software which bind mounts debugfs
to satisfy the assumption of some userspace software.

In other pseudo filesystems such as proc and sysfs we're already
creating mountpoints like this in such a way that no dirents can
be created in the directories, allowing them to be exceptions to
some MNT_LOCKED tests. In fact we're already do this for the
tracefs mountpoint in sysfs.

Do the same in debugfs_create_automount(), since the intention
here is clearly to create a mountpoint. This fixes the regression,
as locked child mounts on permanently empty directories do not
cause a bind mount to fail.

Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 6650ec2e27f912376ea9ec02fe2a144954ce8394)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

Btrfs: fix file/data loss caused by fsync after rename and new inode

Orabug: 23331175

[ Upstream commit 56f23fdbb600e6087db7b009775b95ce07cc3195 ]

If we rename an inode A (be it a file or a directory), create a new
inode B with the old name of inode A and under the same parent directory,
fsync inode B and then power fail, at log tree replay time we end up
removing inode A completely. If inode A is a directory then all its files
are gone too.

Example scenarios where this happens:
This is reproducible with the following steps, taken from a couple of
test cases written for fstests which are going to be submitted upstream
soon:

   # Scenario 1

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir -p /mnt/a/x
   echo "hello" > /mnt/a/x/foo
   echo "world" > /mnt/a/x/bar
   sync
   mv /mnt/a/x /mnt/a/y
   mkdir /mnt/a/x
   xfs_io -c fsync /mnt/a/x
   <power failure happens>

   The next time the fs is mounted, log tree replay happens and
   the directory "y" does not exist nor do the files "foo" and
   "bar" exist anywhere (neither in "y" nor in "x", nor the root
   nor anywhere).

   # Scenario 2

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir /mnt/a
   echo "hello" > /mnt/a/foo
   sync
   mv /mnt/a/foo /mnt/a/bar
   echo "world" > /mnt/a/foo
   xfs_io -c fsync /mnt/a/foo
   <power failure happens>

   The next time the fs is mounted, log tree replay happens and the
   file "bar" does not exists anymore. A file with the name "foo"
   exists and it matches the second file we created.

Another related problem that does not involve file/data loss is when a
new inode is created with the name of a deleted snapshot and we fsync it:

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir /mnt/testdir
   btrfs subvolume snapshot /mnt /mnt/testdir/snap
   btrfs subvolume delete /mnt/testdir/snap
   rmdir /mnt/testdir
   mkdir /mnt/testdir
   xfs_io -c fsync /mnt/testdir # or fsync some file inside /mnt/testdir
   <power failure>

   The next time the fs is mounted the log replay procedure fails because
   it attempts to delete the snapshot entry (which has dir item key type
   of BTRFS_ROOT_ITEM_KEY) as if it were a regular (non-root) entry,
   resulting in the following error that causes mount to fail:

   [52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257
   [52174.512570] ------------[ cut here ]------------
   [52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x178/0x351 [btrfs]()
   [52174.514681] BTRFS: Transaction aborted (error -2)
   [52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev i2c_piix4 proc
   [52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: G        W       4.5.0-rc6-btrfs-next-27+ #1
   [52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
   [52174.524053]  0000000000000000 ffff8801df2a7710 ffffffff81264e93 ffff8801df2a7758
   [52174.524053]  0000000000000009 ffff8801df2a7748 ffffffff81051618 ffffffffa03591cd
   [52174.524053]  00000000fffffffe ffff88015e6e5000 ffff88016dbc3c88 ffff88016dbc3c88
   [52174.524053] Call Trace:
   [52174.524053]  [<ffffffff81264e93>] dump_stack+0x67/0x90
   [52174.524053]  [<ffffffff81051618>] warn_slowpath_common+0x99/0xb2
   [52174.524053]  [<ffffffffa03591cd>] ? __btrfs_unlink_inode+0x178/0x351 [btrfs]
   [52174.524053]  [<ffffffff81051679>] warn_slowpath_fmt+0x48/0x50
   [52174.524053]  [<ffffffffa03591cd>] __btrfs_unlink_inode+0x178/0x351 [btrfs]
   [52174.524053]  [<ffffffff8118f5e9>] ? iput+0xb0/0x284
   [52174.524053]  [<ffffffffa0359fe8>] btrfs_unlink_inode+0x1c/0x3d [btrfs]
   [52174.524053]  [<ffffffffa038631e>] check_item_in_log+0x1fe/0x29b [btrfs]
   [52174.524053]  [<ffffffffa0386522>] replay_dir_deletes+0x167/0x1cf [btrfs]
   [52174.524053]  [<ffffffffa038739e>] fixup_inode_link_count+0x289/0x2aa [btrfs]
   [52174.524053]  [<ffffffffa038748a>] fixup_inode_link_counts+0xcb/0x105 [btrfs]
   [52174.524053]  [<ffffffffa038a5ec>] btrfs_recover_log_trees+0x258/0x32c [btrfs]
   [52174.524053]  [<ffffffffa03885b2>] ? replay_one_extent+0x511/0x511 [btrfs]
   [52174.524053]  [<ffffffffa034f288>] open_ctree+0x1dd4/0x21b9 [btrfs]
   [52174.524053]  [<ffffffffa032b753>] btrfs_mount+0x97e/0xaed [btrfs]
   [52174.524053]  [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf
   [52174.524053]  [<ffffffff8117bafa>] mount_fs+0x67/0x131
   [52174.524053]  [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde
   [52174.524053]  [<ffffffffa032af81>] btrfs_mount+0x1ac/0xaed [btrfs]
   [52174.524053]  [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf
   [52174.524053]  [<ffffffff8108c262>] ? lockdep_init_map+0xb9/0x1b3
   [52174.524053]  [<ffffffff8117bafa>] mount_fs+0x67/0x131
   [52174.524053]  [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde
   [52174.524053]  [<ffffffff8119590f>] do_mount+0x8a6/0x9e8
   [52174.524053]  [<ffffffff811358dd>] ? strndup_user+0x3f/0x59
   [52174.524053]  [<ffffffff81195c65>] SyS_mount+0x77/0x9f
   [52174.524053]  [<ffffffff814935d7>] entry_SYSCALL_64_fastpath+0x12/0x6b
   [52174.561288] ---[ end trace 6b53049efb1a3ea6 ]---

Fix this by forcing a transaction commit when such cases happen.
This means we check in the commit root of the subvolume tree if there
was any other inode with the same reference when the inode we are
fsync'ing is a new inode (created in the current transaction).

Test cases for fstests, covering all the scenarios given above, were
submitted upstream for fstests:

  * fstests: generic test for fsync after renaming directory
    https://patchwork.kernel.org/patch/8694281/

  * fstests: generic test for fsync after renaming file
    https://patchwork.kernel.org/patch/8694301/

  * fstests: add btrfs test for fsync after snapshot deletion
    https://patchwork.kernel.org/patch/8670671/

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 033ad030df0ea932a21499582fea59e1df95769b)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

Btrfs: fix fsync after truncate when no_holes feature is enabled

Orabug: 23331174

[ Upstream commit a89ca6f24ffe435edad57de02eaabd37a2c6bff6 ]

When we have the no_holes feature enabled, if a we truncate a file to a
smaller size, truncate it again but to a size greater than or equals to
its original size and fsync it, the log tree will not have any information
about the hole covering the range [truncate_1_offset, new_file_size[.
Which means if the fsync log is replayed, the file will remain with the
state it had before both truncate operations.

Without the no_holes feature this does not happen, since when the inode
is logged (full sync flag is set) it will find in the fs/subvol tree a
leaf with a generation matching the current transaction id that has an
explicit extent item representing the hole.

Fix this by adding an explicit extent item representing a hole between
the last extent and the inode's i_size if we are doing a full sync.

The issue is easy to reproduce with the following test case for fstests:

  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  _need_to_be_root
  _supported_fs generic
  _supported_os Linux
  _require_scratch
  _require_dm_flakey

  # This test was motivated by an issue found in btrfs when the btrfs
  # no-holes feature is enabled (introduced in kernel 3.14). So enable
  # the feature if the fs being tested is btrfs.
  if [ $FSTYP == "btrfs" ]; then
      _require_btrfs_fs_feature "no_holes"
      _require_btrfs_mkfs_feature "no-holes"
      MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes"
  fi

  rm -f $seqres.full

  _scratch_mkfs >>$seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create our test files and make sure everything is durably persisted.
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K"         \
                  -c "pwrite -S 0xbb 64K 61K"       \
                  $SCRATCH_MNT/foo | _filter_xfs_io
  $XFS_IO_PROG -f -c "pwrite -S 0xee 0 64K"         \
                  -c "pwrite -S 0xff 64K 61K"       \
                  $SCRATCH_MNT/bar | _filter_xfs_io
  sync

  # Now truncate our file foo to a smaller size (64Kb) and then truncate
  # it to the size it had before the shrinking truncate (125Kb). Then
  # fsync our file. If a power failure happens after the fsync, we expect
  # our file to have a size of 125Kb, with the first 64Kb of data having
  # the value 0xaa and the second 61Kb of data having the value 0x00.
  $XFS_IO_PROG -c "truncate 64K" \
               -c "truncate 125K" \
               -c "fsync" \
               $SCRATCH_MNT/foo

  # Do something similar to our file bar, but the first truncation sets
  # the file size to 0 and the second truncation expands the size to the
  # double of what it was initially.
  $XFS_IO_PROG -c "truncate 0" \
               -c "truncate 253K" \
               -c "fsync" \
               $SCRATCH_MNT/bar

  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  # Allow writes again, mount to trigger log replay and validate file
  # contents.
  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  # We expect foo to have a size of 125Kb, the first 64Kb of data all
  # having the value 0xaa and the remaining 61Kb to be a hole (all bytes
  # with value 0x00).
  echo "File foo content after log replay:"
  od -t x1 $SCRATCH_MNT/foo

  # We expect bar to have a size of 253Kb and no extents (any byte read
  # from bar has the value 0x00).
  echo "File bar content after log replay:"
  od -t x1 $SCRATCH_MNT/bar

  status=0
  exit

The expected file contents in the golden output are:

  File foo content after log replay:
  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  0372000
  File bar content after log replay:
  0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  0772000

Without this fix, their contents are:

  File foo content after log replay:
  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
  *
  0372000
  File bar content after log replay:
  0000000 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee
  *
  0200000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  *
  0372000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  0772000

A test case submission for fstests follows soon.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 091537b5b3102af4765256d2b527a2a1de4fb184)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

Btrfs: fix fsync xattr loss in the fast fsync path

Orabug: 23331173

[ Upstream commit 36283bf777d963fac099213297e155d071096994 ]

After commit 4f764e515361 ("Btrfs: remove deleted xattrs on fsync log
replay"), we can end up in a situation where during log replay we end up
deleting xattrs that were never deleted when their file was last fsynced.

This happens in the fast fsync path (flag BTRFS_INODE_NEEDS_FULL_SYNC is
not set in the inode) if the inode has the flag BTRFS_INODE_COPY_EVERYTHING
set, the xattr was added in a past transaction and the leaf where the
xattr is located was not updated (COWed or created) in the current
transaction. In this scenario the xattr item never ends up in the log
tree and therefore at log replay time, which makes the replay code delete
the xattr from the fs/subvol tree as it thinks that xattr was deleted
prior to the last fsync.

Fix this by always logging all xattrs, which is the simplest and most
reliable way to detect deleted xattrs and replay the deletes at log replay
time.

This issue is reproducible with the following test case for fstests:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"

  here=`pwd`
  tmp=/tmp/$$
  status=1 # failure is the default!

  _cleanup()
  {
      _cleanup_flakey
      rm -f $tmp.*
  }
  trap "_cleanup; exit \$status" 0 1 2 3 15

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey
  . ./common/attr

  # real QA test starts here

  # We create a lot of xattrs for a single file. Only btrfs and xfs are currently
  # able to store such a large mount of xattrs per file, other filesystems such
  # as ext3/4 and f2fs for example, fail with ENOSPC even if we attempt to add
  # less than 1000 xattrs with very small values.
  _supported_fs btrfs xfs
  _supported_os Linux
  _need_to_be_root
  _require_scratch
  _require_dm_flakey
  _require_attrs
  _require_metadata_journaling $SCRATCH_DEV

  rm -f $seqres.full

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create the test file with some initial data and make sure everything is
  # durably persisted.
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 32k" $SCRATCH_MNT/foo | _filter_xfs_io
  sync

  # Add many small xattrs to our file.
  # We create such a large amount because it's needed to trigger the issue found
  # in btrfs - we need to have an amount that causes the fs to have at least 3
  # btree leafs with xattrs stored in them, and it must work on any leaf size
  # (maximum leaf/node size is 64Kb).
  num_xattrs=2000
  for ((i = 1; i <= $num_xattrs; i++)); do
      name="user.attr_$(printf "%04d" $i)"
      $SETFATTR_PROG -n $name -v "val_$(printf "%04d" $i)" $SCRATCH_MNT/foo
  done

  # Sync the filesystem to force a commit of the current btrfs transaction, this
  # is a necessary condition to trigger the bug on btrfs.
  sync

  # Now update our file's data and fsync the file.
  # After a successful fsync, if the fsync log/journal is replayed we expect to
  # see all the xattrs we added before with the same values (and the updated file
  # data of course). Btrfs used to delete some of these xattrs when it replayed
  # its fsync log/journal.
  $XFS_IO_PROG -c "pwrite -S 0xbb 8K 16K" \
               -c "fsync" \
               $SCRATCH_MNT/foo | _filter_xfs_io

  # Simulate a crash/power loss.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  # Allow writes again and mount. This makes the fs replay its fsync log.
  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  echo "File content after crash and log replay:"
  od -t x1 $SCRATCH_MNT/foo

  echo "File xattrs after crash and log replay:"
  for ((i = 1; i <= $num_xattrs; i++)); do
      name="user.attr_$(printf "%04d" $i)"
      echo -n "$name="
      $GETFATTR_PROG --absolute-names -n $name --only-values $SCRATCH_MNT/foo
      echo
  done

  status=0
  exit

The golden output expects all xattrs to be available, and with the correct
values, after the fsync log is replayed.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit db4043dffa5b9f77160dfba088bbbbaa64dd9ed1)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

assoc_array: don't call compare_object() on a node

Orabug: 23331172

[ Upstream commit 8d4a2ec1e0b41b0cf9a0c5cd4511da7f8e4f3de2 ]

Changes since V1: fixed the description and added KASan warning.

In assoc_array_insert_into_terminal_node(), we call the
compare_object() method on all non-empty slots, even when they're
not leaves, passing a pointer to an unexpected structure to
compare_object(). Currently it causes an out-of-bound read access
in keyring_compare_object detected by KASan (see below). The issue
is easily reproduced with keyutils testsuite.
Only call compare_object() when the slot is a leave.

KASan warning:
==================================================================
BUG: KASAN: slab-out-of-bounds in keyring_compare_object+0x213/0x240 at addr ffff880060a6f838
Read of size 8 by task keyctl/1655
=============================================================================
BUG kmalloc-192 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------

Disabling lock debugging due to kernel taint
INFO: Allocated in assoc_array_insert+0xfd0/0x3a60 age=69 cpu=1 pid=1647
___slab_alloc+0x563/0x5c0
__slab_alloc+0x51/0x90
kmem_cache_alloc_trace+0x263/0x300
assoc_array_insert+0xfd0/0x3a60
__key_link_begin+0xfc/0x270
key_create_or_update+0x459/0xaf0
SyS_add_key+0x1ba/0x350
entry_SYSCALL_64_fastpath+0x12/0x76
INFO: Slab 0xffffea0001829b80 objects=16 used=8 fp=0xffff880060a6f550 flags=0x3fff8000004080
INFO: Object 0xffff880060a6f740 @offset=5952 fp=0xffff880060a6e5d1

Bytes b4 ffff880060a6f730: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f740: d1 e5 a6 60 00 88 ff ff 0e 00 00 00 00 00 00 00  ...`............
Object ffff880060a6f750: 02 cf 8e 60 00 88 ff ff 02 c0 8e 60 00 88 ff ff  ...`.......`....
Object ffff880060a6f760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f770: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f7a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f7b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f7c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f7d0: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f7e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f7f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
CPU: 0 PID: 1655 Comm: keyctl Tainted: G    B           4.5.0-rc4-kasan+ #291
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
0000000000000000 000000001b2800b4 ffff880060a179e0 ffffffff81b60491
ffff88006c802900 ffff880060a6f740 ffff880060a17a10 ffffffff815e2969
ffff88006c802900 ffffea0001829b80 ffff880060a6f740 ffff880060a6e650
Call Trace:
[<ffffffff81b60491>] dump_stack+0x85/0xc4
[<ffffffff815e2969>] print_trailer+0xf9/0x150
[<ffffffff815e9454>] object_err+0x34/0x40
[<ffffffff815ebe50>] kasan_report_error+0x230/0x550
[<ffffffff819949be>] ? keyring_get_key_chunk+0x13e/0x210
[<ffffffff815ec62d>] __asan_report_load_n_noabort+0x5d/0x70
[<ffffffff81994cc3>] ? keyring_compare_object+0x213/0x240
[<ffffffff81994cc3>] keyring_compare_object+0x213/0x240
[<ffffffff81bc238c>] assoc_array_insert+0x86c/0x3a60
[<ffffffff81bc1b20>] ? assoc_array_cancel_edit+0x70/0x70
[<ffffffff8199797d>] ? __key_link_begin+0x20d/0x270
[<ffffffff8199786c>] __key_link_begin+0xfc/0x270
[<ffffffff81993389>] key_create_or_update+0x459/0xaf0
[<ffffffff8128ce0d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81992f30>] ? key_type_lookup+0xc0/0xc0
[<ffffffff8199e19d>] ? lookup_user_key+0x13d/0xcd0
[<ffffffff81534763>] ? memdup_user+0x53/0x80
[<ffffffff819983ea>] SyS_add_key+0x1ba/0x350
[<ffffffff81998230>] ? key_get_type_from_user.constprop.6+0xa0/0xa0
[<ffffffff828bcf4e>] ? retint_user+0x18/0x23
[<ffffffff8128cc7e>] ? trace_hardirqs_on_caller+0x3fe/0x580
[<ffffffff81004017>] ? trace_hardirqs_on_thunk+0x17/0x19
[<ffffffff828bc432>] entry_SYSCALL_64_fastpath+0x12/0x76
Memory state around the buggy address:
ffff880060a6f700: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
ffff880060a6f780: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
>ffff880060a6f800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                        ^
ffff880060a6f880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff880060a6f900: fc fc fc fc fc fc 00 00 00 00 00 00 00 00 00 00
==================================================================

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 32d1b6727390b22cc58d28eb9d7b2d7055e588b7)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

mac80211: properly deal with station hashtable insert errors

Orabug: 23331171

[ Upstream commit 62b14b241ca6f790a17ccd9dd9f62ce1b006d406 ]

The original hand-implemented hash-table in mac80211 couldn't result
in insertion errors, and while converting to rhashtable I evidently
forgot to check the errors.

This surfaced now only because Ben is adding many identical keys and
that resulted in hidden insertion errors.

Cc: stable@vger.kernel.org
Fixes: 7bedd0cfad4e1 ("mac80211: use rhashtable for station table")
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit d2bccdc948e21a48d593181c05667d454f423fcd)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

drm/i915: Fix race condition in intel_dp_destroy_mst_connector()

Orabug: 23331170

[ Upstream commit 9e60290dbafdf577766e5fc5f2fdb3be450cf9a6 ]

After unplugging a DP MST display from the system, we have to go through
and destroy all of the DRM connectors associated with it since none of
them are valid anymore. Unfortunately, intel_dp_destroy_mst_connector()
doesn't do a good enough job of ensuring that throughout the destruction
process that no modesettings can be done with the connectors. As it is
right now, intel_dp_destroy_mst_connector() works like this:

* Take all modeset locks
* Clear the configuration of the crtc on the connector, if there is one
* Drop all modeset locks, this is required because of circular
dependency issues that arise with trying to remove the connector from
sysfs with modeset locks held
* Unregister the connector
* Take all modeset locks, again
* Do the rest of the required cleaning for destroying the connector
* Finally drop all modeset locks for good

This only works sometimes. During the destruction process, it's very
possible that a userspace application will attempt to do a modesetting
using the connector. When we drop the modeset locks, an ioctl handler
such as drm_mode_setcrtc has the oppurtunity to take all of the modeset
locks from us. When this happens, one thing leads to another and
eventually we end up committing a mode with the non-existent connector:

[drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* failed to enable link training
[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
[drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel equalization
[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
[drm:intel_mst_pre_enable_dp [i915]] *ERROR* failed to allocate vcpi

And in some cases, such as with the T460s using an MST dock, this
results in breaking modesetting and/or panicking the system.

To work around this, we now unregister the connector at the very
beginning of intel_dp_destroy_mst_connector(), grab all the modesetting
locks, and then hold them until we finish the rest of the function.

CC: stable@vger.kernel.org
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Rob Clark <rclark@redhat.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1458155884-13877-1-git-send-email-cpaul@redhat.com
(cherry picked from commit 1f7717552ef1306be3b7ed28c66c6eff550e3a23)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>
(cherry picked from commit e4ad83b41115d3e57318874474cb655f3d6f6378)

Conflict:

drivers/gpu/drm/i915/intel_dp_mst.c

usb: renesas_usbhs: fix to avoid using a disabled ep in usbhsg_queue_done()

Orabug: stable_rc4

[ Upstream commit 4fccb0767fdbdb781a9c5b5c15ee7b219443c89d ]

This patch fixes an issue that usbhsg_queue_done() may cause kernel
panic when dma callback is running and usb_ep_disable() is called
by interrupt handler. (Especially, we can reproduce this issue using
g_audio with usb-dmac driver.)

For example of a flow:
usbhsf_dma_complete (on tasklet)
  --> usbhsf_pkt_handler (on tasklet)
   --> usbhsg_queue_done (on tasklet)
    *** interrupt happened and usb_ep_disable() is called ***
    --> usbhsg_queue_pop (on tasklet)
     Then, oops happened.

Fixes: e73a989 ("usb: renesas_usbhs: add DMAEngine support")
Cc: <stable@vger.kernel.org> # v3.1+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit be851fafe2b011c725e107b8b70e6ec72bcf0d05)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

ext4: add lockdep annotations for i_data_sem

Orabug: 23331167

[ Upstream commit daf647d2dd58cec59570d7698a45b98e580f2076 ]

With the internal Quota feature, mke2fs creates empty quota inodes and
quota usage tracking is enabled as soon as the file system is mounted.
Since quotacheck is no longer preallocating all of the blocks in the
quota inode that are likely needed to be written to, we are now seeing
a lockdep false positive caused by needing to allocate a quota block
from inside ext4_map_blocks(), while holding i_data_sem for a data
inode.  This results in this complaint:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&ei->i_data_sem);
                                lock(&s->s_dquot.dqio_mutex);
                                lock(&ei->i_data_sem);
   lock(&s->s_dquot.dqio_mutex);

Google-Bug-Id: 27907753

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 031b34ddc5b7714d957f7e8ed229e1c1a4c22a8f)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

USB: cypress_m8: add endpoint sanity check

Orabug: 23331166

[ Upstream commit c55aee1bf0e6b6feec8b2927b43f7a09a6d5f754 ]

An attack using missing endpoints exists.

CVE-2016-3137

Signed-off-by: Oliver Neukum <ONeukum@suse.com>
CC: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 45f4b9ca0cf8e53df5adc20d11ffb4b2076dd2c5)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

USB: mct_u232: add sanity checking in probe

Orabug: 23331165

[ Upstream commit 4e9a0b05257f29cf4b75f3209243ed71614d062e ]

An attack using the lack of sanity checking in probe is known. This
patch checks for the existence of a second port.

CVE-2016-3136

Signed-off-by: Oliver Neukum <ONeukum@suse.com>
CC: stable@vger.kernel.org
[johan: add error message ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 4b8d00f5056e278b053ca183e15f4a8e48d79336)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

usb: renesas_usbhs: avoid NULL pointer derefernce in usbhsf_pkt_handler()

Orabug: stable_rc4

[ Upstream commit 894f2fc44f2f3f48c36c973b1123f6ab298be160 ]

When unexpected situation happened (e.g. tx/rx irq happened while
DMAC is used), the usbhsf_pkt_handler() was possible to cause NULL
pointer dereference like the followings:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 80000007 [#1] SMP ARM
Modules linked in: usb_f_acm u_serial g_serial libcomposite
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.5.0-rc6-00842-gac57066-dirty #63
Hardware name: Generic R8A7790 (Flattened Device Tree)
task: c0729c00 ti: c0724000 task.ti: c0724000
PC is at 0x0
LR is at usbhsf_pkt_handler+0xac/0x118
pc : [<00000000>]    lr : [<c03257e0>]    psr: 60000193
sp : c0725db8  ip : 00000000  fp : c0725df4
r10: 00000001  r9 : 00000193  r8 : ef3ccab4
r7 : ef3cca10  r6 : eea4586c  r5 : 00000000  r4 : ef19ceb4
r3 : 00000000  r2 : 0000009c  r1 : c0725dc4  r0 : ef19ceb4

This patch adds a condition to avoid the dereference.

Fixes: e73a989 ("usb: renesas_usbhs: add DMAEngine support")
Cc: <stable@vger.kernel.org> # v3.1+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 250443d81ea8150c3445c87a291a3ef482cd3ad7)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

HID: usbhid: fix inconsistent reset/resume/reset-resume behavior

Orabug: 23331163

[ Upstream commit 972e6a993f278b416a8ee3ec65475724fc36feb2 ]

The usbhid driver has inconsistently duplicated code in its post-reset,
resume, and reset-resume pathways.

reset-resume doesn't check HID_STARTED before trying to
restart the I/O queues.

resume fails to clear the HID_SUSPENDED flag if HID_STARTED
isn't set.

resume calls usbhid_restart_queues() with usbhid->lock held
and the others call it without holding the lock.

The first item in particular causes a problem following a reset-resume
if the driver hasn't started up its I/O. URB submission fails because
usbhid->urbin is NULL, and this triggers an unending reset-retry loop.

This patch fixes the problem by creating a new subroutine,
hid_restart_io(), to carry out all the common activities. It also
adds some checks that were missing in the original code:

After a reset, there's no need to clear any halted endpoints.

After a resume, if a reset is pending there's no need to
restart any I/O until the reset is finished.

After a resume, if the interrupt-IN endpoint is halted there's
no need to submit the input URB until the halt has been
cleared.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Daniel Fraga <fragabr@gmail.com>
Tested-by: Daniel Fraga <fragabr@gmail.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 8db1fb6a6e0fdc6f6fbfabfc4c9c5a183a7527ac)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

crypto: atmel - fix checks of error code returned by devm_ioremap_resource()

Orabug: stable_rc4

[ Upstream commit 9b52d55f4f0e2bb9a34abbcf99e05e17f1b3b281 ]

The change fixes potential oops while accessing iomem on invalid
address, if devm_ioremap_resource() fails due to some reason.

The devm_ioremap_resource() function returns ERR_PTR() and never
returns NULL, which makes useless a following check for NULL.

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Fixes: b0e8b3417a62 ("crypto: atmel - use devm_xxx() managed function")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit a9e524e20d53b67328c1ba0301c19efae260a0a0)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

mm/page_alloc: prevent merging between isolated and other pageblocks

Orabug: 23331159

[ Upstream commit d9dddbf556674bf125ecd925b24e43a5cf2a568a ]

Hanjun Guo has reported that a CMA stress test causes broken accounting of
CMA and free pages:

> Before the test, I got:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal:         204800 kB
> CmaFree:          195044 kB
>
>
> After running the test:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal:         204800 kB
> CmaFree:         6602584 kB
>
> So the freed CMA memory is more than total..
>
> Also the the MemFree is more than mem total:
>
> -bash-4.3# cat /proc/meminfo
> MemTotal:       16342016 kB
> MemFree:        22367268 kB
> MemAvailable:   22370528 kB

Laura Abbott has confirmed the issue and suspected the freepage accounting
rewrite around 3.18/4.0 by Joonsoo Kim.  Joonsoo had a theory that this is
caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
pageblocks:

> CMA isolates MAX_ORDER aligned blocks, but, during the process,
> partialy isolated block exists. If MAX_ORDER is 11 and
> pageblock_order is 9, two pageblocks make up MAX_ORDER
> aligned block and I can think following scenario because pageblock
> (un)isolation would be done one by one.
>
> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
> MIGRATE_ISOLATE, respectively.
>
> CC -> IC -> II (Isolation)
> II -> CI -> CC (Un-isolation)
>
> If some pages are freed at this intermediate state such as IC or CI,
> that page could be merged to the other page that is resident on
> different type of pageblock and it will cause wrong freepage count.

This was supposed to be prevented by CMA operating on MAX_ORDER blocks,
but since it doesn't hold the zone->lock between pageblocks, a race
window does exist.

It's also likely that unexpected merging can occur between
MIGRATE_ISOLATE and non-CMA pageblocks.  This should be prevented in
__free_one_page() since commit 3c605096d315 ("mm/page_alloc: restrict
max order of merging on isolated pageblock").  However, we only check
the migratetype of the pageblock where buddy merging has been initiated,
not the migratetype of the buddy pageblock (or group of pageblocks)
which can be MIGRATE_ISOLATE.

Joonsoo has suggested checking for buddy migratetype as part of
page_is_buddy(), but that would add extra checks in allocator hotpath
and bloat-o-meter has shown significant code bloat (the function is
inline).

This patch reduces the bloat at some expense of more complicated code.
The buddy-merging while-loop in __free_one_page() is initially bounded
to pageblock_border and without any migratetype checks.  The checks are
placed outside, bumping the max_order if merging is allowed, and
returning to the while-loop with a statement which can't be possibly
considered harmful.

This fixes the accounting bug and also removes the arguably weird state
in the original commit 3c605096d315 where buddies could be left
unmerged.

Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
Link: https://lkml.org/lkml/2016/3/2/280
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Hanjun Guo <guohanjun@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Debugged-by: Laura Abbott <labbott@redhat.com>
Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> [3.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>
(cherry picked from commit da2041b6c4bf60904a7732eec0b20c89f54d9693)

Conflict:

mm/page_alloc.c

mm: page_alloc: pass PFN to __free_pages_bootmem

Orabug: 23331157

[ Upstream commit d70ddd7a5d9aa335f9b4b0c3d879e1e70ee1e4e3 ]

__free_pages_bootmem prepares a page for release to the buddy allocator
and assumes that the struct page is initialised. Parallel initialisation
of struct pages defers initialisation and __free_pages_bootmem can be
called for struct pages that cannot yet map struct page to PFN. This
patch passes PFN to __free_pages_bootmem with no other functional change.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit bbd0b13f917860a686ccaf40cda003332a640a70)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

ocfs2/dlm: fix BUG in dlm_move_lockres_to_recovery_list

Orabug: 23331156

[ Upstream commit be12b299a83fc807bbaccd2bcb8ec50cbb0cb55c ]

When master handles convert request, it queues ast first and then
returns status.  This may happen that the ast is sent before the request
status because the above two messages are sent by two threads.  And
right after the ast is sent, if master down, it may trigger BUG in
dlm_move_lockres_to_recovery_list in the requested node because ast
handler moves it to grant list without clear lock->convert_pending.  So
remove BUG_ON statement and check if the ast is processed in
dlmconvert_remote.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Tariq Saeed <tariq.x.saeed@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 474b8c6d329cfc40680ef308878d22f5a1a3b02b)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

ocfs2/dlm: fix race between convert and recovery

Orabug: 23331155

[ Upstream commit ac7cf246dfdbec3d8fed296c7bf30e16f5099dac ]

There is a race window between dlmconvert_remote and
dlm_move_lockres_to_recovery_list, which will cause a lock with
OCFS2_LOCK_BUSY in grant list, thus system hangs.

dlmconvert_remote
{
        spin_lock(&res->spinlock);
        list_move_tail(&lock->list, &res->converting);
        lock->convert_pending = 1;
        spin_unlock(&res->spinlock);

        status = dlm_send_remote_convert_request();
        >>>>>> race window, master has queued ast and return DLM_NORMAL,
               and then down before sending ast.
               this node detects master down and calls
               dlm_move_lockres_to_recovery_list, which will revert the
               lock to grant list.
               Then OCFS2_LOCK_BUSY won't be cleared as new master won't
               send ast any more because it thinks already be authorized.

        spin_lock(&res->spinlock);
        lock->convert_pending = 0;
        if (status != DLM_NORMAL)
                dlm_revert_pending_convert(res, lock);
        spin_unlock(&res->spinlock);
}

In this case, check if res->state has DLM_LOCK_RES_RECOVERING bit set
(res is still in recovering) or res master changed (new master has
finished recovery), reset the status to DLM_RECOVERING, then it will
retry convert.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Tariq Saeed <tariq.x.saeed@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit d5865dc7deb118b1db82dcaf4d249668bc81a311)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

Input: ati_remote2 - fix crashes on detecting device with invalid descriptor

Orabug: stable_rc4

[ Upstream commit 950336ba3e4a1ffd2ca60d29f6ef386dd2c7351d ]

The ati_remote2 driver expects at least two interfaces with one
endpoint each. If given malicious descriptor that specify one
interface or no endpoints, it will crash in the probe function.
Ensure there is at least two interfaces and one endpoint for each
interface before using it.

The full disclosure: http://seclists.org/bugtraq/2016/Mar/90

Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 4b586dc3d736a43659acb575c90d33370ba2fb0d)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

rapidio/rionet: fix deadlock on SMP

Orabug: 23331153

[ Upstream commit 36915976eca58f2eefa040ba8f9939672564df61 ]

Fix deadlocking during concurrent receive and transmit operations on SMP
platforms caused by the use of incorrect lock: on transmit 'tx_lock'
spinlock should be used instead of 'lock' which is used for receive
operation.

This fix is applicable to kernel versions starting from v2.15.

Signed-off-by: Aurelien Jacquiot <a-jacquiot@ti.com>
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit e025091aa3a343703375c66e0bd02675b251411a)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

fs/coredump: prevent fsuid=0 dumps into user-controlled directories

Orabug: 23331152

[ Upstream commit 378c6520e7d29280f400ef2ceaf155c86f05a71a ]

This commit fixes the following security hole affecting systems where
all of the following conditions are fulfilled:

- The fs.suid_dumpable sysctl is set to 2.
- The kernel.core_pattern sysctl's value starts with "/". (Systems
   where kernel.core_pattern starts with "|/" are not affected.)
- Unprivileged user namespace creation is permitted. (This is
   true on Linux >=3.8, but some distributions disallow it by
   default using a distro patch.)

Under these conditions, if a program executes under secure exec rules,
causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
namespace, changes its root directory and crashes, the coredump will be
written using fsuid=0 and a path derived from kernel.core_pattern - but
this path is interpreted relative to the root directory of the process,
allowing the attacker to control where a coredump will be written with
root privileges.

To fix the security issue, always interpret core_pattern for dumps that
are written under SUID_DUMP_ROOT relative to the root directory of init.

Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>
(cherry picked from commit 2e840836fdf9ba0767134dcc5103dd25e442023b)

Conflict:

fs/coredump.c

KVM: fix spin_lock_init order on x86

Orabug: 23331151

[ Upstream commit e9ad4ec8379ad1ba6f68b8ca1c26b50b5ae0a327 ]

Moving the initialization earlier is needed in 4.6 because
kvm_arch_init_vm is now using mmu_lock, causing lockdep to
complain:

[  284.440294] INFO: trying to register non-static key.
[  284.445259] the code is fine but needs lockdep annotation.
[  284.450736] turning off the locking correctness validator.
...
[  284.528318]  [<ffffffff810aecc3>] lock_acquire+0xd3/0x240
[  284.533733]  [<ffffffffa0305aa0>] ? kvm_page_track_register_notifier+0x20/0x60 [kvm]
[  284.541467]  [<ffffffff81715581>] _raw_spin_lock+0x41/0x80
[  284.546960]  [<ffffffffa0305aa0>] ? kvm_page_track_register_notifier+0x20/0x60 [kvm]
[  284.554707]  [<ffffffffa0305aa0>] kvm_page_track_register_notifier+0x20/0x60 [kvm]
[  284.562281]  [<ffffffffa02ece70>] kvm_mmu_init_vm+0x20/0x30 [kvm]
[  284.568381]  [<ffffffffa02dbf7a>] kvm_arch_init_vm+0x1ea/0x200 [kvm]
[  284.574740]  [<ffffffffa02bff3f>] kvm_dev_ioctl+0xbf/0x4d0 [kvm]

However, it also helps fixing a preexisting problem, which is why this
patch is also good for stable kernels: kvm_create_vm was incrementing
current->mm->mm_count but not decrementing it at the out_err label (in
case kvm_init_mmu_notifier failed).  The new initialization order makes
it possible to add the required mmdrop without adding a new error label.

Cc: stable@vger.kernel.org
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 01954dfd48d936199204ae0860897b8aed18c1e2)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

KVM: VMX: avoid guest hang on invalid invept instruction

Orabug: 23331149

[ Upstream commit 2849eb4f99d54925c543db12917127f88b3c38ff ]

A guest executing an invalid invept instruction would hang
because the instruction pointer was not updated.

Cc: stable@vger.kernel.org
Fixes: bfd0a56b90005f8c8a004baf407ad90045c2b11e
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 7a33539146bdcbbce25dbe93e853f39058c640a9)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

target: Fix target_release_cmd_kref shutdown comp leak

Orabug: 23331148

[ Upstream commit 5e47f1985d7107331c3f64fb3ec83d66fd73577e ]

This patch fixes an active I/O shutdown bug for fabric
drivers using target_wait_for_sess_cmds(), where se_cmd
descriptor shutdown would result in hung tasks waiting
indefinitely for se_cmd->cmd_wait_comp to complete().

To address this bug, drop the incorrect list_del_init()
usage in target_wait_for_sess_cmds() and always complete()
during se_cmd target_release_cmd_kref() put, in order to
let caller invoke the final fabric release callback
into se_cmd->se_tfo->release_cmd() code.

Reported-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Tested-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>
(cherry picked from commit b90311472f0dc0aa7a7223c1220db47ce4db5447)

Conflict:

drivers/target/target_core_transport.c

bitops: Do not default to __clear_bit() for __clear_bit_unlock()

Orabug: 23331133

[ Upstream commit f75d48644c56a31731d17fa693c8175328957e1d ]

__clear_bit_unlock() is a special little snowflake. While it carries the
non-atomic '__' prefix, it is specifically documented to pair with
test_and_set_bit() and therefore should be 'somewhat' atomic.

Therefore the generic implementation of __clear_bit_unlock() cannot use
the fully non-atomic __clear_bit() as a default.

If an arch is able to do better; is must provide an implementation of
__clear_bit_unlock() itself.

Specifically, this came up as a result of hackbench livelock'ing in
slab_lock() on ARC with SMP + SLUB + !LLSC.

The issue was incorrect pairing of atomic ops.

slab_lock() -> bit_spin_lock() -> test_and_set_bit()
slab_unlock() -> __bit_spin_unlock() -> __clear_bit()

The non serializing __clear_bit() was getting "lost"

80543b8e: ld_s       r2,[r13,0] <--- (A) Finds PG_locked is set
80543b90: or         r3,r2,1    <--- (B) other core unlocks right here
80543b94: st_s       r3,[r13,0] <--- (C) sets PG_locked (overwrites unlock)

Fixes ARC STAR 9000817404 (and probably more).

Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Tested-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160309114054.GJ6356@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 20b25a3a2ce6ab20ceab54a2650809cc191c3287)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

splice: handle zero nr_pages in splice_to_pipe()

Orabug: 23331132

[ Upstream commit d6785d9152147596f60234157da2b02540c3e60f ]

Running the following command:

busybox cat /sys/kernel/debug/tracing/trace_pipe > /dev/null

with any tracing enabled pretty very quickly leads to various NULL
pointer dereferences and VM BUG_ON()s, such as these:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffff8119df6c>] generic_pipe_buf_release+0xc/0x40
Call Trace:
  [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0
  [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10
  [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0
  [<ffffffff81196869>] do_sendfile+0x199/0x380
  [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0
  [<ffffffff8192cbee>] entry_SYSCALL_64_fastpath+0x12/0x6d

page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
kernel BUG at include/linux/mm.h:367!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
RIP: [<ffffffff8119df9c>] generic_pipe_buf_release+0x3c/0x40
Call Trace:
  [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0
  [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10
  [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0
  [<ffffffff81196869>] do_sendfile+0x199/0x380
  [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0
  [<ffffffff8192cd1e>] tracesys_phase2+0x84/0x89

(busybox's cat uses sendfile(2), unlike the coreutils version)

This is because tracing_splice_read_pipe() can call splice_to_pipe()
with spd->nr_pages == 0.  spd_pages underflows in splice_to_pipe() and
we fill the page pointers and the other fields of the pipe_buffers with
garbage.

All other callers of splice_to_pipe() avoid calling it when nr_pages ==
0, and we could make tracing_splice_read_pipe() do that too, but it
seems reasonable to have splice_to_page() handle this condition
gracefully.

Cc: stable@vger.kernel.org
Signed-off-by: Rabin Vincent <rabin@rab.in>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit ff6c4184945bb5ec843cde5c3936b93823cf5641)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

tracing: Fix crash from reading trace_pipe with sendfile

Orabug: 23331131

[ Upstream commit a29054d9478d0435ab01b7544da4f674ab13f533 ]

If tracing contains data and the trace_pipe file is read with sendfile(),
then it can trigger a NULL pointer dereference and various BUG_ON within the
VM code.

There's a patch to fix this in the splice_to_pipe() code, but it's also a
good idea to not let that happen from trace_pipe either.

Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.in
Cc: stable@vger.kernel.org # 2.6.30+
Reported-by: Rabin Vincent <rabin.vincent@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit ab31b690cdf88cf5cb0493718b89dbc37dd784a3)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

USB: uas: Reduce can_queue to MAX_CMNDS

Orabug: 23331130

[ Upstream commit 55ff8cfbc4e12a7d2187df523938cc671fbebdd1 ]

The uas driver can never queue more then MAX_CMNDS (- 1) tags and tags
are shared between luns, so there is no need to claim that we can_queue
some random large number.

Not claiming that we can_queue 65536 commands, fixes the uas driver
failing to initialize while allocating the tag map with a "Page allocation
failure (order 7)" error on systems which have been running for a while
and thus have fragmented memory.

Cc: stable@vger.kernel.org
Reported-and-tested-by: Yves-Alexis Perez <corsac@corsac.net>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 0b914b30c534502d1d6a68a5859ee65c3d7d8aa5)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

USB: cdc-acm: more sanity checking

Orabug: 23331129

[ Upstream commit 8835ba4a39cf53f705417b3b3a94eb067673f2c9 ]

An attack has become available which pretends to be a quirky
device circumventing normal sanity checks and crashes the kernel
by an insufficient number of interfaces. This patch adds a check
to the code path for quirky devices.

Signed-off-by: Oliver Neukum <ONeukum@suse.com>
CC: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit a635bc779e7b7748c9b0b773eaf08a7f2184ec50)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

USB: usb_driver_claim_interface: add sanity checking

Orabug: 23331128

[ Upstream commit 0b818e3956fc1ad976bee791eadcbb3b5fec5bfd ]

Attacks that trick drivers into passing a NULL pointer
to usb_driver_claim_interface() using forged descriptors are
known. This thwarts them by sanity checking.

Signed-off-by: Oliver Neukum <ONeukum@suse.com>
CC: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit febfaffe4aa9b0600028f46f50156e3354e34208)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

USB: iowarrior: fix oops with malicious USB descriptors

Orabug: 23331127

[ Upstream commit 4ec0ef3a82125efc36173062a50624550a900ae0 ]

The iowarrior driver expects at least one valid endpoint. If given
malicious descriptors that specify 0 for the number of endpoints,
it will crash in the probe function. Ensure there is at least
one endpoint on the interface before using it.

The full report of this issue can be found here:
http://seclists.org/bugtraq/2016/Mar/87

Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 14a710bbf6b4095876ff682b3066ac485480049c)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt()

Orabug: stable_rc4

[ Upstream commit 7834c10313fb823e538f2772be78edcdeed2e6e3 ]

Since 4.4, I've been able to trigger this occasionally:

===============================
[ INFO: suspicious RCU usage. ]
4.5.0-rc7-think+ #3 Not tainted
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20160315012054.GA17765@codemonkey.org.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-------------------------------
./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/3/0.

stack backtrace:
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.5.0-rc7-think+ #3
ffffffff92f821e0 1f3e5c340597d7fc ffff880468e07f10 ffffffff92560c2a
ffff880462145280 0000000000000001 ffff880468e07f40 ffffffff921376a6
ffffffff93665ea0 0000cc7c876d28da 0000000000000005 ffffffff9383dd60
Call Trace:
<IRQ> [<ffffffff92560c2a>] dump_stack+0x67/0x9d
[<ffffffff921376a6>] lockdep_rcu_suspicious+0xe6/0x100
[<ffffffff925ae7a7>] do_trace_write_msr+0x127/0x1a0
[<ffffffff92061c83>] native_apic_msr_eoi_write+0x23/0x30
[<ffffffff92054408>] smp_trace_call_function_interrupt+0x38/0x360
[<ffffffff92d1ca60>] trace_call_function_interrupt+0x90/0xa0
<EOI> [<ffffffff92ac5124>] ? cpuidle_enter_state+0x1b4/0x520

Move the entering_irq() call before ack_APIC_irq(), because entering_irq()
tells the RCU susbstems to end the extended quiescent state, so that the
following trace call in ack_APIC_irq() works correctly.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Fixes: 4787c368a9bc "x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()"
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit e69b0c2d1686a3344265e26a2ea6197e856f0459)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

Input: synaptics - handle spurious release of trackstick buttons, again

Orabug: stable_rc4

[ Upstream commit 82be788c96ed5978d3cb4a00079e26b981a3df3f ]

Looks like the fimware 8.2 still has the extra buttons spurious release
bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=114321
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit dfeccb29d75a2a7dc75d8b2c68b53bd7601d8686)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

mm: memcontrol: reclaim when shrinking memory.high below usage

Orabug: 23331124

[ Upstream commit 588083bb37a3cea8533c392370a554417c8f29cb ]

When setting memory.high below usage, nothing happens until the next
charge comes along, and then it will only reclaim its own charge and not
the now potentially huge excess of the new memory.high. This can cause
groups to stay in excess of their memory.high indefinitely.

To fix that, when shrinking memory.high, kick off a reclaim cycle that
goes after the delta.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 62b3bd2a4f98852cefd0df17333362638246aa8f)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

Input: ims-pcu - sanity check against missing interfaces

Orabug: 23331123

[ Upstream commit a0ad220c96692eda76b2e3fd7279f3dcd1d8a8ff ]

A malicious device missing interface can make the driver oops.
Add sanity checking.

Signed-off-by: Oliver Neukum <ONeukum@suse.com>
CC: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 3ec245e8591a183e276df89cd7f9e7a15645b9da)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

mmc: atmel-mci: Check pdata for NULL before dereferencing it at DMA config

Orabug: stable_rc4

[ Upstream commit 93c77d2999b09f2084b033ea6489915e0104ad9c ]

Using an at91sam9g20ek development board with DTS configuration may trigger
a kernel panic because of a NULL pointer dereference exception, while
configuring DMA. Let's fix this by adding a check for pdata before
dereferencing it.

Signed-off-by: Brent Taylor <motobud@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 0db0888f18f37d9ee10e99a7f3e90861b4d525a2)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

mmc: atmel-mci: restore dma on AVR32

Orabug: 23331120

[ Upstream commit 74843787158e9dff249f0528e7d4806102cc2c26 ]

Commit ecb89f2f5f3e7 ("mmc: atmel-mci: remove compat for non DT board
when requesting dma chan") broke dma on AVR32 and any other boards not
using DT. This restores a fallback mechanism for such cases.

Signed-off-by: Mans Rullgard <mans@mansr.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 7fa28eeed844c4388957095e5c485eb6f87a14de)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

nfsd: fix deadlock secinfo+readdir compound

Orabug: 23331119

[ Upstream commit 2f6fc056e899bd0144a08da5cacaecbe8997cd74 ]

nfsd_lookup_dentry exits with the parent filehandle locked. fh_put also
unlocks if necessary (nfsd filehandle locking is probably too lenient),
so it gets unlocked eventually, but if the following op in the compound
needs to lock it again, we can deadlock.

A fuzzer ran into this; normal clients don't send a secinfo followed by
a readdir in the same compound.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 03d44e3d9dd7744fe97ca472fce1725b7179fa2f)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

ALSA: usb-audio: Add sanity checks for endpoint accesses

Orabug: 23331117

[ Upstream commit 447d6275f0c21f6cc97a88b3a0c601436a4cdf2a ]

Add some sanity check codes before actually accessing the endpoint via
get_endpoint() in order to avoid the invalid access through a
malformed USB descriptor. Mostly just checking bNumEndpoints, but in
one place (snd_microii_spdif_default_get()), the validity of iface and
altsetting index is checked as well.

Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=971125
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 4fa1657957f668fcc9606268df01bc0f3e4f1379)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

ALSA: usb-audio: Fix NULL dereference in create_fixed_stream_quirk()

Orabug: 23331116

[ Upstream commit 0f886ca12765d20124bd06291c82951fd49a33be ]

create_fixed_stream_quirk() may cause a NULL-pointer dereference by
accessing the non-existing endpoint when a USB device with a malformed
USB descriptor is used.

This patch avoids it simply by adding a sanity check of bNumEndpoints
before the accesses.

Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=971125
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 6ed72ce6ab8b38803b12df8c62a3a52becf19017)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

HID: i2c-hid: fix OOB write in i2c_hid_set_or_send_report()

Orabug: 23331115

[ Upstream commit 3b654288b196ceaa156029d9457ccbded0489b98 ]

Even though hid_hw_* checks that passed in data_len is less than
HID_MAX_BUFFER_SIZE it is not enough, as i2c-hid does not necessarily
allocate buffers of HID_MAX_BUFFER_SIZE but rather checks all device
reports and select largest size. In-kernel users normally just send as much
data as report needs, so there is no problem, but hidraw users can do
whatever they please:

BUG: KASAN: slab-out-of-bounds in memcpy+0x34/0x54 at addr ffffffc07135ea80
Write of size 4101 by task syz-executor/8747
CPU: 2 PID: 8747 Comm: syz-executor Tainted: G    BU         3.18.0 #37
Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT)
Call trace:
[<ffffffc00020ebcc>] dump_backtrace+0x0/0x258 arch/arm64/kernel/traps.c:83
[<ffffffc00020ee40>] show_stack+0x1c/0x2c arch/arm64/kernel/traps.c:172
[<     inline     >] __dump_stack lib/dump_stack.c:15
[<ffffffc001958114>] dump_stack+0x90/0x140 lib/dump_stack.c:50
[<     inline     >] print_error_description mm/kasan/report.c:97
[<     inline     >] kasan_report_error mm/kasan/report.c:278
[<ffffffc0004597dc>] kasan_report+0x268/0x530 mm/kasan/report.c:305
[<ffffffc0004592e8>] __asan_storeN+0x20/0x150 mm/kasan/kasan.c:718
[<ffffffc0004594e0>] memcpy+0x30/0x54 mm/kasan/kasan.c:299
[<ffffffc001306354>] __i2c_hid_command+0x2b0/0x7b4 drivers/hid/i2c-hid/i2c-hid.c:178
[<     inline     >] i2c_hid_set_or_send_report drivers/hid/i2c-hid/i2c-hid.c:321
[<ffffffc0013079a0>] i2c_hid_output_raw_report.isra.2+0x3d4/0x4b8 drivers/hid/i2c-hid/i2c-hid.c:589
[<ffffffc001307ad8>] i2c_hid_output_report+0x54/0x68 drivers/hid/i2c-hid/i2c-hid.c:602
[<     inline     >] hid_hw_output_report include/linux/hid.h:1039
[<ffffffc0012cc7a0>] hidraw_send_report+0x400/0x414 drivers/hid/hidraw.c:154
[<ffffffc0012cc7f4>] hidraw_write+0x40/0x64 drivers/hid/hidraw.c:177
[<ffffffc0004681dc>] vfs_write+0x1d4/0x3cc fs/read_write.c:534
[<     inline     >] SYSC_pwrite64 fs/read_write.c:627
[<ffffffc000468984>] SyS_pwrite64+0xec/0x144 fs/read_write.c:614
Object at ffffffc07135ea80, in cache kmalloc-512
Object allocated with size 268 bytes.

Let's check data length against the buffer size before attempting to copy
data over.

Cc: stable@vger.kernel.org
Reported-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 33184f68f4b527a6582e8fc5e94a7a7b6ba9c588)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

md: multipath: don't hardcopy bio in .make_request path

Orabug: 23331114

[ Upstream commit fafcde3ac1a418688a734365203a12483b83907a ]

Inside multipath_make_request(), multipath maps the incoming
bio into low level device's bio, but it is totally wrong to
copy the bio into mapped bio via '*mapped_bio = *bio'. For
example, .__bi_remaining is kept in the copy, especially if
the incoming bio is chained to via bio splitting, so .bi_end_io
can't be called for the mapped bio at all in the completing path
in this kind of situation.

This patch fixes the issue by using clone style.

Cc: stable@vger.kernel.org (v3.14+)
Reported-and-tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 79deb9b280a508c77f6cc233e083544712bc2458)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

Input: powermate - fix oops with malicious USB descriptors

Orabug: 23331113

[ Upstream commit 9c6ba456711687b794dcf285856fc14e2c76074f ]

The powermate driver expects at least one valid USB endpoint in its
probe function. If given malicious descriptors that specify 0 for
the number of endpoints, it will crash. Validate the number of
endpoints on the interface before using them.

The full report for this issue can be found here:
http://seclists.org/bugtraq/2016/Mar/85

Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
(cherry picked from commit 76b69dfeb5f1bf19a6bd65991506bbb00647716b)

Signed-off-by: Dan Duval <dan.duval@oracle.com>