www.infradead.org Git - users/jedix/linux-maple.git/log

Merge branch 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/ocfs2' of git://ca-git.us.oracle.com/linux-uek:
NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock

Merge branch 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek:
  nfs: take extra reference to fl->fl_file when running a LOCKU operation
  mm: madvise allow remove operation for hugetlbfs
  mmotm: build fix hugetlbfs fallocate if not CONFIG_NUMA
  hugetlbfs: add hugetlbfs_fallocate()
  hugetlbfs: New huge_add_to_page_cache helper routine
  mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate
  mm/hugetlb: vma_has_reserves() needs to handle fallocate hole punch
  mm/hugetlb.c: make vma_has_reserves() return bool
  hugetlbfs: truncate_hugepages() takes a range of pages
  hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete
  mm/hugetlb: expose hugetlb fault mutex for use by fallocate
  mm/hugetlb: add region_del() to delete a specific range of entries
  mm-hugetlb-add-cache-of-descriptors-to-resv_map-for-region_add-fix
  mm/hugetlb: add cache of descriptors to resv_map for region_add
  mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages
  mm/hugetlb: compute/return the number of regions added by region_add()
  mm/hugetlb: document the reserve map/region tracking routines

nfs: take extra reference to fl->fl_file when running a LOCKU operation

Jean reported another crash, similar to the one fixed by feaff8e5b2cf:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
    IP: [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    PGD 0
    Oops: 0000 [#1] SMP
    Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock cfg80211 rfkill coretemp crct10dif_pclmul ppdev vmw_balloon crc32_pclmul crc32c_intel ghash_clmulni_intel pcspkr vmxnet3 parport_pc i2c_piix4 microcode serio_raw parport nfsd floppy vmw_vmci acpi_cpufreq auth_rpcgss shpchp nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih ata_generic mptbase i2c_core pata_acpi
    CPU: 0 PID: 329 Comm: kworker/0:1H Not tainted 4.1.0-rc7+ #2
    Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
    Workqueue: rpciod rpc_async_schedule [sunrpc]
    30ec000
    RIP: 0010:[<ffffffff8124ef7f>]  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    RSP: 0018:ffff8802330efc08  EFLAGS: 00010296
    RAX: ffff8802330efc58 RBX: ffff880097187c80 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
    RBP: ffff8802330efc18 R08: ffff88023fc173d8 R09: 3038b7bf00000000
    R10: 00002f1a02000000 R11: 3038b7bf00000000 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8802337a2300 R15: 0000000000000020
    FS:  0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000148 CR3: 000000003680f000 CR4: 00000000001407f0
    Stack:
     ffff880097187c80 ffff880097187cd8 ffff8802330efc98 ffffffff81250281
     ffff8802330efc68 ffffffffa013e7df ffff8802330efc98 0000000000000246
     ffff8801f6901c00 ffff880233d2b8d8 ffff8802330efc58 ffff8802330efc58
    Call Trace:
     [<ffffffff81250281>] __posix_lock_file+0x31/0x5e0
     [<ffffffffa013e7df>] ? rpc_wake_up_task_queue_locked.part.35+0xcf/0x240 [sunrpc]
     [<ffffffff8125088b>] posix_lock_file_wait+0x3b/0xd0
     [<ffffffffa03890b2>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
     [<ffffffffa0365808>] ? nfs41_sequence_done+0xd8/0x300 [nfsv4]
     [<ffffffffa0367525>] do_vfs_lock+0x35/0x40 [nfsv4]
     [<ffffffffa03690c1>] nfs4_locku_done+0x81/0x120 [nfsv4]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e33c>] rpc_exit_task+0x2c/0x90 [sunrpc]
     [<ffffffffa0134400>] ? call_refreshresult+0x170/0x170 [sunrpc]
     [<ffffffffa013ece4>] __rpc_execute+0x84/0x410 [sunrpc]
     [<ffffffffa013f085>] rpc_async_schedule+0x15/0x20 [sunrpc]
     [<ffffffff810add67>] process_one_work+0x147/0x400
     [<ffffffff810ae42b>] worker_thread+0x11b/0x460
     [<ffffffff810ae310>] ? rescuer_thread+0x2f0/0x2f0
     [<ffffffff810b35d9>] kthread+0xc9/0xe0
     [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pmd+0xa0/0x160
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
     [<ffffffff8173c222>] ret_from_fork+0x42/0x70
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
    Code: a5 81 e8 85 75 e4 ff c6 05 31 ee aa 00 01 eb 98 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 <48> 8b 9f 48 01 00 00 48 85 db 74 08 48 89 d8 5b 41 5c 5d c3 83
    RIP  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
     RSP <ffff8802330efc08>
    CR2: 0000000000000148
    ---[ end trace 64484f16250de7ef ]---

The problem is almost exactly the same as the one fixed by feaff8e5b2cf.
We must take a reference to the struct file when running the LOCKU
compound to prevent the final fput from running until the operation is
complete.

Reported-by: Jean Spector <jean@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Orabug: 21687670
(cherry picked from mainline commit db2efec0caba4f81a22d95a34da640b86c313c8e)
Acked-by: Chuck Anderson <chuck.anderson@oracle.com>

NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock

Orabug: 20933419

NFS on a 2 node ocfs2 cluster each node exporting dir. The lock causing
the hang is the global bit map inode lock.  Node 1 is master, has
the lock granted in PR mode; Node 2 is in the converting list (PR ->
EX). There are no holders of the lock on the master node so it should
downconvert to NL and grant EX to node 2 but that does not happen.
BLOCKED + QUEUED in lock res are set and it is on osb blocked list.
Threads are waiting in __ocfs2_cluster_lock on BLOCKED.  One thread wants
EX, rest want PR. So it is as though the downconvert thread needs to be
kicked to complete the conv.

The hang is caused by an EX req coming into  __ocfs2_cluster_lock on
the heels of a PR req after it sets BUSY (drops l_lock, releasing EX
thread), forcing the incoming EX to wait on BUSY without doing anything.
PR has called ocfs2_dlm_lock, which  sets the node 1 lock from NL ->
PR, queues ast.

At this time, upconvert (PR ->EX) arrives from node 2, finds conflict with
node 1 lock in PR, so the lock res is put on dlm thread's dirty listt.

After ret from ocf2_dlm_lock, PR thread now waits behind EX on BUSY till
awoken by ast.

Now it is dlm_thread that serially runs dlm_shuffle_lists, ast,  bast,
in that order.  dlm_shuffle_lists ques a bast on behalf of node 2
(which will be run by dlm_thread right after the ast).  ast does its
part, sets UPCONVERT_FINISHING, clears BUSY and wakes its waiters. Next,
dlm_thread runs  bast. It sets BLOCKED and kicks dc thread.  dc thread
runs ocfs2_unblock_lock, but since UPCONVERT_FINISHING set, skips doing
anything and reques.

Inside of __ocfs2_cluster_lock, since EX has been waiting on BUSY ahead
of PR, it wakes up first, finds BLOCKED set and skips doing anything
but clearing UPCONVERT_FINISHING (which was actually "meant" for the
PR thread), and this time waits on BLOCKED.  Next, the PR thread comes
out of wait but since UPCONVERT_FINISHING is not set, it skips updating
the l_ro_holders and goes straight to wait on BLOCKED. So there, we
have a hang! Threads in __ocfs2_cluster_lock wait on BLOCKED, lock
res in osb blocked list. Only when dc thread is awoken, it will run
ocfs2_unblock_lock and things will unhang.

One way to fix this is to wake the dc thread on the flag after clearing
UPCONVERT_FINISHING

Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm: madvise allow remove operation for hugetlbfs

Orabug: 21652814

Now that we have hole punching support for hugetlbfs, we can
also support the MADV_REMOVE interface to it.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8f787c8989ce599cbf0feb10ecea912d07111439)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mmotm: build fix hugetlbfs fallocate if not CONFIG_NUMA

Orabug: 21652814

Commit 56bb4d795 introduced a build error if CONFIG_NUMA is not
defined. When fallocate preallocation allocates pages, it will
use the defined numa policy. However, if numa is not defined
there is no such policy and no code should reference numa policy.
Create wrappers to isolate policy manipulation code that are a
NOOP in the non-NUMA case.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 0ce9057732d1dd94ef2bd32c8acb68ae68b08a2f)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

hugetlbfs: add hugetlbfs_fallocate()

Orabug: 21652814

This is based on the shmem version, but it has diverged quite a bit.  We
have no swap to worry about, nor the new file sealing.  Add
synchronication via the fault mutex table to coordinate page faults,
fallocate allocation and fallocate hole punch.

What this allows us to do is move physical memory in and out of a
hugetlbfs file without having it mapped.  This also gives us the ability
to support MADV_REMOVE since it is currently implemented using
fallocate().  MADV_REMOVE lets madvise() remove pages from the middle of a
hugetlbfs file, which wasn't possible before.

hugetlbfs fallocate only operates on whole huge pages.

Based on code by Dave Hansen.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 79925b07fd22d7c4e4e77cdc26edb26dc4ff2701)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

hugetlbfs: New huge_add_to_page_cache helper routine

Orabug: 21652814

Currently, there is only a single place where hugetlbfs pages are added to
the page cache. The new fallocate code be adding a second one, so break
the functionality out into its own helper.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4b153581930e8c61250078efcdcce3e19bc2a45b)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate

Orabug: 21652814

Areas hole punched by fallocate will not have entries in the
region/reserve map. However, shared mappings with min_size subpool
reservations may still have reserved pages. alloc_huge_page needs to
handle this special case and do the proper accounting.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b21aa74b8077ce5b9c5fea566fe37af866934746)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: vma_has_reserves() needs to handle fallocate hole punch

Orabug: 21652814

In vma_has_reserves(), the current assumption is that reserves are always
present for shared mappings.  However, this will not be the case with
fallocate hole punch.  When punching a hole, the present page will be
deleted as well as the region/reserve map entry (and hence any
reservation).  vma_has_reserves is passed "chg" which indicates whether or
not a region/reserve map is present.  Use this to determine if reserves
are actually present or were removed via hole punch.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 320fb799e7cedc1edb96fd69c686547c731d5fc8)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb.c: make vma_has_reserves() return bool

Orabug: 21652814

This makes vma_has_reserves() return bool due to this particular function
only returning either one or zero as its return value.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit f7bba6ea9b80a19c4849b4d870fb2f2bd492713b)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

hugetlbfs: truncate_hugepages() takes a range of pages

Orabug: 21652814

Modify truncate_hugepages() to take a range of pages (start, end) instead
of simply start.  If an end value of LLONG_MAX is passed, the current
"truncate" functionality is maintained.  Existing callers are modified to
pass LLONG_MAX as end of range.  By keying off end == LLONG_MAX, the
routine behaves differently for truncate and hole punch.  Page removal is
now synchronized with page allocation via faults by using the fault mutex
table.  The hole punch case can experience the rare region_del error and
must handle accordingly.

Add the routine hugetlb_fix_reserve_counts to fix up reserve counts in the
case where region_del returns an error.

Since the routine handles more than just the truncate case, it is renamed
to remove_inode_hugepages().  To be consistent, the routine
truncate_huge_page() is renamed remove_huge_page().

Downstream of remove_inode_hugepages(), the routine
hugetlb_unreserve_pages() is also modified to take a range of pages.
hugetlb_unreserve_pages is modified to detect an error from region_del and
pass it back to the caller.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 6a57804ccdfb77b8f333b736a3ee7cb1bf8732e1)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete

Orabug: 21652814

fallocate hole punch will want to unmap a specific range of pages. Modify
the existing hugetlb_vmtruncate_list() routine to take a start/end range.
If end is 0, this indicates all pages after start should be unmapped.
This is the same as the existing truncate functionality. Modify existing
callers to add 0 as end of range.

Since the routine will be used in hole punch as well as truncate
operations, it is more appropriately renamed to hugetlb_vmdelete_list().

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit dea23e2a9e811e5fba895a134f701455908aa0d3)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: expose hugetlb fault mutex for use by fallocate

Orabug: 21652814

hugetlb page faults are currently synchronized by the table of mutexes
(htlb_fault_mutex_table).  fallocate code will need to synchronize with
the page fault code when it allocates or deletes pages.  Expose interfaces
so that fallocate operations can be synchronized with page faults.  Minor
name changes to be more consistent with other global hugetlb symbols.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit fec73245c33b067c60f520908a93c971003664c8)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: add region_del() to delete a specific range of entries

Orabug: 21652814

fallocate hole punch will want to remove a specific range of pages.  The
existing region_truncate() routine deletes all region/reserve map entries
after a specified offset.  region_del() will provide this same
functionality if the end of region is specified as LONG_MAX.  Hence,
region_del() can replace region_truncate().

Unlike region_truncate(), region_del() can return an error in the rare
case where it can not allocate memory for a region descriptor.  This ONLY
happens in the case where an existing region must be split.  Current
callers passing LONG_MAX as end of range will never experience this error
and do not need to deal with error handling.  Future callers of
region_del() (such as fallocate hole punch) will need to handle this
error.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit c2cfad5701106f8ddb0607b9e09d524ef55ef0ec)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm-hugetlb-add-cache-of-descriptors-to-resv_map-for-region_add-fix

Orabug: 21652814

fix typo in comment, use more cols

Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 203a5abc2eb0bcd7f8e5a3742467e845de368df8)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: add cache of descriptors to resv_map for region_add

Orabug: 21652814

hugetlbfs is used today by applications that want a high degree of control
over huge page usage.  Often, large hugetlbfs files are used to map a
large number huge pages into the application processes.  The applications
know when page ranges within these large files will no longer be used, and
ideally would like to release them back to the subpool or global pools for
other uses.  The fallocate() system call provides an interface for
preallocation and hole punching within files.  This patch set adds
fallocate functionality to hugetlbfs.

fallocate hole punch will want to remove a specific range of pages.  When
pages are removed, their associated entries in the region/reserve map will
also be removed.  This will break an assumption in the
region_chg/region_add calling sequence.  If a new region descriptor must
be allocated, it is done as part of the region_chg processing.  In this
way, region_add can not fail because it does not need to attempt an
allocation.

To prepare for fallocate hole punch, create a "cache" of descriptors that
can be used by region_add if necessary.  region_chg will ensure there are
sufficient entries in the cache.  It will be necessary to track the number
of in progress add operations to know a sufficient number of descriptors
reside in the cache.  A new routine region_abort is added to adjust this
in progress count when add operations are aborted.  vma_abort_reservation
is also added for callers creating reservations with
vma_needs_reservation/vma_commit_reservation.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 27af163113310a86b6d19bb5693c1a08eb89b0f7)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages

Orabug: 21652814

alloc_huge_page and hugetlb_reserve_pages use region_chg to calculate the
number of pages which will be added to the reserve map.  Subpool and
global reserve counts are adjusted based on the output of region_chg.
Before the pages are actually added to the reserve map, these routines
could race and add fewer pages than expected.  If this happens, the
subpool and global reserve counts are not correct.

Compare the number of pages actually added (region_add) to those expected
to added (region_chg).  If fewer pages are actually added, this indicates
a race and adjust counters accordingly.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 33039678c8da8133e30ea3250d10ae14701dae2b)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: compute/return the number of regions added by region_add()

Orabug: 21652814

Modify region_add() to keep track of regions(pages) added to the reserve
map and return this value. The return value can be compared to the return
value of region_chg() to determine if the map was modified between calls.

Make vma_commit_reservation() also pass along the return value of
region_add(). In the normal case, we want vma_commit_reservation to
return the same value as the preceding call to vma_needs_reservation.
Create a common __vma_reservation_common routine to help keep the special
case return values in sync

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit cf3ad20bfeadda693e408d85684790714fc29b08)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mm/hugetlb: document the reserve map/region tracking routines

Orabug: 21652814

While working on hugetlbfs fallocate support, I noticed the following race
in the existing code.  It is unlikely that this race is hit very often in
the current code.  However, if more functionality to add and remove pages
to hugetlbfs mappings (such as fallocate) is added the likelihood of
hitting this race will increase.

alloc_huge_page and hugetlb_reserve_pages use information from the reserve
map to determine if there are enough available huge pages to complete the
operation, as well as adjust global reserve and subpool usage counts.  The
order of operations is as follows:

- call region_chg() to determine the expected change based on reserve map
- determine if enough resources are available for this operation
- adjust global counts based on the expected change
- call region_add() to update the reserve map

The issue is that reserve map could change between the call to region_chg
and region_add.  In this case, the counters which were adjusted based on
the output of region_chg will not be correct.

In order to hit this race today, there must be an existing shared hugetlb
mmap created with the MAP_NORESERVE flag.  A page fault to allocate a huge
page via this mapping must occur at the same another task is mapping the
same region without the MAP_NORESERVE flag.

The patch set does not prevent the race from happening.  Rather, it adds
simple functionality to detect when the race has occurred.  If a race is
detected, then the incorrect counts are adjusted.

Review comments pointed out the need for documentation of the existing
region/reserve map routines.  This patch set also adds documentation in
this area.

This patch (of 3):

This is a documentation only patch and does not modify any code.
Descriptions of the routines used for reserve map/region tracking are
added.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1dd308a7b49d4bdbc17bfa570675ecc8cf7bedb3)
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into uek/uek-4.1

* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed:
  rds: print vendor error on error induced disconnect/re-connect
  rds: re-entry of rds_ib_xmit/rds_iw_xmit
  net/mlx4_vnic: Initialize new fields of mlx4_ib_qp

Merge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.rds-p2:
rds: print vendor error on error induced disconnect/re-connect
rds: re-entry of rds_ib_xmit/rds_iw_xmit

rds: print vendor error on error induced disconnect/re-connect

An enhancement to log vendor error when rds connections
are disconnected on error and reconnects attempted.

Orabug: 21527137

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

rds: re-entry of rds_ib_xmit/rds_iw_xmit

The BUG_ON at line 452/453 is triggered in function rds_send_xmit.

441                         while (ret) {
442                                 tmp = min_t(int, ret, sg->length -
443                                                       conn->c_xmit_data_off);
444                                 conn->c_xmit_data_off += tmp;
445                                 ret -= tmp;
446                                 if (conn->c_xmit_data_off == sg->length) {
447                                         conn->c_xmit_data_off = 0;
448                                         sg++;
449                                         conn->c_xmit_sg++;
450                                         if (ret != 0 && conn->c_xmit_sg == rm->data.op_nents)
451                                                 printk(KERN_ERR "conn %p rm %p sg %p ret %d\n", conn, rm, sg, ret);
452                                         BUG_ON(ret != 0 &&
453                                                conn->c_xmit_sg == rm->data.op_nents);
454                                 }
455                         }

It is complaining that total sent length is bigger that we want to send.

rds_ib_xmit() is wrong for the second entry for the same rds_message returning
wrong value.

The sg and off passed by rds_send_xmit() to rds_ib_xmit() is based on
scatterlist.offset/length, but the rds_ib_xmit() action is based on
scatterlist.dma_address/dma_length. In case dma_length is larger than length
there is problem. for the 2nd and later entries of rds_ib_xmit() for same
rds_message, at least one of the following two is wrong:

1) the scatterlist to start with, the chosen one can be far beyond the correct
   one.
2) the offset to start with within the scatterlist.

Fix is to add op_dmasg and op_dmaoff fields to rm_data_op structure indicating
the scatterlist and offset within the it to start with for rds_ib_xmit()
respectively. The op_dmasg and op_dmaoff fields are initialized to zero
when doing dma mapping for the first time of the message and are changed
when filling send slots.

The same applies to rds_iw_xmit() too.

upstream commit: d655a9fbc8a51ac8d92db7ff5a599aab17dce3ca

Orabug: 21324078

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
  uek-rpm: configs: sync up config with v4.1.6 stable tag
  uek-rpm: build: Update the base release to 6 with stable v4.1.6
  uek-rpm: configs: enable SNIC driver in kernel configs

Merge branch 'topic/uek-4.1/secureboot' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/secureboot' of git://ca-git.us.oracle.com/linux-uek:
selinux: enable setting security context in cgroup

Merge branch 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/upstream-cherry-picks' of git://ca-git.us.oracle.com/linux-uek:
CVE-2015-666: Revert "sched/x86_64: Don't save flags on context switch"

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek:
snic: driver for Cisco SCSI HBA

Merge branch 'topic/uek-4.1/stable-cherry-picks' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/stable-cherry-picks' of git://ca-git.us.oracle.com/linux-uek: (85 commits)
  Linux 4.1.6
  nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid
  nfsd: refactor nfs4_preprocess_stateid_op
  kvm: x86: fix kvm_apic_has_events to check for NULL pointer
  signal: fix information leak in copy_siginfo_from_user32
  signal: fix information leak in copy_siginfo_to_user
  signalfd: fix information leak in signalfd_copyinfo
  mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
  thermal: exynos: Disable the regulator on probe failure
  Input: alps - only Dell laptops have separate button bits for v2 dualpoint sticks
  mtd: nand: Fix NAND_USE_BOUNCE_BUFFER flag conflict
  USB: qcserial: Add support for Dell Wireless 5809e 4G Modem
  USB: qcserial/option: make AT URCs work for Sierra Wireless MC7305/MC7355
  usb: gadget: f_uac2: fix calculation of uac2->p_interval
  staging: lustre: Include unaligned.h instead of access_ok.h
  staging: vt6655: vnt_bss_info_changed check conf->beacon_rate is not NULL
  dm: fix dm_merge_bvec regression on 32 bit systems
  md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies
  PCI: Restore PCI_MSIX_FLAGS_BIRMASK definition
  nfsd: Drop BUG_ON and ignore SECLABEL on absent filesystem
  ...

selinux: enable setting security context in cgroup

Orabug: 21295765

cgroup uses kernfs that has 'security.*' setxattr handler. But setxattr
with 'security.selinux' name returns EOPNOTSUPP, i.e. SBLABEL_MNT
not set on the cgroup filesystem.

Fix it by adding 'cgroup' type to genfs special handling list.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes:
net/mlx4_vnic: Initialize new fields of mlx4_ib_qp

net/mlx4_vnic: Initialize new fields of mlx4_ib_qp

Initializing the three new mlx4_ib_qp's fields - qps_list, cq_recv_list
and cq_send_list.
w/o initializing these new fields, kernel crashed in destroy_qp_common
when trying to remove them from the list.
The functions get_cqs, mlx4_ib_lock_cqs and mlx4_ib_unlock_cqs moved as
inline functions to mlx4_ib.h so it can be called also from mlx4_vnic.

Orabug: 21530835

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>

uek-rpm: configs: sync up config with v4.1.6 stable tag

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

uek-rpm: build: Update the base release to 6 with stable v4.1.6

Stable v4.1.6 is available so lets get that in. Update the
spec file accordingly.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge tag 'v4.1.6' of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into topic/uek-4.1/stable-cherry-picks

* tag 'v4.1.6' of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (85 commits)
  Linux 4.1.6
  nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid
  nfsd: refactor nfs4_preprocess_stateid_op
  kvm: x86: fix kvm_apic_has_events to check for NULL pointer
  signal: fix information leak in copy_siginfo_from_user32
  signal: fix information leak in copy_siginfo_to_user
  signalfd: fix information leak in signalfd_copyinfo
  mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
  thermal: exynos: Disable the regulator on probe failure
  Input: alps - only Dell laptops have separate button bits for v2 dualpoint sticks
  mtd: nand: Fix NAND_USE_BOUNCE_BUFFER flag conflict
  USB: qcserial: Add support for Dell Wireless 5809e 4G Modem
  USB: qcserial/option: make AT URCs work for Sierra Wireless MC7305/MC7355
  usb: gadget: f_uac2: fix calculation of uac2->p_interval
  staging: lustre: Include unaligned.h instead of access_ok.h
  staging: vt6655: vnt_bss_info_changed check conf->beacon_rate is not NULL
  dm: fix dm_merge_bvec regression on 32 bit systems
  md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies
  PCI: Restore PCI_MSIX_FLAGS_BIRMASK definition
  nfsd: Drop BUG_ON and ignore SECLABEL on absent filesystem
  ...

uek-rpm: configs: enable SNIC driver in kernel configs

Enable SNIC in OL6/OL7 kernel configs.

Orabug: 21674432

Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

snic: driver for Cisco SCSI HBA

Orabug: 21674432

Cisco has developed a new PCI HBA interface called sNIC, which stands for
SCSI NIC. This is a new storage feature supported on specialized network
adapter. The new PCI function provides a uniform host interface and abstracts
backend storage.

[jejb: fix up checkpatch errors]
Signed-off-by: Narsimhulu Musini <nmusini@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
(cherry picked from commit c8806b6c9e824f47726f2a9b7fbbe7ebf19306fa)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

CVE-2015-666: Revert "sched/x86_64: Don't save flags on context switch"

This reverts commit 2c7577a7583747c9b71f26dced7f696b739da745.

CVE Request: Linux x86_64 NT flag issue

When I fixed Linux's NT flag handling, I added an optimization to
Linux 3.19 and up.  A malicious 32-bit program might be able to leak
NT into an unrelated task.  On a CONFIG_PREEMPT=y kernel, this is a
straightforward DoS.  On a CONFIG_PREEMPT=n kernel, it's probably
still exploitable for DoS with some more care.

I believe that this could be used for privilege escalation, too, but
it won't be easy.

The fix is just to revert the optimization:

Orabug: 21689349
CVE:  CVE-2015-666

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
  modsign: Add key for moodule signing
  uek-rpm: extrakeys.pub is not needed for the build
  uek-rpm: build: Fix the new-kernel-pkg path for ol7

modsign: Add key for moodule signing

Orabug: 21659739

Signed-off-by: Alexey Petrenko <alexey.petrenko@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

uek-rpm: extrakeys.pub is not needed for the build

Orabug: 21249387

Signed-off-by: Alexey Petrenko <alexey.petrenko@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

uek-rpm: build: Fix the new-kernel-pkg path for ol7

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/dtrace' of git://ca-git.us.oracle.com/linux-uek:
dtrace: only call dtrace functions when CONFIG_DTRACE is set

dtrace: only call dtrace functions when CONFIG_DTRACE is set

The call to dtrace_sdt_register_module() in complete_formation()
should only be done when CONFIG_DTRACE is set.

Orabug: 21647525

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Merge branch 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/rpm-build' of git://ca-git.us.oracle.com/linux-uek:
  uek-rpm: config: sync up the configs with 4.1.5 stable
  uek-rpm: config: Enable OVM API
  uek-rpm: config: enable some DRM options

Merge branch 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* 'topic/uek-4.1/uek-carry' of git://ca-git.us.oracle.com/linux-uek:
OVMAPI: port ovmapi.ko to UEK4 from UEK3

uek-rpm: config: sync up the configs with 4.1.5 stable

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

uek-rpm: config: Enable OVM API

Orabug: 20426111

Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
Reviewed-by: Adnan Misherfi <adnan.misherfi@oracle.com>
Reviewed-by: Guru Anbalagane <guru.anbalagane@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

uek-rpm: config: enable some DRM options

Xen qemu-dm emulates cirrus VGA by default and supports other VGA types.
Compile them as module and make it consistent with ol7.

Orabug: 21615719

Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

OVMAPI: port ovmapi.ko to UEK4 from UEK3

Orabug: 20426111

Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
Reviewed-by: Adnan Misherfi <adnan.misherfi@oracle.com>
Reviewed-by: Guru Anbalagane <guru.anbalagane@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Linux 4.1.6

nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid

commit 8fcd461db7c09337b6d2e22d25eb411123f379e3 upstream.

Currently, preprocess_stateid_op calls nfs4_check_olstateid which
verifies that the open stateid corresponds to the current filehandle in the
call by calling nfs4_check_fh.

If the stateid is a NFS4_DELEG_STID however, then no such check is done.
This could cause incorrect enforcement of permissions, because the
nfsd_permission() call in nfs4_check_file uses current the current
filehandle, but any subsequent IO operation will use the file descriptor
in the stateid.

Move the call to nfs4_check_fh into nfs4_check_file instead so that it
can be done for all stateid types.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
[bfields: moved fh check to avoid NULL deref in special stateid case]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: refactor nfs4_preprocess_stateid_op

commit a0649b2d3fffb1cde8745568c767f3a55a3462bc upstream.

Split out two self contained helpers to make the function more readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kvm: x86: fix kvm_apic_has_events to check for NULL pointer

commit ce40cd3fc7fa40a6119e5fe6c0f2bc0eb4541009 upstream.

Malicious (or egregiously buggy) userspace can trigger it, but it
should never happen in normal operation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wang Kai <morgan.wang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

signal: fix information leak in copy_siginfo_from_user32

commit 3c00cb5e68dc719f2fc73a33b1b230aadfcb1309 upstream.

This function can leak kernel stack data when the user siginfo_t has a
positive si_code value.  The top 16 bits of si_code descibe which fields
in the siginfo_t union are active, but they are treated inconsistently
between copy_siginfo_from_user32, copy_siginfo_to_user32 and
copy_siginfo_to_user.

copy_siginfo_from_user32 is called from rt_sigqueueinfo and
rt_tgsigqueueinfo in which the user has full control overthe top 16 bits
of si_code.

This fixes the following information leaks:
x86:   8 bytes leaked when sending a signal from a 32-bit process to
       itself. This leak grows to 16 bytes if the process uses x32.
       (si_code = __SI_CHLD)
x86:   100 bytes leaked when sending a signal from a 32-bit process to
       a 64-bit process. (si_code = -1)
sparc: 4 bytes leaked when sending a signal from a 32-bit process to a
       64-bit process. (si_code = any)

parsic and s390 have similar bugs, but they are not vulnerable because
rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code
to a different process.  These bugs are also fixed for consistency.

Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

signal: fix information leak in copy_siginfo_to_user

commit 26135022f85105ad725cda103fa069e29e83bd16 upstream.

This function may copy the si_addr_lsb, si_lower and si_upper fields to
user mode when they haven't been initialized, which can leak kernel
stack data to user mode.

Just checking the value of si_code is insufficient because the same
si_code value is shared between multiple signals. This is solved by
checking the value of si_signo in addition to si_code.

Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

signalfd: fix information leak in signalfd_copyinfo

commit 3ead7c52bdb0ab44f4bb1feed505a8323cc12ba7 upstream.

This function may copy the si_addr_lsb field to user mode when it hasn't
been initialized, which can leak kernel stack data to user mode.

Just checking the value of si_code is insufficient because the same
si_code value is shared between multiple signals. This is solved by
checking the value of si_signo in addition to si_code.

Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations

commit ecf5fc6e9654cd7a268c782a523f072b2f1959f9 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
#10 shrink_zone at ffffffff811360c3
#11 shrink_zones at ffffffff81136eff
#12 do_try_to_free_pages at ffffffff8113712f
#13 try_to_free_mem_cgroup_pages at ffffffff811372be
#14 try_charge at ffffffff81189423
#15 mem_cgroup_try_charge at ffffffff8118c6f5
#16 __add_to_page_cache_locked at ffffffff8112137d
#17 add_to_page_cache_lru at ffffffff81121618
#18 pagecache_get_page at ffffffff8112170b
#19 grow_dev_page at ffffffff811c8297
#20 __getblk_slow at ffffffff811c91d6
#21 __getblk_gfp at ffffffff811c92c1
#22 ext4_ext_grow_indepth at ffffffff8124565c
#23 ext4_ext_create_new_leaf at ffffffff81246ca8
#24 ext4_ext_insert_extent at ffffffff81246f09
#25 ext4_ext_map_blocks at ffffffff8124a848
#26 ext4_map_blocks at ffffffff8121a5b7
#27 mpage_map_one_extent at ffffffff8121b1fa
#28 mpage_map_and_submit_extent at ffffffff8121f07b
#29 ext4_writepages at ffffffff8121f6d5
#30 do_writepages at ffffffff8112c490
#31 __filemap_fdatawrite_range at ffffffff81120199
#32 filemap_flush at ffffffff8112041c
#33 ext4_alloc_da_blocks at ffffffff81219da1
#34 ext4_rename at ffffffff81229b91
#35 ext4_rename2 at ffffffff81229e32
#36 vfs_rename at ffffffff811a08a5
#37 SYSC_renameat2 at ffffffff811a3ffc
#38 sys_renameat2 at ffffffff811a408e
#39 sys_rename at ffffffff8119e51e
#40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384e9da8 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f44fcb0 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f44fcb0 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

thermal: exynos: Disable the regulator on probe failure

commit 5f09a5cbd14ae16e93866040fa44d930ff885650 upstream.

During probe the regulator (if present) was enabled but not disabled in
case of failure. So an unsuccessful probe lead to enabling the
regulator which was actually not needed because the device was not
enabled.

Additionally each deferred probe lead to increase of regulator enable
count so it would not be effectively disabled during removal of the
device.

Test HW: Exynos4412 - Trats2 board

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 498d22f616f6 ("thermal: exynos: Support for TMU regulator defined at device tree")
Reviewed-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Tested-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: alps - only Dell laptops have separate button bits for v2 dualpoint sticks

commit 073e570d7c2caae9910a993d56f340be4548a4a8 upstream.

It turns out that only Dell laptops have the separate button bits for
v2 dualpoint sticks and that commit 92bac83dd79e ("Input: alps - non
interleaved V2 dualpoint has separate stick button bits") causes
regressions on Toshiba laptops.

This commit adds a check for Dell laptops to the code for handling these
extra button bits, fixing this regression.

This patch has been tested on a Dell Latitude D620 to make sure that it
does not reintroduce the original problem.

Reported-and-tested-by: Douglas Christman <douglaschristman@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mtd: nand: Fix NAND_USE_BOUNCE_BUFFER flag conflict

commit 5f867db63473f32cce1b868e281ebd42a41f8fad upstream.

Commit 66507c7bc8895f0da6b ("mtd: nand: Add support to use nand_base
poi databuf as bounce buffer") added a flag NAND_USE_BOUNCE_BUFFER
using the same bit value as the existing NAND_BUSWIDTH_AUTO.

Cc: Kamal Dasu <kdasu.kdev@gmail.com>
Fixes: 66507c7bc8895f0da6b ("mtd: nand: Add support to use nand_base
poi databuf as bounce buffer")
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: qcserial: Add support for Dell Wireless 5809e 4G Modem

commit 6da3700c98cdc8360f55c5510915efae1d66deea upstream.

Added the USB IDs 0x413c:0x81b1 for the "Dell Wireless 5809e Gobi(TM) 4G
LTE Mobile Broadband Card", a Dell-branded Sierra Wireless EM7305 LTE
card in M.2 form factor, used eg. in Dell's Latitude E7540 Notebook
series.

"lsusb -v" output for this device:

Bus 002 Device 003: ID 413c:81b1 Dell Computer Corp.
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0        64
  idVendor           0x413c Dell Computer Corp.
  idProduct          0x81b1
  bcdDevice            0.06
  iManufacturer           1 Sierra Wireless, Incorporated
  iProduct                2 Dell Wireless 5809e Gobi™ 4G LTE Mobile Broadband Card
  iSerial                 3
  bNumConfigurations      2
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength          204
    bNumInterfaces          4
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower              500mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x01  EP 1 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      0
      bInterfaceProtocol      0
      iInterface              0
      ** UNRECOGNIZED:  05 24 00 10 01
      ** UNRECOGNIZED:  05 24 01 00 00
      ** UNRECOGNIZED:  04 24 02 02
      ** UNRECOGNIZED:  05 24 06 00 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x83  EP 3 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x000c  1x 12 bytes
        bInterval               9
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        3
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      0
      bInterfaceProtocol      0
      iInterface              0
      ** UNRECOGNIZED:  05 24 00 10 01
      ** UNRECOGNIZED:  05 24 01 00 00
      ** UNRECOGNIZED:  04 24 02 02
      ** UNRECOGNIZED:  05 24 06 00 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x85  EP 5 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x000c  1x 12 bytes
        bInterval               9
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x84  EP 4 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x03  EP 3 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        8
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x87  EP 7 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x000a  1x 10 bytes
        bInterval               9
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x86  EP 6 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x04  EP 4 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
        ** UNRECOGNIZED:  2c ff 42 49 53 54 00 01 07 f5 40 f6 00 00 00 00 01 f7 c4 09 02 f8 c4 09 03 f9 88 13 04 fa 10 27 05 fb 10 27 06 fc c4 09 07 fd c4 09
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           95
    bNumInterfaces          2
    bConfigurationValue     2
    iConfiguration          0
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower              500mA
    Interface Association:
      bLength                 8
      bDescriptorType        11
      bFirstInterface        12
      bInterfaceCount         2
      bFunctionClass          2 Communications
      bFunctionSubClass      14
      bFunctionProtocol       0
      iFunction               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber       12
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         2 Communications
      bInterfaceSubClass     14
      bInterfaceProtocol      0
      iInterface              0
      CDC Header:
        bcdCDC               1.10
      CDC Union:
        bMasterInterface        12
        bSlaveInterface         13
      CDC MBIM:
        bcdMBIMVersion       1.00
        wMaxControlMessage   4096
        bNumberFilters       32
        bMaxFilterSize       128
        wMaxSegmentSize      1500
        bmNetworkCapabilities 0x20
          8-byte ntb input size
      CDC MBIM Extended:
        bcdMBIMExtendedVersion           1.00
        bMaxOutstandingCommandMessages     64
        wMTU                             1500
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               9
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber       13
      bAlternateSetting       0
      bNumEndpoints           0
      bInterfaceClass        10 CDC Data
      bInterfaceSubClass      0
      bInterfaceProtocol      2
      iInterface              0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber       13
      bAlternateSetting       1
      bNumEndpoints           2
      bInterfaceClass        10 CDC Data
      bInterfaceSubClass      0
      bInterfaceProtocol      2
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x01  EP 1 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
Device Qualifier (for other device speed):
  bLength                10
  bDescriptorType         6
  bcdUSB               2.00
  bDeviceClass            0
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0        64
  bNumConfigurations      2
Device Status:     0x0000
  (Bus Powered)

Signed-off-by: Pieter Hollants <pieter@hollants.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: qcserial/option: make AT URCs work for Sierra Wireless MC7305/MC7355

commit 653cdc13a340ad1cef29f1bab0d05d0771fa1d57 upstream.

Tests with a Sierra Wireless MC7355 have shown that 1199:9041 devices
also require the option_send_setup() code to be used on the USB
interface for the AT port to make unsolicited response codes work
correctly. Move these devices from the qcserial driver to the option
driver like it has been done for the 1199:68c0 devices in commit
d80c0d14183516f184a5ac88e11008ee4c7d2a2e ("USB: qcserial/option: make
AT URCs work for Sierra Wireless MC73xx").

Signed-off-by: Reinhard Speyerer <rspmn@arcor.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: gadget: f_uac2: fix calculation of uac2->p_interval

commit c41b7767673cb76adeb2b5fde220209f717ea13c upstream.

The p_interval should be less if the 'bInterval' at the descriptor
is larger, eg, if 'bInterval' is 5 for HS, the p_interval should be
8000 / 16 = 500.

It fixes the patch 9bb87f168931 ("usb: gadget: f_uac2: send
reasonably sized packets")

Fixes: 9bb87f168931 ("usb: gadget: f_uac2: send reasonably sized packets")
Acked-by: Daniel Mack <zonque@gmail.com>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: Include unaligned.h instead of access_ok.h

commit fb1de5a4c825a389f054cc3803e06116d2fbdc7e upstream.

Including access_ok.h causes the ia64:allmodconfig build (and maybe others)
to fail with

include/linux/unaligned/le_struct.h:6:19: error:
redefinition of 'get_unaligned_le16'
include/linux/unaligned/access_ok.h:7:19: note:
previous definition of 'get_unaligned_le16' was here
include/linux/unaligned/le_struct.h:26:20: error:
redefinition of 'put_unaligned_le32'
include/linux/unaligned/access_ok.h:42:20: note:
previous definition of 'put_unaligned_le32' was here
include/linux/unaligned/le_struct.h:31:20: error:
redefinition of 'put_unaligned_le64'
include/linux/unaligned/access_ok.h:47:20: note:
previous definition of 'put_unaligned_le64' was here

Include unaligned.h instead and leave it up to the architecture to decide
how to implement unaligned accesses.

Fixes: 8c4f136497315 ("Staging: lustre: Use put_unaligned_le64")
Cc: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: vt6655: vnt_bss_info_changed check conf->beacon_rate is not NULL

commit 1f17124006b65482d9084c01e252b59dbca8db8f upstream.

conf->beacon_rate can be NULL on association. So check conf->beacon_rate

BSS_CHANGED_BEACON_INFO needs to flagged in changed as the beacon_rate
will appear later.

Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm: fix dm_merge_bvec regression on 32 bit systems

commit bd4aaf8f9b85d6b2df3231fd62b219ebb75d3568 upstream.

A DM regression on 32 bit systems was reported against v4.2-rc3 here:
https://lkml.org/lkml/2015/7/29/401

Fix this by reverting both commit 1c220c69 ("dm: fix casting bug in
dm_merge_bvec()") and 148e51ba ("dm: improve documentation and code
clarity in dm_merge_bvec"). This combined revert is done to eliminate
the possibility of a partial revert in stable@ kernels.

In hindsight the correct fix, at the time 1c220c69 was applied to fix
the regression that 148e51ba introduced, should've been to simply revert
148e51ba.

Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Tested-by: Adam Williamson <awilliam@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies

commit 423f04d63cf421ea436bcc5be02543d549ce4b28 upstream.

raid1_end_read_request() assumes that the In_sync bits are consistent
with the ->degaded count.
raid1_spare_active updates the In_sync bit before the ->degraded count
and so exposes an inconsistency, as does error()
So extend the spinlock in raid1_spare_active() and error() to hide those
inconsistencies.

This should probably be part of
  Commit: 34cab6f42003 ("md/raid1: fix test for 'was read error from
  last working device'.")
as it addresses the same issue.  It fixes the same bug and should go
to -stable for same reasons.

Fixes: 76073054c95b ("md/raid1: clean up read_balance.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PCI: Restore PCI_MSIX_FLAGS_BIRMASK definition

commit c9ddbac9c89110f77cb0fa07e634aaf1194899aa upstream.

09a2c73ddfc7 ("PCI: Remove unused PCI_MSIX_FLAGS_BIRMASK definition")
removed PCI_MSIX_FLAGS_BIRMASK from an exported header because it was
unused in the kernel. But that breaks user programs that were using it
(QEMU in particular).

Restore the PCI_MSIX_FLAGS_BIRMASK definition.

[bhelgaas: changelog]
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: Drop BUG_ON and ignore SECLABEL on absent filesystem

commit c2227a39a078473115910512aa0f8d53bd915e60 upstream.

On an absent filesystem (one served by another server), we need to be
able to handle requests for certain attributest (like fs_locations, so
the client can find out which server does have the filesystem), but
others we can't.

We forgot to take that into account when adding another attribute
bitmask work for the SECURITY_LABEL attribute.

There an export entry with the "refer" option can result in:

[   88.414272] kernel BUG at fs/nfsd/nfs4xdr.c:2249!
[   88.414828] invalid opcode: 0000 [#1] SMP
[   88.415368] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iosf_mbi ppdev btrfs coretemp crct10dif_pclmul crc32_pclmul crc32c_intel xor ghash_clmulni_intel raid6_pq vmw_balloon parport_pc parport i2c_piix4 shpchp vmw_vmci acpi_cpufreq auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi mptscsih serio_raw mptbase e1000 scsi_transport_spi ata_generic pata_acpi [last unloaded: nfsd]
[   88.417827] CPU: 0 PID: 2116 Comm: nfsd Not tainted 4.0.7-300.fc22.x86_64 #1
[   88.418448] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[   88.419093] task: ffff880079146d50 ti: ffff8800785d8000 task.ti: ffff8800785d8000
[   88.419729] RIP: 0010:[<ffffffffa04b3c10>]  [<ffffffffa04b3c10>] nfsd4_encode_fattr+0x820/0x1f00 [nfsd]
[   88.420376] RSP: 0000:ffff8800785db998  EFLAGS: 00010206
[   88.421027] RAX: 0000000000000001 RBX: 000000000018091a RCX: ffff88006668b980
[   88.421676] RDX: 00000000fffef7fc RSI: 0000000000000000 RDI: ffff880078d05000
[   88.422315] RBP: ffff8800785dbb58 R08: ffff880078d043f8 R09: ffff880078d4a000
[   88.422968] R10: 0000000000010000 R11: 0000000000000002 R12: 0000000000b0a23a
[   88.423612] R13: ffff880078d05000 R14: ffff880078683100 R15: ffff88006668b980
[   88.424295] FS:  0000000000000000(0000) GS:ffff88007c600000(0000) knlGS:0000000000000000
[   88.424944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   88.425597] CR2: 00007f40bc370f90 CR3: 0000000035af5000 CR4: 00000000001407f0
[   88.426285] Stack:
[   88.426921]  ffff8800785dbaa8 ffffffffa049e4af ffff8800785dba08 ffffffff813298f0
[   88.427585]  ffff880078683300 ffff8800769b0de8 0000089d00000001 0000000087f805e0
[   88.428228]  ffff880000000000 ffff880079434a00 0000000000000000 ffff88006668b980
[   88.428877] Call Trace:
[   88.429527]  [<ffffffffa049e4af>] ? exp_get_by_name+0x7f/0xb0 [nfsd]
[   88.430168]  [<ffffffff813298f0>] ? inode_doinit_with_dentry+0x210/0x6a0
[   88.430807]  [<ffffffff8123833e>] ? d_lookup+0x2e/0x60
[   88.431449]  [<ffffffff81236133>] ? dput+0x33/0x230
[   88.432097]  [<ffffffff8123f214>] ? mntput+0x24/0x40
[   88.432719]  [<ffffffff812272b2>] ? path_put+0x22/0x30
[   88.433340]  [<ffffffffa049ac87>] ? nfsd_cross_mnt+0xb7/0x1c0 [nfsd]
[   88.433954]  [<ffffffffa04b54e0>] nfsd4_encode_dirent+0x1b0/0x3d0 [nfsd]
[   88.434601]  [<ffffffffa04b5330>] ? nfsd4_encode_getattr+0x40/0x40 [nfsd]
[   88.435172]  [<ffffffffa049c991>] nfsd_readdir+0x1c1/0x2a0 [nfsd]
[   88.435710]  [<ffffffffa049a530>] ? nfsd_direct_splice_actor+0x20/0x20 [nfsd]
[   88.436447]  [<ffffffffa04abf30>] nfsd4_encode_readdir+0x120/0x220 [nfsd]
[   88.437011]  [<ffffffffa04b58cd>] nfsd4_encode_operation+0x7d/0x190 [nfsd]
[   88.437566]  [<ffffffffa04aa6dd>] nfsd4_proc_compound+0x24d/0x6f0 [nfsd]
[   88.438157]  [<ffffffffa0496103>] nfsd_dispatch+0xc3/0x220 [nfsd]
[   88.438680]  [<ffffffffa006f0cb>] svc_process_common+0x43b/0x690 [sunrpc]
[   88.439192]  [<ffffffffa0070493>] svc_process+0x103/0x1b0 [sunrpc]
[   88.439694]  [<ffffffffa0495a57>] nfsd+0x117/0x190 [nfsd]
[   88.440194]  [<ffffffffa0495940>] ? nfsd_destroy+0x90/0x90 [nfsd]
[   88.440697]  [<ffffffff810bb728>] kthread+0xd8/0xf0
[   88.441260]  [<ffffffff810bb650>] ? kthread_worker_fn+0x180/0x180
[   88.441762]  [<ffffffff81789e58>] ret_from_fork+0x58/0x90
[   88.442322]  [<ffffffff810bb650>] ? kthread_worker_fn+0x180/0x180
[   88.442879] Code: 0f 84 93 05 00 00 83 f8 ea c7 85 a0 fe ff ff 00 00 27 30 0f 84 ba fe ff ff 85 c0 0f 85 a5 fe ff ff e9 e3 f9 ff ff 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 be 04 00 00 00 4c 89 ef 4c 89 8d 68 fe
[   88.444052] RIP  [<ffffffffa04b3c10>] nfsd4_encode_fattr+0x820/0x1f00 [nfsd]
[   88.444658]  RSP <ffff8800785db998>
[   88.445232] ---[ end trace 6cb9d0487d94a29f ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ocfs2: fix shift left overflow

commit 32e5a2a2be6b085febaac36efff495ad65a55e6c upstream.

When using a large volume, for example 9T volume with 2T already used,
frequent creation of small files with O_DIRECT when the IO is not
cluster aligned may clear sectors in the wrong place. This will cause
filesystem corruption.

This is because p_cpos is a u32. When calculating the corresponding
sector it should be converted to u64 first, otherwise it may overflow.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ocfs2: fix BUG in ocfs2_downconvert_thread_do_work()

commit 209f7512d007980fd111a74a064d70a3656079cf upstream.

The "BUG_ON(list_empty(&osb->blocked_lock_list))" in
ocfs2_downconvert_thread_do_work can be triggered in the following case:

ocfs2dc has firstly saved osb->blocked_lock_count to local varibale
processed, and then processes the dentry lockres. During the dentry
put, it calls iput and then deletes rw, inode and open lockres from
blocked list in ocfs2_mark_lockres_freeing. And this causes the
variable `processed' to not reflect the number of blocked lockres to be
processed, which triggers the BUG.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipc: modify message queue accounting to not take kernel data structures into account

commit de54b9ac253787c366bbfb28d901a31954eb3511 upstream.

A while back, the message queue implementation in the kernel was
improved to use btrees to speed up retrieval of messages, in commit
d6629859b36d ("ipc/mqueue: improve performance of send/recv").

That patch introducing the improved kernel handling of message queues
(using btrees) has, as a by-product, changed the meaning of the QSIZE
field in the pseudo-file created for the queue.  Before, this field
reflected the size of the user-data in the queue.  Since, it also takes
kernel data structures into account.  For example, if 13 bytes of user
data are in the queue, on my machine the file reports a size of 61
bytes.

There was some discussion on this topic before (for example
https://lkml.org/lkml/2014/10/1/115).  Commenting on a th lkml, Michael
Kerrisk gave the following background
(https://lkml.org/lkml/2015/6/16/74):

    The pseudofiles in the mqueue filesystem (usually mounted at
    /dev/mqueue) expose fields with metadata describing a message
    queue. One of these fields, QSIZE, as originally implemented,
    showed the total number of bytes of user data in all messages in
    the message queue, and this feature was documented from the
    beginning in the mq_overview(7) page. In 3.5, some other (useful)
    work happened to break the user-space API in a couple of places,
    including the value exposed via QSIZE, which now includes a measure
    of kernel overhead bytes for the queue, a figure that renders QSIZE
    useless for its original purpose, since there's no way to deduce
    the number of overhead bytes consumed by the implementation.
    (The other user-space breakage was subsequently fixed.)

This patch removes the accounting of kernel data structures in the
queue.  Reporting the size of these data-structures in the QSIZE field
was a breaking change (see Michael's comment above).  Without the QSIZE
field reporting the total size of user-data in the queue, there is no
way to deduce this number.

It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
against the worst-case size of the queue (in both the old and the new
implementation).  Therefore, the kernel overhead accounting in QSIZE is
not necessary to help the user understand the limitations RLIMIT imposes
on the processes.

Signed-off-by: Marcus Gelderie <redmnic@gmail.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: John Duffy <jb_duffy@btinternet.com>
Cc: Arto Bendiken <arto@bendiken.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwmon: (dell-smm) Blacklist Dell Studio XPS 8100

commit a4b45b25f18d1e798965efec429ba5fc01b9f0b6 upstream.

CPU fan speed going up and down on Dell Studio XPS 8100 for
unknown reasons. Without further debugging on the affected
machine, it is not possible to find the problem.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=100121
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Jan C Peters <jcpeters89@gmail.com>
[groeck: cleaned up description, comments]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwmon: (nct7904) Export I2C module alias information

commit 1252be9ce0ab4f622b8692b648894d09c0df71ce upstream.

The I2C core always reports the MODALIAS uevent as "i2c:<client name"
regardless if the driver was matched using the I2C id_table or the
of_match_table. So the driver needs to export the I2C table and this
be built into the module or udev won't have the necessary information
to auto load the correct module when the device is added.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: fireworks/firewire-lib: add support for recent firmware quirk

commit 18f5ed365d3f188a91149d528c853000330a4a58 upstream.

Fireworks uses TSB43CB43(IceLynx-Micro) as its IEC 61883-1/6 interface.
This chip includes ARM7 core, and loads and runs program. The firmware
is stored in on-board memory and loaded every powering-on from it.

Echo Audio ships several versions of firmwares for each model. These
firmwares have each quirk and the quirk changes a sequence of packets.

As long as I investigated, AudioFire2/AudioFire4/AudioFirePre8 have a
quirk to transfer a first packet with 0x02 in its dbc field. This causes
ALSA Fireworks driver to detect discontinuity. In this case, firmware
version 5.7.0, 5.7.3 and 5.8.0 are used.

Payload  CIP      CIP
quadlets header1  header2
02       00050002 90ffffff <-
42       0005000a 90013000
42       00050012 90014400
42       0005001a 90015800
02       0005001a 90ffffff
42       00050022 90019000
42       0005002a 9001a400
42       00050032 9001b800
02       00050032 90ffffff
42       0005003a 9001d000
42       00050042 9001e400
42       0005004a 9001f800
02       0005004a 90ffffff
(AudioFire2 with firmware version 5.7.)

$ dmesg
snd-fireworks fw1.0: Detect discontinuity of CIP: 00 02

These models, AudioFire8 (since Jul 2009 ) and Gibson Robot Interface
Pack series uses the same ARM binary as their firmware. Thus, this
quirk may be observed among them.

This commit adds a new member for AMDTP structure. This member represents
the value of dbc field in a first AMDTP packet. Drivers can set it with
a preferred value according to model's quirk.

Tested-by: Johannes Oertei <johannes.oertel@uni-due.de>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - one Dell machine needs the headphone white noise fixup

commit 73851b36fe73819f8c201971e913324d4846a7ea upstream.

The fixup ALC292_FIXUP_DISABLE_AAMIX can fix the white noise of
the headphone on this Dell machine.

Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - fix cs4210_spdif_automute()

commit 44008f0896ae205b02b0882dbf807f0de149efc4 upstream.

Smatch complains that we have nested checks for "spdif_present". It
turns out the current behavior isn't correct, we should remove the first
check and keep the second.

Fixes: 1077a024812d ('ALSA: hda - Use generic parser for Cirrus codec driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: OMAP2+: hwmod: Fix _wait_target_ready() for hwmods without sysc

commit 9a258afa928b45e6dd2efcac46ccf7eea705d35a upstream.

For hwmods without sysc, _init_mpu_rt_base(oh) won't be called and so
_find_mpu_rt_port(oh) will return NULL thus preventing ready state check
on those modules after the module is enabled.

This can potentially cause a bus access error if the module is accessed
before the module is ready.

Fix this by unconditionally calling _init_mpu_rt_base() during hwmod
_init(). Do ioremap only if we need SYSC access.

Eventhough _wait_target_ready() check doesn't really need MPU RT port but
just the PRCM registers, we still mandate that the hwmod must have an
MPU RT port if ready state check needs to be done. Else it would mean that
the module is not accessible by MPU so there is no point in waiting
for target to be ready.

e.g. this fixes the below DCAN bus access error on AM437x-gp-evm.

[   16.672978] ------------[ cut here ]------------
[   16.677885] WARNING: CPU: 0 PID: 1580 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x234/0x35c()
[   16.687946] 44000000.ocp:L3 Custom Error: MASTER M2 (64-bit) TARGET L4_PER_0 (Read): Data Access in User mode during Functional access
[   16.700654] Modules linked in: xhci_hcd btwilink ti_vpfe dwc3 videobuf2_core ov2659 bluetooth v4l2_common videodev ti_am335x_adc kfifo_buf industrialio c_can_platform videobuf2_dma_contig media snd_soc_tlv320aic3x pixcir_i2c_ts c_can dc
[   16.731144] CPU: 0 PID: 1580 Comm: rpc.statd Not tainted 3.14.26-02561-gf733aa036398 #180
[   16.739747] Backtrace:
[   16.742336] [<c0011108>] (dump_backtrace) from [<c00112a4>] (show_stack+0x18/0x1c)
[   16.750285]  r6:00000093 r5:00000009 r4:eab5b8a8 r3:00000000
[   16.756252] [<c001128c>] (show_stack) from [<c05a4418>] (dump_stack+0x20/0x28)
[   16.763870] [<c05a43f8>] (dump_stack) from [<c0037120>] (warn_slowpath_common+0x6c/0x8c)
[   16.772408] [<c00370b4>] (warn_slowpath_common) from [<c00371e4>] (warn_slowpath_fmt+0x38/0x40)
[   16.781550]  r8:c05d1f90 r7:c0730844 r6:c0730448 r5:80080003 r4:ed0cd210
[   16.788626] [<c00371b0>] (warn_slowpath_fmt) from [<c027fa94>] (l3_interrupt_handler+0x234/0x35c)
[   16.797968]  r3:ed0cd480 r2:c0730508
[   16.801747] [<c027f860>] (l3_interrupt_handler) from [<c0063758>] (handle_irq_event_percpu+0x54/0x1bc)
[   16.811533]  r10:ed005600 r9:c084855b r8:0000002a r7:00000000 r6:00000000 r5:0000002a
[   16.819780]  r4:ed0e6d80
[   16.822453] [<c0063704>] (handle_irq_event_percpu) from [<c00638f0>] (handle_irq_event+0x30/0x40)
[   16.831789]  r10:eb2b6938 r9:eb2b6960 r8:bf011420 r7:fa240100 r6:00000000 r5:0000002a
[   16.840052]  r4:ed005600
[   16.842744] [<c00638c0>] (handle_irq_event) from [<c00661d8>] (handle_fasteoi_irq+0x74/0x128)
[   16.851702]  r4:ed005600 r3:00000000
[   16.855479] [<c0066164>] (handle_fasteoi_irq) from [<c0063068>] (generic_handle_irq+0x28/0x38)
[   16.864523]  r4:0000002a r3:c0066164
[   16.868294] [<c0063040>] (generic_handle_irq) from [<c000ef60>] (handle_IRQ+0x38/0x8c)
[   16.876612]  r4:c081c640 r3:00000202
[   16.880380] [<c000ef28>] (handle_IRQ) from [<c00084f0>] (gic_handle_irq+0x30/0x5c)
[   16.888328]  r6:eab5ba38 r5:c0804460 r4:fa24010c r3:00000100
[   16.894303] [<c00084c0>] (gic_handle_irq) from [<c05a8d80>] (__irq_svc+0x40/0x50)
[   16.902193] Exception stack(0xeab5ba38 to 0xeab5ba80)
[   16.907499] ba20:                                                       00000000 00000006
[   16.916108] ba40: fa1d0000 fa1d0008 ed3d3000 eab5bab4 ed3d3460 c0842af4 bf011420 eb2b6960
[   16.924716] ba60: eb2b6938 eab5ba8c eab5ba90 eab5ba80 bf035220 bf07702c 600f0013 ffffffff
[   16.933317]  r7:eab5ba6c r6:ffffffff r5:600f0013 r4:bf07702c
[   16.939317] [<bf077000>] (c_can_plat_read_reg_aligned_to_16bit [c_can_platform]) from [<bf035220>] (c_can_get_berr_counter+0x38/0x64 [c_can])
[   16.952696] [<bf0351e8>] (c_can_get_berr_counter [c_can]) from [<bf010294>] (can_fill_info+0x124/0x15c [can_dev])
[   16.963480]  r5:ec8c9740 r4:ed3d3000
[   16.967253] [<bf010170>] (can_fill_info [can_dev]) from [<c0502fa8>] (rtnl_fill_ifinfo+0x58c/0x8fc)
[   16.976749]  r6:ec8c9740 r5:ed3d3000 r4:eb2b6780
[   16.981613] [<c0502a1c>] (rtnl_fill_ifinfo) from [<c0503408>] (rtnl_dump_ifinfo+0xf0/0x1dc)
[   16.990401]  r10:ec8c9740 r9:00000000 r8:00000000 r7:00000000 r6:ebd4d1b4 r5:ed3d3000
[   16.998671]  r4:00000000
[   17.001342] [<c0503318>] (rtnl_dump_ifinfo) from [<c050e6e4>] (netlink_dump+0xa8/0x1e0)
[   17.009772]  r10:00000000 r9:00000000 r8:c0503318 r7:ebf3e6c0 r6:ebd4d1b4 r5:ec8c9740
[   17.018050]  r4:ebd4d000
[   17.020714] [<c050e63c>] (netlink_dump) from [<c050ec10>] (__netlink_dump_start+0x104/0x154)
[   17.029591]  r6:eab5bd34 r5:ec8c9980 r4:ebd4d000
[   17.034454] [<c050eb0c>] (__netlink_dump_start) from [<c0505604>] (rtnetlink_rcv_msg+0x110/0x1f4)
[   17.043778]  r7:00000000 r6:ec8c9980 r5:00000f40 r4:ebf3e6c0
[   17.049743] [<c05054f4>] (rtnetlink_rcv_msg) from [<c05108e8>] (netlink_rcv_skb+0xb4/0xc8)
[   17.058449]  r8:eab5bdac r7:ec8c9980 r6:c05054f4 r5:ec8c9980 r4:ebf3e6c0
[   17.065534] [<c0510834>] (netlink_rcv_skb) from [<c0504134>] (rtnetlink_rcv+0x24/0x2c)
[   17.073854]  r6:ebd4d000 r5:00000014 r4:ec8c9980 r3:c0504110
[   17.079846] [<c0504110>] (rtnetlink_rcv) from [<c05102ac>] (netlink_unicast+0x180/0x1ec)
[   17.088363]  r4:ed0c6800 r3:c0504110
[   17.092113] [<c051012c>] (netlink_unicast) from [<c0510670>] (netlink_sendmsg+0x2ac/0x380)
[   17.100813]  r10:00000000 r8:00000008 r7:ec8c9980 r6:ebd4d000 r5:eab5be70 r4:eab5bee4
[   17.109083] [<c05103c4>] (netlink_sendmsg) from [<c04dfdb4>] (sock_sendmsg+0x90/0xb0)
[   17.117305]  r10:00000000 r9:eab5a000 r8:becdda3c r7:0000000c r6:ea978400 r5:eab5be70
[   17.125563]  r4:c05103c4
[   17.128225] [<c04dfd24>] (sock_sendmsg) from [<c04e1c28>] (SyS_sendto+0xb8/0xdc)
[   17.136001]  r6:becdda5c r5:00000014 r4:ecd37040
[   17.140876] [<c04e1b70>] (SyS_sendto) from [<c000e680>] (ret_fast_syscall+0x0/0x30)
[   17.148923]  r10:00000000 r8:c000e804 r7:00000122 r6:becdda5c r5:0000000c r4:becdda5c
[   17.157169] ---[ end trace 2b71e15b38f58bad ]---

Fixes: 6423d6df1440 ("ARM: OMAP2+: hwmod: check for module address space during init")
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: dts: i.MX35: Fix can support.

commit e053f96b1a00022b4e2c7ceb7ac0229646626507 upstream.

Since commit 3d42a379b6fa5b46058e3302b1802b29f64865bb
("can: flexcan: add 2nd clock to support imx53 and newer")
the can driver requires a dt nodes to have a second clock.
Add them to imx35 to fix probing the flex can driver on the
respective platforms.

Signed-off-by: Denis Carikli <denis@eukrea.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rbd: fix copyup completion race

commit 2761713d35e370fd640b5781109f753066b746c4 upstream.

For write/discard obj_requests that involved a copyup method call, the
opcode of the first op is CEPH_OSD_OP_CALL and the ->callback is
rbd_img_obj_copyup_callback().  The latter frees copyup pages, sets
->xferred and delegates to rbd_img_obj_callback(), the "normal" image
object callback, for reporting to block layer and putting refs.

rbd_osd_req_callback() however treats CEPH_OSD_OP_CALL as a trivial op,
which means obj_request is marked done in rbd_osd_trivial_callback(),
*before* ->callback is invoked and rbd_img_obj_copyup_callback() has
a chance to run.  Marking obj_request done essentially means giving
rbd_img_obj_callback() a license to end it at any moment, so if another
obj_request from the same img_request is being completed concurrently,
rbd_img_obj_end_request() may very well be called on such prematurally
marked done request:

<obj_request-1/2 reply>
handle_reply()
  rbd_osd_req_callback()
    rbd_osd_trivial_callback()
    rbd_obj_request_complete()
    rbd_img_obj_copyup_callback()
    rbd_img_obj_callback()
                                    <obj_request-2/2 reply>
                                    handle_reply()
                                      rbd_osd_req_callback()
                                        rbd_osd_trivial_callback()
      for_each_obj_request(obj_request->img_request) {
        rbd_img_obj_end_request(obj_request-1/2)
        rbd_img_obj_end_request(obj_request-2/2) <--
      }

Calling rbd_img_obj_end_request() on such a request leads to trouble,
in particular because its ->xfferred is 0.  We report 0 to the block
layer with blk_update_request(), get back 1 for "this request has more
data in flight" and then trip on

    rbd_assert(more ^ (which == img_request->obj_request_count));

with rhs (which == ...) being 1 because rbd_img_obj_end_request() has
been called for both requests and lhs (more) being 1 because we haven't
got a chance to set ->xfferred in rbd_img_obj_copyup_callback() yet.

To fix this, leverage that rbd wants to call class methods in only two
cases: one is a generic method call wrapper (obj_request is standalone)
and the other is a copyup (obj_request is part of an img_request).  So
make a dedicated handler for CEPH_OSD_OP_CALL and directly invoke
rbd_img_obj_copyup_callback() from it if obj_request is part of an
img_request, similar to how CEPH_OSD_OP_READ handler invokes
rbd_img_obj_request_read_callback().

Since rbd_img_obj_copyup_callback() is now being called from the OSD
request callback (only), it is renamed to rbd_osd_copyup_callback().

Cc: Alex Elder <elder@linaro.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: ixp4xx - Remove bogus BUG_ON on scattered dst buffer

commit f898c522f0e9ac9f3177d0762b76e2ab2d2cf9c0 upstream.

This patch removes a bogus BUG_ON in the ablkcipher path that
triggers when the destination buffer is different from the source
buffer and is scattered.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: qat - Fix invalid synchronization between register/unregister sym algs

commit 6f043b50da8e03bdcc5703fd37ea45bc6892432f upstream.

The synchronization method used atomic was bogus.
Use a proper synchronization with mutex.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwrng: core - correct error check of kthread_run call

commit 17fb874dee093139923af8ed36061faa92cc8e79 upstream.

The kthread_run() function can return two different error values
but the hwrng core only checks for -ENOMEM. If the other error
value -EINTR is returned it is assigned to hwrng_fill and later
used on a kthread_stop() call which naturally crashes.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xen/gntdevt: Fix race condition in gntdev_release()

commit 30b03d05e07467b8c6ec683ea96b5bffcbcd3931 upstream.

While gntdev_release() is called the MMU notifier is still registered
and can traverse priv->maps list even if no pages are mapped (which is
the case -- gntdev_release() is called after all). But
gntdev_release() will clear that list, so make sure that only one of
those things happens at the same time.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

commit aa1acff356bbedfd03b544051f5b371746735d89 upstream.

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables. Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix. This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <dvrabel@cantab.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: dapm: Don't add prefix to widget stream name

commit a798c24a69b64f09e2d323ac8155a36373e5d5fd upstream.

Commit fdb6eb0a1287 ("ASoC: dapm: Modify widget stream name according to
prefix") fixed the case where a DAPM route between a DAI widget and a
DAC/ADC/AIF widget with a matching stream name was not created when the
DAPM context was using a prefix.

Unfortunately the patch introduced a few issues on its own like leaking the
dynamically allocated stream name memory and also not checking whether the
allocation succeeded in the first place.

It is also incomplete in that it still does not handle the case where
stream name of the widget is a substring of the stream name of the DAI,
which is explicitly allowed and works fine if no DAPM prefix is used.

Revert the commit and take a slightly different approach to solving the
issue. Instead of comparing the widget's stream name to the name of the DAI
widget compare it to the stream name of the DAI widget. The stream name of
the DAI widget is identical to the name of the DAI widget except that it
wont have the DAPM prefix added. So this approach behaves identical
regardless to whether the DAPM context uses a prefix or not.

We don't have to worry about potentially matching with a widget with the
same stream name, but from a different DAPM context with a different
prefix, since the code already makes sure that both the DAI widget and the
matched widget are from the same DAPM context.

Fixes: fdb6eb0a1287 ("ASoC: dapm: Modify widget stream name according to prefix")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: dapm: Lock during userspace access

commit e50b1e06b79e9d51efbff9627b4dd407184ef43f upstream.

The DAPM lock must be held when accessing the DAPM graph status through
sysfs or debugfs, otherwise concurrent changes to the graph can result in
undefined behaviour.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: pcm1681: Fix setting de-emphasis sampling rate selection

commit fa8173a3ef0570affde7da352de202190b3786c2 upstream.

The de-emphasis sampling rate selection is controlled by BIT[3:4] of
PCM1681_DEEMPH_CONTROL register. Do proper left shift to set it.

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Acked-by: Marek Belisko <marek.belisko@streamunlimited.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: ssm4567: Keep TDM_BCLKS in ssm4567_set_dai_fmt

commit a6c2a32ac83567f15e9af3dcbc73148ce68b2ced upstream.

The regmap_write in ssm4567_set_dai_fmt accidentally clears the
TDM_BCLKS field which was set earlier by ssm4567_set_tdm_slot.

This patch fixes it by using regmap_update_bits with proper mask.

Signed-off-by: Ben Zhang <benzh@chromium.org>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: Intel: Get correct usage_count value to load firmware

commit 412efa73dcd3bd03c1838c91e094533a95529039 upstream.

The usage_count variable was read before it was set to the correct
value, due to which the firmware load was failing. Because of this
IPC messages sent to the firmware were timing out causing a delay
of about 1 second while playing audio from the internal speakers.

With this patch the usage_count is read after the function call
pm_runtime_get_sync which will increment the usage_count variable
and the firmware load is successful and all the IPC messages are
processed correctly.

Signed-off-by: Shilpa Sreeramalu <shilpa.sreeramalu@intel.com>
Signed-off-by: Fang, Yang A <yang.a.fang@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: dts: keystone: fix dt bindings to use post div register for mainpll

commit c1bfa985ded82cacdfc6403e78f329c44e35534a upstream.

All of the keystone devices have a separate register to hold post
divider value for main pll clock. Currently the fixed-postdiv
value used for k2hk/l/e SoCs works by sheer luck as u-boot happens to
use a value of 2 for this. Now that we have fixed this in the pll
clock driver change the dt bindings for the same.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: keystone: add support for post divider register for main pll

commit 02fdfd708fd252a778709beb6c65d5e7360341ac upstream.

Main PLL controller has post divider bits in a separate register in
pll controller. Use the value from this register instead of fixed
divider when available.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sparc64: Fix userspace FPU register corruptions.

[ Upstream commit 44922150d87cef616fd183220d43d8fde4d41390 ]

If we have a series of events from userpsace, with %fprs=FPRS_FEF,
like follows:

ETRAP
ETRAP
VIS_ENTRY(fprs=0x4)
VIS_EXIT
RTRAP (kernel FPU restore with fpu_saved=0x4)
RTRAP

We will not restore the user registers that were clobbered by the FPU
using kernel code in the inner-most trap.

Traps allocate FPU save slots in the thread struct, and FPU using
sequences save the "dirty" FPU registers only.

This works at the initial trap level because all of the registers
get recorded into the top-level FPU save area, and we'll return
to userspace with the FPU disabled so that any FPU use by the user
will take an FPU disabled trap wherein we'll load the registers
back up properly.

But this is not how trap returns from kernel to kernel operate.

The simplest fix for this bug is to always save all FPU register state
for anything other than the top-most FPU save area.

Getting rid of the optimized inner-slot FPU saving code ends up
making VISEntryHalf degenerate into plain VISEntry.

Longer term we need to do something smarter to reinstate the partial
save optimizations. Perhaps the fundament error is having trap entry
and exit allocate FPU save slots and restore register state. Instead,
the VISEntry et al. calls should be doing that work.

This bug is about two decades old.

Reported-by: James Y Knight <jyknight@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: nx - Fix reentrancy bugs

commit 030f4e968741d65aea9cd5f7814d1164967801ef upstream.

This patch fixes a host of reentrancy bugs in the nx driver.  The
following algorithms are affected:

* CCM
* GCM
* CTR
* XCBC
* SHA256
* SHA512

The crypto API allows a single transform to be used by multiple
threads simultaneously.  For example, IPsec will use a single tfm
to process packets for a given SA.  As packets may arrive on
multiple CPUs that tfm must be reentrant.

The nx driver does try to deal with this by using a spin lock.
Unfortunately only the basic AES/CBC/ECB algorithms do this in
the correct way.

The symptom of these bugs may range from the generation of incorrect
output to memory corruption.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: nx - Fixing SHA update bug

commit 10d87b730e1d9f1442cae6487bb3aef8632bed23 upstream.

Bug happens when a data size less than SHA block size is passed.
Since first attempt will be saved in buffer, second round attempt
get into two step to calculate op.inlen and op.outlen. The issue
resides in this step. A wrong value of op.inlen and outlen was being
calculated.

This patch fix this eliminate the nx_sha_build_sg_list, that is
useless in SHA's algorithm context. Instead we call nx_build_sg_list
directly and pass a previous calculated max_sg_len to it.

Signed-off-by: Leonidas S. Barbosa <leosilva@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: nx - Fixing NX data alignment with nx_sg list

commit c3365ce130e50176533debe1cabebcdb8e61156c upstream.

In NX we need to pass always a 16 multiple size nx_sg_list to
co processor. Trim function handle with this assuring all nx_sg_lists
are 16 multiple size, although data was not being considerated when
crop was done. It was causing an unalignment between size of the list
and data, corrupting csbcpb fields returning a -23 H_ST_PARM error, or
invalid operation.

This patch fix this recalculating how much data should be put back
in to_process variable what assures the size of sg_list will be
correct with size of the data.

Signed-off-by: Leonidas S. Barbosa <leosilva@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dmaengine: at_xdmac: fix transfer data width in at_xdmac_prep_slave_sg()

commit 1c8a38b1268aebc1a903b21b11575077e02d2cf7 upstream.

This patch adds the missing update of the transfer data width in
at_xdmac_prep_slave_sg().

Indeed, for each item in the scatter-gather list, we check whether the
transfer length is aligned with the data width provided by
dmaengine_slave_config(). If so, we directly use this data width for the
current part of the transfer we are preparing. Otherwise, the data width
is reduced to 8 bits (1 byte). Of course, the actual number of register
accesses must also be updated to match the new data width.

So one chunk was missing in the original patch (see Fixes tag below): the
number of register accesses was correctly set to (len >> fixed_dwidth) in
mbr_ubc but the real data width was not updated in mbr_cfg. Since mbr_cfg
may change for each part of the scatter-gather transfer this also explains
why the original patch used the Descriptor View 2 instead of the
Descriptor View 1.

Let's take the example of a DMA transfer to write 8bit data into an Atmel
USART with FIFOs. When FIFOs are enabled in the USART, its Transmit
Holding Register (THR) works in multidata mode, that is to say that up to
4 8bit data can be written into the THR in a single 32bit access and it is
still possible to write only one data with a 8bit access. To take
advantage of this new feature, the DMA driver was modified to allow
multiple dwidths when doing slave transfers.
For instance, when the total length is 22 bytes, the USART driver splits
the transfer into 2 parts:

First part: 20 bytes transferred through 5 32bit writes into THR
Second part: 2 bytes transferred though 2 8bit writes into THR

For the second part, the data width was first set to 4_BYTES by the USART
driver thanks to dmaengine_slave_config() then at_xdmac_prep_slave_sg()
reduces this data width to 1_BYTE because the 2 byte length is not aligned
with the original 4_BYTES data width. Since the data width is modified,
the actual number of writes into THR must be set accordingly.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Fixes: 6d3a7d9e3ada ("dmaengine: at_xdmac: allow muliple dwidths when doing slave transfers")
Cc: stable@vger.kernel.org #4.0 and later
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection

commit 810bc075f78ff2c221536eb3008eac6a492dba2d upstream.

We have a tricky bug in the nested NMI code: if we see RSP
pointing to the NMI stack on NMI entry from kernel mode, we
assume that we are executing a nested NMI.

This isn't quite true.  A malicious userspace program can point
RSP at the NMI stack, issue SYSCALL, and arrange for an NMI to
happen while RSP is still pointing at the NMI stack.

Fix it with a sneaky trick.  Set DF in the region of code that
the RSP check is intended to detect.  IRET will clear DF
atomically.

( Note: other than paravirt, there's little need for all this
  complexity. We could check RIP instead of RSP. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/nmi/64: Reorder nested NMI checks

commit a27507ca2d796cfa8d907de31ad730359c8a6d06 upstream.

Check the repeat_nmi .. end_repeat_nmi special case first.  The
next patch will rework the RSP check and, as a side effect, the
RSP check will no longer detect repeat_nmi .. end_repeat_nmi, so
we'll need this ordering of the checks.

Note: this is more subtle than it appears.  The check for
repeat_nmi .. end_repeat_nmi jumps straight out of the NMI code
instead of adjusting the "iret" frame to force a repeat.  This
is necessary, because the code between repeat_nmi and
end_repeat_nmi sets "NMI executing" and then writes to the
"iret" frame itself.  If a nested NMI comes in and modifies the
"iret" frame while repeat_nmi is also modifying it, we'll end up
with garbage.  The old code got this right, as does the new
code, but the new code is a bit more explicit.

If we were to move the check right after the "NMI executing"
check, then we'd get it wrong and have random crashes.

( Because the "NMI executing" check would jump to the code that would
  modify the "iret" frame without checking if the interrupted NMI was
  currently modifying it. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>