www.infradead.org Git - users/jedix/linux-maple.git/log

Merge branch topic/uek-4.1/sparc of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/sparc: (21 commits)
  SPARC64: PORT LDMVSW DRIVER TO UEK4
  SPARC64: Fix bad FP register calculation
  SPARC64: Respect no-fault ASI for floating exceptions
  sparc64: Fixes NUMA node cpulist sysfs file in single NUMA node case.
  sparc64: Cleans up PRIQ error and debugging messages.
  sparc: Remove console spam during kdump
  sparc64: kdump: set crashing_cpu for panic
  sparc: kexec: Don't mess with the tl register
  sparc64: VDS should try indefinitely to allocate IO pages
  sparc64: Use block layer BIO-based interface for VDC IO requests
  sparc64: Enable virtual disk protocol out of order execution
  ipmi: Fix NULL pointer access and double free panic.
  ipmi: Update ipmi driver as per new vldc interface
  ipmi: Fix ipmi driver for ilom reset scenario
  sparc64: vcc fixes
  sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
  sparc64: Initialize xl_hugepage_shift to 0
  sparc64:mm/hugetlb: Set correct huge_pte_count index for 8M hugepages
  sparc64: Fix accounting issues used to size TSBs
  sparc64: Fix irq stack bootmem allocation.
  ...

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/drivers:
ixgbevf: Change the relaxed order settings in VF driver for sparc

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks:
  mm/hugetlb: hugetlb_no_page: rate-limit warning message
  net/vxlan: Fix kernel unaligned access in __vxlan_find_mac
  kexec: align crash_notes allocation to make it be inside one physical page
  iommu-common: Fix error code used in iommu_tbl_range_{alloc,free}().

mm/hugetlb: hugetlb_no_page: rate-limit warning message

The warning message "killed due to inadequate hugepage pool" simply
indicates that SIGBUS was sent, not that the process was forcibly killed.
If the process has a signal handler installed does not fix the problem,
this message can rapidly spam the kernel log.

On my amd64 dev machine that does not have hugepages configured, I can
reproduce the repeated warnings easily by setting vm.nr_hugepages=2 (i.e.,
4 megabytes of huge pages) and running something that sets a signal
handler and forks, like

  #include <sys/mman.h>
  #include <signal.h>
  #include <stdlib.h>
  #include <unistd.h>

  sig_atomic_t counter = 10;
  void handler(int signal)
  {
      if (counter-- == 0)
         exit(0);
  }

  int main(void)
  {
      int status;
      char *addr = mmap(NULL, 4 * 1048576, PROT_READ | PROT_WRITE,
              MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
      if (addr == MAP_FAILED) {perror("mmap"); return 1;}
      *addr = 'x';
      switch (fork()) {
         case -1:
            perror("fork"); return 1;
         case 0:
            signal(SIGBUS, handler);
            *addr = 'x';
            break;
         default:
            *addr = 'x';
            wait(&status);
            if (WIFSIGNALED(status)) {
               psignal(WTERMSIG(status), "child");
            }
            break;
      }
  }

Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 25159252
mainline v4.5 commit 910154d520c97cd0095a889e6b878041c91111a6
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

net/vxlan: Fix kernel unaligned access in __vxlan_find_mac

Orabug: 24593619

Backport of upstream commit 7177a3b037c7 ("net/vxlan: Fix kernel
unaligned access in __vxlan_find_mac")

__vxlan_find_mac invokes ether_addr_equal on the eth_addr field,
which triggers unaligned access messages, so rearrange vxlan_fdb
to avoid this in the most non-intrusive way.

Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit aa51c7b55e350240c2ed5a8a217688d5bfd13424)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

kexec: align crash_notes allocation to make it be inside one physical page

Port of mainline commit bbb78b8f3

Orabug: 25071982

Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit 45cbd27626d17d485ca0718b28add12513690921)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

iommu-common: Fix error code used in iommu_tbl_range_{alloc,free}().

The value returned from iommu_tbl_range_alloc() (and the one passed
in as a fourth argument to iommu_tbl_range_free) is not a DMA address,
it is rather an index into the IOMMU page table.

Therefore using DMA_ERROR_CODE is not appropriate.

Use a more type matching error code define, IOMMU_ERROR_CODE, and
update all users of this interface.

Orabug: 25159279

Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

ixgbevf: Change the relaxed order settings in VF driver for sparc

We noticed performance issues with VF interface on sparc compared
to PF. Setting the RX to IXGBE_DCA_RXCTRL_DATA_WRO_EN brings it
on far with PF. Also this matches to the default sparc settings in
PF driver.

Orabug: 23284026

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 0dbe77b7cfaa4d2867550ff02727549ba957ea8b)

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/rpm-build:
uek-rpm nano: Remove vmware modules from UEK Nano
uek-rpm:enable ldmvsw module

Merge branch topic/uek-4.1/dtrace of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/dtrace:
  dtrace: eliminate need for arg counting in sdt macros
  dtrace: augment SDT probes with type information
  dtrace: import the sdt type information into per-sdt_probedesc state
  dtrace: record SDT and perf probe types in a new ELF section

Merge branch topic/uek-4.1/kernel-generic of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/kernel-generic:
x86/hpet: Reduce HPET counter read contention

Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/ofed:
  mlx4: avoid multiple free on id_map_ent
  xsigo: supported SGE's for LSO QP
  xsigo: Hardening driver in handling remote QP failures
  IB/cm: avoid query device in CM REQ/REP
  IB/cm: return original rnr value when RNR WA for PSIF
  IB/cm: MBIT needs to be used in network order
  IB/core: Issue DREQ when receiving REQ/REP for stale QP
  sif: cq: cleanup cqe once a kernel qp is destroyed/reset
  sif: cq: sif_poll_cq might not drain cq completely
  sif: rq: do not flush rq if it is an srq
  sif: cq: use refcnt to disable/enable cq polling
  sif: pt: Add support for single thread modified page tables
  sif: pqp: Implement handling of PQPs in error.
  sif: eps*: initialize each struct in array
  sif: query_device: Return correct #SGEs for EoIB
  sif: LSO not supported for EoIB queuepairs
  sif: pqp: Make setup/teardown function ref sif_pqp_info directly
  sif: Move the rest of the pqp setup and teardown to sif_pqp
  sif: Move sif_dfs_register beyond base init
  sif: Refactor PQP state out of sif_dev.

Merge branch 'topic/uek-4.1/drivers' of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/drivers:
  NVMe: reduce queue depth as workaround for Samsung EPIC SQ errata
  bonding: "primary_reselect" with "failure" is not working properly
  IB/core: uverbs: Do not wait for uverbs dev closure during forced system shutdown.

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks:
  Bluetooth: Fix potential NULL dereference in RFCOMM bind callback
  aacraid: Check size values after double-fetch from user
  mm: migrate dirty page without clear_page_dirty_for_io etc
  xen-netfront: cast grant table reference first to type int
  xen-netfront: do not cast grant table reference to signed short

Merge branch topic/uek-4.1/stable-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/stable-cherry-picks: (21 commits)
  ocfs2: fix not enough credit panic
  ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
  ocfs2/dlm: fix race between convert and migration
  ocfs2: solve a problem of crossing the boundary in updating backups
  ocfs2: use spinlock_irqsave() to downconvert lock in ocfs2_osb_dump()
  ocfs2: access orphan dinode before delete entry in ocfs2_orphan_del
  ocfs2/dlm: do not insert a new mle when another process is already migrating
  ocfs2: fix slot overwritten if storage link down during mount
  ocfs2/dlm: return appropriate value when dlm_grab() returns NULL
  ocfs2/dlm: wait until DLM_LOCK_RES_SETREF_INPROG is cleared in dlm_deref_lockres_worker
  ocfs2/dlm: fix a race between purge and migration
  ocfs2/dlm: clear migration_pending when migration target goes down
  ocfs2: fix BUG when calculate new backup super
  ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error
  ocfs2: fix race between mount and delete node/cluster
  ocfs2/dlm: unlock lockres spinlock before dlm_lockres_put
  ocfs2: avoid access invalid address when read o2dlm debug messages
  ocfs2: fix a tiny case that inode can not removed
  ocfs2: trusted xattr missing CAP_SYS_ADMIN check
  ocfs2: set filesytem read-only when ocfs2_delete_entry failed.
  ...

mlx4: avoid multiple free on id_map_ent

orabug: 25022868
quickref: 24486198 (uek2) 25022856 (uek3)

Double free is found on id_map_ent.
Existing code makes another queue_delayed_work regardless if the worker was
successfully canceled or not. In case it's not, means the worker routine already
started to run, and then the later queued work will do the same work against
the same id_map_ent stucture. The worker routine actually does cleanup/free
on the structure, so the 2nd run of it is dengerous.

Fix is that we check if we successfully canceled previously queued work, if
that's not canceled, we don't queue the work again on the same structure.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>

Bluetooth: Fix potential NULL dereference in RFCOMM bind callback

Orabug: 25058887
CVE: CVE-2015-8956

addr can be NULL and it should not be dereferenced before NULL checking.

Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
(cherry picked from commit 951b6a0717db97ce420547222647bcc40bf1eacd)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

ocfs2: fix not enough credit panic

The following panic was caught when run ocfs2 disconfig single test
(block size 512 and cluster size 8192). ocfs2_journal_dirty() return
-ENOSPC, that means credits were used up. The total credit should
include 3 times of "num_dx_leaves" from ocfs2_dx_dir_rebalance(),
because 2 times will be consumed in ocfs2_dx_dir_transfer_leaf() and
1 time will be consumed in ocfs2_dx_dir_new_cluster()->
__ocfs2_dx_dir_new_cluster()->ocfs2_dx_dir_format_cluster(). But only
two times is included in ocfs2_dx_dir_rebalance_credits(), fix it.

[34377.331151] ------------[ cut here ]------------
[34377.332007] kernel BUG at fs/ocfs2/journal.c:775!
[34377.344107] invalid opcode: 0000 [#1] SMP
[34377.346090] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport acpi_cpufreq i2c_piix4 i2c_core pcspkr ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
[34377.346090] CPU: 2 PID: 10601 Comm: dd Not tainted 4.1.12-71.el6uek.bug24939243.x86_64 #2
[34377.346090] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
[34377.346090] task: ffff8800b6de6200 ti: ffff8800a7d48000 task.ti: ffff8800a7d48000
[34377.346090] RIP: 0010:[<ffffffffa06e7397>]  [<ffffffffa06e7397>] ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
[34377.346090] RSP: 0018:ffff8800a7d4b6d8  EFLAGS: 00010286
[34377.346090] RAX: 00000000ffffffe4 RBX: 00000000814d0a9c RCX: 00000000000004f9
[34377.346090] RDX: ffffffffa008e990 RSI: ffffffffa008f1ee RDI: ffff8800622b6460
[34377.346090] RBP: ffff8800a7d4b6f8 R08: ffffffffa008f288 R09: ffff8800622b6460
[34377.346090] R10: 0000000000000000 R11: 0000000000000282 R12: 0000000002c8421e
[34377.346090] R13: ffff88006d0cad00 R14: ffff880092beef60 R15: 0000000000000070
[34377.346090] FS:  00007f9b83e92700(0000) GS:ffff8800be880000(0000) knlGS:0000000000000000
[34377.346090] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[34377.346090] CR2: 00007fb2c0d1a000 CR3: 0000000008f80000 CR4: 00000000000406e0
[34377.346090] Stack:
[34377.346090]  00000000814d0a9c ffff88005fe61e00 ffff88006e995c00 ffff880009847c00
[34377.346090]  ffff8800a7d4b768 ffffffffa06c0999 ffff88009d3c2a10 ffff88005fe61e30
[34377.346090]  ffff8800997ce500 ffff8800997ce980 ffff880092beef60 000000100000000e
[34377.346090] Call Trace:
[34377.346090]  [<ffffffffa06c0999>] ocfs2_dx_dir_transfer_leaf+0x159/0x1a0 [ocfs2]
[34377.346090]  [<ffffffffa06c3eeb>] ocfs2_dx_dir_rebalance+0xd9b/0xea0 [ocfs2]
[34377.346090]  [<ffffffffa06dedb2>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]
[34377.346090]  [<ffffffffa0761180>] ? ocfs2_refcount_tree_et_ops+0x60/0xfffffffffffe4b31 [ocfs2]
[34377.346090]  [<ffffffffa06e7730>] ? ocfs2_journal_access_dl+0x20/0x20 [ocfs2]
[34377.346090]  [<ffffffffa06c6b63>] ocfs2_find_dir_space_dx+0xd3/0x300 [ocfs2]
[34377.346090]  [<ffffffffa06c9709>] ocfs2_prepare_dx_dir_for_insert+0x219/0x450 [ocfs2]
[34377.346090]  [<ffffffffa06c9b16>] ocfs2_prepare_dir_for_insert+0x1d6/0x580 [ocfs2]
[34377.346090]  [<ffffffffa06dee90>] ? ocfs2_read_inode_block+0x10/0x20 [ocfs2]
[34377.346090]  [<ffffffffa06f38e2>] ocfs2_mknod+0x5a2/0x1400 [ocfs2]
[34377.346090]  [<ffffffffa06f4933>] ocfs2_create+0x73/0x180 [ocfs2]
[34377.346090]  [<ffffffff81211de8>] vfs_create+0xd8/0x100
[34377.346090]  [<ffffffff8120f5fd>] ? lookup_real+0x1d/0x60
[34377.346090]  [<ffffffff81212535>] lookup_open+0x185/0x1c0
[34377.346090]  [<ffffffff8121571d>] do_last+0x36d/0x780
[34377.346090]  [<ffffffff812a85d6>] ? security_file_alloc+0x16/0x20
[34377.346090]  [<ffffffff81215bc2>] path_openat+0x92/0x470
[34377.346090]  [<ffffffff81215fea>] do_filp_open+0x4a/0xa0
[34377.346090]  [<ffffffff8132c570>] ? find_next_zero_bit+0x10/0x20
[34377.346090]  [<ffffffff812232ec>] ? __alloc_fd+0xac/0x150
[34377.346090]  [<ffffffff8120459a>] do_sys_open+0x11a/0x230
[34377.346090]  [<ffffffff810259d3>] ? syscall_trace_enter_phase1+0x153/0x180
[34377.346090]  [<ffffffff812046ee>] SyS_open+0x1e/0x20
[34377.346090]  [<ffffffff816cb6ee>] system_call_fastpath+0x12/0x71
[34377.346090] Code: 1d 3f 29 09 00 48 85 db 74 1f 48 8b 03 0f 1f 80 00 00 00 00 48 8b 7b 08 48 83 c3 10 4c 89 e6 ff d0 48 8b 03 48 85 c0 75 eb eb 90 <0f> 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54
[34377.346090] RIP  [<ffffffffa06e7397>] ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
[34377.346090]  RSP <ffff8800a7d4b6d8>
[34377.615401] ---[ end trace 91ac5312a6ee1288 ]---
[34377.618919] Kernel panic - not syncing: Fatal exception
[34377.619910] Kernel Offset: disabled

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally.

In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
there are 2 process repeatedly performing the following operations
respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a',
1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then
ftruncate(fd, CLUSTER_SIZE) again and again.

This is the backtrace when the deadlock happens:

   __wait_on_bit_lock+0x50/0xa0
   __lock_page+0xb7/0xc0
   ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
   ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
   do_page_mkwrite+0x66/0xc0
   handle_mm_fault+0x685/0x1350
   __do_page_fault+0x1d8/0x4d0
   trace_do_page_fault+0x37/0xf0
   do_async_page_fault+0x19/0x70
   async_page_fault+0x28/0x30

In ocfs2_write_begin_nolock(), we first grab the pages and then allocate
disk space for this write; ocfs2_try_to_free_truncate_log() will be
called if -ENOSPC is returned; if we're lucky to get enough clusters,
which is usually the case, we start over again.

But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we
will deadlock when trying to grab the target page again.

Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write().
Another deadlock will happen in __do_page_mkwrite() if
ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a
locked target page.

These two errors fail on the same path, so fix them by unlocking the
target page manually before ocfs2_free_write_ctxt().

Jan Kara helps me clear out the JBD2 part, and suggest the hint for root
cause.

Changes since v1:
1. Also put ENOMEM error case into consideration.

Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: He Gang <ghe@suse.com>
Acked-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c33f0785bf292cf1d15f4fbe42869c63e205b21c)

Conflicts:

fs/ocfs2/aops.c

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: fix race between convert and migration

Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
checks if lockres master has changed to identify whether new master has
finished recovery or not. This will introduce a race that right after
old master does umount ( means master will change), a new convert
request comes.

In this case, it will reset lockres state to DLM_RECOVERING and then
retry convert, and then fail with lockres->l_action being set to
OCFS2_AST_INVALID, which will cause inconsistent lock level between
ocfs2 and dlm, and then finally BUG.

Since dlm recovery will clear lock->convert_pending in
dlm_move_lockres_to_recovery_list, we can use it to correctly identify
the race case between convert and recovery. So fix it.

Fixes: ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.com
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e6f0c6e6170fec175fe676495f29029aecdf486c)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: solve a problem of crossing the boundary in updating backups

In update_backups() there exists a problem of crossing the boundary as
follows:

we assume that lun will be resized to 1TB(cluster_size is 32kb), it will
include 0~33554431 cluster, in update_backups func, it will backup super
block in location of 1TB which is the 33554432th cluster, so the
phenomenon of crossing the boundary happens.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Xue jiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 584dca3440732afa84fbca07567bb66e1453936a)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: use spinlock_irqsave() to downconvert lock in ocfs2_osb_dump()

Commit a75e9ccabd92 ("ocfs2: use spinlock irqsave for downconvert lock")
missed an unmodified place in ocfs2_osb_dump(), so it still exists a
deadlock scenario.

    ocfs2_wake_downconvert_thread
    ocfs2_rw_unlock
    ocfs2_dio_end_io
    dio_complete
    .....
    bio_endio
    req_bio_endio
    ....
    scsi_io_completion
    blk_done_softirq
    __do_softirq
    do_softirq
    irq_exit
    do_IRQ
    ocfs2_osb_dump
    cat /sys/kernel/debug/ocfs2/${uuid}/fs_state

This patch still uses spin_lock_irqsave() - replace spin_lock() to solve
this situation.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit bfd97a0320d338b2fce422adeddd512466ef2390)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: access orphan dinode before delete entry in ocfs2_orphan_del

In ocfs2_orphan_del, currently it finds and deletes entry first, and
then access orphan dir dinode.  This will have a problem once
ocfs2_journal_access_di fails.  In this case, entry will be removed from
orphan dir, but in deed the inode hasn't been deleted successfully.  In
other words, the file is missing but not actually deleted.  So we should
access orphan dinode first like unlink and rename.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 074a6c655f6da12cb1123c8a84bfd8d781138800)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: do not insert a new mle when another process is already migrating

When two processes are migrating the same lockres,
dlm_add_migration_mle() return -EEXIST, but insert a new mle in hash
list. dlm_migrate_lockres() will detach the old mle and free the new
one which is already in hash list, that will destroy the list.

Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 32e493265b2be96404aaa478fb2913be29b06887)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix slot overwritten if storage link down during mount

The following case will lead to slot overwritten.

N1                               N2
mount ocfs2 volume, find and
allocate slot 0, then set
osb->slot_num to 0, begin to
write slot info to disk
                                 mount ocfs2 volume, wait for super lock
write block fail because of
storage link down, unlock
super lock
                                 got super lock and also allocate slot 0
                                 then unlock super lock

mount fail and then dismount,
since osb->slot_num is 0, try to
put invalid slot to disk. And it
will succeed if storage link
restores.
                                 N2 slot info is now overwritten

Once another node say N3 mount, it will find and allocate slot 0 again,
which will lead to mount hung because journal has already been locked by
N2.  so when write slot info failed, invalidate slot in advance to avoid
overwrite slot.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1247017f43a93eae3d64b7c25f3637dc545f5a47)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: return appropriate value when dlm_grab() returns NULL

dlm_grab() may return NULL when the node is doing unmount.  When doing
code review, we found that some dlm handlers may return error to caller
when dlm_grab() returns NULL and make caller BUG or other problems.
Here is an example:

Node 1                                 Node 2
receives migration message
from node 3, and send
migrate request to others
                                     start unmounting

                                     receives migrate request
                                     from node 1 and call
                                     dlm_migrate_request_handler()

                                     unmount thread unregisters
                                     domain handlers and removes
                                     dlm_context from dlm_domains

                                     dlm_migrate_request_handlers()
                                     returns -EINVAL to node 1
Exit migration neither clearing the
migration state nor sending
assert master message to node 3 which
cause node 3 hung.

Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c372f2193a2e73d5936bf37259ae63ca388b4cbc)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: wait until DLM_LOCK_RES_SETREF_INPROG is cleared in dlm_deref_lockres_worker

Commit f3f854648de6 ("ocfs2_dlm: Ensure correct ordering of set/clear
refmap bit on lockres") still exists a race which can't ensure the
ordering is exactly correct.

Node1               Node2                    Node3
umount, migrate
lockres to Node2
                    migrate finished,
                    send migrate request
                    to Node3
                                              received migrate request,
                                              create a migration_mle,
                                              respond to Node2.
                    set DLM_LOCK_RES_SETREF_INPROG
                    and send assert master to
                    Node3
                                              delete migration_mle in
                                              assert_master_handler,
                                              Node3 umount without response
                                              dlm_thread purge
                                              this lockres, send drop
                                              deref message to Node2
                    found the flag of
                    DLM_LOCK_RES_SETREF_INPROG
                    is set, dispatch
                    dlm_deref_lockres_worker to
                    clear refmap, but in function of
                    dlm_deref_lockres_worker,
                    only if node in refmap it wait
                    DLM_LOCK_RES_SETREF_INPROG
                    to be cleared. So worker is
                    done successfully

                                              purge lockres, send
                                              assert master response
                                              to Node1, and finish umount
                    set Node3 in refmap, and it
                    won't be cleared forever, thus
                    lead to umount hung

so wait until DLM_LOCK_RES_SETREF_INPROG is cleared in
dlm_deref_lockres_worker.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b5560143385e18b4109ad6951c7719705e3dd995)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: fix a race between purge and migration

We found a race between purge and migration when doing code review.
Node A put lockres to purgelist before receiving the migrate message
from node B which is the master.  Node A call dlm_mig_lockres_handler to
handle this message.

dlm_mig_lockres_handler
  dlm_lookup_lockres
  >>>>>> race window, dlm_run_purge_list may run and send
         deref message to master, waiting the response
  spin_lock(&res->spinlock);
  res->state |= DLM_LOCK_RES_MIGRATING;
  spin_unlock(&res->spinlock);
  dlm_mig_lockres_handler returns

  >>>>>> dlm_thread receives the response from master for the deref
  message and triggers the BUG because the lockres has the state
  DLM_LOCK_RES_MIGRATING with the following message:

dlm_purge_lockres:209 ERROR: 6633EB681FA7474A9C280A4E1A836F0F: res
M0000000000000000030c0300000000 in use after deref

Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 30bee898f86506893883ffb8db20d8101a29b5f5)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: clear migration_pending when migration target goes down

We have found a BUG on res->migration_pending when migrating lock
resources.  The situation is as follows.

dlm_mark_lockres_migration
  res->migration_pending = 1;
  __dlm_lockres_reserve_ast
  dlm_lockres_release_ast returns with res->migration_pending remains
      because other threads reserve asts
  wait dlm_migration_can_proceed returns 1
  >>>>>>> o2hb found that target goes down and remove target
          from domain_map
  dlm_migration_can_proceed returns 1
  dlm_mark_lockres_migrating returns -ESHOTDOWN with
      res->migration_pending still remains.

When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON
with res->migration_pending.  So clear migration_pending when target is
down.

Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit cc28d6d80f6ab494b10f0e2ec949eacd610f66e3)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix BUG when calculate new backup super

When resizing, it firstly extends the last gd.  Once it should backup
super in the gd, it calculates new backup super and update the
corresponding value.

But it currently doesn't consider the situation that the backup super is
already done.  And in this case, it still sets the bit in gd bitmap and
then decrease from bg_free_bits_count, which leads to a corrupted gd and
trigger the BUG in ocfs2_block_group_set_bits:

    BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);

So check whether the backup super is done and then do the updates.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5c9ee4cbf2a945271f25b89b137f2c03bbc3be33)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error

In ocfs2_mknod_locked if '__ocfs2_mknod_locke d' returns an error, we
should reclaim the inode successfully claimed above, otherwise, the
inode never be reused. The case is described below:

ocfs2_mknod
    ocfs2_mknod_locked
        ocfs2_claim_new_inode
                Successfully claim the inode
        __ocfs2_mknod_locked
            ocfs2_journal_access_di
            Failed because of -ENOMEM or other reasons, the inode
                        lockres has not been initialized yet.

    iput(inode)
        ocfs2_evict_inode
            ocfs2_delete_inode
                ocfs2_inode_lock
                    ocfs2_inode_lock_full_nested
                        __ocfs2_cluster_lock
                                Return -EINVAL because of the inode
                                lockres has not been initialized.

                So the following operations are not performed
                ocfs2_wipe_inode
                        ocfs2_remove_inode
                                ocfs2_free_dinode
                                        ocfs2_free_suballoc_bits

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b1529a41f777a48f95d4af29668b70ffe3360e1b)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix race between mount and delete node/cluster

There is a race case between mount and delete node/cluster, which will
lead o2hb_thread to malfunctioning dead loop.

    o2hb_thread
    {
        o2nm_depend_this_node();
        <<<<<< race window, node may have already been deleted, and then
               enter the loop, o2hb thread will be malfunctioning
               because of no configured nodes found.
        while (!kthread_should_stop() &&
               !reg->hr_unclean_stop && !reg->hr_aborted_start) {
    }

So check the return value of o2nm_depend_this_node() is needed.  If node
has been deleted, do not enter the loop and let mount fail.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 0986fe9b50f425ec81f25a1a85aaf3574b31d801)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2/dlm: unlock lockres spinlock before dlm_lockres_put

dlm_lockres_put will call dlm_lockres_release if it is the last
reference, and then it may call dlm_print_one_lock_resource and
take lockres spinlock.

So unlock lockres spinlock before dlm_lockres_put to avoid deadlock.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b67de018b37a97548645a879c627d4188518e907)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: avoid access invalid address when read o2dlm debug messages

The following case will lead to a lockres is freed but is still in use.

cat /sys/kernel/debug/o2dlm/locking_state dlm_thread
lockres_seq_start
    -> lock dlm->track_lock
    -> get resA
                                                resA->refs decrease to 0,
                                                call dlm_lockres_release,
                                                and wait for "cat" unlock.
Although resA->refs is already set to 0,
increase resA->refs, and then unlock
                                                lock dlm->track_lock
                                                    -> list_del_init()
                                                    -> unlock
                                                    -> free resA

In such a race case, invalid address access may occurs.  So we should
delete list res->tracking before resA->refs decrease to 0.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f57a22ddecd6f26040a67e2c12880f98f88b6e00)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix a tiny case that inode can not removed

When running dirop_fileop_racer we found a case that inode
can not removed.

Two nodes, say Node A and Node B, mount the same ocfs2 volume.  Create
two dirs /race/1/ and /race/2/ in the filesystem.

  Node A                            Node B
  rm -r /race/2/
                                    mv /race/1/ /race/2/
  call ocfs2_unlink(), get
  the EX mode of /race/2/
                                    wait for B unlock /race/2/
  decrease i_nlink of /race/2/ to 0,
  and add inode of /race/2/ into
  orphan dir, unlock /race/2/
                                    got EX mode of /race/2/. because
                                    /race/1/ is dir, so inc i_nlink
                                    of /race/2/ and update into disk,
                                    unlock /race/2/
  because i_nlink of /race/2/
  is not zero, this inode will
  always remain in orphan dir

This patch fixes this case by test whether i_nlink of new dir is zero.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Xue jiufei <xuejiufei@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 928dda1f9433f024ac48c3d97ae683bf83dd0e42)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: trusted xattr missing CAP_SYS_ADMIN check

The trusted extended attributes are only visible to the process which
hvae CAP_SYS_ADMIN capability but the check is missing in ocfs2
xattr_handler trusted list. The check is important because this will be
used for implementing mechanisms in the userspace for which other
ordinary processes should not have access to.

Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Taesoo kim <taesoo@gatech.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 0f5e7b41f91814447defc34e915fc5d6e52266d9)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: set filesytem read-only when ocfs2_delete_entry failed.

In ocfs2_rename, it will lead to an inode with two entried(old and new) if
ocfs2_delete_entry(old) failed.  Thus, filesystem will be inconsistent.

The case is described below:

ocfs2_rename
    -> ocfs2_start_trans
    -> ocfs2_add_entry(new)
    -> ocfs2_delete_entry(old)
        -> __ocfs2_journal_access *failed* because of -ENOMEM
    -> ocfs2_commit_trans

So filesystem should be set to read-only at the moment.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 807a7907114c7c703017ed7a96477a2eeb0d08e0)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: fix NULL pointer dereference in function ocfs2_abort_trigger()

ocfs2_abort_trigger() use bh->b_assoc_map to get sb. But there's no
function to set bh->b_assoc_map in ocfs2, it will trigger NULL pointer
dereference while calling this function. We can get sb from
bh->b_bdev->bd_super instead of b_assoc_map.

[akpm@linux-foundation.org: update comment, per Joseph]
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 74e364ad1b13fd518a0bd4e5aec56d5e8706152f)

Orabug: 24939243

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

xsigo: supported SGE's for LSO QP

Orabug: 25029868

PSIF supports 15 SGE's in case LSO QP linearize skb
if number of frags is greater than allowed sge's

Reported-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: viswa krishnamurthy <viswa.krishnamurthy@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

xsigo: Hardening driver in handling remote QP failures

Orabug: 24929076

Handle scenario's where PSIF generates batched
transmit completions.

In case of Remote QP disconnect follow this sequence
-> If there are any pending transmit completion
explicitly transition QP state to ERROR.
-> Wait for a maximum of 10 seconds for all pending
completions
(10 seconds derived from retrycount * local ack timeout)
Destroy QP
Wait for another 10 seconds(max) if completions are
not returned by hardware.

Synchronized calls to poll_tx

Added more efficiency in handling of TX queue full condtion.

Handle scenario where uVNIC removal can come in batches

Reported-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: Aravind Kini <aravind.kini@oracle.com>
Reviewed-by: viswa krishnamurthy <viswa.krishnamurthy@oracle.com>
Reviewed-by: Manish Kumar Singh <mk.singh@oracle.com>
Reviewed-by: Ariel cohen <ariel.x.cohen@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

aacraid: Check size values after double-fetch from user

Orabug: 25060030

In aacraid's ioctl_send_fib() we do two fetches from userspace, one the
get the fib header's size and one for the fib itself. Later we use the
size field from the second fetch to further process the fib. If for some
reason the size from the second fetch is different than from the first
fix, we may encounter an out-of- bounds access in aac_fib_send(). We
also check the sender size to insure it is not out of bounds. This was
reported in https://bugzilla.kernel.org/show_bug.cgi?id=116751 and was
assigned CVE-2016-6480.

Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Fixes: 7c00ffa31 '[SCSI] 2.6 aacraid: Variable FIB size (updated patch)'
Cc: stable@vger.kernel.org
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit fa00c437eef8dc2e7b25f8cd868cfa405fcc2bb3)
Signed-off-by: Dan Duval <dan.duval@oracle.com>

NVMe: reduce queue depth as workaround for Samsung EPIC SQ errata

Orabug: 25138123

Oracle discovered that the NVMe driver gets SQ completion errors eventually
leading to the device being reset, taken out of the PCI bus tree or kernel
panics when using the default SQ size of 1024 entries (64KB) for Samsung
EPIC NVMe SSDs.

PCIe analyzer tracing by Oracle and Samsung revealed an errata in Samsung's
firmware for EPIC SSDs where these invalid completion entries can occur
when the queues straddle an 8MB DMA address boundary.

This patch works around the errata by detecting these specific devices and
limiting their descriptor queue depth to 64. This is only for the Samsung
NVMe controllers used in Oracle X-series servers.

There was no noticeable performance impact of reducing queue depths to 64
for these Samsung drives, Oracle X6-2 server, and Oracle VM Server 3.4.2.

Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Signed-off-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

x86/hpet: Reduce HPET counter read contention

On a large system with many CPUs, using HPET as the clock source can
have a significant impact on the overall system performance because
of the following reasons:
1) There is a single HPET counter shared by all the CPUs.
2) HPET counter reading is a very slow operation.

Using HPET as the default clock source may happen when, for example,
the TSC clock calibration exceeds the allowable tolerance. Something
the performance slowdown can be so severe that the system may crash
because of a NMI watchdog soft lockup, for example.

During the TSC clock calibration process, the default clock source
will be set temporarily to HPET. For systems with many CPUs, it is
possible that NMI watchdog soft lockup may occur occasionally during
that short time period where HPET clocking is active as is shown in
the kernel log below:

[   71.646504] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
[   71.655313] Switching to clocksource hpet
[   95.679135] BUG: soft lockup - CPU#144 stuck for 23s! [swapper/144:0]
[   95.693363] BUG: soft lockup - CPU#145 stuck for 23s! [swapper/145:0]
[   95.695580] BUG: soft lockup - CPU#582 stuck for 23s! [swapper/582:0]
[   95.698128] BUG: soft lockup - CPU#357 stuck for 23s! [swapper/357:0]

This patch addresses the above issues by reducing HPET read contention
using the fact that if more than one CPUs are trying to access HPET at
the same time, it will be more efficient when only one CPU in the group
reads the HPET counter and shares it with the rest of the group instead
of each group member trying to read the HPET counter individually.

This is done by using a combination quadword that contains a 32-bit
stored HPET value and a 32-bit spinlock.  The CPU that gets the lock
will be responsible for reading the HPET counter and storing it in
the quadword. The others will monitor the change in HPET value and
lock status and grab the latest stored HPET value accordingly. This
change is only enabled on 64-bit SMP configuration.

On a 4-socket Haswell-EX box with 144 threads (HT on), running the
AIM7 compute workload (1500 users) on a 4.8-rc1 kernel (HZ=1000)
with and without the patch has the following performance numbers
(with HPET or TSC as clock source):

TSC = 1042431 jobs/min
HPET w/o patch =  798068 jobs/min
HPET with patch = 1029445 jobs/min

The perf profile showed a reduction of the %CPU time consumed by
read_hpet from 11.19% without patch to 1.24% with patch.

[ tglx: It's really sad that we need to have such hacks just to deal with
   the fact that cpu vendors have not managed to fix the TSC wreckage
   within 15+ years. Were They Forgetting? ]

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: Randy Wright <rwright@hpe.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1473182530-29175-1-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit f99fd22e4d4bc84880a8a3117311bbf0e3a6a9dc)

Orbug: 24744785

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

bonding: "primary_reselect" with "failure" is not working properly

Upstream commit b5a983f3141239b5.

When "primary_reselect" is set to "failure", primary interface should
not become active until current active slave is down. But if we set first
member of bond device as a "primary" interface and "primary_reselect"
is set to "failure" then whenever primary interface's link gets back(up)
it becomes active slave even if current active slave is still up.

With this patch, "bond_find_best_slave" will not traverse members if
primary interface is not candidate for failover/reselection and current
active slave is still up.

Orabug: 24831739

Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Mazhar Rana <mazhar.rana@cyberoam.com>
Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>

uek-rpm nano: Remove vmware modules from UEK Nano

Orabug:24808621
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

mm: migrate dirty page without clear_page_dirty_for_io etc

Orabug: 25059177
CVE: CVE-2016-3070

clear_page_dirty_for_io() has accumulated writeback and memcg subtleties
since v2.6.16 first introduced page migration; and the set_page_dirty()
which completed its migration of PageDirty, later had to be moderated to
__set_page_dirty_nobuffers(); then PageSwapBacked had to skip that too.

No actual problems seen with this procedure recently, but if you look into
what the clear_page_dirty_for_io(page)+set_page_dirty(newpage) is actually
achieving, it turns out to be nothing more than moving the PageDirty flag,
and its NR_FILE_DIRTY stat from one zone to another.

It would be good to avoid a pile of irrelevant decrementations and
incrementations, and improper event counting, and unnecessary descent of
the radix_tree under tree_lock (to set the PAGECACHE_TAG_DIRTY which
radix_tree_replace_slot() left in place anyway).

Do the NR_FILE_DIRTY movement, like the other stats movements, while
interrupts still disabled in migrate_page_move_mapping(); and don't even
bother if the zone is the same. Do the PageDirty movement there under
tree_lock too, where old page is frozen and newpage not yet visible:
bearing in mind that as soon as newpage becomes visible in radix_tree, an
un-page-locked set_page_dirty() might interfere (or perhaps that's just
not possible: anything doing so should already hold an additional
reference to the old page, preventing its migration; but play safe).

But we do still need to transfer PageDirty in migrate_page_copy(), for
those who don't go the mapping route through migrate_page_move_mapping().

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 42cb14b110a5698ccf26ce59c4441722605a3743)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
mm/migrate.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

xen-netfront: cast grant table reference first to type int

IS_ERR_VALUE() in commit 87557efc27f6a50140fb20df06a917f368ce3c66
("xen-netfront: do not cast grant table reference to signed short") would
not return true for error code unless we cast ref first to type int.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oracle-Bug: 25138361
upstream commit: 269ebce4531b8edc4224259a02143181a1c1d77c
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed by: Jack F. Vogel <jack.vogel@oracle.com>
Acked-by: Joe Jin <joe.jin@oracle.com>

xen-netfront: do not cast grant table reference to signed short

While grant reference is of type uint32_t, xen-netfront erroneously casts
it to signed short in BUG_ON().

This would lead to the xen domU panic during boot-up or migration when it
is attached with lots of paravirtual devices.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oracle-Bug: 25138362
upstream commit: 87557efc27f6a50140fb20df06a917f368ce3c66
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed by: Jack F. Vogel <jack.vogel@oracle.com>
Acked-by: Joe Jin <joe.jin@oracle.com>

IB/core: uverbs: Do not wait for uverbs dev closure during forced system shutdown.

Waiting for uverbs dev file closure in the system forced shutdown [e.g
reboot -f] path locks up the reboot process forever preventing system
shutdown as application closure of this device file is very unlikely at
this time. This closure happens in orderly shutdown [e.g reboot] as
processes are sent SIGTERM.

Orabug: 24918039

Signed-off-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>

uek-rpm:enable ldmvsw module

Orabug: 23215917

Signed-off-by: Allen Pais <allen.pais@oracle.com>

Merge branch topic/uek-4.1/secureboot of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/secureboot:
acpi: Disable ACPI table override if securelevel is set

acpi: Disable ACPI table override if securelevel is set

From the kernel documentation (initrd_table_override.txt):

  If the ACPI_INITRD_TABLE_OVERRIDE compile option is true, it is possible
  to override nearly any ACPI table provided by the BIOS with an
  instrumented, modified one.

When securelevel is set, the kernel should disallow any unauthenticated
changes to kernel space. ACPI tables contain code invoked by the kernel, so
do not allow ACPI tables to be overridden if securelevel is set.

Signed-off-by: Linn Crosetto <linn@hpe.com>
Orabug: 25058372
CVE: CVE-2016-3699
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Reviewed-by: Guru Anbalagane <guru.anbalagane@oracle.com>

IB/cm: avoid query device in CM REQ/REP

The query device needed in CM REQ/REP is a bit expensive since
it involves a MAD query and also it is not saved in the cache.
When the driver that holds the local device is different from
sif then there is no need to go through the query device. If
sif driver is identified, then we still need to go through the
query device in order to get the specific vendor id. This last
is to make sure the software workaround is applied only to the
PSIF revisions that are affected.

This patch filters those cases and avoids unnecessary MAD queries
when driver is different from sif.

Orabug: 24785622

Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>

IB/cm: return original rnr value when RNR WA for PSIF

With this patch, the original min_rnr_value set by the user is saved
in case it is later queried. The ib_qp flag has been re-purposed to
store the value in addition. This patch makes the RNR WA implementation
total transparent for the user.

Orabug: 24785622

Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>

IB/cm: MBIT needs to be used in network order

This patch fixes the MBIT big endian portability.

Orabug: 24785622

Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>

IB/core: Issue DREQ when receiving REQ/REP for stale QP

from "InfiBand Architecture Specifications Volume 1":

  A QP is said to have a stale connection when only one side has
  connection information. A stale connection may result if the remote CM
  had dropped the connection and sent a DREQ but the DREQ was never
  received by the local CM. Alternatively the remote CM may have lost
  all record of past connections because its node crashed and rebooted,
  while the local CM did not become aware of the remote node's reboot
  and therefore did not clean up stale connections.

and:

   A local CM may receive a REQ/REP for a stale connection. It shall
   abort the connection issuing REJ to the REQ/REP. It shall then issue
   DREQ with "DREQ:remote QPNâ\80\9d set to the remote QPN from the REQ/REP.

This patch solves a problem with reuse of QPN. Current codebase, that
is IPoIB, relies on a REAP-mechanism to do cleanup of the structures
in CM. A problem with this is the timeconstants governing this
mechanism; they are up to 768 seconds and the interface may look
inresponsive in that period.  Issuing a DREQ (and receiving a DREP)
does the necessary cleanup and the interface comes up.

Orabug: 24806985

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

sif: cq: cleanup cqe once a kernel qp is destroyed/reset

To ease the sqflush/rqflush workaround, sifdrv removes
all the associated cqes once a kernel qp is destroy/reset.

Even though IB specification 10.2.4.4 mentioned that
"Destroying a QP does not guarantee that CQEs of that
QP are deallocated from the CQ upon destruction.",
it also stated that "Even if the CQEs are already on
the CQ, it might not be possible to retrieve them"

Thus, IB spec is indicating that it is vendor specific
implementation and the ULP should not assume that the
cqes are in the cq once a qp is destroyed/reset.

Orabug: 25070316

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: cq: sif_poll_cq might not drain cq completely

In a scenario where a duplicate completion is detected, the sif_poll_cq
skips the remaining cqes in the cq. This code bug is introduced in
commit "sif: sqflush: Handle duplicate completions in poll_cq".

Orabug: 25038711

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: rq: do not flush rq if it is an srq

sifdrv needs to flush a regular rq (non-srq) once it
detects that a qp is transitioned into ERR state.
Nevertheless, if a qp is created with srq and
with no event handler, the srq might be accidentally
flushed once the qp is transitioned into ERR sate.

Orabug: 25071205

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: cq: use refcnt to disable/enable cq polling

Due to a hardware bug, sifdrv must clean up the cq
before it is being polled by the user. Thus, sifdrv
uses CQ_POLLING_NOT_ALLOWED bitmask to disable/enable
cq polling.

Nevertheless, the bit mask operation is not sufficient in
a shared cq scenario (many qps to one cq). The cq clean up
is performed by each qp and it might be performed concurrently.
As a result, the cq polling might be enabled before all qps
have clean up the cq.

Thus, this patch uses refcnt to disable/enable cq polling.
The CQ_POLLING_NOT_ALLOWED bitmask is kept in order to have
backward compatibility in the user library.

Orabug: 25038731

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: pt: Add support for single thread modified page tables

Modifications to the page tables in sif_pt is protected by
a lock to allow multiple threads to add and subtract regions
to/from the page table in parallel. This functionality is
currently only needed/used by the special sq_cmpl page table
handling. In the future we might however need this also for
other cases, for instance to optimize further on page table
memory usage.

The kernel documentation for infiniband midlayer locking
requires that map_phys_fmr should be callable from any context.
This prevents us from blocking on a lock, something that happens
if there are contention for the lock (eg. more than one thread
involved in modifying the page table)

Implement another flag: thread_safe in a pt that determines
if a page table is going to need to be modified from multiple
threads simultaneously. For now keep a BUG_ON if the code
is attempted accessed in parallel for memory types
that should not ever see parallel access.

Orabug: 24836269

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Francisco Trivino-Garcia <francisco.trivino@oracle.com>

sif: pqp: Implement handling of PQPs in error.

The assumption is that any such situation that can arise in
production is due to an application that causes it's CQ to go
to error and where the PQP subsequently tries to post a CQ
operation that affects the CQ that is in error. In these cases,
the PQP itself goes to error and an event is generated.

This commit refactors the modify_qp logic slightly, as well as
implementing a modification cycle to bring a privileged QP
back up again. It also adds a new pqp debugfs file and some statistics
to help monitoring the new PQP specific state as well.

The resurrect operation is queued on the sif workqueue
by the new handle_pqp_event function, which is now properly
wired up to accept all PQP events. When a PQP is detected as
being in error, its last_set_state is updated, and in addition
the write_only flag is set, which causes new send reqs not to
touch any collect buffer as part of the operation.
This flag was introduced to allow the resurrect to set the PQP in
RTS again while still not triggering any sends.

This way the implementation allows clients to continue to
post requests to the PQP while it is in error or in transition
back to RTS again by just accepting these requests into the PQP
send queue without any writes to the collect buffer.

When in the INIT state, the resurrect worker updates
the SQ pointers to skip the request that triggered
the PQP error.

Once back in RTS, the resurrect worker can take the single SQ lock
which serializes posts, check the size of the send queue
and if >= 0, trigger the send queue scheduler to start processing these.
Once the QP is in SQS mode, or just idle if the queue was empty,
it is safe for ordering purposes to let normal posting with
collect buffer writes commence.

Orabug: 24715634

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Francisco Trivino-Garcia <francisco.trivino@oracle.com>

sif: eps*: initialize each struct in array

explisitly zero all the structs, set and retain also for epsa0
which was (unintentionally) zeroed also when looping through rest
of the eps entries.

Make sure to set members of the struct after initialization.

Orabug: 25033224

Signed-off-by: George Refseth <george.refseth@oracle.com>
Reviewed-by: Åsmund Østvold <asmund.ostvold@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: query_device: Return correct #SGEs for EoIB

ULPs that use LSO, can only create QPs with #SGE entries being one
less than what is supported in HW. This because PSIF uses one entry
for the LSO stencil. The driver attempts to detect the ULP and from
that derive if LSO will be used.

This commit adds support for XVE (Xsigo Virtual Ethernet), and a
query_device() from the XVE driver (not the connected mode part), will
return one less #SGEs.

Further, since the XVE ULP is detected, we amend the sysfs listing of
QPs with EoIB (for XVE datagram mode QPs) and EoIB_CM (for XVE
connected mode QPs).

Orabug: 25027106

Signed-off-by: Hakon Bugge <Haakon.Bugge@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: LSO not supported for EoIB queuepairs

Due to missing decoding of create_flags, LSO is not enabled for EoIB
queuepairs. Regression was introduced in 4.1.12-71 kernel

Orabug: 25026132

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: pqp: Make setup/teardown function ref sif_pqp_info directly

Take advantage of the new, cleaner and more separate PQP data structures:
Simplify/abstract pqp setup/teardown by pointing into sdev
for a direct ref to the sif_pqp_info data structure.

Orabug: 24715634

Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: Move the rest of the pqp setup and teardown to sif_pqp

This finishes the restructure started in the previous commit,
to consolidate PQP handling logic to make it simpler
to extend without spreading the complexity to other parts
of the code.

Orabug: 24715634

Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: Move sif_dfs_register beyond base init

The debugfs setup must be initialized prior to PQP operation,
but must also be deinitialized before base table takedown,
otherwise we are exposed to faults due to a race condition between
a user accessing debugfs tables and driver unload.

This commit moves the dfs init/deinit from sif_probe to
sif_hw_init to achieve this order.

Orabug: 24971465

Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: Refactor PQP state out of sif_dev.

Prepare handling of PQPs for the additional complexity
of resurrect upon errors and query for undetected
error conditions by moving some of the state
now directly in sif_dev into a new sif_pqp_info
struct.

Already complex logic for PQP handling is going to
increase in complexity. We need to consolidate
to be able to keep clean and easily maintainable
interfaces.

Orabug: 24715634

Signed-off-by: Knut Omang <knut.omang@oracle.com>

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks:
ecryptfs: don't allow mmap when the lower fs doesn't support it
Revert "ecryptfs: forbid opening files without mmap handler"

ecryptfs: don't allow mmap when the lower fs doesn't support it

There are legitimate reasons to disallow mmap on certain files, notably
in sysfs or procfs. We shouldn't emulate mmap support on file systems
that don't offer support natively.

CVE-2016-1583

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: stable@vger.kernel.org
[tyhicks: clean up f_op check by using ecryptfs_file_to_lower()]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Mainline v4.7 commit f0fe970df3838c202ef6c07a4c2b36838ef0a88b
Replaces UEK4 commit e06914f2e9ac6b3f19d4461cb24b401f77ce4f17
which was reverted by UEK4 commit b1660e855b21.

Orabug: 24971905
CVE: CVE-2016-1583
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com

Revert "ecryptfs: forbid opening files without mmap handler"

This reverts commit UEK4 e06914f2e9ac6b3f19d4461cb24b401f77ce4f17
which was based on mainline v4.7 commit:
2f36db71009304b3f0b95afacd8eba1f9f046b87
ecryptfs: forbid opening files without mmap handler

2f36db71 was replaced by mainline v4.7 commit:
f0fe970df3838c202ef6c07a4c2b36838ef0a88b
ecryptfs: don't allow mmap when the lower fs doesn't support it
which follows in another commit.

Also see mainline 4.7 commit 78c4e172412de5d0456dc00d2b34050aa0b683b5
Revert "ecryptfs: forbid opening files without mmap handler"

Orabug: 24971905
CVE: CVE-2016-1583
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks:
  percpu: fix synchronization between synchronous map extension and chunk destruction
  percpu: fix synchronization between chunk->map_extend_work and chunk destruction
  ALSA: timer: Fix leak in events via snd_timer_user_tinterrupt
  ALSA: timer: Fix leak in events via snd_timer_user_ccallback
  ALSA: timer: Fix leak in SNDRV_TIMER_IOCTL_PARAMS

percpu: fix synchronization between synchronous map extension and chunk destruction

For non-atomic allocations, pcpu_alloc() can try to extend the area
map synchronously after dropping pcpu_lock; however, the extension
wasn't synchronized against chunk destruction and the chunk might get
freed while extension is in progress.

This patch fixes the bug by putting most of non-atomic allocations
under pcpu_alloc_mutex to synchronize against pcpu_balance_work which
is responsible for async chunk management including destruction.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # v3.18+
Fixes: 1a4d76076cda ("percpu: implement asynchronous chunk population")
Orabug: 25060076
CVE: CVE-2016-4794
Mainline v4.7 commit 6710e594f71ccaad8101bc64321152af7cd9ea28
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

percpu: fix synchronization between chunk->map_extend_work and chunk destruction

Atomic allocations can trigger async map extensions which is serviced
by chunk->map_extend_work. pcpu_balance_work which is responsible for
destroying idle chunks wasn't synchronizing properly against
chunk->map_extend_work and may end up freeing the chunk while the work
item is still in flight.

This patch fixes the bug by rolling async map extension operations
into pcpu_balance_work.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # v3.18+
Fixes: 9c824b6a172c ("percpu: make sure chunk->map array has available space")
Orabug: 25060076
CVE: CVE-2016-4794
Mainline v4.7 commit 4f996e234dad488e5d9ba0858bc1bae12eff82c3
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

ALSA: timer: Fix leak in events via snd_timer_user_tinterrupt

The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Orabug: 25059885
CVE: CVE-2016-4578
Mainline v4.7 commit e4ec8cc8039a7063e24204299b462bd1383184a5
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

ALSA: timer: Fix leak in events via snd_timer_user_ccallback

The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Orabug: 25059885
CVE: CVE-2016-4578
Mainline v4.7 commit 9a47e9cff994f37f7f0dbd9ae23740d0f64f9fe6
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

ALSA: timer: Fix leak in SNDRV_TIMER_IOCTL_PARAMS

The stack object “tread” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Orabug: 25059408
CVE: CVE-2016-4569
Mainline v4.7 commit cec8f96e49d9be372fdb0c3836dcf31ec71e457e
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>

Merge branch topic/uek-4.1/rpm-build of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/rpm-build:
uek-rpm ol7: change uek-rpm/ol7/update-el release value from 7.1 to 7.3

Merge branch topic/uek-4.1/upstream-cherry-picks of git://ca-git.us.oracle.com/linux-uek into uek/uek-4.1

* topic/uek-4.1/upstream-cherry-picks:
perf tools: handle spaces in file names obtained from /proc/pid/maps

perf tools: handle spaces in file names obtained from /proc/pid/maps

Steam frequently puts game binaries in folders with spaces.

Note: "(deleted)" markers are now treated as part of the file name.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Fixes: 6064803313ba ("perf tools: Use sscanf for parsing /proc/pid/maps")
Link: http://lkml.kernel.org/r/20160119190303.GA17579@marcin-Inspiron-7720
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from commit 89fee59b504f86925894fcc9ba79d5c933842f93)

Orabug: 25072114
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>

SPARC64: PORT LDMVSW DRIVER TO UEK4

      Port of the new ldmvsw (Ldoms Virtual Switch) driver to UEK4.
      This code has already been submitted and accepted
      into the mainline Linux kernel.

      The ldmvsw is very similar in function to the existing sunvnet driver. The
      sunvnet driver is therefore split to put the code common to both drivers
      into the kernel for use by both drivers when loaded (see sunvnet_common.c/h).

Orabug: 23215917

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 361afffe35368dc23d2c9df6d7797ccf9af8fe57)

SPARC64: Fix bad FP register calculation

An additional problem was found in handle_ldf_stq
after adding the fix for the SIGFPE on no-fault
load.  The calculation for freg is incorrect when
a single precision load is being handled. This
causes %f1 to be seen as %f32 etc, and the incorrect
register ends up being overwritten.  This code
sequence demonstrates the problem:
ldd [%g1], %f32         ! g1 = valid address
lda [%i3] ASI_PNF, %f1  ! i3 = invalid address
std %f32, [%g1] ! %f32 is mangled
This is corrected by basing the freg calculation on
the load size.

Orabug: 24942761

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

SPARC64: Respect no-fault ASI for floating exceptions

Floating point load instructions using ASI_PNF or other
no-fault ASIs should never cause a SIGFPE. A store-quad
instruction should naturally fault if a non-quad register
is given, but this constraint should not apply to loads,
which may be single precision, double, or quad, and the
only constraint should be that the target register type
be appropriate for the precision of the load. A bug in
handle_ldf_stq() unnecessarily restricts no-fault loads
to quad registers, and causes a floating point exception
if one is not given. This restriction is removed.

Orabug: 24942761

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

sparc64: Fixes NUMA node cpulist sysfs file in single NUMA node case.

Forward port 23175351 to UEK4

The sysfs file /sys/devices/system/node/node0/cpulist is incorrect in the
single node case on sun4v machines as the machine description record in this
case does not contain any NUMA information. A default list from 0 to NR_CPUS
was used prior. This file is read by utilities such as 'numactl --hardware'
and lscpu to show CPU-to-node assignment.

In order to fix this issue, the numa_cpumask_lookup_table is cleared at
bootup. Whenever an extra cpu is bringup via __cpu_up, the corresponding
cpu mask is set in the numa_cpumask_lookup_table.

Orabug: 24500614
Orabug: 22546851

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Thomas Tai <thomas.tai@oracle.com>

sparc64: Cleans up PRIQ error and debugging messages.

Given that the lowest level arch dependent interrupt routines cannot actually
propagate any error back to the calling driver in the case of irq
request/enable/disable and setting affinity, PRIQ error messages need to
communicate failures in a more traceable way. The original error messages which
were more for internal debugging than regular usage have also been improved as
well as made controllable via a command line parameter priq=dbg.

Orabug: 24010412

Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
(cherry picked from commit 89c31d4dd664cd2edc1f6d14aa62c75acfb0d172)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc: Remove console spam during kdump

Before executing the crash kernel, the panicking kernel cleans up the
irq state of the machine. This code contains a warning when cleaning up
unbound MSIs. Repeating this warning for each one floods the console and
can cause a waiting thread to time out before the other cpus have
completed.

This patch removes the warning and increases the time allowed for all
the cpus to complete the machine_capture_other_strands() function.

orabug: 23585248

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit 30e77b09b134b3ec049b01cfd8754c774da493b9)
(cherry picked from commit 67657418d64c3cee2945370614c38d4516fb0ea1)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: kdump: set crashing_cpu for panic

crashing_cpu was only being set in die_if_kernel() but not when a
crash dump is initiated from panic(). Move the initialization to
machine_crash_shutdown().

Also call bust_spinlocks() from die_if_kernel() to get rid of a warning
in smp_call_function_many(). It's already called in the panic path.

Orabug: 23585248

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit 1b588be700fac73edd07c015ff53aecba5d92bec)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc: kexec: Don't mess with the tl register

I meddled with things I didn't fully understand while implementing commit
b43bc8f0 - "sparc64: add missing code for crash_setup_regs()"

I had changed the tl register in order to read tstate, tpc, etc. without
really knowing what I was doing. This can be a disaster if the crashing
thread takes another interrupt. Currently, the crash utitility doesn't
even use those values. They are found on the stack instead.

Orabug: 23585248

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit 80eb7e28d3c719bbe3af56de5a5a8c68b764dbb9)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: VDS should try indefinitely to allocate IO pages

Orabug: 24924152

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewd-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Use block layer BIO-based interface for VDC IO requests

Orabug: 24823012

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Enable virtual disk protocol out of order execution

Orabug: 24815498

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Chris Hyser <Chris.Hyser@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

ipmi: Fix NULL pointer access and double free panic.

Orabug: 24697944

In case of a ldom/hardware not supporting ldc, ipmi_si module
will set the smi interface pointer to NULL after ldc channel
detection failure. However, ipmi_si module will crash during
unload due to absence of NULL check.

Add the smi interface null check and assign the workqueue to
NULL during cleanup to avoid double free panic.

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: David Aldridge <david.j.aldridge@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
(cherry picked from commit f2546771efb0c6402a5ea65dac9c5dbce18150e6)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

ipmi: Update ipmi driver as per new vldc interface

Orabug: 23748821

Currently, ipmi driver fakes it self as a userland process
to access ipmi vldc channel.

This patch uses new cleaner vldc kernel interface that is added
for ipmi driver.

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit 7a0d1deac3289130680a5ab1626c609b76c9f053)

ipmi: Fix ipmi driver for ilom reset scenario

Orabug: 24407542

IPMI driver will have a stale vldc file pointer if ILOM resets.
Thus, IPMI drivers failed to work after the reset is complete.
IPMI driver need to close that file pointer and open another after
ilom reset is complete.

This is achieved by trying to open vldc file in every 15 seconds
in a process context. As vldc or ldc can not detect a ILOM reset,
this is the best possible approach for the problem.

This is based on Rob's patch for mc reset fix.
Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
(cherry picked from commit cf5139791a8241fcab1f59c1da0a9058def661f2)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: vcc fixes

Orabug: 24653154

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Reviewed-By: Bijan Mottahedeh <Bijan.Mottahedeh@oracle.com>
Reviewed-By: Liam Merwick <Liam.Merwick@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
(cherry picked from commit ef9086623414761b55a23d18dfc8565e7d1d7659)

sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()

Synonym created.
Grant succeeded.
kernel BUG at include/asm-generic/pgtable.h:576!
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
oracle_8114_cdb(8114): Kernel bad sw trap 5 [#1]
CPU: 120 PID: 8114 Comm: oracle_8114_cdb Not tainted
4.1.12-61.7.1.el6uek.rc1.sparc64 #1
task: fff8400700a24d60 ti: fff8400700bc4000 task.ti: fff8400700bc4000
TSTATE: 0000004411e01607 TPC: 00000000004609f8 TNPC: 00000000004609fc Y:
00000005    Not tainted
TPC: <gup_huge_pmd+0x198/0x1e0>
g0: 000000000001c000 g1: 0000000000ef3954 g2: 0000000000000000 g3:
0000000000000001
g4: fff8400700a24d60 g5: fff8001fa5c10000 g6: fff8400700bc4000 g7:
0000000000000720
o0: 0000000000bc5058 o1: 0000000000000240 o2: 0000000000006000 o3:
0000000000001c00
o4: 0000000000000000 o5: 0000048000080000 sp: fff8400700bc6ab1 ret_pc:
00000000004609f0
RPC: <gup_huge_pmd+0x190/0x1e0>
l0: fff8400700bc74fc l1: 0000000000020000 l2: 0000000000002000 l3:
0000000000000000
l4: fff8001f93250950 l5: 000000000113f800 l6: 0000000000000004 l7:
0000000000000000
i0: fff8400700ca46a0 i1: bd0000085e800453 i2: 000000026a0c4000 i3:
000000026a0c6000
i4: 0000000000000001 i5: fff800070c958de8 i6: fff8400700bc6b61 i7:
0000000000460dd0
I7: <gup_pud_range+0x170/0x1a0>
Call Trace:
[0000000000460dd0] gup_pud_range+0x170/0x1a0
[0000000000460e84] get_user_pages_fast+0x84/0x120
[00000000006f5a18] iov_iter_get_pages+0x98/0x240
[00000000005fa744] do_direct_IO+0xf64/0x1e00
[00000000005fbbc0] __blockdev_direct_IO+0x360/0x15a0
[00000000101f74fc] ext4_ind_direct_IO+0xdc/0x400 [ext4]
[00000000101af690] ext4_ext_direct_IO+0x1d0/0x2c0 [ext4]
[00000000101af86c] ext4_direct_IO+0xec/0x220 [ext4]
[0000000000553bd4] generic_file_read_iter+0x114/0x140
[00000000005bdc2c] __vfs_read+0xac/0x100
[00000000005bf254] vfs_read+0x54/0x100
[00000000005bf368] SyS_pread64+0x68/0x80

Orabug: 24665642

Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Initialize xl_hugepage_shift to 0

Currently, this global is incorrectly initialized
to the default hugepage size (HPAGE_SHIFT) which
causes non-8M hugepages fail to initialize.

Orabug: 24439278

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64:mm/hugetlb: Set correct huge_pte_count index for 8M hugepages

Both set_huge_pte_at(...) and huge_ptep_get_and_clear(...)
call real_hugepage_size_to_pte_count_idx(hugepage_size) when adjusting
huge_pte_count. For 8MB/4MB the huge_pte_count index computed is 1(one).
This is incorrect because this index is for xl_hugepages. So the tsb
grow code in the mm fault path does not grow the tsb for 8MB/4MB
hugepages.

Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Orabug: 24490586
(cherry picked from commit c928d6fccaa59bd4b6cffc904144fa67a4726ff6)
Signed-off-by: Allen Pais <allen.pais@oracle.com>

sparc64: Fix accounting issues used to size TSBs

Orabug: 24478985

As pages are allocated by a task, counters in the mm and mm_context
structures are used to track these allocations.  These counters are
then used to size the task's TSBs.  This patch addresses issues where
counts are not maintained properly, and TSBs of the incorrect size
are created for the task.

- hugetlb pages are not included in a task's RSS calculations.  However,
  the routine do_sparc64_fault() calculates the size of base TSB block
  by subtracting total size of hugetlb pages from RSS.  Since hugetlb
  size is likely larger than RSS, a negative value is passed as an
  unsigned value to the routine which allocates the TSB block.  The
  'negative unsigned' value appears as a really big value and results in
  a maximum sized base TSB being allocated.  This is the case for almost
  all tasks using hugetlb pages.

  THP pages are also counted in huge_pte_count[MM_PTES_HUGE].  And
  unlike hugetlb pages, THP pages are included in a task's RSS.
  Therefore, both hugetlb and THP can not be counted for in
  huge_pte_count[MM_PTES_HUGE].

  Add a new counter thp_pte_count for THP pages, and use this value for
  adjusting RSS to size the base TSB.

- In order to save memory, THP makes use of a huge zero page.  This huge
  zero page does not count against a task's RSS, but it does consume TSB
  entries.  Therefore, count huge zero page entries in
  huge_pte_count[MM_PTES_HUGE].

- Accounting of THP pages is done in the routine set_pmd_at().
  Unfortunately, this does not catch the case where a THP page is split.
  To handle this case, decrement the count in pmdp_invalidate().
  pmdp_invalidate is only called when splitting a THP.  However, 'sanity
  checks' are added in case it is ever called for other purposes.

- huge_pte_count[MM_PTES_HUGE] tracks the number of HPAGE_SIZE (8M) pages
  used by the task.  This value is used to size the TSB for HPAGE_SIZE
  pages.  However, for each HPAGE_SIZE (8M) there are two REAL_HPAGE_SIZE
  (4M) pages.  The TSB contains an entry for each REAL_HPAGE_SIZE page.
  Therefore, the number of REAL_HPAGE_SIZE pages used by the task should
  be used to size the MM_PTES_HUGE TSB.  A new compile time constant
  REAL_HPAGE_PER_HPAGE is used to multiply huge_pte_count[MM_PTES_HUGE]
  before sizing the TSB.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Tested-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
(cherry picked from commit 417fc85e759b6d4c4602fbdbdd5375ec5ddf2cb0)
Signed-off-by: Allen Pais <allen.pais@oracle.com>