www.infradead.org Git - users/jedix/linux-maple.git/log

ocfs2: fix deadlock caused by recursive locking in xattr

Orabug: 26427132

Another deadlock path caused by recursive locking is reported.  This
kind of issue was introduced since commit 743b5f1434f5 ("ocfs2: take
inode lock in ocfs2_iop_set/get_acl()").  Two deadlock paths have been
fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when taking
inode lock at vfs entry points").  Yes, we intend to fix this kind of
case in incremental way, because it's hard to find out all possible
paths at once.

This one can be reproduced like this.  On node1, cp a large file from
home directory to ocfs2 mountpoint.  While on node2, run
setfacl/getfacl.  Both nodes will hang up there.  The backtraces:

On node1:
  __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
  ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
  ocfs2_write_begin+0x43/0x1a0 [ocfs2]
  generic_perform_write+0xa9/0x180
  __generic_file_write_iter+0x1aa/0x1d0
  ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
  __vfs_write+0xc3/0x130
  vfs_write+0xb1/0x1a0
  SyS_write+0x46/0xa0

On node2:
  __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
  ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
  ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
  ocfs2_set_acl+0x22d/0x260 [ocfs2]
  ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
  set_posix_acl+0x75/0xb0
  posix_acl_xattr_set+0x49/0xa0
  __vfs_setxattr+0x69/0x80
  __vfs_setxattr_noperm+0x72/0x1a0
  vfs_setxattr+0xa7/0xb0
  setxattr+0x12d/0x190
  path_setxattr+0x9f/0xb0
  SyS_setxattr+0x14/0x20

Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is
exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic
to avoid recursive cluster lock").

Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com
Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
Signed-off-by: Eric Ren <zren@suse.com>
Reported-by: Thomas Voegtle <tv@lio96.de>
Tested-by: Thomas Voegtle <tv@lio96.de>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherrypicked from commit 8818efaaacb78c60a9d90c5705b6c99b75d7d442)
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

ocfs2: fix deadlock issue when taking inode lock at vfs entry points

Orabug: 26427132

Conflicts:
    fs/ocfs2/file.c

Commit 3acdc8b3862a results in a deadlock, as the author realized shortly
after the patch was merged.  The discussion happened here

https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html

The reason why taking cluster inode lock at vfs entry points opens up a
self deadlock window, is explained in the previous patch of this series.

So far, we have seen two different code paths that have this issue.

1. do_sys_open
     may_open
       inode_permission
        ocfs2_permission
         ocfs2_inode_lock() <=== take PR
          generic_permission
           get_acl
            ocfs2_iop_get_acl
             ocfs2_inode_lock() <=== take PR

2. fchmod|fchmodat
    chmod_common
     notify_change
      ocfs2_setattr <=== take EX
       posix_acl_chmod
        get_acl
         ocfs2_iop_get_acl <=== remote PR request
        ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for these
funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Link: http://lkml.kernel.org/r/20170117100948.11657-3-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherrypicked from commit b891fa5024a95c77e0d6fd6655cb74af6fb77f46)
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

Orabug: 26427132

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a precess
already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that are
invoked directly by vfs code.  For instance:

  const struct inode_operations ocfs2_file_iops = {
      .permission     = ocfs2_permission,
      .get_acl        = ocfs2_iop_get_acl,
      .set_acl        = ocfs2_iop_set_acl,
  };

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):

  do_sys_open
   may_open
    inode_permission
     ocfs2_permission
      ocfs2_inode_lock() <=== first time
       generic_permission
        get_acl
         ocfs2_iop_get_acl
   ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two of
ocfs2_inode_lock().  Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of
the remote EX lock request.  Another hand, the recursive cluster lock
(the second one) will be blocked in in __ocfs2_cluster_lock() because of
OCFS2_LOCK_BLOCKED.  But, the downconvert never complete, why? because
there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.

1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track
   of the processes' pid who has taken the cluster lock of this lock
   resource;

2. introduce a new flag for ocfs2_inode_lock_full:
   OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for
   us if we've got cluster lock.

3. export a helper: ocfs2_is_locked_by_me() is used to check if we have
   got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs
routines can call into each other.

The performance penalty of processing the holder list should only be
seen at a few cases where the tracking logic is used, such as get/set
acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real
world, as far as I can see, including permission check,
(get|set)_(acl|attr), and the gfs2 code also do so.

[sfr@canb.auug.org.au remove some inlines]
Link: http://lkml.kernel.org/r/20170117100948.11657-2-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherrypicked from commit 439a36b8ef38657f765b80b775e2885338d72451)
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

Revert "add OCFS2_LOCK_RECURSIVE arg_flags to ocfs2_cluster_lock() to prevent hang"

Orabug: 26427132

This reverts commit 387775a70d2e5d89d1a81d78d66655337a5c2765.

Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

xen/blkfront: always allocate grants first from per-queue persistent grants

This patch partially reverts 3df0e50 ("xen/blkfront: pseudo support for
multi hardware queues/rings"). The xen-blkfront queue/ring might hang due
to grants allocation failure in the situation when gnttab_free_head is
almost empty while many persistent grants are reserved for this queue/ring.

As persistent grants management was per-queue since 73716df ("xen/blkfront:
make persistent grants pool per-queue"), we should always allocate from
persistent grants first.

Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26351401

upstream commit: bd912ef3e46b6edb51bb8af4b73fd2be7817e305
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>

rds: Make sure updates to cp_send_gen can be observed

cp->cp_send_gen is treated as a normal variable, although it may be
used by different threads.

This is fixed by using {READ,WRITE}_ONCE when it is incremented and
READ_ONCE when it is read outside the {acquire,release}_in_xmit
protection.

Normative reference from the Linux-Kernel Memory Model:

    Loads from and stores to shared (but non-atomic) variables should
    be protected with the READ_ONCE(), WRITE_ONCE(), and
    ACCESS_ONCE().

Clause 5.1.2.4/25 in the C standard is also relevant.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from upstream e623a48ee433985f6ca0fb238f0002cc2eccdf53)

Orabug: 26519030

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>

NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration

Transparent State Migration copies a client's lease state from the
server where a filesystem used to reside to the server where it now
resides. When an NFSv4.1 client first contacts that destination
server, it uses EXCHANGE_ID to detect trunking relationships.

The lease that was copied there is returned to that client, but the
destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
the client. This is because the lease was confirmed on the source
server (before it was copied).

Normally, when CONFIRMED_R is set, a client purges the lease and
creates a new one. However, that throws away the entire benefit of
Transparent State Migration.

Therefore, the client must not purge that lease when it is possible
that Transparent State Migration has occurred.

Reported-by: Xuan Qi <xuan.qi@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Xuan Qi <xuan.qi@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
(cherry picked from commit 8dcbec6d20eb881ba368d0aebc3a8a678aebb1da)

Orabug: 25727872
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>

xen: do not re-use pirq number cached in pci device msi msg data

Revert the main part of commit:
af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")

That commit introduced reading the pci device's msi message data to see
if a pirq was previously configured for the device's msi/msix, and re-use
that pirq.  At the time, that was the correct behavior.  However, a
later change to Qemu caused it to call into the Xen hypervisor to unmap
all pirqs for a pci device, when the pci device disables its MSI/MSIX
vectors; specifically the Qemu commit:
c976437c7dba9c7444fb41df45468968aaa326ad
("qemu-xen: free all the pirqs for msi/msix when driver unload")

Once Qemu added this pirq unmapping, it was no longer correct for the
kernel to re-use the pirq number cached in the pci device msi message
data.  All Qemu releases since 2.1.0 contain the patch that unmaps the
pirqs when the pci device disables its MSI/MSIX vectors.

This bug is causing failures to initialize multiple NVMe controllers
under Xen, because the NVMe driver sets up a single MSIX vector for
each controller (concurrently), and then after using that to talk to
the controller for some configuration data, it disables the single MSIX
vector and re-configures all the MSIX vectors it needs.  So the MSIX
setup code tries to re-use the cached pirq from the first vector
for each controller, but the hypervisor has already given away that
pirq to another controller, and its initialization fails.

This is discussed in more detail at:
https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html

Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
(cherry picked from commit c74fd80f2f41d05f350bb478151021f88551afe8)

Orabug: 26547167

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Re-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

MacSec: fix backporting error in patches for CVE-2017-7477

Orabug: 26443893

- macsec: dynamically allocate space for sglist (Jason A. Donenfeld)
[Orabug: 26368162]  {CVE-2017-7477}
- macsec: avoid heap overflow in skb_to_sgvec (Jason A. Donenfeld)  [Orabug:
26368162]  {CVE-2017-7477}

The backporting of above patches introduded a heap overrun error shown
as bug 26443893.

------------[ cut here ]------------
WARNING: CPU: 28 PID: 0 at kernel/time/timer.c:1177
call_timer_fn+0x142/0x150()
timer: mld_ifc_timer_expire+0x0/0x2d0 preempt leak: 00000100 -> 00000101
Modules linked in: gcm macsec fuse btrfs xor raid6_pq vfat msdos fat ext4
jbd2 ext2 mbcache2 ip6table_filter ip6_tables
BUG: workqueue leaked lock or atomic: kworker/15:2/0x00000001/689
     last function: addrconf_dad_work
CPU: 15 PID: 689 Comm: kworker/15:2 Not tainted 4.1.12-103.2.6.el7uek.x86_64
Hardware name: Oracle Corporation SUN SERVER X4-2       /ASSY,MOTHERBOARD,1U
, BIOS 25010603 01/16/2014
Workqueue: ipv6_addrconf addrconf_dad_work

Call Trace:
[<ffffffff81735938>] dump_stack+0x63/0x81
[<ffffffff810a0fd8>] process_one_work+0x3a8/0x460
[<ffffffff810a1582>] worker_thread+0x112/0x520
[<ffffffff810a1470>] ? rescuer_thread+0x3e0/0x3e0
[<ffffffff810a7348>] kthread+0xd8/0xf0
[<ffffffff810a7270>] ? kthread_create_on_node+0x1b0/0x1b0
[<ffffffff8173d9a2>] ret_from_fork+0x42/0x70
[<ffffffff810a7270>] ? kthread_create_on_node+0x1b0/0x1b0
BUG: scheduling while atomic: kworker/15:2/689/0x00000001

1. newly introduced variable "num_frags" not used in 'sg_ad', assumes
'MAX_SKB_FRAGS + 1'

2. Initialization of sglist assumes 'MAX_SKB_FRAGS + 1' length, though it was
changed to the number of scatterlist elements being returned from
"skb_cow_data()"

3. It seems that "sg_init_table(sg, MAX_SKB_FRAGS + 1);" is redundant, it was
already done a few lines before.

This patch may solve the above issues.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

ovl: move super block magic number to magic.h

Orabug: 26546379, 26540706
CVE-2016-1575
CVE-2016-1576

The overlayfs file system is not recognized by programs
like tail because the magic number is not in standard header location.

Move it so that the value will propagate on for the GNU library
and utilities. Needs to go in the fstatfs manual page as well.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
(cherry picked from commit 257f871993474e2bde6c497b54022c362cf398e1)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

ovl: use a minimal buffer in ovl_copy_xattr

Orabug: 26546379, 26540706
CVE-2016-1575
CVE-2016-1576

Rather than always allocating the high-order XATTR_SIZE_MAX buffer
which is costly and prone to failure, only allocate what is needed and
realloc if necessary.

Fixes https://github.com/coreos/bugs/issues/489

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
(cherry picked from commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

ovl: allow zero size xattr

Orabug: 26546379, 26540706
CVE-2016-1575
CVE-2016-1576

When ovl_copy_xattr() encountered a zero size xattr no more xattrs were
copied and the function returned success. This is clearly not the desired
behavior.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
(cherry picked from commit 97daf8b97ad6f913a34c82515be64dc9ac08d63e)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

ovl: default permissions

Orabug: 26546379, 26540706
CVE-2016-1575
CVE-2016-1576

Add mount option "default_permissions" to alter the way permissions are
calculated.

Without this option and prior to this patch permissions were calculated by
underlying lower or upper filesystem.

With this option the permissions are calculated by overlayfs based on the
file owner, group and mode bits.

This has significance for example when a read-only exported NFS filesystem
is used as a lower layer. In this case the underlying NFS filesystem will
reply with EROFS, in which case all we know is that the filesystem is
read-only. But that's not what we are interested in, we are interested in
whether the access would be allowed if the filesystem wasn't read-only; the
server doesn't tell us that, and would need updating at various levels,
which doesn't seem practicable.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
(cherry picked from commit 8d3095f4ad47ac409440a0ba1c80e13519ff867d)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

uek-rpm: Add missing .ko files to ueknano modules list

Orabug: 26521422

When installing kernel-ueknano rpm package, warnings are displayed due to missing
symbols. This commit adds modules with needed symbols to /lib/modules/ directory.

Fixes: 74d5ebd39bfa ("uek-rpm: Share specfile for both kernel-ueknano and kernel-uek")
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>

ping: implement proper locking

We got a report of yet another bug in ping

http://www.openwall.com/lists/oss-security/2017/03/24/6

->disconnect() is not called with socket lock held.

Fix this by acquiring ping rwlock earlier.

Thanks to Daniel, Alexander and Andrey for letting us know this problem.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Daniel Jiang <danieljiang0415@gmail.com>
Reported-by: Solar Designer <solar@openwall.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 43a6684519ab0a6c52024b5e25322476cabad893)

Orabug: 25883225
CEV: CVE-2017-2671

Signed-off-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect

If there is inflight I/O in any non-last queue, blkback returns -EBUSY
directly, and never stops thread of remaining queue and processs them. When
removing vbd device with lots of disk I/O load, some queues with inflight
I/O still have blkback thread running even though the corresponding vbd
device or guest is gone.
And this could cause some problems, for example, if the backend device type
is file, some loop devices and blkback thread always lingers there forever
after guest is destroyed, and this causes failure of umounting repositories
unless rebooting the dom0. So stop all threads properly and return -EBUSY
if any queue has inflight I/O.

OraBug: 26539922

Signed-off-by: Annie Li <annie.li@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>

uek-rpm: Share specfile for both kernel-ueknano and kernel-uek

Orabug: 26521422

The kernel-ueknano is added as sub package in kernel-uek spec file to
create both the binary rpms from the same source. The list of modules to
be added to kernel-ueknano is passed as input to the spec file.

Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed By: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-By: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>

PCI: Workaround wrong flags completions for IDT switch

The IDT switch incorrectly flags an ACS source violation on a read config
request to an end point device on the completion (IDT 89H32H8G3-YC,
errata #36) even though the PCI Express spec states that completions are
never affected by ACS source violation (PCI Spec 3.1, Section 6.12.1.1).

The suggested workaround by IDT is to issue a configuration write to the
downstream device before issuing the first config read. This allows the
downstream device to capture its bus number, thus avoiding the ACS
violation on the completion.

The patch does the following -

1. Disable ACS source violation if enabled
2. Wait for config space access to become available by reading vendor id
3. Do a config write to the end point (errata workaround)
4. Enable ACS source validation (if it was enabled to begin with)

-v2: move workaround to pci_bus_read_dev_vendor_id() from
pci_bus_check_dev()
      and move enable_acs_sv to drivers/pci/pci.c -- by Yinghai
-v3: add bus->self check for root bus and virtual bus for sriov vfs.
-v4: only do workaround for IDT switches
-v5: tweak pci_std_enable_acs_sv to deal with unimplemented SV and
clarify return value

Signed-off-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
--

  drivers/pci/pci.c   | 37 +++++++++++++++++++++++++++++++++++++
  drivers/pci/pci.h   |  1 +
  drivers/pci/probe.c | 38 ++++++++++++++++++++++++++++++++++++--
  3 files changed, 74 insertions(+), 2 deletions(-)

Orabug: 26243152

Link: https://patchwork.kernel.org/patch/9828571/
Fixed wrapped lines in the original patch to fit the lines in 80 columns.
Changed variable types from integer to bool to keep consistent with function
return type.

Signed-off-by: Shan Hai <shan.hai@oracle.com>

Revert "SUNRPC: Refactor svc_set_num_threads()"

This reverts commit 0bc9402329824ae06d8a26a73b60d21cdac3e6f2.

Orabug: 26479081
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>

Revert "NFSv4: Fix callback server shutdown"

This reverts commit 13757272bec28f1b8be59f56292e8f17076923b3.

Orabug: 26479081
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>

nmi_backtrace: generate one-line reports for idle cpus

When doing an nmi backtrace of many cores, most of which are idle, the
output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the interrupted
PC to see if it lies within that section.

This commit suitably tags x86 and tile idle routines, and only adds in
the minimal framework for other architectures.

Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Tested-by: Petr Mladek <pmladek@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6727ad9e206cc08b80d8000a4d67f8417e53539d)

Orabug: 25925689

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
arch/arm/kernel/vmlinux-xip.lds.S
arch/h8300/kernel/vmlinux.lds.S
arch/x86/kernel/process.c
drivers/acpi/processor_idle.c
kernel/sched/idle.c
lib/nmi_backtrace.c

netfilter: nf_tables: fix oob access

BUG: KASAN: slab-out-of-bounds in nf_tables_rule_destroy+0xf1/0x130 at addr ffff88006a4c35c8
Read of size 8 by task nft/1607

When we've destroyed last valid expr, nft_expr_next() returns an invalid expr.
We must not dereference it unless it passes != nft_expr_last() check.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 3e38df136e453aa69eb4472108ebce2fb00b1ba6)

Orabug: 25960439,26492640,26492632

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

scsi: libiscsi: use kvzalloc for iscsi_pool_init

iscsiadm session login can fail with the following error:

iscsiadm: Could not login to [iface: default, target: iqn.1986-03.com...
iscsiadm: initiator reported error (9 - internal error)

When /etc/iscsi/iscsid.conf sets node.session.cmds_max = 4096, it
results in 64K-sized kmallocs per session. A system under fragmented
slab pressure may not have any 64K objects available and fail iscsiadm
session login. Even though memory objects of a smaller size are
available, the large order allocation ends up failing.

The kernel prints a warning and does dump_stack, like below:

iscsid: page allocation failure: order:4, mode:0xc0d0
CPU: 0 PID: 2456 Comm: iscsid Not tainted 4.1.12-61.1.28.el6uek.x86_64 #2
Call Trace:
[<ffffffff816c6e40>] dump_stack+0x63/0x83
[<ffffffff8118e58a>] warn_alloc_failed+0xea/0x140
[<ffffffff81191df9>] __alloc_pages_slowpath+0x409/0x760
[<ffffffff81192401>] __alloc_pages_nodemask+0x2b1/0x2d0
[<ffffffffa048f6c0>] ? dev_attr_host_ipaddress+0x20/0xffffffffffffc722
[<ffffffff811dc38f>] alloc_pages_current+0xaf/0x170
[<ffffffff81192581>] alloc_kmem_pages+0x31/0xd0
[<ffffffffa048f600>] ? iscsi_transport_group+0x20/0xffffffffffffc7e2
[<ffffffff811ad738>] kmalloc_order+0x18/0x50
[<ffffffff811ad7a4>] kmalloc_order_trace+0x34/0xe0
[<ffffffff8146ee30>] ? transport_remove_classdev+0x70/0x70
[<ffffffff811e843d>] __kmalloc+0x27d/0x2a0
[<ffffffff810c8cbd>] ? complete_all+0x4d/0x60
[<ffffffffa04af299>] iscsi_pool_init+0x69/0x160 [libiscsi]
[<ffffffff81465d90>] ? device_initialize+0xb0/0xd0
[<ffffffffa04af510>] iscsi_session_setup+0x180/0x2f4 [libiscsi]
[<ffffffffa04c5a60>] ? iscsi_max_lun+0x20/0xfffffffffffffa9e [iscsi_tcp]
[<ffffffffa04c531f>] iscsi_sw_tcp_session_create+0xcf/0x150 [iscsi_tcp]
[<ffffffffa04c5a60>] ? iscsi_max_lun+0x20/0xfffffffffffffa9e [iscsi_tcp]
[<ffffffffa048a633>] iscsi_if_create_session+0x33/0xd0
[<ffffffffa04c5a60>] ? iscsi_max_lun+0x20/0xfffffffffffffa9e [iscsi_tcp]
[<ffffffffa048abd8>] iscsi_if_recv_msg+0x508/0x8c0 [scsi_transport_iscsi]
[<ffffffff811922eb>] ? __alloc_pages_nodemask+0x19b/0x2d0
[<ffffffff811e6d69>] ? __kmalloc_node_track_caller+0x209/0x2c0
[<ffffffffa048b00c>] iscsi_if_rx+0x7c/0x200 [scsi_transport_iscsi]
[<ffffffff81623dc6>] netlink_unicast+0x126/0x1c0
[<ffffffff8162468c>] netlink_sendmsg+0x36c/0x400
[<ffffffff815d2fed>] sock_sendmsg+0x4d/0x60
[<ffffffff815d596a>] ___sys_sendmsg+0x30a/0x330
[<ffffffff811bc72c>] ? handle_pte_fault+0x20c/0x230
[<ffffffff811bc90c>] ? __handle_mm_fault+0x1bc/0x330
[<ffffffff811bcb32>] ? handle_mm_fault+0xb2/0x1a0
[<ffffffff815d5b99>] __sys_sendmsg+0x49/0x90
[<ffffffff815d5bf9>] SyS_sendmsg+0x19/0x20
[<ffffffff816cbb2e>] system_call_fastpath+0x12/0x71

Use kvzalloc for iscsi_pool in iscsi_pool_init.

Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Tested-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Joseph Slember <joe.slember@oracle.com>
Reviewed-by: Lance Hartmann <lance.hartmann@oracle.com>
Acked-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26473178
(cherry picked from commit bfcc62ed7066268349e8e7955925bdaf4be0eec0)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

mm: introduce kv[mz]alloc helpers

Patch series "kvmalloc", v5.

There are many open coded kmalloc with vmalloc fallback instances in the
tree.  Most of them are not careful enough or simply do not care about
the underlying semantic of the kmalloc/page allocator which means that
a) some vmalloc fallbacks are basically unreachable because the kmalloc
part will keep retrying until it succeeds b) the page allocator can
invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.

As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.

Most callers, I could find, have been converted to use the helper
instead.  This is patch 6.  There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.

[1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com

This patch (of 9):

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code.  Yet we do not have any common helper
for that and so users have invented their own helpers.  Some of them are
really creative when doing so.  Let's just add kv[mz]alloc and make sure
it is implemented properly.  This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures.  This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them.  In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
require GFP_NO{FS,IO} context which is not vmalloc compatible in general
(note that the page table allocation is GFP_KERNEL).  Those need to be
fixed separately.

While we are at it, document that __vmalloc{_node} about unsupported gfp
mask because there seems to be a lot of confusion out there.
kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
superset) flags to catch new abusers.  Existing ones would have to die
slowly.

[sfr@canb.auug.org.au: f2fs fixup]
Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 26473178
(cherry picked from commit a7c3e901a46ff54c016d040847eda598a9e3e653)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Conflicts:

arch/x86/kvm/lapic.c
arch/x86/kvm/page_track.c
arch/x86/kvm/x86.c
fs/f2fs/f2fs.h
fs/f2fs/file.c
fs/f2fs/node.c
fs/f2fs/segment.c
fs/seq_file.c
security/apparmor/apparmorfs.c
security/apparmor/include/lib.h
security/apparmor/lib.c
security/apparmor/policy_unpack.c
virt/kvm/kvm_main.c

sg: Fix double-free when drives detach during SG_IO

In sg_common_write(), we free the block request and return -ENODEV if
the device is detached in the middle of the SG_IO ioctl().

Unfortunately, sg_finish_rem_req() also tries to free srp->rq, so we
end up freeing rq->cmd in the already free rq object, and then free
the object itself out from under the current user.

This ends up corrupting random memory via the list_head on the rq
object. The most common crash trace I saw is this:

  ------------[ cut here ]------------
  kernel BUG at block/blk-core.c:1420!
  Call Trace:
  [<ffffffff81281eab>] blk_put_request+0x5b/0x80
  [<ffffffffa0069e5b>] sg_finish_rem_req+0x6b/0x120 [sg]
  [<ffffffffa006bcb9>] sg_common_write.isra.14+0x459/0x5a0 [sg]
  [<ffffffff8125b328>] ? selinux_file_alloc_security+0x48/0x70
  [<ffffffffa006bf95>] sg_new_write.isra.17+0x195/0x2d0 [sg]
  [<ffffffffa006cef4>] sg_ioctl+0x644/0xdb0 [sg]
  [<ffffffff81170f80>] do_vfs_ioctl+0x90/0x520
  [<ffffffff81258967>] ? file_has_perm+0x97/0xb0
  [<ffffffff811714a1>] SyS_ioctl+0x91/0xb0
  [<ffffffff81602afb>] tracesys+0xdd/0xe2
    RIP [<ffffffff81281e04>] __blk_put_request+0x154/0x1a0

The solution is straightforward: just set srp->rq to NULL in the
failure branch so that sg_finish_rem_req() doesn't attempt to re-free
it.

Additionally, since sg_rq_end_io() will never be called on the object
when this happens, we need to free memory backing ->cmd if it isn't
embedded in the object itself.

KASAN was extremely helpful in finding the root cause of this bug.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26492266
(cherry picked from commit f3951a3709ff50990bf3e188c27d346792103432)
Signed-off-by: John Sobecki <john.sobecki@oracle.com>

scsi: smartpqi: mark PM functions as __maybe_unused

Orabug: 26191021, 26447813

The newly added suspend/resume support causes harmless warnings when
CONFIG_PM is disabled:

smartpqi/smartpqi_init.c:5147:12: error: 'pqi_ctrl_wait_for_pending_io' defined but not used [-Werror=unused-function]
smartpqi/smartpqi_init.c:2019:13: error: 'pqi_wait_until_lun_reset_finished' defined but not used [-Werror=unused-function]
smartpqi/smartpqi_init.c:2013:13: error: 'pqi_wait_until_scan_finished' defined but not used [-Werror=unused-function]

We can avoid the warnings by removing the #ifdef around the handlers and
instead marking them as __maybe_unused, which will let gcc drop the
unused code silently.

Fixes: f44d210312a6 ("scsi: smartpqi: add suspend and resume support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5c146686e32085e76ad9e2957f3dee9b28fe4f22)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: bump driver version

Orabug: 26191021, 26447813

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2d154f5ff338137a69f2f2a313520b6da2e1eb16)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: remove writeq/readq function definitions

Orabug: 26191021, 26447813

Instead of rewriting write/readq, use existing functions

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add module parameters

Orabug: 26191021, 26447813

Add module parameters to disable heartbeat support and to disable
shutting down the controller when a controller is taken offline.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5a259e32ba32c380537f3d186a311e528b9f9c94)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: cleanup list initialization

Orabug: 26191021, 26447813

Better initialization of linked list heads.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8a994a04fc3a8edbcc0ba1d17219b6d8f4c38009)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add raid level show

Orabug: 26191021, 26447813

Display the RAID level via sysfs

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a9f93392415eb0fc86c29f015822b36016278c72)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: make ioaccel references consistent

Orabug: 26191021, 26447813

- make all references to RAID bypass consistent throughout driver.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 588a63fea1c28009fe17f194941fb8d8b101b44e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: enhance device add and remove messages

Orabug: 26191021, 26447813

Improved formatting of information displayed when devices
are added/removed from the system.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6de783f666291763bcc6c3975e146b9b698378b1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update timeout on admin commands

Orabug: 26191021, 26447813

Increase the timeout on admin commands from 3 seconds to 60
seconds and added a check for controller crash in the loop
where the driver polls for admin command completion.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 13bede676b98d595a43d36a34e1835b686d0d140)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: map more raid errors to SCSI errors

Orabug: 26191021, 26447813

enhance mapping of RAID path errors to Linux SCSI host
error codes.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f5b63206255f68116c117565ab703c531c5ce400)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: cleanup controller branding

Orabug: 26191021, 26447813

- Improve controller branding support.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 37b36847a94669a898c9e3449d19d522d9c13979)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update rescan worker

Orabug: 26191021, 26447813

improve support for taking controller offline.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5f310425c8eabeeb303809898682e5b79c8a9c7e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update device offline

Orabug: 26191021, 26447813

- Improve handling of offline devices.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 03b288cf3d92202b950245e931576bb573930c70)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct aio error path

Orabug: 26191021, 26447813

set the internal flag that causes I/O to be sent down the
RAID path when the AIO path is disabled

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 376fb880a4fbf6903918a88081b16c167819af3f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: add lockup action

Orabug: 26191021, 26447813

add support for actions to take when controller goes offline.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3c50976f33f30cf00baea9d518bd3e7ddd01ecc4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: remove qdepth calculations for logical volumes

Orabug: 26191021, 26447813

make the queue depth for LVs the same as the maximum
I/Os supported by the controller

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 94086f5be3f15fc8231e65975e4413c0df3e0203)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: enhance kdump

Orabug: 26191021, 26447813

constrain resource usage during kdump to avoid kdump failures

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d727a776d72b26033161bc19441266749455115b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: change return value for LUN reset operations

Orabug: 26191021, 26447813

change return value for controller offline to be consistent
with the rest of the driver.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4e8415e3861e8b73a47c92e09e044b9dbc8ee37f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add ptraid support

Orabug: 26191021, 26447813

add support for PTRAID devices

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit bd10cf0be6057f680fab911d89761fd15d76b205)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update copyright

Orabug: 26191021, 26447813

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b805dbfe2bce1ddf3209c29f1aa7d6b2064ab6c9)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: cleanup messages

Orabug: 26191021, 26447813

- improve some error messages.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d87d5474e2080695ef0cc8c5e6c42a41d6ab961b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add new PCI device IDs

Orabug: 26191021, 26447813

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7eddabff8acb0f4c25f992efe126cf6cccdd6e7b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: minor driver cleanup

Orabug: 26191021, 26447813

- remove debug code that is no longer necessary.
- Some WARN_ON checks were removed because the driver continues
to function when the conditions are met.
- remove a MACRO that is no longer used.
- remove unnecessary multi-line statements.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit cbe0c7b11dbfda368f27a6935a08ba91522edf1a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct BMIC identify physical drive

Orabug: 26191021, 26447813

correct the BMIC Identify Physical Device structure
- missing 2 fields

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1be42f46ade32c668f11c0735af03ab2d479d206)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: eliminate redundant error messages

Orabug: 26191021, 26447813

eliminate redundant error message during initialization
if the controller has crashed.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8845fdfa92ab6eb24209f9929d6340c2f5d4a2de)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add pqi_wait_for_completion_io

Orabug: 26191021, 26447813

Add check for controller lockup during waits for synchronous
controller commands.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1f37e992ad8015ce33596466b0f36babb495148e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct bdma hw bug

Orabug: 26191021, 26447813

add workaround for BDMA hardware bug that can cause
hw to read up to 12 SGL elements (192 bytes) beyond the
last element in the list. This fix avoids IOMMU violations

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e1d213bdc3e359c6c5da8ebbc5b2e87b376e8777)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add heartbeat check

Orabug: 26191021, 26447813

check for controller lockups

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 98f876674a6fba3591c342dfbcfdbaa7ecf0a84e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add suspend and resume support

Orabug: 26191021, 26447813

add support for ACPI S3 (suspend) and S4 (hibernate)
system power states.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 061ef06a2d436cea85984cf0b51b452547a5496c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: enhance resets

Orabug: 26191021, 26447813

- Block all I/O targeted at LUN reset device.
- Wait until all I/O targeted at LUN reset device has been
consumed by the controller.
- Issue LUN reset request.
- Wait until all outstanding I/Os and LUN reset completion
have been received by the host.
- Return to OS results of LUN reset request.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7561a7e4412e515100ac195303531fc2621ac2db)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add supporting events

Orabug: 26191021, 26447813

Only register for controller events that driver supports
cleanup event handling.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6a50d6ada03d8d9102a632d0e2db70cd9b6620f5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: ensure controller is in SIS mode at init

Orabug: 26191021, 26447813

put in SIS mode during initialization.
support kexec/kdump

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 162d7753fce9a00719c09dfebd9fee3855e27fbe)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add in controller checkpoint for controller lockups.

Orabug: 26191021, 26447813

tell smartpqi controller to generate a checkpoint for rare lockup
conditions.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5b0fba0f408777113eff93bd18ab0b9f80760fb7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: set pci completion timeout

Orabug: 26191021, 26447813

add support for setting PCIe completion timeout.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a81ed5f338a843d8bfd199928142b196d71ae62c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct remove scsi devices

Orabug: 26191021, 26447813

correct a problem caused by holding a spinlock during device deletion.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a37ef74517acf0d022ab4c8fa671c82c877eed7b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: fix time handling

Orabug: 26191021, 26447813

When we have turned off RTC support, the smartpqi driver fails to build:

ERROR: "rtc_time64_to_tm" [drivers/scsi/smartpqi/smartpqi.ko] undefined!

This is easily avoided by using the generic 'struct tm' based helper rather
than the RTC specific one. While fixing this, I noticed that even though
the driver uses time64_t for storing seconds, it gets them from the
old 32-bit struct timeval. To address this, we can simplify the code
by calling ktime_get_real_seconds() directly.

Fixes: 6c223761eb54 ("smartpqi: initial commit of Microsemi smartpqi driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ed10858eadd4988260c6bc7d75fc25176342b5a7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/sock: add WARN_ON(parent->sk) in sock_graft()

sock_graft() unilaterally sets up parent->sk based on the
assumption that the existing parent->sk is null. If this
condition is not true, then the existing parent->sk would
be leaked, so add a WARN_ON() to alert callers who may fall
in this category.

Orabug: 26477756

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

rds: tcp: use sock_create_lite() to create the accept socket

There are two problems with calling sock_create_kern() from
rds_tcp_accept_one()
1. it sets up a new_sock->sk that is wasteful, because this ->sk
is going to get replaced by inet_accept() in the subsequent ->accept()
2. The new_sock->sk is a leaked reference in sock_graft() which
expects to find a null parent->sk

Avoid these problems by calling sock_create_lite().

Orabug: 26477756

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

rds: tcp: set linger to 1 when unloading a rds-tcp

If we are unloading the rds_tcp module, we can set linger to 1
and drop pending packets to accelerate reconnect. The peer will
end up resetting the connection based on new generation numbers
of the new incarnation, so hanging on to unsent TCP packets via
linger is mostly pointless in this case.

Orabug: 26477841

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

rds: tcp: send handshake ping-probe from passive endpoint

The RDS handshake ping probe added by commit 5916e2c1554f
("RDS: TCP: Enable multipath RDS for TCP") is sent from rds_sendmsg()
before the first data packet is sent to a peer. If the conversation
is not bidirectional (i.e., one side is always passive and never
invokes rds_sendmsg()) and the passive side restarts its rds_tcp
module, a new HS ping probe needs to be sent, so that the number
of paths can be re-established.

This patch achieves that by sending a HS ping probe from
rds_tcp_accept_one() when c_npaths is 0 (i.e., we have not done
a handshake probe with this peer yet).

Orabug: 26477841

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

xfs: skip dirty pages in ->releasepage()

Orabug: 26451790

XFS has had scattered reports of delalloc blocks present at
->releasepage() time. This results in a warning with a stack trace
similar to the following:

...
Call Trace:
  [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
  [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
  [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
  [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
  [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
  [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
  [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
  [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
  [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
  [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
  [<ffffffffa2168539>] kswapd+0x4f9/0x970
  [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
  [<ffffffffa20a0d99>] kthread+0xc9/0xe0
  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
  [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100

This occurs because it is possible for shrink_active_list() to send
pages marked dirty to ->releasepage() when certain buffer_head threshold
conditions are met. shrink_active_list() doesn't check the page dirty
state apparently to handle an old ext3 corner case where in some cases
clean pages would not have the dirty bit cleared, thus it is up to the
filesystem to determine how to handle the page.

XFS currently handles the delalloc case properly, but this behavior
makes the warning spurious. Update the XFS ->releasepage() handler to
explicitly skip dirty pages. Retain the existing delalloc/unwritten
checks so we continue to warn if such buffers exist on clean pages when
they shouldn't.

Diagnosed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
(cherry picked from commit 99579ccec4e271c3d4d4e7c946058766812afdab)
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qede: Add support for ingress headroom

Orabug: 25933053, 26439680

Driver currently doesn't support any headroom; The only 'available'
space it has in the head of the buffer is due to the placement
offset.
In order to allow [later] support of XDP adjustment of headroom,
modify the the ingress flow to properly handle a scenario where
the packets would have such.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qede: Update receive statistic once per NAPI

Orabug: 25933053, 26439680

Currently, each time an ingress packet is passed to networking stack
the driver increments a per-queue SW statistic.
As we want to have additional fields in the first cache-line of the
Rx-queue struct, change flow so this statistic would be updated once per
NAPI run. We will later push the statistic to a different cache line.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Make OOO archipelagos into an array

Orabug: 25933053, 26439680

No need to maintain the various open archipelagos as a list -
The maximal number of them is known, and we can use the CID
as key for random-access into the array.

Signed-off-by: Michal Kalderon <Michal.Kalderon@caviumc.om>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Provide iSCSI statistics to management

Orabug: 25933053, 26439680

Management firmware can query for some basic iSCSI-related statistics.
Provide those just as we do for other protocols.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Inform qedi the number of possible CQs

Orabug: 25933053, 26439680

Now that management firmware is capable of telling us the number of CQs
available for a given PF, qed needs to communicate the number to qedi
so it would know have many to use.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Add missing stat for new isles

Orabug: 25933053, 26439680

Firmware provides a statistic for the number of out-of-order isles
it used - fill it in the iscsi-related statistics.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Don't close the OUT_EN during init

Orabug: 25933053, 26439680

Before initializing the chip's engine, driver currently closes a set
of registers on the HW's ingress flow to prevent packets from slipping
in while they're not supposed to.

This configuration is insufficient, as there are some scenarios where
packets would still arrive even when said registers are set,
but the management firmware already closes other per-port registers
that do suffice, making this setting unnecessray.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Configure cacheline size in HW

Orabug: 25933053, 26439680

Default HW configuration is optimal for an architecture where cache
line size is 64B.

During chip initialization, properly initialize the cache line size
in HW to avoid possible redundant PCI transactions.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Don't use main-ptt in unrelated flows

Orabug: 25933053, 26439680

In order to access HW registers driver needs to acquire a PTT entry
[mapping between bar memory and internal chip address].
Since acquiring PTT entries could fail [at least in theory] as their
number is finite and other flows can hold them, we reserve special PTT
entries for 'important' enough flows - ones we want to guarantee that
would not be susceptible to such issues.

One such special entry is the 'main' PTT which is meant to be used in
flows such as chip initialization and de-initialization.
However, there are other flows that are also using that same entry
for their own purpose, and might run concurrently with the original
flows [notice that for most cases using the main-ptt by mistake, such
a race is still impossible, at least today].

This patch re-organizes the various functions that currently use the
main_ptt in one of two ways:

  - If a function shouldn't use the main_ptt it starts acquiring and
    releasing it's own PTT entry and use it instead. Notice if those
    functions previously couldn't fail, they now can [as acquisition
    might fail].

  - Change the prototypes so that the main_ptt would be received as
    a parameter [instead of explicitly accessing it].
    This prevents the future risk of adding codes that introduces new
    use-cases for flows using the main_ptt, ones that might be in race
    with the actual 'main' flows.

Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Warn PTT usage by wrong hw-function

Orabug: 25933053, 26439680

PTT entries are per-hwfn; If some errneous flow is trying
to use a PTT belonging to a differnet hwfn warn user, as this
can break every register accessing flow later and is very hard
to root-cause.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Correct MSI-x for storage

Orabug: 25933053, 26439680

When qedr is enabled, qed would try dividing the msi-x vectors between
L2 and RoCE, starting with L2 and providing it with sufficient vectors
for its queues.

Problem is qed would also do that for storage partitions, and as those
don't need queues it would lead qed to award those partitions with 0
msi-x vectors, causing them to believe theye're using INTa and
preventing them from operating.

Fixes: 51ff17251c9c ("qed: Add support for RoCE hw init")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: fix missing break in OOO_LB_TC case

Orabug: 25933053, 26439680

There seems to be a missing break on the OOO_LB_TC case, pq_id
is being assigned and then re-assigned on the fall through default
case and that seems suspect.

Detected by CoverityScan, CID#1424402 ("Missing break in switch")

Fixes: b5a9ee7cf3be1 ("qed: Revise QM cofiguration")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Add a missing error code

Orabug: 25933053, 26439680

We should be returning -ENOMEM if qed_mcp_cmd_add_elem() fails. The
current code returns success.

Fixes: 4ed1eea82a21 ("qed: Revise MFW command locking")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: RoCE doesn't need to use SRC

Orabug: 25933053, 26439680

As RoCE doesn't need to use the SRC, allocating ILT memory
on behalf of RoCE is wasting available ILT lines.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Correct TM ILT lines in presence of VFs

Orabug: 25933053, 26439680

As of today there's no protocol supported that requires
support from the TM hardware block and enables SRIOV,
but we should still correct the calculation to reflect
the lines required for such future VFs instead of changing
the PF's own lines.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Fix TM block ILT allocation

Orabug: 25933053, 26439680

When configuring the HW timers block we should set the number of CIDs
up until the last CID that require timers, instead of only those CIDs
whose protocol needs timers support.

Today, the protocols that require HW timers' support have their CIDs
before any other protocol, but that would change in future [when we
add iWARP support].

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Revise QM cofiguration

Orabug: 25933053, 26439680

Refactor and clean up the queue manager initialization logic.
Also, this adds support for RoC low latency queues, which later
would be used for improving RoCE latency in high throughput scenarios.

Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Use BDQ resource for storage protocols

Orabug: 25933053, 26439680

Until now, qed used some port-defined value as BDQ index for both iSCSI
and FCoE.

As management firmware now treats BDQ as a resource and tells each PF
its BDQ-range, start using a valure from that range instead.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Utilize resource-lock based scheme

Orabug: 25933053, 26439680

Management firmware is used as an arbiter between the various PFs
in matters of resources, but some of the resources that need to
be divided are dependent on the non-management firmware used,
so management firmware first needs to be told how many resources
there are before trying to divide them.

As part of the initialization sequence, driver would first inform
the management firmware of the available resources under
a dedicated resource lock, and afterwards request for various
resources which might be based on the previous set values.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Support management-based resource locking

Orabug: 25933053, 26439680

Global locking can't properly be used to synchronize between different
PFs in all scenarios, as those instances might reside in different
logical partitions [e.g., when a PF is assigned via PDA to some VM].

The management firmware provides a generic infrastructure for
device locks. For each 'resource', it's guaranteed it could be acquired
by at most a single PF at any given time [or by management firmware].

This patch adds the necessary logic in qed for utilizing said
infrastructure, implementing lock/unlock internal APIs.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Send pf-flr as part of initialization

Orabug: 25933053, 26439680

During HW initialization, driver would set various registers to their
needed values - but it assumes all registers start at their reset-value,
so there's no need to re-configure a register's default value.

This assumption might be incorrect, e.g., in case of preboot driver
running and initializing the driver prior to our driver.

To overcome this, we now ask management firmware to initiate a PF-flr
early during the initialization sequence. That would return everything
in the PF's scope back to default and prevent previous configurations
from still being applied.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Move to new load request scheme

Orabug: 25933053, 26439680

Management firmware is used as an arbiter between the various PFs
in regard to loading - it causes the various PFs to load/unload
sequentially and informs each of its appropriate rule in the init.

But the existing flow is too weak to handle some scenarios where
PFs aren't properly cleaned prior to loading.
The significant scenarios falling under this criteria:
a. Preboot drivers in some environment can't properly unload.
b. Unexpected driver replacement [kdump, PDA].

Modern management firmware supports a more intricate loading flow,
where the driver has the ability to overcome previous limitations.
This moves qed into using this newer scheme.

Notice new scheme is backward compatible, so new drivers would
still be able to load properly on top of older management firmwares
and vice versa.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: hw_init() to receive parameter-struct

Orabug: 25933053, 26439680

We'll soon need additional information, so start by changing
the infrastructure to receive the initializing variables
via a parameter struct.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Correct HW stop flow

Orabug: 25933053, 26439680

Management firmware is used as arbiter between different PFs
which are loading/unloading, but in order to use the synchronization
it offers the contending configurations need to be applied either
between their LOAD_REQ <-> LOAD_DONE or UNLOAD_REQ <-> UNLOAD_DONE
management firmware commands.

Existing HW stop flow utilizes 2 different functions: qed_hw_stop() and
qed_hw_reset() which don't abide this requirement; Most of the closure
is doing outside the scope of the unload request.

This patch removes qed_hw_reset() and places the relevant stop
functionality underneath the management firmware protection.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Reserve VF feature before PF

Orabug: 25933053, 26439680

Align the driver feature distribution with the flow utilized
by the management firmware - first reserve L2 queues for
VFs and use all the remaining for the PF.

The current distribution might lead to PFs with an enormous
amount of queues, but at the same time leave us with insufficient
resources for starting all VFs.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Don't waste SBs unused by RoCE

Orabug: 25933053, 26439680

When RoCE is enabled on a given L2 interface, the interrupt lines
are divided equally between L2 and RoCE -
But in case number of lines needed for RoCE is limited by number
of available CNQs, we can utilize the additional lines for L2.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Correct endian order of MAC passed to MFW

Orabug: 25933053, 26439680

The management firmware is running on a Big Endian processor,
and when running on LE platform HW is configured to swap access
to memory shared between management firmware and driver on
32-bit granulariy.

As a result, for matters of simplicity most of the APIs between
driver and management firmware are based on 32-bit variables.
MAC settings are one exception, as driver needs to fill a byte
array when indicating to management firmware that primary MAC
has changed.
Due to the swap, driver must make sure that the mac that was
provided in byte-order would be translated into native order,
otherwise after the swap the management firmware would read
it swapped.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Pass src/dst sizes when interacting with MFW

Orabug: 25933053, 26439680

The driver interaction with management firmware involves a union
of all the data-members relating to the commands the driver prepares.

Current interface assumes the caller always passes such a union -
but thats cumbersome as well as risky [chancing a stack corruption
in case caller accidentally passes a smaller member instead of union].

Change implementation so that caller could pass a pointer to any
of the members instead of the union.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Revise MFW command locking

Orabug: 25933053, 26439680

Interaction of driver -> management firmware is based
on a one-pending mailbox [per interface], and various
mailbox commands need to be synchronized.

Current scheme is messy, and there's a difficulty extending
it as it deals differently with various commands as well as
making assumption on the required behavior for load/unload
requests.

Drop the current scheme into a completion-list-based approach;
Each flow would try sending the command when possible,
allowing one flow to complete another flow's completion and
relieve the mailbox before sending its own command.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Always publish VF link from leading hwfn

Orabug: 25933053, 26439680

The link information exists only on the leading hwfn,
but some of its derivatives [e.g., min/max rate] need to
be configured for each hwfn.
When re-basing the VF link view, use the leading hwfn
information as basis for all existing hwfns to allow
said configurations to stick.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Raise verbosity of Malicious VF indications

Orabug: 25933053, 26439680

Malicious VF existance should be interesting enough for the
hyperuser. Change the PF indication that one of its child VF
became malicious to appear by default.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Make qed_iov_mark_vf_flr() return bool

Orabug: 25933053, 26439680

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Deprecate VF multiple queue-stop

Orabug: 25933053, 26439680

The PF<->VF interface allows for the VF to request
multiple queues closure via a single message, but this has
never been used by any official driver.

We now deprecate this option, forcing each queue close
to arrive via a different command; This would be required
for future TLVs that are going to extend the queue TLVs with
additional information on a per-queue basis.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Uniform IOV queue validation

Orabug: 25933053, 26439680

PF needs to validate the status of VF queues before asking firmware
to configure anything for them, but that validation is done in various
different forms - sometimes inadequate.

Add auxillary functions that can be used for testing of the queue
state and convert the various flows to use those instead of current
existing flows; Also, add missing validations where needed.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>