www.infradead.org Git - users/jedix/linux-maple.git/log

ovl: do not require mounter to have MAY_WRITE on lower

Now we have two levels of checks in ovl_permission(). overlay inode
is checked with the creds of task while underlying inode is checked
with the creds of mounter.

Looks like mounter does not have to have WRITE access to files on lower/.
So remove the MAY_WRITE from access mask for checks on underlying
lower inode.

This means task should still have the MAY_WRITE permission on lower
inode and mounter is not required to have MAY_WRITE.

It also solves the problem of read only NFS mounts being used as lower.
If __inode_permission(lower_inode, MAY_WRITE) is called on read only
NFS, it fails. By resetting MAY_WRITE, check succeeds and case of
read only NFS shold work with overlay without having to specify any
special mount options (default permission).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 754f8cb72b42a3a6100d2bbb1cb885361a7310dd)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: do operations on underlying file system in mounter's context

Given we are now doing checks both on overlay inode as well underlying
inode, we should be able to do checks and operations on underlying file
system using mounter's context.

So modify all operations to do checks/operations on underlying dentry/inode
in the context of mounter.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 1175b6b8d96331676f1d436b089b965807f23b4a)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: fix uid/gid when creating over whiteout

Fix a regression when creating a file over a whiteout.  The new
file/directory needs to use the current fsuid/fsgid, not the ones from the
mounter's credentials.

The refcounting is a bit tricky: prepare_creds() sets an original refcount,
override_creds() gets one more, which revert_cred() drops.  So

  1) we need to expicitly put the mounter's credentials when overriding
     with the updated one

  2) we need to put the original ref to the updated creds (and this can
     safely be done before revert_creds(), since we'll still have the ref
     from override_creds()).

Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Fixes: 3fe6e52f0626 ("ovl: override creds with the ones from the superblock mounter")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit d0e13f5bbe4be7c8f27736fc40503dcec04b7de0)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: modify ovl_permission() to do checks on two inodes

Right now ovl_permission() calls __inode_permission(realinode), to do
permission checks on real inode and no checks are done on overlay inode.

Modify it to do checks both on overlay inode as well as underlying inode.
Checks on overlay inode will be done with the creds of calling task while
checks on underlying inode will be done with the creds of mounter.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit c0ca3d70e8d3cf81e2255a217f7ca402f5ed0862)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: define ->get_acl() for overlay inodes

Now we are planning to do DAC permission checks on overlay inode
itself. And to make it work, we will need to make sure we can get acls from
underlying inode. So define ->get_acl() for overlay inodes and this in turn
calls into underlying filesystem to get acls, if any.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 39a25b2b37629f65e5a1eba1b353d0b47687c2ca)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: move some common code in a function

ovl_create_upper() and ovl_create_over_whiteout() seem to be sharing some
common code which can be moved into a separate function. No functionality
change.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 72e48481815eeca72fc886b3be91301ad87d6aeb)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: store ovl_entry in inode->i_private for all inodes

Previously this was only done for directory inodes. Doing so for all
inodes makes for a nice cleanup in ovl_permission at zero cost.

Inodes are not shared for hard links on the overlay, so this works fine.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 58ed4e70f253d80ed72faba7873dc11603b398bc)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: use generic_delete_inode

No point in keeping overlay inodes around since they will never be reused.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit eead4f2dc4f851a3790c49850e96a1d155bf5451)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: check mounter creds on underlying lookup

The hash salting changes meant that we can no longer reuse the hash in the
overlay dentry to look up the underlying dentry.

Instead of lookup_hash(), use lookup_one_len_unlocked() and swith to
mounter's creds (like we do for all other operations later in the series).

Now the lookup_hash() export introduced in 4.6 by 3c9fe8cdff1b ("vfs: add
lookup_hash() helper") is unused and can possibly be removed; its
usefulness negated by the hash salting and the idea that mounter's creds
should be used on operations on underlying filesystems.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 8387ff2577eb ("vfs: make the string hashes salt the hash")
Orabug: 26401569

(backport upstream commit c1b2cc1a765aff4df7b22abe6b66014236f73eba)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: ignore permissions on underlying lookup

Generally permission checking is not necessary when overlayfs looks up a
dentry on one of the underlying layers, since search permission on base
directory was already checked in ovl_permission().

More specifically using lookup_one_len() causes a problem when the lower
directory lacks search permission for a specific user while the upper
directory does have search permission. Since lookups are cached, this
causes inconsistency in behavior: success depends on who did the first
lookup.

So instead use lookup_hash() which doesn't do the permission check.

Reported-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 38b78a5f18584db6fa7441e0f4531b283b0e6725)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: override creds with the ones from the superblock mounter

In user namespace the whiteout creation fails with -EPERM because the
current process isn't capable(CAP_SYS_ADMIN) when setting xattr.

A simple reproducer:

$ mkdir upper lower work merged lower/dir
$ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
$ unshare -m -p -f -U -r bash

Now as root in the user namespace:

\# touch merged/dir/{1,2,3} # this will force a copy up of lower/dir
\# rm -fR merged/*

This ends up failing with -EPERM after the files in dir has been
correctly deleted:

unlinkat(4, "2", 0)                     = 0
unlinkat(4, "1", 0)                     = 0
unlinkat(4, "3", 0)                     = 0
close(4)                                = 0
unlinkat(AT_FDCWD, "merged/dir", AT_REMOVEDIR) = -1 EPERM (Operation not
permitted)

Interestingly, if you don't place files in merged/dir you can remove it,
meaning if upper/dir does not exist, creating the char device file works
properly in that same location.

This patch uses ovl_sb_creator_cred() to get the cred struct from the
superblock mounter and override the old cred with these new ones so that
the whiteout creation is possible because overlay is wrong in assuming that
the creds it will get with prepare_creds will be in the initial user
namespace.  The old cap_raise game is removed in favor of just overriding
the old cred struct.

This patch also drops from ovl_copy_up_one() the following two lines:

override_cred->fsuid = stat->uid;
override_cred->fsgid = stat->gid;

This is because the correct uid and gid are taken directly with the stat
struct and correctly set with ovl_set_attr().

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 3fe6e52f062643676eb4518d68cee3bc1272091b)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: fix dentry leak for default_permissions

When using the 'default_permissions' mount option, ovl_permission() on
non-directories was missing a dput(alias), resulting in "BUG Dentry still
in use".

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 8d3095f4ad47 ("ovl: default permissions")
Cc: <stable@vger.kernel.org> # v4.5+
Orabug: 26401569

(backport upstream commit a4859d75944a726533ab86d24bb5ffd1b2b7d6cc)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: fix open in stacked overlay

If two overlayfs filesystems are stacked on top of each other, then we need
recursion in ovl_d_select_inode().

I guess d_backing_inode() is supposed to do that. But currently it doesn't
and that functionality is open coded in vfs_open(). This is now copied
into ovl_d_select_inode() to fix this regression.

Reported-by: Alban Crequy <alban.crequy@gmail.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay...")
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org> # v4.2+
Orabug: 26401569

(backport upstream commit 1c8a47df36d72ace8cf78eb6c228aa0f8027d3c2)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

nfsd: don't hold i_mutex over userspace upcalls

We need information about exports when crossing mountpoints during
lookup or NFSv4 readdir.  If we don't already have that information
cached, we may have to ask (and wait for) rpc.mountd.

In both cases we currently hold the i_mutex on the parent of the
directory we're asking rpc.mountd about.  We've seen situations where
rpc.mountd performs some operation on that directory that tries to take
the i_mutex again, resulting in deadlock.

With some care, we may be able to avoid that in rpc.mountd.  But it
seems better just to avoid holding a mutex while waiting on userspace.

It appears that lookup_one_len is pretty much the only operation that
needs the i_mutex.  So we could just drop the i_mutex elsewhere and do
something like

mutex_lock()
lookup_one_len()
mutex_unlock()

In many cases though the lookup would have been cached and not required
the i_mutex, so it's more efficient to create a lookup_one_len() variant
that only takes the i_mutex when necessary.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Orabug: 26401569

(backport upstream commit bbddca8e8fac07ece3938e03526b5d00fa791a4c)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

Revert "ixgbevf: get rid of custom busy polling code"

This reverts commit 1975e69c708706b84d9462ce7c0135d33310c28a. Performance regression,
because the net/core napi support is not present.

Orabug: 26494997
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>

Revert "ixgbe: get rid of custom busy polling code"

This reverts commit 9244251e4f45dc9a61dd094a5d7ba23bb0285a86. The core napi support is
not in place, we need to keep the driver support or performance suffers.

Orabug: 26494997
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>

ocfs2: fix deadlock caused by recursive locking in xattr

Orabug: 26427132

Another deadlock path caused by recursive locking is reported.  This
kind of issue was introduced since commit 743b5f1434f5 ("ocfs2: take
inode lock in ocfs2_iop_set/get_acl()").  Two deadlock paths have been
fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when taking
inode lock at vfs entry points").  Yes, we intend to fix this kind of
case in incremental way, because it's hard to find out all possible
paths at once.

This one can be reproduced like this.  On node1, cp a large file from
home directory to ocfs2 mountpoint.  While on node2, run
setfacl/getfacl.  Both nodes will hang up there.  The backtraces:

On node1:
  __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
  ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
  ocfs2_write_begin+0x43/0x1a0 [ocfs2]
  generic_perform_write+0xa9/0x180
  __generic_file_write_iter+0x1aa/0x1d0
  ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
  __vfs_write+0xc3/0x130
  vfs_write+0xb1/0x1a0
  SyS_write+0x46/0xa0

On node2:
  __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
  ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
  ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
  ocfs2_set_acl+0x22d/0x260 [ocfs2]
  ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
  set_posix_acl+0x75/0xb0
  posix_acl_xattr_set+0x49/0xa0
  __vfs_setxattr+0x69/0x80
  __vfs_setxattr_noperm+0x72/0x1a0
  vfs_setxattr+0xa7/0xb0
  setxattr+0x12d/0x190
  path_setxattr+0x9f/0xb0
  SyS_setxattr+0x14/0x20

Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is
exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic
to avoid recursive cluster lock").

Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com
Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
Signed-off-by: Eric Ren <zren@suse.com>
Reported-by: Thomas Voegtle <tv@lio96.de>
Tested-by: Thomas Voegtle <tv@lio96.de>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherrypicked from commit 8818efaaacb78c60a9d90c5705b6c99b75d7d442)
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

ocfs2: fix deadlock issue when taking inode lock at vfs entry points

Orabug: 26427132

Conflicts:
    fs/ocfs2/file.c

Commit 3acdc8b3862a results in a deadlock, as the author realized shortly
after the patch was merged.  The discussion happened here

https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html

The reason why taking cluster inode lock at vfs entry points opens up a
self deadlock window, is explained in the previous patch of this series.

So far, we have seen two different code paths that have this issue.

1. do_sys_open
     may_open
       inode_permission
        ocfs2_permission
         ocfs2_inode_lock() <=== take PR
          generic_permission
           get_acl
            ocfs2_iop_get_acl
             ocfs2_inode_lock() <=== take PR

2. fchmod|fchmodat
    chmod_common
     notify_change
      ocfs2_setattr <=== take EX
       posix_acl_chmod
        get_acl
         ocfs2_iop_get_acl <=== remote PR request
        ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for these
funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Link: http://lkml.kernel.org/r/20170117100948.11657-3-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherrypicked from commit b891fa5024a95c77e0d6fd6655cb74af6fb77f46)
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

Orabug: 26427132

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a precess
already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that are
invoked directly by vfs code.  For instance:

  const struct inode_operations ocfs2_file_iops = {
      .permission     = ocfs2_permission,
      .get_acl        = ocfs2_iop_get_acl,
      .set_acl        = ocfs2_iop_set_acl,
  };

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):

  do_sys_open
   may_open
    inode_permission
     ocfs2_permission
      ocfs2_inode_lock() <=== first time
       generic_permission
        get_acl
         ocfs2_iop_get_acl
   ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two of
ocfs2_inode_lock().  Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of
the remote EX lock request.  Another hand, the recursive cluster lock
(the second one) will be blocked in in __ocfs2_cluster_lock() because of
OCFS2_LOCK_BLOCKED.  But, the downconvert never complete, why? because
there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.

1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track
   of the processes' pid who has taken the cluster lock of this lock
   resource;

2. introduce a new flag for ocfs2_inode_lock_full:
   OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for
   us if we've got cluster lock.

3. export a helper: ocfs2_is_locked_by_me() is used to check if we have
   got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs
routines can call into each other.

The performance penalty of processing the holder list should only be
seen at a few cases where the tracking logic is used, such as get/set
acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real
world, as far as I can see, including permission check,
(get|set)_(acl|attr), and the gfs2 code also do so.

[sfr@canb.auug.org.au remove some inlines]
Link: http://lkml.kernel.org/r/20170117100948.11657-2-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherrypicked from commit 439a36b8ef38657f765b80b775e2885338d72451)
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

Revert "add OCFS2_LOCK_RECURSIVE arg_flags to ocfs2_cluster_lock() to prevent hang"

Orabug: 26427132

This reverts commit 387775a70d2e5d89d1a81d78d66655337a5c2765.

Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

xen/blkfront: always allocate grants first from per-queue persistent grants

This patch partially reverts 3df0e50 ("xen/blkfront: pseudo support for
multi hardware queues/rings"). The xen-blkfront queue/ring might hang due
to grants allocation failure in the situation when gnttab_free_head is
almost empty while many persistent grants are reserved for this queue/ring.

As persistent grants management was per-queue since 73716df ("xen/blkfront:
make persistent grants pool per-queue"), we should always allocate from
persistent grants first.

Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26351401

upstream commit: bd912ef3e46b6edb51bb8af4b73fd2be7817e305
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>

rds: Make sure updates to cp_send_gen can be observed

cp->cp_send_gen is treated as a normal variable, although it may be
used by different threads.

This is fixed by using {READ,WRITE}_ONCE when it is incremented and
READ_ONCE when it is read outside the {acquire,release}_in_xmit
protection.

Normative reference from the Linux-Kernel Memory Model:

    Loads from and stores to shared (but non-atomic) variables should
    be protected with the READ_ONCE(), WRITE_ONCE(), and
    ACCESS_ONCE().

Clause 5.1.2.4/25 in the C standard is also relevant.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from upstream e623a48ee433985f6ca0fb238f0002cc2eccdf53)

Orabug: 26519030

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>

NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration

Transparent State Migration copies a client's lease state from the
server where a filesystem used to reside to the server where it now
resides. When an NFSv4.1 client first contacts that destination
server, it uses EXCHANGE_ID to detect trunking relationships.

The lease that was copied there is returned to that client, but the
destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
the client. This is because the lease was confirmed on the source
server (before it was copied).

Normally, when CONFIRMED_R is set, a client purges the lease and
creates a new one. However, that throws away the entire benefit of
Transparent State Migration.

Therefore, the client must not purge that lease when it is possible
that Transparent State Migration has occurred.

Reported-by: Xuan Qi <xuan.qi@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Xuan Qi <xuan.qi@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
(cherry picked from commit 8dcbec6d20eb881ba368d0aebc3a8a678aebb1da)

Orabug: 25727872
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>

xen: do not re-use pirq number cached in pci device msi msg data

Revert the main part of commit:
af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")

That commit introduced reading the pci device's msi message data to see
if a pirq was previously configured for the device's msi/msix, and re-use
that pirq.  At the time, that was the correct behavior.  However, a
later change to Qemu caused it to call into the Xen hypervisor to unmap
all pirqs for a pci device, when the pci device disables its MSI/MSIX
vectors; specifically the Qemu commit:
c976437c7dba9c7444fb41df45468968aaa326ad
("qemu-xen: free all the pirqs for msi/msix when driver unload")

Once Qemu added this pirq unmapping, it was no longer correct for the
kernel to re-use the pirq number cached in the pci device msi message
data.  All Qemu releases since 2.1.0 contain the patch that unmaps the
pirqs when the pci device disables its MSI/MSIX vectors.

This bug is causing failures to initialize multiple NVMe controllers
under Xen, because the NVMe driver sets up a single MSIX vector for
each controller (concurrently), and then after using that to talk to
the controller for some configuration data, it disables the single MSIX
vector and re-configures all the MSIX vectors it needs.  So the MSIX
setup code tries to re-use the cached pirq from the first vector
for each controller, but the hypervisor has already given away that
pirq to another controller, and its initialization fails.

This is discussed in more detail at:
https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html

Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
(cherry picked from commit c74fd80f2f41d05f350bb478151021f88551afe8)

Orabug: 26547167

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Re-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

MacSec: fix backporting error in patches for CVE-2017-7477

Orabug: 26443893

- macsec: dynamically allocate space for sglist (Jason A. Donenfeld)
[Orabug: 26368162]  {CVE-2017-7477}
- macsec: avoid heap overflow in skb_to_sgvec (Jason A. Donenfeld)  [Orabug:
26368162]  {CVE-2017-7477}

The backporting of above patches introduded a heap overrun error shown
as bug 26443893.

------------[ cut here ]------------
WARNING: CPU: 28 PID: 0 at kernel/time/timer.c:1177
call_timer_fn+0x142/0x150()
timer: mld_ifc_timer_expire+0x0/0x2d0 preempt leak: 00000100 -> 00000101
Modules linked in: gcm macsec fuse btrfs xor raid6_pq vfat msdos fat ext4
jbd2 ext2 mbcache2 ip6table_filter ip6_tables
BUG: workqueue leaked lock or atomic: kworker/15:2/0x00000001/689
     last function: addrconf_dad_work
CPU: 15 PID: 689 Comm: kworker/15:2 Not tainted 4.1.12-103.2.6.el7uek.x86_64
Hardware name: Oracle Corporation SUN SERVER X4-2       /ASSY,MOTHERBOARD,1U
, BIOS 25010603 01/16/2014
Workqueue: ipv6_addrconf addrconf_dad_work

Call Trace:
[<ffffffff81735938>] dump_stack+0x63/0x81
[<ffffffff810a0fd8>] process_one_work+0x3a8/0x460
[<ffffffff810a1582>] worker_thread+0x112/0x520
[<ffffffff810a1470>] ? rescuer_thread+0x3e0/0x3e0
[<ffffffff810a7348>] kthread+0xd8/0xf0
[<ffffffff810a7270>] ? kthread_create_on_node+0x1b0/0x1b0
[<ffffffff8173d9a2>] ret_from_fork+0x42/0x70
[<ffffffff810a7270>] ? kthread_create_on_node+0x1b0/0x1b0
BUG: scheduling while atomic: kworker/15:2/689/0x00000001

1. newly introduced variable "num_frags" not used in 'sg_ad', assumes
'MAX_SKB_FRAGS + 1'

2. Initialization of sglist assumes 'MAX_SKB_FRAGS + 1' length, though it was
changed to the number of scatterlist elements being returned from
"skb_cow_data()"

3. It seems that "sg_init_table(sg, MAX_SKB_FRAGS + 1);" is redundant, it was
already done a few lines before.

This patch may solve the above issues.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

ovl: move super block magic number to magic.h

Orabug: 26546379, 26540706
CVE-2016-1575
CVE-2016-1576

The overlayfs file system is not recognized by programs
like tail because the magic number is not in standard header location.

Move it so that the value will propagate on for the GNU library
and utilities. Needs to go in the fstatfs manual page as well.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
(cherry picked from commit 257f871993474e2bde6c497b54022c362cf398e1)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

ovl: use a minimal buffer in ovl_copy_xattr

Orabug: 26546379, 26540706
CVE-2016-1575
CVE-2016-1576

Rather than always allocating the high-order XATTR_SIZE_MAX buffer
which is costly and prone to failure, only allocate what is needed and
realloc if necessary.

Fixes https://github.com/coreos/bugs/issues/489

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
(cherry picked from commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

ovl: allow zero size xattr

Orabug: 26546379, 26540706
CVE-2016-1575
CVE-2016-1576

When ovl_copy_xattr() encountered a zero size xattr no more xattrs were
copied and the function returned success. This is clearly not the desired
behavior.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
(cherry picked from commit 97daf8b97ad6f913a34c82515be64dc9ac08d63e)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

ovl: default permissions

Orabug: 26546379, 26540706
CVE-2016-1575
CVE-2016-1576

Add mount option "default_permissions" to alter the way permissions are
calculated.

Without this option and prior to this patch permissions were calculated by
underlying lower or upper filesystem.

With this option the permissions are calculated by overlayfs based on the
file owner, group and mode bits.

This has significance for example when a read-only exported NFS filesystem
is used as a lower layer. In this case the underlying NFS filesystem will
reply with EROFS, in which case all we know is that the filesystem is
read-only. But that's not what we are interested in, we are interested in
whether the access would be allowed if the filesystem wasn't read-only; the
server doesn't tell us that, and would need updating at various levels,
which doesn't seem practicable.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
(cherry picked from commit 8d3095f4ad47ac409440a0ba1c80e13519ff867d)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

uek-rpm: Add missing .ko files to ueknano modules list

Orabug: 26521422

When installing kernel-ueknano rpm package, warnings are displayed due to missing
symbols. This commit adds modules with needed symbols to /lib/modules/ directory.

Fixes: 74d5ebd39bfa ("uek-rpm: Share specfile for both kernel-ueknano and kernel-uek")
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>

ping: implement proper locking

We got a report of yet another bug in ping

http://www.openwall.com/lists/oss-security/2017/03/24/6

->disconnect() is not called with socket lock held.

Fix this by acquiring ping rwlock earlier.

Thanks to Daniel, Alexander and Andrey for letting us know this problem.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Daniel Jiang <danieljiang0415@gmail.com>
Reported-by: Solar Designer <solar@openwall.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 43a6684519ab0a6c52024b5e25322476cabad893)

Orabug: 25883225
CEV: CVE-2017-2671

Signed-off-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect

If there is inflight I/O in any non-last queue, blkback returns -EBUSY
directly, and never stops thread of remaining queue and processs them. When
removing vbd device with lots of disk I/O load, some queues with inflight
I/O still have blkback thread running even though the corresponding vbd
device or guest is gone.
And this could cause some problems, for example, if the backend device type
is file, some loop devices and blkback thread always lingers there forever
after guest is destroyed, and this causes failure of umounting repositories
unless rebooting the dom0. So stop all threads properly and return -EBUSY
if any queue has inflight I/O.

OraBug: 26539922

Signed-off-by: Annie Li <annie.li@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>

uek-rpm: Share specfile for both kernel-ueknano and kernel-uek

Orabug: 26521422

The kernel-ueknano is added as sub package in kernel-uek spec file to
create both the binary rpms from the same source. The list of modules to
be added to kernel-ueknano is passed as input to the spec file.

Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed By: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-By: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>

PCI: Workaround wrong flags completions for IDT switch

The IDT switch incorrectly flags an ACS source violation on a read config
request to an end point device on the completion (IDT 89H32H8G3-YC,
errata #36) even though the PCI Express spec states that completions are
never affected by ACS source violation (PCI Spec 3.1, Section 6.12.1.1).

The suggested workaround by IDT is to issue a configuration write to the
downstream device before issuing the first config read. This allows the
downstream device to capture its bus number, thus avoiding the ACS
violation on the completion.

The patch does the following -

1. Disable ACS source violation if enabled
2. Wait for config space access to become available by reading vendor id
3. Do a config write to the end point (errata workaround)
4. Enable ACS source validation (if it was enabled to begin with)

-v2: move workaround to pci_bus_read_dev_vendor_id() from
pci_bus_check_dev()
      and move enable_acs_sv to drivers/pci/pci.c -- by Yinghai
-v3: add bus->self check for root bus and virtual bus for sriov vfs.
-v4: only do workaround for IDT switches
-v5: tweak pci_std_enable_acs_sv to deal with unimplemented SV and
clarify return value

Signed-off-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
--

  drivers/pci/pci.c   | 37 +++++++++++++++++++++++++++++++++++++
  drivers/pci/pci.h   |  1 +
  drivers/pci/probe.c | 38 ++++++++++++++++++++++++++++++++++++--
  3 files changed, 74 insertions(+), 2 deletions(-)

Orabug: 26243152

Link: https://patchwork.kernel.org/patch/9828571/
Fixed wrapped lines in the original patch to fit the lines in 80 columns.
Changed variable types from integer to bool to keep consistent with function
return type.

Signed-off-by: Shan Hai <shan.hai@oracle.com>

Revert "SUNRPC: Refactor svc_set_num_threads()"

This reverts commit 0bc9402329824ae06d8a26a73b60d21cdac3e6f2.

Orabug: 26479081
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>

Revert "NFSv4: Fix callback server shutdown"

This reverts commit 13757272bec28f1b8be59f56292e8f17076923b3.

Orabug: 26479081
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>

nmi_backtrace: generate one-line reports for idle cpus

When doing an nmi backtrace of many cores, most of which are idle, the
output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the interrupted
PC to see if it lies within that section.

This commit suitably tags x86 and tile idle routines, and only adds in
the minimal framework for other architectures.

Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Tested-by: Petr Mladek <pmladek@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6727ad9e206cc08b80d8000a4d67f8417e53539d)

Orabug: 25925689

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
arch/arm/kernel/vmlinux-xip.lds.S
arch/h8300/kernel/vmlinux.lds.S
arch/x86/kernel/process.c
drivers/acpi/processor_idle.c
kernel/sched/idle.c
lib/nmi_backtrace.c

netfilter: nf_tables: fix oob access

BUG: KASAN: slab-out-of-bounds in nf_tables_rule_destroy+0xf1/0x130 at addr ffff88006a4c35c8
Read of size 8 by task nft/1607

When we've destroyed last valid expr, nft_expr_next() returns an invalid expr.
We must not dereference it unless it passes != nft_expr_last() check.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 3e38df136e453aa69eb4472108ebce2fb00b1ba6)

Orabug: 25960439,26492640,26492632

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

scsi: libiscsi: use kvzalloc for iscsi_pool_init

iscsiadm session login can fail with the following error:

iscsiadm: Could not login to [iface: default, target: iqn.1986-03.com...
iscsiadm: initiator reported error (9 - internal error)

When /etc/iscsi/iscsid.conf sets node.session.cmds_max = 4096, it
results in 64K-sized kmallocs per session. A system under fragmented
slab pressure may not have any 64K objects available and fail iscsiadm
session login. Even though memory objects of a smaller size are
available, the large order allocation ends up failing.

The kernel prints a warning and does dump_stack, like below:

iscsid: page allocation failure: order:4, mode:0xc0d0
CPU: 0 PID: 2456 Comm: iscsid Not tainted 4.1.12-61.1.28.el6uek.x86_64 #2
Call Trace:
[<ffffffff816c6e40>] dump_stack+0x63/0x83
[<ffffffff8118e58a>] warn_alloc_failed+0xea/0x140
[<ffffffff81191df9>] __alloc_pages_slowpath+0x409/0x760
[<ffffffff81192401>] __alloc_pages_nodemask+0x2b1/0x2d0
[<ffffffffa048f6c0>] ? dev_attr_host_ipaddress+0x20/0xffffffffffffc722
[<ffffffff811dc38f>] alloc_pages_current+0xaf/0x170
[<ffffffff81192581>] alloc_kmem_pages+0x31/0xd0
[<ffffffffa048f600>] ? iscsi_transport_group+0x20/0xffffffffffffc7e2
[<ffffffff811ad738>] kmalloc_order+0x18/0x50
[<ffffffff811ad7a4>] kmalloc_order_trace+0x34/0xe0
[<ffffffff8146ee30>] ? transport_remove_classdev+0x70/0x70
[<ffffffff811e843d>] __kmalloc+0x27d/0x2a0
[<ffffffff810c8cbd>] ? complete_all+0x4d/0x60
[<ffffffffa04af299>] iscsi_pool_init+0x69/0x160 [libiscsi]
[<ffffffff81465d90>] ? device_initialize+0xb0/0xd0
[<ffffffffa04af510>] iscsi_session_setup+0x180/0x2f4 [libiscsi]
[<ffffffffa04c5a60>] ? iscsi_max_lun+0x20/0xfffffffffffffa9e [iscsi_tcp]
[<ffffffffa04c531f>] iscsi_sw_tcp_session_create+0xcf/0x150 [iscsi_tcp]
[<ffffffffa04c5a60>] ? iscsi_max_lun+0x20/0xfffffffffffffa9e [iscsi_tcp]
[<ffffffffa048a633>] iscsi_if_create_session+0x33/0xd0
[<ffffffffa04c5a60>] ? iscsi_max_lun+0x20/0xfffffffffffffa9e [iscsi_tcp]
[<ffffffffa048abd8>] iscsi_if_recv_msg+0x508/0x8c0 [scsi_transport_iscsi]
[<ffffffff811922eb>] ? __alloc_pages_nodemask+0x19b/0x2d0
[<ffffffff811e6d69>] ? __kmalloc_node_track_caller+0x209/0x2c0
[<ffffffffa048b00c>] iscsi_if_rx+0x7c/0x200 [scsi_transport_iscsi]
[<ffffffff81623dc6>] netlink_unicast+0x126/0x1c0
[<ffffffff8162468c>] netlink_sendmsg+0x36c/0x400
[<ffffffff815d2fed>] sock_sendmsg+0x4d/0x60
[<ffffffff815d596a>] ___sys_sendmsg+0x30a/0x330
[<ffffffff811bc72c>] ? handle_pte_fault+0x20c/0x230
[<ffffffff811bc90c>] ? __handle_mm_fault+0x1bc/0x330
[<ffffffff811bcb32>] ? handle_mm_fault+0xb2/0x1a0
[<ffffffff815d5b99>] __sys_sendmsg+0x49/0x90
[<ffffffff815d5bf9>] SyS_sendmsg+0x19/0x20
[<ffffffff816cbb2e>] system_call_fastpath+0x12/0x71

Use kvzalloc for iscsi_pool in iscsi_pool_init.

Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Tested-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Joseph Slember <joe.slember@oracle.com>
Reviewed-by: Lance Hartmann <lance.hartmann@oracle.com>
Acked-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26473178
(cherry picked from commit bfcc62ed7066268349e8e7955925bdaf4be0eec0)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

mm: introduce kv[mz]alloc helpers

Patch series "kvmalloc", v5.

There are many open coded kmalloc with vmalloc fallback instances in the
tree.  Most of them are not careful enough or simply do not care about
the underlying semantic of the kmalloc/page allocator which means that
a) some vmalloc fallbacks are basically unreachable because the kmalloc
part will keep retrying until it succeeds b) the page allocator can
invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.

As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.

Most callers, I could find, have been converted to use the helper
instead.  This is patch 6.  There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.

[1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com

This patch (of 9):

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code.  Yet we do not have any common helper
for that and so users have invented their own helpers.  Some of them are
really creative when doing so.  Let's just add kv[mz]alloc and make sure
it is implemented properly.  This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures.  This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them.  In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
require GFP_NO{FS,IO} context which is not vmalloc compatible in general
(note that the page table allocation is GFP_KERNEL).  Those need to be
fixed separately.

While we are at it, document that __vmalloc{_node} about unsupported gfp
mask because there seems to be a lot of confusion out there.
kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
superset) flags to catch new abusers.  Existing ones would have to die
slowly.

[sfr@canb.auug.org.au: f2fs fixup]
Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 26473178
(cherry picked from commit a7c3e901a46ff54c016d040847eda598a9e3e653)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Conflicts:

arch/x86/kvm/lapic.c
arch/x86/kvm/page_track.c
arch/x86/kvm/x86.c
fs/f2fs/f2fs.h
fs/f2fs/file.c
fs/f2fs/node.c
fs/f2fs/segment.c
fs/seq_file.c
security/apparmor/apparmorfs.c
security/apparmor/include/lib.h
security/apparmor/lib.c
security/apparmor/policy_unpack.c
virt/kvm/kvm_main.c

sg: Fix double-free when drives detach during SG_IO

In sg_common_write(), we free the block request and return -ENODEV if
the device is detached in the middle of the SG_IO ioctl().

Unfortunately, sg_finish_rem_req() also tries to free srp->rq, so we
end up freeing rq->cmd in the already free rq object, and then free
the object itself out from under the current user.

This ends up corrupting random memory via the list_head on the rq
object. The most common crash trace I saw is this:

  ------------[ cut here ]------------
  kernel BUG at block/blk-core.c:1420!
  Call Trace:
  [<ffffffff81281eab>] blk_put_request+0x5b/0x80
  [<ffffffffa0069e5b>] sg_finish_rem_req+0x6b/0x120 [sg]
  [<ffffffffa006bcb9>] sg_common_write.isra.14+0x459/0x5a0 [sg]
  [<ffffffff8125b328>] ? selinux_file_alloc_security+0x48/0x70
  [<ffffffffa006bf95>] sg_new_write.isra.17+0x195/0x2d0 [sg]
  [<ffffffffa006cef4>] sg_ioctl+0x644/0xdb0 [sg]
  [<ffffffff81170f80>] do_vfs_ioctl+0x90/0x520
  [<ffffffff81258967>] ? file_has_perm+0x97/0xb0
  [<ffffffff811714a1>] SyS_ioctl+0x91/0xb0
  [<ffffffff81602afb>] tracesys+0xdd/0xe2
    RIP [<ffffffff81281e04>] __blk_put_request+0x154/0x1a0

The solution is straightforward: just set srp->rq to NULL in the
failure branch so that sg_finish_rem_req() doesn't attempt to re-free
it.

Additionally, since sg_rq_end_io() will never be called on the object
when this happens, we need to free memory backing ->cmd if it isn't
embedded in the object itself.

KASAN was extremely helpful in finding the root cause of this bug.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26492266
(cherry picked from commit f3951a3709ff50990bf3e188c27d346792103432)
Signed-off-by: John Sobecki <john.sobecki@oracle.com>

scsi: smartpqi: mark PM functions as __maybe_unused

Orabug: 26191021, 26447813

The newly added suspend/resume support causes harmless warnings when
CONFIG_PM is disabled:

smartpqi/smartpqi_init.c:5147:12: error: 'pqi_ctrl_wait_for_pending_io' defined but not used [-Werror=unused-function]
smartpqi/smartpqi_init.c:2019:13: error: 'pqi_wait_until_lun_reset_finished' defined but not used [-Werror=unused-function]
smartpqi/smartpqi_init.c:2013:13: error: 'pqi_wait_until_scan_finished' defined but not used [-Werror=unused-function]

We can avoid the warnings by removing the #ifdef around the handlers and
instead marking them as __maybe_unused, which will let gcc drop the
unused code silently.

Fixes: f44d210312a6 ("scsi: smartpqi: add suspend and resume support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5c146686e32085e76ad9e2957f3dee9b28fe4f22)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: bump driver version

Orabug: 26191021, 26447813

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2d154f5ff338137a69f2f2a313520b6da2e1eb16)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: remove writeq/readq function definitions

Orabug: 26191021, 26447813

Instead of rewriting write/readq, use existing functions

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add module parameters

Orabug: 26191021, 26447813

Add module parameters to disable heartbeat support and to disable
shutting down the controller when a controller is taken offline.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5a259e32ba32c380537f3d186a311e528b9f9c94)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: cleanup list initialization

Orabug: 26191021, 26447813

Better initialization of linked list heads.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8a994a04fc3a8edbcc0ba1d17219b6d8f4c38009)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add raid level show

Orabug: 26191021, 26447813

Display the RAID level via sysfs

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a9f93392415eb0fc86c29f015822b36016278c72)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: make ioaccel references consistent

Orabug: 26191021, 26447813

- make all references to RAID bypass consistent throughout driver.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 588a63fea1c28009fe17f194941fb8d8b101b44e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: enhance device add and remove messages

Orabug: 26191021, 26447813

Improved formatting of information displayed when devices
are added/removed from the system.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6de783f666291763bcc6c3975e146b9b698378b1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update timeout on admin commands

Orabug: 26191021, 26447813

Increase the timeout on admin commands from 3 seconds to 60
seconds and added a check for controller crash in the loop
where the driver polls for admin command completion.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 13bede676b98d595a43d36a34e1835b686d0d140)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: map more raid errors to SCSI errors

Orabug: 26191021, 26447813

enhance mapping of RAID path errors to Linux SCSI host
error codes.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f5b63206255f68116c117565ab703c531c5ce400)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: cleanup controller branding

Orabug: 26191021, 26447813

- Improve controller branding support.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 37b36847a94669a898c9e3449d19d522d9c13979)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update rescan worker

Orabug: 26191021, 26447813

improve support for taking controller offline.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5f310425c8eabeeb303809898682e5b79c8a9c7e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update device offline

Orabug: 26191021, 26447813

- Improve handling of offline devices.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 03b288cf3d92202b950245e931576bb573930c70)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct aio error path

Orabug: 26191021, 26447813

set the internal flag that causes I/O to be sent down the
RAID path when the AIO path is disabled

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 376fb880a4fbf6903918a88081b16c167819af3f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: add lockup action

Orabug: 26191021, 26447813

add support for actions to take when controller goes offline.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3c50976f33f30cf00baea9d518bd3e7ddd01ecc4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: remove qdepth calculations for logical volumes

Orabug: 26191021, 26447813

make the queue depth for LVs the same as the maximum
I/Os supported by the controller

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 94086f5be3f15fc8231e65975e4413c0df3e0203)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: enhance kdump

Orabug: 26191021, 26447813

constrain resource usage during kdump to avoid kdump failures

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d727a776d72b26033161bc19441266749455115b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: change return value for LUN reset operations

Orabug: 26191021, 26447813

change return value for controller offline to be consistent
with the rest of the driver.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4e8415e3861e8b73a47c92e09e044b9dbc8ee37f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add ptraid support

Orabug: 26191021, 26447813

add support for PTRAID devices

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit bd10cf0be6057f680fab911d89761fd15d76b205)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update copyright

Orabug: 26191021, 26447813

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b805dbfe2bce1ddf3209c29f1aa7d6b2064ab6c9)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: cleanup messages

Orabug: 26191021, 26447813

- improve some error messages.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d87d5474e2080695ef0cc8c5e6c42a41d6ab961b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add new PCI device IDs

Orabug: 26191021, 26447813

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7eddabff8acb0f4c25f992efe126cf6cccdd6e7b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: minor driver cleanup

Orabug: 26191021, 26447813

- remove debug code that is no longer necessary.
- Some WARN_ON checks were removed because the driver continues
to function when the conditions are met.
- remove a MACRO that is no longer used.
- remove unnecessary multi-line statements.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit cbe0c7b11dbfda368f27a6935a08ba91522edf1a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct BMIC identify physical drive

Orabug: 26191021, 26447813

correct the BMIC Identify Physical Device structure
- missing 2 fields

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1be42f46ade32c668f11c0735af03ab2d479d206)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: eliminate redundant error messages

Orabug: 26191021, 26447813

eliminate redundant error message during initialization
if the controller has crashed.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8845fdfa92ab6eb24209f9929d6340c2f5d4a2de)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add pqi_wait_for_completion_io

Orabug: 26191021, 26447813

Add check for controller lockup during waits for synchronous
controller commands.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1f37e992ad8015ce33596466b0f36babb495148e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct bdma hw bug

Orabug: 26191021, 26447813

add workaround for BDMA hardware bug that can cause
hw to read up to 12 SGL elements (192 bytes) beyond the
last element in the list. This fix avoids IOMMU violations

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e1d213bdc3e359c6c5da8ebbc5b2e87b376e8777)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add heartbeat check

Orabug: 26191021, 26447813

check for controller lockups

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 98f876674a6fba3591c342dfbcfdbaa7ecf0a84e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add suspend and resume support

Orabug: 26191021, 26447813

add support for ACPI S3 (suspend) and S4 (hibernate)
system power states.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 061ef06a2d436cea85984cf0b51b452547a5496c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: enhance resets

Orabug: 26191021, 26447813

- Block all I/O targeted at LUN reset device.
- Wait until all I/O targeted at LUN reset device has been
consumed by the controller.
- Issue LUN reset request.
- Wait until all outstanding I/Os and LUN reset completion
have been received by the host.
- Return to OS results of LUN reset request.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7561a7e4412e515100ac195303531fc2621ac2db)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add supporting events

Orabug: 26191021, 26447813

Only register for controller events that driver supports
cleanup event handling.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6a50d6ada03d8d9102a632d0e2db70cd9b6620f5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: ensure controller is in SIS mode at init

Orabug: 26191021, 26447813

put in SIS mode during initialization.
support kexec/kdump

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 162d7753fce9a00719c09dfebd9fee3855e27fbe)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add in controller checkpoint for controller lockups.

Orabug: 26191021, 26447813

tell smartpqi controller to generate a checkpoint for rare lockup
conditions.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5b0fba0f408777113eff93bd18ab0b9f80760fb7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: set pci completion timeout

Orabug: 26191021, 26447813

add support for setting PCIe completion timeout.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a81ed5f338a843d8bfd199928142b196d71ae62c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct remove scsi devices

Orabug: 26191021, 26447813

correct a problem caused by holding a spinlock during device deletion.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a37ef74517acf0d022ab4c8fa671c82c877eed7b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: fix time handling

Orabug: 26191021, 26447813

When we have turned off RTC support, the smartpqi driver fails to build:

ERROR: "rtc_time64_to_tm" [drivers/scsi/smartpqi/smartpqi.ko] undefined!

This is easily avoided by using the generic 'struct tm' based helper rather
than the RTC specific one. While fixing this, I noticed that even though
the driver uses time64_t for storing seconds, it gets them from the
old 32-bit struct timeval. To address this, we can simplify the code
by calling ktime_get_real_seconds() directly.

Fixes: 6c223761eb54 ("smartpqi: initial commit of Microsemi smartpqi driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ed10858eadd4988260c6bc7d75fc25176342b5a7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/sock: add WARN_ON(parent->sk) in sock_graft()

sock_graft() unilaterally sets up parent->sk based on the
assumption that the existing parent->sk is null. If this
condition is not true, then the existing parent->sk would
be leaked, so add a WARN_ON() to alert callers who may fall
in this category.

Orabug: 26477756

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

rds: tcp: use sock_create_lite() to create the accept socket

There are two problems with calling sock_create_kern() from
rds_tcp_accept_one()
1. it sets up a new_sock->sk that is wasteful, because this ->sk
is going to get replaced by inet_accept() in the subsequent ->accept()
2. The new_sock->sk is a leaked reference in sock_graft() which
expects to find a null parent->sk

Avoid these problems by calling sock_create_lite().

Orabug: 26477756

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

rds: tcp: set linger to 1 when unloading a rds-tcp

If we are unloading the rds_tcp module, we can set linger to 1
and drop pending packets to accelerate reconnect. The peer will
end up resetting the connection based on new generation numbers
of the new incarnation, so hanging on to unsent TCP packets via
linger is mostly pointless in this case.

Orabug: 26477841

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

rds: tcp: send handshake ping-probe from passive endpoint

The RDS handshake ping probe added by commit 5916e2c1554f
("RDS: TCP: Enable multipath RDS for TCP") is sent from rds_sendmsg()
before the first data packet is sent to a peer. If the conversation
is not bidirectional (i.e., one side is always passive and never
invokes rds_sendmsg()) and the passive side restarts its rds_tcp
module, a new HS ping probe needs to be sent, so that the number
of paths can be re-established.

This patch achieves that by sending a HS ping probe from
rds_tcp_accept_one() when c_npaths is 0 (i.e., we have not done
a handshake probe with this peer yet).

Orabug: 26477841

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

xfs: skip dirty pages in ->releasepage()

Orabug: 26451790

XFS has had scattered reports of delalloc blocks present at
->releasepage() time. This results in a warning with a stack trace
similar to the following:

...
Call Trace:
  [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
  [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
  [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
  [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
  [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
  [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
  [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
  [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
  [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
  [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
  [<ffffffffa2168539>] kswapd+0x4f9/0x970
  [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
  [<ffffffffa20a0d99>] kthread+0xc9/0xe0
  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
  [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100

This occurs because it is possible for shrink_active_list() to send
pages marked dirty to ->releasepage() when certain buffer_head threshold
conditions are met. shrink_active_list() doesn't check the page dirty
state apparently to handle an old ext3 corner case where in some cases
clean pages would not have the dirty bit cleared, thus it is up to the
filesystem to determine how to handle the page.

XFS currently handles the delalloc case properly, but this behavior
makes the warning spurious. Update the XFS ->releasepage() handler to
explicitly skip dirty pages. Retain the existing delalloc/unwritten
checks so we continue to warn if such buffers exist on clean pages when
they shouldn't.

Diagnosed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
(cherry picked from commit 99579ccec4e271c3d4d4e7c946058766812afdab)
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

qede: Add support for ingress headroom

Orabug: 25933053, 26439680

Driver currently doesn't support any headroom; The only 'available'
space it has in the head of the buffer is due to the placement
offset.
In order to allow [later] support of XDP adjustment of headroom,
modify the the ingress flow to properly handle a scenario where
the packets would have such.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qede: Update receive statistic once per NAPI

Orabug: 25933053, 26439680

Currently, each time an ingress packet is passed to networking stack
the driver increments a per-queue SW statistic.
As we want to have additional fields in the first cache-line of the
Rx-queue struct, change flow so this statistic would be updated once per
NAPI run. We will later push the statistic to a different cache line.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Make OOO archipelagos into an array

Orabug: 25933053, 26439680

No need to maintain the various open archipelagos as a list -
The maximal number of them is known, and we can use the CID
as key for random-access into the array.

Signed-off-by: Michal Kalderon <Michal.Kalderon@caviumc.om>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Provide iSCSI statistics to management

Orabug: 25933053, 26439680

Management firmware can query for some basic iSCSI-related statistics.
Provide those just as we do for other protocols.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Inform qedi the number of possible CQs

Orabug: 25933053, 26439680

Now that management firmware is capable of telling us the number of CQs
available for a given PF, qed needs to communicate the number to qedi
so it would know have many to use.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Add missing stat for new isles

Orabug: 25933053, 26439680

Firmware provides a statistic for the number of out-of-order isles
it used - fill it in the iscsi-related statistics.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Don't close the OUT_EN during init

Orabug: 25933053, 26439680

Before initializing the chip's engine, driver currently closes a set
of registers on the HW's ingress flow to prevent packets from slipping
in while they're not supposed to.

This configuration is insufficient, as there are some scenarios where
packets would still arrive even when said registers are set,
but the management firmware already closes other per-port registers
that do suffice, making this setting unnecessray.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Configure cacheline size in HW

Orabug: 25933053, 26439680

Default HW configuration is optimal for an architecture where cache
line size is 64B.

During chip initialization, properly initialize the cache line size
in HW to avoid possible redundant PCI transactions.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Don't use main-ptt in unrelated flows

Orabug: 25933053, 26439680

In order to access HW registers driver needs to acquire a PTT entry
[mapping between bar memory and internal chip address].
Since acquiring PTT entries could fail [at least in theory] as their
number is finite and other flows can hold them, we reserve special PTT
entries for 'important' enough flows - ones we want to guarantee that
would not be susceptible to such issues.

One such special entry is the 'main' PTT which is meant to be used in
flows such as chip initialization and de-initialization.
However, there are other flows that are also using that same entry
for their own purpose, and might run concurrently with the original
flows [notice that for most cases using the main-ptt by mistake, such
a race is still impossible, at least today].

This patch re-organizes the various functions that currently use the
main_ptt in one of two ways:

  - If a function shouldn't use the main_ptt it starts acquiring and
    releasing it's own PTT entry and use it instead. Notice if those
    functions previously couldn't fail, they now can [as acquisition
    might fail].

  - Change the prototypes so that the main_ptt would be received as
    a parameter [instead of explicitly accessing it].
    This prevents the future risk of adding codes that introduces new
    use-cases for flows using the main_ptt, ones that might be in race
    with the actual 'main' flows.

Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Warn PTT usage by wrong hw-function

Orabug: 25933053, 26439680

PTT entries are per-hwfn; If some errneous flow is trying
to use a PTT belonging to a differnet hwfn warn user, as this
can break every register accessing flow later and is very hard
to root-cause.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Correct MSI-x for storage

Orabug: 25933053, 26439680

When qedr is enabled, qed would try dividing the msi-x vectors between
L2 and RoCE, starting with L2 and providing it with sufficient vectors
for its queues.

Problem is qed would also do that for storage partitions, and as those
don't need queues it would lead qed to award those partitions with 0
msi-x vectors, causing them to believe theye're using INTa and
preventing them from operating.

Fixes: 51ff17251c9c ("qed: Add support for RoCE hw init")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: fix missing break in OOO_LB_TC case

Orabug: 25933053, 26439680

There seems to be a missing break on the OOO_LB_TC case, pq_id
is being assigned and then re-assigned on the fall through default
case and that seems suspect.

Detected by CoverityScan, CID#1424402 ("Missing break in switch")

Fixes: b5a9ee7cf3be1 ("qed: Revise QM cofiguration")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Add a missing error code

Orabug: 25933053, 26439680

We should be returning -ENOMEM if qed_mcp_cmd_add_elem() fails. The
current code returns success.

Fixes: 4ed1eea82a21 ("qed: Revise MFW command locking")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: RoCE doesn't need to use SRC

Orabug: 25933053, 26439680

As RoCE doesn't need to use the SRC, allocating ILT memory
on behalf of RoCE is wasting available ILT lines.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Correct TM ILT lines in presence of VFs

Orabug: 25933053, 26439680

As of today there's no protocol supported that requires
support from the TM hardware block and enables SRIOV,
but we should still correct the calculation to reflect
the lines required for such future VFs instead of changing
the PF's own lines.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Fix TM block ILT allocation

Orabug: 25933053, 26439680

When configuring the HW timers block we should set the number of CIDs
up until the last CID that require timers, instead of only those CIDs
whose protocol needs timers support.

Today, the protocols that require HW timers' support have their CIDs
before any other protocol, but that would change in future [when we
add iWARP support].

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Revise QM cofiguration

Orabug: 25933053, 26439680

Refactor and clean up the queue manager initialization logic.
Also, this adds support for RoC low latency queues, which later
would be used for improving RoCE latency in high throughput scenarios.

Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

qed: Use BDQ resource for storage protocols

Orabug: 25933053, 26439680

Until now, qed used some port-defined value as BDQ index for both iSCSI
and FCoE.

As management firmware now treats BDQ as a resource and tells each PF
its BDQ-range, start using a valure from that range instead.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>