www.infradead.org Git - users/jedix/linux-maple.git/log

ovl: rename is_merge to is_lowest

The 'is_merge' is an historical naming from when only a single lower layer
could exist. With the introduction of multiple lower layers the meaning of
this flag was changed to mean only the "lowest layer" (while all lower
layers were being merged).

So now 'is_merge' is inaccurate and hence renaming to 'is_lowest'

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 56656e960b555cb98bc414382566dcb59aae99a2)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: cleanup unused var in rename2

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 6986c012faa480fb0fda74eaae9abb22f7ad1004)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: Switch to generic_getxattr

Now that overlayfs has xattr handlers for iop->{set,remove}xattr, use
those same handlers for iop->getxattr as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Orabug: 26401569

(backport upstream commit 0eb45fc3bb7a2cf9c9c93d9e95986a841e5f4625)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: Switch to generic_removexattr

Commit d837a49bd57f ("ovl: fix POSIX ACL setting") switches from
iop->setxattr from ovl_setxattr to generic_setxattr, so switch from
ovl_removexattr to generic_removexattr as well. As far as permission
checking goes, the same rules should apply in either case.

While doing that, rename ovl_setxattr to ovl_xattr_set to indicate that
this is not an iop->setxattr implementation and remove the unused inode
argument.

Move ovl_other_xattr_set above ovl_own_xattr_set so that they match the
order of handlers in ovl_xattr_handlers.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Orabug: 26401569

(backport upstream commit 0e585ccc13b3edbb187fb4f1b7cc9397f17d64a9)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: Get rid of ovl_xattr_noacl_handlers array

Use an ordinary #ifdef to conditionally include the POSIX ACL handlers
in ovl_xattr_handlers, like the other filesystems do. Flag the code
that is now only used conditionally with __maybe_unused.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 0c97be22f928b85110504c4bbb8574facb4bd0c0)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: Fix OVL_XATTR_PREFIX

Make sure ovl_own_xattr_handler only matches attribute names starting
with "overlay.", not "overlayXXX".

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Orabug: 26401569

(backport upstream commit fe2b75952347762a21f67d9df1199137ae5988b2)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: xattr filter fix

a) ovl_need_xattr_filter() is wrong, we can have multiple lower layers
overlaid, all of which (except the lowest one) honouring the
"trusted.overlay.opaque" xattr. So need to filter everything except the
bottom and the pure-upper layer.

b) we no longer can assume that inode is attached to dentry in
get/setxattr.

This patch unconditionally filters private xattrs to fix both of the above.
Performance impact for get/removexattrs is likely in the noise.

For listxattrs it might be measurable in pathological cases, but I very
much hope nobody cares. If they do, we'll fix it then.

Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: b96809173e94 ("security_d_instantiate(): move to the point prior to attaching dentry to inode")
Orabug: 26401569

(backport upstream commit b581755b1c565391c72d03b157ba2dd0b18e9d15)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: fix warnings caused by WRITE_ONCE

Orabug: 26401569

The commit 39b681f80(ovl: store real inode pointer in ->i_private)
adds a call to the WRITE_ONCE which generates "initialization makes pointer
from integer without a cast" warnings, fix it by coverting to the required type.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: simplify empty checking

The empty checking logic is duplicated in ovl_check_empty_and_clear() and
ovl_remove_and_whiteout(), except the condition for clearing whiteouts is
different:

ovl_check_empty_and_clear() checked for being upper

ovl_remove_and_whiteout() checked for merge OR lower

Move the intersection of those checks (upper AND merge) into
ovl_check_empty_and_clear() and simplify ovl_remove_and_whiteout().

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 30c17ebfb2a11468fe825de19afa3934ee98bfd2)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

qstr: constify instances in overlayfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 29c42e80ba5b1e59a4f427b44e2bdebd347b9409)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: clear nlink on rmdir

To make delete notification work on fa/inotify.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit dbc816d05ddcfb189af8784d04fc84c812db3747)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: append MAY_READ when diluting write checks

Right now we remove MAY_WRITE/MAY_APPEND bits from mask if realfile is on
lower/. This is done as files on lower will never be written and will be
copied up. But to copy up a file, mounter should have MAY_READ permission
otherwise copy up will fail. So set MAY_READ in mask when MAY_WRITE is
reset.

Dan Walsh noticed this when he did access(lowerfile, W_OK) and it returned
True (context mounts) but when he tried to actually write to file, it
failed as mounter did not have permission on lower file.

[SzM] don't set MAY_READ if only MAY_APPEND is set without MAY_WRITE; this
won't trigger a copy-up.

Reported-by: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 500cac3ccee65526d5075da3af2674101305bf8c)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: dilute permission checks on lower only if not special file

Right now if file is on lower/, we remove MAY_WRITE/MAY_APPEND bits from
mask as lower/ will never be written and file will be copied up. But this
is not true for special files. These files are not copied up and are opened
in place. So don't dilute the checks for these types of files.

Reported-by: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit e29841a0ab3d03e77313abd8fb4c16e80fb26e29)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: fix POSIX ACL setting

Setting POSIX ACL needs special handling:

1) Some permission checks are done by ->setxattr() which now uses mounter's
creds ("ovl: do operations on underlying file system in mounter's
context"). These permission checks need to be done with current cred as
well.

2) Setting ACL can fail for various reasons. We do not need to copy up in
these cases.

In the mean time switch to using generic_setxattr.

[Arnd Bergmann] Fix link error without POSIX ACL. posix_acl_from_xattr()
doesn't have a 'static inline' implementation when CONFIG_FS_POSIX_ACL is
disabled, and I could not come up with an obvious way to do it.

This instead avoids the link error by defining two sets of ACL operations
and letting the compiler drop one of the two at compile time depending
on CONFIG_FS_POSIX_ACL. This avoids all references to the ACL code,
also leading to smaller code.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit d837a49bd57f1ec2f6411efa829fecc34002b110)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: share inode for hard link

Inode attributes are copied up to overlay inode (uid, gid, mode, atime,
mtime, ctime) so generic code using these fields works correcty. If a hard
link is created in overlayfs separate inodes are allocated for each link.
If chmod/chown/etc. is performed on one of the links then the inode
belonging to the other ones won't be updated.

This patch attempts to fix this by sharing inodes for hard links.

Use inode hash (with real inode pointer as a key) to make sure overlay
inodes are shared for hard links on upper. Hard links on lower are still
split (which is not user observable until the copy-up happens, see
Documentation/filesystems/overlayfs.txt under "Non-standard behavior").

The inode is only inserted in the hash if it is non-directoy and upper.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 51f7e52dc943468c6929fa0a82d4afac3c8e9636)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: store real inode pointer in ->i_private

To get from overlay inode to real inode we currently use 'struct
ovl_entry', which has lifetime connected to overlay dentry. This is okay,
since each overlay dentry had a new overlay inode allocated.

Following patch will break that assumption, so need to leave out ovl_entry.
This patch stores the real inode directly in i_private, with the lowest bit
used to indicate whether the inode is upper or lower.

Lifetime rules remain, using ovl_inode_real() must only be done while
caller holds ref on overlay dentry (and hence on real dentry), or within
RCU protected regions.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 39b681f8026c170a73972517269efc830db0d7ce)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: permission: return ECHILD instead of ENOENT

The error is due to RCU and is temporary.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit a999d7e161a085e30181d0a88f049bd92112e172)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: update atime on upper

Fix atime update logic in overlayfs.

This patch adds an i_op->update_time() handler to overlayfs inodes.  This
forwards atime updates to the upper layer only.  No atime updates are done
on lower layers.

Remove implicit atime updates to underlying files and directories with
O_NOATIME.  Remove explicit atime update in ovl_readlink().

Clear atime related mnt flags from cloned upper mount.  This means atime
updates are controlled purely by overlayfs mount options.

Reported-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit d719e8f268fa4f9944b24b60814da9017dfb7787)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: convert inode_lock to mutex_lock

Orabug: 26401569

This patch is a fix to the conflict of the upstream commit bb0d2b8ad29
(ovl: fix sgid on directory), convert the lock type to match with the
lock usage of the current kernel.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: fix sgid on directory

When creating directory in workdir, the group/sgid inheritance from the
parent dir was omitted completely. Fix this by calling inode_init_owner()
on overlay inode and using the resulting uid/gid/mode to create the file.

Unfortunately the sgid bit can be stripped off due to umask, so need to
reset the mode in this case in workdir before moving the directory in
place.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit bb0d2b8ad29630b580ac903f989e704e23462357)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: simplify permission checking

The fact that we always do permission checking on the overlay inode and
clear MAY_WRITE for checking access to the lower inode allows cruft to be
removed from ovl_permission().

1) "default_permissions" option effectively did generic_permission() on the
overlay inode with i_mode, i_uid and i_gid updated from underlying
filesystem. This is what we do by default now. It did the update using
vfs_getattr() but that's only needed if the underlying filesystem can
change (which is not allowed). We may later introduce a "paranoia_mode"
that verifies that mode/uid/gid are not changed.

2) splitting out the IS_RDONLY() check from inode_permission() also becomes
unnecessary once we remove the MAY_WRITE from the lower inode check.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 9c630ebefeeee4363ffd29f2f9b18eddafc6479c)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: do not require mounter to have MAY_WRITE on lower

Now we have two levels of checks in ovl_permission(). overlay inode
is checked with the creds of task while underlying inode is checked
with the creds of mounter.

Looks like mounter does not have to have WRITE access to files on lower/.
So remove the MAY_WRITE from access mask for checks on underlying
lower inode.

This means task should still have the MAY_WRITE permission on lower
inode and mounter is not required to have MAY_WRITE.

It also solves the problem of read only NFS mounts being used as lower.
If __inode_permission(lower_inode, MAY_WRITE) is called on read only
NFS, it fails. By resetting MAY_WRITE, check succeeds and case of
read only NFS shold work with overlay without having to specify any
special mount options (default permission).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 754f8cb72b42a3a6100d2bbb1cb885361a7310dd)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: do operations on underlying file system in mounter's context

Given we are now doing checks both on overlay inode as well underlying
inode, we should be able to do checks and operations on underlying file
system using mounter's context.

So modify all operations to do checks/operations on underlying dentry/inode
in the context of mounter.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 1175b6b8d96331676f1d436b089b965807f23b4a)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: fix uid/gid when creating over whiteout

Fix a regression when creating a file over a whiteout.  The new
file/directory needs to use the current fsuid/fsgid, not the ones from the
mounter's credentials.

The refcounting is a bit tricky: prepare_creds() sets an original refcount,
override_creds() gets one more, which revert_cred() drops.  So

  1) we need to expicitly put the mounter's credentials when overriding
     with the updated one

  2) we need to put the original ref to the updated creds (and this can
     safely be done before revert_creds(), since we'll still have the ref
     from override_creds()).

Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Fixes: 3fe6e52f0626 ("ovl: override creds with the ones from the superblock mounter")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit d0e13f5bbe4be7c8f27736fc40503dcec04b7de0)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: modify ovl_permission() to do checks on two inodes

Right now ovl_permission() calls __inode_permission(realinode), to do
permission checks on real inode and no checks are done on overlay inode.

Modify it to do checks both on overlay inode as well as underlying inode.
Checks on overlay inode will be done with the creds of calling task while
checks on underlying inode will be done with the creds of mounter.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit c0ca3d70e8d3cf81e2255a217f7ca402f5ed0862)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: define ->get_acl() for overlay inodes

Now we are planning to do DAC permission checks on overlay inode
itself. And to make it work, we will need to make sure we can get acls from
underlying inode. So define ->get_acl() for overlay inodes and this in turn
calls into underlying filesystem to get acls, if any.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 39a25b2b37629f65e5a1eba1b353d0b47687c2ca)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: move some common code in a function

ovl_create_upper() and ovl_create_over_whiteout() seem to be sharing some
common code which can be moved into a separate function. No functionality
change.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 72e48481815eeca72fc886b3be91301ad87d6aeb)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: store ovl_entry in inode->i_private for all inodes

Previously this was only done for directory inodes. Doing so for all
inodes makes for a nice cleanup in ovl_permission at zero cost.

Inodes are not shared for hard links on the overlay, so this works fine.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 58ed4e70f253d80ed72faba7873dc11603b398bc)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: use generic_delete_inode

No point in keeping overlay inodes around since they will never be reused.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit eead4f2dc4f851a3790c49850e96a1d155bf5451)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: check mounter creds on underlying lookup

The hash salting changes meant that we can no longer reuse the hash in the
overlay dentry to look up the underlying dentry.

Instead of lookup_hash(), use lookup_one_len_unlocked() and swith to
mounter's creds (like we do for all other operations later in the series).

Now the lookup_hash() export introduced in 4.6 by 3c9fe8cdff1b ("vfs: add
lookup_hash() helper") is unused and can possibly be removed; its
usefulness negated by the hash salting and the idea that mounter's creds
should be used on operations on underlying filesystems.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 8387ff2577eb ("vfs: make the string hashes salt the hash")
Orabug: 26401569

(backport upstream commit c1b2cc1a765aff4df7b22abe6b66014236f73eba)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: ignore permissions on underlying lookup

Generally permission checking is not necessary when overlayfs looks up a
dentry on one of the underlying layers, since search permission on base
directory was already checked in ovl_permission().

More specifically using lookup_one_len() causes a problem when the lower
directory lacks search permission for a specific user while the upper
directory does have search permission. Since lookups are cached, this
causes inconsistency in behavior: success depends on who did the first
lookup.

So instead use lookup_hash() which doesn't do the permission check.

Reported-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 38b78a5f18584db6fa7441e0f4531b283b0e6725)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: override creds with the ones from the superblock mounter

In user namespace the whiteout creation fails with -EPERM because the
current process isn't capable(CAP_SYS_ADMIN) when setting xattr.

A simple reproducer:

$ mkdir upper lower work merged lower/dir
$ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
$ unshare -m -p -f -U -r bash

Now as root in the user namespace:

\# touch merged/dir/{1,2,3} # this will force a copy up of lower/dir
\# rm -fR merged/*

This ends up failing with -EPERM after the files in dir has been
correctly deleted:

unlinkat(4, "2", 0)                     = 0
unlinkat(4, "1", 0)                     = 0
unlinkat(4, "3", 0)                     = 0
close(4)                                = 0
unlinkat(AT_FDCWD, "merged/dir", AT_REMOVEDIR) = -1 EPERM (Operation not
permitted)

Interestingly, if you don't place files in merged/dir you can remove it,
meaning if upper/dir does not exist, creating the char device file works
properly in that same location.

This patch uses ovl_sb_creator_cred() to get the cred struct from the
superblock mounter and override the old cred with these new ones so that
the whiteout creation is possible because overlay is wrong in assuming that
the creds it will get with prepare_creds will be in the initial user
namespace.  The old cap_raise game is removed in favor of just overriding
the old cred struct.

This patch also drops from ovl_copy_up_one() the following two lines:

override_cred->fsuid = stat->uid;
override_cred->fsgid = stat->gid;

This is because the correct uid and gid are taken directly with the stat
struct and correctly set with ovl_set_attr().

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 26401569

(backport upstream commit 3fe6e52f062643676eb4518d68cee3bc1272091b)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: fix dentry leak for default_permissions

When using the 'default_permissions' mount option, ovl_permission() on
non-directories was missing a dput(alias), resulting in "BUG Dentry still
in use".

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 8d3095f4ad47 ("ovl: default permissions")
Cc: <stable@vger.kernel.org> # v4.5+
Orabug: 26401569

(backport upstream commit a4859d75944a726533ab86d24bb5ffd1b2b7d6cc)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

ovl: fix open in stacked overlay

If two overlayfs filesystems are stacked on top of each other, then we need
recursion in ovl_d_select_inode().

I guess d_backing_inode() is supposed to do that. But currently it doesn't
and that functionality is open coded in vfs_open(). This is now copied
into ovl_d_select_inode() to fix this regression.

Reported-by: Alban Crequy <alban.crequy@gmail.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay...")
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org> # v4.2+
Orabug: 26401569

(backport upstream commit 1c8a47df36d72ace8cf78eb6c228aa0f8027d3c2)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

nfsd: don't hold i_mutex over userspace upcalls

We need information about exports when crossing mountpoints during
lookup or NFSv4 readdir.  If we don't already have that information
cached, we may have to ask (and wait for) rpc.mountd.

In both cases we currently hold the i_mutex on the parent of the
directory we're asking rpc.mountd about.  We've seen situations where
rpc.mountd performs some operation on that directory that tries to take
the i_mutex again, resulting in deadlock.

With some care, we may be able to avoid that in rpc.mountd.  But it
seems better just to avoid holding a mutex while waiting on userspace.

It appears that lookup_one_len is pretty much the only operation that
needs the i_mutex.  So we could just drop the i_mutex elsewhere and do
something like

mutex_lock()
lookup_one_len()
mutex_unlock()

In many cases though the lookup would have been cached and not required
the i_mutex, so it's more efficient to create a lookup_one_len() variant
that only takes the i_mutex when necessary.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Orabug: 26401569

(backport upstream commit bbddca8e8fac07ece3938e03526b5d00fa791a4c)

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

Revert "ixgbevf: get rid of custom busy polling code"

This reverts commit 1975e69c708706b84d9462ce7c0135d33310c28a. Performance regression,
because the net/core napi support is not present.

Orabug: 26494997
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>

Revert "ixgbe: get rid of custom busy polling code"

This reverts commit 9244251e4f45dc9a61dd094a5d7ba23bb0285a86. The core napi support is
not in place, we need to keep the driver support or performance suffers.

Orabug: 26494997
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>

ocfs2: fix deadlock caused by recursive locking in xattr

Orabug: 26427132

Another deadlock path caused by recursive locking is reported.  This
kind of issue was introduced since commit 743b5f1434f5 ("ocfs2: take
inode lock in ocfs2_iop_set/get_acl()").  Two deadlock paths have been
fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when taking
inode lock at vfs entry points").  Yes, we intend to fix this kind of
case in incremental way, because it's hard to find out all possible
paths at once.

This one can be reproduced like this.  On node1, cp a large file from
home directory to ocfs2 mountpoint.  While on node2, run
setfacl/getfacl.  Both nodes will hang up there.  The backtraces:

On node1:
  __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
  ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
  ocfs2_write_begin+0x43/0x1a0 [ocfs2]
  generic_perform_write+0xa9/0x180
  __generic_file_write_iter+0x1aa/0x1d0
  ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
  __vfs_write+0xc3/0x130
  vfs_write+0xb1/0x1a0
  SyS_write+0x46/0xa0

On node2:
  __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
  ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
  ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
  ocfs2_set_acl+0x22d/0x260 [ocfs2]
  ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
  set_posix_acl+0x75/0xb0
  posix_acl_xattr_set+0x49/0xa0
  __vfs_setxattr+0x69/0x80
  __vfs_setxattr_noperm+0x72/0x1a0
  vfs_setxattr+0xa7/0xb0
  setxattr+0x12d/0x190
  path_setxattr+0x9f/0xb0
  SyS_setxattr+0x14/0x20

Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is
exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic
to avoid recursive cluster lock").

Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com
Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
Signed-off-by: Eric Ren <zren@suse.com>
Reported-by: Thomas Voegtle <tv@lio96.de>
Tested-by: Thomas Voegtle <tv@lio96.de>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherrypicked from commit 8818efaaacb78c60a9d90c5705b6c99b75d7d442)
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

ocfs2: fix deadlock issue when taking inode lock at vfs entry points

Orabug: 26427132

Conflicts:
    fs/ocfs2/file.c

Commit 3acdc8b3862a results in a deadlock, as the author realized shortly
after the patch was merged.  The discussion happened here

https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html

The reason why taking cluster inode lock at vfs entry points opens up a
self deadlock window, is explained in the previous patch of this series.

So far, we have seen two different code paths that have this issue.

1. do_sys_open
     may_open
       inode_permission
        ocfs2_permission
         ocfs2_inode_lock() <=== take PR
          generic_permission
           get_acl
            ocfs2_iop_get_acl
             ocfs2_inode_lock() <=== take PR

2. fchmod|fchmodat
    chmod_common
     notify_change
      ocfs2_setattr <=== take EX
       posix_acl_chmod
        get_acl
         ocfs2_iop_get_acl <=== remote PR request
        ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for these
funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Link: http://lkml.kernel.org/r/20170117100948.11657-3-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherrypicked from commit b891fa5024a95c77e0d6fd6655cb74af6fb77f46)
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

Orabug: 26427132

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a precess
already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that are
invoked directly by vfs code.  For instance:

  const struct inode_operations ocfs2_file_iops = {
      .permission     = ocfs2_permission,
      .get_acl        = ocfs2_iop_get_acl,
      .set_acl        = ocfs2_iop_set_acl,
  };

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):

  do_sys_open
   may_open
    inode_permission
     ocfs2_permission
      ocfs2_inode_lock() <=== first time
       generic_permission
        get_acl
         ocfs2_iop_get_acl
   ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two of
ocfs2_inode_lock().  Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of
the remote EX lock request.  Another hand, the recursive cluster lock
(the second one) will be blocked in in __ocfs2_cluster_lock() because of
OCFS2_LOCK_BLOCKED.  But, the downconvert never complete, why? because
there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.

1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track
   of the processes' pid who has taken the cluster lock of this lock
   resource;

2. introduce a new flag for ocfs2_inode_lock_full:
   OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for
   us if we've got cluster lock.

3. export a helper: ocfs2_is_locked_by_me() is used to check if we have
   got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs
routines can call into each other.

The performance penalty of processing the holder list should only be
seen at a few cases where the tracking logic is used, such as get/set
acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real
world, as far as I can see, including permission check,
(get|set)_(acl|attr), and the gfs2 code also do so.

[sfr@canb.auug.org.au remove some inlines]
Link: http://lkml.kernel.org/r/20170117100948.11657-2-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherrypicked from commit 439a36b8ef38657f765b80b775e2885338d72451)
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

Revert "add OCFS2_LOCK_RECURSIVE arg_flags to ocfs2_cluster_lock() to prevent hang"

Orabug: 26427132

This reverts commit 387775a70d2e5d89d1a81d78d66655337a5c2765.

Signed-off-by: Ashish Samant <ashish.samant@oracle.com>

xen/blkfront: always allocate grants first from per-queue persistent grants

This patch partially reverts 3df0e50 ("xen/blkfront: pseudo support for
multi hardware queues/rings"). The xen-blkfront queue/ring might hang due
to grants allocation failure in the situation when gnttab_free_head is
almost empty while many persistent grants are reserved for this queue/ring.

As persistent grants management was per-queue since 73716df ("xen/blkfront:
make persistent grants pool per-queue"), we should always allocate from
persistent grants first.

Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Orabug: 26351401

upstream commit: bd912ef3e46b6edb51bb8af4b73fd2be7817e305
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>

rds: Make sure updates to cp_send_gen can be observed

cp->cp_send_gen is treated as a normal variable, although it may be
used by different threads.

This is fixed by using {READ,WRITE}_ONCE when it is incremented and
READ_ONCE when it is read outside the {acquire,release}_in_xmit
protection.

Normative reference from the Linux-Kernel Memory Model:

    Loads from and stores to shared (but non-atomic) variables should
    be protected with the READ_ONCE(), WRITE_ONCE(), and
    ACCESS_ONCE().

Clause 5.1.2.4/25 in the C standard is also relevant.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from upstream e623a48ee433985f6ca0fb238f0002cc2eccdf53)

Orabug: 26519030

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>

NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration

Transparent State Migration copies a client's lease state from the
server where a filesystem used to reside to the server where it now
resides. When an NFSv4.1 client first contacts that destination
server, it uses EXCHANGE_ID to detect trunking relationships.

The lease that was copied there is returned to that client, but the
destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
the client. This is because the lease was confirmed on the source
server (before it was copied).

Normally, when CONFIRMED_R is set, a client purges the lease and
creates a new one. However, that throws away the entire benefit of
Transparent State Migration.

Therefore, the client must not purge that lease when it is possible
that Transparent State Migration has occurred.

Reported-by: Xuan Qi <xuan.qi@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Xuan Qi <xuan.qi@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
(cherry picked from commit 8dcbec6d20eb881ba368d0aebc3a8a678aebb1da)

Orabug: 25727872
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>

xen: do not re-use pirq number cached in pci device msi msg data

Revert the main part of commit:
af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")

That commit introduced reading the pci device's msi message data to see
if a pirq was previously configured for the device's msi/msix, and re-use
that pirq.  At the time, that was the correct behavior.  However, a
later change to Qemu caused it to call into the Xen hypervisor to unmap
all pirqs for a pci device, when the pci device disables its MSI/MSIX
vectors; specifically the Qemu commit:
c976437c7dba9c7444fb41df45468968aaa326ad
("qemu-xen: free all the pirqs for msi/msix when driver unload")

Once Qemu added this pirq unmapping, it was no longer correct for the
kernel to re-use the pirq number cached in the pci device msi message
data.  All Qemu releases since 2.1.0 contain the patch that unmaps the
pirqs when the pci device disables its MSI/MSIX vectors.

This bug is causing failures to initialize multiple NVMe controllers
under Xen, because the NVMe driver sets up a single MSIX vector for
each controller (concurrently), and then after using that to talk to
the controller for some configuration data, it disables the single MSIX
vector and re-configures all the MSIX vectors it needs.  So the MSIX
setup code tries to re-use the cached pirq from the first vector
for each controller, but the hypervisor has already given away that
pirq to another controller, and its initialization fails.

This is discussed in more detail at:
https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html

Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
(cherry picked from commit c74fd80f2f41d05f350bb478151021f88551afe8)

Orabug: 26547167

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Re-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

MacSec: fix backporting error in patches for CVE-2017-7477

Orabug: 26443893

- macsec: dynamically allocate space for sglist (Jason A. Donenfeld)
[Orabug: 26368162]  {CVE-2017-7477}
- macsec: avoid heap overflow in skb_to_sgvec (Jason A. Donenfeld)  [Orabug:
26368162]  {CVE-2017-7477}

The backporting of above patches introduded a heap overrun error shown
as bug 26443893.

------------[ cut here ]------------
WARNING: CPU: 28 PID: 0 at kernel/time/timer.c:1177
call_timer_fn+0x142/0x150()
timer: mld_ifc_timer_expire+0x0/0x2d0 preempt leak: 00000100 -> 00000101
Modules linked in: gcm macsec fuse btrfs xor raid6_pq vfat msdos fat ext4
jbd2 ext2 mbcache2 ip6table_filter ip6_tables
BUG: workqueue leaked lock or atomic: kworker/15:2/0x00000001/689
     last function: addrconf_dad_work
CPU: 15 PID: 689 Comm: kworker/15:2 Not tainted 4.1.12-103.2.6.el7uek.x86_64
Hardware name: Oracle Corporation SUN SERVER X4-2       /ASSY,MOTHERBOARD,1U
, BIOS 25010603 01/16/2014
Workqueue: ipv6_addrconf addrconf_dad_work

Call Trace:
[<ffffffff81735938>] dump_stack+0x63/0x81
[<ffffffff810a0fd8>] process_one_work+0x3a8/0x460
[<ffffffff810a1582>] worker_thread+0x112/0x520
[<ffffffff810a1470>] ? rescuer_thread+0x3e0/0x3e0
[<ffffffff810a7348>] kthread+0xd8/0xf0
[<ffffffff810a7270>] ? kthread_create_on_node+0x1b0/0x1b0
[<ffffffff8173d9a2>] ret_from_fork+0x42/0x70
[<ffffffff810a7270>] ? kthread_create_on_node+0x1b0/0x1b0
BUG: scheduling while atomic: kworker/15:2/689/0x00000001

1. newly introduced variable "num_frags" not used in 'sg_ad', assumes
'MAX_SKB_FRAGS + 1'

2. Initialization of sglist assumes 'MAX_SKB_FRAGS + 1' length, though it was
changed to the number of scatterlist elements being returned from
"skb_cow_data()"

3. It seems that "sg_init_table(sg, MAX_SKB_FRAGS + 1);" is redundant, it was
already done a few lines before.

This patch may solve the above issues.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

ovl: move super block magic number to magic.h

Orabug: 26546379, 26540706
CVE-2016-1575
CVE-2016-1576

The overlayfs file system is not recognized by programs
like tail because the magic number is not in standard header location.

Move it so that the value will propagate on for the GNU library
and utilities. Needs to go in the fstatfs manual page as well.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
(cherry picked from commit 257f871993474e2bde6c497b54022c362cf398e1)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

ovl: use a minimal buffer in ovl_copy_xattr

Orabug: 26546379, 26540706
CVE-2016-1575
CVE-2016-1576

Rather than always allocating the high-order XATTR_SIZE_MAX buffer
which is costly and prone to failure, only allocate what is needed and
realloc if necessary.

Fixes https://github.com/coreos/bugs/issues/489

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
(cherry picked from commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

ovl: allow zero size xattr

Orabug: 26546379, 26540706
CVE-2016-1575
CVE-2016-1576

When ovl_copy_xattr() encountered a zero size xattr no more xattrs were
copied and the function returned success. This is clearly not the desired
behavior.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
(cherry picked from commit 97daf8b97ad6f913a34c82515be64dc9ac08d63e)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

ovl: default permissions

Orabug: 26546379, 26540706
CVE-2016-1575
CVE-2016-1576

Add mount option "default_permissions" to alter the way permissions are
calculated.

Without this option and prior to this patch permissions were calculated by
underlying lower or upper filesystem.

With this option the permissions are calculated by overlayfs based on the
file owner, group and mode bits.

This has significance for example when a read-only exported NFS filesystem
is used as a lower layer. In this case the underlying NFS filesystem will
reply with EROFS, in which case all we know is that the filesystem is
read-only. But that's not what we are interested in, we are interested in
whether the access would be allowed if the filesystem wasn't read-only; the
server doesn't tell us that, and would need updating at various levels,
which doesn't seem practicable.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
(cherry picked from commit 8d3095f4ad47ac409440a0ba1c80e13519ff867d)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

uek-rpm: Add missing .ko files to ueknano modules list

Orabug: 26521422

When installing kernel-ueknano rpm package, warnings are displayed due to missing
symbols. This commit adds modules with needed symbols to /lib/modules/ directory.

Fixes: 74d5ebd39bfa ("uek-rpm: Share specfile for both kernel-ueknano and kernel-uek")
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>

ping: implement proper locking

We got a report of yet another bug in ping

http://www.openwall.com/lists/oss-security/2017/03/24/6

->disconnect() is not called with socket lock held.

Fix this by acquiring ping rwlock earlier.

Thanks to Daniel, Alexander and Andrey for letting us know this problem.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Daniel Jiang <danieljiang0415@gmail.com>
Reported-by: Solar Designer <solar@openwall.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 43a6684519ab0a6c52024b5e25322476cabad893)

Orabug: 25883225
CEV: CVE-2017-2671

Signed-off-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect

If there is inflight I/O in any non-last queue, blkback returns -EBUSY
directly, and never stops thread of remaining queue and processs them. When
removing vbd device with lots of disk I/O load, some queues with inflight
I/O still have blkback thread running even though the corresponding vbd
device or guest is gone.
And this could cause some problems, for example, if the backend device type
is file, some loop devices and blkback thread always lingers there forever
after guest is destroyed, and this causes failure of umounting repositories
unless rebooting the dom0. So stop all threads properly and return -EBUSY
if any queue has inflight I/O.

OraBug: 26539922

Signed-off-by: Annie Li <annie.li@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>

uek-rpm: Share specfile for both kernel-ueknano and kernel-uek

Orabug: 26521422

The kernel-ueknano is added as sub package in kernel-uek spec file to
create both the binary rpms from the same source. The list of modules to
be added to kernel-ueknano is passed as input to the spec file.

Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed By: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-By: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>

PCI: Workaround wrong flags completions for IDT switch

The IDT switch incorrectly flags an ACS source violation on a read config
request to an end point device on the completion (IDT 89H32H8G3-YC,
errata #36) even though the PCI Express spec states that completions are
never affected by ACS source violation (PCI Spec 3.1, Section 6.12.1.1).

The suggested workaround by IDT is to issue a configuration write to the
downstream device before issuing the first config read. This allows the
downstream device to capture its bus number, thus avoiding the ACS
violation on the completion.

The patch does the following -

1. Disable ACS source violation if enabled
2. Wait for config space access to become available by reading vendor id
3. Do a config write to the end point (errata workaround)
4. Enable ACS source validation (if it was enabled to begin with)

-v2: move workaround to pci_bus_read_dev_vendor_id() from
pci_bus_check_dev()
      and move enable_acs_sv to drivers/pci/pci.c -- by Yinghai
-v3: add bus->self check for root bus and virtual bus for sriov vfs.
-v4: only do workaround for IDT switches
-v5: tweak pci_std_enable_acs_sv to deal with unimplemented SV and
clarify return value

Signed-off-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
--

  drivers/pci/pci.c   | 37 +++++++++++++++++++++++++++++++++++++
  drivers/pci/pci.h   |  1 +
  drivers/pci/probe.c | 38 ++++++++++++++++++++++++++++++++++++--
  3 files changed, 74 insertions(+), 2 deletions(-)

Orabug: 26243152

Link: https://patchwork.kernel.org/patch/9828571/
Fixed wrapped lines in the original patch to fit the lines in 80 columns.
Changed variable types from integer to bool to keep consistent with function
return type.

Signed-off-by: Shan Hai <shan.hai@oracle.com>

Revert "SUNRPC: Refactor svc_set_num_threads()"

This reverts commit 0bc9402329824ae06d8a26a73b60d21cdac3e6f2.

Orabug: 26479081
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>

Revert "NFSv4: Fix callback server shutdown"

This reverts commit 13757272bec28f1b8be59f56292e8f17076923b3.

Orabug: 26479081
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Brian Maly <brian.maly@oracle.com>

nmi_backtrace: generate one-line reports for idle cpus

When doing an nmi backtrace of many cores, most of which are idle, the
output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the interrupted
PC to see if it lies within that section.

This commit suitably tags x86 and tile idle routines, and only adds in
the minimal framework for other architectures.

Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Tested-by: Petr Mladek <pmladek@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6727ad9e206cc08b80d8000a4d67f8417e53539d)

Orabug: 25925689

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
arch/arm/kernel/vmlinux-xip.lds.S
arch/h8300/kernel/vmlinux.lds.S
arch/x86/kernel/process.c
drivers/acpi/processor_idle.c
kernel/sched/idle.c
lib/nmi_backtrace.c

netfilter: nf_tables: fix oob access

BUG: KASAN: slab-out-of-bounds in nf_tables_rule_destroy+0xf1/0x130 at addr ffff88006a4c35c8
Read of size 8 by task nft/1607

When we've destroyed last valid expr, nft_expr_next() returns an invalid expr.
We must not dereference it unless it passes != nft_expr_last() check.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 3e38df136e453aa69eb4472108ebce2fb00b1ba6)

Orabug: 25960439,26492640,26492632

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

scsi: libiscsi: use kvzalloc for iscsi_pool_init

iscsiadm session login can fail with the following error:

iscsiadm: Could not login to [iface: default, target: iqn.1986-03.com...
iscsiadm: initiator reported error (9 - internal error)

When /etc/iscsi/iscsid.conf sets node.session.cmds_max = 4096, it
results in 64K-sized kmallocs per session. A system under fragmented
slab pressure may not have any 64K objects available and fail iscsiadm
session login. Even though memory objects of a smaller size are
available, the large order allocation ends up failing.

The kernel prints a warning and does dump_stack, like below:

iscsid: page allocation failure: order:4, mode:0xc0d0
CPU: 0 PID: 2456 Comm: iscsid Not tainted 4.1.12-61.1.28.el6uek.x86_64 #2
Call Trace:
[<ffffffff816c6e40>] dump_stack+0x63/0x83
[<ffffffff8118e58a>] warn_alloc_failed+0xea/0x140
[<ffffffff81191df9>] __alloc_pages_slowpath+0x409/0x760
[<ffffffff81192401>] __alloc_pages_nodemask+0x2b1/0x2d0
[<ffffffffa048f6c0>] ? dev_attr_host_ipaddress+0x20/0xffffffffffffc722
[<ffffffff811dc38f>] alloc_pages_current+0xaf/0x170
[<ffffffff81192581>] alloc_kmem_pages+0x31/0xd0
[<ffffffffa048f600>] ? iscsi_transport_group+0x20/0xffffffffffffc7e2
[<ffffffff811ad738>] kmalloc_order+0x18/0x50
[<ffffffff811ad7a4>] kmalloc_order_trace+0x34/0xe0
[<ffffffff8146ee30>] ? transport_remove_classdev+0x70/0x70
[<ffffffff811e843d>] __kmalloc+0x27d/0x2a0
[<ffffffff810c8cbd>] ? complete_all+0x4d/0x60
[<ffffffffa04af299>] iscsi_pool_init+0x69/0x160 [libiscsi]
[<ffffffff81465d90>] ? device_initialize+0xb0/0xd0
[<ffffffffa04af510>] iscsi_session_setup+0x180/0x2f4 [libiscsi]
[<ffffffffa04c5a60>] ? iscsi_max_lun+0x20/0xfffffffffffffa9e [iscsi_tcp]
[<ffffffffa04c531f>] iscsi_sw_tcp_session_create+0xcf/0x150 [iscsi_tcp]
[<ffffffffa04c5a60>] ? iscsi_max_lun+0x20/0xfffffffffffffa9e [iscsi_tcp]
[<ffffffffa048a633>] iscsi_if_create_session+0x33/0xd0
[<ffffffffa04c5a60>] ? iscsi_max_lun+0x20/0xfffffffffffffa9e [iscsi_tcp]
[<ffffffffa048abd8>] iscsi_if_recv_msg+0x508/0x8c0 [scsi_transport_iscsi]
[<ffffffff811922eb>] ? __alloc_pages_nodemask+0x19b/0x2d0
[<ffffffff811e6d69>] ? __kmalloc_node_track_caller+0x209/0x2c0
[<ffffffffa048b00c>] iscsi_if_rx+0x7c/0x200 [scsi_transport_iscsi]
[<ffffffff81623dc6>] netlink_unicast+0x126/0x1c0
[<ffffffff8162468c>] netlink_sendmsg+0x36c/0x400
[<ffffffff815d2fed>] sock_sendmsg+0x4d/0x60
[<ffffffff815d596a>] ___sys_sendmsg+0x30a/0x330
[<ffffffff811bc72c>] ? handle_pte_fault+0x20c/0x230
[<ffffffff811bc90c>] ? __handle_mm_fault+0x1bc/0x330
[<ffffffff811bcb32>] ? handle_mm_fault+0xb2/0x1a0
[<ffffffff815d5b99>] __sys_sendmsg+0x49/0x90
[<ffffffff815d5bf9>] SyS_sendmsg+0x19/0x20
[<ffffffff816cbb2e>] system_call_fastpath+0x12/0x71

Use kvzalloc for iscsi_pool in iscsi_pool_init.

Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Tested-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Joseph Slember <joe.slember@oracle.com>
Reviewed-by: Lance Hartmann <lance.hartmann@oracle.com>
Acked-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26473178
(cherry picked from commit bfcc62ed7066268349e8e7955925bdaf4be0eec0)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

mm: introduce kv[mz]alloc helpers

Patch series "kvmalloc", v5.

There are many open coded kmalloc with vmalloc fallback instances in the
tree.  Most of them are not careful enough or simply do not care about
the underlying semantic of the kmalloc/page allocator which means that
a) some vmalloc fallbacks are basically unreachable because the kmalloc
part will keep retrying until it succeeds b) the page allocator can
invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.

As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.

Most callers, I could find, have been converted to use the helper
instead.  This is patch 6.  There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.

[1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com

This patch (of 9):

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code.  Yet we do not have any common helper
for that and so users have invented their own helpers.  Some of them are
really creative when doing so.  Let's just add kv[mz]alloc and make sure
it is implemented properly.  This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures.  This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them.  In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
require GFP_NO{FS,IO} context which is not vmalloc compatible in general
(note that the page table allocation is GFP_KERNEL).  Those need to be
fixed separately.

While we are at it, document that __vmalloc{_node} about unsupported gfp
mask because there seems to be a lot of confusion out there.
kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
superset) flags to catch new abusers.  Existing ones would have to die
slowly.

[sfr@canb.auug.org.au: f2fs fixup]
Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 26473178
(cherry picked from commit a7c3e901a46ff54c016d040847eda598a9e3e653)
Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Conflicts:

arch/x86/kvm/lapic.c
arch/x86/kvm/page_track.c
arch/x86/kvm/x86.c
fs/f2fs/f2fs.h
fs/f2fs/file.c
fs/f2fs/node.c
fs/f2fs/segment.c
fs/seq_file.c
security/apparmor/apparmorfs.c
security/apparmor/include/lib.h
security/apparmor/lib.c
security/apparmor/policy_unpack.c
virt/kvm/kvm_main.c

sg: Fix double-free when drives detach during SG_IO

In sg_common_write(), we free the block request and return -ENODEV if
the device is detached in the middle of the SG_IO ioctl().

Unfortunately, sg_finish_rem_req() also tries to free srp->rq, so we
end up freeing rq->cmd in the already free rq object, and then free
the object itself out from under the current user.

This ends up corrupting random memory via the list_head on the rq
object. The most common crash trace I saw is this:

  ------------[ cut here ]------------
  kernel BUG at block/blk-core.c:1420!
  Call Trace:
  [<ffffffff81281eab>] blk_put_request+0x5b/0x80
  [<ffffffffa0069e5b>] sg_finish_rem_req+0x6b/0x120 [sg]
  [<ffffffffa006bcb9>] sg_common_write.isra.14+0x459/0x5a0 [sg]
  [<ffffffff8125b328>] ? selinux_file_alloc_security+0x48/0x70
  [<ffffffffa006bf95>] sg_new_write.isra.17+0x195/0x2d0 [sg]
  [<ffffffffa006cef4>] sg_ioctl+0x644/0xdb0 [sg]
  [<ffffffff81170f80>] do_vfs_ioctl+0x90/0x520
  [<ffffffff81258967>] ? file_has_perm+0x97/0xb0
  [<ffffffff811714a1>] SyS_ioctl+0x91/0xb0
  [<ffffffff81602afb>] tracesys+0xdd/0xe2
    RIP [<ffffffff81281e04>] __blk_put_request+0x154/0x1a0

The solution is straightforward: just set srp->rq to NULL in the
failure branch so that sg_finish_rem_req() doesn't attempt to re-free
it.

Additionally, since sg_rq_end_io() will never be called on the object
when this happens, we need to free memory backing ->cmd if it isn't
embedded in the object itself.

KASAN was extremely helpful in finding the root cause of this bug.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26492266
(cherry picked from commit f3951a3709ff50990bf3e188c27d346792103432)
Signed-off-by: John Sobecki <john.sobecki@oracle.com>

scsi: smartpqi: mark PM functions as __maybe_unused

Orabug: 26191021, 26447813

The newly added suspend/resume support causes harmless warnings when
CONFIG_PM is disabled:

smartpqi/smartpqi_init.c:5147:12: error: 'pqi_ctrl_wait_for_pending_io' defined but not used [-Werror=unused-function]
smartpqi/smartpqi_init.c:2019:13: error: 'pqi_wait_until_lun_reset_finished' defined but not used [-Werror=unused-function]
smartpqi/smartpqi_init.c:2013:13: error: 'pqi_wait_until_scan_finished' defined but not used [-Werror=unused-function]

We can avoid the warnings by removing the #ifdef around the handlers and
instead marking them as __maybe_unused, which will let gcc drop the
unused code silently.

Fixes: f44d210312a6 ("scsi: smartpqi: add suspend and resume support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5c146686e32085e76ad9e2957f3dee9b28fe4f22)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: bump driver version

Orabug: 26191021, 26447813

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2d154f5ff338137a69f2f2a313520b6da2e1eb16)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: remove writeq/readq function definitions

Orabug: 26191021, 26447813

Instead of rewriting write/readq, use existing functions

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add module parameters

Orabug: 26191021, 26447813

Add module parameters to disable heartbeat support and to disable
shutting down the controller when a controller is taken offline.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5a259e32ba32c380537f3d186a311e528b9f9c94)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: cleanup list initialization

Orabug: 26191021, 26447813

Better initialization of linked list heads.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8a994a04fc3a8edbcc0ba1d17219b6d8f4c38009)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add raid level show

Orabug: 26191021, 26447813

Display the RAID level via sysfs

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a9f93392415eb0fc86c29f015822b36016278c72)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: make ioaccel references consistent

Orabug: 26191021, 26447813

- make all references to RAID bypass consistent throughout driver.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 588a63fea1c28009fe17f194941fb8d8b101b44e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: enhance device add and remove messages

Orabug: 26191021, 26447813

Improved formatting of information displayed when devices
are added/removed from the system.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6de783f666291763bcc6c3975e146b9b698378b1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update timeout on admin commands

Orabug: 26191021, 26447813

Increase the timeout on admin commands from 3 seconds to 60
seconds and added a check for controller crash in the loop
where the driver polls for admin command completion.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 13bede676b98d595a43d36a34e1835b686d0d140)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: map more raid errors to SCSI errors

Orabug: 26191021, 26447813

enhance mapping of RAID path errors to Linux SCSI host
error codes.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f5b63206255f68116c117565ab703c531c5ce400)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: cleanup controller branding

Orabug: 26191021, 26447813

- Improve controller branding support.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 37b36847a94669a898c9e3449d19d522d9c13979)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update rescan worker

Orabug: 26191021, 26447813

improve support for taking controller offline.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5f310425c8eabeeb303809898682e5b79c8a9c7e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update device offline

Orabug: 26191021, 26447813

- Improve handling of offline devices.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 03b288cf3d92202b950245e931576bb573930c70)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct aio error path

Orabug: 26191021, 26447813

set the internal flag that causes I/O to be sent down the
RAID path when the AIO path is disabled

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 376fb880a4fbf6903918a88081b16c167819af3f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: add lockup action

Orabug: 26191021, 26447813

add support for actions to take when controller goes offline.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3c50976f33f30cf00baea9d518bd3e7ddd01ecc4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: remove qdepth calculations for logical volumes

Orabug: 26191021, 26447813

make the queue depth for LVs the same as the maximum
I/Os supported by the controller

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 94086f5be3f15fc8231e65975e4413c0df3e0203)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: enhance kdump

Orabug: 26191021, 26447813

constrain resource usage during kdump to avoid kdump failures

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d727a776d72b26033161bc19441266749455115b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: change return value for LUN reset operations

Orabug: 26191021, 26447813

change return value for controller offline to be consistent
with the rest of the driver.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4e8415e3861e8b73a47c92e09e044b9dbc8ee37f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add ptraid support

Orabug: 26191021, 26447813

add support for PTRAID devices

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit bd10cf0be6057f680fab911d89761fd15d76b205)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: update copyright

Orabug: 26191021, 26447813

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b805dbfe2bce1ddf3209c29f1aa7d6b2064ab6c9)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: cleanup messages

Orabug: 26191021, 26447813

- improve some error messages.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d87d5474e2080695ef0cc8c5e6c42a41d6ab961b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add new PCI device IDs

Orabug: 26191021, 26447813

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7eddabff8acb0f4c25f992efe126cf6cccdd6e7b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: minor driver cleanup

Orabug: 26191021, 26447813

- remove debug code that is no longer necessary.
- Some WARN_ON checks were removed because the driver continues
to function when the conditions are met.
- remove a MACRO that is no longer used.
- remove unnecessary multi-line statements.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit cbe0c7b11dbfda368f27a6935a08ba91522edf1a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct BMIC identify physical drive

Orabug: 26191021, 26447813

correct the BMIC Identify Physical Device structure
- missing 2 fields

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1be42f46ade32c668f11c0735af03ab2d479d206)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: eliminate redundant error messages

Orabug: 26191021, 26447813

eliminate redundant error message during initialization
if the controller has crashed.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8845fdfa92ab6eb24209f9929d6340c2f5d4a2de)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add pqi_wait_for_completion_io

Orabug: 26191021, 26447813

Add check for controller lockup during waits for synchronous
controller commands.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1f37e992ad8015ce33596466b0f36babb495148e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct bdma hw bug

Orabug: 26191021, 26447813

add workaround for BDMA hardware bug that can cause
hw to read up to 12 SGL elements (192 bytes) beyond the
last element in the list. This fix avoids IOMMU violations

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e1d213bdc3e359c6c5da8ebbc5b2e87b376e8777)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add heartbeat check

Orabug: 26191021, 26447813

check for controller lockups

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 98f876674a6fba3591c342dfbcfdbaa7ecf0a84e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add suspend and resume support

Orabug: 26191021, 26447813

add support for ACPI S3 (suspend) and S4 (hibernate)
system power states.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 061ef06a2d436cea85984cf0b51b452547a5496c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

scsi: smartpqi: enhance resets

Orabug: 26191021, 26447813

- Block all I/O targeted at LUN reset device.
- Wait until all I/O targeted at LUN reset device has been
consumed by the controller.
- Issue LUN reset request.
- Wait until all outstanding I/Os and LUN reset completion
have been received by the host.
- Return to OS results of LUN reset request.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7561a7e4412e515100ac195303531fc2621ac2db)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add supporting events

Orabug: 26191021, 26447813

Only register for controller events that driver supports
cleanup event handling.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6a50d6ada03d8d9102a632d0e2db70cd9b6620f5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: ensure controller is in SIS mode at init

Orabug: 26191021, 26447813

put in SIS mode during initialization.
support kexec/kdump

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 162d7753fce9a00719c09dfebd9fee3855e27fbe)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: add in controller checkpoint for controller lockups.

Orabug: 26191021, 26447813

tell smartpqi controller to generate a checkpoint for rare lockup
conditions.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5b0fba0f408777113eff93bd18ab0b9f80760fb7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: set pci completion timeout

Orabug: 26191021, 26447813

add support for setting PCIe completion timeout.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a81ed5f338a843d8bfd199928142b196d71ae62c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: correct remove scsi devices

Orabug: 26191021, 26447813

correct a problem caused by holding a spinlock during device deletion.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a37ef74517acf0d022ab4c8fa671c82c877eed7b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

scsi: smartpqi: fix time handling

Orabug: 26191021, 26447813

When we have turned off RTC support, the smartpqi driver fails to build:

ERROR: "rtc_time64_to_tm" [drivers/scsi/smartpqi/smartpqi.ko] undefined!

This is easily avoided by using the generic 'struct tm' based helper rather
than the RTC specific one. While fixing this, I noticed that even though
the driver uses time64_t for storing seconds, it gets them from the
old 32-bit struct timeval. To address this, we can simplify the code
by calling ktime_get_real_seconds() directly.

Fixes: 6c223761eb54 ("smartpqi: initial commit of Microsemi smartpqi driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ed10858eadd4988260c6bc7d75fc25176342b5a7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/sock: add WARN_ON(parent->sk) in sock_graft()

sock_graft() unilaterally sets up parent->sk based on the
assumption that the existing parent->sk is null. If this
condition is not true, then the existing parent->sk would
be leaked, so add a WARN_ON() to alert callers who may fall
in this category.

Orabug: 26477756

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

rds: tcp: use sock_create_lite() to create the accept socket

There are two problems with calling sock_create_kern() from
rds_tcp_accept_one()
1. it sets up a new_sock->sk that is wasteful, because this ->sk
is going to get replaced by inet_accept() in the subsequent ->accept()
2. The new_sock->sk is a leaked reference in sock_graft() which
expects to find a null parent->sk

Avoid these problems by calling sock_create_lite().

Orabug: 26477756

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>