www.infradead.org Git - users/jedix/linux-maple.git/log

scsi: mpt3sas: Fix possibility of using invalid Enclosure Handle for SAS device after host reset

Enclosure handles are not updated after host reset. As a result, driver
device structure is holding previously assigned enclosure handle which
is different from the enclosure handle populated in the corresponding
device page.

Modified the driver to update devices enclosure handles after host reset
to current value by referring the enclosure handles from corresponding
device pages

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26894858
(cherry picked from commit aba5a85c2fcf05a6922e28c7179526adad58f4b5)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

scsi: mpt3sas: Display chassis slot information of the drive

Display chassis slot information along with other drive location
parameters such as slot number and connector name in the logs if
chassis slot validity bit is set in 'SAS Enclosure Page 0'.

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26894858
(cherry picked from commit 7588895646b5a943d3310271885c5935123a455c)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

scsi: mpt3sas: Updated MPI headers to v2.00.48

Updated MPI headers to v2.00.48

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26894858
(cherry picked from commit 90e7a70199184ed5f3081981c7cffed771b84bb3)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

scsi: mpt3sas: Fix IO error occurs on pulling out a drive from RAID1 volume created on two SATA drive

Whenever an I/O for a RAID volume fails with IOCStatus
MPI2_IOCSTATUS_SCSI_IOC_TERMINATED and SCSIStatus equal to
(MPI2_SCSI_STATE_TERMINATED | MPI2_SCSI_STATE_NO_SCSI_STATUS) then
return the I/O to SCSI midlayer with "DID_RESET" (i.e. retry the IO
infinite times) set in the host byte.

Previously, the driver was completing the I/O with "DID_SOFT_ERROR"
which causes the I/O to be quickly retried. However, firmware needed
more time and hence I/Os were failing.

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26894858
(cherry picked from commit 2ce9a3645299ba1752873d333d73f67620f4550b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

scsi: mpt3sas: Fix removal and addition of vSES device during host reset

For Dev Handles whose value is less than HBA's phys count number, driver
would return HBA's SAS address value. As a result, for a Virtual SES
device the driver was returning the HBA's SAS address. Updated the
driver to return Virtual SES' SAS address.

[mkp: clarified commit message]

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26894858
(cherry picked from commit 758f8139e9a779de76fed2b48dc492bfb6612684)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

scsi: mpt3sas: Reduce memory footprint in kdump kernel

To reduce the memory footprint of the driver in the kdump kernel, we
apply the following settings when reset_devices is set:

1. Use single MSI-x vector.
2. Disable RDPQ mode.
3. Set sg_table_size to 32 by default.
4) Set SCSI IO Queue depth to 200.

[mkp: fixed commit message]

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26894858
(cherry picked from commit 06f5f976a6ee0f8bbb0dd648415eeac0536fef97)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

scsi: mpt3sas: Fixed memory leaks in driver

While removing Expander devices, we are removing expander device entry
from the list before freeing its child devices. While freeing child
device we are finding its parent device node as NULL and therefore we
are not freeing the child device's allocated data structures. Updated
the driver to remove the expander device from the list only after
freeing all its child devices.

[mkp: clarified commit message]

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26894858
(cherry picked from commit bbe3def3a11dc1040d45469f5dd26032e9fd8c79)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

scsi: mpt3sas: Processing of Cable Exception events

Earlier Active Cable Exception event with reason code "Cable Degraded
(0x02))" was added only for Active Cable. Now this event is extended to
Passive cable too. Re-arranged display message accordingly.

Also added Cable Exception Event event for SAS3008 & SAS3108 HBAs
(i.e. MPI 2.5 spec supporting HBAs). Previously, this event was enabled
only for MPI 2.6 spec supporting HBA devices.

[mkp: typos]

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 26894858
(cherry picked from commit b99b199378afac7675876adc170d82d7a4442330)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

selinux: fix off-by-one in setprocattr

SELinux tries to support setting/clearing of /proc/pid/attr attributes
from the shell by ignoring terminating newlines and treating an
attribute value that begins with a NUL or newline as an attempt to
clear the attribute.  However, the test for clearing attributes has
always been wrong; it has an off-by-one error, and this could further
lead to reading past the end of the allocated buffer since commit
bb646cdb12e75d82258c2f2e7746d5952d3e321a ("proc_pid_attr_write():
switch to memdup_user()").  Fix the off-by-one error.

Even with this fix, setting and clearing /proc/pid/attr attributes
from the shell is not straightforward since the interface does not
support multiple write() calls (so shells that write the value and
newline separately will set and then immediately clear the attribute,
requiring use of echo -n to set the attribute), whereas trying to use
echo -n "" to clear the attribute causes the shell to skip the
write() call altogether since POSIX says that a zero-length write
causes no side effects. Thus, one must use echo -n to set and echo
without -n to clear, as in the following example:
$ echo -n unconfined_u:object_r:user_home_t:s0 > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate
unconfined_u:object_r:user_home_t:s0
$ echo "" > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate

Note the use of /proc/$$ rather than /proc/self, as otherwise
the cat command will read its own attribute value, not that of the shell.

There are no users of this facility to my knowledge; possibly we
should just get rid of it.

UPDATE: Upon further investigation it appears that a local process
with the process:setfscreate permission can cause a kernel panic as a
result of this bug.  This patch fixes CVE-2017-2618.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: added the update about CVE-2017-2618 to the commit description]
Cc: stable@vger.kernel.org # 3.5: d6ea83ec6864e
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 0c461cb727d146c9ef2d3e86214f498b78b7d125)

Orabug: 25660054
CVE: CVE-2017-2618
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

sysctl: Drop reference added by grab_header in proc_sys_readdir

Fixes CVE-2016-9191, proc_sys_readdir doesn't drop reference
added by grab_header when return from !dir_emit_dots path.
It can cause any path called unregister_sysctl_table will
wait forever.

The calltrace of CVE-2016-9191:

[ 5535.960522] Call Trace:
[ 5535.963265]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 5535.968817]  [<ffffffff817d33fb>] schedule_timeout+0x3db/0x6f0
[ 5535.975346]  [<ffffffff817cf055>] ? wait_for_completion+0x45/0x130
[ 5535.982256]  [<ffffffff817cf0d3>] wait_for_completion+0xc3/0x130
[ 5535.988972]  [<ffffffff810d1fd0>] ? wake_up_q+0x80/0x80
[ 5535.994804]  [<ffffffff8130de64>] drop_sysctl_table+0xc4/0xe0
[ 5536.001227]  [<ffffffff8130de17>] drop_sysctl_table+0x77/0xe0
[ 5536.007648]  [<ffffffff8130decd>] unregister_sysctl_table+0x4d/0xa0
[ 5536.014654]  [<ffffffff8130deff>] unregister_sysctl_table+0x7f/0xa0
[ 5536.021657]  [<ffffffff810f57f5>] unregister_sched_domain_sysctl+0x15/0x40
[ 5536.029344]  [<ffffffff810d7704>] partition_sched_domains+0x44/0x450
[ 5536.036447]  [<ffffffff817d0761>] ? __mutex_unlock_slowpath+0x111/0x1f0
[ 5536.043844]  [<ffffffff81167684>] rebuild_sched_domains_locked+0x64/0xb0
[ 5536.051336]  [<ffffffff8116789d>] update_flag+0x11d/0x210
[ 5536.057373]  [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
[ 5536.064186]  [<ffffffff81167acb>] ? cpuset_css_offline+0x1b/0x60
[ 5536.070899]  [<ffffffff810fce3d>] ? trace_hardirqs_on+0xd/0x10
[ 5536.077420]  [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
[ 5536.084234]  [<ffffffff8115a9f5>] ? css_killed_work_fn+0x25/0x220
[ 5536.091049]  [<ffffffff81167ae5>] cpuset_css_offline+0x35/0x60
[ 5536.097571]  [<ffffffff8115aa2c>] css_killed_work_fn+0x5c/0x220
[ 5536.104207]  [<ffffffff810bc83f>] process_one_work+0x1df/0x710
[ 5536.110736]  [<ffffffff810bc7c0>] ? process_one_work+0x160/0x710
[ 5536.117461]  [<ffffffff810bce9b>] worker_thread+0x12b/0x4a0
[ 5536.123697]  [<ffffffff810bcd70>] ? process_one_work+0x710/0x710
[ 5536.130426]  [<ffffffff810c3f7e>] kthread+0xfe/0x120
[ 5536.135991]  [<ffffffff817d4baf>] ret_from_fork+0x1f/0x40
[ 5536.142041]  [<ffffffff810c3e80>] ? kthread_create_on_node+0x230/0x230

One cgroup maintainer mentioned that "cgroup is trying to offline
a cpuset css, which takes place under cgroup_mutex.  The offlining
ends up trying to drain active usages of a sysctl table which apprently
is not happening."
The real reason is that proc_sys_readdir doesn't drop reference added
by grab_header when return from !dir_emit_dots path. So this cpuset
offline path will wait here forever.

See here for details: http://www.openwall.com/lists/oss-security/2016/11/04/13

Fixes: f0c3b5093add ("[readdir] convert procfs")
Cc: stable@vger.kernel.org
Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: Yang Shukui <yangshukui@huawei.com>
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
(cherry picked from commit 93362fa47fe98b62e4a34ab408c4a418432e7939)

Orabug: 25062944
CVE: 25062944
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>

storvsc: don't assume SG list is contiguous

Scatterlists are contiguous if they're limited to a page; but for large
I/Os, it's possible that the scatterlists span pages, in which case the
pages will not be physically contiguous - they will be chained together.
The MS patch (link below) fixes the wrong assumption in do_bounce_buffer()
that scatterlists are always contiguous, so it's a good fix to port, in
general.

Orabug: 26492697

MS patch:
https://github.com/LIS/lis-next/commit/a13bbc4ab81e459f635237a938f89737300ecfa1

Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Reviewed-by: Joe Slember <joe.slember@oracle.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>

thp: run vma_adjust_trans_huge() outside i_mmap_rwsem

vma_addjust_trans_huge() splits pmd if it's crossing VMA boundary.
During split we munlock the huge page which requires rmap walk. rmap
wants to take the lock on its own.

Let's move vma_adjust_trans_huge() outside i_mmap_rwsem to fix this.

Link: http://lkml.kernel.org/r/1466021202-61880-19-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
mm/mmap.c

Orabug: 27026170

(cherry picked from commit 37f9f5595c26d3cb644ca2fab83dc4c4db119f9f)
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands

When SCSI was written, all commands coming from the filesystem
(REQ_TYPE_FS commands) had data.  This meant that our signal for needing
to complete the command was the number of bytes completed being equal to
the number of bytes in the request.  Unfortunately, with the advent of
flush barriers, we can now get zero length REQ_TYPE_FS commands, which
confuse this logic because they satisfy the condition every time.  This
means they never get retried even for retryable conditions, like UNIT
ATTENTION because we complete them early assuming they're done.  Fix
this by special casing the early completion condition to recognise zero
length commands with errors and let them drop through to the retry code.

Cc: stable@vger.kernel.org
Reported-by: Sebastian Parschauer <s.parschauer@gmx.de>
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Tested-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a621bac3044ed6f7ec5fa0326491b2d4838bfa93)

Orabug: 26824565

Signed-off-by: Jim Quigley <Jim.Quigley@oracle.com>
reviewed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ovl: during copy up, switch to mounter's creds early

Now, we have the notion that copy up of a file is done with the creds
of mounter of overlay filesystem (as opposed to task). Right now before
we switch creds, we do some vfs_getattr() operations in the context of
task and that itself can fail. We should do that getattr() using the
creds of mounter instead.

So this patch switches to mounter's creds early during copy up process so
that even vfs_getattr() is done with mounter's creds.

Do not call revert_creds() unless we have already called
ovl_override_creds(). [Reported by Arnd Bergmann]

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 25684456

(backport upstream commit 8eac98b8beb4711c4ab61822cac077fd6660e820)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>

ovl: lookup: do getxattr with mounter's permission

The getxattr() in ovl_is_opaquedir() was missed when converting all
operations on underlying fs to be done under mounter's permission.

This patch fixes this by moving the ovl_override_creds()/revert_creds() out
from ovl_lookup_real() to ovl_lookup().

Also convert to using vfs_getxattr() instead of directly calling
i_op->getxattr().

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Orabug: 25684456

(backport upstream commit 2b6bc7f48d34a6043915beddbf53b981603737c8)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
conflict fix
fs/overlay/super.c

ovl: get rid of the dead code left from broken (and disabled) optimizations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Orabug: 25684456

(backport upstream commit 0f7ff2dabbc95ed7a8019d142274f0c7e083577d)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>

selinux: Implement dentry_create_files_as() hook

Calculate what would be the label of newly created file and set that
secid in the passed creds.

Context of the task which is actually creating file is retrieved from
set of creds passed in. (old->security).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Orabug: 25684456

(backport upstream commit a518b0a5b0d7f3397e065acb956bca9635aa892d)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Conflict fix:
security/selinux/hooks.c

security, overlayfs: Provide hook to correctly label newly created files

During a new file creation we need to make sure new file is created with the
right label. New file is created in upper/ so effectively file should get
label as if task had created file in upper/.

We switched to mounter's creds for actual file creation. Also if there is a
whiteout present, then file will be created in work/ dir first and then
renamed in upper. In none of the cases file will be labeled as we want it to
be.

This patch introduces a new hook dentry_create_files_as(), which determines
the label/context dentry will get if it had been created by task in upper
and modify passed set of creds appropriately. Caller makes use of these new
creds for file creation.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: fix whitespace issues found with checkpatch.pl]
[PM: changes to use stat->mode in ovl_create_or_link()]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Orabug: 25684456

(backport upstream commit 2602625b7e46576b00db619ac788c508ba3bcb2c)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
conflict fix:
include/linux/security.h
security/capability.c

selinux: Pass security pointer to determine_inode_label()

Right now selinux_determine_inode_label() works on security pointer of
current task. Soon I need this to work on a security pointer retrieved
from a set of creds. So start passing in a pointer and caller can
decide where to fetch security pointer from.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Orabug: 25684456

(backport upstream commit c957f6df52c509ccfbb96659fd1a0f7812de333f)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>

selinux: Implementation for inode_copy_up_xattr() hook

When a file is copied up in overlay, we have already created file on
upper/ with right label and there is no need to copy up selinux
label/xattr from lower file to upper file. In fact in case of context
mount, we don't want to copy up label as newly created file got its label
from context= option.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Orabug: 25684456

(backport upstream commit 19472b69d639d58415866bf127d5f9005038c105)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>

security,overlayfs: Provide security hook for copy up of xattrs for overlay file

Provide a security hook which is called when xattrs of a file are being
copied up. This hook is called once for each xattr and LSM can return
0 if the security module wants the xattr to be copied up, 1 if the
security module wants the xattr to be discarded on the copy, -EOPNOTSUPP
if the security module does not handle/manage the xattr, or a -errno
upon an error.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: whitespace cleanup for checkpatch.pl]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Orabug: 25684456

(backport upstream commit 121ab822ef21914adac2fa3730efeeb8fd762473)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
conflict fix
include/linux/security.h
security/capability.c
security/security.c

selinux: Implementation for inode_copy_up() hook

A file is being copied up for overlay file system. Prepare a new set of
creds and set create_sid appropriately so that new file is created with
appropriate label.

Overlay inode has right label for both context and non-context mount
cases. In case of non-context mount, overlay inode will have the label
of lower file and in case of context mount, overlay inode will have
the label from context= mount option.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Orabug: 25684456

(backport upstream commit 56909eb3f559103196ecbf2c08c923e0804980fb)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>

security, overlayfs: provide copy up security hook for unioned files

Provide a security hook to label new file correctly when a file is copied
up from lower layer to upper layer of a overlay/union mount.

This hook can prepare a new set of creds which are suitable for new file
creation during copy up. Caller will use new creds to create file and then
revert back to old creds and release new creds.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: whitespace cleanup to appease checkpatch.pl]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Orabug: 25684456

(backport upstream commit d8ad8b49618410ddeafd78465b63a6cedd6c9484)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Conflict fix:
include/linux/security.h
security/security.c
security/capability.c

selinux: delay inode label lookup as long as possible

Since looking up an inode's label can result in revalidation, delay
the lookup as long as possible to limit the performance impact.

Signed-off-by: Paul Moore <paul@paul-moore.com>
Orabug: 25684456

(backport upstream commit 20cdef8d57591ec8674f65ccfe555aca5fd10b64)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
conflict fix
security/selinux/hooks.c

selinux: Add accessor functions for inode->i_security

Add functions dentry_security and inode_security for accessing
inode->i_security. These functions initially don't do much, but they
will later be used to revalidate the security labels when necessary.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Orabug: 25684456

(backport upstream commit 83da53c5a34564a0a63b26f84293c6e2a639e1e4)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
conflict fix
security/selinux/hooks.c

selinux: Create a common helper to determine an inode label [ver #3]

Create a common helper function to determine the label for a new inode.
This is then used by:

- may_create()
- selinux_dentry_init_security()
- selinux_inode_init_security()

This will change the behaviour of the functions slightly, bringing them
all into line.

Suggested-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Orabug: 25684456

(backport upstream commit c3c188b2c3ed29effe8693672ee1c84184103b4e)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>

rds: Proper init/exit declaration for module init/exit function

Changed all the module init and exit function declarations such that
they are placed in .init.text and .exit.text sections respectively.

Orabug: 27013833

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

rds: Remove .exit from struct rds_transport

The .exit function in struct rds_transport is removed as it
is never used.

Orabug: 27013833

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

ipv6: avoid overflow of offset in ip6_find_1stfragopt

Orabug: 26540159
CVE: CVE-2017-7542

In some cases, offset can overflow and can cause an infinite loop in
ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.

This problem has been here since before the beginning of git history.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6399f1fae4ec29fab5ec76070435555e256ca3a6)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

xfs: use dedicated log worker wq to avoid deadlock with cil wq

orabug: 26871913
backport: upstream 696a562072e3c14bcd13ae5acc19cdf27679e865 (no conflict)

The log covering background task used to be part of the xfssyncd
workqueue. That workqueue was removed as of commit 5889608df ("xfs:
syncd workqueue is no more") and the associated work item scheduled
to the xfs-log wq. The latter is used for log buffer I/O completion.

Since xfs_log_worker() can invoke a log flush, a deadlock is
possible between the xfs-log and xfs-cil workqueues. Consider the
following codepath from xfs_log_worker():

xfs_log_worker()
  xfs_log_force()
    _xfs_log_force()
      xlog_cil_force()
        xlog_cil_force_lsn()
          xlog_cil_push_now()
            flush_work()

The above is in xfs-log wq context and blocked waiting on the
completion of an xfs-cil work item. Concurrently, the cil push in
progress can end up blocked here:

xlog_cil_push_work()
  xlog_cil_push()
    xlog_write()
      xlog_state_get_iclog_space()
        xlog_wait(&log->l_flush_wait, ...)

The above is in xfs-cil context waiting on log buffer I/O
completion, which executes in xfs-log wq context. In this scenario
both workqueues are deadlocked waiting on eachother.

Add a new workqueue specifically for the high level log covering and
ail pushing worker, as was the case prior to commit 5889608df.

Diagnosed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

udp: consistently apply ufo or fragmentation

Orabug: 26921303
CVE: CVE-2017-1000112

When iteratively building a UDP datagram with MSG_MORE and that
datagram exceeds MTU, consistently choose UFO or fragmentation.

Once skb_is_gso, always apply ufo. Conversely, once a datagram is
split across multiple skbs, do not consider ufo.

Sendpage already maintains the first invariant, only add the second.
IPv6 does not have a sendpage implementation to modify.

A gso skb must have a partial checksum, do not follow sk_no_check_tx
in udp_send_skb.

Found by syzkaller.

Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 85f1bd9a7b5a79d5baa8bf44af19658f7bf77bfa)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/ipv4/ip_output.c
net/ipv6/ip6_output.c
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

nvme-pci: Remove nvme_setup_prps BUG_ON

This patch replaces the invalid nvme SGL kernel panic with a warning,
and returns an appropriate error. The warning will occur only on the
first occurance, and sgl details will be printed to help debug how the
request was allowed to form.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 86eea2895d11dde9bf43fa2046331e84154e00f4)

Orabug: 26871819

Conflicts:
Added the macro BLK_STS to get status from nvme_setup_prps.

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>

block: Check for gaps on front and back merges

We are checking for gaps to previous bio_vec, which can
only detect back merges gaps. Moreover, at the point where
we check for a gap, we don't know if we will attempt a back
or a front merge. Thus, check for gap to prev in a back merge
attempt and check for a gap to next in a front merge attempt.

Signed-off-by: Jens Axboe <axboe@fb.com>
[sagig: Minor rename change]
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
(cherry picked from commit 5e7c4274a70aa2d6f485996d0ca1dad52d0039ca)

Orabug: 26871819

Conflicts:
Replaced queue_virt_boundary with
test_bit(QUEUE_FLAG_SG_GAPS, &q->queue_flags)

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

block: Copy a user iovec if it includes gaps

For drivers that don't support gaps in the SG lists handed to
them we must bounce (copy the user buffers) and pass a bio that
does not include gaps. This doesn't matter for any current user,
but will help to allow iser which can't handle gaps to use the
block virtual boundary instead of using driver-local bounce
buffering when handling SG_IO commands.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 46348456c1791053dcbe5a9e21825b10a3c8a8fb)

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

blk: [Partial] Replace SG_GAPGS with new queue limits mask

Several fixes went in upstream on top of queue_virt_boundary()
to address the gaps issue. However, back-porting queue_virt_boundary() api
disrupts iSER, storvsc and mpt3sas. Hence, implemented a hybrid
approach in QU4 got GAPS functionality. NVMe driver supports both
queue_virt_boundary() and QUEUE_FLAG_SG_GAPS to facilitate smooth
transistion.

Orabug: 26871819

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

CVE-2016-10318 missing authorization check fscrypt_process_policy

Port to UEK4 of mainline commit id 163ae1c6ad62.

On an ext4 or f2fs filesystem with file encryption supported, a user
could set an encryption policy on any empty directory(*) to which they
had readonly access.  This is obviously problematic, since such a
directory might be owned by another user and the new encryption policy
would prevent that other user from creating files in their own directory
(for example).

Fix this by requiring inode_owner_or_capable() permission to set an
encryption policy.  This means that either the caller must own the file,
or the caller must have the capability CAP_FOWNER.

(*) Or also on any regular file, for f2fs v4.6 and later and ext4
    v4.8-rc1 and later; a separate bug fix is coming for that.

Orabug: 25883175
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Acked-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

uek-rpm: Build kernel ueknano rpm for OL7

Orabug: 27002543

This commit enables building of kernel-ueknano rpm for OL7 also. These
changes are taken from the below commits that were done for OL6.

008dae863de6 uek-rpm: Clean up installed directories when uninstalling kernel-ueknano
799da1091458 uek-rpm: Add missing ko modules to nano rpm
069411cd55c2 uek-rpm: Fix package dependencies for kernel-ueknano
566c3b2e1946 uek-rpm: Add missing .ko files to ueknano modules list
74d5ebd39bfa uek-rpm: Share specfile for both kernel-ueknano and kernel-uek

Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Todd Vierling <todd.vierling@oracle.com>
Acked-by: Ethan Zhao <ethan.zhao@oracle.com>

nvme: honor RTD3 Entry Latency for shutdowns

If an NVMe controller reports RTD3 Entry Latency larger than
shutdown_timeout, up to a maximum of 60 seconds, use that value to set
the shutdown timer. Otherwise fall back to the module parameter which
defaults to 5 seconds.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
[hch: removed do_div, made transition time local scope]
Signed-off-by: Christoph Hellwig <hch@lst.de>
(cherry picked from commit 07fbd32a6b215d8b2fc01ccc89622207b9b782fd)

Orabug: 26999048

Signed-off-by: Ashok Vairavan<ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

ocfs2: fix posix_acl_create deadlock

Commit 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure")
refactored code to use posix_acl_create.  The problem with this function
is that it is not mindful of the cluster wide inode lock making it
unsuitable for use with ocfs2 inode creation with ACLs.  For example,
when used in ocfs2_mknod, this function can cause deadlock as follows.
The parent dir inode lock is taken when calling posix_acl_create ->
get_acl -> ocfs2_iop_get_acl which takes the inode lock again.  This can
cause deadlock if there is a blocked remote lock request waiting for the
lock to be downconverted.  And same deadlock happened in ocfs2_reflink.
This fix is to revert back using ocfs2_init_acl.

Fixes: 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure")
Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 26731834

(cherry picked from commit c25a1e0671fbca7b2c0d0757d533bd2650d6dc0c)

Conflicts:

fs/ocfs2/acl.h

Reviewed-by: Ashish Samant <ashish.samant@oracle.com>

scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly

ChunYu found a kernel crash by syzkaller:

[  651.617875] kasan: CONFIG_KASAN_INLINE enabled
[  651.618217] kasan: GPF could be caused by NULL-ptr deref or user memory access
[  651.618731] general protection fault: 0000 [#1] SMP KASAN
[  651.621543] CPU: 1 PID: 9539 Comm: scsi Not tainted 4.11.0.cov #32
[  651.621938] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  651.622309] task: ffff880117780000 task.stack: ffff8800a3188000
[  651.622762] RIP: 0010:skb_release_data+0x26c/0x590
[...]
[  651.627260] Call Trace:
[  651.629156]  skb_release_all+0x4f/0x60
[  651.629450]  consume_skb+0x1a5/0x600
[  651.630705]  netlink_unicast+0x505/0x720
[  651.632345]  netlink_sendmsg+0xab2/0xe70
[  651.633704]  sock_sendmsg+0xcf/0x110
[  651.633942]  ___sys_sendmsg+0x833/0x980
[  651.637117]  __sys_sendmsg+0xf3/0x240
[  651.638820]  SyS_sendmsg+0x32/0x50
[  651.639048]  entry_SYSCALL_64_fastpath+0x1f/0xc2

It's caused by skb_shared_info at the end of sk_buff was overwritten by
ISCSI_KEVENT_IF_ERROR when parsing nlmsg info from skb in iscsi_if_rx.

During the loop if skb->len == nlh->nlmsg_len and both are sizeof(*nlh),
ev = nlmsg_data(nlh) will acutally get skb_shinfo(SKB) instead and set a
new value to skb_shinfo(SKB)->nr_frags by ev->type.

This patch is to fix it by checking nlh->nlmsg_len properly there to
avoid over accessing sk_buff.

Reported-by: ChunYu Wang <chunwang@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c88f0e6b06f4092995688211a631bb436125d77b)

Orabug: 26828494
CVE: CVE-2017-14489

Signed-off-by: Tim Tianyang Chen <tianyang.chen@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

uek/config: enable NVME SG_IO support by default

Orabug: 26993705

The NVME SG_IO support was optional since the commit
"nvme: make SG_IO support optional", enable it by default
since there are user space tools like sg3_utils depend on it.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

nvme: report the scsi TUR state correctly

Orabug: 26993705

The current nvme driver reports the scsi TEST UNIT READY state
upside down because of the inconsistency between the condition
checking and the return value, fix it by making it consistent.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

vring: Use the DMA API on Xen

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 78fe39872378b0bef00a91181f1947acb8a08500)
Orabug: 26388044
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

virtio_pci: Use the DMA API if enabled

This switches to vring_create_virtqueue, simplifying the driver and
adding DMA API support.

This fixes virtio-pci on platforms and busses that have IOMMUs. This
will break the experimental QEMU Q35 IOMMU support until QEMU is
fixed. In exchange, it fixes physical virtio hardware as well as
virtio-pci running under Xen.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 7a5589b240b405d55b2b395554082ec284f414bb)
Orabug: 26388044
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

virtio_mmio: Use the DMA API if enabled

This switches to vring_create_virtqueue, simplifying the driver and
adding DMA API support.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit b42111382f0e677e2e227c5c4894423cbdaed1f1)
Orabug: 26388044
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

virtio: Add improved queue allocation API

This leaves vring_new_virtqueue alone for compatbility, but it
adds two new improved APIs:

vring_create_virtqueue: Creates a virtqueue backed by automatically
allocated coherent memory. (Some day it this could be extended to
support non-coherent memory, too, if there ends up being a platform
on which it's worthwhile.)

__vring_new_virtqueue: Creates a virtqueue with a manually-specified
layout. This should allow mic_virtio to work much more cleanly.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2a2d1382fe9dccfce6f9c60a9c9fd2f0fe5bcf2b)
Orabug: 26388044
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

virtio_ring: Support DMA APIs

virtio_ring currently sends the device (usually a hypervisor)
physical addresses of its I/O buffers. This is okay when DMA
addresses and physical addresses are the same thing, but this isn't
always the case. For example, this never works on Xen guests, and
it is likely to fail if a physical "virtio" device ever ends up
behind an IOMMU or swiotlb.

The immediate use case for me is to enable virtio on Xen guests.
For that to work, we need vring to support DMA address translation
as well as a corresponding change to virtio_pci or to another
driver.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 780bc7903a32edb63be138487fd981694d993610)
Orabug: 26388044
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

vring: Introduce vring_use_dma_api()

This is a kludge, but no one has come up with a a better idea yet.
We'll introduce DMA API support guarded by vring_use_dma_api().
Eventually we may be able to return true on more and more systems,
and hopefully we can get rid of vring_use_dma_api() entirely some
day.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit d26c96c8102549f91eb0bea6196d54711ab52176)

OraBug: 26388044
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

smartpqi: update driver version

Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Orabug: 26943380

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

smartpqi: cleanup raid map warning message

Fix a small cosmetic bug in a very rarely encountered
error message that can occur when a LD is in the
process of being deleted.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Orabug: 26943380

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

smartpqi: update controller ids

Update the driver’s PCI IDs

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Orabug: 26943380

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: smartpqi: remove the smp_handler stub

The SAS transport class will do the right thing and not register the BSG
node if now smp_handler method is present.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit eaa79a6cd733e1f978613a5fcf5f7c1cdb38eb2a)

Orabug: 26943380

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: smartpqi: change driver version to 1.1.2-125

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b98117caa0e3d99e4aee1114bcb03ae9ad02bf22)

Orabug: 26943380

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: smartpqi: add in new controller ids

Update the driver’s PCI IDs to match the latest Microsemi controllers

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 557900640b06752fc6a7f6ed545ad1f8e00face9)

Orabug: 26943380

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: smartpqi: update kexec and power down support

Add PQI reset to driver shutdown callback to work around controller bug.

During an 1.) OS shutdown or 2.) kexec outside of a kdump, the Linux
kernel will clear BME on our controller.

If BME is cleared during a controller/host PCIe transfer, the controller
will lock up.

So we perform a PQI reset in the driver's shutdown callback function to
eliminate the possibility of a controller/host PCIe transfer being
active when the kernel clears BME immediately after calling the driver's
shutdown callback.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b6d478119edeaca964b46796fd26893b81f8a561)

Orabug: 26943380

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: smartpqi: cleanup doorbell register usage.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4f078e24080626764896055d857719cd886e6321)

Orabug: 26943380

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: smartpqi: update pqi passthru ioctl

- make pass-thru requests bi-directional

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 41555d540f18f72e8a52d5c4bc14c36413d09916)

Orabug: 26943380

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: smartpqi: enhance BMIC cache flush

- distinguish between shutdown and non-shutdown.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 58322fe0069a2ae2a19cf29023cc0b82c7245762)

Orabug: 26943380

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: smartpqi: add pqi reset quiesce support

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 336b68193165b1215d21dd05619dc262340e404b)

Orabug: 26943380

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

scsi: smartpqi: make pdev pointer names consistent

make all variable names for pointers to struct pci_dev consistent
throughout the driver.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d91d7820d39629fc67cea5d6721eac8b180b0451)

Orabug: 26943380

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

be2net: fix TSO6/GSO issue causing TX-stall on Lancer/BEx

IPv6 TSO requests with extension hdrs are a problem to the
Lancer and BEx chips. Workaround is to disable TSO6 feature
for such packets.

Also in Lancer chips, MSS less than 256 was resulting in TX stall.
Fix this by disabling GSO when MSS less than 256.

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 822f8565c93949fb2d31502d595c8bc45629c9b7)

Orabug: 26943365

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

ovl: fix get_acl() on tmpfs

tmpfs doesn't have ->get_acl() because it only uses cached acls.

This fixes the acl tests in pjdfstest when tmpfs is used as the upper layer
of the overlay.

Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes")
Cc: <stable@vger.kernel.org> # v4.8
Orabug: 26975443

(backport upstream commit b93d4a0eb308d4400b84c8b24c1b80e09a9497d0)

Signed-off-by: Shan Hai <shan.hai@oracle.com>

ixgbe: Initialize 64-bit stats seqcounts

On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.

Fixes: 4197aa7bb818 ("ixgbevf: provide 64 bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 26785078
(cherry picked from commit 7c3a4626eb65e78ebe208f48ffa21a5002f7f38e)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Disable flow control for XFI

Flow control autonegotiation is not supported for XFI. Make sure that
ixgbe_device_supports_autoneg_fc() returns false and
hw->fc.disable_fc_autoneg is set to true to avoid running the fc_autoneg
function for that device.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit 7adbccbbb5beabe14f3a02ee41abdaa1801395b8)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Do not support flow control autonegotiation for X553

Flow control autonegotiation is not supported for fiber on X553. Add
device ID checks in ixgbe_device_supports_autoneg_fc() to return the
appropriate value.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit ae84dbf7ff485b3b59740c6ea69df0613f6cd4f7)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Update NW_MNG_IF_SEL support for X553

The MAC register NW_MNG_IF_SEL fields have been redefined for
X553. These changes impact the iXFI driver code flow. Since iXFI is
only supported in X552, add MAC checks for iXFI flows.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit 48301cf22fa7d70db3ae777e374edfd4119fc826)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Enable LASI interrupts for X552 devices

Enable LASI interrupts on X552 devices in order to receive notifications of
link configurations of the external PHY and support the configuration of
the internal iXFI link since iXFI does not support auto-negotiation. This
is not required for X553 devices; add a check to avoid enabling LASI
interrupts for X553 devices.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit 72f740b1013783c81da928cfe2ac82dd767c74f0)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Ensure MAC filter was added before setting MACVLAN

This patch adds a check to ensure that adding the MAC filter was
successful before setting the MACVLAN. If it was unsuccessful, propagate
the error.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit 0e1ff3061cb529a70f03f63988a48f9fda8ed419)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: pci_set_drvdata must be called before register_netdev

We call pci_set_drvdata immediately after calling register_netdev,
which leaves a window where tasks writing to the sriov_numvfs sysfs
attribute can sneak in and crash the kernel. register_netdev cleans
up after itself so placing pci_set_drvdata immediately before it
should preserve the intent of commit 0fb6a55cc31f ("ixgbe: fix crash
on rmmod after probe fail").

Fixes: 0fb6a55cc31f ("ixgbe: fix crash on rmmod after probe fail")
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit a09c0fc3f5d775231f1884e0e66c495065a461ee)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Resolve cppcheck format string warning

cppcheck warns that the format string is incorrect in the function
ixgbe_get_strings(). Since the value cannot be negative, change the
variable to unsigned which matches the format specifier.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit 4ebdf8af3017ce242f37be2ae5e5f655dc9846ef)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: fix writes to PFQDE

ixgbe_write_qde() was ignoring the qde parameter which resulted
in PFQDE.HIDE_VLAN not being set for X550.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit d28b194955a9b6e6ccf4383f1baba78bb5a528db)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbevf: Bump version number

Update ixgbevf version number.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit adc2c83e2b317de39220e0004b6556b5ea2bf412)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: Bump version number

Update ixgbe version number.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit 01ec5525fc2a0fcc8f4b796b9bb4ee1c6a5d9415)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: check for Tx timestamp timeouts during watchdog

The ixgbe driver has logic to handle only one Tx timestamp at a time,
using a state bit lock to avoid multiple requests at once.

It may be possible, if incredibly unlikely, that a Tx timestamp event is
requested but never completes. Since we use an interrupt scheme to
determine when the Tx timestamp occurred we would never clear the state
bit in this case.

Add an ixgbe_ptp_tx_hang() function similar to the already existing
ixgbe_ptp_rx_hang() function. This function runs in the watchdog routine
and makes sure we eventually recover from this case instead of
permanently disabling Tx timestamps.

Note: there is no currently known way to cause this without hacking the
driver code to force it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit 622a2ef538fb3ca8eccf49716aba8267d6e95a47)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: add statistic indicating number of skipped Tx timestamps

The ixgbe driver can only handle one Tx timestamp request at a time.
This means it is possible for an application timestamp request to be
ignored.

There is no easy way for an administrator to determine if this occurred.
Add a new statistic which tracks this, tx_hwtstamp_skipped.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit 4cc74c01ef8bb59fae98aeda359e8bcf6148943a)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: avoid permanent lock of *_PTP_TX_IN_PROGRESS

The ixgbe driver uses a state bit lock to avoid handling more than one Tx
timestamp request at once. This is required because hardware is limited
to a single set of registers for Tx timestamps.

The state bit lock is not properly cleaned up during
ixgbe_xmit_frame_ring() if the transmit fails such as due to DMA or TSO
failure. In some hardware this results in blocking timestamps until the
service task times out. In other hardware this results in a permanent
lock of the timestamp bit because we never receive an interrupt
indicating the timestamp occurred, since indeed the packet was never
transmitted.

Fix this by checking for DMA and TSO errors in ixgbe_xmit_frame_ring() and
properly cleaning up after ourselves when these occur.

Reported-by: Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit 5fef124d9c75942dc5c2445a3faa8ad37cbf4c82)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

ixgbe: fix race condition with PTP_TX_IN_PROGRESS bits

Hardware related to the ixgbe driver is limited to handling a single Tx
timestamp request at a time. Thus, the driver ignores requests for Tx
timestamp while waiting for the current request to finish. It uses
a state bit lock which enforces that only one timestamp request is
honored at a time.

Unfortunately this suffers from a simple race condition. The bit lock is
not cleared until after skb_tstamp_tx() is called notifying applications
of a new Tx timestamp. Even a well behaved application sending only one
packet at a time and waiting for a response can wake up and send a new
packet before the bit lock is cleared. This results in needlessly
dropping some Tx timestamp requests.

We can fix this by unlocking the state bit as soon as we read the
Timestamp register, as this is the first point at which it is safe to
unlock.

To avoid issues with the skb pointer, we'll use a copy of the pointer
and set the global variable in the driver structure to NULL first. This
ensures that the next timestamp request does not modify our local copy
of the skb pointer.

This ensures that well behaved applications do not accidentally race
with the unlock bit. Obviously an application which sends multiple Tx
timestamp requests at once will still only timestamp one packet at
a time. Unfortunately there is nothing we can do about this.

Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785078
(cherry picked from commit aaebaf50b502648b1d4d8c93b4be133944c2bbd0)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com>

net: better skb->sender_cpu and skb->napi_id cohabitation

Orabug: 26953388
Orabug: 26591689

skb->sender_cpu and skb->napi_id share a common storage,
and we had various bugs about this.

We had to call skb_sender_cpu_clear() in some places to
not leave a prior skb->napi_id and fool netdev_pick_tx()

As suggested by Alexei, we could split the space so that
these errors can not happen.

0 value being reserved as the common (not initialized) value,
let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
and [NR_CPUS+1 .. ~0U] for valid napi_id.

This will allow proper busy polling support over tunnels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

uek-rpm: Clean up installed directories when uninstalling kernel-ueknano

Orabug: 26929773

When creating kernel-ueknano package, the files (modules) to be included in
it are supplied from a input file. This input file lists the modules with
install path. When installing the rpm, parent directories for these
files are created automatically. When uninstalling, the modules get removed
but not the parent directories. Because of this, /lib/modules/<kversion>/kernel
and its subdirectories are left intact even after the package uninstall.

This commit adds post uninstall scriptlet to remove the
"/lib/modules/<kversion>/" directory when the package is uninstalled.

Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

uek-rpm: Add missing ko modules to nano rpm

Orabug: 26929773

The commit adds target_core_user.ko and dtrace modules to kernel-ueknano rpm.

Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq

When introducing the functions to read the NVM through the AdminQ, we
did not correctly mark the wb_desc.

Fixes: 7073f46e443e ("i40e: Add AQ commands for NVM Update for X722", 2015-06-05)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 3c8f3e96af3a6799841761923d000566645f0942)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: avoid NVM acquire deadlock during NVM update

X722 devices use the AdminQ to access the NVM, and this requires taking
the AdminQ lock. Because of this, we lock the AdminQ during
i40e_read_nvm(), which is also called in places where the lock is
already held, such as the firmware update path which wants to lock once
and then unlock when finished after performing several tasks.

Although this should have only affected X722 devices, commit
96a39aed25e6 ("i40e: Acquire NVM lock before reads on all devices",
2016-12-02) added locking for all NVM reads, regardless of device
family.

This resulted in us accidentally causing NVM acquire timeouts on all
devices, causing failed firmware updates which left the eeprom in
a corrupt state.

Create unsafe non-locked variants of i40e_read_nvm_word and
i40e_read_nvm_buffer, __i40e_read_nvm_word and __i40e_read_nvm_buffer
respectively. These variants will not take the NVM lock and are expected
to only be called in places where the NVM lock is already held if
needed.

Since the only caller of i40e_read_nvm_buffer() was in such a path,
remove it entirely in favor of the unsafe version. If necessary we can
always add it back in the future.

Additionally, we now need to hold the NVM lock in i40e_validate_checksum
because the call to i40e_calc_nvm_checksum now assumes that the NVM lock
is held. We can further move the call to read I40E_SR_SW_CHECKSUM_WORD
up a bit so that we do not need to acquire the NVM lock twice.

This should resolve firmware updates and also fix potential raise that
could have caused the driver to report an invalid NVM checksum upon
driver load.

Reported-by: Stefan Assmann <sassmann@kpanic.de>
Fixes: 96a39aed25e6 ("i40e: Acquire NVM lock before reads on all devices", 2016-12-02)
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 09f79fd49d94cda5837e9bfd0cb222232b3b6d9f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e/i40evf: avoid dynamic ITR updates when polling or low packet rate

The dynamic ITR algorithm depends on a calculation of usecs which
assumes that the interrupts have been firing constantly at the interrupt
throttle rate. This is not guaranteed because we could have a low packet
rate, or have been polling in software.

We'll estimate whether this is the case by using jiffies to determine if
we've been too long. If the time difference of jiffies is larger we are
guaranteed to have an incorrect calculation. If the time difference of
jiffies is smaller we might have been polling some but the difference
shouldn't affect the calculation too much.

This ensures that we don't get stuck in BULK latency during certain rare
situations where we receive bursts of packets that force us into NAPI
polling.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 742c9875759c1858c3312442a78a80f3e93d82c4)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e/i40evf: remove ULTRA latency mode

Since commit c56625d59726 ("i40e/i40evf: change dynamic interrupt
thresholds") a new higher latency ITR setting called I40E_ULTRA_LATENCY
was added with a cryptic comment about how it was meant for adjusting Rx
more aggressively when streaming small packets.

This mode was attempting to calculate packets per second and then kick
in when we have a huge number of small packets.

Unfortunately, the ULTRA setting was kicking in for workloads it wasn't
intended for including single-thread UDP_STREAM workloads.

This wasn't caught for a variety of reasons. First, the ip_defrag
routines were improved somewhat which makes the UDP_STREAM test still
reasonable at 10GbE, even when dropped down to 8k interrupts a second.
Additionally, some other obvious workloads appear to work fine, such
as TCP_STREAM.

The number 40k doesn't make sense for a number of reasons. First, we
absolutely can do more than 40k packets per second. Second, we calculate
the value inline in an integer, which sometimes can overflow resulting
in using incorrect values.

If we fix this overflow it makes it even more likely that we'll enter
ULTRA mode which is the opposite of what we want.

The ULTRA mode was added originally as a way to reduce CPU utilization
during a small packet workload where we weren't keeping up anyways. It
should never have been kicking in during these other workloads.

Given the issues outlined above, let's remove the ULTRA latency mode. If
necessary, a better solution to the CPU utilization issue for small
packet workloads will be added in a future patch.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 0a2c7722be1705edca34458bd9de2f97188f9636)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: invert logic for checking incorrect cpu vs irq affinity

In commit 96db776a3682 ("i40e/vf: fix interrupt affinity bug")
we added some code to force exit of polling in case we did
not have the correct CPU. This is important since it was possible for
the IRQ affinity to be changed while the CPU is pegged at 100%. This can
result in the polling routine being stuck on the wrong CPU until
traffic finally stops.

Unfortunately, the implementation, "if the CPU is correct, exit as
normal, otherwise, fall-through to the end-polling exit" is incredibly
confusing to reason about. In this case, the normal flow looks like the
exception, while the exception actually occurs far away from the if
statement and comment.

We recently discovered and fixed a bug in this code because we were
incorrectly initializing the affinity mask.

Re-write the code so that the exceptional case is handled at the check,
rather than having the logic be spread through the regular exit flow.
This does end up with minor code duplication, but the resulting code is
much easier to reason about.

The new logic is identical, but inverted. If we are running on a CPU not
in our affinity mask, we'll exit polling. However, the code flow is much
easier to understand.

Note that we don't actually have to check for MSI-X, because in the MSI
case we'll only have one q_vector, but its default affinity mask should
be correct as it includes all CPUs when it's initialized. Further, we
could at some point add code to setup the notifier for the non-MSI-X
case and enable this workaround for that case too, if desired, though
there isn't much gain since its unlikely to be the common case.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 6d9777298b54bf1212fcaa6ee6679a430ceca452)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: initialize our affinity_mask based on cpu_possible_mask

On older kernels a call to irq_set_affinity_hint does not guarantee that
the IRQ affinity will be set. If nothing else on the system sets the IRQ
affinity this can result in a bug in the i40e_napi_poll() routine where
we notice that our interrupt fired on the "wrong" CPU according to our
internal affinity_mask variable.

This results in a bug where we continuously tell NAPI to stop polling to
move the interrupt to a new CPU, but the CPU never changes because our
affinity mask does not match the actual mask setup for the IRQ.

The root problem is a mismatched affinity mask value. So lets initialize
the value to cpu_possible_mask instead. This ensures that prior to the
first time we get an IRQ affinity notification we'll have the mask set
to include every possible CPU.

We use cpu_possible_mask instead of cpu_online_mask since the former is
almost certainly never going to change, while the later might change
after we've made a copy.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 759dc4a7e605e0dc21708b0a6e0816ed0ac82641)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: move enabling icr0 into i40e_update_enable_itr

If we don't have MSI-X enabled, we handle interrupts on all icr0. This
is a special case, so let's move the conditional into
i40e_update_enable_itr() in order to make i40e_napi_poll easier to
read about.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 9254c0e34e4253c41fdcd4670b754506ce20d3eb)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: remove workaround for resetting XPS

Since commit 3ffa037d7f78 ("i40e: Set XPS bit mask to zero in DCB mode")
we've tried to reset the XPS settings by building a custom
empty CPU mask.

This workaround is not necessary because we're not really removing the
XPS setting, but simply setting it so that no CPU is valid.

Second, we shorten the code further by using zalloc_cpumask_var instead
of a separate call to bitmap_zero().

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit ba4460d45a6ec04e29e55e6c97edc0e842c18999)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: Fix for unused value issue found by static analysis

This patch fixes an issue where an error return value is
set, but without an immediate exit, the value can be overwritten
by the following code execution. The condition at this point
is not fatal, so remove the error assignment and comment the
intent for future code maintainers

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 19279235bea221798e3307a8bec2c02559cab0c5)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: 25G FEC status improvements

This patch improves the system log message. The log message will
be expanded to include the FEC mode the FW requested before link
was established.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 68e49702a1216bbf098ebfff954eeb8f6fd96415)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: force VMDQ device name truncation

In new versions of GCC since 7.x a new warning exists which warns when
a string is truncated before all of the format can be completed.

When we setup VMDQ netdev names we are copying a pre-existing interface
name which could be up to 15 characters in length. Since we also add
4 bytes, v, the literal %, the d and a \0 null, we would overrun the
available size unless snprintf truncated for us.

The snprintf call will of course truncate on the end, so lets instead
modify the code to force truncation of the copied netdev name by
4 characters, to create enough space for the 4 bytes we're adding.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 8c9eb350aa7b66ab06f3e378dab3c7875a0bf83a)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40evf: fix possible snprintf truncation of q_vector->name

The q_vector names are based on the interface name with a driver prefix,
the type of q_vector setup, and the queue number. We previously set the
size of this variable to IFNAMSIZ + 9, which is incorrect, because we
actually include a minimum of 14 characters extra beyond the interface
name size.

New versions of GCC since 7 include a new warning that detects this
possible truncation and complains. We can fix this by increasing the
size in case our interface name is too large to avoid truncation. We
don't need to go beyond 14 because the compiler is smart enough to
realize our values can never exceed size of 1. We do go up to 15 here
because possible future changes may increase the number of queues beyond
one digit.

While we are here, also change some variables to be unsigned (since they
are never negative) and stop using an extra unnecessary %s format
specifier.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 696ac80aa11fb80e641068123412cd397b460a0b)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: Use correct flag to enable egress traffic for unicast promisc

Albeit, we usually set true promiscuous mode for both multicast and
unicast at the same time - however, it is possible to set it
individually, so using allmulti flag which is only for allmulticast might
caused unwanted behavior in mirroring egress traffic promiscuous for
unicast in VF.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit e53b382f3a207690fc0411a3b39fbd21d7470cfc)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: prevent snprintf format specifier truncation

Increase the size of the prefix buffer so that it can hold enough
characters for every possible input. Although 20 is enough for all
expected inputs, it is possible for the values to be larger than
expected, resulting in a possibly truncated string. Additionally, lets
use sizeof(prefix) in order to ensure we use the correct size if we need
to change the array length in the future.

New versions of GCC starting at 7 now include warnings to prevent
truncation unless you handle the return code. At most 27 bytes can be
written here, so lets just increase the buffer size even if for all
expected hw->bus.* values we only needed 20.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit b5d5504aa1e961fc1f87ee7b092bf5ce1a7bf0de)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: Store the requested FEC information

Store information about FEC modes, that were requested. It will be used
in printing link status information function and this way there is no
need to call admin queue there.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit ed601f660131be6bb9a8a109b0f2bf031786100f)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: Update state variable for adminq subtask

During NVM update, state machine gets into unrecoverable state because
i40e_clean_adminq_subtask can get scheduled after the admin queue
command but before other state variables are updated. This causes
incorrect input to i40e_nvmupd_check_wait_event and state transitions
don't happen.

This fix updates the state variables so that adminq_subtask will have
accurate information whenever it gets scheduled.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 167d52edc4991e81012ef571643d0307aa2bb916)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: synchronize nvmupdate command and adminq subtask

During NVM update, state machine gets into unrecoverable state because
i40e_clean_adminq_subtask can get scheduled after the admin queue
command but before other state variables are updated. This causes
incorrect input to i40e_nvmupd_check_wait_event and state transitions
don't happen.

This issue existed before but surfaced after commit 373149fc99a0
("i40e: Decrease the scope of rtnl lock")

This fix adds locking around admin queue command and update of
state variables so that adminq_subtask will have accurate information
whenever it gets scheduled.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 2bf01935ec5362aee6ff9ffc2476043af321aa42)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>

i40e: prevent changing ITR if adaptive-rx/tx enabled

Currently the driver allows the user to change (or even disable)
interrupt moderation if adaptive-rx/tx is enabled when this should
not be the case.

Adaptive RX/TX will not respect the user's ITR settings so
allowing the user to change it is weird. This bug would also
allow the user to disable interrupt moderation with adaptive-rx/tx
enabled which doesn't make much sense either.

This patch makes it such that if adaptive-rx/tx is enabled, the user
cannot make any manual adjustments to interrupt moderation. It also
makes it so that if ITR is disabled but adaptive-rx/tx is then
enabled, ITR will be re-enabled.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26785018
(cherry picked from commit 06b2decd924891b6c7570a91f91e11a5a8fed421)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com>