David Jeffery [Fri, 28 Aug 2015 04:50:45 +0000 (14:50 +1000)]
xfs: return errors from partial I/O failures to files
There is an issue with xfs's error reporting in some cases of I/O partially
failing and partially succeeding. Calls like fsync() can report success even
though not all I/O was successful in partial-failure cases such as one disk of
a RAID0 array being offline.
The issue can occur when there are more than one bio per xfs_ioend struct.
Each call to xfs_end_bio() for a bio completing will write a value to
ioend->io_error. If a successful bio completes after any failed bio, no
error is reported do to it writing 0 over the error code set by any failed bio.
The I/O error information is now lost and when the ioend is completed
only success is reported back up the filesystem stack.
xfs_end_bio() should only set ioend->io_error in the case of BIO_UPTODATE
being clear. ioend->io_error is initialized to 0 at allocation so only needs
to be updated by a failed bio. Also check that ioend->io_error is 0 so that
the first error reported will be the error code returned.
Cc: stable@vger.kernel.org Signed-off-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Orabug : 22681464
Mainline v4.3 commit c9eb256eda4420c06bb10f5e8fbdbe1a34bc98e0 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don't strip leading zeros from the crypto key ID when using it to construct
the struct key description as the signature in kernels up to and including
4.2 matched this aspect of the key. This means that 1 in 256 keys won't
actually match if their key ID begins with 00.
The key ID is stored in the module signature as binary and so must be
converted to text in order to invoke request_key() - but it isn't stripped
at this point.
Something like this is likely to be observed in dmesg when the key is loaded:
The 'Loaded' line should show an extra '00' on the front of the hex string.
This problem should not affect 4.3-rc1 and onwards because there the key
should be matched on one of its auxiliary identities rather than the key
struct's description string.
Reported-by: Arjan van de Ven <arjan@linux.intel.com> Reported-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: David Howells <dhowells@redhat.com>
(cherry picked from mainline commit e7c87bef7de2417b219d4dbfe8d33a0098a8df54) Signed-off-by: Dan Duval <dan.duval@oracle.com>
Filipe Manana [Wed, 29 Jul 2015 16:21:17 +0000 (17:21 +0100)]
Btrfs: teach backref walking about backrefs with underflowed offset values
When cloning/deduplicating file extents (through the clone and extent_same
ioctls) we can get data back references with offset values that are a
result of an unsigned integer arithmetic underflow, that is, values that
are much larger then they could be otherwise.
This is not a problem when decrementing or dropping the back references
(happens when we overwrite the extents or punch a hole for example, through
__btrfs_drop_extents()), since we compute the same too large offset value,
but it is a problem for the backref walking code, used by an incremental
send and the ioctls that are used by the btrfs tool "inspect-internal"
commands, as it makes it miss the corresponding file extent items because
the search key is set for an extent item that starts at an offset matching
the exceptionally large offset value of the data back reference. For an
incremental send this causes the send ioctl to fail with -EIO.
So teach the backref walking code to deal with these cases by setting the
search key's offset to 0 if the backref's offset value is larger than
LLONG_MAX (the largest possible file offset). This makes sure the backref
walking code finds the corresponding file extent items at the expense of
scanning more items and leafs in the btree.
Fixing the clone/dedup ioctls to not produce such underflowed results would
require major changes breaking backward compatibility, updating user space
tools, etc.
Simple reproducer case for fstests:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
# Create our test file with a single extent of 64K starting at file
# offset 128K.
$XFS_IO_PROG -f -c "pwrite -S 0xaa 128K 64K" $SCRATCH_MNT/foo \
| _filter_xfs_io
# Now clone parts of the original extent into lower offsets of the file.
#
# The first clone operation adds a file extent item to file offset 0
# that points to our initial extent with a data offset of 16K. The
# corresponding data back reference in the extent tree has an offset of
# 18446744073709535232, which is the result of file_offset - data_offset
# = 0 - 16K.
#
# The second clone operation adds a file extent item to file offset 16K
# that points to our initial extent with a data offset of 48K. The
# corresponding data back reference in the extent tree has an offset of
# 18446744073709518848, which is the result of file_offset - data_offset
# = 16K - 48K.
#
# Those large back reference offsets (result of unsigned arithmetic
# underflow) confused the back reference walking code (used by an
# incremental send and the multiple inspect-internal ioctls) and made it
# miss the back references, which for the case of an incremental send it
# made it fail with -EIO and print a message like the following to
# dmesg:
#
# "BTRFS error (device sdc): did not find backref in send_root. \
# inode=257, offset=0, disk_byte=12845056 found extent=12845056"
#
$CLONER_PROG -s $(((128 + 16) * 1024)) -d 0 -l $((16 * 1024)) \
$SCRATCH_MNT/foo $SCRATCH_MNT/foo
$CLONER_PROG -s $(((128 + 48) * 1024)) -d $((16 * 1024)) \
-l $((16 * 1024)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
echo "File digest in the original filesystem:"
md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
# Now recreate the filesystem by receiving both send streams and verify
# we get the same file contents that the original filesystem had.
_scratch_unmount
_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount
echo "File digest in the new filesystem:"
md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
status=0
exit
The test's expected golden output is:
wrote 65536/65536 bytes at offset 131072
XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
File digest in the original filesystem: 6c6079335cff141b8a31233ead04cbff SCRATCH_MNT/mysnap2/foo
File digest in the new filesystem: 6c6079335cff141b8a31233ead04cbff SCRATCH_MNT/mysnap2/foo
But it failed with:
(...)
@@ -1,7 +1,5 @@
QA output created by 097
wrote 65536/65536 bytes at offset 131072
XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-File digest in the original filesystem:
-6c6079335cff141b8a31233ead04cbff SCRATCH_MNT/mysnap2/foo
-File digest in the new filesystem:
-6c6079335cff141b8a31233ead04cbff SCRATCH_MNT/mysnap2/foo
...
Filipe Manana [Wed, 19 Aug 2015 10:09:40 +0000 (11:09 +0100)]
Btrfs: fix file read corruption after extent cloning and fsync
If we partially clone one extent of a file into a lower offset of the
file, fsync the file, power fail and then mount the fs to trigger log
replay, we can get multiple checksum items in the csum tree that overlap
each other and result in checksum lookup failures later. Those failures
can make file data read requests assume a checksum value of 0, but they
will not return an error (-EIO for example) to userspace exactly because
the expected checksum value 0 is a special value that makes the read bio
endio callback return success and set all the bytes of the corresponding
page with the value 0x01 (at fs/btrfs/inode.c:__readpage_endio_check()).
From a userspace perspective this is equivalent to file corruption
because we are not returning what was written to the file.
Details about how this can happen, and why, are included inline in the
following reproducer test case for fstests and the comment added to
tree-log.c.
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
_cleanup_flakey
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
. ./common/dmflakey
# real QA test starts here
_need_to_be_root
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_dm_flakey
_require_cloner
_require_metadata_journaling $SCRATCH_DEV
# Create our test file with a single 100K extent starting at file
# offset 800K. We fsync the file here to make the fsync log tree gets
# a single csum item that covers the whole 100K extent, which causes
# the second fsync, done after the cloning operation below, to not
# leave in the log tree two csum items covering two sub-ranges
# ([0, 20K[ and [20K, 100K[)) of our extent.
$XFS_IO_PROG -f -c "pwrite -S 0xaa 800K 100K" \
-c "fsync" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Now clone part of our extent into file offset 400K. This adds a file
# extent item to our inode's metadata that points to the 100K extent
# we created before, using a data offset of 20K and a data length of
# 20K, so that it refers to the sub-range [20K, 40K[ of our original
# extent.
$CLONER_PROG -s $((800 * 1024 + 20 * 1024)) -d $((400 * 1024)) \
-l $((20 * 1024)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
# Now fsync our file to make sure the extent cloning is durably
# persisted. This fsync will not add a second csum item to the log
# tree containing the checksums for the blocks in the sub-range
# [20K, 40K[ of our extent, because there was already a csum item in
# the log tree covering the whole extent, added by the first fsync
# we did before.
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
echo "File digest before power failure:"
md5sum $SCRATCH_MNT/foo | _filter_scratch
# Silently drop all writes and ummount to simulate a crash/power
# failure.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
# Allow writes again, mount to trigger log replay and validate file
# contents.
# The fsync log replay first processes the file extent item
# corresponding to the file offset 400K (the one which refers to the
# [20K, 40K[ sub-range of our 100K extent) and then processes the file
# extent item for file offset 800K. It used to happen that when
# processing the later, it erroneously left in the csum tree 2 csum
# items that overlapped each other, 1 for the sub-range [20K, 40K[ and
# 1 for the whole range of our extent. This introduced a problem where
# subsequent lookups for the checksums of blocks within the range
# [40K, 100K[ of our extent would not find anything because lookups in
# the csum tree ended up looking only at the smaller csum item, the
# one covering the subrange [20K, 40K[. This made read requests assume
# an expected checksum with a value of 0 for those blocks, which caused
# checksum verification failure when the read operations finished.
# However those checksum failure did not result in read requests
# returning an error to user space (like -EIO for e.g.) because the
# expected checksum value had the special value 0, and in that case
# btrfs set all bytes of the corresponding pages with the value 0x01
# and produce the following warning in dmesg/syslog:
#
# "BTRFS warning (device dm-0): csum failed ino 257 off 917504 csum\
# 1322675045 expected csum 0"
#
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
echo "File digest after log replay:"
# Must match the same digest he had after cloning the extent and
# before the power failure happened.
md5sum $SCRATCH_MNT/foo | _filter_scratch
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from mainline commit b84b8390d6009cde5134f775a251103c14bbed74) Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
When ctx access is used, the kernel often needs to expand/rewrite
instructions, so after that patching, branch offsets have to be
adjusted for both forward and backward jumps in the new eBPF program,
but for backward jumps it fails to account the delta. Meaning, for
example, if the expansion happens exactly on the insn that sits at
the jump target, it doesn't fix up the back jump offset.
Analysis on what the check in adjust_branches() is currently doing:
/* adjust offset of jmps if necessary */
if (i < pos && i + insn->off + 1 > pos)
insn->off += delta;
else if (i > pos && i + insn->off + 1 < pos)
insn->off -= delta;
First case is if we cross pos-boundary and the jump instruction was
before pos. This is handeled correctly. I.e. if i == pos, then this
would mean our jump that we currently check was the patchlet itself
that we just injected. Since such patchlets are self-contained and
have no awareness of any insns before or after the patched one, the
delta is correctly not adjusted. Also, for the second condition in
case of i + insn->off + 1 == pos, means we jump to that newly patched
instruction, so no offset adjustment are needed. That part is correct.
Second interesting case is where we cross pos-boundary and the jump
instruction was after pos. Backward jump with i == pos would be
impossible and pose a bug somewhere in the patchlet, so the first
condition checking i > pos is okay only by itself. However, i +
insn->off + 1 < pos does not always work as intended to trigger the
adjustment. It works when jump targets would be far off where the
delta wouldn't matter. But, for example, where the fixed insn->off
before pointed to pos (target_Y), it now points to pos + delta, so
that additional room needs to be taken into account for the check.
This means that i) both tests here need to be adjusted into pos + delta,
and ii) for the second condition, the test needs to be <= as pos
itself can be a target in the backjump, too.
Fixes: 9bac3d6d548e ("bpf: allow extended BPF programs access skb fields") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a1b14d27ed0965838350f1377ff97c93ee383492) Signed-off-by: Brian Maly <brian.maly@oracle.com>
The 'umidi' object will be free'd on the error path by snd_usbmidi_free()
when tearing down the rawmidi interface. So we shouldn't try to free it
in snd_usbmidi_create() after having registered the rawmidi interface.
Found by KASAN.
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> Acked-by: Clemens Ladisch <clemens@ladisch.de> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 07d86ca93db7e5cdf4743564d98292042ec21af7) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Jason Luo [Thu, 25 Feb 2016 08:52:10 +0000 (16:52 +0800)]
bio: Fix kabi error
The two commits:
bio: skip atomic inc/dec of ->bi_remaining for non-chains
bio: skip atomic inc/dec of ->bi_cnt for most use cases
rename some members of struct bio which causes KABI changes
Orabug: 22820562 Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
Venkat Venkatsubra [Tue, 1 Mar 2016 22:27:28 +0000 (14:27 -0800)]
RDS/IB: VRPC DELAY / OSS RECONNECT CAUSES 5 MINUTE STALL ON PORT FAILURE
This problem occurs when the user gets notified of a successful
rdma write + bcopy message completion but the peer application
does not receive the bcopy message. This happens during a port down/up test.
What seems to happen is the rdma write succeeds but the bcopy message fails.
RDS should not be returning successful completion status to the user
in this case.
When RDS does a rdma followed by a bcopy message the user notification is
supposed to be implemented by method #3 below.
/* If the user asked for a completion notification on this
* message, we can implement three different semantics:
* 1. Notify when we received the ACK on the RDS message
* that was queued with the RDMA. This provides reliable
* notification of RDMA status at the expense of a one-way
* packet delay.
* 2. Notify when the IB stack gives us the completion event for
* the RDMA operation.
* 3. Notify when the IB stack gives us the completion event for
* the accompanying RDS messages.
* Here, we implement approach #3. To implement approach #2,
* we would need to take an event for the rdma WR. To implement #1,
* don't call rds_rdma_send_complete at all, and fall back to the notify
* handling in the ACK processing code.
But unfortunately the user gets notified earlier to knowing the bcopy
send status. Right after rdma write completes the user gets notified
even though the subsequent bcopy eventually fails.
The fix is to delay signaling completions of rdma op till the
bcopy send completes.
Ajaykumar Hotchandani [Fri, 4 Mar 2016 03:23:05 +0000 (19:23 -0800)]
rds: add infrastructure to find more details for reconnect failure
This patch adds run-time support to debug scenarios where reconnect is
not successful for certain time.
We add two sysctl variables for start time and end time. These are
number of seconds after reconnect was initiated.
Ajaykumar Hotchandani [Fri, 4 Mar 2016 03:18:28 +0000 (19:18 -0800)]
rds: find connection drop reason
This patch attempts to find connection drop details.
Rational for adding this type of patch is, there are too many
places from where connection can get dropped.
And, in some cases, we don't have any idea of the source of
connection drop. This is especially painful for issues which
are reproducible in customer environment only.
Idea here is, we have tracker variable which keeps latest value
of connection drop source.
We can fetch that tracker variable as per our need.
Existing Intel xHCI controllers require a delay of 1 mS,
after setting the CMD_RESET bit in command register, before
accessing any HC registers. This allows the HC to complete
the reset operation and be ready for HC register access.
Without this delay, the subsequent HC register access,
may result in a system hang, very rarely.
Verified CherryView / Braswell platforms go through over
5000 warm reboot cycles (which was not possible without
this patch), without any xHCI reset hang.
Signed-off-by: Rajmohan Mani <rajmohan.mani@intel.com> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a5964396190d0c40dd549c23848c282fffa5d1f2) Signed-off-by: Dan Duval <dan.duval@oracle.com>
Don Brace [Fri, 6 Nov 2015 14:04:55 +0000 (06:04 -0800)]
hpsa: fix rmmod issues
The driver is calling hpsa_shutdown before calling scsi_remove_host.
hpsa_shutdown is disabling interrupts.
scsi_remove_host can trigger I/O operations, such as
SYNCHRONIZE CACHE when multipath is enabled which hang the system.
Call scsi_remove_host before calling hpsa_shutdown.
Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Scott Teel <scott.teel@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
shane.seymour [Fri, 6 Nov 2015 14:04:55 +0000 (06:04 -0800)]
hpsa: fix issues with multilun devices
A regression was introduced into the hpsa driver a while back so
non-zero LUNs of multi-LUN devices may no longer be presented via
a SAS based Smart Array. I have not done a bisection to discover
the change that caused it.
The CISS firmware specification (available on sourceforge)
defines an 8 byte lunid that describes devices that the Smart
Array can see/present to the system. The current code in the hpsa
driver attempts to find matches for non-zero LUNs with LUN 0 for
a bus/target by zeroing out byte 4 of the lunid and find a match.
This method is sufficient for SCSI based Smart Arrays because
byte 5 is always 0. For SAS based Smart arrays byte 5 of the
lunid contains the path number for a multipath device and
either one or two bits (the documentation does not define how
many bits are used but it appears it may be one only) that
indicate if the given path number in byte 5 must always be
used to access that device. Byte 5 may not always be zero.
The following are lunids (spaces added for clarity) for a
MSL2024 single drive library connected via a H241 Smart Array:
In the 4th byte (counting from 0) you can see that the tape
is LUN 0 and the changer is LUN 1. The 0x80 set in the 5th byte
for the tape drive means the driver should force access to
path 0 (the library in this case was connected to one path only
anyway).
After the changes we can see the following in the dmesg output:
scsi 0:3:0:0: RAID HP H241 1.18 \
PQ: 0 ANSI: 5
scsi 0:2:0:0: Sequential-Access HP Ultrium 6-SCSI 354W \
PQ: 0 ANSI: 6
scsi 0:2:0:1: Medium Changer HP MSL G3 Series 8.70 \
PQ: 0 ANSI: 5
Showing that the changer is correctly identified as LUN 1 of
bus 2 target 0. Before the change the changer device is not seen.
Suggested-by: shane.seymour <shane.seymour@hp.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Scott Teel <scott.teel@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Scott Benesh [Fri, 6 Nov 2015 14:04:54 +0000 (06:04 -0800)]
hpsa: add in new offline mode
prevent adding volumes that are not available.
Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Don Brace [Fri, 6 Nov 2015 14:04:54 +0000 (06:04 -0800)]
hpsa: correct static checker warnings on driver init cleanup
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Don Brace [Fri, 6 Nov 2015 14:04:54 +0000 (06:04 -0800)]
hpsa: correct decode sense data
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Don Brace [Fri, 6 Nov 2015 14:04:53 +0000 (06:04 -0800)]
hpsa: Correct double unlock of mutex
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Don Brace [Fri, 6 Nov 2015 13:47:45 +0000 (05:47 -0800)]
hpsa: change driver version
update driver version
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Webb Scales [Fri, 6 Nov 2015 13:47:45 +0000 (05:47 -0800)]
hpsa: cleanup reset
Synchronize completion the reset with completion of outstanding commands
Extending the newly-added synchronous abort functionality,
now also synchronize resets with the completion of outstanding commands.
Rename the wait queue to reflect the fact that it's being used for both
types of waits. Also, don't complete commands which are terminated
due to a reset operation.
fix for controller lockup during reset
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Webb Scales <webbnh@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:45 +0000 (05:47 -0800)]
hpsa: propagate the error code in hpsa_kdump_soft_reset
If hpsa_wait_for_board_state fails, hpsa_kdump_soft_reset
should propagate its return value (e.g., -ENODEV) rather
than just returning -1.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:44 +0000 (05:47 -0800)]
hpsa: use scsi host_no as hpsa controller number
Rather than numbering the hpsa controllers with an
incrementing 0..n value (e.g., that shows up in
/proc/interrupts), use the scsi midlayer
host_no (e.g. matching /sys/class/scsi_host/hostNN).
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Webb Scales [Fri, 6 Nov 2015 13:47:44 +0000 (05:47 -0800)]
hpsa: use block layer tag for command allocation
Rework slave allocation:
- separate the tagging support setup from the hostdata setup
- make the hostdata setup act consistently when the lookup fails
- make the hostdata setup act consistently when the device is not added
- set up the queue depth consistently across these scenarios
- if the block layer mq support is not available, explicitly enable and
activate the SCSI layer tcq support (and do this at allocation-time so
that the tags will be available for INQUIRY commands)
Tweak slave configuration so that devices which are masked are also
not attached.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Webb Scales <webbnh@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:44 +0000 (05:47 -0800)]
hpsa: add interrupt number to /proc/interrupts interrupt name
Add the interrupt number to the interrupt names that
appear in /proc/interrupts, so they are unique
Also, delete the IRQ and DAC prints. Other parts of the kernel
already print the IRQ assignments, and dual-address-cycle support
has not been interesting since the parallel PCI bus went from
32 to 64 bits wide.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:44 +0000 (05:47 -0800)]
hpsa: create workqueue after the driver is ready for use
Don't create the resubmit workqueue in hpsa_init_one until everything else
is ready to use, so everything can be freed in reverse order of when they
were allocated without risking freeing things while workqueue items are
still active.
Destroy the workqueue in the right order in
hpsa_undo_allocations_after_kdump_soft_reset too.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:44 +0000 (05:47 -0800)]
hpsa: fix try_soft_reset error handling
If registering the special interrupt handlers in hpsa_init_one
before a soft reset fails, the error exit needs to deallocate
everything that was allocated before.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:44 +0000 (05:47 -0800)]
hpsa: cleanup for init_one step 2 in kdump
In hpsa_undo_allocations_after_kdump_soft_reset,
the things allocated in hpsa_init_one step 2 -
h->resubmit_wq and h->lockup_detected need to
be freed, in the right order.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:44 +0000 (05:47 -0800)]
hpsa: skip free_irq calls if irqs are not allocated
If try_soft_reset fails to re-allocate irqs, the error exit
starts with free_irq calls, which generate kernel WARN
messages since they were already freed a few lines earlier.
Jump to the next exit label to skip the free_irq calls.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:43 +0000 (05:47 -0800)]
hpsa: call pci_release_regions after pci_disable_device
Despite the fact that PCI devices are enabled in this order:
1. pci_enable_device
2. pci_request_regions
Documentation/PCI/pci.txt specifies that they be undone
in this order
1. pci_disable_device
2. pci_release_regions
Tested by injecting error in the call to pci_enable_device
in hpsa_init_one -> hpsa_pci_init:
[ 9.095001] hpsa 0000:04:00.0: failed to enable PCI device
[ 9.095005] hpsa: probe of 0000:04:00.0 failed with error -22
(-22 is -EINVAL)
and then in the call pci_request_regions:
[ 9.178623] hpsa 0000:04:00.0: failed to obtain PCI resources
[ 9.178671] hpsa: probe of 0000:04:00.0 failed with error -16
(-16 is -EBUSY)
and then by adding
reset_devices
to the kernel command line and inject errors into the two
calls to pci_enable_device and the call to pci_request_regions
in hpsa_init_one -> hpsa_init_reset_devices.
(inject on 6th call, 1st to hpsa2)
[ 62.413750] hpsa 0000:04:00.0: Failed to enable PCI device
(inject on 7th call, 2nd to hpsa2)
[ 62.807571] hpsa 0000:04:00.0: failed to enable device.
(inject on 8th call, 3rd to hpsa2)
[ 62.697198] hpsa 0000:04:00.0: failed to obtain PCI resources
[ 62.697234] hpsa: probe of 0000:04:00.0 failed with error -16
The reset_devices path calls return -ENODEV on failure
rather than passing the result, which apparently doesn't
cause the pci driver to print anything.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace < don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Webb Scales [Fri, 6 Nov 2015 13:47:43 +0000 (05:47 -0800)]
hpsa: performance tweak for hpsa_scatter_gather()
Divide the loop in hpsa_scatter_gather() into two, one for the initial SG list
and a second one for the chained list, if any. This allows the conditional
check which resets the indicies for the chained list to be performed outside
the loop instead of being done on every iteration inside the loop.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Webb Scales <webbnh@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Webb Scales [Fri, 6 Nov 2015 13:47:43 +0000 (05:47 -0800)]
hpsa: refactor and rework support for sending TEST_UNIT_READY
Factor out the code which sends the TEST_UNIT_READY from
wait_for_device_to_become_ready() into its own function.
Move the code which waits for the TEST_UNIT_READY from
wait_for_device_to_become_ready() into its own function.
If a logical drive has failed, resetting it will ensure
outstanding commands are completed, but polling it with
TURs after the reset will not work because the TURs will
never report good status. So successful TUR should not
be a condition of success for the device reset error
handler.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Webb Scales <webbnh@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Webb Scales [Fri, 6 Nov 2015 13:47:43 +0000 (05:47 -0800)]
hpsa: don't return abort request until target is complete
Don't return from the abort request until the target command is complete.
Mark outstanding commands which have a pending abort, and do not send them
to the host if we can avoid it.
If the current command has been aborted, do not call the SCSI command
completion routine from the I/O path: when the abort returns successfully,
the SCSI mid-layer will handle the completion implicitly.
The following race was possible in theory.
1. LLD is requested to abort a scsi command
2. scsi command completes
3. The struct CommandList associated with 2 is made available.
4. new io request to LLD to another LUN re-uses struct CommandList
5. abort handler follows scsi_cmnd->host_scribble and
finds struct CommandList and tries to aborts it.
Now we have aborted the wrong command.
Fix by resetting the scsi_cmd field of struct CommandList
upon completion and making the abort handler check that
the scsi_cmd pointer in the CommadList struct matches the
scsi_cmnd that it has been asked to abort.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Webb Scales <webbnh@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Webb Scales [Fri, 6 Nov 2015 13:47:43 +0000 (05:47 -0800)]
hpsa: use helper routines for finishing commands
cleanup command completions
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Webb Scales <webbnh@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Stephen Cameron [Fri, 6 Nov 2015 13:47:43 +0000 (05:47 -0800)]
hpsa: add support sending aborts to physical devices via the ioaccel2 path
add support for tmf when in ioaccel2 mode
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Joe Handzik <joseph.t.handzik@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:43 +0000 (05:47 -0800)]
hpsa: do not print ioaccel2 warning messages about unusual completions.
The SCSI midlayer already prints more detail about completions,
and has logging level options to filter them if not wanted.
These just slow down the system if a lot of errors occur,
stressing error handling even more.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:42 +0000 (05:47 -0800)]
hpsa: clean up some error reporting output in abort handler
report more useful information on aborts
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:42 +0000 (05:47 -0800)]
hpsa: clean up driver init
Improve initialization error handling in hpsa_init_one
Clean up style and indent issues
Rename functions for consistency
Improve error messaging on allocations
Fix return status from hpsa_put_ctlr_into_performant_mode
Correct free order in hpsa_init_one using new function
hpsa_free_performant_mode
Prevent inadvertent use of null pointers by nulling out the parent structures
and zeroing out associated size variables.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:42 +0000 (05:47 -0800)]
hpsa: correct return values from driver functions.
correct return codes for error conditions
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:42 +0000 (05:47 -0800)]
hpsa: do not check cmd_alloc return value - it cannnot return NULL
cmd_alloc can no longer return NULL, so don't check for NULL any more
(which is unreachable code).
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Joe Handzik [Fri, 6 Nov 2015 13:47:42 +0000 (05:47 -0800)]
hpsa: add more ioaccel2 error handling, including underrun statuses.
improve ioaccel2 error handling, including better handling of
underrun statuses
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Joe Handzik <joseph.t.handzik@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Webb Scales [Fri, 6 Nov 2015 13:47:42 +0000 (05:47 -0800)]
hpsa: add ioaccel sg chaining for the ioaccel2 path
Increase the request size for ioaccel2 path.
The error, if any, returned by hpsa_allocate_ioaccel2_sg_chain_blocks
to hpsa_alloc_ioaccel2_cmd_and_bft should be returned upstream rather
than assumed to be -ENOMEM.
This differs slightly from hpsa_alloc_ioaccel1_cmd_and_bft,
which does not call another hpsa_allocate function and only
has -ENOMEM to return from some kmalloc calls.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:42 +0000 (05:47 -0800)]
hpsa: refactor freeing of resources into more logical functions
refactor freeing of resources into more logical functions
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:41 +0000 (05:47 -0800)]
hpsa: clean up error handling
refactor error cleanup and shutdown
disable interrupts and pci_disable_device on critical failures
add hpsa_free_cfgtables function
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:41 +0000 (05:47 -0800)]
hpsa: break hpsa_free_irqs_and_disable_msix into two functions
replace calls to hpsa_free_irqs_and_disable_msix with
hpsa_free_irqs and hpsa_disable_interrupt_mode
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Joe Handzik [Fri, 6 Nov 2015 13:47:41 +0000 (05:47 -0800)]
hpsa: Get queue depth from identify physical bmic for physical disks.
get drive queue depth to help avoid task set full conditions.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Joe Handzik <joseph.t.handzik@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Joe Handzik [Fri, 6 Nov 2015 13:47:41 +0000 (05:47 -0800)]
hpsa: use ioaccel2 path to submit IOs to physical drives in HBA mode.
use ioaccel2 path to submit I/O to physical drives in HBA mode
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Joe Handzik <joseph.t.handzik@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:41 +0000 (05:47 -0800)]
hpsa: print accurate SSD Smart Path Enabled status
offload_enabled changes are deferred until after the
added/updated prints occur, so the values are incorrect.
defer printing SSD Smart Path Enabled status information until the
information is correct
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Webb Scales [Fri, 6 Nov 2015 13:47:41 +0000 (05:47 -0800)]
hpsa: factor out hpsa_ioaccel_submit function
clean up command submission
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Webb Scales <webbnh@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Stephen Cameron [Fri, 6 Nov 2015 13:47:41 +0000 (05:47 -0800)]
hpsa: try resubmitting down raid path on task set full
allow the controller firmware to queue up commands when the ioaccel device
queue is full.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Stephen Cameron [Fri, 6 Nov 2015 13:47:40 +0000 (05:47 -0800)]
hpsa: do not ignore return value of hpsa_register_scsi
add error handling for failure when registering with SCSI subsystem.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Stephen Cameron [Fri, 6 Nov 2015 13:47:40 +0000 (05:47 -0800)]
hpsa: factor out hpsa_init_cmd function
Factor out hpsa_cmd_init from cmd_alloc(). We also need
this for resubmitting commands down the default RAID path
when they have returned from the ioaccel paths with errors.
In particular, reinitialize the cmd_type and busaddr fields as these
will not be correct for submitting down the RAID stack path
after ioaccel command completion.
This saves time when submitting commands.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Robert Elliott [Fri, 6 Nov 2015 13:47:40 +0000 (05:47 -0800)]
hpsa: make function names consistent
make function names more consistent and meaningful
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Stephen Cameron [Fri, 6 Nov 2015 13:47:40 +0000 (05:47 -0800)]
hpsa: allow lockup detected to be viewed via sysfs
expose a detected lockup via sysfs
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Stephen Cameron [Fri, 6 Nov 2015 13:47:40 +0000 (05:47 -0800)]
hpsa: hpsa decode sense data for io and tmf
In hba mode, we could get sense data in descriptor format so
we need to handle that.
It's possible for CommandStatus to have value 0x0D
"TMF Function Status", which we should handle. We will get
this from a P1224 when aborting a non-existent tag, for
example. The "ScsiStatus" field of the errinfo field
will contain the TMF function status value.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Stephen Cameron [Fri, 6 Nov 2015 13:47:40 +0000 (05:47 -0800)]
hpsa: decrement h->commands_outstanding in fail_all_outstanding_cmds
make tracking of outstanding commands more robust
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Stephen Cameron [Fri, 6 Nov 2015 13:47:40 +0000 (05:47 -0800)]
hpsa: clean up aborts
Do not send aborts to logical devices that do not support aborts
Instead of relying on what the Smart Array claims for supporting logical
drives, simply try an abort and see how it responds at device discovery
time. This way devices that do support aborts (e.g. MSA2000) can work
and we do not waste time trying to send aborts to logical drives that do
not support them (important for high IOPS devices.)
While rescanning devices only test whether devices support aborts
the first time we encounter a device rather than every time.
Some Smart Arrays required aborts to be sent with tags in
the wrong endian byte order. To avoid having to know about
this, we would send two aborts with tags with each endian order.
On high IOPS devices, this turns out to be not such a hot idea.
So we now have a list of the devices that got the tag backwards,
and we only send it one way.
If all available commands are outstanding and the abort handler
is invoked, the abort handler may not be able to allocate a command
and may busy-wait excessivly. Reserve a small number of commands
for the abort handler and limit the number of concurrent abort
requests to the number of reserved commands.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Webb Scales [Fri, 6 Nov 2015 13:47:39 +0000 (05:47 -0800)]
hpsa: rework controller command submission
Allow driver initiated commands to have a timeout. It does not
yet try to do anything with timeouts on such commands.
We are sending a reset in order to get rid of a command we want to abort.
If we make it return on the same reply queue as the command we want to abort,
the completion of the aborted command will not race with the completion of
the reset command.
Rename hpsa_scsi_do_simple_cmd_core() to hpsa_scsi_do_simple_cmd(), since
this function is the interface for issuing commands to the controller and
not the "core" of that implementation. Add a parameter to it which allows
the caller to specify the reply queue to be used. Modify existing callers
to specify the default reply queue.
Rename __hpsa_scsi_do_simple_cmd_core() to hpsa_scsi_do_simple_cmd_core(),
since this routine is the "core" implementation of the "do simple command"
function and there is no longer any other function with a similar name.
Modify the existing callers of this routine (other than
hpsa_scsi_do_simple_cmd()) to instead call hpsa_scsi_do_simple_cmd(), since
it will now accept the reply_queue paramenter, and it provides a controller
lock-up check. (Also, tweak two related message strings to make them
distinct from each other.)
Submitting a command to a locked up controller always results in a timeout,
so check for controller lock-up before submitting.
This is to enable fixing a race between command completions and
abort completions on different reply queues in a subsequent patch.
We want to be able to specify which reply queue an abort completion
should occur on so that it cannot race the completion of the command
it is trying to abort.
The following race was possible in theory:
1. Abort command is sent to hardware.
2. Command to be aborted simultaneously completes on another
reply queue.
3. Hardware receives abort command, decides command has already
completed and indicates this to the driver via another different
reply queue.
4. driver processes abort completion finds that the hardware does not know
about the command, concludes that therefore the command cannot complete,
returns SUCCESS indicating to the mid-layer that the scsi_cmnd may be
re-used.
5. Command from step 2 is processed and completed back to scsi mid
layer (after we already promised that would never happen.)
Fix by forcing aborts to complete on the same reply queue as the command
they are aborting.
Piggybacking device rescanning functionality onto the lockup
detection thread is not a good idea because if the controller
locks up during device rescanning, then the thread could get
stuck, then the lockup isn't detected. Use separate work
queues for device rescanning and lockup detection.
Detect controller lockup in abort handler.
After a lockup is detected, return DO_NO_CONNECT which results in immediate
termination of commands rather than DID_ERR which results in retries.
Modify detect_controller_lockup() to return the result, to remove the need for a separate check.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Webb Scales <webbnh@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Webb Scales [Fri, 6 Nov 2015 13:47:39 +0000 (05:47 -0800)]
hpsa: clean up host, channel, target, lun prints
We had a mix of formats used for specifying controller, bus, target,
and lun address of devices.
change to the format used by the scsi midlayer and upper layer (2:3:0:0)
so you can easily follow the information from hpsa to scsi midlayer
to sd upper layer.
Also add this information:
- product ID
- vendor ID
- RAID level
- SSD Smath Path capable and enabled
- exposure level (sg-only)
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Stephen Cameron [Fri, 6 Nov 2015 13:47:39 +0000 (05:47 -0800)]
hpsa: add masked physical devices into h->dev[] array
Cache the ioaccel handle so that when we need to abort commands sent
down the ioaccel2 path, we can look up the LUN ID in h->dev[] instead of
having to do I/O to the controller.
Add a field to elements in h->dev[] to keep track of how the device is exposed
to the SCSI mid layer: Not at all, without an upper level driver
(no_uld_attach) or normally exposed.
Since masked physical devices are now present in h->dev[] array
it would be perfectly possible to do
and bring them online. This was previously not allowed for masked
physical devices.
Ensure that the mapping of physical disks to logical drives gets updated in a
consistent way when a RAID migration occurs and is not touched until updates
to it are complete.
now instead of doing CISS_REPORT_PHYSICAL to get the LUNID for
the physical disk in hpsa_get_pdisk_of_ioaccel2(), just get
it out of h->dev[] where we already have it cached.
do not touch phys_disk[] for ioaccel enabled logical drives during rescan
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com>
Orabug: 22075051 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Change-ID: Ifa19aadaa892ad103f1b96fe2361fa690912c6a3 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b8f1343a13c669aaa3d475ed8513a32154ae5ffd) Signed-off-by: Brian Maly <brian.maly@oracle.com>
If we reset a VF, its VSI goes away, and it gets a new one. So don't
hang on to the now-stale local VSI pointer. It just leads to suffering
and kernel panics.
Change-ID: Ia8823b4e85893e95e963acee284968022b29177a Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 35f3472a750b3549f7f914ed96f41f0c2ca284f3) Signed-off-by: Brian Maly <brian.maly@oracle.com>
We need to suspend scheduling or any pending service task during driver
unload process, so that new task will not be scheduled. This patch sets
the suspend flag bit during reload which avoids service task execution.
Change-ID: I017c57b5d6656564556e3c5387da671369a572ac Signed-off-by: Pandi Kumar Maharajan <pandi.maharajan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit a4618ec88de95a86f290d01c74c506552f1a5d95) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Use the new AdminQ functions for safely accessing the Rx control
registers that may be affected by heavy small packet traffic.
We can't use AdminQ calls in i40e_clear_hw() because the HW is being
initialized and the AdminQ is not alive. We recently added an AQ
related replacement for reading PFLAN_QALLOC, and this patch puts
back the original register read.
Change-ID: Ib027168c954a5733299aa3a4ce5f8218c6bb5636 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 272cdaf2472ab7713deebe060bb90319b0382a94) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Use the new AdminQ functions for safely accessing the Rx control
registers that may be affected by heavy small packet traffic.
Change-ID: Ibb00983e8dcba71f4b760222a609a5fcaa726f18 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f658137cbb1fddbe40ec7f1a2cebaf9dc9484ea7) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Add the new opcodes and struct used for asking the firmware to update Rx
control registers that need extra care when being accessed while under
heavy traffic - e.g. sustained 64byte packets at line rate on all ports.
The firmware will take extra steps to be sure the register accesses
are successful.
Change-ID: I56c8144000da66ad99f68948d8a184b2ec2aeb3e Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3336514381f9ef99c50e5337ae1bf36f8138679d) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Return from i40e_vsi_reinit_setup() if vsi param is NULL.
This makes this code consistent with all the other code that
checks for NULL before using one of the VSI pointers accessed
with an indexed variable. (Indexed VSI pointers are
intentionally set to NULL in i40e_vsi_clear() and
i40e_remove().
Change-ID: I3bc8b909c70fd2439334eeae994d151f61480985 Signed-off-by: John Underwood <johnx.underwood@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f534039dd8ab39cb3259e5860d2be3b0e70aacbf) Signed-off-by: Brian Maly <brian.maly@oracle.com>
This patch adds 7 new register definitions for programming the
parser, flow director and RSS blocks in the HW.
Change-ID: I31e76673125275f3c69a14c646361919d04dc987 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit fe726082728da9f653d4e747baf0816d826fc626) Signed-off-by: Brian Maly <brian.maly@oracle.com>
This fixes an issue where a previously removed message
has returned. Changing the message type to dev_dbg
leaves the info, if desired, but takes it out of normal
everyday usage. Also changed call to only provide port
data when its valid and not when its not (delete case).
Change-ID: Ief6f33b915f6364c24fa8e5789c2fc3168b5e2ed Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 730a8f8777e55912f445c2c29234d51cceb1dfc2) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Just like Tx queues don't wait for Rx queues to be disabled before
DCB has been reconfigured.
Check the queues are disabled only after the DCB configuration has
been applied to the VSI(s) managed by the PF driver.
In case of any timeout issue a PF reset to recover.
Change-ID: Ic51e94c25baf9a5480cee983f35d15575a88642c Signed-off-by: Neerav Parikh <neerav.parikh@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3fe06f415b31ad06d2c2923216292057e899eb0d) Signed-off-by: Brian Maly <brian.maly@oracle.com>
When linking with particular PHY types (ex: copper PHY), the amount of
time it takes for the GLGEN_RSTAT_DEVSTATE to be set increases greatly,
which can lead to a timeout and failure to load the driver.
Change-ID: If02be0dfcd7c57fdde2d5c81cd63651260cd2029 Signed-off-by: Kevin Scott <kevin.c.scott@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 4d7cec078de864b7ba5459aa688278c4e6f3ad42) Signed-off-by: Brian Maly <brian.maly@oracle.com>
This patch fixes a problem where the ethtool identify adapter
functionality did not work for some copper PHY's. Without this
patch, the blink led functionality fails on some parts. This
patch adds PHY write code to blink led's on parts where this
functionality is contained in the PHY rather than the MAC.
Change-ID: Iee7b3453f61d5ffd0b3d03f720ee4f17f919fcc2 Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 31b606d0c40a1435c54bff18e4d3d3c33af1c3cf) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/intel/i40e/i40e.h
This patch adds functions to blink led on devices using
10GBaseT PHY since MAC registers used in other designs
do not work in this device configuration.
Change-ID: Id4b88c93c649fd2b88073a00b42867a77c761ca3 Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit fd077cd3399b164548f538625f07f3e9f1d7ef00) Signed-off-by: Brian Maly <brian.maly@oracle.com>
On all of the other Intel drivers we place checksum close to TSO as they
have a significant amount in common and it can help to reduce the decision
tree for how to handle the frame as the first check in TSO is to see if
checksumming is offloaded, and if it is not we can skip _BOTH_ TSO and Tx
checksum offload based on a single check.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 3bc67973e81d5104908a4ba7c2aab98a4f7bd64e) Signed-off-by: Brian Maly <brian.maly@oracle.com>
This patch is meant to rewrite the logic for how we determine if we can
transmit the frame or if it needs to be linearized.
The previous code for this function was using a mix of division and modulus
division as a part of computing if we need to take the slow path. Instead
I have replaced this by simply working with a sliding window which will
tell us if the frame would be capable of causing a single packet to span
several descriptors.
The logic for the scan is fairly simple. If any given group of 6 fragments
is less than gso_size - 1 then it is possible for us to have one byte
coming out of the first fragment, 6 fragments, and one or more bytes coming
out of the last fragment. This gives us a total of 8 fragments
which exceeds what we can allow so we send such frames to be linearized.
Arguably the use of modulus might be more exact as the approach I propose
may generate some false positives. However the likelihood of us taking much
of a hit for those false positives is fairly low, and I would rather not
add more overhead in the case where we are receiving a frame composed of 4K
pages.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 2d37490b82afe1d1b745811e6ce0a4d16bc5e996) Signed-off-by: Brian Maly <brian.maly@oracle.com>
In an upcoming patch I would like to have access to the descriptor count
used for the data portion of the frame. For this reason I am splitting up
the descriptor count function from the function that stops the ring.
Also in order to try and reduce unnecessary duplication of code I am moving
the slow-path portions of the code out of being inline calls so that we can
just jump to them and process them instead of having to build them into
each function that calls them.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 4ec441df25a686518fb369086e2b34a1cedaa6c9) Signed-off-by: Brian Maly <brian.maly@oracle.com>
This patch updates the code for determining the L4 protocol and L3 header
length so that when IPv6 extension headers are being used we can determine
the offset and type of the L4 protocol.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit ffcc55c0c2a85835a4ac080bc1053c3a277b88e2) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Recent changes should have enabled support for IPv6 based tunnels and
support for TSO with outer UDP checksums. As such we can update the
feature flags to reflect that.
In addition we can clean-up the flags that aren't needed such as SCTP and
RXCSUM since having the bits there doesn't add any value.
I also found one spot where we were setting the same flag twice. It looks
like it was probably a git merge error that resulted in the line being
duplicated. As such I have dropped it in this patch.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit f608e6a60fc85e4f261daab5e7aac6225e2120d6) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Recent changes should have enabled support for IPv6 based tunnels and
support for TSO with outer UDP checksums. As such we can update the
feature flags to reflect that.
In addition we can clean-up the flags that aren't needed such as SCTP and
RXCSUM since having the bits there doesn't add any value.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit bc5d252b363cca63b7ddc1e20dd8b8b242631006) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/intel/i40e/i40e_main.c
All of the documentation in the datasheets for the XL710 do not call out
any reason to exclude support for IPv6 based tunnels. As such I am
dropping the code that was excluding these tunnel types from having their
port numbers recognized. This way we can take advantage of things such as
checksum offload for inner headers over IPv6 based VXLAN or GENEVE
tunnels.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 84d5946d49cf9552d0f1740ad62d0f126cb3b6a9) Signed-off-by: Brian Maly <brian.maly@oracle.com>
This patch contains a number of fixes to make certain that we are using
the correct protocols when parsing both the inner and outer headers of a
frame that is mixed between IPv4 and IPv6 for inner and outer.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 6b037cd465ff6e5f6b96524658f47d59d1acc554) Signed-off-by: Brian Maly <brian.maly@oracle.com>
The XL722 has support for providing the outer UDP tunnel checksum on
transmits. Make use of this feature to support segmenting UDP tunnels with
outer checksums enabled.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 5453205cd0975b845f6f8837f0c2b7c8cb80fcf8) Signed-off-by: Brian Maly <brian.maly@oracle.com>
This is mostly a minor clean-up for the Rx checksum path in order to avoid
some of the unnecessary conditional checks that were being applied.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit fad57330b6d0710fdf39dc1c2b28ccebb97ae8a1) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Add exception handling to the Tx checksum path so that we can handle cases
of TSO where the frame is bad, or Tx checksum where we didn't recognize a
protocol
Drop I40E_TX_FLAGS_CSUM as it is unused, move the CHECKSUM_PARTIAL check
into the function itself so that we can decrease indent.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 529f1f652e3c3c6db6ab5a6e3a35469ddfd9575d) Signed-off-by: Brian Maly <brian.maly@oracle.com>
This patch defers writing to the Tx descriptor bits until we know we have
successfully completed a given operation. So for example we defer updating
the tunnelling portion of the context descriptor until we have fully
identified the type.
The advantage to this approach is that we can assemble values as we go
instead of having to try and kludge everything together all at once. As a
result we can significantly clean up the tunneling configuration for
instance as we can just do a pointer walk and do the math for the distance
between each set of points.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 475b4205aa52c16feef08d55c8fd76e815b6bee7) Signed-off-by: Brian Maly <brian.maly@oracle.com>
This patch adds support for IPv6 extension headers in setting up the Tx
checksum. Without this patch extension headers would cause IPv6 traffic to
fail as the transport protocol could not be identified.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit a3fd9d8876a589f05725237aced606b995956860) Signed-off-by: Brian Maly <brian.maly@oracle.com>
This patch fixes two issues. First was the fact that iphdr(skb)->protocl
was being used to test for the outer transport protocol. This completely
breaks IPv6 support. Second was the fact that we cleared the flag for v4
going to v6, but we didn't take care of txflags going the other way. As
such we would have the v6 flag still set even if the inner header was v4.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit a0064728f8a34f7a5afd9df86d9cdd8210977c8d) Signed-off-by: Brian Maly <brian.maly@oracle.com>
The Tx checksum path was maintaining a set of 3 pointers and two lengths in
order to prepare the packet for being checksummed. The thing is we only
really needed 2 pointers, and the lengths that were being maintained can
easily be computed.
As such we can replace the IPv4 and IPv6 header pointers with one single
union that represents both, or a generic pointer to the start of the
network header. For the L4 headers we can do the same with TCP and a
generic pointer to the start of the transport header. The length of the
TCP header is obtained by simply multiplying doff by 4, and the network
header length can be obtained by subtracting the network header pointer
from the transport header pointer.
While I was at it I renamed l4_hdr to l4_proto to make it a bit more clear
and less likely to be confused with l4.hdr which is the transport header
pointer.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b96b78f2b789ab8398e7ec0111bb8b4588ed42bf) Signed-off-by: Brian Maly <brian.maly@oracle.com>
This patch goes through and pulls all of the spots where we were updating
either the TCP or IP checksums in the TSO and checksum path into the TSO
function. The general idea here is that we should only be updating the
header after we verify we have completed a skb_cow_head check to verify the
head is writable.
One other advantage to doing this is that it makes things much more
obvious. For example, in the case of IPv6 there was one spot where the
offset of the IPv4 header checksum was being updated which is obviously
incorrect.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit c777019af1dc7343be8dc44bb4d32f5e2ef072dd) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/net/ethernet/intel/i40e/i40e_txrx.c
This patch makes it so that the L4 header offsets and such can be ignored
when dealing with the L3 checksum and length update. This is done making
use of two things.
First we can just use the offset from the L4 header to the start of the
packet to determine the L4 offset, and from that we can then make use of
the data offset to determine the full length of the headers.
As far as adjusting the checksum to remove the length we can simply add the
inverse of the length instead of having to recompute the entire
pseudo-header without the length. In the case of an IPv6 header this
should be significantly cheaper since we can make use of a value we already
needed instead of having to read the source and destination address out of
the packet.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit c49a7bc3308099a8d5f9e2e38adfc5ab969804aa) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Instead of casing u32 values to u64 it makes more sense to just start out
with u64 values in the first place. This way we don't need to create a
mess with all of the casts needed to populate a 64b value.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 03f9d6a59f94f70ae775ca2aae04f2accc01a973) Signed-off-by: Brian Maly <brian.maly@oracle.com>
The i40e and i40evf drivers contained code for inserting an outer checksum
on UDP tunnels. The issue however is that the upper levels of the stack
never requested such an offload and it results in possible errors.
In addition the same logic was being applied to the Rx side where it was
attempting to validate the outer checksum, but the logic there was
incorrect in that it was testing for the resultant sum to be equal to the
header checksum instead of being equal to 0.
Since this code is so massively flawed, and doing things that we didn't ask
for it to do I am just dropping it, and will bring it back later to use as
an offload for SKB_GSO_UDP_TUNNEL_CSUM which can make use of such a
feature.
As far as the Rx feature I am dropping it completely since it would need to
be massively expanded and applied to IPv4 and IPv6 checksums for all parts,
not just the one that supports Tx checksum offload for the outer.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit a9c9a81f5892eb984234223399ee624f7dbd15e8) Signed-off-by: Brian Maly <brian.maly@oracle.com>