When RAC tries to scale RDS-TCP, they are hitting bottlenecks
due to inefficiencies in rds_bind_lookup. Each call to
rds_bind_lookup results in an irqsave/irqrestore sequence, and
when the list of RDS sockets is large, we end up having IRQs
suppressed for long intervals. This trigger flow-control assertions
and causes TX queue watchdog hangs in the sender. The current
implementation makes this even worse, by superfluously calling
rds_bind_lookup(). This patch set takes the first step to solving
this problem by avoiding one of the redundant calls to rds_bind_lookup.
When errors such as connection hangs or failures are encountered
over RDS-TCP, the sending RDS, in an attempt at HA, will try to
reconnect, and trip up on all sorts of data structures intended
for ToS support. The ToS feature is currently only supported for
RDS-IB, and unplanned/untested usage of these data
structures by RDS-TCP causes deadlocks and panics.
Until we properly design, support, and test the ToS feature for
RDS-TCP, such paths should not be wandered into. Thus this patchset
adds defensive checks to ignore rs_tos settings in rds_sendmsg() for
TCP transports, and prevents the sending of ToS heartbeat pings
Until we properly design, support, and test the ToS feature for
RDS-TCP, such paths should not be wandered into. Thus this patchset
adds defensive checks to ignore rs_tos settings in rds_sendmsg() for
TCP transports, and prevents the sending of ToS heartbeat pings
in rds_send_hb() for TCP transport.
For reference, the deadlock that can be encountered in the
hb ping path is:
shamir rabinovitch [Wed, 16 Mar 2016 13:57:19 +0000 (09:57 -0400)]
rds: rds-stress show all zeros after few minutes
Issue can be seen on platforms that use 8K and above page size
while rds fragment size is 4K. On those platforms single page is
shared between 2 or more rds fragments. Each fragment has it's own
offeset and rds cong map code need to take this offset to account.
Not taking this offset to account lead to reading the data fragment
as congestion map fragment and hang of the rds transmit due to far
cong map corruption.
Two different threads with different rds sockets may be in
rds_recv_rcvbuf_delta() via receive path. If their ports
both map to the same word in the congestion map, then
using non-atomic ops to update it could cause the map to
be incorrect. Lets use atomics to avoid such an issue.
Full credit to Wengang <wen.gang.wang@oracle.com> for
finding the issue, analysing it and also pointing out
to offending code with spin lock based fix.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
rds_fmr_flush workqueue is calling ib_unmap_fmr
to invalidate a list of FMRs. Today, this workqueue
can be scheduled at any CPUs. In a NUMA-aware system,
schedule this workqueue to run on a CPU core closer to
ib_device can improve performance. As for now, we use
"sequential-low" policy. This policy selects two lower
cpu cores closer to HCA. In a non-NUMA aware system,
schedule rds_fmr_flush workqueue in a fixed cpu core
improves performance.
The mapping of cpu to the rds_fmr_flush workqueue
can be enabled/disabled via sysctl and it is enable
by default. To disable the feature, use below sysctl.
rds_ib_sysctl_disable_unmap_fmr_cpu = 1
Putting down some of the rds-stress performance number
comparing default and sequential-low policy in a NUMA
system with Oracle M4 QDR and Mellanox CX3.
rds-stress 4 conns, 32 threads, 16 depths, RDMA write
and unidirectional (higher is better).
Santosh Shilimkar [Wed, 21 Oct 2015 23:47:28 +0000 (16:47 -0700)]
RDS: IB: support larger frag size up to 16KB
Infiniband (IB) transport supports larger message size
than RDS_FRAG_SIZE, which is usually in 4KB PAGE_SIZE.
Nevertheless, RDS always fragments each payload into
RDS_FRAG_SIZE before hands it over to the underlying
IB HCA.
One of the important message size required for database
is 8448 (8K + 256B control message) for BCOPY. This RDS
message, even with IB transport, will generate three
IB work requests (WR) with each having its own RDS header.
This series of patches improve RDS performance by allowing
IB transport to send/receive RDS message with a larger
RDS_FRAG_SIZE (Ideally, using a single WR).
In order to maintain the backward compatibility and
interoperability between various RDS versions, and at
the same time to support various FRAG_SIZE, the IB
fragment size is negotiated per connection.
Although IB is capable of supporting 4GB of message size,
currently we limit the IB RDS_FRAG_SIZE up to 16KB due to
two reasons:-
1. This is needed for current 8448 RDS message size usecase.
2. Minizing the size for each receive queue entry in order
to optimal memory usage.
In term of implementation, The 'dp_reserved2' field of
'struct rds_ib_connect_private' now carries information about
supported IB fragment size. Since we are just
using the IB connection private data and a reserved field,
the protocol version is not bumped up. Furthermore, the feature
is enabled only for RDS_PROTOCOL_v4.1 and above (future).
To keep thing simpler for user, a module parameter
'rds_ib_max_frag' is provided. Without module parameter,
the default PAGE_SIZE frag will be used. During the connection
establishment, the smallest fragment size will be
chosen. If the fragment size is 0, it means RDS module
doesn't support large fragment size and the default
RDS_FRAG_SIZE will be used.
Upto ~10+ % improvement seen with Orion and ~9+ % with RDBMS
update queries.
Santosh Shilimkar [Mon, 21 Mar 2016 06:24:32 +0000 (23:24 -0700)]
RDS: IB: purge receive frag cache on connection shutdown
RDS IB connections can be formed with different fragment size across
reconnect and hence the current frag cache needs to be purged to
avoid stale frag usages.
Santosh Shilimkar [Wed, 4 Nov 2015 21:42:39 +0000 (13:42 -0800)]
RDS: IB: scale rds_ib_allocation based on fragment size
The 'rds_ib_sysctl_max_recv_allocation' allocation is used to manage
and allocate the size of IB receive queue entry (RQE) for each IB
connection. However, it relies on the hardcoded RDS_FRAG_SIZE.
Lets make it scalable based on supported fragment sizes for different
IB connection. Each connection can allocate different RQE size
depending on the per connection fragment_size.
Santosh Shilimkar [Wed, 21 Oct 2015 23:47:28 +0000 (16:47 -0700)]
RDS: IB: make fragment size (RDS_FRAG_SIZE) dynamic
IB fabric is capable of fragment 4GB of data payload into
send_first, send_middle and send_last. Nevertheless,
RDS fragments each payload into PAGE_SIZE, which is usually
4KB. This patch makes the RDS_FRAG_SIZE for RDS IB transport
dynamic.
In the preperation for subsequent patch(es), this patch
adds per connection peer negotiation to determine the
supported fragment size for IB transport.
Orabug: 21894138 Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com> Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Wei Lin Guay [Fri, 20 Nov 2015 23:14:48 +0000 (15:14 -0800)]
RDS: fix the sg allocation based on actual message size
Fix an issue where only PAGE_SIZE bytes are allocated per
scatter-gather entry (SGE) regardless of the actual message
size: Furthermore, use buddy memory allocation technique to
allocate/free memmory (if possible) to reduce SGE.
Santosh Shilimkar [Mon, 16 Nov 2015 21:28:11 +0000 (13:28 -0800)]
RDS: make congestion code independent of PAGE_SIZE
RDS congestion map code is designed with base assumption of
4K page size. The map update as well transport code assumes
it that way. Ofcourse it breaks when transport like IB starts
supporting larger fragments than 4K.
To overcome this limitation without too many changes to the core
congestion map update logic, define indepedent RDS_CONG_PAGE_SIZE
and use it.
While at it we also move rds_message_map_pages() whose sole
purpose it to map congestion pages to congestion code.
Santosh Shilimkar [Tue, 16 Feb 2016 18:24:31 +0000 (10:24 -0800)]
RDS: Back out OoO send status fix since it causes the regression
With the DB build, the crash was observed which was boiled down to
this change on UEK2. Proactively we back this out on UEK4 as well
till the issue gets addresssed.
net/mlx4_core: Modify default value of log_rdmarc_per_qp to be consistent with HW capability
This value is used to calculate max_qp_dest_rdma.
Default value of 4 brings us to 16 while HW supports 128
(max_requester_per_qp)
Although this value can be changed by module param it is best that default
will be optimized
Roman Gushchin [Mon, 12 Oct 2015 13:33:44 +0000 (16:33 +0300)]
fuse: break infinite loop in fuse_fill_write_pages()
I got a report about unkillable task eating CPU. Further
investigation shows, that the problem is in the fuse_fill_write_pages()
function. If iov's first segment has zero length, we get an infinite
loop, because we never reach iov_iter_advance() call.
Fix this by calling iov_iter_advance() before repeating an attempt to
copy data from userspace.
A similar problem is described in 124d3b7041f ("fix writev regression:
pan hanging unkillable and un-straceable"). If zero-length segmend
is followed by segment with invalid address,
iov_iter_fault_in_readable() checks only first segment (zero-length),
iov_iter_copy_from_user_atomic() skips it, fails at second and
returns zero -> goto again without skipping zero-length segment.
Patch calls iov_iter_advance() before goto again: we'll skip zero-length
segment at second iteraction and iov_iter_fault_in_readable() will detect
invalid address.
Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
description.
Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Maxim Patlasov <mpatlasov@parallels.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: ea9b9907b82a ("fuse: implement perform_write") Cc: <stable@vger.kernel.org>
(cherry picked from commit 3ca8138f014a913f98e6ef40e939868e1e9ea876)
Alexey Dobriyan [Thu, 25 Jun 2015 22:00:54 +0000 (15:00 -0700)]
proc: fix PAGE_SIZE limit of /proc/$PID/cmdline
/proc/$PID/cmdline truncates output at PAGE_SIZE. It is easy to see with
$ cat /proc/self/cmdline $(seq 1037) 2>/dev/null
However, command line size was never limited to PAGE_SIZE but to 128 KB
and relatively recently limitation was removed altogether.
People noticed and ask questions:
http://stackoverflow.com/questions/199130/how-do-i-increase-the-proc-pid-cmdline-4096-byte-limit
seq file interface is not OK, because it kmalloc's for whole output and
open + read(, 1) + sleep will pin arbitrary amounts of kernel memory. To
not do that, limit must be imposed which is incompatible with arbitrary
sized command lines.
I apologize for hairy code, but this it direct consequence of command line
layout in memory and hacks to support things like "init [3]".
The loops are "unrolled" otherwise it is either macros which hide control
flow or functions with 7-8 arguments with equal line count.
There should be real setproctitle(2) or something.
[akpm@linux-foundation.org: fix a billion min() warnings] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Tested-by: Jarod Wilson <jarod@redhat.com> Acked-by: Jarod Wilson <jarod@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Jan Stancek <jstancek@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Orabug: 23093364
Mainline commit c2c0bb44620dece7ec97e28167e32c343da22867 Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
The problem comes with the fact that many such jobs (for the same device)
are being scheduled simultaneously. While scsi_remove_device() uses
shost->scan_mutex and scsi_device_lookup() will fail for a device in
SDEV_DEL state there is no protection against someone who did
scsi_device_lookup() before we actually entered __scsi_remove_device(). So
the whole scenario looks like that: two callers do simultaneous (or
preemption happens) calls to scsi_device_lookup() ant these calls succeed
for both of them, after that they try doing scsi_remove_device().
shost->scan_mutex only serializes their calls to __scsi_remove_device()
and we end up doing the cleanup path twice.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit be821fd8e62765de43cc4f0e2db363d0e30a7e9b)
Orabug: 23021563 Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
David S. Miller [Mon, 14 Mar 2016 03:28:00 +0000 (23:28 -0400)]
ipv4: Don't do expensive useless work during inetdev destroy.
When an inetdev is destroyed, every address assigned to the interface
is removed. And in this scenerio we do two pointless things which can
be very expensive if the number of assigned interfaces is large:
1) Address promotion. We are deleting all addresses, so there is no
point in doing this.
2) A full nf conntrack table purge for every address. We only need to
do this once, as is already caught by the existing
masq_dev_notifier so masq_inet_event() can skip this.
Reported-by: Solar Designer <solar@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
(cherry picked from commit fbd40ea0180a2d328c5adc61414dc8bab9335ce2)
Filipe Manana [Sat, 20 Jun 2015 17:20:09 +0000 (18:20 +0100)]
Btrfs: fix shrinking truncate when the no_holes feature is enabled
If the no_holes feature is enabled, we attempt to shrink a file to a size
that ends up in the middle of a hole and we don't have any file extent
items in the fs/subvol tree that go beyond the new file size (or any
ordered extents that will insert such file extent items), we end up not
updating the inode's disk_i_size, we only update the inode's i_size.
This means that after unmounting and mounting the filesystem, or after
the inode is evicted and reloaded, its i_size ends up being incorrect
(an inode's i_size is set to the disk_i_size field when an inode is
loaded). This happens when btrfs_truncate_inode_items() doesn't find
any file extent items to drop - in this case it never makes a call to
btrfs_ordered_update_i_size() in order to update the inode's disk_i_size.
Example reproducer:
$ mkfs.btrfs -O no-holes -f /dev/sdd
$ mount /dev/sdd /mnt
# Create our test file with some data and durably persist it.
$ xfs_io -f -c "pwrite -S 0xaa 0 128K" /mnt/foo
$ sync
# Append some data to the file, increasing its size, and leave a hole
# between the old size and the start offset if the following write. So
# our file gets a hole in the range [128Kb, 256Kb[.
$ xfs_io -c "truncate 160K" /mnt/foo
# We expect to see our file with a size of 160Kb, with the first 128Kb
# of data all having the value 0xaa and the remaining 32Kb of data all
# having the value 0x00.
$ od -t x1 /mnt/foo 0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
* 0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
* 0500000
# Now cleanly unmount and mount again the filesystem.
$ umount /mnt
$ mount /dev/sdd /mnt
# We expect to get the same result as before, a file with a size of
# 160Kb, with the first 128Kb of data all having the value 0xaa and the
# remaining 32Kb of data all having the value 0x00.
$ od -t x1 /mnt/foo 0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
* 0400000
In the example above the file size/data do not match what they were before
the remount.
Fix this by always calling btrfs_ordered_update_i_size() with a size
matching the size the file was truncated to if btrfs_truncate_inode_items()
is not called for a log tree and no file extent items were dropped. This
ensures the same behaviour as when the no_holes feature is not enabled.
Even though we delay the rename of directories when they become
descendents of other directories that were also renamed in the send
root to prevent infinite path build loops, we were doing it in cases
where this was not needed and was actually harmful resulting in
infinite path build loops as we ended up with a circular dependency
of delayed directory renames.
Consider the following reproducer:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt2
This reproducer resulted in an infinite path build loop when building the
path for inode 266 because the following circular dependency of delayed
directory renames was created:
Where the notation "X <- Y" means the rename of inode X is delayed by the
rename of inode Y (X will be renamed after Y is renamed). This resulted
in an infinite path build loop of inode 266 because that inode has inode
261 as an ancestor in the send root and inode 261 is in the circular
dependency of delayed renames listed above.
Fix this by not delaying the rename of a directory inode if an ancestor of
the inode in the send root, which has a delayed rename operation, is not
also a descendent of the inode in the parent root.
Thanks to Robbie Ko for sending the reproducer example.
A test case for xfstests follows soon.
Reported-by: Robbie Ko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com>
(cherry picked from mainline commit 80aa6027561eef12b49031d46fd6724daf1e7fb6) Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
We have another case where after an fsync log replay we get an inode with
a wrong link count (smaller than it should be) and a number of directory
entries greater than its link count. This happens when we add a new link
hard link to our inode A and then we fsync some other inode B that has
the side effect of logging the parent directory inode too. In this case
at log replay time we add the new hard link to our inode (the item with
key BTRFS_INODE_REF_KEY) when processing the parent directory but we
never adjust the link count of our inode A. As a result we get stale dir
entries for our inode A that can never be deleted and therefore it makes
it impossible to remove the parent directory (as its i_size can never
decrease back to 0).
A simple reproducer for fstests that triggers this issue:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
_cleanup_flakey
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
. ./common/dmflakey
# real QA test starts here
_need_to_be_root
_supported_fs generic
_supported_os Linux
_require_scratch
_require_dm_flakey
_require_metadata_journaling $SCRATCH_DEV
# Create our test directory and files.
mkdir $SCRATCH_MNT/testdir
touch $SCRATCH_MNT/testdir/foo
touch $SCRATCH_MNT/testdir/bar
# Make sure everything done so far is durably persisted.
sync
# Create one hard link for file foo and another one for file bar. After
# that fsync only the file bar.
ln $SCRATCH_MNT/testdir/bar $SCRATCH_MNT/testdir/bar_link
ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/foo_link
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/bar
# Silently drop all writes on scratch device to simulate power failure.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
# Allow writes again and mount the fs to trigger log/journal replay.
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
# Now verify both our files have a link count of 2.
echo "Link count for file foo: $(stat --format=%h $SCRATCH_MNT/testdir/foo)"
echo "Link count for file bar: $(stat --format=%h $SCRATCH_MNT/testdir/bar)"
# We should be able to remove all the links of our files in testdir, and
# after that the parent directory should become empty and therefore
# possible to remove it.
rm -f $SCRATCH_MNT/testdir/*
rmdir $SCRATCH_MNT/testdir
_unmount_flakey
# The fstests framework will call fsck against our filesystem which will verify
# that all metadata is in a consistent state.
status=0
exit
The test fails with:
-Link count for file foo: 2
+Link count for file foo: 1
Link count for file bar: 2
+rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/testdir/foo_link': Stale file handle
+rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty
(...)
_check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
And fsck's output:
(...)
checking fs roots
root 5 inode 258 errors 2001, no inode item, link count wrong
unresolved ref dir 257 index 5 namelen 8 name foo_link filetype 1 errors 4, no inode ref
Checking filesystem on /dev/sdc
(...)
So fix this by marking inodes for link count fixup at log replay time
whenever a directory entry is replayed if the entry was created in the
transaction where the fsync was made and if it points to a non-directory
inode.
This isn't a new problem/regression, the issue exists for a long time,
possibly since the log tree feature was added (2008).
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from mainline commit bb53eda9029fd52b466fa501ba4aa58e94789b18) Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Andy Lutomirski [Wed, 23 Mar 2016 10:38:23 +0000 (03:38 -0700)]
x86/iopl/64: properly context-switch IOPL on Xen PV
On Xen PV, regs->flags doesn't reliably reflect IOPL and the
exit-to-userspace code doesn't change IOPL. We need to context
switch it manually.
I'm doing this without going through paravirt because this is
specific to Xen PV. After the dust settles, we can merge this with
the 32-bit code, tidy up the iopl syscall implementation, and remove
the set_iopl pvop entirely.
This is XSA-171.
Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Jan Beulich <jbeulich@suse.com>
Orabug: 22926124
Conflicts:
arch/x86/kernel/process_64.c
X86_FEATURE_XENPV not defined, too much extra to bring it in.
Replaced with xen_pv_domain() which is trigger to set X86_FEATURE_XENPV.
CVE: CVE-2016-3157 Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Ashish Samant [Tue, 15 Mar 2016 02:23:37 +0000 (19:23 -0700)]
fuse: Do not mask return value from fuse_direct_io for partially valid data
If a user calls writev/readv in direct io mode with partially valid data
in the iovec array such that any vector other than the first one in the
array contains invalid data, we currently return the error for the invalid
iovec.
Instead, we should return the number of bytes already written/read and not
the error as we do in the non direct_io case.
Jason Luo [Thu, 17 Mar 2016 03:29:50 +0000 (11:29 +0800)]
fnic: Using rport-_dd_data to check rport online instead of rport_lookup
From: Satish Kharat <satishkh@cisco.com>
When issuing I/O we check if rport is online through libfc
rport_lookup() function which needs to be protected by mutex lock
that cannot acquired in I/O context. The change is to use midlayer
remote port’s dd_data which is preserved until its devloss timeout
and no protection is required.
Fnic driver version changed from 1.6.0.20 to 1.6.0.21
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Orabug: 22918200 Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
Jason Luo [Thu, 17 Mar 2016 03:14:24 +0000 (11:14 +0800)]
fnic: Cleanup the I_O that has timed out and is used to issue LUN reset
From: Satish Kharat <satishkh@cisco.com>
In case of LUN reset, the device reset command is issued with one of
the I/Os that has timed out on that LUN. The change is to also return
this I/O with error status set to DID_RESET.
Fnic driver version changed from 1.6.0.19 to 1.6.0.20
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Orabug: 22918200 Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
Jason Luo [Thu, 17 Mar 2016 02:59:23 +0000 (10:59 +0800)]
fnic: Fix to cleanup aborted IO to avoid device being offlined by mid-layer
From: Satish Kharat <satishkh@cisco.com>
If an I/O times out and an abort issued by host, if the abort is
successful we need to set scsi status as DID_ABORT. Or else the
mid-layer error handler which looks for this error code, will
offline the device. Also if the original I/O is not found in fnic
firmware, we will consider the abort as successful.
Fnic driver version changed from 1.6.0.17a to 1.6.0.19,
version 1.6.0.18 has been skipped
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Orabug: 22918200 Signed-off-by: Jason Luo <zhangqing.luo@oracle.com>
This fixes a regression introduced by 37b8d27d between v4.1 and v4.2.
When a snapshot is received, its received_uuid is set to the original
uuid of the subvolume. When that snapshot is then resent to a third
filesystem, it's received_uuid is set to the second uuid
instead of the original one. The same was true for the parent_uuid.
This behaviour was partially changed in 37b8d27d, but in that patch
only the parent_uuid was taken from the real original,
not the uuid itself, causing the search for the parent to fail in
the case below.
This happens for example when trying to send a series of linked
snapshots (e.g. created by snapper) from the backup file system back
to the original one.
The following commands reproduce the issue in v4.2.1
(no error in 4.1.6)
# setup three test file systems
for i in 1 2 3; do
truncate -s 50M fs$i
mkfs.btrfs fs$i
mkdir $i
mount fs$i $i
done
echo "content" > 1/testfile
btrfs su snapshot -r 1/ 1/snap1
echo "changed content" > 1/testfile
btrfs su snapshot -r 1/ 1/snap2
Signed-off-by: Robin Ruede <rruede+git@gmail.com> Fixes: 37b8d27de5d0 ("Btrfs: use received_uuid of parent during send") Cc: stable@vger.kernel.org # v4.2+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Tested-by: Ed Tomlinson <edt@aei.ca>
(cherry picked from commit b96b1db039ebc584d03a9933b279e0d3e704c528) Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Hans Westgaard Ry [Wed, 3 Feb 2016 08:26:57 +0000 (09:26 +0100)]
net:Add sysctl_max_skb_frags
Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5f74f82ea34c0da80ea0b49192bb5ea06e063593)
David Jeffery [Fri, 28 Aug 2015 04:50:45 +0000 (14:50 +1000)]
xfs: return errors from partial I/O failures to files
There is an issue with xfs's error reporting in some cases of I/O partially
failing and partially succeeding. Calls like fsync() can report success even
though not all I/O was successful in partial-failure cases such as one disk of
a RAID0 array being offline.
The issue can occur when there are more than one bio per xfs_ioend struct.
Each call to xfs_end_bio() for a bio completing will write a value to
ioend->io_error. If a successful bio completes after any failed bio, no
error is reported do to it writing 0 over the error code set by any failed bio.
The I/O error information is now lost and when the ioend is completed
only success is reported back up the filesystem stack.
xfs_end_bio() should only set ioend->io_error in the case of BIO_UPTODATE
being clear. ioend->io_error is initialized to 0 at allocation so only needs
to be updated by a failed bio. Also check that ioend->io_error is 0 so that
the first error reported will be the error code returned.
Cc: stable@vger.kernel.org Signed-off-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
(cherry picked from commit c9eb256eda4420c06bb10f5e8fbdbe1a34bc98e0)
David Jeffery [Fri, 28 Aug 2015 04:50:45 +0000 (14:50 +1000)]
xfs: return errors from partial I/O failures to files
There is an issue with xfs's error reporting in some cases of I/O partially
failing and partially succeeding. Calls like fsync() can report success even
though not all I/O was successful in partial-failure cases such as one disk of
a RAID0 array being offline.
The issue can occur when there are more than one bio per xfs_ioend struct.
Each call to xfs_end_bio() for a bio completing will write a value to
ioend->io_error. If a successful bio completes after any failed bio, no
error is reported do to it writing 0 over the error code set by any failed bio.
The I/O error information is now lost and when the ioend is completed
only success is reported back up the filesystem stack.
xfs_end_bio() should only set ioend->io_error in the case of BIO_UPTODATE
being clear. ioend->io_error is initialized to 0 at allocation so only needs
to be updated by a failed bio. Also check that ioend->io_error is 0 so that
the first error reported will be the error code returned.
Cc: stable@vger.kernel.org Signed-off-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Orabug : 22681464
Mainline v4.3 commit c9eb256eda4420c06bb10f5e8fbdbe1a34bc98e0 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
A previous commit, bbb87ba5690c72821614acdd465c24eeb99fa923,
introduced a call to efi_secure_boot_init() into the start_kernel()
sequence. The new call was not conditionalized based on platform.
Secure Boot is only implmented on the x86 platform; the definition
of efi_secure_boot_init() is in arch/x86/platform/efi/efi.c. On
non-x86 architectures, this caused a implicit-declaration error and
broke the build.
This commit wraps the call in #ifdef CONFIG_X86/#endif to eliminate
the error.
Signed-off-by: Dan Duval <dan.duval@oracle.com>
(cherry picked from commit 289ef170a213f9b078886327a162cbcb7a325838) Signed-off-by: Dan Duval <dan.duval@oracle.com>
With UEFI Secure Boot enabled and securelevel set, after a kernel is loaded
using kexec and booted, securelevel is disabled. With the securelevel patch
set, the state of UEFI Secure Boot is queried when booted via the efi stub, but
kexec does not use the efi stub.
To allow kernels which are not loaded through the efi stub to properly set
securelevel as well, add a new init routine to start_kernel() to query the
state of UEFI Secure Boot and enable securelevel if needed.
Taken from https://bugzilla.redhat.com/attachment.cgi?id=1052836 .
Signed-off-by: Linn Crosetto <linn@hp.com> Signed-off-by: Dan Duval <dan.duval@oracle.com>
(cherry picked from commit a954f7350658a8fde4b893c7b74de8137864ad12) Signed-off-by: Dan Duval <dan.duval@oracle.com>
Pradeep Gopanapalli [Tue, 1 Mar 2016 01:45:59 +0000 (01:45 +0000)]
Fixed vnic issue after saturn reset
After saturn reboots uVnic will not recover
The issue happens When a host connected to NM3 via infiniband switch.
The fact here is xve driver sets OPER_DOWN when it losses HEART_BEAT and
there is no way for it set this state back .
Fixing this by disjoining Multicast group once HeartBeat is lost and
joining back once saturn comes back
Pradeep Gopanapalli [Thu, 11 Feb 2016 23:29:39 +0000 (23:29 +0000)]
uvnic issues
When there is Heartbeat loss because of Link event on EDR fabric, uvnic
recovery fails(some times).
This is due to the fact that XVE_OPER_UP state would have cleared and
xve driver has to check this once multicast group membership is retained
While running MAXq test we are running into soft crash around
skb_try_coalesce .Since xve driver allocates PAGE_SIZE it has to use
full PAGE_SIZE for truesize of skb.
xve driver doesn't allocate enough tailroom in skbs, so IP/TCP
stacks need to reallocate skb head to pull IP/TCP headers.
this .
xve allocates some resources which are not needed by uVnic
functionality.
Fix this by using cm_supported flag.
Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mainline v4.4 commit fb53c439d84387621c53808a3957ffd9876e5094 Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>
Inside compat IOCTL hook of driver, driver was using wrong address of
ioc->frame.raw which leads sense_ioc_ptr to be calculated wrongly and
failing IOCTL.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ea1c928bb6051ec4ccf24826898aa2361eaa71e5) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver assumes that VFs always have peers present whenever they have
same LD IDs. But this is not the case. This patch handles the above
mentioned by explicitly checking for a peer before making HA/non-HA path
decision.
Signed-off-by: Uday Lingala <uday.lingala@avagotech.com> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8f67c8c518f324874e8caf93d1f4468d25754333) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The pd_seq_sync pointer can't be NULL, we have to check its entries
instead.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Acked-by: Sumit Saxena <sumit.saxena@broadcom.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 546e559c79b1a8d27c23262907a00fc209e392a0) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d92ca9d3de862cb123d7ef0fd42dc1e9867dd590) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch fixes online controller resets on SRIOV-enabled series of
Avago controllers.
1) Remove late detection heartbeat.
2) Change in the behavior if the FW found in READY/OPERATIONAL state.
Signed-off-by: Uday Lingala <uday.lingala@avagotech.com> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3885c26b773750bf2e7e071a5b0b72f079196d60) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch will introduce module-parameter for SCSI command timeout
value and fix setting of resetwaittime beyond a value.
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e3d178ca773ff997c6c94989d0b14a2c0eae761c) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ccc7507de27a639c9e1327d6e56ef1f357962b09) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Make instance->adprecovery variable atomic and removes hba_lock spinlock
while accessing instance->adprecovery.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8a01a41d864771fbc3cfc80a9629e06189479cce) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch will add capability in driver to tell firmware that it can
throttle IOs in case controller's queue depth is downgraded post OFU
(online firmware upgrade). This feature will ensure firmware can be
downgraded from higher queue depth to lower queue depth without needing
system reboot. Added throttling code in IO path of driver, in case OS
tries to send more IOs than post OFU firmware's queue depth.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 52b62ac7c66e1a11eb8b3e3b0212847749af3b2d) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Controller-wide queue depth will be greater among the two. Using this
new feature, iMR can provide larger Queue depth(QD) for JBOD and limited
QD for Virtual Disk(VD).
2. megaraid_sas driver will throttle read/write LDIOs based on "LDIO
Queue Depth".
3. Dual queue depth can be enabled/disabled via module parameter. It is
enabled by default if the firmware supports it. Only specific firmware
builds will enable the feature.
4. Added sysfs parameter "ldio_outstanding" which permits querying the
number of outstanding LDIO requests at runtime.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 308ec459bc1975d9856cfeb3d1cd6461794a3976) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
build_and_issue_cmd should return SCSI_MLQUEUE_HOST_BUSY for a few error
cases instead of returning 1.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f9a9dee6a1fd8570884a0ab6f19c6b5cca05bd49) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch will create a reply queue pool for each MSI-X index and will
provide an array of base addresses instead of the single address of
legacy mode. Using this new interface the driver can support higher
queue depths through scattered DMA pools.
If array mode is not supported driver will fall back to the legacy
method of reply pool allocation. This limits controller queue depth to
1K max. To enable a queue depth of more than 1K driver requires firmware
to support array mode and scratch_pad3 will provide the new queue depth
value.
When RDPQ is used, downgrading to an older firmware release should not
be permitted. This may cause firmware fault and is not supported.
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 179ac14291a0e1cf8c2b2dfedce7c5af66696cc9) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Firmware will fill out per-LD data to tell driver whether a particular
LD supports region lock bypass. If yes, then driver will send non-FP
LDIO to region lock bypass FIFO. With this change in driver, firmware
will optimize certain code to improve performance.
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8f05024cd3dbd3ec85923f3e8da05bf6db187d57) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch will update device Queue depth based on interface type(SAS,
SATA..) for sysPDs. For Virtual disks(VDs), there will be no change in
queue depth (will remain 256). To fetch interface type (SAS or SATA or
FC..) of syspD, driver will send DCMD MR_DCMD_PD_GET_INFO.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2216c30523b0a1835b6d522ffe73ca167f199f00) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch adds task management for SCSI commands. Added functions are
task abort and target reset.
1. Currently, megaraid_sas driver performs controller reset when any IO
times out. With task management support added, task abort and target
reset will be tried to recover timed out IO. If task management fails,
then controller reset will be performaned. If the task management
request times out, fail the request and escalate to the next
level (controller reset).
2. mr_device_priv_data will be allocated for all generations of
controller, but is_tm_capable flag will never be set for
controllers (prior to Invader series) as firmware support is not
available for task management.
3. Task management capable firmware will set is_tm_capable flag in
firmware API.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 18365b138508bfbce0405f9904639fa3b7caf3c9) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2c048351c8e3e2b90b3c8b9dea3ee1b709853a9d) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch will do proper error handling for DCMD timeout failure cases
for Fusion adapters:
1. For MFI adapters, in case of DCMD timeout (DCMD which must return
SUCCESS) driver will call kill adapter.
2. What action needs to be taken in case of DCMD timeout is decided by
function dcmd_timeout_ocr_possible(). DCMD timeout causing OCR is
applicable to the following commands:
3. If DCMD fails from driver init path there are certain DCMDs which
must return SUCCESS. If those DCMDs fail, driver bails out. For optional
DCMDs like pd_info etc., driver continues without executing certain
functionality.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6d40afbc7d13359b30a5cd783e3db6ebefa5f40a) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch will do synhronization between OCR function and AEN function
using "reset_mutex" lock. reset_mutex will be acquired only in the
first half of the AEN function which issues a DCMD. Second half of the
function which calls SCSI API (scsi_add_device/scsi_remove_device)
should be out of reset_mutex to avoid deadlock between scsi_eh thread
and driver.
During chip reset (inside OCR function), there should not be any PCI
access and AEN function (which is called in delayed context) may be
firing DCMDs (doing PCI writes) when chip reset is happening in parallel
which will cause FW fault. This patch will solve the problem by making
AEN thread and OCR thread mutually exclusive.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 11c71cb4ab7cd901b9d6f0ff267c102778c1c8ef) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This adds the needed check after the call to the function
mraid_mm_alloc_kioc in order to make sure that this function has not
returned NULL and therefore makes sure we do not deference a NULL
pointer if one is returned by mraid_mm_alloc_kioc. Further more add
needed comments explaining that this function call can return NULL if
the list head is empty for the pointer passed in order to allow furture
users to understand this required pointer check.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Acked-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7296f62f0322d808362b21064deb34f20799c20d) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 9fb74c4e66daab5c3fb3b949d37c15684d7ee82a) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The DELL PERC5 controller firmware does not list tape drives in response
to MR_DCMD_PD_LIST_QUERY. This causes tape drives not be exposed to the
OS when connected to a PERC5 controller.
This patch permits detection of tape drives connected to a PERC5
controller by exposing non-TYPE_DISK devices unconditionally.
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit aed335eecf8f09c28588b01c7f7e24ee78156e28) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c8051156d1d3dd99d02e0bf5b127fc8d32f30f69) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit afb2b5ddac9f3727983030bc4450e9e3b5956b2a) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is an issue on SMAP enabled CPUs and 32 bit apps running on 64 bit
OS. Do not access user memory from kernel code. The SMAP bit restricts
accessing user memory from kernel code.
Cc: <stable@vger.kernel.org> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 323c4a02c631d00851d8edc4213c4d184ef83647) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0b48d12d0365a628d2257a4560b3b06c825fe1cd) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It may happen (kdump), that an interrupt is invoked just after the
setup_irqs function was called but before the tasklet was initialised.
At this phase the hw ints should have been disabled, but for unknown
reason this mechanism seems to not work properly.
From: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 91626c2701acad605c434b5e8245cbeea6671382) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3222251dbbe9f155e7b8c910b770d6ff922fb47e) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c4bd265415d5b06d7e3615c53036f589f300076e) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Do not use PAGE_SIZE marco to calculate max_sectors per I/O
request. Driver code assumes PAGE_SIZE will be always 4096 which can
lead to wrongly calculated value if PAGE_SIZE is not 4096. This issue
was reported in Ubuntu Bugzilla Bug #1475166.
Cc: <stable@vger.kernel.org> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 357ae967ad66e357f78b5cfb5ab6ca07fb4a7758) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7364d34b878d78c4df90d0e6a5e06f8ad0c283e4) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 90c204bc59a313bf03a0641caee3e2b5945629b5) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove PCI id based checks and use instance->ctrl_context to decide
whether controller is MFI-based or a Fusion adapter. Additionally,
Fusion adapters are divided into two categories: Thunderbolt and
Invader.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5a8cb85b569b2349493aadb81a747e077766907d) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0d5b47a724bab0ebaaa933d6ff5e584957aaa188) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 609fb07b2bdebe5d2c6a9da7e9d8e05155086418) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some of these code changes were proposed by David Binderman.
Removed redudant check of requestorId. Redundant condition:
instance.requestorId. Check for plasma firmware 1.11 are now
restructured to support only specific device id.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 92bb6505785b632a1b0f735b21c4b34326ec048f) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Syncro firmware supports round robin I/O switching on dual path. Driver
uses validHandles to check for dual path. However, it is supposed to
check for values > 1 (not > 2).
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 75b96061eb95c9b9f8a1da6995f1c314526f3572) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Print firmware events in human-readable form. This will help users track
any critical firmware events without special application support.
Sample syslogd output:
megaraid_sas 0000:02:00.0: 8619 (491648347s/0x0020/WARN) - Controller temperature threshold exceeded. This may indicate inadequate system cooling. Switching to low performance mode.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 714f517745729e060ef716d16026e26e5fce591f) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Try to do chip reset at driver load time. If firmware fails to reach
ready state, try chip reset using adp_reset() callback. For Fusion
adapters the call back was previously void. Provide a suitable reset
function.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 79b82c2c560025afbb88ba7ad5cddb9c2203cf2e) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver will expose max sge = 256 (earlier it was 64) if firmware
supports extended IO size (1M).
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Martin Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit bd5f9484262a13397a0725f4a43f7baaa3341125) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Martin Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4dbbe3cec443f0c6867e7ef549704966fbd6f48b) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implemented JBOD map which will provide quick access for JBOD path and
also provide sequence number. This will help hardware to fail command
to the FW in case of any sequence mismatch.
Fast Path I/O for JBOD will refer JBOD map (which has sequence number
per JBOD device) instead of RAID map. Previously, the driver used RAID
map to get device handle for fast path I/O and this not have sequence
number information. Now, driver will use JBOD map instead. As part of
error handling, if JBOD map is failed/not supported by firmware, driver
will continue using legacy behavior.
Now there will be three IO paths for JBOD (syspd):
- JBOD map with sequence number (Fast Path)
- RAID map without sequence number (Fast Path)
- FW path via h/w exception queue deliberately setup devhandle
0xFFFF (FW path).
Relevant data structures:
- Driver send new DCMD MR_DCMD_SYSTEM_PD_MAP_GET_INFO for this purpose.
- struct MR_PD_CFG_SEQ- This structure represent map of single physical
device.
- struct MR_PD_CFG_SEQ_NUM_SYNC- This structure represent whole JBOD
map in general(size, count of sysPDs configured, struct MR_PD_CFG_SEQ
of syspD with 0 index).
- JBOD sequence map size is: sizeof(struct MR_PD_CFG_SEQ_NUM_SYNC)
+ (sizeof(struct MR_PD_CFG_SEQ) * (MAX_PHYSICAL_DEVICES - 1)) which
is allocated while setting up JBOD map at driver load time.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Martin Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3761cb4cf65ec78846b4b8cba9c0578bb10f92d5) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Martin Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e0bd0874f2de21613e572669b2de1e4b0c3a97de) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Martin Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0be3f4c9e6b8f12c8c0d7b156b995b30134c7448) Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>