Don Brace [Fri, 6 May 2016 19:17:57 +0000 (12:17 -0700)]
hpsa: correct handling of HBA device removal
Need to report HBA device removal faster than the
event handler polling interval.
Stop I/O to the removed disk and wait for all
I/O operations to flush before removing the device.
Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Don Brace <don.brace@microsemi.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:17:36 +0000 (12:17 -0700)]
hpsa: correct ioaccel2 error procecssing.
set offload_to_be_enabled to 0 when an ioaccel2 error is processed.
Before, an ioaccel completion error would turn of ioaccel but a rescan
would turn it back on again.
Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Don Brace <don.brace@microsemi.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:17:16 +0000 (12:17 -0700)]
hpsa: correct ioaccel state change operation
offload_to_be_enabled also needs to be set to 0 during a state
change.
Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Don Brace <don.brace@microsemi.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:16:56 +0000 (12:16 -0700)]
hpsa: add timeouts for driver initiated commands
faulty drives can cause the driver to hang during a
scan operation.
Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Don Brace <don.brace@microsemi.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Joseph T Handzik [Fri, 6 May 2016 19:16:35 +0000 (12:16 -0700)]
hpsa: add sas_address to sysfs device attibute
There have been companies requesting a sysfs entry
to obtain the sas address of device.
Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Don Brace <don.brace@microsemi.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:16:15 +0000 (12:16 -0700)]
hpsa: correct initialization order issue
The driver was calling scsi_scan_host before enabling interrupts.
This has gone unnoticed except for customers running in intx mode.
Calling scsi_scan_host before interrupts are enabled causes
"irq XX: nobody cared" messages and the driver to hang.
This patch enables interrupts before the call to scsi_scan_host.
Reported-by: Piotr Karbowski <piotr.karbowski@gmail.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Don Brace <don.brace@microsemi.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:15:55 +0000 (12:15 -0700)]
hpsa: update copyright information
Reviewed-by: Justin Lindley <justin.lindley@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:15:14 +0000 (12:15 -0700)]
hpsa: check for a null phys_disk pointer in ioaccel2
path
An oops can occur when submitting ioaccel2 commands when the phys_disk
pointer is NULL in hpsa_scsi_ioaccel_raid_map. Happens when there are
configuration changes during I/O operations.
If the phys_disk pointer is NULL, send the command down the RAID path.
Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com> Reviewed-by: Justin Lindley <justin.lindley@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:14:54 +0000 (12:14 -0700)]
hpsa: correct abort tmf for hba devices
Aborts were not being sent down to HBA devices
Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com> Reviewed-by: Justin Lindley <justin.lindley@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:14:34 +0000 (12:14 -0700)]
hpsa: correct lun data caching bitmap definition
The bitmap was changed after this definition was added to the
driver. Correcting the bitmap definition.
Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com> Reviewed-by: Justin Lindley <justin.lindley@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:14:14 +0000 (12:14 -0700)]
hpsa: add SMR drive support
Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com> Reviewed-by: Justin Lindley <justin.lindley@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:13:53 +0000 (12:13 -0700)]
hpsa: do not get enclosure info for external devices
Stop annoying "Error, could not get enclosure information"
messages.
Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com> Reviewed-by: Justin Lindley <justin.lindley@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:13:33 +0000 (12:13 -0700)]
hpsa: Add box and bay information for enclosure
devices
Adding a new method to display enclosure device information.
Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:13:13 +0000 (12:13 -0700)]
hpsa: Change SAS transport devices to bus 0.
SAS transport places devices on bus 0 but driver was setting the bus to
3.
Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Dan Carpenter [Fri, 6 May 2016 19:12:32 +0000 (12:12 -0700)]
hpsa: logical vs bitwise AND typo
HPSA_DIAG_OPTS_DISABLE_RLD_CACHING is a mask and bitwise AND was
intended here instead of logical &&. This bug is essentially harmless,
it means that sometimes we don't print a warning message which we wanted
to print.
Fixes: c2adae44e916 ('hpsa: disable report lun data caching') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Christoph Hellwig [Fri, 6 May 2016 19:12:12 +0000 (12:12 -0700)]
scsi: use host wide tags by default
This patch changes the !blk-mq path to the same defaults as the blk-mq
I/O path by always enabling block tagging, and always using host wide
tags. We've had blk-mq available for a few releases so bugs with
this mode should have been ironed out, and this ensures we get better
coverage of over tagging setup over different configs.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:11:52 +0000 (12:11 -0700)]
hpsa: bump the driver version
Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Gerry Morong <gerry.morong.pmcs.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Kevin Barnett [Fri, 6 May 2016 19:11:32 +0000 (12:11 -0700)]
hpsa: add in sas transport class
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
so hopefully output_len contains the combined length of the eight
strings. Otherwise, snprintf will stop copying to the output
buffer, but still end up reporting that combined length - which
in turn would result in user-space getting a bunch of useless nul
bytes (thankfully the upper sysfs layer seems to clear the output
buffer before passing it to the various ->show routines). But we have
so output_len at best contains the length of the last string printed.
Inside the loop, we then otherwise add to output_len. By magic,
we still have PATH_STRING_LEN available every time... This
wouldn't really be a problem if the bean-counting has been done
properly and each line actually does fit in 50 bytes, and maybe
it does, but I don't immediately see why. Suppose we end up
taking this branch:
An optimistic estimate says this uses strlen("BOX: 1 BAY: 2
Active\n") which is 21. Now add the 20 bytes guaranteed by the
%20.20s and then some for the rest of that format string, and
we're easily over 50 bytes. I don't think we can get over 100
bytes even being pessimistic, so this just means we'll scribble
into the next path[i+1] and maybe get that overwritten later,
leading to some garbled output (in fact, since we'd overwrite the
previous string's 0-terminator, we could end up with one very
long string and then print various suffixes of that, leading to
much more than 400 bytes of output). Except of course when we're
filling path[7], where overrunning it means writing random stuff
to the kernel stack, which is usually a lot of fun.
We can fix all of that and get rid of the 400 byte stack buffer by
simply writing directly to the given output buffer, which the upper
layer guarantees is at least PAGE_SIZE. s[c]nprintf doesn't care where
it is writing to, so this doesn't make the spin lock hold time any
longer. Using scnprintf ensures that output_len always represents the
number of bytes actually written to the buffer, so we'll report the
proper amount to the upper layer.
Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:10:47 +0000 (12:10 -0700)]
hpsa: enhance device messages
Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Scott Teel [Fri, 6 May 2016 19:10:20 +0000 (12:10 -0700)]
hpsa: disable report lun data caching
When external target arrays are present, disable the firmware's
normal behavior of returning a cached copy of the report lun data,
and force it to collect new data each time we request a report luns.
This is necessary for external arrays, since there may be no
reliable signal from the external array to the smart array when
lun configuration changes, and thus when driver requests
report luns, it may be stale data.
Use diag options to turn off RPL data caching.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Scott Teel [Fri, 6 May 2016 19:09:59 +0000 (12:09 -0700)]
hpsa: add discovery polling for PT RAID devices.
There are problems with getting configuration change notification
in pass-through RAID environments. So, activate flag
h->discovery_polling when one of these devices is detected in
update_scsi_devices.
After discovery_polling is set, execute a report luns from
rescan_controller_worker (every 30 seconds).
If the data from report_luns is different than last
time (binary compare), execute a full rescan via update_scsi_devices.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Scott Teel [Fri, 6 May 2016 19:09:39 +0000 (12:09 -0700)]
hpsa: eliminate fake lun0 enclosures
We don't need to create fake enclosure devices at Lun0
in external target array configurations anymore.
This was done to support Pre-SCSI rev 5 controllers
that didn't suppoprt report luns commands, so the
SCSI layer had to scan targets. If there was no
LUN at LUN 0, then the target scan would stop, and
move to the next target. Lun0 enclosure device
was added to prevent sparsely-numbered LUNs from
being missed.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Scott Teel [Fri, 6 May 2016 19:09:19 +0000 (12:09 -0700)]
hpsa: generalize external arrays
External array LUNs must use target and lun numbers assigned by the
external array. So the driver must treat these differently from
local LUNs when assigning lun/target.
LUN's 'model' field has been used to detect Lun types that need
special treatment, but the desire is to eliminate the need to reference
specific array models, and support any external array.
Pass-through RAID (PTRAID) luns are not luns of the local controller,
so they are not reported in LUN count of command 'ID controller'.
However, they ARE reported in "Report logical Luns" command.
Local luns are listed first, then PTRAID LUNs.
The number of luns from "Report LUNs" in excess of those reported by
'ID controller' are therefore the PTRAID LUNS.
We can now remove function is_ext_target, and the 'white list'
array of supported model names.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Kevin Barnett [Fri, 6 May 2016 19:08:59 +0000 (12:08 -0700)]
hpsa: move scsi_add_device and scsi_remove_device
calls to new function
preparation for adding the sas transport class
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Kevin Barnett [Fri, 6 May 2016 19:08:38 +0000 (12:08 -0700)]
hpsa: refactor hpsa_figure_bus_target_lun
setup for sas transport. Need to set the
bus and target accordingly.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:08:18 +0000 (12:08 -0700)]
hpsa: enhance hpsa_get_device_id
use an index into vpd data for SAS/SATA drives
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Kevin Barnett [Fri, 6 May 2016 19:07:58 +0000 (12:07 -0700)]
hpsa: add function is_logical_device
simplify checking for logical/physical devices
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Kevin Barnett [Fri, 6 May 2016 19:07:38 +0000 (12:07 -0700)]
hpsa: simplify update scsi devices
remove repeated calculation that checks for physical
or logical devices.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Kevin Barnett [Fri, 6 May 2016 19:07:17 +0000 (12:07 -0700)]
hpsa: simplify check for device exposure
remove macros and cleanup device exposure checking
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:06:57 +0000 (12:06 -0700)]
hpsa: correct ioaccel2 sg chain len
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:06:37 +0000 (12:06 -0700)]
hpsa: correct check for non-disk devices
The driver is using two MACROs which seemingly are looking in
the wrong location for the device_flags returned from
CISS_REPORT_PHYS. Both MACROs, NON_DISK_PHYS_DEV and
PHYS_IOACCEL, are using the pointer returned from figure_lunaddrbytes
which is the address of the LUN.lunid element in
the extended CISS_REPORT_PHYS. But the MACROS are using offsets
beyond the range of the element (offset 17 of an 8 byte element).
These MACROs actually are looking at the correct location but
they fail static checker analysis. It also will not work
if any new elements are added to the extended LUN structure.
Change the code to use the structure elements directly
since this MACRO is only used in one location.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Scott Teel [Fri, 6 May 2016 19:06:17 +0000 (12:06 -0700)]
hpsa: fix physical target reset
Set reset type in device_reset_handler to do either
logical unit reset for logical devices, or physical
target reset, for physical devices.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:05:56 +0000 (12:05 -0700)]
hpsa: fix hpsa_adjust_hpsa_scsi_table
Fix a NULL pointer issue in the driver when devices are removed
during a reset.
Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:05:36 +0000 (12:05 -0700)]
hpsa: correct transfer length for 6 byte read/write
commands
handle block counts of 0. Cleanup block and block count calculations.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:05:16 +0000 (12:05 -0700)]
hpsa: abandon rescans on memory alloaction failures.
Abandon and reschedule rescan process only if device inquiries
fail due to mem alloc failures, which are likely to occur for
all devices.
Otherwise, skip device if inquiry fails for other reasons,
and continue rescanning process for other devices.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:04:56 +0000 (12:04 -0700)]
hpsa: allow driver requested rescans
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by; Hannes Reinecke <hare@suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:03:55 +0000 (12:03 -0700)]
hpsa: change devtype to unsigned
This member is used in calls to scsi_device_type.
It should be unsigned since the kernel checks for upper bounds
and it should never be negative.
Suggested-by: Tomas Henzl <thenzl@redhat.com> Suggested-by: Hannes Reinecke <hare@suse.de> Suggested-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:03:14 +0000 (12:03 -0700)]
hpsa: stop zeroing reset_cmds_out and
ioaccel_cmds_out during rescan
pulling the rug out from under the reset handler
likewise for ioaccel_cmds_out
Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:02:54 +0000 (12:02 -0700)]
hpsa: remove unused parameter hostno
This parameter was once used before scan_start was defined
but now it is no longer used.
Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Scott Benesh [Fri, 6 May 2016 19:02:34 +0000 (12:02 -0700)]
hpsa: add in new offline mode
prevent adding volumes that are not available.
Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Justin Lindley <justin.lindley@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Kevin Barnett [Fri, 6 May 2016 19:02:14 +0000 (12:02 -0700)]
Change how controllers in mixed mode are handled.
Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Don Brace [Fri, 6 May 2016 19:01:53 +0000 (12:01 -0700)]
hpsa: update controller names
replace PM8068/69 with actual names
Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Scott Teel <scott.teel@pmcs.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
Dan Carpenter [Fri, 6 May 2016 19:01:33 +0000 (12:01 -0700)]
hpsa: fix an sprintf() overflow in the reset handler
The string "cmd %d RESET FAILED, new lockup detected" is not quite
large enough so the sprintf() will overflow. I have increased the size
of the buffer and also changed the sprintf calls to snprintf.
Fixes: 73153fe533bc ('hpsa: use block layer tag for command allocation') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Don Brace <don.brace@pmcs.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Orabug: 23064595 Signed-off-by: Manjunath Govindashetty <manjunath.govindashetty@oracle.com>
We should be doing this, it's weird we hadn't been doing this.
Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit 0d2b2372e097cd3b4150d3ec91e79ac3c5cc750e) Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
If we remove a hard link from an inode, the inode gets evicted, then
we fsync the inode and then power fail/crash, when the log tree is
replayed, the parent directory inode still has entries pointing to
the name that no longer exists, while our inode no longer has the
BTRFS_INODE_REF_KEY item matching the deleted hard link (as expected),
leaving the filesystem in an inconsistent state. The stale directory
entries can not be deleted (an attempt to delete them causes -ESTALE
errors), which makes it impossible to delete the parent directory.
This happens because we track the id of the transaction where the last
unlink operation for the inode happened (last_unlink_trans) in an
in-memory only field of the inode, that is, a value that is never
persisted in the inode item stored on the fs/subvol btree. So if an
inode is evicted and loaded again, the value for last_unlink_trans is
set to 0, which prevents the fsync from logging the parent directory
at btrfs_log_inode_parent(). So fix this by setting last_unlink_trans
to the id of the transaction that last modified the inode when we
load the inode. This is a pessimistic approach but it always ensures
correctness with the trade off of ocassional full transaction commits
when an fsync is done against the inode in the same transaction where
it was evicted and reloaded when our inode is a directory and often
logging its parent unnecessarily when our inode is not a directory.
The following test case for fstests triggers the problem:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
_cleanup_flakey
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
. ./common/dmflakey
# real QA test starts here
_need_to_be_root
_supported_fs generic
_supported_os Linux
_require_scratch
_require_dm_flakey
_require_metadata_journaling $SCRATCH_DEV
# Create our test file with 2 hard links.
mkdir $SCRATCH_MNT/testdir
touch $SCRATCH_MNT/testdir/foo
ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/bar
# Make sure everything done so far is durably persisted.
sync
# Now remove one of the links, trigger inode eviction and then fsync
# our inode.
unlink $SCRATCH_MNT/testdir/bar
echo 2 > /proc/sys/vm/drop_caches
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/foo
# Silently drop all writes on our scratch device to simulate a power failure.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
# Allow writes again and mount the fs to trigger log/journal replay.
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
# Now verify our directory entries.
echo "Entries in testdir:"
ls -1 $SCRATCH_MNT/testdir
# If we remove our inode, its parent should become empty and therefore we should
# be able to remove the parent.
rm -f $SCRATCH_MNT/testdir/*
rmdir $SCRATCH_MNT/testdir
_unmount_flakey
# The fstests framework will call fsck against our filesystem which will verify
# that all metadata is in a consistent state.
status=0
exit
The test failed on btrfs with:
generic/098 4s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad)
--- tests/generic/098.out 2015-07-23 18:01:12.616175932 +0100
+++ /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad 2015-07-23 18:04:58.924138308 +0100
@@ -1,3 +1,6 @@
QA output created by 098
Entries in testdir:
+bar
foo
+rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/testdir/foo': Stale file handle
+rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty
...
(Run 'diff -u tests/generic/098.out /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad' to see the entire diff)
_check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see /home/fdmanana/git/hub/xfstests/results//generic/098.full)
$ cat /home/fdmanana/git/hub/xfstests/results//generic/098.full
(...)
checking fs roots
root 5 inode 258 errors 2001, no inode item, link count wrong
unresolved ref dir 257 index 0 namelen 3 name foo filetype 1 errors 6, no dir index, no inode ref
unresolved ref dir 257 index 3 namelen 3 name bar filetype 1 errors 5, no dir item, no inode ref
Checking filesystem on /dev/sdc
(...)
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit bde6c242027b0f1d697d5333950b3a05761d40e4) Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Andrey Ryabinin [Wed, 7 Oct 2015 11:39:55 +0000 (14:39 +0300)]
lockd: get rid of reference-counted NSM RPC clients
Currently we have reference-counted per-net NSM RPC client
which created on the first monitor request and destroyed
after the last unmonitor request. It's needed because
RPC client need to know 'utsname()->nodename', but utsname()
might be NULL when nsm_unmonitor() called.
So instead of holding the rpc client we could just save nodename
in struct nlm_host and pass it to the rpc_create().
Thus ther is no need in keeping rpc client until last
unmonitor request. We could create separate RPC clients
for each monitor/unmonitor requests.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
(cherry picked from commit 0d0f4aab4e4d290138a4ae7f2ef8469e48c9a669)
Commit cb7323fffa85 ("lockd: create and use per-net NSM
RPC clients on MON/UNMON requests") introduced per-net
NSM RPC clients. Unfortunately this doesn't make any sense
without per-net nsm_handle.
E.g. the following scenario could happen
Two hosts (X and Y) in different namespaces (A and B) share
the same nsm struct.
1. nsm_monitor(host_X) called => NSM rpc client created,
nsm->sm_monitored bit set.
2. nsm_mointor(host-Y) called => nsm->sm_monitored already set,
we just exit. Thus in namespace B ln->nsm_clnt == NULL.
3. host X destroyed => nsm->sm_count decremented to 1
4. host Y destroyed => nsm_unmonitor() => nsm_mon_unmon() => NULL-ptr
dereference of *ln->nsm_clnt
So this could be fixed by making per-net nsm_handles list,
instead of global. Thus different net namespaces will not be able
share the same nsm_handle.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: <stable@vger.kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
(cherry picked from commit 0ad95472bf169a3501991f8f33f5147f792a8116)
Keith Busch [Mon, 24 Aug 2015 13:48:16 +0000 (08:48 -0500)]
PCI: Set MPS to match upstream bridge
Firmware typically configures the PCIe fabric with a consistent Max Payload
Size setting based on the devices present at boot. A hot-added device
typically has the power-on default MPS setting (128 bytes), which may not
match the fabric.
The previous Linux default, in the absence of any "pci=pcie_bus_*" options,
was PCIE_BUS_TUNE_OFF, in which we never touch MPS, even for hot-added
devices.
Add a new default setting, PCIE_BUS_DEFAULT, in which we make sure every
device's MPS setting matches the upstream bridge. This makes it more
likely that a hot-added device will work in a system with optimized MPS
configuration.
Note that if we hot-add a device that only supports 128-byte MPS, it still
likely won't work because we don't reconfigure the rest of the fabric.
Booting with "pci=pcie_bus_peer2peer" is a workaround for this because it
sets MPS to 128 for everything.
[bhelgaas: changelog, new default, rework for pci_configure_device() path] Tested-by: Keith Busch <keith.busch@intel.com> Tested-by: Jordan Hargrave <jharg93@gmail.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Yinghai Lu <yinghai@kernel.org>
(cherry picked from commit 27d868b5e6cfaee4fec66b388e4085ff94050fa7)
Bjorn Helgaas [Thu, 20 Aug 2015 21:08:27 +0000 (16:08 -0500)]
PCI: Move MPS configuration check to pci_configure_device()
Previously we checked for invalid MPS settings, i.e., a device with MPS
different than its upstream bridge, in pcie_bus_detect_mps(). We only did
this if the arch or hotplug driver called pcie_bus_configure_settings(),
and then only if PCIe bus tuning was disabled (PCIE_BUS_TUNE_OFF).
Move the MPS checking code to pci_configure_device(), so we do it in the
pci_device_add() path for every device.
In the ASN.1 decoder, when the length field of an ASN.1 value is extracted,
it isn't validated against the remaining amount of data before being added
to the cursor. With a sufficiently large size indicated, the check:
datalen - dp < 2
may then fail due to integer overflow.
Fix this by checking the length indicated against the amount of remaining
data in both places a definite length is determined.
Whilst we're at it, make the following changes:
(1) Check the maximum size of extended length does not exceed the capacity
of the variable it's being stored in (len) rather than the type that
variable is assumed to be (size_t).
(2) Compare the EOC tag to the symbolic constant ASN1_EOC rather than the
integer 0.
(3) To reduce confusion, move the initialisation of len outside of:
for (len = 0; n > 0; n--) {
since it doesn't have anything to do with the loop counter n.
Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Acked-by: Peter Jones <pjones@redhat.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Backport of upstream commit bd7c5f983f31 ("RDS: TCP: Synchronize accept()
and connect() paths on t_conn_lock.")
An arbitration scheme for duelling SYNs is implemented as part of
commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") which ensures that both nodes
involved will arrive at the same arbitration decision. However, this
needs to be synchronized with an outgoing SYN to be generated by
rds_tcp_conn_connect(). This commit achieves the synchronization
through the t_conn_lock mutex in struct rds_tcp_connection.
The rds_conn_state is checked in rds_tcp_conn_connect() after acquiring
the t_conn_lock mutex. A SYN is sent out only if the RDS connection is
not already UP (an UP would indicate that rds_tcp_accept_one() has
completed 3WH, so no SYN needs to be generated).
Similarly, the rds_conn_state is checked in rds_tcp_accept_one() after
acquiring the t_conn_lock mutex. The only acceptable states (to
allow continuation of the arbitration logic) are UP (i.e., outgoing SYN
was SYN-ACKed by peer after it sent us the SYN) or CONNECTING (we sent
outgoing SYN before we saw incoming SYN).
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Backport of upstream commit eb192840266f ("RDS:TCP: Synchronize
rds_tcp_accept_one with rds_send_xmit when resetting t_sock")
There is a race condition between rds_send_xmit -> rds_tcp_xmit
and the code that deals with resolution of duelling syns added
by commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()").
Specifically, we may end up derefencing a null pointer in rds_send_xmit
if we have the interleaving sequence:
rds_tcp_accept_one rds_send_xmit
The race condition can be avoided without adding the overhead of
additional locking in the xmit path: have rds_tcp_accept_one wait
for rds_tcp_xmit threads to complete before resetting callbacks.
The synchronization can be done in the same manner as rds_conn_shutdown().
First set the rds_conn_state to something other than RDS_CONN_UP
(so that new threads cannot get into rds_tcp_xmit()), then wait for
RDS_IN_XMIT to be cleared in the conn->c_flags indicating that any
threads in rds_tcp_xmit are done.
Fixes: 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
A pattern of skb usage seen in modules such as RDS-TCP is to
extract `to_copy' bytes from the received TCP segment, starting
at some offset `off' into a new skb `clone'. This is done in
the ->data_ready callback, where the clone skb is queued up for rx on
the PF_RDS socket, while the parent TCP segment is returned unchanged
back to the TCP engine.
The existing code uses the sequence
clone = skb_clone(..);
pskb_pull(clone, off, ..);
pskb_trim(clone, to_copy, ..);
with the intention of discarding the first `off' bytes. However,
skb_clone() + pskb_pull() implies pksb_expand_head(), which ends
up doing a redundant memcpy of bytes that will then get discarded
in __pskb_pull_tail().
To avoid this inefficiency, this commit adds pskb_extract() that
creates the clone, and memcpy's only the relevant header/frag/frag_list
to the start of `clone'. pskb_trim() is then invoked to trim clone
down to the requested to_copy bytes.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
rds-stress experiments with request size 256 bytes, 8K acks,
using 16 threads show a 40% improvment when pskb_extract()
replaces the {skb_clone(..); pskb_pull(..); pskb_trim(..);}
pattern in the Rx path, so we leverage the perf gain with
this commit.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 05cc5a39ddb7 "bnx2x: add vlan filtering offload" introduced
a regression in regard for vlans for 57710, 57711 adapters -
Loading 8021q module on a machine with such an adapter would cause
a null pointer dereference, as the driver mistakenly publishes it
has capabilities for vlan CTAG filtering.
Reported-by: Otto Sabart <osabart@redhat.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ab6d7846cf80affc43b9d412fed5e25dfcf4f35d) Signed-off-by: Brian Maly <brian.maly@oracle.com>
IPoIB collects statistics of traffic including number of packets
sent/received, number of bytes transferred, and certain errors. This
patch makes these statistics available to be queried by ethtool.
IPoIB puts skb-fragments in SGEs adding 1 extra SGE when SG is enabled.
Current codepath assumes that the max number of SGEs a device supports
is at least MAX_SKB_FRAGS+1, there is no interaction with upper layers
to limit number of fragments in an skb if a device suports fewer SGEs.
The assumptions also lead to requesting a fixed number of SGEs when
IPoIB creates queue-pairs with SG enabled.
A fallback/slowpath is implemented using skb_linearize to
handle cases where the conversion would result in more sges than supported.
Change-Id: Ia81e69d7231987208ac298300fc5b9734f193a2d Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com> Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Junxiao Bi [Mon, 21 Sep 2015 07:54:06 +0000 (15:54 +0800)]
ocfs2: o2hb: don't negotiate if last hb fail
Sometimes io error is returned when storage is down for a while.
Like for iscsi device, stroage is made offline when session timeout,
and this will make all io return -EIO. For this case, nodes shouldn't
do negotiate timeout but should fence self. So let nodes fence self
when o2hb_do_disk_heartbeat return an error, this is the same behavior
with o2hb without negotiate timer.
Junxiao Bi [Fri, 18 Sep 2015 05:57:33 +0000 (13:57 +0800)]
ocfs2: o2hb: add NEGOTIATE_APPROVE message
This message is used to re-queue write timeout timer and negotiate timer
when all nodes suffer a write hung to storage, this makes node not fence
self if storage down.
Junxiao Bi [Fri, 18 Sep 2015 05:47:39 +0000 (13:47 +0800)]
ocfs2: o2hb: add NEGO_TIMEOUT message
This message is sent to master node when non-master nodes's
negotiate timer expired. Master node records these nodes in
a bitmap which is used to do write timeout timer re-queue
decision.
Junxiao Bi [Fri, 18 Sep 2015 07:15:31 +0000 (15:15 +0800)]
ocfs2: o2hb: add negotiate timer
When storage down, all nodes will fence self due to write timeout.
The negotiate timer is designed to avoid this, with it node will
wait until storage up again.
Negotiate timer working in the following way:
1. The timer expires before write timeout timer, its timeout is half
of write timeout now. It is re-queued along with write timeout timer.
If expires, it will send NEGO_TIMEOUT message to master node(node with
lowest node number). This message does nothing but marks a bit in a
bitmap recording which nodes are negotiating timeout on master node.
2. If storage down, nodes will send this message to master node, then
when master node finds its bitmap including all online nodes, it sends
NEGO_APPROVL message to all nodes one by one, this message will re-queue
write timeout timer and negotiate timer.
For any node doesn't receive this message or meets some issue when
handling this message, it will be fenced.
If storage up at any time, o2hb_thread will run and re-queue all the
timer, nothing will be affected by these two steps.
Peter Hurley [Mon, 11 Jan 2016 06:40:55 +0000 (22:40 -0800)]
tty: Fix unsafe ldisc reference via ioctl(TIOCGETD)
ioctl(TIOCGETD) retrieves the line discipline id directly from the
ldisc because the line discipline id (c_line) in termios is untrustworthy;
userspace may have set termios via ioctl(TCSETS*) without actually
changing the line discipline via ioctl(TIOCSETD).
However, directly accessing the current ldisc via tty->ldisc is
unsafe; the ldisc ptr dereferenced may be stale if the line discipline
is changing via ioctl(TIOCSETD) or hangup.
Wait for the line discipline reference (just like read() or write())
to retrieve the "current" line discipline id.
Cc: <stable@vger.kernel.org> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5c17c861a357e9458001f021a7afa7aab9937439)
Alan Stern [Wed, 16 Dec 2015 18:32:38 +0000 (13:32 -0500)]
USB: fix invalid memory access in hub_activate()
Commit 8520f38099cc ("USB: change hub initialization sleeps to
delayed_work") changed the hub_activate() routine to make part of it
run in a workqueue. However, the commit failed to take a reference to
the usb_hub structure or to lock the hub interface while doing so. As
a result, if a hub is plugged in and quickly unplugged before the work
routine can run, the routine will try to access memory that has been
deallocated. Or, if the hub is unplugged while the routine is
running, the memory may be deallocated while it is in active use.
This patch fixes the problem by taking a reference to the usb_hub at
the start of hub_activate() and releasing it at the end (when the work
is finished), and by locking the hub interface while the work routine
is running. It also adds a check at the start of the routine to see
if the hub has already been disconnected, in which nothing should be
done.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Alexandru Cornea <alexandru.cornea@intel.com> Tested-by: Alexandru Cornea <alexandru.cornea@intel.com> Fixes: 8520f38099cc ("USB: change hub initialization sleeps to delayed_work") CC: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e50293ef9775c5f1cf3fcc093037dd6a8c5684ea)
Commit 8b13eddfdf04cbfa561725cfc42d6868fe896f56 ("netfilter: refactor NAT
redirect IPv4 to use it from nf_tables") has introduced a trivial logic
change which can result in the following crash.
unsigned int
nf_nat_redirect_ipv4(struct sk_buff *skb,
...
{
...
rcu_read_lock();
indev = __in_dev_get_rcu(skb->dev);
if (indev != NULL) {
ifa = indev->ifa_list;
newdst = ifa->ifa_local; <---
}
rcu_read_unlock();
...
}
Before the commit, 'ifa' had been always checked before access. After the
commit, however, it could be accessed even if it's NULL. Interestingly,
this was once fixed in 2003.
In addition to the original one, we have seen the crash when packets that
need to be redirected somehow arrive on an interface which hasn't been
yet fully configured.
This change just reverts the logic to the old behavior to avoid the crash.
Fixes: 8b13eddfdf04 ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables") Signed-off-by: Munehisa Kamata <kamatam@amazon.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 94f9cd81436c85d8c3a318ba92e236ede73752fc)
Andy Lutomirski [Wed, 6 Jan 2016 20:21:01 +0000 (12:21 -0800)]
x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
When switch_mm() activates a new PGD, it also sets a bit that
tells other CPUs that the PGD is in use so that TLB flush IPIs
will be sent. In order for that to work correctly, the bit
needs to be visible prior to loading the PGD and therefore
starting to fill the local TLB.
Document all the barriers that make this work correctly and add
a couple that were missing.
Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 71b3c126e61177eb693423f2e18a1914205b165e)
From Avinash Repaka <avinash.repaka@oracle.com>:
This patch reverts the fix for Orabug: 22661521, since the fix assumes
that the memory region is always aligned on a page boundary, causing an
EMSGSIZE error when trying to register 1MB region that isn't 4KB aligned.
These issues were observed on kernel 4.1.12-39.el6uek tag.
Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
A case can occur when sctp_accept() is called by the user during
a heartbeat timeout event after the 4-way handshake. Since
sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
the listening socket but released with the new association socket.
The result is a deadlock on any future attempts to take the listening
socket lock.
Note that this race can occur with other SCTP timeouts that take
the bh_lock_sock() in the event sctp_accept() is called.
=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
CslRx/12087 is trying to release lock (slock-AF_INET) at:
[<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
but there are no more locks to release!
other info that might help us debug this:
2 locks held by CslRx/12087:
#0: (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
#1: (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]
Ensure the socket taken is also the same one that is released by
saving a copy of the socket before entering the timeout event
critical section.
Signed-off-by: Karl Heiss <kheiss@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 635682a14427d241bab7bbdeebb48a7d7b91638e) Signed-off-by: Brian Maly <brian.maly@oracle.com> Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
Conflicts:
net/sctp/sm_sideeffect.c
chris hyser [Fri, 1 Apr 2016 19:44:22 +0000 (12:44 -0700)]
sparc64: Fix I/O NUMA parsing and sysfs display code.
I/O NUMA node parsing has been broke since T5 and did not work on
T7. The code also did not correctly handle PCIe root complexes
crossbar connected to multiple memory/cpu NUMA nodes. Additionally,
the numa_node attributes displayed in sysfs were incorrect.
Example: T7-4 showing round-robin spread of multiply connected root
complexes.
[ 3723.288247] /pci@305: On NUMA node 0
[ 3723.363398] /pci@304: On NUMA node 2
[ 3723.437486] /pci@307: On NUMA node 0
[ 3723.510510] /pci@306: On NUMA node 2
[ 3723.582582] /pci@313: On NUMA node 0
[ 3723.655276] /pci@308: On NUMA node 2
[ 3723.728077] /pci@302: On NUMA node 0
[ 3723.800774] /pci@30a: On NUMA node 2
[ 3723.874895] /pci@309: On NUMA node 0
[ 3723.947089] /pci@301: On NUMA node 2
[ 3724.020218] /pci@30b: On NUMA node 1
[ 3724.092902] /pci@300: On NUMA node 3
[ 3724.167630] /pci@303: On NUMA node 1
[ 3724.240287] /pci@30c: On NUMA node 3
[ 3724.312245] /pci@312: On NUMA node 1
[ 3724.384857] /pci@30e: On NUMA node 3
[ 3724.457482] /pci@30d: On NUMA node 1
[ 3724.531679] /pci@310: On NUMA node 3
[ 3724.603621] /pci@30f: On NUMA node 1
[ 3724.675695] /pci@311: On NUMA node 3
chris hyser [Thu, 7 Apr 2016 20:55:56 +0000 (13:55 -0700)]
sparc64: Set up core sibling list correctly for T7.
The important definition of core sibling is that some level of cache is shared.
The prior SPARC notion of socket was defined as highest level of shared cache.
On T7 platforms, the MD record now describes the CPUs that share the physical
socket and this is no longer tied to shared cache. This patch correctly
separates these two concepts.
chris hyser [Thu, 7 Apr 2016 19:12:05 +0000 (12:12 -0700)]
sparc64: Fix CPU package information in /sys
CPU package information in
/sys/bus/cpu/devices/cpu*/topology/physical_package_id
is inconisistent with the use by tools such as irqbalance. This patch
uses the socket ID to be consistent and useful.
chris hyser [Thu, 7 Apr 2016 19:32:48 +0000 (12:32 -0700)]
sparc64: Add 3rd level cache info to /sys
This patch pulls line size and cache size info from the machine description and
adds l3 caches files to /sys/bus/cpu/devices/cpu* directories. It also
structures the information in the same directory hierachy as x86 so that user
programs like irqbalance can find the needed information to work correctly.
Rob Gardner [Sun, 27 Mar 2016 22:39:13 +0000 (16:39 -0600)]
sparc64: Add lightweight syscall mechanism for lwp_info
This patch introduces a new "light weight" system call
mechanism which has the ability to retrieve small bits
of information and/or perform minor computations without
the need for a full blown save/switch/restore context.
Solaris provides _lwp_info(), which returns basically the
same information as getrusage(RUSAGE_THREAD) but much faster.
This is used extensively by the database code, and returns
the utime and stime for the calling thread.
(This patch also provides a fast getcpu function just as
a demonstration of how additional calls might be added.
Unlike x86, there is no unprivileged instruction to do this,
and so it is a fairly expensive system call.)