Chad Dupuis [Fri, 13 Jan 2012 15:08:35 +0000 (09:08 -0600)]
qla2xxx: Hard code the number of loop entries at 128.
Do not use ha->max_fibre_devices in loop topology since the maximum number of
entries will always be 128 and so we don't have to worry about changing
ha->max_fibre_devices back.
Giridhar Malavali [Wed, 14 Dec 2011 01:17:47 +0000 (17:17 -0800)]
qla2xxx: Complete mailbox command timedout to avoid initialization failures during next reset cycle.
Complete the mailbox command timed out before initiating another abort cycle
to recover so that mailbox commands issued during next reset cycle don't fail
due to pending mailbox access timeout.
Andrew Vasquez [Tue, 10 Apr 2012 11:52:14 +0000 (17:22 +0530)]
qla2xxx: Cache swl during fabric discovery.
Rather than continuously allocating and freeing swl within the discovery
process, simply pre-allocate it the first time that it's needed, cache it
through the rest of the lifecycle of the driver and free it at module unload.
Joe Carnuccio [Tue, 10 Apr 2012 11:50:36 +0000 (17:20 +0530)]
qla2xxx: Remove EDC sysfs interface.
Since the new firmware periodically resets the EDC, the EDC is now
not able to be flashed while the firmware is running, so the user
applications must be prevented from flashing the EDC, and this is
achieved by removing the EDC sysfs interface.
Michael Christie [Tue, 10 Apr 2012 11:20:23 +0000 (16:50 +0530)]
qla2xxx: Remove check for null fcport from host reset handler.
Remove the check for a NULL fcport so that the host reset will run
unconditionally to unwedge any commands before the device is offlined and to
prevent a quick runthrough of the SCSI error handling.
Andrew Vasquez [Fri, 28 Oct 2011 21:40:44 +0000 (14:40 -0700)]
qla2xxx: Correct out of bounds read of ISP2200 mailbox registers.
From Olatunji:
A tool that I m building for finding memory faults in
Linux drivers is reporting that the following loop, in
qla2x00_mbx_completion(), reads outside the allocated io memory
while reading ISP2200 mailbox registers. I would appreciate your
help in confirming this bug.
During isp2200 initialization (qla2x00_probe_one), ha->mbx_count
is set to 32, even though isp2200 has 24 mailbox registers
(mailbox0 ... mailbox23). Therefore the loop runs for
cnt=[1..31], wptr walks off the allocated mailbox register region
at cnt==24, and results in out-of-bounds reads.
Although I observed this problem in linux2.6.17.1, I
confirmed that it also exists in 2.6.37 and 3.1-rc4.
Fortunately, the reads outside the 24 mailbox registers are
benign. For correctness, limit the driver's read to 24.
Andrew Vasquez [Thu, 20 Oct 2011 17:14:16 +0000 (10:14 -0700)]
qla2xxx: Clear options-flags while issuing stop-firmware mbx command.
Not clearing the options flags in mbx1 could lead the firmware
into interpreting old data in mbx1 through mbx8. This could
lead to inadvertent DMA read/write operations to stale memory.
During command failure/non-recognition, the upper-layer
FC-transport expects the drivers to set
job-reply->reply_payload_rcv_len. Do this in a consistent manner
to avoid duplication.
Andrew Vasquez [Fri, 4 Nov 2011 14:31:51 +0000 (09:31 -0500)]
qla2xxx: Perform implicit logout during rport tear-down.
During rport tear-down, make sure we do an implicit LOGO of the fcport in our
firmware to try to clear any residual commands associated with that fcport.
Rework the structures related to SRB processing to minimize the memor
allocations per I/O and manage resources associated with and completions
from common routines.
Nagalakshmi Nandigama [Mon, 7 May 2012 20:40:00 +0000 (13:40 -0700)]
[mpt2sas] fix NULL pointer at ioc->pfacts
Orabug: 14040678
The ioc->pfacts member in the IOC structure is getting set to zero
following a call to _base_get_ioc_facts due to the memset in that routine.
So if the ioc->pfacts was read after a host reset, there would be a NULL
pointer dereference. The routine _base_get_ioc_facts is called from context
of host reset. The problem in _base_get_ioc_facts is the size of
Mpi2IOCFactsReply is 64, whereas the sizeof "struct mpt2sas_facts" is 60,
so there is a four byte overflow resulting from the memset.
Also, there is memset in _base_get_port_facts using the incorrect structure,
it should be "struct mpt2sas_port_facts" instead of Mpi2PortFactsReply.
Nagalakshmi Nandigama [Mon, 7 May 2012 20:39:25 +0000 (13:39 -0700)]
[mpt2sas] A hard drive is going OFFLINE when there is a hard reset issued
and simultaneously another hard drive is hot unplugged
Orabug: 14040678
Following the host reset, the firmware discovery is reassigning another
hard drive in the topology to the same device handle as that device is
getting hot removed. Until the driver device removal routine is called,
there will be two hard drive with the matching device handle in the
internal device link list. In the device removal routine, a separate
function which moves the device from BLOCKED into OFFLINE state.
Since this routine is passed with the device handle passed as input parameter,
the routine will be traversing the internal device link list searching for
matching device handle. This results in two devices with matching
device handle, therefore both devices goes OFFLINE.
To fix this issue,the input parameter is changed from device handle to
SAS address, therefore only the device that is hot unplugged will be placed
in OFFLINE state.
Nagalakshmi Nandigama [Mon, 7 May 2012 20:38:49 +0000 (13:38 -0700)]
[mpt2sas] Set the phy identifier of the end device to to the phy number of the parent device
it is linked to
Orabug: 14040678
The phy_identifier inside the routine _transport_set_identify()
is set to sas_device_page_zero->PhyNum. This returns the
phy number of the parent device this device is linked to.
Nagalakshmi Nandigama [Mon, 7 May 2012 20:37:52 +0000 (13:37 -0700)]
[mpt2sas] While enabling phy, read the current port number from sas iounit page 0
instead of page 1
Orabug: 14040678
The port number is changing after disabling/enabling phys using the SysFS interface
This is because the firmware behavour changed where it would read the the port number
then set it to some different value even though Auto Port Config is turned on.
With this change of behavour in FW, it is possible that the expanders are moved
from one port to another after disabling /enabling phys. This is occuring because
the port number in sas iounit page 1 is not matching up to the current port in
page 0. In order to fix this the driver is modified to read the current
port number from sas iounit page 0 instead of page 1. Also copy the
port and phy flags over from page 0 to page 1.
Nagalakshmi Nandigama [Mon, 7 May 2012 20:35:49 +0000 (13:35 -0700)]
[mpt2sas] Modify the source code as per the findings reported by the source
code analysis tool
Orabug: 14040678
Modified the source code as per the findings reported by the source
code analysis tool. Source code for the following functionalities
has been touched. None of the driver functionalities has changed.
- SMP Passthrough IOCTL
- Debug messages for MPT Replies (i.e. bit 9 of Logging Level)
- Task Management using sysfs
- Device removal, i.e. when a target device (including any PD within a volume) is removed, and Volume Deletion.
- Trace Buffer
Nagalakshmi Nandigama [Mon, 7 May 2012 20:29:11 +0000 (13:29 -0700)]
[mpt2sas] Improvement were made to better protect the sas_device, raid_device,
and expander_device lists
There were possible race conditions surrounding reading an object
from the link list while from another context in the driver was
removing it. The nature of this enhancement is to rearrange locking
so the link lists are better protected.
Change set:
(1) numerous routines were rearranged so spin locks are held through
the entire time a link list object is being read from or written to.
(2) added new routines for object deletion from link list. Thus ensuring
lock was held during the deletion of the link list object, then and memory
for object freed outside the lock. The memory was freed outside the lock
so driver had access to device object info which was required for
notifying the scsi mid layer that a device was getting deleted.
(3) added the ioc->blocking_handles parameter. This is a bitmask used
to identify which devices need blocking when there is device loss. This was
introduced so that lock can be held for the entire time traversing the link
list objects, and the bitmask was set to indicate which device handles need
blocking. Oustide the lock the ioc->blocking_handles bitmask is traversed,
with the respective device handle the scsi mid layer is called for moving
devices into blocking state.
Nagalakshmi Nandigama [Mon, 7 May 2012 20:27:04 +0000 (13:27 -0700)]
[mpt2sas] Added multisegment mode support for Linux BSG Driver
Orabug: 14040678
Added support for Block IO requests with multiple segments (vectors) in
the SMP handler of the SAS Transport Class. This is required by the
BSG driver. Multisegment support added for both, Request and Response.
Nagalakshmi Nandigama [Mon, 7 May 2012 20:23:48 +0000 (13:23 -0700)]
[mpt2sas] remove the global mutex
Orabug: 14040678
When the lock_kernel and unlock_kernel routines were removed in the
2.6.39 kernel, a global mutex was added on top of the existing mutex
which already existed. With this implementation, only one IOCTL
will be active at any time no matter how many ever controllers
are present. This causes poor performance.
Removed the global mutex so that the driver can work with the existing
semaphore that was already part of the existing code.
Nagalakshmi Nandigama [Mon, 7 May 2012 20:22:18 +0000 (13:22 -0700)]
[mpt2sas] MPI next revision header update
Orabug: 14040678
Changeset in MPI headers:
1) Bumped MPI2_HEADER_VERSION_UNIT
2) Added 4K sectors supported bit to CapabilitiesFlags field of IOC Page 6.
3) Added UEFIVersion field to BIOS Page 1 and defined additional
BiosOptions bits to control UEFI behavior.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:28 +0000 (17:01 -0500)]
Use PCI configure space read to flush PCI function reset register write to avoid MMIO issues (CR 128101)
When adding PCI read following LPe16000 port PCI function reset PortControl
register write for flushing the PCI pipe, the LPe16000 PortStatus register was
used for the PCI readl(). However, it might be an issue on platforms which will
not allow MMIO reads to master abort as the PCI device was not expected to
respond to a readl() following the function reset.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:27 +0000 (17:01 -0500)]
Fixed system panic when extents enabled with large number of small blocks (CR 128010)
When LPe16000 port extents are enable with 24 extents of small blocks, the
system will crash at driver load time. This is because the total number of
sgls posted was not calculated correctly
Vaios Papadimitriou [Tue, 8 May 2012 22:01:27 +0000 (17:01 -0500)]
Fixed the system panic during EEH recovery (CR 127062)
During the EEH recovery process while preparing for function reset, the
mbox_sys_shutdown routine was invoked to shutdown driver's internal mailbox
queue, including the pending (outstanding) mailbox command. There is a window
for a race condition on the pending mailbox command handling when such mailbox
command was just released by the lpfc_sli4_post_async_mbox due to unable to
post such mailbox command because of PCI bus frozen.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:27 +0000 (17:01 -0500)]
Fix resource leak when acc fails for received plogi (CR 127847)
When a port tries to respond to a plogi that it receives and the issue of the
acc fails the mailbox command that was allocated to register the RPI is not
freed. Now, if the issue for the acc or reject fails free the mailbox command
that was allocated to register the RPI.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:27 +0000 (17:01 -0500)]
Fixed SLI4 driver module load and unload test in a loop crashes the system (CR 126397)
Loading/unloading the lpfc driver overnight when an LPe16000 sees multiple
targets can result in a kernel panic. This is because the board was not
correctly being reset, now correctly clear the Status register so the proper
reset is done.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:27 +0000 (17:01 -0500)]
Fixed missing CVL event causing round-robin FCF failover process to stop (CR 123367)
It was found during virtual fabric testing that a virtual link mismatch was
created. This is because after a CVL was causing the driver to stop the FCF
failover process. The driver will now not break out of the failover process
if the flogi fails and the CVL event is expected.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:26 +0000 (17:01 -0500)]
Fix same RPI registered multiple times after HBA reset (CR 127176)
After HBA reset I/Os never complete and system eventually hangs, this is
because the BE adapters need to have RPI headers reposted before rpis can be
assigned to NDLPs. By moving the call to lpfc_sli4_node_prep to be after the
call to post_all_rpi headers this posts the headers before the rpis are
assigned.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:26 +0000 (17:01 -0500)]
Fix driver handling of XRI Aborted CQE response (CR 127345)
When driver issued an ABTS, the aborted IO and ABTS WQE request would complete
as expected, but the XRI_ABORTED_CQE notification that the ABTS protocol
completed caused the driver to tear down the rport mapping for each completed
ABTS. Now process the status and extended status before making a decision to
tear down the target status.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:26 +0000 (17:01 -0500)]
Fixed port and system failure in handling SLI4 FC port function reset (CR 126551)
When performing function reset to LPe16000 port which is connected to a fabric
and has targets (LUNs) discovered in the zone, sometimes the reset can fail and
end up in the port being unresponsive or system crash or hang. Now, when
reposting the SCSI SGL list after function reset, set the logical XRI allocated
bit in the logical XRI bmask to account for the XRIs posted, so duplicated
XRI will not be allocated later for new SGLs. Also, reset the used xri counts,
and updated them properly during function reset.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:26 +0000 (17:01 -0500)]
Fix for SLI4 Port delivery for BLS ABORT ACC (CR 126289)
An unsolicited ABTS recieved on an SLI4 port does not properly complete the
exchange.
Fix:
In __lpfc_sli_issue_iocb_s4, allow CMD_XMIT_BLS_RSP64_CX to allocate an SGL.
In lpfc_sli4_bpl2sgl, allow CMD_XMIT_BLS_RSP64_CX to just return the xri_tag.
In lpfc_sli4_iocb2wqe, setup CT context to use VPI for CMD_XMIT_BLS_RSP64_CX.
In lpfc_sli4_seq_abort_rsp_cmpl log port error.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:25 +0000 (17:01 -0500)]
Fix ndlp nodelist not empty wait timeout during driver unloading (CR 127052)
In lpfc_set_rrq_active the code allocates an RRQ while lock are held. The
driver either need to make this allocation ATOMIC or remove the allocation from
the lock. We choose to remove the locked version of lpfc_set_rrq_active since
there were no users of this function. Then rearrange the code so that the
allocation does not occur while the lock is held.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:25 +0000 (17:01 -0500)]
Fix management communication issues by creating character device to take a reference on the driver (CR 126082)
The management userspace applications have no way to take a lock on the driver
to prevent it from unloading. To remedy this a character device is created that
will up the reference count on lpfc by one when it is opened and decrement it by
one when it is closed.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:25 +0000 (17:01 -0500)]
Fix for FDISC failures after firmware reset or link bounce (CR 126779)
The driver failed to discover target on vports after a link bounce. This is a
regression on SLI4 adapters where the SID in the FDISC was set to a non-zero
value. There is no use case for a non-zero SID in the FDISC. Thefix is to clear
fc_myDID to guarentee a zero SID
Vaios Papadimitriou [Tue, 8 May 2012 22:01:25 +0000 (17:01 -0500)]
Fix for driver using duplicate RPIs after LPe16000 port reset (CR 126723)
The RPI bit map is reinitiatized in the adapter port 'online' path. SLI4 RPI are
designed to be 'long lived', so when the adapter port is taken offline, the
driver will reuse the RPI if the port is recovered within devloss tmo.
These stale RPI values can collide when new RPIs are allocated. We now free RPIs
on all active nodes and then allocate new RPIs
Vaios Papadimitriou [Tue, 8 May 2012 22:01:25 +0000 (17:01 -0500)]
Fix discovery problem when in pt2pt (CR 126887)
When a target is direct connected in pt2pt topology it is not discovered by the
driver. The confirm nport routine is called during plogi completion. It will
look up the ndlp using the service parameter wwpn in the target response. If the
ndlp that is returned in the lookup does not match the ndlp that the plogi was
sent with confirm_nport updates the new ndlp with old ndlp's information.
Confirm nport has to make sure that only one ndlp with that wwpn is active
before returning so it sets the old ndlp's state to NPR. It set the state
before it copied the state to the new ndlp so both ndlp's end up in NPR. When
the plogi completion routine calls the state machine with the plogi complete
event and the ndlp in NPR the ndlp'd state stays in NPR. The state machine is
stopped for this ndlp. Because it was the only target discovery is completed.
The old ndlp state is now copied to the new ndlp before setting the old one to
NPR
Vaios Papadimitriou [Tue, 8 May 2012 22:01:24 +0000 (17:01 -0500)]
Fixed failure in handling large CQ/EQ identifiers in an IOV environment (CR 126856)
In an SR-IOV environment, when creating virtual functions, the driver failed to
issue INIT_LINK mailbox commands properly when attaching to the virtual
functions. The driver will now write into CQ/EQ doorbell registers by taking
both the lower and the possible higher bit CQ/EQ identifier fields into
consideration to comply for with the spec for handling INIT_LINK mailbox
commands.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:24 +0000 (17:01 -0500)]
Fix Locking code raising IRQ twice
Remove the irq part of the locking and unlocking calls, this could have caused
a deadlock because the cpu could have interrupted this thread while the hbalock
was still held.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:23 +0000 (17:01 -0500)]
Fix driver does not reset port when reset is needed during fw_dump (CR 125807)
A port error was detacted during rest, this is because the driver was not
looking for a RN flag in the status reg. Now only fail the reset if ERR bit is
set and the reset needed flag is not.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:23 +0000 (17:01 -0500)]
Fix ELS FDISC failing with local reject / invalid RPI (CR 126350)
No FDISC seen on the wire when running with SLI4, the ELS command fails with
local reject / invalid RPI. Now allow the FDISC ELS command to use the
temporary RPI and the Destination DID for SLI4-FC.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:23 +0000 (17:01 -0500)]
Fix SLI4 FC port internal loopback (CR 126409)
LPe16000s could fail internal loopback tests do to an issue with the serdes.
The loopback was properly changed to use internal rather then serdes internal.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:23 +0000 (17:01 -0500)]
Fix bug with driver processing an els command using 16Gb FC Adapter (CR 126345)
ELS echo fails on an LPe16000 adapter because the driver was not setting up
the ulpContext correctly. The ulpContext is now properly set from the rpi_ids
table for SLI4 devices.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:22 +0000 (17:01 -0500)]
Fixed SLI4 FC port obtained link type and number dependent on link connection (CR 126264)
There are places in the driver diagnostic code space, which picked up
SLI4 FC port link type and number depend on link connection from asynchronous
link event. In those cases, instead of using the link type and link number
obtainedfrom the asynchronous link event, used the link type and link number
obtained from the READ_CONFIG mailbox command from SLI4 setup routine , which
will not depend on an external link or loopback plug present.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:22 +0000 (17:01 -0500)]
Fixed SLI4 FC port internal loopback without SFP and external link/loopback plug (CR 125843)
When performing internal loopback diagnostic test on LPe16000 port without SFP
present or without external link/loopback plug plugged in, internal loopback on
port1 failed. Instead of using the link type and link number obtained reported
from the asynchronous link event, used the link type and link number obtained
from the READ_CONFIG mailbox command from SLI4 setup routine , which will not
depend on an external link or loopback plug present.
Vaios Papadimitriou [Tue, 8 May 2012 22:01:22 +0000 (17:01 -0500)]
Fix driver incorrectly building fcpCdb during scsi command prep (CR 126209)
Some scsi inquiry commands were failing with sense key 0x5 and ASC/ASCQ values
of 24/00. At times, this failure cause retries over several hours because the
driver was returning DID_BUS_BUSY. These retries and failures were clogging up
the console logs. Now, always initialize the fcpCdb to 0 during
lpfc_scsi_prep_cmnd. After the memset, only copy scsi_cmnd->cmd_len bytes into
the fcpCdb.
Somnath Kotur [Wed, 2 May 2012 03:40:49 +0000 (03:40 +0000)]
be2net: Record receive queue index in skb to aid RPS.
Signed-off-by: Sarveshwar Bandi <Sarveshwar.Bandi@emulex.com> Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>