James Smart [Fri, 28 May 2021 21:22:40 +0000 (14:22 -0700)]
scsi: lpfc: Fix failure to transmit ABTS on FC link
The abort_cmd_ia flag in an abort wqe describes whether an ABTS basic link
service should be transmitted on the FC link or not. Code added in
lpfc_sli4_issue_abort_iotag() set the abort_cmd_ia flag incorrectly,
surpressing ABTS transmission.
A previous LPFC change to build an abort wqe inverted prior logic that
determined whether an ABTS was to be issued on the FC link.
Revert this logic to its proper state.
Link: https://lore.kernel.org/r/20210528212240.11387-1-jsmart2021@gmail.com Fixes: db7531d2b377 ("scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers") Cc: <stable@vger.kernel.org> # v5.11+ Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Maurizio Lombardi [Mon, 31 May 2021 12:13:26 +0000 (14:13 +0200)]
scsi: target: core: Fix warning on realtime kernels
On realtime kernels, spin_lock_irq*(spinlock_t) do not disable the
interrupts, a call to irqs_disabled() will return false thus firing a
warning in __transport_wait_for_tasks().
Remove the warning and also replace assert_spin_locked() with
lockdep_assert_held()
Resetting interrupt aggregation counters first and reading the
DOOR_BELL afterward allows us to handle all the completed requests. In
order to prevent other interrupts starvation the DB is read once after
reset. The down side of this solution is the possibility of false
interrupt if device completes another request after resetting
aggregation and before reading the DB.
Prevent that ufshcd_intr() reports a false positive "Unhandled interrupt"
message if the above scenario is triggered.
Link: https://lore.kernel.org/r/20210519202058.12634-2-bvanassche@acm.org Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Can Guo <cang@codeaurora.org> Cc: Bean Huo <beanhuo@micron.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Asutosh Das <asutoshd@codeaurora.org> Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Suganath Prabu S [Tue, 18 May 2021 05:16:25 +0000 (10:46 +0530)]
scsi: mpt3sas: Handle firmware faults during second half of IOC init
If a firmware fault occurs while scanning the devices during IOC
initialization then the driver issues the hard reset operation to recover
the IOC. However, the driver is not issuing a Port enable request
message as part of hard reset operation during IOC initialization. Due to
this, the driver will not receive get any device discovery-related events
and hence devices will not be accessible.
Teach the driver to gracefully handle firmware faults while scanning for
target devices during IOC initialization. Make the driver issue a port
enable request message as part of hard reset operation. This permits
receiving device discovery-related events from the firmware after the hard
reset operation completes.
Suganath Prabu S [Tue, 18 May 2021 05:16:24 +0000 (10:46 +0530)]
scsi: mpt3sas: Handle firmware faults during first half of IOC init
During first half of IOC initialization (i.e. before going for device
scanning), if any firmware fault occurs then driver is aborting the IOC
initialization operation.
Modify the driver to issue a diag reset operation to recover IOC from fault
state and reinitialize the IOC.
Suganath Prabu S [Tue, 18 May 2021 05:16:23 +0000 (10:46 +0530)]
scsi: mpt3sas: Fix deadlock while cancelling the running firmware event
Do not cancel current running firmware event work if the event type is
different from MPT3SAS_REMOVE_UNRESPONDING_DEVICES. Otherwise a deadlock
can be observed while cancelling the current firmware event work if a hard
reset operation is called as part of processing the current event.
John Garry [Wed, 19 May 2021 14:31:02 +0000 (22:31 +0800)]
scsi: core: Cap scsi_host cmd_per_lun at can_queue
The sysfs handling function sdev_store_queue_depth() enforces that the sdev
queue depth cannot exceed shost can_queue. The initial sdev queue depth
comes from shost cmd_per_lun. However, the LLDD may manually set
cmd_per_lun to be larger than can_queue, which leads to an initial sdev
queue depth greater than can_queue.
Such an issue was reported in [0], which caused a hang. That has since been
fixed in commit fc09acb7de31 ("scsi: scsi_debug: Fix cmd_per_lun, set to
max_queue").
Stop this possibly happening for other drivers by capping shost cmd_per_lun
at shost can_queue.
scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal
Target de-configuration panics at high CPU load because TPGT and WWPN can
be removed on separate threads.
TPGT removal requests a reset HBA on a separate thread and waits for reset
complete (phase1). Due to high CPU load that HBA reset can be delayed for
some time.
WWPN removal does qlt_stop_phase2(). There it is believed that phase1 has
already completed and thus tgt.tgt_ops is subsequently cleared. However,
tgt.tgt_ops is needed to process incoming traffic and therefore this will
cause one of the following panics:
James Smart [Fri, 14 May 2021 19:55:59 +0000 (12:55 -0700)]
scsi: lpfc: Update lpfc version to 12.8.0.10
Update lpfc version to 12.8.0.10
Link: https://lore.kernel.org/r/20210514195559.119853-12-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 14 May 2021 19:55:58 +0000 (12:55 -0700)]
scsi: lpfc: Reregister FPIN types if ELS_RDF is received from fabric controller
FC-LS-5 specifies that a received RDF implies a possible change to fabric
supported diagnostic functions. Endpoints are to re-perform the RDF
exchange with the fabric to enable possible new features or adapt to
changes in values.
This patch adds the logic to RDF receive to re-perform the RDF exchange
with the switch.
Link: https://lore.kernel.org/r/20210514195559.119853-11-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 14 May 2021 19:55:57 +0000 (12:55 -0700)]
scsi: lpfc: Add a option to enable interlocked ABTS before job completion
Default behavior for the driver, when aborting an I/O, is to terminate the
I/O with the adapter. The adapter will initiate an ABTS to terminate the
exchange on the link and mark the exchange is terminated so that no further
use of the sgl or any traffic for the exchange is worked on. Completion on
the Abort is then posted to the driver, which as the I/O is terminated can
complete the I/O to the OS. This completion may occur prior to the ABTS
handshake completing on the wire. The ABTS handshake can take a long time
to complete with timeouts and retries reaching 60+ seconds. Note: if
retries fail, LOGO occurs.
Some devices want to ensure that the ABTS handshake fully completes (this
device has fully ack'd it) before the I/O completion is posted back to the
OS, where a failed I/O may be retried via a different path.
To support this behavior, an option was added to the driver to change I/O
completion from the Abort cmd completion to the Exchange termination (aka
ABTS) completion.
Link: https://lore.kernel.org/r/20210514195559.119853-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 14 May 2021 19:55:56 +0000 (12:55 -0700)]
scsi: lpfc: Fix crash when lpfc_sli4_hba_setup() fails to initialize the SGLs
The driver is encountering a crash in lpfc_free_iocb_list() while
performing initial attachment.
Code review found this to be an errant failure path that was taken, jumping
to a tag that then referenced structures that were uninitialized.
Fix the failure path.
Link: https://lore.kernel.org/r/20210514195559.119853-9-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 14 May 2021 19:55:55 +0000 (12:55 -0700)]
scsi: lpfc: Ignore GID-FT response that may be received after a link flip
When a link bounce happens, there is a possibility that responses to
requests posted prior to the link bounce could be received. This is
problematic as the counter to track reglogin completion after link up can
become out of sync with the real state.
As there is no reason to process a request made in a prior link up context,
eliminate all the disturbance by tagging the request with the event_tag
maintained by the SLI Port for the link. The event_tag will change on every
link state transition. As long as the tag matches the current event_tag,
the response can be processed. If it doesn't match, just discard the
response.
Link: https://lore.kernel.org/r/20210514195559.119853-8-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 14 May 2021 19:55:54 +0000 (12:55 -0700)]
scsi: lpfc: Fix node handling for Fabric Controller and Domain Controller
During link bounce testing, RPI counts were seen to differ from the number
of nodes. For fabric and domain controllers, a temporary RPI is assigned,
but the code isn't registering it. If the nodes do go away, such as on link
down, the temporary RPI isn't being released.
Change the way these two fabric services are managed, make them behave like
any other remote port. Register the RPI and register with the transport.
Never leave the nodes in a NPR or UNUSED state where their RPI is in limbo.
This allows them to follow normal dev_loss_tmo handling, RPI refcounting,
and normal removal rules. It also allows fabric I/Os to use the RPI for
traffic requests.
Note: There is some logic that still has a couple of exceptions when the
Domain controller (0xfffcXX). There are cases where the fabric won't have a
valid login but will send RDP. Other times, it will it send a LOGO then an
RDP. It makes for ad-hoc behavior to manage the node. Exceptions are
documented in the code.
Link: https://lore.kernel.org/r/20210514195559.119853-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 14 May 2021 19:55:53 +0000 (12:55 -0700)]
scsi: lpfc: Fix Node recovery when driver is handling simultaneous PLOGIs
When lpfc is handling a solicited and unsolicited PLOGI with another
initiator, the remote initiator is never recovered. The node for the
initiator is erroneouosly removed and all resources released.
In lpfc_cmpl_els_plogi(), when lpfc_els_retry() returns a failure code, the
driver is calling the state machine with a device remove event because the
remote port is not currently registered with the SCSI or NVMe
transports. The issue is that on a PLOGI "collision" the driver correctly
aborts the solicited PLOGI and allows the unsolicited PLOGI to complete the
process, but this process is interrupted with a device_rm event.
Introduce logic in the PLOGI completion to capture the PLOGI collision
event and jump out of the routine. This will avoid removal of the node.
If there is no collision, the normal node removal will occur.
Fixes: 52edb2caf675 ("scsi: lpfc: Remove ndlp when a PLOGI/ADISC/PRLI/REG_RPI ultimately fails") Cc: <stable@vger.kernel.org> # v5.11+ Link: https://lore.kernel.org/r/20210514195559.119853-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 14 May 2021 19:55:52 +0000 (12:55 -0700)]
scsi: lpfc: Add ndlp kref accounting for resume RPI path
The driver is crashing due to a bad pointer during driver load due in an
adisc acc receive routine. The driver is missing node get/put in the
mbx_resume_rpi paths.
Fix by adding the proper gets and puts into the resume_rpi path.
Link: https://lore.kernel.org/r/20210514195559.119853-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 14 May 2021 19:55:51 +0000 (12:55 -0700)]
scsi: lpfc: Fix "Unexpected timeout" error in direct attach topology
An 'unexpected timeout' message may be seen in a point-2-point topology.
The message occurs when a PLOGI is received before the driver is notified
of FLOGI completion. The FLOGI completion failure causes discovery to be
triggered for a second time. The discovery timer is restarted but no new
discovery activity is initiated, thus the timeout message eventually
appears.
In point-2-point, when discovery has progressed before the FLOGI completion
is processed, it is not a failure. Add code to FLOGI completion to detect
that discovery has progressed and exit the FLOGI handling (noop'ing it).
Link: https://lore.kernel.org/r/20210514195559.119853-4-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 14 May 2021 19:55:50 +0000 (12:55 -0700)]
scsi: lpfc: Fix non-optimized ERSP handling
When processing an NVMe ERSP IU which didn't match the optimized CQE-only
path, the status was being left to the WQE status. WQE status is non-zero
as it is indicating a non-optimized completion that needs to be handled by
the driver.
Fix by clearing the status field when falling into the non-optimized
case. Log message added to track optimized vs non-optimized debug.
Link: https://lore.kernel.org/r/20210514195559.119853-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 14 May 2021 19:55:49 +0000 (12:55 -0700)]
scsi: lpfc: Fix unreleased RPIs when NPIV ports are created
While testing NPIV and watching logins and used RPI levels, it was seen the
used RPI count was much higher than the number of remote ports discovered.
Code inspection showed that remote port removals on any NPIV instance are
releasing the RPI, but not performing an UNREG_RPI with the adapter thus
the reference counting never fully drops and the RPI is never fully
released. This was happening on NPIV nodes due to a log of fabric ELS's to
fabric addresses. This lack of UNREG_RPI was introduced by a prior node
rework patch that performed the UNREG_RPI as part of node cleanup.
To resolve the issue, do the following:
- Restore the RPI release code, but move the location to so that it is in
line with the new node cleanup design.
- NPIV ports now release the RPI and drop the node when the caller sets
the NLP_RELEASE_RPI flag.
- Set the NLP_RELEASE_RPI flag in node cleanup which will trigger a
release of RPI to free pool.
- Ensure there's an UNREG_RPI at LOGO completion so that RPI release is
completed.
- Stop offline_prep from skipping nodes that are UNUSED. The RPI may
not have been released.
- Stop the default RPI handling in lpfc_cmpl_els_rsp() for SLI4.
- Fixed up debugfs RPI displays for better debugging.
Fixes: a70e63eee1c1 ("scsi: lpfc: Fix NPIV Fabric Node reference counting") Link: https://lore.kernel.org/r/20210514195559.119853-2-jsmart2021@gmail.com Cc: <stable@vger.kernel.org> # v5.11+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Martin Wilck [Fri, 14 May 2021 15:32:14 +0000 (17:32 +0200)]
scsi: scsi_dh_alua: Retry RTPG on a different path after failure
If an RTPG fails, we can't infer anything wrt. the state of the ports in
the port group except that we were unable to reach the one port on which
the RTPG had failed. "offline" is just a secondary port state, which means
that we can't infer the state of any port in the PG from the failure (in
fact, even the failed port might still be in "active/optimized" primary
port access state).
Therefore, when we encounter an RTPG failure, we should retry the RTPG on a
different port. This avoids falsely setting port states to offline for
unreachable ports. To do this, ports on which an RTPG has failed are
temporarily set to "disabled" to avoid repeating the failed I/O on the same
target port. Once the RTPG has either succeeded on one port or failed on
all ports of the PG, the ports are enabled again.
Matt Wang [Wed, 19 May 2021 09:49:32 +0000 (09:49 +0000)]
scsi: vmw_pvscsi: Set correct residual data length
Some commands (such as INQUIRY) may return less data than the initiator
requested. To avoid conducting useless information, set the right residual
count to make upper layer aware of this.
Link: https://lore.kernel.org/r/20210515230358.GA97544@60d1edce16e0 Fixes: 9814b55cde05 ("scsi: target: tcmu: Return from tcmu_handle_completions() if cmd_id not found") CC: Bodo Stroesser <bostroesser@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Wed, 19 May 2021 20:20:58 +0000 (13:20 -0700)]
scsi: ufs: Use designated initializers in ufs_pm_lvl_states[]
The comments in the enum ufs_pm_level definition are redundant. Remove the
comments from the ufs_pm_level enum and use designated initializers in the
ufs_pm_lvl_states[] definition instead.
Link: https://lore.kernel.org/r/20210519202058.12634-3-bvanassche@acm.org Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Can Guo <cang@codeaurora.org> Cc: Bean Huo <beanhuo@micron.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sergey Shtylyov [Wed, 19 May 2021 19:20:15 +0000 (22:20 +0300)]
scsi: hisi_sas: Propagate errors in interrupt_init_v1_hw()
After commit 6c11dc060427 ("scsi: hisi_sas: Fix IRQ checks") we have the
error codes returned by platform_get_irq() ready for the propagation
upsream in interrupt_init_v1_hw() -- that will fix still broken deferred
probing. Let's propagate the error codes from devm_request_irq() as well
since I don't see the reason to override them with -ENOENT...
Daniel Wagner [Thu, 20 May 2021 07:31:27 +0000 (09:31 +0200)]
scsi: scsi_transport_fc: Remove double FC_FPORT_DELETED in mask creation
Remove the double listed FC_FPORT_DELETING from the mask creation.
Commit 260f4aeddb48 ("scsi: scsi_transport_fc: return -EBUSY for deleted
vport") added VC_VPORT_DELETING to the flag masks. This is not necessary as
FC_FPORT_DEL is defined as VC_FPORT_DELETED | FC_FPORT_DELETING.
This has us use raw_smp_processor_id() in iblock's plug_device callout.
smp_processor_id() is not needed here, because we are running from a per
CPU work item that is also queued to run on a worker thread that is
normally bound to a specific CPU. If the worker thread did end up switching
CPUs then it's handled the same way we handle when the work got moved to a
different CPU's worker thread, where we will just end up sending I/O from
the new CPU.
Bodo Stroesser [Wed, 19 May 2021 13:54:40 +0000 (15:54 +0200)]
scsi: target: tcmu: Fix xarray RCU warning
Commit f5ce815f34bc ("scsi: target: tcmu: Support DATA_BLOCK_SIZE = N *
PAGE_SIZE") introduced xas_next() calls to iterate xarray elements. These
calls triggered the WARNING "suspicious RCU usage" at tcmu device set up
[1]. In the call stack of xas_next(), xas_load() was called. According to
its comment, this function requires "the xa_lock or the RCU lock".
To avoid the warning:
- Guard the small loop calling xas_next() in tcmu_get_empty_block with RCU
lock.
- In the large loop in tcmu_copy_data using RCU lock would possibly
disable preemtion for a long time (copy multi MBs). Therefore replace
XA_STATE, xas_set and xas_next with a single xa_load.
Shin'ichiro Kawasaki [Sat, 15 May 2021 07:03:15 +0000 (16:03 +0900)]
scsi: target: core: Avoid smp_processor_id() in preemptible code
The BUG message "BUG: using smp_processor_id() in preemptible [00000000]
code" was observed for TCMU devices with kernel config DEBUG_PREEMPT.
The message was observed when blktests block/005 was run on TCMU devices
with fileio backend or user:zbc backend [1]. The commit 1130b499b4a7
("scsi: target: tcm_loop: Use LIO wq cmd submission helper") triggered the
symptom. The commit modified work queue to handle commands and changed
'current->nr_cpu_allowed' at smp_processor_id() call.
The message was also observed at system shutdown when TCMU devices were not
cleaned up [2]. The function smp_processor_id() was called in SCSI host
work queue for abort handling, and triggered the BUG message. This symptom
was observed regardless of the commit 1130b499b4a7 ("scsi: target:
tcm_loop: Use LIO wq cmd submission helper").
To avoid the preemptible code check at smp_processor_id(), get CPU ID with
raw_smp_processor_id() instead. The CPU ID is used for performance
improvement then thread move to other CPU will not affect the code.
Bart Van Assche [Sun, 9 May 2021 21:38:17 +0000 (14:38 -0700)]
scsi: ufs: ufs-exynos: Move definitions from .h to .c
In the Linux kernel definitions of data structures should occur in .c
files. Hence move the exynos7_uic_attr definition from a .h into a .c
file. Additionally, declare exynos_ufs_drvs static. This patch fixes the
following two sparse warnings:
drivers/scsi/ufs/ufs-exynos.h:248:28: warning: symbol 'exynos_ufs_drvs' was not declared. Should it be static?
drivers/scsi/ufs/ufs-exynos.h:250:28: warning: symbol 'exynos7_uic_attr' was not declared. Should it be static?
Link: https://lore.kernel.org/r/20210509213817.4348-1-bvanassche@acm.org Cc: Alim Akhtar <alim.akhtar@samsung.com> Cc: Kiwoong Kim <kwmad.kim@samsung.com> Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ajish Koshy [Wed, 5 May 2021 12:01:03 +0000 (17:31 +0530)]
scsi: pm80xx: Fix drives missing during rmmod/insmod loop
When driver is loaded after rmmod some drives are not showing up during
discovery.
SATA drives are directly attached to the controller connected phys. During
device discovery, the IDENTIFY command (qc timeout (cmd 0xec)) is timing out
during revalidation. This will trigger abort from host side and controller
successfully aborts the command and returns success. Post this successful
abort response ATA library decides to mark the disk as NODEV.
To overcome this, inside pm8001_scan_start() after phy_start() call, add get
start response and wait for few milliseconds to trigger next phy start.
This millisecond delay will give sufficient time for the controller state
machine to accept next phy start.
Samuel Holland [Tue, 27 Apr 2021 23:59:15 +0000 (18:59 -0500)]
scsi: 3w-9xxx: Fix endianness issues in command packets
The controller expects all data it sends/receives to be little-endian.
Therefore, the packet struct definitions should use the __le16/32/64
types. Once those are correct, sparse reports several issues with the
driver code, which are fixed here as well.
The main issue observed was at the call to scsi_set_resid(), where the
byteswapped parameter would eventually trigger the alignment check at
drivers/scsi/sd.c:2009. At that point, the kernel would continuously
complain about an "Unaligned partial completion", and no further I/O could
occur.
This gets the controller working on big endian powerpc64.
Samuel Holland [Tue, 27 Apr 2021 23:59:14 +0000 (18:59 -0500)]
scsi: 3w-9xxx: Reduce scope of structure packing
Currently, all command packet structs used by this driver are packed.
However, only one (TW_SG_Entry) actually needs to be packed, because it
uses 64-bit addresses at 32-bit alignment. To improve the quality of
generated code, stop packing all of the other command packet structs. This
requires adjusting the type of one misaligned "reserved" member.
After this change, pahole reports that only one type had its layout change:
the tw_compat_info member of TW_Device_Extension is now naturally aligned.
Samuel Holland [Tue, 27 Apr 2021 23:59:13 +0000 (18:59 -0500)]
scsi: 3w-9xxx: Use flexible array members to avoid struct padding
In preparation for removing the "#pragma pack(1)" from the driver, fix all
instances where a trailing array member could be replaced by a flexible
array member. Since a flexible array member has zero size, it introduces no
padding, whether or not the struct is packed.
Nigel Christian [Thu, 13 May 2021 22:20:32 +0000 (17:20 -0500)]
scsi: be2iscsi: Remove redundant initialization
The nested for loop variables i and j in beiscsi_free_mem() are initialized
twice. The values outside of the loops are redundant and can be removed.
Addresses-Coverity: ("Unused value") Link: https://lore.kernel.org/r/YJ2mMHNqAgTNVVj+@fedora Signed-off-by: Nigel Christian <nigel.l.christian@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Thu, 13 May 2021 17:12:29 +0000 (10:12 -0700)]
scsi: ufs: core: Remove usfhcd_is_*_pm() macros
Remove these macros to make the UFS driver source code easier to read.
These macros were introduced by commit 57d104c153d3 ("ufs: add UFS power
management support").
Link: https://lore.kernel.org/r/20210513171229.7439-1-bvanassche@acm.org Cc: Can Guo <cang@codeaurora.org> Cc: Alim Akhtar <alim.akhtar@samsung.com> Cc: Avri Altman <avri.altman@wdc.com> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Bean Huo <beanhuo@micron.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Konstantin Shelekhin [Thu, 13 May 2021 19:28:03 +0000 (22:28 +0300)]
scsi: target: core: Bump INQUIRY VERSION to SPC-4
Bump the SCSI primary command set standard to SPC-4. The upcoming version
descriptors will report newer SCSI standards (like SBC-3) that are not
defined in SPC-3.
Jiapeng Chong [Thu, 13 May 2021 10:49:37 +0000 (18:49 +0800)]
scsi: target: sbp_target: Remove redundant assignment to pg_size
Variable pg_size is set to '0x100 << pg_size', but this value is never read
and it is not used later on. Hence it is a redundant assignment and can be
removed.
Clean up the following clang-analyzer warning:
drivers/target/sbp/sbp_target.c:1264:3: warning: Value stored to
'pg_size' is never read [clang-analyzer-deadcode.DeadStores].
Brian King [Tue, 11 May 2021 18:12:20 +0000 (13:12 -0500)]
scsi: ibmvfc: Reinit target retries
If rport target discovery commands fail for some reason, they get retried
up to a set number of retries. Once the retry limit is exceeded, the target
is deleted. In order to delete the target, we either need to do an implicit
logout or a move login. In the move login case, if the move login fails, we
want to retry it. This ensures the retry counter gets reinitialized so the
move login will get retried.
Brian King [Tue, 11 May 2021 18:12:19 +0000 (13:12 -0500)]
scsi: ibmvfc: Avoid move login if fast fail is enabled
If fast fail is enabled and we encounter a WWPN moving from one port id to
another port id with I/O outstanding, if we use the move login MAD,
although it will work, it will leave any outstanding I/O still outstanding
to the old port id. Eventually, the SCSI command timers will fire and we
will abort these commands, however, this is generally much longer than the
fast fail timeout, which can lead to I/O operations being outstanding for a
long time. This patch changes the behavior to avoid the move login if fast
fail is enabled. Once terminate_rport_io cleans up the rport, then we force
the target back through the delete process, which re-drives the implicit
logout, then kicks us back into discovery where we will discover the WWPN
at the new location and do a PLOGI to it.
Brian King [Tue, 11 May 2021 18:12:18 +0000 (13:12 -0500)]
scsi: ibmvfc: Handle move login failure
When service is being performed on an SVC with NPIV enabled, the WWPN of
the canister / node being serviced fails over to the another canister /
node. This looks to the ibmvfc driver as a WWPN moving from one SCSI ID to
another. The driver will first attempt to do an implicit logout of the old
SCSI ID. If this works, we simply delete the rport at the old location and
add an rport at the new location and the FC transport class handles
everything. However, if there is I/O outstanding, this implicit logout will
fail, in which case we will send a "move login" request to the VIOS. This
will cancel any outstanding I/O to that port, logout the port, and PLOGI
the new port. Recently we've encountered a scenario where the move login
fails. This was resulting in an attempted plogi to the new scsi id, without
the old scsi id getting logged out, which is a VIOS protocol violation. To
solve this, we want to keep tracking the old scsi id as the current scsi
id. That way, once terminate_rport_io cancels the outstanding i/o, it will
send us back through to do an implicit logout of the old scsi id, rather
than the new scsi id, and then we can plogi the new scsi id.
API qedf_link_update() is getting called from QED but by that time
shost_data is not initialised. This results in a NULL pointer dereference
when we try to dereference shost_data while updating supported_speeds.
Add a NULL pointer check before dereferencing shost_data.
Link: https://lore.kernel.org/r/20210512072533.23618-1-jhasan@marvell.com Fixes: 61d8658b4a43 ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiang Chen [Mon, 10 May 2021 11:35:26 +0000 (19:35 +0800)]
scsi: core: Fix a comment in function scsi_host_dev_release()
Commit 3be8828fc507 ("scsi: core: Avoid that ATA error handling can
trigger a kernel hang or oops") moved rcu to scsi_cmnd instead of
shost. Modify "shost->rcu" to "scmd->rcu" in a comment.
Guenter Roeck [Mon, 10 May 2021 04:12:11 +0000 (21:12 -0700)]
scsi: qedf: Drop unnecessary NULL checks after container_of()
The result of container_of() operations is never NULL unless the embedded
element is the first element of the structure, which is not the case here.
The NULL checks are therefore unnecessary and misleading. Remove them.
The changes in this patch were made automatically using the following
Coccinelle script.
@@
type t;
identifier v;
statement s;
@@
<+...
(
t v = container_of(...);
|
v = container_of(...);
)
...
when != v
- if (\( !v \| v == NULL \) ) s
...+>
Guenter Roeck [Mon, 10 May 2021 04:08:17 +0000 (21:08 -0700)]
scsi: target: iscsi: Drop unnecessary container_of()
The structure pointer passed to container_of() is never NULL; that was
already checked. That means that the result of container_of() operations
on it is also never NULL, even though se_node_acl is the first element
of the structure embedding it. On top of that, it is misleading to perform
a NULL check on the result of container_of() because the position of the
contained element could change, which would make the test invalid.
Remove the unnecessary NULL check.
As it turns out, the container_of operation was only made for the purpose
of the NULL check. If the container_of is actually needed, it is repeated
later. Remove the container_of operation as well.
The NULL check was identified and removed with the following Coccinelle
script.
@@
type t;
identifier v;
statement s;
@@
<+...
(
t v = container_of(...);
|
v = container_of(...);
)
...
when != v
- if (\( !v \| v == NULL \) ) s
...+>
Link: https://lore.kernel.org/r/20210510040817.2050266-1-linux@roeck-us.net Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Hou Pu <houpu@bytedance.com> Cc: Mike Christie <michael.christie@oracle.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Thu, 13 May 2021 16:49:12 +0000 (09:49 -0700)]
scsi: ufs: core: Increase the usable queue depth
With the current implementation of the UFS driver active_queues is 1
instead of 0 if all UFS request queues are idle. That causes
hctx_may_queue() to divide the queue depth by 2 when queueing a request and
hence reduces the usable queue depth.
The shared tag set code in the block layer keeps track of the number of
active request queues. blk_mq_tag_busy() is called before a request is
queued onto a hwq and blk_mq_tag_idle() is called some time after the hwq
became idle. blk_mq_tag_idle() is called from inside blk_mq_timeout_work().
Hence, blk_mq_tag_idle() is only called if a timer is associated with each
request that is submitted to a request queue that shares a tag set with
another request queue.
Adds a blk_mq_start_request() call in ufshcd_exec_dev_cmd(). This doubles
the queue depth on my test setup from 16 to 32.
In addition to increasing the usable queue depth, also fix the
documentation of the 'timeout' parameter in the header above
ufshcd_exec_dev_cmd().
Link: https://lore.kernel.org/r/20210513164912.5683-1-bvanassche@acm.org Fixes: 7252a3603015 ("scsi: ufs: Avoid busy-waiting by eliminating tag conflicts") Cc: Can Guo <cang@codeaurora.org> Cc: Alim Akhtar <alim.akhtar@samsung.com> Cc: Avri Altman <avri.altman@wdc.com> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Bean Huo <beanhuo@micron.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Matt Wang [Tue, 11 May 2021 03:04:37 +0000 (03:04 +0000)]
scsi: BusLogic: Fix 64-bit system enumeration error for Buslogic
Commit 391e2f25601e ("[SCSI] BusLogic: Port driver to 64-bit")
introduced a serious issue for 64-bit systems. With this commit,
64-bit kernel will enumerate 8*15 non-existing disks. This is caused
by the broken CCB structure. The change from u32 data to void *data
increased CCB length on 64-bit system, which introduced an extra 4
byte offset of the CDB. This leads to incorrect response to INQUIRY
commands during enumeration.
Fix disk enumeration failure by reverting the portion of the commit
above which switched the data pointer from u32 to void.
Peter Wang [Wed, 12 May 2021 10:01:45 +0000 (18:01 +0800)]
scsi: ufs: ufs-mediatek: Fix power down spec violation
As per spec, e.g. JESD220E chapter 7.2, while powering off the UFS device,
RST_N signal should be between VSS(Ground) and VCCQ/VCCQ2. The power down
sequence after fixing:
Add a new sysfs group which has nodes to monitor data/request transfer
performance. This sysfs group has nodes showing total sectors/requests
transferred, total busy time spent and max/min/avg/sum latencies. This
group can be enhanced later to show more UFS driver layer performance
statistics data during runtime.
Gustavo A. R. Silva [Wed, 21 Apr 2021 18:56:11 +0000 (13:56 -0500)]
scsi: aacraid: Replace one-element array with flexible-array member
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].
Refactor the code according to the use of a flexible-array member in struct
aac_raw_io2 instead of one-element array, and use the struct_size() helper.
Also, this helps with the ongoing efforts to enable -Warray-bounds by
fixing the following warnings:
drivers/scsi/aacraid/aachba.c: In function ‘aac_build_sgraw2’:
drivers/scsi/aacraid/aachba.c:3970:18: warning: array subscript 1 is above array bounds of ‘struct sge_ieee1212[1]’ [-Warray-bounds]
3970 | if (rio2->sge[j].length % (i*PAGE_SIZE)) {
| ~~~~~~~~~^~~
drivers/scsi/aacraid/aachba.c:3974:27: warning: array subscript 1 is above array bounds of ‘struct sge_ieee1212[1]’ [-Warray-bounds]
3974 | nseg_new += (rio2->sge[j].length / (i*PAGE_SIZE));
| ~~~~~~~~~^~~
drivers/scsi/aacraid/aachba.c:4011:28: warning: array subscript 1 is above array bounds of ‘struct sge_ieee1212[1]’ [-Warray-bounds]
4011 | for (j = 0; j < rio2->sge[i].length / (pages * PAGE_SIZE); ++j) {
| ~~~~~~~~~^~~
drivers/scsi/aacraid/aachba.c:4012:24: warning: array subscript 1 is above array bounds of ‘struct sge_ieee1212[1]’ [-Warray-bounds]
4012 | addr_low = rio2->sge[i].addrLow + j * pages * PAGE_SIZE;
| ~~~~~~~~~^~~
drivers/scsi/aacraid/aachba.c:4014:33: warning: array subscript 1 is above array bounds of ‘struct sge_ieee1212[1]’ [-Warray-bounds]
4014 | sge[pos].addrHigh = rio2->sge[i].addrHigh;
| ~~~~~~~~~^~~
drivers/scsi/aacraid/aachba.c:4015:28: warning: array subscript 1 is above array bounds of ‘struct sge_ieee1212[1]’ [-Warray-bounds]
4015 | if (addr_low < rio2->sge[i].addrLow)
| ~~~~~~~~~^~~
Asutosh Das [Sat, 24 Apr 2021 00:20:16 +0000 (17:20 -0700)]
scsi: ufs: core: Enable power management for wlun
During runtime-suspend of ufs host, the SCSI devices are already suspended
and so are the queues associated with them. However, the ufs host sends SSU
(START_STOP_UNIT) to the wlun during runtime-suspend.
During the process blk_queue_enter() checks if the queue is not in suspended
state. If so, it waits for the queue to resume, and never comes out of
it. Commit 52abca64fd94 ("scsi: block: Do not accept any requests while
suspended") adds the check to see if the queue is in suspended state in
blk_queue_enter().
Fix this by registering ufs device wlun as a SCSI driver and registering it
for block runtime-pm. Also make this a supplier for all other LUNs. This
way the wlun device suspends after all the consumers and resumes after HBA
resumes. This also registers a new SCSI driver for rpmb wlun. This new
driver is mostly used to clear rpmb uac.
[mkp: resolve merge conflict with 5.13-rc1 and fix doc warning]
Fixed smatch warnings: Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/4662c462e79e3e7f541f54f88f8993f421026d83.1619223249.git.asutoshd@codeaurora.org Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Co-developed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Colin Ian King [Tue, 20 Apr 2021 10:49:19 +0000 (11:49 +0100)]
scsi: megaraid_mbox: Remove redundant initialization of pointer mbox
The pointer mbox is being initialized with a value that is never read and
it is being updated later with a new value. The initialization is
redundant and can be removed.
Randy Dunlap [Sun, 18 Apr 2021 20:32:59 +0000 (13:32 -0700)]
scsi: message: fusion: Documentation cleanup
Fix kernel-doc warnings, spellos, and typos.
../drivers/message/fusion/mptsas.c:432: warning: Function parameter or member 'sas_address' not described in 'mptsas_find_portinfo_by_sas_address'
../drivers/message/fusion/mptsas.c:432: warning: Excess function parameter 'handle' description in 'mptsas_find_portinfo_by_sas_address'
../drivers/message/fusion/mptsas.c:581: warning: Function parameter or member 'slot' not described in 'mptsas_add_device_component'
../drivers/message/fusion/mptsas.c:581: warning: Function parameter or member 'enclosure_logical_id' not described in 'mptsas_add_device_component'
../drivers/message/fusion/mptsas.c:678: warning: Function parameter or member 'starget' not described in 'mptsas_add_device_component_starget_ir'
../drivers/message/fusion/mptsas.c:678: warning: Excess function parameter 'channel' description in 'mptsas_add_device_component_starget_ir'
../drivers/message/fusion/mptsas.c:678: warning: Excess function parameter 'id' description in 'mptsas_add_device_component_starget_ir'
../drivers/message/fusion/mptsas.c:990: warning: Function parameter or member 'ioc' not described in 'mptsas_find_vtarget'
../drivers/message/fusion/mptsas.c:990: warning: Function parameter or member 'channel' not described in 'mptsas_find_vtarget'
../drivers/message/fusion/mptsas.c:990: warning: Function parameter or member 'id' not described in 'mptsas_find_vtarget'
../drivers/message/fusion/mptsas.c:990: warning: expecting prototype for csmisas_find_vtarget(). Prototype was for mptsas_find_vtarget() instead
../drivers/message/fusion/mptsas.c:1064: warning: Function parameter or member 'ioc' not described in 'mptsas_target_reset'
../drivers/message/fusion/mptsas.c:1064: warning: Function parameter or member 'channel' not described in 'mptsas_target_reset'
../drivers/message/fusion/mptsas.c:1064: warning: Function parameter or member 'id' not described in 'mptsas_target_reset'
../drivers/message/fusion/mptsas.c:1135: warning: Function parameter or member 'ioc' not described in 'mptsas_target_reset_queue'
../drivers/message/fusion/mptsas.c:1135: warning: Function parameter or member 'sas_event_data' not described in 'mptsas_target_reset_queue'
../drivers/message/fusion/mptsas.c:1217: warning: Function parameter or member 'mf' not described in 'mptsas_taskmgmt_complete'
../drivers/message/fusion/mptsas.c:1217: warning: Function parameter or member 'mr' not described in 'mptsas_taskmgmt_complete'
../drivers/message/fusion/mptsas.c:1311: warning: Function parameter or member 'ioc' not described in 'mptsas_ioc_reset'
../drivers/message/fusion/mptsas.c:1311: warning: Function parameter or member 'reset_phase' not described in 'mptsas_ioc_reset'
../drivers/message/fusion/mptsas.c:1311: warning: expecting prototype for mptscsih_ioc_reset(). Prototype was for mptsas_ioc_reset() instead
../drivers/message/fusion/mptsas.c:1951: warning: expecting prototype for mptsas_mptsas_eh_timed_out(). Prototype was for mptsas_eh_timed_out() instead
../drivers/message/fusion/mptsas.c:3623: warning: Function parameter or member 'fw_event' not described in 'mptsas_send_expander_event'
../drivers/message/fusion/mptsas.c:3623: warning: Excess function parameter 'ioc' description in 'mptsas_send_expander_event'
../drivers/message/fusion/mptsas.c:3623: warning: Excess function parameter 'expander_data' description in 'mptsas_send_expander_event'
../drivers/message/fusion/mptsas.c:4010: warning: Excess function parameter 'sas_address' description in 'mptsas_scan_sas_topology'
../drivers/message/fusion/mptsas.c:4783: warning: Function parameter or member 'issue_reset' not described in 'mptsas_issue_tm'
../drivers/message/fusion/mptsas.c:4856: warning: Function parameter or member 'fw_event' not described in 'mptsas_broadcast_primitive_work'
../drivers/message/fusion/mptsas.c:4856: warning: Excess function parameter 'work' description in 'mptsas_broadcast_primitive_work'
mptsas.c:984: warning: missing initial short description on line:
* csmisas_find_vtarget
mptsas.c:993: warning: expecting prototype for csmisas_find_vtarget(). Prototype
was for mptsas_find_vtarget() instead
mptsas.c:1053: warning: missing initial short description on line:
* mptsas_target_reset
mptsas.c:1057: warning: contents before sections
mptsas.c:1125: warning: missing initial short description on line:
* mptsas_target_reset_queue
mptsas.c:1131: warning: contents before sections
mptsas.c:1308: warning: missing initial short description on line:
* mptsas_ioc_reset
Link: https://lore.kernel.org/r/20210418203259.835-1-rdunlap@infradead.org Cc: Sathya Prakash <sathya.prakash@broadcom.com> Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> Cc: MPT-FusionLinux.pdl@broadcom.com Cc: linux-scsi@vger.kernel.org Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Randy Dunlap [Sun, 18 Apr 2021 20:32:46 +0000 (13:32 -0700)]
scsi: mpt3sas: Documentation cleanup
Fix kernel-doc warnings, spellos, and typos.
drivers/scsi/mpt3sas/mpt3sas_base.c:5430: warning: Excess function parameter 'ct' description in '_base_allocate_pcie_sgl_pool'
drivers/scsi/mpt3sas/mpt3sas_base.c:5493: warning: Excess function parameter 'ctr' description in '_base_allocate_chain_dma_pool'
mpt3sas_base.c:1362: warning: missing initial short description on line:
* _base_display_reply_info -
mpt3sas_base.c:2151: warning: contents before sections
mpt3sas_base.c:2314: warning: missing initial short description on line:
* base_make_prp_nvme -
Link: https://lore.kernel.org/r/20210418203246.782-1-rdunlap@infradead.org Cc: linux-scsi@vger.kernel.org Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Sathya Prakash <sathya.prakash@broadcom.com> Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> Cc: MPT-FusionLinux.pdl@broadcom.com Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>