]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
3 years agoscsi: message: fusion: Remove usage of the deprecated "pci-dma-compat.h" API
Christophe JAILLET [Thu, 6 Jan 2022 21:54:05 +0000 (22:54 +0100)]
scsi: message: fusion: Remove usage of the deprecated "pci-dma-compat.h" API

In [1], Christoph Hellwig has proposed to remove the wrappers in
include/linux/pci-dma-compat.h.

Some reasons why this API should be removed have been given by Julia Lawall
in [2].

A coccinelle script has been used to perform the needed transformation.  It
can be found in [3].

In this patch, all functions but pci_alloc_consistent() are handled.
pci_alloc_consistent() needs more attention and explanation.

[1]: https://lore.kernel.org/kernel-janitors/20200421081257.GA131897@infradead.org/
[2]: https://lore.kernel.org/kernel-janitors/alpine.DEB.2.22.394.2007120902170.2424@hadrien/
[3]: https://lore.kernel.org/kernel-janitors/20200716192821.321233-1-christophe.jaillet@wanadoo.fr/

Link: https://lore.kernel.org/r/e38e897fbd3314718315b0e357c824e3f01775d6.1641500561.git.christophe.jaillet@wanadoo.fr
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: megaraid: Avoid mismatched storage type sizes
Kees Cook [Wed, 5 Jan 2022 17:36:33 +0000 (09:36 -0800)]
scsi: megaraid: Avoid mismatched storage type sizes

Remove needless use of mbox_t, replacing with just struct
mbox_out. Silences compiler warnings under a -Warray-bounds build:

drivers/scsi/megaraid.c: In function 'megaraid_probe_one':
drivers/scsi/megaraid.c:3615:30: error: array subscript 'mbox_t[0]' is partly outside array bounds of 'unsigned char[15]' [-Werror=array-bounds]
 3615 |         mbox->m_out.xferaddr = (u32)adapter->buf_dma_handle;
      |         ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/megaraid.c:3599:23: note: while referencing 'raw_mbox'
 3599 |         unsigned char raw_mbox[sizeof(struct mbox_out)];
      |                       ^~~~~~~~

Link: https://lore.kernel.org/r/20220105173633.2421129-1-keescook@chromium.org
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: megaraidlinux.pdl@broadcom.com
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hisi_sas: Remove unused variable and check in hisi_sas_send_ata_reset_each_phy()
Xiang Chen [Tue, 4 Jan 2022 12:42:06 +0000 (20:42 +0800)]
scsi: hisi_sas: Remove unused variable and check in hisi_sas_send_ata_reset_each_phy()

In commit 29e2bac87421 ("scsi: hisi_sas: Fix some issues related to
asd_sas_port->phy_list"), we use asd_sas_port->phy_mask instead of
accessing asd_sas_port->phy_list, and it is enough to use
asd_sas_port->phy_mask to check the state of phy, so remove the unused
check and variable.

Link: https://lore.kernel.org/r/1641300126-53574-1-git-send-email-chenxiang66@hisilicon.com
Fixes: 29e2bac87421 ("scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_list")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Colin King <colin.i.king@gmail.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: aic79xx: Remove redundant error variable
Minghao Chi [Tue, 4 Jan 2022 11:24:52 +0000 (11:24 +0000)]
scsi: aic79xx: Remove redundant error variable

Return the value from ahd_linux_queue_abort_cmd() directly instead of using
a redundant variable.

Link: https://lore.kernel.org/r/20220104112452.601899-1-chi.minghao@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: pm80xx: Port reset timeout error handling correction
Ajish Koshy [Tue, 28 Dec 2021 11:17:53 +0000 (16:47 +0530)]
scsi: pm80xx: Port reset timeout error handling correction

Error handling steps were not in sequence as per the programmers
manual. Expected sequence:

 - PHY_DOWN (PORT_IN_RESET)

 - PORT_RESET_TIMER_TMO

 - Host aborts pending I/Os

 - Host deregister the device

 - Host sends HW_EVENT_PHY_DOWN ACK

Previously we were sending HW_EVENT_PHY_DOWN ACK first and then deregister
the device. Fix this to use the expected sequence.

Link: https://lore.kernel.org/r/20211228111753.10802-1-Ajish.Koshy@microchip.com
Signed-off-by: Ajish Koshy <Ajish.Koshy@microchip.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Fix formatting problems in some kernel-doc comments
Yang Li [Fri, 31 Dec 2021 08:23:50 +0000 (16:23 +0800)]
scsi: mpi3mr: Fix formatting problems in some kernel-doc comments

Remove some warnings found by running scripts/kernel-doc, which is caused
by using 'make W=1'.

drivers/scsi/mpi3mr/mpi3mr_fw.c:2188: warning: Function parameter or
member 'reason_code' not described in 'mpi3mr_check_rh_fault_ioc'

drivers/scsi/mpi3mr/mpi3mr_fw.c:3650: warning: Excess function parameter
'init_type' description in 'mpi3mr_init_ioc'

drivers/scsi/mpi3mr/mpi3mr_fw.c:4177: warning: bad line

Link: https://lore.kernel.org/r/20211231082350.19315-1-yang.lee@linux.alibaba.com
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Fix some spelling mistakes
Colin Ian King [Fri, 24 Dec 2021 17:52:40 +0000 (17:52 +0000)]
scsi: mpi3mr: Fix some spelling mistakes

There are some spelling mistakes in some literal strings. Fix them.

Link: https://lore.kernel.org/r/20211224175240.1348942-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpt3sas: Update persistent trigger pages from sysfs interface
Suganath Prabu S [Mon, 27 Dec 2021 05:30:55 +0000 (11:00 +0530)]
scsi: mpt3sas: Update persistent trigger pages from sysfs interface

Store sysfs-provided trigger values into the corresponding persistent
trigger pages. Otherwise trigger entries are not persistent across system
reboots.

Link: https://lore.kernel.org/r/20211227053055.289537-1-suganath-prabu.subramani@broadcom.com
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: core: Fix scsi_mode_select() interface
Damien Le Moal [Wed, 29 Sep 2021 09:17:44 +0000 (18:17 +0900)]
scsi: core: Fix scsi_mode_select() interface

The modepage argument is unused. Remove it.

Link: https://lore.kernel.org/r/20210929091744.706003-3-damien.lemoal@wdc.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: aacraid: Fix spelling of "its"
Randy Dunlap [Thu, 23 Dec 2021 06:11:19 +0000 (22:11 -0800)]
scsi: aacraid: Fix spelling of "its"

Use the possessive "its" instead of the contraction "it's" in user
messages.

Link: https://lore.kernel.org/r/20211223061119.18304-1-rdunlap@infradead.org
Cc: Adaptec OEM Raid Solutions <aacraid@microsemi.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: qedf: Fix potential dereference of NULL pointer
Jiasheng Jiang [Thu, 16 Dec 2021 10:14:49 +0000 (18:14 +0800)]
scsi: qedf: Fix potential dereference of NULL pointer

The return value of dma_alloc_coherent() needs to be checked to avoid use
of NULL pointer in case of an allocation failure.

Link: https://lore.kernel.org/r/20211216101449.375953-1-jiasheng@iscas.ac.cn
Fixes: 61d8658b4a43 ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Bump driver version to 8.0.0.61.0
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:59 +0000 (19:41 +0530)]
scsi: mpi3mr: Bump driver version to 8.0.0.61.0

Update the driver version to newer version format i.e. 8.0.0.61.0.

Link: https://lore.kernel.org/r/20211220141159.16117-26-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Fixes around reply request queues
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:58 +0000 (19:41 +0530)]
scsi: mpi3mr: Fixes around reply request queues

Set reply queue depth of 1K for B0 and 4K for A0.

While freeing the segmented request queues use the actual queue depth that
is used while creating them.

Link: https://lore.kernel.org/r/20211220141159.16117-25-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Enhanced Task Management Support Reply handling
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:57 +0000 (19:41 +0530)]
scsi: mpi3mr: Enhanced Task Management Support Reply handling

Enhance driver to consider MPI3_IOCSTATUS_SCSI_IOC_TERMINATED as a success
for TMs issued by it and check the pending I/Os to decide the success or
failure of the task management requests instead of just considering the
MPI3_IOCSTATUS_SCSI_IOC_TERMINATED as a failure of the task management
request.

Link: https://lore.kernel.org/r/20211220141159.16117-24-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Use TM response codes from MPI3 headers
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:56 +0000 (19:41 +0530)]
scsi: mpi3mr: Use TM response codes from MPI3 headers

Remove locally defined TM response codes and use codes from MPI3 headers.

Link: https://lore.kernel.org/r/20211220141159.16117-23-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Add io_uring interface support in I/O-polled mode
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:55 +0000 (19:41 +0530)]
scsi: mpi3mr: Add io_uring interface support in I/O-polled mode

Add support for the io_uring interface in I/O-polled mode.

This feature is disabled in the driver by default. To enable the feature, a
module parameter "poll_queues" has to be set with the desired number of
polling queues.

When the feature is enabled, the driver reserves a certain number of
operational queue pairs for the poll_queues either from the available queue
pairs or creates additional queue pairs based on the operational queue
availability.

The Polling queues will have corresponding IRQ and ISR functions as similar
to default queues. However, the IRQ line is disabled by the driver for
poll_queues.

Link: https://lore.kernel.org/r/20211220141159.16117-22-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Print cable mngnt and temp threshold events
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:54 +0000 (19:41 +0530)]
scsi: mpi3mr: Print cable mngnt and temp threshold events

Print cable management & temperature threshold event data.

Use vendor id & device id macro definitions from MPI3 headers.

Link: https://lore.kernel.org/r/20211220141159.16117-21-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Support Prepare for Reset event
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:53 +0000 (19:41 +0530)]
scsi: mpi3mr: Support Prepare for Reset event

The IOC sends a Prepare for Reset Event to the host to prepare for a Soft
Reset. This event data has two reason codes:

 1. Start - The host is expected to gracefully quiesce all I/O within
    approximately 1 second.

 2. Abort - The IOC is requesting to abort a previous Prepare for Reset
    Event request. Normal I/O may be resumed.

Link: https://lore.kernel.org/r/20211220141159.16117-20-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Add Event acknowledgment logic
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:52 +0000 (19:41 +0530)]
scsi: mpi3mr: Add Event acknowledgment logic

Add Event acknowledgment logic.

Link: https://lore.kernel.org/r/20211220141159.16117-19-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Gracefully handle online FW update operation
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:51 +0000 (19:41 +0530)]
scsi: mpi3mr: Gracefully handle online FW update operation

Enhance driver to gracefully handle discrepancies in certain key data sizes
between firmware update operations as mentioned below:

 - The driver displays an error message and marks the controller as
   unrecoverable if the firmware reports ReplyFrameSize that is greater
   than the current ReplyFrameSize.

 - If the firmware reports ReplyFrameSize greater than the current
   ReplyFrameSize then the driver uses the current ReplyFrameSize while
   copying the reply messages.

 - The driver displays an error message and marks the controller as
   unrecoverable if the firmware reports MaxOperationalReplyQueues less
   than the currently allocated operational reply queues count.

 - If the firmware reports MaxOperationalReplyQueues that is greater than
   the currently allocated operational reply queue count then the driver
   ignores the new increased value and uses the previously allocated number
   of operational queues only.

 - If the firmware reports MaxDevHandle greater than the previously used
   MaxDevHandle value after a reset then the driver re-allocates the
   'device remove pending bitmap' buffer with the newer size using
   krealloc().

Link: https://lore.kernel.org/r/20211220141159.16117-18-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Detect async reset that occurred in firmware
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:50 +0000 (19:41 +0530)]
scsi: mpi3mr: Detect async reset that occurred in firmware

Detect asynchronous reset that occurred in the firmware by polling for
reset history bit of IOC status register is set and if that bit is set,
then the driver waits for the controller to become ready and then
re-initializes the controller.

Also reduce the time driver is waiting for the controller to acknowledge
the reset action after issuing a specific reset action to the
controller. The wait time is reduced from 510 seconds to 30 seconds. If the
controller didn't acknowledge a specific reset action within the time
interval then the driver marks the controller as unrecoverable instead of
retrying two more times prior to giving up.

Link: https://lore.kernel.org/r/20211220141159.16117-17-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Add IOC reinit function
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:49 +0000 (19:41 +0530)]
scsi: mpi3mr: Add IOC reinit function

Add IOC reinitialization function.

Link: https://lore.kernel.org/r/20211220141159.16117-16-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Handle offline FW activation in graceful manner
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:48 +0000 (19:41 +0530)]
scsi: mpi3mr: Handle offline FW activation in graceful manner

Currently the driver marks the controller as unrecoverable if there is an
asynchronous reset or fault during the initialization, reinitialization
post reset, and OS resume.

Enhance driver to retry the initialization, re-initialization, and resume
sequences for a maximum of 3 times if the controller became faulty or
asynchronously reset due to a firmware activation during the initialization
sequence.

Link: https://lore.kernel.org/r/20211220141159.16117-15-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Code refactor of IOC init - part2
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:47 +0000 (19:41 +0530)]
scsi: mpi3mr: Code refactor of IOC init - part2

Move the IOC initialization's bring up logic to mpi3mr_bring_ioc_ready()
routine.

Link: https://lore.kernel.org/r/20211220141159.16117-14-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Code refactor of IOC init - part1
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:46 +0000 (19:41 +0530)]
scsi: mpi3mr: Code refactor of IOC init - part1

Separate out reply and sense buffer allocation and initialization into two
routines and call only initialization routine while issuing the IOC Init
request message.

Also move out the event enable logic to a separate function.

Link: https://lore.kernel.org/r/20211220141159.16117-13-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Fault IOC when internal command gets timeout
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:45 +0000 (19:41 +0530)]
scsi: mpi3mr: Fault IOC when internal command gets timeout

Save snapdump and fault the controller with the given reason code if it is
already not in the fault or not in asynchronous reset. This ensures that
soft reset is issued from the watchdog thread.  This will also be used to
handle initialization time faults/resets/timeout as in those cases
immediate soft reset invocation is not required.

Link: https://lore.kernel.org/r/20211220141159.16117-12-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Display IOC firmware package version
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:44 +0000 (19:41 +0530)]
scsi: mpi3mr: Display IOC firmware package version

Display IOC firmware package version by reading component image upload
data.

Link: https://lore.kernel.org/r/20211220141159.16117-11-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Handle unaligned PLL in unmap cmnds
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:43 +0000 (19:41 +0530)]
scsi: mpi3mr: Handle unaligned PLL in unmap cmnds

The following special handling is needed for UNMAP commands issued to NVMe
drives:

 - On B0 boards, if the parameter list length is greater than 24 and not a
   16-byte multiple, then truncate the parameter list length to a 16-byte
   multiple.

 - On A0 boards, if the parameter list length is greater than block
   descriptor data length + 8, then truncate the parameter list length to
   block descriptor data length + 8 value.

Link: https://lore.kernel.org/r/20211220141159.16117-10-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Increase internal cmnds timeout to 60s
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:42 +0000 (19:41 +0530)]
scsi: mpi3mr: Increase internal cmnds timeout to 60s

 - Increase internal command timeout to 60 seconds.

 - Enable 16 device removal handshake processing in parallel in the device
   removal handshake infrastructure.

Link: https://lore.kernel.org/r/20211220141159.16117-9-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Do access status validation before adding devices
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:41 +0000 (19:41 +0530)]
scsi: mpi3mr: Do access status validation before adding devices

Add validation for various access statuses prior to exposing attached
target device to the operating system.

Link: https://lore.kernel.org/r/20211220141159.16117-8-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Add support for PCIe Managed Switch SES device
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:40 +0000 (19:41 +0530)]
scsi: mpi3mr: Add support for PCIe Managed Switch SES device

The SAS4 Controller firmware exposes the SES devices in Managed PCIe Switch
as a PCIe Device Type SCSI Device
(MPI3_DEVICE0_PCIE_DEVICE_INFO_TYPE_SCSI_DEVICE).

Driver is enhanced to handle this device type by:

 - Exposing the device to the upper layers and

 - Not updating any hardware sectors & virtual boundary settings as these
   settings are needed only for NVMe devices.

Link: https://lore.kernel.org/r/20211220141159.16117-7-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Update MPI3 headers - part2
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:39 +0000 (19:41 +0530)]
scsi: mpi3mr: Update MPI3 headers - part2

Continued updating MPI3 headers.

Link: https://lore.kernel.org/r/20211220141159.16117-6-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Update MPI3 headers - part1
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:38 +0000 (19:41 +0530)]
scsi: mpi3mr: Update MPI3 headers - part1

Update MPI3 headers.

Link: https://lore.kernel.org/r/20211220141159.16117-5-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Don't reset IOC if cmnds flush with reset status
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:37 +0000 (19:41 +0530)]
scsi: mpi3mr: Don't reset IOC if cmnds flush with reset status

Don't issue the soft reset if internal commands are flushed out with reset
status. Soft reset needs to be issued only if commands are really timed
out.

Link: https://lore.kernel.org/r/20211220141159.16117-4-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Replace spin_lock() with spin_lock_irqsave()
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:36 +0000 (19:41 +0530)]
scsi: mpi3mr: Replace spin_lock() with spin_lock_irqsave()

Use spin_lock_irqsave() instead of spin_lock() while acquiring
reply_free_queue_lock & sbq_lock locks.

Link: https://lore.kernel.org/r/20211220141159.16117-3-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mpi3mr: Add debug APIs based on logging_level bits
Sreekanth Reddy [Mon, 20 Dec 2021 14:11:35 +0000 (19:41 +0530)]
scsi: mpi3mr: Add debug APIs based on logging_level bits

Add debug print functions which will print messages based on logging_level
bits enabled.

Link: https://lore.kernel.org/r/20211220141159.16117-2-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: pmcraid: Don't use GFP_DMA in pmcraid_alloc_sglist()
Christoph Hellwig [Wed, 22 Dec 2021 09:22:47 +0000 (10:22 +0100)]
scsi: pmcraid: Don't use GFP_DMA in pmcraid_alloc_sglist()

The driver doesn't express DMA addressing limitation under 32-bits anywhere
else, so remove the spurious GFP_DMA allocation.

Link: https://lore.kernel.org/r/20211222092247.928711-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: snic: Don't use GFP_DMA in snic_queue_report_tgt_req()
Christoph Hellwig [Wed, 22 Dec 2021 09:20:48 +0000 (10:20 +0100)]
scsi: snic: Don't use GFP_DMA in snic_queue_report_tgt_req()

The driver doesn't express DMA addressing limitation under 32-bits anywhere
else, so remove the spurious GFP_DMA allocation.

Link: https://lore.kernel.org/r/20211222092048.925829-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: myrs: Don't use GFP_DMA
Christoph Hellwig [Wed, 22 Dec 2021 09:19:35 +0000 (10:19 +0100)]
scsi: myrs: Don't use GFP_DMA

The myrs devices supports 64-bit addressing, so remove the spurious GFP_DMA
allocations.

Link: https://lore.kernel.org/r/20211222091935.925624-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: myrb: Don't use GFP_DMA in myrb_pdev_slave_alloc()
Christoph Hellwig [Wed, 22 Dec 2021 09:18:01 +0000 (10:18 +0100)]
scsi: myrb: Don't use GFP_DMA in myrb_pdev_slave_alloc()

The driver doesn't express DMA addressing limitation under 32-bits anywhere
else, so remove the spurious GFP_DMA allocation.

Link: https://lore.kernel.org/r/20211222091801.924745-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: initio: Don't use GFP_DMA in initio_probe_one()
Christoph Hellwig [Wed, 22 Dec 2021 09:16:30 +0000 (10:16 +0100)]
scsi: initio: Don't use GFP_DMA in initio_probe_one()

The driver doesn't express DMA addressing limitation under 32-bits anywhere
else, so remove the spurious GFP_DMA allocation.

Link: https://lore.kernel.org/r/20211222091630.922788-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: sr: Don't use GFP_DMA
Christoph Hellwig [Wed, 22 Dec 2021 09:08:42 +0000 (10:08 +0100)]
scsi: sr: Don't use GFP_DMA

The allocated buffers are used as a command payload, for which the block
layer and/or DMA API do the proper bounce buffering if needed.

Link: https://lore.kernel.org/r/20211222090842.920724-1-hch@lst.de
Reported-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ch: Don't use GFP_DMA
Christoph Hellwig [Wed, 22 Dec 2021 09:03:11 +0000 (10:03 +0100)]
scsi: ch: Don't use GFP_DMA

The allocated buffers are used as a command payload, for which the block
layer and/or DMA API do the proper bounce buffering if needed.

Link: https://lore.kernel.org/r/20211222090311.916624-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hisi_sas: Use autosuspend for the host controller
Xiang Chen [Mon, 20 Dec 2021 11:21:38 +0000 (19:21 +0800)]
scsi: hisi_sas: Use autosuspend for the host controller

The controller may frequently enter and exit suspend for each I/O which we
need to deal with. This is inefficient and may cause too much suspend and
resume activity for the controller.  To avoid this, use a default 5s
autosuspend for the controller to stop frequently suspending and
resuming. This value may still be modified via sysfs interfaces.

Link: https://lore.kernel.org/r/1639999298-244569-16-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: libsas: Keep host active while processing events
Xiang Chen [Mon, 20 Dec 2021 11:21:37 +0000 (19:21 +0800)]
scsi: libsas: Keep host active while processing events

Processing events such as PORTE_BROADCAST_RCVD may cause dependency issues
for runtime power management support.  Such a problem would be that
handling a PORTE_BROADCAST_RCVD event requires that the host is resumed to
send SMP commands. However, in resuming the host, the phyup events
generated from re-enabling the phys are processed in the same workqueue as
the original PORTE_BROADCAST_RCVD event. As such, the host will never
finish resuming (as it waits for the phyup event processing), and then the
PORTE_BROADCAST_RCVD event can't be processed as the SMP commands are
blocked, and so we have a deadlock.  Solve this problem by ensuring that
libsas keeps the host active until completely finished phy or port events,
such as PORTE_BYTES_DMAED. As such, we don't have to worry about resuming
the host for processing individual SMP commands in this example.

Link: https://lore.kernel.org/r/1639999298-244569-15-git-send-email-chenxiang66@hisilicon.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hisi_sas: Keep controller active between ISR of phyup and the event being processed
Xiang Chen [Mon, 20 Dec 2021 11:21:36 +0000 (19:21 +0800)]
scsi: hisi_sas: Keep controller active between ISR of phyup and the event being processed

It is possible that controller may become suspended between processing a
phyup interrupt and the event being processed by libsas. As such, we can't
ensure the controller is active when processing the phyup event - this may
cause the phyup event to be lost or other issues.  To avoid any possible
issues, add pm_runtime_get_noresume() in phyup interrupt handler and
pm_runtime_put_sync() in the work handler exit to ensure that we stay
always active. Since we only want to call pm_runtime_get_noresume() for v3
hw, signal this will a new event, HISI_PHYE_PHY_UP_PM.

Link: https://lore.kernel.org/r/1639999298-244569-14-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: libsas: Defer works of new phys during suspend
Xiang Chen [Mon, 20 Dec 2021 11:21:35 +0000 (19:21 +0800)]
scsi: libsas: Defer works of new phys during suspend

During the processing of event PORT_BYTES_DMAED, the driver queues work
DISCE_DISCOVER_DOMAIN and then flushes workqueue ha->disco_q.  If a new
phyup event occurs during resuming the controller, the work
PORTE_BYTES_DMAED of new phy occurs before suspended phy's. The work
DISCE_DISCOVER_DOMAIN of new phy requires an active SAS controller (it
needs to resume SAS controller by function scsi_sysfs_add_sdev() and some
other functions such as function add_device_link()). However, the
activation of the SAS controller requires completion of work
PORTE_BYTES_DMAED of suspended phys while it is blocked by new phy's work
on ha->event_q. So there is a deadlock and it is released only after resume
timeout.

To solve the issue, defer works of new phys during suspend and queue those
defer works after SAS controller becomes active.

Link: https://lore.kernel.org/r/1639999298-244569-13-git-send-email-chenxiang66@hisilicon.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: libsas: Refactor sas_queue_deferred_work()
Xiang Chen [Mon, 20 Dec 2021 11:21:34 +0000 (19:21 +0800)]
scsi: libsas: Refactor sas_queue_deferred_work()

In the second part of function __sas_drain_work(), deferred work is queued.
This functionality is required other places so factor it out into the
function sas_queue_deferred_work().

Link: https://lore.kernel.org/r/1639999298-244569-12-git-send-email-chenxiang66@hisilicon.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: libsas: Add flag SAS_HA_RESUMING
Xiang Chen [Mon, 20 Dec 2021 11:21:33 +0000 (19:21 +0800)]
scsi: libsas: Add flag SAS_HA_RESUMING

Add a flag SAS_HA_RESUMING and use it to indicate the state of resuming the
host controller.

Link: https://lore.kernel.org/r/1639999298-244569-11-git-send-email-chenxiang66@hisilicon.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: libsas: Resume host while sending SMP I/Os
Xiang Chen [Mon, 20 Dec 2021 11:21:32 +0000 (19:21 +0800)]
scsi: libsas: Resume host while sending SMP I/Os

When sending SMP I/Os to the host we need to ensure that the host is not
suspended and can process the commands. This is a better approach than
replying on the host to resume itself to handle such commands. Use
pm_runtime_get_sync() and pm_runtime_put_sync() calls for the host when
executing SMP I/Os.

Link: https://lore.kernel.org/r/1639999298-244569-10-git-send-email-chenxiang66@hisilicon.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hisi_sas: Add more logs for runtime suspend/resume
Xiang Chen [Mon, 20 Dec 2021 11:21:31 +0000 (19:21 +0800)]
scsi: hisi_sas: Add more logs for runtime suspend/resume

Add some logs at the beginning and end of suspend/resume.

Link: https://lore.kernel.org/r/1639999298-244569-9-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: libsas: Insert PORTE_BROADCAST_RCVD event for resuming host
Xiang Chen [Mon, 20 Dec 2021 11:21:30 +0000 (19:21 +0800)]
scsi: libsas: Insert PORTE_BROADCAST_RCVD event for resuming host

If a new disk is inserted through an expander when the host was suspended,
it will not necessarily be detected as the topology is not re-scanned
during resume.  To detect possible changes in topology during suspension,
insert a PORTE_BROADCAST_RCVD event per port when resuming to trigger a
revalidation.

Link: https://lore.kernel.org/r/1639999298-244569-8-git-send-email-chenxiang66@hisilicon.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: mvsas: Add spin_lock/unlock() to protect asd_sas_port->phy_list
Xiang Chen [Mon, 20 Dec 2021 11:21:29 +0000 (19:21 +0800)]
scsi: mvsas: Add spin_lock/unlock() to protect asd_sas_port->phy_list

phy_list_lock is not held when using asd_sas_port->phy_list in the mvsas
driver. Add spin_lock/unlock in those places.

Link: https://lore.kernel.org/r/1639999298-244569-7-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hisi_sas: Fix some issues related to asd_sas_port->phy_list
Xiang Chen [Mon, 20 Dec 2021 11:21:28 +0000 (19:21 +0800)]
scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_list

Most places that use asd_sas_port->phy_list are protected by spinlock
asd_sas_port->phy_list_lock, however there are still some places which miss
grabbing the lock. Add it in function hisi_sas_refresh_port_id() when
accessing asd_sas_port->phy_list. This carries a risk that list mutates
while at the same time dropping the lock in function
hisi_sas_send_ata_reset_each_phy(). Read asd_sas_port->phy_mask instead of
accessing asd_sas_port->phy_list to avoid this risk.

Link: https://lore.kernel.org/r/1639999298-244569-6-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: libsas: Add spin_lock/unlock() to protect asd_sas_port->phy_list
Xiang Chen [Mon, 20 Dec 2021 11:21:27 +0000 (19:21 +0800)]
scsi: libsas: Add spin_lock/unlock() to protect asd_sas_port->phy_list

Most places that use asd_sas_port->phy_list in libsas are protected by
spinlock asd_sas_port->phy_list_lock. However, there are still a few places
which miss the lock. Add it in those places.

Link: https://lore.kernel.org/r/1639999298-244569-5-git-send-email-chenxiang66@hisilicon.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume()
Alan Stern [Mon, 20 Dec 2021 11:21:26 +0000 (19:21 +0800)]
scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume()

John Garry reported a deadlock that occurs when trying to access a
runtime-suspended SATA device.  For obscure reasons, the rescan procedure
causes the link to be hard-reset, which disconnects the device.

The rescan tries to carry out a runtime resume when accessing the device.
scsi_rescan_device() holds the SCSI device lock and won't release it until
it can put commands onto the device's block queue.  This can't happen until
the queue is successfully runtime-resumed or the device is unregistered.
But the runtime resume fails because the device is disconnected, and
__scsi_remove_device() can't do the unregistration because it can't get the
device lock.

The best way to resolve this deadlock appears to be to allow the block
queue to start running again even after an unsuccessful runtime resume.
The idea is that the driver or the SCSI error handler will need to be able
to use the queue to resolve the runtime resume failure.

This patch removes the err argument to blk_post_runtime_resume() and makes
the routine act as though the resume was successful always.  This fixes the
deadlock.

Link: https://lore.kernel.org/r/1639999298-244569-4-git-send-email-chenxiang66@hisilicon.com
Fixes: e27829dc92e5 ("scsi: serialize ->rescan against ->remove")
Reported-and-tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: Revert "scsi: hisi_sas: Filter out new PHY up events during suspend"
John Garry [Mon, 20 Dec 2021 11:21:25 +0000 (19:21 +0800)]
scsi: Revert "scsi: hisi_sas: Filter out new PHY up events during suspend"

This reverts commit b14a37e011d829404c29a5ae17849d7efb034893.

In that commit, we had to filter out phy-up events during suspend, as it
work cause a deadlock between processing the phyup event and the resume HA
function try to drain the HA event workqueue to complete the resume
process.

Now that we no longer try to drain the HA event queue during the HA resume
processor, the deadlock would not occur, so remove the special handling for
it.

Link: https://lore.kernel.org/r/1639999298-244569-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: libsas: Don't always drain event workqueue for HA resume
John Garry [Mon, 20 Dec 2021 11:21:24 +0000 (19:21 +0800)]
scsi: libsas: Don't always drain event workqueue for HA resume

For the hisi_sas driver, if a directly attached disk is removed during
suspend, a hang will occur in the resume process:

The background is that in commit 16fd4a7c5917 ("scsi: hisi_sas: Add device
link between SCSI devices and hisi_hba"), it is ensured that the HBA device
cannot be runtime suspended when any SCSI device associated is active.

Other drivers which use libsas don't worry about this as none support
runtime suspend.

The mentioned hang occurs when an disk is removed during suspend. In the
removal process - from PHYE_RESUME_TIMEOUT event processing - we call into
scsi_remove_device(), which is being processed in the HA event workqueue.
Here we wait for all suppliers of the SCSI device to resume, which includes
the HBA device (from the above commit). However the HBA device cannot
resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed (from
calling sas_resume_ha() -> sas_drain_work()). This is the deadlock.

There does not appear to be any need for the sas_drain_work() to be called
at all in sas_resume_ha() as it is not syncing against anything, so allow
LLDDs to avoid this by providing a variant of sas_resume_ha() which does
"sync", i.e. doesn't drain the event workqueue.

Link: https://lore.kernel.org/r/1639999298-244569-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: libsas: Decode SAM status and host byte codes
John Garry [Wed, 15 Dec 2021 14:37:41 +0000 (22:37 +0800)]
scsi: libsas: Decode SAM status and host byte codes

Value 0 is used for SAM status and libsas exec_status bytes codes in
sas_end_task() - use defined macros instead. In addition, change to proper
enum types.

Also replace SAM_STAT_CHECK_CONDITION with SAS_SAM_STAT_CHECK_CONDITION,
the former being a proper member of enum exec_status.

Link: https://lore.kernel.org/r/1639579061-179473-9-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hisi_sas: Fix phyup timeout on FPGA
Qi Liu [Wed, 15 Dec 2021 14:37:40 +0000 (22:37 +0800)]
scsi: hisi_sas: Fix phyup timeout on FPGA

The OOB interrupt and phyup interrupt handlers may run out-of-order in high
CPU usage scenarios. Since the hisi_sas_phy.timer is added in
hisi_sas_phy_oob_ready() and disarmed in phy_up_v3_hw(), this out-of-order
execution will cause hisi_sas_phy.timer timeout to trigger.

To solve, protect hisi_sas_phy.timer and .attached with a lock, and ensure
that the timer won't be added after phyup handler completes.

Link: https://lore.kernel.org/r/1639579061-179473-8-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hisi_sas: Prevent parallel FLR and controller reset
Qi Liu [Wed, 15 Dec 2021 14:37:39 +0000 (22:37 +0800)]
scsi: hisi_sas: Prevent parallel FLR and controller reset

If we issue a controller reset command during executing a FLR a hung task
may be found:

 Call trace:
  __switch_to+0x158/0x1cc
  __schedule+0x2e8/0x85c
  schedule+0x7c/0x110
  schedule_timeout+0x190/0x1cc
  __down+0x7c/0xd4
  down+0x5c/0x7c
  hisi_sas_task_exec+0x510/0x680 [hisi_sas_main]
  hisi_sas_queue_command+0x24/0x30 [hisi_sas_main]
  smp_execute_task_sg+0xf4/0x23c [libsas]
  sas_smp_phy_control+0x110/0x1e0 [libsas]
  transport_sas_phy_reset+0xc8/0x190 [libsas]
  phy_reset_work+0x2c/0x40 [libsas]
  process_one_work+0x1dc/0x48c
  worker_thread+0x15c/0x464
  kthread+0x160/0x170
  ret_from_fork+0x10/0x18

This is a race condition which occurs when the FLR completes first.

Here the host HISI_SAS_RESETTING_BIT flag out gets of sync as
HISI_SAS_RESETTING_BIT is not always cleared with the hisi_hba.sem held, so
now only set/unset HISI_SAS_RESETTING_BIT under hisi_hba.sem .

Link: https://lore.kernel.org/r/1639579061-179473-7-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hisi_sas: Prevent parallel controller reset and control phy command
Qi Liu [Wed, 15 Dec 2021 14:37:38 +0000 (22:37 +0800)]
scsi: hisi_sas: Prevent parallel controller reset and control phy command

A user may issue a control phy command from sysfs at any time, even if the
controller is resetting.

If a phy is disabled by hardreset/linkreset command before calling
get_phys_state() in the reset path, the saved phy state may be incorrect.

To avoid incorrectly recording the phy state, use hisi_hba.sem to ensure
that the controller reset may not run at the same time as when the phy
control function is running.

Link: https://lore.kernel.org/r/1639579061-179473-6-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hisi_sas: Factor out task prep and delivery code
John Garry [Wed, 15 Dec 2021 14:37:37 +0000 (22:37 +0800)]
scsi: hisi_sas: Factor out task prep and delivery code

The task prep code is the same between the normal path (in
hisi_sas_task_prep()) and the internal abort path, so factor is out into a
common function.

Link: https://lore.kernel.org/r/1639579061-179473-5-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hisi_sas: Pass abort structure for internal abort
John Garry [Wed, 15 Dec 2021 14:37:36 +0000 (22:37 +0800)]
scsi: hisi_sas: Pass abort structure for internal abort

To help factor out code in future, it's useful to know if we're executing
an internal abort, so pass a pointer to the structure. The idea is that a
NULL pointer means not an internal abort.

Link: https://lore.kernel.org/r/1639579061-179473-4-git-send-email-john.garry@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hisi_sas: Make internal abort have no task proto
John Garry [Wed, 15 Dec 2021 14:37:35 +0000 (22:37 +0800)]
scsi: hisi_sas: Make internal abort have no task proto

For an internal abort, the task does not have a protocol, so set to none.

This will make it easier to differentiate internal abort tasks in future.

Link: https://lore.kernel.org/r/1639579061-179473-3-git-send-email-john.garry@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hisi_sas: Start delivery hisi_sas_task_exec() directly
John Garry [Wed, 15 Dec 2021 14:37:34 +0000 (22:37 +0800)]
scsi: hisi_sas: Start delivery hisi_sas_task_exec() directly

Currently we start delivery of commands to the DQ after returning from
hisi_sas_task_exec() with success.

Let's just start delivery directly in that function without having to check
if some local variable is set.

Link: https://lore.kernel.org/r/1639579061-179473-2-git-send-email-john.garry@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: efct: Don't pass GFP_DMA to dma_alloc_coherent()
Christoph Hellwig [Tue, 14 Dec 2021 16:36:05 +0000 (17:36 +0100)]
scsi: efct: Don't pass GFP_DMA to dma_alloc_coherent()

dma_alloc_coherent() ignores the zone specifiers so this is pointless and
confusing.

Link: https://lore.kernel.org/r/20211214163605.416288-1-hch@lst.de
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: core: Fix deadlock issue in ufshcd_wait_for_doorbell_clr()
Bean Huo [Tue, 14 Dec 2021 12:05:37 +0000 (13:05 +0100)]
scsi: ufs: core: Fix deadlock issue in ufshcd_wait_for_doorbell_clr()

Call shost_for_each_device() with holding host->host_lock will cause a
deadlock situation, which will cause the system to stall (the log as
follow). Fix this issue by using __shost_for_each_device() in
ufshcd_pending_cmds().

stalls on CPUs/tasks:
all trace:
__switch_to+0x120/0x170
0xffff800011643998
ask dump for CPU 5:
ask:kworker/u16:2   state:R  running task     stack:    0 pid:   80 ppid:     2 flags:0x0000000a
orkqueue: events_unbound async_run_entry_fn
all trace:
__switch_to+0x120/0x170
0x0
ask dump for CPU 6:
ask:kworker/u16:6   state:R  running task     stack:    0 pid:  164 ppid:     2 flags:0x0000000a
orkqueue: events_unbound async_run_entry_fn
all trace:
__switch_to+0x120/0x170
0xffff54e7c4429f80
ask dump for CPU 7:
ask:kworker/u16:4   state:R  running task     stack:    0 pid:  153 ppid:     2 flags:0x0000000a
orkqueue: events_unbound async_run_entry_fn
all trace:
__switch_to+0x120/0x170
blk_mq_run_hw_queue+0x34/0x110
blk_mq_sched_insert_request+0xb0/0x120
blk_execute_rq_nowait+0x68/0x88
blk_execute_rq+0x4c/0xd8
__scsi_execute+0xec/0x1d0
scsi_vpd_inquiry+0x84/0xf0
scsi_get_vpd_buf+0x34/0xb8
scsi_attach_vpd+0x34/0x140
scsi_probe_and_add_lun+0xa6c/0xab8
__scsi_scan_target+0x438/0x4f8
scsi_scan_channel+0x6c/0xa8
scsi_scan_host_selected+0xf0/0x150
do_scsi_scan_host+0x88/0x90
scsi_scan_host+0x1b4/0x1d0
ufshcd_async_scan+0x248/0x310
async_run_entry_fn+0x30/0x178
process_one_work+0x1e8/0x368
worker_thread+0x40/0x478
kthread+0x174/0x180
ret_from_fork+0x10/0x20

Link: https://lore.kernel.org/r/20211214120537.531628-1-huobean@gmail.com
Fixes: 8d077ede48c1 ("scsi: ufs: Optimize the command queueing code")
Reported-by: YongQin Liu <yongqin.liu@linaro.org>
Reported-by: Amit Pundir <amit.pundir@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Co-developed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: qla2xxx: Synchronize rport dev_loss_tmo setting
Hannes Reinecke [Tue, 14 Dec 2021 11:11:39 +0000 (12:11 +0100)]
scsi: qla2xxx: Synchronize rport dev_loss_tmo setting

Currently, the dev_loss_tmo setting is only ever used for SCSI
devices. This patch reshuffles initialisation such that the SCSI remote
ports are registered before the NVMe ones, allowing the dev_loss_tmo
setting to be synchronized between SCSI and NVMe.

Link: https://lore.kernel.org/r/20211214111139.52503-1-dwagner@suse.de
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoMerge branch '5.16/scsi-fixes' into 5.17/scsi-staging
Martin K. Petersen [Fri, 17 Dec 2021 03:38:19 +0000 (22:38 -0500)]
Merge branch '5.16/scsi-fixes' into 5.17/scsi-staging

Pull in the 5.16 fixes branch to resolve a conflict in the UFS driver
core.

Conflicts:
drivers/scsi/ufs/ufshcd.c

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: hpsa: Remove an unused variable in hpsa_update_scsi_devices()
Christophe JAILLET [Thu, 9 Dec 2021 21:11:56 +0000 (22:11 +0100)]
scsi: hpsa: Remove an unused variable in hpsa_update_scsi_devices()

'lunzerobits' is unused. Remove it.

This a left over of commit 2d62a33e05d4 ("hpsa: eliminate fake lun0
enclosures")

Link: https://lore.kernel.org/r/9f80ea569867b5f7ae1e0f99d656e5a8bacad34e.1639084205.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: lpfc: Use struct_group to isolate cast to larger object
Kees Cook [Fri, 3 Dec 2021 22:33:51 +0000 (14:33 -0800)]
scsi: lpfc: Use struct_group to isolate cast to larger object

When building under -Warray-bounds, a warning is generated when casting a
u32 into MAILBOX_t (which is larger). This warning is conservative, but
it's not an unreasonable change to make to improve future robustness. Use a
tagged struct_group that can refer to either the specific fields or the
first u32 separately, silencing this warning:

drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_reset_barrier':
drivers/scsi/lpfc/lpfc_sli.c:4787:29: error: array subscript 'MAILBOX_t[0]' is partly outside array bounds of 'volatile uint32_t[1]' {aka 'volatile unsigned int[1]'} [-Werror=array-bounds]
 4787 |         ((MAILBOX_t *)&mbox)->mbxCommand = MBX_KILL_BOARD;
      |                             ^~
drivers/scsi/lpfc/lpfc_sli.c:4752:27: note: while referencing 'mbox'
 4752 |         volatile uint32_t mbox;
      |                           ^~~~

There is no change to the resulting executable instruction code.

Link: https://lore.kernel.org/r/20211203223351.107323-1-keescook@chromium.org
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: lpfc: Use struct_group() to initialize struct lpfc_cgn_info
Kees Cook [Wed, 8 Dec 2021 19:59:57 +0000 (11:59 -0800)]
scsi: lpfc: Use struct_group() to initialize struct lpfc_cgn_info

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add struct_group() to mark "stat" region of struct lpfc_cgn_info that
should be initialized to zero, and refactor the "data" region memset()
to wipe everything up to the cgn_stats region.

Link: https://lore.kernel.org/r/20211208195957.1603092-1-keescook@chromium.org
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: qla2xxx: Format log strings only if needed
Roman Bolshakov [Fri, 12 Nov 2021 14:54:46 +0000 (17:54 +0300)]
scsi: qla2xxx: Format log strings only if needed

Commit 598a90f2002c ("scsi: qla2xxx: add ring buffer for tracing debug
logs") introduced unconditional log string formatting to ql_dbg() even if
ql_dbg_log event is disabled. It harms performance because some strings are
formatted in fastpath and/or interrupt context.

Link: https://lore.kernel.org/r/20211112145446.51210-1-r.bolshakov@yadro.com
Fixes: 598a90f2002c ("scsi: qla2xxx: add ring buffer for tracing debug logs")
Cc: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: lpfc: Update lpfc version to 14.0.0.4
James Smart [Sat, 4 Dec 2021 00:26:44 +0000 (16:26 -0800)]
scsi: lpfc: Update lpfc version to 14.0.0.4

Update lpfc version to 14.0.0.4.

Link: https://lore.kernel.org/r/20211204002644.116455-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: lpfc: Add additional debugfs support for CMF
James Smart [Sat, 4 Dec 2021 00:26:43 +0000 (16:26 -0800)]
scsi: lpfc: Add additional debugfs support for CMF

Dump raw CMF parameter information in debugfs cgn_buffer.

Link: https://lore.kernel.org/r/20211204002644.116455-9-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: lpfc: Cap CMF read bytes to MBPI
James Smart [Sat, 4 Dec 2021 00:26:42 +0000 (16:26 -0800)]
scsi: lpfc: Cap CMF read bytes to MBPI

Ensure read bytes data does not go over MBPI for CMF timer intervals that
are purposely shortened.

Link: https://lore.kernel.org/r/20211204002644.116455-8-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: lpfc: Adjust CMF total bytes and rxmonitor
James Smart [Sat, 4 Dec 2021 00:26:41 +0000 (16:26 -0800)]
scsi: lpfc: Adjust CMF total bytes and rxmonitor

Calculate any extra bytes needed to account for timer accuracy. If we are
less than LPFC_CMF_INTERVAL, then calculate the adjustment needed for total
to reflect a full LPFC_CMF_INTERVAL.

Add additional info to rxmonitor, and adjust some log formatting.

Link: https://lore.kernel.org/r/20211204002644.116455-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: lpfc: Trigger SLI4 firmware dump before doing driver cleanup
James Smart [Sat, 4 Dec 2021 00:26:40 +0000 (16:26 -0800)]
scsi: lpfc: Trigger SLI4 firmware dump before doing driver cleanup

Extraneous teardown routines are present in the firmware dump path causing
altered states in firmware captures.

When a firmware dump is requested via sysfs, trigger the dump immediately
without tearing down structures and changing adapter state.

The driver shall rely on pre-existing firmware error state clean up
handlers to restore the adapter.

Link: https://lore.kernel.org/r/20211204002644.116455-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: lpfc: Fix NPIV port deletion crash
James Smart [Sat, 4 Dec 2021 00:26:39 +0000 (16:26 -0800)]
scsi: lpfc: Fix NPIV port deletion crash

The driver is calling schedule_timeout after the DA_ID nameserver request
and LOGO commands are issued to the fabric by the initiator virtual
endport.  These fixed delay functions are causing long delays in the
driver's worker thread when processing discovery I/Os in a serialized
fashion, which is then triggering mailbox timeout errors artificially.

To fix this, don't wait on the DA_ID request to complete and call
wait_event_timeout to allow the vport delete thread to make progress on an
event driven basis rather than fixing the wait time.

Link: https://lore.kernel.org/r/20211204002644.116455-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: lpfc: Fix lpfc_force_rscn ndlp kref imbalance
James Smart [Sat, 4 Dec 2021 00:26:38 +0000 (16:26 -0800)]
scsi: lpfc: Fix lpfc_force_rscn ndlp kref imbalance

Issuing lpfc_force_rscn twice results in an ndlp kref use-after-free call
trace.

A prior patch reworked the get/put handling by ensuring nlp_get was done
before WQE submission and a put was done in the completion path.
Unfortunately, the issue_els_rscn path had a piece of legacy code that did
a nlp_put, causing an imbalance on the ref counts.

Fixed by removing the unnecessary legacy code snippet.

Link: https://lore.kernel.org/r/20211204002644.116455-4-jsmart2021@gmail.com
Fixes: 4430f7fd09ec ("scsi: lpfc: Rework locations of ndlp reference taking")
Cc: <stable@vger.kernel.org> # v5.11+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: lpfc: Change return code on I/Os received during link bounce
James Smart [Sat, 4 Dec 2021 00:26:37 +0000 (16:26 -0800)]
scsi: lpfc: Change return code on I/Os received during link bounce

During heavy I/O testing with issue_lip to bounce the link, occasionally
I/O is terminated with status 3 result 9, which means the RPI is suspended.
The I/O is completed and this type of error will result in immediate retry
by the SCSI layer. The retry count expires and the I/O fails and returns
error to the application.

To avoid these quick retry/retries exhausted scenarios change the return
code given to the midlayer to DID_REQUEUE rather than DID_ERROR. This gets
them retried, and eventually succeed when the link recovers.

Link: https://lore.kernel.org/r/20211204002644.116455-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: lpfc: Fix leaked lpfc_dmabuf mbox allocations with NPIV
James Smart [Sat, 4 Dec 2021 00:26:36 +0000 (16:26 -0800)]
scsi: lpfc: Fix leaked lpfc_dmabuf mbox allocations with NPIV

During rmmod testing, messages appeared indicating lpfc_mbuf_pool entries
were still busy. This situation was only seen doing rmmod after at least 1
vport (NPIV) instance was created and destroyed. The number of messages
scaled with the number of vports created.

When a vport is created, it can receive a PLOGI from another initiator
Nport.  When this happens, the driver prepares to ack the PLOGI and
prepares an RPI for registration (via mbx cmd) which includes an mbuf
allocation. During the unsolicited PLOGI processing and after the RPI
preparation, the driver recognizes it is one of the vport instances and
decides to reject the PLOGI. During the LS_RJT preparation for the PLOGI,
the mailbox struct allocated for RPI registration is freed, but the mbuf
that was also allocated is not released.

Fix by freeing the mbuf with the mailbox struct in the LS_RJT path.

As part of the code review to figure the issue out a couple of other areas
where found that also would not have released the mbuf. Those are cleaned
up as well.

Link: https://lore.kernel.org/r/20211204002644.116455-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Implement polling support
Bart Van Assche [Fri, 3 Dec 2021 23:19:50 +0000 (15:19 -0800)]
scsi: ufs: Implement polling support

The time spent in io_schedule() and also the interrupt latency are
significant when submitting direct I/O to a UFS device. Hence this patch
that implements polling support. User space software can enable polling by
passing the RWF_HIPRI flag to the preadv2() system call or the
IORING_SETUP_IOPOLL flag to the io_uring interface.

Although the block layer supports to partition the tag space for
interrupt-based completions (HCTX_TYPE_DEFAULT) purposes and polling
(HCTX_TYPE_POLL), the choice has been made to use the same hardware queue
for both hctx types because partitioning the tag space would negatively
affect performance.

On my test setup this patch increases IOPS from 2736 to 22000 (8x) for the
following test:

for hipri in 0 1; do
    fio --ioengine=io_uring --iodepth=1 --rw=randread \
    --runtime=60 --time_based=1 --direct=1 --name=qd1 \
    --filename=/dev/block/sda --ioscheduler=none --gtod_reduce=1 \
    --norandommap --hipri=$hipri
done

Link: https://lore.kernel.org/r/20211203231950.193369-18-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Optimize the command queueing code
Bart Van Assche [Fri, 3 Dec 2021 23:19:49 +0000 (15:19 -0800)]
scsi: ufs: Optimize the command queueing code

Remove the clock scaling lock from ufshcd_queuecommand() since it is a
performance bottleneck. Instead check the SCSI device budget bitmaps in the
code that waits for ongoing ufshcd_queuecommand() calls. A bit is set in
sdev->budget_map just before scsi_queue_rq() is called and a bit is cleared
from that bitmap if scsi_queue_rq() does not submit the request or after
the request has finished. See also the blk_mq_{get,put}_dispatch_budget()
calls in the block layer.

There is no risk for a livelock since the block layer delays queue reruns
if queueing a request fails because the SCSI host has been blocked.

Link: https://lore.kernel.org/r/20211203231950.193369-17-bvanassche@acm.org
Cc: Asutosh Das (asd) <asutoshd@codeaurora.org>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Stop using the clock scaling lock in the error handler
Bart Van Assche [Fri, 3 Dec 2021 23:19:48 +0000 (15:19 -0800)]
scsi: ufs: Stop using the clock scaling lock in the error handler

Instead of locking and unlocking the clock scaling lock, surround the
command queueing code with an RCU reader lock and call synchronize_rcu().
This patch prepares for removal of the clock scaling lock.

Link: https://lore.kernel.org/r/20211203231950.193369-16-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Fix a kernel crash during shutdown
Bart Van Assche [Fri, 3 Dec 2021 23:19:47 +0000 (15:19 -0800)]
scsi: ufs: Fix a kernel crash during shutdown

Fix the following kernel crash:

Unable to handle kernel paging request at virtual address ffffffc91e735000
Call trace:
 __queue_work+0x26c/0x624
 queue_work_on+0x6c/0xf0
 ufshcd_hold+0x12c/0x210
 __ufshcd_wl_suspend+0xc0/0x400
 ufshcd_wl_shutdown+0xb8/0xcc
 device_shutdown+0x184/0x224
 kernel_restart+0x4c/0x124
 __arm64_sys_reboot+0x194/0x264
 el0_svc_common+0xc8/0x1d4
 do_el0_svc+0x30/0x8c
 el0_svc+0x20/0x30
 el0_sync_handler+0x84/0xe4
 el0_sync+0x1bc/0x1c0

Fix this crash by ungating the clock before destroying the work queue on
which clock gating work is queued.

Link: https://lore.kernel.org/r/20211203231950.193369-15-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Improve SCSI abort handling further
Bart Van Assche [Fri, 3 Dec 2021 23:19:46 +0000 (15:19 -0800)]
scsi: ufs: Improve SCSI abort handling further

Release resources when aborting a command. Make sure that aborted commands
are completed once by clearing the corresponding tag bit from
hba->outstanding_reqs. This patch is an improved version of commit
3ff1f6b6ba6f ("scsi: ufs: core: Improve SCSI abort handling").

Link: https://lore.kernel.org/r/20211203231950.193369-14-bvanassche@acm.org
Fixes: 7a3e97b0dc4b ("[SCSI] ufshcd: UFS Host controller driver")
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Introduce ufshcd_release_scsi_cmd()
Bart Van Assche [Fri, 3 Dec 2021 23:19:45 +0000 (15:19 -0800)]
scsi: ufs: Introduce ufshcd_release_scsi_cmd()

The only functional change in this patch is that scsi_done() is now called
after ufshcd_release() and ufshcd_clk_scaling_update_busy() instead of
before.

The next patch in this series will introduce a call to
ufshcd_release_scsi_cmd() in the abort handler.

Link: https://lore.kernel.org/r/20211203231950.193369-13-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Remove the 'update_scaling' local variable
Bart Van Assche [Fri, 3 Dec 2021 23:19:44 +0000 (15:19 -0800)]
scsi: ufs: Remove the 'update_scaling' local variable

This patch does not change any functionality but makes the next patch in
this series easier to read.

Link: https://lore.kernel.org/r/20211203231950.193369-12-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Remove hba->cmd_queue
Bart Van Assche [Fri, 3 Dec 2021 23:19:43 +0000 (15:19 -0800)]
scsi: ufs: Remove hba->cmd_queue

The previous patch removed all code that uses hba->cmd_queue. Hence also
remove hba->cmd_queue itself.

Link: https://lore.kernel.org/r/20211203231950.193369-11-bvanassche@acm.org
Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Fix a deadlock in the error handler
Bart Van Assche [Fri, 3 Dec 2021 23:19:42 +0000 (15:19 -0800)]
scsi: ufs: Fix a deadlock in the error handler

The following deadlock has been observed on a test setup:

 - All tags allocated

 - The SCSI error handler calls ufshcd_eh_host_reset_handler()

 - ufshcd_eh_host_reset_handler() queues work that calls
   ufshcd_err_handler()

 - ufshcd_err_handler() locks up as follows:

Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt
Call trace:
 __switch_to+0x298/0x5d8
 __schedule+0x6cc/0xa94
 schedule+0x12c/0x298
 blk_mq_get_tag+0x210/0x480
 __blk_mq_alloc_request+0x1c8/0x284
 blk_get_request+0x74/0x134
 ufshcd_exec_dev_cmd+0x68/0x640
 ufshcd_verify_dev_init+0x68/0x35c
 ufshcd_probe_hba+0x12c/0x1cb8
 ufshcd_host_reset_and_restore+0x88/0x254
 ufshcd_reset_and_restore+0xd0/0x354
 ufshcd_err_handler+0x408/0xc58
 process_one_work+0x24c/0x66c
 worker_thread+0x3e8/0xa4c
 kthread+0x150/0x1b4
 ret_from_fork+0x10/0x30

Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved
request.

Link: https://lore.kernel.org/r/20211203231950.193369-10-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Rework ufshcd_change_queue_depth()
Bart Van Assche [Fri, 3 Dec 2021 23:19:41 +0000 (15:19 -0800)]
scsi: ufs: Rework ufshcd_change_queue_depth()

Prepare for making sdev->host->can_queue less than hba->nutrs. This patch
does not change any functionality.

Link: https://lore.kernel.org/r/20211203231950.193369-9-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Remove ufshcd_any_tag_in_use()
Bart Van Assche [Fri, 3 Dec 2021 23:19:40 +0000 (15:19 -0800)]
scsi: ufs: Remove ufshcd_any_tag_in_use()

Use hba->outstanding_reqs instead of ufshcd_any_tag_in_use(). This patch
prepares for removal of the blk_mq_start_request() call from
ufshcd_wait_for_dev_cmd(). blk_mq_tagset_busy_iter() only iterates over
started requests.

Link: https://lore.kernel.org/r/20211203231950.193369-8-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Fix race conditions related to driver data
Bart Van Assche [Fri, 3 Dec 2021 23:19:39 +0000 (15:19 -0800)]
scsi: ufs: Fix race conditions related to driver data

The driver data pointer must be set before any callbacks are registered
that use that pointer. Hence move the initialization of that pointer from
after the ufshcd_init() call to inside ufshcd_init().

Link: https://lore.kernel.org/r/20211203231950.193369-7-bvanassche@acm.org
Fixes: 3b1d05807a9a ("[SCSI] ufs: Segregate PCI Specific Code")
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Remove dead code
Bart Van Assche [Fri, 3 Dec 2021 23:19:38 +0000 (15:19 -0800)]
scsi: ufs: Remove dead code

Commit 7252a3603015 ("scsi: ufs: Avoid busy-waiting by eliminating tag
conflicts") guarantees that 'tag' is not in use by any SCSI command.
Remove the check that returns early if a conflict occurs.

Link: https://lore.kernel.org/r/20211203231950.193369-6-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Remove the sdev_rpmb member
Bart Van Assche [Fri, 3 Dec 2021 23:19:37 +0000 (15:19 -0800)]
scsi: ufs: Remove the sdev_rpmb member

Since the sdev_rpmb member of struct ufs_hba is only used inside
ufshcd_scsi_add_wlus(), convert it into a local variable.

Link: https://lore.kernel.org/r/20211203231950.193369-5-bvanassche@acm.org
Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Remove is_rpmb_wlun()
Bart Van Assche [Fri, 3 Dec 2021 23:19:36 +0000 (15:19 -0800)]
scsi: ufs: Remove is_rpmb_wlun()

Commit edc0596cc04b ("scsi: ufs: core: Stop clearing UNIT ATTENTIONS")
removed all callers of is_rpmb_wlun(). Hence also remove the function
itself.

Link: https://lore.kernel.org/r/20211203231950.193369-4-bvanassche@acm.org
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: ufs: Rename a function argument
Bart Van Assche [Fri, 3 Dec 2021 23:19:35 +0000 (15:19 -0800)]
scsi: ufs: Rename a function argument

The new name makes it clear what the meaning of the function argument is.

Link: https://lore.kernel.org/r/20211203231950.193369-3-bvanassche@acm.org
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Chanho Park <chanho61.park@samsung.com>
Reviewed-by: Keoseong Park <keosung.park@samsung.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Acked-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: core: Fix scsi_device_max_queue_depth()
Bart Van Assche [Fri, 3 Dec 2021 23:19:34 +0000 (15:19 -0800)]
scsi: core: Fix scsi_device_max_queue_depth()

The comment above scsi_device_max_queue_depth() and also the description of
commit ca4453213951 ("scsi: core: Make sure sdev->queue_depth is <=
max(shost->can_queue, 1024)") contradict the implementation of the function
scsi_device_max_queue_depth(). Additionally, the maximum queue depth of a
SCSI LUN never exceeds host->can_queue. Fix scsi_device_max_queue_depth()
by changing max_t() into min_t().

Link: https://lore.kernel.org/r/20211203231950.193369-2-bvanassche@acm.org
Fixes: ca4453213951 ("scsi: core: Make sure sdev->queue_depth is <= max(shost->can_queue, 1024)")
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>