www.infradead.org Git - users/jedix/linux-maple.git/log

Merge branch '5.15/scsi-fixes' into 5.16/scsi-staging

Merge the 5.15/scsi-fixes branch into the staging tree to resolve UFS
conflict reported by sfr.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Update version to 2.1.12-055

Update driver version to reflect changes.

Link: https://lore.kernel.org/r/20210928235442.201875-12-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Add 3252-8i PCI id

Add PCI ID information for the Adaptec SmartRAID 3252-8i controller:

9005 / 028F / 9005 / 14A2

Link: https://lore.kernel.org/r/20210928235442.201875-11-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Mike McGowen <Mike.McGowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Fix duplicate device nodes for tape changers

Stop the OS from re-discovering multiple LUNs for tape drive and medium
changer.

Duplicate device nodes for Ultrium tape drive and medium changer are being
created.

The Ultrium tape drive is a multi-LUN SCSI target. It presents a LUN for
the tape drive and a 2nd LUN for the medium changer. Our controller FW
lists both LUNs in the RPL results.

As a result, the smartpqi driver exposes both devices to the OS. Then the
OS does its normal device discovery via the SCSI REPORT LUNS command, which
causes it to re-discover both devices a 2nd time, which results in the
duplicate device nodes.

Link: https://lore.kernel.org/r/20210928235442.201875-10-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Fix boot failure during LUN rebuild

Move the delay in the register polling loop to the beginning of the loop to
ensure there is always a delay between writing the register and reading it.

Link: https://lore.kernel.org/r/20210928235442.201875-9-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Mike McGowen <Mike.McGowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Add extended report physical LUNs

Add support for the new extended formats in the data returned from the
Report Physical LUNs command for controllers that enable this feature.

The new formats allow the reporting of 16-byte WWIDs.

Link: https://lore.kernel.org/r/20210928235442.201875-8-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Mike McGowen <Mike.McGowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Avoid failing I/Os for offline devices

Prevent kernel crash by failing outstanding I/O request when the OS takes
device offline.

When posted I/Os to the controller's inbound queue are not picked by the
controller, the driver will halt the controller and take the controller
offline.

When the driver takes the controller offline, the driver will fail all the
outstanding requests which can sometimes lead to an OS crash.

Link: https://lore.kernel.org/r/20210928235442.201875-7-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Add TEST UNIT READY check for SANITIZE operation

Send a TEST UNIT READY to HBA disks and do not present them to the OS if
0x02/0x04/0x1b (SANITIZE IN PROGRESS) is returned.

During boot-up, some OSes appear to hang when there are one or more disks
undergoing a sanitize operation.

According to SCSI SBC4 specification section 4.11.2 "Commands allowed
during SANITIZE", some SCSI commands are permitted, but read/write
operations are not.

When the OS attempts to read the disk partition table a CHECK CONDITION ASC
0x04 ASCQ 0x1b is returned which causes the OS to retry the read until
SANITIZE has completed. This can take hours.

According to document HPE Smart Storage Administrator User Guide, during
the sanitize erase operation, the drive is unusable. I.e. the expected
behavior for SANITIZE is the that disk remains offline even after SANITIZE
has completed. The customer is expected to re-enable the disk using the
management utility.

Link: https://lore.kernel.org/r/20210928235442.201875-6-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Update LUN reset handler

Enhance check for commands queued to the controller.  Add new function
pqi_nonempty_inbound_queue_count() that will wait for all I/O queued for
submission to controller across all queue groups to drain.  Add helper
functions to obtain queue command counts for each queue group.  These
queues should drain quickly as they are already staged to be submitted down
to the controller's IB queue.

Enhance check for outstanding command completion.  Update the count of
outstanding commands while waiting.  This value was not re-obtained and was
potentially causing infinite wait for all completions.

Link: https://lore.kernel.org/r/20210928235442.201875-5-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Capture controller reason codes

In some rare cases, the driver can halt the controller. Add a reason code
describing why the controller was halted. Store this reason code in a
controller register to aid in debugging the issue.

Link: https://lore.kernel.org/r/20210928235442.201875-4-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Add controller handshake during kdump

Correct kdump hangs when controller is locked up.

There are occasions when a controller reboot (controller soft reset) is
issued when a controller firmware crash dump is in progress.

This leads to incomplete controller firmware crash dump:

- When the controller crash dump is in progress, and a kdump is initiated,
   the driver issues inbound doorbell reset to bring back the controller in
   SIS mode.

- If the controller is in locked up state, the inbound doorbell reset does
   not work causing controller initialization failures. This results in the
   driver hanging waiting for SIS mode.

To avoid an incomplete controller crash dump, add in a controller crash
dump handshake:

- Controller will indicate start and end of the controller crash dump by
   setting some register bits.

- Driver will look these bits when a kdump is initiated.  If a controller
   crash dump is in progress, the driver will wait for the controller crash
   dump to complete before issuing the controller soft reset then complete
   driver initialization.

Link: https://lore.kernel.org/r/20210928235442.201875-3-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: smartpqi: Update device removal management

Update device removal path to handle issues for:

- rmmod: Correct stack trace when removing devices.
- rmmod: Synchronize SCSI cache.
- Update handling for removing devices using sysfs.

Link: https://lore.kernel.org/r/20210928235442.201875-2-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpi3mr: Clean up mpi3mr_print_ioc_info()

This function is more complicated than necessary.

If we change from scnprintf() to snprintf() that lets us remove the if
bytes_wrote < sizeof(protocol) checks. Also, we can use bytes_wrote ? ","
: "" to print the comma and remove the separate if statement and the
"is_string_nonempty" variable.

[mkp: a few formatting cleanups and s/wrote/written/]

Link: https://lore.kernel.org/r/20210916132605.GF25094@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: iscsi: Fix iscsi_task use after free

Commit d39df158518c ("scsi: iscsi: Have abort handler get ref to conn")
added iscsi_get_conn()/iscsi_put_conn() calls during abort handling but
then also changed the handling of the case where we detect an already
completed task where we now end up doing a goto to the common put/cleanup
code. This results in a iscsi_task use after free, because the common
cleanup code will do a put on the iscsi_task.

This reverts the goto and moves the iscsi_get_conn() to after we've checked
if the iscsi_task is valid.

Link: https://lore.kernel.org/r/20211004210608.9962-1-michael.christie@oracle.com
Fixes: d39df158518c ("scsi: iscsi: Have abort handler get ref to conn")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix memory overwrite during FC-GS I/O abort handling

When an FC-GS I/O is aborted by lpfc, the driver requires a node pointer
for a dereference operation. In the abort I/O routine, the driver miscasts
a context pointer to the wrong data type and overwrites a single byte
outside of the allocated space. This miscast is done in the abort I/O
function handler because the handler works on both FC-GS and FC-LS
commands. However, the code neglected to get the correct job location for
the node.

Fix this by acquiring the necessary node pointer from the correct job
structure depending on the I/O type.

Link: https://lore.kernel.org/r/20211004231210.35524-1-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: elx: efct: Delete stray unlock statement

It's not holding the lock at this stage and the IRQ "flags" are not correct
so it would restore something bogus. Delete the unlock statement.

Link: https://lore.kernel.org/r/20211004103851.GE25015@kili
Fixes: 3e6414003bf9 ("scsi: elx: efct: SCSI I/O handling routines")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: pm80xx: Fix misleading log statement in pm8001_mpi_get_nvmd_resp()

pm8001_mpi_get_nvmd_resp() handles a GET_NVMD_DATA response, not a
SET_NVMD_DATA response, as the log statement implies.

Fixes: 1f889b58716a ("scsi: pm80xx: Fix pm8001_mpi_get_nvmd_resp() race condition")
Link: https://lore.kernel.org/r/20210929025847.646999-1-ipylypiv@google.com
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: pm80xx: Replace open coded check with dev_is_expander()

This is a follow up cleanup to the commit 924a3541eab0 ("scsi: libsas:
aic94xx: hisi_sas: mvsas: pm8001: Use dev_is_expander()")

Link: https://lore.kernel.org/r/20210929025807.646589-1-ipylypiv@google.com
Reviewed-by: Vishakha Channapattan <vishakhavc@google.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: tcmu: Use struct_size() helper in kmalloc()

Make use of the struct_size() helper instead of an open-coded version, in
order to avoid any potential type mistakes or integer overflows that, in
the worst scenario, could lead to heap overflows.

Link: https://github.com/KSPP/linux/issues/160
Link: https://lore.kernel.org/r/20210927224344.GA190701@embeddedor
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: usb: Replace enable attr with ops.enable

Remove tpg/enable attribute. Add fabric ops enable_tpg implementation
instead.

Link: https://lore.kernel.org/r/20210910084133.17956-8-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: ibm_vscsi: Replace enable attr with ops.enable

Remove tpg/enable attribute. Add fabric ops enable_tpg implementation
instead.

Link: https://lore.kernel.org/r/20210910084133.17956-7-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: srpt: Replace enable attr with ops.enable

Remove tpg/enable attribute. Add fabric ops enable_tpg implementation
instead.

Link: https://lore.kernel.org/r/20210910084133.17956-6-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: sbp: Replace enable attr with ops.enable

Remove tpg/enable attribute. Add fabric ops enable_tpg implementation
instead.

Link: https://lore.kernel.org/r/20210910084133.17956-5-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: qla2xxx: Replace enable attr with ops.enable

Remove tpg/enable attribute. Add fabric ops enable_tpg implementation
instead.

Link: https://lore.kernel.org/r/20210910084133.17956-4-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: iscsi: Replace tpg enable attr with ops.enable

Remove tpg/enable attribute. Add fabric ops enable_tpg implementation
instead.

Link: https://lore.kernel.org/r/20210910084133.17956-3-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Add common tpg/enable attribute

Many fabric modules provide their own implementation of enable attribute in
tpg.

Provide a way to remove code duplication in the fabric modules and
automatically add "enable" attribute if a fabric module has an
implementation of fabric_enable_tpg().

Link: https://lore.kernel.org/r/20210910084133.17956-2-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Driver version update to 07.719.03.00-rc1

Link: https://lore.kernel.org/r/20210929124022.24605-4-sumit.saxena@broadcom.com
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Add helper functions for irq_context

Adding helper functions for ISR access and release to improve readability.

Link: https://lore.kernel.org/r/20210929124022.24605-3-sumit.saxena@broadcom.com
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Fix concurrent access to ISR between IRQ polling and real interrupt

IRQ polling thread calls ISR after enable_irq() to handle any missed I/O
completion. The atomic flag "in_used" was added to have the synchronization
between the IRQ polling thread and the interrupt context. There is a bug
around it leading to a race condition.

Below is the sequence:

- IRQ polling thread accesses ISR, fetches the reply descriptor.

- Real interrupt arrives and pre-empts polling thread (enable_irq() is
   already called).

- Interrupt context picks the same reply descriptor as fetched by polling
   thread, processes it, and exits.

- Polling thread resumes and processes the descriptor which is already
   processed by interrupt thread leads to kernel crash.

Setting the "in_used" flag before fetching the reply descriptor ensures
synchronized access to ISR.

Link: https://www.spinics.net/lists/linux-scsi/msg159440.html
Link: https://lore.kernel.org/r/20210929124022.24605-2-sumit.saxena@broadcom.com
Fixes: 9bedd36e9146 ("scsi: megaraid_sas: Handle missing interrupts while re-enabling IRQs")
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: advansys: Fix kernel pointer leak

Pointers should be printed with %p or %px rather than cast to 'unsigned
long' and printed with %lx.

Change %lx to %p to print the hashed pointer.

Link: https://lore.kernel.org/r/20210929122538.1158235-1-qtxuning1999@sjtu.edu.cn
Signed-off-by: Guo Zhi <qtxuning1999@sjtu.edu.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Make logs less verbose

Change the log level of the following message to debug:

Unsupported SCSI Opcode 0xXX, sending CHECK_CONDITION.

This message is mostly helpful during debugging sessions in order to
understand errors on the initiator side. But most of the time it's just
useless and makes reading logs much harder.

It gets particularly annoying if there are many initiators that come and go
or if an initiator runs a program that does not care whether the command is
supported and just keeps sending it.

Link: https://lore.kernel.org/r/20210929114959.705852-1-k.shelekhin@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Konstantin Shelekhin <k.shelekhin@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Do not exit ufshcd_err_handler() unless operational or dead

Callers of ufshcd_err_handler() expect it to return in an operational
state. However, the code does not check the state before exiting.

Add a check for the state and perform retries until either success or the
maximum number of retries is reached.

Link: https://lore.kernel.org/r/20211002154550.128511-3-adrian.hunter@intel.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Do not exit ufshcd_reset_and_restore() unless operational or dead

Callers of ufshcd_reset_and_restore() expect it to return in an operational
state. However, the code only checks direct errors and so the ufshcd_state
may not be UFSHCD_STATE_OPERATIONAL due to error interrupts.

Fix by also checking ufshcd_state, still allowing non-fatal errors which
are left for the error handler to deal with.

Link: https://lore.kernel.org/r/20211002154550.128511-2-adrian.hunter@intel.com
Reviewed-by: Avri altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Stop clearing UNIT ATTENTIONS

Commit aa53f580e67b ("scsi: ufs: Minor adjustments to error handling")
introduced a ufshcd_clear_ua_wluns() call in
ufshcd_err_handling_unprepare(). As explained in detail by Adrian Hunter,
this can trigger a deadlock. Avoid that deadlock by removing the code that
clears the unit attention. This is safe because the only software that
relies on clearing unit attentions is the Android Trusty software and
because support for handling unit attentions has been added in the Trusty
software.

See also https://lore.kernel.org/linux-scsi/20210930124224.114031-2-adrian.hunter@intel.com/

Note that "scsi: ufs: Retry START_STOP on UNIT_ATTENTION" is a prerequisite
for this commit.

Link: https://lore.kernel.org/r/20211001182015.1347587-3-jaegeuk@kernel.org
Fixes: aa53f580e67b ("scsi: ufs: Minor adjustments to error handling")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Retry START_STOP on UNIT_ATTENTION

Commit 57d104c153d3 ("ufs: add UFS power management support") made the UFS
driver submit a REQUEST SENSE command before submitting a power management
command to a WLUN to clear the POWER ON unit attention. Instead of
submitting a REQUEST SENSE command before submitting a power management
command, retry the power management command until it succeeds.

This is the preparation to get rid of all UNIT ATTENTION code which should
be handled by users.

Link: https://lore.kernel.org/r/20211001182015.1347587-2-jaegeuk@kernel.org
Cc: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Remove return statement in void function

Return statement is not useful at the end of "void" function.

Link: https://lore.kernel.org/r/20210929200640.828611-4-huobean@gmail.com
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Fix ufshcd_probe_hba() prototype to match the definition

Since commit 568dd9959611 ("scsi: ufs: Rename the second ufshcd_probe_hba()
argument"), the second ufshcd_probe_hba() argument has been changed to
init_dev_params.

Link: https://lore.kernel.org/r/20210929200640.828611-3-huobean@gmail.com
Fixes: 568dd9959611 ("scsi: ufs: Rename the second ufshcd_probe_hba() argument")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Fix NULL pointer dereference

Calling ufshcd_rpm_{get/put}_sync() prior to ufshcd_scsi_add_wlus() being
called will trigger a NULL pointer dereference. This is because
hba->sdev_ufs_device is initialized in ufshcd_scsi_add_wlus().

    Unable to handle kernel NULL pointer dereference at virtual address
    0000000000000348
    Mem abort info:
      ESR = 0x96000004
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x04: level 0 translation fault
    Data abort info:
      ISV = 0, ISS = 0x00000004
      CM = 0, WnR = 0
    [0000000000000348] user address but active_mm is swapper
    Internal error: Oops: 96000004 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 0 PID: 91 Comm: kworker/u16:1 Not tainted 5.15.0-rc1-beanhuo-linaro-1423
    Hardware name: MicronRB (DT)
    Workqueue: events_unbound async_run_entry_fn
    pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : pm_runtime_drop_link+0x128/0x338
    lr : ufshpb_get_dev_info+0x8c/0x148
    sp : ffff800012573c10
    x29: ffff800012573c10 x28: 0000000000000000 x27: 0000000000000003
    x26: ffff000001d21298 x25: 000000005abcea60 x24: ffff800011d89000
    x23: 0000000000000001 x22: ffff000001d21880 x21: ffff000001ec9300
    x20: 0000000000000004 x19: 0000000000000198 x18: ffffffffffffffff
    x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000041400
    x14: 5eee00201100200a x13: 000000000000bb03 x12: 0000000000000000
    x11: 0000000000000100 x10: 0200000000000000 x9 : bb0000021a162c01
    x8 : 0302010021021003 x7 : 0000000000000000 x6 : ffff800012573af0
    x5 : 0000000000000001 x4 : 0000000000000001 x3 : 0000000000000200
    x2 : 0000000000000348 x1 : 0000000000000348 x0 : ffff80001095308c
    Call trace:
     pm_runtime_drop_link+0x128/0x338
     ufshpb_get_dev_info+0x8c/0x148
     ufshcd_probe_hba+0xda0/0x11b8
     ufshcd_async_scan+0x34/0x330
     async_run_entry_fn+0x38/0x180
     process_one_work+0x1f4/0x498
     worker_thread+0x48/0x480
     kthread+0x140/0x158
     ret_from_fork+0x10/0x20
    Code: 88027c01 35ffffa2 17fff6c4 f9800051 (885f7c40)
    ---[ end trace 2ba541335f595c95 ]

ufshpb_get_dev_info() is only called during asynchronous scanning and at
that time pm_runtime_get_sync() has been called:

    ...
    /* Hold auto suspend until async scan completes */
    pm_runtime_get_sync(dev);
    atomic_set(&hba->scsi_block_reqs_cnt, 0);
    ...
    ufshcd_async_scan()
        ufshcd_probe_hba(hba, true);
            ufshcd_device_params_init(hba);
                ufshpb_get_dev_info();
    ...
        pm_runtime_put_sync(hba->dev);

Remove ufshcd_rpm_{get/put}_sync() from ufshpb_get_dev_info() to fix this
problem.

Link: https://lore.kernel.org/r/20210929200640.828611-2-huobean@gmail.com
Fixes: 351b3a849ac7 ("scsi: ufs: ufshpb: Use proper power management API")
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Fix task management completion

The UFS driver uses blk_mq_tagset_busy_iter() when identifying task
management requests to complete, however blk_mq_tagset_busy_iter() doesn't
work.

blk_mq_tagset_busy_iter() only iterates requests dispatched by the block
layer. That appears as if it might have started since commit 37f4a24c2469
("blk-mq: centralise related handling into blk_mq_get_driver_tag") which
removed 'data->hctx->tags->rqs[rq->tag] = rq' from blk_mq_rq_ctx_init()
which gets called:

blk_get_request
blk_mq_alloc_request
__blk_mq_alloc_request
blk_mq_rq_ctx_init

Since UFS task management requests are not dispatched by the block layer,
hctx->tags->rqs[rq->tag] remains NULL, and since blk_mq_tagset_busy_iter()
relies on finding requests using hctx->tags->rqs[rq->tag], UFS task
management requests are never found by blk_mq_tagset_busy_iter().

By using blk_mq_tagset_busy_iter(), the UFS driver was relying on internal
details of the block layer, which was fragile and subsequently got
broken. Fix by removing the use of blk_mq_tagset_busy_iter() and having the
driver keep track of task management requests.

Link: https://lore.kernel.org/r/20210922091059.4040-1-adrian.hunter@intel.com
Fixes: 1235fc569e0b ("scsi: ufs: core: Fix task management request completion timeout")
Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs")
Cc: stable@vger.kernel.org
Tested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: acornscsi: Remove scsi_cmd_to_tag() reference

Commit 756fb6a895af ("scsi: acornscsi: Remove tagged queuing vestiges")
mistakenly introduced a reference to function scsi_cmd_to_tag(). This
function does not exist as it was removed from an earlier series version
when I upstreamed the named commit - originally authored By Hannes - but
this reference still remained.

Fix by replacing the reference to scsi_cmd_to_tag() with
scsi_cmd_to_rq(scsi_scmd)->tag, which scsi_cmd_to_tag() was a wrapper for.

Link: https://lore.kernel.org/r/1633002717-79765-1-git-send-email-john.garry@huawei.com
Fixes: 756fb6a895af ("scsi: acornscsi: Remove tagged queuing vestiges")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Fix spelling in a source code comment

The typo in this source code comment makes the comment confusing. Clear up
the confusion by fixing the typo.

Link: https://lore.kernel.org/r/20210929182318.2060489-1-bvanassche@acm.org
Fixes: bc85dc500f9d ("scsi: remove scsi_end_request")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: sd: Fix sd_do_mode_sense() buffer length handling

For devices that explicitly asked for MODE SENSE(10) use, make sure that
scsi_mode_sense() is called with a buffer of at least 8 bytes so that the
sense header fits.

Link: https://lore.kernel.org/r/20210820070255.682775-4-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Fix scsi_mode_select() buffer length handling

The MODE SELECT(6) command allows handling mode page buffers that are up to
255 bytes, including the 4 byte header needed in front of the page
buffer. For requests larger than this limit, automatically use the MODE
SELECT(10) command.

In both cases, since scsi_mode_select() adds the mode select page header,
checks on the buffer length value must include this header size to avoid
overflows of the command CDB allocation length field.

While at it, use put_unaligned_be16() for setting the header block
descriptor length and CDB allocation length when using MODE SELECT(10).

[mkp: fix MODE SENSE vs. MODE SELECT confusion]

Link: https://lore.kernel.org/r/20210820070255.682775-3-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Fix scsi_mode_sense() buffer length handling

Several problems exist with scsi_mode_sense() buffer length handling:

1) The allocation length field of the MODE SENSE(10) command is 16-bits,
    occupying bytes 7 and 8 of the CDB. With this command, access to mode
    pages larger than 255 bytes is thus possible. However, the CDB
    allocation length field is set by assigning len to byte 8 only, thus
    truncating buffer length larger than 255.

2) If scsi_mode_sense() is called with len smaller than 8 with
    sdev->use_10_for_ms set, or smaller than 4 otherwise, the buffer length
    is increased to 8 and 4 respectively, and the buffer is zero filled
    with these increased values, thus corrupting the memory following the
    buffer.

Fix these 2 problems by using put_unaligned_be16() to set the allocation
length field of MODE SENSE(10) CDB and by returning an error when len is
too small.

Furthermore, if len is larger than 255B, always try MODE SENSE(10) first,
even if the device driver did not set sdev->use_10_for_ms. In case of
invalid opcode error for MODE SENSE(10), access to mode pages larger than
255 bytes are not retried using MODE SENSE(6). To avoid buffer length
overflows for the MODE_SENSE(10) case, check that len is smaller than 65535
bytes.

While at it, also fix the folowing:

* Use get_unaligned_be16() to retrieve the mode data length and block
   descriptor length fields of the mode sense reply header instead of using
   an open coded calculation.

* Fix the kdoc dbd argument explanation: the DBD bit stands for Disable
   Block Descriptor, which is the opposite of what the dbd argument
   description was.

Link: https://lore.kernel.org/r/20210820070255.682775-2-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Delete scsi_{get,free}_host_dev()

Since commit 0653c358d2dc ("scsi: Drop gdth driver"), functions
scsi_{get,free}_host_dev() no longer have any in-tree users, so delete
them.

Link: https://lore.kernel.org/r/1631528047-30150-1-git-send-email-john.garry@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nacked-by: Hannes Reinecke <hare@suse.de>

scsi: elx: efct: Switch from 'pci_' to 'dma_' API

The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below.

It has been hand modified to use 'dma_set_mask_and_coherent()' instead of
'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable.
This is less verbose.

It has been compile tested.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Link: https://lore.kernel.org/r/3899b1ed4abac581c30845d82f33ec6df8b38976.1629633207.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: ufs-qcom: Enter and exit hibern8 during clock scaling

Qualcomm controller needs to be in hibern8 before scaling clocks. This
change puts the controller in hibern8 state before scaling and brings it
out after scaling of clocks.

Link: https://lore.kernel.org/r/212b7aaf6d834c4a8c682fdac4a59b84013ed573.1632818942.git.nguyenb@codeaurora.org
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Export hibern8 entry and exit functions

Qualcomm controllers need to be in hibern8 before scaling up or down the
clocks. Hence, export the hibern8 entry and exit functions.

Link: https://lore.kernel.org/r/a29bfdd0c8f1d1a3e5fb69e43ea277c97a7f0cb6.1632818942.git.nguyenb@codeaurora.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Add support for optional PLDV handling

At adapter attachment or SLI port initialization, read the SLIPORT_STATUS
register to check for pldv_enable. If found, the driver will perform a PCIe
configuration space write when attaching to an SLI port instance that is an
LPe32000 series adapter.

Link: https://lore.kernel.org/r/20210927183518.22130-1-jsmart2021@gmail.com
Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: csiostor: Add module softdep on cxgb4

Both cxgb4 and csiostor drivers run on their own independent Physical
Function. But when cxgb4 and csiostor are both being loaded in parallel via
modprobe, there is a race when firmware upgrade is attempted by both the
drivers.

When the cxgb4 driver initiates the firmware upgrade, it halts the firmware
and the chip until upgrade is complete. When the csiostor driver is coming
up in parallel, the firmware mailbox communication fails with timeouts and
the csiostor driver probe fails.

Add a module soft dependency on cxgb4 driver to ensure loading csiostor
triggers cxgb4 to load first when available to avoid the firmware upgrade
race.

Link: https://lore.kernel.org/r/1632759248-15382-1-git-send-email-rahul.lakkireddy@chelsio.com
Fixes: a3667aaed569 ("[SCSI] csiostor: Chelsio FCoE offload driver")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: SCSI_UFS_HWMON depends on HWMON=y

When building an allmodconfig kernel, the following build error shows up:

aarch64-linux-gnu-ld: drivers/scsi/ufs/ufs-hwmon.o: in function `ufs_hwmon_probe':
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:177: undefined reference to `hwmon_device_register_with_info'
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:177:(.text+0x510): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `hwmon_device_register_with_info'
aarch64-linux-gnu-ld: drivers/scsi/ufs/ufs-hwmon.o: in function `ufs_hwmon_remove':
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:195: undefined reference to `hwmon_device_unregister'
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:195:(.text+0x5c8): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `hwmon_device_unregister'
aarch64-linux-gnu-ld: drivers/scsi/ufs/ufs-hwmon.o: in function `ufs_hwmon_notify_event':
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:206: undefined reference to `hwmon_notify_event'
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:206:(.text+0x64c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `hwmon_notify_event'
aarch64-linux-gnu-ld: /home/anders/src/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:209: undefined reference to `hwmon_notify_event'
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:209:(.text+0x66c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `hwmon_notify_event'

Since SCSI_UFS_HWMON can't be built as a module, SCSI_UFS_HWMON has to
depend on HWMON=y.

Link: https://lore.kernel.org/r/20210927084615.1938432-1-anders.roxell@linaro.org
Fixes: e88e2d32200a ("scsi: ufs: core: Probe for temperature notification support")
Also-reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Return NULL rather than a plain 0 integer

Function lpfc_sli4_perform_vport_cvl() returns a pointer to struct
lpfc_nodelist so returning a plain 0 integer isn't good practice. Fix this
by returning a NULL instead.

Link: https://lore.kernel.org/r/20210925224113.183040-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: aic7xxx: Fix a function name in comments

Use dma_alloc_coherent() instead of pci_alloc_consistent().

Link: https://lore.kernel.org/r/20210925132931.95-1-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix a function name in comments

Use dma_map_sg() instead of pci_map_sg() in comments.

Link: https://lore.kernel.org/r/20210925125324.1760-3-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: advansys: Prefer struct_size() over open-coded arithmetic

As noted in the "Deprecated Interfaces, Language Features, Attributes, and
Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead to
values wrapping around and a smaller allocation being made than the caller
was expecting. Using those allocations could lead to linear overflows of
heap memory and other misbehaviors.

Use the struct_size() helper to do the arithmetic instead of the argument
"size + count * size" in the kzalloc() function.

This code was detected with the help of Coccinelle and audited and fixed
manually.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments

Link: https://lore.kernel.org/r/20210925114205.11377-1-len.baker@gmx.com
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Len Baker <len.baker@gmx.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Fix excessive messages during device logout

Disable default logging of some I/O path messages. If desired, the messages
can be turned back on by setting ql2xextended_error_logging.

Link: https://lore.kernel.org/r/20210925035154.29815-1-njavali@marvell.com
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: virtio_scsi: Fix spelling mistake "Unsupport" -> "Unsupported"

There are a couple of spelling mistakes in pr_info and pr_err messages.
Fix them.

Link: https://lore.kernel.org/r/20210924230330.143785-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: exynos: Unify naming

Use "Samsung" and "Exynos", not the uppercase versions.

Link: https://lore.kernel.org/r/20210924132658.109814-2-krzysztof.kozlowski@canonical.com
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ses: Fix unsigned comparison with less than zero

Fix the following coccicheck warning:

./drivers/scsi/ses.c:137:10-16: WARNING: Unsigned expression compared
with zero: result > 0.

Link: https://lore.kernel.org/r/1632477113-90378-1-git-send-email-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: Fix illegal offset in UPIU event trace

Fix incorrect index for UTMRD reference in ufshcd_add_tm_upiu_trace().

Link: https://lore.kernel.org/r/20210924085848.25500-1-jonathan.hsu@mediatek.com
Fixes: 4b42d557a8ad ("scsi: ufs: core: Fix wrong Task Tag used in task management request UPIUs")
Cc: stable@vger.kernel.org
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jonathan Hsu <jonathan.hsu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ses: Retry failed Send/Receive Diagnostic commands

Setting SCSI logging level with error=3, we saw some errors from enclosues:

[108017.360833] ses 0:0:9:0: tag#641 Done: NEEDS_RETRY Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[108017.360838] ses 0:0:9:0: tag#641 CDB: Receive Diagnostic 1c 01 01 00 20 00
[108017.427778] ses 0:0:9:0: Power-on or device reset occurred
[108017.427784] ses 0:0:9:0: tag#641 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[108017.427788] ses 0:0:9:0: tag#641 CDB: Receive Diagnostic 1c 01 01 00 20 00
[108017.427791] ses 0:0:9:0: tag#641 Sense Key : Unit Attention [current]
[108017.427793] ses 0:0:9:0: tag#641 Add. Sense: Bus device reset function occurred
[108017.427801] ses 0:0:9:0: Failed to get diagnostic page 0x1
[108017.427804] ses 0:0:9:0: Failed to bind enclosure -19
[108017.427895] ses 0:0:10:0: Attached Enclosure device
[108017.427942] ses 0:0:10:0: Attached scsi generic sg18 type 13

Retry if the Send/Receive Diagnostic commands complete with a transient
error status (NOT_READY or UNIT_ATTENTION with ASC 0x29).

Link: https://lore.kernel.org/r/1631849061-10210-2-git-send-email-wenxiong@linux.ibm.com
Reviewed-by: Brian King <brking@linux.ibm.com>
Reviewed-by: James Bottomley <jejb@linux.ibm.com>
Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix mailbox command failure during driver initialization

Contention for the mailbox interface may occur during driver initialization
(immediately after a function reset), between mailbox commands initiated
via ioctl (bsg) and those driver requested by the driver.

After setting SLI_ACTIVE flag for a port, there is a window in which the
driver will allow an ioctl to be initiated while the adapter is
initializing and issuing mailbox commands via polling. The polling logic
then gets confused.

Correct by having thread setting SLI_ACTIVE spot an active mailbox command
and allow it complete before proceeding.

Link: https://lore.kernel.org/r/20210921143008.64212-1-jsmart2021@gmail.com
Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: dc395: Fix error case unwinding

dc395x_init_one()->adapter_init() might fail. In this case, the acb is
already cleaned up by adapter_init(), no need to do that in
adapter_uninit(acb) again.

[    1.252251] dc395x: adapter init failed
[    1.254900] RIP: 0010:adapter_uninit+0x94/0x170 [dc395x]
[    1.260307] Call Trace:
[    1.260442]  dc395x_init_one.cold+0x72a/0x9bb [dc395x]

Link: https://lore.kernel.org/r/20210907040702.1846409-1-ztong0001@gmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: Fix spelling mistake "CONFLIFT" -> "CONFLICT"

There is a spelling mistake in a dev_err message. Fix it.

Link: https://lore.kernel.org/r/20210920183206.17477-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix gcc -Wstringop-overread warning, again

I fixed a stringop-overread warning earlier this year, now a second copy of
the original code was added and the warning came back:

drivers/scsi/lpfc/lpfc_attr.c: In function 'lpfc_cmf_info_show':
drivers/scsi/lpfc/lpfc_attr.c:289:25: error: 'strnlen' specified bound 4095 exceeds source size 24 [-Werror=stringop-overread]
289 | strnlen(LPFC_INFO_MORE_STR, PAGE_SIZE - 1),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix it the same way as the other copy.

Link: https://lore.kernel.org/r/20210920095628.1191676-1-arnd@kernel.org
Fixes: ada48ba70f6b ("scsi: lpfc: Fix gcc -Wstringop-overread warning")
Fixes: 74a7baa2a3ee ("scsi: lpfc: Add cmf_info sysfs entry")
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Use correct scnprintf() limit

The limit should be "PAGE_SIZE - len" instead of "PAGE_SIZE". We're not
going to hit the limit so this fix will not affect runtime.

Link: https://lore.kernel.org/r/20210916132331.GE25094@kili
Fixes: 5b9e70b22cc5 ("scsi: lpfc: raise sg count for nvme to use available sg resources")
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix sprintf() overflow in lpfc_display_fpin_wwpn()

This scnprintf() uses the wrong limit. It should be
"LPFC_FPIN_WWPN_LINE_SZ - len" instead of LPFC_FPIN_WWPN_LINE_SZ.

Link: https://lore.kernel.org/r/20210916132251.GD25094@kili
Fixes: 428569e66fa7 ("scsi: lpfc: Expand FPIN and RDF receive logging")
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Remove 'current_tag'

The 'current_tag' field in struct scsi_device is unused now; remove it.

Link: https://lore.kernel.org/r/1631696835-136198-4-git-send-email-john.garry@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: acornscsi: Remove tagged queuing vestiges

The acornscsi driver has a config option to enable tagged queuing, but this
option gets disabled in the driver itself with the comment 'needs to be
debugged'. As this is a _really_ old driver I doubt anyone will be wanting
to invest time here, so remove the tagged queue vestiges and make our lives
easier.

[jpg: Use scsi_cmd_to_rq()]

Link: https://lore.kernel.org/r/1631696835-136198-3-git-send-email-john.garry@huawei.com
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: fas216: Kill scmd->tag

The driver is attempting to allocate a tag internally which is a no-go with
blk-mq. Switch the driver to use the request tag and kill usage of
scmd->tag and scmd->device->current_tag.

[jpg: Change to use scsi_cmd_to_rq()]

Link: https://lore.kernel.org/r/1631696835-136198-2-git-send-email-john.garry@huawei.com
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Add temperature notification exception handling

The device may notify the host of an extreme temperature by using the
exception event mechanism. The exception can be raised when the device’s
Tcase temperature is either too high or too low.

It is essentially up to the platform to decide what further actions need to
be taken. leave a placeholder for a designated vop for that.

Link: https://lore.kernel.org/r/20210915060407.40-3-avri.altman@wdc.com
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Probe for temperature notification support

Probe the dExtendedUFSFeaturesSupport register for the device's temperature
notification support and, if supported, add a hardware monitor device.

Link: https://lore.kernel.org/r/20210915060407.40-2-avri.altman@wdc.com
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: efct: Decrease area under spinlock

Under the session level spinlock node->active_ios_lock in
efct_scsi_io_alloc() we are taking another spinlock for the port. This
leads to contention between sessions and even between I/Os in the same
session.

Reduce the locked region to active_ios list for which active_ios_lock is
intended. Spinlock CPU usage decreases from 18% down to 13%. IOPS are
increased from 220 kIOPS to 264 kIOPS for one LUN.

Link: https://lore.kernel.org/r/20210914105539.6942-4-d.bogdanov@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Ram Vegesna <ram.vegesna@broadcom.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: efct: Fix nport free

nport_free for an empty nport hangs the state machine waiting for mbox
completion if nport is not yet attached thinking that it is attaching right
now. Add a check for nport attaching state and complete nport free.

Link: https://lore.kernel.org/r/20210914105539.6942-3-d.bogdanov@yadro.com
Reviewed-by: Ram Vegesna <ram.vegesna@broadcom.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: efct: Add state in nport sm trace printout

Similar to other state machine traces and to make debug easier, add the
state name to nport sm trace printout.

Link: https://lore.kernel.org/r/20210914105539.6942-2-d.bogdanov@yadro.com
Reviewed-by: Ram Vegesna <ram.vegesna@broadcom.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Restore initiator in dual mode

In dual mode in case of disabling the target, the whole port goes offline
and initiator is turned off too.

Fix restoring initiator mode after disabling target in dual mode.

Link: https://lore.kernel.org/r/20210915153239.8035-1-d.bogdanov@yadro.com
Fixes: 0645cb8350cd ("scsi: qla2xxx: Add mode control for each physical port")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Unbreak the reset handler

A command tag is passed as the second argument of the
__ufshcd_transfer_req_compl() call in ufshcd_eh_device_reset_handler()
instead of a bitmask. Fix this by passing a bitmask as argument instead of
a command tag.

Link: https://lore.kernel.org/r/20210916175408.2260084-1-bvanassche@acm.org
Fixes: a45f937110fa ("scsi: ufs: Optimize host lock on transfer requests send/compl paths")
Cc: Can Guo <cang@codeaurora.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Remove include <scsi/scsi_host.h> from scsi_cmnd.h

There are no dependencies in <scsi/scsi_cmnd.h> on the <scsi/scsi_host.h>
header file. Hence remove the scsi_host.h include directive from
scsi_cmnd.h. This include directive was introduced in February 2021 by
commit af1830956dc3 ("scsi: core: Add mq_poll support to SCSI layer").

Link: https://lore.kernel.org/r/20210917212751.2676054-1-bvanassche@acm.org
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: sd_zbc: Support disks with more than 2**32 logical blocks

This patch addresses the following Coverity report about the zno *
sdkp->zone_blocks expression:

CID 1475514 (#1 of 1): Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression zno *
sdkp->zone_blocks with type unsigned int (32 bits, unsigned) is evaluated
using 32-bit arithmetic, and then used in a context that expects an
expression of type sector_t (64 bits, unsigned).

Link: https://lore.kernel.org/r/20210917212314.2362324-1-bvanassche@acm.org
Fixes: 5795eb443060 ("scsi: sd_zbc: emulate ZONE_APPEND commands")
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Revert "scsi: ufs: Synchronize SCSI and UFS error handling"

This reverts commit a113eaaf86373362b053279049907ff82b5df6c8.

There are a couple of issues with the commit:

1. It causes deadlocks.

2. It causes the shost->eh_cmd_q list of failed requests not to be
    processed, ever.

So revert it.

1. Deadlocks

The SCSI error handler runs with requests blocked beginning when
scsi_schedule_eh() sets SHOST_RECOVERY state, continuing through
scsi_error_handler() callback ->eh_strategy_handler() until
scsi_restart_operations() is called.  By setting eh_strategy_handler to
ufshcd_err_handler, the patch changed the UFS error handler to run with
requests blocked, including PM requests, for the entire run of the error
handler.

That conflicts with UFS error handler existing synchronization with UFS
device PM operations.  The UFS error handler synchronizes with runtime PM
by doing pm_runtime_get_sync() prior to blocking requests itself.  It
synchronizes with system PM by use of hba->host_sem, again before blocking
requests itself.  However, if requests are already blocked, then PM
operations will block.  So:

   the UFS error handler blocks waiting on PM
+ PM blocks waiting on SCSI PM requests to process or fail
+ PM requests are blocked waiting on error handling to finish
=  deadlock

This happens both for runtime PM and system PM.

Prior to the patch, these deadlocks could not happen even if SCSI error
handling was running, because the presence of requests in shost->eh_cmd_q
would mean the queues could not be suspended, which would mean that, should
the UFS error handler run at the same time, it would not need to wait for
PM or vice versa.

Please note these scenarios are not just theoretical, they were found
during testing on a Samsung Galaxy Book S.

2. ->eh_strategy_handler() must process shost->eh_cmd_q list of failed
requests, as all other eh_strategy_handler's do except UFS error handler.
Refer for example: scsi_unjam_host(), ata_scsi_error() and
sas_scsi_recover_host().

Link: https://lore.kernel.org/r/20210917144349.14058-1-adrian.hunter@intel.com
Fixes: a113eaaf8637 ("scsi: ufs: Synchronize SCSI and UFS error handling")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: Remove unused function arguments

The se_cmd is unused in these functions, just remove it.

Link: https://lore.kernel.org/r/20210913083045.3670648-1-fengli@smartx.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Li Feng <fengli@smartx.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: ufs-mediatek: Change dbg select by check IP version

Mediatek UFS dbg select setting is changed in new IP version. Check the IP
version before setting dbg select.

Link: https://lore.kernel.org/r/1630918387-8333-1-git-send-email-peter.wang@mediatek.com
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: ufshpb: Use proper power management API

In ufshpb, pm_runtime_{get,put}_sync() are used to avoid unwanted runtime
suspend during query requests. Whereas commit b294ff3e3449 ("scsi: ufs:
core: Enable power management for wlun") modified the driver core to use
ufshcd_rpm_{get,put}_sync() APIs.

Switch to these APIs in HPB module as well.

Link: https://lore.kernel.org/r/20210902003534epcms2p1937a0f0eeb48a441cb69f5ef13ff8430@epcms2p1
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: ufs-qcom: Remove unneeded variable 'err'

'err' is never set in this functon. Remove the declaration and just return
0.

Link: https://lore.kernel.org/r/20210907044111.29632-1-cw9316.lee@samsung.com
Signed-off-by: ChanWoo Lee <cw9316.lee@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: documentation: Document Fibre Channel sysfs node for appid

Update documentation for sysfs node within /sys/class/fc/fc_udev_device/.

Link: https://lore.kernel.org/r/20210913015853.2086512-1-muneendra.kumar@broadcom.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: elx: libefc: Prefer kcalloc() over open coded arithmetic

As noted in the "Deprecated Interfaces, Language Features, Attributes, and
Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead to
values wrapping around and a smaller allocation being made than the caller
was expecting. Using those allocations could lead to linear overflows of
heap memory and other misbehaviors.

Use the purpose specific kcalloc() function instead of the argument count *
size in the kzalloc() function.

[1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments

Link: https://lore.kernel.org/r/20210905062448.6587-1-len.baker@gmx.com
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Len Baker <len.baker@gmx.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Update lpfc version to 14.0.0.2

Update lpfc version to 14.0.0.2.

Link: https://lore.kernel.org/r/20210910233159.115896-15-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Improve PBDE checks during SGL processing

The PBDE feature, setting payload buffer address explicitly in the WQE so
it doesn't have to be fetched from the SGL, only makes sense when there is
a single buffer for the I/O. When there are multiple buffers it actually
hurts performance as the SGL subsequently has to be fetched.

Rework the SGL logic to only use PBDE when a single buffer.

Link: https://lore.kernel.org/r/20210910233159.115896-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Zero CGN stats only during initial driver load and stat reset

Currently congestion management framework results are cleared whenever the
framework settings changed (such as it being turned off then back on). This
unfortunately means prior stats, rolled up to higher time windows lose
meaning.

Change such that stats are not cleared. Thus they pause and resume with
prior values still being considered.

Link: https://lore.kernel.org/r/20210910233159.115896-13-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix I/O block after enabling managed congestion mode

If the congestion management framework dynamically enables, it may do so
while I/O is in flight. The updates of cmf info due to inflight I/O
completing may happen before values have been initialized.

Fix by ensure cmf_max_bytes_per_interval is initialized when checking
bandwidth utilization for SCSI layer blocking.

Link: https://lore.kernel.org/r/20210910233159.115896-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Adjust bytes received vales during cmf timer interval

The newly added congestion mgmt framework is seeing unexpected congestion
FPINs and signals. In analysis, time values given to the adapter are not
at hard time intervals. Thus the drift vs the transfer count seen is
affecting how the framework manages things.

Adjust counters to cover the drift.

Link: https://lore.kernel.org/r/20210910233159.115896-11-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix EEH support for NVMe I/O

Injecting errors on the PCI slot while the driver is handling NVMe I/O will
cause crashes and hangs.

There are several rather difficult scenarios occurring. The main issue is
that the adapter can report a PCI error before or simultaneously to the PCI
subsystem reporting the error. Both paths have different entry points and
currently there is no interlock between them. Thus multiple teardown paths
are competing and all heck breaks loose.

Complicating things is the NVMs path. To a large degree, I/O was able to be
shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a
similar call. At best, it works on a per-controller basis, but even at the
controller level, it's a controller "reset" call. All of which means I/O is
still flowing on different CPUs with reset paths expecting hw access
(mailbox commands) to execute properly.

The following modifications are made:

- A new flag is set in PCI error entrypoints so the driver can track being
   called by that path.

- An interlock is added in the SLI hw error path and the PCI error path
   such that only one of the paths proceeds with the teardown logic.

- RPI cleanup is patched such that RPIs are marked unregistered w/o mbx
   cmds in cases of hw error.

- If entering the SLI port re-init calls, a case where SLI error teardown
   was quick and beat the PCI calls now reporting error, check whether the
   SLI port is still live on the PCI bus.

- In the PCI reset code to bring the adapter back, recheck the IRQ
   settings. Different checks for SLI3 vs SLI4.

- In I/O completions, that may be called as part of the cleanup or
   underway just before the hw error, check the state of the adapter.  If
   in error, shortcut handling that would expect further adapter
   completions as the hw error won't be sending them.

- In routines waiting on I/O completions, which may have been in progress
   prior to the hw error, detect the device is being torn down and abort
   from their waits and just give up. This points to a larger issue in the
   driver on ref-counting for data structures, as it doesn't have
   ref-counting on q and port structures. We'll do this fix for now as it
   would be a major rework to be done differently.

- Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being
   failed back due to hw error.

- In I/O buf allocation, done at the start of new I/Os, check hw state and
   fail if hw error.

Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix FCP I/O flush functionality for TMF routines

A prior patch inadvertently caused lpfc_sli_sum_iocb() to exclude counting
of outstanding aborted I/Os and ABORT IOCBs.  Thus,
lpfc_reset_flush_io_context() called from any TMF routine does not properly
wait to flush all outstanding FCP IOCBs leading to a block layer crash on
an invalid scsi_cmnd->request pointer.

  kernel BUG at ../block/blk-core.c:1489!
  RIP: 0010:blk_requeue_request+0xaf/0xc0
  ...
  Call Trace:
  <IRQ>
  __scsi_queue_insert+0x90/0xe0 [scsi_mod]
  blk_done_softirq+0x7e/0x90
  __do_softirq+0xd2/0x280
  irq_exit+0xd5/0xe0
  do_IRQ+0x4c/0xd0
  common_interrupt+0x87/0x87
  </IRQ>

Fix by separating out the LPFC_IO_FCP, LPFC_IO_ON_TXCMPLQ,
LPFC_DRIVER_ABORTED, and CMD_ABORT_XRI_CN || CMD_CLOSE_XRI_CN checks into a
new lpfc_sli_validate_fcp_iocb_for_abort() routine when determining to
build an ABORT iocb.

Restore lpfc_reset_flush_io_context() functionality by including counting
of outstanding aborted IOCBs and ABORT IOCBs in lpfc_sli_sum_iocb().

Link: https://lore.kernel.org/r/20210910233159.115896-9-jsmart2021@gmail.com
Fixes: e1364711359f ("scsi: lpfc: Fix illegal memory access on Abort IOCBs")
Cc: <stable@vger.kernel.org> # v5.12+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix NVMe I/O failover to non-optimized path

Currently, we hold off unregistering with NVMe transport layer until GID_FT
or ADISC completes upon receipt of RSCN. In the ADISC discovery routine,
for nodes not found in the GID_FT response, the nodes are unregistered from
the SCSI transport but not UNREG_RPI'd. Meaning outstanding WQEs continue
to be outstanding and were not failed back to the OS. If an NVMe device,
this mean there wasn't initial termination of the I/Os so they could be
issued on a different NVMe path.

Fix by unregistering the RPI so that I/O is cancelled.

Link: https://lore.kernel.org/r/20210910233159.115896-8-jsmart2021@gmail.com
Fixes: 0614568361b0 ("scsi: lpfc: Delay unregistering from transport until GIDFT or ADISC completes")
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Don't remove ndlp on PRLI errors in P2P mode

In pt-2-pt mode, the initiator does not log into the target after a PRLI
error. In pt-2-pt mode, the target responded to the PRLI by sending a
LOGO. The LOGO causes all ELS and I/Os to be aborted. This caused the PRLI
to fail. The PRLI completion path caused the discovery node to be dropped
to avoid being stick in an UNUSED (not logged in) state. As the node was
dropped there is no retry of the login and as it is pt-2-pt, there is no
RSCN to retrigger discovery. Thus the other end is not seen by the OS.

Fix by ensuring the discovery node is not dropped if connecting pt-2-pt.
This will cause PLOGI to be retried.

Link: https://lore.kernel.org/r/20210910233159.115896-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix rediscovery of tape device after LIP

On link up and node discovery, a remote port is registered with the SCSI
transport and the driver sets fc4_xpt_flags to track transport
registration.

A link down event causes the driver to deregister with the SCSI transport,
starting the devloss timer, and calls a local unreg routine to clear the
login state. Part of the login state is the fc4_xpt_flags. However, with
tape devices that support sequence level error recovery, which wants to
preserve the login, the local unreg routine is skipped, thus the flags
aren't cleared.

A subsequent link up, ADISC is performed and the lpfc_nlp_reg_node()
routine is called. As the fc4_xpt_flags is not clear, it's believed the
node is already registered with the transport. Unfortunately, the
registration was already terminated. Eventually the devloss tmo timer
expires and tears down the device.

Fix by ensuring the tape device, known by the ADISC flag, is always
unregistered if the link drops.

Link: https://lore.kernel.org/r/20210910233159.115896-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix hang on unload due to stuck fport node

A test scenario encountered an unload hang while an FLOGI ELS was in flight
when a link down condition occurred.  The driver fails unload as it never
releases the fport node.

For most nodes, when the link drops, devloss tmo is started and the timeout
will cause the final node release. For the Fport, as it has not yet
registered with the SCSI transport, there is no devloss timer to be
started, so there is no final release.  Additionally, the link down
sequence causes ABORTS to be issued for pending ELS's. The completions from
the ABORTS perform the release of node references.  However, as the adapter
is being reset to be unloaded, those completions will never occur.

Fix by the following:

- In the ELS cleanup, recognize when unloading and place the ELS's on a
   different list that immediately cleans up/completes the ELS's.  It's
   recognized that this condition primarily affects only the fport, with
   other ports having normal clean up logic that handles things.

- Resolve the devloss issue by, when cleaning up nodes on after link down,
   recognizing when the fabric node does not have a completed state (its
   state is UNUSED) and removing a reference so the node can delete after
   the ELS reference is released.

Link: https://lore.kernel.org/r/20210910233159.115896-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix premature rpi release for unsolicited TPLS and LS_RJT

A test scenario has a target issuing a TPLS after accepting the driver's
PRLI.  TPLS is not supported by the driver so it rejects the ELS.  However,
the reject was only happening on the primary N_Port.  If the TPLS was to a
NPIV vport, not only would it reject the ELS, but it would act on the TPLS,
starting devloss, then unregister from the SCSI transport and release the
node. When devloss expired, it would access the node again and cause a page
faul.

Fix by altering the NPIV code to recognize that a correctly registered node
can reject unsolicited ELS I/O and to not unregister with the SCSI
transport and tear the node down.  Add a check of the fc4_xpt_flags so that
only a zero value allows the unreg and teardown.

Link: https://lore.kernel.org/r/20210910233159.115896-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Don't release final kref on Fport node while ABTS outstanding

In a rarely executed path, FLOGI failure, there is a refcounting error. If
FLOGI completed with an error, typically a timeout, the initial completion
handler would remove the job reference. However, the job completion isn't
the actual end of the job/exchange as the timeout usually initiates an
ABTS, and upon that ABTS completion, a final completion is sent. The driver
removes the reference again in the final completion. Thus the imbalance.

In the buggy cases, if there was a link bounce while the delayed response
is outstanding, the fport node may be referenced again but there was no
additional reference as it is already present. The delayed completion then
occurs and removes the last reference freeing the node and causing issues
in the link up processed that is using the node.

Fix this scenario by removing the snippet that removed the reference in the
initial FLOGI completion. The bad snippet was poorly trying to identify the
FLOGI as OK to do so by realizing the node was not registered with either
SCSI or NVMe transport.

Link: https://lore.kernel.org/r/20210910233159.115896-3-jsmart2021@gmail.com
Fixes: 618e2ee146d4 ("scsi: lpfc: Fix FLOGI failure due to accessing a freed node")
Cc: <stable@vger.kernel.org> # v5.13+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()

When parsing the txq list in lpfc_drain_txq(), the driver attempts to pass
the requests to the adapter. If such an attempt fails, a local "fail_msg"
string is set and a log message output. The job is then added to a
completions list for cancellation.

Processing of any further jobs from the txq list continues, but since
"fail_msg" remains set, jobs are added to the completions list regardless
of whether a wqe was passed to the adapter. If successfully added to
txcmplq, jobs are added to both lists resulting in list corruption.

Fix by clearing the fail_msg string after adding a job to the completions
list. This stops the subsequent jobs from being added to the completions
list unless they had an appropriate failure.

Link: https://lore.kernel.org/r/20210910233159.115896-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>