Bean Huo [Thu, 1 Dec 2022 14:04:34 +0000 (15:04 +0100)]
scsi: ufs: core: Split ufshcd_map_sg()
Take out the "map scatter-gather list to prdt" part of the code in
ufshcd_map_sg() and split it into a new function ufshcd_sgl_to_prdt().
Signed-off-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bean Huo [Thu, 1 Dec 2022 14:04:33 +0000 (15:04 +0100)]
scsi: ufs: bsg: Clean up ufs_bsg_request()
Move sg_copy_from_buffer() below its associated case statement.
Signed-off-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bean Huo [Thu, 1 Dec 2022 14:04:32 +0000 (15:04 +0100)]
scsi: ufs: bsg: Remove unnecessary length checkup
Remove checks on job->request_len and job->reply_len because msgcode checks
in a subseqent commit will rule out malicious requests.
Signed-off-by: Bean Huo <beanhuo@micron.com> Acked-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Fri, 18 Nov 2022 23:37:03 +0000 (15:37 -0800)]
scsi: ufs: core: Fix the polling implementation
Fix the following issues in ufshcd_poll():
- If polling succeeds, return a positive value.
- Do not complete polling requests from interrupt context because the
block layer expects these requests to be completed from thread
context. From block/bio.c:
If REQ_ALLOC_CACHE is set, the final put of the bio MUST be done from
process context, not hard/soft IRQ.
Fixes: eaab9b573054 ("scsi: ufs: Implement polling support") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221118233717.441298-1-bvanassche@acm.org Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jie Zhan [Fri, 18 Nov 2022 08:37:13 +0000 (16:37 +0800)]
scsi: hisi_sas: Fix SATA devices missing issue during I_T nexus reset
SATA devices on an expander may be removed and not be found again when I_T
nexus reset and revalidation are processed simultaneously.
The issue comes from:
- Revalidation can remove SATA devices in link reset, e.g. in
hisi_sas_clear_nexus_ha().
- However, hisi_sas_debug_I_T_nexus_reset() polls the state of a SATA
device on an expander after sending link_reset, where it calls:
hisi_sas_debug_I_T_nexus_reset
sas_ata_wait_after_reset
ata_wait_after_reset
ata_wait_ready
smp_ata_check_ready
sas_ex_phy_discover
sas_ex_phy_discover_helper
sas_set_ex_phy
The ex_phy's change count is updated in sas_set_ex_phy(), so SATA
devices after a link reset may not be found later through revalidation.
A similar issue was reported in:
commit 0f3fce5cc77e ("[SCSI] libsas: fix ata_eh clobbering ex_phys via
smp_ata_check_ready")
commit 87c8331fcf72 ("[SCSI] libsas: prevent domain rediscovery competing
with ata error handling").
To address this issue, in hisi_sas_debug_I_T_nexus_reset(), we now call
smp_ata_check_ready_type() that only polls the device type while not
updating the ex_phy's data of libsas.
This is now unnecessary to solve the SATA devices missing issue in
hisi_sas_clear_nexus_ha(). Hence, we should not ignore bcast events during
sas_eh_handle_sas_errors() in case of missing bcast events, unless a
justified need is found and a mechanism to defer (but not ignore) bcast
events in sas_eh_handle_sas_errors() is provided.
Also, in hisi_sas_clear_nexus_ha(), there is nothing further to handle in
"out: " other than return, so that part can be reverted.
Draining or flushing events in hisi_sas_rescan_topology() can hang the
driver, typically with phy up or phy down events being processed,
i.e. sas_porte_bytes_dmaed() or sas_phye_loss_of_signal().
Signed-off-by: ChanWoo Lee <cw9316.lee@samsung.com> Link: https://lore.kernel.org/r/20221118045242.2770-1-cw9316.lee@samsung.com Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
ChanWoo Lee [Fri, 18 Nov 2022 04:41:36 +0000 (13:41 +0900)]
scsi: ufs: ufs-mediatek: Remove unneeded code
Remove unnecessary if/goto code.
Signed-off-by: ChanWoo Lee <cw9316.lee@samsung.com> Link: https://lore.kernel.org/r/20221118044136.921-1-cw9316.lee@samsung.com Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Thu, 17 Nov 2022 18:36:26 +0000 (10:36 -0800)]
scsi: device_handler: alua: Call scsi_device_put() from non-atomic context
Since commit f93ed747e2c7 ("scsi: core: Release SCSI devices
synchronously"), scsi_device_put() might sleep. Avoid calling it from
alua_rtpg_queue() with the pg_lock held. The lock only pretects h->pg,
anyway. To avoid the pg being freed under us, because of a race with
another thread, take a temporary reference. In alua_rtpg_queue(), verify
that the pg still belongs to the sdev being passed before actually queueing
the RTPG.
This patch fixes the following smatch warning:
drivers/scsi/device_handler/scsi_dh_alua.c:1013 alua_rtpg_queue() warn: sleeping in atomic context
Cc: Martin Wilck <mwilck@suse.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Sachin Sant <sachinp@linux.ibm.com> Cc: Benjamin Block <bblock@linux.ibm.com> Suggested-by: Martin Wilck <mwilck@suse.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221117183626.2656196-3-bvanassche@acm.org Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Thu, 17 Nov 2022 18:36:25 +0000 (10:36 -0800)]
scsi: device_handler: alua: Revert "Move a scsi_device_put() call out of alua_check_vpd()"
There is a bug in commit 0b25e17e9018 ("scsi: alua: Move a
scsi_device_put() call out of alua_check_vpd()"): that patch may cause
alua_rtpg_queue() callers to call scsi_device_put() even if that function
should not be called. Revert that commit to prepare for a different
solution.
Cc: Hannes Reinecke <hare@suse.de> Cc: Martin Wilck <mwilck@suse.com> Cc: Sachin Sant <sachinp@linux.ibm.com> Cc: Benjamin Block <bblock@linux.ibm.com> Reported-by: Sachin Sant <sachinp@linux.ibm.com> Reported-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221117183626.2656196-2-bvanassche@acm.org Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Gaosheng Cui [Thu, 17 Nov 2022 03:51:00 +0000 (11:51 +0800)]
scsi: snic: Fix possible UAF in snic_tgt_create()
Smatch reports a warning as follows:
drivers/scsi/snic/snic_disc.c:307 snic_tgt_create() warn:
'&tgt->list' not removed from list
If device_add() fails in snic_tgt_create(), tgt will be freed, but
tgt->list will not be removed from snic->disc.tgt_list, then list traversal
may cause UAF.
Remove from snic->disc.tgt_list before free().
Fixes: c8806b6c9e82 ("snic: driver for Cisco SCSI HBA") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Link: https://lore.kernel.org/r/20221117035100.2944812-1-cuigaosheng1@huawei.com Acked-by: Narsimhulu Musini <nmusini@cisco.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Gleb Chesnokov [Tue, 15 Nov 2022 09:38:08 +0000 (12:38 +0300)]
scsi: qla2xxx: Initialize vha->unknown_atio_[list, work] for NPIV hosts
Initialization of vha->unknown_atio_list and vha->unknown_atio_work only
happens for base_vha in qlt_probe_one_stage1(). But there is no
initialization for NPIV hosts that are created in qla24xx_vport_create().
This causes a crash when trying to access these NPIV host fields.
Fix this by adding initialization to qla_vport_create().
Gleb Chesnokov [Tue, 15 Nov 2022 09:38:05 +0000 (12:38 +0300)]
scsi: qla2xxx: Remove duplicate of vha->iocb_work initialization
Commit 9b3e0f4d4147 ("scsi: qla2xxx: Move work element processing out of
DPC thread") introduced the initialization of vha->iocb_work in
qla2x00_create_host() function.
This initialization is also called from qla2x00_probe_one() function, just
after qla2x00_create_host().
Hence remove this duplicate call since it has already been called before.
Chen Zhongjin [Tue, 15 Nov 2022 09:24:42 +0000 (17:24 +0800)]
scsi: fcoe: Fix transport not deattached when fcoe_if_init() fails
fcoe_init() calls fcoe_transport_attach(&fcoe_sw_transport), but when
fcoe_if_init() fails, &fcoe_sw_transport is not detached and leaves freed
&fcoe_sw_transport on fcoe_transports list. This causes panic when
reinserting module.
BUG: unable to handle page fault for address: fffffbfff82e2213
RIP: 0010:fcoe_transport_attach+0xe1/0x230 [libfcoe]
Call Trace:
<TASK>
do_one_initcall+0xd0/0x4e0
load_module+0x5eee/0x7210
...
Fixes: 78a582463c1e ("[SCSI] fcoe: convert fcoe.ko to become an fcoe transport provider driver") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Link: https://lore.kernel.org/r/20221115092442.133088-1-chenzhongjin@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shin'ichiro Kawasaki [Tue, 15 Nov 2022 00:29:05 +0000 (09:29 +0900)]
scsi: sd: Use 16-byte SYNCHRONIZE CACHE on ZBC devices
ZBC Zoned Block Commands specification mandates SYNCHRONIZE CACHE(16) for
host-managed zoned block devices, but does not mandate SYNCHRONIZE
CACHE(10). Call SYNCHRONIZE CACHE(16) in place of SYNCHRONIZE CACHE(10) to
ensure that the command is always supported. For this purpose, add
use_16_for_sync flag to struct scsi_device in same manner as use_16_for_rw
flag.
To be precise, ZBC does not mandate SYNCHRONIZE CACHE(16) for host-aware
zoned block devices. However, modern devices should support 16-byte
commands. Hence, call SYNCHRONIZE CACHE (16) on both types of ZBC devices,
host-aware and host-managed. Of note is that READ(16) and WRITE(16) have
same story and they are already called for both types of ZBC devices.
Another note is that this patch depends on the fix commit ea045fd344cb
("ata: libata-scsi: fix SYNCHRONIZE CACHE (16) command failure").
Shang XiaoJing [Sun, 13 Nov 2022 06:45:13 +0000 (14:45 +0800)]
scsi: ipr: Fix WARNING in ipr_init()
ipr_init() will not call unregister_reboot_notifier() when
pci_register_driver() fails, which causes a WARNING. Call
unregister_reboot_notifier() when pci_register_driver() fails.
Yang Yingliang [Sat, 12 Nov 2022 13:10:10 +0000 (21:10 +0800)]
scsi: scsi_debug: Fix possible name leak in sdebug_add_host_helper()
Afer commit 1fa5ae857bb1 ("driver core: get rid of struct device's bus_id
string array"), the name of device is allocated dynamically, it needs be
freed when device_register() returns error.
As comment of device_register() says, one should use put_device() to give
up the reference in the error path. Fix this by calling put_device(), then
the name can be freed in kobject_cleanup(), and sdbg_host is freed in
sdebug_release_adapter().
When the device release is not set, it means the device is not initialized.
We can not call put_device() in this case. Use kfree() to free memory.
Yang Yingliang [Sat, 12 Nov 2022 09:43:10 +0000 (17:43 +0800)]
scsi: fcoe: Fix possible name leak when device_register() fails
If device_register() returns an error, the name allocated by dev_set_name()
needs to be freed. As the comment of device_register() says, one should use
put_device() to give up the reference in the error path. Fix this by
calling put_device(), then the name can be freed in kobject_cleanup().
The 'fcf' is freed in fcoe_fcf_device_release(), so the kfree() in the
error path can be removed.
The 'ctlr' is freed in fcoe_ctlr_device_release(), so don't use the error
label, just return NULL after calling put_device().
Harshit Mogalapalli [Sat, 12 Nov 2022 07:06:12 +0000 (23:06 -0800)]
scsi: scsi_debug: Fix a warning in resp_report_zones()
As 'alloc_len' is user controlled data, if user tries to allocate memory
larger than(>=) MAX_ORDER, then kcalloc() will fail, it creates a stack
trace and messes up dmesg with a warning.
Add __GFP_NOWARN in order to avoid too large allocation warning. This is
detected by static analysis using smatch.
Harshit Mogalapalli [Sat, 12 Nov 2022 07:00:31 +0000 (23:00 -0800)]
scsi: scsi_debug: Fix a warning in resp_verify()
As 'vnum' is controlled by user, so if user tries to allocate memory larger
than(>=) MAX_ORDER, then kcalloc() will fail, it creates a stack trace and
messes up dmesg with a warning.
Add __GFP_NOWARN in order to avoid too large allocation warning. This is
detected by static analysis using smatch.
Chen Zhongjin [Fri, 11 Nov 2022 07:40:46 +0000 (15:40 +0800)]
scsi: efct: Fix possible memleak in efct_device_init()
In efct_device_init(), when efct_scsi_reg_fc_transport() fails,
efct_scsi_tgt_driver_exit() is not called to release memory for
efct_scsi_tgt_driver_init() and causes memleak:
Yang Yingliang [Fri, 11 Nov 2022 04:30:12 +0000 (12:30 +0800)]
scsi: hpsa: Fix possible memory leak in hpsa_add_sas_device()
If hpsa_sas_port_add_rphy() returns an error, the 'rphy' allocated in
sas_end_device_alloc() needs to be freed. Address this by calling
sas_rphy_free() in the error path.
If hpsa_sas_port_add_phy() returns an error, hpsa_free_sas_phy() can not be
called to free the memory because the port and the phy have not been added
yet.
Replace hpsa_free_sas_phy() with sas_phy_free() and kfree() to avoid kernel
crash in this case.
Yang Yingliang [Wed, 9 Nov 2022 03:24:03 +0000 (11:24 +0800)]
scsi: mpt3sas: Fix possible resource leaks in mpt3sas_transport_port_add()
In mpt3sas_transport_port_add(), if sas_rphy_add() returns error,
sas_rphy_free() needs be called to free the resource allocated in
sas_end_device_alloc(). Otherwise a kernel crash will happen:
Because transport_add_device() is not called when sas_rphy_add() fails, the
device is not added. When sas_rphy_remove() is subsequently called to
remove the device in the remove() path, a NULL pointer dereference happens.
Yuan Can [Tue, 22 Nov 2022 01:57:51 +0000 (01:57 +0000)]
scsi: hpsa: Fix possible memory leak in hpsa_init_one()
The hpda_alloc_ctlr_info() allocates h and its field reply_map. However, in
hpsa_init_one(), if alloc_percpu() failed, the hpsa_init_one() jumps to
clean1 directly, which frees h and leaks the h->reply_map.
Fix by calling hpda_free_ctlr_info() to release h->replay_map and h instead
free h directly.
Fixes: 8b834bff1b73 ("scsi: hpsa: fix selection of reply queue") Signed-off-by: Yuan Can <yuancan@huawei.com> Link: https://lore.kernel.org/r/20221122015751.87284-1-yuancan@huawei.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Wenchao Hao [Wed, 23 Nov 2022 12:21:37 +0000 (20:21 +0800)]
scsi: core: Do not increase scsi_device's iorequest_cnt if dispatch failed
If scsi_dispatch_cmd() failed, the SCSI command was not sent to the target.
scsi_queue_rq() would return BLK_STS_RESOURCE if scsi_dispatch_cmd()
failed, and the related request would be requeued. The timeout of this
request would not fire, so noone would increase iodone_cnt.
Wenchao Hao [Wed, 23 Nov 2022 12:21:36 +0000 (20:21 +0800)]
scsi: core: Increase scsi_device's iodone_cnt in scsi_timeout()
If a SCSI command times out and is going to be aborted, we should increase
the iodone_cnt of the related scsi_device. Otherwise the iodone_cnt would
be smaller than iorequest_cnt.
Increasing iodone_cnt in scsi_timeout() would not cause a double accounting
issue. Brief analysis follows:
- We add the iodone_cnt when BLK_EH_DONE is returned in
scsi_timeout(). The related command's timeout event would not happen.
- If the abort succeeds and the command is not retried, the command would
be completed with scsi_finish_command() which would not increase
iodone_cnt.
- If the abort succeeds and the command is retried, it would be requeue. A
scsi_dispatch_cmd() would be called and iorequest_cnt would be increased
again.
- If the abort fails, the error handler successfully recovers the device,
and the command is not retried, the command would be completed with
scsi_finish_command() which would not increase iodone_cnt.
- If the abort fails, the error handler successfully recovers the device,
and the command is retried, the iorequest_cnt would be increased again.
Wenchao Hao [Tue, 22 Nov 2022 18:11:05 +0000 (18:11 +0000)]
scsi: iscsi: Rename iscsi_set_param() to iscsi_if_set_param()
There are two iscsi_set_param() functions defined in libiscsi.c and
scsi_transport_iscsi.c respectively which is confusing.
Rename the one in scsi_transport_iscsi.c to iscsi_if_set_param().
Signed-off-by: Wenchao Hao <haowenchao@huawei.com> Link: https://lore.kernel.org/r/20221122181105.4123935-1-haowenchao@huawei.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Maurizio Lombardi [Mon, 21 Nov 2022 09:27:03 +0000 (10:27 +0100)]
scsi: target: core: Fix hard lockup when executing a compare-and-write command
While handling an I/O completion for the compare portion of a
COMPARE_AND_WRITE command, it may happen that the
compare_and_write_callback function submits new bio structs while still in
softirq context.
Low level drivers like md raid5 do not expect their make_request call to be
used in softirq context, they call into schedule() and create a deadlocked
system.
This problem appears to be an issue between target_cmd_interrupted() and
compare_and_write_callback(). target_cmd_interrupted() calls the se_cmd's
transport_complete_callback function pointer if the se_cmd is being stopped
or aborted, and CMD_T_ABORTED was set on the se_cmd.
When calling compare_and_write_callback(), the success parameter was set to
false. target_cmd_interrupted() seems to expect this means the callback
will do cleanup that does not require a process context. But
compare_and_write_callback() ignores the parameter if there was I/O done
for the compare part of COMPARE_AND_WRITE.
Since there was data, the function continued on, passed the compare, and
issued a write while ignoring the value of the success parameter. The
submit of a bio for the write portion of the COMPARE_AND_WRITE then causes
schedule to be unsafely called from the softirq context.
Fix the bug in compare_and_write_callback by jumping to the out label if
success == "false", after checking if we have been called by
transport_generic_request_failure(); The command is being aborted or
stopped so there is no need to submit the write bio for the write part of
the COMPARE_AND_WRITE command.
Maurizio Lombardi [Tue, 15 Nov 2022 12:56:38 +0000 (13:56 +0100)]
scsi: target: iscsi: Fix a race condition between login_work and the login thread
In case a malicious initiator sends some random data immediately after a
login PDU; the iscsi_target_sk_data_ready() callback will schedule the
login_work and, at the same time, the negotiation may end without clearing
the LOGIN_FLAGS_INITIAL_PDU flag (because no additional PDU exchanges are
required to complete the login).
The login has been completed but the login_work function will find the
LOGIN_FLAGS_INITIAL_PDU flag set and will never stop from rescheduling
itself; at this point, if the initiator drops the connection, the
iscsit_conn structure will be freed, login_work will dereference a released
socket structure and the kernel crashes.
Anastasia Kovaleva [Mon, 14 Nov 2022 10:25:00 +0000 (13:25 +0300)]
scsi: target: core: Change the way target_xcopy_do_work() sets restiction on max I/O
To determine how many blocks sends in one command, the minimum value is
selected from the hw_max_sectors of both devices. In target_xcopy_do_work,
hw_max_sectors are used as blocks, not sectors; it also ignores the fact
that sectors can be of different sizes, for example 512 and 4096
bytes. Because of this, a number of blocks can be transmitted that the
device will not be able to accept.
Change the selection of max transmission size into bytes.
Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com> Reviewed-by: Dmitriy Bogdanov <d.bogdanov@yadro.com> Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> Link: https://lore.kernel.org/r/20221114102500.88892-4-a.kovaleva@yadro.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Anastasia Kovaleva [Mon, 14 Nov 2022 10:24:59 +0000 (13:24 +0300)]
scsi: target: core: Make hw_max_sectors store the sectors amount in blocks
By default, hw_max_sectors stores its value in 512 blocks in iblock,
despite the fact that the block size can be 4096 bytes. Change
hw_max_sectors to store the number of sectors in hw_block_size blocks.
Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com> Reviewed-by: Dmitriy Bogdanov <d.bogdanov@yadro.com> Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> Link: https://lore.kernel.org/r/20221114102500.88892-3-a.kovaleva@yadro.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Anastasia Kovaleva [Mon, 14 Nov 2022 10:24:58 +0000 (13:24 +0300)]
scsi: target: core: Send max transfer length in blocks
A MAXIMUM TRANSFER LENGTH value indicates the maximum transfer length in
logical blocks that the device server accepts for a single command. Fix
function sending the length in sectors instead of blocks.
This patch also removes the special casing for fileio in block_size_store
since this logic in now unified in spc_emulate_evpd_b0() for all backends.
Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com> Reviewed-by: Dmitriy Bogdanov <d.bogdanov@yadro.com> Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> Link: https://lore.kernel.org/r/20221114102500.88892-2-a.kovaleva@yadro.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Thomas Gleixner [Sun, 13 Nov 2022 20:33:59 +0000 (21:33 +0100)]
scsi: lpfc: Remove linux/msi.h include
Nothing in this file needs anything from linux/msi.h
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221113202428.436270297@linutronix.de Cc: James Smart <james.smart@broadcom.com> Cc: Dick Kennedy <dick.kennedy@broadcom.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee [Wed, 16 Nov 2022 01:19:19 +0000 (17:19 -0800)]
scsi: lpfc: Fix crash involving race between FLOGI timeout and devloss handler
When a FLOGI completes with a sequence timeout error, a freed kref ptr
dereference crash can occur due to a timing race involving ndlp referencing
in lpfc_dev_loss_tmo_callbk.
Fix by ensuring the driver accounts for an outstanding FLOGI when dev_loss
is active. Also, don't remove the HBA_FLOGI_OUTSTANDING flag when the
FLOGI is retried to allow the driver to handle the reference counts
correctly in lpfc_dev_loss_tmo_handler.
Reported-by: Dietmar Hahn <dietmar.hahn@fujitsu.com> Tested-by: Dietmar Hahn <dietmar.hahn@fujitsu.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20221116011921.105995-5-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee [Wed, 16 Nov 2022 01:19:18 +0000 (17:19 -0800)]
scsi: lpfc: Fix MI capability display in cmf_info sysfs attribute
The dynamic mi_ver value holds the currently configured MI setting. mi_ver
was being displayed as part of the cmf_info sysfs attribute, when the
output string meant to display MI capabilities instead.
Add a mi_cap member in the lpfc_pc_sli4_params structure that will store MI
capabilities during initialization so that cmf_info prints out capabilities
instead of current configuration.
Justin Tee [Wed, 16 Nov 2022 01:19:16 +0000 (17:19 -0800)]
scsi: lpfc: Fix WQ|CQ|EQ resource check
Adapter configurations with limited EQ resources may fail to initialize.
Firmware resources are queried in lpfc_sli4_read_config(). The driver
parameters cfg_irq_chann and cfg_hdw_queue are adjusted from defaults if
constrained by firmware resources.
The minimum resource check includes a special allocation for queues such as
ELS, MBOX, NVME LS. However the additional reservation was also incorrectly
applied to EQ resources.
Reordered WQ|CQ|EQ resource checks to apply the special allocation
adjustment to WQ and CQ resources only.
Gustavo A. R. Silva [Tue, 15 Nov 2022 20:25:16 +0000 (14:25 -0600)]
scsi: bfa: Replace one-element array with flexible-array member
One-element arrays are deprecated, and we are replacing them with flexible
array members instead. So, replace one-element array with flexible-array
member in struct fdmi_attr_s.
Important to mention is that doing a build before/after this patch results
in no binary output differences.
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines
on memcpy() and help us make progress towards globally enabling
-fstrict-flex-arrays=3 [1].
If you try to allocate a memory larger than(>=) MAX_ORDER, then kmalloc()
will definitely fail. It creates a stack trace and messes up dmesg. The
user controls the size here so if they specify a too large size it will
fail.
Add __GFP_NOWARN in order to avoid too large allocation warning. This is
detected by static analysis using smatch.
Fixes: 481b5e5c7949 ("scsi: scsi_debug: add resp_write_scat function") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://lore.kernel.org/r/20221111100526.1790533-1-harshit.m.mogalapalli@oracle.com Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Tue, 8 Nov 2022 19:22:14 +0000 (13:22 -0600)]
scsi: smartpqi: Change version to 2.1.20-035
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Gerry Morong <gerry.morong@microchip.com> Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/166793533417.322537.3074216622272955440.stgit@brunhilda Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Tue, 8 Nov 2022 19:22:09 +0000 (13:22 -0600)]
scsi: smartpqi: Initialize feature section info
Initialize features to 0 before processing.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Mike Mcgowan <mike.mcgowan@microchip.com> Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/166793532902.322537.2436075977808555348.stgit@brunhilda Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Gilbert Wu [Tue, 8 Nov 2022 19:22:03 +0000 (13:22 -0600)]
scsi: smartpqi: Add controller cache flush during rmmod
Add in a call to flush the controller cache during driver removal.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Mike Mcgowan <mike.mcgowan@microchip.com> Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Gilbert Wu <Gilbert.Wu@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/166793532388.322537.878022136408270892.stgit@brunhilda Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kumar Meiyappan [Tue, 8 Nov 2022 19:21:58 +0000 (13:21 -0600)]
scsi: smartpqi: Correct device removal for multi-actuator devices
Correct device count for multi-actuator drives which can cause kernel
panics.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike Mcgowan <mike.mcgowan@microchip.com> Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Kumar Meiyappan <Kumar.Meiyappan@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/166793531872.322537.9003385780343419275.stgit@brunhilda Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kevin Barnett [Tue, 8 Nov 2022 19:21:53 +0000 (13:21 -0600)]
scsi: smartpqi: Change sysfs raid_level attribute to N/A for controllers
Change the sysfs raid_level attribute from "RAID-0" to N/A.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowan <mike.mcgowan@microchip.com> Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/166793531357.322537.8639138137605612362.stgit@brunhilda Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kevin Barnett [Tue, 8 Nov 2022 19:21:48 +0000 (13:21 -0600)]
scsi: smartpqi: Correct max LUN number
Correct maximum LUN number for multi-actuator devices.
When multi-actuator support was added to smartpqi, the maximum number of
LUNs supported for multi-actuator devices was supposed to be changed from
unlimited to 256, but the setting was inadvertently left at unlimited.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/166793530842.322537.816949081443241857.stgit@brunhilda Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Signed-off-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/166793530327.322537.6056884426657539311.stgit@brunhilda Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Suggested-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@microchip.com> Reviewed-by: Mike McGowen <Mike.McGowen@microchip.com> Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/166793529811.322537.3294617845448383948.stgit@brunhilda Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
drivers/scsi/qla2xxx/qla_init.c: In function ‘qla24xx_async_abort_cmd’:
drivers/scsi/qla2xxx/qla_init.c:171:17: warning: variable ‘bail’ set but not used [-Wunused-but-set-variable]
171 | uint8_t bail;
| ^~~~
drivers/scsi/qla2xxx/qla_init.c: In function ‘qla2x00_async_tm_cmd’:
drivers/scsi/qla2xxx/qla_init.c:2023:17: warning: variable ‘bail’ set but not used [-Wunused-but-set-variable]
2023 | uint8_t bail;
| ^~~~
Cc: Arun Easi <arun.easi@qlogic.com> Cc: Giridhar Malavali <giridhar.malavali@qlogic.com> Fixes: feafb7b1714c ("[SCSI] qla2xxx: Fix vport delete issues") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221031224818.2607882-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.de> Cc: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221031224728.2607760-3-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.de> Cc: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221031224728.2607760-2-bvanassche@acm.org Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nathan Chancellor [Wed, 2 Nov 2022 16:19:06 +0000 (09:19 -0700)]
scsi: elx: libefc: Fix second parameter type in state callbacks
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function pointer
prototype to make sure the call target is valid to help mitigate ROP
attacks. If they are not identical, there is a failure at run time, which
manifests as either a kernel panic or thread getting killed. A proposed
warning in clang aims to catch these at compile time, which reveals:
The type of the second parameter in the prototypes of ->current_state() and
->nodedb_state() ('u32') does not match the implementations, which have a
second parameter type of 'enum efc_sm_event'. Update the prototypes to have
the correct second parameter type, clearing up all the warnings and CFI
failures.
Bart Van Assche [Mon, 31 Oct 2022 18:34:21 +0000 (11:34 -0700)]
scsi: ufs: core: Introduce ufshcd_abort_all()
Move the code for aborting all SCSI commands and TMFs into a new function.
This patch makes the ufshcd_err_handler() easier to read. Except for adding
more logging, this patch does not change any functionality.
John Garry [Wed, 26 Oct 2022 10:56:04 +0000 (18:56 +0800)]
scsi: pm8001: Drop !task check in pm8001_abort_task()
In commit 0b639decf651 ("scsi: pm8001: Modify task abort handling for SATA
task"), code was introduced to dereference "task" pointer in
pm8001_abort_task(). However there was a pre-existing later check for
"!task", which spooked the kernel test robot.
Function pm8001_abort_task() should never be passed NULL for "task"
pointer, so remove that check. Also remove the "unlikely" hint, as this is
not fastpath code.
Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1666781764-123090-1-git-send-email-john.garry@huawei.com Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bean Huo [Tue, 25 Oct 2022 22:24:30 +0000 (00:24 +0200)]
scsi: ufs: core: Use is_visible to control UFS unit descriptor sysfs nodes
UFS Boot and Device W-LUs do not have unit descriptors and RPMB does not
support WB. Use is_visible() to control which nodes are visible and which
are not.
Signed-off-by: Bean Huo <beanhuo@micron.com> Link: https://lore.kernel.org/r/20221025222430.277768-4-beanhuo@iokpp.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Arthur Simchaev <arthur.simchaev@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bean Huo [Tue, 25 Oct 2022 22:24:29 +0000 (00:24 +0200)]
scsi: ufs: core: Clean up ufshcd_slave_alloc()
Combine ufshcd_get_lu_power_on_wp_status() and ufshcd_set_queue_depth()
into one single ufshcd_lu_init(), so that we only need to read the LUN
descriptor once.
Signed-off-by: Bean Huo <beanhuo@micron.com> Link: https://lore.kernel.org/r/20221025222430.277768-3-beanhuo@iokpp.de Reviewed-by: Arthur Simchaev <arthur.simchaev@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bean Huo [Tue, 25 Oct 2022 22:24:28 +0000 (00:24 +0200)]
scsi: ufs: core: Revert "WB is only available on LUN #0 to #7"
Ccommit d3d9c4570285 ("scsi: ufs: Fix memory corruption by
ufshcd_read_desc_param()") has properly fixed stack overflow issue.
As a result, commit a2fca52ee640 ("scsi: ufs: WB is only available on LUN
#0 to #7") is no longer required. Revert it.
Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Link: https://lore.kernel.org/r/20221025222430.277768-2-beanhuo@iokpp.de Reviewed-by: Arthur Simchaev <arthur.simchaev@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Peter Wang [Mon, 24 Oct 2022 12:06:02 +0000 (20:06 +0800)]
scsi: ufs: core: Print events for WLUN suspend and resume failures
WLUN suspend and resume events are currently not handled by
ufshcd_print_evt_hist(). Add the missing events.
Signed-off-by: Peter Wang <peter.wang@mediatek.com> Link: https://lore.kernel.org/r/20221024120602.30019-1-peter.wang@mediatek.com Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Asutosh Das <quic_asutoshd@quicinc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: target: core: Dynamically set DPO and FUA in usage_bits
libiscsi tests check the support of DPO & FUA bits in usage bits of RSOC
response. This patch adds support for dynamic usage bits for each opcode.
Set support of DPO & FUA bits in usage_bits of RSOC response depending on
support DPOFUA in the backstore device.
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Link: https://lore.kernel.org/r/20220906103421.22348-7-d.bogdanov@yadro.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: target: core: Check emulate_3pc for RECEIVE COPY
RECEIVE COPY RESULTS is an opcode from 3rd party copy command set and shall
be rejected if emulate_3pc attribute is off like EXTENDED COPY.
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Link: https://lore.kernel.org/r/20220906103421.22348-6-d.bogdanov@yadro.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Allow support for RSOC to be turned off via the emulate_rsoc attibute.
This is just for testing purposes.
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Link: https://lore.kernel.org/r/20220906103421.22348-5-d.bogdanov@yadro.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: target: core: Dynamic opcode support in RSOC
Report supported opcodes depending on a dynamic device configuration.
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Link: https://lore.kernel.org/r/20220906103421.22348-4-d.bogdanov@yadro.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fill the strucures for supported opcodes and usage bits that are reported
in REPORT SUPPORTED OPERATION CODES command response.
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Link: https://lore.kernel.org/r/20220906103421.22348-3-d.bogdanov@yadro.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add support for REPORT SUPPORTED OPERATION CODES command according to SPC4.
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Link: https://lore.kernel.org/r/20220906103421.22348-2-d.bogdanov@yadro.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Tue, 18 Oct 2022 20:29:58 +0000 (13:29 -0700)]
scsi: ufs: Fix a deadlock between PM and the SCSI error handler
The following deadlock has been observed on multiple test setups:
* ufshcd_wl_suspend() is waiting for blk_execute_rq(START STOP UNIT) to
complete while ufshcd_wl_suspend() holds host_sem.
* The SCSI error handler is activated, changes the host state to
SHOST_RECOVERY, ufshcd_eh_host_reset_handler() and ufshcd_err_handler()
are called and the latter function tries to obtain host_sem.
This is a deadlock because blk_execute_rq() can't execute SCSI commands
while the host is in the SHOST_RECOVERY state and because the error handler
cannot make progress because host_sem is held by another thread.
Fix this deadlock as follows:
* Fail attempts to suspend the system while the SCSI error handler is in
progress by setting the SCMD_FAIL_IF_RECOVERING flag for START STOP UNIT
commands.
* If the system is suspending and a START STOP UNIT command times out,
handle the SCSI command timeout from inside the context of the SCSI
timeout handler instead of activating the SCSI error handler.
The runtime power management code is not affected by this deadlock since
hba->host_sem is not touched by the runtime power management functions in
the UFS driver.
Bart Van Assche [Tue, 18 Oct 2022 20:29:57 +0000 (13:29 -0700)]
scsi: ufs: Introduce the function ufshcd_execute_start_stop()
Open-code scsi_execute() because a later patch will modify scmd->flags and
because scsi_execute() does not support setting scmd->flags. No
functionality is changed.
Bart Van Assche [Tue, 18 Oct 2022 20:29:56 +0000 (13:29 -0700)]
scsi: ufs: Track system suspend / resume activity
Add a new boolean variable that tracks whether the system is suspending,
suspended or resuming. This information will be used in a later commit to
fix a deadlock between the SCSI error handler and the suspend code.
Bart Van Assche [Tue, 18 Oct 2022 20:29:55 +0000 (13:29 -0700)]
scsi: ufs: Try harder to change the power mode
Instead of only retrying the START STOP UNIT command if a unit attention is
reported, repeat it if any SCSI error is reported by the device or if the
command timed out.
Bart Van Assche [Tue, 18 Oct 2022 20:29:54 +0000 (13:29 -0700)]
scsi: ufs: Reduce the START STOP UNIT timeout
Reduce the START STOP UNIT command timeout to one second since on Android
devices a kernel panic is triggered if an attempt to suspend the system
takes more than 20 seconds. One second should be enough for the START STOP
UNIT command since this command completes in less than a millisecond for
the UFS devices I have access to.
Bart Van Assche [Tue, 18 Oct 2022 20:29:53 +0000 (13:29 -0700)]
scsi: ufs: Use 'else' in ufshcd_set_dev_pwr_mode()
Convert if (ret) { ... } if (!ret) { ... } into
if (ret) { ... } else { ... }.
Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221018202958.1902564-6-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Tue, 18 Oct 2022 20:29:52 +0000 (13:29 -0700)]
scsi: ufs: Remove an outdated comment
Although the host lock had to be held by ufshcd_clk_scaling_start_busy()
callers when that function was introduced, that is no longer the case
today. Hence remove the comment that claims that callers of this function
must hold the host lock.
Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221018202958.1902564-5-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Tue, 18 Oct 2022 20:29:51 +0000 (13:29 -0700)]
scsi: core: Support failing requests while recovering
The current behavior for SCSI commands submitted while error recovery is
ongoing is to retry command submission after error recovery has finished.
See also the scsi_host_in_recovery() check in scsi_host_queue_ready(). Add
support for failing SCSI commands while host recovery is in progress. This
functionality will be used to fix a deadlock in the UFS driver.
Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221018202958.1902564-4-bvanassche@acm.org Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Tue, 18 Oct 2022 20:29:50 +0000 (13:29 -0700)]
scsi: core: Change the return type of .eh_timed_out()
Commit 6600593cbd93 ("block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONE")
made it impossible for .eh_timed_out() implementations to call
scsi_done() without causing a crash.
Restore support for SCSI timeout handlers to call scsi_done() as follows:
* Change all .eh_timed_out() handlers as follows:
- Change the return type into enum scsi_timeout_action.
- Change BLK_EH_RESET_TIMER into SCSI_EH_RESET_TIMER.
- Change BLK_EH_DONE into SCSI_EH_NOT_HANDLED.
* In scsi_timeout(), convert the SCSI_EH_* values into BLK_EH_* values.
Reviewed-by: Lee Duncan <lduncan@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221018202958.1902564-3-bvanassche@acm.org Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Tue, 18 Oct 2022 20:29:49 +0000 (13:29 -0700)]
scsi: core: Fix a race between scsi_done() and scsi_timeout()
If there is a race between scsi_done() and scsi_timeout() and if
scsi_timeout() loses the race, scsi_timeout() should not reset the request
timer. Hence change the return value for this case from BLK_EH_RESET_TIMER
into BLK_EH_DONE.
Although the block layer holds a reference on a request (req->ref) while
calling a timeout handler, restarting the timer (blk_add_timer()) while a
request is being completed is racy.
Reviewed-by: Mike Christie <michael.christie@oracle.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Hannes Reinecke <hare@suse.de> Reported-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: 15f73f5b3e59 ("blk-mq: move failure injection out of blk_mq_complete_request") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221018202958.1902564-2-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>