From e6286fae32c9e90e0975ba59c28b9893ef8ade14 Mon Sep 17 00:00:00 2001 From: David Jeffery Date: Thu, 11 Oct 2012 05:39:24 +0530 Subject: [PATCH] qla2xxx: Test and clear FCPORT_UPDATE_NEEDED atomically. When the qla2xxx driver loses access to multiple, remote ports, there is a race condition which can occur which will keep the request stuck on a scsi request queue indefinitely. While testing path loss, a customer encountered a case where the loss of the FC connection could result in a process which sent a command through the sg driver becoming hung and unresponsive. It had sent a command to one of the now unreachable LUNs, and the command and its request never completes. But this isn't a case of the target and LUNs being blocked. From a vmcore, the FC transport still considered the port as in a working, FC_PORTSTATE_ONLINE state. Instead, the request was on the queue, but the target structure was in blocked state. qla2xxx's queuecommand function was returning SCSI_MLQUEUE_TARGET_BUSY, causing the request to keep being requeued and retried without ever advancing. The qla2xxx driver was rejecting the command with SCSI_MLQUEUE_TARGET_BUSY because its internal port state for this port was in FCS_DEVICE_LOST state. This should not happen. qla2xxx should not think the port is lost while the fc transport thinks the device is in good, working order. This bad state occurred do to a race condition with how the FCPORT_UPDATE_NEEDED bit is set in qla2x00_schedule_rport_del(), and how it is cleared in qla2x00_do_dpc(). The problem port has its drport pointer set, but it has never been processed by the driver to inform the fc transport that the port has been lost. qla2x00_schedule_rport_del() sets drport, and then sets the FCPORT_UPDATE_NEEDED bit. In qla2x00_do_dpc(), the port lists are walked and any drport pointer is handled and the fc transport informed of the port loss, then the FCPORT_UPDATE_NEEDED bit is cleared. This leaves a race where the dpc thread is processing one port removal, another port removal is marked with a call to qla2x00_schedule_rport_del(), and the dpc thread clears the bit for both removals, even though only the first removal was actually handled. Until another event occurs to set FCPORT_UPDATE_NEEDED, the later port removal is never finished and qla2xxx stays in a bad state which causes requests to become stuck on request queues. The attached patch updates the driver to test and clear FCPORT_UPDATE_NEEDED atomically. This ensures the port state changes are processed and not lost. If a race occurs, the dpc thread will walk the ports an extra time as FCPORT_UPDATE_NEEDED will have become set again. JIRA Key: V2632FC-283 Acked-by: Giridhar Malavali Acked-by: Armen Baloyan Signed-off-by: Chad Dupuis Signed-off-by: Saurav Kashyap Signed-off-by: Jerry Snitselaar --- drivers/scsi/qla2xxx/qla_os.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index e14c579300df..9c0ba5969dac 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -4453,9 +4453,9 @@ qla2x00_do_dpc(void *data) "ISP abort end.\n"); } - if (test_bit(FCPORT_UPDATE_NEEDED, &base_vha->dpc_flags)) { + if (test_and_clear_bit(FCPORT_UPDATE_NEEDED, + &base_vha->dpc_flags)) { qla2x00_update_fcports(base_vha); - clear_bit(FCPORT_UPDATE_NEEDED, &base_vha->dpc_flags); } if (test_bit(ISP_QUIESCE_NEEDED, &base_vha->dpc_flags)) { -- 2.50.1