scsi: target: Move delayed/ordered tracking to per CPU
The atomic use from the delayed/ordered tracking is causing perf issues
when using higher perf backend devices and multiple queues. This moves
the values to a per CPU counter. Combined with the per CPU stats patch,
this improves IOPS by up to 33% for 8K IOS when using 4 or more queues
from the initiator.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20250424032741.16216-3-michael.christie@oracle.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>