www.infradead.org Git - users/hch/blktests.git/log

block/022: skip test when only 1 cpu available

Below is fail log in single cpu node.
block/022 (Test hang caused by freeze/unfreeze sequence)     [failed]
    runtime    ...  30.138s
    --- tests/block/022.out 2020-10-09 12:43:48.000000000 +0000
    +++ /usr/local/blktests/results/nodev/block/022.out.bad 2020-10-09 13:45:18.594417401 +0000
    @@ -1,2 +1,3 @@
     Running block/022
    +taskset: failed to set pid 13212's affinity: Invalid argument
     Test complete

Signed-off-by: Xiao Liang <xiliang@redhat.com>
[Shin'ichiro: fixed commit message typo and resolved merge conflict]
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

nvmeof-mp/001: Set expected count properly

The number of block devices will increase according
to the number of RDMA-capable NICs.
For example, nvmeof-mp/001 with two RDMA-capable NICs
got the following error:
-------------------------------------
    Configured NVMe target driver
    -count_devices(): 1 <> 1
    +count_devices(): 2 <> 1
    Passed
-------------------------------------

Set expected count properly by calculating the number
of RDMA-capable NICs.

Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

srp/011: Avoid $dev becoming invalid during test

$dev will become invalid when log_out has been done
and fio doesn't run yet. In this case subsequent fio
throws the following error:
-------------------------------------
    From diff -u 011.out 011.out.bad
    Configured SRP target driver
    -Passed

    From 011.full:
    fio: looks like your file system does not support direct=1/buffered=0
    fio: destination does not support O_DIRECT
    run_fio exit code: 1
-------------------------------------
This issue happens randomly.

Try to fix the issue by holding $dev before test.

Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>

CONTIRIBUTING, README: transfer maintainer role

To offload blktests maintenance overhead from Omar, I volunteer to take
the blktests maintainer role. Replace Omar's name and e-mail address in
CONTRIBUTING.md with mine. Also note his original authorship in
README.md.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

common/multipath-over-rdma: Remove unused debug operation

The loop ("for m in ;") will never be entered and it seems
unnecessary to debug sereval modules during test. So I try
to remove the debug operation.

Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>

tests/nvme: add tests for error logging

Test nvme error logging by injecting errors. Kernel must have FAULT_INJECTION
and FAULT_INJECTION_DEBUG_FS configured to use error injector. Tests can be
run with or without NVME_VERBOSE_ERRORS configured.

Test for commit bd83fe6f2cd2 ("nvme: add verbose error logging").

Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>

tests/nvme: add helper routine to use error injector

nvme tests can use these helper routines to setup and use
the nvme error injector.

Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Alan Adamson <alan.adamson@oracle.com>

Documentation: Fix typo nvme-trtype -> nvme_trtype

Fixes: 3be78490def5 ("Documentation: add document for nvme-rdma nvmeof-mp srp tests")
Reviewed-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>

tests/nvme/rc: Fix possible endless loop

A failed connection will cause the later endless loop, so we should return
directly in such case.

344 _nvmet_passthru_target_connect() {
345         local trtype=$1
346         local subsys_name=$2
347
348         _nvme_connect_subsys "${trtype}" "${subsys_name}" || return
349         nsdev=$(_find_nvme_passthru_loop_dev "${subsys_name}")
350
351         # The following tests can race with the creation
352         # of the device so ensure the block device exists
353         # before continuing
354         while [ ! -b "${nsdev}" ]; do sleep 1; done <<< endless loop
355
356         echo "${nsdev}"
357 }

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

scsi/003: remove unnecessary out file

The test case scsi/003 was removed with the commit 5e803ca0ae99 ("Remove
partition rereading tests for reverted fixes"), but its out file was
left. Remove it.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

scsi/006: skip cache types which disable read cache for SATA drives

The test case scsi/006 sets four cache types to test target SCSI
devices. Two cache types out of the four, "none" and "write back, no
read (daft)" disable read cache. However, these two types do not work
for SATA drives since SAT specification requires Disable Read Cache is
always set to zero in the caching mode page. It results in invalid
argument error and the test case failure.

To avoid the failure, skip the cache types which disable read cache if
the test devices are SATA drives. To check the device, add a helper
function _test_dev_is_sata in scsi/rc.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

scsi/006: whitelist for zoned mode

Define CAN_BE_ZONED=1 in scsi/006. This test case can be executed
without problem against zoned SCSI devices specified in TEST_DEVS, such
as SMR HDDs.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

block/027, scsi/004: whitelist scsi_debug test cases for zoned mode

Define CAN_BE_ZONED=1 in block/027 and scsi/004. These test cases can be
executed in zoned mode without problem against scsi_debug devices in
zoned mode.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

common/scsi_debug: prepare scsi_debug in zoned mode

To allow running tests using scsi_debug device with the zoned mode
disabled (current setup) as well as enabled, modify the _init_scsi_debug
helper function. When RUN_FOR_ZONED is set, specify zbc=host-managed
parameter to scsi_debug module so that the scsi_debug devices are
prepared in zoned mode.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

zbd/008: check no stale page cache after BLKRESETZONE ioctl

Run two processes which repeat data read and BLKRESETZONE ioctl, and
check that the race does not leave stale page cache. This allows to
catch the bug fixed with the commit e5113505904e ("block: Discard page
cache of # zone reset target range").

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

nvme tests should use nvme_trtype when setting up passthru target

No matter what was passed in with nvme_trtype, the target was being
set up with trtype as "loop". This caused several passthru tests
to fail when testing tcp or rdma.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Alan Adamson <alan.adamson@oracle.com>

block/008: check CPU offline failure due to many IRQs

When systems have more IRQs than a single CPU can handle, the test case
block/008 fails with kernel message such as,

"CPU 31 has 111 vectors, 90 available. Cannot disable CPU"

The failure cause is that the test case offlined too many CPUs and the
left online CPU can not hold all of the required IRQ vectors. To avoid
this failure, check error message of CPU offline. If CPU offline failure
cause is IRQ vector resource shortage, do not handle it as a failure.
Also keep the actual number of CPUs which can be offlined without the
failure and use this number for the test.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

tests/srp: fix module loading issue during srp tests

The ib_isert/ib_srpt modules will be automatically loaded after the first
time rdma_rxe/siw setup, which will lead srp tests fail.

$ modprobe rdma_rxe
$ echo eno1 >/sys/module/rdma_rxe/parameters/add
$ lsmod | grep -E "ib_srpt|iscsi_target_mod|ib_isert"
ib_srpt               167936  0
ib_isert              139264  0
iscsi_target_mod      843776  1 ib_isert
target_core_mod      1069056  3 iscsi_target_mod,ib_srpt,ib_isert
rdma_cm               315392  5 rpcrdma,ib_srpt,ib_iser,ib_isert,rdma_ucm
ib_cm                 344064  2 rdma_cm,ib_srpt
ib_core              1101824  10 rdma_cm,rdma_rxe,rpcrdma,ib_srpt,iw_cm,ib_iser,ib_isert,rdma_ucm,ib_uverbs,ib_cm

$ ./check srp/001
srp/001 (Create and remove LUNs)                             [failed]
    runtime    ...  3.675s
    --- tests/srp/001.out 2021-10-13 01:18:50.846740093 -0400
    +++ /root/blktests/results/nodev/srp/001.out.bad 2021-10-14 03:24:18.593852208 -0400
    @@ -1,3 +1 @@
    -Configured SRP target driver
    -count_luns(): 3 <> 3
    -Passed
    +insmod: ERROR: could not insert module /lib/modules/5.15.0-rc5.fix+/kernel/drivers/infiniband/ulp/srpt/ib_srpt.ko: File exists
modprobe: FATAL: Module iscsi_target_mod is in use.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

tests/nvme: misc fix and coding style update

Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

tests/scsi/007: Add a test that triggers the SCSI error handler

Since none of the existing tests guarantee that the SCSI error handler
will be triggered, add a test that guarantees this.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>

tests/srp/012: Fix the displayed test status

The _have_* scripts set the SKIP_REASON variable. That variable controls
whether the status [passed] or [not run] is displayed. This patch causes
the status [passed] to be displayed instead of [not run] if legacy dm
support is not available.

Fixes: 0a2cbbd1874d ("srp and nvmeof-mp: Check whether legacy dm is supported")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Fix multiple shellcheck warnings

The latest version of shellcheck reports the following warnings:

check:307:9: warning: Quote arguments to unset so they're not glob expanded. [SC2184]
check:505:10: warning: Quote arguments to unset so they're not glob expanded. [SC2184]
check:506:10: warning: Quote arguments to unset so they're not glob expanded. [SC2184]
check:539:11: warning: Quote arguments to unset so they're not glob expanded. [SC2184]
new:102:19: note: Expansions inside ${..} need to be quoted separately, otherwise they match as patterns. [SC2295]
common/rc:272:37: warning: -ne treats this as an arithmetic expression. Use != to compare as string (or expand explicitly with $((expr))). [SC2309]
tests/block/008:65:10: warning: Quote arguments to unset so they're not glob expanded. [SC2184]
tests/block/008:71:10: warning: Quote arguments to unset so they're not glob expanded. [SC2184]

This patch fixes the above warnings.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>

tests/block: add the missing _have_fio check for block/029 block/031

Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

Documentation: add document for nvme-rdma nvmeof-mp srp tests

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

nvmeof-mp/001: fix failure when CONFIG_NVME_HWMON enabled

skip checking ng0n1/hwmon5 in count_devices

$ use_siw=1  ./check nvmeof-mp/001
nvmeof-mp/001 (Log in and log out)                           [failed]
    runtime  3.695s  ...  4.002s
    --- tests/nvmeof-mp/001.out 2021-09-12 05:35:17.866892187 -0400
    +++ /root/blktests/results/nodev/nvmeof-mp/001.out.bad 2021-09-12 06:49:25.621880616 -0400
    @@ -1,3 +1,3 @@
     Configured NVMe target driver
    -count_devices(): 1 <> 1
    +count_devices(): 3 <> 1
     Passed
$ ls -l /sys/class/nvme-fabrics/ctl/*/*/device
lrwxrwxrwx. 1 root root 0 Sep 12 06:49 /sys/class/nvme-fabrics/ctl/nvme0/hwmon5/device -> ../../nvme0
lrwxrwxrwx. 1 root root 0 Sep 12 06:49 /sys/class/nvme-fabrics/ctl/nvme0/ng0n1/device -> ../../nvme0
lrwxrwxrwx. 1 root root 0 Sep 12 06:49 /sys/class/nvme-fabrics/ctl/nvme0/nvme0n1/device -> ../../nvme0

Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

block/001: don't exit test with pending async scan

We have to run scan and delete together, otherwise pending async
may prevent scsi_debug from being unloaded, and cause failure of
'modprobe: FATAL: Module scsi_debug is in use.'

Fix the issue by always running both scan and delete together.

Fixes: f3bcd8c ("block/001: wait until device is added")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>

Create test name from most recently used test number

The 'new' script can inadvertently use a test name that was removed, ex
nvme/001, which may create confusion if identically-named tests exist
among different versions. Instead, generate a test name at the numerical
tail end of the test group.

Signed-off-by: Jon Derrick <jonathan.derrick@linux.dev>

block/001: wait until device is added

Writing to the scan attribute of scsi host is usually one sync scan, but
devices in this sync scan may be delay added if there is concurrent
asnyc scan.

So wait until the device is added in block/001 for avoiding to fail
the test.

Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
[Omar: fix quoting and simplify logic]
Signed-off-by: Omar Sandoval <osandov@fb.com>

zbd/007: Reset test target zones at test end

The test case zbd/007 checks write pointer mapping between a logical
device and its container device. To do so, it moves write pointers of
the container device by writing data to the container device. When the
logical device is a dm-crypt device, this test case works as expected,
but the data written to the container device is not encrypted, then it
leaves broken data on the logical, dm-crypt device. This results in I/O
errors in the following operations to the dm-crypt device.

To avoid the I/O errors, reset the test target zones of the logical
device at the test case end to wipe out the broken data.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

zbd/rc: Support dm-crypt

Linux kernel 5.9 added zoned block device support to dm-crypt. To test
dm-crypt devices, modify the function _get_dev_container_and_sector().
To handle device-mapper table format difference between dm-crypt and
dm-linear/flakey, add dev_idx and off_idx local variables.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

tests/nvme/031: add the missing steps for loop_dev clean up

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

tests/block/031: Add a test for sharing a tag set across hardware queues

Support for sharing a tag set across hardware queues has been added
recently to the Linux kernel. See also the BLK_MQ_F_TAG_HCTX_SHARED flag,
Linux kernel commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per
tagset"; v5.10) and commit 0905053bdb5b ("null_blk: Support shared tag
bitmap"; v5.10). Add a test that triggers the shared tag set code in the
block layer core.

Cc: John Garry <john.garry@huawei.com>
Cc: Don Brace<don.brace@microsemi.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>

multipath: work around false shellcheck error

Shellcheck seems to think that [ -z "$debug" ] is setting the debug
variable and warns:

common/multipath-over-rdma:190:11: note: Modification of debug is local (to subshell caused by pipeline). [SC2030]
common/multipath-over-rdma:606:8: note: debug was modified in a subshell. That change might be lost. [SC2031]

Work around this by using test instead.

Signed-off-by: Omar Sandoval <osandov@fb.com>

tests/block/014: ignore dd error messages

The kernel commit de3510e52b0a ("null_blk: fix command timeout
completion handling") fixed null_blk driver to report ETIMEDOUT errors
for IO operations failed with a timeout. This change causes the dd call
in block/014 case to print the following error message:

dd: error reading '/dev/nullb0': Connection timed out

The presence of this message result in a failure of the test case even
without a kernel crash or hang, which is what the block/014 case is
testing. Avoid this failure by ignoring dd error messages using a
redirection of dd stderr to /dev/null.

Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

tests/srp/rc, tests/nvmeof-mp/rc: add fio check to group_requires

Most of the srp and nvmeof-mp tests need fio, we need add fio
check before running the tests

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

rdma: Use rdma link instead of /sys/class/infiniband/*/parent

The approach of verifying whether or not an RDMA interface is associated
with the rdma_rxe interface by looking up its parent device is deprecated
and will be removed soon from the Linux kernel. Hence this patch that uses
the rdma link command instead.

Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>

tests/srp/rc: Improve reliability of stop_lio_srpt()

Remove the 'np' directory if it exists. Unload the iscsi_target_mod kernel
module if it has been loaded.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>

tests/block/030: Make this test less noisy

Since test block/030 injects blk_mq_realloc_hw_ctxs() failures, it is
expected that writes into the 'submit_queues' attribute can fail. Send
the 'nproc: write error: Cannot allocate memory' failures to $FULL instead
of stderr. See also commit a668c61064f2 ("Add a test that triggers the
blk_mq_realloc_hw_ctxs() error path").

Signed-off-by: Bart Van Assche <bvanassche@acm.org>

nvmeof-mp/rc: fix nvmeof-mp failure when NVME_TARGET_PASSTHRU enabled

$ ./check nvmeof-mp/001
nvmeof-mp/001 (Log in and log out) [passed]
runtime 0.400s ... 0.457s
rmdir: failed to remove 'subsystems/nvme-test/passthru/admin_timeout': Not a directory
rmdir: failed to remove 'subsystems/nvme-test/passthru/device_path': Not a directory
rmdir: failed to remove 'subsystems/nvme-test/passthru/enable': Not a directory
rmdir: failed to remove 'subsystems/nvme-test/passthru/io_timeout': Not a directory

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

zbd/005: Provide max_active/open_zones limit to fio command

When test target zoned block devices have max_open_zones or
max_active_zones limit, high queue depth sequential write in the test
case zbd/005 may result in parallel writes to number of zones beyond the
limit. This causes I/O errors.

To avoid the errors, specify the limit to fio command in the test case.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

common/rc: Check both max_active_zones and max_open_zones

Linux kernel 5.9 introduced new sysfs attributes max_active_zones and
max_open_zones for zoned block devices. Max_open_zones is the limit of
number of zones in open status. Max_active_zones is the limit of number
of zones in open or closed status. Currently, the helper function
_test_dev_max_active_zones() checks only max_active_zones, but it is not
enough. When the device has max_open_zones, check for max_active_zones
can not avoid the errors for write operations.

To avoid the error, improve the function _test_dev_max_active_zones() to
check the limits both. Rename it to _test_dev_max_open_active_zones().
When one of the limits is available for the test target device, return
it. If both limits are available, return smaller limit.

Also modify block/004 and zbd/003 to call the renamed helper function
and update comment description.

Fixes: e6981bb2d9ce ("common/rc: Add _test_dev_max_active_zones() helper function")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

common/rc: confirm pcie hotplug capabilities

It turns out some PCIe slots report hotplug surprise but are not hotplug
capable. Despite the contridiction, the spec seems to allow that.

The linux pciehp driver needs hotplug capable to bind to the slot, and
the block/019 test requires hotplug surprise to handle the unannounced
link-down. Verify both bits in the slot capabilities register are set.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

common/multipath-over-rdma: allow to set use_siw

With this change, we can change to use siw for nvme-rdma/nvmeof-mp/srp
testing from cmdline:

$ use_siw=1 nvme-trtype=rdma ./check nvme/
$ use_siw=1 ./check nvmeof-mp/
$ use_siw=1 ./check srp/

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

common/rc: _have_iproute2 fix for "ip -V" change

With below commit, the version will be updated base on the tag
fbef6555 replace SNAPSHOT with auto-generated version string

To reproduce it:
$ ./check srp/015
common/rc: line 98: [: ip utility, iproute2-5.9.0: integer expression expected

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

nvmeof-mp/012, srp/012: fix the scheduler list

There is no cfq scheduler and new added kyber scheduler in lastest kernel,
introduce get_scheduler_list and fix nvmeof-mp/012 srp/012

To reproduce it:
$ ./check nvmeof-mp/012
nvmeof-mp/012 (dm-mpath on top of multiple I/O schedulers)   [passed]
    runtime  5.922s  ...  8.804s

$ cat results/nodev/nvmeof-mp/012.full  | grep -n "Changing scheduler"
31:Changing scheduler of dm-3 from mq-deadline kyber [bfq] none into cfq failed
47:Changing scheduler of dm-3 from mq-deadline kyber [bfq] none into cfq failed

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

tests/nvmeof-mp/rc: run nvmeof-mp tests if we set multipath=N

To enable it, just do bellow step before we run it:
$ echo "options nvme_core multipath=N" >/etc/modprobe.d/nvme.conf

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

tests/srp/rc: update the ib_srpt module name

Fix the ib_srpt module insmod failure as the module in some distros are
end with .xz, like bellow on fedora:
/lib/modules/$(uname -r)/kernel/drivers/infiniband/ulp/srpt/ib_srpt.ko.xz

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

Migrate to GitHub Actions

Travis CI is no longer offering free open source CI, so migrate to
GitHub Actions.

Signed-off-by: Omar Sandoval <osandov@fb.com>

nvme/038: Test removal of un-enabled subsystem and ports

Test that we can remove a subsystem that has not been enabled by
passthru or any ns. Do the same for ports while we are at it.

This was an issue in the original passthru patches and is
not commonly tested. So this test will ensure we don't regress this.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

nvme/037: Add test which loops passthru connect and disconnect

Similar to test nvme/031 except for passthru controllers.

Note: it's normal to get I/O errors in this test as when the controller
disconnects it races with the partition table read.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

nvme/036: Add test for testing reset command on nvme-passthru

Similar to test 022 but for passthru controllers.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

nvme/035: Add test to verify passthru controller with a filesystem

This is a similar test as nvme/012 and nvme/013, except with a
passthru controller.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

nvme/034: Add test for passthru data verification

Similar to test nvme/010 and nvme/011 but for a passthru controller

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

nvme/033: Simple test to create and connect to a passthru target

This tests creates and connects to a passthru controller backed
by a test NVMe namespace. It then verifies that some common fields
in id-ctrl and id-ns are the same in the target and the orginial
device.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

nvme: Add common helpers for passthru tests

Add some simple helpers to setup a passthru target that passes through
to a nvme test device.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

nvme: Search for specific subsysnqn in _find_nvme_loop_dev

This ensures we find the correct nvme loop device if others exist on a
given system (which is generally not expected on test systems).

Additionally, this will be required in the upcomming test nvme/037 which
will have controllers racing with ones being destroyed.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

common/xfs: Create common helper to verify block device with xfs

Make a common helper from the code in tests nvme/012 and nvme/013
to run an fio verify on a XFS file system backed by the
specified block device.

While we are at it, all the output is redirected to $FULL instead of
/dev/null.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

common/xfs: Create common helper to check for XFS support

Two nvme tests create and mount XFS filesystems and check for mkfs.xfs.

They should also check for XFS support in the kernel so create a common
helper for this.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

common/fio: Remove state file in common helper

Instead of each individual test removing this file, just do it
in the common helper.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

zbd/003: Reset zones when the test device has max_active_zones limit

When the test target device has the max_active_zones limit, write
operations by test case zbd/003 may open zones beyond the limit and
trigger write failures.

To avoid the failure, check max_active_zones limit of the test target
device. If the limit is valid, reset all zones of the device at test
start to ensure that number of open zones does not exceed the limit.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

block/004: Provide max_active_zones to fio command

If the test target devices is a zoned block device with max_active_zones
limit, the fio command in block/004 opens zones beyond the limit and
fails with I/O errors.

To avoid the failure, pass the limit value to fio using --max_open_zones
option. This option, which was introduced to fio together with
zonemode=zbd, keeps the number of open zones within the specified value.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

common/rc: Add _test_dev_max_active_zones() helper function

Linux kernel 5.9 introduced a new sysfs attribute "max_active_zones". It
is an attribute of zoned block devices which indicates the limit of zones
in open or close status. To refer the attribute from test cases,
introduce the helper function _test_dev_max_active_zones(). If the
attribute is available, the function returns the attribute value.
Otherwise, returns 0 to indicate that the device does not have the limit.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

nvme: support rdma transport type

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

common: move module_unload to common

It creates a dependency between multipath-over-rdma and test/nvmeof/rc
(and test/srp/rc) which is not a natural home for it.

Move it to common helpers.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

nvme: support nvme-tcp when runinng tests

run with: nvme_trtype=tcp ./check nvme

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

tests/nvme: restrict tests to specific transports

Protect against running tests with the wrong transport type. Most tests
cannot have nvme_trtype=nvme and discovery tests expect the $trtype to
be written and verified in the .out file. Adding a couple of helpers
to restrict the transport types in tests.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

nvme: make tests transport type agnostic

Pass in nvme_trtype to common routines that can
support multiple transport types.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

nvme: consolidate some nvme-cli utility functions

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

nvme: consolidate nvme requirements based on transport type

Right now, only pci and loop have tests, hence these are
the only ones that are allowed. The user can pass an env
variable nvme_trtype and check for the necessary modules.

This allows prepares us to support other transport types.

Note that test 031 is designed to run only with nvme, hence
it overrides the environment variable to nvme_trtype=pci.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

nvme/005: add the missing _have_program nvme

Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

common/multipath-over-rdma: make block scheduler directory optional

We currently fail if the following tests if the directory
/lib/modules/$(uname -r)/kernel/block does not exist. Just make
this optional. Older distributions won't have this directory.

srp/001
srp/002
srp/013
srp/014

Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

zbd/002: Check write pointers only when zones have valid conditions

Per ZBC, ZAC and ZNS specifications, when zones have condition "read
only", "full" or "offline", the zones may not have valid write pointers.
In such a case, do not check validity of write pointers.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

zbd/005: Enable zonemode=zbd when zone capacity is less than zone size

The test case zbd/005 runs fio to issue sequential write requests with
high queue depth. This workload does not require zonemode=zbd for zones
with zone capacity same as zone length. However, when the zone has
smaller zone capacity than zone size, it issues write beyond zone
capacity and triggers write errors.

To allow fio skipping the writes beyond zone capacity, specify the option
zonemode=zbd to fio when the test target zone has zone capacity smaller
than zone size.

Also remove unused sysfs access in the test case.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

zbd/004: Check zone boundary writes using zones without zone capacity gap

The test case zbd/004 checks zone boundary write handling by block layer
using two contiguous sequential write required zones. This test is valid
when the first zone has same zone capacity as zone size. However, if the
zone has zone capacity smaller than zone size, the write in the zone
beyond zone capacity limit causes write error and the test fails.

To avoid the write error, find the two zones with first zone that has
zone capacity same as zone size. If such zones are not found, skip the
test case.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

zbd/002: Check validity of zone capacity

Linux kernel 5.9 zone descriptor interface added the new zone capacity
field defining the range of sectors usable within a zone. Add a check to
ensure that the zone capacity is smaller than or equal to the zone size.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

zbd/rc: Support zone capacity report by blkzone

Linux kernel 5.9 zone descriptor interface added the new zone capacity
field defining the range of sectors usable within a zone. The blkzone
tool recently supported the zone capacity in its report zone feature.
Modify the helper function _get_blkzone_report() to support the zone
capacity field.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

Remove partition rereading tests for reverted fixes

The change that block/013 tested for was reverted in Linux kernel commit
10c70d95c0f2 ("block: remove the bd_openers checks in
blk_drop_partitions"). To quote Christoph:

"That check only catches file systems that use a single block device
(e.g. not btrfs multi-device or XFS or ext4 with log devices) and also
doesn't catch non-filesystem users. I first tried to generalized it,
but that ran into a chain of other problems. And there really isn't
much of a problem re-reading partitions on a mounted file system - it is
pointless but not actually harmful."

So, we shouldn't expect that check to come back. Let's remove the
test.

Similarly, the change that scsi/003 tested for was reverted in Linux
kernel commit 8acf608e602f ("Revert "scsi: sd: Keep disk read-only when
re-reading partition""). According to that commit, this can be fixed, so
when that happens we can reintroduce the test.

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>

tests/srp/rc: Separate login parameters with a comma

This patch fixes a syntax error in the SRP login string.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>

srp tests: Use _{init,exit}_scsi_debug() instead of duplicating these functions

This patch does not change any functionality but reduces code duplication.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>

common/multipath-over-rdma: Log mkfs output

The mkfs output is important, hence log it in $FULL.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>

srp, nvmeof-mp: Use no_path_retry instead of queue_if_no_path

queue_if_no_path has been deprecated, hence use no_path_retry instead.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>

tests/nvmeof-mp/rc: Make login failures easier to debug

Record the login parameters in $FULL.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>

block/012: add comment explaning second --setro

Closes #73.

Signed-off-by: Omar Sandoval <osandov@fb.com>

tests/srp/rc: Fix a shellcheck warning

Fix the following shellcheck warning:

tests/srp/rc:519:11: warning: The = here is literal. To assign by index, use
( [index]=value ) with no spaces. To keep as literal, quote it. [SC2191]

Reported-by: Shin'ichiro Kawasaki
Signed-off-by: Bart Van Assche <bvanassche@acm.org>

tests: mark tests with CAN_BE_ZONED=1

Zoned devices should have no issues running block/{008,019} and
nvme/032, so mark the tests with CAN_BE_ZONED=1.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>

common/cpuhotplug: fix ALL_CPUS

_have_cpu_hotplug needs to set cpu before adding it to ALL_CPUS.

Closes #68.

tests/srp/rc: Do not pass an empty string to dd

Instead of passing an empty string as argument to dd, do not pass any
argument when not using direct I/O.

Fixes: 577caa7d2b4a ("Fix unquoted integer shellcheck errors")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>

zbd/007: Add --force option to blkzone reset

The test case zbd/007 utilizes blkzone command from util-linux project
to reset zones of test target devices. Recently, blkzone was modified to
report EBUSY error when it was called to change zone status of devices
used by the system. This avoids unintended zone status change by mistake
and good for most of use cases.

However this change triggered failure of the test case zbd/007 with the
EBUSY error. The test case executes blkzone to reset zones of block devices
which the system maps to container devices such as dm-linear.

To avoid this failure, modify zbd/007 to check if blkzone supports --force
option. And if it is supported, add it to blkzone command line. This option
was introduced to blkzone to allow zone status change of devices even when
the system use them.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

Fix ./check: line 275: LAST_TEST_RUN["$key"]: unbound variable

for the srp first run where /path/to/blktests/results is empty, it will
throw errors like:

$ ./check srp/001 srp/002 srp/003 srp/004 srp/005 srp/006 srp/007 srp/008 srp/009 srp/010 srp/011 srp/012 srp/013 srp/015
srp/001 (Create and remove LUNs)----------------------------
srp/001 (Create and remove LUNs)                             [failed]

==> /tmp/stderr <==
./check: line 275: LAST_TEST_RUN["$key"]: unbound variable

==> /tmp/stdout <==
    --- tests/srp/001.out>------2020-06-09 13:44:19.000000000 +0800
    +++ /lkp/benchmarks/blktests/results/nodev/srp/001.out.bad>-2020-06-09 16:18:22.594012394 +0800
    @@ -1,3 +1,4 @@
     Configured SRP target driver
    +tests/srp/rc: line 105: use_blk_mq: Permission denied
     count_luns(): 3 <> 3
     Passed

Signed-off-by: Li Zhijian <zhijianx.li@intel.com>

Fix unquoted integer shellcheck errors

Shellcheck apparently got stricter about SC2086 ("Double quote to
prevent globbing and word splitting") because now it is warning about
unquoted integers.

travis: update shellcheck URL

The latest build failed with this error:

```
You are downloading ShellCheck from an outdated URL!
Please update to the new URL:
https://github.com/koalaman/shellcheck/releases/download/stable/shellcheck-stable.linux.x86_64.tar.xz
For more information, see:
https://github.com/koalaman/shellcheck/issues/1871
PS: Sorry for breaking your build :(
```

tests/srp/rc: Make the SRP tests pass against kernel v5.7

Linux kernel commit 569334014370 ("scsi: core: Delete scsi_use_blk_mq")
removed the use_blk_mq sysfs attribute. Hence only write into the
use_blk_mq sysfs attribute if it exists.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Add $DESCRIPTION to the TEST_RUN

Signed-off-by: Sebastian Chlad <schlad@suse.de>

Fix unintentional skipping of tests

cd11d001fe86 ("Support skipping tests from test{,_device}()") breaks a
good handful of tests.

For example, block/005 uses _test_dev_is_rotational to check if the
device is rotational and uses the result to size up the fio run. As a
side-effect, _test_dev_is_rotational also sets SKIP_REASON, which (since
commit cd11d001fe86) causes the test to print out a "[not run]" even
through the test actually ran successfully.

Fix this by renaming the existing helpers to _require_foo (e.g. a
_require_test_dev_is_rotational) and add the non-_require variant where
needed.

Fixes: cd11d001fe86 ("Support skipping tests from test{,_device}()")
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
[Omar: simplify new _test_dev helpers]
Signed-off-by: Omar Sandoval <osandov@fb.com>

Add a test that triggers the blk_mq_realloc_hw_ctxs() error path

Add a test that triggers the code touched by commit d0930bb8f46b ("blk-mq:
Fix a recently introduced regression in blk_mq_realloc_hw_ctxs()"). This
test only runs if a recently added fault injection feature is available,
namely commit 596444e75705 ("null_blk: Add support for init_hctx() fault
injection").

Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Introduce the function _configure_null_blk()

Introduce a function for creating a null_blk device instance through
configfs.

Suggested-by: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Use _{init,exit}_null_blk instead of open-coding these functions

This patch reduces code duplication.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Make _exit_null_blk remove all null_blk device instances

Instead of making every test remove null_blk device instances before calling
_exit_null_blk(), move the null_blk device instance removal code into
_exit_null_blk().

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>

common/fio: do not use norandommap with verify

As per the fio documentation, using norandommap with an async I/O engine
and I/O depth > 1, can cause verification errors.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>

Support skipping tests from test{,_device}()

Most of the time, test requirements can be checked without much setup.
However, in some cases, it's not possible to know if the test can be run
until we're halfway through the test. Allow setting SKIP_REASON from the
test function. This should only be used as a last resort.