www.infradead.org Git - users/hch/block.git/log

]> www.infradead.org Git - users/hch/block.git/log

projects / users / hch / block.git / log

Christoph Hellwig [Sun, 25 Feb 2024 06:53:44 +0000 (07:53 +0100)]

drbd: atomically update queue limits in drbd_reconsider_queue_parameters

Switch drbd_reconsider_queue_parameters to set up the queue parameters
in an on-stack queue_limits structure and apply the atomically. Remove
various helpers that have become so trivial that they can be folded into
drbd_reconsider_queue_parameters.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 25 Feb 2024 06:49:28 +0000 (07:49 +0100)]

drbd: split out a drbd_discard_supported helper

Add a helper to check if discard is supported for a given connection /
backing device combination.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Sun, 25 Feb 2024 06:52:18 +0000 (07:52 +0100)]

drbd: don't set max_write_zeroes_sectors in decide_on_discard_support

fixup_write_zeroes always overrides the max_write_zeroes_sectors value
a little further down the callchain, so don't bother to setup a limit
in decide_on_discard_support.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Fri, 23 Feb 2024 18:25:06 +0000 (19:25 +0100)]

drbd: merge drbd_setup_queue_param into drbd_reconsider_queue_parameters

drbd_setup_queue_param is only called by drbd_reconsider_queue_parameters
and there is no really clear boundary of responsibilities between the
two.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Fri, 23 Feb 2024 18:23:13 +0000 (19:23 +0100)]

drbd: refactor the backing dev max_segments calculation

Factor out a drbd_backing_dev_max_segments helper that checks the
backing device limitation.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Fri, 23 Feb 2024 07:34:16 +0000 (08:34 +0100)]

drbd: refactor drbd_reconsider_queue_parameters

Split out a drbd_max_peer_bio_size helper for the peer I/O size,
and condense the various checks to a nested min3(..., max())) instead
of using a lot of local variables.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Fri, 23 Feb 2024 07:19:58 +0000 (08:19 +0100)]

drbd: pass the max_hw_sectors limit to blk_alloc_disk

Pass a queue_limits structure with the max_hw_sectors limit to
blk_alloc_disk instead of updating the limit on the allocated gendisk.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Fri, 23 Feb 2024 15:55:54 +0000 (16:55 +0100)]

block: remove disk_stack_limits

disk_stack_limits is unused now, remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Mon, 26 Feb 2024 14:30:13 +0000 (09:30 -0500)]

md: remove mddev->queue

Just use the request_queue from the gendisk pointer in the relatively
few places that sill need it.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Mon, 26 Feb 2024 14:06:19 +0000 (15:06 +0100)]

md: don't initialize queue limits

Initial queue limits are now set from ->run. Remove the superfluous
initialization in md_alloc and level_store.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Mon, 26 Feb 2024 13:44:42 +0000 (14:44 +0100)]

md/raid10: use the atomic queue limit update APIs

Build the queue limits outside the queue and apply them using
queue_limits_set. To make the code more obvious also split the queue
limits handling into separate helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Mon, 26 Feb 2024 13:50:55 +0000 (14:50 +0100)]

md/raid5: use the atomic queue limit update APIs

Build the queue limits outside the queue and apply them using
queue_limits_set. To make the code more obvious also split the queue
limits handling into separate helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Mon, 26 Feb 2024 13:41:16 +0000 (14:41 +0100)]

md/raid1: use the atomic queue limit update APIs

Build the queue limits outside the queue and apply them using
queue_limits_set. To make the code more obvious also split the queue
limits handling into a separate helper function.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Mon, 26 Feb 2024 13:36:58 +0000 (14:36 +0100)]

md/raid0: use the atomic queue limit update APIs

Build the queue limits outside the queue and apply them using
queue_limits_set. To make the code more obvious also split the queue
limits handling into a separate helper function.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Thu, 22 Feb 2024 11:56:40 +0000 (12:56 +0100)]

md: add queue limit helpers

Add a few helpers that wrap the block queue limits API for use in MD.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Mon, 26 Feb 2024 14:28:50 +0000 (09:28 -0500)]

md: add a mddev_is_dm helper

Add a helper to check for a DM-mapped MD device instead of using
the obfuscated ->gendisk or ->queue NULL checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Mon, 26 Feb 2024 14:21:03 +0000 (09:21 -0500)]

md: add a mddev_add_trace_msg helper

Add a small wrapper around blk_add_trace_msg that hides some argument
dereferences and the check for a DM-mapped MD device.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Mon, 26 Feb 2024 13:23:28 +0000 (14:23 +0100)]

md: add a mddev_trace_remap helper

Add a helper to trace bio remapping that hides some argument
dereferences and the check for a DM-mapped MD device.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Fri, 23 Feb 2024 06:04:25 +0000 (07:04 +0100)]

dm: use queue_limits_set

Use queue_limits_set which validates the limits and takes care of
updating the readahead settings instead of directly assigning them to
the queue. For that make sure all limits are actually updated before
the assignment.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Thu, 22 Feb 2024 10:07:09 +0000 (11:07 +0100)]

block: add a queue_limits_stack_bdev helper

Add a small wrapper around blk_stack_limits that allows passing a bdev
for the bottom device and prints an error in case of misaligned
device. The name fits into the new queue limits API and the intent is
to eventually replace disk_stack_limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Christoph Hellwig [Thu, 22 Feb 2024 11:47:37 +0000 (12:47 +0100)]

block: add a queue_limits_set helper

Add a small wrapper around queue_limits_commit_update for stacking
drivers that don't want to update existing limits, but set an
entirely new set.

Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Chengming Zhou [Sat, 24 Feb 2024 13:46:46 +0000 (13:46 +0000)]

bdev: remove SLAB_MEM_SPREAD flag usage

The SLAB_MEM_SPREAD flag is already a no-op as of 6.8-rc1, remove
its usage so we can delete it from slab. No functional change.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Link: https://lore.kernel.org/r/20240224134646.829105-1-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Qais Yousef [Fri, 23 Feb 2024 15:57:49 +0000 (15:57 +0000)]

block/blk-mq: Don't complete locally if capacities are different

The logic in blk_mq_complete_need_ipi() assumes SMP systems where all
CPUs have equal compute capacities and only LLC cache can make
a different on perceived performance. But this assumption falls apart on
HMP systems where LLC is shared, but the CPUs have different capacities.
Staying local then can have a big performance impact if the IO request
was done from a CPU with higher capacity but the interrupt is serviced
on a lower capacity CPU.

Use the new cpus_equal_capacity() function to check if we need to send
an IPI.

Without the patch I see the BLOCK softirq always running on little cores
(where the hardirq is serviced). With it I can see it running on all
cores.

This was noticed after the topology change [1] where now on a big.LITTLE
we truly get that the LLC is shared between all cores where as in the
past it was being misrepresented for historical reasons. The logic
exposed a missing dependency on capacities for such systems where there
can be a big performance difference between the CPUs.

This of course introduced a noticeable change in behavior depending on
how the topology is presented. Leading to regressions in some workloads
as the performance of the BLOCK softirq on littles can be noticeably
worse on some platforms.

Worth noting that we could have checked for capacities being greater
than or equal instead for equality. This will lead to favouring higher
performance always. But opted for equality instead to match the
performance of the requester without making an assumption that can lead
to power trade-offs which these systems tend to be sensitive about. If
the requester would like to run faster, it's better to rely on the
scheduler to give the IO requester via some facility to run on a faster
core; and then if the interrupt triggered on a CPU with different
capacity we'll make sure to match the performance the requester is
supposed to run at.

[1] https://lpc.events/event/16/contributions/1342/attachments/962/1883/LPC-2022-Android-MC-Phantom-Domains.pdf

Signed-off-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240223155749.2958009-3-qyousef@layalina.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Qais Yousef [Fri, 23 Feb 2024 15:57:48 +0000 (15:57 +0000)]

sched: Add a new function to compare if two cpus have the same capacity

The new helper function is needed to help blk-mq check if it needs to
dispatch the softirq on another CPU to match the performance level the
IO requester is running at. This is important on HMP systems where not
all CPUs have the same compute capacity.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240223155749.2958009-2-qyousef@layalina.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Keith Busch [Fri, 23 Feb 2024 15:59:10 +0000 (07:59 -0800)]

blk-lib: check for kill signal

Some of these block operations can access a significant capacity and
take longer than the user expected. A user may change their mind about
wanting to run that command and attempt to kill the process and do
something else with their device. But since the task is uninterruptable,
they have to wait for it to finish, which could be many hours.

Check for a fatal signal at each iteration so the user doesn't have to
wait for their regretted operation to complete naturally.

Reported-by: Conrad Meyer <conradmeyer@meta.com>
Tested-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240223155910.3622666-5-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Keith Busch [Fri, 23 Feb 2024 15:59:09 +0000 (07:59 -0800)]

block: io wait hang check helper

This is the same in two places, and another will be added soon. Create a
helper for it.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240223155910.3622666-4-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Keith Busch [Fri, 23 Feb 2024 15:59:08 +0000 (07:59 -0800)]

block: cleanup __blkdev_issue_write_zeroes

Use min to calculate the next number of sectors like everyone else.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240223155910.3622666-3-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Keith Busch [Fri, 23 Feb 2024 15:59:07 +0000 (07:59 -0800)]

block: blkdev_issue_secure_erase loop style

Use consistent coding style in this file. All the other loops for the
same purpose use "while (nr_sects)", so they win.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240223155910.3622666-2-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Li Nan [Wed, 21 Feb 2024 09:01:22 +0000 (17:01 +0800)]

block: fix deadlock between bd_link_disk_holder and partition scan

'open_mutex' of gendisk is used to protect open/close block devices. But
in bd_link_disk_holder(), it is used to protect the creation of symlink
between holding disk and slave bdev, which introduces some issues.

When bd_link_disk_holder() is called, the driver is usually in the process
of initialization/modification and may suspend submitting io. At this
time, any io hold 'open_mutex', such as scanning partitions, can cause
deadlocks. For example, in raid:

T1                              T2
bdev_open_by_dev
lock open_mutex [1]
...
  efi_partition
  ...
   md_submit_bio
md_ioctl mddev_syspend
  -> suspend all io
md_add_new_disk
  bind_rdev_to_array
   bd_link_disk_holder
    try lock open_mutex [2]
    md_handle_request
     -> wait mddev_resume

T1 scan partition, T2 add a new device to raid. T1 waits for T2 to resume
mddev, but T2 waits for open_mutex held by T1. Deadlock occurs.

Fix it by introducing a local mutex 'blk_holder_mutex' to replace
'open_mutex'.

Fixes: 1b0a2d950ee2 ("md: use new apis to suspend array for ioctls involed array reconfiguration")
Reported-by: mgperkow@gmail.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218459
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240221090122.1281868-1-linan666@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Damien Le Moal [Thu, 22 Feb 2024 13:17:24 +0000 (22:17 +0900)]

block: Do not include rbtree.h in blk-zoned.c

The block zone code does not use RB-tree. So remove the include of
linux/rbtree.h as it is not needed.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240222131724.1803520-2-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Damien Le Moal [Thu, 22 Feb 2024 13:17:23 +0000 (22:17 +0900)]

block: Clear zone limits for a non-zoned stacked queue

Device mapper may create a non-zoned mapped device out of a zoned device
(e.g., the dm-zoned target). In such case, some queue limit such as the
max_zone_append_sectors and zone_write_granularity endup being non zero
values for a block device that is not zoned. Avoid this by clearing
these limits in blk_stack_limits() when the stacked zoned limit is
false.

Fixes: 3093a479727b ("block: inherit the zoned characteristics in blk_stack_limits")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240222131724.1803520-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

John Garry [Thu, 22 Feb 2024 08:34:20 +0000 (08:34 +0000)]

null_blk: Delete nullb.{queue_depth, nr_queues}

Since commit 8b631f9cf0b8 ("null_blk: remove the bio based I/O path"),
struct nullb members queue_depth and nr_queues are only ever written, so
delete them.

With that, null_exit_hctx() can also be deleted.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240222083420.6026-1-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 22 Feb 2024 07:36:47 +0000 (08:36 +0100)]

pktcdvd: set queue limits at disk allocation time

Remove pkt_init_queue and just pass the two parameters directly to
blk_alloc_disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240222073647.3776769-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 22 Feb 2024 07:36:46 +0000 (08:36 +0100)]

pktcdvd: stop setting q->queuedata

The two users can get the private data from the gendisk with one less
pointer dereference, and we can drop the useless q parameter from
pkt_make_request_write.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240222073647.3776769-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Wed, 21 Feb 2024 12:50:10 +0000 (13:50 +0100)]

block: fix virt_boundary handling in blk_validate_limits

Don't set the default max_segment_size value when a virt_boundary is
used.

Fixes: d690cb8ae14b ("block: add an API to atomically update queue limits")
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20240221125010.3609444-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 20 Feb 2024 09:32:48 +0000 (10:32 +0100)]

null_blk: pass queue_limits to blk_mq_alloc_disk

Pass the queue limits directly to blk_mq_alloc_disk instead of
setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240220093248.3290292-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 20 Feb 2024 09:32:47 +0000 (10:32 +0100)]

null_blk: remove null_gendisk_register

null_gendisk_register isn't a very useful abstraction given that it
doesn't even allocate the gendisk. Merge it into the only caller
instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240220093248.3290292-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 20 Feb 2024 09:32:46 +0000 (10:32 +0100)]

null_blk: refactor tag_set setup

Move the tagset initialization out of null_add_dev into a new
null_setup_tagset helper, and move the shared vs local differences
out of null_init_tag_set into the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240220093248.3290292-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 20 Feb 2024 09:32:45 +0000 (10:32 +0100)]

null_blk: initialize the tag_set timeout in null_init_tag_set

Otherwise it will be reset to the always same value when initializing a
device using the shared tag_set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240220093248.3290292-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 20 Feb 2024 09:32:44 +0000 (10:32 +0100)]

null_blk: remove the bio based I/O path

The bio based I/O path complicates null_blk and also make various
data structures, including the per-command one way bigger than
required for the main request based interface. As the bio-based
path is mostly used by stacking drivers and simple memory based
drivers, and brd is a good example driver for the latter there is
no need to have a bio based path in null_blk. Remove the path
to simplify the driver and make future block layer API changes
simpler by not having to deal with the complex two API setup in
null_blk.

Note that the queue_mode field in struct nullb_device is kept as
that is simpler than having two different places to check the
value and fully open coding the debugfs helpers as the existing
ones won't work without a named struct member.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240220093248.3290292-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:03:00 +0000 (08:03 +0100)]

mmc: pass queue_limits to blk_mq_alloc_disk

Pass the queue limit set at initialization time directly to
blk_mq_alloc_disk instead of updating it right after the allocation.

This requires refactoring the code a bit so that what was mmc_setup_queue
before also allocates the gendisk now and actually sets all limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20240215070300.2200308-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:59 +0000 (08:02 +0100)]

ublk: pass queue_limits to blk_mq_alloc_disk

Pass the limits ublk imposes directly to blk_mq_alloc_disk instead of
setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:58 +0000 (08:02 +0100)]

scm_blk: pass queue_limits to blk_mq_alloc_disk

Pass the few limits scm_block imposes directly to blk_mq_alloc_disk
instead of setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:57 +0000 (08:02 +0100)]

ubiblock: pass queue_limits to blk_mq_alloc_disk

Pass the few limits ubiblock imposes directly to blk_mq_alloc_disk
instead of setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20240215070300.2200308-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:56 +0000 (08:02 +0100)]

mtd_blkdevs: pass queue_limits to blk_mq_alloc_disk

Pass the few limits mtd_blkdevs imposes directly to blk_mq_alloc_disk
instead of setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:55 +0000 (08:02 +0100)]

mspro_block: pass queue_limits to blk_mq_alloc_disk

Pass the few limits mspro_block imposes directly to blk_mq_alloc_disk
instead of setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:54 +0000 (08:02 +0100)]

ms_block: pass queue_limits to blk_mq_alloc_disk

Pass the few limits ms_block imposes directly to blk_mq_alloc_disk
instead of setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:53 +0000 (08:02 +0100)]

gdrom: pass queue_limits to blk_mq_alloc_disk

Pass the few limits gdrom imposes directly to blk_mq_alloc_disk instead
of setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:52 +0000 (08:02 +0100)]

sunvdc: pass queue_limits to blk_mq_alloc_disk

Pass the few limits sunvdc imposes directly to blk_mq_alloc_disk instead
of setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:51 +0000 (08:02 +0100)]

rnbd-clt: pass queue_limits to blk_mq_alloc_disk

Pass the limits rnbd-clt imposes directly to blk_mq_alloc_disk instead
of setting them one at a time.

While at it don't set an explicit number of discard segments, as 1 is
the default (which most drivers rely on).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20240215070300.2200308-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:50 +0000 (08:02 +0100)]

rbd: pass queue_limits to blk_mq_alloc_disk

Pass the limits rbd imposes directly to blk_mq_alloc_disk instead
of setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:49 +0000 (08:02 +0100)]

ps3disk: pass queue_limits to blk_mq_alloc_disk

Pass the few limits ps3disk imposes directly to blk_mq_alloc_disk instead
of setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:48 +0000 (08:02 +0100)]

nbd: pass queue_limits to blk_mq_alloc_disk

Pass the few limits nbd imposes directly to blk_mq_alloc_disk instead
of setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:47 +0000 (08:02 +0100)]

mtip: pass queue_limits to blk_mq_alloc_disk

Pass the few limits mtip imposes directly to blk_mq_alloc_disk instead
of setting them one at a time and drop the pointless setting of a io_min
that is equal to the physical block size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:46 +0000 (08:02 +0100)]

floppy: pass queue_limits to blk_mq_alloc_disk

Pass the few limits floppy imposes directly to blk_mq_alloc_disk instead
of setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Denis Efremov <efremov@linux.com>
Link: https://lore.kernel.org/r/20240215070300.2200308-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:45 +0000 (08:02 +0100)]

aoe: pass queue_limits to blk_mq_alloc_disk

Pass the few limits aoe imposes directly to blk_mq_alloc_disk instead
of setting them one at a time and improve the way the default
max_hw_sectors is initialized while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:02:44 +0000 (08:02 +0100)]

ubd: pass queue_limits to blk_mq_alloc_disk

Pass the few limits ubd imposes directly to blk_mq_alloc_disk instead
of setting them one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240215070300.2200308-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:10:55 +0000 (08:10 +0100)]

dcssblk: pass queue_limits to blk_mq_alloc_disk

Pass the queue limits directly to blk_alloc_disk instead of setting them
one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:10:54 +0000 (08:10 +0100)]

pmem: pass queue_limits to blk_mq_alloc_disk

Pass the queue limits directly to blk_alloc_disk instead of setting them
one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:10:53 +0000 (08:10 +0100)]

btt: pass queue_limits to blk_mq_alloc_disk

Pass the queue limits directly to blk_alloc_disk instead of setting them
one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:10:52 +0000 (08:10 +0100)]

bcache: pass queue_limits to blk_mq_alloc_disk

Pass the queue limits directly to blk_alloc_disk instead of setting them
one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:10:51 +0000 (08:10 +0100)]

zram: pass queue_limits to blk_mq_alloc_disk

Pass the queue limits directly to blk_alloc_disk instead of setting them
one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:10:50 +0000 (08:10 +0100)]

n64cart: pass queue_limits to blk_mq_alloc_disk

Pass the queue limits directly to blk_alloc_disk instead of setting them
one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:10:49 +0000 (08:10 +0100)]

brd: pass queue_limits to blk_mq_alloc_disk

Pass the queue limits directly to blk_alloc_disk instead of setting them
one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:10:48 +0000 (08:10 +0100)]

nfblock: pass queue_limits to blk_mq_alloc_disk

Pass the queue limits directly to blk_alloc_disk instead of setting them
one at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Thu, 15 Feb 2024 07:10:47 +0000 (08:10 +0100)]

block: pass a queue_limits argument to blk_alloc_disk

Pass a queue_limits to blk_alloc_disk and apply it if non-NULL. This
will allow allocating queues with valid queue limits instead of setting
the values one at a time later.

Also change blk_alloc_disk to return an ERR_PTR instead of just NULL
which can't distinguish errors.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Navid Emamdoost [Sun, 18 Feb 2024 04:25:38 +0000 (20:25 -0800)]

nbd: null check for nla_nest_start

nla_nest_start() may fail and return NULL. Insert a check and set errno
based on other call sites within the same source code.

Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Fixes: 47d902b90a32 ("nbd: add a status netlink command")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240218042534.it.206-kees@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jens Axboe [Fri, 16 Feb 2024 22:43:58 +0000 (15:43 -0700)]

Merge tag 'md-6.9-20240216' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.9/block

Pull MD changes from Song:

"1. Cleanup redundant checks, by Yu Kuai.
2. Remove deprecated headers, by Marc Zyngier and Song Liu.
3. Concurrency fixes, by Li Lingfeng.
4. Memory leak fix, by Li Nan."

* tag 'md-6.9-20240216' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: fix kmemleak of rdev->serial
  md/multipath: Remove md-multipath.h
  md/linear: Get rid of md-linear.h
  md: use RCU lock to protect traversal in md_spares_need_change()
  md: get rdev->mddev with READ_ONCE()
  md: remove redundant md_wakeup_thread()
  md: remove redundant check of 'mddev->sync_thread'

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:25 +0000 (08:34 +0100)]

loop: use the atomic queue limits update API

Pass the default limits to blk_mq_alloc_disk and then use the
queue_limits_{start,commit}_update API to change the limits in an
atomic way on existing loop gendisks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:24 +0000 (08:34 +0100)]

loop: pass queue_limits to blk_mq_alloc_disk

Pass the max_hw_sector limit loop sets at initialization time directly to
blk_mq_alloc_disk instead of updating it right after the allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:23 +0000 (08:34 +0100)]

loop: cleanup loop_config_discard

Initialize the local variables for the discard max sectors and
granularity to zero as a sensible default, and then merge the
calls assigning them to the queue limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:22 +0000 (08:34 +0100)]

virtio_blk: pass queue_limits to blk_mq_alloc_disk

Call virtblk_read_limits and most of virtblk_probe_zoned_device before
allocating the gendisk and thus request_queue and make them read into
a queue_limits structure instead. Pass this initialized queue_limits
to blk_mq_alloc_disk to set the queue up with the right parameters
from the start and only leave a few final touches for zoned devices
to be done just before adding the disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:21 +0000 (08:34 +0100)]

virtio_blk: split virtblk_probe

Split out a virtblk_read_limits helper that just reads the various
queue limits to separate it from the higher level probing logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:20 +0000 (08:34 +0100)]

block: pass a queue_limits argument to blk_mq_alloc_disk

Pass a queue_limits to blk_mq_alloc_disk and apply it if non-NULL. This
will allow allocating queues with valid queue limits instead of setting
the values one at a time later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:19 +0000 (08:34 +0100)]

block: pass a queue_limits argument to blk_mq_init_queue

Pass a queue_limits to blk_mq_init_queue and apply it if non-NULL. This
will allow allocating queues with valid queue limits instead of setting
the values one at a time later.

Also rename the function to blk_mq_alloc_queue as that is a much better
name for a function that allocates a queue and always pass the queuedata
argument instead of having a separate version for the extra argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:18 +0000 (08:34 +0100)]

block: pass a queue_limits argument to blk_alloc_queue

Pass a queue_limits to blk_alloc_queue and apply it after validating and
capping the values using blk_validate_limits. This will allow allocating
queues with valid queue limits instead of setting the values one at a
time later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:17 +0000 (08:34 +0100)]

block: use queue_limits_commit_update in queue_discard_max_store

Convert queue_discard_max_store to use queue_limits_commit_update to
check and update the max_discard_sectors limit and freeze the queue
before doing so to ensure we don't have requests in flight while
changing the limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:16 +0000 (08:34 +0100)]

block: add a max_user_discard_sectors queue limit

Add a new max_user_discard_sectors limit that mirrors max_user_sectors
and stores the value that the user manually set. This now allows
updates of the max_hw_discard_sectors to not worry about the user
limit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:15 +0000 (08:34 +0100)]

block: use queue_limits_commit_update in queue_max_sectors_store

Convert queue_max_sectors_store to use queue_limits_commit_update to
check and update the max_sectors limit and freeze the queue before
doing so to ensure we don't have requests in flight while changing
the limits.

Note that this removes the previously held queue_lock that doesn't
protect against any other reader or writer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:14 +0000 (08:34 +0100)]

block: add an API to atomically update queue limits

Add a new queue_limits_{start,commit}_update pair of functions that
allows taking an atomic snapshot of queue limits, update it, and
commit it if it passes validity checking. Also use the low-level
validation helper to implement blk_set_default_limits instead of
duplicating the initialization.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:13 +0000 (08:34 +0100)]

block: decouple blk_set_stacking_limits from blk_set_default_limits

blk_set_stacking_limits uses very little from blk_set_default_limits.
Open code these initializations in preparation for rewriting
blk_set_default_limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:12 +0000 (08:34 +0100)]

block: refactor disk_update_readahead

Factor out a blk_apply_bdi_limits limits helper that can be used with
an explicit queue_limits argument, which will be useful later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Christoph Hellwig [Tue, 13 Feb 2024 07:34:11 +0000 (08:34 +0100)]

block: move max_{open,active}_zones to struct queue_limits

The maximum number of open and active zones is a limit on the queue
and should be places there so that we can including it in the upcoming
queue limits batch update API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Arnd Bergmann [Tue, 13 Feb 2024 10:03:01 +0000 (11:03 +0100)]

drbd: fix function cast warnings in state machine

There are four state machines in drbd that use a common infrastructure, with
a cast to an incompatible function type in REMEMBER_STATE_CHANGE that clang-16
now warns about:

drivers/block/drbd/drbd_state.c:1632:3: error: cast from 'int (*)(struct sk_buff *, unsigned int, struct drbd_resource_state_change *, enum drbd_notification_type)' to 'typeof (last_func)' (aka 'int (*)(struct sk_buff *, unsigned int, void *, enum drbd_notification_type)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
1632 |                 REMEMBER_STATE_CHANGE(notify_resource_state_change,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1633 |                                       resource_state_change, NOTIFY_CHANGE);
      |                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/block/drbd/drbd_state.c:1619:17: note: expanded from macro 'REMEMBER_STATE_CHANGE'
1619 |            last_func = (typeof(last_func))func; \
      |                        ^~~~~~~~~~~~~~~~~~~~~~~
drivers/block/drbd/drbd_state.c:1641:4: error: cast from 'int (*)(struct sk_buff *, unsigned int, struct drbd_connection_state_change *, enum drbd_notification_type)' to 'typeof (last_func)' (aka 'int (*)(struct sk_buff *, unsigned int, void *, enum drbd_notification_type)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
1641 |                         REMEMBER_STATE_CHANGE(notify_connection_state_change,
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1642 |                                               connection_state_change, NOTIFY_CHANGE);
      |                                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Change these all to actually expect a void pointer to be passed, which
matches the caller.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240213100354.457128-1-arnd@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Arnd Bergmann [Tue, 13 Feb 2024 09:59:07 +0000 (10:59 +0100)]

floppy: fix function pointer cast warnings

clang-16 complains about a control flow integrity (kcfi) violation
casting between incompatible pointers:

drivers/block/floppy.c:2001:11: error: cast from 'void (*)(void)' to 'done_f' (aka 'void (*)(int)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
2001 | .done = (done_f)empty
| ^~~~~~~~~~~~~

Just add another empty function with the correct prototype as a
workaround.

The warning is for code that was added before the start of the normal
git history, but I tracked it done to an early change in the reconstructed
linux-history.git.

Fixes: 598a477afe06 ("Import 1.1.41")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240213095918.455478-1-arnd@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Li Nan [Thu, 8 Feb 2024 08:55:56 +0000 (16:55 +0800)]

md: fix kmemleak of rdev->serial

If kobject_add() is fail in bind_rdev_to_array(), 'rdev->serial' will be
alloc not be freed, and kmemleak occurs.

unreferenced object 0xffff88815a350000 (size 49152):
  comm "mdadm", pid 789, jiffies 4294716910
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc f773277a):
    [<0000000058b0a453>] kmemleak_alloc+0x61/0xe0
    [<00000000366adf14>] __kmalloc_large_node+0x15e/0x270
    [<000000002e82961b>] __kmalloc_node.cold+0x11/0x7f
    [<00000000f206d60a>] kvmalloc_node+0x74/0x150
    [<0000000034bf3363>] rdev_init_serial+0x67/0x170
    [<0000000010e08fe9>] mddev_create_serial_pool+0x62/0x220
    [<00000000c3837bf0>] bind_rdev_to_array+0x2af/0x630
    [<0000000073c28560>] md_add_new_disk+0x400/0x9f0
    [<00000000770e30ff>] md_ioctl+0x15bf/0x1c10
    [<000000006cfab718>] blkdev_ioctl+0x191/0x3f0
    [<0000000085086a11>] vfs_ioctl+0x22/0x60
    [<0000000018b656fe>] __x64_sys_ioctl+0xba/0xe0
    [<00000000e54e675e>] do_syscall_64+0x71/0x150
    [<000000008b0ad622>] entry_SYSCALL_64_after_hwframe+0x6c/0x74

Fixes: 963c555e75b0 ("md: introduce mddev_create/destroy_wb_pool for the change of member device")
Signed-off-by: Li Nan <linan122@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240208085556.2412922-1-linan666@huaweicloud.com

commit | commitdiff | tree

Kanchan Joshi [Thu, 1 Feb 2024 13:01:26 +0000 (18:31 +0530)]

nvme: allow integrity when PI is not in first bytes

NVM command set 1.0 (or later) mandates PI to be in the last bytes of
metadata. But this was not supported in the block-layer, and driver
registered a nop profile.

Since block-integrity can now handle flexible PI offset, change the
driver to support this configuration.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240201130126.211402-4-joshi.k@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Kanchan Joshi [Thu, 1 Feb 2024 13:01:25 +0000 (18:31 +0530)]

block: support PI at non-zero offset within metadata

Block layer integrity processing assumes that protection information
(PI) is placed in the first bytes of each metadata block.

Remove this limitation and include the metadata before the PI in the
calculation of the guard tag.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Chinmay Gameti <c.gameti@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240201130126.211402-3-joshi.k@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Kanchan Joshi [Thu, 1 Feb 2024 13:01:24 +0000 (18:31 +0530)]

block: refactor guard helpers

Allow computation using the existing guard value.
This is a prep patch.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240201130126.211402-2-joshi.k@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Johannes Thumshirn [Mon, 29 Jan 2024 07:52:20 +0000 (23:52 -0800)]

block: remove gfp_flags from blkdev_zone_mgmt

Now that all callers pass in GFP_KERNEL to blkdev_zone_mgmt() and use
memalloc_no{io,fs}_{save,restore}() to define the allocation scope, we can
drop the gfp_mask parameter from blkdev_zone_mgmt() as well as
blkdev_zone_reset_all() and blkdev_zone_reset_all_emulated().

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-5-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Johannes Thumshirn [Mon, 29 Jan 2024 07:52:19 +0000 (23:52 -0800)]

f2fs: guard blkdev_zone_mgmt with nofs scope

Guard the calls to blkdev_zone_mgmt() with a memalloc_nofs scope.
This helps us getting rid of the GFP_NOFS argument to blkdev_zone_mgmt();

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-4-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Johannes Thumshirn [Mon, 29 Jan 2024 07:52:18 +0000 (23:52 -0800)]

btrfs: zoned: call blkdev_zone_mgmt in nofs scope

Add a memalloc_nofs scope around all calls to blkdev_zone_mgmt(). This
allows us to further get rid of the GFP_NOFS argument for
blkdev_zone_mgmt().

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-3-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Johannes Thumshirn [Mon, 29 Jan 2024 07:52:17 +0000 (23:52 -0800)]

dm: dm-zoned: guard blkdev_zone_mgmt with noio scope

Guard the calls to blkdev_zone_mgmt() with a memalloc_noio scope.
This helps us getting rid of the GFP_NOIO argument to blkdev_zone_mgmt();

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-2-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Johannes Thumshirn [Mon, 29 Jan 2024 07:52:16 +0000 (23:52 -0800)]

zonefs: pass GFP_KERNEL to blkdev_zone_mgmt() call

Pass GFP_KERNEL instead of GFP_NOFS to the blkdev_zone_mgmt() call in
zonefs_zone_mgmt().

As as zonefs_zone_mgmt() and zonefs_inode_zone_mgmt() are never called
from a place that can recurse back into the filesystem on memory reclaim,
it is save to call blkdev_zone_mgmt() with GFP_KERNEL.

Link: https://lore.kernel.org/all/ZZcgXI46AinlcBDP@casper.infradead.org/
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-1-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Miroslav Franc [Fri, 9 Feb 2024 12:45:22 +0000 (13:45 +0100)]

s390/dasd: fix double module refcount decrement

Once the discipline is associated with the device, deleting the device
takes care of decrementing the module's refcount. Doing it manually on
this error path causes refcount to artificially decrease on each error
while it should just stay the same.

Fixes: c020d722b110 ("s390/dasd: fix panic during offline processing")
Signed-off-by: Miroslav Franc <mfranc@suse.cz>
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240209124522.3697827-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jan Höppner [Fri, 9 Feb 2024 12:45:21 +0000 (13:45 +0100)]

s390/dasd: Improve ERP error messages

Some ERP errors still share the same message format and only add
different reason codes to it. These reason codes don't have any meaning
anymore.
Make the individual error messages more explicit and remove the reason
codes altogether. Comments around the error messages are also removed as
they provide no additional value anymore with more explicit messages.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240209124522.3697827-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Shin'ichiro Kawasaki [Tue, 30 Jan 2024 04:21:34 +0000 (13:21 +0900)]

null_blk: add configfs variable shared_tags

Allow setting shared_tags through configfs, which could only be set as a
module parameter. For that purpose, delay tag_set initialization from
null_init() to null_add_dev(). Refer tag_set.ops as the flag to check if
tag_set is initialized or not.

The following parameters can not be set through configfs yet:

    timeout
    requeue
    init_hctx

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240130042134.2463659-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Kunwu Chan [Wed, 31 Jan 2024 09:43:23 +0000 (17:43 +0800)]

block: Simplify the allocation of slab caches

Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.

Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240131094323.146659-1-chentao@kylinos.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 7 Feb 2024 14:14:29 +0000 (14:14 +0000)]

block: optimise in irq bio put caching

When enlisting a bio into ->free_list_irq we protect the list by
disabling irqs. It's likely they're already disabled and performance of
local_irq_{save,restore}() is decent, but it's not zero cost.

Let's only use the irq cache when when we're serving a hard irq, which
allows to remove local_irq_{save,restore}(), and fall back to bio_free()
in all left cases.

Profiles indicate that the bio_put() cost is reduced by ~3.5 times
(1.76% -> 0.49%), and total throughput of a CPU bound benchmark improve
by around 1% (t/io_uring with high QD and several drives).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/36d207540b7046c653cc16e5ff08fe7234b19f81.1707314970.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 7 Feb 2024 14:14:28 +0000 (14:14 +0000)]

block: extend bio caching to task context

bio_put_percpu_cache() puts all non-iopoll bios into the irq-safe list,
which entails disabling irqs. The overhead of that is not that bad when
interrupts are already off but getting worse otherwise. We can optimise
it when we're in the task context by using ->free_list directly just as
the IOPOLL path does.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4774e1a0f905f96c63174b0f3e4f79f0d9b63246.1707314970.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom