]> www.infradead.org Git - users/hch/block.git/log
users/hch/block.git
3 years agoudf: use sb_bdev_nr_blocks block-kill-bd_inode-part1
Christoph Hellwig [Mon, 11 Oct 2021 15:01:25 +0000 (17:01 +0200)]
udf: use sb_bdev_nr_blocks

Use the sb_bdev_nr_blocks helper instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
3 years agoreiserfs: use sb_bdev_nr_blocks
Christoph Hellwig [Mon, 11 Oct 2021 14:58:24 +0000 (16:58 +0200)]
reiserfs: use sb_bdev_nr_blocks

Use the sb_bdev_nr_blocks helper instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
3 years agontfs: use sb_bdev_nr_blocks
Christoph Hellwig [Mon, 11 Oct 2021 14:55:32 +0000 (16:55 +0200)]
ntfs: use sb_bdev_nr_blocks

Use the sb_bdev_nr_blocks helper instead of open coding it and clean up
ntfs_fill_super a bit by moving an assignment a little earlier that has
no negative side effects.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Anton Altaparmakov <anton@tuxera.com>
3 years agojfs: use sb_bdev_nr_blocks
Christoph Hellwig [Mon, 11 Oct 2021 14:49:33 +0000 (16:49 +0200)]
jfs: use sb_bdev_nr_blocks

Use the sb_bdev_nr_blocks helper instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
3 years agoext4: use sb_bdev_nr_blocks
Christoph Hellwig [Mon, 11 Oct 2021 14:47:47 +0000 (16:47 +0200)]
ext4: use sb_bdev_nr_blocks

Use the sb_bdev_nr_blocks helper instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Theodore Ts'o <tytso@mit.edu>
3 years agoblock: add a sb_bdev_nr_blocks helper
Christoph Hellwig [Mon, 11 Oct 2021 14:45:59 +0000 (16:45 +0200)]
block: add a sb_bdev_nr_blocks helper

Add a helper to return the size of sb->s_bdev in sb->s_blocksize_bits
based unites.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
3 years agoblock: use bdev_nr_bytes instead of open coding it in blkdev_fallocate
Christoph Hellwig [Mon, 11 Oct 2021 14:41:37 +0000 (16:41 +0200)]
block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
3 years agosquashfs: use bdev_nr_bytes instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 14:59:54 +0000 (16:59 +0200)]
squashfs: use bdev_nr_bytes instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Phillip Lougher <phillip@squashfs.org.uk>
3 years agoreiserfs: use bdev_nr_bytes instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 14:57:23 +0000 (16:57 +0200)]
reiserfs: use bdev_nr_bytes instead of open coding it

Use the proper helper to read the block device size and remove two
cargo culted checks that can't be false.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
3 years agopstore/blk: use bdev_nr_bytes instead of open coding it
Christoph Hellwig [Tue, 12 Oct 2021 08:24:50 +0000 (10:24 +0200)]
pstore/blk: use bdev_nr_bytes instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Kees Cook <keescook@chromium.org>
3 years agontfs3: use bdev_nr_bytes instead of open coding it
Christoph Hellwig [Tue, 12 Oct 2021 08:21:28 +0000 (10:21 +0200)]
ntfs3: use bdev_nr_bytes instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonilfs2: use bdev_nr_bytes instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 14:53:27 +0000 (16:53 +0200)]
nilfs2: use bdev_nr_bytes instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
3 years agonfs/blocklayout: use bdev_nr_bytes instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 14:51:48 +0000 (16:51 +0200)]
nfs/blocklayout: use bdev_nr_bytes instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
3 years agojfs: use bdev_nr_bytes instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 14:50:44 +0000 (16:50 +0200)]
jfs: use bdev_nr_bytes instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
3 years agohfsplus: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:22:50 +0000 (13:22 +0200)]
hfsplus: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
3 years agohfs: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:22:17 +0000 (13:22 +0200)]
hfs: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
3 years agofat: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:23:47 +0000 (13:23 +0200)]
fat: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
3 years agocramfs: use bdev_nr_bytes instead of open coding it
Christoph Hellwig [Tue, 12 Oct 2021 08:16:15 +0000 (10:16 +0200)]
cramfs: use bdev_nr_bytes instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
3 years agobtrfs: use bdev_nr_bytes instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:33:41 +0000 (13:33 +0200)]
btrfs: use bdev_nr_bytes instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Acked-by: David Sterba <dsterba@suse.com>
3 years agoaffs: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:20:59 +0000 (13:20 +0200)]
affs: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
3 years agofs: simplify init_page_buffers
Christoph Hellwig [Mon, 11 Oct 2021 15:09:05 +0000 (17:09 +0200)]
fs: simplify init_page_buffers

No need to convert from bdev to inode and back.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
3 years agofs: use bdev_nr_bytes instead of open coding it in blkdev_max_block
Christoph Hellwig [Mon, 11 Oct 2021 11:36:06 +0000 (13:36 +0200)]
fs: use bdev_nr_bytes instead of open coding it in blkdev_max_block

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
3 years agotarget/iblock: use bdev_nr_bytes instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:28:23 +0000 (13:28 +0200)]
target/iblock: use bdev_nr_bytes instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
3 years agonvmet: use bdev_nr_bytes instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:26:29 +0000 (13:26 +0200)]
nvmet: use bdev_nr_bytes instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
3 years agomd: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:19:53 +0000 (13:19 +0200)]
md: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Song Liu <song@kernel.org>
3 years agodm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them
Christoph Hellwig [Mon, 11 Oct 2021 11:07:46 +0000 (13:07 +0200)]
dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them

Use the proper helpers to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Snitzer <snitzer@redhat.com>
3 years agodrbd: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:03:17 +0000 (13:03 +0200)]
drbd: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
3 years agobcache: remove bdev_sectors
Christoph Hellwig [Mon, 11 Oct 2021 11:04:16 +0000 (13:04 +0200)]
bcache: remove bdev_sectors

Use the equivalent block layer helper instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Coly Li <colyli@suse.de>
3 years agoblock: add a bdev_nr_bytes helper
Christoph Hellwig [Thu, 14 Oct 2021 11:17:57 +0000 (13:17 +0200)]
block: add a bdev_nr_bytes helper

Add a helper to query the size of a block device in bytes.  This
will be used to remove open coded access to ->bd_inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
3 years agoblock: move the SECTOR_SIZE related definitions to blk_types.h
Christoph Hellwig [Thu, 14 Oct 2021 11:20:08 +0000 (13:20 +0200)]
block: move the SECTOR_SIZE related definitions to blk_types.h

Ensure these are always available for inlines in the various block layer
headers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
3 years agoblock: only check previous entry for plug merge attempt
Jens Axboe [Thu, 14 Oct 2021 13:24:07 +0000 (07:24 -0600)]
block: only check previous entry for plug merge attempt

Currently we scan the entire plug list, which is potentially very
expensive. In an IOPS bound workload, we can drive about 5.6M IOPS with
merging enabled, and profiling shows that the plug merge check is the
(by far) most expensive thing we're doing:

  Overhead  Command   Shared Object     Symbol
  +   20.89%  io_uring  [kernel.vmlinux]  [k] blk_attempt_plug_merge
  +    4.98%  io_uring  [kernel.vmlinux]  [k] io_submit_sqes
  +    4.78%  io_uring  [kernel.vmlinux]  [k] blkdev_direct_IO
  +    4.61%  io_uring  [kernel.vmlinux]  [k] blk_mq_submit_bio

Instead of browsing the whole list, just check the previously inserted
entry. That is enough for a naive merge check and will catch most cases,
and for devices that need full merging, the IO scheduler attached to
such devices will do that anyway. The plug merge is meant to be an
inexpensive check to avoid getting a request, but if we repeatedly
scan the list for every single insert, it is very much not a cheap
check.

With this patch, the workload instead runs at ~7.0M IOPS, providing
a 25% improvement. Disabling merging entirely yields another 5%
improvement.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move CONFIG_BLOCK guard to top Makefile
Masahiro Yamada [Mon, 27 Sep 2021 14:00:00 +0000 (23:00 +0900)]
block: move CONFIG_BLOCK guard to top Makefile

Every object under block/ depends on CONFIG_BLOCK.

Move the guard to the top Makefile since there is no point to
descend into block/ if CONFIG_BLOCK=n.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210927140000.866249-5-masahiroy@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move menu "Partition type" to block/partitions/Kconfig
Masahiro Yamada [Mon, 27 Sep 2021 13:59:59 +0000 (22:59 +0900)]
block: move menu "Partition type" to block/partitions/Kconfig

Move the menu to the relevant place.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210927140000.866249-4-masahiroy@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: simplify Kconfig files
Masahiro Yamada [Mon, 27 Sep 2021 13:59:58 +0000 (22:59 +0900)]
block: simplify Kconfig files

Everything under block/ depends on BLOCK. BLOCK_HOLDER_DEPRECATED is
selected from drivers/md/Kconfig, which is entirely dependent on BLOCK.

Extend the 'if BLOCK' ... 'endif' so it covers the whole block/Kconfig.

Also, clean up the definition of BLOCK_COMPAT and BLK_MQ_PCI because
COMPAT and PCI are boolean.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210927140000.866249-3-masahiroy@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: remove redundant =y from BLK_CGROUP dependency
Masahiro Yamada [Mon, 27 Sep 2021 13:59:57 +0000 (22:59 +0900)]
block: remove redundant =y from BLK_CGROUP dependency

CONFIG_BLK_CGROUP is a boolean option, that is, its value is 'y' or 'n'.
The comparison to 'y' is redundant.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210927140000.866249-2-masahiroy@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: improve batched tag allocation
Jens Axboe [Sat, 9 Oct 2021 19:10:39 +0000 (13:10 -0600)]
block: improve batched tag allocation

Add a blk_mq_get_tags() helper, which uses the new sbitmap API for
allocating a batch of tags all at once. This both simplifies the block
code for batched allocation, and it is also more efficient than just
doing repeated calls into __sbitmap_queue_get().

This reduces the sbitmap overhead in peak runs from ~3% to ~1% and
yields a performanc increase from 6.6M IOPS to 6.8M IOPS for a single
CPU core.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agosbitmap: add __sbitmap_queue_get_batch()
Jens Axboe [Sat, 9 Oct 2021 19:02:23 +0000 (13:02 -0600)]
sbitmap: add __sbitmap_queue_get_batch()

The block layer tag allocation batching still calls into sbitmap to get
each tag, but we can improve on that. Add __sbitmap_queue_get_batch(),
which returns a mask of tags all at once, along with an offset for
those tags.

An example return would be 0xff, where bits 0..7 are set, with
tag_offset == 128. The valid tags in this case would be 128..135.

A batch is specific to an individual sbitmap_map, hence it cannot be
larger than that. The requested number of tags is automatically reduced
to the max that can be satisfied with a single map.

On failure, 0 is returned. Caller should fall back to single tag
allocation at that point/

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: optimise *end_request non-stat path
Pavel Begunkov [Wed, 13 Oct 2021 08:57:13 +0000 (09:57 +0100)]
blk-mq: optimise *end_request non-stat path

We already have a blk_mq_need_time_stamp() check in
__blk_mq_end_request() to get a timestamp, hide all the statistics
accounting under it. It cuts some cycles for requests that don't need
stats, and is free otherwise.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/e0f2ea812e93a8adcd07101212e7d7e70ca304e7.1634115360.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: mark bio_truncate static
Christoph Hellwig [Tue, 12 Oct 2021 16:18:04 +0000 (18:18 +0200)]
block: mark bio_truncate static

bio_truncate is only used in bio.c, so mark it static.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move bio_get_{first,last}_bvec out of bio.h
Christoph Hellwig [Tue, 12 Oct 2021 16:18:03 +0000 (18:18 +0200)]
block: move bio_get_{first,last}_bvec out of bio.h

bio_get_first_bvec and bio_get_last_bvec are only used in blk-merge.c,
so move them there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: mark __bio_try_merge_page static
Christoph Hellwig [Tue, 12 Oct 2021 16:18:02 +0000 (18:18 +0200)]
block: mark __bio_try_merge_page static

Mark __bio_try_merge_page static and move it up a bit to avoid the need
for a forward declaration.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move bio_full out of bio.h
Christoph Hellwig [Tue, 12 Oct 2021 16:18:01 +0000 (18:18 +0200)]
block: move bio_full out of bio.h

bio_full is only used in bio.c, so move it there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: fold bio_cur_bytes into blk_rq_cur_bytes
Christoph Hellwig [Tue, 12 Oct 2021 16:18:00 +0000 (18:18 +0200)]
block: fold bio_cur_bytes into blk_rq_cur_bytes

Fold bio_cur_bytes into the only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move bio_mergeable out of bio.h
Christoph Hellwig [Tue, 12 Oct 2021 16:17:59 +0000 (18:17 +0200)]
block: move bio_mergeable out of bio.h

bio_mergeable is only needed by I/O schedulers, so move it to
blk-mq-sched.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: don't include <linux/ioprio.h> in <linux/bio.h>
Christoph Hellwig [Tue, 12 Oct 2021 16:17:58 +0000 (18:17 +0200)]
block: don't include <linux/ioprio.h> in <linux/bio.h>

bio.h doesn't need any of the definitions from ioprio.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: remove BIO_BUG_ON
Christoph Hellwig [Tue, 12 Oct 2021 16:17:57 +0000 (18:17 +0200)]
block: remove BIO_BUG_ON

BIO_DEBUG is always defined, so just switch the two instances to use
BUG_ON directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: inline hot part of __blk_mq_sched_restart
Pavel Begunkov [Sat, 9 Oct 2021 12:25:42 +0000 (13:25 +0100)]
blk-mq: inline hot part of __blk_mq_sched_restart

Extract a fast check out of __block_mq_sched_restart() and inline it for
performance reasons.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/894abaa0998e5999f2fe18f271e5efdfc2c32bd2.1633781740.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: inline hot paths of blk_account_io_*()
Pavel Begunkov [Sat, 9 Oct 2021 12:25:41 +0000 (13:25 +0100)]
block: inline hot paths of blk_account_io_*()

Extract hot paths of __blk_account_io_start() and
__blk_account_io_done() into inline functions, so we don't always pay
for function calls.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b0662a636bd4cc7b4f84c9d0a41efa46a688ef13.1633781740.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: merge block_ioctl into blkdev_ioctl
Christoph Hellwig [Tue, 12 Oct 2021 10:44:50 +0000 (12:44 +0200)]
block: merge block_ioctl into blkdev_ioctl

Simplify the ioctl path and match the code structure on the compat side.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104450.659013-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move the *blkdev_ioctl declarations out of blkdev.h
Christoph Hellwig [Tue, 12 Oct 2021 10:44:49 +0000 (12:44 +0200)]
block: move the *blkdev_ioctl declarations out of blkdev.h

These are only used inside of block/.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104450.659013-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: unexport blkdev_ioctl
Christoph Hellwig [Tue, 12 Oct 2021 10:44:48 +0000 (12:44 +0200)]
block: unexport blkdev_ioctl

With the raw driver gone, there is no modular user left.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104450.659013-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: cleanup blk_mq_submit_bio
Christoph Hellwig [Tue, 12 Oct 2021 10:40:45 +0000 (12:40 +0200)]
blk-mq: cleanup blk_mq_submit_bio

Move the blk_mq_alloc_data stack allocation only into the branch
that actually needs it, and use rq->mq_hctx instead of data.hctx
to refer to the hctx.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104045.658051-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: cleanup and rename __blk_mq_alloc_request
Christoph Hellwig [Tue, 12 Oct 2021 10:40:44 +0000 (12:40 +0200)]
blk-mq: cleanup and rename __blk_mq_alloc_request

The newly added loop for the cached requests in __blk_mq_alloc_request
is a little too convoluted for my taste, so unwind it a bit.  Also
rename the function to __blk_mq_alloc_requests now that it can allocate
more than a single request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104045.658051-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: pre-allocate requests if plug is started and is a batch
Jens Axboe [Wed, 6 Oct 2021 12:34:11 +0000 (06:34 -0600)]
block: pre-allocate requests if plug is started and is a batch

The caller typically has a good (or even exact) idea of how many requests
it needs to submit. We can make the request/tag allocation a lot more
efficient if we just allocate N requests/tags upfront when we queue the
first bio from the batch.

Provide a new plug start helper that allows the caller to specify how many
IOs are expected. This sets plug->nr_ios, and we can use that for smarter
request allocation. The plug provides a holding spot for requests, and
request allocation will check it before calling into the normal request
allocation path.

The blk_finish_plug() is called, check if there are unused requests and
free them. This should not happen in normal operations. The exception is
if we get merging, then we may be left with requests that need freeing
when done.

This raises the per-core performance on my setup from ~5.8M to ~6.1M
IOPS.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: bump max plugged deferred size from 16 to 32
Jens Axboe [Wed, 6 Oct 2021 18:01:07 +0000 (12:01 -0600)]
block: bump max plugged deferred size from 16 to 32

Particularly for NVMe with efficient deferred submission for many
requests, there are nice benefits to be seen by bumping the default max
plug count from 16 to 32. This is especially true for virtualized setups,
where the submit part is more expensive. But can be noticed even on
native hardware.

Reduce the multiple queue factor from 4 to 2, since we're changing the
default size.

While changing it, move the defines into the block layer private header.
These aren't values that anyone outside of the block layer uses, or
should use.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: inherit request start time from bio for BLK_CGROUP
Jens Axboe [Tue, 5 Oct 2021 15:23:59 +0000 (09:23 -0600)]
block: inherit request start time from bio for BLK_CGROUP

Doing high IOPS testing with blk-cgroups enabled spends ~15-20% of the
time just doing ktime_get_ns() -> readtsc. We essentially read and
set the start time twice, one for the bio and then again when that bio
is mapped to a request.

Given that the time between the two is very short, inherit the bio
start time instead of reading it again. This cuts 1/3rd of the overhead
of the time keeping.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move blk-throtl fast path inline
Jens Axboe [Tue, 5 Oct 2021 15:11:56 +0000 (09:11 -0600)]
block: move blk-throtl fast path inline

Even if no policies are defined, we spend ~2% of the total IO time
checking. Move the fast path inline.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Change shared sbitmap naming to shared tags
John Garry [Tue, 5 Oct 2021 10:23:39 +0000 (18:23 +0800)]
blk-mq: Change shared sbitmap naming to shared tags

Now that shared sbitmap support really means shared tags, rename symbols
to match that.

Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1633429419-228500-15-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Stop using pointers for blk_mq_tags bitmap tags
John Garry [Tue, 5 Oct 2021 10:23:38 +0000 (18:23 +0800)]
blk-mq: Stop using pointers for blk_mq_tags bitmap tags

Now that we use shared tags for shared sbitmap support, we don't require
the tags sbitmap pointers, so drop them.

This essentially reverts commit 222a5ae03cdd ("blk-mq: Use pointers for
blk_mq_tags bitmap tags").

Function blk_mq_init_bitmap_tags() is removed also, since it would be only
a wrappper for blk_mq_init_bitmaps().

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1633429419-228500-14-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Use shared tags for shared sbitmap support
John Garry [Tue, 5 Oct 2021 10:23:37 +0000 (18:23 +0800)]
blk-mq: Use shared tags for shared sbitmap support

Currently we use separate sbitmap pairs and active_queues atomic_t for
shared sbitmap support.

However a full sets of static requests are used per HW queue, which is
quite wasteful, considering that the total number of requests usable at
any given time across all HW queues is limited by the shared sbitmap depth.

As such, it is considerably more memory efficient in the case of shared
sbitmap to allocate a set of static rqs per tag set or request queue, and
not per HW queue.

So replace the sbitmap pairs and active_queues atomic_t with a shared
tags per tagset and request queue, which will hold a set of shared static
rqs.

Since there is now no valid HW queue index to be passed to the blk_mq_ops
.init and .exit_request callbacks, pass an invalid index token. This
changes the semantics of the APIs, such that the callback would need to
validate the HW queue index before using it. Currently no user of shared
sbitmap actually uses the HW queue index (as would be expected).

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-13-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()
John Garry [Tue, 5 Oct 2021 10:23:36 +0000 (18:23 +0800)]
blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()

Refactor blk_mq_free_map_and_requests() such that it can be used at many
sites at which the tag map and rqs are freed.

Also rename to blk_mq_free_map_and_rqs(), which is shorter and matches the
alloc equivalent.

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-12-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Add blk_mq_alloc_map_and_rqs()
John Garry [Tue, 5 Oct 2021 10:23:35 +0000 (18:23 +0800)]
blk-mq: Add blk_mq_alloc_map_and_rqs()

Add a function to combine allocating tags and the associated requests,
and factor out common patterns to use this new function.

Some function only call blk_mq_alloc_map_and_rqs() now, but more
functionality will be added later.

Also make blk_mq_alloc_rq_map() and blk_mq_alloc_rqs() static since they
are only used in blk-mq.c, and finally rename some functions for
conciseness and consistency with other function names:
- __blk_mq_alloc_map_and_{request -> rqs}()
- blk_mq_alloc_{map_and_requests -> set_map_and_rqs}()

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-11-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Add blk_mq_tag_update_sched_shared_sbitmap()
John Garry [Tue, 5 Oct 2021 10:23:34 +0000 (18:23 +0800)]
blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap()

Put the functionality to update the sched shared sbitmap size in a common
function.

Since the same formula is always used to resize, and it can be got from
the request queue argument, so just pass the request queue pointer.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-10-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Don't clear driver tags own mapping
John Garry [Tue, 5 Oct 2021 10:23:33 +0000 (18:23 +0800)]
blk-mq: Don't clear driver tags own mapping

Function blk_mq_clear_rq_mapping() is required to clear the sched tags
mappings in driver tags rqs[].

But there is no need for a driver tags to clear its own mapping, so skip
clearing the mapping in this scenario.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-9-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Pass driver tags to blk_mq_clear_rq_mapping()
John Garry [Tue, 5 Oct 2021 10:23:32 +0000 (18:23 +0800)]
blk-mq: Pass driver tags to blk_mq_clear_rq_mapping()

Function blk_mq_clear_rq_mapping() will be used for shared sbitmap tags
in future, so pass a driver tags pointer instead of the tagset container
and HW queue index.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-8-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()
John Garry [Tue, 5 Oct 2021 10:23:31 +0000 (18:23 +0800)]
blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()

To be more concise and consistent in naming, rename
blk_mq_sched_free_requests() -> blk_mq_sched_free_rqs().

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-7-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}()
John Garry [Tue, 5 Oct 2021 10:23:30 +0000 (18:23 +0800)]
blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}()

Function blk_mq_sched_alloc_tags() does same as
__blk_mq_alloc_map_and_request(), so give a similar name to be consistent.

Similarly rename label err_free_tags -> err_free_map_and_rqs.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-6-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Invert check in blk_mq_update_nr_requests()
John Garry [Tue, 5 Oct 2021 10:23:29 +0000 (18:23 +0800)]
blk-mq: Invert check in blk_mq_update_nr_requests()

It's easier to read:

if (x)
X;
else
Y;

over:

if (!x)
Y;
else
X;

No functional change intended.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-5-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()
John Garry [Tue, 5 Oct 2021 10:23:28 +0000 (18:23 +0800)]
blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()

For shared sbitmap, if the call to blk_mq_tag_update_depth() was
successful for any hctx when hctx->sched_tags is not set, then it would be
successful for all (due to nature in which blk_mq_tag_update_depth()
fails).

As such, there is no need to call blk_mq_tag_resize_shared_sbitmap() for
each hctx. So relocate the call until after the hctx iteration under the
!q->elevator check, which is equivalent (to !hctx->sched_tags).

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-4-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ
John Garry [Tue, 5 Oct 2021 10:23:27 +0000 (18:23 +0800)]
block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ

It is a bit confusing that there is BLKDEV_MAX_RQ and MAX_SCHED_RQ, as
the name BLKDEV_MAX_RQ would imply the max requests always, which it is
not.

Rename to BLKDEV_MAX_RQ to BLKDEV_DEFAULT_RQ, matching its usage - that being
the default number of requests assigned when allocating a request queue.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-3-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Change rqs check in blk_mq_free_rqs()
John Garry [Tue, 5 Oct 2021 10:23:26 +0000 (18:23 +0800)]
blk-mq: Change rqs check in blk_mq_free_rqs()

The original code in commit 24d2f90309b23 ("blk-mq: split out tag
initialization, support shared tags") would check tags->rqs is non-NULL and
then dereference tags->rqs[].

Then in commit 2af8cbe30531 ("blk-mq: split tag ->rqs[] into two"), we
started to dereference tags->static_rqs[], but continued to check non-NULL
tags->rqs.

Check tags->static_rqs as non-NULL instead, which is more logical.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-2-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: print the current process in handle_bad_sector
Christoph Hellwig [Tue, 28 Sep 2021 05:27:55 +0000 (07:27 +0200)]
block: print the current process in handle_bad_sector

Make the bad sector information a little more useful by printing
current->comm to identify the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20210928052755.113016-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock/mq-deadline: Prioritize high-priority requests
Bart Van Assche [Mon, 27 Sep 2021 22:03:28 +0000 (15:03 -0700)]
block/mq-deadline: Prioritize high-priority requests

In addition to reverting commit 7b05bf771084 ("Revert "block/mq-deadline:
Prioritize high-priority requests""), this patch uses 'jiffies' instead
of ktime_get() in the code for aging lower priority requests.

This patch has been tested as follows:

Measured QD=1/jobs=1 IOPS for nullb with the mq-deadline scheduler.
Result without and with this patch: 555 K IOPS.

Measured QD=1/jobs=8 IOPS for nullb with the mq-deadline scheduler.
Result without and with this patch: about 380 K IOPS.

Ran the following script:

set -e
scriptdir=$(dirname "$0")
if [ -e /sys/module/scsi_debug ]; then modprobe -r scsi_debug; fi
modprobe scsi_debug ndelay=1000000 max_queue=16
sd=''
while [ -z "$sd" ]; do
  sd=$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
done
echo $((100*1000)) > "/sys/block/$sd/queue/iosched/prio_aging_expire"
if [ -e /sys/fs/cgroup/io.prio.class ]; then
  cd /sys/fs/cgroup
  echo restrict-to-be >io.prio.class
  echo +io > cgroup.subtree_control
else
  cd /sys/fs/cgroup/blkio/
  echo restrict-to-be >blkio.prio.class
fi
echo $$ >cgroup.procs
mkdir -p hipri
cd hipri
if [ -e io.prio.class ]; then
  echo none-to-rt >io.prio.class
else
  echo none-to-rt >blkio.prio.class
fi
{ "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/low-pri.txt & }
echo $$ >cgroup.procs
"${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/hi-pri.txt

Result:
* 11000 IOPS for the high-priority job
*    40 IOPS for the low-priority job

If the prio aging expiry time is changed from 100s into 0, the IOPS results
change into 6712 and 6796 IOPS.

The max-iops script is a script that runs fio with the following arguments:
--bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
--norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
--iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
--iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
--filename=${positional_argument_1}

Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20210927220328.1410161-5-bvanassche@acm.org
[axboe: @latest -> @latest_start]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock/mq-deadline: Stop using per-CPU counters
Bart Van Assche [Mon, 27 Sep 2021 22:03:27 +0000 (15:03 -0700)]
block/mq-deadline: Stop using per-CPU counters

Calculating the sum over all CPUs of per-CPU counters frequently is
inefficient. Hence switch from per-CPU to individual counters. Three
counters are protected by the mq-deadline spinlock since these are
only accessed from contexts that already hold that spinlock. The fourth
counter is atomic because protecting it with the mq-deadline spinlock
would trigger lock contention.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210927220328.1410161-4-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock/mq-deadline: Add an invariant check
Bart Van Assche [Mon, 27 Sep 2021 22:03:26 +0000 (15:03 -0700)]
block/mq-deadline: Add an invariant check

Check a statistics invariant at module unload time. When running
blktests, the invariant is verified every time a request queue is
removed and hence is verified at least once per test.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210927220328.1410161-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock/mq-deadline: Improve request accounting further
Bart Van Assche [Mon, 27 Sep 2021 22:03:25 +0000 (15:03 -0700)]
block/mq-deadline: Improve request accounting further

The scheduler .insert_requests() callback is called when a request is
queued for the first time and also when it is requeued. Only count a
request the first time it is queued. Additionally, since the mq-deadline
scheduler only performs zone locking for requests that have been
inserted, skip the zone unlock code for requests that have not been
inserted into the mq-deadline scheduler.

Fixes: 38ba64d12d4c ("block/mq-deadline: Track I/O statistics")
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210927220328.1410161-2-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move struct request to blk-mq.h
Christoph Hellwig [Mon, 20 Sep 2021 12:33:28 +0000 (14:33 +0200)]
block: move struct request to blk-mq.h

struct request is only used by blk-mq drivers, so move it and all
related declarations to blk-mq.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move integrity handling out of <linux/blkdev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:27 +0000 (14:33 +0200)]
block: move integrity handling out of <linux/blkdev.h>

Split the integrity/metadata handling definitions out into a new header.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move a few merge helpers out of <linux/blkdev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:26 +0000 (14:33 +0200)]
block: move a few merge helpers out of <linux/blkdev.h>

These are block-layer internal helpers, so move them to block/blk.h and
block/blk-merge.c.  Also update a comment a bit to use better grammar.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: drop unused includes in <linux/genhd.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:25 +0000 (14:33 +0200)]
block: drop unused includes in <linux/genhd.h>

Drop various include not actually used in genhd.h itself, and
move the remaning includes closer together.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: drop unused includes in <linux/blkdev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:24 +0000 (14:33 +0200)]
block: drop unused includes in <linux/blkdev.h>

Drop various include not actually used in blkdev.h itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move elevator.h to block/
Christoph Hellwig [Mon, 20 Sep 2021 12:33:23 +0000 (14:33 +0200)]
block: move elevator.h to block/

Except for the features passed to blk_queue_required_elevator_features,
elevator.h is only needed internally to the block layer.  Move the
ELEVATOR_F_* definitions to blkdev.h, and the move elevator.h to
block/, dropping all the spurious includes outside of that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: remove the struct blk_queue_ctx forward declaration
Christoph Hellwig [Mon, 20 Sep 2021 12:33:22 +0000 (14:33 +0200)]
block: remove the struct blk_queue_ctx forward declaration

This type doesn't exist at all, so no need to forward declare it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: remove the cmd_size field from struct request_queue
Christoph Hellwig [Mon, 20 Sep 2021 12:33:21 +0000 (14:33 +0200)]
block: remove the cmd_size field from struct request_queue

Entirely unused.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: remove the unused blk_queue_state enum
Christoph Hellwig [Mon, 20 Sep 2021 12:33:20 +0000 (14:33 +0200)]
block: remove the unused blk_queue_state enum

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: remove the unused rq_end_sector macro
Christoph Hellwig [Mon, 20 Sep 2021 12:33:19 +0000 (14:33 +0200)]
block: remove the unused rq_end_sector macro

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agosched: move the <linux/blkdev.h> include out of kernel/sched/sched.h
Christoph Hellwig [Mon, 20 Sep 2021 12:33:18 +0000 (14:33 +0200)]
sched: move the <linux/blkdev.h> include out of kernel/sched/sched.h

Only core.c needs blkdev.h, so move the #include statement there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agokernel: remove spurious blkdev.h includes
Christoph Hellwig [Mon, 20 Sep 2021 12:33:17 +0000 (14:33 +0200)]
kernel: remove spurious blkdev.h includes

Various files have acquired spurious includes of <linux/blkdev.h> over
time.  Remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoarch: remove spurious blkdev.h includes
Christoph Hellwig [Mon, 20 Sep 2021 12:33:16 +0000 (14:33 +0200)]
arch: remove spurious blkdev.h includes

Various files have acquired spurious includes of <linux/blkdev.h> over
time.  Remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agomm: remove spurious blkdev.h includes
Christoph Hellwig [Mon, 20 Sep 2021 12:33:15 +0000 (14:33 +0200)]
mm: remove spurious blkdev.h includes

Various files have acquired spurious includes of <linux/blkdev.h> over
time.  Remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agomm: don't include <linux/blkdev.h> in <linux/backing-dev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:14 +0000 (14:33 +0200)]
mm: don't include <linux/blkdev.h> in <linux/backing-dev.h>

Move inode_to_bdi out of line to avoid having to include blkdev.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agomm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:13 +0000 (14:33 +0200)]
mm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h>

There is no need to pull blk-cgroup.h and thus blkdev.h in here, so
break the include chain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agomm: don't include <linux/blk-cgroup.h> in <linux/writeback.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:12 +0000 (14:33 +0200)]
mm: don't include <linux/blk-cgroup.h> in <linux/writeback.h>

blk-cgroup.h pulls in blkdev.h and thus pretty much all the block
headers.  Break this dependency chain by turning wbc_blkcg_css into a
macro and dropping the blk-cgroup.h include.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoLinux 5.15-rc3
Linus Torvalds [Sun, 26 Sep 2021 21:08:19 +0000 (14:08 -0700)]
Linux 5.15-rc3

3 years agoMerge tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Sun, 26 Sep 2021 19:46:45 +0000 (12:46 -0700)]
Merge tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull ksmbd fixes from Steve French:
 "Five fixes for the ksmbd kernel server, including three security
  fixes:

   - remove follow symlinks support

   - use LOOKUP_BENEATH to prevent out of share access

   - SMB3 compounding security fix

   - fix for returning the default streams correctly, fixing a bug when
     writing ppt or doc files from some clients

   - logging more clearly that ksmbd is experimental (at module load
     time)"

* tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: use LOOKUP_BENEATH to prevent the out of share access
  ksmbd: remove follow symlinks support
  ksmbd: check protocol id in ksmbd_verify_smb_message()
  ksmbd: add default data stream name in FILE_STREAM_INFORMATION
  ksmbd: log that server is experimental at module load

3 years agoMerge tag 'edac_urgent_for_v5.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Sep 2021 19:18:10 +0000 (12:18 -0700)]
Merge tag 'edac_urgent_for_v5.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC fixes from Borislav Petkov:
 "Fix two EDAC drivers using the wrong value type for the DIMM mode"

* tag 'edac_urgent_for_v5.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/dmc520: Assign the proper type to dimm->edac_mode
  EDAC/synopsys: Fix wrong value type assignment for edac_mode

3 years agoMerge tag 'thermal-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/therma...
Linus Torvalds [Sun, 26 Sep 2021 19:11:58 +0000 (12:11 -0700)]
Merge tag 'thermal-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Pull thermal fixes from Daniel Lezcano:

 - Fix thermal shutdown after a suspend/resume due to a wrong TCC value
   restored on Intel platform (Antoine Tenart)

 - Fix potential buffer overflow when building the list of policies. The
   buffer size is not updated after writing to it (Dan Carpenter)

 - Fix wrong check against IS_ERR instead of NULL (Ansuel Smith)

* tag 'thermal-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/tsens: Fix wrong check for tzd in irq handlers
  thermal/core: Potential buffer overflow in thermal_build_list_of_policies()
  thermal/drivers/int340x: Do not set a wrong tcc offset on resume

3 years agoMerge tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 26 Sep 2021 17:09:20 +0000 (10:09 -0700)]
Merge tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for X86:

   - Prevent sending the wrong signal when protection keys are enabled
     and the kernel handles a fault in the vsyscall emulation.

   - Invoke early_reserve_memory() before invoking e820_memory_setup()
     which is required to make the Xen dom0 e820 hooks work correctly.

   - Use the correct data type for the SETZ operand in the EMQCMDS
     instruction wrapper.

   - Prevent undefined behaviour to the potential unaligned accesss in
     the instruction decoder library"

* tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses
  x86/asm: Fix SETZ size enqcmds() build failure
  x86/setup: Call early_reserve_memory() earlier
  x86/fault: Fix wrong signal when vsyscall fails with pkey

3 years agoMerge tag 'timers-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Sep 2021 17:00:16 +0000 (10:00 -0700)]
Merge tag 'timers-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix for the recently introduced regression in posix CPU
  timers which failed to stop the timer when requested. That caused
  unexpected signals to be sent to the process/thread causing
  malfunction"

* tag 'timers-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Prevent spuriously armed 0-value itimer

3 years agoMerge tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 26 Sep 2021 16:55:22 +0000 (09:55 -0700)]
Merge tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of fixes for interrupt chip drivers:

   - Work around a bad GIC integration on a Renesas platform which can't
     handle byte-sized MMIO access

   - Plug a potential memory leak in the GICv4 driver

   - Fix a regression in the Armada 370-XP IPI code which was caused by
     issuing EOI instack of ACK.

   - A couple of small fixes here and there"

* tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic: Work around broken Renesas integration
  irqchip/renesas-rza1: Use semicolons instead of commas
  irqchip/gic-v3-its: Fix potential VPE leak on error
  irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build
  irqchip/mbigen: Repair non-kernel-doc notation
  irqdomain: Change the type of 'size' in __irq_domain_add() to be consistent
  irqchip/armada-370-xp: Fix ack/eoi breakage
  Documentation: Fix irq-domain.rst build warning