]> www.infradead.org Git - users/hch/block.git/log
users/hch/block.git
3 years agopartitions/ibm: use bdev_nr_sectors instead of open coding it block-kill-bd_inode
Christoph Hellwig [Tue, 12 Oct 2021 08:36:52 +0000 (10:36 +0200)]
partitions/ibm: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agopartitions/efi: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Tue, 12 Oct 2021 08:34:27 +0000 (10:34 +0200)]
partitions/efi: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoblock/ioctl: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Tue, 12 Oct 2021 08:33:30 +0000 (10:33 +0200)]
block/ioctl: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agomtd/block2mtd: use bdev_read_cache_page in read_part_sector
Christoph Hellwig [Tue, 12 Oct 2021 08:29:00 +0000 (10:29 +0200)]
mtd/block2mtd: use bdev_read_cache_page in read_part_sector

Use bdev_read_cache_page instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoscsi: use bdev_read_cache_page in scsi_bios_ptable
Christoph Hellwig [Tue, 12 Oct 2021 08:13:17 +0000 (10:13 +0200)]
scsi: use bdev_read_cache_page in scsi_bios_ptable

Use bdev_read_cache_page instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoblock: use bdev_read_cache_page in read_part_sector
Christoph Hellwig [Tue, 12 Oct 2021 08:11:56 +0000 (10:11 +0200)]
block: use bdev_read_cache_page in read_part_sector

Use bdev_read_cache_page instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoerofs: use bdev_read_cache_page
Christoph Hellwig [Tue, 12 Oct 2021 08:07:45 +0000 (10:07 +0200)]
erofs: use bdev_read_cache_page

Use bdev_read_cache_page instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agocramfs: use bdev_read_cache_page
Christoph Hellwig [Tue, 12 Oct 2021 08:17:37 +0000 (10:17 +0200)]
cramfs: use bdev_read_cache_page

Use bdev_read_cache_page instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agobtrfs: use bdev_read_cache_page
Christoph Hellwig [Tue, 12 Oct 2021 08:04:11 +0000 (10:04 +0200)]
btrfs: use bdev_read_cache_page

Use bdev_read_cache_page instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agobcache: use bdev_read_cache_page
Christoph Hellwig [Tue, 12 Oct 2021 08:02:32 +0000 (10:02 +0200)]
bcache: use bdev_read_cache_page

Use bdev_read_cache_page instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoblock: add a bdev_read_cache_page helper
Christoph Hellwig [Tue, 12 Oct 2021 08:01:51 +0000 (10:01 +0200)]
block: add a bdev_read_cache_page helper

Add a helper to read a pagecache page from a block device.  This allows
to hide accesses to ->bd_inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoudf: use sb_bdev_nr_blocks
Christoph Hellwig [Mon, 11 Oct 2021 15:01:25 +0000 (17:01 +0200)]
udf: use sb_bdev_nr_blocks

Use the sb_bdev_nr_blocks helper instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoreiserfs: use sb_bdev_nr_blocks
Christoph Hellwig [Mon, 11 Oct 2021 14:58:24 +0000 (16:58 +0200)]
reiserfs: use sb_bdev_nr_blocks

Use the sb_bdev_nr_blocks helper instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agontfs: use sb_bdev_nr_blocks
Christoph Hellwig [Mon, 11 Oct 2021 14:55:32 +0000 (16:55 +0200)]
ntfs: use sb_bdev_nr_blocks

Use the sb_bdev_nr_blocks helper instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agojfs: use sb_bdev_nr_blocks
Christoph Hellwig [Mon, 11 Oct 2021 14:49:33 +0000 (16:49 +0200)]
jfs: use sb_bdev_nr_blocks

Use the sb_bdev_nr_blocks helper instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoext4: use sb_bdev_nr_blocks
Christoph Hellwig [Mon, 11 Oct 2021 14:47:47 +0000 (16:47 +0200)]
ext4: use sb_bdev_nr_blocks

Use the sb_bdev_nr_blocks helper instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoblock: add a sb_bdev_nr_blocks helper
Christoph Hellwig [Mon, 11 Oct 2021 14:45:59 +0000 (16:45 +0200)]
block: add a sb_bdev_nr_blocks helper

Add a helper to return the size of sb->s_bdev in sb->s_blocksize_bits
based unites.  Note that SECTOR_SHIFT has to be open coded due to
include dependency issues for now, but I have a plan to sort that out
eventually.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoblock: use bdev_nr_sectors instead of open coding it in blkdev_fallocate
Christoph Hellwig [Mon, 11 Oct 2021 14:41:37 +0000 (16:41 +0200)]
block: use bdev_nr_sectors instead of open coding it in blkdev_fallocate

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agosquashfs: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 14:59:54 +0000 (16:59 +0200)]
squashfs: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoreiserfs: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 14:57:23 +0000 (16:57 +0200)]
reiserfs: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size and remove two
cargo culted checks that can't be false.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agopstore/blk: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Tue, 12 Oct 2021 08:24:50 +0000 (10:24 +0200)]
pstore/blk: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agontfs3: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Tue, 12 Oct 2021 08:21:28 +0000 (10:21 +0200)]
ntfs3: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonilfs2: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 14:53:27 +0000 (16:53 +0200)]
nilfs2: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonfs/blocklayout: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 14:51:48 +0000 (16:51 +0200)]
nfs/blocklayout: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agojfs: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 14:50:44 +0000 (16:50 +0200)]
jfs: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agohfsplus: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:22:50 +0000 (13:22 +0200)]
hfsplus: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agohfs: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:22:17 +0000 (13:22 +0200)]
hfs: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agofat: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:23:47 +0000 (13:23 +0200)]
fat: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agocramfs: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Tue, 12 Oct 2021 08:16:15 +0000 (10:16 +0200)]
cramfs: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agobtrfs: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:33:41 +0000 (13:33 +0200)]
btrfs: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoaffs: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:20:59 +0000 (13:20 +0200)]
affs: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agofs: simplify init_page_buffers
Christoph Hellwig [Mon, 11 Oct 2021 15:09:05 +0000 (17:09 +0200)]
fs: simplify init_page_buffers

No need to convert from bdev to inode and back.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agofs: use bdev_nr_sectors instead of open coding it in blkdev_max_block
Christoph Hellwig [Mon, 11 Oct 2021 11:36:06 +0000 (13:36 +0200)]
fs: use bdev_nr_sectors instead of open coding it in blkdev_max_block

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agotarget/iblock: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:28:23 +0000 (13:28 +0200)]
target/iblock: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:26:29 +0000 (13:26 +0200)]
nvmet: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agomtd/block2mtd: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:25:32 +0000 (13:25 +0200)]
mtd/block2mtd: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agomd: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:19:53 +0000 (13:19 +0200)]
md: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agodm: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:07:46 +0000 (13:07 +0200)]
dm: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agodrbd: use bdev_nr_sectors instead of open coding it
Christoph Hellwig [Mon, 11 Oct 2021 11:03:17 +0000 (13:03 +0200)]
drbd: use bdev_nr_sectors instead of open coding it

Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agobcache: remove bdev_sectors
Christoph Hellwig [Mon, 11 Oct 2021 11:04:16 +0000 (13:04 +0200)]
bcache: remove bdev_sectors

Use the equivalent block layer helper instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoblock: pre-allocate requests if plug is started and is a batch
Jens Axboe [Wed, 6 Oct 2021 12:34:11 +0000 (06:34 -0600)]
block: pre-allocate requests if plug is started and is a batch

The caller typically has a good (or even exact) idea of how many requests
it needs to submit. We can make the request/tag allocation a lot more
efficient if we just allocate N requests/tags upfront when we queue the
first bio from the batch.

Provide a new plug start helper that allows the caller to specify how many
IOs are expected. This sets plug->nr_ios, and we can use that for smarter
request allocation. The plug provides a holding spot for requests, and
request allocation will check it before calling into the normal request
allocation path.

The blk_finish_plug() is called, check if there are unused requests and
free them. This should not happen in normal operations. The exception is
if we get merging, then we may be left with requests that need freeing
when done.

This raises the per-core performance on my setup from ~5.8M to ~6.1M
IOPS.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: bump max plugged deferred size from 16 to 32
Jens Axboe [Wed, 6 Oct 2021 18:01:07 +0000 (12:01 -0600)]
block: bump max plugged deferred size from 16 to 32

Particularly for NVMe with efficient deferred submission for many
requests, there are nice benefits to be seen by bumping the default max
plug count from 16 to 32. This is especially true for virtualized setups,
where the submit part is more expensive. But can be noticed even on
native hardware.

Reduce the multiple queue factor from 4 to 2, since we're changing the
default size.

While changing it, move the defines into the block layer private header.
These aren't values that anyone outside of the block layer uses, or
should use.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: inherit request start time from bio for BLK_CGROUP
Jens Axboe [Tue, 5 Oct 2021 15:23:59 +0000 (09:23 -0600)]
block: inherit request start time from bio for BLK_CGROUP

Doing high IOPS testing with blk-cgroups enabled spends ~15-20% of the
time just doing ktime_get_ns() -> readtsc. We essentially read and
set the start time twice, one for the bio and then again when that bio
is mapped to a request.

Given that the time between the two is very short, inherit the bio
start time instead of reading it again. This cuts 1/3rd of the overhead
of the time keeping.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move blk-throtl fast path inline
Jens Axboe [Tue, 5 Oct 2021 15:11:56 +0000 (09:11 -0600)]
block: move blk-throtl fast path inline

Even if no policies are defined, we spend ~2% of the total IO time
checking. Move the fast path inline.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Change shared sbitmap naming to shared tags
John Garry [Tue, 5 Oct 2021 10:23:39 +0000 (18:23 +0800)]
blk-mq: Change shared sbitmap naming to shared tags

Now that shared sbitmap support really means shared tags, rename symbols
to match that.

Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1633429419-228500-15-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Stop using pointers for blk_mq_tags bitmap tags
John Garry [Tue, 5 Oct 2021 10:23:38 +0000 (18:23 +0800)]
blk-mq: Stop using pointers for blk_mq_tags bitmap tags

Now that we use shared tags for shared sbitmap support, we don't require
the tags sbitmap pointers, so drop them.

This essentially reverts commit 222a5ae03cdd ("blk-mq: Use pointers for
blk_mq_tags bitmap tags").

Function blk_mq_init_bitmap_tags() is removed also, since it would be only
a wrappper for blk_mq_init_bitmaps().

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1633429419-228500-14-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Use shared tags for shared sbitmap support
John Garry [Tue, 5 Oct 2021 10:23:37 +0000 (18:23 +0800)]
blk-mq: Use shared tags for shared sbitmap support

Currently we use separate sbitmap pairs and active_queues atomic_t for
shared sbitmap support.

However a full sets of static requests are used per HW queue, which is
quite wasteful, considering that the total number of requests usable at
any given time across all HW queues is limited by the shared sbitmap depth.

As such, it is considerably more memory efficient in the case of shared
sbitmap to allocate a set of static rqs per tag set or request queue, and
not per HW queue.

So replace the sbitmap pairs and active_queues atomic_t with a shared
tags per tagset and request queue, which will hold a set of shared static
rqs.

Since there is now no valid HW queue index to be passed to the blk_mq_ops
.init and .exit_request callbacks, pass an invalid index token. This
changes the semantics of the APIs, such that the callback would need to
validate the HW queue index before using it. Currently no user of shared
sbitmap actually uses the HW queue index (as would be expected).

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-13-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()
John Garry [Tue, 5 Oct 2021 10:23:36 +0000 (18:23 +0800)]
blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()

Refactor blk_mq_free_map_and_requests() such that it can be used at many
sites at which the tag map and rqs are freed.

Also rename to blk_mq_free_map_and_rqs(), which is shorter and matches the
alloc equivalent.

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-12-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Add blk_mq_alloc_map_and_rqs()
John Garry [Tue, 5 Oct 2021 10:23:35 +0000 (18:23 +0800)]
blk-mq: Add blk_mq_alloc_map_and_rqs()

Add a function to combine allocating tags and the associated requests,
and factor out common patterns to use this new function.

Some function only call blk_mq_alloc_map_and_rqs() now, but more
functionality will be added later.

Also make blk_mq_alloc_rq_map() and blk_mq_alloc_rqs() static since they
are only used in blk-mq.c, and finally rename some functions for
conciseness and consistency with other function names:
- __blk_mq_alloc_map_and_{request -> rqs}()
- blk_mq_alloc_{map_and_requests -> set_map_and_rqs}()

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-11-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Add blk_mq_tag_update_sched_shared_sbitmap()
John Garry [Tue, 5 Oct 2021 10:23:34 +0000 (18:23 +0800)]
blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap()

Put the functionality to update the sched shared sbitmap size in a common
function.

Since the same formula is always used to resize, and it can be got from
the request queue argument, so just pass the request queue pointer.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-10-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Don't clear driver tags own mapping
John Garry [Tue, 5 Oct 2021 10:23:33 +0000 (18:23 +0800)]
blk-mq: Don't clear driver tags own mapping

Function blk_mq_clear_rq_mapping() is required to clear the sched tags
mappings in driver tags rqs[].

But there is no need for a driver tags to clear its own mapping, so skip
clearing the mapping in this scenario.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-9-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Pass driver tags to blk_mq_clear_rq_mapping()
John Garry [Tue, 5 Oct 2021 10:23:32 +0000 (18:23 +0800)]
blk-mq: Pass driver tags to blk_mq_clear_rq_mapping()

Function blk_mq_clear_rq_mapping() will be used for shared sbitmap tags
in future, so pass a driver tags pointer instead of the tagset container
and HW queue index.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-8-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()
John Garry [Tue, 5 Oct 2021 10:23:31 +0000 (18:23 +0800)]
blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()

To be more concise and consistent in naming, rename
blk_mq_sched_free_requests() -> blk_mq_sched_free_rqs().

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-7-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}()
John Garry [Tue, 5 Oct 2021 10:23:30 +0000 (18:23 +0800)]
blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}()

Function blk_mq_sched_alloc_tags() does same as
__blk_mq_alloc_map_and_request(), so give a similar name to be consistent.

Similarly rename label err_free_tags -> err_free_map_and_rqs.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-6-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Invert check in blk_mq_update_nr_requests()
John Garry [Tue, 5 Oct 2021 10:23:29 +0000 (18:23 +0800)]
blk-mq: Invert check in blk_mq_update_nr_requests()

It's easier to read:

if (x)
X;
else
Y;

over:

if (!x)
Y;
else
X;

No functional change intended.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-5-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()
John Garry [Tue, 5 Oct 2021 10:23:28 +0000 (18:23 +0800)]
blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()

For shared sbitmap, if the call to blk_mq_tag_update_depth() was
successful for any hctx when hctx->sched_tags is not set, then it would be
successful for all (due to nature in which blk_mq_tag_update_depth()
fails).

As such, there is no need to call blk_mq_tag_resize_shared_sbitmap() for
each hctx. So relocate the call until after the hctx iteration under the
!q->elevator check, which is equivalent (to !hctx->sched_tags).

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-4-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ
John Garry [Tue, 5 Oct 2021 10:23:27 +0000 (18:23 +0800)]
block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ

It is a bit confusing that there is BLKDEV_MAX_RQ and MAX_SCHED_RQ, as
the name BLKDEV_MAX_RQ would imply the max requests always, which it is
not.

Rename to BLKDEV_MAX_RQ to BLKDEV_DEFAULT_RQ, matching its usage - that being
the default number of requests assigned when allocating a request queue.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-3-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblk-mq: Change rqs check in blk_mq_free_rqs()
John Garry [Tue, 5 Oct 2021 10:23:26 +0000 (18:23 +0800)]
blk-mq: Change rqs check in blk_mq_free_rqs()

The original code in commit 24d2f90309b23 ("blk-mq: split out tag
initialization, support shared tags") would check tags->rqs is non-NULL and
then dereference tags->rqs[].

Then in commit 2af8cbe30531 ("blk-mq: split tag ->rqs[] into two"), we
started to dereference tags->static_rqs[], but continued to check non-NULL
tags->rqs.

Check tags->static_rqs as non-NULL instead, which is more logical.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-2-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: print the current process in handle_bad_sector
Christoph Hellwig [Tue, 28 Sep 2021 05:27:55 +0000 (07:27 +0200)]
block: print the current process in handle_bad_sector

Make the bad sector information a little more useful by printing
current->comm to identify the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20210928052755.113016-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock/mq-deadline: Prioritize high-priority requests
Bart Van Assche [Mon, 27 Sep 2021 22:03:28 +0000 (15:03 -0700)]
block/mq-deadline: Prioritize high-priority requests

In addition to reverting commit 7b05bf771084 ("Revert "block/mq-deadline:
Prioritize high-priority requests""), this patch uses 'jiffies' instead
of ktime_get() in the code for aging lower priority requests.

This patch has been tested as follows:

Measured QD=1/jobs=1 IOPS for nullb with the mq-deadline scheduler.
Result without and with this patch: 555 K IOPS.

Measured QD=1/jobs=8 IOPS for nullb with the mq-deadline scheduler.
Result without and with this patch: about 380 K IOPS.

Ran the following script:

set -e
scriptdir=$(dirname "$0")
if [ -e /sys/module/scsi_debug ]; then modprobe -r scsi_debug; fi
modprobe scsi_debug ndelay=1000000 max_queue=16
sd=''
while [ -z "$sd" ]; do
  sd=$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
done
echo $((100*1000)) > "/sys/block/$sd/queue/iosched/prio_aging_expire"
if [ -e /sys/fs/cgroup/io.prio.class ]; then
  cd /sys/fs/cgroup
  echo restrict-to-be >io.prio.class
  echo +io > cgroup.subtree_control
else
  cd /sys/fs/cgroup/blkio/
  echo restrict-to-be >blkio.prio.class
fi
echo $$ >cgroup.procs
mkdir -p hipri
cd hipri
if [ -e io.prio.class ]; then
  echo none-to-rt >io.prio.class
else
  echo none-to-rt >blkio.prio.class
fi
{ "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/low-pri.txt & }
echo $$ >cgroup.procs
"${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/hi-pri.txt

Result:
* 11000 IOPS for the high-priority job
*    40 IOPS for the low-priority job

If the prio aging expiry time is changed from 100s into 0, the IOPS results
change into 6712 and 6796 IOPS.

The max-iops script is a script that runs fio with the following arguments:
--bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
--norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
--iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
--iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
--filename=${positional_argument_1}

Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20210927220328.1410161-5-bvanassche@acm.org
[axboe: @latest -> @latest_start]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock/mq-deadline: Stop using per-CPU counters
Bart Van Assche [Mon, 27 Sep 2021 22:03:27 +0000 (15:03 -0700)]
block/mq-deadline: Stop using per-CPU counters

Calculating the sum over all CPUs of per-CPU counters frequently is
inefficient. Hence switch from per-CPU to individual counters. Three
counters are protected by the mq-deadline spinlock since these are
only accessed from contexts that already hold that spinlock. The fourth
counter is atomic because protecting it with the mq-deadline spinlock
would trigger lock contention.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210927220328.1410161-4-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock/mq-deadline: Add an invariant check
Bart Van Assche [Mon, 27 Sep 2021 22:03:26 +0000 (15:03 -0700)]
block/mq-deadline: Add an invariant check

Check a statistics invariant at module unload time. When running
blktests, the invariant is verified every time a request queue is
removed and hence is verified at least once per test.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210927220328.1410161-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock/mq-deadline: Improve request accounting further
Bart Van Assche [Mon, 27 Sep 2021 22:03:25 +0000 (15:03 -0700)]
block/mq-deadline: Improve request accounting further

The scheduler .insert_requests() callback is called when a request is
queued for the first time and also when it is requeued. Only count a
request the first time it is queued. Additionally, since the mq-deadline
scheduler only performs zone locking for requests that have been
inserted, skip the zone unlock code for requests that have not been
inserted into the mq-deadline scheduler.

Fixes: 38ba64d12d4c ("block/mq-deadline: Track I/O statistics")
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210927220328.1410161-2-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move struct request to blk-mq.h
Christoph Hellwig [Mon, 20 Sep 2021 12:33:28 +0000 (14:33 +0200)]
block: move struct request to blk-mq.h

struct request is only used by blk-mq drivers, so move it and all
related declarations to blk-mq.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move integrity handling out of <linux/blkdev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:27 +0000 (14:33 +0200)]
block: move integrity handling out of <linux/blkdev.h>

Split the integrity/metadata handling definitions out into a new header.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move a few merge helpers out of <linux/blkdev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:26 +0000 (14:33 +0200)]
block: move a few merge helpers out of <linux/blkdev.h>

These are block-layer internal helpers, so move them to block/blk.h and
block/blk-merge.c.  Also update a comment a bit to use better grammar.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: drop unused includes in <linux/genhd.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:25 +0000 (14:33 +0200)]
block: drop unused includes in <linux/genhd.h>

Drop various include not actually used in genhd.h itself, and
move the remaning includes closer together.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: drop unused includes in <linux/blkdev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:24 +0000 (14:33 +0200)]
block: drop unused includes in <linux/blkdev.h>

Drop various include not actually used in blkdev.h itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: move elevator.h to block/
Christoph Hellwig [Mon, 20 Sep 2021 12:33:23 +0000 (14:33 +0200)]
block: move elevator.h to block/

Except for the features passed to blk_queue_required_elevator_features,
elevator.h is only needed internally to the block layer.  Move the
ELEVATOR_F_* definitions to blkdev.h, and the move elevator.h to
block/, dropping all the spurious includes outside of that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: remove the struct blk_queue_ctx forward declaration
Christoph Hellwig [Mon, 20 Sep 2021 12:33:22 +0000 (14:33 +0200)]
block: remove the struct blk_queue_ctx forward declaration

This type doesn't exist at all, so no need to forward declare it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: remove the cmd_size field from struct request_queue
Christoph Hellwig [Mon, 20 Sep 2021 12:33:21 +0000 (14:33 +0200)]
block: remove the cmd_size field from struct request_queue

Entirely unused.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: remove the unused blk_queue_state enum
Christoph Hellwig [Mon, 20 Sep 2021 12:33:20 +0000 (14:33 +0200)]
block: remove the unused blk_queue_state enum

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: remove the unused rq_end_sector macro
Christoph Hellwig [Mon, 20 Sep 2021 12:33:19 +0000 (14:33 +0200)]
block: remove the unused rq_end_sector macro

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agosched: move the <linux/blkdev.h> include out of kernel/sched/sched.h
Christoph Hellwig [Mon, 20 Sep 2021 12:33:18 +0000 (14:33 +0200)]
sched: move the <linux/blkdev.h> include out of kernel/sched/sched.h

Only core.c needs blkdev.h, so move the #include statement there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agokernel: remove spurious blkdev.h includes
Christoph Hellwig [Mon, 20 Sep 2021 12:33:17 +0000 (14:33 +0200)]
kernel: remove spurious blkdev.h includes

Various files have acquired spurious includes of <linux/blkdev.h> over
time.  Remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoarch: remove spurious blkdev.h includes
Christoph Hellwig [Mon, 20 Sep 2021 12:33:16 +0000 (14:33 +0200)]
arch: remove spurious blkdev.h includes

Various files have acquired spurious includes of <linux/blkdev.h> over
time.  Remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agomm: remove spurious blkdev.h includes
Christoph Hellwig [Mon, 20 Sep 2021 12:33:15 +0000 (14:33 +0200)]
mm: remove spurious blkdev.h includes

Various files have acquired spurious includes of <linux/blkdev.h> over
time.  Remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agomm: don't include <linux/blkdev.h> in <linux/backing-dev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:14 +0000 (14:33 +0200)]
mm: don't include <linux/blkdev.h> in <linux/backing-dev.h>

Move inode_to_bdi out of line to avoid having to include blkdev.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agomm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:13 +0000 (14:33 +0200)]
mm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h>

There is no need to pull blk-cgroup.h and thus blkdev.h in here, so
break the include chain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agomm: don't include <linux/blk-cgroup.h> in <linux/writeback.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:12 +0000 (14:33 +0200)]
mm: don't include <linux/blk-cgroup.h> in <linux/writeback.h>

blk-cgroup.h pulls in blkdev.h and thus pretty much all the block
headers.  Break this dependency chain by turning wbc_blkcg_css into a
macro and dropping the blk-cgroup.h include.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoLinux 5.15-rc3
Linus Torvalds [Sun, 26 Sep 2021 21:08:19 +0000 (14:08 -0700)]
Linux 5.15-rc3

3 years agoMerge tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Sun, 26 Sep 2021 19:46:45 +0000 (12:46 -0700)]
Merge tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull ksmbd fixes from Steve French:
 "Five fixes for the ksmbd kernel server, including three security
  fixes:

   - remove follow symlinks support

   - use LOOKUP_BENEATH to prevent out of share access

   - SMB3 compounding security fix

   - fix for returning the default streams correctly, fixing a bug when
     writing ppt or doc files from some clients

   - logging more clearly that ksmbd is experimental (at module load
     time)"

* tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: use LOOKUP_BENEATH to prevent the out of share access
  ksmbd: remove follow symlinks support
  ksmbd: check protocol id in ksmbd_verify_smb_message()
  ksmbd: add default data stream name in FILE_STREAM_INFORMATION
  ksmbd: log that server is experimental at module load

3 years agoMerge tag 'edac_urgent_for_v5.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Sep 2021 19:18:10 +0000 (12:18 -0700)]
Merge tag 'edac_urgent_for_v5.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC fixes from Borislav Petkov:
 "Fix two EDAC drivers using the wrong value type for the DIMM mode"

* tag 'edac_urgent_for_v5.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/dmc520: Assign the proper type to dimm->edac_mode
  EDAC/synopsys: Fix wrong value type assignment for edac_mode

3 years agoMerge tag 'thermal-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/therma...
Linus Torvalds [Sun, 26 Sep 2021 19:11:58 +0000 (12:11 -0700)]
Merge tag 'thermal-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Pull thermal fixes from Daniel Lezcano:

 - Fix thermal shutdown after a suspend/resume due to a wrong TCC value
   restored on Intel platform (Antoine Tenart)

 - Fix potential buffer overflow when building the list of policies. The
   buffer size is not updated after writing to it (Dan Carpenter)

 - Fix wrong check against IS_ERR instead of NULL (Ansuel Smith)

* tag 'thermal-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/tsens: Fix wrong check for tzd in irq handlers
  thermal/core: Potential buffer overflow in thermal_build_list_of_policies()
  thermal/drivers/int340x: Do not set a wrong tcc offset on resume

3 years agoMerge tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 26 Sep 2021 17:09:20 +0000 (10:09 -0700)]
Merge tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for X86:

   - Prevent sending the wrong signal when protection keys are enabled
     and the kernel handles a fault in the vsyscall emulation.

   - Invoke early_reserve_memory() before invoking e820_memory_setup()
     which is required to make the Xen dom0 e820 hooks work correctly.

   - Use the correct data type for the SETZ operand in the EMQCMDS
     instruction wrapper.

   - Prevent undefined behaviour to the potential unaligned accesss in
     the instruction decoder library"

* tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses
  x86/asm: Fix SETZ size enqcmds() build failure
  x86/setup: Call early_reserve_memory() earlier
  x86/fault: Fix wrong signal when vsyscall fails with pkey

3 years agoMerge tag 'timers-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Sep 2021 17:00:16 +0000 (10:00 -0700)]
Merge tag 'timers-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix for the recently introduced regression in posix CPU
  timers which failed to stop the timer when requested. That caused
  unexpected signals to be sent to the process/thread causing
  malfunction"

* tag 'timers-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Prevent spuriously armed 0-value itimer

3 years agoMerge tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 26 Sep 2021 16:55:22 +0000 (09:55 -0700)]
Merge tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of fixes for interrupt chip drivers:

   - Work around a bad GIC integration on a Renesas platform which can't
     handle byte-sized MMIO access

   - Plug a potential memory leak in the GICv4 driver

   - Fix a regression in the Armada 370-XP IPI code which was caused by
     issuing EOI instack of ACK.

   - A couple of small fixes here and there"

* tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic: Work around broken Renesas integration
  irqchip/renesas-rza1: Use semicolons instead of commas
  irqchip/gic-v3-its: Fix potential VPE leak on error
  irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build
  irqchip/mbigen: Repair non-kernel-doc notation
  irqdomain: Change the type of 'size' in __irq_domain_add() to be consistent
  irqchip/armada-370-xp: Fix ack/eoi breakage
  Documentation: Fix irq-domain.rst build warning

3 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 25 Sep 2021 23:20:34 +0000 (16:20 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "16 patches.

  Subsystems affected by this patch series: xtensa, sh, ocfs2, scripts,
  lib, and mm (memory-failure, kasan, damon, shmem, tools, pagecache,
  debug, and pagemap)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: fix uninitialized use in overcommit_policy_handler
  mm/memory_failure: fix the missing pte_unmap() call
  kasan: always respect CONFIG_KASAN_STACK
  sh: pgtable-3level: fix cast to pointer from integer of different size
  mm/debug: sync up latest migrate_reason to migrate_reason_names
  mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN
  mm: fs: invalidate bh_lrus for only cold path
  lib/zlib_inflate/inffast: check config in C to avoid unused function warning
  tools/vm/page-types: remove dependency on opt_file for idle page tracking
  scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error
  ocfs2: drop acl cache for directories too
  mm/shmem.c: fix judgment error in shmem_is_huge()
  xtensa: increase size of gcc stack frame check
  mm/damon: don't use strnlen() with known-bogus source length
  kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS
  mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()

3 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 25 Sep 2021 23:05:56 +0000 (16:05 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Thirty-three fixes, I'm afraid.

  Essentially the build up from the last couple of weeks while I've been
  dealling with Linux Plumbers conference infrastructure issues. It's
  mostly the usual assortment of spelling fixes and minor corrections.

  The only core relevant changes are to the sd driver to reduce the spin
  up message spew and fix a small memory leak on the freeing path"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (33 commits)
  scsi: ses: Retry failed Send/Receive Diagnostic commands
  scsi: target: Fix spelling mistake "CONFLIFT" -> "CONFLICT"
  scsi: lpfc: Fix gcc -Wstringop-overread warning, again
  scsi: lpfc: Use correct scnprintf() limit
  scsi: lpfc: Fix sprintf() overflow in lpfc_display_fpin_wwpn()
  scsi: core: Remove 'current_tag'
  scsi: acornscsi: Remove tagged queuing vestiges
  scsi: fas216: Kill scmd->tag
  scsi: qla2xxx: Restore initiator in dual mode
  scsi: ufs: core: Unbreak the reset handler
  scsi: sd_zbc: Support disks with more than 2**32 logical blocks
  scsi: ufs: core: Revert "scsi: ufs: Synchronize SCSI and UFS error handling"
  scsi: bsg: Fix device unregistration
  scsi: sd: Make sd_spinup_disk() less noisy
  scsi: ufs: ufs-pci: Fix Intel LKF link stability
  scsi: mpt3sas: Clean up some inconsistent indenting
  scsi: megaraid: Clean up some inconsistent indenting
  scsi: sr: Fix spelling mistake "does'nt" -> "doesn't"
  scsi: Remove SCSI CDROM MAINTAINERS entry
  scsi: megaraid: Fix Coccinelle warning
  ...

3 years agoMerge tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 25 Sep 2021 22:51:08 +0000 (15:51 -0700)]
Merge tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "This one looks a bit bigger than it is, but that's mainly because 2/3
  of it is enabling IORING_OP_CLOSE to close direct file descriptors.

  We've had a few folks using them and finding it confusing that the way
  to close them is through using -1 for file update, this just brings
  API symmetry for direct descriptors. Hence I think we should just do
  this now and have a better API for 5.15 release. There's some room for
  de-duplicating the close code, but we're leaving that for the next
  merge window.

  Outside of that, just small fixes:

   - Poll race fixes (Hao)

   - io-wq core dump exit fix (me)

   - Reschedule around potentially intensive tctx and buffer iterators
     on teardown (me)

   - Fix for always ending up punting files update to io-wq (me)

   - Put the provided buffer meta data under memcg accounting (me)

   - Tweak for io_write(), removing dead code that was added with the
     iterator changes in this release (Pavel)"

* tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-block:
  io_uring: make OP_CLOSE consistent with direct open
  io_uring: kill extra checks in io_write()
  io_uring: don't punt files update to io-wq unconditionally
  io_uring: put provided buffer meta data under memcg accounting
  io_uring: allow conditional reschedule for intensive iterators
  io_uring: fix potential req refcount underflow
  io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow
  io_uring: fix race between poll completion and cancel_hash insertion
  io-wq: ensure we exit if thread group is exiting

3 years agoMerge tag 'block-5.15-2021-09-25' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 25 Sep 2021 22:44:05 +0000 (15:44 -0700)]
Merge tag 'block-5.15-2021-09-25' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request via Christoph:
      - keep ctrl->namespaces ordered (Christoph Hellwig)
      - fix incorrect h2cdata pdu offset accounting in nvme-tcp (Sagi
        Grimberg)
      - handled updated hw_queues in nvme-fc more carefully (Daniel
        Wagner, James Smart)

 - md lock order fix (Christoph)

 - fallocate locking fix (Ming)

 - blktrace UAF fix (Zhihao)

 - rq-qos bio tracking fix (Ming)

* tag 'block-5.15-2021-09-25' of git://git.kernel.dk/linux-block:
  block: hold ->invalidate_lock in blkdev_fallocate
  blktrace: Fix uaf in blk_trace access after removing by sysfs
  block: don't call rq_qos_ops->done_bio if the bio isn't tracked
  md: fix a lock order reversal in md_alloc
  nvme: keep ctrl->namespaces ordered
  nvme-tcp: fix incorrect h2cdata pdu offset accounting
  nvme-fc: remove freeze/unfreeze around update_nr_hw_queues
  nvme-fc: avoid race between time out and tear down
  nvme-fc: update hardware queues before using them

3 years agoMerge tag 'for-linus-5.15b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 25 Sep 2021 22:37:31 +0000 (15:37 -0700)]
Merge tag 'for-linus-5.15b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Some minor cleanups and fixes of some theoretical bugs, as well as a
  fix of a bug introduced in 5.15-rc1"

* tag 'for-linus-5.15b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/x86: fix PV trap handling on secondary processors
  xen/balloon: fix balloon kthread freezing
  swiotlb-xen: this is PV-only on x86
  xen/pci-swiotlb: reduce visibility of symbols
  PCI: only build xen-pcifront in PV-enabled environments
  swiotlb-xen: ensure to issue well-formed XENMEM_exchange requests
  Xen/gntdev: don't ignore kernel unmapping error
  xen/x86: drop redundant zeroing from cpu_initialize_context()

3 years agoMerge tag 'linux-kselftest-fixes-5.15-rc3' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 25 Sep 2021 22:30:29 +0000 (15:30 -0700)]
Merge tag 'linux-kselftest-fixes-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:

 - fix to Kselftest common framework header install to run before other
   targets for it work correctly in parallel build case.

 - fixes to kvm test to not ignore fscanf() returns which could result
   in inconsistent test behavior and failures.

* tag 'linux-kselftest-fixes-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: kvm: fix get_run_delay() ignoring fscanf() return warn
  selftests: kvm: move get_run_delay() into lib/test_util
  selftests:kvm: fix get_trans_hugepagesz() ignoring fscanf() return warn
  selftests:kvm: fix get_warnings_count() ignoring fscanf() return warn
  selftests: be sure to make khdr before other targets

3 years agoMerge tag 'erofs-for-5.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 25 Sep 2021 18:31:48 +0000 (11:31 -0700)]
Merge tag 'erofs-for-5.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
 "Two bugfixes to fix the 4KiB blockmap chunk format availability and a
  dangling pointer usage. There is also a trivial cleanup to clarify
  compacted_2b if compacted_4b_initial > totalidx.

  Summary:

   - fix the dangling pointer use in erofs_lookup tracepoint

   - fix unsupported chunk format check

   - zero out compacted_2b if compacted_4b_initial > totalidx"

* tag 'erofs-for-5.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: clear compacted_2b if compacted_4b_initial > totalidx
  erofs: fix misbehavior of unsupported chunk format check
  erofs: fix up erofs_lookup tracepoint

3 years agoMerge tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 25 Sep 2021 18:08:12 +0000 (11:08 -0700)]
Merge tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Six small cifs/smb3 fixes, two for stable:

   - important fix for deferred close (found by a git functional test)
     related to attribute caching on close.

   - four (two cosmetic, two more serious) small fixes for problems
     pointed out by smatch via Dan Carpenter

   - fix for comment formatting problems pointed out by W=1"

* tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix incorrect check for null pointer in header_assemble
  smb3: correct server pointer dereferencing check to be more consistent
  smb3: correct smb3 ACL security descriptor
  cifs: Clear modified attribute bit from inode flags
  cifs: Deal with some warnings from W=1
  cifs: fix a sign extension bug

3 years agoMerge tag 'char-misc-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sat, 25 Sep 2021 17:29:14 +0000 (10:29 -0700)]
Merge tag 'char-misc-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small char and misc driver fixes for 5.15-rc3.

  Nothing huge in here, just fixes for a number of small issues that
  have been reported. These include:

   - habanalabs race conditions and other bugs fixed

   - binder driver fixes

   - fpga driver fixes

   - coresight build warning fix

   - nvmem driver fix

   - comedi memory leak fix

   - bcm-vk tty race fix

   - other tiny driver fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
  comedi: Fix memory leak in compat_insnlist()
  nvmem: NVMEM_NINTENDO_OTP should depend on WII
  misc: bcm-vk: fix tty registration race
  fpga: dfl: Avoid reads to AFU CSRs during enumeration
  fpga: machxo2-spi: Fix missing error code in machxo2_write_complete()
  fpga: machxo2-spi: Return an error on failure
  habanalabs: expose a single cs seq in staged submissions
  habanalabs: fix wait offset handling
  habanalabs: rate limit multi CS completion errors
  habanalabs/gaudi: fix LBW RR configuration
  habanalabs: Fix spelling mistake "FEADBACK" -> "FEEDBACK"
  habanalabs: fail collective wait when not supported
  habanalabs/gaudi: use direct MSI in single mode
  habanalabs: fix kernel OOPs related to staged cs
  habanalabs: fix potential race in interrupt wait ioctl
  mcb: fix error handling in mcb_alloc_bus()
  misc: genwqe: Fixes DMA mask setting
  coresight: syscfg: Fix compiler warning
  nvmem: core: Add stubs for nvmem_cell_read_variable_le_u32/64 if !CONFIG_NVMEM
  binder: make sure fd closes complete
  ...

3 years agoMerge tag 'staging-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sat, 25 Sep 2021 17:19:49 +0000 (10:19 -0700)]
Merge tag 'staging-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are two small staging driver fixes for 5.15-rc3:

   - greybus tty use-after-free bugfix

   - r8188eu ioctl overlap build warning fix

  Note, the r8188eu ioctl has been entirely removed for 5.16-rc1, but
  it's good to get this fixed now for people using this in 5.15.

  Both of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: r8188eu: fix -Wrestrict warnings
  staging: greybus: uart: fix tty use after free

3 years agoMerge tag 'tty-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sat, 25 Sep 2021 17:15:55 +0000 (10:15 -0700)]
Merge tag 'tty-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are four small tty/serial driver fixes for 5.15-rc3. They
  include:

   - remove an export now that no one is using it anymore

   - mvebu-uart tx_empty callback fix

   - 8250_omap bugfix

   - synclink_gt build fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: unexport tty_ldisc_release
  tty: synclink_gt: rename a conflicting function name
  serial: mvebu-uart: fix driver's tx_empty callback
  serial: 8250: 8250_omap: Fix RX_LVL register offset

3 years agoMerge tag 'usb-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sat, 25 Sep 2021 17:10:38 +0000 (10:10 -0700)]
Merge tag 'usb-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
 "Here are some USB driver fixes and new device ids for 5.15-rc3.

  They include:

   - usb-storage quirk additions

   - usb-serial new device ids

   - usb-serial driver fixes

   - USB roothub registration bugfix to resolve a long-reported issue

   - usb gadget driver fixes for a large number of small things

   - dwc2 driver fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
  USB: serial: option: add device id for Foxconn T99W265
  USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
  USB: serial: cp210x: add part-number debug printk
  USB: serial: cp210x: fix dropped characters with CP2102
  MAINTAINERS: usb, update Peter Korsgaard's entries
  usb: musb: tusb6010: uninitialized data in tusb_fifo_write_unaligned()
  usb-storage: Add quirk for ScanLogic SL11R-IDE older than 2.6c
  Re-enable UAS for LaCie Rugged USB3-FW with fk quirk
  USB: serial: option: remove duplicate USB device ID
  USB: serial: mos7840: remove duplicated 0xac24 device ID
  arm64: dts: qcom: ipq8074: remove USB tx-fifo-resize property
  usb: gadget: f_uac2: Populate SS descriptors' wBytesPerInterval
  usb: gadget: f_uac2: Add missing companion descriptor for feedback EP
  usb: dwc2: gadget: Fix ISOC transfer complete handling for DDMA
  usb: core: hcd: Modularize HCD stop configuration in usb_stop_hcd()
  xhci: Set HCD flag to defer primary roothub registration
  usb: core: hcd: Add support for deferring roothub registration
  usb: dwc2: gadget: Fix ISOC flow for BDMA and Slave
  usb: dwc3: core: balance phy init and exit
  Revert "USB: bcma: Add a check for devm_gpiod_get"
  ...

3 years agoksmbd: use LOOKUP_BENEATH to prevent the out of share access
Hyunchul Lee [Fri, 24 Sep 2021 15:06:16 +0000 (00:06 +0900)]
ksmbd: use LOOKUP_BENEATH to prevent the out of share access

instead of removing '..' in a given path, call
kern_path with LOOKUP_BENEATH flag to prevent
the out of share access.

ran various test on this:
smb2-cat-async smb://127.0.0.1/homes/../out_of_share
smb2-cat-async smb://127.0.0.1/homes/foo/../../out_of_share
smbclient //127.0.0.1/homes -c "mkdir ../foo2"
smbclient //127.0.0.1/homes -c "rename bar ../bar"

Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Boehme <slow@samba.org>
Tested-by: Steve French <smfrench@gmail.com>
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>