]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
4 years agoio_uring: don't sleep when polling for I/O
Christoph Hellwig [Tue, 12 Oct 2021 11:12:20 +0000 (13:12 +0200)]
io_uring: don't sleep when polling for I/O

There is no point in sleeping for the expected I/O completion timeout
in the io_uring async polling model as we never poll for a specific
I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: replace the spin argument to blk_iopoll with a flags argument
Christoph Hellwig [Tue, 12 Oct 2021 11:12:19 +0000 (13:12 +0200)]
block: replace the spin argument to blk_iopoll with a flags argument

Switch the boolean spin argument to blk_poll to passing a set of flags
instead.  This will allow to control polling behavior in a more fine
grained way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de
[axboe: adapt to changed io_uring iopoll]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: remove blk_qc_t_valid
Christoph Hellwig [Tue, 12 Oct 2021 11:12:18 +0000 (13:12 +0200)]
blk-mq: remove blk_qc_t_valid

Move the trivial check into the only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: remove blk_qc_t_to_tag and blk_qc_t_is_internal
Christoph Hellwig [Tue, 12 Oct 2021 11:12:17 +0000 (13:12 +0200)]
blk-mq: remove blk_qc_t_to_tag and blk_qc_t_is_internal

Merge both functions into their only caller to keep the blk-mq tag to
blk_qc_t mapping as private as possible in blk-mq.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: factor out a "classic" poll helper
Christoph Hellwig [Tue, 12 Oct 2021 11:12:16 +0000 (13:12 +0200)]
blk-mq: factor out a "classic" poll helper

Factor the code to do the classic full metal polling out of blk_poll into
a separate blk_mq_poll_classic helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: factor out a blk_qc_to_hctx helper
Christoph Hellwig [Tue, 12 Oct 2021 11:12:15 +0000 (13:12 +0200)]
blk-mq: factor out a blk_qc_to_hctx helper

Add a helper to get the hctx from a request_queue and cookie, and fold
the blk_qc_t_to_queue_num helper into it as no other callers are left.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoio_uring: fix a layering violation in io_iopoll_req_issued
Christoph Hellwig [Tue, 12 Oct 2021 11:12:14 +0000 (13:12 +0200)]
io_uring: fix a layering violation in io_iopoll_req_issued

syscall-level code can't just poke into the details of the poll cookie,
which is private information of the block layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012111226.760968-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoiomap: don't try to poll multi-bio I/Os in __iomap_dio_rw
Christoph Hellwig [Tue, 12 Oct 2021 11:12:13 +0000 (13:12 +0200)]
iomap: don't try to poll multi-bio I/Os in __iomap_dio_rw

If an iocb is split into multiple bios we can't poll for both.  So don't
bother to even try to poll in that case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: don't try to poll multi-bio I/Os in __blkdev_direct_IO
Christoph Hellwig [Tue, 12 Oct 2021 11:12:12 +0000 (13:12 +0200)]
block: don't try to poll multi-bio I/Os in __blkdev_direct_IO

If an iocb is split into multiple bios we can't poll for both.  So don't
even bother to try to poll in that case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012111226.760968-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agodirect-io: remove blk_poll support
Christoph Hellwig [Tue, 12 Oct 2021 11:12:11 +0000 (13:12 +0200)]
direct-io: remove blk_poll support

The polling support in the legacy direct-io support is a little crufty.
It already doesn't support the asynchronous polling needed for io_uring
polling, and is hard to adopt to upcoming changes in the polling
interfaces.  Given that all the major file systems already use the iomap
direct I/O code, just drop the polling support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: only check previous entry for plug merge attempt
Jens Axboe [Thu, 14 Oct 2021 13:24:07 +0000 (07:24 -0600)]
block: only check previous entry for plug merge attempt

Currently we scan the entire plug list, which is potentially very
expensive. In an IOPS bound workload, we can drive about 5.6M IOPS with
merging enabled, and profiling shows that the plug merge check is the
(by far) most expensive thing we're doing:

  Overhead  Command   Shared Object     Symbol
  +   20.89%  io_uring  [kernel.vmlinux]  [k] blk_attempt_plug_merge
  +    4.98%  io_uring  [kernel.vmlinux]  [k] io_submit_sqes
  +    4.78%  io_uring  [kernel.vmlinux]  [k] blkdev_direct_IO
  +    4.61%  io_uring  [kernel.vmlinux]  [k] blk_mq_submit_bio

Instead of browsing the whole list, just check the previously inserted
entry. That is enough for a naive merge check and will catch most cases,
and for devices that need full merging, the IO scheduler attached to
such devices will do that anyway. The plug merge is meant to be an
inexpensive check to avoid getting a request, but if we repeatedly
scan the list for every single insert, it is very much not a cheap
check.

With this patch, the workload instead runs at ~7.0M IOPS, providing
a 25% improvement. Disabling merging entirely yields another 5%
improvement.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move CONFIG_BLOCK guard to top Makefile
Masahiro Yamada [Mon, 27 Sep 2021 14:00:00 +0000 (23:00 +0900)]
block: move CONFIG_BLOCK guard to top Makefile

Every object under block/ depends on CONFIG_BLOCK.

Move the guard to the top Makefile since there is no point to
descend into block/ if CONFIG_BLOCK=n.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210927140000.866249-5-masahiroy@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move menu "Partition type" to block/partitions/Kconfig
Masahiro Yamada [Mon, 27 Sep 2021 13:59:59 +0000 (22:59 +0900)]
block: move menu "Partition type" to block/partitions/Kconfig

Move the menu to the relevant place.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210927140000.866249-4-masahiroy@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: simplify Kconfig files
Masahiro Yamada [Mon, 27 Sep 2021 13:59:58 +0000 (22:59 +0900)]
block: simplify Kconfig files

Everything under block/ depends on BLOCK. BLOCK_HOLDER_DEPRECATED is
selected from drivers/md/Kconfig, which is entirely dependent on BLOCK.

Extend the 'if BLOCK' ... 'endif' so it covers the whole block/Kconfig.

Also, clean up the definition of BLOCK_COMPAT and BLK_MQ_PCI because
COMPAT and PCI are boolean.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210927140000.866249-3-masahiroy@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove redundant =y from BLK_CGROUP dependency
Masahiro Yamada [Mon, 27 Sep 2021 13:59:57 +0000 (22:59 +0900)]
block: remove redundant =y from BLK_CGROUP dependency

CONFIG_BLK_CGROUP is a boolean option, that is, its value is 'y' or 'n'.
The comparison to 'y' is redundant.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210927140000.866249-2-masahiroy@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: improve batched tag allocation
Jens Axboe [Sat, 9 Oct 2021 19:10:39 +0000 (13:10 -0600)]
block: improve batched tag allocation

Add a blk_mq_get_tags() helper, which uses the new sbitmap API for
allocating a batch of tags all at once. This both simplifies the block
code for batched allocation, and it is also more efficient than just
doing repeated calls into __sbitmap_queue_get().

This reduces the sbitmap overhead in peak runs from ~3% to ~1% and
yields a performanc increase from 6.6M IOPS to 6.8M IOPS for a single
CPU core.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agosbitmap: add __sbitmap_queue_get_batch()
Jens Axboe [Sat, 9 Oct 2021 19:02:23 +0000 (13:02 -0600)]
sbitmap: add __sbitmap_queue_get_batch()

The block layer tag allocation batching still calls into sbitmap to get
each tag, but we can improve on that. Add __sbitmap_queue_get_batch(),
which returns a mask of tags all at once, along with an offset for
those tags.

An example return would be 0xff, where bits 0..7 are set, with
tag_offset == 128. The valid tags in this case would be 128..135.

A batch is specific to an individual sbitmap_map, hence it cannot be
larger than that. The requested number of tags is automatically reduced
to the max that can be satisfied with a single map.

On failure, 0 is returned. Caller should fall back to single tag
allocation at that point/

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: optimise *end_request non-stat path
Pavel Begunkov [Wed, 13 Oct 2021 08:57:13 +0000 (09:57 +0100)]
blk-mq: optimise *end_request non-stat path

We already have a blk_mq_need_time_stamp() check in
__blk_mq_end_request() to get a timestamp, hide all the statistics
accounting under it. It cuts some cycles for requests that don't need
stats, and is free otherwise.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/e0f2ea812e93a8adcd07101212e7d7e70ca304e7.1634115360.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: mark bio_truncate static
Christoph Hellwig [Tue, 12 Oct 2021 16:18:04 +0000 (18:18 +0200)]
block: mark bio_truncate static

bio_truncate is only used in bio.c, so mark it static.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move bio_get_{first,last}_bvec out of bio.h
Christoph Hellwig [Tue, 12 Oct 2021 16:18:03 +0000 (18:18 +0200)]
block: move bio_get_{first,last}_bvec out of bio.h

bio_get_first_bvec and bio_get_last_bvec are only used in blk-merge.c,
so move them there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: mark __bio_try_merge_page static
Christoph Hellwig [Tue, 12 Oct 2021 16:18:02 +0000 (18:18 +0200)]
block: mark __bio_try_merge_page static

Mark __bio_try_merge_page static and move it up a bit to avoid the need
for a forward declaration.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move bio_full out of bio.h
Christoph Hellwig [Tue, 12 Oct 2021 16:18:01 +0000 (18:18 +0200)]
block: move bio_full out of bio.h

bio_full is only used in bio.c, so move it there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: fold bio_cur_bytes into blk_rq_cur_bytes
Christoph Hellwig [Tue, 12 Oct 2021 16:18:00 +0000 (18:18 +0200)]
block: fold bio_cur_bytes into blk_rq_cur_bytes

Fold bio_cur_bytes into the only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move bio_mergeable out of bio.h
Christoph Hellwig [Tue, 12 Oct 2021 16:17:59 +0000 (18:17 +0200)]
block: move bio_mergeable out of bio.h

bio_mergeable is only needed by I/O schedulers, so move it to
blk-mq-sched.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: don't include <linux/ioprio.h> in <linux/bio.h>
Christoph Hellwig [Tue, 12 Oct 2021 16:17:58 +0000 (18:17 +0200)]
block: don't include <linux/ioprio.h> in <linux/bio.h>

bio.h doesn't need any of the definitions from ioprio.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove BIO_BUG_ON
Christoph Hellwig [Tue, 12 Oct 2021 16:17:57 +0000 (18:17 +0200)]
block: remove BIO_BUG_ON

BIO_DEBUG is always defined, so just switch the two instances to use
BUG_ON directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: inline hot part of __blk_mq_sched_restart
Pavel Begunkov [Sat, 9 Oct 2021 12:25:42 +0000 (13:25 +0100)]
blk-mq: inline hot part of __blk_mq_sched_restart

Extract a fast check out of __block_mq_sched_restart() and inline it for
performance reasons.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/894abaa0998e5999f2fe18f271e5efdfc2c32bd2.1633781740.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: inline hot paths of blk_account_io_*()
Pavel Begunkov [Sat, 9 Oct 2021 12:25:41 +0000 (13:25 +0100)]
block: inline hot paths of blk_account_io_*()

Extract hot paths of __blk_account_io_start() and
__blk_account_io_done() into inline functions, so we don't always pay
for function calls.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b0662a636bd4cc7b4f84c9d0a41efa46a688ef13.1633781740.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: merge block_ioctl into blkdev_ioctl
Christoph Hellwig [Tue, 12 Oct 2021 10:44:50 +0000 (12:44 +0200)]
block: merge block_ioctl into blkdev_ioctl

Simplify the ioctl path and match the code structure on the compat side.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104450.659013-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move the *blkdev_ioctl declarations out of blkdev.h
Christoph Hellwig [Tue, 12 Oct 2021 10:44:49 +0000 (12:44 +0200)]
block: move the *blkdev_ioctl declarations out of blkdev.h

These are only used inside of block/.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104450.659013-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: unexport blkdev_ioctl
Christoph Hellwig [Tue, 12 Oct 2021 10:44:48 +0000 (12:44 +0200)]
block: unexport blkdev_ioctl

With the raw driver gone, there is no modular user left.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104450.659013-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: don't dereference request after flush insertion
Jens Axboe [Sat, 16 Oct 2021 13:34:49 +0000 (07:34 -0600)]
block: don't dereference request after flush insertion

We could have a race here, where the request gets freed before we call
into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state
of the request.

Grab the hardware context before inserting the flush.

Fixes: 0f38d7664615 ("blk-mq: cleanup blk_mq_submit_bio")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: cleanup blk_mq_submit_bio
Christoph Hellwig [Tue, 12 Oct 2021 10:40:45 +0000 (12:40 +0200)]
blk-mq: cleanup blk_mq_submit_bio

Move the blk_mq_alloc_data stack allocation only into the branch
that actually needs it, and use rq->mq_hctx instead of data.hctx
to refer to the hctx.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104045.658051-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: cleanup and rename __blk_mq_alloc_request
Christoph Hellwig [Tue, 12 Oct 2021 10:40:44 +0000 (12:40 +0200)]
blk-mq: cleanup and rename __blk_mq_alloc_request

The newly added loop for the cached requests in __blk_mq_alloc_request
is a little too convoluted for my taste, so unwind it a bit.  Also
rename the function to __blk_mq_alloc_requests now that it can allocate
more than a single request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104045.658051-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: pre-allocate requests if plug is started and is a batch
Jens Axboe [Wed, 6 Oct 2021 12:34:11 +0000 (06:34 -0600)]
block: pre-allocate requests if plug is started and is a batch

The caller typically has a good (or even exact) idea of how many requests
it needs to submit. We can make the request/tag allocation a lot more
efficient if we just allocate N requests/tags upfront when we queue the
first bio from the batch.

Provide a new plug start helper that allows the caller to specify how many
IOs are expected. This sets plug->nr_ios, and we can use that for smarter
request allocation. The plug provides a holding spot for requests, and
request allocation will check it before calling into the normal request
allocation path.

The blk_finish_plug() is called, check if there are unused requests and
free them. This should not happen in normal operations. The exception is
if we get merging, then we may be left with requests that need freeing
when done.

This raises the per-core performance on my setup from ~5.8M to ~6.1M
IOPS.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: bump max plugged deferred size from 16 to 32
Jens Axboe [Wed, 6 Oct 2021 18:01:07 +0000 (12:01 -0600)]
block: bump max plugged deferred size from 16 to 32

Particularly for NVMe with efficient deferred submission for many
requests, there are nice benefits to be seen by bumping the default max
plug count from 16 to 32. This is especially true for virtualized setups,
where the submit part is more expensive. But can be noticed even on
native hardware.

Reduce the multiple queue factor from 4 to 2, since we're changing the
default size.

While changing it, move the defines into the block layer private header.
These aren't values that anyone outside of the block layer uses, or
should use.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: inherit request start time from bio for BLK_CGROUP
Jens Axboe [Tue, 5 Oct 2021 15:23:59 +0000 (09:23 -0600)]
block: inherit request start time from bio for BLK_CGROUP

Doing high IOPS testing with blk-cgroups enabled spends ~15-20% of the
time just doing ktime_get_ns() -> readtsc. We essentially read and
set the start time twice, one for the bio and then again when that bio
is mapped to a request.

Given that the time between the two is very short, inherit the bio
start time instead of reading it again. This cuts 1/3rd of the overhead
of the time keeping.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move blk-throtl fast path inline
Jens Axboe [Tue, 5 Oct 2021 15:11:56 +0000 (09:11 -0600)]
block: move blk-throtl fast path inline

Even if no policies are defined, we spend ~2% of the total IO time
checking. Move the fast path inline.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Change shared sbitmap naming to shared tags
John Garry [Tue, 5 Oct 2021 10:23:39 +0000 (18:23 +0800)]
blk-mq: Change shared sbitmap naming to shared tags

Now that shared sbitmap support really means shared tags, rename symbols
to match that.

Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1633429419-228500-15-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Stop using pointers for blk_mq_tags bitmap tags
John Garry [Tue, 5 Oct 2021 10:23:38 +0000 (18:23 +0800)]
blk-mq: Stop using pointers for blk_mq_tags bitmap tags

Now that we use shared tags for shared sbitmap support, we don't require
the tags sbitmap pointers, so drop them.

This essentially reverts commit 222a5ae03cdd ("blk-mq: Use pointers for
blk_mq_tags bitmap tags").

Function blk_mq_init_bitmap_tags() is removed also, since it would be only
a wrappper for blk_mq_init_bitmaps().

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1633429419-228500-14-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Use shared tags for shared sbitmap support
John Garry [Tue, 5 Oct 2021 10:23:37 +0000 (18:23 +0800)]
blk-mq: Use shared tags for shared sbitmap support

Currently we use separate sbitmap pairs and active_queues atomic_t for
shared sbitmap support.

However a full sets of static requests are used per HW queue, which is
quite wasteful, considering that the total number of requests usable at
any given time across all HW queues is limited by the shared sbitmap depth.

As such, it is considerably more memory efficient in the case of shared
sbitmap to allocate a set of static rqs per tag set or request queue, and
not per HW queue.

So replace the sbitmap pairs and active_queues atomic_t with a shared
tags per tagset and request queue, which will hold a set of shared static
rqs.

Since there is now no valid HW queue index to be passed to the blk_mq_ops
.init and .exit_request callbacks, pass an invalid index token. This
changes the semantics of the APIs, such that the callback would need to
validate the HW queue index before using it. Currently no user of shared
sbitmap actually uses the HW queue index (as would be expected).

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-13-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()
John Garry [Tue, 5 Oct 2021 10:23:36 +0000 (18:23 +0800)]
blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()

Refactor blk_mq_free_map_and_requests() such that it can be used at many
sites at which the tag map and rqs are freed.

Also rename to blk_mq_free_map_and_rqs(), which is shorter and matches the
alloc equivalent.

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-12-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Add blk_mq_alloc_map_and_rqs()
John Garry [Tue, 5 Oct 2021 10:23:35 +0000 (18:23 +0800)]
blk-mq: Add blk_mq_alloc_map_and_rqs()

Add a function to combine allocating tags and the associated requests,
and factor out common patterns to use this new function.

Some function only call blk_mq_alloc_map_and_rqs() now, but more
functionality will be added later.

Also make blk_mq_alloc_rq_map() and blk_mq_alloc_rqs() static since they
are only used in blk-mq.c, and finally rename some functions for
conciseness and consistency with other function names:
- __blk_mq_alloc_map_and_{request -> rqs}()
- blk_mq_alloc_{map_and_requests -> set_map_and_rqs}()

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-11-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Add blk_mq_tag_update_sched_shared_sbitmap()
John Garry [Tue, 5 Oct 2021 10:23:34 +0000 (18:23 +0800)]
blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap()

Put the functionality to update the sched shared sbitmap size in a common
function.

Since the same formula is always used to resize, and it can be got from
the request queue argument, so just pass the request queue pointer.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-10-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Don't clear driver tags own mapping
John Garry [Tue, 5 Oct 2021 10:23:33 +0000 (18:23 +0800)]
blk-mq: Don't clear driver tags own mapping

Function blk_mq_clear_rq_mapping() is required to clear the sched tags
mappings in driver tags rqs[].

But there is no need for a driver tags to clear its own mapping, so skip
clearing the mapping in this scenario.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-9-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Pass driver tags to blk_mq_clear_rq_mapping()
John Garry [Tue, 5 Oct 2021 10:23:32 +0000 (18:23 +0800)]
blk-mq: Pass driver tags to blk_mq_clear_rq_mapping()

Function blk_mq_clear_rq_mapping() will be used for shared sbitmap tags
in future, so pass a driver tags pointer instead of the tagset container
and HW queue index.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-8-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()
John Garry [Tue, 5 Oct 2021 10:23:31 +0000 (18:23 +0800)]
blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()

To be more concise and consistent in naming, rename
blk_mq_sched_free_requests() -> blk_mq_sched_free_rqs().

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-7-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}()
John Garry [Tue, 5 Oct 2021 10:23:30 +0000 (18:23 +0800)]
blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}()

Function blk_mq_sched_alloc_tags() does same as
__blk_mq_alloc_map_and_request(), so give a similar name to be consistent.

Similarly rename label err_free_tags -> err_free_map_and_rqs.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-6-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Invert check in blk_mq_update_nr_requests()
John Garry [Tue, 5 Oct 2021 10:23:29 +0000 (18:23 +0800)]
blk-mq: Invert check in blk_mq_update_nr_requests()

It's easier to read:

if (x)
X;
else
Y;

over:

if (!x)
Y;
else
X;

No functional change intended.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-5-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()
John Garry [Tue, 5 Oct 2021 10:23:28 +0000 (18:23 +0800)]
blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()

For shared sbitmap, if the call to blk_mq_tag_update_depth() was
successful for any hctx when hctx->sched_tags is not set, then it would be
successful for all (due to nature in which blk_mq_tag_update_depth()
fails).

As such, there is no need to call blk_mq_tag_resize_shared_sbitmap() for
each hctx. So relocate the call until after the hctx iteration under the
!q->elevator check, which is equivalent (to !hctx->sched_tags).

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-4-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ
John Garry [Tue, 5 Oct 2021 10:23:27 +0000 (18:23 +0800)]
block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ

It is a bit confusing that there is BLKDEV_MAX_RQ and MAX_SCHED_RQ, as
the name BLKDEV_MAX_RQ would imply the max requests always, which it is
not.

Rename to BLKDEV_MAX_RQ to BLKDEV_DEFAULT_RQ, matching its usage - that being
the default number of requests assigned when allocating a request queue.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-3-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Change rqs check in blk_mq_free_rqs()
John Garry [Tue, 5 Oct 2021 10:23:26 +0000 (18:23 +0800)]
blk-mq: Change rqs check in blk_mq_free_rqs()

The original code in commit 24d2f90309b23 ("blk-mq: split out tag
initialization, support shared tags") would check tags->rqs is non-NULL and
then dereference tags->rqs[].

Then in commit 2af8cbe30531 ("blk-mq: split tag ->rqs[] into two"), we
started to dereference tags->static_rqs[], but continued to check non-NULL
tags->rqs.

Check tags->static_rqs as non-NULL instead, which is more logical.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-2-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: print the current process in handle_bad_sector
Christoph Hellwig [Tue, 28 Sep 2021 05:27:55 +0000 (07:27 +0200)]
block: print the current process in handle_bad_sector

Make the bad sector information a little more useful by printing
current->comm to identify the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20210928052755.113016-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock/mq-deadline: Prioritize high-priority requests
Bart Van Assche [Mon, 27 Sep 2021 22:03:28 +0000 (15:03 -0700)]
block/mq-deadline: Prioritize high-priority requests

In addition to reverting commit 7b05bf771084 ("Revert "block/mq-deadline:
Prioritize high-priority requests""), this patch uses 'jiffies' instead
of ktime_get() in the code for aging lower priority requests.

This patch has been tested as follows:

Measured QD=1/jobs=1 IOPS for nullb with the mq-deadline scheduler.
Result without and with this patch: 555 K IOPS.

Measured QD=1/jobs=8 IOPS for nullb with the mq-deadline scheduler.
Result without and with this patch: about 380 K IOPS.

Ran the following script:

set -e
scriptdir=$(dirname "$0")
if [ -e /sys/module/scsi_debug ]; then modprobe -r scsi_debug; fi
modprobe scsi_debug ndelay=1000000 max_queue=16
sd=''
while [ -z "$sd" ]; do
  sd=$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
done
echo $((100*1000)) > "/sys/block/$sd/queue/iosched/prio_aging_expire"
if [ -e /sys/fs/cgroup/io.prio.class ]; then
  cd /sys/fs/cgroup
  echo restrict-to-be >io.prio.class
  echo +io > cgroup.subtree_control
else
  cd /sys/fs/cgroup/blkio/
  echo restrict-to-be >blkio.prio.class
fi
echo $$ >cgroup.procs
mkdir -p hipri
cd hipri
if [ -e io.prio.class ]; then
  echo none-to-rt >io.prio.class
else
  echo none-to-rt >blkio.prio.class
fi
{ "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/low-pri.txt & }
echo $$ >cgroup.procs
"${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/hi-pri.txt

Result:
* 11000 IOPS for the high-priority job
*    40 IOPS for the low-priority job

If the prio aging expiry time is changed from 100s into 0, the IOPS results
change into 6712 and 6796 IOPS.

The max-iops script is a script that runs fio with the following arguments:
--bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
--norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
--iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
--iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
--filename=${positional_argument_1}

Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20210927220328.1410161-5-bvanassche@acm.org
[axboe: @latest -> @latest_start]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock/mq-deadline: Stop using per-CPU counters
Bart Van Assche [Mon, 27 Sep 2021 22:03:27 +0000 (15:03 -0700)]
block/mq-deadline: Stop using per-CPU counters

Calculating the sum over all CPUs of per-CPU counters frequently is
inefficient. Hence switch from per-CPU to individual counters. Three
counters are protected by the mq-deadline spinlock since these are
only accessed from contexts that already hold that spinlock. The fourth
counter is atomic because protecting it with the mq-deadline spinlock
would trigger lock contention.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210927220328.1410161-4-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock/mq-deadline: Add an invariant check
Bart Van Assche [Mon, 27 Sep 2021 22:03:26 +0000 (15:03 -0700)]
block/mq-deadline: Add an invariant check

Check a statistics invariant at module unload time. When running
blktests, the invariant is verified every time a request queue is
removed and hence is verified at least once per test.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210927220328.1410161-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock/mq-deadline: Improve request accounting further
Bart Van Assche [Mon, 27 Sep 2021 22:03:25 +0000 (15:03 -0700)]
block/mq-deadline: Improve request accounting further

The scheduler .insert_requests() callback is called when a request is
queued for the first time and also when it is requeued. Only count a
request the first time it is queued. Additionally, since the mq-deadline
scheduler only performs zone locking for requests that have been
inserted, skip the zone unlock code for requests that have not been
inserted into the mq-deadline scheduler.

Fixes: 38ba64d12d4c ("block/mq-deadline: Track I/O statistics")
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210927220328.1410161-2-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move struct request to blk-mq.h
Christoph Hellwig [Mon, 20 Sep 2021 12:33:28 +0000 (14:33 +0200)]
block: move struct request to blk-mq.h

struct request is only used by blk-mq drivers, so move it and all
related declarations to blk-mq.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move integrity handling out of <linux/blkdev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:27 +0000 (14:33 +0200)]
block: move integrity handling out of <linux/blkdev.h>

Split the integrity/metadata handling definitions out into a new header.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move a few merge helpers out of <linux/blkdev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:26 +0000 (14:33 +0200)]
block: move a few merge helpers out of <linux/blkdev.h>

These are block-layer internal helpers, so move them to block/blk.h and
block/blk-merge.c.  Also update a comment a bit to use better grammar.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: drop unused includes in <linux/genhd.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:25 +0000 (14:33 +0200)]
block: drop unused includes in <linux/genhd.h>

Drop various include not actually used in genhd.h itself, and
move the remaning includes closer together.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: drop unused includes in <linux/blkdev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:24 +0000 (14:33 +0200)]
block: drop unused includes in <linux/blkdev.h>

Drop various include not actually used in blkdev.h itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move elevator.h to block/
Christoph Hellwig [Mon, 20 Sep 2021 12:33:23 +0000 (14:33 +0200)]
block: move elevator.h to block/

Except for the features passed to blk_queue_required_elevator_features,
elevator.h is only needed internally to the block layer.  Move the
ELEVATOR_F_* definitions to blkdev.h, and the move elevator.h to
block/, dropping all the spurious includes outside of that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove the struct blk_queue_ctx forward declaration
Christoph Hellwig [Mon, 20 Sep 2021 12:33:22 +0000 (14:33 +0200)]
block: remove the struct blk_queue_ctx forward declaration

This type doesn't exist at all, so no need to forward declare it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove the cmd_size field from struct request_queue
Christoph Hellwig [Mon, 20 Sep 2021 12:33:21 +0000 (14:33 +0200)]
block: remove the cmd_size field from struct request_queue

Entirely unused.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove the unused blk_queue_state enum
Christoph Hellwig [Mon, 20 Sep 2021 12:33:20 +0000 (14:33 +0200)]
block: remove the unused blk_queue_state enum

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove the unused rq_end_sector macro
Christoph Hellwig [Mon, 20 Sep 2021 12:33:19 +0000 (14:33 +0200)]
block: remove the unused rq_end_sector macro

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agosched: move the <linux/blkdev.h> include out of kernel/sched/sched.h
Christoph Hellwig [Mon, 20 Sep 2021 12:33:18 +0000 (14:33 +0200)]
sched: move the <linux/blkdev.h> include out of kernel/sched/sched.h

Only core.c needs blkdev.h, so move the #include statement there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agokernel: remove spurious blkdev.h includes
Christoph Hellwig [Mon, 20 Sep 2021 12:33:17 +0000 (14:33 +0200)]
kernel: remove spurious blkdev.h includes

Various files have acquired spurious includes of <linux/blkdev.h> over
time.  Remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoarch: remove spurious blkdev.h includes
Christoph Hellwig [Mon, 20 Sep 2021 12:33:16 +0000 (14:33 +0200)]
arch: remove spurious blkdev.h includes

Various files have acquired spurious includes of <linux/blkdev.h> over
time.  Remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agomm: remove spurious blkdev.h includes
Christoph Hellwig [Mon, 20 Sep 2021 12:33:15 +0000 (14:33 +0200)]
mm: remove spurious blkdev.h includes

Various files have acquired spurious includes of <linux/blkdev.h> over
time.  Remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agomm: don't include <linux/blkdev.h> in <linux/backing-dev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:14 +0000 (14:33 +0200)]
mm: don't include <linux/blkdev.h> in <linux/backing-dev.h>

Move inode_to_bdi out of line to avoid having to include blkdev.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agomm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:13 +0000 (14:33 +0200)]
mm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h>

There is no need to pull blk-cgroup.h and thus blkdev.h in here, so
break the include chain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agomm: don't include <linux/blk-cgroup.h> in <linux/writeback.h>
Christoph Hellwig [Mon, 20 Sep 2021 12:33:12 +0000 (14:33 +0200)]
mm: don't include <linux/blk-cgroup.h> in <linux/writeback.h>

blk-cgroup.h pulls in blkdev.h and thus pretty much all the block
headers.  Break this dependency chain by turning wbc_blkcg_css into a
macro and dropping the blk-cgroup.h include.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-cgroup: blk_cgroup_bio_start() should use irq-safe operations on blkg->iostat_cpu
Tejun Heo [Thu, 14 Oct 2021 23:20:22 +0000 (13:20 -1000)]
blk-cgroup: blk_cgroup_bio_start() should use irq-safe operations on blkg->iostat_cpu

c3df5fb57fe8 ("cgroup: rstat: fix A-A deadlock on 32bit around
u64_stats_sync") made u64_stats updates irq-safe to avoid A-A deadlocks.
Unfortunately, the conversion missed one in blk_cgroup_bio_start(). Fix it.

Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: syzbot+9738c8815b375ce482a1@syzkaller.appspotmail.com
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/YWi7NrQdVlxD6J9W@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoLinux 5.15-rc6
Linus Torvalds [Mon, 18 Oct 2021 06:00:13 +0000 (20:00 -1000)]
Linux 5.15-rc6

4 years agoMerge tag 'libata-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Mon, 18 Oct 2021 05:39:22 +0000 (19:39 -1000)]
Merge tag 'libata-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull libata fixes from Damien Le Moal:
 "Two fixes for this cycle:

   - Fix a null pointer dereference in ahci-platform driver (from Hai)

   - Fix uninitialized variables in pata_legacy driver (from Dan)"

* tag 'libata-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: ahci_platform: fix null-ptr-deref in ahci_platform_enable_regulators()
  pata_legacy: fix a couple uninitialized variable bugs

4 years agoMerge tag 'block-5.15-2021-10-17' of git://git.kernel.dk/linux-block
Linus Torvalds [Mon, 18 Oct 2021 05:25:20 +0000 (19:25 -1000)]
Merge tag 'block-5.15-2021-10-17' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Bigger than usual for this point in time, the majority is fixing some
  issues around BDI lifetimes with the move from the request_queue to
  the disk in this release. In detail:

   - Series on draining fs IO for del_gendisk() (Christoph)

   - NVMe pull request via Christoph:
        - fix the abort command id (Keith Busch)
        - nvme: fix per-namespace chardev deletion (Adam Manzanares)

   - brd locking scope fix (Tetsuo)

   - BFQ fix (Paolo)"

* tag 'block-5.15-2021-10-17' of git://git.kernel.dk/linux-block:
  block, bfq: reset last_bfqq_created on group change
  block: warn when putting the final reference on a registered disk
  brd: reduce the brd_devices_mutex scope
  kyber: avoid q->disk dereferences in trace points
  block: keep q_usage_counter in atomic mode after del_gendisk
  block: drain file system I/O on del_gendisk
  block: split bio_queue_enter from blk_queue_enter
  block: factor out a blk_try_enter_queue helper
  block: call submit_bio_checks under q_usage_counter
  nvme: fix per-namespace chardev deletion
  block/rnbd-clt-sysfs: fix a couple uninitialized variable bugs
  nvme-pci: Fix abort command id

4 years agoMerge tag 'io_uring-5.15-2021-10-17' of git://git.kernel.dk/linux-block
Linus Torvalds [Mon, 18 Oct 2021 05:20:13 +0000 (19:20 -1000)]
Merge tag 'io_uring-5.15-2021-10-17' of git://git.kernel.dk/linux-block

Pull io_uring fix from Jens Axboe:
 "Just a single fix for a wrong condition for grabbing a lock, a
  regression in this merge window"

* tag 'io_uring-5.15-2021-10-17' of git://git.kernel.dk/linux-block:
  io_uring: fix wrong condition to grab uring lock

4 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Mon, 18 Oct 2021 04:17:19 +0000 (18:17 -1000)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
 "Fixes up some issues in rc5"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost-vdpa: Fix the wrong input in config_cb
  VDUSE: fix documentation underline warning
  Revert "virtio-blk: Add validation for block size in config space"
  vhost_vdpa: unset vq irq before freeing irq
  virtio: write back F_VERSION_1 before validate

4 years agoMerge tag 'powerpc-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Mon, 18 Oct 2021 04:01:32 +0000 (18:01 -1000)]
Merge tag 'powerpc-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a bug where guests on P9 with interrupts passed through could get
   stuck in synchronize_irq().

 - Fix a bug in KVM on P8 where secondary threads entering a guest would
   write outside their allocated stack.

 - Fix a bug in KVM on P8 where secondary threads could confuse the host
   offline code and cause the guest or host to crash.

Thanks to Cédric Le Goater.

* tag 'powerpc-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest
  KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()
  powerpc/xive: Discard disabled interrupts in get_irqchip_state()

4 years agoMerge tag 'objtool_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 18 Oct 2021 03:41:39 +0000 (17:41 -1000)]
Merge tag 'objtool_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Borislav Petkov:

 - Update section headers before the respective relocations to not
   trigger a safety check in elftoolchain's implementation of libelf

 - Do not add garbage data to the .rela.orc_unwind_ip section

* tag 'objtool_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Update section header before relocations
  objtool: Check for gelf_update_rel[a] failures

4 years agoMerge tag 'edac_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 18 Oct 2021 03:36:39 +0000 (17:36 -1000)]
Merge tag 'edac_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC fix from Borislav Petkov:

 - Log the "correct" uncorrectable error count in the armada_xp driver

* tag 'edac_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/armada-xp: Fix output of uncorrectable error counter

4 years agoMerge tag 'perf_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 18 Oct 2021 03:34:18 +0000 (17:34 -1000)]
Merge tag 'perf_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Borislav Petkov:

 - Add Sapphire Rapids to the list of CPUs supporting the SMI count MSR

* tag 'perf_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/msr: Add Sapphire Rapids CPU support

4 years agoMerge tag 'efi-urgent-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 18 Oct 2021 03:30:49 +0000 (17:30 -1000)]
Merge tag 'efi-urgent-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fixes from Borislav Petkov:
 "Forwarded from Ard Biesheuvel through the tip tree. Ard will send
  stuff directly in the near future.

  Low priority fixes but fixes nonetheless:

   - update stub diagnostic print that is no longer accurate

   - avoid statically allocated buffer for CPER error record decoding

   - avoid sleeping on the efi_runtime semaphore when calling the
     ResetSystem EFI runtime service"

* tag 'efi-urgent-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: Change down_interruptible() in virt_efi_reset_system() to down_trylock()
  efi/cper: use stack buffer for error record decoding
  efi/libstub: Simplify "Exiting bootservices" message

4 years agoMerge tag 'x86_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 18 Oct 2021 03:27:22 +0000 (17:27 -1000)]
Merge tag 'x86_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Do not enable AMD memory encryption in Kconfig by default due to
   shortcomings of some platforms, leading to boot failures.

 - Mask out invalid bits in the MXCSR for 32-bit kernels again because
   Thomas and I don't know how to mask out bits properly. Third time's
   the charm.

* tag 'x86_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Mask out the invalid MXCSR bits properly
  x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically

4 years agoMerge tag 'driver-core-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 18 Oct 2021 03:17:28 +0000 (17:17 -1000)]
Merge tag 'driver-core-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some small driver core fixes for 5.15-rc6, all of which have
  been in linux-next for a while with no reported issues.

  They include:

   - kernfs negative dentry bugfix

   - simple pm bus fixes to resolve reported issues"

* tag 'driver-core-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  drivers: bus: Delete CONFIG_SIMPLE_PM_BUS
  drivers: bus: simple-pm-bus: Add support for probing simple bus only devices
  driver core: Reject pointless SYNC_STATE_ONLY device links
  kernfs: don't create a negative dentry if inactive node exists

4 years agoMerge tag 'char-misc-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Mon, 18 Oct 2021 03:14:00 +0000 (17:14 -1000)]
Merge tag 'char-misc-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small char/misc driver fixes for 5.15-rc6 for reported
  issues that include:

   - habanalabs driver fixes

   - mei driver fixes and new ids

   - fpga new device ids

   - MAINTAINER file updates for fpga subsystem

   - spi module id table additions and fixes

   - fastrpc locking fixes

   - nvmem driver fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  eeprom: 93xx46: fix MODULE_DEVICE_TABLE
  nvmem: Fix shift-out-of-bound (UBSAN) with byte size cells
  mei: hbm: drop hbm responses on early shutdown
  mei: me: add Ice Lake-N device id.
  eeprom: 93xx46: Add SPI device ID table
  eeprom: at25: Add SPI ID table
  misc: HI6421V600_IRQ should depend on HAS_IOMEM
  misc: fastrpc: Add missing lock before accessing find_vma()
  cb710: avoid NULL pointer subtraction
  misc: gehc: Add SPI ID table
  MAINTAINERS: Drop outdated FPGA Manager website
  MAINTAINERS: Add Hao and Yilun as maintainers
  habanalabs: fix resetting args in wait for CS IOCTL
  fpga: ice40-spi: Add SPI device ID table

4 years agoMerge tag 'staging-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Mon, 18 Oct 2021 03:10:00 +0000 (17:10 -1000)]
Merge tag 'staging-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO driver fixes from Greg KH:
 "Here are a number of small IIO and staging driver fixes for 5.15-rc6.

  They include:

   - vc04_services bugfix for reported problem

   - r8188eu array underflow fix

   - iio driver fixes for a lot of tiny reported issues.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: r8188eu: prevent array underflow in rtw_hal_update_ra_mask()
  staging: vc04_services: shut up out-of-range warning
  iio: light: opt3001: Fixed timeout error when 0 lux
  iio: adis16480: fix devices that do not support sleep mode
  iio: mtk-auxadc: fix case IIO_CHAN_INFO_PROCESSED
  iio: adis16475: fix deadlock on frequency set
  iio: ssp_sensors: add more range checking in ssp_parse_dataframe()
  iio: ssp_sensors: fix error code in ssp_print_mcu_debug()
  iio: adc: ad7793: Fix IRQ flag
  iio: adc: ad7780: Fix IRQ flag
  iio: adc: ad7192: Add IRQ flag
  iio: adc: aspeed: set driver data when adc probe.
  iio: adc: rzg2l_adc: add missing clk_disable_unprepare() in rzg2l_adc_pm_runtime_resume()
  iio: adc: max1027: Fix the number of max1X31 channels
  iio: adc: max1027: Fix wrong shift with 12-bit devices
  iio: adc128s052: Fix the error handling path of 'adc128_probe()'
  iio: adc: rzg2l_adc: Fix -EBUSY timeout error return
  iio: accel: fxls8962af: return IRQ_HANDLED when fifo is flushed
  iio: dac: ti-dac5571: fix an error code in probe()

4 years agoMerge tag 'tty-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Mon, 18 Oct 2021 03:06:31 +0000 (17:06 -1000)]
Merge tag 'tty-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull serial driver fix from Greg KH:
 "Here is a single 8250 Kconfig fix for 5.15-rc6 that resolves a
  regression that showed up in 5.15-rc1. It has been in linux-next for a
  while with no reported issues"

* tag 'tty-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: 8250: allow disabling of Freescale 16550 compile test

4 years agoMerge tag 'usb-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Mon, 18 Oct 2021 03:02:00 +0000 (17:02 -1000)]
Merge tag 'usb-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB fixes that resolve a number of tiny issues.
  They include:

   - new USB serial driver ids

   - xhci driver fixes for a bunch of issues

   - musb error path fixes.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: musb: dsps: Fix the probe error path
  xhci: Enable trust tx length quirk for Fresco FL11 USB controller
  xhci: Fix command ring pointer corruption while aborting a command
  USB: xhci: dbc: fix tty registration race
  xhci: add quirk for host controllers that don't update endpoint DCS
  xhci: guard accesses to ep_state in xhci_endpoint_reset()
  USB: serial: qcserial: add EM9191 QDL support
  USB: serial: option: add Quectel EC200S-CN module support
  USB: serial: option: add prod. id for Quectel EG91
  USB: serial: option: add Telit LE910Cx composition 0x1204

4 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Mon, 18 Oct 2021 02:57:06 +0000 (16:57 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

 - a new product ID for the xpad joystick driver

 - fixes to resistive-adc-touch and snvs_pwrkey drivers

 - a change to touchscreen helpers to make clang happier

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: touchscreen - avoid bitwise vs logical OR warning
  Input: xpad - add support for another USB ID of Nacon GC-100
  Input: resistive-adc-touch - fix division by zero error on z1 == 0
  Input: snvs_pwrkey - add clk handling

4 years agoblock, bfq: reset last_bfqq_created on group change
Paolo Valente [Fri, 15 Oct 2021 14:43:36 +0000 (16:43 +0200)]
block, bfq: reset last_bfqq_created on group change

Since commit 430a67f9d616 ("block, bfq: merge bursts of newly-created
queues"), BFQ maintains a per-group pointer to the last bfq_queue
created. If such a queue, say bfqq, happens to move to a different
group, then bfqq is no more a valid last bfq_queue created for its
previous group. That pointer must then be cleared. Not resetting such
a pointer may also cause UAF, if bfqq happens to also be freed after
being moved to a different group. This commit performs this missing
reset. As such it fixes commit 430a67f9d616 ("block, bfq: merge bursts
of newly-created queues").

Such a missing reset is most likely the cause of the crash reported in [1].
With some analysis, we found that this crash was due to the
above UAF. And such UAF did go away with this commit applied [1].

Anyway, before this commit, that crash happened to be triggered in
conjunction with commit 2d52c58b9c9b ("block, bfq: honor already-setup
queue merges"). The latter was then reverted by commit ebc69e897e17
("Revert "block, bfq: honor already-setup queue merges""). Yet commit
2d52c58b9c9b ("block, bfq: honor already-setup queue merges") contains
no error related with the above UAF, and can then be restored.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=214503

Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
Tested-by: Grzegorz Kowal <custos.mentis@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20211015144336.45894-2-paolo.valente@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: warn when putting the final reference on a registered disk
Christoph Hellwig [Thu, 14 Oct 2021 13:02:31 +0000 (15:02 +0200)]
block: warn when putting the final reference on a registered disk

Warn when the last reference on a live disk is put without calling
del_gendisk first.  There are some BDI related bug reports that look
like a case of this, so make sure we have the proper instrumentation
to catch it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211014130231.1468538-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobrd: reduce the brd_devices_mutex scope
Tetsuo Handa [Tue, 7 Sep 2021 10:18:00 +0000 (19:18 +0900)]
brd: reduce the brd_devices_mutex scope

As with commit 8b52d8be86d72308 ("loop: reorder loop_exit"),
unregister_blkdev() needs to be called first in order to avoid calling
brd_alloc() from brd_probe() after brd_del_one() from brd_exit(). Then,
we can avoid holding global mutex during add_disk()/del_gendisk() as with
commit 1c500ad706383f1a ("loop: reduce the loop_ctl_mutex scope").

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/e205f13d-18ff-a49c-0988-7de6ea5ff823@i-love.sakura.ne.jp
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoMerge tag 'perf-tools-fixes-for-v5.15-2021-10-16' of git://git.kernel.org/pub/scm...
Linus Torvalds [Sat, 16 Oct 2021 18:11:07 +0000 (11:11 -0700)]
Merge tag 'perf-tools-fixes-for-v5.15-2021-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix 'perf test evsel' build error on !x86 architectures

 - Fix libperf's test_stat_cpu mixup of CPU numbers and CPU indexes

 - Output offsets for decompressed records, not just useless zeros

* tag 'perf-tools-fixes-for-v5.15-2021-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  libperf tests: Fix test_stat_cpu
  libperf test evsel: Fix build error on !x86 architectures
  perf report: Output non-zero offset for decompressed records

4 years agoMerge tag 'fixes-2021-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt...
Linus Torvalds [Sat, 16 Oct 2021 17:57:13 +0000 (10:57 -0700)]
Merge tag 'fixes-2021-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock fix from Mike Rapoport:
 "Fix handling of NOMAP regions with kmemleak.

  NOMAP regions don't have linear map entries so an attempt to scan
  these areas in kmemleak would fault.

  Prevent such faults by excluding NOMAP regions from kmemleak"

* tag 'fixes-2021-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock: exclude NOMAP regions from kmemleak

4 years agoMerge tag 'trace-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Sat, 16 Oct 2021 17:51:41 +0000 (10:51 -0700)]
Merge tag 'trace-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Tracing fixes for 5.15:

 - Fix defined but not use warning/error for osnoise function

 - Fix memory leak in event probe

 - Fix memblock leak in bootconfig

 - Fix the API of event probes to be like kprobes

 - Added test to check removal of event probe API

 - Fix recordmcount.pl for nds32 failed build

* tag 'trace-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  nds32/ftrace: Fix Error: invalid operands (*UND* and *UND* sections) for `^'
  selftests/ftrace: Update test for more eprobe removal process
  tracing: Fix event probe removal from dynamic events
  tracing: Fix missing * in comment block
  bootconfig: init: Fix memblock leak in xbc_make_cmdline()
  tracing: Fix memory leak in eprobe_register()
  tracing: Fix missing osnoise tracer on max_latency

4 years agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 16 Oct 2021 17:22:08 +0000 (10:22 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk driver fixes from Stephen Boyd:
 "Clk driver fixes for critical issues found in the past few weeks:

   - Select gdsc config so qcom sm6350 driver probes

   - Fix a register offset in qcom gcc-sm6115 so the correct clk is
     controlled

   - Fix inverted logic in Renesas RZ/G2L .is_enabled()

   - Mark some more clks critical in Renesas clk driver

   - Remove a duplicate clk in the agilex driver"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: qcom: add select QCOM_GDSC for SM6350
  clk: qcom: gcc-sm6115: Fix offset for hlos1_vote_turing_mmu_tbu0_gdsc
  clk: socfpga: agilex: fix duplicate s2f_user0_clk
  clk: renesas: rzg2l: Fix clk status function
  clk: renesas: r9a07g044: Mark IA55_CLK and DMAC_ACLK critical

4 years agoMerge tag 'for-5.15/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Sat, 16 Oct 2021 17:12:21 +0000 (10:12 -0700)]
Merge tag 'for-5.15/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM verity target to skip redundant processing on I/O errors.

 - Fix request-based DM so that it doesn't queue request to blk-mq when
   DM device is suspended.

 - Fix DM core mempool NULL pointer race when completing IO.

 - Make DM clone target's 'descs' array static.

* tag 'for-5.15/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: fix mempool NULL pointer race when completing IO
  dm rq: don't queue request to blk-mq during DM suspend
  dm clone: make array 'descs' static
  dm verity: skip redundant verity_handle_err() on I/O errors