Christoph Hellwig [Wed, 25 Jan 2023 16:55:15 +0000 (17:55 +0100)]
block: add a bvec_set_folio helper
A smaller wrapper around bvec_set_page that takes a folio instead.
There are only two potential users for this in the tree, but the number
will grow in the future.
Christoph Hellwig [Mon, 23 Jan 2023 07:47:18 +0000 (08:47 +0100)]
ps3vram: remove bio splitting
ps3vram iterates over the bio one segment, that is page aligned and max
page sized chunk, a time. Because of that there is no point in
calling bio_split_to_limits, or explicitly setting the default limits
that are only used by bio_split_to_limits.
Jens Axboe [Fri, 20 Jan 2023 14:51:07 +0000 (07:51 -0700)]
block: treat poll queue enter similarly to timeouts
We ran into an issue where a production workload would randomly grind to
a halt and not continue until the pending IO had timed out. This turned
out to be a complicated interaction between queue freezing and polled
IO:
1) You have an application that does polled IO. At any point in time,
there may be polled IO pending.
2) You have a monitoring application that issues a passthrough command,
which is marked with side effects such that it needs to freeze the
queue.
3) Passthrough command is started, which calls blk_freeze_queue_start()
on the device. At this point the queue is marked frozen, and any
attempt to enter the queue will fail (for non-blocking) or block.
4) Now the driver calls blk_mq_freeze_queue_wait(), which will return
when the queue is quiesced and pending IO has completed.
5) The pending IO is polled IO, but any attempt to poll IO through the
normal iocb_bio_iopoll() -> bio_poll() will fail when it gets to
bio_queue_enter() as the queue is frozen. Rather than poll and
complete IO, the polling threads will sit in a tight loop attempting
to poll, but failing to enter the queue to do so.
The end result is that progress for either application will be stalled
until all pending polled IO has timed out. This causes obvious huge
latency issues for the application doing polled IO, but also long delays
for passthrough command.
Fix this by treating queue enter for polled IO just like we do for
timeouts. This allows quick quiesce of the queue as we still poll and
complete this IO, while still disallowing queueing up new IO.
Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Li Nan [Tue, 17 Jan 2023 07:08:06 +0000 (15:08 +0800)]
blk-iocost: change div64_u64 to DIV64_U64_ROUND_UP in ioc_refresh_params()
vrate_min is calculated by DIV64_U64_ROUND_UP, but vrate_max is calculated
by div64_u64. Vrate_min may be 1 greater than vrate_max if the input
values min and max of cost.qos are equal.
calc_lcoefs() uses the input value of cost.model in DIV_ROUND_UP_ULL,
overflow would happen if bps plus IOC_PAGE_SIZE is greater than
ULLONG_MAX, it can cause divide by 0 error.
Arnd Bergmann [Wed, 18 Jan 2023 08:07:01 +0000 (09:07 +0100)]
blk-iocost: avoid 64-bit division in ioc_timer_fn
The behavior of 'enum' types has changed in gcc-13, so now the
UNBUSY_THR_PCT constant is interpreted as a 64-bit number because
it is defined as part of the same enum definition as some other
constants that do not fit within a 32-bit integer. This in turn
leads to some inefficient code on 32-bit architectures as well
as a link error:
arm-linux-gnueabi/bin/arm-linux-gnueabi-ld: block/blk-iocost.o: in function `ioc_timer_fn':
blk-iocost.c:(.text+0x68e8): undefined reference to `__aeabi_uldivmod'
arm-linux-gnueabi-ld: blk-iocost.c:(.text+0x6908): undefined reference to `__aeabi_uldivmod'
Split the enum definition to keep the 64-bit timing constants in
a separate enum type from those constants that can clearly fit
within a smaller type.
Ming Lei [Wed, 18 Jan 2023 04:23:18 +0000 (12:23 +0800)]
block: ublk: fix doc build warning
Fix the following warning:
Documentation/block/ublk.rst:157: WARNING: Enumerated list ends without a blank line; unexpected unindent.
Documentation/block/ublk.rst:171: WARNING: Enumerated list ends without a blank line; unexpected unindent.
Fixes: 56f5160bc1b8 ("ublk_drv: add mechanism for supporting unprivileged ublk device") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230118042318.127900-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pankaj Raghav [Tue, 10 Jan 2023 14:36:35 +0000 (15:36 +0100)]
block: introduce bdev_zone_no helper
Add a generic bdev_zone_no() helper to calculate zone number for a
given sector in a block device. This helper internally uses disk_zone_no()
to find the zone number.
Use the helper bdev_zone_no() to calculate nr of zones. This lets us
make modifications to the math if needed in one place.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230110143635.77300-4-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Anuj Gupta [Tue, 17 Jan 2023 12:06:37 +0000 (17:36 +0530)]
nvme: set REQ_ALLOC_CACHE for uring-passthru request
This patch sets REQ_ALLOC_CACHE flag for uring-passthru requests.
This is a prep-patch so that normal / IRQ-driven uring-passthru
I/Os can also leverage bio-cache.
Ming Lei [Fri, 6 Jan 2023 04:17:11 +0000 (12:17 +0800)]
ublk_drv: add mechanism for supporting unprivileged ublk device
unprivileged ublk device is helpful for container use case, such
as: ublk device created in one unprivileged container can be controlled
and accessed by this container only.
Implement this feature by adding flag of UBLK_F_UNPRIVILEGED_DEV, and if
this flag isn't set, any control command has been run from privileged
user. Otherwise, any control command can be sent from any unprivileged
user, but the user has to be permitted to access the ublk char device
to be controlled.
In case of UBLK_F_UNPRIVILEGED_DEV:
1) for command UBLK_CMD_ADD_DEV, it is always allowed, and user needs
to provide owner's uid/gid in this command, so that udev can set correct
ownership for the created ublk device, since the device owner uid/gid
can be queried via command of UBLK_CMD_GET_DEV_INFO.
2) for other control commands, they can only be run successfully if the
current user is allowed to access the specified ublk char device, for
running the permission check, path of the ublk char device has to be
provided by these commands.
Also add one control of command UBLK_CMD_GET_DEV_INFO2 which always
include the char dev path in payload since userspace may not have
knowledge if this device is created in unprivileged mode.
For applying this mechanism, system administrator needs to take
the following policies:
Ming Lei [Fri, 6 Jan 2023 04:17:10 +0000 (12:17 +0800)]
ublk_drv: add module parameter of ublks_max for limiting max allowed ublk dev
Prepare for supporting unprivileged ublk device by limiting max number
ublk devices added. Otherwise too many ublk devices could be added by
un-trusted user, which can be thought as one DoS.
Userspace side only knows device ID, but the associated path of ublkc* and
ublkb* could be changed by udev, and that depends on userspace's policy, so
add parameter of UBLK_PARAM_TYPE_DEVT for retrieving major/minor of the
ublkc* and ublkb*, then user may figure out major/minor of the ublk disks
he/she owns. With major/minor, it is easy to find the device node path.
Ming Lei [Fri, 6 Jan 2023 04:17:07 +0000 (12:17 +0800)]
ublk_drv: don't probe partitions if the ubq daemon isn't trusted
If any ubq daemon is unprivileged, the ublk char device is allowed
for unprivileged user actually, and we can't trust the current user,
so not probe partitions.
Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Reviewed-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230106041711.914434-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 16 Jan 2023 15:55:53 +0000 (08:55 -0700)]
block: don't allow multiple bios for IOCB_NOWAIT issue
If we're doing a large IO request which needs to be split into multiple
bios for issue, then we can run into the same situation as the below
marked commit fixes - parts will complete just fine, one or more parts
will fail to allocate a request. This will result in a partially
completed read or write request, where the caller gets EAGAIN even though
parts of the IO completed just fine.
Do the same for large bios as we do for splits - fail a NOWAIT request
with EAGAIN. This isn't technically fixing an issue in the below marked
patch, but for stable purposes, we should have either none of them or
both.
This depends on: 613b14884b85 ("block: handle bio_split_to_limits() NULL return")
Cc: stable@vger.kernel.org # 5.15+ Fixes: 9cea62b2cbab ("block: don't allow splitting of a REQ_NOWAIT bio") Link: https://github.com/axboe/liburing/issues/766 Reported-and-tested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Böhmwalder [Fri, 13 Jan 2023 12:35:35 +0000 (13:35 +0100)]
drbd: remove macros using require_context
This require_context attribute originated in a proposed sparse patch by
Philipp Reisner back in 2008. Johannes Berg had a different solution to
a similar problem, and that patch "won" in the end; so the require_context
thing never got merged. The whole history can be read at [0].
DRBD kept using these annotations anyway for a while. Nowadays, on a
modern unmodified sparse, they obviously do nothing, and they are hardly
used anymore anyway.
Robert Altnoeder [Fri, 13 Jan 2023 12:35:32 +0000 (13:35 +0100)]
drbd: fix DRBD_VOLUME_MAX 65535 -> 65534
The protocol uses -1 as a reserved value for
'no specific volume', and since the protocol field
is a 16 bit unsigned value, -1 is converted to
65535. Therefore, limit the range of valid volume
numbers to [0, 65534].
Christoph Böhmwalder [Fri, 13 Jan 2023 12:35:31 +0000 (13:35 +0100)]
drbd: adjust drbd_limits license header
See also commit 93c68cc46a07 ("drbd: use consistent license"). We only
want to license drbd under GPL-2.0, so use the corresponding SPDX header
consistently.
Keith Busch [Thu, 5 Jan 2023 20:51:46 +0000 (12:51 -0800)]
block: save user max_sectors limit
The user can set the max_sectors limit to any valid value via sysfs
/sys/block/<dev>/queue/max_sectors_kb attribute. If the device limits
are ever rescanned, though, the limit reverts back to the potentially
artificially low BLK_DEF_MAX_SECTORS value.
Preserve the user's setting as the max_sectors limit as long as it's
valid. The user can reset back to defaults by writing 0 to the sysfs
file.
Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230105205146.3610282-3-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Thu, 5 Jan 2023 20:51:45 +0000 (12:51 -0800)]
block: make BLK_DEF_MAX_SECTORS unsigned
This is used as an unsigned value, so define it that way to avoid
having to cast it.
Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230105205146.3610282-2-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Davide Zini [Tue, 3 Jan 2023 14:55:03 +0000 (15:55 +0100)]
block, bfq: balance I/O injection among underutilized actuators
Upon the invocation of its dispatch function, BFQ returns the next I/O
request of the in-service bfq_queue, unless some exception holds. One
such exception is that there is some underutilized actuator, different
from the actuator for which the in-service queue contains I/O, and
that some other bfq_queue happens to contain I/O for such an
actuator. In this case, the next I/O request of the latter bfq_queue,
and not of the in-service bfq_queue, is returned (I/O is injected from
that bfq_queue). To find such an actuator, a linear scan, in
increasing index order, is performed among actuators.
Performing a linear scan entails a prioritization among actuators: an
underutilized actuator may be considered for injection only if all
actuators with a lower index are currently fully utilized, or if there
is no pending I/O for any lower-index actuator that happens to be
underutilized.
This commits breaks this prioritization and tends to distribute
injection uniformly across actuators. This is obtained by adding the
following condition to the linear scan: even if an actuator A is
underutilized, A is however skipped if its load is higher than that of
the next actuator.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Davide Zini <davidezini2@gmail.com> Link: https://lore.kernel.org/r/20230103145503.71712-9-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Davide Zini [Tue, 3 Jan 2023 14:55:02 +0000 (15:55 +0100)]
block, bfq: inject I/O to underutilized actuators
The main service scheme of BFQ for sync I/O is serving one sync
bfq_queue at a time, for a while. In particular, BFQ enforces this
scheme when it deems the latter necessary to boost throughput or
to preserve service guarantees. Unfortunately, when BFQ enforces
this policy, only one actuator at a time gets served for a while,
because each bfq_queue contains I/O only for one actuator. The
other actuators may remain underutilized.
Actually, BFQ may serve (inject) extra I/O, taken from other
bfq_queues, in parallel with that of the in-service queue. This
injection mechanism may provide the ground for dealing also with
the above actuator-underutilization problem. Yet BFQ does not take
the actuator load into account when choosing which queue to pick
extra I/O from. In addition, BFQ may happen to inject extra I/O
only when the in-service queue is temporarily empty.
In view of these facts, this commit extends the
injection mechanism in such a way that the latter:
(1) takes into account also the actuator load;
(2) checks such a load on each dispatch, and injects I/O for an
underutilized actuator, if there is one and there is I/O for it.
To perform the check in (2), this commit introduces a load
threshold, currently set to 4. A linear scan of each actuator is
performed, until an actuator is found for which the following two
conditions hold: the load of the actuator is below the threshold,
and there is at least one non-in-service queue that contains I/O
for that actuator. If such a pair (actuator, queue) is found, then
the head request of that queue is returned for dispatch, instead
of the head request of the in-service queue.
We have set the threshold, empirically, to the minimum possible
value for which an actuator is fully utilized, or close to be
fully utilized. By doing so, injected I/O 'steals' as few
drive-queue slots as possibile to the in-service queue. This
reduces as much as possible the probability that the service of
I/O from the in-service bfq_queue gets delayed because of slot
exhaustion, i.e., because all the slots of the drive queue are
filled with I/O injected from other queues (NCQ provides for 32
slots).
This new mechanism also counters actuator underutilization in the
case of asymmetric configurations of bfq_queues. Namely if there
are few bfq_queues containing I/O for some actuators and many
bfq_queues containing I/O for other actuators. Or if the
bfq_queues containing I/O for some actuators have lower weights
than the other bfq_queues.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Davide Zini <davidezini2@gmail.com> Link: https://lore.kernel.org/r/20230103145503.71712-8-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Federico Gavioli [Tue, 3 Jan 2023 14:55:01 +0000 (15:55 +0100)]
block, bfq: retrieve independent access ranges from request queue
This patch implements the code to gather the content of the
independent_access_ranges structure from the request_queue and copy
it into the queue's bfq_data. This copy is done at queue initialization.
We copy the access ranges into the bfq_data to avoid taking the queue
lock each time we access the ranges.
This implementation, however, puts a limit to the maximum independent
ranges supported by the scheduler. Such a limit is equal to the constant
BFQ_MAX_ACTUATORS. This limit was placed to avoid the allocation of
dynamic memory.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Co-developed-by: Rory Chen <rory.c.chen@seagate.com> Signed-off-by: Rory Chen <rory.c.chen@seagate.com> Signed-off-by: Federico Gavioli <f.gavioli97@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20230103145503.71712-7-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Valente [Tue, 3 Jan 2023 14:54:59 +0000 (15:54 +0100)]
block, bfq: turn bfqq_data into an array in bfq_io_cq
When a bfq_queue Q is merged with another queue, several pieces of
information are saved about Q. These pieces are stored in the
bfqq_data field in the bfq_io_cq data structure of the process
associated with Q.
Yet, with a multi-actuator drive, a process may get associated with
multiple bfq_queues: one queue for each of the N actuators. Each of
these queues may undergo a merge. So, the bfq_io_cq data structure
must be able to accommodate the above information for N queues.
This commit solves this problem by turning the bfqq_data scalar field
into an array of N elements (and by changing code so as to handle
this array).
This solution is written under the assumption that bfq_queues
associated with different actuators cannot be cross-merged. This
assumption holds naturally with basic queue merging: the latter is
triggered by spatial locality, and sectors for different actuators are
not close to each other (apart from the corner case of the last
sectors served by a given actuator and the first sectors served by the
next actuator). As for stable cross-merging, the assumption here is
that it is disabled.
Paolo Valente [Tue, 3 Jan 2023 14:54:58 +0000 (15:54 +0100)]
block, bfq: move io_cq-persistent bfqq data into a dedicated struct
With a multi-actuator drive, a process may get associated with multiple
bfq_queues: one queue for each of the N actuators. So, the bfq_io_cq
data structure must be able to accommodate its per-queue persistent
information for N queues. Currently it stores this information for
just one queue, in several scalar fields.
This is a preparatory commit for moving to accommodating persistent
information for N queues. In particular, this commit packs all the
above scalar fields into a single data structure. Then there is now
only one field, in bfq_io_cq, that stores all the above information. This
scalar field will then be turned into an array by a following commit.
Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Gianmarco Lusvardi <glusvardi@posteo.net> Signed-off-by: Giulio Barabino <giuliobarabino99@gmail.com> Signed-off-by: Emiliano Maccaferri <inbox@emilianomaccaferri.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20230103145503.71712-4-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Valente [Tue, 3 Jan 2023 14:54:57 +0000 (15:54 +0100)]
block, bfq: forbid stable merging of queues associated with different actuators
If queues associated with different actuators are merged, then control
is lost on each actuator. Therefore some actuator may be
underutilized, and throughput may decrease. This problem cannot occur
with basic queue merging, because the latter is triggered by spatial
locality, and sectors for different actuators are not close to each
other. Yet it may happen with stable merging. To address this issue,
this commit prevents stable merging from occurring among queues
associated with different actuators.
Paolo Valente [Tue, 3 Jan 2023 14:54:56 +0000 (15:54 +0100)]
block, bfq: split sync bfq_queues on a per-actuator basis
Single-LUN multi-actuator SCSI drives, as well as all multi-actuator
SATA drives appear as a single device to the I/O subsystem [1]. Yet
they address commands to different actuators internally, as a function
of Logical Block Addressing (LBAs). A given sector is reachable by
only one of the actuators. For example, Seagate’s Serial Advanced
Technology Attachment (SATA) version contains two actuators and maps
the lower half of the SATA LBA space to the lower actuator and the
upper half to the upper actuator.
Evidently, to fully utilize actuators, no actuator must be left idle
or underutilized while there is pending I/O for it. The block layer
must somehow control the load of each actuator individually. This
commit lays the ground for allowing BFQ to provide such a per-actuator
control.
BFQ associates an I/O-request sync bfq_queue with each process doing
synchronous I/O, or with a group of processes, in case of queue
merging. Then BFQ serves one bfq_queue at a time. While in service, a
bfq_queue is emptied in request-position order. Yet the same process,
or group of processes, may generate I/O for different actuators. In
this case, different streams of I/O (each for a different actuator)
get all inserted into the same sync bfq_queue. So there is basically
no individual control on when each stream is served, i.e., on when the
I/O requests of the stream are picked from the bfq_queue and
dispatched to the drive.
This commit enables BFQ to control the service of each actuator
individually for synchronous I/O, by simply splitting each sync
bfq_queue into N queues, one for each actuator. In other words, a sync
bfq_queue is now associated to a pair (process, actuator). As a
consequence of this split, the per-queue proportional-share policy
implemented by BFQ will guarantee that the sync I/O generated for each
actuator, by each process, receives its fair share of service.
This is just a preparatory patch. If the I/O of the same process
happens to be sent to different queues, then each of these queues may
undergo queue merging. To handle this event, the bfq_io_cq data
structure must be properly extended. In addition, stable merging must
be disabled to avoid loss of control on individual actuators. Finally,
also async queues must be split. These issues are described in detail
and addressed in next commits. As for this commit, although multiple
per-process bfq_queues are provided, the I/O of each process or group
of processes is still sent to only one queue, regardless of the
actuator the I/O is for. The forwarding to distinct bfq_queues will be
enabled after addressing the above issues.
Linus Torvalds [Sun, 15 Jan 2023 13:17:44 +0000 (07:17 -0600)]
Merge tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure the poking PGD is pinned for Xen PV as it requires it this
way
- Fixes for two resctrl races when moving a task or creating a new
monitoring group
- Fix SEV-SNP guests running under HyperV where MTRRs are disabled to
not return a UC- type mapping type on memremap() and thus cause a
serious slowdown
- Fix insn mnemonics in bioscall.S now that binutils is starting to fix
confusing insn suffixes
* tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: fix poking_init() for Xen PV guests
x86/resctrl: Fix event counts regression in reused RMIDs
x86/resctrl: Fix task CLOSID/RMID update race
x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
x86/boot: Avoid using Intel mnemonics in AT&T syntax asm
Linus Torvalds [Sun, 15 Jan 2023 13:12:58 +0000 (07:12 -0600)]
Merge tag 'edac_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Fix the EDAC device's confusion in the polling setting units
- Fix a memory leak in highbank's probing function
* tag 'edac_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/highbank: Fix memory leak in highbank_mc_probe()
EDAC/device: Fix period calculation in edac_device_reset_delay_period()
Linus Torvalds [Sun, 15 Jan 2023 13:09:41 +0000 (07:09 -0600)]
Merge tag 'powerpc-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a build failure with some versions of ld that have an odd version
string
- Fix incorrect use of mutex in the IMC PMU driver
Thanks to Kajol Jain, Michael Petlan, Ojaswin Mujoo, Peter Zijlstra, and
Yang Yingliang.
* tag 'powerpc-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/hash: Make stress_hpt_timer_fn() static
powerpc/imc-pmu: Fix use of mutex in IRQs disabled section
powerpc/boot: Fix incorrect version calculation issue in ld_version
Linus Torvalds [Sat, 14 Jan 2023 16:48:15 +0000 (10:48 -0600)]
Merge tag 'iommu-fixes-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Core: Fix an iommu-group refcount leak
- Fix overflow issue in IOVA alloc path
- ARM-SMMU fixes from Will:
- Fix VFIO regression on NXP SoCs by reporting IOMMU_CAP_CACHE_COHERENCY
- Fix SMMU shutdown paths to avoid device unregistration race
- Error handling fix for Mediatek IOMMU driver
* tag 'iommu-fixes-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe()
iommu/iova: Fix alloc iova overflows issue
iommu: Fix refcount leak in iommu_device_claim_dma_owner
iommu/arm-smmu-v3: Don't unregister on shutdown
iommu/arm-smmu: Don't unregister on shutdown
iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY even betterer
Linus Torvalds [Sat, 14 Jan 2023 16:08:08 +0000 (10:08 -0600)]
Merge tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fix from Mike Rapoport:
"memblock: always release pages to the buddy allocator in
memblock_free_late()
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the
deferred init process.
memblock_free_pages() is called by memblock_free_late(), which is used
to free reserved ranges after memblock_free_all() has run. All pages
in reserved ranges have been initialized at that point, and
accordingly, those pages are not touched by the deferred init process.
This means that currently, if the pages that memblock_free_late()
intends to release are in the deferred range, they will never be
released to the buddy allocator. They will forever be reserved.
In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().
For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call
__free_pages_core() directly instead.
One case where this issue can occur in the wild is EFI boot on x86_64.
The x86 EFI code reserves all EFI boot services memory ranges via
memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively).
If any of those ranges happens to fall within the deferred init range,
the pages will not be released and that memory will be unavailable.
For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
* tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
mm: Always release pages to the buddy allocator in memblock_free_late().
Linus Torvalds [Sat, 14 Jan 2023 16:04:00 +0000 (10:04 -0600)]
Merge tag 'hardening-v6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kernel hardening fixes from Kees Cook:
- Fix CFI hash randomization with KASAN (Sami Tolvanen)
- Check size of coreboot table entry and use flex-array
* tag 'hardening-v6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kbuild: Fix CFI hash randomization with KASAN
firmware: coreboot: Check size of table entry and use flex-array
Linus Torvalds [Sat, 14 Jan 2023 14:08:25 +0000 (08:08 -0600)]
Merge tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
- memory leak and double free fix
- two symlink fixes
- minor cleanup fix
- two smb1 fixes
* tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix uninitialized memory read for smb311 posix symlink create
cifs: fix potential memory leaks in session setup
cifs: do not query ifaces on smb1 mounts
cifs: fix double free on failed kerberos auth
cifs: remove redundant assignment to the variable match
cifs: fix file info setting in cifs_open_file()
cifs: fix file info setting in cifs_query_path_info()
Linus Torvalds [Sat, 14 Jan 2023 13:57:25 +0000 (07:57 -0600)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two minor fixes in the hisi_sas driver which only impact enterprise
style multi-expander and shared disk situations and no core changes"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: hisi_sas: Set a port invalid only if there are no devices attached when refreshing port id
scsi: hisi_sas: Use abort task set to reset SAS disks when discovered
Linus Torvalds [Sat, 14 Jan 2023 13:52:11 +0000 (07:52 -0600)]
Merge tag 'ata-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
"A single fix to prevent building the pata_cs5535 driver with user mode
linux as it uses msr operations that are not defined with UML"
* tag 'ata-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_cs5535: Don't build on UML
Linus Torvalds [Fri, 13 Jan 2023 23:41:19 +0000 (17:41 -0600)]
Merge tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Nothing major in here, just a collection of NVMe fixes and dropping a
wrong might_sleep() that static checkers tripped over but which isn't
valid"
* tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux:
MAINTAINERS: stop nvme matching for nvmem files
nvme: don't allow unprivileged passthrough on partitions
nvme: replace the "bool vec" arguments with flags in the ioctl path
nvme: remove __nvme_ioctl
nvme-pci: fix error handling in nvme_pci_enable()
nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers
nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression
block: Drop spurious might_sleep() from blk_put_queue()
Linus Torvalds [Fri, 13 Jan 2023 23:37:09 +0000 (17:37 -0600)]
Merge tag 'io_uring-6.2-2023-01-13' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"A fix for a regression that happened last week, rest is fixes that
will be headed to stable as well. In detail:
- Fix for a regression added with the leak fix from last week (me)
- In writing a test case for that leak, inadvertently discovered a
case where we a poll request can race. So fix that up and mark it
for stable, and also ensure that fdinfo covers both the poll tables
that we have. The latter was an oversight when the split poll table
were added (me)
- Fix for a lockdep reported issue with IOPOLL (Pavel)"
* tag 'io_uring-6.2-2023-01-13' of git://git.kernel.dk/linux:
io_uring: lock overflowing for IOPOLL
io_uring/poll: attempt request issue after racy poll wakeup
io_uring/fdinfo: include locked hash table in fdinfo output
io_uring/poll: add hash if ready poll request can't complete inline
io_uring/io-wq: only free worker if it was allocated for creation
Linus Torvalds [Fri, 13 Jan 2023 23:32:22 +0000 (17:32 -0600)]
Merge tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:
- Work around apparent firmware issue that made Linux reject MMCONFIG
space, which broke PCI extended config space (Bjorn Helgaas)
- Fix CONFIG_PCIE_BT1 dependency due to mid-air collision between a
PCI_MSI_IRQ_DOMAIN -> PCI_MSI change and addition of PCIE_BT1 (Lukas
Bulwahn)
* tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
x86/pci: Simplify is_mmconf_reserved() messages
PCI: dwc: Adjust to recent removal of PCI_MSI_IRQ_DOMAIN
Sami Tolvanen [Thu, 12 Jan 2023 22:49:48 +0000 (22:49 +0000)]
kbuild: Fix CFI hash randomization with KASAN
Clang emits a asan.module_ctor constructor to each object file
when KASAN is enabled, and these functions are indirectly called
in do_ctors. With CONFIG_CFI_CLANG, the compiler also emits a CFI
type hash before each address-taken global function so they can
pass indirect call checks.
However, in commit 0c3e806ec0f9 ("x86/cfi: Add boot time hash
randomization"), x86 implemented boot time hash randomization,
which relies on the .cfi_sites section generated by objtool. As
objtool is run against vmlinux.o instead of individual object
files with X86_KERNEL_IBT (enabled by default), CFI types in
object files that are not part of vmlinux.o end up not being
included in .cfi_sites, and thus won't get randomized and trip
CFI when called.
Only .vmlinux.export.o and init/version-timestamp.o are linked
into vmlinux separately from vmlinux.o. As these files don't
contain any functions, disable KASAN for both of them to avoid
breaking hash randomization.
Kees Cook [Thu, 12 Jan 2023 23:03:16 +0000 (15:03 -0800)]
firmware: coreboot: Check size of table entry and use flex-array
The memcpy() of the data following a coreboot_table_entry couldn't
be evaluated by the compiler under CONFIG_FORTIFY_SOURCE. To make it
easier to reason about, add an explicit flexible array member to struct
coreboot_device so the entire entry can be copied at once. Additionally,
validate the sizes before copying. Avoids this run-time false positive
warning:
memcpy: detected field-spanning write (size 168) of single field "&device->entry" at drivers/firmware/google/coreboot_table.c:103 (size 8)
Nicholas Piggin [Thu, 12 Jan 2023 10:54:26 +0000 (20:54 +1000)]
kallsyms: Fix scheduling with interrupts disabled in self-test
kallsyms_on_each* may schedule so must not be called with interrupts
disabled. The iteration function could disable interrupts, but this
also changes lookup_symbol() to match the change to the other timing
code.
Peter Foley [Fri, 13 Jan 2023 04:37:06 +0000 (23:37 -0500)]
ata: pata_cs5535: Don't build on UML
This driver uses MSR functions that aren't implemented under UML.
Avoid building it to prevent tripping up allyesconfig.
e.g.
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x3a3): undefined reference to `__tracepoint_read_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x3d2): undefined reference to `__tracepoint_write_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x457): undefined reference to `__tracepoint_write_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x481): undefined reference to `do_trace_write_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x4d5): undefined reference to `do_trace_write_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x4f5): undefined reference to `do_trace_read_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x51c): undefined reference to `do_trace_write_msr'
Signed-off-by: Peter Foley <pefoley2@pefoley.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Linus Torvalds [Fri, 13 Jan 2023 20:41:50 +0000 (14:41 -0600)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix the PMCR_EL0 reset value after the PMU rework
- Correctly handle S2 fault triggered by a S1 page table walk by not
always classifying it as a write, as this breaks on R/O memslots
- Document why we cannot exit with KVM_EXIT_MMIO when taking a write
fault from a S1 PTW on a R/O memslot
- Put the Apple M2 on the naughty list for not being able to
correctly implement the vgic SEIS feature, just like the M1 before
it
- Reviewer updates: Alex is stepping down, replaced by Zenghui
x86:
- Fix various rare locking issues in Xen emulation and teach lockdep
to detect them
- Documentation improvements
- Do not return host topology information from KVM_GET_SUPPORTED_CPUID"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule
KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking
Documentation: kvm: fix SRCU locking order docs
KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
KVM: nSVM: clarify recalc_intercepts() wrt CR8
MAINTAINERS: Remove myself as a KVM/arm64 reviewer
MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer
KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations
KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
KVM: arm64: Fix S1PTW handling on RO memslots
KVM: arm64: PMU: Fix PMCR_EL0 reset value
Mateusz Guzik [Fri, 13 Jan 2023 18:44:47 +0000 (19:44 +0100)]
lockref: stop doing cpu_relax in the cmpxchg loop
On the x86-64 architecture even a failing cmpxchg grants exclusive
access to the cacheline, making it preferable to retry the failed op
immediately instead of stalling with the pause instruction.
To illustrate the impact, below are benchmark results obtained by
running various will-it-scale tests on top of the 6.2-rc3 kernel and
Cascade Lake (2 sockets * 24 cores * 2 threads) CPU.
All results in ops/s. Note there is some variance in re-runs, but the
code is consistently faster when contention is present.
That is, fixing the problems in access itself *reduces* scalability
after the cacheline ping-pong only happens in lockref with the pause
instruction.
Note that fstat and access benchmarks are not currently integrated into
will-it-scale, but interested parties can find them in pull requests to
said project.
Code at hand has a rather tortured history. First modification showed
up in commit d472d9d98b46 ("lockref: Relax in cmpxchg loop"), written
with Itanium in mind. Later it got patched up to use an arch-dependent
macro to stop doing it on s390 where it caused a significant regression.
Said macro had undergone revisions and was ultimately eliminated later,
going back to cpu_relax.
While I intended to only remove cpu_relax for x86-64, I got the
following comment from Linus:
I would actually prefer just removing it entirely and see if
somebody else hollers. You have the numbers to prove it hurts on
real hardware, and I don't think we have any numbers to the
contrary.
So I think it's better to trust the numbers and remove it as a
failure, than say "let's just remove it on x86-64 and leave
everybody else with the potentially broken code"
Additionally, Will Deacon (maintainer of the arm64 port, one of the
architectures previously benchmarked):
So, from the arm64 side of the fence, I'm perfectly happy just
removing the cpu_relax() calls from lockref.
As such, come back full circle in history and whack it altogether.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/all/CAGudoHHx0Nqg6DE70zAVA75eV-HXfWyhVMWZ-aSeOofkA_=WdA@mail.gmail.com/ Acked-by: Tony Luck <tony.luck@intel.com> # ia64 Acked-by: Nicholas Piggin <npiggin@gmail.com> # powerpc Acked-by: Will Deacon <will@kernel.org> # arm64 Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bjorn Helgaas [Tue, 10 Jan 2023 18:02:43 +0000 (12:02 -0600)]
x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
Normally we reject ECAM space unless it is reported as reserved in the E820
table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2).
07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
E820 entries that correspond to EfiMemoryMappedIO regions because some
other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
E820 entries prevent Linux from allocating BAR space for hot-added devices.
Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
normally converted to an E820 entry by a bootloader or EFI stub. After 07eab0901ede, that E820 entry is removed, so we reject this ECAM space,
which makes PCI extended config space (offsets 0x100-0xfff) inaccessible.
The lack of extended config space breaks anything that relies on it,
including perf, VSEC telemetry, EDAC, QAT, SR-IOV, etc.
Allow use of ECAM for extended config space when the region is covered by
an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
_CRS.
Link: https://lore.kernel.org/r/ac2693d8-8ba3-72e0-5b66-b3ae008d539d@linux.intel.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216891 Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map") Link: https://lore.kernel.org/r/20230110180243.1590045-3-helgaas@kernel.org Reported-by: Kan Liang <kan.liang@linux.intel.com> Reported-by: Tony Luck <tony.luck@intel.com> Reported-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reported-by: Yunying Sun <yunying.sun@intel.com> Reported-by: Baowen Zheng <baowen.zheng@corigine.com> Reported-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reported-by: Yang Lixiao <lixiao.yang@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Linus Torvalds [Fri, 13 Jan 2023 16:35:26 +0000 (10:35 -0600)]
Merge tag 'docs-6.2-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"Three documentation fixes (or rather two and one warning):
- Sphinx 6.0 broke our configuration mechanism, so fix it
- I broke our configuration for non-Alabaster themes; Akira fixed it
- Deprecate Sphinx < 2.4 with an eye toward future removal"
* tag 'docs-6.2-fixes' of git://git.lwn.net/linux:
docs/conf.py: Use about.html only in sidebar of alabaster theme
docs: Deprecate use of Sphinx < 2.4.x
docs: Fix the docs build with Sphinx 6.0
Ard Biesheuvel [Mon, 9 Jan 2023 09:44:31 +0000 (10:44 +0100)]
efi: tpm: Avoid READ_ONCE() for accessing the event log
Nathan reports that recent kernels built with LTO will crash when doing
EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
misaligned load from the TPM event log, which is annotated with
READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
which does not tolerate misaligned accesses.
Interestingly, this does not happen when booting the same kernel
straight from the UEFI shell, and so the fact that the event log may
appear misaligned in memory may be caused by a bug in GRUB or SHIM.
However, using READ_ONCE() to access firmware tables is slightly unusual
in any case, and here, we only need to ensure that 'event' is not
dereferenced again after it gets unmapped, but this is already taken
care of by the implicit barrier() semantics of the early_memunmap()
call.
Cc: <stable@vger.kernel.org> Cc: Peter Jones <pjones@redhat.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1782 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
There is no real problem for normal IOPOLL as flush is also called with
uring_lock taken, but it's getting more complicated for IOPOLL|SQPOLL,
for which __io_cqring_overflow_flush() happens from the CQ waiting path.
Linus Torvalds [Fri, 13 Jan 2023 14:20:29 +0000 (08:20 -0600)]
Merge tag 'sound-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became a slightly big update, but it's more or less expected, as
the first batch after holidays.
All changes (but for the last two last-minute fixes) have been stewed
in linux-next long enough, so it's fairly safe to take:
- PCM UAF fix in 32bit compat layer
- ASoC board-specific fixes for Intel, AMD, Medathek, Qualcomm
- SOF power management fixes
- ASoC Intel link failure fixes
- A series of fixes for USB-audio regressions
- CS35L41 HD-audio codec regression fixes
- HD-audio device-specific fixes / quirks
Note that one SPI patch has been taken in ASoC subtree mistakenly, and
the same fix is found in spi tree, but it should be OK to apply"
* tag 'sound-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (39 commits)
ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF
ALSA: usb-audio: Fix possible NULL pointer dereference in snd_usb_pcm_has_fixed_rate()
ALSA: hda/realtek: Enable mute/micmute LEDs on HP Spectre x360 13-aw0xxx
ASoC: fsl-asoc-card: Fix naming of AC'97 CODEC widgets
ASoC: fsl_ssi: Rename AC'97 streams to avoid collisions with AC'97 CODEC
ALSA: hda/hdmi: Add a HP device 0x8715 to force connect list
ALSA: control-led: use strscpy in set_led_id()
ALSA: usb-audio: Always initialize fixed_rate in snd_usb_find_implicit_fb_sync_format()
ASoC: dt-bindings: qcom,lpass-tx-macro: correct clocks on SC7280
ASoC: dt-bindings: qcom,lpass-wsa-macro: correct clocks on SM8250
ASoC: qcom: Fix building APQ8016 machine driver without SOUNDWIRE
ALSA: hda: cs35l41: Check runtime suspend capability at runtime_idle
ALSA: hda: cs35l41: Don't return -EINVAL from system suspend/resume
ASoC: fsl_micfil: Correct the number of steps on SX controls
ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform
Revert "ALSA: usb-audio: Drop superfluous interface setup at parsing"
ALSA: usb-audio: More refactoring of hw constraint rules
ALSA: usb-audio: Relax hw constraints for implicit fb sync
ALSA: usb-audio: Make sure to stop endpoints before closing EPs
ALSA: hda - Enable headset mic on another Dell laptop with ALC3254
...
Linus Torvalds [Fri, 13 Jan 2023 13:38:14 +0000 (07:38 -0600)]
Merge tag 'pm-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix assorted issues in the ARM cpufreq drivers and in the AMD
P-state driver.
Specifics:
- Fix cpufreq policy reference counting in amd-pstate to prevent it
from crashing on removal (Perry Yuan)
- Fix double initialization and set suspend-freq for Apple's cpufreq
driver (Arnd Bergmann, Hector Martin)
- Fix reading of "reg" property, update cpufreq-dt's blocklist and
update DT documentation for Qualcomm's cpufreq driver (Konrad
Dybcio, Krzysztof Kozlowski)
- Replace 0 with NULL in the Armada cpufreq driver (Miles Chen)
- Fix potential overflows in the CPPC cpufreq driver (Pierre Gondois)
- Update blocklist for the Tegra234 Soc cpufreq driver (Sumit Gupta)"
* tag 'pm-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: amd-pstate: fix kernel hang issue while amd-pstate unregistering
cpufreq: armada-37xx: stop using 0 as NULL pointer
cpufreq: apple-soc: Switch to the lowest frequency on suspend
dt-bindings: cpufreq: cpufreq-qcom-hw: document interrupts
cpufreq: Add SM6375 to cpufreq-dt-platdev blocklist
cpufreq: Add Tegra234 to cpufreq-dt-platdev blocklist
cpufreq: qcom-hw: Fix reading "reg" with address/size-cells != 2
cpufreq: CPPC: Add u64 casts to avoid overflowing
cpufreq: apple: remove duplicate intializer
Linus Torvalds [Fri, 13 Jan 2023 13:32:55 +0000 (07:32 -0600)]
Merge tag 'acpi-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These add one more ACPI IRQ override quirk, improve ACPI companion
lookup for backlight devices and add missing kernel command line
option values for backlight detection.
Specifics:
- Improve ACPI companion lookup for backlight devices in the cases
when there is more than one candidate ACPI device object (Hans de
Goede)
- Add missing support for manual selection of NVidia-WMI-EC or Apple
GMUX backlight in the kernel command line to the ACPI backlight
driver (Hans de Goede)
- Skip ACPI IRQ override on Asus Expertbook B2402CBA (Tamim Khan)"
* tag 'acpi-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: Fix selecting wrong ACPI fwnode for the iGPU on some Dell laptops
ACPI: video: Allow selecting NVidia-WMI-EC or Apple GMUX backlight from the cmdline
ACPI: resource: Skip IRQ override on Asus Expertbook B2402CBA
Linus Torvalds [Fri, 13 Jan 2023 13:26:40 +0000 (07:26 -0600)]
Merge tag 'platform-drivers-x86-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"A set of assorted fixes and hardware-id additions"
* tag 'platform-drivers-x86-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: thinkpad_acpi: Fix profile mode display in AMT mode
platform/x86: int3472/discrete: Ensure the clk/power enable pins are in output mode
platform/x86/amd: Fix refcount leak in amd_pmc_probe
platform/x86: intel/pmc/core: Add Meteor Lake mobile support
platform/x86: simatic-ipc: add another model
platform/x86: simatic-ipc: correct name of a model
platform/x86: dell-privacy: Only register SW_CAMERA_LENS_COVER if present
platform/x86: dell-privacy: Fix SW_CAMERA_LENS_COVER reporting
platform/x86: asus-wmi: Don't load fan curves without fan
platform/x86: asus-wmi: Ignore fan on E410MA
platform/x86: asus-wmi: Add quirk wmi_ignore_fan
platform/x86: asus-nb-wmi: Add alternate mapping for KEY_SCREENLOCK
platform/x86: asus-nb-wmi: Add alternate mapping for KEY_CAMERA
platform/surface: aggregator: Add missing call to ssam_request_sync_free()
platform/surface: aggregator: Ignore command messages not intended for us
platform/x86: touchscreen_dmi: Add info for the CSL Panther Tab HD
platform/x86: ideapad-laptop: Add Legion 5 15ARH05 DMI id to set_fn_lock_led_list[]
platform/x86: sony-laptop: Don't turn off 0x153 keyboard backlight during probe
Linus Torvalds [Fri, 13 Jan 2023 13:18:59 +0000 (07:18 -0600)]
Merge tag 'drm-fixes-2023-01-13' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"There is a bit of a post-holiday build up here I expect, small fixes
across the board, amdgpu and msm being the main leaders, with others
having a few. One code removal patch for nouveau:
buddy:
- benchmark regression fix for top-down buddy allocation
panel:
- add Lenovo panel orientation quirk
ttm:
- fix kernel oops regression
amdgpu:
- fix missing fence references
- fix missing pipeline sync fencing
- SMU13 fan speed fix
- SMU13 fix power cap handling
- SMU13 BACO fix
- Fix a possible segfault in bo validation error case
- Delay removal of firmware framebuffer
- Fix error when unloading
amdkfd:
- SVM fix when clearing vram
- GC11 fix for multi-GPU
i915:
- Reserve enough fence slot for i915_vma_unbind_vsync
- Fix potential use after free
- Reset engines twice in case of reset failure
- Use multi-cast registers for SVG Unit registers
msm:
- display:
- doc warning fixes
- dt attribs cleanups
- memory leak fix
- error handing in hdmi probe fix
- dp_aux_isr incorrect signalling fix
- shutdown path fix
- accel:
- a5xx: fix quirks to be a bitmask
- a6xx: fix gx halt to avoid 1s hang
- kexec shutdown fix
- fix potential double free
vmwgfx:
- drop rcu usage to make code more robust
virtio:
- fix use-after-free in gem handle code
nouveau:
- drop unused nouveau_fbcon.c"
* tag 'drm-fixes-2023-01-13' of git://anongit.freedesktop.org/drm/drm: (35 commits)
drm: Optimize drm buddy top-down allocation method
drm/ttm: Fix a regression causing kernel oops'es
drm/i915/gt: Cover rest of SVG unit MCR registers
drm/nouveau: Remove file nouveau_fbcon.c
drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU
drm/amd/pm/smu13: BACO is supported when it's in BACO state
drm/amdkfd: Add sync after creating vram bo
drm/i915/gt: Reset twice
drm/amdgpu: fix pipeline sync v2
drm/vmwgfx: Remove rcu locks from user resources
drm/virtio: Fix GEM handle creation UAF
drm/amdgpu: Fixed bug on error when unloading amdgpu
drm/amd: Delay removal of the firmware framebuffer
drm/amdgpu: Fix potential NULL dereference
drm/i915: Fix potential context UAFs
drm/i915: Reserve enough fence slot for i915_vma_unbind_async
drm: Add orientation quirk for Lenovo ideapad D330-10IGL
drm/msm/a6xx: Avoid gx gbit halt during rpm suspend
drm/msm/adreno: Make adreno quirks not overwrite each other
drm/msm: another fix for the headless Adreno GPU
...
Clement Lecigne [Fri, 13 Jan 2023 12:07:45 +0000 (13:07 +0100)]
ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF
Takes rwsem lock inside snd_ctl_elem_read instead of snd_ctl_elem_read_user
like it was done for write in commit 1fa4445f9adf1 ("ALSA: control - introduce
snd_ctl_notify_one() helper"). Doing this way we are also fixing the following
locking issue happening in the compat path which can be easily triggered and
turned into an use-after-free.
Linus Torvalds [Fri, 13 Jan 2023 13:11:45 +0000 (07:11 -0600)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here's a sizeable batch of Friday the 13th arm64 fixes for -rc4. What
could possibly go wrong?
The obvious reason we have so much here is because of the holiday
season right after the merge window, but we've also brought back an
erratum workaround that was previously dropped at the last minute and
there's an MTE coredumping fix that strays outside of the arch/arm64
directory.
Summary:
- Fix PAGE_TABLE_CHECK failures on hugepage splitting path
- Fix PSCI encoding of MEM_PROTECT_RANGE function in UAPI header
- Fix NULL deref when accessing debugfs node if PSCI is not present
- Fix MTE core dumping when VMA list is being updated concurrently
- Fix SME signal frame handling when SVE is not implemented by the
CPU
- Fix asm constraints for cmpxchg_double() to hazard both words
- Fix build failure with stack tracer and older versions of Clang
- Bring back workaround for Cortex-A715 erratum 2645198"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix build with CC=clang, CONFIG_FTRACE=y and CONFIG_STACK_TRACER=y
arm64/mm: Define dummy pud_user_exec() when using 2-level page-table
arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption
firmware/psci: Don't register with debugfs if PSCI isn't available
firmware/psci: Fix MEM_PROTECT_RANGE function numbers
arm64/signal: Always allocate SVE signal frames on SME only systems
arm64/signal: Always accept SVE signal frames on SME only systems
arm64/sme: Fix context switch for SME only systems
arm64: cmpxchg_double*: hazard against entire exchange variable
arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning
arm64: mte: Avoid the racy walk of the vma list during core dump
elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size}
arm64: mte: Fix double-freeing of the temporary tag storage during coredump
arm64: ptrace: Use ARM64_SME to guard the SME register enumerations
arm64/mm: add pud_user_exec() check in pud_user_accessible_page()
arm64/mm: fix incorrect file_map_count for invalid pmd