Christoph Hellwig [Mon, 11 Nov 2024 15:46:25 +0000 (16:46 +0100)]
block: remove the ioprio field from struct request
The request ioprio is only initialized from the first attached bio,
so requests without a bio already never set it. Directly use the
bio field instead.
The zram->comp_algs[ZRAM_PRIMARY_COMP] can be NULL in zram_add() if
comp_algorithm_set() has not been called. User can access the zram device
by sysfs after device_add_disk(), so there is a time window to trigger
the NULL pointer dereference. Move it ahead device_add_disk() to make sure
when user can access the zram device, it is ready. comp_algorithm_set() is
protected by zram->init_lock in other places and no such problem.
In case of an early failure in dasd_init, dasd_proc_init is never
called and /proc/dasd* files are never created. That can happen, for
example, if an incompatible or incorrect argument is provided to the
dasd_mod.dasd= kernel parameter.
However, the attempted removal of /proc/dasd* files causes 8 warnings
and backtraces in this case. 4 on the error path within dasd_init and
4 when the dasd module is unloaded. Notice the "removing permanent
/proc entry 'devices'" message that is caused by the dasd_proc_exit
function trying to remove /proc/devices instead of /proc/dasd/devices
since dasd_proc_root_entry is NULL and /proc/devices is indeed
permanent. Example:
------------[ cut here ]------------
removing permanent /proc entry 'devices'
WARNING: CPU: 6 PID: 557 at fs/proc/generic.c:701 remove_proc_entry+0x22e/0x240
While the cause is a user failure, the dasd module should handle the
situation more gracefully. One of the simplest solutions is to make
removal of the /proc/dasd* entries idempotent.
Jens Axboe [Sun, 10 Nov 2024 03:06:30 +0000 (20:06 -0700)]
Merge branch 'for-6.13/block' into for-next
* for-6.13/block:
loop: fix type of block size
MAINTAINERS: Make Yu Kuai co-maintainer of md/raid subsystem
md/raid5: Wait sync io to finish before changing group cnt
Jens Axboe [Sun, 10 Nov 2024 03:06:13 +0000 (20:06 -0700)]
Merge tag 'md-6.13-20241107' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.13/block
Pull raid5 fix from Song.
* tag 'md-6.13-20241107' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
MAINTAINERS: Make Yu Kuai co-maintainer of md/raid subsystem
md/raid5: Wait sync io to finish before changing group cnt
Li Wang [Sat, 9 Nov 2024 02:27:44 +0000 (10:27 +0800)]
loop: fix type of block size
PAGE_SIZE may be 64K, and the max block size can be PAGE_SIZE, so any
variable for holding block size can't be defined as 'unsigned short'.
Unfortunately commit 473516b36193 ("loop: use the atomic queue limits
update API") passes 'bsize' with type of 'unsigned short' to
loop_reconfigure_limits(), and causes LTP/ioctl_loop06 test failure:
12 ioctl_loop06.c:76: TINFO: Using LOOP_SET_BLOCK_SIZE with arg > PAGE_SIZE
13 ioctl_loop06.c:59: TFAIL: Set block size succeed unexpectedly
...
18 ioctl_loop06.c:76: TINFO: Using LOOP_CONFIGURE with block_size > PAGE_SIZE
19 ioctl_loop06.c:59: TFAIL: Set block size succeed unexpectedly
Fixes the issue by defining 'block size' variable with 'unsigned int', which is
aligned with block layer's definition.
(improve commit log & add fixes tag)
Fixes: 473516b36193 ("loop: use the atomic queue limits update API") Cc: John Garry <john.g.garry@oracle.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Li Wang <liwang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241109022744.1126003-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Xiao Ni [Wed, 6 Nov 2024 09:51:24 +0000 (17:51 +0800)]
md/raid5: Wait sync io to finish before changing group cnt
One customer reports a bug: raid5 is hung when changing thread cnt
while resync is running. The stripes are all in conf->handle_list
and new threads can't handle them.
Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
pers->quiesce from mddev_suspend/resume. Before this patch, mddev_suspend
needs to wait for all ios including sync io to finish. Now it's used
to only wait normal io.
Fix this by calling raid5_quiesce from raid5_store_group_thread_cnt
directly to wait all sync requests to finish before changing the group
cnt.
Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()") Cc: stable@vger.kernel.org Signed-off-by: Xiao Ni <xni@redhat.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20241106095124.74577-1-xni@redhat.com Signed-off-by: Song Liu <song@kernel.org>
Jens Axboe [Thu, 7 Nov 2024 23:27:28 +0000 (16:27 -0700)]
Merge branch 'for-6.13/block' into for-next
* for-6.13/block:
block: don't verify IO lock for freeze/unfreeze in elevator_init_mq()
block: always verify unfreeze lock on the owner task
rbd: unfreeze queue after marking disk as dead
block: remove blk_freeze_queue()
Ming Lei [Thu, 31 Oct 2024 13:37:20 +0000 (21:37 +0800)]
block: don't verify IO lock for freeze/unfreeze in elevator_init_mq()
elevator_init_mq() is only called at the entry of add_disk_fwnode() when
disk IO isn't allowed yet.
So not verify io lock(q->io_lockdep_map) for freeze & unfreeze in
elevator_init_mq().
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Reported-by: Lai Yi <yi1.lai@linux.intel.com> Fixes: f1be1788a32e ("block: model freeze & enter queue as lock for supporting lockdep") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241031133723.303835-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Thu, 31 Oct 2024 13:37:19 +0000 (21:37 +0800)]
block: always verify unfreeze lock on the owner task
commit f1be1788a32e ("block: model freeze & enter queue as lock for
supporting lockdep") tries to apply lockdep for verifying freeze &
unfreeze. However, the verification is only done the outmost freeze and
unfreeze. This way is actually not correct because q->mq_freeze_depth
still may drop to zero on other task instead of the freeze owner task.
Fix this issue by always verifying the last unfreeze lock on the owner
task context, and make sure both the outmost freeze & unfreeze are
verified in the current task.
Damien Le Moal [Thu, 7 Nov 2024 06:43:00 +0000 (15:43 +0900)]
block: Add a public bdev_zone_is_seq() helper
Turn the private disk_zone_is_conv() function in blk-zoned.c into a
public and documented bdev_zone_is_seq() helper with the inverse
polarity of the original function, also adding a check for non-zoned
devices so that all file systems can use the helper, even with a regular
block device.
Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241107064300.227731-3-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Thu, 7 Nov 2024 06:42:59 +0000 (15:42 +0900)]
block: RCU protect disk->conv_zones_bitmap
Ensure that a disk revalidation changing the conventional zones bitmap
of a disk does not cause invalid memory references when using the
disk_zone_is_conv() helper by RCU protecting the disk->conv_zones_bitmap
pointer.
disk_zone_is_conv() is modified to operate under the RCU read lock and
the function disk_set_conv_zones_bitmap() is added to update a disk
conv_zones_bitmap pointer using rcu_replace_pointer() with the disk
zone_wplugs_lock spinlock held.
disk_free_zone_resources() is modified to call
disk_update_zone_resources() with a NULL bitmap pointer to free the disk
conv_zones_bitmap. disk_set_conv_zones_bitmap() is also used in
disk_update_zone_resources() to set the new (revalidated) bitmap and
free the old one.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241107064300.227731-2-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
zhangguopeng [Thu, 7 Nov 2024 10:42:58 +0000 (18:42 +0800)]
block: Replace sprintf() with sysfs_emit()
Per Documentation/filesystems/sysfs.rst, show() should only use
sysfs_emit() or sysfs_emit_at() when formatting the value to be
returned to user space.
Jens Axboe [Thu, 7 Nov 2024 22:24:51 +0000 (15:24 -0700)]
Merge branch 'for-6.13/block' into for-next
* for-6.13/block: (61 commits)
block: Switch to using refcount_t for zone write plugs
Revert "block: pre-calculate max_zone_append_sectors"
mtip32xx: Replace deprecated PCI functions
md/md-bitmap: Add missing destroy_work_on_stack()
md/raid5: don't set Faulty rdev for blocked_rdev
md/raid10: don't wait for Faulty rdev in wait_blocked_rdev()
md/raid1: don't wait for Faulty rdev in wait_blocked_rdev()
md/raid1: factor out helper to handle blocked rdev from raid1_write_request()
md: don't record new badblocks for faulty rdev
md: don't wait faulty rdev in md_wait_for_blocked_rdev()
md: add a new helper rdev_blocked()
md/raid5-ppl: Use atomic64_inc_return() in ppl_new_iounit()
block: pre-calculate max_zone_append_sectors
block: remove the max_zone_append_sectors check in blk_revalidate_disk_zones
block: update blk_stack_limits documentation
lib/iov_iter: fix bvec iterator setup
loop: Simplify discard granularity calc
block: remove bio_add_zone_append_page
block: remove zone append special casing from the direct I/O path
lib/iov_iter.c: initialize bi.bi_idx before iterating over bvec
...
Ming Lei [Fri, 25 Oct 2024 12:22:41 +0000 (20:22 +0800)]
io_uring: support SQE grouping
SQE group is defined as one chain of SQEs starting with the first SQE
that has IOSQE_GROUP_LINK set, and ending with the first subsequent SQE
that doesn't have it set, and it is similar to a chain of linked SQEs.
However, unlike linked SQEs, where each sqe is issued after the previous
has completed, with SQE grouping all SQEs in one group can be submitted
in parallel. To simplify the implementation from beginning, all members
are queued after the leader is completed, however, this way may change
and the leader and members may be issued concurrently in future.
The 1st SQE is the group leader, and the other SQEs are group members.
The whole group shares single IOSQE_IO_LINK and IOSQE_IO_DRAIN from the
group leader, and the two flags can't be set for group members. For the
sake of simplicity, IORING_OP_LINK_TIMEOUT is disallowed for SQE group
now.
When the group is in one link chain, this group isn't submitted until
the previous SQE or group is completed. And the following SQE or group
can't be started before this group has completed. Failure from any group
member will fail the group leader, which in turn will terminate the link
chain.
When IOSQE_IO_DRAIN is set for the group leader, all requests in this
group and previous requests submitted are drained. Given IOSQE_IO_DRAIN
can be set for group leader only, we respect IO_DRAIN by always
completing group leader as the last one in the group. Meanwhile it is
natural to post leader's CQE as the last one from application viewpoint.
Combined with IOSQE_IO_LINK, SQE grouping provides a flexible way to
support N:M dependencies, such as:
- group A is chained with group B together
- group A has N SQEs
- group B has M SQEs
then M SQEs in group B depend on N SQEs in group A.
N:M dependency can support some interesting use cases in an efficient
way:
1) read from multiple files, then write the read data into single file
2) read from single file, and write the read data into multiple files
3) write same data into multiple files, and read data from multiple
files and compare if correct data is written
Note: this grabs the last sqe flag bit available. If we need more flags
in the future, then see the linked patch/discussion for how that could
be done with a setup flag.
Ming Lei [Thu, 7 Nov 2024 11:01:34 +0000 (19:01 +0800)]
io_uring/rsrc: pass 'struct io_ring_ctx' reference to rsrc helpers
`io_rsrc_node` instance won't be shared among different io_uring ctxs,
and its allocation 'ctx' is always same with the user's 'ctx', so it is
safe to pass user 'ctx' reference to rsrc helpers. Even in io_clone_buffers(),
`io_rsrc_node` instance is allocated actually for destination io_uring_ctx.
Then io_rsrc_node_ctx() can be removed, and the 8 bytes `ctx` pointer will be
removed from `io_rsrc_node` in the following patch.
Linus Torvalds [Thu, 7 Nov 2024 17:41:34 +0000 (07:41 -1000)]
Merge tag 'pwm/for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fix from Uwe Kleine-König:
"Fix period setting in imx-tpm driver and a maintainer update
Erik Schumacher found and fixed a problem in the calculation of the
PWM period setting yielding too long periods. Trevor Gamblin - who
already cared about mainlining the pwm-axi-pwmgen driver - stepped
forward as an additional reviewer.
Thanks to Erik and Trevor"
* tag 'pwm/for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
MAINTAINERS: add self as reviewer for AXI PWM GENERATOR
pwm: imx-tpm: Use correct MODULO value for EPWM mode
David Wang [Wed, 6 Nov 2024 02:12:28 +0000 (10:12 +0800)]
proc/softirqs: replace seq_printf with seq_put_decimal_ull_width
seq_printf is costy, on a system with n CPUs, reading /proc/softirqs
would yield 10*n decimal values, and the extra cost parsing format string
grows linearly with number of cpus. Replace seq_printf with
seq_put_decimal_ull_width have significant performance improvement.
On an 8CPUs system, reading /proc/softirqs show ~40% performance
gain with this patch.
Signed-off-by: David Wang <00107082@163.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 6 Nov 2024 23:09:22 +0000 (13:09 -1000)]
Merge tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"These are mostly fixes that came up during the nfs bakeathon the other
week.
Stable Fixes:
- Fix KMSAN warning in decode_getfattr_attrs()
Other Bugfixes:
- Handle -ENOTCONN in xs_tcp_setup_socked()
- NFSv3: only use NFS timeout for MOUNT when protocols are compatible
- Fix attribute delegation behavior on exclusive create and a/mtime
changes
- Fix localio to cope with racing nfs_local_probe()
- Avoid i_lock contention in fs_clear_invalid_mapping()"
* tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
nfs: avoid i_lock contention in nfs_clear_invalid_mapping
nfs_common: fix localio to cope with racing nfs_local_probe()
NFS: Further fixes to attribute delegation a/mtime changes
NFS: Fix attribute delegation behaviour on exclusive create
nfs: Fix KMSAN warning in decode_getfattr_attrs()
NFSv3: only use NFS timeout for MOUNT when protocols are compatible
sunrpc: handle -ENOTCONN in xs_tcp_setup_socket()
Pavel Begunkov [Tue, 5 Nov 2024 02:12:33 +0000 (02:12 +0000)]
io_uring: avoid normal tw intermediate fallback
When a DEFER_TASKRUN io_uring is terminating it requeues deferred task
work items as normal tw, which can further fallback to kthread
execution. Avoid this extra step and always push them to the fallback
kthread.
Olivier Langlois [Sun, 13 Oct 2024 18:29:24 +0000 (14:29 -0400)]
io_uring/napi: add static napi tracking strategy
Add the static napi tracking strategy. That allows the user to manually
manage the napi ids list for busy polling, and eliminate the overhead of
dynamically updating the list from the fast path.
Olivier Langlois [Sun, 13 Oct 2024 18:29:12 +0000 (14:29 -0400)]
io_uring/napi: clean up __io_napi_do_busy_loop
__io_napi_do_busy_loop now requires to have loop_end in its parameters.
This makes the code cleaner and also has the benefit of removing a
branch since the only caller not passing NULL for loop_end_arg is also
setting the value conditionally.
Olivier Langlois [Sun, 13 Oct 2024 18:28:50 +0000 (14:28 -0400)]
io_uring/napi: improve __io_napi_add
1. move the sock->sk pointer validity test outside the function to
avoid the function call overhead and to make the function more
more reusable
2. change its name to __io_napi_add_id to be more precise about it is
doing
3. return an error code to report errors
Pavel Begunkov [Mon, 4 Nov 2024 12:02:47 +0000 (12:02 +0000)]
io_uring: prevent speculating sq_array indexing
The SQ index array consists of user provided indexes, which io_uring
then uses to index the SQ, and so it's susceptible to speculation. For
all other queues io_uring tracks heads and tails in kernel, and they
shouldn't need any special care.
Jens Axboe [Sun, 3 Nov 2024 17:23:38 +0000 (10:23 -0700)]
io_uring: move struct io_kiocb from task_struct to io_uring_task
Rather than store the task_struct itself in struct io_kiocb, store
the io_uring specific task_struct. The life times are the same in terms
of io_uring, and this avoids doing some dereferences through the
task_struct. For the hot path of putting local task references, we can
deref req->tctx instead, which we'll need anyway in that function
regardless of whether it's local or remote references.
This is mostly straight forward, except the original task PF_EXITING
check needs a bit of tweaking. task_work is _always_ run from the
originating task, except in the fallback case, where it's run from a
kernel thread. Replace the potentially racy (in case of fallback work)
checks for req->task->flags with current->flags. It's either the still
the original task, in which case PF_EXITING will be sane, or it has
PF_KTHREAD set, in which case it's fallback work. Both cases should
prevent moving forward with the given request.
Jens Axboe [Sun, 3 Nov 2024 17:22:43 +0000 (10:22 -0700)]
io_uring: move cancelations to be io_uring_task based
Right now the task_struct pointer is used as the key to match a task,
but in preparation for some io_kiocb changes, move it to using struct
io_uring_task instead. No functional changes intended in this patch.
Jens Axboe [Sun, 3 Nov 2024 15:46:07 +0000 (08:46 -0700)]
io_uring/rsrc: split io_kiocb node type assignments
Currently the io_rsrc_node assignment in io_kiocb is an array of two
pointers, as two nodes may be assigned to a request - one file node,
and one buffer node. However, the buffer node can co-exist with the
provided buffers, as currently it's not supported to use both provided
and registered buffers at the same time.
This crucially brings struct io_kiocb down to 4 cache lines again, as
before it spilled into the 5th cacheline.
Jens Axboe [Sun, 3 Nov 2024 15:17:28 +0000 (08:17 -0700)]
io_uring/rsrc: encode node type and ctx together
Rather than keep the type field separate rom ctx, use the fact that we
can encode up to 4 types of nodes in the LSB of the ctx pointer. Doesn't
reclaim any space right now on 64-bit archs, but it leaves a full int
for future use.
Linus Torvalds [Wed, 6 Nov 2024 19:29:15 +0000 (09:29 -1000)]
Merge tag 'keys-next-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull keys fixes from Jarkko Sakkinen:
"A couple of fixes for keys and trusted keys"
* tag 'keys-next-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
KEYS: trusted: dcp: fix NULL dereference in AEAD crypto operation
security/keys: fix slab-out-of-bounds in key_task_permission
Linus Torvalds [Wed, 6 Nov 2024 18:08:39 +0000 (08:08 -1000)]
Merge tag 'tracefs-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracefs fixes from Steven Rostedt:
"Fix tracefs mount options.
Commit 78ff64081949 ("vfs: Convert tracefs to use the new mount API")
broke the gid setting when set by fstab or other mount utility. It is
ignored when it is set. Fix the code so that it recognises the option
again and will honor the settings on mount at boot up.
Update the internal documentation and create a selftest to make sure
it doesn't break again in the future"
* tag 'tracefs-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/selftests: Add tracefs mount options test
tracing: Document tracefs gid mount option
tracing: Fix tracefs mount options
Linus Torvalds [Wed, 6 Nov 2024 18:03:19 +0000 (08:03 -1000)]
Merge tag 'platform-drivers-x86-v6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
- AMD PMF: Add new hardware id
- AMD PMC: Fix crash when loaded with enable_stb=1 on devices without STB
- Dell: Add Alienware hwid for Alienware systems with Dell WMI interface
- thinkpad_acpi: Quirk to fix wrong fan speed readings on L480
- New hotkey mappings for Dell and Lenovo laptops
* tag 'platform-drivers-x86-v6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: thinkpad_acpi: Fix for ThinkPad's with ECFW showing incorrect fan speed
platform/x86: ideapad-laptop: add missing Ideapad Pro 5 fn keys
platform/x86: dell-wmi-base: Handle META key Lock/Unlock events
platform/x86: dell-smbios-base: Extends support to Alienware products
platform/x86/amd/pmc: Detect when STB is not available
platform/x86/amd/pmf: Add SMU metrics table support for 1Ah family 60h model
Linus Torvalds [Wed, 6 Nov 2024 17:56:47 +0000 (07:56 -1000)]
Merge tag 'for-6.12/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mikulas Patocka:
- fix memory safety bugs in dm-cache
- fix restart/panic logic in dm-verity
- fix 32-bit unsigned integer overflow in dm-unstriped
- fix a device mapper crash if blk_alloc_disk fails
* tag 'for-6.12/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: fix potential out-of-bounds access on the first resume
dm cache: optimize dirty bit checking with find_next_bit when resizing
dm cache: fix out-of-bounds access to the dirty bitset when resizing
dm cache: fix flushing uninitialized delayed_work on cache_ctr error
dm cache: correct the number of origin blocks to match the target length
dm-verity: don't crash if panic_on_corruption is not selected
dm-unstriped: cast an operand to sector_t to prevent potential uint32_t overflow
dm: fix a crash if blk_alloc_disk fails
Jens Axboe [Wed, 6 Nov 2024 14:55:19 +0000 (07:55 -0700)]
Merge tag 'md-6.13-20241105' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.13/block
Pull MD changes from Song:
"1. Enhance handling of faulty and blocked devices, by Yu Kuai.
2. raid5-ppl atomic improvement, by Uros Bizjak.
3. md-bitmap fix, by Yuan Can."
* tag 'md-6.13-20241105' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md/md-bitmap: Add missing destroy_work_on_stack()
md/raid5: don't set Faulty rdev for blocked_rdev
md/raid10: don't wait for Faulty rdev in wait_blocked_rdev()
md/raid1: don't wait for Faulty rdev in wait_blocked_rdev()
md/raid1: factor out helper to handle blocked rdev from raid1_write_request()
md: don't record new badblocks for faulty rdev
md: don't wait faulty rdev in md_wait_for_blocked_rdev()
md: add a new helper rdev_blocked()
md/raid5-ppl: Use atomic64_inc_return() in ppl_new_iounit()
Philipp Stanner [Wed, 6 Nov 2024 14:52:50 +0000 (15:52 +0100)]
mtip32xx: Replace deprecated PCI functions
pcim_iomap_table() and pcim_request_regions() have been deprecated in
commit e354bb84a4c1 ("PCI: Deprecate pcim_iomap_table(),
pcim_iomap_regions_request_all()") and commit d140f80f60358 ("PCI:
Deprecate pcim_iomap_regions() in favor of pcim_iomap_region()"),
respectively.
Vishnu Sankar [Tue, 5 Nov 2024 23:55:05 +0000 (08:55 +0900)]
platform/x86: thinkpad_acpi: Fix for ThinkPad's with ECFW showing incorrect fan speed
Fix for Thinkpad's with ECFW showing incorrect fan speed. Some models use
decimal instead of hexadecimal for the speed stored in the EC registers.
For example the rpm register will have 0x4200 instead of 0x1068, here
the actual RPM is "4200" in decimal.
Add a quirk to handle this.
Signed-off-by: Vishnu Sankar <vishnuocv@gmail.com> Suggested-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://lore.kernel.org/r/20241105235505.8493-1-vishnuocv@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Yuan Can [Tue, 5 Nov 2024 13:01:05 +0000 (21:01 +0800)]
md/md-bitmap: Add missing destroy_work_on_stack()
This commit add missed destroy_work_on_stack() operations for
unplug_work.work in bitmap_unplug_async().
Fixes: a022325ab970 ("md/md-bitmap: add a new helper to unplug bitmap asynchrously") Cc: stable@vger.kernel.org Signed-off-by: Yuan Can <yuancan@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20241105130105.127336-1-yuancan@huawei.com Signed-off-by: Song Liu <song@kernel.org>
Yu Kuai [Thu, 31 Oct 2024 03:31:14 +0000 (11:31 +0800)]
md/raid5: don't set Faulty rdev for blocked_rdev
Faulty rdev should never be accessed anymore, hence there is no point to
wait for bad block to be acknowledged in this case while handling write
request.
Yu Kuai [Thu, 31 Oct 2024 03:31:13 +0000 (11:31 +0800)]
md/raid10: don't wait for Faulty rdev in wait_blocked_rdev()
Faulty rdev should never be accessed anymore, hence there is no point to
wait for bad block to be acknowledged in this case while handling write
request.
Yu Kuai [Thu, 31 Oct 2024 03:31:12 +0000 (11:31 +0800)]
md/raid1: don't wait for Faulty rdev in wait_blocked_rdev()
Faulty rdev should never be accessed anymore, hence there is no point to
wait for bad block to be acknowledged in this case while handling write
request.
Yu Kuai [Thu, 31 Oct 2024 03:31:11 +0000 (11:31 +0800)]
md/raid1: factor out helper to handle blocked rdev from raid1_write_request()
Currently raid1 is preparing IO for underlying disks while checking if
any disk is blocked, if so allocated resources must be released, then
waiting for rdev to be unblocked and try to prepare IO again.
Make code cleaner by checking blocked rdev first, it doesn't matter if
rdev is blocked while issuing IO, the IO will wait for rdev to be
unblocked or not.
Yu Kuai [Thu, 31 Oct 2024 03:31:10 +0000 (11:31 +0800)]
md: don't record new badblocks for faulty rdev
Faulty will be checked before issuing IO to the rdev, however, rdev can
be faulty at any time, hence it's possible that rdev_set_badblocks()
will be called for faulty rdev. In this case, mddev->sb_flags will be
set and some other path can be blocked by updating super block.
Since faulty rdev will not be accesed anymore, there is no need to
record new babblocks for faulty rdev and forcing updating super block.
Noted this is not a bugfix, just prevent updating superblock in some
corner cases, and will help to slice a bug related to external
metadata[1], testing also shows that devices are removed faster in the
case IO error.
Yu Kuai [Thu, 31 Oct 2024 03:31:09 +0000 (11:31 +0800)]
md: don't wait faulty rdev in md_wait_for_blocked_rdev()
md_wait_for_blocked_rdev() is called for write IO while rdev is
blocked, howerver, rdev can be faulty after choosing this rdev to write,
and faulty rdev should never be accessed anymore, hence there is no point
to wait for faulty rdev to be unblocked.
Yu Kuai [Thu, 31 Oct 2024 03:31:08 +0000 (11:31 +0800)]
md: add a new helper rdev_blocked()
The helper will be used in later patches for raid1/raid10/raid5, the
difference is that Faulty rdev with unacknowledged bad block will not
be considered blocked.
Uros Bizjak [Mon, 7 Oct 2024 08:48:04 +0000 (10:48 +0200)]
md/raid5-ppl: Use atomic64_inc_return() in ppl_new_iounit()
Use atomic64_inc_return(&ref) instead of atomic64_add_return(1, &ref)
to use optimized implementation and ease register pressure around
the primitive for targets that implement optimized variant.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Link: https://lore.kernel.org/r/20241007084831.48067-1-ubizjak@gmail.com Signed-off-by: Song Liu <song@kernel.org>
Linus Torvalds [Tue, 5 Nov 2024 01:23:26 +0000 (15:23 -1000)]
Merge tag 'arm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"Where the last set of fixes was mostly drivers, this time the
devicetree changes all come at once, targeting mostly the Rockchips,
Qualcomm and NXP platforms.
The Qualcomm bugfixes target the Snapdragon X Elite laptops,
specifically problems with PCIe and NVMe support to improve
reliability, and a boot regresion on msm8939.
Also for Snapdragon platforms, there are a number of correctness
changes in the several platform specific device drivers, but none of
these are as impactful.
On the NXP i.MX platform, the fixes are all for 64-bit i.MX8 variants,
correcting individual entries in the devicetree that were incorrect
and causing the media, video, mmc and spi drivers to misbehave in
minor ways.
The Arm SCMI firmware driver gets fixes for a use-after-free bug and
for correctly parsing firmware information.
On the RISC-V side, there are three minor devicetree fixes for
starfive and sophgo, again addressing only minor mistakes. One device
driver patch fixes a problem with spurious interrupt handling"
* tag 'arm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (63 commits)
firmware: arm_scmi: Use vendor string in max-rx-timeout-ms
dt-bindings: firmware: arm,scmi: Add missing vendor string
riscv: dts: Replace deprecated snps,nr-gpios property for snps,dw-apb-gpio-port devices
arm64: dts: rockchip: Correct GPIO polarity on brcm BT nodes
arm64: dts: rockchip: Drop invalid clock-names from es8388 codec nodes
ARM: dts: rockchip: Fix the realtek audio codec on rk3036-kylin
ARM: dts: rockchip: Fix the spi controller on rk3036
ARM: dts: rockchip: drop grf reference from rk3036 hdmi
ARM: dts: rockchip: fix rk3036 acodec node
arm64: dts: rockchip: remove orphaned pinctrl-names from pinephone pro
soc: qcom: pmic_glink: Handle GLINK intent allocation rejections
rpmsg: glink: Handle rejected intent request better
arm64: dts: qcom: x1e80100: fix PCIe5 interconnect
arm64: dts: qcom: x1e80100: fix PCIe4 interconnect
arm64: dts: qcom: x1e80100: Fix up BAR spaces
MAINTAINERS: invert Misc RISC-V SoC Support's pattern
soc: qcom: socinfo: fix revision check in qcom_socinfo_probe()
arm64: dts: qcom: x1e80100-qcp: fix nvme regulator boot glitch
arm64: dts: qcom: x1e80100-microsoft-romulus: fix nvme regulator boot glitch
arm64: dts: qcom: x1e80100-yoga-slim7x: fix nvme regulator boot glitch
...
David Gstir [Tue, 29 Oct 2024 11:34:01 +0000 (12:34 +0100)]
KEYS: trusted: dcp: fix NULL dereference in AEAD crypto operation
When sealing or unsealing a key blob we currently do not wait for
the AEAD cipher operation to finish and simply return after submitting
the request. If there is some load on the system we can exit before
the cipher operation is done and the buffer we read from/write to
is already removed from the stack. This will e.g. result in NULL
pointer dereference errors in the DCP driver during blob creation.
Fix this by waiting for the AEAD cipher operation to finish before
resuming the seal and unseal calls.
Cc: stable@vger.kernel.org # v6.10+ Fixes: 0e28bf61a5f9 ("KEYS: trusted: dcp: fix leak of blob encryption key") Reported-by: Parthiban N <parthiban@linumiz.com> Closes: https://lore.kernel.org/keyrings/254d3bb1-6dbc-48b4-9c08-77df04baee2f@linumiz.com/ Signed-off-by: David Gstir <david@sigma-star.at> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Chen Ridong [Tue, 8 Oct 2024 12:46:39 +0000 (12:46 +0000)]
security/keys: fix slab-out-of-bounds in key_task_permission
KASAN reports an out of bounds read:
BUG: KASAN: slab-out-of-bounds in __kuid_val include/linux/uidgid.h:36
BUG: KASAN: slab-out-of-bounds in uid_eq include/linux/uidgid.h:63 [inline]
BUG: KASAN: slab-out-of-bounds in key_task_permission+0x394/0x410
security/keys/permission.c:54
Read of size 4 at addr ffff88813c3ab618 by task stress-ng/4362
It can be reproduced by following these steps(more details [1]):
1. Obtain more than 32 inputs that have similar hashes, which ends with the
pattern '0xxxxxxxe6'.
2. Reboot and add the keys obtained in step 1.
The reproducer demonstrates how this issue happened:
1. In the search_nested_keyrings function, when it iterates through the
slots in a node(below tag ascend_to_node), if the slot pointer is meta
and node->back_pointer != NULL(it means a root), it will proceed to
descend_to_node. However, there is an exception. If node is the root,
and one of the slots points to a shortcut, it will be treated as a
keyring.
2. Whether the ptr is keyring decided by keyring_ptr_is_keyring function.
However, KEYRING_PTR_SUBTYPE is 0x2UL, the same as
ASSOC_ARRAY_PTR_SUBTYPE_MASK.
3. When 32 keys with the similar hashes are added to the tree, the ROOT
has keys with hashes that are not similar (e.g. slot 0) and it splits
NODE A without using a shortcut. When NODE A is filled with keys that
all hashes are xxe6, the keys are similar, NODE A will split with a
shortcut. Finally, it forms the tree as shown below, where slot 6 points
to a shortcut.
4. As mentioned above, If a slot(slot 6) of the root points to a shortcut,
it may be mistakenly transferred to a key*, leading to a read
out-of-bounds read.
To fix this issue, one should jump to descend_to_node if the ptr is a
shortcut, regardless of whether the node is root or not.
[jarkko: tweaked the commit message a bit to have an appropriate closes
tag.] Fixes: b2a4df200d57 ("KEYS: Expand the capacity of a keyring") Reported-by: syzbot+5b415c07907a2990d1a3@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000cbb7860611f61147@google.com/T/ Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Linus Torvalds [Mon, 4 Nov 2024 18:07:22 +0000 (08:07 -1000)]
Merge tag 'mmc-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull mmc fixes from Ulf Hansson:
- sdhci-pci-gli: A couple of fixes for low power mode on GL9767
* tag 'mmc-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-pci-gli: GL9767: Fix low power mode in the SD Express process
mmc: sdhci-pci-gli: GL9767: Fix low power mode on the set clock function
Christoph Hellwig [Mon, 4 Nov 2024 07:39:32 +0000 (08:39 +0100)]
block: pre-calculate max_zone_append_sectors
max_zone_append_sectors differs from all other queue limits in that the
final value used is not stored in the queue_limits but needs to be
obtained using queue_limits_max_zone_append_sectors helper. This not
only adds (tiny) extra overhead to the I/O path, but also can be easily
forgotten in file system code.
Add a new max_hw_zone_append_sectors value to queue_limits which is
set by the driver, and calculate max_zone_append_sectors from that and
the other inputs in blk_validate_zoned_limits, similar to how
max_sectors is calculated to fix this.
Christoph Hellwig [Mon, 4 Nov 2024 07:39:31 +0000 (08:39 +0100)]
block: remove the max_zone_append_sectors check in blk_revalidate_disk_zones
With the lock layer zone append emulation, we are now always setting a
max_zone_append_sectors value for zoned devices and this check can't
ever trigger.
Ming-Hung Tsai [Tue, 22 Oct 2024 07:13:54 +0000 (15:13 +0800)]
dm cache: fix potential out-of-bounds access on the first resume
Out-of-bounds access occurs if the fast device is expanded unexpectedly
before the first-time resume of the cache table. This happens because
expanding the fast device requires reloading the cache table for
cache_create to allocate new in-core data structures that fit the new
size, and the check in cache_preresume is not performed during the
first resume, leading to the issue.
2. load a cache table of 512 cache blocks, and deliberately expand the
fast device before resuming the cache, making the in-core data
structures inadequate.
Ming-Hung Tsai [Tue, 22 Oct 2024 07:13:39 +0000 (15:13 +0800)]
dm cache: optimize dirty bit checking with find_next_bit when resizing
When shrinking the fast device, dm-cache iteratively searches for a
dirty bit among the cache blocks to be dropped, which is less efficient.
Use find_next_bit instead, as it is twice as fast as the iterative
approach with test_bit.
Ming-Hung Tsai [Tue, 22 Oct 2024 07:13:16 +0000 (15:13 +0800)]
dm cache: fix out-of-bounds access to the dirty bitset when resizing
dm-cache checks the dirty bits of the cache blocks to be dropped when
shrinking the fast device, but an index bug in bitset iteration causes
out-of-bounds access.
Reproduce steps:
1. create a cache device of 1024 cache blocks (128 bytes dirty bitset)
Ming-Hung Tsai [Tue, 22 Oct 2024 07:12:49 +0000 (15:12 +0800)]
dm cache: fix flushing uninitialized delayed_work on cache_ctr error
An unexpected WARN_ON from flush_work() may occur when cache creation
fails, caused by destroying the uninitialized delayed_work waker in the
error path of cache_create(). For example, the warning appears on the
superblock checksum error.
(snip)
WARNING: CPU: 0 PID: 84 at kernel/workqueue.c:4178 __flush_work+0x5d4/0x890
Fix by pulling out the cancel_delayed_work_sync() from the constructor's
error path. This patch doesn't affect the use-after-free fix for
concurrent dm_resume and dm_destroy (commit 6a459d8edbdb ("dm cache: Fix
UAF in destroy()")) as cache_dtr is not changed.
Ming-Hung Tsai [Tue, 22 Oct 2024 07:12:22 +0000 (15:12 +0800)]
dm cache: correct the number of origin blocks to match the target length
When creating a cache device, the actual size of the cache origin might
be greater than the specified cache target length. In such case, the
number of origin blocks should match the cache target length, not the
full size of the origin device, since access beyond the cache target is
not possible. This issue occurs when reducing the origin device size
using lvm, as lvreduce preloads the new cache table before resuming the
cache origin, which can result in incorrect sizes for the discard bitset
and smq hotspot blocks.
Reproduce steps:
1. create a cache device consists of 4096 origin blocks
Mikulas Patocka [Tue, 29 Oct 2024 11:17:13 +0000 (12:17 +0100)]
dm-verity: don't crash if panic_on_corruption is not selected
If the user sets panic_on_error and doesn't set panic_on_corruption,
dm-verity should not panic on data mismatch. But, currently it panics,
because it treats data mismatch as I/O error.
This commit fixes the logic so that if there is data mismatch and
panic_on_corruption or restart_on_corruption is not selected, the system
won't restart or panic.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Fixes: f811b83879fb ("dm-verity: introduce the options restart_on_error and panic_on_error")
Zichen Xie [Mon, 21 Oct 2024 19:54:45 +0000 (14:54 -0500)]
dm-unstriped: cast an operand to sector_t to prevent potential uint32_t overflow
This was found by a static analyzer.
There may be a potential integer overflow issue in
unstripe_ctr(). uc->unstripe_offset and uc->unstripe_width are
defined as "sector_t"(uint64_t), while uc->unstripe,
uc->chunk_size and uc->stripes are all defined as "uint32_t".
The result of the calculation will be limited to "uint32_t"
without correct casting.
So, we recommend adding an extra cast to prevent potential
integer overflow.
Mike Snitzer [Fri, 18 Oct 2024 21:15:41 +0000 (17:15 -0400)]
nfs: avoid i_lock contention in nfs_clear_invalid_mapping
Multi-threaded buffered reads to the same file exposed significant
inode spinlock contention in nfs_clear_invalid_mapping().
Eliminate this spinlock contention by checking flags without locking,
instead using smp_rmb and smp_load_acquire accordingly, but then take
spinlock and double-check these inode flags.
Also refactor nfs_set_cache_invalid() slightly to use
smp_store_release() to pair with nfs_clear_invalid_mapping()'s
smp_load_acquire().
While this fix is beneficial for all multi-threaded buffered reads
issued by an NFS client, this issue was identified in the context of
surprisingly low LOCALIO performance with 4K multi-threaded buffered
read IO. This fix dramatically speeds up LOCALIO performance:
Fixes: 17dfeb911339 ("NFS: Fix races in nfs_revalidate_mapping") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Mike Snitzer [Wed, 23 Oct 2024 20:34:42 +0000 (16:34 -0400)]
nfs_common: fix localio to cope with racing nfs_local_probe()
Fix the possibility of racing nfs_local_probe() resulting in:
list_add double add: new=ffff8b99707f9f58, prev=ffff8b99707f9f58, next=ffffffffc0f30000.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:35!
Add nfs_uuid_init() to properly initialize all nfs_uuid_t members
(particularly its list_head).
Switch to returning bool from nfs_uuid_begin(), returns false if
nfs_uuid_t is already in-use (its list_head is on a list). Update
nfs_local_probe() to return early if the nfs_client's cl_uuid
(nfs_uuid_t) is in-use.
Also, switch nfs_uuid_begin() from using list_add_tail_rcu() to
list_add_tail() -- rculist was used in an earlier version of the
localio code that had a lockless nfs_uuid_lookup interface.
Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Trond Myklebust [Wed, 23 Oct 2024 21:05:48 +0000 (17:05 -0400)]
NFS: Further fixes to attribute delegation a/mtime changes
When asked to set both an atime and an mtime to the current system time,
ensure that the setting is atomic by calling inode_update_timestamps()
only once with the appropriate flags.
Fixes: e12912d94137 ("NFSv4: Add support for delegated atime and mtime attributes") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Trond Myklebust [Wed, 23 Oct 2024 13:35:43 +0000 (09:35 -0400)]
NFS: Fix attribute delegation behaviour on exclusive create
When the client does an exclusive create and the server decides to store
the verifier in the timestamps, a SETATTR is subsequently sent to fix up
those timestamps. When that is the case, suppress the exceptions for
attribute delegations in nfs4_bitmap_copy_adjust().
Fixes: 32215c1f893a ("NFSv4: Don't request atime/mtime/size if they are delegated to us") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
The KMSAN warning is triggered in decode_getfattr_attrs(), when calling
decode_attr_mdsthreshold(). It appears that fattr->mdsthreshold is not
initialized.
Fix the issue by initializing fattr->mdsthreshold to NULL in
nfs_fattr_init().
NeilBrown [Fri, 4 Oct 2024 01:07:23 +0000 (11:07 +1000)]
NFSv3: only use NFS timeout for MOUNT when protocols are compatible
If a timeout is specified in the mount options, it currently applies to
both the NFS protocol and (with v3) the MOUNT protocol. This is
sensible when they both use the same underlying protocol, or those
protocols are compatible w.r.t timeouts as RDMA and TCP are.
However if, for example, NFS is using TCP and MOUNT is using UDP then
using the same timeout doesn't make much sense.
If you
mount -o vers=3,proto=tcp,mountproto=udp,timeo=600,retrans=5 \
server:/path /mountpoint
then the timeo=600 which was intended for the NFS/TCP request will
apply to the MOUNT/UDP requests with the result that there will only be
one request sent (because UDP has a maximum timeout of 60 seconds).
This is not what a reasonable person might expect.
This patch disables the sharing of timeout information in cases where
the underlying protocols are not compatible.
Fixes: c9301cb35b59 ("nfs: hornor timeo and retrans option when mounting NFSv3") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
NeilBrown [Wed, 9 Oct 2024 05:28:06 +0000 (16:28 +1100)]
sunrpc: handle -ENOTCONN in xs_tcp_setup_socket()
xs_tcp_finish_connecting() can return -ENOTCONN but the switch statement
in xs_tcp_setup_socket() treats that as an unhandled error.
If we treat it as a known error it would propagate back to
call_connect_status() which does handle that error code. This appears
to be the intention of the commit (given below) which added -ENOTCONN as
a return status for xs_tcp_finish_connecting().
So add -ENOTCONN to the switch statement as an error to pass through to
the caller.
Arnd Bergmann [Mon, 4 Nov 2024 13:22:53 +0000 (14:22 +0100)]
Merge tag 'qcom-drivers-fixes-for-6.12' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes
Qualcomm driver fixes for v6.12
The Qualcomm EDAC driver's configuration of interrupts is made optional,
to avoid violating security constriants on X Elite platform .
The SCM drivers' detection mechanism for the presence of SHM bridge in QTEE,
is corrected to handle the case where firmware successfully returns that
the interface isn't supported.
The GLINK driver and the PMIC GLINK interface is updated to handle
buffer allocation issues during initialization of the communication
channel.
Allocation error handling in the socinfo dirver is corrected, and then
the fix is corrected.
* tag 'qcom-drivers-fixes-for-6.12' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
soc: qcom: pmic_glink: Handle GLINK intent allocation rejections
rpmsg: glink: Handle rejected intent request better
soc: qcom: socinfo: fix revision check in qcom_socinfo_probe()
firmware: qcom: scm: Return -EOPNOTSUPP for unsupported SHM bridge enabling
EDAC/qcom: Make irq configuration optional
firmware: qcom: scm: fix a NULL-pointer dereference
firmware: qcom: scm: suppress download mode error
soc: qcom: Add check devm_kasprintf() returned value
MAINTAINERS: Qualcomm SoC: Match reserved-memory bindings
Renato Caldas [Sat, 2 Nov 2024 18:31:16 +0000 (18:31 +0000)]
platform/x86: ideapad-laptop: add missing Ideapad Pro 5 fn keys
The scancodes for the Mic Mute and Airplane keys on the Ideapad Pro 5
(14AHP9 at least, probably the other variants too) are different and
were not being picked up by the driver. This adds them to the keymap.
Apart from what is already supported, the remaining fn keys are
unfortunately producing windows-specific key-combos.
Kurt Borja [Thu, 31 Oct 2024 15:44:42 +0000 (12:44 -0300)]
platform/x86: dell-wmi-base: Handle META key Lock/Unlock events
Some Alienware devices have a key that locks/unlocks the Meta key. This
key triggers a WMI event that should be ignored by the kernel, as it's
handled by internally the firmware.
There is no known way of changing this default behavior. The firmware
would lock/unlock the Meta key, regardless of how the event is handled.
Tested on an Alienware x15 R1.
Signed-off-by: Kurt Borja <kuurtb@gmail.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Pali Rohár <pali@kernel.org> Link: https://lore.kernel.org/r/20241031154441.6663-2-kuurtb@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Kurt Borja [Thu, 31 Oct 2024 15:40:24 +0000 (12:40 -0300)]
platform/x86: dell-smbios-base: Extends support to Alienware products
Fixes the following error:
dell_smbios: Unable to run on non-Dell system
Which is triggered after dell-wmi driver fails to initialize on
Alienware systems, as it depends on dell-smbios.
This effectively extends dell-wmi, dell-smbios and dcdbas support to
Alienware devices, that might share some features of the SMBIOS intereface
calling interface with other Dell products.
Tested on an Alienware X15 R1.
Signed-off-by: Kurt Borja <kuurtb@gmail.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Pali Rohár <pali@kernel.org> Link: https://lore.kernel.org/r/20241031154023.6149-2-kuurtb@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Corey Hickey [Mon, 28 Oct 2024 18:02:41 +0000 (11:02 -0700)]
platform/x86/amd/pmc: Detect when STB is not available
Loading the amd_pmc module as:
amd_pmc enable_stb=1
...can result in the following messages in the kernel ring buffer:
amd_pmc AMDI0009:00: SMU cmd failed. err: 0xff
ioremap on RAM at 0x0000000000000000 - 0x0000000000ffffff
WARNING: CPU: 10 PID: 2151 at arch/x86/mm/ioremap.c:217 __ioremap_caller+0x2cd/0x340
Further debugging reveals that this occurs when the requests for
S2D_PHYS_ADDR_LOW and S2D_PHYS_ADDR_HIGH return a value of 0,
indicating that the STB is inaccessible. To prevent the ioremap
warning and provide clarity to the user, handle the invalid address
and display an error message.
Shyam Sundar S K [Wed, 23 Oct 2024 06:32:41 +0000 (12:02 +0530)]
platform/x86/amd/pmf: Add SMU metrics table support for 1Ah family 60h model
Add SMU metrics table support for 1Ah family 60h model. This information
will be used by the PMF driver to alter the system thermals.
Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241023063245.1404420-2-Shyam-sundar.S-k@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Jarkko Sakkinen [Thu, 31 Oct 2024 00:16:09 +0000 (02:16 +0200)]
tpm: Lock TPM chip in tpm_pm_suspend() first
Setting TPM_CHIP_FLAG_SUSPENDED in the end of tpm_pm_suspend() can be racy
according, as this leaves window for tpm_hwrng_read() to be called while
the operation is in progress. The recent bug report gives also evidence of
this behaviour.
Aadress this by locking the TPM chip before checking any chip->flags both
in tpm_pm_suspend() and tpm_hwrng_read(). Move TPM_CHIP_FLAG_SUSPENDED
check inside tpm_get_random() so that it will be always checked only when
the lock is reserved.
Cc: stable@vger.kernel.org # v6.4+ Fixes: 99d464506255 ("tpm: Prevent hwrng from activating during resume") Reported-by: Mike Seo <mikeseohyungjin@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219383 Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Mike Seo <mikeseohyungjin@gmail.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Linus Torvalds [Sun, 3 Nov 2024 20:25:05 +0000 (10:25 -1000)]
Merge tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes. 9 are cc:stable. 13 are MM and 4 are non-MM.
The usual collection of singletons - please see the changelogs"
* tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()
mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes
mm: shrinker: avoid memleak in alloc_shrinker_info
.mailmap: update e-mail address for Eugen Hristev
vmscan,migrate: fix page count imbalance on node stats when demoting pages
mailmap: update Jarkko's email addresses
mm: allow set/clear page_type again
nilfs2: fix potential deadlock with newly created symlinks
Squashfs: fix variable overflow in squashfs_readpage_block
kasan: remove vmalloc_percpu test
tools/mm: -Werror fixes in page-types/slabinfo
mm, swap: avoid over reclaim of full clusters
mm: fix PSWPIN counter for large folios swap-in
mm: avoid VM_BUG_ON when try to map an anon large folio to zero page.
mm/codetag: fix null pointer check logic for ref and tag
mm/gup: stop leaking pinned pages in low memory conditions
Linus Torvalds [Sun, 3 Nov 2024 20:15:50 +0000 (10:15 -1000)]
Merge tag 'dmaengine-fix-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
- TI driver fix to set EOP for cyclic BCDMA transfers
- sh rz-dmac driver fix for handling config with zero address
* tag 'dmaengine-fix-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: ti: k3-udma: Set EOP for all TRs in cyclic BCDMA transfer
dmaengine: sh: rz-dmac: handle configs where one address is zero
Linus Torvalds [Sun, 3 Nov 2024 18:51:53 +0000 (08:51 -1000)]
Merge tag 'driver-core-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core revert from Greg KH:
"Here is a single driver core revert for 6.12-rc6. It reverts a change
that came in -rc1 that was supposed to resolve a reported problem, but
caused another one, so revert it for now so that we can get this all
worked out properly in 6.13.
The revert has been in linux-next all week with no reported issues"
* tag 'driver-core-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Revert "driver core: Fix uevent_show() vs driver detach race"
Linus Torvalds [Sun, 3 Nov 2024 18:48:11 +0000 (08:48 -1000)]
Merge tag 'usb-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes for 6.12-rc6 that
have been sitting in my tree this week. Included in here are the
following:
- thunderbolt driver fixes for reported issues
- USB typec driver fixes
- xhci driver fixes for reported problems
- dwc2 driver revert for a broken change
- usb phy driver fix
- usbip tool fix
All of these have been in linux-next this week with no reported
issues"
* tag 'usb-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tcpm: restrict SNK_WAIT_CAPABILITIES_TIMEOUT transitions to non self-powered devices
usb: phy: Fix API devm_usb_put_phy() can not release the phy
usb: typec: use cleanup facility for 'altmodes_node'
usb: typec: fix unreleased fwnode_handle in typec_port_register_altmodes()
usb: typec: qcom-pmic-typec: fix missing fwnode removal in error path
usb: typec: qcom-pmic-typec: use fwnode_handle_put() to release fwnodes
usb: acpi: fix boot hang due to early incorrect 'tunneled' USB3 device links
Revert "usb: dwc2: Skip clock gating on Broadcom SoCs"
xhci: Fix Link TRB DMA in command ring stopped completion event
xhci: Use pm_runtime_get to prevent RPM on unsupported systems
usbip: tools: Fix detach_port() invalid port error path
thunderbolt: Honor TMU requirements in the domain when setting TMU mode
thunderbolt: Fix KASAN reported stack out-of-bounds read in tb_retimer_scan()
Yu Zhao [Sat, 19 Oct 2024 01:29:39 +0000 (01:29 +0000)]
mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()
When the MM_WALK capability is enabled, memory that is mostly accessed by
a VM appears younger than it really is, therefore this memory will be less
likely to be evicted. Therefore, the presence of a running VM can
significantly increase swap-outs for non-VM memory, regressing the
performance for the rest of the system.
Fix this regression by always calling {ptep,pmdp}_clear_young_notify()
whenever we clear the young bits on PMDs/PTEs.
[jthoughton@google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: James Houghton <jthoughton@google.com> Reported-by: David Stevens <stevensd@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>