Linus Torvalds [Sat, 28 Jun 2025 00:58:32 +0000 (17:58 -0700)]
Merge tag 'cxl-fixes-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) fixes from Dave Jiang:
"These fixes address a few issues in the CXL subsystem, including
dealing with some bugs in the CXL EDAC and RAS drivers:
- Fix return value of cxlctl_validate_set_features()
- Fix min_scrub_cycle of a region miscaculation and add additional
documentation
- Fix potential memory leak issues for CXL EDAC
- Fix CPER handler device confusion for CXL RAS
- Fix using wrong repair type to check DRAM event record"
* tag 'cxl-fixes-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/edac: Fix using wrong repair type to check dram event record
cxl/ras: Fix CPER handler device confusion
cxl/edac: Fix potential memory leak issues
cxl/Documentation: Add more description about min/max scrub cycle
cxl/edac: Fix the min_scrub_cycle of a region miscalculation
cxl: fix return value in cxlctl_validate_set_features()
Linus Torvalds [Sat, 28 Jun 2025 00:32:30 +0000 (17:32 -0700)]
Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fix from Eric Biggers:
"Fix a regression where the purgatory code sometimes fails to build"
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: sha256: Mark sha256_choose_blocks as __always_inline
Linus Torvalds [Fri, 27 Jun 2025 19:08:36 +0000 (12:08 -0700)]
Merge tag 'acpi-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Revert a commit that attempted to fix a memory leak in an error code
path and introduced a different issue (Zhe Qiao)"
* tag 'acpi-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "PCI/ACPI: Fix allocated memory release on error in pci_acpi_scan_root()"
Linus Torvalds [Fri, 27 Jun 2025 16:02:33 +0000 (09:02 -0700)]
Merge tag 'block-6.16-20250626' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fixes for ublk:
- fix C++ narrowing warnings in the uapi header
- update/improve UBLK_F_SUPPORT_ZERO_COPY comment in uapi header
- fix for the ublk ->queue_rqs() implementation, limiting a batch
to just the specific task AND ring
- ublk_get_data() error handling fix
- sanity check more arguments in ublk_ctrl_add_dev()
- selftest addition
- NVMe pull request via Christoph:
- reset delayed remove_work after reconnect
- fix atomic write size validation
- Fix for a warning introduced in bdev_count_inflight_rw() in this
merge window
* tag 'block-6.16-20250626' of git://git.kernel.dk/linux:
block: fix false warning in bdev_count_inflight_rw()
ublk: sanity check add_dev input for underflow
nvme: fix atomic write size validation
nvme: refactor the atomic write unit detection
nvme: reset delayed remove_work after reconnect
ublk: setup ublk_io correctly in case of ublk_get_data() failure
ublk: update UBLK_F_SUPPORT_ZERO_COPY comment in UAPI header
ublk: fix narrowing warnings in UAPI header
selftests: ublk: don't take same backing file for more than one ublk devices
ublk: build batch from IOs in same io_ring_ctx and io task
Linus Torvalds [Fri, 27 Jun 2025 15:55:57 +0000 (08:55 -0700)]
Merge tag 'io_uring-6.16-20250626' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Two tweaks for a recent fix: fixing a memory leak if multiple iovecs
were initially mapped but only the first was used and hence turned
into a UBUF rathan than an IOVEC iterator, and catching a case where
a retry would be done even if the previous segment wasn't full
- Small series fixing an issue making the vm unhappy if debugging is
turned on, hitting a VM_BUG_ON_PAGE()
- Fix a resource leak in io_import_dmabuf() in the error handling case,
which is a regression in this merge window
- Mark fallocate as needing to be write serialized, as is already done
for truncate and buffered writes
* tag 'io_uring-6.16-20250626' of git://git.kernel.dk/linux:
io_uring/kbuf: flag partial buffer mappings
io_uring/net: mark iov as dynamically allocated even for single segments
io_uring: fix resource leak in io_import_dmabuf()
io_uring: don't assume uaddr alignment in io_vec_fill_bvec
io_uring/rsrc: don't rely on user vaddr alignment
io_uring/rsrc: fix folio unpinning
io_uring: make fallocate be hashed work
Linus Torvalds [Fri, 27 Jun 2025 15:26:25 +0000 (08:26 -0700)]
Merge tag 's390-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Fix incorrectly dropped dereferencing of the stack nth entry
introduced with a previous KASAN false positive fix
- Use a proper memdup_array_user() helper to prevent overflow in a
protected key size calculation
* tag 's390-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ptrace: Fix pointer dereferencing in regs_get_kernel_stack_nth()
s390/pkey: Prevent overflow in size calculation for memdup_user()
Linus Torvalds [Fri, 27 Jun 2025 02:49:12 +0000 (19:49 -0700)]
Merge tag 'bcachefs-2025-06-26' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
- Lots of small check/repair fixes, primarily in subvol loop and
directory structure loop (when involving snapshots).
- Fix a few 6.16 regressions: rare UAF in the foreground allocator path
when taking a transaction restart from the transaction bump
allocator, and some small fallout from the change to log the error
being corrected in the journal when repairing errors, also some
fallout from the btree node read error logging improvements.
(Alan, Bharadwaj)
- New option: journal_rewind
This lets the entire filesystem be reset to an earlier point in time.
Note that this is only a disaster recovery tool, and right now there
are major caveats to using it (discards should be disabled, in
particular), but it successfully restored the filesystem of one of
the users who was bit by the subvolume deletion bug and didn't have
backups. I'll likely be making some changes to the discard path in
the future to make this a reliable recovery tool.
- Some new btree iterator tracepoints, for tracking down some
livelock-ish behaviour we've been seeing in the main data write path.
* tag 'bcachefs-2025-06-26' of git://evilpiepirate.org/bcachefs: (51 commits)
bcachefs: Plumb correct ip to trans_relock_fail tracepoint
bcachefs: Ensure we rewind to run recovery passes
bcachefs: Ensure btree node scan runs before checking for scanned nodes
bcachefs: btree_root_unreadable_and_scan_found_nothing should not be autofix
bcachefs: fix bch2_journal_keys_peek_prev_min() underflow
bcachefs: Use wait_on_allocator() when allocating journal
bcachefs: Check for bad write buffer key when moving from journal
bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blocked
bcachefs: Fix range in bch2_lookup_indirect_extent() error path
bcachefs: fix spurious error_throw
bcachefs: Add missing bch2_err_class() to fileattr_set()
bcachefs: Add missing key type checks to check_snapshot_exists()
bcachefs: Don't log fsck err in the journal if doing repair elsewhere
bcachefs: Fix *__bch2_trans_subbuf_alloc() error path
bcachefs: Fix missing newlines before ero
bcachefs: fix spurious error in read_btree_roots()
bcachefs: fsck: Fix oops in key_visible_in_snapshot()
bcachefs: fsck: fix unhandled restart in topology repair
bcachefs: fsck: Fix check_directory_structure when no check_dirents
bcachefs: Fix restart handling in btree_node_scrub_work()
...
Linus Torvalds [Thu, 26 Jun 2025 19:26:39 +0000 (12:26 -0700)]
Merge tag 'devicetree-fixes-for-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Convert altr,uart-1.0 and altr,juart-1.0 to DT schema. These were
applied for nios2, but never sent upstream.
- Fix extra '/' in fsl,ls1028a-reset '$id' path
- Fix warnings in ti,sn65dsi83 schema due to unnecessary $ref.
* tag 'devicetree-fixes-for-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: serial: Convert altr,uart-1.0 to DT schema
dt-bindings: serial: Convert altr,juart-1.0 to DT schema
dt-bindings: soc: fsl,ls1028a-reset: Drop extra "/" in $id
dt-bindings: drm/bridge: ti-sn65dsi83: drop $ref to fix lvds-vod* warnings
Jens Axboe [Thu, 26 Jun 2025 18:17:48 +0000 (12:17 -0600)]
io_uring/kbuf: flag partial buffer mappings
A previous commit aborted mapping more for a non-incremental ring for
bundle peeking, but depending on where in the process this peeking
happened, it would not necessarily prevent a retry by the user. That can
create gaps in the received/read data.
Add struct buf_sel_arg->partial_map, which can pass this information
back. The networking side can then map that to internal state and use it
to gate retry as well.
Since this necessitates a new flag, change io_sr_msg->retry to a
retry_flags member, and store both the retry and partial map condition
in there.
Cc: stable@vger.kernel.org Fixes: 26ec15e4b0c1 ("io_uring/kbuf: don't truncate end buffer for multiple buffer peeks") Signed-off-by: Jens Axboe <axboe@kernel.dk>
* tag 'nvme-6.16-2025-06-26' of git://git.infradead.org/nvme:
nvme: fix atomic write size validation
nvme: refactor the atomic write unit detection
nvme: reset delayed remove_work after reconnect
Christoph Hellwig [Wed, 11 Jun 2025 04:54:56 +0000 (06:54 +0200)]
nvme: fix atomic write size validation
Don't mix the namespace and controller values, and validate the
per-controller limit when probing the controller. This avoid spurious
failures for controllers with namespaces that have different namespaces
with different logical block sizes, or report the per-namespace values
only for some namespaces.
It also fixes a missing queue_limits_cancel_update in an error path by
removing that error path.
Fixes: 8695f060a029 ("nvme: all namespaces in a subsystem must adhere to a common atomic write size") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Tested-by: Yi Zhang <yi.zhang@redhat.com>
Keith Busch [Wed, 11 Jun 2025 04:50:47 +0000 (06:50 +0200)]
nvme: reset delayed remove_work after reconnect
The remove_work will proceed with permanently disconnecting on the
initial final path failure if the head shows no paths after the delay.
If a new path connects while the remove_work is pending, and if that new
path happens to disconnect before that remove_work executes, the delayed
removal should reset based on the most recent path disconnect time, but
queue_delayed_work() won't do anything if the work is already pending.
Attempt to cancel the delayed work when a new path connects, and use
mod_delayed_work() in case the remove_work remains pending anyway.
Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Jiawen Wu [Wed, 25 Jun 2025 02:39:24 +0000 (10:39 +0800)]
net: libwx: fix the creation of page_pool
'rx_ring->size' means the count of ring descriptors multiplied by the
size of one descriptor. When increasing the count of ring descriptors,
it may exceed the limit of pool size.
[ 864.209610] page_pool_create_percpu() gave up with errno -7
[ 864.209613] txgbe 0000:11:00.0: Page pool creation failed: -7
Fix to set the pool_size to the count of ring descriptors.
Fixes: 850b971110b2 ("net: libwx: Allocate Rx and Tx resources") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/434C72BFB40E350A+20250625023924.21821-1-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 24 Jun 2025 18:32:58 +0000 (11:32 -0700)]
net: selftests: fix TCP packet checksum
The length in the pseudo header should be the length of the L3 payload
AKA the L4 header+payload. The selftest code builds the packet from
the lower layers up, so all the headers are pushed already when it
constructs L4. We need to subtract the lower layer headers from skb->len.
Salvatore Bonaccorso [Wed, 25 Jun 2025 18:41:28 +0000 (20:41 +0200)]
ALSA: hda/realtek: Fix built-in mic on ASUS VivoBook X507UAR
The built-in mic of ASUS VivoBook X507UAR is broken recently by the fix
of the pin sort. The fixup ALC256_FIXUP_ASUS_MIC_NO_PRESENCE is working
for addressing the regression, too.
Fixes: 3b4309546b48 ("ALSA: hda: Fix headset detection failure due to unstable sort") Reported-by: Igor Tamara <igor.tamara@gmail.com> Closes: https://bugs.debian.org/1108069 Signed-off-by: Salvatore Bonaccorso <carnil@debian.org> Link: https://lore.kernel.org/CADdHDco7_o=4h_epjEAb92Dj-vUz_PoTC2-W9g5ncT2E0NzfeQ@mail.gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Thu, 26 Jun 2025 04:09:02 +0000 (21:09 -0700)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix use-after-free in libbpf when map is resized (Adin Scannell)
- Fix verifier assumptions about 2nd argument of bpf_sysctl_get_name
(Jerome Marchand)
- Fix verifier assumption of nullness of d_inode in dentry (Song Liu)
- Fix global starvation of LRU map (Willem de Bruijn)
- Fix potential NULL dereference in btf_dump__free (Yuan Chen)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: adapt one more case in test_lru_map to the new target_free
libbpf: Fix possible use-after-free for externs
selftests/bpf: Convert test_sysctl to prog_tests
bpf: Specify access type of bpf_sysctl_get_name args
libbpf: Fix null pointer dereference in btf_dump__free on allocation failure
bpf: Adjust free target to avoid global starvation of LRU map
bpf: Mark dentry->d_inode as trusted_or_null
Kent Overstreet [Wed, 25 Jun 2025 04:48:14 +0000 (00:48 -0400)]
bcachefs: Ensure we rewind to run recovery passes
Fix a 6.16 regression from the recovery pass rework, which introduced a
bug where calling bch2_run_explicit_recovery_pass() would only return
the error code to rewind recovery for the first call that scheduled that
recovery pass.
If the error code from the first call was swallowed (because it was
called by an asynchronous codepath), subsequent calls would go "ok, this
pass is already marked as needing to run" and return 0.
Fixing this ensures that check_topology bails out to run btree_node_scan
before doing any repair.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 25 Jun 2025 16:45:11 +0000 (12:45 -0400)]
bcachefs: Ensure btree node scan runs before checking for scanned nodes
Previously, calling bch2_btree_has_scanned_nodes() when btree node
scan hadn't actually run would erroniously return false - causing us to
think a btree was entirely gone.
This fixes a 6.16 regression from moving the scheduling of btree node
scan out of bch2_btree_lost_data() (fixing the bug where we'd schedule
it persistently in the superblock) and only scheduling it when
check_toploogy() is asking for scanned btree nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Linus Torvalds [Thu, 26 Jun 2025 03:48:48 +0000 (20:48 -0700)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount fixes from Al Viro:
"Several mount-related fixes"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
userns and mnt_idmap leak in open_tree_attr(2)
attach_recursive_mnt(): do not lock the covering tree when sliding something under it
replace collect_mounts()/drop_collected_mounts() with a safer variant
Kuniyuki Iwashima [Tue, 24 Jun 2025 21:45:00 +0000 (14:45 -0700)]
atm: Release atm_dev_mutex after removing procfs in atm_dev_deregister().
syzbot reported a warning below during atm_dev_register(). [0]
Before creating a new device and procfs/sysfs for it, atm_dev_register()
looks up a duplicated device by __atm_dev_lookup(). These operations are
done under atm_dev_mutex.
However, when removing a device in atm_dev_deregister(), it releases the
mutex just after removing the device from the list that __atm_dev_lookup()
iterates over.
So, there will be a small race window where the device does not exist on
the device list but procfs/sysfs are still not removed, triggering the
splat.
Let's hold the mutex until procfs/sysfs are removed in
atm_dev_deregister().
====================
netlink: specs: enforce strict naming of properties
I got annoyed once again by the name properties in the ethtool spec
which use underscore instead of dash. I previously assumed that there
is a lot of such properties in the specs so fixing them now would
be near impossible. On a closer look, however, I only found 22
(rough grep suggests we have ~4.8k names in the specs, so bad ones
are just 0.46%).
Add a regex to the JSON schema to enforce the naming, fix the few
bad names. I was hoping we could start enforcing this from newer
families, but there's no correlation between the protocol and the
number of errors. If anything classic netlink has more recently
added specs so it has fewer errors.
The regex is just for name properties which will end up visible
to the user (in Python or YNL CLI). I left the c-name properties
alone, those don't matter as much. C codegen rewrites them, anyway.
I'm not updating the spec for genetlink-c. Looks like it has no
users, new families use genetlink, all old ones need genetlink-legacy.
If these patches are merged I will remove genetlink-c completely
in net-next.
====================
Jakub Kicinski [Tue, 24 Jun 2025 21:10:02 +0000 (14:10 -0700)]
netlink: specs: enforce strict naming of properties
Add a regexp to make sure all names which may end up being visible
to the user consist of lower case characters, numbers and dashes.
Underscores keep sneaking into the specs, which is not visible
in the C code but makes the Python and alike inconsistent.
Note that starting with a number is okay, as in C the full
name will include the family name.
For legacy families we can't enforce the naming in the family
name or the multicast group names, as these are part of the
binary uAPI of the kernel.
For classic netlink we need to allow capital letters in names
of struct members. TC has some structs with capitalized members.
Jakub Kicinski [Tue, 24 Jun 2025 21:10:01 +0000 (14:10 -0700)]
netlink: specs: tc: replace underscores with dashes in names
We're trying to add a strict regexp for the name format in the spec.
Underscores will not be allowed, dashes should be used instead.
This makes no difference to C (codegen, if used, replaces special
chars in names) but it gives more uniform naming in Python.
Jakub Kicinski [Tue, 24 Jun 2025 21:10:00 +0000 (14:10 -0700)]
netlink: specs: rt-link: replace underscores with dashes in names
We're trying to add a strict regexp for the name format in the spec.
Underscores will not be allowed, dashes should be used instead.
This makes no difference to C (codegen, if used, replaces special
chars in names) but it gives more uniform naming in Python.
Fixes: b2f63d904e72 ("doc/netlink: Add spec for rt link messages") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250624211002.3475021-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 24 Jun 2025 21:09:59 +0000 (14:09 -0700)]
netlink: specs: mptcp: replace underscores with dashes in names
We're trying to add a strict regexp for the name format in the spec.
Underscores will not be allowed, dashes should be used instead.
This makes no difference to C (codegen, if used, replaces special
chars in names) but it gives more uniform naming in Python.
Fixes: bc8aeb2045e2 ("Documentation: netlink: add a YAML spec for mptcp") Reviewed-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250624211002.3475021-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 24 Jun 2025 21:09:58 +0000 (14:09 -0700)]
netlink: specs: ovs_flow: replace underscores with dashes in names
We're trying to add a strict regexp for the name format in the spec.
Underscores will not be allowed, dashes should be used instead.
This makes no difference to C (codegen, if used, replaces special
chars in names) but it gives more uniform naming in Python.
Fixes: 93b230b549bc ("netlink: specs: add ynl spec for ovs_flow") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Eelco Chaudron <echaudro@redhat.com> Link: https://patch.msgid.link/20250624211002.3475021-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 24 Jun 2025 21:09:57 +0000 (14:09 -0700)]
netlink: specs: devlink: replace underscores with dashes in names
We're trying to add a strict regexp for the name format in the spec.
Underscores will not be allowed, dashes should be used instead.
This makes no difference to C (codegen, if used, replaces special
chars in names) but it gives more uniform naming in Python.
Fixes: 429ac6211494 ("devlink: define enum for attr types of dynamic attributes") Fixes: f2f9dd164db0 ("netlink: specs: devlink: add the remaining command to generate complete split_ops") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250624211002.3475021-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 24 Jun 2025 21:09:56 +0000 (14:09 -0700)]
netlink: specs: dpll: replace underscores with dashes in names
We're trying to add a strict regexp for the name format in the spec.
Underscores will not be allowed, dashes should be used instead.
This makes no difference to C (codegen, if used, replaces special
chars in names) but it gives more uniform naming in Python.
Fixes: 3badff3a25d8 ("dpll: spec: Add Netlink spec in YAML") Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250624211002.3475021-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 24 Jun 2025 21:09:55 +0000 (14:09 -0700)]
netlink: specs: ethtool: replace underscores with dashes in names
We're trying to add a strict regexp for the name format in the spec.
Underscores will not be allowed, dashes should be used instead.
This makes no difference to C (codegen replaces special chars in names)
but gives more uniform naming in Python.
Fixes: 13e59344fb9d ("net: ethtool: add support for symmetric-xor RSS hash") Fixes: 46fb3ba95b93 ("ethtool: Add an interface for flashing transceiver modules' firmware") Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250624211002.3475021-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 24 Jun 2025 21:09:54 +0000 (14:09 -0700)]
netlink: specs: fou: replace underscores with dashes in names
We're trying to add a strict regexp for the name format in the spec.
Underscores will not be allowed, dashes should be used instead.
This makes no difference to C (codegen, if used, replaces special
chars in names) but it gives more uniform naming in Python.
Jakub Kicinski [Tue, 24 Jun 2025 21:09:53 +0000 (14:09 -0700)]
netlink: specs: nfsd: replace underscores with dashes in names
We're trying to add a strict regexp for the name format in the spec.
Underscores will not be allowed, dashes should be used instead.
This makes no difference to C (codegen, if used, replaces special
chars in names) but it gives more uniform naming in Python.
Fixes: 13727f85b49b ("NFSD: introduce netlink stubs") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20250624211002.3475021-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Tue, 24 Jun 2025 16:35:12 +0000 (17:35 +0100)]
net: enetc: Correct endianness handling in _enetc_rd_reg64
enetc_hw.h provides two versions of _enetc_rd_reg64.
One which simply calls ioread64() when available.
And another that composes the 64-bit result from ioread32() calls.
In the second case the code appears to assume that each ioread32() call
returns a little-endian value. However both the shift and logical or
used to compose the return value would not work correctly on big endian
systems if this were the case. Moreover, this is inconsistent with the
first case where the return value of ioread64() is assumed to be in host
byte order.
It appears that the correct approach is for both versions to treat the
return value of ioread*() functions as being in host byte order. And
this patch corrects the ioread32()-based version to do so.
This is a bug but would only manifest on big endian systems
that make use of the ioread32-based implementation of _enetc_rd_reg64.
While all in-tree users of this driver are little endian and
make use of the ioread64-based implementation of _enetc_rd_reg64.
Thus, no in-tree user of this driver is affected by this bug.
Adin Scannell [Wed, 25 Jun 2025 05:02:15 +0000 (22:02 -0700)]
libbpf: Fix possible use-after-free for externs
The `name` field in `obj->externs` points into the BTF data at initial
open time. However, some functions may invalidate this after opening and
before loading (e.g. `bpf_map__set_value_size`), which results in
pointers into freed memory and undefined behavior.
The simplest solution is to simply `strdup` these strings, similar to
the `essent_name`, and free them at the same time.
In order to test this path, the `global_map_resize` BPF selftest is
modified slightly to ensure the presence of an extern, which causes this
test to fail prior to the fix. Given there isn't an obvious API or error
to test against, I opted to add this to the existing test as an aspect
of the resizing feature rather than duplicate the test.
Li Ming [Fri, 20 Jun 2025 05:29:24 +0000 (13:29 +0800)]
cxl/edac: Fix using wrong repair type to check dram event record
cxl_find_rec_dram() is used to find a DRAM event record based on the
inputted attributes. Different repair_type of the inputted attributes
will check the DRAM event record in different ways.
When EDAC driver is performing a memory rank sparing, it should use
CXL_RANK_SPARING rather than CXL_BANK_SPARING as repair_type for DRAM
event record checking.
Fixes: 588ca944c277 ("cxl/edac: Add CXL memory device memory sparing control feature") Signed-off-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://patch.msgid.link/20250620052924.138892-1-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Linus Torvalds [Wed, 25 Jun 2025 18:20:14 +0000 (11:20 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Fixes all in drivers.
ufs and megaraid_sas are small and obvious.
The large diffstat in fnic comes from two pieces: the addition of
quite a bit of logging (no change to function) and the reworking of
the timeout allocation path for the two conditions that can occur
simultaneously to prevent reusing the same abort frame and then both
trying to free it"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fnic: Fix missing DMA mapping error in fnic_send_frame()
scsi: fnic: Set appropriate logging level for log message
scsi: fnic: Add and improve logs in FDMI and FDMI ABTS paths
scsi: fnic: Turn off FDMI ACTIVE flags on link down
scsi: fnic: Fix crash in fnic_wq_cmpl_handler when FDMI times out
scsi: ufs: core: Fix clk scaling to be conditional in reset and restore
scsi: megaraid_sas: Fix invalid node index
Linus Torvalds [Wed, 25 Jun 2025 18:13:31 +0000 (11:13 -0700)]
Merge tag 'uml-for-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML fixes from Johannes Berg:
- fix FP registers in seccomp mode
- prevent duplicate devices in VFIO support
- don't ignore errors in UBD thread start
- reduce stack use with clang 19
* tag 'uml-for-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: vector: Reduce stack usage in vector_eth_configure()
um: Use correct data source in fpregs_legacy_set()
um: vfio: Prevent duplicate device assignments
um: ubd: Add missing error check in start_io_thread()
Jakub Kicinski [Wed, 25 Jun 2025 17:26:16 +0000 (10:26 -0700)]
Merge tag 'wireless-2025-06-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Just a few fixes:
- iwlegacy: work around large stack with clang/kasan
- mac80211: fix integer overflow
- mac80211: fix link struct init vs. RCU publish
- iwlwifi: fix warning on IFF_UP
* tag 'wireless-2025-06-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: finish link init before RCU publish
wifi: iwlwifi: mvm: assume '1' as the default mac_config_cmd version
wifi: mac80211: fix beacon interval calculation overflow
wifi: iwlegacy: work around excessive stack usage on clang/kasan
====================
Jens Axboe [Wed, 25 Jun 2025 16:17:06 +0000 (10:17 -0600)]
io_uring/net: mark iov as dynamically allocated even for single segments
A bigger array of vecs could've been allocated, but
io_ring_buffers_peek() still decided to cap the mapped range depending
on how much data was available. Hence don't rely on the segment count
to know if the request should be marked as needing cleanup, always
check upfront if the iov array is different than the fast_iov array.
Fixes: 26ec15e4b0c1 ("io_uring/kbuf: don't truncate end buffer for multiple buffer peeks") Signed-off-by: Jens Axboe <axboe@kernel.dk>
Niklas Cassel [Tue, 24 Jun 2025 07:40:30 +0000 (09:40 +0200)]
ata: ahci: Use correct DMI identifier for ASUSPRO-D840SA LPM quirk
ASUS store the board name in DMI_PRODUCT_NAME rather than
DMI_PRODUCT_VERSION. (Apparently it is only Lenovo that stores the
model-name in DMI_PRODUCT_VERSION.)
Use the correct DMI identifier, DMI_PRODUCT_NAME, to match the
ASUSPRO-D840SA board, such that the quirk actually gets applied.
Cc: stable@vger.kernel.org Reported-by: Andy Yang <andyybtc79@gmail.com> Tested-by: Andy Yang <andyybtc79@gmail.com> Closes: https://lore.kernel.org/linux-ide/aFb3wXAwJSSJUB7o@ryzen/ Fixes: b5acc3628898 ("ata: ahci: Disallow LPM for ASUSPRO-D840SA motherboard") Reviewed-by: Hans de Goede <hansg@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250624074029.963028-2-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
Tiwei Bie [Mon, 23 Jun 2025 11:08:29 +0000 (19:08 +0800)]
um: vector: Reduce stack usage in vector_eth_configure()
When compiling with clang (19.1.7), initializing *vp using a compound
literal may result in excessive stack usage. Fix it by initializing the
required fields of *vp individually.
Pei Xiao [Tue, 24 Jun 2025 09:00:47 +0000 (17:00 +0800)]
ALSA: usb: qcom: fix NULL pointer dereference in qmi_stop_session
The find_substream() call may return NULL, but the error path
dereferenced 'subs' unconditionally via dev_err(&subs->dev->dev, ...),
causing a NULL pointer dereference when subs is NULL.
Fix by switching to &uadev[idx].udev->dev which is always valid
in this context.
Pavel Begunkov [Tue, 24 Jun 2025 13:40:34 +0000 (14:40 +0100)]
io_uring/rsrc: don't rely on user vaddr alignment
There is no guaranteed alignment for user pointers, however the
calculation of an offset of the first page into a folio after coalescing
uses some weird bit mask logic, get rid of it.
We can pin a tail page of a folio, but then io_uring will try to unpin
the head page of the folio. While it should be fine in terms of keeping
the page actually alive, mm folks say it's wrong and triggers a debug
warning. Use unpin_user_folio() instead of unpin_user_page*.
Cc: stable@vger.kernel.org Debugged-by: David Hildenbrand <david@redhat.com> Reported-by: syzbot+1d335893772467199ab6@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/683f1551.050a0220.55ceb.0017.GAE@google.com Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/io-uring/a28b0f87339ac2acf14a645dad1e95bbcbf18acd.1750771718.git.asml.silence@gmail.com/
[axboe: adapt to current tree, massage commit message] Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Tue, 24 Jun 2025 10:41:21 +0000 (18:41 +0800)]
ublk: setup ublk_io correctly in case of ublk_get_data() failure
If ublk_get_data() fails, -EIOCBQUEUED is returned and the current command
becomes ASYNC. And the only reason is that mapping data can't move on,
because of no enough pages or pending signal, then the current ublk request
has to be requeued.
Once the request need to be requeued, we have to setup `ublk_io` correctly,
including io->cmd and flags, otherwise the request may not be forwarded to
ublk server successfully.
Fixes: 9810362a57cb ("ublk: don't call ublk_dispatch_req() for NEED_GET_DATA") Reported-by: Changhui Zhong <czhong@redhat.com> Closes: https://lore.kernel.org/linux-block/CAGVVp+VN9QcpHUz_0nasFf5q9i1gi8H8j-G-6mkBoqa3TyjRHA@mail.gmail.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Changhui Zhong <czhong@redhat.com> Link: https://lore.kernel.org/r/20250624104121.859519-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
ublk: update UBLK_F_SUPPORT_ZERO_COPY comment in UAPI header
UBLK_F_SUPPORT_ZERO_COPY has a very old comment describing the initial
idea for how zero-copy would be implemented. The actual implementation
added in commit 1f6540e2aabb ("ublk: zc register/unregister bvec") uses
io_uring registered buffers rather than shared memory mapping.
Remove the inaccurate remarks about mapping ublk request memory into the
ublk server's address space and requiring 4K block size. Replace them
with a description of the current zero-copy mechanism.
When a C++ file compiled with -Wc++11-narrowing includes the UAPI header
linux/ublk_cmd.h, ublk_sqe_addr_to_auto_buf_reg()'s assignments of u64
values to u8, u16, and u32 fields result in compiler warnings. Add
explicit casts to the intended types to avoid these warnings. Drop the
unnecessary bitmasks.
Reported-by: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: 99c1e4eb6a3f ("ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG") Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250621162842.337452-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Wed, 25 Jun 2025 02:25:54 +0000 (10:25 +0800)]
ublk: build batch from IOs in same io_ring_ctx and io task
ublk_queue_cmd_list() dispatches the whole batch list by scheduling task
work via the tail request's io_uring_cmd, this way is fine even though
more than one io_ring_ctx are involved for this batch since it is just
one running context.
However, the task work handler ublk_cmd_list_tw_cb() takes `issue_flags`
of tail uring_cmd's io_ring_ctx for completing all commands. This way is
wrong if any uring_cmd is issued from different io_ring_ctx.
Fixes it by always building batch IOs from same io_ring_ctx and io task
because ublk_dispatch_req() does validate task context, and IO needs to
be aborted in case of running from fallback task work context.
For typical per-queue or per-io daemon implementation, this way shouldn't
make difference from performance viewpoint, because single io_ring_ctx is
taken in each daemon for normal use case.
From the crash dump, we found that the cpu_map_flush_list inside
redirect info is partially corrupted: its list_head->next points to
itself, but list_head->prev points to a valid list of unflushed bq
entries.
This turned out to be a result of missed XDP flush on redirect lists. By
digging in the actual source code, we found that
commit 7f0a168b0441 ("bnxt_en: Add completion ring pointer in TX and RX
ring structures") incorrectly overwrites the event mask for XDP_REDIRECT
in bnxt_rx_xdp. We can stably reproduce this crash by returning XDP_TX
and XDP_REDIRECT randomly for incoming packets in a naive XDP program.
Properly propagate the XDP_REDIRECT events back fixes the crash.
Fixes: a7559bc8c17c ("bnxt: support transmit and free of aggregation buffers") Tested-by: Andrew Rzeznik <arzeznik@cloudflare.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Link: https://patch.msgid.link/aFl7jpCNzscumuN2@debian.debian Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 25 Jun 2025 00:20:43 +0000 (17:20 -0700)]
Merge tag 'selinux-pr-20250624' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"Another small SELinux patch to fix a problem seen by the dracut-ng
folks during early boot when SELinux is enabled, but the policy has
yet to be loaded"
* tag 'selinux-pr-20250624' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: change security_compute_sid to return the ssid or tsid on match
If a userspace application just include <linux/vm_sockets.h> will fail
to build with the following errors:
/usr/include/linux/vm_sockets.h:182:39: error: invalid application of ‘sizeof’ to incomplete type ‘struct sockaddr’
182 | unsigned char svm_zero[sizeof(struct sockaddr) -
| ^~~~~~
/usr/include/linux/vm_sockets.h:183:39: error: ‘sa_family_t’ undeclared here (not in a function)
183 | sizeof(sa_family_t) -
|
Include <sys/socket.h> for userspace (guarded by ifndef __KERNEL__)
where `struct sockaddr` and `sa_family_t` are defined.
We already do something similar in <linux/mptcp.h> and <linux/if.h>.
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reported-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250623100053.40979-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alan Huang [Tue, 24 Jun 2025 19:10:27 +0000 (03:10 +0800)]
bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blocked
Reported-by: syzbot+d540192e763531d307ff@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Having PM put sync in remove function is causing PM underflow during
remove operation. This is caused by the function, runtime_pm_get_sync,
not being called anywhere during the op. Ensure that calls to
pm_runtime_enable()/pm_runtime_disable() and
pm_runtime_get_sync()/pm_runtime_put_sync() match.
Continuous bind/unbind will result in an "Unbalanced pm_runtime_enable" error.
Subsequent unbind attempts will return a "No such device" error, while bind
attempts will return a "Resource temporarily unavailable" error.
Also, change clk_disable_unprepare() to clk_disable() since continuous
bind and unbind operations will trigger a warning indicating that the clock is
already unprepared.
Al Viro [Tue, 24 Jun 2025 14:25:04 +0000 (10:25 -0400)]
userns and mnt_idmap leak in open_tree_attr(2)
Once want_mount_setattr() has returned a positive, it does require
finish_mount_kattr() to release ->mnt_userns. Failing do_mount_setattr()
does not change that.
As the result, we can end up leaking userns and possibly mnt_idmap as
well.
Fixes: c4a16820d901 ("fs: add open_tree_attr()") Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Johannes Berg [Tue, 24 Jun 2025 11:07:49 +0000 (13:07 +0200)]
wifi: mac80211: finish link init before RCU publish
Since the link/conf pointers can be accessed without any
protection other than RCU, make sure the data is actually
set up before publishing the structures.
Miri Korenblit [Tue, 24 Jun 2025 07:14:27 +0000 (10:14 +0300)]
wifi: iwlwifi: mvm: assume '1' as the default mac_config_cmd version
Unfortunately, FWs of some devices don't have the version of the
iwl_mac_config_cmd defined in the TLVs. We send 0 as the 'def argument
to iwl_fw_lookup_cmd_ver, so for such FWs, the return value will be 0,
leading to a warning, and to not sending the command.
Fix this by assuming that the default version is 1.
Harshit Mogalapalli [Mon, 23 Jun 2025 14:26:27 +0000 (07:26 -0700)]
ALSA: qc_audio_offload: Fix missing error code in prepare_qmi_response()
When snd_soc_usb_find_priv_data() fails, return failure instead of
success. While we are at it also use direct returns at first few error
paths where there is no additional cleanup needed.
Paolo Abeni [Tue, 24 Jun 2025 08:10:09 +0000 (10:10 +0200)]
Merge branch 'af_unix-fix-two-oob-issues'
Kuniyuki Iwashima says:
====================
af_unix: Fix two OOB issues.
From: Kuniyuki Iwashima <kuniyu@google.com>
Recently, two issues are reported regarding MSG_OOB.
Patch 1 fixes issues that happen when multiple consumed OOB
skbs are placed consecutively in the recv queue.
Patch 2 fixes an inconsistent behaviour that close()ing a socket
with a consumed OOB skb at the head of the recv queue triggers
-ECONNRESET on the peer's recv().
Kuniyuki Iwashima [Thu, 19 Jun 2025 04:13:57 +0000 (21:13 -0700)]
af_unix: Don't set -ECONNRESET for consumed OOB skb.
Christian Brauner reported that even after MSG_OOB data is consumed,
calling close() on the receiver socket causes the peer's recv() to
return -ECONNRESET:
Kuniyuki Iwashima [Thu, 19 Jun 2025 04:13:56 +0000 (21:13 -0700)]
af_unix: Add test for consecutive consumed OOB.
Let's add a test case where consecutive concumed OOB skbs stay
at the head of the queue.
Without the previous patch, ioctl(SIOCATMARK) assertion fails.
Before:
# RUN msg_oob.no_peek.ex_oob_ex_oob_oob ...
# msg_oob.c:305:ex_oob_ex_oob_oob:Expected answ[0] (0) == oob_head (1)
# ex_oob_ex_oob_oob: Test terminated by assertion
# FAIL msg_oob.no_peek.ex_oob_ex_oob_oob
not ok 12 msg_oob.no_peek.ex_oob_ex_oob_oob
After:
# RUN msg_oob.no_peek.ex_oob_ex_oob_oob ...
# OK msg_oob.no_peek.ex_oob_ex_oob_oob
ok 12 msg_oob.no_peek.ex_oob_ex_oob_oob
Even though a user reads OOB data, the skb holding the data stays on
the recv queue to mark the OOB boundary and break the next recv().
After the last send() in the scenario above, the sk2's recv queue has
2 leading consumed OOB skbs and 1 real OOB skb.
Then, the following happens during the next recv() without MSG_OOB
1. unix_stream_read_generic() peeks the first consumed OOB skb
2. manage_oob() returns the next consumed OOB skb
3. unix_stream_read_generic() fetches the next not-yet-consumed OOB skb
4. unix_stream_read_generic() reads and frees the OOB skb
, and the last recv(MSG_OOB) triggers KASAN splat.
The 3. above occurs because of the SO_PEEK_OFF code, which does not
expect unix_skb_len(skb) to be 0, but this is true for such consumed
OOB skbs.
In addition to this use-after-free, there is another issue that
ioctl(SIOCATMARK) does not function properly with consecutive consumed
OOB skbs.
So, nothing good comes out of such a situation.
Instead of complicating manage_oob(), ioctl() handling, and the next
ECONNRESET fix by introducing a loop for consecutive consumed OOB skbs,
let's not leave such consecutive OOB unnecessarily.
Now, while receiving an OOB skb in unix_stream_recv_urg(), if its
previous skb is a consumed OOB skb, it is freed.
[0]:
BUG: KASAN: slab-use-after-free in unix_stream_read_actor (net/unix/af_unix.c:3027)
Read of size 4 at addr ffff888106ef2904 by task python3/315
The buggy address belongs to the object at ffff888106ef28c0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 68 bytes inside of
freed 224-byte region [ffff888106ef28c0, ffff888106ef29a0)
Memory state around the buggy address: ffff888106ef2800: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc ffff888106ef2880: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
>ffff888106ef2900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff888106ef2980: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ffff888106ef2a00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Arnd Bergmann [Fri, 20 Jun 2025 11:39:42 +0000 (13:39 +0200)]
wifi: iwlegacy: work around excessive stack usage on clang/kasan
In some rare randconfig builds, I seem to trigger a bug in clang where
it unrolls a loop but then runs out of registers, which then get
spilled to the stack:
This seems to be the same one I saw in the omapdrm driver, and there is
an easy workaround by not inlining the il4965_rs_rate_scale_clear_win
function.
====================
bpf: Specify access type of bpf_sysctl_get_name args
The second argument of bpf_sysctl_get_name() helper is a pointer to a
buffer that is being written to. However that isn't specify in the
prototype. Until commit 37cce22dbd51a ("bpf: verifier: Refactor helper
access type tracking") that mistake was hidden by the way the verifier
treated helper accesses. Since then, the verifier, working on wrong
infromation from the prototype, can make faulty optimization that
would had been caught by the test_sysctl selftests if it was run by
the CI.
The first patch fixes bpf_sysctl_get_name prototype.
The second patch converts the test_sysctl to prog_tests so that it
will be run by the CI and catch similar issues in the future.
Changes in v3:
- Use ASSERT* macro instead of CHECK_FAIL.
- Remove useless code.
Changes in v2:
- Replace ARG_PTR_TO_UNINIT_MEM by ARG_PTR_TO_MEM | MEM_WRITE.
- Converts test_sysctl to prog_tests.
====================
Jerome Marchand [Thu, 19 Jun 2025 14:06:02 +0000 (16:06 +0200)]
bpf: Specify access type of bpf_sysctl_get_name args
The second argument of bpf_sysctl_get_name() helper is a pointer to a
buffer that is being written to. However that isn't specify in the
prototype.
Until commit 37cce22dbd51a ("bpf: verifier: Refactor helper access
type tracking"), all helper accesses were considered as a possible
write access by the verifier, so no big harm was done. However, since
then, the verifier might make wrong asssumption about the content of
that address which might lead it to make faulty optimizations (such as
removing code that was wrongly labeled dead). This is what happens in
test_sysctl selftest to the tests related to sysctl_get_name.
Add MEM_WRITE flag the second argument of bpf_sysctl_get_name().
Ido Schimmel [Thu, 19 Jun 2025 18:22:28 +0000 (21:22 +0300)]
bridge: mcast: Fix use-after-free during router port configuration
The bridge maintains a global list of ports behind which a multicast
router resides. The list is consulted during forwarding to ensure
multicast packets are forwarded to these ports even if the ports are not
member in the matching MDB entry.
When per-VLAN multicast snooping is enabled, the per-port multicast
context is disabled on each port and the port is removed from the global
router port list:
# ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1
# ip link add name dummy1 up master br1 type dummy
# ip link set dev dummy1 type bridge_slave mcast_router 2
$ bridge -d mdb show | grep router
router ports on br1: dummy1
# ip link set dev br1 type bridge mcast_vlan_snooping 1
$ bridge -d mdb show | grep router
However, the port can be re-added to the global list even when per-VLAN
multicast snooping is enabled:
# ip link set dev dummy1 type bridge_slave mcast_router 0
# ip link set dev dummy1 type bridge_slave mcast_router 2
$ bridge -d mdb show | grep router
router ports on br1: dummy1
Since commit 4b30ae9adb04 ("net: bridge: mcast: re-implement
br_multicast_{enable, disable}_port functions"), when per-VLAN multicast
snooping is enabled, multicast disablement on a port will disable the
per-{port, VLAN} multicast contexts and not the per-port one. As a
result, a port will remain in the global router port list even after it
is deleted. This will lead to a use-after-free [1] when the list is
traversed (when adding a new port to the list, for example):
# ip link del dev dummy1
# ip link add name dummy2 up master br1 type dummy
# ip link set dev dummy2 type bridge_slave mcast_router 2
Similarly, stale entries can also be found in the per-VLAN router port
list. When per-VLAN multicast snooping is disabled, the per-{port, VLAN}
contexts are disabled on each port and the port is removed from the
per-VLAN router port list:
# ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1
# ip link add name dummy1 up master br1 type dummy
# bridge vlan add vid 2 dev dummy1
# bridge vlan global set vid 2 dev br1 mcast_snooping 1
# bridge vlan set vid 2 dev dummy1 mcast_router 2
$ bridge vlan global show dev br1 vid 2 | grep router
router ports: dummy1
# ip link set dev br1 type bridge mcast_vlan_snooping 0
$ bridge vlan global show dev br1 vid 2 | grep router
However, the port can be re-added to the per-VLAN list even when
per-VLAN multicast snooping is disabled:
# bridge vlan set vid 2 dev dummy1 mcast_router 0
# bridge vlan set vid 2 dev dummy1 mcast_router 2
$ bridge vlan global show dev br1 vid 2 | grep router
router ports: dummy1
When the VLAN is deleted from the port, the per-{port, VLAN} multicast
context will not be disabled since multicast snooping is not enabled
on the VLAN. As a result, the port will remain in the per-VLAN router
port list even after it is no longer member in the VLAN. This will lead
to a use-after-free [2] when the list is traversed (when adding a new
port to the list, for example):
# ip link add name dummy2 up master br1 type dummy
# bridge vlan add vid 2 dev dummy2
# bridge vlan del vid 2 dev dummy1
# bridge vlan set vid 2 dev dummy2 mcast_router 2
Fix these issues by removing the port from the relevant (global or
per-VLAN) router port list in br_multicast_port_ctx_deinit(). The
function is invoked during port deletion with the per-port multicast
context and during VLAN deletion with the per-{port, VLAN} multicast
context.
Note that deleting the multicast router timer is not enough as it only
takes care of the temporary multicast router states (1 or 3) and not the
permanent one (2).
[1]
BUG: KASAN: slab-out-of-bounds in br_multicast_add_router.part.0+0x3f1/0x560
Write of size 8 at addr ffff888004a67328 by task ip/384
[...]
Call Trace:
<TASK>
dump_stack_lvl+0x6f/0xa0
print_address_description.constprop.0+0x6f/0x350
print_report+0x108/0x205
kasan_report+0xdf/0x110
br_multicast_add_router.part.0+0x3f1/0x560
br_multicast_set_port_router+0x74e/0xac0
br_setport+0xa55/0x1870
br_port_slave_changelink+0x95/0x120
__rtnl_newlink+0x5e8/0xa40
rtnl_newlink+0x627/0xb00
rtnetlink_rcv_msg+0x6fb/0xb70
netlink_rcv_skb+0x11f/0x350
netlink_unicast+0x426/0x710
netlink_sendmsg+0x75a/0xc20
__sock_sendmsg+0xc1/0x150
____sys_sendmsg+0x5aa/0x7b0
___sys_sendmsg+0xfc/0x180
__sys_sendmsg+0x124/0x1c0
do_syscall_64+0xbb/0x360
entry_SYSCALL_64_after_hwframe+0x4b/0x53
[2]
BUG: KASAN: slab-use-after-free in br_multicast_add_router.part.0+0x378/0x560
Read of size 8 at addr ffff888009f00840 by task bridge/391
[...]
Call Trace:
<TASK>
dump_stack_lvl+0x6f/0xa0
print_address_description.constprop.0+0x6f/0x350
print_report+0x108/0x205
kasan_report+0xdf/0x110
br_multicast_add_router.part.0+0x378/0x560
br_multicast_set_port_router+0x6f9/0xac0
br_vlan_process_options+0x8b6/0x1430
br_vlan_rtm_process_one+0x605/0xa30
br_vlan_rtm_process+0x396/0x4c0
rtnetlink_rcv_msg+0x2f7/0xb70
netlink_rcv_skb+0x11f/0x350
netlink_unicast+0x426/0x710
netlink_sendmsg+0x75a/0xc20
__sock_sendmsg+0xc1/0x150
____sys_sendmsg+0x5aa/0x7b0
___sys_sendmsg+0xfc/0x180
__sys_sendmsg+0x124/0x1c0
do_syscall_64+0xbb/0x360
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Linus Torvalds [Mon, 23 Jun 2025 22:02:57 +0000 (15:02 -0700)]
Merge tag 'for-6.16/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mikulas Patocka:
- dm-crypt: fix a crash on 32-bit machines
- dm-raid: replace "rdev" with correct loop variable name "r"
* tag 'for-6.16/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-raid: fix variable in journal device check
dm-crypt: Extend state buffer size in crypt_iv_lmk_one
Linus Torvalds [Mon, 23 Jun 2025 21:55:40 +0000 (14:55 -0700)]
Merge tag 'f2fs-for-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
- fix double-unlock introduced by the recent folio conversion
- fix stale page content beyond EOF complained by xfstests/generic/363
* tag 'f2fs-for-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: fix to zero post-eof page
f2fs: Fix __write_node_folio() conversion
Breno Leitao [Fri, 20 Jun 2025 18:48:55 +0000 (11:48 -0700)]
net: netpoll: Initialize UDP checksum field before checksumming
commit f1fce08e63fe ("netpoll: Eliminate redundant assignment") removed
the initialization of the UDP checksum, which was wrong and broke
netpoll IPv6 transmission due to bad checksumming.
udph->check needs to be set before calling csum_ipv6_magic().
Linus Torvalds [Mon, 23 Jun 2025 18:16:38 +0000 (11:16 -0700)]
Merge tag 'for-6.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Fixes:
- fix invalid inode pointer dereferences during log replay
- fix a race between renames and directory logging
- fix shutting down delayed iput worker
- fix device byte accounting when dropping chunk
- in zoned mode, fix offset calculations for DUP profile when
conventional and sequential zones are used together
Regression fixes:
- fix possible double unlock of extent buffer tree (xarray
conversion)
- in zoned mode, fix extent buffer refcount when writing out extents
(xarray conversion)
Error handling fixes and updates:
- handle unexpected extent type when replaying log
- check and warn if there are remaining delayed inodes when putting a
root
- fix assertion when building free space tree
- handle csum tree error with mount option 'rescue=ibadroot'
Other:
- error message updates: add prefix to all scrub related messages,
include other information in messages"
* tag 'for-6.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: fix alloc_offset calculation for partly conventional block groups
btrfs: handle csum tree error with rescue=ibadroots correctly
btrfs: fix race between async reclaim worker and close_ctree()
btrfs: fix assertion when building free space tree
btrfs: don't silently ignore unexpected extent type when replaying log
btrfs: fix invalid inode pointer dereferences during log replay
btrfs: fix double unlock of buffer_tree xarray when releasing subpage eb
btrfs: update superblock's device bytes_used when dropping chunk
btrfs: fix a race between renames and directory logging
btrfs: scrub: add prefix for the error messages
btrfs: warn if leaking delayed_nodes in btrfs_put_root()
btrfs: fix delayed ref refcount leak in debug assertion
btrfs: include root in error message when unlinking inode
btrfs: don't drop a reference if btrfs_check_write_meta_pointer() fails