Miguel Ojeda [Fri, 10 Oct 2025 17:43:49 +0000 (19:43 +0200)]
docs: rust: add section on imports formatting
`rustfmt`, by default, formats imports in a way that is prone to conflicts
while merging and rebasing, since in some cases it condenses several
items into the same line.
For instance, Linus mentioned [1] that the following case:
use crate::{
fmt,
page::AsPageIter,
};
is compressed by `rustfmt` into:
use crate::{fmt, page::AsPageIter};
which is undesirable.
Similarly, `rustfmt` may put several items in the same line even if the
braces span already multiple lines, e.g.:
use kernel::{
acpi, c_str,
device::{property, Core},
of, platform,
};
The options that control the formatting behavior around imports are
generally unstable, and `rustfmt` releases do not allow to use nightly
features, unlike the compiler and other Rust tooling [2].
For the moment, we can introduce a workaround to prevent `rustfmt`
from compressing the example above -- the "trailing empty comment":
use crate::{
fmt,
page::AsPageIter, //
};
which is reminiscent of the trailing comma behavior in other formatters.
We already used empty comments for formatting purposes in the past,
e.g. in commit b9b701fce49a ("rust: clarify the language unstable features
in use").
In addition, `rustfmt` actually reformats with a vertical layout (i.e. it
does not put two items in the same line) when seeing such a comment,
i.e. it doesn't just preserve the formatting, which is good in the sense
that we can use it to easily reformat some imports, since it matches
the style we generally want to have.
A Git merge driver would help (suggested by Gary and Wedson), though
maintainers would need to set it up, the diffs would still be larger
and the formatting rules for imports would remain hard to predict.
Thus document the style that we will follow in the coding guidelines
by introducing a new section and explain how the trailing empty comment
works there too.
We discussed the issue with upstream Rust in our usual Rust <-> Rust
for Linux meeting [3], and there have also been a few other discussions
in parallel in issues [4][5] and Zulip [6]. We will see what happens,
but upstream Rust has already created a subteam of `rustfmt` to try
to overcome the bandwidth issue [7], which is a good signal, and some
organization work has already started (e.g. tracking issues). We will
continue our discussions with them about it.
Linus Torvalds [Thu, 16 Oct 2025 17:22:38 +0000 (10:22 -0700)]
Merge tag 'for-6.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- in tree-checker fix extref bounds check
- reorder send context structure to avoid
-Wflex-array-member-not-at-end warning
- fix extent readahead length for compressed extents
- fix memory leaks on error paths (qgroup assign ioctl, zone loading
with raid stripe tree enabled)
- fix how device specific mount options are applied, in particular the
'ssd' option will be set unexpectedly
- fix tracking of relocation state when tasks are running and
cancellation is attempted
- adjust assertion condition for folios allocated for scrub
- remove incorrect assertion checking for block group when populating
free space tree
* tag 'for-6.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: send: fix -Wflex-array-member-not-at-end warning in struct send_ctx
btrfs: tree-checker: fix bounds check in check_inode_extref()
btrfs: fix memory leaks when rejecting a non SINGLE data profile without an RST
btrfs: fix incorrect readahead expansion length
btrfs: do not assert we found block group item when creating free space tree
btrfs: do not use folio_test_partial_kmap() in ASSERT()s
btrfs: only set the device specific options after devices are opened
btrfs: fix memory leak on duplicated memory in the qgroup assign ioctl
btrfs: fix clearing of BTRFS_FS_RELOC_RUNNING if relocation already running
Linus Torvalds [Thu, 16 Oct 2025 17:16:41 +0000 (10:16 -0700)]
Merge tag 'v6.18-rc1-smb-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix RPC hang due to locking bug
- Fix for memory leak in read and refcount leak (in session setup)
- Minor cleanup
* tag 'v6.18-rc1-smb-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix recursive locking in RPC handle list access
smb/server: fix possible refcount leak in smb2_sess_setup()
smb/server: fix possible memory leak in smb2_read()
smb: server: Use common error handling code in smb_direct_rdma_xmit()
Linus Torvalds [Thu, 16 Oct 2025 16:41:21 +0000 (09:41 -0700)]
Merge tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from CAN
Current release - regressions:
- udp: do not use skb_release_head_state() before
skb_attempt_defer_free()
- gro_cells: use nested-BH locking for gro_cell
- dpll: zl3073x: increase maximum size of flash utility
Previous releases - regressions:
- core: fix lockdep splat on device unregister
- tcp: fix tcp_tso_should_defer() vs large RTT
- tls:
- don't rely on tx_work during send()
- wait for pending async decryptions if tls_strp_msg_hold fails
- can: j1939: add missing calls in NETDEV_UNREGISTER notification
handler
- eth: lan78xx: fix lost EEPROM write timeout in
lan78xx_write_raw_eeprom
Previous releases - always broken:
- ip6_tunnel: prevent perpetual tunnel growth
- dpll: zl3073x: handle missing or corrupted flash configuration
- can: m_can: fix pm_runtime and CAN state handling
- eth:
- ixgbe: fix too early devlink_free() in ixgbe_remove()
- ixgbevf: fix mailbox API compatibility
- gve: Check valid ts bit on RX descriptor before hw timestamping
- idpf: cleanup remaining SKBs in PTP flows
- r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H"
* tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
udp: do not use skb_release_head_state() before skb_attempt_defer_free()
net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset
netdevsim: set the carrier when the device goes up
selftests: tls: add test for short splice due to full skmsg
selftests: net: tls: add tests for cmsg vs MSG_MORE
tls: don't rely on tx_work during send()
tls: wait for pending async decryptions if tls_strp_msg_hold fails
tls: always set record_type in tls_process_cmsg
tls: wait for async encrypt in case of error during latter iterations of sendmsg
tls: trim encrypted message to match the plaintext on short splice
tg3: prevent use of uninitialized remote_adv and local_adv variables
MAINTAINERS: new entry for IPv6 IOAM
gve: Check valid ts bit on RX descriptor before hw timestamping
net: core: fix lockdep splat on device unregister
MAINTAINERS: add myself as maintainer for b53
selftests: net: check jq command is supported
net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit()
tcp: fix tcp_tso_should_defer() vs large RTT
r8152: add error handling in rtl8152_driver_init
usbnet: Fix using smp_processor_id() in preemptible code warnings
...
Linus Torvalds [Thu, 16 Oct 2025 16:39:29 +0000 (09:39 -0700)]
Merge tag 'ata-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
- Do not print an error message (and assume that the General Purpose
Log Directory log page is not supported) for a device that reports a
bogus General Purpose Logging Version.
Unsurprisingly, many vendors fail to report the only valid General
Purpose Logging Version (Damien)
* tag 'ata-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: relax checks in ata_read_log_directory()
Eric Dumazet [Wed, 15 Oct 2025 05:27:15 +0000 (05:27 +0000)]
udp: do not use skb_release_head_state() before skb_attempt_defer_free()
Michal reported and bisected an issue after recent adoption
of skb_attempt_defer_free() in UDP.
The issue here is that skb_release_head_state() is called twice per skb,
one time from skb_consume_udp(), then a second time from skb_defer_free_flush()
and napi_consume_skb().
As Sabrina suggested, remove skb_release_head_state() call from
skb_consume_udp().
Add a DEBUG_NET_WARN_ON_ONCE(skb_nfct(skb)) in skb_attempt_defer_free()
Many thanks to Michal, Sabrina, Paolo and Florian for their help.
Fixes: 6471658dc66c ("udp: use skb_attempt_defer_free()") Reported-and-bisected-by: Michal Kubecek <mkubecek@suse.cz> Closes: https://lore.kernel.org/netdev/gpjh4lrotyephiqpuldtxxizrsg6job7cvhiqrw72saz2ubs3h@g6fgbvexgl3r/ Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Michal Kubecek <mkubecek@suse.cz> Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20251015052715.4140493-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
I Viswanath [Mon, 13 Oct 2025 18:16:48 +0000 (23:46 +0530)]
net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset
dev->chipid is used in lan78xx_init_mac_address before it's initialized:
lan78xx_reset() {
lan78xx_init_mac_address()
lan78xx_read_eeprom()
lan78xx_read_raw_eeprom() <- dev->chipid is used here
dev->chipid = ... <- dev->chipid is initialized correctly here
}
Reorder initialization so that dev->chipid is set before calling
lan78xx_init_mac_address().
Fixes: a0db7d10b76e ("lan78xx: Add to handle mux control per chip id") Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Khalid Aziz <khalid@kernel.org> Link: https://patch.msgid.link/20251013181648.35153-1-viswanathiyyappan@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 16 Oct 2025 00:56:20 +0000 (17:56 -0700)]
Merge tag 'linux-can-fixes-for-6.18-20251014' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-10-14
The first 2 paches are by Celeste Liu and target the gS_usb driver.
The first patch remove the limitation to 3 CAN interface per USB
device. The second patch adds the missing population of
net_device->dev_port.
The next 4 patches are by me and fix the m_can driver. They add a
missing pm_runtime_disable(), fix the CAN state transition back to
Error Active and fix the state after ifup and suspend/resume.
Another patch by me targets the m_can driver, too and replaces Dong
Aisheng's old email address.
The next 2 patches are by Vincent Mailhol and update the CAN
networking Documentation.
Tetsuo Handa contributes the last patch that add missing cleanup calls
in the NETDEV_UNREGISTER notification handler.
* tag 'linux-can-fixes-for-6.18-20251014' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: j1939: add missing calls in NETDEV_UNREGISTER notification handler
can: add Transmitter Delay Compensation (TDC) documentation
can: remove false statement about 1:1 mapping between DLC and length
can: m_can: replace Dong Aisheng's old email address
can: m_can: fix CAN state in system PM
can: m_can: m_can_chip_config(): bring up interface in correct state
can: m_can: m_can_handle_state_errors(): fix CAN state transition to Error Active
can: m_can: m_can_plat_remove(): add missing pm_runtime_disable()
can: gs_usb: gs_make_candev(): populate net_device->dev_port
can: gs_usb: increase max interface to U8_MAX
====================
Breno Leitao [Tue, 14 Oct 2025 09:17:25 +0000 (02:17 -0700)]
netdevsim: set the carrier when the device goes up
Bringing a linked netdevsim device down and then up causes communication
failure because both interfaces lack carrier. Basically a ifdown/ifup on
the interface make the link broken.
Commit 3762ec05a9fbda ("netdevsim: add NAPI support") added supported
for NAPI, calling netif_carrier_off() in nsim_stop(). This patch
re-enables the carrier symmetrically on nsim_open(), in case the device
is linked and the peer is up.
Jakub Kicinski [Thu, 16 Oct 2025 00:41:47 +0000 (17:41 -0700)]
Merge branch 'tls-misc-bugfixes'
Sabrina Dubroca says:
====================
tls: misc bugfixes
Jann Horn reported multiple bugs in kTLS. This series addresses them,
and adds some corresponding selftests for those that are reproducible
(and without failure injection).
====================
Sabrina Dubroca [Tue, 14 Oct 2025 09:17:01 +0000 (11:17 +0200)]
selftests: net: tls: add tests for cmsg vs MSG_MORE
We don't have a test to check that MSG_MORE won't let us merge records
of different types across sendmsg calls.
Add new tests that check:
- MSG_MORE is only allowed for DATA records
- a pending DATA record gets closed and pushed before a non-DATA
record is processed
Sabrina Dubroca [Tue, 14 Oct 2025 09:17:00 +0000 (11:17 +0200)]
tls: don't rely on tx_work during send()
With async crypto, we rely on tx_work to actually transmit records
once encryption completes. But while send() is running, both the
tx_lock and socket lock are held, so tx_work_handler cannot process
the queue of encrypted records, and simply reschedules itself. During
a large send(), this could last a long time, and use a lot of memory.
Transmit any pending encrypted records before restarting the main
loop of tls_sw_sendmsg_locked.
Sabrina Dubroca [Tue, 14 Oct 2025 09:16:59 +0000 (11:16 +0200)]
tls: wait for pending async decryptions if tls_strp_msg_hold fails
Async decryption calls tls_strp_msg_hold to create a clone of the
input skb to hold references to the memory it uses. If we fail to
allocate that clone, proceeding with async decryption can lead to
various issues (UAF on the skb, writing into userspace memory after
the recv() call has returned).
In this case, wait for all pending decryption requests.
Sabrina Dubroca [Tue, 14 Oct 2025 09:16:58 +0000 (11:16 +0200)]
tls: always set record_type in tls_process_cmsg
When userspace wants to send a non-DATA record (via the
TLS_SET_RECORD_TYPE cmsg), we need to send any pending data from a
previous MSG_MORE send() as a separate DATA record. If that DATA record
is encrypted asynchronously, tls_handle_open_record will return
-EINPROGRESS. This is currently treated as an error by
tls_process_cmsg, and it will skip setting record_type to the correct
value, but the caller (tls_sw_sendmsg_locked) handles that return
value correctly and proceeds with sending the new message with an
incorrect record_type (DATA instead of whatever was requested in the
cmsg).
Always set record_type before handling the open record. If
tls_handle_open_record returns an error, record_type will be
ignored. If it succeeds, whether with synchronous crypto (returning 0)
or asynchronous (returning -EINPROGRESS), the caller will proceed
correctly.
Sabrina Dubroca [Tue, 14 Oct 2025 09:16:57 +0000 (11:16 +0200)]
tls: wait for async encrypt in case of error during latter iterations of sendmsg
If we hit an error during the main loop of tls_sw_sendmsg_locked (eg
failed allocation), we jump to send_end and immediately
return. Previous iterations may have queued async encryption requests
that are still pending. We should wait for those before returning, as
we could otherwise be reading from memory that userspace believes
we're not using anymore, which would be a sort of use-after-free.
This is similar to what tls_sw_recvmsg already does: failures during
the main loop jump to the "wait for async" code, not straight to the
unlock/return.
Sabrina Dubroca [Tue, 14 Oct 2025 09:16:56 +0000 (11:16 +0200)]
tls: trim encrypted message to match the plaintext on short splice
During tls_sw_sendmsg_locked, we pre-allocate the encrypted message
for the size we're expecting to send during the current iteration, but
we may end up sending less, for example when splicing: if we're
getting the data from small fragments of memory, we may fill up all
the slots in the skmsg with less data than expected.
In this case, we need to trim the encrypted message to only the length
we actually need, to avoid pushing uninitialized bytes down the
underlying TCP socket.
Justin Iurman [Tue, 14 Oct 2025 17:06:50 +0000 (19:06 +0200)]
MAINTAINERS: new entry for IPv6 IOAM
Create a maintainer entry for IPv6 IOAM. Add myself as I authored most
if not all of the IPv6 IOAM code in the kernel and actively participate
in the related IETF groups.
Tim Hostetler [Tue, 14 Oct 2025 00:47:39 +0000 (00:47 +0000)]
gve: Check valid ts bit on RX descriptor before hw timestamping
The device returns a valid bit in the LSB of the low timestamp byte in
the completion descriptor that the driver should check before
setting the SKB's hardware timestamp. If the timestamp is not valid, do not
hardware timestamp the SKB.
Cc: stable@vger.kernel.org Fixes: b2c7aeb49056 ("gve: Implement ndo_hwtstamp_get/set for RX timestamping") Reviewed-by: Joshua Washington <joshwash@google.com> Signed-off-by: Tim Hostetler <thostet@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20251014004740.2775957-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 15 Oct 2025 14:57:28 +0000 (07:57 -0700)]
Remove long-stale ext3 defconfig option
Inspired by commit c065b6046b34 ("Use CONFIG_EXT4_FS instead of
CONFIG_EXT3_FS in all of the defconfigs") I looked around for any other
left-over EXT3 config options, and found some old defconfig files still
mentioned CONFIG_EXT3_DEFAULTS_TO_ORDERED.
That config option was removed a decade ago in commit c290ea01abb7 ("fs:
Remove ext3 filesystem driver"). It had a good run, but let's remove it
for good.
Linus Torvalds [Wed, 15 Oct 2025 14:51:57 +0000 (07:51 -0700)]
Merge tag 'ext4_for_linus-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bug fixes from Ted Ts'o:
- Fix regression caused by removing CONFIG_EXT3_FS when testing some
very old defconfigs
- Avoid a BUG_ON when opening a file on a maliciously corrupted file
system
- Avoid mm warnings when freeing a very large orphan file metadata
- Avoid a theoretical races between metadata writeback and checkpoints
(it's very hard to hit in practice, since the race requires that the
writeback take a very long time)
* tag 'ext4_for_linus-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
Use CONFIG_EXT4_FS instead of CONFIG_EXT3_FS in all of the defconfigs
ext4: free orphan info with kvfree
ext4: detect invalid INLINE_DATA + EXTENTS flag combination
ext4, doc: fix and improve directory hash tree description
ext4: wait for ongoing I/O to complete before freeing blocks
jbd2: ensure that all ongoing I/O complete before freeing blocks
Nicolas Frattaroli [Mon, 13 Oct 2025 07:34:04 +0000 (09:34 +0200)]
PM / devfreq: rockchip-dfi: switch to FIELD_PREP_WM16 macro
The era of hand-rolled HIWORD_UPDATE macros is over, at least for those
drivers that use constant masks.
Like many other Rockchip drivers, rockchip-dfi brings with it its own
HIWORD_UPDATE macro. This variant doesn't shift the value (and like the
others, doesn't do any checking).
Remove it, and replace instances of it with hw_bitfield.h's
FIELD_PREP_WM16. Since FIELD_PREP_WM16 requires contiguous masks and
shifts the value for us, some reshuffling of definitions needs to
happen.
This gives us better compile-time error checking, and in my opinion,
nicer code.
Tested on an RK3568 ODROID-M1 board (LPDDR4X at 1560 MHz, an RK3588
Radxa ROCK 5B board (LPDDR4X at 2112 MHz) and an RK3588 Radxa ROCK 5T
board (LPDDR5 at 2400 MHz). perf measurements were consistent with the
measurements of stress-ng --stream in all cases.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Kernel side:
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:0 state:D stack:0 pid:5021 tgid:5021 ppid:2 flags:0x00200000
Workqueue: ksmbd-io handle_ksmbd_work
Call trace:
__schedule from schedule+0x3c/0x58
schedule from schedule_preempt_disabled+0xc/0x10
schedule_preempt_disabled from rwsem_down_read_slowpath+0x1b0/0x1d8
rwsem_down_read_slowpath from down_read+0x28/0x30
down_read from ksmbd_session_rpc_method+0x18/0x3c
ksmbd_session_rpc_method from ksmbd_rpc_open+0x34/0x68
ksmbd_rpc_open from ksmbd_session_rpc_open+0x194/0x228
ksmbd_session_rpc_open from create_smb2_pipe+0x8c/0x2c8
create_smb2_pipe from smb2_open+0x10c/0x27ac
smb2_open from handle_ksmbd_work+0x238/0x3dc
handle_ksmbd_work from process_scheduled_works+0x160/0x25c
process_scheduled_works from worker_thread+0x16c/0x1e8
worker_thread from kthread+0xa8/0xb8
kthread from ret_from_fork+0x14/0x38
Exception stack(0x8529ffb0 to 0x8529fff8)
The task deadlocks because the lock is already held:
ksmbd_session_rpc_open
down_write(&sess->rpc_lock)
ksmbd_rpc_open
ksmbd_session_rpc_method
down_read(&sess->rpc_lock) <-- deadlock
Adjust ksmbd_session_rpc_method() callers to take the lock when necessary.
Fixes: 305853cce3794 ("ksmbd: Fix race condition in RPC handle list access") Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Florian Westphal [Mon, 13 Oct 2025 18:50:52 +0000 (20:50 +0200)]
net: core: fix lockdep splat on device unregister
Since blamed commit, unregister_netdevice_many_notify() takes the netdev
mutex if the device needs it.
If the device list is too long, this will lock more device mutexes than
lockdep can handle:
unshare -n \
bash -c 'for i in $(seq 1 100);do ip link add foo$i type dummy;done'
BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48 max: 48!
48 locks held by kworker/u16:1/69:
#0: ..148 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work
#1: ..d40 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work
#2: ..bd0 (pernet_ops_rwsem){++++}-{4:4}, at: cleanup_net
#3: ..aa8 (rtnl_mutex){+.+.}-{4:4}, at: default_device_exit_batch
#4: ..cb0 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: unregister_netdevice_many_notify
[..]
Add a helper to close and then unlock a list of net_devices.
Devices that are not up have to be skipped - netif_close_many always
removes them from the list without any other actions taken, so they'd
remain in locked state.
Close devices whenever we've used up half of the tracking slots or we
processed entire list without hitting the limit.
Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20251013185052.14021-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 14 Oct 2025 16:15:45 +0000 (09:15 -0700)]
Merge tag 'for-linus-6.18-2' of https://github.com/cminyard/linux-ipmi
Pull IPMI fixes from Corey Minyard:
"A few bug fixes for patches that went in this release: a refcount
error and some missing or incorrect error checks"
* tag 'for-linus-6.18-2' of https://github.com/cminyard/linux-ipmi:
ipmi: Fix handling of messages with provided receive message pointer
mfd: ls2kbmc: check for devm_mfd_add_devices() failure
mfd: ls2kbmc: Fix an IS_ERR() vs NULL check in probe()
Wang Liang [Mon, 13 Oct 2025 08:00:39 +0000 (16:00 +0800)]
selftests: net: check jq command is supported
The jq command is used in vlan_bridge_binding.sh, if it is not supported,
the test will spam the following log.
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# TEST: Test bridge_binding on->off when lower down [FAIL]
# Got operstate of , expected 0
The rtnetlink.sh has the same problem. It makes sense to check if jq is
installed before running these tests. After this patch, the
vlan_bridge_binding.sh skipped if jq is not supported:
# timeout set to 3600
# selftests: net: vlan_bridge_binding.sh
# TEST: jq not installed [SKIP]
Fixes: dca12e9ab760 ("selftests: net: Add a VLAN bridge binding selftest") Fixes: 6a414fd77f61 ("selftests: rtnetlink: Add an address proto test") Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20251013080039.3035898-1-wangliang74@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Sun, 12 Oct 2025 09:19:44 +0000 (11:19 +0200)]
net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit()
Completion napi can free out-of-order tx descriptors if hw QoS is
enabled and packets with different priority are queued to same DMA ring.
Take into account possible out-of-order reports checking if the tx queue
is full using circular buffer head/tail pointer instead of the number of
queued packets.
Fixes: 23020f0493270 ("net: airoha: Introduce ethernet support for EN7581 SoC") Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251012-airoha-tx-busy-queue-v2-1-a600b08bab2d@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Sat, 11 Oct 2025 11:57:42 +0000 (11:57 +0000)]
tcp: fix tcp_tso_should_defer() vs large RTT
Neal reported that using neper tcp_stream with TCP_TX_DELAY
set to 50ms would often lead to flows stuck in a small cwnd mode,
regardless of the congestion control.
While tcp_stream sets TCP_TX_DELAY too late after the connect(),
it highlighted two kernel bugs.
The following heuristic in tcp_tso_should_defer() seems wrong
for large RTT:
delta = tp->tcp_clock_cache - head->tstamp;
/* If next ACK is likely to come too late (half srtt), do not defer */
if ((s64)(delta - (u64)NSEC_PER_USEC * (tp->srtt_us >> 4)) < 0)
goto send_now;
If next ACK is expected to come in more than 1 ms, we should
not defer because we prefer a smooth ACK clocking.
While blamed commit was a step in the good direction, it was not
generic enough.
Another patch fixing TCP_TX_DELAY for established flows
will be proposed when net-next reopens.
For historical and portability reasons, the netif_rx() is usually
run in the softirq or interrupt context, this commit therefore add
local_bh_disable/enable() protection in the usbnet_resume_rx().
Raju Rangoju [Fri, 10 Oct 2025 06:51:42 +0000 (12:21 +0530)]
amd-xgbe: Avoid spurious link down messages during interface toggle
During interface toggle operations (ifdown/ifup), the driver currently
resets the local helper variable 'phy_link' to -1. This causes the link
state machine to incorrectly interpret the state as a link change event,
resulting in spurious "Link is down" messages being logged when the
interface is brought back up.
Preserve the phy_link state across interface toggles to avoid treating
the -1 sentinel value as a legitimate link state transition.
Fixes: 88131a812b16 ("amd-xgbe: Perform phy connect/disconnect at dev open/stop") Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Link: https://patch.msgid.link/20251010065142.1189310-1-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Theodore Ts'o [Tue, 14 Oct 2025 01:50:40 +0000 (21:50 -0400)]
Use CONFIG_EXT4_FS instead of CONFIG_EXT3_FS in all of the defconfigs
Commit d6ace46c82fd ("ext4: remove obsolete EXT3 config options")
removed the obsolete EXT3_CONFIG options, since it had been over a
decade since fs/ext3 had been removed. Unfortunately, there were a
number of defconfigs that still used CONFIG_EXT3_FS which the cleanup
commit didn't fix up. This led to a large number of defconfig test
builds to fail. Oops.
Marek Vasut [Sat, 11 Oct 2025 11:02:49 +0000 (13:02 +0200)]
net: phy: realtek: Avoid PHYCR2 access if PHYCR2 not present
The driver is currently checking for PHYCR2 register presence in
rtl8211f_config_init(), but it does so after accessing PHYCR2 to
disable EEE. This was introduced in commit bfc17c165835 ("net:
phy: realtek: disable PHY-mode EEE"). Move the PHYCR2 presence
test before the EEE disablement and simplify the code.
Fixes: bfc17c165835 ("net: phy: realtek: disable PHY-mode EEE") Signed-off-by: Marek Vasut <marek.vasut@mailbox.org> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20251011110309.12664-1-marek.vasut@mailbox.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Intel Wired LAN Driver Updates 2025-10-01 (idpf, ixgbe, ixgbevf)
For idpf:
Milena fixes a memory leak in the idpf reset logic when the driver resets
with an outstanding Tx timestamp.
For ixgbe and ixgbevf:
Jedrzej fixes an issue with reporting link speed on E610 VFs.
Jedrzej also fixes the VF mailbox API incompatibilities caused by the
confusion with API v1.4, v1.5, and v1.6. The v1.4 API introduced IPSEC
offload, but this was only supported on Linux hosts. The v1.5 API
introduced a new mailbox API which is necessary to resolve issues on ESX
hosts. The v1.6 API introduced a new link management API for E610. Jedrzej
introduces a new v1.7 API with a feature negotiation which enables properly
checking if features such as IPSEC or the ESX mailbox APIs are supported.
This resolves issues with compatibility on different hosts, and aligns the
API across hosts instead of having Linux require custom mailbox API
versions for IPSEC offload.
Koichiro fixes a KASAN use-after-free bug in ixgbe_remove().
====================
Koichiro Den [Fri, 10 Oct 2025 00:03:51 +0000 (17:03 -0700)]
ixgbe: fix too early devlink_free() in ixgbe_remove()
Since ixgbe_adapter is embedded in devlink, calling devlink_free()
prematurely in the ixgbe_remove() path can lead to UAF. Move devlink_free()
to the end.
Jedrzej Jagielski [Fri, 10 Oct 2025 00:03:49 +0000 (17:03 -0700)]
ixgbevf: fix mailbox API compatibility by negotiating supported features
There was backward compatibility in the terms of mailbox API. Various
drivers from various OSes supporting 10G adapters from Intel portfolio
could easily negotiate mailbox API.
This convention has been broken since introducing API 1.4.
Commit 0062e7cc955e ("ixgbevf: add VF IPsec offload code") added support
for IPSec which is specific only for the kernel ixgbe driver. None of the
rest of the Intel 10G PF/VF drivers supports it. And actually lack of
support was not included in the IPSec implementation - there were no such
code paths. No possibility to negotiate support for the feature was
introduced along with introduction of the feature itself.
Commit 339f28964147 ("ixgbevf: Add support for new mailbox communication
between PF and VF") increasing API version to 1.5 did the same - it
introduced code supported specifically by the PF ESX driver. It altered API
version for the VF driver in the same time not touching the version
defined for the PF ixgbe driver. It led to additional discrepancies,
as the code provided within API 1.6 cannot be supported for Linux ixgbe
driver as it causes crashes.
The issue was noticed some time ago and mitigated by Jake within the commit d0725312adf5 ("ixgbevf: stop attempting IPSEC offload on Mailbox API 1.5").
As a result we have regression for IPsec support and after increasing API
to version 1.6 ixgbevf driver stopped to support ESX MBX.
To fix this mess add new mailbox op asking PF driver about supported
features. Basing on a response determine whether to set support for IPSec
and ESX-specific enhanced mailbox.
New mailbox op, for compatibility purposes, must be added within new API
revision, as API version of OOT PF & VF drivers is already increased to
1.6 and doesn't incorporate features negotiate op.
Features negotiation mechanism gives possibility to be extended with new
features when needed in the future.
Reported-by: Jacob Keller <jacob.e.keller@intel.com> Closes: https://lore.kernel.org/intel-wired-lan/20241101-jk-ixgbevf-mailbox-v1-5-fixes-v1-0-f556dc9a66ed@intel.com/ Fixes: 0062e7cc955e ("ixgbevf: add VF IPsec offload code") Fixes: 339f28964147 ("ixgbevf: Add support for new mailbox communication between PF and VF") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251009-jk-iwl-net-2025-10-01-v3-4-ef32a425b92a@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update supported API version and provide handler for
IXGBE_VF_GET_PF_LINK_STATE cmd.
Simply put stored values of link speed and link_up from adapter context.
Jedrzej Jagielski [Fri, 10 Oct 2025 00:03:47 +0000 (17:03 -0700)]
ixgbevf: fix getting link speed data for E610 devices
E610 adapters no longer use the VFLINKS register to read PF's link
speed and linkup state. As a result VF driver cannot get actual link
state and it incorrectly reports 10G which is the default option.
It leads to a situation where even 1G adapters print 10G as actual
link speed. The same happens when PF driver set speed different than 10G.
Add new mailbox operation to let the VF driver request a PF driver
to provide actual link data. Update the mailbox api to v1.6.
Incorporate both ways of getting link status within the legacy
ixgbe_check_mac_link_vf() function.
Fixes: 4c44b450c69b ("ixgbevf: Add support for Intel(R) E610 device") Co-developed-by: Andrzej Wilczynski <andrzejx.wilczynski@intel.com> Signed-off-by: Andrzej Wilczynski <andrzejx.wilczynski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251009-jk-iwl-net-2025-10-01-v3-2-ef32a425b92a@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Milena Olech [Fri, 10 Oct 2025 00:03:46 +0000 (17:03 -0700)]
idpf: cleanup remaining SKBs in PTP flows
When the driver requests Tx timestamp value, one of the first steps is
to clone SKB using skb_get. It increases the reference counter for that
SKB to prevent unexpected freeing by another component.
However, there may be a case where the index is requested, SKB is
assigned and never consumed by PTP flows - for example due to reset during
running PTP apps.
Add a check in release timestamping function to verify if the SKB
assigned to Tx timestamp latch was freed, and release remaining SKBs.
Fixes: 4901e83a94ef ("idpf: add Tx timestamp capabilities negotiation") Signed-off-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251009-jk-iwl-net-2025-10-01-v3-1-ef32a425b92a@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dmitry Safonov [Thu, 9 Oct 2025 15:02:19 +0000 (16:02 +0100)]
net/ip6_tunnel: Prevent perpetual tunnel growth
Similarly to ipv4 tunnel, ipv6 version updates dev->needed_headroom, too.
While ipv4 tunnel headroom adjustment growth was limited in
commit 5ae1e9922bbd ("net: ip_tunnel: prevent perpetual headroom growth"),
ipv6 tunnel yet increases the headroom without any ceiling.
Reflect ipv4 tunnel headroom adjustment limit on ipv6 version.
Credits to Francesco Ruggeri, who was originally debugging this issue
and wrote local Arista-specific patch and a reproducer.
The Broadcom bcm54811 is hardware-strapped to select among RGMII and
GMII/MII/MII-Lite modes. However, the corresponding bit, RGMII Enable
in Miscellaneous Control Register must be also set to select desired
RGMII or MII(-lite)/GMII mode.
Linmao Li [Thu, 9 Oct 2025 12:25:49 +0000 (20:25 +0800)]
r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H
After resume from S4 (hibernate), RTL8168H/RTL8111H truncates incoming
packets. Packet captures show messages like "IP truncated-ip - 146 bytes
missing!".
The issue is caused by RxConfig not being properly re-initialized after
resume. Re-initializing the RxConfig register before the chip
re-initialization sequence avoids the truncation and restores correct
packet reception.
This follows the same pattern as commit ef9da46ddef0 ("r8169: fix data
corruption issue on RTL8402").
Fixes: 6e1d0b898818 ("r8169:add support for RTL8168H and RTL8107E") Signed-off-by: Linmao Li <lilinmao@kylinos.cn> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20251009122549.3955845-1-lilinmao@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sebastian Andrzej Siewior [Thu, 9 Oct 2025 09:43:38 +0000 (11:43 +0200)]
net: gro_cells: Use nested-BH locking for gro_cell
The gro_cell data structure is per-CPU variable and relies on disabled
BH for its locking. Without per-CPU locking in local_bh_disable() on
PREEMPT_RT this data structure requires explicit locking.
Add a local_lock_t to the data structure and use
local_lock_nested_bh() for locking. This change adds only lockdep
coverage and does not alter the functional behaviour for !PREEMPT_RT.
Reported-by: syzbot+8715dd783e9b0bef43b1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68c6c3b1.050a0220.2ff435.0382.GAE@google.com/ Fixes: 3253cb49cbad ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20251009094338.j1jyKfjR@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Wed, 8 Oct 2025 14:14:45 +0000 (16:14 +0200)]
dpll: zl3073x: Handle missing or corrupted flash configuration
If the internal flash contains missing or corrupted configuration,
basic communication over the bus still functions, but the device
is not capable of normal operation (for example, using mailboxes).
This condition is indicated in the info register by the ready bit.
If this bit is cleared, the probe procedure times out while fetching
the device state.
Handle this case by checking the ready bit value in zl3073x_dev_start()
and skipping DPLL device and pin registration if it is cleared.
Do not report this condition as an error, allowing the devlink device
to be registered and enabling the user to flash the correct configuration.
Prior this patch:
[ 31.112299] zl3073x-i2c 1-0070: Failed to fetch input state: -ETIMEDOUT
[ 31.116332] zl3073x-i2c 1-0070: error -ETIMEDOUT: Failed to start device
[ 31.136881] zl3073x-i2c 1-0070: probe with driver zl3073x-i2c failed with error -110
After this patch:
[ 41.011438] zl3073x-i2c 1-0070: FW not fully ready - missing or corrupted config
Fixes: 75a71ecc24125 ("dpll: zl3073x: Register DPLL devices and pins") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251008141445.841113-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In this example, we should give the relative map of the target block device
ranging from 0x3caa9 to 0x3ffa9 where the length should be calculated by
0x37ebfff + 1 - 0x37ebfa9.
In the below equation, however, map->m_pblk was supposed to be the original
address instead of the one from the target block address.
Gustavo A. R. Silva [Fri, 3 Oct 2025 14:11:06 +0000 (15:11 +0100)]
btrfs: send: fix -Wflex-array-member-not-at-end warning in struct send_ctx
The warning -Wflex-array-member-not-at-end was introduced in GCC-14, and
we are getting ready to enable it, globally.
Fix the following warning:
fs/btrfs/send.c:181:24: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
and move the declaration of send_ctx::cur_inode_path to the end.
Notice that struct fs_path contains a flexible array member inline_buf,
but also a padding array and a limit calculated for the usable space of
inline_buf (FS_PATH_INLINE_SIZE). It is not the pattern where flexible
array is in the middle of a structure and could potentially overwrite
other members.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Dan Carpenter [Wed, 8 Oct 2025 15:08:58 +0000 (18:08 +0300)]
btrfs: tree-checker: fix bounds check in check_inode_extref()
The parentheses for the unlikely() annotation were put in the wrong
place so it means that the condition is basically never true and the
bounds checking is skipped.
Fixes: aab9458b9f00 ("btrfs: tree-checker: add inode extref checks") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: fix memory leaks when rejecting a non SINGLE data profile without an RST
At the end of btrfs_load_block_group_zone_info() the first thing we do
is to ensure that if the mapping type is not a SINGLE one and there is
no RAID stripe tree, then we return early with an error.
Doing that, though, prevents the code from running the last calls from
this function which are about freeing memory allocated during its
run. Hence, in this case, instead of returning early, we set the ret
value and fall through the rest of the cleanup code.
Boris Burkov [Wed, 1 Oct 2025 04:05:17 +0000 (21:05 -0700)]
btrfs: fix incorrect readahead expansion length
The intent of btrfs_readahead_expand() was to expand to the length of
the current compressed extent being read. However, "ram_bytes" is *not*
that, in the case where a single physical compressed extent is used for
multiple file extents.
Consider this case with a large compressed extent C and then later two
non-compressed extents N1 and N2 written over C, leaving C1 and C2
pointing to offset/len pairs of C:
[ C ]
[ N1 ][ C1 ][ N2 ][ C2 ]
In such a case, ram_bytes for both C1 and C2 is the full uncompressed
length of C. So starting readahead in C1 will expand the readahead past
the end of C1, past N2, and into C2. This will then expand readahead
again, to C2_start + ram_bytes, way past EOF. First of all, this is
totally undesirable, we don't want to read the whole file in arbitrary
chunks of the large underlying extent if it happens to exist. Secondly,
it results in zeroing the range past the end of C2 up to ram_bytes. This
is particularly unpleasant with fs-verity as it can zero and set
uptodate pages in the verity virtual space past EOF. This incorrect
readahead behavior can lead to verity verification errors, if we iterate
in a way that happens to do the wrong readahead.
Fix this by using em->len for readahead expansion, not em->ram_bytes,
resulting in the expected behavior of stopping readahead at the extent
boundary.
Reported-by: Max Chernoff <git@maxchernoff.ca> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2399898 Fixes: 9e9ff875e417 ("btrfs: use readahead_expand() on compressed extents") CC: stable@vger.kernel.org # 6.17 Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 1 Oct 2025 10:08:13 +0000 (11:08 +0100)]
btrfs: do not assert we found block group item when creating free space tree
Currently, when building a free space tree at populate_free_space_tree(),
if we are not using the block group tree feature, we always expect to find
block group items (either extent items or a block group item with key type
BTRFS_BLOCK_GROUP_ITEM_KEY) when we search the extent tree with
btrfs_search_slot_for_read(), so we assert that we found an item. However
this expectation is wrong since we can have a new block group created in
the current transaction which is still empty and for which we still have
not added the block group's item to the extent tree, in which case we do
not have any items in the extent tree associated to the block group.
The insertion of a new block group's block group item in the extent tree
happens at btrfs_create_pending_block_groups() when it calls the helper
insert_block_group_item(). This typically is done when a transaction
handle is released, committed or when running delayed refs (either as
part of a transaction commit or when serving tickets for space reservation
if we are low on free space).
So remove the assertion at populate_free_space_tree() even when the block
group tree feature is not enabled and update the comment to mention this
case.
Syzbot reported this with the following stack trace:
btrfs: do not use folio_test_partial_kmap() in ASSERT()s
[BUG]
Syzbot reported an ASSERT() triggered inside scrub:
BTRFS info (device loop0): scrub: started on devid 1
assertion failed: !folio_test_partial_kmap(folio) :: 0, in fs/btrfs/scrub.c:697
------------[ cut here ]------------
kernel BUG at fs/btrfs/scrub.c:697!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6077 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
RIP: 0010:scrub_stripe_get_kaddr+0x1bb/0x1c0 fs/btrfs/scrub.c:697
Call Trace:
<TASK>
scrub_bio_add_sector fs/btrfs/scrub.c:932 [inline]
scrub_submit_initial_read+0xf21/0x1120 fs/btrfs/scrub.c:1897
submit_initial_group_read+0x423/0x5b0 fs/btrfs/scrub.c:1952
flush_scrub_stripes+0x18f/0x1150 fs/btrfs/scrub.c:1973
scrub_stripe+0xbea/0x2a30 fs/btrfs/scrub.c:2516
scrub_chunk+0x2a3/0x430 fs/btrfs/scrub.c:2575
scrub_enumerate_chunks+0xa70/0x1350 fs/btrfs/scrub.c:2839
btrfs_scrub_dev+0x6e7/0x10e0 fs/btrfs/scrub.c:3153
btrfs_ioctl_scrub+0x249/0x4b0 fs/btrfs/ioctl.c:3163
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
---[ end trace 0000000000000000 ]---
Which doesn't make much sense, as all the folios we allocated for scrub
should not be highmem.
[CAUSE]
Thankfully syzbot has a detailed kernel config file, showing that
CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP is set to y.
And that debug option will force all folio_test_partial_kmap() to return
true, to improve coverage on highmem tests.
But in our case we really just want to make sure the folios we allocated
are not highmem (and they are indeed not). Such incorrect result from
folio_test_partial_kmap() is just screwing up everything.
[FIX]
Replace folio_test_partial_kmap() to folio_test_highmem() so that we
won't bother those highmem specific debuging options.
Fixes: 5fbaae4b8567 ("btrfs: prepare scrub to support bs > ps cases") Reported-by: syzbot+bde59221318c592e6346@syzkaller.appspotmail.com Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# mount
[...]
/dev/sdd on /data2 type btrfs (rw,nosuid,nodev,relatime,ssd,space_cache=v2,subvolid=5,subvol=/)
[CAUSE]
The 'ssd' mount option is set by set_device_specific_options(), and it
expects that if there is any rotating device in the btrfs, it will set
fs_devices::rotating.
However after commit bddf57a70781 ("btrfs: delay btrfs_open_devices()
until super block is created"), the device opening is delayed until the
super block is created.
But the timing of set_device_specific_options() is still left as is,
this makes the function be called without any device opened.
Since no device is opened, thus fs_devices::rotating will never be set,
making btrfs incorrectly set 'ssd' mount option.
[FIX]
Only call set_device_specific_options() after btrfs_open_devices().
Also only call set_device_specific_options() after a new mount, if we're
mounting a mounted btrfs, there is no need to set the device specific
mount options again.
Reported-by: HAN Yuwei <hrx@bupt.moe> Link: https://lore.kernel.org/linux-btrfs/C8FF75669DFFC3C5+5f93bf8a-80a0-48a6-81bf-4ec890abc99a@bupt.moe/ Fixes: bddf57a70781 ("btrfs: delay btrfs_open_devices() until super block is created") CC: stable@vger.kernel.org # 6.17 Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: fix memory leak on duplicated memory in the qgroup assign ioctl
On 'btrfs_ioctl_qgroup_assign' we first duplicate the argument as
provided by the user, which is kfree'd in the end. But this was not the
case when allocating memory for 'prealloc'. In this case, if it somehow
failed, then the previous code would go directly into calling
'mnt_drop_write_file', without freeing the string duplicated from the
user space.
Filipe Manana [Wed, 24 Sep 2025 15:10:38 +0000 (16:10 +0100)]
btrfs: fix clearing of BTRFS_FS_RELOC_RUNNING if relocation already running
When starting relocation, at reloc_chunk_start(), if we happen to find
the flag BTRFS_FS_RELOC_RUNNING is already set we return an error
(-EINPROGRESS) to the callers, however the callers call reloc_chunk_end()
which will clear the flag BTRFS_FS_RELOC_RUNNING, which is wrong since
relocation was started by another task and still running.
Finding the BTRFS_FS_RELOC_RUNNING flag already set is an unexpected
scenario, but still our current behaviour is not correct.
Fix this by never calling reloc_chunk_end() if reloc_chunk_start() has
returned an error, which is what logically makes sense, since the general
widespread pattern is to have end functions called only if the counterpart
start functions succeeded. This requires changing reloc_chunk_start() to
clear BTRFS_FS_RELOC_RUNNING if there's a pending cancel request.
Fixes: 907d2710d727 ("btrfs: add cancellable chunk relocation support") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
can: j1939: add missing calls in NETDEV_UNREGISTER notification handler
Currently NETDEV_UNREGISTER event handler is not calling
j1939_cancel_active_session() and j1939_sk_queue_drop_all().
This will result in these calls being skipped when j1939_sk_release() is
called. And I guess that the reason syzbot is still reporting
unregister_netdevice: waiting for vcan0 to become free. Usage count = 2
is caused by lack of these calls.
Calling j1939_cancel_active_session(priv, sk) from j1939_sk_release() can
be covered by calling j1939_cancel_active_session(priv, NULL) from
j1939_netdev_notify().
Calling j1939_sk_queue_drop_all() from j1939_sk_release() can be covered
by calling j1939_sk_netdev_event_netdown() from j1939_netdev_notify().
Therefore, we can reuse j1939_cancel_active_session(priv, NULL) and
j1939_sk_netdev_event_netdown(priv) for NETDEV_UNREGISTER event handler.
Marc Kleine-Budde [Mon, 13 Oct 2025 19:26:08 +0000 (21:26 +0200)]
Merge patch series "can: add Transmitter Delay Compensation (TDC) documentation"
Vincent Mailhol <mailhol@kernel.org> says:
TDC was added to the kernel in 2021 but I never took time to update the
documentation. The year is now 2025... As we say: "better late than never"!
The first patch is a small clean up which fixes an incorrect statement
concerning the CAN DLC, the second patch is the real thing and adds the
documentation of how to use the ip tool to configure the TDC.
Vincent Mailhol [Mon, 13 Oct 2025 10:10:22 +0000 (19:10 +0900)]
can: remove false statement about 1:1 mapping between DLC and length
The CAN-FD section of can.rst still states that there is a 1:1 mapping
between the Classical CAN DLC and its length. This is only true for
the DLC values up to 8. Beyond that point, the length remains at 8.
For reference, the mapping between the CAN DLC and the length is given
in below table [1]:
Damien Le Moal [Thu, 9 Oct 2025 10:46:00 +0000 (19:46 +0900)]
ata: libata-core: relax checks in ata_read_log_directory()
Commit 6d4405b16d37 ("ata: libata-core: Cache the general purpose log
directory") introduced caching of a device general purpose log directory
to avoid repeated access to this log page during device scan. This
change also added a check on this log page to verify that the log page
version is 0x0001 as mandated by the ACS specifications.
And it turns out that some devices do not bother reporting this version,
instead reporting a version 0, resulting in error messages such as:
ata6.00: Invalid log directory version 0x0000
and to the device being marked as not supporting the general purpose log
directory log page.
Since before commit 6d4405b16d37 the log page version check did not
exist and things were still working correctly for these devices, relax
ata_read_log_directory() version check and only warn about the invalid
log page version number without disabling access to the log directory
page.
Fixes: 6d4405b16d37 ("ata: libata-core: Cache the general purpose log directory") Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220635 Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Markus Elfring [Wed, 1 Oct 2025 19:09:00 +0000 (21:09 +0200)]
smb: server: Use common error handling code in smb_direct_rdma_xmit()
Add two jump targets so that a bit of exception handling can be better
reused at the end of this function implementation.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Nicolas Dichtel [Fri, 10 Oct 2025 14:18:59 +0000 (16:18 +0200)]
doc: fix seg6_flowlabel path
This sysctl is not per interface; it's global per netns.
Fixes: 292ecd9f5a94 ("doc: move seg6_flowlabel to seg6-sysctl.rst") Reported-by: Philippe Guibert <philippe.guibert@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 12 Oct 2025 20:27:56 +0000 (13:27 -0700)]
Merge tag 'i2c-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"One revert because of a regression in the I2C core which has sadly not
showed up during its time in -next"
* tag 'i2c-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: boardinfo: Annotate code used in init phase only"
Convert remaining __init__ files similar to what we did in
commit b615879dbfea ("selftests: drv-net: make linters happy with our imports")
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
There is no error handling for `dma_map_single()` failures.
Add error handling by checking `dma_mapping_error()` and freeing
the `skb` using `dev_kfree_skb()` (process context) when it fails.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com>
Tested-on: D-Link DGE-550T Rev-A3 Suggested-by: Simon Horman <horms@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Rex Lu [Thu, 9 Oct 2025 06:29:34 +0000 (08:29 +0200)]
net: mtk: wed: add dma mask limitation and GFP_DMA32 for device with more than 4GB DRAM
Limit tx/rx buffer address to 32-bit address space for board with more
than 4GB DRAM.
Fixes: 804775dfc2885 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)") Fixes: 6757d345dd7db ("net: ethernet: mtk_wed: introduce hw_rro support for MT7988") Tested-by: Daniel Pawlik <pawlik.dan@gmail.com> Tested-by: Matteo Croce <teknoraver@meta.com> Signed-off-by: Rex Lu <rex.lu@mediatek.com> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: usb: lan78xx: Fix lost EEPROM write timeout error(-ETIMEDOUT) in lan78xx_write_raw_eeprom
The function lan78xx_write_raw_eeprom failed to properly propagate EEPROM
write timeout errors (-ETIMEDOUT). In the timeout fallthrough path, it first
attempted to restore the pin configuration for LED outputs and then
returned only the status of that restore operation, discarding the
original timeout error saved in ret.
As a result, callers could mistakenly treat EEPROM write operation as
successful even though the EEPROM write had actually timed out with no
or partial data write.
To fix this, handle errors in restoring the LED pin configuration separately.
If the restore succeeds, return any prior EEPROM write timeout error saved
in ret to the caller.
Suggested-by: Oleksij Rempel <o.rempel@pengutronix.de> Fixes: 8b1b2ca83b20 ("net: usb: lan78xx: Improve error handling in EEPROM and OTP operations")
cc: stable@vger.kernel.org Signed-off-by: Bhanu Seshu Kumar Valluri <bhanuseshukumar@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 12 Oct 2025 15:45:52 +0000 (08:45 -0700)]
Merge tag 'irq_urgent_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Skip interrupt ID 0 in sifive-plic during suspend/resume because
ID 0 is reserved and accessing reserved register space could result
in undefined behavior
- Fix a function's retval check in aspeed-scu-ic
* tag 'irq_urgent_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/sifive-plic: Avoid interrupt ID 0 handling during suspend/resume
irqchip/aspeed-scu-ic: Fix an IS_ERR() vs NULL check
Linus Torvalds [Sat, 11 Oct 2025 23:06:04 +0000 (16:06 -0700)]
Merge tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"The previous fix to trace_marker required updating trace_marker_raw as
well. The difference between trace_marker_raw from trace_marker is
that the raw version is for applications to write binary structures
directly into the ring buffer instead of writing ASCII strings. This
is for applications that will read the raw data from the ring buffer
and get the data structures directly. It's a bit quicker than using
the ASCII version.
Unfortunately, it appears that our test suite has several tests that
test writes to the trace_marker file, but lacks any tests to the
trace_marker_raw file (this needs to be remedied). Two issues came
about the update to the trace_marker_raw file that syzbot found:
- Fix tracing_mark_raw_write() to use per CPU buffer
The fix to use the per CPU buffer to copy from user space was
needed for both the trace_maker and trace_maker_raw file.
The fix for reading from user space into per CPU buffers properly
fixed the trace_marker write function, but the trace_marker_raw
file wasn't fixed properly. The user space data was correctly
written into the per CPU buffer, but the code that wrote into the
ring buffer still used the user space pointer and not the per CPU
buffer that had the user space data already written.
- Stop the fortify string warning from writing into trace_marker_raw
After converting the copy_from_user_nofault() into a memcpy(),
another issue appeared. As writes to the trace_marker_raw expects
binary data, the first entry is a 4 byte identifier. The entry
structure is defined as:
struct {
struct trace_entry ent;
int id;
char buf[];
};
The size of this structure is reserved on the ring buffer with:
size = sizeof(*entry) + cnt;
Then it is copied from the buffer into the ring buffer with:
memcpy(&entry->id, buf, cnt);
This use to be a copy_from_user_nofault(), but now converting it to
a memcpy() triggers the fortify-string code, and causes a warning.
The allocated space is actually more than what is copied, as the
cnt used also includes the entry->id portion. Allocating
sizeof(*entry) plus cnt is actually allocating 4 bytes more than
what is needed.
* tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Stop fortify-string from warning in tracing_mark_raw_write()
tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
Linus Torvalds [Sat, 11 Oct 2025 22:47:12 +0000 (15:47 -0700)]
Merge tag 'kbuild-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild fixes from Nathan Chancellor:
- Fix UAPI types check in headers_check.pl
- Only enable -Werror for hostprogs with CONFIG_WERROR / W=e
- Ignore fsync() error when output of gen_init_cpio is a pipe
- Several little build fixes for recent modules.builtin.modinfo series
* tag 'kbuild-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: Use '--strip-unneeded-symbol' for removing module device table symbols
s390/vmlinux.lds.S: Move .vmlinux.info to end of allocatable sections
kbuild: Add '.rel.*' strip pattern for vmlinux
kbuild: Restore pattern to avoid stripping .rela.dyn from vmlinux
gen_init_cpio: Ignore fsync() returning EINVAL on pipes
scripts/Makefile.extrawarn: Respect CONFIG_WERROR / W=e for hostprogs
kbuild: uapi: Strip comments before size type check
Reported-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Closes: https://lore.kernel.org/r/29ec0082-4dd4-4120-acd2-44b35b4b9487@oss.qualcomm.com Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Linus Torvalds [Sat, 11 Oct 2025 18:56:47 +0000 (11:56 -0700)]
Merge tag 'rtc-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"This cycle, we have a new RTC driver, for the SpacemiT P1. The optee
driver gets alarm support. We also get a fix for a race condition that
was fairly rare unless while stress testing the alarms.
Subsystem:
- Fix race when setting alarm
- Ensure alarm irq is enabled when UIE is enabled
- remove unneeded 'fast_io' parameter in regmap_config
New driver:
- SpacemiT P1 RTC
Drivers:
- efi: Remove wakeup functionality
- optee: add alarms support
- s3c: Drop support for S3C2410
- zynqmp: Restore alarm functionality after kexec transition"
* tag 'rtc-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (29 commits)
rtc: interface: Ensure alarm irq is enabled when UIE is enabled
rtc: tps6586x: Fix initial enable_irq/disable_irq balance
rtc: cpcap: Fix initial enable_irq/disable_irq balance
rtc: isl12022: Fix initial enable_irq/disable_irq balance
rtc: interface: Fix long-standing race when setting alarm
rtc: pcf2127: fix watchdog interrupt mask on pcf2131
rtc: zynqmp: Restore alarm functionality after kexec transition
rtc: amlogic-a4: Optimize global variables
rtc: sd2405al: Add I2C address.
rtc: Kconfig: move symbols to proper section
rtc: optee: make optee_rtc_pm_ops static
rtc: optee: Fix error code in optee_rtc_read_alarm()
rtc: optee: fix error code in probe()
dt-bindings: rtc: Convert apm,xgene-rtc to DT schema
rtc: spacemit: support the SpacemiT P1 RTC
rtc: optee: add alarm related rtc ops to optee rtc driver
rtc: optee: remove unnecessary memory operations
rtc: optee: fix memory leak on driver removal
rtc: x1205: Fix Xicor X1205 vendor prefix
dt-bindings: rtc: Fix Xicor X1205 vendor prefix
...
Linus Torvalds [Sat, 11 Oct 2025 18:49:00 +0000 (11:49 -0700)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Fixes only in drivers (ufs, mvsas, qla2xxx, target) that came in just
before or during the merge window.
The most important one is the qla2xxx which reverts a conversion to
fix flexible array member warnings, that went up in this merge window
but which turned out on further testing to be causing data corruption"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Include UTP error in INT_FATAL_ERRORS
scsi: ufs: sysfs: Make HID attributes visible
scsi: mvsas: Fix use-after-free bugs in mvs_work_queue
scsi: ufs: core: Fix PM QoS mutex initialization
scsi: ufs: core: Fix runtime suspend error deadlock
Revert "scsi: qla2xxx: Fix memcpy() field-spanning write issue"
scsi: target: target_core_configfs: Add length check to avoid buffer overflow
Linus Torvalds [Sat, 11 Oct 2025 18:19:16 +0000 (11:19 -0700)]
Merge tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more x86 updates from Borislav Petkov:
- Remove a bunch of asm implementing condition flags testing in KVM's
emulator in favor of int3_emulate_jcc() which is written in C
- Replace KVM fastops with C-based stubs which avoids problems with the
fastop infra related to latter not adhering to the C ABI due to their
special calling convention and, more importantly, bypassing compiler
control-flow integrity checking because they're written in asm
- Remove wrongly used static branches and other ugliness accumulated
over time in hyperv's hypercall implementation with a proper static
function call to the correct hypervisor call variant
- Add some fixes and modifications to allow running FRED-enabled
kernels in KVM even on non-FRED hardware
- Add kCFI improvements like validating indirect calls and prepare for
enabling kCFI with GCC. Add cmdline params documentation and other
code cleanups
- Use the single-byte 0xd6 insn as the official #UD single-byte
undefined opcode instruction as agreed upon by both x86 vendors
- Other smaller cleanups and touchups all over the place
* tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86,retpoline: Optimize patch_retpoline()
x86,ibt: Use UDB instead of 0xEA
x86/cfi: Remove __noinitretpoline and __noretpoline
x86/cfi: Add "debug" option to "cfi=" bootparam
x86/cfi: Standardize on common "CFI:" prefix for CFI reports
x86/cfi: Document the "cfi=" bootparam options
x86/traps: Clarify KCFI instruction layout
compiler_types.h: Move __nocfi out of compiler-specific header
objtool: Validate kCFI calls
x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y
x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware
x86/fred: Install system vector handlers even if FRED isn't fully enabled
x86/hyperv: Use direct call to hypercall-page
x86/hyperv: Clean up hv_do_hypercall()
KVM: x86: Remove fastops
KVM: x86: Convert em_salc() to C
KVM: x86: Introduce EM_ASM_3WCL
KVM: x86: Introduce EM_ASM_1SRC2
KVM: x86: Introduce EM_ASM_2CL
KVM: x86: Introduce EM_ASM_2W
...
Linus Torvalds [Sat, 11 Oct 2025 17:51:14 +0000 (10:51 -0700)]
Merge tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
- Simplify inline asm flag output operands now that the minimum
compiler version supports the =@ccCOND syntax
- Remove a bunch of AS_* Kconfig symbols which detect assembler support
for various instruction mnemonics now that the minimum assembler
version supports them all
- The usual cleanups all over the place
* tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__
x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h>
x86/mtrr: Remove license boilerplate text with bad FSF address
x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h>
x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h>
x86/entry/fred: Push __KERNEL_CS directly
x86/kconfig: Remove CONFIG_AS_AVX512
crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ
crypto: X86 - Remove CONFIG_AS_VAES
crypto: x86 - Remove CONFIG_AS_GFNI
x86/kconfig: Drop unused and needless config X86_64_SMP
Linus Torvalds [Sat, 11 Oct 2025 17:40:24 +0000 (10:40 -0700)]
Merge tag 'slab-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
"A NULL pointer deref hotfix"
* tag 'slab-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slab: fix barn NULL pointer dereference on memoryless nodes
- Fix metadata_dst leak in __bpf_redirect_neigh_v{4,6}() (Daniel
Borkmann)
- Fix undefined behavior in {get,put}_unaligned_be32() (Eric Biggers)
- Use correct context to unpin bpf hash map with special types (KaFai
Wan)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add test for unpinning htab with internal timer struct
bpf: Avoid RCU context warning when unpinning htab with internal structs
xsk: Harden userspace-supplied xdp_desc validation
bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6}
libbpf: Fix undefined behavior in {get,put}_unaligned_be32()
bpf: Finish constification of 1st parameter of bpf_d_path()
Linus Torvalds [Sat, 11 Oct 2025 17:27:52 +0000 (10:27 -0700)]
Merge tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more updates from Andrew Morton:
"Just one series here - Mike Rappoport has taught KEXEC handover to
preserve vmalloc allocations across handover"
* tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in fdt
kho: add support for preserving vmalloc allocations
kho: replace kho_preserve_phys() with kho_preserve_pages()
kho: check if kho is finalized in __kho_preserve_order()
MAINTAINERS, .mailmap: update Umang's email address
Linus Torvalds [Sat, 11 Oct 2025 17:14:55 +0000 (10:14 -0700)]
Merge tag 'mm-hotfixes-stable-2025-10-10-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"7 hotfixes. All 7 are cc:stable and all 7 are for MM.
All singletons, please see the changelogs for details"
* tag 'mm-hotfixes-stable-2025-10-10-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: hugetlb: avoid soft lockup when mprotect to large memory area
fsnotify: pass correct offset to fsnotify_mmap_perm()
mm/ksm: fix flag-dropping behavior in ksm_madvise
mm/damon/vaddr: do not repeat pte_offset_map_lock() until success
mm/rmap: fix soft-dirty and uffd-wp bit loss when remapping zero-filled mTHP subpage to shared zeropage
mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
memcg: skip cgroup_file_notify if spinning is not allowed
This is because fortify string sees that the size of entry->id is only 4
bytes, but it is writing more than that. But this is OK as the
dynamic_array is allocated to handle that copy.
The size allocated on the ring buffer was actually a bit too big:
size = sizeof(*entry) + cnt;
But cnt includes the 'id' and the buffer data, so adding cnt to the size
of *entry actually allocates too much on the ring buffer.
Vlastimil Babka [Sat, 11 Oct 2025 08:45:41 +0000 (10:45 +0200)]
slab: fix barn NULL pointer dereference on memoryless nodes
Phil reported a boot failure once sheaves become used in commits 59faa4da7cd4 ("maple_tree: use percpu sheaves for maple_node_cache") and 3accabda4da1 ("mm, vma: use percpu sheaves for vm_area_struct cache"):
Linus decoded the stacktrace to get_barn() and get_node() and determined
that kmem_cache->node[numa_mem_id()] is NULL.
The problem is due to a wrong assumption that memoryless nodes only
exist on systems with CONFIG_HAVE_MEMORYLESS_NODES, where numa_mem_id()
points to the nearest node that has memory. SLUB has been allocating its
kmem_cache_node structures only on nodes with memory and so it does with
struct node_barn.
For kmem_cache_node, get_partial_node() checks if get_node() result is
not NULL, which I assumed was for protection from a bogus node id passed
to kmalloc_node() but apparently it's also for systems where
numa_mem_id() (used when no specific node is given) might return a
memoryless node.
Fix the sheaves code the same way by checking the result of get_node()
and bailing out if it's NULL. Note that cpus on such memoryless nodes
will have degraded sheaves performance, which can be improved later,
preferably by making numa_mem_id() work properly on such systems.
Steven Rostedt [Sat, 11 Oct 2025 03:51:42 +0000 (23:51 -0400)]
tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
The fix to use a per CPU buffer to read user space tested only the writes
to trace_marker. But it appears that the selftests are missing tests to
the trace_maker_raw file. The trace_maker_raw file is used by applications
that writes data structures and not strings into the file, and the tools
read the raw ring buffer to process the structures it writes.
The fix that reads the per CPU buffers passes the new per CPU buffer to
the trace_marker file writes, but the update to the trace_marker_raw write
read the data from user space into the per CPU buffer, but then still used
then passed the user space address to the function that records the data.
Pass in the per CPU buffer and not the user space address.
TODO: Add a test to better test trace_marker_raw.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20251011035243.386098147@kernel.org Fixes: 64cf7d058a00 ("tracing: Have trace_marker use per-cpu data to read user space") Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Nathan Chancellor [Fri, 10 Oct 2025 21:49:27 +0000 (14:49 -0700)]
kbuild: Use '--strip-unneeded-symbol' for removing module device table symbols
After commit 5ab23c7923a1 ("modpost: Create modalias for builtin
modules"), relocatable RISC-V kernels with CONFIG_KASAN=y start failing
when attempting to strip the module device table symbols:
riscv64-linux-objcopy: not stripping symbol `__mod_device_table__kmod_irq_starfive_jh8100_intc__of__starfive_intc_irqchip_match_table' because it is named in a relocation
make[4]: *** [scripts/Makefile.vmlinux:97: vmlinux] Error 1
The relocation appears to come from .LASANLOC5 in .data.rel.local:
This section appears to come from GCC for including additional
information about global variables that may be protected by KASAN.
There appears to be no way to opt out of the generation of these symbols
through either a flag or attribute. Attempting to remove '.LASANLOC*'
with '--strip-symbol' results in the same error as above because these
symbols may refer to (thus have relocation between) each other.
Avoid this build breakage by switching to '--strip-unneeded-symbol' for
removing __mod_device_table__ symbols, as it will only remove the symbol
when there is no relocation pointing to it. While this may result in a
little more bloat in the symbol table in certain configurations, it is
not as bad as outright build failures.
Fixes: 5ab23c7923a1 ("modpost: Create modalias for builtin modules") Reported-by: Charles Mirabile <cmirabil@redhat.com> Closes: https://lore.kernel.org/20251007011637.2512413-1-cmirabil@redhat.com/ Suggested-by: Alexey Gladkov <legion@kernel.org> Tested-by: Nicolas Schier <nsc@kernel.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Linus Torvalds [Fri, 10 Oct 2025 21:06:02 +0000 (14:06 -0700)]
Merge tag 'for-6.18/hpfs-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull hpfs updates from Mikulas Patocka:
- Avoid -Wflex-array-member-not-at-end warnings
- Replace simple_strtoul with kstrtoint
- Fix error code for new_inode() failure
* tag 'for-6.18/hpfs-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
fs/hpfs: Fix error code for new_inode() failure in mkdir/create/mknod/symlink
hpfs: Replace simple_strtoul with kstrtoint in hpfs_parse_param
fs: hpfs: Avoid multiple -Wflex-array-member-not-at-end warnings
amdkfd:
- Fix kfd process ref leak
- mmap write lock handling fix
- Fix comments in IOCTL
xe:
- Fix build with clang 16
- Fix handling of invalid configfs syntax usage and spell out the
expected syntax in the documentation
- Do not try late bind firmware when running as VF since it shouldn't
handle firmware loading
- Fix idle assertion for local BOs
- Fix uninitialized variable for late binding
- Do not require perfmon_capable to expose free memory at page
granularity. Handle it like other drm drivers do
- Fix lock handling on suspend error path
- Fix I2C controller resume after S3
v3d:
- fix fence locking"
* tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel: (34 commits)
drm/amd/display: Incorrect Mirror Cositing
drm/amd/display: Enable Dynamic DTBCLK Switch
drm/amdgpu: Report individual reset error
drm/amdgpu: partially revert "revert to old status lock handling v3"
drm/amd/display: Fix unsafe uses of kernel mode FPU
drm/amd/pm: Disable VCN queue reset on SMU v13.0.6 due to regression
drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine
drm/amdgpu: Check swus/ds for switch state save
drm/amdkfd: Fix two comments in kfd_ioctl.h
drm/amd/pm: Avoid interface mismatch messaging
drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init
drm/amd/amdgpu: Fix the mes version that support inv_tlbs
drm/amd: Check whether secure display TA loaded successfully
drm/amdkfd: Fix mmap write lock not release
drm/amdkfd: Fix kfd process ref leaking when userptr unmapping
drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O.
drm/amd/display: Disable scaling on DCE6 for now
drm/amd/display: Properly disable scaling on DCE6
drm/amd/display: Properly clear SCL_*_FILTER_CONTROL on DCE6
drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT* SRIs
...
Linus Torvalds [Fri, 10 Oct 2025 20:59:38 +0000 (13:59 -0700)]
Merge tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Some fixes leftover from our fixes branch, just nouveau and vmwgfx:
nouveau:
- Return errno code from TTM move helper
vmwgfx:
- Fix null-ptr access in cursor code
- Fix UAF in validation
- Use correct iterator in validation"
* tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel:
drm/nouveau: fix bad ret code in nouveau_bo_move_prep
drm/vmwgfx: Fix copy-paste typo in validation
drm/vmwgfx: Fix Use-after-free in validation
drm/vmwgfx: Fix a null-ptr access in the cursor snooper