Breno Leitao [Fri, 20 Jun 2025 18:48:55 +0000 (11:48 -0700)]
net: netpoll: Initialize UDP checksum field before checksumming
commit f1fce08e63fe ("netpoll: Eliminate redundant assignment") removed
the initialization of the UDP checksum, which was wrong and broke
netpoll IPv6 transmission due to bad checksumming.
udph->check needs to be set before calling csum_ipv6_magic().
Arnd Bergmann [Fri, 20 Jun 2025 13:09:53 +0000 (15:09 +0200)]
net: qed: reduce stack usage for TLV processing
clang gets a bit confused by the code in the qed_mfw_process_tlv_req and
ends up spilling registers to the stack hundreds of times. When sanitizers
are enabled, this can end up blowing the stack warning limit:
Apparently the problem is the complexity of qed_mfw_update_tlvs()
after inlining, and marking the four main branches of that function
as noinline_for_stack makes this problem completely go away, the stack
usage goes down to 100 bytes.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 20 Jun 2025 14:28:44 +0000 (14:28 +0000)]
atm: clip: prevent NULL deref in clip_push()
Blamed commit missed that vcc_destroy_socket() calls
clip_push() with a NULL skb.
If clip_devs is NULL, clip_push() then crashes when reading
skb->truesize.
Fixes: 93a2014afbac ("atm: fix a UAF in lec_arp_clear_vccs()") Reported-by: syzbot+1316233c4c6803382a8b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68556f59.a00a0220.137b3.004e.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Gengming Liu <l.dmxcsnsbh@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Faisal Bukhari [Sat, 21 Jun 2025 10:32:04 +0000 (16:02 +0530)]
Fix typo in marvell octeontx2 documentation
Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
Fixes a spelling mistake: "funcionality" → "functionality".
Signed-off-by: Faisal Bukhari <faisalbukhari523@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Long Li [Wed, 18 Jun 2025 01:36:46 +0000 (18:36 -0700)]
net: mana: Record doorbell physical address in PF mode
MANA supports RDMA in PF mode. The driver should record the doorbell
physical address when in PF mode.
The doorbell physical address is used by the RDMA driver to map
doorbell pages of the device to user-mode applications through RDMA
verbs interface. In the past, they have been mapped to user-mode while
the device is in VF mode. With the support for PF mode implemented,
also expose those pages in PF mode.
Support for PF mode is implemented in 290e5d3c49f6 ("net: mana: Add support for Multi Vports on Bare metal")
Linus Torvalds [Thu, 19 Jun 2025 17:21:32 +0000 (10:21 -0700)]
Merge tag 'net-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless.
The ath12k fix to avoid FW crashes requires adding support for a
number of new FW commands so it's quite large in terms of LoC. The
rest is relatively small.
Current release - fix to a fix:
- ptp: fix breakage after ptp_vclock_in_use() rework
Current release - regressions:
- openvswitch: allocate struct ovs_pcpu_storage dynamically, static
allocation may exhaust module loader limit on smaller systems
Previous releases - regressions:
- tcp: fix tcp_packet_delayed() for peers with no selective ACK
support
Previous releases - always broken:
- wifi: ath12k: don't activate more links than firmware supports
- tcp: make sure sockets open via passive TFO have valid NAPI ID
- eth: bnxt_en: update MRU and RSS table of RSS contexts on queue
reset, prevent Rx queues from silently hanging after queue reset
- NFC: uart: set tty->disc_data only in success path"
* tag 'net-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits)
net: airoha: Differentiate hwfd buffer size for QDMA0 and QDMA1
net: airoha: Compute number of descriptors according to reserved memory size
tools: ynl: fix mixing ops and notifications on one socket
net: atm: fix /proc/net/atm/lec handling
net: atm: add lec_mutex
mlxbf_gige: return EPROBE_DEFER if PHY IRQ is not available
net: airoha: Always check return value from airoha_ppe_foe_get_entry()
NFC: nci: uart: Set tty->disc_data only in success path
calipso: Fix null-ptr-deref in calipso_req_{set,del}attr().
MAINTAINERS: Remove Shannon Nelson from MAINTAINERS file
net: lan743x: fix potential out-of-bounds write in lan743x_ptp_io_event_clock_get()
eth: fbnic: avoid double free when failing to DMA-map FW msg
tcp: fix passive TFO socket having invalid NAPI ID
selftests: net: add test for passive TFO socket NAPI ID
selftests: net: add passive TFO test binary
selftests: netdevsim: improve lib.sh include in peer.sh
tipc: fix null-ptr-deref when acquiring remote ip of ethernet bearer
Octeontx2-pf: Fix Backpresure configuration
net: ftgmac100: select FIXED_PHY
net: ethtool: remove duplicate defines for family info
...
Lorenzo Bianconi [Thu, 19 Jun 2025 07:07:25 +0000 (09:07 +0200)]
net: airoha: Differentiate hwfd buffer size for QDMA0 and QDMA1
EN7581 SoC allows configuring the size and the number of buffers in
hwfd payload queue for both QDMA0 and QDMA1.
In order to reduce the required DRAM used for hwfd buffers queues and
decrease the memory footprint, differentiate hwfd buffer size for QDMA0
and QDMA1 and reduce hwfd buffer size to 1KB for QDMA1 (WAN) while
maintaining 2KB for QDMA0 (LAN).
Lorenzo Bianconi [Thu, 19 Jun 2025 07:07:24 +0000 (09:07 +0200)]
net: airoha: Compute number of descriptors according to reserved memory size
In order to not exceed the reserved memory size for hwfd buffers,
compute the number of hwfd buffers/descriptors according to the
reserved memory size and the size of each hwfd buffer (2KB).
Fixes: 3a1ce9e3d01b ("net: airoha: Add the capability to allocate hwfd buffers via reserved-memory") Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250619-airoha-hw-num-desc-v4-1-49600a9b319a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 19 Jun 2025 15:38:40 +0000 (08:38 -0700)]
Merge tag 'wireless-2025-06-18' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
More fixes:
- ath12k
- avoid busy-waiting
- activate correct number of links
- iwlwifi
- iwldvm regression (lots of warnings)
- iwlmld merge damage regression (crash)
- fix build with some old gcc versions
- carl9170: don't talk to device w/o FW [syzbot]
- ath6kl: remove bad FW WARN [syzbot]
- ieee80211: use variable-length arrays [syzbot]
- mac80211
- remove WARN on delayed beacon update [syzbot]
- drop OCB frames with invalid source [syzbot]
* tag 'wireless-2025-06-18' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: Fix incorrect logic on cmd_ver range checking
wifi: iwlwifi: dvm: restore n_no_reclaim_cmds setting
wifi: iwlwifi: cfg: Limit cb_size to valid range
wifi: iwlwifi: restore missing initialization of async_handlers_list (again)
wifi: ath6kl: remove WARN on bad firmware input
wifi: carl9170: do not ping device which has failed to load firmware
wifi: ath12k: don't wait when there is no vdev started
wifi: ath12k: don't use static variables in ath12k_wmi_fw_stats_process()
wifi: ath12k: avoid burning CPU while waiting for firmware stats
wifi: ath12k: fix documentation on firmware stats
wifi: ath12k: don't activate more links than firmware supports
wifi: ath12k: update link active in case two links fall on the same MAC
wifi: ath12k: support WMI_MLO_LINK_SET_ACTIVE_CMDID command
wifi: ath12k: update freq range for each hardware mode
wifi: ath12k: parse and save sbs_lower_band_end_freq from WMI_SERVICE_READY_EXT2_EVENTID event
wifi: ath12k: parse and save hardware mode info from WMI_SERVICE_READY_EXT_EVENTID event for later use
wifi: ath12k: Avoid CPU busy-wait by handling VDEV_STAT and BCN_STAT
wifi: mac80211: don't WARN for late channel/color switch
wifi: mac80211: drop invalid source address OCB frames
wifi: remove zero-length arrays
====================
Jakub Kicinski [Wed, 18 Jun 2025 17:17:46 +0000 (10:17 -0700)]
tools: ynl: fix mixing ops and notifications on one socket
The multi message support loosened the connection between the request
and response handling, as we can now submit multiple requests before
we start processing responses. Passing the attr set to NlMsgs decoding
no longer makes sense (if it ever did), attr set may differ message
by messsage. Isolate the part of decoding responsible for attr-set
specific interpretation and call it once we identified the correct op.
Without this fix performing SET operation on an ethtool socket, while
being subscribed to notifications causes:
# File "tools/net/ynl/pyynl/lib/ynl.py", line 1096, in _op
# Exception| return self._ops(ops)[0]
# Exception| ~~~~~~~~~^^^^^
# File "tools/net/ynl/pyynl/lib/ynl.py", line 1040, in _ops
# Exception| nms = NlMsgs(reply, attr_space=op.attr_set)
# Exception| ^^^^^^^^^^^
The value of op we use on line 1040 is stale, it comes form the previous
loop. If a notification comes before a response we will update op to None
and the next iteration thru the loop will break with the trace above.
Fixes: 6fda63c45fe8 ("tools/net/ynl: fix cli.py --subscribe feature") Fixes: ba8be00f68f5 ("tools/net/ynl: Add multi message support to ynl") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250618171746.1201403-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 18 Jun 2025 14:08:43 +0000 (14:08 +0000)]
net: atm: add lec_mutex
syzbot found its way in net/atm/lec.c, and found an error path
in lecd_attach() could leave a dangling pointer in dev_lec[].
Add a mutex to protect dev_lecp[] uses from lecd_attach(),
lec_vcc_attach() and lec_mcast_attach().
Following patch will use this mutex for /proc/net/atm/lec.
BUG: KASAN: slab-use-after-free in lecd_attach net/atm/lec.c:751 [inline]
BUG: KASAN: slab-use-after-free in lane_ioctl+0x2224/0x23e0 net/atm/lec.c:1008
Read of size 8 at addr ffff88807c7b8e68 by task syz.1.17/6142
David Thompson [Wed, 18 Jun 2025 13:59:02 +0000 (13:59 +0000)]
mlxbf_gige: return EPROBE_DEFER if PHY IRQ is not available
The message "Error getting PHY irq. Use polling instead"
is emitted when the mlxbf_gige driver is loaded by the
kernel before the associated gpio-mlxbf driver, and thus
the call to get the PHY IRQ fails since it is not yet
available. The driver probe() must return -EPROBE_DEFER
if acpi_dev_gpio_irq_get_by() returns the same.
Fixes: 6c2a6ddca763 ("net: mellanox: mlxbf_gige: Replace non-standard interrupt handling") Signed-off-by: David Thompson <davthompson@nvidia.com> Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250618135902.346-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Krzysztof Kozlowski [Wed, 18 Jun 2025 07:36:50 +0000 (09:36 +0200)]
NFC: nci: uart: Set tty->disc_data only in success path
Setting tty->disc_data before opening the NCI device means we need to
clean it up on error paths. This also opens some short window if device
starts sending data, even before NCIUARTSETDRIVER IOCTL succeeded
(broken hardware?). Close the window by exposing tty->disc_data only on
the success path, when opening of the NCI device and try_module_get()
succeeds.
The code differs in error path in one aspect: tty->disc_data won't be
ever assigned thus NULL-ified. This however should not be relevant
difference, because of "tty->disc_data=NULL" in nci_uart_tty_open().
Kuniyuki Iwashima [Tue, 17 Jun 2025 22:40:42 +0000 (15:40 -0700)]
calipso: Fix null-ptr-deref in calipso_req_{set,del}attr().
syzkaller reported a null-ptr-deref in sock_omalloc() while allocating
a CALIPSO option. [0]
The NULL is of struct sock, which was fetched by sk_to_full_sk() in
calipso_req_setattr().
Since commit a1a5344ddbe8 ("tcp: avoid two atomic ops for syncookies"),
reqsk->rsk_listener could be NULL when SYN Cookie is returned to its
client, as hinted by the leading SYN Cookie log.
Here are 3 options to fix the bug:
1) Return 0 in calipso_req_setattr()
2) Return an error in calipso_req_setattr()
3) Alaways set rsk_listener
1) is no go as it bypasses LSM, but 2) effectively disables SYN Cookie
for CALIPSO. 3) is also no go as there have been many efforts to reduce
atomic ops and make TCP robust against DDoS. See also commit 3b24d854cb35
("tcp/dccp: do not touch listener sk_refcnt under synflood").
As of the blamed commit, SYN Cookie already did not need refcounting,
and no one has stumbled on the bug for 9 years, so no CALIPSO user will
care about SYN Cookie.
Let's return an error in calipso_req_setattr() and calipso_req_delattr()
in the SYN Cookie case.
This can be reproduced by [1] on Fedora and now connect() of nc times out.
Fixes: e1adea927080 ("calipso: Allow request sockets to be relabelled by the lsm.") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: John Cheung <john.cs.hey@gmail.com> Closes: https://lore.kernel.org/netdev/CAP=Rh=MvfhrGADy+-WJiftV2_WzMH4VEhEFmeT28qY+4yxNu4w@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20250617224125.17299-1-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Mon, 16 Jun 2025 22:44:37 +0000 (15:44 -0700)]
MAINTAINERS: Remove Shannon Nelson from MAINTAINERS file
Brett Creeley is taking ownership of AMD/Pensando drivers while I wander
off into the sunset with my retirement this month. I'll still keep an
eye out on a few topics for awhile, and maybe do some free-lance work in
the future.
Meanwhile, thank you all for the fun and support and the many learning
opportunities :-).
Special thanks go to DaveM for merging my first patch long ago, the big
ionic patchset a few years ago, and my last patchset last week.
Alexey Kodanev [Mon, 16 Jun 2025 11:37:43 +0000 (11:37 +0000)]
net: lan743x: fix potential out-of-bounds write in lan743x_ptp_io_event_clock_get()
Before calling lan743x_ptp_io_event_clock_get(), the 'channel' value
is checked against the maximum value of PCI11X1X_PTP_IO_MAX_CHANNELS(8).
This seems correct and aligns with the PTP interrupt status register
(PTP_INT_STS) specifications.
However, lan743x_ptp_io_event_clock_get() writes to ptp->extts[] with
only LAN743X_PTP_N_EXTTS(4) elements, using channel as an index:
====================
net: fix passive TFO socket having invalid NAPI ID
Found a bug where an accepted passive TFO socket returns an invalid NAPI
ID (i.e. 0) from SO_INCOMING_NAPI_ID. Add a selftest for this using
netdevsim and fix the bug.
Patch 1 is a drive-by fix for the lib.sh include in an existing
drivers/net/netdevsim/peer.sh selftest.
Patch 2 adds a test binary for a simple TFO server/client.
Patch 3 adds a selftest for checking that the NAPI ID of a passive TFO
socket is valid. This will currently fail.
Patch 4 adds a fix for the bug.
====================
David Wei [Tue, 17 Jun 2025 21:21:02 +0000 (14:21 -0700)]
tcp: fix passive TFO socket having invalid NAPI ID
There is a bug with passive TFO sockets returning an invalid NAPI ID 0
from SO_INCOMING_NAPI_ID. Normally this is not an issue, but zero copy
receive relies on a correct NAPI ID to process sockets on the right
queue.
Fix by adding a sk_mark_napi_id_set().
Fixes: e5907459ce7e ("tcp: Record Rx hash and NAPI ID in tcp_child_process") Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250617212102.175711-5-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 19 Jun 2025 00:47:27 +0000 (17:47 -0700)]
Merge tag '6.16-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix alternate data streams bug
- Important fix for null pointer deref with Kerberos authentication
- Fix oops in smbdirect (RDMA) in free_transport
* tag '6.16-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: handle set/get info file for streamed file
ksmbd: fix null pointer dereference in destroy_previous_session
ksmbd: add free_transport ops in ksmbd connection
Linus Torvalds [Wed, 18 Jun 2025 21:31:16 +0000 (14:31 -0700)]
Merge tag 'driver-core-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Fix a race condition in Devres::drop(). This depends on two other
patches:
- (Minimal) Rust abstractions for struct completion
- Let Revocable indicate whether its data is already being revoked
- Fix Devres to avoid exposing the internal Revocable
- Add .mailmap entry for Danilo Krummrich
- Add Madhavan Srinivasan to embargoed-hardware-issues.rst
* tag 'driver-core-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
Documentation: embargoed-hardware-issues.rst: Add myself for Power
mailmap: add entry for Danilo Krummrich
rust: devres: do not dereference to the internal Revocable
rust: devres: fix race in Devres::drop()
rust: revocable: indicate whether `data` has been revoked already
rust: completion: implement initial abstraction
Linus Torvalds [Wed, 18 Jun 2025 21:25:50 +0000 (14:25 -0700)]
Merge tag 'cgroup-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
- In cgroup1 freezer, a task migrating into a frozen cgroup might not
get frozen immediately due to the wrong operation order. Fix it.
* tag 'cgroup-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup,freezer: fix incomplete freezing when attaching tasks
Linus Torvalds [Wed, 18 Jun 2025 21:22:31 +0000 (14:22 -0700)]
Merge tag 'wq-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
- Fix missed early init of wq_isolated_cpumask
* tag 'wq-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Initialize wq_isolated_cpumask in workqueue_init_early()
Each link supports multiple channels for example RPM link supports
16 channels. In case of link oversubsribe, NIX will assert backpressure
on receive channels.
The previous patch considered a single channel per link, resulting in
backpressure not being enabled on the remaining channels
Fixes: a7ef63dbd588 ("octeontx2-af: Disable backpressure between CPT and NIX") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250617063403.3582210-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 18 Jun 2025 21:17:15 +0000 (14:17 -0700)]
Merge tag 'sched_ext-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix a couple bugs in cgroup cpu.weight support
- Add the new sched-ext@lists.linux.dev to MAINTAINERS
* tag 'sched_ext-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext, sched/core: Don't call scx_group_set_weight() prematurely from sched_create_group()
sched_ext: Make scx_group_set_weight() always update tg->scx.weight
sched_ext: Update mailing list entry in MAINTAINERS
Jakub Kicinski [Wed, 18 Jun 2025 21:15:15 +0000 (14:15 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-06-17 (ice, e1000e)
For ice:
Krishna Kumar modifies aRFS match criteria to correctly identify
matching filters.
Grzegorz fixes a memory leak in eswitch legacy mode.
For e1000e:
Vitaly sets clock frequency on some Nahum systems which may misreport
their value.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000e: set fixed clock frequency indication for Nahum 11 and Nahum 13
ice: fix eswitch code memory leak in reset scenario
net: ice: Perform accurate aRFS flow match
====================
Heiner Kallweit [Tue, 17 Jun 2025 18:20:17 +0000 (20:20 +0200)]
net: ftgmac100: select FIXED_PHY
Depending on e.g. DT configuration this driver uses a fixed link.
So we shouldn't rely on the user to enable FIXED_PHY, select it in
Kconfig instead. We may end up with a non-functional driver otherwise.
Jakub Kicinski [Tue, 17 Jun 2025 20:22:40 +0000 (13:22 -0700)]
net: ethtool: remove duplicate defines for family info
Commit under fixes switched to uAPI generation from the YAML
spec. A number of custom defines were left behind, mostly
for commands very hard to express in YAML spec.
Among what was left behind was the name and version of
the generic netlink family. Problem is that the codegen
always outputs those values so we ended up with a duplicated,
differently named set of defines.
Provide naming info in YAML and remove the incorrect defines.
Linus Torvalds [Wed, 18 Jun 2025 21:09:22 +0000 (14:09 -0700)]
Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fixes from Eric Biggers:
- Fix a regression in the arm64 Poly1305 code
- Fix a couple compiler warnings
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto/poly1305: Fix arm64's poly1305_blocks_arch()
lib/crypto/curve25519-hacl64: Disable KASAN with clang-17 and older
lib/crypto: Annotate crypto strings with nonstring
When tasks are migrated to a frozen cgroup, the freezer fails to
immediately freeze the tasks, causing the cgroup to remain in the
"FREEZING".
The freeze_task() function is called before clearing the CGROUP_FROZEN
flag. This causes the freezing() check to incorrectly return false,
preventing __freeze_task() from being invoked for the migrated task.
To fix this issue, clear the CGROUP_FROZEN state before calling
freeze_task().
Linus Torvalds [Wed, 18 Jun 2025 17:32:01 +0000 (10:32 -0700)]
Merge tag 'selinux-pr-20250618' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"A small SELinux patch to resolve a UBSAN warning in the
xfrm/labeled-IPsec code"
* tag 'selinux-pr-20250618' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix selinux_xfrm_alloc_user() to set correct ctx_len
Linus Torvalds [Wed, 18 Jun 2025 17:24:50 +0000 (10:24 -0700)]
Merge tag 'ftrace-v6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace fix from Steven Rostedt:
- Do not blindly enable function_graph tracer when updating
funcgraph-args
When the option to trace function arguments in the function graph
trace is updated, it requires the function graph tracer to switch its
callback routine. It disables function graph tracing, updates the
callback and then re-enables function graph tracing.
The issue is that it doesn't check if function graph tracing is
currently enabled or not. If it is not enabled, it will try to
disable it and re-enable it (which will actually enable it even
though it is not the current tracer). This causes an issue in the
accounting and will trigger a WARN_ON() if the function tracer is
enabled after that.
* tag 'ftrace-v6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fgraph: Do not enable function_graph tracer when setting funcgraph-args
Linus Torvalds [Wed, 18 Jun 2025 17:21:39 +0000 (10:21 -0700)]
Merge tag 'ata-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:
- Force PIO for ATAPI devices on VT6415/VT6330 as the controller locks
up on ATAPI DMA (Tasos)
- Fix ACPI PATA cable type detection such that the controller is not
forced down to a slow transfer mode (Tasos)
- Fix build error on 32-bit UML (Johannes)
- Fix a PCI region leak in the pata_macio driver so that the driver no
longer fails to load after rmmod (Philipp)
- Use correct DMI BIOS build date for ThinkPad W541 quirk (me)
- Disallow LPM for ASUSPRO-D840SA motherboard as this board
interestingly enough gets graphical corruptions on the iGPU when LPM
is enabled (me)
- Disallow LPM for Asus B550-F motherboard as this board will get
command timeouts on ports 5 and 6, yet LPM with the same drive works
fine on all other ports (Mikko)
* tag 'ata-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: ahci: Disallow LPM for Asus B550-F motherboard
ata: ahci: Disallow LPM for ASUSPRO-D840SA motherboard
ata: ahci: Use correct BIOS build date for ThinkPad W541 quirk
ata: pata_macio: Fix PCI region leak
ata: pata_cs5536: fix build on 32-bit UML
ata: libata-acpi: Do not assume 40 wire cable if no devices are enabled
ata: pata_via: Force PIO for ATAPI devices on VT6415/VT6330
Steven Rostedt [Wed, 18 Jun 2025 11:38:01 +0000 (07:38 -0400)]
fgraph: Do not enable function_graph tracer when setting funcgraph-args
When setting the funcgraph-args option when function graph tracer is net
enabled, it incorrectly enables it. Worse, it unregisters itself when it
was never registered. Then when it gets enabled again, it will register
itself a second time causing a WARNing.
Have the setting of the funcgraph-args check if function_graph tracer is
the current tracer of the instance, and if not, do nothing, as there's
nothing to do (the option is checked when function_graph tracing starts).
Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/20250618073801.057ea636@gandalf.local.home Fixes: c7a60a733c373 ("ftrace: Have funcgraph-args take affect during tracing") Closes: https://lore.kernel.org/all/4ab1a7bdd0174ab09c7b0d68cdbff9a4@huawei.com/ Reported-by: Changbin Du <changbin.du@huawei.com> Tested-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Neal Cardwell [Fri, 13 Jun 2025 19:30:56 +0000 (15:30 -0400)]
tcp: fix tcp_packet_delayed() for tcp_is_non_sack_preventing_reopen() behavior
After the following commit from 2024:
commit e37ab7373696 ("tcp: fix to allow timestamp undo if no retransmits were sent")
...there was buggy behavior where TCP connections without SACK support
could easily see erroneous undo events at the end of fast recovery or
RTO recovery episodes. The erroneous undo events could cause those
connections to suffer repeated loss recovery episodes and high
retransmit rates.
The problem was an interaction between the non-SACK behavior on these
connections and the undo logic. The problem is that, for non-SACK
connections at the end of a loss recovery episode, if snd_una ==
high_seq, then tcp_is_non_sack_preventing_reopen() holds steady in
CA_Recovery or CA_Loss, but clears tp->retrans_stamp to 0. Then upon
the next ACK the "tcp: fix to allow timestamp undo if no retransmits
were sent" logic saw the tp->retrans_stamp at 0 and erroneously
concluded that no data was retransmitted, and erroneously performed an
undo of the cwnd reduction, restoring cwnd immediately to the value it
had before loss recovery. This caused an immediate burst of traffic
and build-up of queues and likely another immediate loss recovery
episode.
This commit fixes tcp_packet_delayed() to ignore zero retrans_stamp
values for non-SACK connections when snd_una is at or above high_seq,
because tcp_is_non_sack_preventing_reopen() clears retrans_stamp in
this case, so it's not a valid signal that we can undo.
Note that the commit named in the Fixes footer restored long-present
behavior from roughly 2005-2019, so apparently this bug was present
for a while during that era, and this was simply not caught.
Fixes: e37ab7373696 ("tcp: fix to allow timestamp undo if no retransmits were sent") Reported-by: Eric Wheeler <netdev@lists.ewheeler.net> Closes: https://lore.kernel.org/netdev/64ea9333-e7f9-0df-b0f2-8d566143acab@ewheeler.net/ Signed-off-by: Neal Cardwell <ncardwell@google.com> Co-developed-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 22 May 2025 12:17:03 +0000 (13:17 +0100)]
wifi: iwlwifi: Fix incorrect logic on cmd_ver range checking
The current cmd_ver range checking on cmd_ver < 1 && cmd_ver > 3 can
never be true because the logical operator && is being used, cmd_ver
can never be less than 1 and also greater than 3. Fix this by using
the logical || operator.
Apparently I accidentally removed this setting in my transport
configuration rework, leading to an endless stream of warnings
from the PCIe code when relevant notifications are received by
the driver from firmware. Restore it.
Pei Xiao [Tue, 27 May 2025 08:03:55 +0000 (16:03 +0800)]
wifi: iwlwifi: cfg: Limit cb_size to valid range
on arm64 defconfig build failed with gcc-8:
drivers/net/wireless/intel/iwlwifi/pcie/ctxt-info.c:208:3:
include/linux/bitfield.h:195:3: error: call to '__field_overflow'
declared with attribute error: value doesn't fit into mask
__field_overflow(); \
^~~~~~~~~~~~~~~~~~
include/linux/bitfield.h:215:2: note: in expansion of macro '____MAKE_OP'
____MAKE_OP(u##size,u##size,,)
^~~~~~~~~~~
include/linux/bitfield.h:218:1: note: in expansion of macro '__MAKE_OP'
__MAKE_OP(32)
Johannes Berg [Wed, 18 Jun 2025 07:05:26 +0000 (09:05 +0200)]
Merge tag 'ath-current-20250617' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath
Jeff Johnson says:
==================
ath.git updates for v6.16-rc3
Fix the following 3 issues:
wifi: ath12k: avoid burning CPU while waiting for firmware stats
wifi: ath12k: don't activate more links than firmware supports
wifi: carl9170: do not ping device which has failed to load firmware
==================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Tue, 17 Jun 2025 09:45:29 +0000 (11:45 +0200)]
wifi: ath6kl: remove WARN on bad firmware input
If the firmware gives bad input, that's nothing to do with
the driver's stack at this point etc., so the WARN_ON()
doesn't add any value. Additionally, this is one of the
top syzbot reports now. Just print a message, and as an
added bonus, print the sizes too.
Kuniyuki Iwashima [Mon, 16 Jun 2025 18:21:14 +0000 (11:21 -0700)]
atm: atmtcp: Free invalid length skb in atmtcp_c_send().
syzbot reported the splat below. [0]
vcc_sendmsg() copies data passed from userspace to skb and passes
it to vcc->dev->ops->send().
atmtcp_c_send() accesses skb->data as struct atmtcp_hdr after
checking if skb->len is 0, but it's not enough.
Also, when skb->len == 0, skb and sk (vcc) were leaked because
dev_kfree_skb() is not called and sk_wmem_alloc adjustment is missing
to revert atm_account_tx() in vcc_sendmsg(), which is expected
to be done in atm_pop_raw().
Let's properly free skb with an invalid length in atmtcp_c_send().
Dmitry Antipov [Mon, 16 Jun 2025 18:12:05 +0000 (21:12 +0300)]
wifi: carl9170: do not ping device which has failed to load firmware
Syzkaller reports [1, 2] crashes caused by an attempts to ping
the device which has failed to load firmware. Since such a device
doesn't pass 'ieee80211_register_hw()', an internal workqueue
managed by 'ieee80211_queue_work()' is not yet created and an
attempt to queue work on it causes null-ptr-deref.
Fixes: e4a668c59080 ("carl9170: fix spurious restart due to high latency") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Christian Lamparter <chunkeey@gmail.com> Link: https://patch.msgid.link/20250616181205.38883-1-dmantipov@yandex.ru Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Baochen Qiang [Thu, 12 Jun 2025 01:31:52 +0000 (09:31 +0800)]
wifi: ath12k: don't wait when there is no vdev started
For WMI_REQUEST_VDEV_STAT request, firmware might split response into
multiple events dut to buffer limit, hence currently in
ath12k_wmi_fw_stats_process() host waits until all events received. In
case there is no vdev started, this results in that below condition
would never get satisfied
Baochen Qiang [Thu, 12 Jun 2025 01:31:51 +0000 (09:31 +0800)]
wifi: ath12k: don't use static variables in ath12k_wmi_fw_stats_process()
Currently ath12k_wmi_fw_stats_process() is using static variables to count
firmware stat events. Taking num_vdev as an example, if for whatever
reason (say ar->num_started_vdevs is 0 or firmware bug etc.) the following
condition
(++num_vdev) == total_vdevs_started
is not met, is_end is not set thus num_vdev won't be cleared. Next time
when firmware stats is requested again, even if everything is working
fine, failure is expected due to the condition above will never be
satisfied.
The same applies to num_bcn as well.
Change to use non-static counters and reset them each time before firmware
stats is requested.
Baochen Qiang [Thu, 12 Jun 2025 01:31:50 +0000 (09:31 +0800)]
wifi: ath12k: avoid burning CPU while waiting for firmware stats
ath12k_mac_get_fw_stats() is busy polling fw_stats_done flag while waiting
firmware finishing sending all events. This is not good as CPU is
monopolized and kept burning during the wait.
Baochen Qiang [Thu, 12 Jun 2025 01:31:49 +0000 (09:31 +0800)]
wifi: ath12k: fix documentation on firmware stats
Regarding the firmware stats events handling, the comment in
ath12k_mac_get_fw_stats() says host determines whether all events have been
received based on 'end' tag in TLV. This is wrong as there is no such tag
at all, actually host makes the decision totally by itself based on the
stats type and active pdev/vdev counts etc.
this results in firmware crash when the number of links activated on the
same device is more than supported.
Since firmware supports activating at most 2 links for a ML connection,
to avoid firmware crash, host needs to select 2 links out of the useful
links. As the assoc link has already been chosen, the question becomes
how to determine partner links. A straightforward principle applied
here is that the resulted combination should achieve the best throughput.
For that purpose, ideally various factors like bandwidth, RSSI etc should
be considered. But that would be too complicate. To make it easy, the
choice is to only take hardware modes into consideration.
The SBS (single band simultaneously) mode frequency range covers 5 GHz
and 6 GHz bands. In this mode, the two individual MACs are both active,
with one working on 5g-high band and the other on 5g-low band (from
hardware perspective 5 GHz and 6 GHz bands are referred to as a 'large'
single 5 GHz band). The DBS (dual band simultaneously) mode covers 2 GHz
band and the 'large' 5 GHz band, with one MAC working on 2 GHz band and
the other working on 5 GHz band or 6 GHz band. Since 5,6 GHz bands could
provide higher bandwidth than 2 GHz band, the preference is given to SBS
mode. Other hardware modes results in only one working MAC at any given
time, so it is chosen only when both SBS are DBS are not possible.
For each hardware mode, if there are more than one partner candidate,
just choose the first one.
For now only single device MLO case is handled as it is easy. Other cases
could be addressed in the future.
Baochen Qiang [Thu, 22 May 2025 08:54:14 +0000 (16:54 +0800)]
wifi: ath12k: update link active in case two links fall on the same MAC
In case of two links established on the same device in an ML connection,
depending on device's hardware mode capability, it is possible that both
links fall on the same MAC. Currently, no specific action is taken to
address this but just keep both links active. However this would result
in lower throughput compared to even one link, because switching between
these two links on the resulted MAC significantly impacts throughput.
Check if both links fall in the frequency range of a single MAC. If
so, send WMI_MLO_LINK_SET_ACTIVE_CMDID command to firmware such that
firmware can deactivate one of them. Note the decision of which link
getting deactivated is made by firmware, host only sends the vdev
lists.
Baochen Qiang [Thu, 22 May 2025 08:54:13 +0000 (16:54 +0800)]
wifi: ath12k: support WMI_MLO_LINK_SET_ACTIVE_CMDID command
Add WMI_MLO_LINK_SET_ACTIVE_CMDID command. This command allows host to
send required link information to firmware such that firmware can make
decision on activating/deactivating links in various scenarios.
Baochen Qiang [Thu, 22 May 2025 08:54:12 +0000 (16:54 +0800)]
wifi: ath12k: update freq range for each hardware mode
Previous patches parse and save hardware MAC frequency range information
in ath12k_svc_ext_info structure. Such range represents hardware
capability hence needs to be updated based on host information, e.g. guard
the range based on host's low/high boundary.
So update frequency range. The updated range is saved in
ath12k_hw_mode_info structure and would be used when doing vdev activation
and link selection in following patches.
Baochen Qiang [Thu, 22 May 2025 08:54:11 +0000 (16:54 +0800)]
wifi: ath12k: parse and save sbs_lower_band_end_freq from WMI_SERVICE_READY_EXT2_EVENTID event
Firmware sends the boundary between lower and higher bands in
ath12k_wmi_dbs_or_sbs_cap_params structure embedded in
WMI_SERVICE_READY_EXT2_EVENTID event. The boundary is needed when
updating frequency range in the following patch. So parse and save
it for later use. Note ath12k_wmi_dbs_or_sbs_cap_params is placed
after some other structures, so placeholders for them are added
as well.
Baochen Qiang [Thu, 22 May 2025 08:54:10 +0000 (16:54 +0800)]
wifi: ath12k: parse and save hardware mode info from WMI_SERVICE_READY_EXT_EVENTID event for later use
WLAN hardware might support various hardware modes such as DBS (dual
band simultaneously), SBS (single band simultaneously) and DBS_OR_SBS
etc, see enum wmi_host_hw_mode_config_type. Firmware advertises actual
supported modes in WMI_SERVICE_READY_EXT_EVENTID event. For each mode,
firmware advertises frequency range each hardware MAC can operate on.
In MLO case such information is necessary during vdev activation and
link selection (which is done in following patches), so add a new
structure ath12k_svc_ext_info to ath12k_wmi_base, then parse and save
those information to it for later use.
Bjorn Andersson [Tue, 10 Jun 2025 03:06:22 +0000 (22:06 -0500)]
wifi: ath12k: Avoid CPU busy-wait by handling VDEV_STAT and BCN_STAT
When the ath12k driver is built without CONFIG_ATH12K_DEBUG, the
recently refactored stats code can cause any user space application
(such at NetworkManager) to consume 100% CPU for 3 seconds, every time
stats are read.
Commit 'b8a0d83fe4c7 ("wifi: ath12k: move firmware stats out of
debugfs")' moved ath12k_debugfs_fw_stats_request() out of debugfs, by
merging the additional logic into ath12k_mac_get_fw_stats().
Among the added responsibility of ath12k_mac_get_fw_stats() was the
busy-wait for `fw_stats_done`.
Signalling of `fw_stats_done` happens when one of the
WMI_REQUEST_PDEV_STAT, WMI_REQUEST_VDEV_STAT, and WMI_REQUEST_BCN_STAT
messages are received, but the handling of the latter two commands remained
in the debugfs code. As `fw_stats_done` isn't signalled, the calling
processes will spin until the timeout (3 seconds) is reached.
Moving the handling of these two additional responses out of debugfs
resolves the issue.
Jakub Kicinski [Tue, 17 Jun 2025 23:13:12 +0000 (16:13 -0700)]
Merge branch 'ptp_vclock-fixes'
Vladimir Oltean says:
====================
ptp_vclock fixes
While I was intending to test something else related to PTP in net-next
I noticed any time I would run ptp4l on an interface, the kernel would
print "ptp: physical clock is free running" and ptp4l would exit with an
error code.
I then found Jeongjun Park's patch and subsequent explanation provided
to Jakub's question, specifically related to the code which introduced
the breakage I am seeing.
https://lore.kernel.org/netdev/CAO9qdTEjQ5414un7Yw604paECF=6etVKSDSnYmZzZ6Pg3LurXw@mail.gmail.com/
I had to look at the original issue that prompted Jeongjun Park's patch,
and provide an alternative fix for it. Patch 1/2 in this set contains a
logical revert plus the alternative fix, squashed into one.
Patch 2/2 fixes another issue which was confusing during debugging/
characterization, namely: "ok, the kernel clearly thinks that any
physical clock is free-running after this change (despite there being no
vclocks), but why would ptp4l fail to create the clock altogether? Why
not just fail to adjust it?"
By reverting (locally) Jeongjun Park's commit, I could reproduce
the reported lockdep splat using the commands from patch 1/2's commit
message, and this goes away with the reworked implementation.
====================
Vladimir Oltean [Fri, 13 Jun 2025 17:47:49 +0000 (20:47 +0300)]
ptp: allow reading of currently dialed frequency to succeed on free-running clocks
There is a bug in ptp_clock_adjtime() which makes it refuse the
operation even if we just want to read the current clock dialed
frequency, not modify anything (tx->modes == 0). That should be possible
even if the clock is free-running. For context, the kernel UAPI is the
same for getting and setting the frequency of a POSIX clock.
For example, ptp4l errors out at clock_create() -> clockadj_get_freq()
-> clock_adjtime() time, when it should logically only have failed on
actual adjustments to the clock, aka if the clock was configured as
slave. But in master mode it should work.
This was discovered when examining the issue described in the previous
commit, where ptp_clock_freerun() returned true despite n_vclocks being
zero.
Vladimir Oltean [Fri, 13 Jun 2025 17:47:48 +0000 (20:47 +0300)]
ptp: fix breakage after ptp_vclock_in_use() rework
What is broken
--------------
ptp4l, and any other application which calls clock_adjtime() on a
physical clock, is greeted with error -EBUSY after commit 87f7ce260a3c
("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()").
Explanation for the breakage
----------------------------
The blamed commit was based on the false assumption that
ptp_vclock_in_use() callers already test for n_vclocks prior to calling
this function.
This is notably incorrect for the code path below, in which there is, in
fact, no n_vclocks test:
The result is that any clock adjustment on any physical clock is now
impossible. This is _despite_ there not being any vclock over this
physical clock.
$ ptp4l -i eno0 -2 -P -m
ptp4l[58.425]: selected /dev/ptp0 as PTP clock
[ 58.429749] ptp: physical clock is free running
ptp4l[58.431]: Failed to open /dev/ptp0: Device or resource busy
failed to create a clock
$ cat /sys/class/ptp/ptp0/n_vclocks
0
The patch makes the ptp_vclock_in_use() function say "if it's not a
virtual clock, then this physical clock does have virtual clocks on
top".
Then ptp_clock_freerun() uses this information to say "this physical
clock has virtual clocks on top, so it must stay free-running".
Then ptp_clock_adjtime() uses this information to say "well, if this
physical clock has to be free-running, I can't do it, return -EBUSY".
Simply put, ptp_vclock_in_use() cannot be simplified so as to remove the
test whether vclocks are in use.
What did the blamed commit intend to fix
----------------------------------------
The blamed commit presents a lockdep warning stating "possible recursive
locking detected", with the n_vclocks_store() and ptp_clock_unregister()
functions involved.
The issue can be triggered by creating and then deleting vclocks:
$ echo 2 > /sys/class/ptp/ptp0/n_vclocks
$ echo 0 > /sys/class/ptp/ptp0/n_vclocks
But note that in the original stack trace, the address of the first lock
is different from the address of the second lock. This is because at
step 1 marked above, &ptp->n_vclocks_mux is the lock of the parent
(physical) PTP clock, and at step 2, the lock is of the child (virtual)
PTP clock. They are different locks of different devices.
In this situation there is no real deadlock, the lockdep warning is
caused by the fact that the mutexes have the same lock class on both the
parent and the child. Functionally it is fine.
Proposed alternative solution
-----------------------------
We must reintroduce the body of ptp_vclock_in_use() mostly as it was
structured prior to the blamed commit, but avoid the lockdep warning.
Based on the fact that vclocks cannot be nested on top of one another
(ptp_is_attribute_visible() hides n_vclocks for virtual clocks), we
already know that ptp->n_vclocks is zero for a virtual clock. And
ptp->is_virtual_clock is a runtime invariant, established at
ptp_clock_register() time and never changed. There is no need to
serialize on any mutex in order to read ptp->is_virtual_clock, and we
take advantage of that by moving it outside the lock.
Thus, virtual clocks do not need to acquire &ptp->n_vclocks_mux at
all, and step 2 in the code walkthrough above can simply go away.
We can simply return false to the question "ptp_vclock_in_use(a virtual
clock)".
Other notes
-----------
Releasing &ptp->n_vclocks_mux before ptp_vclock_in_use() returns
execution seems racy, because the returned value can become stale as
soon as the function returns and before the return value is used (i.e.
n_vclocks_store() can run any time). The locking requirement should
somehow be transferred to the caller, to ensure a longer life time for
the returned value, but this seems out of scope for this severe bug fix.
Because we are also fixing up the logic from the original commit, there
is another Fixes: tag for that.
Fixes: 87f7ce260a3c ("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()") Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250613174749.406826-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 17 Jun 2025 22:56:33 +0000 (15:56 -0700)]
Merge branch 'bnxt_en-bug-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes
Thie first patch fixes a crash during PCIe AER when the bnxt_re RoCE
driver is loaded. The second patch is a refactor patch needed by
patch 3. Patch 3 fixes a packet drop issue if queue restart is done
on a ring belonging to a non-default RSS context. Patch 2 and 3 are
version 2 that has addressed the v1 issue by reducing the scope of
the traffic disruptions:
Pavan Chebbi [Fri, 13 Jun 2025 23:18:41 +0000 (16:18 -0700)]
bnxt_en: Update MRU and RSS table of RSS contexts on queue reset
The commit under the Fixes tag below which updates the VNICs' RSS
and MRU during .ndo_queue_start(), needs to be extended to cover any
non-default RSS contexts which have their own VNICs. Without this
step, packets that are destined to a non-default RSS context may be
dropped after .ndo_queue_start().
We further optimize this scheme by updating the VNIC only if the
RX ring being restarted is in the RSS table of the VNIC. Updating
the VNIC (in particular setting the MRU to 0) will momentarily stop
all traffic to all rings in the RSS table. Any VNIC that has the
RX ring excluded from the RSS table can skip this step and avoid the
traffic disruption.
Note that this scheme is just an improvement. A VNIC with multiple
rings in the RSS table will still see traffic disruptions to all rings
in the RSS table when one of the rings is being restarted. We are
working on a FW scheme that will improve upon this further.
Fixes: 5ac066b7b062 ("bnxt_en: Fix queue start to update vnic RSS table") Reported-by: David Wei <dw@davidwei.uk> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250613231841.377988-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pavan Chebbi [Fri, 13 Jun 2025 23:18:40 +0000 (16:18 -0700)]
bnxt_en: Add a helper function to configure MRU and RSS
Add a new helper function that will configure MRU and RSS table
of a VNIC. This will be useful when we configure both on a VNIC
when resetting an RX ring. This function will be used again in
the next bug fix patch where we have to reconfigure VNICs for RSS
contexts.
Suggested-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Wei <dw@davidwei.uk> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250613231841.377988-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalesh AP [Fri, 13 Jun 2025 23:18:39 +0000 (16:18 -0700)]
bnxt_en: Fix double invocation of bnxt_ulp_stop()/bnxt_ulp_start()
Before the commit under the Fixes tag below, bnxt_ulp_stop() and
bnxt_ulp_start() were always invoked in pairs. After that commit,
the new bnxt_ulp_restart() can be invoked after bnxt_ulp_stop()
has been called. This may result in the RoCE driver's aux driver
.suspend() method being invoked twice. The 2nd bnxt_re_suspend()
call will crash when it dereferences a NULL pointer:
Check the BNXT_EN_FLAG_ULP_STOPPED flag and do not proceed if the flag
is already set. This will preserve the original symmetrical
bnxt_ulp_stop() and bnxt_ulp_start().
Also, inside bnxt_ulp_start(), clear the BNXT_EN_FLAG_ULP_STOPPED
flag after taking the mutex to avoid any race condition. And for
symmetry, only proceed in bnxt_ulp_start() if the
BNXT_EN_FLAG_ULP_STOPPED is set.
Fixes: 3c163f35bd50 ("bnxt_en: Optimize recovery path ULP locking in the driver") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Co-developed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250613231841.377988-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 17 Jun 2025 22:48:38 +0000 (15:48 -0700)]
Merge tag 'linux-can-fixes-for-6.16-20250617' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-06-17
The patch is by Brett Werling, and fixes the power regulator retrieval
during probe of the tcan4x5x glue code for the m_can driver.
* tag 'linux-can-fixes-for-6.16-20250617' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: tcan4x5x: fix power regulator retrieval during probe
openvswitch: Allocate struct ovs_pcpu_storage dynamically
====================
Mina Almasry [Sun, 15 Jun 2025 20:07:33 +0000 (20:07 +0000)]
net: netmem: fix skb_ensure_writable with unreadable skbs
skb_ensure_writable should succeed when it's trying to write to the
header of the unreadable skbs, so it doesn't need an unconditional
skb_frags_readable check. The preceding pskb_may_pull() call will
succeed if write_len is within the head and fail if we're trying to
write to the unreadable payload, so we don't need an additional check.
Removing this check restores DSCP functionality with unreadable skbs as
it's called from dscp_tg.
Cc: willemb@google.com Cc: asml.silence@gmail.com Fixes: 65249feb6b3d ("net: add support for skbs with unreadable frags") Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250615200733.520113-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Meghana Malladi [Mon, 16 Jun 2025 06:33:19 +0000 (12:03 +0530)]
net: ti: icssg-prueth: Fix packet handling for XDP_TX
While transmitting XDP frames for XDP_TX, page_pool is
used to get the DMA buffers (already mapped to the pages)
and need to be freed/reycled once the transmission is complete.
This need not be explicitly done by the driver as this is handled
more gracefully by the xdp driver while returning the xdp frame.
__xdp_return() frees the XDP memory based on its memory type,
under which page_pool memory is also handled. This change fixes
the transmit queue timeout while running XDP_TX.
logs:
[ 309.069682] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 45860 ms
[ 313.933780] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 50724 ms
[ 319.053656] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 55844 ms
...
Namjae Jeon [Fri, 13 Jun 2025 02:51:29 +0000 (11:51 +0900)]
ksmbd: handle set/get info file for streamed file
The bug only appears when:
- windows 11 copies a file that has an alternate data stream
- streams_xattr is enabled on the share configuration.
Microsoft Edge adds a ZoneIdentifier data stream containing the URL
for files it downloads.
Another way to create a test file:
- open cmd.exe
- echo "hello from default data stream" > hello.txt
- echo "hello again from ads" > hello.txt:ads.txt
If you open the file using notepad, we'll see the first message.
If you run "notepad hello.txt:ads.txt" in cmd.exe, we should see
the second message.
dir /s /r should least all streams for the file.
The truncation happens because the windows 11 client sends
a SetInfo/EndOfFile message on the ADS, but it is instead applied
on the main file, because we don't check fp->stream.
When receiving set/get info file for a stream file, Change to process
requests using stream position and size.
Truncate is unnecessary for stream files, so we skip
set_file_allocation_info and set_end_of_file_info operations.
Reported-by: Marios Makassikis <mmakassikis@freebox.fr> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Fri, 13 Jun 2025 01:12:43 +0000 (10:12 +0900)]
ksmbd: fix null pointer dereference in destroy_previous_session
If client set ->PreviousSessionId on kerberos session setup stage,
NULL pointer dereference error will happen. Since sess->user is not
set yet, It can pass the user argument as NULL to destroy_previous_session.
sess->user will be set in ksmbd_krb5_authenticate(). So this patch move
calling destroy_previous_session() after ksmbd_krb5_authenticate().
Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-27391 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Tue, 10 Jun 2025 09:52:56 +0000 (18:52 +0900)]
ksmbd: add free_transport ops in ksmbd connection
free_transport function for tcp connection can be called from smbdirect.
It will cause kernel oops. This patch add free_transport ops in ksmbd
connection, and add each free_transports for tcp and smbdirect.
Fixes: 21a4e47578d4 ("ksmbd: fix use-after-free in __smb2_lease_break_noti()") Reviewed-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Chuyi Zhou [Tue, 17 Jun 2025 04:42:16 +0000 (12:42 +0800)]
workqueue: Initialize wq_isolated_cpumask in workqueue_init_early()
Now when isolcpus is enabled via the cmdline, wq_isolated_cpumask does
not include these isolated CPUs, even wq_unbound_cpumask has already
excluded them. It is only when we successfully configure an isolate cpuset
partition that wq_isolated_cpumask gets overwritten by
workqueue_unbound_exclude_cpumask(), including both the cmdline-specified
isolated CPUs and the isolated CPUs within the cpuset partitions.
Fix this issue by initializing wq_isolated_cpumask properly in
workqueue_init_early().
Fixes: fe28f631fa94 ("workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask") Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
- alienware-wmi-wmax: Revert G-Mode support as it lowers performance
- dell_rbu:
- Fix sparse lock context warning
- Fix list head usage
- Don't overwrite data buffer past the size of the last packet
- ideapad-laptop: Ensure EC is not polled too frequently
- intel-uncore-freq:
- Fail module load when plat_info is NULL
- Avoid a non-literal format string as it triggers a compiler warning
- intel/pmc: Add Lunar Lake and Panther Lake support to SSRAM Telemetry
- intel/power-domains: Fix error code in tpmi_init()
- samsung-galaxybook: Add support for Notebook 9 Pro and others
(SAM0426)
* tag 'platform-drivers-x86-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
Revert "platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware m16 R1"
platform/x86/amd/pmc: Add PCSpecialist Lafite Pro V 14M to 8042 quirks list
platform/x86/intel-uncore-freq: avoid non-literal format string
platform/x86/intel/pmc: Add Panther Lake support to Intel PMC SSRAM Telemetry
platform/x86/intel/pmc: Add Lunar Lake support to Intel PMC SSRAM Telemetry
MAINTAINERS: .mailmap: Update Hans de Goede's email address
platform/x86: dell_rbu: Bump version
platform/x86: dell_rbu: Stop overwriting data buffer
platform/x86: dell_rbu: Fix list usage
platform/x86: dell_rbu: Fix lock context warning
platform/x86/amd: pmf: Simplify error flow in amd_pmf_init_smart_pc()
platform/x86/amd: pmf: Prevent amd_pmf_tee_deinit() from running twice
platform/x86/amd: pmf: Use device managed allocations
x86/platform/amd: replace down_timeout() with down_interruptible()
x86/platform/amd: move final timeout check to after final sleep
platform/x86/amd: pmc: Clear metrics table at start of cycle
platform/x86/intel: power-domains: Fix error code in tpmi_init()
platform/x86: samsung-galaxybook: Add SAM0426
platform/x86/intel-uncore-freq: Fail module load when plat_info is NULL
platform/x86: ideapad-laptop: use usleep_range() for EC polling
Tejun Heo [Mon, 16 Jun 2025 20:13:25 +0000 (10:13 -1000)]
sched_ext, sched/core: Don't call scx_group_set_weight() prematurely from sched_create_group()
During task_group creation, sched_create_group() calls
scx_group_set_weight() with CGROUP_WEIGHT_DFL to initialize the sched_ext
portion. This is premature and ends up calling ops.cgroup_set_weight() with
an incorrect @cgrp before ops.cgroup_init() is called.
sched_create_group() should just initialize SCX related fields in the new
task_group. Fix it by factoring out scx_tg_init() from sched_init() and
making sched_create_group() call that function instead of
scx_group_set_weight().
v2: Retain CONFIG_EXT_GROUP_SCHED ifdef in sched_init() as removing it leads
to build failures on !CONFIG_GROUP_SCHED configs.
Vitaly Lifshits [Sun, 25 May 2025 08:38:43 +0000 (11:38 +0300)]
e1000e: set fixed clock frequency indication for Nahum 11 and Nahum 13
On some systems with Nahum 11 and Nahum 13 the value of the XTAL clock in
the software STRAP is incorrect. This causes the PTP timer to run at the
wrong rate and can lead to synchronization issues.
The STRAP value is configured by the system firmware, and a firmware
update is not always possible. Since the XTAL clock on these systems
always runs at 38.4MHz, the driver may ignore the STRAP and just set
the correct value.
Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake") Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Reviewed-by: Gil Fine <gil.fine@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Grzegorz Nitka [Fri, 16 May 2025 13:09:07 +0000 (15:09 +0200)]
ice: fix eswitch code memory leak in reset scenario
Add simple eswitch mode checker in attaching VF procedure and allocate
required port representor memory structures only in switchdev mode.
The reset flows triggers VF (if present) detach/attach procedure.
It might involve VF port representor(s) re-creation if the device is
configured is switchdev mode (not legacy one).
The memory was blindly allocated in current implementation,
regardless of the mode and not freed if in legacy mode.
Testing hints (ethX is PF netdev):
- create at least one VF
echo 1 > /sys/class/net/ethX/device/sriov_numvfs
- trigger the reset
echo 1 > /sys/class/net/ethX/device/reset
Fixes: 415db8399d06 ("ice: make representor code generic") Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Krishna Kumar [Tue, 20 May 2025 17:06:56 +0000 (22:36 +0530)]
net: ice: Perform accurate aRFS flow match
This patch fixes an issue seen in a large-scale deployment under heavy
incoming pkts where the aRFS flow wrongly matches a flow and reprograms the
NIC with wrong settings. That mis-steering causes RX-path latency spikes
and noisy neighbor effects when many connections collide on the same
hash (some of our production servers have 20-30K connections).
set_rps_cpu() calls ndo_rx_flow_steer() with flow_id that is calculated by
hashing the skb sized by the per rx-queue table size. This results in
multiple connections (even across different rx-queues) getting the same
hash value. The driver steer function modifies the wrong flow to use this
rx-queue, e.g.: Flow#1 is first added:
Flow#1: <ip1, port1, ip2, port2>, Hash 'h', q#10
Later when a new flow needs to be added:
Flow#2: <ip3, port3, ip4, port4>, Hash 'h', q#20
The driver finds the hash 'h' from Flow#1 and updates it to use q#20. This
results in both flows getting un-optimized - packets for Flow#1 goes to
q#20, and then reprogrammed back to q#10 later and so on; and Flow #2
programming is never done as Flow#1 is matched first for all misses. Many
flows may wrongly share the same hash and reprogram rules of the original
flow each with their own q#.
Tested on two 144-core servers with 16K netperf sessions for 180s. Netperf
clients are pinned to cores 0-71 sequentially (so that wrong packets on q#s
72-143 can be measured). IRQs are set 1:1 for queues -> CPUs, enable XPS,
enable aRFS (global value is 144 * rps_flow_cnt).
Test notes about results from ice_rx_flow_steer():
---------------------------------------------------
1. "Skip:" counter increments here:
if (fltr_info->q_index == rxq_idx ||
arfs_entry->fltr_state != ICE_ARFS_ACTIVE)
goto out;
2. "Add:" counter increments here:
ret = arfs_entry->fltr_info.fltr_id;
INIT_HLIST_NODE(&arfs_entry->list_entry);
3. "Update:" counter increments here:
/* update the queue to forward to on an already existing flow */
Runtime comparison: original code vs with the patch for different
rps_flow_cnt values.
+-------------------------------+--------------+--------------+
| rps_flow_cnt | 512 | 2048 |
+-------------------------------+--------------+--------------+
| Ratio of Pkts on Good:Bad q's | 214 vs 822K | 1.1M vs 980K |
| Avoid wrong aRFS programming | 0 vs 310K | 0 vs 30K |
| CPU User | 216 vs 183 | 216 vs 206 |
| CPU System | 1441 vs 1171 | 1447 vs 1320 |
| CPU Softirq | 1245 vs 920 | 1238 vs 961 |
| CPU Total | 29 vs 22.7 | 29 vs 24.9 |
| aRFS Update | 533K vs 59 | 521K vs 32 |
| aRFS Skip | 82M vs 77M | 7.2M vs 4.5M |
+-------------------------------+--------------+--------------+
A separate TCP_STREAM and TCP_RR with 1,4,8,16,64,128,256,512 connections
showed no performance degradation.
Some points on the patch/aRFS behavior:
1. Enabling full tuple matching ensures flows are always correctly matched,
even with smaller hash sizes.
2. 5-6% drop in CPU utilization as the packets arrive at the correct CPUs
and fewer calls to driver for programming on misses.
3. Larger hash tables reduces mis-steering due to more unique flow hashes,
but still has clashes. However, with larger per-device rps_flow_cnt, old
flows take more time to expire and new aRFS flows cannot be added if h/w
limits are reached (rps_may_expire_flow() succeeds when 10*rps_flow_cnt
pkts have been processed by this cpu that are not part of the flow).
Fixes: 28bf26724fdb0 ("ice: Implement aRFS") Signed-off-by: Krishna Kumar <krikku@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Brett Werling [Thu, 12 Jun 2025 19:18:25 +0000 (14:18 -0500)]
can: tcan4x5x: fix power regulator retrieval during probe
Fixes the power regulator retrieval in tcan4x5x_can_probe() by ensuring
the regulator pointer is not set to NULL in the successful return from
devm_regulator_get_optional().
Fixes: 3814ca3a10be ("can: tcan4x5x: tcan4x5x_can_probe(): turn on the power before parsing the config") Signed-off-by: Brett Werling <brett.werling@garmin.com> Link: https://patch.msgid.link/20250612191825.3646364-1-brett.werling@garmin.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
PERCPU_MODULE_RESERVE defines the maximum size that can by used for the
per-CPU data size used by modules. This is 8KiB.
Commit 035fcdc4d240c ("openvswitch: Merge three per-CPU structures into
one") restructured the per-CPU memory allocation for the module and
moved the separate alloc_percpu() invocations at module init time to a
static per-CPU variable which is allocated by the module loader.
The size of the per-CPU data section for openvswitch is 6488 bytes which
is ~80% of the available per-CPU memory. Together with a few other
modules it is easy to exhaust the available 8KiB of memory.
Allocate ovs_pcpu_storage dynamically at module init time.
Reported-by: Gal Pressman <gal@nvidia.com> Closes: https://lore.kernel.org/all/c401e017-f8db-4f57-a1cd-89beb979a277@nvidia.com Fixes: 035fcdc4d240c ("openvswitch: Merge three per-CPU structures into one") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250613123629.-XSoQTCu@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Johannes Berg [Tue, 17 Jun 2025 08:49:01 +0000 (10:49 +0200)]
wifi: mac80211: don't WARN for late channel/color switch
There's really no value in the WARN stack trace etc., the reason
for this happening isn't directly related to the calling function
anyway. Also, syzbot has been observing it constantly, and there's
no way we can resolve it there - those systems are just slow.
Instead print an error message (once) and add a comment about what
really causes this message.
Johannes Berg [Fri, 13 Jun 2025 22:30:37 +0000 (00:30 +0200)]
wifi: remove zero-length arrays
All of these are really meant to be variable-length, and
in the case of s1g_beacon it's actually accessed. Make that
one in particular, and a couple of others (that aren't used
as arrays now), actually variable.
Mikko Korhonen [Tue, 17 Jun 2025 06:18:41 +0000 (09:18 +0300)]
ata: ahci: Disallow LPM for Asus B550-F motherboard
Asus ROG STRIX B550-F GAMING (WI-FI) motherboard has problems on some
SATA ports with at least one hard drive model (WDC WD20EFAX-68FB5N0)
when LPM is enabled. Disabling LPM solves the issue.
Cc: stable@vger.kernel.org Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Signed-off-by: Mikko Korhonen <mjkorhon@gmail.com> Link: https://lore.kernel.org/r/20250617062055.784827-1-mjkorhon@gmail.com
[cassel: more detailed comment, make single line comments consistent] Signed-off-by: Niklas Cassel <cassel@kernel.org>
Stephen Smalley [Fri, 13 Jun 2025 19:37:05 +0000 (15:37 -0400)]
selinux: fix selinux_xfrm_alloc_user() to set correct ctx_len
We should count the terminating NUL byte as part of the ctx_len.
Otherwise, UBSAN logs a warning:
UBSAN: array-index-out-of-bounds in security/selinux/xfrm.c:99:14
index 60 is out of range for type 'char [*]'
The allocation itself is correct so there is no actual out of bounds
indexing, just a warning.