]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
4 months agonet: lan743x: fix potential out-of-bounds write in lan743x_ptp_io_event_clock_get()
Alexey Kodanev [Mon, 16 Jun 2025 11:37:43 +0000 (11:37 +0000)]
net: lan743x: fix potential out-of-bounds write in lan743x_ptp_io_event_clock_get()

Before calling lan743x_ptp_io_event_clock_get(), the 'channel' value
is checked against the maximum value of PCI11X1X_PTP_IO_MAX_CHANNELS(8).
This seems correct and aligns with the PTP interrupt status register
(PTP_INT_STS) specifications.

However, lan743x_ptp_io_event_clock_get() writes to ptp->extts[] with
only LAN743X_PTP_N_EXTTS(4) elements, using channel as an index:

    lan743x_ptp_io_event_clock_get(..., u8 channel,...)
    {
        ...
        /* Update Local timestamp */
        extts = &ptp->extts[channel];
        extts->ts.tv_sec = sec;
        ...
    }

To avoid an out-of-bounds write and utilize all the supported GPIO
inputs, set LAN743X_PTP_N_EXTTS to 8.

Detected using the static analysis tool - Svace.
Fixes: 60942c397af6 ("net: lan743x: Add support for PTP-IO Event Input External Timestamp (extts)")
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Rengarajan S <rengarajan.s@microchip.com>
Link: https://patch.msgid.link/20250616113743.36284-1-aleksei.kodanev@bell-sw.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agoeth: fbnic: avoid double free when failing to DMA-map FW msg
Jakub Kicinski [Mon, 16 Jun 2025 19:55:10 +0000 (12:55 -0700)]
eth: fbnic: avoid double free when failing to DMA-map FW msg

The semantics are that caller of fbnic_mbx_map_msg() retains
the ownership of the message on error. All existing callers
dutifully free the page.

Fixes: da3cde08209e ("eth: fbnic: Add FW communication mechanism")
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250616195510.225819-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agoMerge branch 'net-fix-passive-tfo-socket-having-invalid-napi-id'
Jakub Kicinski [Thu, 19 Jun 2025 01:30:55 +0000 (18:30 -0700)]
Merge branch 'net-fix-passive-tfo-socket-having-invalid-napi-id'

David Wei says:

====================
net: fix passive TFO socket having invalid NAPI ID

Found a bug where an accepted passive TFO socket returns an invalid NAPI
ID (i.e. 0) from SO_INCOMING_NAPI_ID. Add a selftest for this using
netdevsim and fix the bug.

Patch 1 is a drive-by fix for the lib.sh include in an existing
drivers/net/netdevsim/peer.sh selftest.

Patch 2 adds a test binary for a simple TFO server/client.

Patch 3 adds a selftest for checking that the NAPI ID of a passive TFO
socket is valid. This will currently fail.

Patch 4 adds a fix for the bug.
====================

Link: https://patch.msgid.link/20250617212102.175711-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotcp: fix passive TFO socket having invalid NAPI ID
David Wei [Tue, 17 Jun 2025 21:21:02 +0000 (14:21 -0700)]
tcp: fix passive TFO socket having invalid NAPI ID

There is a bug with passive TFO sockets returning an invalid NAPI ID 0
from SO_INCOMING_NAPI_ID. Normally this is not an issue, but zero copy
receive relies on a correct NAPI ID to process sockets on the right
queue.

Fix by adding a sk_mark_napi_id_set().

Fixes: e5907459ce7e ("tcp: Record Rx hash and NAPI ID in tcp_child_process")
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250617212102.175711-5-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: net: add test for passive TFO socket NAPI ID
David Wei [Tue, 17 Jun 2025 21:21:01 +0000 (14:21 -0700)]
selftests: net: add test for passive TFO socket NAPI ID

Add a test that checks that the NAPI ID of a passive TFO socket is valid
i.e. not zero.

Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250617212102.175711-4-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: net: add passive TFO test binary
David Wei [Tue, 17 Jun 2025 21:21:00 +0000 (14:21 -0700)]
selftests: net: add passive TFO test binary

Add a simple passive TFO server and client test binary. This will be
used to test the SO_INCOMING_NAPI_ID of passive TFO accepted sockets.

Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250617212102.175711-3-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: netdevsim: improve lib.sh include in peer.sh
David Wei [Tue, 17 Jun 2025 21:20:59 +0000 (14:20 -0700)]
selftests: netdevsim: improve lib.sh include in peer.sh

Fix the peer.sh test to run from INSTALL_PATH.

Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250617212102.175711-2-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotipc: fix null-ptr-deref when acquiring remote ip of ethernet bearer
Haixia Qu [Tue, 17 Jun 2025 05:56:24 +0000 (05:56 +0000)]
tipc: fix null-ptr-deref when acquiring remote ip of ethernet bearer

The reproduction steps:
1. create a tun interface
2. enable l2 bearer
3. TIPC_NL_UDP_GET_REMOTEIP with media name set to tun

tipc: Started in network mode
tipc: Node identity 8af312d38a21, cluster identity 4711
tipc: Enabled bearer <eth:syz_tun>, priority 1
Oops: general protection fault
KASAN: null-ptr-deref in range
CPU: 1 UID: 1000 PID: 559 Comm: poc Not tainted 6.16.0-rc1+ #117 PREEMPT
Hardware name: QEMU Ubuntu 24.04 PC
RIP: 0010:tipc_udp_nl_dump_remoteip+0x4a4/0x8f0

the ub was in fact a struct dev.

when bid != 0 && skip_cnt != 0, bearer_list[bid] may be NULL or
other media when other thread changes it.

fix this by checking media_id.

Fixes: 832629ca5c313 ("tipc: add UDP remoteip dump to netlink API")
Signed-off-by: Haixia Qu <hxqu@hillstonenet.com>
Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Link: https://patch.msgid.link/20250617055624.2680-1-hxqu@hillstonenet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoOcteontx2-pf: Fix Backpresure configuration
Hariprasad Kelam [Tue, 17 Jun 2025 06:34:02 +0000 (12:04 +0530)]
Octeontx2-pf: Fix Backpresure configuration

NIX block can receive packets from multiple links such as
MAC (RPM), LBK and CPT.

       -----------------
 RPM --|     NIX       |
       -----------------
             |
             |
            LBK

Each link supports multiple channels for example RPM link supports
16 channels. In case of link oversubsribe, NIX will assert backpressure
on receive channels.

The previous patch considered a single channel per link, resulting in
backpressure not being enabled on the remaining channels

Fixes: a7ef63dbd588 ("octeontx2-af: Disable backpressure between CPT and NIX")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250617063403.3582210-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Wed, 18 Jun 2025 21:15:15 +0000 (14:15 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-06-17 (ice, e1000e)

For ice:
Krishna Kumar modifies aRFS match criteria to correctly identify
matching filters.

Grzegorz fixes a memory leak in eswitch legacy mode.

For e1000e:
Vitaly sets clock frequency on some Nahum systems which may misreport
their value.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  e1000e: set fixed clock frequency indication for Nahum 11 and Nahum 13
  ice: fix eswitch code memory leak in reset scenario
  net: ice: Perform accurate aRFS flow match
====================

Link: https://patch.msgid.link/20250617172444.1419560-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: ftgmac100: select FIXED_PHY
Heiner Kallweit [Tue, 17 Jun 2025 18:20:17 +0000 (20:20 +0200)]
net: ftgmac100: select FIXED_PHY

Depending on e.g. DT configuration this driver uses a fixed link.
So we shouldn't rely on the user to enable FIXED_PHY, select it in
Kconfig instead. We may end up with a non-functional driver otherwise.

Fixes: 38561ded50d0 ("net: ftgmac100: support fixed link")
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/476bb33b-5584-40f0-826a-7294980f2895@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: ethtool: remove duplicate defines for family info
Jakub Kicinski [Tue, 17 Jun 2025 20:22:40 +0000 (13:22 -0700)]
net: ethtool: remove duplicate defines for family info

Commit under fixes switched to uAPI generation from the YAML
spec. A number of custom defines were left behind, mostly
for commands very hard to express in YAML spec.

Among what was left behind was the name and version of
the generic netlink family. Problem is that the codegen
always outputs those values so we ended up with a duplicated,
differently named set of defines.

Provide naming info in YAML and remove the incorrect defines.

Fixes: 8d0580c6ebdd ("ethtool: regenerate uapi header from the spec")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250617202240.811179-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotcp: fix tcp_packet_delayed() for tcp_is_non_sack_preventing_reopen() behavior
Neal Cardwell [Fri, 13 Jun 2025 19:30:56 +0000 (15:30 -0400)]
tcp: fix tcp_packet_delayed() for tcp_is_non_sack_preventing_reopen() behavior

After the following commit from 2024:

commit e37ab7373696 ("tcp: fix to allow timestamp undo if no retransmits were sent")

...there was buggy behavior where TCP connections without SACK support
could easily see erroneous undo events at the end of fast recovery or
RTO recovery episodes. The erroneous undo events could cause those
connections to suffer repeated loss recovery episodes and high
retransmit rates.

The problem was an interaction between the non-SACK behavior on these
connections and the undo logic. The problem is that, for non-SACK
connections at the end of a loss recovery episode, if snd_una ==
high_seq, then tcp_is_non_sack_preventing_reopen() holds steady in
CA_Recovery or CA_Loss, but clears tp->retrans_stamp to 0. Then upon
the next ACK the "tcp: fix to allow timestamp undo if no retransmits
were sent" logic saw the tp->retrans_stamp at 0 and erroneously
concluded that no data was retransmitted, and erroneously performed an
undo of the cwnd reduction, restoring cwnd immediately to the value it
had before loss recovery.  This caused an immediate burst of traffic
and build-up of queues and likely another immediate loss recovery
episode.

This commit fixes tcp_packet_delayed() to ignore zero retrans_stamp
values for non-SACK connections when snd_una is at or above high_seq,
because tcp_is_non_sack_preventing_reopen() clears retrans_stamp in
this case, so it's not a valid signal that we can undo.

Note that the commit named in the Fixes footer restored long-present
behavior from roughly 2005-2019, so apparently this bug was present
for a while during that era, and this was simply not caught.

Fixes: e37ab7373696 ("tcp: fix to allow timestamp undo if no retransmits were sent")
Reported-by: Eric Wheeler <netdev@lists.ewheeler.net>
Closes: https://lore.kernel.org/netdev/64ea9333-e7f9-0df-b0f2-8d566143acab@ewheeler.net/
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Co-developed-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 months agoMerge branch 'atm-fix-uninit-and-mem-accounting-leak-in-vcc_sendmsg'
Jakub Kicinski [Wed, 18 Jun 2025 01:42:48 +0000 (18:42 -0700)]
Merge branch 'atm-fix-uninit-and-mem-accounting-leak-in-vcc_sendmsg'

Kuniyuki Iwashima says:

====================
atm: Fix uninit and mem accounting leak in vcc_sendmsg().

Patch 1 fixes uninit issue reported by KMSAN, and patch 2 fixes
another issue found by Simon Horman during review for v1 patch.

v1: https://lore.kernel.org/20250613055700.415596-1-kuni1840@gmail.com
====================

Link: https://patch.msgid.link/20250616182147.963333-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoatm: Revert atm_account_tx() if copy_from_iter_full() fails.
Kuniyuki Iwashima [Mon, 16 Jun 2025 18:21:15 +0000 (11:21 -0700)]
atm: Revert atm_account_tx() if copy_from_iter_full() fails.

In vcc_sendmsg(), we account skb->truesize to sk->sk_wmem_alloc by
atm_account_tx().

It is expected to be reverted by atm_pop_raw() later called by
vcc->dev->ops->send(vcc, skb).

However, vcc_sendmsg() misses the same revert when copy_from_iter_full()
fails, and then we will leak a socket.

Let's factorise the revert part as atm_return_tx() and call it in
the failure path.

Note that the corresponding sk_wmem_alloc operation can be found in
alloc_tx() as of the blamed commit.

  $ git blame -L:alloc_tx net/atm/common.c c55fa3cccbc2c~

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/netdev/20250614161959.GR414686@horms.kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250616182147.963333-3-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoatm: atmtcp: Free invalid length skb in atmtcp_c_send().
Kuniyuki Iwashima [Mon, 16 Jun 2025 18:21:14 +0000 (11:21 -0700)]
atm: atmtcp: Free invalid length skb in atmtcp_c_send().

syzbot reported the splat below. [0]

vcc_sendmsg() copies data passed from userspace to skb and passes
it to vcc->dev->ops->send().

atmtcp_c_send() accesses skb->data as struct atmtcp_hdr after
checking if skb->len is 0, but it's not enough.

Also, when skb->len == 0, skb and sk (vcc) were leaked because
dev_kfree_skb() is not called and sk_wmem_alloc adjustment is missing
to revert atm_account_tx() in vcc_sendmsg(), which is expected
to be done in atm_pop_raw().

Let's properly free skb with an invalid length in atmtcp_c_send().

[0]:
BUG: KMSAN: uninit-value in atmtcp_c_send+0x255/0xed0 drivers/atm/atmtcp.c:294
 atmtcp_c_send+0x255/0xed0 drivers/atm/atmtcp.c:294
 vcc_sendmsg+0xd7c/0xff0 net/atm/common.c:644
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg+0x330/0x3d0 net/socket.c:727
 ____sys_sendmsg+0x7e0/0xd80 net/socket.c:2566
 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2620
 __sys_sendmsg net/socket.c:2652 [inline]
 __do_sys_sendmsg net/socket.c:2657 [inline]
 __se_sys_sendmsg net/socket.c:2655 [inline]
 __x64_sys_sendmsg+0x211/0x3e0 net/socket.c:2655
 x64_sys_call+0x32fb/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:47
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was created at:
 slab_post_alloc_hook mm/slub.c:4154 [inline]
 slab_alloc_node mm/slub.c:4197 [inline]
 kmem_cache_alloc_node_noprof+0x818/0xf00 mm/slub.c:4249
 kmalloc_reserve+0x13c/0x4b0 net/core/skbuff.c:579
 __alloc_skb+0x347/0x7d0 net/core/skbuff.c:670
 alloc_skb include/linux/skbuff.h:1336 [inline]
 vcc_sendmsg+0xb40/0xff0 net/atm/common.c:628
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg+0x330/0x3d0 net/socket.c:727
 ____sys_sendmsg+0x7e0/0xd80 net/socket.c:2566
 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2620
 __sys_sendmsg net/socket.c:2652 [inline]
 __do_sys_sendmsg net/socket.c:2657 [inline]
 __se_sys_sendmsg net/socket.c:2655 [inline]
 __x64_sys_sendmsg+0x211/0x3e0 net/socket.c:2655
 x64_sys_call+0x32fb/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:47
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

CPU: 1 UID: 0 PID: 5798 Comm: syz-executor192 Not tainted 6.16.0-rc1-syzkaller-00010-g2c4a1f3fe03e #0 PREEMPT(undef)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+1d3c235276f62963e93a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1d3c235276f62963e93a
Tested-by: syzbot+1d3c235276f62963e93a@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250616182147.963333-2-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agompls: Use rcu_dereference_rtnl() in mpls_route_input_rcu().
Kuniyuki Iwashima [Mon, 16 Jun 2025 20:15:12 +0000 (13:15 -0700)]
mpls: Use rcu_dereference_rtnl() in mpls_route_input_rcu().

As syzbot reported [0], mpls_route_input_rcu() can be called
from mpls_getroute(), where is under RTNL.

net->mpls.platform_label is only updated under RTNL.

Let's use rcu_dereference_rtnl() in mpls_route_input_rcu() to
silence the splat.

[0]:
WARNING: suspicious RCU usage
6.15.0-rc7-syzkaller-00082-g5cdb2c77c4c3 #0 Not tainted
 ----------------------------
net/mpls/af_mpls.c:84 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by syz.2.4451/17730:
 #0: ffffffff9012a3e8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
 #0: ffffffff9012a3e8 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x371/0xe90 net/core/rtnetlink.c:6961

stack backtrace:
CPU: 1 UID: 0 PID: 17730 Comm: syz.2.4451 Not tainted 6.15.0-rc7-syzkaller-00082-g5cdb2c77c4c3 #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x16c/0x1f0 lib/dump_stack.c:120
 lockdep_rcu_suspicious+0x166/0x260 kernel/locking/lockdep.c:6865
 mpls_route_input_rcu+0x1d4/0x200 net/mpls/af_mpls.c:84
 mpls_getroute+0x621/0x1ea0 net/mpls/af_mpls.c:2381
 rtnetlink_rcv_msg+0x3c9/0xe90 net/core/rtnetlink.c:6964
 netlink_rcv_skb+0x16d/0x440 net/netlink/af_netlink.c:2534
 netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
 netlink_unicast+0x53a/0x7f0 net/netlink/af_netlink.c:1339
 netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg net/socket.c:727 [inline]
 ____sys_sendmsg+0xa98/0xc70 net/socket.c:2566
 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620
 __sys_sendmmsg+0x200/0x420 net/socket.c:2709
 __do_sys_sendmmsg net/socket.c:2736 [inline]
 __se_sys_sendmmsg net/socket.c:2733 [inline]
 __x64_sys_sendmmsg+0x9c/0x100 net/socket.c:2733
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0x230 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f0a2818e969
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f0a28f52038 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f0a283b5fa0 RCX: 00007f0a2818e969
RDX: 0000000000000003 RSI: 0000200000000080 RDI: 0000000000000003
RBP: 00007f0a28210ab1 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f0a283b5fa0 R15: 00007ffce5e9f268
 </TASK>

Fixes: 0189197f4416 ("mpls: Basic routing support")
Reported-by: syzbot+8a583bdd1a5cc0b0e068@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68507981.a70a0220.395abc.01ef.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250616201532.1036568-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/sched: fix use-after-free in taprio_dev_notifier
Hyunwoo Kim [Sat, 14 Jun 2025 00:54:57 +0000 (20:54 -0400)]
net/sched: fix use-after-free in taprio_dev_notifier

Since taprio’s taprio_dev_notifier() isn’t protected by an
RCU read-side critical section, a race with advance_sched()
can lead to a use-after-free.

Adding rcu_read_lock() inside taprio_dev_notifier() prevents this.

Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations")
Cc: stable@vger.kernel.org
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/aEzIYYxt0is9upYG@v4bel-B760M-AORUS-ELITE-AX
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'ptp_vclock-fixes'
Jakub Kicinski [Tue, 17 Jun 2025 23:13:12 +0000 (16:13 -0700)]
Merge branch 'ptp_vclock-fixes'

Vladimir Oltean says:

====================
ptp_vclock fixes

While I was intending to test something else related to PTP in net-next
I noticed any time I would run ptp4l on an interface, the kernel would
print "ptp: physical clock is free running" and ptp4l would exit with an
error code.

I then found Jeongjun Park's patch and subsequent explanation provided
to Jakub's question, specifically related to the code which introduced
the breakage I am seeing.
https://lore.kernel.org/netdev/CAO9qdTEjQ5414un7Yw604paECF=6etVKSDSnYmZzZ6Pg3LurXw@mail.gmail.com/

I had to look at the original issue that prompted Jeongjun Park's patch,
and provide an alternative fix for it. Patch 1/2 in this set contains a
logical revert plus the alternative fix, squashed into one.

Patch 2/2 fixes another issue which was confusing during debugging/
characterization, namely: "ok, the kernel clearly thinks that any
physical clock is free-running after this change (despite there being no
vclocks), but why would ptp4l fail to create the clock altogether? Why
not just fail to adjust it?"

By reverting (locally) Jeongjun Park's commit, I could reproduce
the reported lockdep splat using the commands from patch 1/2's commit
message, and this goes away with the reworked implementation.
====================

Link: https://patch.msgid.link/20250613174749.406826-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoptp: allow reading of currently dialed frequency to succeed on free-running clocks
Vladimir Oltean [Fri, 13 Jun 2025 17:47:49 +0000 (20:47 +0300)]
ptp: allow reading of currently dialed frequency to succeed on free-running clocks

There is a bug in ptp_clock_adjtime() which makes it refuse the
operation even if we just want to read the current clock dialed
frequency, not modify anything (tx->modes == 0). That should be possible
even if the clock is free-running. For context, the kernel UAPI is the
same for getting and setting the frequency of a POSIX clock.

For example, ptp4l errors out at clock_create() -> clockadj_get_freq()
-> clock_adjtime() time, when it should logically only have failed on
actual adjustments to the clock, aka if the clock was configured as
slave. But in master mode it should work.

This was discovered when examining the issue described in the previous
commit, where ptp_clock_freerun() returned true despite n_vclocks being
zero.

Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250613174749.406826-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoptp: fix breakage after ptp_vclock_in_use() rework
Vladimir Oltean [Fri, 13 Jun 2025 17:47:48 +0000 (20:47 +0300)]
ptp: fix breakage after ptp_vclock_in_use() rework

What is broken
--------------

ptp4l, and any other application which calls clock_adjtime() on a
physical clock, is greeted with error -EBUSY after commit 87f7ce260a3c
("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()").

Explanation for the breakage
----------------------------

The blamed commit was based on the false assumption that
ptp_vclock_in_use() callers already test for n_vclocks prior to calling
this function.

This is notably incorrect for the code path below, in which there is, in
fact, no n_vclocks test:

ptp_clock_adjtime()
-> ptp_clock_freerun()
   -> ptp_vclock_in_use()

The result is that any clock adjustment on any physical clock is now
impossible. This is _despite_ there not being any vclock over this
physical clock.

$ ptp4l -i eno0 -2 -P -m
ptp4l[58.425]: selected /dev/ptp0 as PTP clock
[   58.429749] ptp: physical clock is free running
ptp4l[58.431]: Failed to open /dev/ptp0: Device or resource busy
failed to create a clock
$ cat /sys/class/ptp/ptp0/n_vclocks
0

The patch makes the ptp_vclock_in_use() function say "if it's not a
virtual clock, then this physical clock does have virtual clocks on
top".

Then ptp_clock_freerun() uses this information to say "this physical
clock has virtual clocks on top, so it must stay free-running".

Then ptp_clock_adjtime() uses this information to say "well, if this
physical clock has to be free-running, I can't do it, return -EBUSY".

Simply put, ptp_vclock_in_use() cannot be simplified so as to remove the
test whether vclocks are in use.

What did the blamed commit intend to fix
----------------------------------------

The blamed commit presents a lockdep warning stating "possible recursive
locking detected", with the n_vclocks_store() and ptp_clock_unregister()
functions involved.

The recursive locking seems this:
n_vclocks_store()
-> mutex_lock_interruptible(&ptp->n_vclocks_mux) // 1
-> device_for_each_child_reverse(..., unregister_vclock)
   -> unregister_vclock()
      -> ptp_vclock_unregister()
         -> ptp_clock_unregister()
            -> ptp_vclock_in_use()
               -> mutex_lock_interruptible(&ptp->n_vclocks_mux) // 2

The issue can be triggered by creating and then deleting vclocks:
$ echo 2 > /sys/class/ptp/ptp0/n_vclocks
$ echo 0 > /sys/class/ptp/ptp0/n_vclocks

But note that in the original stack trace, the address of the first lock
is different from the address of the second lock. This is because at
step 1 marked above, &ptp->n_vclocks_mux is the lock of the parent
(physical) PTP clock, and at step 2, the lock is of the child (virtual)
PTP clock. They are different locks of different devices.

In this situation there is no real deadlock, the lockdep warning is
caused by the fact that the mutexes have the same lock class on both the
parent and the child. Functionally it is fine.

Proposed alternative solution
-----------------------------

We must reintroduce the body of ptp_vclock_in_use() mostly as it was
structured prior to the blamed commit, but avoid the lockdep warning.

Based on the fact that vclocks cannot be nested on top of one another
(ptp_is_attribute_visible() hides n_vclocks for virtual clocks), we
already know that ptp->n_vclocks is zero for a virtual clock. And
ptp->is_virtual_clock is a runtime invariant, established at
ptp_clock_register() time and never changed. There is no need to
serialize on any mutex in order to read ptp->is_virtual_clock, and we
take advantage of that by moving it outside the lock.

Thus, virtual clocks do not need to acquire &ptp->n_vclocks_mux at
all, and step 2 in the code walkthrough above can simply go away.
We can simply return false to the question "ptp_vclock_in_use(a virtual
clock)".

Other notes
-----------

Releasing &ptp->n_vclocks_mux before ptp_vclock_in_use() returns
execution seems racy, because the returned value can become stale as
soon as the function returns and before the return value is used (i.e.
n_vclocks_store() can run any time). The locking requirement should
somehow be transferred to the caller, to ensure a longer life time for
the returned value, but this seems out of scope for this severe bug fix.

Because we are also fixing up the logic from the original commit, there
is another Fixes: tag for that.

Fixes: 87f7ce260a3c ("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()")
Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250613174749.406826-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'bnxt_en-bug-fixes'
Jakub Kicinski [Tue, 17 Jun 2025 22:56:33 +0000 (15:56 -0700)]
Merge branch 'bnxt_en-bug-fixes'

Michael Chan says:

====================
bnxt_en: Bug fixes

Thie first patch fixes a crash during PCIe AER when the bnxt_re RoCE
driver is loaded.  The second patch is a refactor patch needed by
patch 3.  Patch 3 fixes a packet drop issue if queue restart is done
on a ring belonging to a non-default RSS context.  Patch 2 and 3 are
version 2 that has addressed the v1 issue by reducing the scope of
the traffic disruptions:

https://lore.kernel.org/netdev/CACKFLi=P9xYHVF4h2Ovjd-8DaoyzFAHnY6Y6H+1b7eGq+BQZzA@mail.gmail.com/
====================

Link: https://patch.msgid.link/20250613231841.377988-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agobnxt_en: Update MRU and RSS table of RSS contexts on queue reset
Pavan Chebbi [Fri, 13 Jun 2025 23:18:41 +0000 (16:18 -0700)]
bnxt_en: Update MRU and RSS table of RSS contexts on queue reset

The commit under the Fixes tag below which updates the VNICs' RSS
and MRU during .ndo_queue_start(), needs to be extended to cover any
non-default RSS contexts which have their own VNICs.  Without this
step, packets that are destined to a non-default RSS context may be
dropped after .ndo_queue_start().

We further optimize this scheme by updating the VNIC only if the
RX ring being restarted is in the RSS table of the VNIC.  Updating
the VNIC (in particular setting the MRU to 0) will momentarily stop
all traffic to all rings in the RSS table.  Any VNIC that has the
RX ring excluded from the RSS table can skip this step and avoid the
traffic disruption.

Note that this scheme is just an improvement.  A VNIC with multiple
rings in the RSS table will still see traffic disruptions to all rings
in the RSS table when one of the rings is being restarted.  We are
working on a FW scheme that will improve upon this further.

Fixes: 5ac066b7b062 ("bnxt_en: Fix queue start to update vnic RSS table")
Reported-by: David Wei <dw@davidwei.uk>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250613231841.377988-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agobnxt_en: Add a helper function to configure MRU and RSS
Pavan Chebbi [Fri, 13 Jun 2025 23:18:40 +0000 (16:18 -0700)]
bnxt_en: Add a helper function to configure MRU and RSS

Add a new helper function that will configure MRU and RSS table
of a VNIC. This will be useful when we configure both on a VNIC
when resetting an RX ring.  This function will be used again in
the next bug fix patch where we have to reconfigure VNICs for RSS
contexts.

Suggested-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Wei <dw@davidwei.uk>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250613231841.377988-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agobnxt_en: Fix double invocation of bnxt_ulp_stop()/bnxt_ulp_start()
Kalesh AP [Fri, 13 Jun 2025 23:18:39 +0000 (16:18 -0700)]
bnxt_en: Fix double invocation of bnxt_ulp_stop()/bnxt_ulp_start()

Before the commit under the Fixes tag below, bnxt_ulp_stop() and
bnxt_ulp_start() were always invoked in pairs.  After that commit,
the new bnxt_ulp_restart() can be invoked after bnxt_ulp_stop()
has been called.  This may result in the RoCE driver's aux driver
.suspend() method being invoked twice.  The 2nd bnxt_re_suspend()
call will crash when it dereferences a NULL pointer:

(NULL ib_device): Handle device suspend call
BUG: kernel NULL pointer dereference, address: 0000000000000b78
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 20 UID: 0 PID: 181 Comm: kworker/u96:5 Tainted: G S                  6.15.0-rc1 #4 PREEMPT(voluntary)
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
Workqueue: bnxt_pf_wq bnxt_sp_task [bnxt_en]
RIP: 0010:bnxt_re_suspend+0x45/0x1f0 [bnxt_re]
Code: 8b 05 a7 3c 5b f5 48 89 44 24 18 31 c0 49 8b 5c 24 08 4d 8b 2c 24 e8 ea 06 0a f4 48 c7 c6 04 60 52 c0 48 89 df e8 1b ce f9 ff <48> 8b 83 78 0b 00 00 48 8b 80 38 03 00 00 a8 40 0f 85 b5 00 00 00
RSP: 0018:ffffa2e84084fd88 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffffb4b6b934 RDI: 00000000ffffffff
RBP: ffffa1760954c9c0 R08: 0000000000000000 R09: c0000000ffffdfff
R10: 0000000000000001 R11: ffffa2e84084fb50 R12: ffffa176031ef070
R13: ffffa17609775000 R14: ffffa17603adc180 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffa17daa397000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000b78 CR3: 00000004aaa30003 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
bnxt_ulp_stop+0x69/0x90 [bnxt_en]
bnxt_sp_task+0x678/0x920 [bnxt_en]
? __schedule+0x514/0xf50
process_scheduled_works+0x9d/0x400
worker_thread+0x11c/0x260
? __pfx_worker_thread+0x10/0x10
kthread+0xfe/0x1e0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2b/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30

Check the BNXT_EN_FLAG_ULP_STOPPED flag and do not proceed if the flag
is already set.  This will preserve the original symmetrical
bnxt_ulp_stop() and bnxt_ulp_start().

Also, inside bnxt_ulp_start(), clear the BNXT_EN_FLAG_ULP_STOPPED
flag after taking the mutex to avoid any race condition.  And for
symmetry, only proceed in bnxt_ulp_start() if the
BNXT_EN_FLAG_ULP_STOPPED is set.

Fixes: 3c163f35bd50 ("bnxt_en: Optimize recovery path ULP locking in the driver")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Co-developed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250613231841.377988-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge tag 'linux-can-fixes-for-6.16-20250617' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Tue, 17 Jun 2025 22:48:38 +0000 (15:48 -0700)]
Merge tag 'linux-can-fixes-for-6.16-20250617' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2025-06-17

The patch is by Brett Werling, and fixes the power regulator retrieval
during probe of the tcan4x5x glue code for the m_can driver.

* tag 'linux-can-fixes-for-6.16-20250617' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: tcan4x5x: fix power regulator retrieval during probe
  openvswitch: Allocate struct ovs_pcpu_storage dynamically
====================

Link: https://patch.msgid.link/20250617155123.2141584-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: netmem: fix skb_ensure_writable with unreadable skbs
Mina Almasry [Sun, 15 Jun 2025 20:07:33 +0000 (20:07 +0000)]
net: netmem: fix skb_ensure_writable with unreadable skbs

skb_ensure_writable should succeed when it's trying to write to the
header of the unreadable skbs, so it doesn't need an unconditional
skb_frags_readable check. The preceding pskb_may_pull() call will
succeed if write_len is within the head and fail if we're trying to
write to the unreadable payload, so we don't need an additional check.

Removing this check restores DSCP functionality with unreadable skbs as
it's called from dscp_tg.

Cc: willemb@google.com
Cc: asml.silence@gmail.com
Fixes: 65249feb6b3d ("net: add support for skbs with unreadable frags")
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250615200733.520113-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: ti: icssg-prueth: Fix packet handling for XDP_TX
Meghana Malladi [Mon, 16 Jun 2025 06:33:19 +0000 (12:03 +0530)]
net: ti: icssg-prueth: Fix packet handling for XDP_TX

While transmitting XDP frames for XDP_TX, page_pool is
used to get the DMA buffers (already mapped to the pages)
and need to be freed/reycled once the transmission is complete.
This need not be explicitly done by the driver as this is handled
more gracefully by the xdp driver while returning the xdp frame.
__xdp_return() frees the XDP memory based on its memory type,
under which page_pool memory is also handled. This change fixes
the transmit queue timeout while running XDP_TX.

logs:
[  309.069682] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 45860 ms
[  313.933780] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 50724 ms
[  319.053656] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 55844 ms
...

Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250616063319.3347541-1-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoe1000e: set fixed clock frequency indication for Nahum 11 and Nahum 13
Vitaly Lifshits [Sun, 25 May 2025 08:38:43 +0000 (11:38 +0300)]
e1000e: set fixed clock frequency indication for Nahum 11 and Nahum 13

On some systems with Nahum 11 and Nahum 13 the value of the XTAL clock in
the software STRAP is incorrect. This causes the PTP timer to run at the
wrong rate and can lead to synchronization issues.

The STRAP value is configured by the system firmware, and a firmware
update is not always possible. Since the XTAL clock on these systems
always runs at 38.4MHz, the driver may ignore the STRAP and just set
the correct value.

Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Reviewed-by: Gil Fine <gil.fine@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 months agoice: fix eswitch code memory leak in reset scenario
Grzegorz Nitka [Fri, 16 May 2025 13:09:07 +0000 (15:09 +0200)]
ice: fix eswitch code memory leak in reset scenario

Add simple eswitch mode checker in attaching VF procedure and allocate
required port representor memory structures only in switchdev mode.
The reset flows triggers VF (if present) detach/attach procedure.
It might involve VF port representor(s) re-creation if the device is
configured is switchdev mode (not legacy one).
The memory was blindly allocated in current implementation,
regardless of the mode and not freed if in legacy mode.

Kmemeleak trace:
unreferenced object (percpu) 0x7e3bce5b888458 (size 40):
  comm "bash", pid 1784, jiffies 4295743894
  hex dump (first 32 bytes on cpu 45):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 0):
    pcpu_alloc_noprof+0x4c4/0x7c0
    ice_repr_create+0x66/0x130 [ice]
    ice_repr_create_vf+0x22/0x70 [ice]
    ice_eswitch_attach_vf+0x1b/0xa0 [ice]
    ice_reset_all_vfs+0x1dd/0x2f0 [ice]
    ice_pci_err_resume+0x3b/0xb0 [ice]
    pci_reset_function+0x8f/0x120
    reset_store+0x56/0xa0
    kernfs_fop_write_iter+0x120/0x1b0
    vfs_write+0x31c/0x430
    ksys_write+0x61/0xd0
    do_syscall_64+0x5b/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

Testing hints (ethX is PF netdev):
- create at least one VF
    echo 1 > /sys/class/net/ethX/device/sriov_numvfs
- trigger the reset
    echo 1 > /sys/class/net/ethX/device/reset

Fixes: 415db8399d06 ("ice: make representor code generic")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 months agonet: ice: Perform accurate aRFS flow match
Krishna Kumar [Tue, 20 May 2025 17:06:56 +0000 (22:36 +0530)]
net: ice: Perform accurate aRFS flow match

This patch fixes an issue seen in a large-scale deployment under heavy
incoming pkts where the aRFS flow wrongly matches a flow and reprograms the
NIC with wrong settings. That mis-steering causes RX-path latency spikes
and noisy neighbor effects when many connections collide on the same
hash (some of our production servers have 20-30K connections).

set_rps_cpu() calls ndo_rx_flow_steer() with flow_id that is calculated by
hashing the skb sized by the per rx-queue table size. This results in
multiple connections (even across different rx-queues) getting the same
hash value. The driver steer function modifies the wrong flow to use this
rx-queue, e.g.: Flow#1 is first added:
    Flow#1:  <ip1, port1, ip2, port2>, Hash 'h', q#10

Later when a new flow needs to be added:
    Flow#2:  <ip3, port3, ip4, port4>, Hash 'h', q#20

The driver finds the hash 'h' from Flow#1 and updates it to use q#20. This
results in both flows getting un-optimized - packets for Flow#1 goes to
q#20, and then reprogrammed back to q#10 later and so on; and Flow #2
programming is never done as Flow#1 is matched first for all misses. Many
flows may wrongly share the same hash and reprogram rules of the original
flow each with their own q#.

Tested on two 144-core servers with 16K netperf sessions for 180s. Netperf
clients are pinned to cores 0-71 sequentially (so that wrong packets on q#s
72-143 can be measured). IRQs are set 1:1 for queues -> CPUs, enable XPS,
enable aRFS (global value is 144 * rps_flow_cnt).

Test notes about results from ice_rx_flow_steer():
---------------------------------------------------
1. "Skip:" counter increments here:
    if (fltr_info->q_index == rxq_idx ||
arfs_entry->fltr_state != ICE_ARFS_ACTIVE)
    goto out;
2. "Add:" counter increments here:
    ret = arfs_entry->fltr_info.fltr_id;
    INIT_HLIST_NODE(&arfs_entry->list_entry);
3. "Update:" counter increments here:
    /* update the queue to forward to on an already existing flow */

Runtime comparison: original code vs with the patch for different
rps_flow_cnt values.

+-------------------------------+--------------+--------------+
| rps_flow_cnt                  |      512     |    2048      |
+-------------------------------+--------------+--------------+
| Ratio of Pkts on Good:Bad q's | 214 vs 822K  | 1.1M vs 980K |
| Avoid wrong aRFS programming  | 0 vs 310K    | 0 vs 30K     |
| CPU User                      | 216 vs 183   | 216 vs 206   |
| CPU System                    | 1441 vs 1171 | 1447 vs 1320 |
| CPU Softirq                   | 1245 vs 920  | 1238 vs 961  |
| CPU Total                     | 29 vs 22.7   | 29 vs 24.9   |
| aRFS Update                   | 533K vs 59   | 521K vs 32   |
| aRFS Skip                     | 82M vs 77M   | 7.2M vs 4.5M |
+-------------------------------+--------------+--------------+

A separate TCP_STREAM and TCP_RR with 1,4,8,16,64,128,256,512 connections
showed no performance degradation.

Some points on the patch/aRFS behavior:
1. Enabling full tuple matching ensures flows are always correctly matched,
   even with smaller hash sizes.
2. 5-6% drop in CPU utilization as the packets arrive at the correct CPUs
   and fewer calls to driver for programming on misses.
3. Larger hash tables reduces mis-steering due to more unique flow hashes,
   but still has clashes. However, with larger per-device rps_flow_cnt, old
   flows take more time to expire and new aRFS flows cannot be added if h/w
   limits are reached (rps_may_expire_flow() succeeds when 10*rps_flow_cnt
   pkts have been processed by this cpu that are not part of the flow).

Fixes: 28bf26724fdb0 ("ice: Implement aRFS")
Signed-off-by: Krishna Kumar <krikku@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 months agocan: tcan4x5x: fix power regulator retrieval during probe
Brett Werling [Thu, 12 Jun 2025 19:18:25 +0000 (14:18 -0500)]
can: tcan4x5x: fix power regulator retrieval during probe

Fixes the power regulator retrieval in tcan4x5x_can_probe() by ensuring
the regulator pointer is not set to NULL in the successful return from
devm_regulator_get_optional().

Fixes: 3814ca3a10be ("can: tcan4x5x: tcan4x5x_can_probe(): turn on the power before parsing the config")
Signed-off-by: Brett Werling <brett.werling@garmin.com>
Link: https://patch.msgid.link/20250612191825.3646364-1-brett.werling@garmin.com
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 months agoopenvswitch: Allocate struct ovs_pcpu_storage dynamically
Sebastian Andrzej Siewior [Fri, 13 Jun 2025 12:36:29 +0000 (14:36 +0200)]
openvswitch: Allocate struct ovs_pcpu_storage dynamically

PERCPU_MODULE_RESERVE defines the maximum size that can by used for the
per-CPU data size used by modules. This is 8KiB.

Commit 035fcdc4d240c ("openvswitch: Merge three per-CPU structures into
one") restructured the per-CPU memory allocation for the module and
moved the separate alloc_percpu() invocations at module init time to a
static per-CPU variable which is allocated by the module loader.

The size of the per-CPU data section for openvswitch is 6488 bytes which
is ~80% of the available per-CPU memory. Together with a few other
modules it is easy to exhaust the available 8KiB of memory.

Allocate ovs_pcpu_storage dynamically at module init time.

Reported-by: Gal Pressman <gal@nvidia.com>
Closes: https://lore.kernel.org/all/c401e017-f8db-4f57-a1cd-89beb979a277@nvidia.com
Fixes: 035fcdc4d240c ("openvswitch: Merge three per-CPU structures into one")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20250613123629.-XSoQTCu@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agopldmfw: Select CRC32 when PLDMFW is selected
Simon Horman [Fri, 13 Jun 2025 16:46:20 +0000 (17:46 +0100)]
pldmfw: Select CRC32 when PLDMFW is selected

pldmfw calls crc32 code and depends on it being enabled, else
there is a link error as follows. So PLDMFW should select CRC32.

  lib/pldmfw/pldmfw.o: In function `pldmfw_flash_image':
  pldmfw.c:(.text+0x70f): undefined reference to `crc32_le_base'

This problem was introduced by commit b8265621f488 ("Add pldmfw library
for PLDM firmware update").

It manifests as of commit d69ea414c9b4 ("ice: implement device flash
update via devlink").

And is more likely to occur as of commit 9ad19171b6d6 ("lib/crc: remove
unnecessary prompt for CONFIG_CRC32 and drop 'default y'").

Found by chance while exercising builds based on tinyconfig.

Fixes: b8265621f488 ("Add pldmfw library for PLDM firmware update")
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250613-pldmfw-crc32-v1-1-f3fad109eee6@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoionic: Prevent driver/fw getting out of sync on devcmd(s)
Brett Creeley [Mon, 9 Jun 2025 21:28:27 +0000 (14:28 -0700)]
ionic: Prevent driver/fw getting out of sync on devcmd(s)

Some stress/negative firmware testing around devcmd(s) returning
EAGAIN found that the done bit could get out of sync in the
firmware when it wasn't cleared in a retry case.

While here, change the type of the local done variable to a bool
to match the return type from ionic_dev_cmd_done().

Fixes: ec8ee714736e ("ionic: stretch heartbeat detection")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250609212827.53842-1-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 12 Jun 2025 16:50:36 +0000 (09:50 -0700)]
Merge tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth and wireless.

  Current release - regressions:

   - af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD

  Current release - new code bugs:

   - eth: airoha: correct enable mask for RX queues 16-31

   - veth: prevent NULL pointer dereference in veth_xdp_rcv when peer
     disappears under traffic

   - ipv6: move fib6_config_validate() to ip6_route_add(), prevent
     invalid routes

  Previous releases - regressions:

   - phy: phy_caps: don't skip better duplex match on non-exact match

   - dsa: b53: fix untagged traffic sent via cpu tagged with VID 0

   - Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused
     transient packet loss, exact reason not fully understood, yet

  Previous releases - always broken:

   - net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6)

   - sched: sfq: fix a potential crash on gso_skb handling

   - Bluetooth: intel: improve rx buffer posting to avoid causing issues
     in the firmware

   - eth: intel: i40e: make reset handling robust against multiple
     requests

   - eth: mlx5: ensure FW pages are always allocated on the local NUMA
     node, even when device is configure to 'serve' another node

   - wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850,
     prevent kernel crashes

   - wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
     for 3 sec if fw_stats_done is not set"

* tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
  selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
  net: ethtool: Don't check if RSS context exists in case of context 0
  af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
  ipv6: Move fib6_config_validate() to ip6_route_add().
  net: drv: netdevsim: don't napi_complete() from netpoll
  net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
  veth: prevent NULL pointer dereference in veth_xdp_rcv
  net_sched: remove qdisc_tree_flush_backlog()
  net_sched: ets: fix a race in ets_qdisc_change()
  net_sched: tbf: fix a race in tbf_change()
  net_sched: red: fix a race in __red_change()
  net_sched: prio: fix a race in prio_tune()
  net_sched: sch_sfq: reject invalid perturb period
  net: phy: phy_caps: Don't skip better duplex macth on non-exact match
  MAINTAINERS: Update Kuniyuki Iwashima's email address.
  selftests: net: add test case for NAT46 looping back dst
  net: clear the dst when changing skb protocol
  net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
  net/mlx5e: Fix leak of Geneve TLV option object
  net/mlx5: HWS, make sure the uplink is the last destination
  ...

4 months agoMerge tag 'pinctrl-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Thu, 12 Jun 2025 15:21:13 +0000 (08:21 -0700)]
Merge tag 'pinctrl-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:

 - Add some missing pins on the Qualcomm QCM2290, along with a managed
   resources patch that make it clean and nice

 - Drop an unused function in the ST Micro driver

 - Drop bouncing MAINTAINER entry

 - Drop of_match_ptr() macro to rid compile warnings in the TB10x
   driver

 - Fix up calculation of pin numbers from base in the Sunxi driver

* tag 'pinctrl-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: sunxi: dt: Consider pin base when calculating bank number from pin
  pinctrl: tb10x: Drop of_match_ptr for ID table
  pinctrl: MAINTAINERS: Drop bouncing Jianlong Huang
  pinctrl: st: Drop unused st_gpio_bank() function
  pinctrl: qcom: pinctrl-qcm2290: Add missing pins
  pinctrl: qcom: switch to devm_gpiochip_add_data()

4 months agoMerge tag 'arc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Linus Torvalds [Thu, 12 Jun 2025 15:17:56 +0000 (08:17 -0700)]
Merge tag 'arc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

 - arch_atomic64_cmpxchg relaxed variant [Jason]

 - use of inbuilt swap in stack unwinder  [Yu-Chun Lin]

 - use of __ASSEMBLER__ in kernel headers [Thomas Huth]

* tag 'arc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in the non-uapi headers
  ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
  ARC: unwind: Use built-in sort swap to reduce code size and improve performance
  ARC: atomics: Implement arch_atomic64_cmpxchg using _relaxed

4 months agoMerge tag 'wireless-2025-06-12' of https://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Thu, 12 Jun 2025 15:16:47 +0000 (08:16 -0700)]
Merge tag 'wireless-2025-06-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Another quick round of updates:

 - revert mwifiex HT40 that was causing issues
 - many ath10k/ath11k/ath12k fixes
 - re-add some iwlwifi code I lost in a merge
 - use kfree_sensitive() on an error path in cfg80211

* tag 'wireless-2025-06-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: cfg80211: use kfree_sensitive() for connkeys cleanup
  wifi: iwlwifi: fix merge damage related to iwl_pci_resume
  Revert "wifi: mwifiex: Fix HT40 bandwidth issue."
  wifi: ath12k: fix uaf in ath12k_core_init()
  wifi: ath12k: Fix hal_reo_cmd_status kernel-doc
  wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850
  wifi: ath11k: validate ath11k_crypto_mode on top of ath11k_core_qmi_firmware_ready
  wifi: ath11k: consistently use ath11k_mac_get_fw_stats()
  wifi: ath11k: move locking outside of ath11k_mac_get_fw_stats()
  wifi: ath11k: adjust unlock sequence in ath11k_update_stats_event()
  wifi: ath11k: move some firmware stats related functions outside of debugfs
  wifi: ath11k: don't wait when there is no vdev started
  wifi: ath11k: don't use static variables in ath11k_debugfs_fw_stats_process()
  wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
  wil6210: fix support for sparrow chipsets
  wifi: ath10k: Avoid vdev delete timeout when firmware is already down
  ath10k: snoc: fix unbalanced IRQ enable in crash recovery
====================

Link: https://patch.msgid.link/20250612082519.11447-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'fix-ntuple-rules-targeting-default-rss'
Jakub Kicinski [Thu, 12 Jun 2025 15:15:37 +0000 (08:15 -0700)]
Merge branch 'fix-ntuple-rules-targeting-default-rss'

Gal Pressman says:

====================
Fix ntuple rules targeting default RSS

This series addresses a regression in ethtool flow steering where rules
targeting the default RSS context (context 0) were incorrectly rejected.

The default RSS context always exists but is not stored in the rss_ctx
xarray like additional contexts. The current validation logic was
checking for the existence of context 0 in this array, causing valid
flow steering rules to be rejected.

This prevented configurations such as:
- High priority rules directing specific traffic to the default context
- Low priority catch-all rules directing remaining traffic to additional
  contexts

Patch 1 fixes the validation logic to skip the existence check for
context 0.

Patch 2 adds a selftest that verifies this behavior.

v3: https://lore.kernel.org/20250609120250.1630125-1-gal@nvidia.com
v2: https://lore.kernel.org/20250225071348.509432-1-gal@nvidia.com
====================

Link: https://patch.msgid.link/20250612071958.1696361-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
Gal Pressman [Thu, 12 Jun 2025 07:19:58 +0000 (10:19 +0300)]
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context

Add test_rss_default_context_rule() to verify that ntuple rules can
correctly direct traffic to the default RSS context (context 0).

The test creates two ntuple rules with explicit location priorities:
- A high-priority rule (loc 0) directing specific port traffic to
  context 0.
- A low-priority rule (loc 1) directing all other TCP traffic to context
  1.

This validates that:
1. Rules targeting the default context function properly.
2. Traffic steering works as expected when mixing default and
   additional RSS contexts.

The test was written by AI, and reviewed by humans.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250612071958.1696361-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: ethtool: Don't check if RSS context exists in case of context 0
Gal Pressman [Thu, 12 Jun 2025 07:19:57 +0000 (10:19 +0300)]
net: ethtool: Don't check if RSS context exists in case of context 0

Context 0 (default context) always exists, there is no need to check
whether it exists or not when adding a flow steering rule.

The existing check fails when creating a flow steering rule for context
0 as it is not stored in the rss_ctx xarray.

For example:
$ ethtool --config-ntuple eth2 flow-type tcp4 dst-ip 194.237.147.23 dst-port 19983 context 0 loc 618
rmgr: Cannot insert RX class rule: Invalid argument
Cannot insert classification rule

An example usecase for this could be:
- A high-priority rule (loc 0) directing specific port traffic to
  context 0.
- A low-priority rule (loc 1) directing all other TCP traffic to context
  1.

This is a user-visible regression that was caught in our testing
environment, it was not reported by a user yet.

Fixes: de7f7582dff2 ("net: ethtool: prevent flow steering to RSS contexts which don't exist")
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250612071958.1696361-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge tag 'for-net-2025-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Thu, 12 Jun 2025 15:13:47 +0000 (08:13 -0700)]
Merge tag 'for-net-2025-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - eir: Fix NULL pointer deference on eir_get_service_data
 - eir: Fix possible crashes on eir_create_adv_data
 - hci_sync: Fix broadcast/PA when using an existing instance
 - ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets
 - ISO: Fix not using bc_sid as advertisement SID
 - MGMT: Fix sparse errors

* tag 'for-net-2025-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: MGMT: Fix sparse errors
  Bluetooth: ISO: Fix not using bc_sid as advertisement SID
  Bluetooth: ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets
  Bluetooth: eir: Fix possible crashes on eir_create_adv_data
  Bluetooth: hci_sync: Fix broadcast/PA when using an existing instance
  Bluetooth: Fix NULL pointer deference on eir_get_service_data
====================

Link: https://patch.msgid.link/20250611204944.1559356-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoaf_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
Kuniyuki Iwashima [Wed, 11 Jun 2025 20:27:35 +0000 (13:27 -0700)]
af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.

Before the cited commit, the kernel unconditionally embedded SCM
credentials to skb for embryo sockets even when both the sender
and listener disabled SO_PASSCRED and SO_PASSPIDFD.

Now, the credentials are added to skb only when configured by the
sender or the listener.

However, as reported in the link below, it caused a regression for
some programs that assume credentials are included in every skb,
but sometimes not now.

The only problematic scenario would be that a socket starts listening
before setting the option.  Then, there will be 2 types of non-small
race window, where a client can send skb without credentials, which
the peer receives as an "invalid" message (and aborts the connection
it seems ?):

  Client                    Server
  ------                    ------
                            s1.listen()  <-- No SO_PASS{CRED,PIDFD}
  s2.connect()
  s2.send()  <-- w/o cred
                            s1.setsockopt(SO_PASS{CRED,PIDFD})
  s2.send()  <-- w/  cred

or

  Client                    Server
  ------                    ------
                            s1.listen()  <-- No SO_PASS{CRED,PIDFD}
  s2.connect()
  s2.send()  <-- w/o cred
                            s3, _ = s1.accept()  <-- Inherit cred options
  s2.send()  <-- w/o cred                            but not set yet

                            s3.setsockopt(SO_PASS{CRED,PIDFD})
  s2.send()  <-- w/  cred

It's unfortunate that buggy programs depend on the behaviour,
but let's restore the previous behaviour.

Fixes: 3f84d577b79d ("af_unix: Inherit sk_flags at connect().")
Reported-by: Jacek Łuczak <difrost.kernel@gmail.com>
Closes: https://lore.kernel.org/all/68d38b0b-1666-4974-85d4-15575789c8d4@gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Tested-by: Christian Heusel <christian@heusel.eu>
Tested-by: André Almeida <andrealmeid@igalia.com>
Tested-by: Jacek Łuczak <difrost.kernel@gmail.com>
Link: https://patch.msgid.link/20250611202758.3075858-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoipv6: Move fib6_config_validate() to ip6_route_add().
Kuniyuki Iwashima [Wed, 11 Jun 2025 19:35:02 +0000 (12:35 -0700)]
ipv6: Move fib6_config_validate() to ip6_route_add().

syzkaller created an IPv6 route from a malformed packet, which has
a prefix len > 128, triggering the splat below. [0]

This is a similar issue fixed by commit 586ceac9acb7 ("ipv6: Restore
fib6_config validation for SIOCADDRT.").

The cited commit removed fib6_config validation from some callers
of ip6_add_route().

Let's move the validation back to ip6_route_add() and
ip6_route_multipath_add().

[0]:
UBSAN: array-index-out-of-bounds in ./include/net/ipv6.h:616:34
index 20 is out of range for type '__u8 [16]'
CPU: 1 UID: 0 PID: 7444 Comm: syz.0.708 Not tainted 6.16.0-rc1-syzkaller-g19272b37aa4f #0 PREEMPT
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff80078a80>] dump_backtrace+0x2e/0x3c arch/riscv/kernel/stacktrace.c:132
[<ffffffff8000327a>] show_stack+0x30/0x3c arch/riscv/kernel/stacktrace.c:138
[<ffffffff80061012>] __dump_stack lib/dump_stack.c:94 [inline]
[<ffffffff80061012>] dump_stack_lvl+0x12e/0x1a6 lib/dump_stack.c:120
[<ffffffff800610a6>] dump_stack+0x1c/0x24 lib/dump_stack.c:129
[<ffffffff8001c0ea>] ubsan_epilogue+0x14/0x46 lib/ubsan.c:233
[<ffffffff819ba290>] __ubsan_handle_out_of_bounds+0xf6/0xf8 lib/ubsan.c:455
[<ffffffff85b363a4>] ipv6_addr_prefix include/net/ipv6.h:616 [inline]
[<ffffffff85b363a4>] ip6_route_info_create+0x8f8/0x96e net/ipv6/route.c:3793
[<ffffffff85b635da>] ip6_route_add+0x2a/0x1aa net/ipv6/route.c:3889
[<ffffffff85b02e08>] addrconf_prefix_route+0x2c4/0x4e8 net/ipv6/addrconf.c:2487
[<ffffffff85b23bb2>] addrconf_prefix_rcv+0x1720/0x1e62 net/ipv6/addrconf.c:2878
[<ffffffff85b92664>] ndisc_router_discovery+0x1a06/0x3504 net/ipv6/ndisc.c:1570
[<ffffffff85b99038>] ndisc_rcv+0x500/0x600 net/ipv6/ndisc.c:1874
[<ffffffff85bc2c18>] icmpv6_rcv+0x145e/0x1e0a net/ipv6/icmp.c:988
[<ffffffff85af6798>] ip6_protocol_deliver_rcu+0x18a/0x1976 net/ipv6/ip6_input.c:436
[<ffffffff85af8078>] ip6_input_finish+0xf4/0x174 net/ipv6/ip6_input.c:480
[<ffffffff85af8262>] NF_HOOK include/linux/netfilter.h:317 [inline]
[<ffffffff85af8262>] NF_HOOK include/linux/netfilter.h:311 [inline]
[<ffffffff85af8262>] ip6_input+0x16a/0x70c net/ipv6/ip6_input.c:491
[<ffffffff85af8dcc>] ip6_mc_input+0x5c8/0x1268 net/ipv6/ip6_input.c:588
[<ffffffff85af6112>] dst_input include/net/dst.h:469 [inline]
[<ffffffff85af6112>] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
[<ffffffff85af6112>] NF_HOOK include/linux/netfilter.h:317 [inline]
[<ffffffff85af6112>] NF_HOOK include/linux/netfilter.h:311 [inline]
[<ffffffff85af6112>] ipv6_rcv+0x5ae/0x6e0 net/ipv6/ip6_input.c:309
[<ffffffff85087e84>] __netif_receive_skb_one_core+0x106/0x16e net/core/dev.c:5977
[<ffffffff85088104>] __netif_receive_skb+0x2c/0x144 net/core/dev.c:6090
[<ffffffff850883c6>] netif_receive_skb_internal net/core/dev.c:6176 [inline]
[<ffffffff850883c6>] netif_receive_skb+0x1aa/0xbf2 net/core/dev.c:6235
[<ffffffff8328656e>] tun_rx_batched.isra.0+0x430/0x686 drivers/net/tun.c:1485
[<ffffffff8329ed3a>] tun_get_user+0x2952/0x3d6c drivers/net/tun.c:1938
[<ffffffff832a21e0>] tun_chr_write_iter+0xc4/0x21c drivers/net/tun.c:1984
[<ffffffff80b9b9ae>] new_sync_write fs/read_write.c:593 [inline]
[<ffffffff80b9b9ae>] vfs_write+0x56c/0xa9a fs/read_write.c:686
[<ffffffff80b9c2be>] ksys_write+0x126/0x228 fs/read_write.c:738
[<ffffffff80b9c42e>] __do_sys_write fs/read_write.c:749 [inline]
[<ffffffff80b9c42e>] __se_sys_write fs/read_write.c:746 [inline]
[<ffffffff80b9c42e>] __riscv_sys_write+0x6e/0x94 fs/read_write.c:746
[<ffffffff80076912>] syscall_handler+0x94/0x118 arch/riscv/include/asm/syscall.h:112
[<ffffffff8637e31e>] do_trap_ecall_u+0x396/0x530 arch/riscv/kernel/traps.c:341
[<ffffffff863a69e2>] handle_exception+0x146/0x152 arch/riscv/kernel/entry.S:197

Fixes: fa76c1674f2e ("ipv6: Move some validation from ip6_route_info_create() to rtm_to_fib6_config().")
Reported-by: syzbot+4c2358694722d304c44e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6849b8c3.a00a0220.1eb5f5.00f0.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611193551.2999991-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: drv: netdevsim: don't napi_complete() from netpoll
Jakub Kicinski [Wed, 11 Jun 2025 17:46:43 +0000 (10:46 -0700)]
net: drv: netdevsim: don't napi_complete() from netpoll

netdevsim supports netpoll. Make sure we don't call napi_complete()
from it, since it may not be scheduled. Breno reports hitting a
warning in napi_complete_done():

WARNING: CPU: 14 PID: 104 at net/core/dev.c:6592 napi_complete_done+0x2cc/0x560
  __napi_poll+0x2d8/0x3a0
  handle_softirqs+0x1fe/0x710

This is presumably after netpoll stole the SCHED bit prematurely.

Reported-by: Breno Leitao <leitao@debian.org>
Fixes: 3762ec05a9fb ("netdevsim: add NAPI support")
Tested-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250611174643.2769263-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
Dan Carpenter [Wed, 11 Jun 2025 13:14:32 +0000 (16:14 +0300)]
net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()

Check for if ida_alloc() or rhashtable_lookup_get_insert_fast() fails.

Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Link: https://patch.msgid.link/aEmBONjyiF6z5yCV@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoveth: prevent NULL pointer dereference in veth_xdp_rcv
Jesper Dangaard Brouer [Wed, 11 Jun 2025 12:40:04 +0000 (14:40 +0200)]
veth: prevent NULL pointer dereference in veth_xdp_rcv

The veth peer device is RCU protected, but when the peer device gets
deleted (veth_dellink) then the pointer is assigned NULL (via
RCU_INIT_POINTER).

This patch adds a necessary NULL check in veth_xdp_rcv when accessing
the veth peer net_device.

This fixes a bug introduced in commit dc82a33297fc ("veth: apply qdisc
backpressure on full ptr_ring to reduce TX drops"). The bug is a race
and only triggers when having inflight packets on a veth that is being
deleted.

Reported-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Closes: https://lore.kernel.org/all/fecfcad0-7a16-42b8-bff2-66ee83a6e5c4@linux.dev/
Reported-by: syzbot+c4c7bf27f6b0c4bd97fe@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/683da55e.a00a0220.d8eae.0052.GAE@google.com/
Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://patch.msgid.link/174964557873.519608.10855046105237280978.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'net_sched-no-longer-use-qdisc_tree_flush_backlog'
Jakub Kicinski [Thu, 12 Jun 2025 15:05:53 +0000 (08:05 -0700)]
Merge branch 'net_sched-no-longer-use-qdisc_tree_flush_backlog'

Eric Dumazet says:

====================
net_sched: no longer use qdisc_tree_flush_backlog()

This series is based on a report from Gerrard Tai.

Essentially, all users of qdisc_tree_flush_backlog() are racy.

We must instead use qdisc_purge_queue().
====================

Link: https://patch.msgid.link/20250611111515.1983366-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet_sched: remove qdisc_tree_flush_backlog()
Eric Dumazet [Wed, 11 Jun 2025 11:15:15 +0000 (11:15 +0000)]
net_sched: remove qdisc_tree_flush_backlog()

This function is no longer used after the four prior fixes.

Given all prior uses were wrong, it seems better to remove it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet_sched: ets: fix a race in ets_qdisc_change()
Eric Dumazet [Wed, 11 Jun 2025 11:15:14 +0000 (11:15 +0000)]
net_sched: ets: fix a race in ets_qdisc_change()

Gerrard Tai reported a race condition in ETS, whenever SFQ perturb timer
fires at the wrong time.

The race is as follows:

CPU 0                                 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
 |
 |                                    [5]: lock root
 |                                    [6]: rehash
 |                                    [7]: qdisc_tree_reduce_backlog()
 |
[4]: qdisc_put()

This can be abused to underflow a parent's qlen.

Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.

Fixes: b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet_sched: tbf: fix a race in tbf_change()
Eric Dumazet [Wed, 11 Jun 2025 11:15:13 +0000 (11:15 +0000)]
net_sched: tbf: fix a race in tbf_change()

Gerrard Tai reported a race condition in TBF, whenever SFQ perturb timer
fires at the wrong time.

The race is as follows:

CPU 0                                 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
 |
 |                                    [5]: lock root
 |                                    [6]: rehash
 |                                    [7]: qdisc_tree_reduce_backlog()
 |
[4]: qdisc_put()

This can be abused to underflow a parent's qlen.

Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.

Fixes: b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://patch.msgid.link/20250611111515.1983366-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet_sched: red: fix a race in __red_change()
Eric Dumazet [Wed, 11 Jun 2025 11:15:12 +0000 (11:15 +0000)]
net_sched: red: fix a race in __red_change()

Gerrard Tai reported a race condition in RED, whenever SFQ perturb timer
fires at the wrong time.

The race is as follows:

CPU 0                                 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
 |
 |                                    [5]: lock root
 |                                    [6]: rehash
 |                                    [7]: qdisc_tree_reduce_backlog()
 |
[4]: qdisc_put()

This can be abused to underflow a parent's qlen.

Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.

Fixes: 0c8d13ac9607 ("net: sched: red: delay destroying child qdisc on replace")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet_sched: prio: fix a race in prio_tune()
Eric Dumazet [Wed, 11 Jun 2025 11:15:11 +0000 (11:15 +0000)]
net_sched: prio: fix a race in prio_tune()

Gerrard Tai reported a race condition in PRIO, whenever SFQ perturb timer
fires at the wrong time.

The race is as follows:

CPU 0                                 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
 |
 |                                    [5]: lock root
 |                                    [6]: rehash
 |                                    [7]: qdisc_tree_reduce_backlog()
 |
[4]: qdisc_put()

This can be abused to underflow a parent's qlen.

Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.

Fixes: 7b8e0b6e6599 ("net: sched: prio: delay destroying child qdiscs on change")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet_sched: sch_sfq: reject invalid perturb period
Eric Dumazet [Wed, 11 Jun 2025 08:35:01 +0000 (08:35 +0000)]
net_sched: sch_sfq: reject invalid perturb period

Gerrard Tai reported that SFQ perturb_period has no range check yet,
and this can be used to trigger a race condition fixed in a separate patch.

We want to make sure ctl->perturb_period * HZ will not overflow
and is positive.

Tested:

tc qd add dev lo root sfq perturb -10   # negative value : error
Error: sch_sfq: invalid perturb period.

tc qd add dev lo root sfq perturb 1000000000 # too big : error
Error: sch_sfq: invalid perturb period.

tc qd add dev lo root sfq perturb 2000000 # acceptable value
tc -s -d qd sh dev lo
qdisc sfq 8005: root refcnt 2 limit 127p quantum 64Kb depth 127 flows 128 divisor 1024 perturb 2000000sec
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250611083501.1810459-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: phy: phy_caps: Don't skip better duplex macth on non-exact match
Maxime Chevallier [Fri, 6 Jun 2025 09:43:20 +0000 (11:43 +0200)]
net: phy: phy_caps: Don't skip better duplex macth on non-exact match

When performing a non-exact phy_caps lookup, we are looking for a
supported mode that matches as closely as possible the passed speed/duplex.

Blamed patch broke that logic by returning a match too early in case
the caller asks for half-duplex, as a full-duplex linkmode may match
first, and returned as a non-exact match without even trying to mach on
half-duplex modes.

Reported-by: Jijie Shao <shaojijie@huawei.com>
Closes: https://lore.kernel.org/netdev/20250603102500.4ec743cf@fedora/T/#m22ed60ca635c67dc7d9cbb47e8995b2beb5c1576
Tested-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Fixes: fc81e257d19f ("net: phy: phy_caps: Allow looking-up link caps based on speed and duplex")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250606094321.483602-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Linus Torvalds [Thu, 12 Jun 2025 02:00:20 +0000 (19:00 -0700)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull BPF fixes from Alexei Starovoitov

 - Fix libbpf backward compatibility (Andrii Nakryiko)

 - Add Stanislav Fomichev as bpf/net reviewer

 - Fix resolve_btfid build when cross compiling (Suleiman Souhlal)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  MAINTAINERS: Add myself as bpf networking reviewer
  tools/resolve_btfids: Fix build when cross compiling kernel with clang.
  libbpf: Handle unsupported mmap-based /sys/kernel/btf/vmlinux correctly

4 months agoMAINTAINERS: Update Kuniyuki Iwashima's email address.
Kuniyuki Iwashima [Tue, 10 Jun 2025 23:56:58 +0000 (23:56 +0000)]
MAINTAINERS: Update Kuniyuki Iwashima's email address.

I left Amazon and joined Google, so let's map the email
addresses accordingly.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250610235734.88540-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: net: add test case for NAT46 looping back dst
Jakub Kicinski [Tue, 10 Jun 2025 00:12:45 +0000 (17:12 -0700)]
selftests: net: add test case for NAT46 looping back dst

Simple test for crash involving multicast loopback and stale dst.
Reuse exising NAT46 program.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250610001245.1981782-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: clear the dst when changing skb protocol
Jakub Kicinski [Tue, 10 Jun 2025 00:12:44 +0000 (17:12 -0700)]
net: clear the dst when changing skb protocol

A not-so-careful NAT46 BPF program can crash the kernel
if it indiscriminately flips ingress packets from v4 to v6:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
    ip6_rcv_core (net/ipv6/ip6_input.c:190:20)
    ipv6_rcv (net/ipv6/ip6_input.c:306:8)
    process_backlog (net/core/dev.c:6186:4)
    napi_poll (net/core/dev.c:6906:9)
    net_rx_action (net/core/dev.c:7028:13)
    do_softirq (kernel/softirq.c:462:3)
    netif_rx (net/core/dev.c:5326:3)
    dev_loopback_xmit (net/core/dev.c:4015:2)
    ip_mc_finish_output (net/ipv4/ip_output.c:363:8)
    NF_HOOK (./include/linux/netfilter.h:314:9)
    ip_mc_output (net/ipv4/ip_output.c:400:5)
    dst_output (./include/net/dst.h:459:9)
    ip_local_out (net/ipv4/ip_output.c:130:9)
    ip_send_skb (net/ipv4/ip_output.c:1496:8)
    udp_send_skb (net/ipv4/udp.c:1040:8)
    udp_sendmsg (net/ipv4/udp.c:1328:10)

The output interface has a 4->6 program attached at ingress.
We try to loop the multicast skb back to the sending socket.
Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr
and changes skb->protocol to v6. We enter ip6_rcv_core which
tries to use skb_dst(). But the dst is still an IPv4 one left
after IPv4 mcast output.

Clear the dst in all BPF helpers which change the protocol.
Try to preserve metadata dsts, those may carry non-routing
metadata.

Cc: stable@vger.kernel.org
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()")
Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow")
Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250610001245.1981782-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agomm: pgtable: fix pte_swp_exclusive
Magnus Lindholm [Tue, 18 Feb 2025 17:55:14 +0000 (18:55 +0100)]
mm: pgtable: fix pte_swp_exclusive

Make pte_swp_exclusive return bool instead of int.  This will better
reflect how pte_swp_exclusive is actually used in the code.

This fixes swap/swapoff problems on Alpha due pte_swp_exclusive not
returning correct values when _PAGE_SWP_EXCLUSIVE bit resides in upper
32-bits of PTE (like on alpha).

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
Cc: Sam James <sam@gentoo.org>
Link: https://lore.kernel.org/lkml/20250218175735.19882-2-linmag7@gmail.com/
Link: https://lore.kernel.org/lkml/20250602041118.GA2675383@ZenIV/
[ Applied as the 'sed' script Al suggested   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoMerge branch 'mlx5-misc-fixes-2025-06-10'
Jakub Kicinski [Wed, 11 Jun 2025 21:41:15 +0000 (14:41 -0700)]
Merge branch 'mlx5-misc-fixes-2025-06-10'

Mark Bloch says:

====================
mlx5 misc fixes 2025-06-10

This patchset includes misc fixes from the team for the mlx5 core
and Ethernet drivers.
====================

Link: https://patch.msgid.link/20250610151514.1094735-1-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
Shahar Shitrit [Tue, 10 Jun 2025 15:15:14 +0000 (18:15 +0300)]
net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper

When the link is up, either eth_proto_oper or ext_eth_proto_oper
typically reports the active link protocol, from which both speed
and number of lanes can be retrieved. However, in certain cases,
such as when a NIC is connected via a non-standard cable, the
firmware may not report the protocol.

In such scenarios, the speed can still be obtained from the
data_rate_oper field in PTYS register. Since data_rate_oper
provides only speed information and lacks lane details, it is
incorrect to derive the number of lanes from it.

This patch corrects the behavior by setting the number of lanes to
UNKNOWN instead of incorrectly using MAX_LANES when relying on
data_rate_oper.

Fixes: 7e959797f021 ("net/mlx5e: Enable lanes configuration when auto-negotiation is off")
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-10-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/mlx5e: Fix leak of Geneve TLV option object
Jianbo Liu [Tue, 10 Jun 2025 15:15:13 +0000 (18:15 +0300)]
net/mlx5e: Fix leak of Geneve TLV option object

Previously, a unique tunnel id was added for the matching on TC
non-zero chains, to support inner header rewrite with goto action.
Later, it was used to support VF tunnel offload for vxlan, then for
Geneve and GRE. To support VF tunnel, a temporary mlx5_flow_spec is
used to parse tunnel options. For Geneve, if there is TLV option, a
object is created, or refcnt is added if already exists. But the
temporary mlx5_flow_spec is directly freed after parsing, which causes
the leak because no information regarding the object is saved in
flow's mlx5_flow_spec, which is used to free the object when deleting
the flow.

To fix the leak, call mlx5_geneve_tlv_option_del() before free the
temporary spec if it has TLV object.

Fixes: 521933cdc4aa ("net/mlx5e: Support Geneve and GRE with VF tunnel offload")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Alex Lazar <alazar@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-9-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/mlx5: HWS, make sure the uplink is the last destination
Vlad Dogaru [Tue, 10 Jun 2025 15:15:11 +0000 (18:15 +0300)]
net/mlx5: HWS, make sure the uplink is the last destination

When there are more than one destinations, we create a FW flow
table and provide it with all the destinations. FW requires to
have wire as the last destination in the list (if it exists),
otherwise the operation fails with FW syndrome.

This patch fixes the destination array action creation: if it
contains a wire destination, it is moved to the end.

Fixes: 504e536d9010 ("net/mlx5: HWS, added actions handling")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-7-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/mlx5: HWS, fix missing ip_version handling in definer
Yevgeny Kliteynik [Tue, 10 Jun 2025 15:15:10 +0000 (18:15 +0300)]
net/mlx5: HWS, fix missing ip_version handling in definer

Fix missing field handling in definer - outer IP version.

Fixes: 74a778b4a63f ("net/mlx5: HWS, added definers handling")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-6-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/mlx5: HWS, Init mutex on the correct path
Vlad Dogaru [Tue, 10 Jun 2025 15:15:09 +0000 (18:15 +0300)]
net/mlx5: HWS, Init mutex on the correct path

The newly introduced mutex is only used for reformat actions, but it was
initialized for modify header instead.

The struct that contains the mutex is zero-initialized and an all-zero
mutex is valid, so the issue only shows up with CONFIG_DEBUG_MUTEXES.

Fixes: b206d9ec19df ("net/mlx5: HWS, register reformat actions with fw")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/mlx5: Fix return value when searching for existing flow group
Patrisious Haddad [Tue, 10 Jun 2025 15:15:08 +0000 (18:15 +0300)]
net/mlx5: Fix return value when searching for existing flow group

When attempting to add a rule to an existing flow group, if a matching
flow group exists but is not active, the error code returned should be
EAGAIN, so that the rule can be added to the matching flow group once
it is active, rather than ENOENT, which indicates that no matching
flow group was found.

Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Gavi Teitz <gavi@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-4-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/mlx5: Fix ECVF vports unload on shutdown flow
Amir Tzin [Tue, 10 Jun 2025 15:15:07 +0000 (18:15 +0300)]
net/mlx5: Fix ECVF vports unload on shutdown flow

Fix shutdown flow UAF when a virtual function is created on the embedded
chip (ECVF) of a BlueField device. In such case the vport acl ingress
table is not properly destroyed.

ECVF functionality is independent of ecpf_vport_exists capability and
thus functions mlx5_eswitch_(enable|disable)_pf_vf_vports() should not
test it when enabling/disabling ECVF vports.

kernel log:
[] refcount_t: underflow; use-after-free.
[] WARNING: CPU: 3 PID: 1 at lib/refcount.c:28
   refcount_warn_saturate+0x124/0x220
----------------
[] Call trace:
[] refcount_warn_saturate+0x124/0x220
[] tree_put_node+0x164/0x1e0 [mlx5_core]
[] mlx5_destroy_flow_table+0x98/0x2c0 [mlx5_core]
[] esw_acl_ingress_table_destroy+0x28/0x40 [mlx5_core]
[] esw_acl_ingress_lgcy_cleanup+0x80/0xf4 [mlx5_core]
[] esw_legacy_vport_acl_cleanup+0x44/0x60 [mlx5_core]
[] esw_vport_cleanup+0x64/0x90 [mlx5_core]
[] mlx5_esw_vport_disable+0xc0/0x1d0 [mlx5_core]
[] mlx5_eswitch_unload_ec_vf_vports+0xcc/0x150 [mlx5_core]
[] mlx5_eswitch_disable_sriov+0x198/0x2a0 [mlx5_core]
[] mlx5_device_disable_sriov+0xb8/0x1e0 [mlx5_core]
[] mlx5_sriov_detach+0x40/0x50 [mlx5_core]
[] mlx5_unload+0x40/0xc4 [mlx5_core]
[] mlx5_unload_one_devl_locked+0x6c/0xe4 [mlx5_core]
[] mlx5_unload_one+0x3c/0x60 [mlx5_core]
[] shutdown+0x7c/0xa4 [mlx5_core]
[] pci_device_shutdown+0x3c/0xa0
[] device_shutdown+0x170/0x340
[] __do_sys_reboot+0x1f4/0x2a0
[] __arm64_sys_reboot+0x2c/0x40
[] invoke_syscall+0x78/0x100
[] el0_svc_common.constprop.0+0x54/0x184
[] do_el0_svc+0x30/0xac
[] el0_svc+0x48/0x160
[] el0t_64_sync_handler+0xa4/0x12c
[] el0t_64_sync+0x1a4/0x1a8
[] --[ end trace 9c4601d68c70030e ]---

Fixes: a7719b29a821 ("net/mlx5: Add management of EC VF vports")
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/mlx5: Ensure fw pages are always allocated on same NUMA
Moshe Shemesh [Tue, 10 Jun 2025 15:15:06 +0000 (18:15 +0300)]
net/mlx5: Ensure fw pages are always allocated on same NUMA

When firmware asks the driver to allocate more pages, using event of
give_pages, the driver should always allocate it from same NUMA, the
original device NUMA. Current code uses dev_to_node() which can result
in different NUMA as it is changed by other driver flows, such as
mlx5_dma_zalloc_coherent_node(). Instead, use saved numa node for
allocating firmware pages.

Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on reader NUMA node")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Jakub Kicinski [Wed, 11 Jun 2025 21:28:51 +0000 (14:28 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-06-10 (i40e, iavf, ice, e1000)

For i40e:
Robert Malz improves reset handling for situations where multiple reset
requests could cause some to be missed.

For iavf:
Ahmed adds detection, and handling, of reset that could occur early in
the initialization process to stop long wait/hangs.

For ice:
Anton, properly, sets missed use_nsecs value.

For e1000:
Joe Damato moves cancel_work_sync() call to avoid deadlock.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  e1000: Move cancel_work_sync to avoid deadlock
  ice/ptp: fix crosstimestamp reporting
  iavf: fix reset_task for early reset event
  i40e: retry VFLR handling if there is ongoing VF reset
  i40e: return false from i40e_reset_vf if reset is in progress
====================

Link: https://patch.msgid.link/20250610171348.1476574-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoBluetooth: MGMT: Fix sparse errors
Luiz Augusto von Dentz [Wed, 11 Jun 2025 20:36:27 +0000 (16:36 -0400)]
Bluetooth: MGMT: Fix sparse errors

This fixes the following errors:

net/bluetooth/mgmt.c:5400:59: sparse: sparse: incorrect type in argument 3
(different base types) @@     expected unsigned short [usertype] handle @@
got restricted __le16 [usertype] monitor_handle @@
net/bluetooth/mgmt.c:5400:59: sparse:     expected unsigned short [usertype] handle
net/bluetooth/mgmt.c:5400:59: sparse:     got restricted __le16 [usertype] monitor_handle

Fixes: e6ed54e86aae ("Bluetooth: MGMT: Fix UAF on mgmt_remove_adv_monitor_complete")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506060347.ux2O1p7L-lkp@intel.com/
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: ISO: Fix not using bc_sid as advertisement SID
Luiz Augusto von Dentz [Fri, 6 Jun 2025 18:32:49 +0000 (14:32 -0400)]
Bluetooth: ISO: Fix not using bc_sid as advertisement SID

Currently bc_sid is being ignore when acting as Broadcast Source role,
so this fix it by passing the bc_sid and then use it when programming
the PA:

< HCI Command: LE Set Exte.. (0x08|0x0036) plen 25
        Handle: 0x01
        Properties: 0x0000
        Min advertising interval: 140.000 msec (0x00e0)
        Max advertising interval: 140.000 msec (0x00e0)
        Channel map: 37, 38, 39 (0x07)
        Own address type: Random (0x01)
        Peer address type: Public (0x00)
        Peer address: 00:00:00:00:00:00 (OUI 00-00-00)
        Filter policy: Allow Scan Request from Any, Allow Connect Request from Any (0x00)
        TX power: Host has no preference (0x7f)
        Primary PHY: LE 1M (0x01)
        Secondary max skip: 0x00
        Secondary PHY: LE 2M (0x02)
        SID: 0x01
        Scan request notifications: Disabled (0x00)

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets
Luiz Augusto von Dentz [Mon, 9 Jun 2025 18:53:53 +0000 (14:53 -0400)]
Bluetooth: ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets

BT_SK_PA_SYNC is only valid for Broadcast Sinks which means socket used
for Broadcast Sources wouldn't be able to use the likes of getpeername
to read out the sockaddr_iso_bc fields which may have been update (e.g.
bc_sid).

Fixes: 0a766a0affb5 ("Bluetooth: ISO: Fix getpeername not returning sockaddr_iso_bc fields")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: eir: Fix possible crashes on eir_create_adv_data
Luiz Augusto von Dentz [Tue, 10 Jun 2025 14:14:35 +0000 (10:14 -0400)]
Bluetooth: eir: Fix possible crashes on eir_create_adv_data

eir_create_adv_data may attempt to add EIR_FLAGS and EIR_TX_POWER
without checking if that would fit.

Link: https://github.com/bluez/bluez/issues/1117#issuecomment-2958244066
Fixes: 01ce70b0a274 ("Bluetooth: eir: Move EIR/Adv Data functions to its own file")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: hci_sync: Fix broadcast/PA when using an existing instance
Luiz Augusto von Dentz [Thu, 5 Jun 2025 15:15:16 +0000 (11:15 -0400)]
Bluetooth: hci_sync: Fix broadcast/PA when using an existing instance

When using and existing adv_info instance for broadcast source it
needs to be updated to periodic first before it can be reused, also in
case the existing instance already have data hci_set_adv_instance_data
cannot be used directly since it would overwrite the existing data so
this reappend the original data after the Broadcast ID, if one was
generated.

Example:

bluetoothctl># Add PBP to EA so it can be later referenced as the BIS ID
bluetoothctl> advertise.service 0x1856 0x00 0x00
bluetoothctl> advertise on
...
< HCI Command: LE Set Extended Advertising Data (0x08|0x0037) plen 13
        Handle: 0x01
        Operation: Complete extended advertising data (0x03)
        Fragment preference: Minimize fragmentation (0x01)
        Data length: 0x09
        Service Data: Public Broadcast Announcement (0x1856)
          Data[2]: 0000
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
...
bluetoothctl># Attempt to acquire Broadcast Source transport
bluetoothctl>transport.acquire /org/bluez/hci0/pac_bcast0/fd0
...
< HCI Command: LE Set Extended Advertising Data (0x08|0x0037) plen 255
        Handle: 0x01
        Operation: Complete extended advertising data (0x03)
        Fragment preference: Minimize fragmentation (0x01)
        Data length: 0x0e
        Service Data: Broadcast Audio Announcement (0x1852)
        Broadcast ID: 11371620 (0xad8464)
        Service Data: Public Broadcast Announcement (0x1856)
          Data[2]: 0000
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported

Link: https://github.com/bluez/bluez/issues/1117
Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: Fix NULL pointer deference on eir_get_service_data
Luiz Augusto von Dentz [Thu, 5 Jun 2025 15:14:25 +0000 (11:14 -0400)]
Bluetooth: Fix NULL pointer deference on eir_get_service_data

The len parameter is considered optional so it can be NULL so it cannot
be used for skipping to next entry of EIR_SERVICE_DATA.

Fixes: 8f9ae5b3ae80 ("Bluetooth: eir: Add helpers for managing service data")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoKEYS: Invert FINAL_PUT bit
Herbert Xu [Wed, 28 May 2025 12:18:17 +0000 (13:18 +0100)]
KEYS: Invert FINAL_PUT bit

Invert the FINAL_PUT bit so that test_bit_acquire and clear_bit_unlock
can be used instead of smp_mb.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agonet/mdiobus: Fix potential out-of-bounds clause 45 read/write access
Jakub Raczynski [Mon, 9 Jun 2025 15:31:47 +0000 (17:31 +0200)]
net/mdiobus: Fix potential out-of-bounds clause 45 read/write access

When using publicly available tools like 'mdio-tools' to read/write data
from/to network interface and its PHY via C45 (clause 45) mdiobus,
there is no verification of parameters passed to the ioctl and
it accepts any mdio address.
Currently there is support for 32 addresses in kernel via PHY_MAX_ADDR define,
but it is possible to pass higher value than that via ioctl.
While read/write operation should generally fail in this case,
mdiobus provides stats array, where wrong address may allow out-of-bounds
read/write.

Fix that by adding address verification before C45 read/write operation.
While this excludes this access from any statistics, it improves security of
read/write operation.

Fixes: 4e4aafcddbbf ("net: mdio: Add dedicated C45 API to MDIO bus drivers")
Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
Reported-by: Wenjing Shan <wenjing.shan@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 months agonet/mdiobus: Fix potential out-of-bounds read/write access
Jakub Raczynski [Mon, 9 Jun 2025 15:31:46 +0000 (17:31 +0200)]
net/mdiobus: Fix potential out-of-bounds read/write access

When using publicly available tools like 'mdio-tools' to read/write data
from/to network interface and its PHY via mdiobus, there is no verification of
parameters passed to the ioctl and it accepts any mdio address.
Currently there is support for 32 addresses in kernel via PHY_MAX_ADDR define,
but it is possible to pass higher value than that via ioctl.
While read/write operation should generally fail in this case,
mdiobus provides stats array, where wrong address may allow out-of-bounds
read/write.

Fix that by adding address verification before read/write operation.
While this excludes this access from any statistics, it improves security of
read/write operation.

Fixes: 080bb352fad00 ("net: phy: Maintain MDIO device and bus statistics")
Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
Reported-by: Wenjing Shan <wenjing.shan@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 months agowifi: cfg80211: use kfree_sensitive() for connkeys cleanup
Zilin Guan [Fri, 23 May 2025 11:01:56 +0000 (11:01 +0000)]
wifi: cfg80211: use kfree_sensitive() for connkeys cleanup

The nl80211_parse_connkeys() function currently uses kfree() to release
the 'result' structure in error handling paths. However, if an error
occurs due to result->def being less than 0, the 'result' structure may
contain sensitive information.

To prevent potential leakage of sensitive data, replace kfree() with
kfree_sensitive() when freeing 'result'. This change aligns with the
approach used in its caller, nl80211_join_ibss(), enhancing the overall
security of the wireless subsystem.

Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Link: https://patch.msgid.link/20250523110156.4017111-1-zilin@seu.edu.cn
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
4 months agoMerge tag 'ath-current-20250608' of git://git.kernel.org/pub/scm/linux/kernel/git...
Johannes Berg [Wed, 11 Jun 2025 09:28:35 +0000 (11:28 +0200)]
Merge tag 'ath-current-20250608' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath

Jeff Johnson says:
==================
ath.git updates for v6.16-rc2

Fix a handful of both build and stability issues across multiple drivers.
==================

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
4 months agowifi: iwlwifi: fix merge damage related to iwl_pci_resume
Emmanuel Grumbach [Tue, 3 Jun 2025 09:17:54 +0000 (12:17 +0300)]
wifi: iwlwifi: fix merge damage related to iwl_pci_resume

The changes I made in
wifi: iwlwifi: don't warn if the NIC is gone in resume
conflicted with a major rework done in this area.
The merge de-facto removed my changes.
Re-add them.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20250603091754.70182-1-emmanuel.grumbach@intel.com
Fixes: 06c4b2036818 ("Merge tag 'iwlwifi-next-2025-05-15' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
4 months agoRevert "wifi: mwifiex: Fix HT40 bandwidth issue."
Francesco Dolcini [Thu, 5 Jun 2025 13:03:02 +0000 (15:03 +0200)]
Revert "wifi: mwifiex: Fix HT40 bandwidth issue."

This reverts commit 4fcfcbe45734 ("wifi: mwifiex: Fix HT40 bandwidth
issue.")

That commit introduces a regression, when HT40 mode is enabled,
received packets are lost, this was experience with W8997 with both
SDIO-UART and SDIO-SDIO variants. From an initial investigation the
issue solves on its own after some time, but it's not clear what is
the reason. Given that this was just a performance optimization, let's
revert it till we have a better understanding of the issue and a proper
fix.

Cc: Jeff Chen <jeff.chen_1@nxp.com>
Cc: stable@vger.kernel.org
Fixes: 4fcfcbe45734 ("wifi: mwifiex: Fix HT40 bandwidth issue.")
Closes: https://lore.kernel.org/all/20250603203337.GA109929@francesco-nb/
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Link: https://patch.msgid.link/20250605130302.55555-1-francesco@dolcini.it
[fix commit reference format]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
4 months agomacsec: MACsec SCI assignment for ES = 0
Carlos Fernandez [Mon, 9 Jun 2025 07:26:26 +0000 (09:26 +0200)]
macsec: MACsec SCI assignment for ES = 0

According to 802.1AE standard, when ES and SC flags in TCI are zero,
used SCI should be the current active SC_RX. Current code uses the
header MAC address. Without this patch, when ES flag is 0 (using a
bridge or switch), header MAC will not fit the SCI and MACSec frames
will be discarted.

In order to test this issue, MACsec link should be stablished between
two interfaces, setting SC and ES flags to zero and a port identifier
different than one. For example, using ip macsec tools:

ip link add link $ETH0 macsec0 type macsec port 11 send_sci off
end_station off
ip macsec add macsec0 tx sa 0 pn 2 on key 01 $ETH1_KEY
ip macsec add macsec0 rx port 11 address $ETH1_MAC
ip macsec add macsec0 rx port 11 address $ETH1_MAC sa 0 pn 2 on key 02
ip link set dev macsec0 up

ip link add link $ETH1 macsec1 type macsec port 11 send_sci off
end_station off
ip macsec add macsec1 tx sa 0 pn 2 on key 01 $ETH0_KEY
ip macsec add macsec1 rx port 11 address $ETH0_MAC
ip macsec add macsec1 rx port 11 address $ETH0_MAC sa 0 pn 2 on key 02
ip link set dev macsec1 up

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Co-developed-by: Andreu Montiel <Andreu.Montiel@technica-engineering.de>
Signed-off-by: Andreu Montiel <Andreu.Montiel@technica-engineering.de>
Signed-off-by: Carlos Fernandez <carlos.fernandez@technica-engineering.de>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 months agonet: airoha: Enable RX queues 16-31
Lorenzo Bianconi [Mon, 9 Jun 2025 20:40:35 +0000 (22:40 +0200)]
net: airoha: Enable RX queues 16-31

Fix RX_DONE_INT_MASK definition in order to enable RX queues 16-31.

Fixes: f252493e18353 ("net: airoha: Enable multiple IRQ lines support in airoha_eth driver.")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250609-aioha-fix-rx-queue-mask-v1-1-f33706a06fa2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonetconsole: fix appending sysdata when sysdata_fields == SYSDATA_RELEASE
Gustavo Luiz Duarte [Mon, 9 Jun 2025 18:24:20 +0000 (11:24 -0700)]
netconsole: fix appending sysdata when sysdata_fields == SYSDATA_RELEASE

Before appending sysdata, prepare_extradata() checks if any feature is
enabled in sysdata_fields (and exits early if none is enabled).

When SYSDATA_RELEASE was introduced, we missed adding it to the list of
features being checked against sysdata_fields in prepare_extradata().
The result was that, if only SYSDATA_RELEASE is enabled in
sysdata_fields, we incorreclty exit early and fail to append the
release.

Instead of checking specific bits in sysdata_fields, check if
sysdata_fields has ALL bit zeroed and exit early if true. This fixes
case when only SYSDATA_RELEASE enabled and makes the code more general /
less error prone in future feature implementation.

Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Fixes: cfcc9239e78a ("netconsole: append release to sysdata")
Link: https://patch.msgid.link/20250609-netconsole-fix-v1-1-17543611ae31@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: Fix TOCTOU issue in sk_is_readable()
Michal Luczaj [Mon, 9 Jun 2025 17:08:03 +0000 (19:08 +0200)]
net: Fix TOCTOU issue in sk_is_readable()

sk->sk_prot->sock_is_readable is a valid function pointer when sk resides
in a sockmap. After the last sk_psock_put() (which usually happens when
socket is removed from sockmap), sk->sk_prot gets restored and
sk->sk_prot->sock_is_readable becomes NULL.

This makes sk_is_readable() racy, if the value of sk->sk_prot is reloaded
after the initial check. Which in turn may lead to a null pointer
dereference.

Ensure the function pointer does not turn NULL after the check.

Fixes: 8934ce2fd081 ("bpf: sockmap redirect ingress support")
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250609-skisreadable-toctou-v1-1-d0dfb2d62c37@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: usb: r8152: Add device ID for TP-Link UE200
Lucas Sanchez Sagrado [Mon, 9 Jun 2025 14:55:36 +0000 (16:55 +0200)]
net: usb: r8152: Add device ID for TP-Link UE200

The TP-Link UE200 is a RTL8152B based USB 2.0 Fast Ethernet adapter. This
patch adds its device ID. It has been tested on Ubuntu 22.04.5.

Signed-off-by: Lucas Sanchez Sagrado <lucsansag@gmail.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/20250609145536.26648-1-lucsansag@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMAINTAINERS: Add myself as bpf networking reviewer
Stanislav Fomichev [Tue, 10 Jun 2025 17:54:37 +0000 (10:54 -0700)]
MAINTAINERS: Add myself as bpf networking reviewer

I've been focusing on networking BPF bits lately, add myself as a
reviewer.

Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20250610175442.2138504-1-stfomichev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoRevert "mm/damon/Kconfig: enable CONFIG_DAMON by default"
Linus Torvalds [Tue, 10 Jun 2025 17:22:15 +0000 (10:22 -0700)]
Revert "mm/damon/Kconfig: enable CONFIG_DAMON by default"

This reverts commit 28615e6eed152f2fda5486680090b74aeed7b554.

No, we don't make random features default to being on.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoe1000: Move cancel_work_sync to avoid deadlock
Joe Damato [Tue, 3 Jun 2025 16:34:01 +0000 (16:34 +0000)]
e1000: Move cancel_work_sync to avoid deadlock

Previously, e1000_down called cancel_work_sync for the e1000 reset task
(via e1000_down_and_stop), which takes RTNL.

As reported by users and syzbot, a deadlock is possible in the following
scenario:

CPU 0:
  - RTNL is held
  - e1000_close
  - e1000_down
  - cancel_work_sync (cancel / wait for e1000_reset_task())

CPU 1:
  - process_one_work
  - e1000_reset_task
  - take RTNL

To remedy this, avoid calling cancel_work_sync from e1000_down
(e1000_reset_task does nothing if the device is down anyway). Instead,
call cancel_work_sync for e1000_reset_task when the device is being
removed.

Fixes: e400c7444d84 ("e1000: Hold RTNL when e1000_down can be called")
Reported-by: syzbot+846bb38dc67fe62cc733@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/683837bf.a00a0220.52848.0003.GAE@google.com/
Reported-by: John <john.cs.hey@gmail.com>
Closes: https://lore.kernel.org/netdev/CAP=Rh=OEsn4y_2LvkO3UtDWurKcGPnZ_NPSXK=FbgygNXL37Sw@mail.gmail.com/
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 months agoice/ptp: fix crosstimestamp reporting
Anton Nadezhdin [Tue, 20 May 2025 08:42:16 +0000 (10:42 +0200)]
ice/ptp: fix crosstimestamp reporting

Set use_nsecs=true as timestamp is reported in ns. Lack of this result
in smaller timestamp error window which cause error during phc2sys
execution on E825 NICs:
phc2sys[1768.256]: ioctl PTP_SYS_OFFSET_PRECISE: Invalid argument

This problem was introduced in the cited commit which omitted setting
use_nsecs to true when converting the ice driver to use
convert_base_to_cs().

Testing hints (ethX is PF netdev):
phc2sys -s ethX -c CLOCK_REALTIME  -O 37 -m
phc2sys[1769.256]: CLOCK_REALTIME phc offset -5 s0 freq      -0 delay    0

Fixes: d4bea547ebb57 ("ice/ptp: Remove convert_art_to_tsc()")
Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 months agoiavf: fix reset_task for early reset event
Ahmed Zaki [Thu, 24 Apr 2025 13:50:13 +0000 (15:50 +0200)]
iavf: fix reset_task for early reset event

If a reset event is received from the PF early in the init cycle, the
state machine hangs for about 25 seconds.

Reproducer:
  echo 1 > /sys/class/net/$PF0/device/sriov_numvfs
  ip link set dev $PF0 vf 0 mac $NEW_MAC

The log shows:
  [792.620416] ice 0000:5e:00.0: Enabling 1 VFs
  [792.738812] iavf 0000:5e:01.0: enabling device (0000 -> 0002)
  [792.744182] ice 0000:5e:00.0: Enabling 1 VFs with 17 vectors and 16 queues per VF
  [792.839964] ice 0000:5e:00.0: Setting MAC 52:54:00:00:00:11 on VF 0. VF driver will be reinitialized
  [813.389684] iavf 0000:5e:01.0: Failed to communicate with PF; waiting before retry
  [818.635918] iavf 0000:5e:01.0: Hardware came out of reset. Attempting reinit.
  [818.766273] iavf 0000:5e:01.0: Multiqueue Enabled: Queue pair count = 16

Fix it by scheduling the reset task and making the reset task capable of
resetting early in the init cycle.

Fixes: ef8693eb90ae3 ("i40evf: refactor reset handling")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 months agoi40e: retry VFLR handling if there is ongoing VF reset
Robert Malz [Tue, 20 May 2025 08:31:52 +0000 (10:31 +0200)]
i40e: retry VFLR handling if there is ongoing VF reset

When a VFLR interrupt is received during a VF reset initiated from a
different source, the VFLR may be not fully handled. This can
leave the VF in an undefined state.
To address this, set the I40E_VFLR_EVENT_PENDING bit again during VFLR
handling if the reset is not yet complete. This ensures the driver
will properly complete the VF reset in such scenarios.

Fixes: 52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF")
Signed-off-by: Robert Malz <robert.malz@canonical.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 months agoi40e: return false from i40e_reset_vf if reset is in progress
Robert Malz [Tue, 20 May 2025 08:31:51 +0000 (10:31 +0200)]
i40e: return false from i40e_reset_vf if reset is in progress

The function i40e_vc_reset_vf attempts, up to 20 times, to handle a
VF reset request, using the return value of i40e_reset_vf as an indicator
of whether the reset was successfully triggered. Currently, i40e_reset_vf
always returns true, which causes new reset requests to be ignored if a
different VF reset is already in progress.

This patch updates the return value of i40e_reset_vf to reflect when
another VF reset is in progress, allowing the caller to properly use
the retry mechanism.

Fixes: 52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF")
Signed-off-by: Robert Malz <robert.malz@canonical.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 months agotools/resolve_btfids: Fix build when cross compiling kernel with clang.
Suleiman Souhlal [Fri, 6 Jun 2025 07:45:38 +0000 (16:45 +0900)]
tools/resolve_btfids: Fix build when cross compiling kernel with clang.

When cross compiling the kernel with clang, we need to override
CLANG_CROSS_FLAGS when preparing the step libraries.

Prior to commit d1d096312176 ("tools: fix annoying "mkdir -p ..." logs
when building tools in parallel"), MAKEFLAGS would have been set to a
value that wouldn't set a value for CLANG_CROSS_FLAGS, hiding the
fact that we weren't properly overriding it.

Fixes: 56a2df7615fa ("tools/resolve_btfids: Compile resolve_btfids as host program")
Signed-off-by: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20250606074538.1608546-1-suleiman@google.com
4 months agopinctrl: sunxi: dt: Consider pin base when calculating bank number from pin
Chen-Yu Tsai [Sat, 7 Jun 2025 13:52:03 +0000 (21:52 +0800)]
pinctrl: sunxi: dt: Consider pin base when calculating bank number from pin

In prepare_function_table() when the pinctrl function table IRQ entries
are generated, the pin bank is calculated from the absolute pin number;
however the IRQ bank mux array is indexed from the first pin bank of the
controller. For R_PIO controllers, this means the absolute pin bank is
way off from the relative pin bank used for array indexing.

Correct this by taking into account the pin base of the controller.

Fixes: f5e2cd34b12f ("pinctrl: sunxi: allow reading mux values from DT")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Link: https://lore.kernel.org/20250607135203.2085226-1-wens@kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
4 months agoMerge tag 'powerpc-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Tue, 10 Jun 2025 03:06:58 +0000 (20:06 -0700)]
Merge tag 'powerpc-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Madhavan Srinivasan:

 - a couple of fixes for out of bounds issues in memtrace and vas

Thanks to Ritesh Harjani (IBM), Haren Myneni, and Jonathan Greental

* tag 'powerpc-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/vas: Return -EINVAL if the offset is non-zero in mmap()
  powerpc/powernv/memtrace: Fix out of bounds issue in memtrace mmap

4 months agopowerpc/vas: Return -EINVAL if the offset is non-zero in mmap()
Haren Myneni [Tue, 10 Jun 2025 02:12:27 +0000 (07:42 +0530)]
powerpc/vas: Return -EINVAL if the offset is non-zero in mmap()

The user space calls mmap() to map VAS window paste address
and the kernel returns the complete mapped page for each
window. So return -EINVAL if non-zero is passed for offset
parameter to mmap().

See Documentation/arch/powerpc/vas-api.rst for mmap()
restrictions.

Co-developed-by: Jonathan Greental <yonatan02greental@gmail.com>
Signed-off-by: Jonathan Greental <yonatan02greental@gmail.com>
Reported-by: Jonathan Greental <yonatan02greental@gmail.com>
Fixes: dda44eb29c23 ("powerpc/vas: Add VAS user space API")
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250610021227.361980-2-maddy@linux.ibm.com