]> www.infradead.org Git - users/hch/misc.git/log
users/hch/misc.git
6 weeks agonet: sfp: convert sfp quirks to modify struct sfp_module_support
Russell King (Oracle) [Tue, 16 Sep 2025 21:46:46 +0000 (22:46 +0100)]
net: sfp: convert sfp quirks to modify struct sfp_module_support

In order to provide extensible module support properties, arrange for
the SFP quirks to modify any member of the sfp_module_support struct,
rather than just the ethtool link modes and interfaces.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uydVe-000000061WK-3KwI@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: sfp: pre-parse the module support
Russell King (Oracle) [Tue, 16 Sep 2025 21:46:41 +0000 (22:46 +0100)]
net: sfp: pre-parse the module support

Pre-parse the module support on insert rather than when the upstream
requests the data. This will allow more flexible and extensible
parsing.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uydVZ-000000061WE-2pXD@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: add phy_interface_copy()
Russell King (Oracle) [Tue, 16 Sep 2025 21:46:36 +0000 (22:46 +0100)]
net: phy: add phy_interface_copy()

Add a helper for copying PHY interface bitmasks. This will be used by
the SFP bus code, which will then be moved to phylink in the subsequent
patches.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uydVU-000000061W8-2IDT@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'mptcp-pm-netlink-announce-server-side-flag'
Jakub Kicinski [Mon, 22 Sep 2025 18:51:27 +0000 (11:51 -0700)]
Merge branch 'mptcp-pm-netlink-announce-server-side-flag'

Matthieu Baerts says:

====================
mptcp: pm: netlink: announce server-side flag

Now that the 'flags' attribute is used, it seems interesting to add one
flag for 'server-side', a boolean value.

Here are a few patches related to the 'server-side' attribute:

- Patch 1: only announce this attribute on the server side.

- Patch 2: announce the 'server-side' flag when this is the case.

- Patch 3: deprecate the 'server-side' attribute.

- Patch 4: use the 'server-side' flag in the selftests.

- Patches 5, 6: small cleanups when working on code around.
====================

Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-0-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agomptcp: remove unused returned value of check_data_fin
Matthieu Baerts (NGI0) [Fri, 19 Sep 2025 12:09:03 +0000 (14:09 +0200)]
mptcp: remove unused returned value of check_data_fin

When working on a fix modifying mptcp_check_data_fin(), I noticed the
returned value was no longer used.

It looks like it was used for 3 days, between commit 7ed90803a213
("mptcp: send explicit ack on delayed ack_seq incr") and commit
ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling").

This returned value can be safely removed.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-6-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agomptcp: use _BITUL() instead of (1 << x)
Matthieu Baerts (NGI0) [Fri, 19 Sep 2025 12:09:02 +0000 (14:09 +0200)]
mptcp: use _BITUL() instead of (1 << x)

Simply to use the proper way to declare bits, and to align with all
other flags declared in this file.

No functional changes intended.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-5-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: mptcp: pm: get server-side flag
Matthieu Baerts (NGI0) [Fri, 19 Sep 2025 12:09:01 +0000 (14:09 +0200)]
selftests: mptcp: pm: get server-side flag

server-side info linked to the MPTCP connect/established events can now
come from the flags, in addition to the dedicated attribute.

The attribute is now deprecated -- in favour of the new flag, and will
be removed later on.

Print this info only once.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-4-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agomptcp: pm: netlink: deprecate server-side attribute
Matthieu Baerts (NGI0) [Fri, 19 Sep 2025 12:09:00 +0000 (14:09 +0200)]
mptcp: pm: netlink: deprecate server-side attribute

Now that such info is in the 'flags' attribute, it is time to deprecate
the dedicated 'server-side' attribute.

It will be removed in a few versions.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-3-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agomptcp: pm: netlink: announce server-side flag
Matthieu Baerts (NGI0) [Fri, 19 Sep 2025 12:08:59 +0000 (14:08 +0200)]
mptcp: pm: netlink: announce server-side flag

Now that the 'flags' attribute is used, it seems interesting to add one
flag for 'server-side', a boolean value.

This is duplicating the info from the dedicated 'server-side' attribute,
but it will be deprecated in the next commit, and removed in a few
versions.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-2-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agomptcp: pm: netlink: only add server-side attr when true
Matthieu Baerts (NGI0) [Fri, 19 Sep 2025 12:08:58 +0000 (14:08 +0200)]
mptcp: pm: netlink: only add server-side attr when true

This attribute is a boolean. No need to add it to set it to 'false'.

Indeed, the default value when this attribute is not set is naturally
'false'. A few bytes can then be saved by not adding this attribute if
the connection is not on the server side.

This prepares the future deprecation of its attribute, in favour of a
new flag.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-1-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: spacemit: Make stats_lock softirq-safe
Vivian Wang [Fri, 19 Sep 2025 12:04:33 +0000 (20:04 +0800)]
net: spacemit: Make stats_lock softirq-safe

While most of the statistics functions (emac_get_stats64() and such) are
called with softirqs enabled, emac_stats_timer() is, as its name
suggests, also called from a timer, i.e. called in softirq context.

All of these take stats_lock. Therefore, make stats_lock softirq-safe by
changing spin_lock() into spin_lock_bh() for the functions that get
statistics.

Also, instead of directly calling emac_stats_timer() in emac_up() and
emac_resume(), set the timer to trigger instead, so that
emac_stats_timer() is only called from the timer. It will keep using
spin_lock().

This fixes a lockdep warning, and potential deadlock when stats_timer is
triggered in the middle of getting statistics.

Fixes: bfec6d7f2001 ("net: spacemit: Add K1 Ethernet MAC")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Closes: https://lore.kernel.org/all/a52c0cf5-0444-41aa-b061-a0a1d72b02fe@samsung.com/
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20250919-k1-ethernet-fix-lock-v1-1-c8b700aa4954@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-enetc-improve-the-interface-for-obtaining-phc_index'
Jakub Kicinski [Mon, 22 Sep 2025 18:49:18 +0000 (11:49 -0700)]
Merge branch 'net-enetc-improve-the-interface-for-obtaining-phc_index'

Wei Fang says:

====================
net: enetc: improve the interface for obtaining phc_index

The first patch is to fix the issue that a sleeping function is called
in the context of rcu_read_lock(). The second patch is to use the generic
API instead of the custom API to get phc_index. In addition, the second
patch depends on the first patch to work.

v1: https://lore.kernel.org/20250918074454.1742328-1-wei.fang@nxp.com
====================

Link: https://patch.msgid.link/20250919084509.1846513-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: enetc: use generic interfaces to get phc_index for ENETC v1
Wei Fang [Fri, 19 Sep 2025 08:45:09 +0000 (16:45 +0800)]
net: enetc: use generic interfaces to get phc_index for ENETC v1

The commit 61f132ca8c46 ("ptp: add helpers to get the phc_index by
of_node or dev") has added two generic interfaces to get the phc_index
of the PTP clock. This eliminates the need for PTP device drivers to
provide custom APIs for consumers to retrieve the phc_index. This has
already been implemented for ENETC v4 and is also applicable to ENETC
v1. Therefore, the global variable enetc_phc_index is removed from the
driver. ENETC v1 now uses the same interface as v4 to get phc_index.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250919084509.1846513-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: enetc: fix sleeping function called from rcu_read_lock() context
Wei Fang [Fri, 19 Sep 2025 08:45:08 +0000 (16:45 +0800)]
net: enetc: fix sleeping function called from rcu_read_lock() context

The rcu_read_lock() has been introduced in __ethtool_get_ts_info() since
the commit 4c61d809cf60 ("net: ethtool: Fix suspicious rcu_dereference
usage"). Therefore, the device drivers cannot use any sleeping functions
when implementing the callback of ethtool_ops::get_ts_info(). Currently,
pci_get_slot() is used in enetc_get_ts_info(), but it calls down_read()
which might sleep, so this is a potential issue. Therefore, to fix this
issue, pci_get_domain_bus_and_slot() is used to replace pci_get_slot()
in enetc_get_ts_info().

Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Closes: https://lore.kernel.org/netdev/20250918124823.t3xlzn7w2glzkhnx@skbuf/
Fixes: f5b9a1cde0a2 ("net: enetc: add PTP synchronization support for ENETC v4")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250919084509.1846513-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'tcp-clean-up-inet_hash-and-inet_unhash'
Jakub Kicinski [Mon, 22 Sep 2025 18:38:47 +0000 (11:38 -0700)]
Merge branch 'tcp-clean-up-inet_hash-and-inet_unhash'

Kuniyuki Iwashima says:

====================
tcp: Clean up inet_hash() and inet_unhash().

While reviewing the ehash fix series from Xuanqiang Luo [0],
I noticed that inet_twsk_hashdance_schedule() checks the
retval of __sk_nulls_del_node_init_rcu(), which looks confusing.

The test exists from the pre-git era:

  $ git blame -L:tcp_tw_hashdance net/ipv4/tcp_minisocks.c e48c414ee61f4~

Patch 3 is to clarify that the retval check is unnecessary in
inet_twsk_hashdance_schedule(), but I'll delegate its removal
to the Xuanqiang's series.

Patch 1 & 2 are minor cleanups.

[0]: https://lore.kernel.org/netdev/20250916103054.719584-4-xuanqiang.luo@linux.dev/
====================

Link: https://patch.msgid.link/20250919083706.1863217-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotcp: Remove redundant sk_unhashed() in inet_unhash().
Kuniyuki Iwashima [Fri, 19 Sep 2025 08:35:30 +0000 (08:35 +0000)]
tcp: Remove redundant sk_unhashed() in inet_unhash().

inet_unhash() checks sk_unhashed() twice at the entry and after locking
ehash/lhash bucket.

The former was somehow added redundantly by commit 4f9bf2a2f5aa ("tcp:
Don't acquire inet_listen_hashbucket::lock with disabled BH.").

inet_unhash() is called for the full socket from 4 places, and it is
always under lock_sock() or the socket is not yet published to other
threads:

  1. __sk_prot_rehash()
     -> called from inet_sk_reselect_saddr(), which has
        lockdep_sock_is_held()

  2. sk_common_release()
     -> called when inet_create() or inet6_create() fail, then the
        socket is not yet published

  3. tcp_set_state()
     -> calls tcp_call_bpf_2arg(), and tcp_call_bpf() has
        sock_owned_by_me()

  4. inet_ctl_sock_create()
     -> creates a kernel socket and unhashes it immediately, but TCP
        socket is not hashed in sock_create_kern() (only SOCK_RAW is)

So we do not need to check sk_unhashed() twice before/after ehash/lhash
lock in inet_unhash().

Let's remove the 2nd one.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250919083706.1863217-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotcp: Remove inet6_hash().
Kuniyuki Iwashima [Fri, 19 Sep 2025 08:35:29 +0000 (08:35 +0000)]
tcp: Remove inet6_hash().

inet_hash() and inet6_hash() are exactly the same.

Also, we do not need to export inet6_hash().

Let's consolidate the two into __inet_hash() and rename it to inet_hash().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250919083706.1863217-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotcp: Remove osk from __inet_hash() arg.
Kuniyuki Iwashima [Fri, 19 Sep 2025 08:35:28 +0000 (08:35 +0000)]
tcp: Remove osk from __inet_hash() arg.

__inet_hash() is called from inet_hash() and inet6_hash with osk NULL.

Let's remove the 2nd arg from __inet_hash().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250919083706.1863217-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: forwarding: Reorder (ar)ping arguments to obey POSIX getopt
David Yang [Fri, 19 Sep 2025 05:35:33 +0000 (13:35 +0800)]
selftests: forwarding: Reorder (ar)ping arguments to obey POSIX getopt

Quoted from musl wiki:

  GNU getopt permutes argv to pull options to the front, ahead of
  non-option arguments. musl and the POSIX standard getopt stop
  processing options at the first non-option argument with no
  permutation.

Thus these scripts stop working on musl since non-option arguments for
tools using getopt() (in this case, (ar)ping) do not always come last.
Fix it by reordering arguments.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250919053538.1106753-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net-netpoll-remove-dead-code-and-speed-up-rtnl-locked-region'
Jakub Kicinski [Sat, 20 Sep 2025 00:50:19 +0000 (17:50 -0700)]
Merge branch 'net-netpoll-remove-dead-code-and-speed-up-rtnl-locked-region'

Breno Leitao says:

====================
net: netpoll: remove dead code and speed up rtnl-locked region

This patchset introduces two minor modernizations to the netpoll
infrastructure:

The first patch removes the unused netpoll pointer from the netpoll_info
structure. This member is redundant and its presence does not benefit
multi-instance setups, as reported by Jay Vosburgh. Eliminating it cleans up
the structure and removes unnecessary code.

The second patch updates the netpoll resource cleanup routine to use
synchronize_net() instead of synchronize_rcu(). As __netpoll_free() is always
called under the RTNL lock, using synchronize_net() leverages the more
efficient synchronize_rcu_expedited() in these contexts, reducing time spent in
critical sections and improving performance.

Both changes simplify maintenance and enhance efficiency without altering
netpoll behavior.
====================

Link: https://patch.msgid.link/20250918-netpoll_jv-v1-0-67d50eeb2c26@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: netpoll: use synchronize_net() instead of synchronize_rcu()
Breno Leitao [Thu, 18 Sep 2025 12:25:58 +0000 (05:25 -0700)]
net: netpoll: use synchronize_net() instead of synchronize_rcu()

Replace synchronize_rcu() with synchronize_net() in __netpoll_free().

synchronize_net() is RTNL-aware and will use the more efficient
synchronize_rcu_expedited() when called under RTNL lock, avoiding
the potentially expensive synchronize_rcu() in RTNL critical sections.

Since __netpoll_free() is called with RTNL held (as indicated by
ASSERT_RTNL()), this change improves performance by reducing the
time spent in the RTNL critical section.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250918-netpoll_jv-v1-2-67d50eeb2c26@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: netpoll: remove unused netpoll pointer from netpoll_info
Breno Leitao [Thu, 18 Sep 2025 12:25:57 +0000 (05:25 -0700)]
net: netpoll: remove unused netpoll pointer from netpoll_info

The netpoll_info structure contains an useless pointer back to its
associated netpoll. This field is never used, and the assignment in
__netpoll_setup() is does not comtemplate multiple instances, as
reported by Jay[1].

Drop both the member and its initialization to simplify the structure.

Link: https://lore.kernel.org/all/2930648.1757463506@famine/
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250918-netpoll_jv-v1-1-67d50eeb2c26@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net-ipv4-some-drop-reason-cleanup-and-improvements'
Jakub Kicinski [Sat, 20 Sep 2025 00:35:54 +0000 (17:35 -0700)]
Merge branch 'net-ipv4-some-drop-reason-cleanup-and-improvements'

Antoine Tenart says:

====================
net: ipv4: some drop reason cleanup and improvements

A few patches that were laying around cleaning up and improving drop
reasons in net/ipv4.
====================

Link: https://patch.msgid.link/20250915091958.15382-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ipv4: convert ip_rcv_options to drop reasons
Antoine Tenart [Mon, 15 Sep 2025 09:19:56 +0000 (11:19 +0200)]
net: ipv4: convert ip_rcv_options to drop reasons

This converts the only path not returning drop reasons in
ip_rcv_finish_core.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250915091958.15382-4-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ipv4: simplify drop reason handling in ip_rcv_finish_core
Antoine Tenart [Mon, 15 Sep 2025 09:19:55 +0000 (11:19 +0200)]
net: ipv4: simplify drop reason handling in ip_rcv_finish_core

Instead of setting the drop reason to SKB_DROP_REASON_NOT_SPECIFIED
early and having to reset it each time it is overridden by a function
returned value, just set the drop reason to the expected value before
returning from ip_rcv_finish_core.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250915091958.15382-3-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ipv4: make udp_v4_early_demux explicitly return drop reason
Antoine Tenart [Mon, 15 Sep 2025 09:19:54 +0000 (11:19 +0200)]
net: ipv4: make udp_v4_early_demux explicitly return drop reason

udp_v4_early_demux already returns drop reasons as it either returns 0
or ip_mc_validate_source, which itself returns drop reasons. Its return
value is also already used as a drop reason itself.

Makes this explicit by making it return drop reasons.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250915091958.15382-2-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agortnetlink: add needed_{head,tail}room attributes
Alasdair McWilliam [Wed, 17 Sep 2025 09:55:42 +0000 (10:55 +0100)]
rtnetlink: add needed_{head,tail}room attributes

Various network interface types make use of needed_{head,tail}room values
to efficiently reserve buffer space for additional encapsulation headers,
such as VXLAN, Geneve, IPSec, etc. However, it is not currently possible
to query these values in a generic way.

Introduce ability to query the needed_{head,tail}room values of a network
device via rtnetlink, such that applications that may wish to use these
values can do so.

For example, Cilium agent iterates over present devices based on user config
(direct routing, vxlan, geneve, wireguard etc.) and in future will configure
netkit in order to expose the needed_{head,tail}room into K8s pods. See
b9ed315d3c4c ("netkit: Allow for configuring needed_{head,tail}room").

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alasdair McWilliam <alasdair@mcwilliam.dev>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20250917095543.14039-1-alasdair@mcwilliam.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net-stmmac-remove-mac_interface'
Jakub Kicinski [Sat, 20 Sep 2025 00:19:47 +0000 (17:19 -0700)]
Merge branch 'net-stmmac-remove-mac_interface'

Russell King says:

====================
net: stmmac: remove mac_interface

The dwmac core supports a range of interfaces, but when it comes to
SerDes interfaces, the core itself does not include the SerDes block.
Consequently, it has to provide an interface suitable to interface such
a block to, and uses TBI for this.

The driver also uses "PCS" for RGMII, even though the dwmac PCS block
is not present for this interface type - it was a convenice for the
code structure as RGMII includes inband signalling of the PHY state,
much like Cisco SGMII does at a high level.

As such, the code refers to RGMII and SGMII modes for the PCS, and
there used to be STMMAC_PCS_TBI and STMMAC_PCS_RTBI constants as well
but these were never set, although they were used in the code.

The selection of the PCS mode was from mac_interface. Thus, it seems
that the original intention was for mac_interface to describe the
interface mode used within the dwmac core, and phy_interface to
describe the external world-accessible interface (e.g. which would
connect to a PHY or SFP cage.)

It appears that many glue drivers misinterpret these. A good exmple
is socfpga. This supports SGMII and 1000BASE-X, but does not include
the dwmac PCS, relying on the Lynx PCS instead. However, it makes use
of mac_interface to configure the dwmac core to its GMII/MII mode.

So, when operating in either of these modes, the dwmac is configured
for GMII mode to communicate with the Lynx PCS which then provides
the SGMII or 1000BASE-X interface mode to the external world.

Given that phy_interface is the external world interface, and
mac_interface is the dwmac core interface, selecting the interface
mode based on mac_interface being 1000BASEX makes no sense.

Moreover, mac_interface is initialised by the stmmac core code. If
the "mac-mode" property is set in DT, this will be used. Otherwise,
it will reflect the "phy-mode" property - meaning that it defaults
to phy_interface. As no in-kernel DT makes reference to a "mac-mode"
property, we can assume that for all in-kernel platforms, these two
interface variables are the same. The exception are those platform
glues which I reviwed and suggested they use phy_interface, setting
mac_interface to PHY_INTERFACE_MODE_NA.

The conclusion to all of this is that mac_interface serves no useful
purpose, and causes confusion as the platform glue code doesn't seem
to know which one should be used.

Thus, let's get rid of mac_interface, which makes all this code more
understandable. It also helps to untangle some of the questions such
as:
- should this be using the interface passed from phylink
- should we set the range of phylink supported interfaces to be
  more than one
- when we get phylink PCS support for the dwmac PCS, should we be
  selecting it based on mac_interface or phy_interface, and how
  should we populate the PCS' supported_interface bitmap.

Having converted socfpga to use phy_interface, this turns out to
feel like "the right way" to do this - convert the external world
"phy_interface" to whatever phy_intf_sel value that the dwmac core
needs to achieve the connection to whatever hardware blocks are
integrated inside the SoC to achieve the requested external world
interface.

As an illustration why - consider that in the case of socfpga, it
_could_ have been implemented such that the dwmac PCS was used for
SGMII, and the Lynx PCS for 1000BASE-X, or vice versa. Only the
platform glue would know which it is.

I will also note that several cores provide their currently configured
interface mode via the ACTPHYIF field of Hardware Feature 0, and thus
can be read back in the platform-independent parts of the driver to
decide whether the internal PCS or the RGMII (or actually SMII) "PCS"
should be used.

This is hot-off-the-press, and has only been build tested. As I have
none of these platforms, this series has not been run-tested, so
please test on your hardware.
====================

Link: https://patch.msgid.link/aMrPpc8oRxqGtVPJ@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: remove mac_interface
Russell King (Oracle) [Wed, 17 Sep 2025 15:12:47 +0000 (16:12 +0100)]
net: stmmac: remove mac_interface

mac_interface has served little purpose, and has only caused confusion.
Now that we have cleaned up all platform glue drivers which should not
have been using mac_interface, there are no users remaining. Remove
mac_interface.

This results in the special dwmac specific "mac-mode" DT property
becoming redundant, and an in case, no DTS files in the kernel make use
of this property. Add a warning if the property is set, and it is
different from the "phy-mode".

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Link: https://patch.msgid.link/E1uytpv-00000006H2x-196h@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: thead: convert to use phy_interface
Russell King (Oracle) [Wed, 17 Sep 2025 15:12:42 +0000 (16:12 +0100)]
net: stmmac: thead: convert to use phy_interface

dwmac-thead supports either MII or RGMII interface modes only.

None of the DTS files set "mac-mode", so mac_interface will be
identical to phy_interface.

Convert dwmac-thead to use phy_interface when determining the
interface mode rather than mac_interface.

Also convert the error prints to use phy_modes() so that we get a
meaningful string rather than a number for the interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpq-00000006H2q-0ajY@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: sun8i: convert to use phy_interface
Russell King (Oracle) [Wed, 17 Sep 2025 15:12:37 +0000 (16:12 +0100)]
net: stmmac: sun8i: convert to use phy_interface

dwmac-sun8i supports MII, RMII and RGMII interface modes only. It
is unclear whether the dwmac core interface is different from the
one presented to the outside world.

However, as none of the DTS files set "mac-mode", mac_interface will
be identical to phy_interface.

Convert dwmac-sun8i to use phy_interface when determining the
interface mode rather than mac_interface.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Link: https://patch.msgid.link/E1uytpl-00000006H2k-08pH@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: stm32: convert to use phy_interface
Russell King (Oracle) [Wed, 17 Sep 2025 15:12:31 +0000 (16:12 +0100)]
net: stmmac: stm32: convert to use phy_interface

dwmac-stm32 supports MII, RMII, GMII and RGMII interface modes,
selecting the dwmac core interface mode via bits 23:21 of the
SYSCFG register. The bit combinations are identical to the
dwmac phy_intf_sel_i signals.

None of the DTS files set "mac-mode", so mac_interface will be
identical to phy_interface.

Convert dwmac-stm32 to use phy_interface when determining the
interface mode rather than mac_interface.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpf-00000006H2c-3hiU@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: starfive: convert to use phy_interface
Russell King (Oracle) [Wed, 17 Sep 2025 15:12:26 +0000 (16:12 +0100)]
net: stmmac: starfive: convert to use phy_interface

dwmac-starfive uses RMII or RGMII interface modes without any PCS,
and selects the dwmac core accordingly using a register field with
the same bit encoding as the core's phy_intf_sel_i signals.

None of the DTS files set "mac-mode", so mac_interface will be
identical to phy_interface.

Convert dwmac-starfive to use phy_interface when determining the
interface mode rather than mac_interface. Also convert the error
prints to use phy_modes() so that we get a meaningful string rather
than a number for the interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpa-00000006H2X-3GWx@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: socfpga: convert to use phy_interface
Russell King (Oracle) [Wed, 17 Sep 2025 15:12:21 +0000 (16:12 +0100)]
net: stmmac: socfpga: convert to use phy_interface

dwmac-socfpga uses MII, RMII, GMII, RGMII, SGMII and 1000BASE-X
interface modes, and supports the Lynx PCS. The Lynx PCS will only be
used for SGMII and 1000BASE-X modes, with the MAC programmed to use
GMII or MII mode to talk to the PCS. This suggests that the Synopsys
optional dwmac PCS is not present.

None of the DTS files set "mac-mode", so mac_interface will be
identical to phy_interface.

Convert dwmac-socfpga to use phy_interface when determining the
interface mode rather than mac_interface.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpV-00000006H2R-2nA6@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: ingenic: convert to use phy_interface
Russell King (Oracle) [Wed, 17 Sep 2025 15:12:16 +0000 (16:12 +0100)]
net: stmmac: ingenic: convert to use phy_interface

dwmac-ingenic uses only MII, RMII, GMII or RGMII interface modes, none
of which require any kind of conversion between the MAC and external
world. Thus, mac_interface and phy_interface will be the same.

Convert dwmac-ingenic to use phy_interface when determining the
interface mode that the dwmac core should be configured to at reset,
rather than mac_interface.

Also convert the error prints to use phy_modes() and terminate with a
newline so that we get a human readable string rather than a number for
the interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpQ-00000006H2L-2Jzb@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: imx: convert to use phy_interface
Russell King (Oracle) [Wed, 17 Sep 2025 15:12:11 +0000 (16:12 +0100)]
net: stmmac: imx: convert to use phy_interface

Checking the IMX8MP documentation, there is no requirement for a
separate mac_interface mode definition. As mac_interface and
phy_interface will be the same, use phy_interface internally rather
than mac_interface.

Also convert the error prints to use phy_modes() so that we get a
meaningful string rather than a number for the interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpL-00000006H2F-1o6b@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: use phy_interface in stmmac_check_pcs_mode()
Russell King (Oracle) [Wed, 17 Sep 2025 15:12:06 +0000 (16:12 +0100)]
net: stmmac: use phy_interface in stmmac_check_pcs_mode()

In the majority, if not all cases, mac_interface and phy_interface
are the same with the exception of some drivers that I have suggested
only use phy_interface and set mac_interface to PHY_INTERFACE_MODE_NA.

The only two that currently set mac_interface to PHY_INTERFACE_MODE_NA
are dwmac-loongson and dwmac-lpc18xx, neither of which use RGMII nor
SGMII.

In order to phase out the use of mac_interface, we need to have a path
for existing drivers so they can update to only using phy_interface
without causing regressions.

Therefore, in order to keep the "pcs" code working, we need to choose
the STMMAC integrated PCS mode based on phy_interface if mac_interface
is PHY_INTERFACE_MODE_NA.

This will allow more drivers to set mac_interface to
PHY_INTERFACE_MODE_NA without risking regressions.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpG-00000006H29-1Ltk@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: rework mac_interface and phy_interface documentation
Russell King (Oracle) [Wed, 17 Sep 2025 15:12:01 +0000 (16:12 +0100)]
net: stmmac: rework mac_interface and phy_interface documentation

Based on new research, it has come to light that the comment that I
added in a014c35556b9 ("net: stmmac: clarify difference between
"interface" and "phy_interface"") is not fully correct.

Update the comment to properly describe the difference between the two.

All of the DTS files in the kernel tree do not mention the "mac-mode"
property, which results in mac_interface and phy_interface being the
same. Also, none of the platform glue drivers set mac_interface to
anything but PHY_INTERFACE_MODE_NA. This means that for all the
platforms known to mainline, mac_interface is either the same as
phy_interface, or it is PHY_INTERFACE_MODE_NA.

Thus, updating the definition for mac_interface in stmmac.h has no
material effect on current uses known to mainline, but the change opens
the door to cleaning up all uses.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpB-00000006H23-0pRi@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5: Remove dead code from total_vfs setter
Vlad Dumitrescu [Thu, 18 Sep 2025 22:05:07 +0000 (15:05 -0700)]
net/mlx5: Remove dead code from total_vfs setter

The mlx5_devlink_total_vfs_set function branches based on per_pf_support
twice. Remove the second branch as the first one exits the function when
per_pf_support is false.

Accidentally added as part of commit a4c49611cf4f ("net/mlx5: Implement
devlink total_vfs parameter").

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-rdma/aMQWenzpdjhAX4fm@stanley.mountain/
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Link: https://patch.msgid.link/a6142a60-1948-439a-b0ae-ff1df26a37f8@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopsp: clarify checksum behavior of psp_dev_rcv()
Daniel Zahka [Thu, 18 Sep 2025 21:27:20 +0000 (14:27 -0700)]
psp: clarify checksum behavior of psp_dev_rcv()

psp_dev_rcv() decapsulates psp headers from a received frame. This
will make any csum complete computed by the device inaccurate. Rather
than attempt to patch up skb->csum in psp_dev_rcv() just make it clear
to callers what they can expect regarding checksum complete.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250918212723.17495-1-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopsp: Fix typo in kdoc for struct psp_dev_caps.assoc_drv_spc.
Kuniyuki Iwashima [Thu, 18 Sep 2025 19:25:35 +0000 (19:25 +0000)]
psp: Fix typo in kdoc for struct psp_dev_caps.assoc_drv_spc.

assoc_drv_spc is the size of psp_assoc.drv_data[].

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250918192539.1587586-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: phy: micrel: use %pe in print format
Jakub Kicinski [Thu, 18 Sep 2025 18:31:19 +0000 (11:31 -0700)]
net: phy: micrel: use %pe in print format

New cocci check complains:

  drivers/net/phy/micrel.c:4308:6-13: WARNING: Consider using %pe to print PTR_ERR()
  drivers/net/phy/micrel.c:5742:6-13: WARNING: Consider using %pe to print PTR_ERR()

Link: https://lore.kernel.org/1758192227-701925-1-git-send-email-tariqt@nvidia.com
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250918183119.2396019-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'address-miscellaneous-issues-with-psp_sk_get_assoc_rcu'
Jakub Kicinski [Sat, 20 Sep 2025 00:01:22 +0000 (17:01 -0700)]
Merge branch 'address-miscellaneous-issues-with-psp_sk_get_assoc_rcu'

Daniel Zahka says:

====================
address miscellaneous issues with psp_sk_get_assoc_rcu()

There were a few minor issues with psp_sk_get_assoc_rcu() identified
by Eric in his review of the initial psp series. This series addresses
them.
====================

Link: https://patch.msgid.link/20250918155205.2197603-1-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopsp: don't use flags for checking sk_state
Daniel Zahka [Thu, 18 Sep 2025 15:52:04 +0000 (08:52 -0700)]
psp: don't use flags for checking sk_state

Using flags to check sk_state only makes sense to check for a subset
of states in parallel e.g. sk_fullsock(). We are not doing that
here. Compare for individual states directly.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250918155205.2197603-4-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopsp: fix preemptive inet_twsk() cast in psp_sk_get_assoc_rcu()
Daniel Zahka [Thu, 18 Sep 2025 15:52:03 +0000 (08:52 -0700)]
psp: fix preemptive inet_twsk() cast in psp_sk_get_assoc_rcu()

It is weird to cast to a timewait_sock before checking sk_state, even
if the use is after such a check. Remove the tw local variable, and
use inet_twsk() directly in the timewait branch.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250918155205.2197603-3-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopsp: make struct sock argument const in psp_sk_get_assoc_rcu()
Daniel Zahka [Thu, 18 Sep 2025 15:52:02 +0000 (08:52 -0700)]
psp: make struct sock argument const in psp_sk_get_assoc_rcu()

This function does not need a mutable reference to its argument.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250918155205.2197603-2-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: prefer sk_skb_reason_drop()
Eric Dumazet [Thu, 18 Sep 2025 13:20:07 +0000 (13:20 +0000)]
tcp: prefer sk_skb_reason_drop()

Replace two calls to kfree_skb_reason() with sk_skb_reason_drop().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250918132007.325299-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoptp_ocp: make ptp_ocp driver compatible with PTP_EXTTS_REQUEST2
Vadim Fedorenko [Thu, 18 Sep 2025 13:11:46 +0000 (13:11 +0000)]
ptp_ocp: make ptp_ocp driver compatible with PTP_EXTTS_REQUEST2

Originally ptp_ocp driver was not strictly checking flags for external
timestamper and was always activating rising edge timestamping as it's
the only supported mode. Recent changes to ptp made it incompatible with
PTP_EXTTS_REQUEST2 ioctl. Adjust ptp_clock_info to provide supported
mode and be compatible with new infra.

While at here remove explicit check of periodic output flags from the
driver and provide supported flags for ptp core to check.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250918131146.651468-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ti: icssm-prueth: unwind cleanly in probe()
Dan Carpenter [Thu, 18 Sep 2025 09:48:26 +0000 (12:48 +0300)]
net: ti: icssm-prueth: unwind cleanly in probe()

This error handling triggers a Smatch warning:

    drivers/net/ethernet/ti/icssm/icssm_prueth.c:1574 icssm_prueth_probe()
    warn: 'prueth->pru1' is an error pointer or valid

The warning is harmless because the pru_rproc_put() function has an
IS_ERR_OR_NULL() check built in.  However, there is a small bug if
syscon_regmap_lookup_by_phandle() fails.  In that case we should call
of_node_put() on eth0_node and eth1_node.

It's a little bit easier to re-write this code to only free things which
we know have been allocated successfully.

Fixes: 511f6c1ae093 ("net: ti: icssm-prueth: Adds ICSSM Ethernet driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Parvathi Pudi <parvathi@couthit.com>
Link: https://patch.msgid.link/aMvVagz8aBRxMvFn@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net-mlx5e-support-rss-for-ipsec-offload'
Jakub Kicinski [Fri, 19 Sep 2025 23:48:39 +0000 (16:48 -0700)]
Merge branch 'net-mlx5e-support-rss-for-ipsec-offload'

Tariq Toukan says:

====================
net/mlx5e: Support RSS for IPSec offload

The series by Jianbo uses a new firmware feature to identify the inner
protocol of decrypted packets, adding new flow groups and steering rules
to redirect them for proper L4-based RSS. This ensures traffic is spread
across multiple CPU cores.
====================

Link: https://patch.msgid.link/1758179963-649455-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5e: Add flow rules for the decrypted ESP packets
Jianbo Liu [Thu, 18 Sep 2025 07:19:23 +0000 (10:19 +0300)]
net/mlx5e: Add flow rules for the decrypted ESP packets

The previous commit introduced two new flow groups to enable L4 RSS
for decrypted IPsec traffic. This commit implements the logic to
populate these groups with the necessary steering rules.

The rules are created dynamically whenever the first IPSec offload
rule is configured via the xfrm subsystem and the decryption tables
for RX are created. Each rule matches a specific decrypted traffic
type based on its ip version (or ethertype) and outer/inner
l4_type_ext, directing it to the appropriate L4 RSS-enabled TIR.

The lifecycle of these steering rules is tied directly to the RX
tables. They are deleted when the RX tables are destroyed.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758179963-649455-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5e: Add flow groups for the packets decrypted by crypto offload
Jianbo Liu [Thu, 18 Sep 2025 07:19:22 +0000 (10:19 +0300)]
net/mlx5e: Add flow groups for the packets decrypted by crypto offload

When using IPsec crypto offload, the hardware decrypts the packet
payload but preserves the ESP header. This prevents the standard RSS
mechanism from accessing the inner L4 (TCP/UDP) headers. As a result,
the RSS hash is calculated only on the outer L3 IP headers, causing
all traffic for a given IPsec tunnel to be directed to a single queue,
leading to poor traffic distribution.

Newer firmware introduces the ability to match on l4_type_ext, which
exposes the L4 protocol type following an ESP header. This allows the
driver to create steering rules that can identify the inner protocols
of decrypted packets.

This commit leverages this new capability to improve traffic
distribution. It adds two new flow groups to steer decrypted packets
to dedicated TIRs that was configured to perform RSS on the inner L4
headers.

These groups are inserted after the standard L4 group and before the
group that handles undecrypted ESP packets added in this series. The
first new group matches decrypted packets based on the outer IP
version (or ethertype) and l4_type_ext. The second new group matches
decrypted tunneled packets based on the inner IP version and
l4_type_ext. Eight new traffic types are also defined to support this
functionality.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758179963-649455-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5e: Recirculate decrypted packets into TTC table
Jianbo Liu [Thu, 18 Sep 2025 07:19:21 +0000 (10:19 +0300)]
net/mlx5e: Recirculate decrypted packets into TTC table

In the commit 5e466345291a ("net/mlx5e: IPsec: Add IPsec steering in
local NIC RX"), the decrypted packets are handled in RX error flow
table. There is only one rule in the table, which forwards packets to
the default ESP TIR.

This patch updates the design to allow RSS after decryption. For ESP
traffic, SPI and IP addresses are the fields selected for RSS hash,
and it's common that only one SPI is configured in RX direction, so
RSS can't work properly as all the packets are hashed to one key in
this case. To take advantage of RSS and improve performance, the
decrypted packets need to be forwarded back to TTC table, where RSS
can work based on the decrypted packet types.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758179963-649455-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5: Change TTC rules to match on undecrypted ESP packets
Jianbo Liu [Thu, 18 Sep 2025 07:19:20 +0000 (10:19 +0300)]
net/mlx5: Change TTC rules to match on undecrypted ESP packets

The TTC (Traffic Type Classifier) table classifies the traffic and
steers packet to TIRs, where RSS works based on the hash calculated
from the selected packet fields. For AH/ESP packets, SPI and IP
addresses are the fields used to calculate the hash value for RSS. So,
it's hard to distribute packets to different receiving queues as there
is usually only one SPI in that direction.

IPSec hardware offloads, crypto offload and full (packet) offload were
introduced later. For crypto offload, hardware does encryption,
decryption and authentication, kernel does the others. Kernel always
sends/receives formatted ESP packets with plaintext data instead of
the ciphertext data, all other fields are unmodified. For full
offload, hardware will take care of almost everything, kernel just
sends/receives packets without any IPSec headers.

Currently, all packets with ESP protocols are forwarded to IPSec
offload tables if IPSec rules are configured. In a downstream patch,
the decrypted packets will be recirculated to TTC table, in order to
use RSS, which does the hash on L4 fields after IPSec headers are
stripped by full offload. So those packets handled by crypto offload
must filtered out, as they still have the ESP headers, but apparently
no need to be decrypted again. To do that, ipsec_next_header is added
for the packet matching, as it is valid only after passing through
IPSec decryption.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758179963-649455-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agowan: framer: pef2256: use %pe in print format
Jakub Kicinski [Thu, 18 Sep 2025 13:46:37 +0000 (06:46 -0700)]
wan: framer: pef2256: use %pe in print format

New cocci check complains:

  drivers/net/wan/framer/pef2256/pef2256.c:733:3-10: WARNING: Consider using %pe to print PTR_ERR()

Link: https://lore.kernel.org/1758192227-701925-1-git-send-email-tariqt@nvidia.com
Acked-by: Herve Codina <herve.codina@bootlin.com>
Link: https://patch.msgid.link/20250918134637.2226614-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: airoha: Fix PPE_IP_PROTO_CHK register definitions
Lorenzo Bianconi [Thu, 18 Sep 2025 06:59:41 +0000 (08:59 +0200)]
net: airoha: Fix PPE_IP_PROTO_CHK register definitions

Fix typo in PPE_IP_PROTO_CHK_IPV4_MASK and PPE_IP_PROTO_CHK_IPV6_MASK
register mask definitions. This is not a real problem since this
register is not actually used in the current codebase.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agor8169: set EEE speed down ratio to 1
ChunHao Lin [Thu, 18 Sep 2025 02:34:25 +0000 (10:34 +0800)]
r8169: set EEE speed down ratio to 1

EEE speed down means speed down MAC MCU clock. It is not from spec.
It is kind of Realtek specific power saving feature. But enable it
may cause some issues, like packet drop or interrupt loss. Different
hardware may have different issues.

EEE speed down ratio (mac ocp 0xe056[7:4]) is used to set EEE speed
down rate. The larger this value is, the more power can save. But it
actually save less power then we expected. And, as mentioned above,
will impact compatibility. So set it to 1 (mac ocp 0xe056[7:4] = 0)
, which means not to speed down, to improve compatibility.

Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250918023425.3463-1-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: dsa: dsa_loop: remove duplicated definition of NUM_FIXED_PHYS
Heiner Kallweit [Thu, 18 Sep 2025 05:54:00 +0000 (07:54 +0200)]
net: dsa: dsa_loop: remove duplicated definition of NUM_FIXED_PHYS

Remove duplicated definition of NUM_FIXED_PHYS. This was a leftover from
41357bc7b94b ("net: dsa: dsa_loop: remove usage of mdio_board_info").

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/67a3b7df-c967-4431-86b6-a836dc46a4ef@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agomptcp: reset blackhole on success with non-loopback ifaces
Matthieu Baerts (NGI0) [Thu, 18 Sep 2025 08:50:18 +0000 (10:50 +0200)]
mptcp: reset blackhole on success with non-loopback ifaces

When a first MPTCP connection gets successfully established after a
blackhole period, 'active_disable_times' was supposed to be reset when
this connection was done via any non-loopback interfaces.

Unfortunately, the opposite condition was checked: only reset when the
connection was established via a loopback interface. Fixing this by
simply looking at the opposite.

This is similar to what is done with TCP FastOpen, see
tcp_fastopen_active_disable_ofo_check().

This patch is a follow-up of a previous discussion linked to commit
893c49a78d9f ("mptcp: Use __sk_dst_get() and dst_dev_rcu() in
mptcp_active_enable()."), see [1].

Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/4209a283-8822-47bd-95b7-87e96d9b7ea3@kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250918-net-next-mptcp-blackhole-reset-loopback-v1-1-bf5818326639@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agohinic3: Fix NULL vs IS_ERR() check in hinic3_alloc_rxqs_res()
Dan Carpenter [Thu, 18 Sep 2025 09:45:47 +0000 (12:45 +0300)]
hinic3: Fix NULL vs IS_ERR() check in hinic3_alloc_rxqs_res()

The page_pool_create() function never returns NULL, it returns
error pointers.  Update the check to match.

Fixes: 73f37a7e1993 ("hinic3: Queue pair resource initialization")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/aMvUywhgbmO1kH3Z@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopsp: do not use sk_dst_get() in psp_dev_get_for_sock()
Eric Dumazet [Thu, 18 Sep 2025 11:52:38 +0000 (11:52 +0000)]
psp: do not use sk_dst_get() in psp_dev_get_for_sock()

Use __sk_dst_get() and dst_dev_rcu(), because dst->dev could
be changed under us.

Fixes: 6b46ca260e22 ("net: psp: add socket security association code")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Tested-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250918115238.237475-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: sparx5/lan969x: Add support for ethtool pause parameters
Daniel Machon [Wed, 17 Sep 2025 11:49:43 +0000 (13:49 +0200)]
net: sparx5/lan969x: Add support for ethtool pause parameters

Implement get_pauseparam() and set_pauseparam() ethtool operations for
Sparx5 ports.  This allows users to query and configure IEEE 802.3x
pause frame settings via:

ethtool -a ethX
ethtool -A ethX rx on|off tx on|off autoneg on|off

The driver delegates pause parameter handling to phylink through
phylink_ethtool_get_pauseparam() and phylink_ethtool_set_pauseparam().

The underlying configuration of pause frame generation and reception is
already implemented in the driver; this patch only wires it up to the
standard ethtool interface, making the feature accessible to userspace.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250917-802-3x-pause-v1-1-3d1565a68a96@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethernet: microchip: sparx5: make it selectable for ARCH_LAN969X
Robert Marko [Wed, 17 Sep 2025 11:00:24 +0000 (13:00 +0200)]
net: ethernet: microchip: sparx5: make it selectable for ARCH_LAN969X

LAN969x switchdev support depends on the SparX-5 core,so make it selectable
for ARCH_LAN969X.

Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20250917110106.55219-1-robert.marko@sartura.hr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: phy: micrel: Add Fast link failure support for lan8842
Horatiu Vultur [Wed, 17 Sep 2025 10:46:30 +0000 (12:46 +0200)]
net: phy: micrel: Add Fast link failure support for lan8842

Add support for fast link failure for lan8842, when this is enabled the
PHY will detect link down immediately (~1ms). The disadvantage of this
is that also small instability might be reported as link down.
Therefore add this feature as a tunable configuration and the user will
know when to enable or not. By default it is not enabled.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250917104630.3931969-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge tag 'mlx5-next-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Jakub Kicinski [Thu, 18 Sep 2025 22:47:34 +0000 (15:47 -0700)]
Merge tag 'mlx5-next-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Tariq Toukan says:

====================
mlx5-next updates 2025-09-17

This series by Carolina contains cleanups significantly touching shared
mlx5 net and rdma headers.

* tag 'mlx5-next-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5e: Prevent WQE metadata conflicts between timestamping and offloads
  net/mlx5: Refactor MACsec WQE metadata shifts
  net/mlx5: Remove VLAN insertion fields from WQE Ether segment
====================

Link: https://patch.msgid.link/1757574619-604874-1-git-send-email-tariqt@nvidia.com
Link: https://patch.msgid.link/1758104780-642426-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: phy: clear link parameters on admin link down
Oleksij Rempel [Wed, 17 Sep 2025 09:47:51 +0000 (11:47 +0200)]
net: phy: clear link parameters on admin link down

When a PHY is halted (e.g. `ip link set dev lan2 down`), several
fields in struct phy_device may still reflect the last active
connection. This leads to ethtool showing stale values even though
the link is down.

Reset selected fields in _phy_state_machine() when transitioning
to PHY_HALTED and the link was previously up:

- speed/duplex -> UNKNOWN, but only in autoneg mode (in forced mode
  these fields carry configuration, not status)
- master_slave_state -> UNKNOWN if previously supported
- mdix -> INVALID (state only, same meaning as "unknown")
- lp_advertising -> always cleared

The cleanup is skipped if the PHY is in PHY_ERROR state, so the
last values remain available for diagnostics.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250917094751.2101285-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ti: am65-cpsw: Update hw timestamping filter for PTPv1 RX packets
Vishnu Singh [Wed, 17 Sep 2025 04:14:55 +0000 (09:44 +0530)]
net: ti: am65-cpsw: Update hw timestamping filter for PTPv1 RX packets

CPTS module of CPSW supports hardware timestamping of PTPv1 packets.Update
the "hwtstamp_rx_filters" of CPSW driver to enable timestamping of received
PTPv1 packets. Also update the advertised capability to include PTPv1.

Signed-off-by: Vishnu Singh <v-singh1@ti.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20250917041455.1815579-1-v-singh1@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 18 Sep 2025 18:23:29 +0000 (11:23 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.17-rc7).

No conflicts.

Adjacent changes:

drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
  9536fbe10c9d ("net/mlx5e: Add PSP steering in local NIC RX")
  7601a0a46216 ("net/mlx5e: Add a miss level for ipsec crypto offload")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge tag 'net-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 18 Sep 2025 17:22:02 +0000 (10:22 -0700)]
Merge tag 'net-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless. No known regressions at this point.

  Current release - fix to a fix:

   - eth: Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"

   - wifi: iwlwifi: pcie: fix byte count table for 7000/8000 devices

   - net: clear sk->sk_ino in sk_set_socket(sk, NULL), fix CRIU

  Previous releases - regressions:

   - bonding: set random address only when slaves already exist

   - rxrpc: fix untrusted unsigned subtract

   - eth:
       - ice: fix Rx page leak on multi-buffer frames
       - mlx5: don't return mlx5_link_info table when speed is unknown

  Previous releases - always broken:

   - tls: make sure to abort the stream if headers are bogus

   - tcp: fix null-deref when using TCP-AO with TCP_REPAIR

   - dpll: fix skipping last entry in clock quality level reporting

   - eth: qed: don't collect too many protection override GRC elements,
     fix memory corruption"

* tag 'net-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
  octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
  cnic: Fix use-after-free bugs in cnic_delete_task
  devlink rate: Remove unnecessary 'static' from a couple places
  MAINTAINERS: update sundance entry
  net: liquidio: fix overflow in octeon_init_instr_queue()
  net: clear sk->sk_ino in sk_set_socket(sk, NULL)
  Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
  selftests: tls: test skb copy under mem pressure and OOB
  tls: make sure to abort the stream if headers are bogus
  selftest: packetdrill: Add tcp_fastopen_server_reset-after-disconnect.pkt.
  tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().
  octeon_ep: fix VF MAC address lifecycle handling
  selftests: bonding: add vlan over bond testing
  bonding: don't set oif to bond dev when getting NS target destination
  net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
  net/mlx5e: Add a miss level for ipsec crypto offload
  net/mlx5e: Harden uplink netdev access against device unbind
  MAINTAINERS: make the DPLL entry cover drivers
  doc/netlink: Fix typos in operation attributes
  igc: don't fail igc_probe() on LED setup error
  ...

7 weeks agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Thu, 18 Sep 2025 16:42:55 +0000 (09:42 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "These are mostly Oliver's Arm changes: lock ordering fixes for the
  vGIC, and reverts for a buggy attempt to avoid RCU stalls on large
  VMs.

  Arm:

   - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
     visiting from an MMU notifier

   - Fixes to the TLB match process and TLB invalidation range for
     managing the VCNR pseudo-TLB

   - Prevent SPE from erroneously profiling guests due to UNKNOWN reset
     values in PMSCR_EL1

   - Fix save/restore of host MDCR_EL2 to account for eagerly
     programming at vcpu_load() on VHE systems

   - Correct lock ordering when dealing with VGIC LPIs, avoiding
     scenarios where an xarray's spinlock was nested with a *raw*
     spinlock

   - Permit stage-2 read permission aborts which are possible in the
     case of NV depending on the guest hypervisor's stage-2 translation

   - Call raw_spin_unlock() instead of the internal spinlock API

   - Fix parameter ordering when assigning VBAR_EL1

   - Reverted a couple of fixes for RCU stalls when destroying a stage-2
     page table.

     There appears to be some nasty refcounting / UAF issues lurking in
     those patches and the band-aid we tried to apply didn't hold.

  s390:

   - mm fixes, including userfaultfd bug fix

  x86:

   - Sync the vTPR from the local APIC to the VMCB even when AVIC is
     active.

     This fixes a bug where host updates to the vTPR, e.g. via
     KVM_SET_LAPIC or emulation of a guest access, are lost and result
     in interrupt delivery issues in the guest"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active
  Revert "KVM: arm64: Split kvm_pgtable_stage2_destroy()"
  Revert "KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables"
  KVM: arm64: vgic: fix incorrect spinlock API usage
  KVM: arm64: Remove stage 2 read fault check
  KVM: arm64: Fix parameter ordering for VBAR_EL1 assignment
  KVM: arm64: nv: Fix incorrect VNCR invalidation range calculation
  KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
  KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock
  KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks
  KVM: arm64: Spin off release helper from vgic_put_irq()
  KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs
  KVM: arm64: vgic: Drop stale comment on IRQ active state
  KVM: arm64: VHE: Save and restore host MDCR_EL2 value correctly
  KVM: arm64: Initialize PMSCR_EL1 when in VHE
  KVM: arm64: nv: fix VNCR TLB ASID match logic for non-Global entries
  KVM: s390: Fix FOLL_*/FAULT_FLAG_* confusion
  KVM: s390: Fix incorrect usage of mmu_notifier_register()
  KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
  KVM: arm64: Mark freed S2 MMUs as invalid

7 weeks agoMerge tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 18 Sep 2025 16:22:34 +0000 (09:22 -0700)]
Merge tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:
 "Fixes and new HW support:

   - amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list

   - amd/pmf: Support new ACPI ID AMDI0108

   - asus-wmi: Re-add extra keys to ignore_key_wlan quirk

   - oxpec: Add support for AOKZOE A1X and OneXPlayer X1Pro EVA-02"

* tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: asus-wmi: Re-add extra keys to ignore_key_wlan quirk
  platform/x86/amd/pmf: Support new ACPI ID AMDI0108
  platform/x86: oxpec: Add support for AOKZOE A1X
  platform/x86: oxpec: Add support for OneXPlayer X1Pro EVA-02
  platform/x86/amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list

7 weeks agoMerge tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml...
Linus Torvalds [Thu, 18 Sep 2025 16:18:27 +0000 (09:18 -0700)]
Merge tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML fixes from Johannes Berg:
 "A few fixes for UML, which I'd meant to send earlier but then forgot.

  All of them are pretty long-standing issues that are either not really
  happening (the UAF), in rarely used code (the FD buffer issue), or an
  issue only for some host configurations (the executable stack):

   - mark stack not executable to work on more modern systems with
     selinux

   - fix use-after-free in a virtio error path

   - fix stack buffer overflow in external unix socket FD receive
     function"

* tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: Fix FD copy size in os_rcv_fd_msg()
  um: virtio_uml: Fix use-after-free after put_device in probe
  um: Don't mark stack executable

7 weeks agoocteontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
Duoming Zhou [Wed, 17 Sep 2025 06:38:53 +0000 (14:38 +0800)]
octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()

The original code relies on cancel_delayed_work() in otx2_ptp_destroy(),
which does not ensure that the delayed work item synctstamp_work has fully
completed if it was already running. This leads to use-after-free scenarios
where otx2_ptp is deallocated by otx2_ptp_destroy(), while synctstamp_work
remains active and attempts to dereference otx2_ptp in otx2_sync_tstamp().
Furthermore, the synctstamp_work is cyclic, the likelihood of triggering
the bug is nonnegligible.

A typical race condition is illustrated below:

CPU 0 (cleanup)           | CPU 1 (delayed work callback)
otx2_remove()             |
  otx2_ptp_destroy()      | otx2_sync_tstamp()
    cancel_delayed_work() |
    kfree(ptp)            |
                          |   ptp = container_of(...); //UAF
                          |   ptp-> //UAF

This is confirmed by a KASAN report:

BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x7d7/0x8c0
Write of size 8 at addr ffff88800aa09a18 by task bash/136
...
Call Trace:
 <IRQ>
 dump_stack_lvl+0x55/0x70
 print_report+0xcf/0x610
 ? __run_timer_base.part.0+0x7d7/0x8c0
 kasan_report+0xb8/0xf0
 ? __run_timer_base.part.0+0x7d7/0x8c0
 __run_timer_base.part.0+0x7d7/0x8c0
 ? __pfx___run_timer_base.part.0+0x10/0x10
 ? __pfx_read_tsc+0x10/0x10
 ? ktime_get+0x60/0x140
 ? lapic_next_event+0x11/0x20
 ? clockevents_program_event+0x1d4/0x2a0
 run_timer_softirq+0xd1/0x190
 handle_softirqs+0x16a/0x550
 irq_exit_rcu+0xaf/0xe0
 sysvec_apic_timer_interrupt+0x70/0x80
 </IRQ>
...
Allocated by task 1:
 kasan_save_stack+0x24/0x50
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0x7f/0x90
 otx2_ptp_init+0xb1/0x860
 otx2_probe+0x4eb/0xc30
 local_pci_probe+0xdc/0x190
 pci_device_probe+0x2fe/0x470
 really_probe+0x1ca/0x5c0
 __driver_probe_device+0x248/0x310
 driver_probe_device+0x44/0x120
 __driver_attach+0xd2/0x310
 bus_for_each_dev+0xed/0x170
 bus_add_driver+0x208/0x500
 driver_register+0x132/0x460
 do_one_initcall+0x89/0x300
 kernel_init_freeable+0x40d/0x720
 kernel_init+0x1a/0x150
 ret_from_fork+0x10c/0x1a0
 ret_from_fork_asm+0x1a/0x30

Freed by task 136:
 kasan_save_stack+0x24/0x50
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3a/0x60
 __kasan_slab_free+0x3f/0x50
 kfree+0x137/0x370
 otx2_ptp_destroy+0x38/0x80
 otx2_remove+0x10d/0x4c0
 pci_device_remove+0xa6/0x1d0
 device_release_driver_internal+0xf8/0x210
 pci_stop_bus_device+0x105/0x150
 pci_stop_and_remove_bus_device_locked+0x15/0x30
 remove_store+0xcc/0xe0
 kernfs_fop_write_iter+0x2c3/0x440
 vfs_write+0x871/0xd70
 ksys_write+0xee/0x1c0
 do_syscall_64+0xac/0x280
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
...

Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the delayed work item is properly canceled before the otx2_ptp is
deallocated.

This bug was initially identified through static analysis. To reproduce
and test it, I simulated the OcteonTX2 PCI device in QEMU and introduced
artificial delays within the otx2_sync_tstamp() function to increase the
likelihood of triggering the bug.

Fixes: 2958d17a8984 ("octeontx2-pf: Add support for ptp 1-step mode on CN10K silicon")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agocnic: Fix use-after-free bugs in cnic_delete_task
Duoming Zhou [Wed, 17 Sep 2025 05:46:02 +0000 (13:46 +0800)]
cnic: Fix use-after-free bugs in cnic_delete_task

The original code uses cancel_delayed_work() in cnic_cm_stop_bnx2x_hw(),
which does not guarantee that the delayed work item 'delete_task' has
fully completed if it was already running. Additionally, the delayed work
item is cyclic, the flush_workqueue() in cnic_cm_stop_bnx2x_hw() only
blocks and waits for work items that were already queued to the
workqueue prior to its invocation. Any work items submitted after
flush_workqueue() is called are not included in the set of tasks that the
flush operation awaits. This means that after the cyclic work items have
finished executing, a delayed work item may still exist in the workqueue.
This leads to use-after-free scenarios where the cnic_dev is deallocated
by cnic_free_dev(), while delete_task remains active and attempt to
dereference cnic_dev in cnic_delete_task().

A typical race condition is illustrated below:

CPU 0 (cleanup)              | CPU 1 (delayed work callback)
cnic_netdev_event()          |
  cnic_stop_hw()             | cnic_delete_task()
    cnic_cm_stop_bnx2x_hw()  | ...
      cancel_delayed_work()  | /* the queue_delayed_work()
      flush_workqueue()      |    executes after flush_workqueue()*/
                             | queue_delayed_work()
  cnic_free_dev(dev)//free   | cnic_delete_task() //new instance
                             |   dev = cp->dev; //use

Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the cyclic delayed work item is properly canceled and that any
ongoing execution of the work item completes before the cnic_dev is
deallocated. Furthermore, since cancel_delayed_work_sync() uses
__flush_work(work, true) to synchronously wait for any currently
executing instance of the work item to finish, the flush_workqueue()
becomes redundant and should be removed.

This bug was identified through static analysis. To reproduce the issue
and validate the fix, I simulated the cnic PCI device in QEMU and
introduced intentional delays — such as inserting calls to ssleep()
within the cnic_delete_task() function — to increase the likelihood
of triggering the bug.

Fixes: fdf24086f475 ("cnic: Defer iscsi connection cleanup")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodevlink rate: Remove unnecessary 'static' from a couple places
Cosmin Ratiu [Thu, 18 Sep 2025 10:15:06 +0000 (13:15 +0300)]
devlink rate: Remove unnecessary 'static' from a couple places

devlink_rate_node_get_by_name() and devlink_rate_nodes_destroy() have a
couple of unnecessary static variables for iterating over devlink rates.

This could lead to races/corruption/unhappiness if two concurrent
operations execute the same function.

Remove 'static' from both. It's amazing this was missed for 4+ years.
While at it, I confirmed there are no more examples of this mistake in
net/ with 1, 2 or 3 levels of indentation.

Fixes: a8ecb93ef03d ("devlink: Introduce rate nodes")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMAINTAINERS: update sundance entry
Denis Kirjanov [Thu, 18 Sep 2025 09:15:56 +0000 (12:15 +0300)]
MAINTAINERS: update sundance entry

Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: liquidio: fix overflow in octeon_init_instr_queue()
Alexey Nepomnyashih [Wed, 17 Sep 2025 15:30:58 +0000 (15:30 +0000)]
net: liquidio: fix overflow in octeon_init_instr_queue()

The expression `(conf->instr_type == 64) << iq_no` can overflow because
`iq_no` may be as high as 64 (`CN23XX_MAX_RINGS_PER_PF`). Casting the
operand to `u64` ensures correct 64-bit arithmetic.

Fixes: f21fb3ed364b ("Add support of Cavium Liquidio ethernet adapters")
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: clear sk->sk_ino in sk_set_socket(sk, NULL)
Eric Dumazet [Wed, 17 Sep 2025 13:53:37 +0000 (13:53 +0000)]
net: clear sk->sk_ino in sk_set_socket(sk, NULL)

Andrei Vagin reported that blamed commit broke CRIU.

Indeed, while we want to keep sk_uid unchanged when a socket
is cloned, we want to clear sk->sk_ino.

Otherwise, sock_diag might report multiple sockets sharing
the same inode number.

Move the clearing part from sock_orphan() to sk_set_socket(sk, NULL),
called both from sock_orphan() and sk_clone_lock().

Fixes: 5d6b58c932ec ("net: lockless sock_i_ino()")
Closes: https://lore.kernel.org/netdev/aMhX-VnXkYDpKd9V@google.com/
Closes: https://github.com/checkpoint-restore/criu/issues/2744
Reported-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrei Vagin <avagin@google.com>
Link: https://patch.msgid.link/20250917135337.1736101-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoRevert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
Tariq Toukan [Wed, 17 Sep 2025 13:48:54 +0000 (16:48 +0300)]
Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"

This reverts commit d24341740fe48add8a227a753e68b6eedf4b385a.
It causes errors when trying to configure QoS, as well as
loss of L2 connectivity (on multi-host devices).

Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/20250910170011.70528106@kernel.org
Fixes: d24341740fe4 ("net/mlx5e: Update and set Xon/Xoff upon port speed set")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net-ethtool-add-dedicated-grxrings-driver-callbacks'
Jakub Kicinski [Thu, 18 Sep 2025 14:07:41 +0000 (07:07 -0700)]
Merge branch 'net-ethtool-add-dedicated-grxrings-driver-callbacks'

Breno Leitao says:

====================
net: ethtool: add dedicated GRXRINGS driver callbacks

This patchset introduces a new dedicated ethtool_ops callback,
.get_rx_ring_count, which enables drivers to provide the number of RX
rings directly, improving efficiency and clarity in RX ring queries and
RSS configuration.

Number of drivers implements .get_rxnfc callback just to report the ring
count, so, having a proper callback makes sense and simplify .get_rxnfc
(in some cases remove it completely).

This has been suggested by Jakub, and follow the same idea as RXFH
driver callbacks [1].

This also port virtio_net to this new callback. Once there is consensus
on this approach, I can start moving the drivers to this new callback.

Link: https://lore.kernel.org/all/20250611145949.2674086-1-kuba@kernel.org/
v3: https://lore.kernel.org/20250915-gxrings-v3-0-bfd717dbcaad@debian.org
v2: https://lore.kernel.org/20250912-gxrings-v2-0-3c7a60bbeebf@debian.org
v1: https://lore.kernel.org/20250909-gxrings-v1-0-634282f06a54@debian.org
RFC: https://lore.kernel.org/20250905-gxrings-v1-0-984fc471f28f@debian.org
====================

Link: https://patch.msgid.link/20250917-gxrings-v4-0-dae520e2e1cb@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: virtio_net: add get_rxrings ethtool callback for RX ring queries
Breno Leitao [Wed, 17 Sep 2025 09:58:15 +0000 (02:58 -0700)]
net: virtio_net: add get_rxrings ethtool callback for RX ring queries

Replace the existing virtnet_get_rxnfc callback with a dedicated
virtnet_get_rxrings implementation to provide the number of RX rings
directly via the new ethtool_ops get_rx_ring_count pointer.

This simplifies the RX ring count retrieval and aligns virtio_net with
the new ethtool API for querying RX ring parameters.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250917-gxrings-v4-8-dae520e2e1cb@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethtool: use the new helper in rss_set_prep_indir()
Breno Leitao [Wed, 17 Sep 2025 09:58:14 +0000 (02:58 -0700)]
net: ethtool: use the new helper in rss_set_prep_indir()

Refactor rss_set_prep_indir() to utilize the new
ethtool_get_rx_ring_count() helper for determining the number of RX
rings, replacing the direct use of get_rxnfc with ETHTOOL_GRXRINGS.

This ensures compatibility with both legacy and new ethtool_ops
interfaces by transparently multiplexing between them.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250917-gxrings-v4-7-dae520e2e1cb@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethtool: update set_rxfh_indir to use ethtool_get_rx_ring_count helper
Breno Leitao [Wed, 17 Sep 2025 09:58:13 +0000 (02:58 -0700)]
net: ethtool: update set_rxfh_indir to use ethtool_get_rx_ring_count helper

Modify ethtool_set_rxfh() to use the new ethtool_get_rx_ring_count()
helper function for retrieving the number of RX rings instead of
directly calling get_rxnfc with ETHTOOL_GRXRINGS.

This way, we can leverage the new helper if it is available in ethtool_ops.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250917-gxrings-v4-6-dae520e2e1cb@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethtool: update set_rxfh to use ethtool_get_rx_ring_count helper
Breno Leitao [Wed, 17 Sep 2025 09:58:12 +0000 (02:58 -0700)]
net: ethtool: update set_rxfh to use ethtool_get_rx_ring_count helper

Modify ethtool_set_rxfh() to use the new ethtool_get_rx_ring_count()
helper function for retrieving the number of RX rings instead of
directly calling get_rxnfc with ETHTOOL_GRXRINGS.

This way, we can leverage the new helper if it is available in ethtool_ops.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250917-gxrings-v4-5-dae520e2e1cb@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethtool: add get_rx_ring_count callback to optimize RX ring queries
Breno Leitao [Wed, 17 Sep 2025 09:58:11 +0000 (02:58 -0700)]
net: ethtool: add get_rx_ring_count callback to optimize RX ring queries

Add a new optional get_rx_ring_count callback in ethtool_ops to allow
drivers to provide the number of RX rings directly without going through
the full get_rxnfc flow classification interface.

Create ethtool_get_rx_ring_count() to use .get_rx_ring_count if
available, falling back to get_rxnfc() otherwise. It needs to be
non-static, given it will be called by other ethtool functions laters,
as those calling get_rxfh().

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250917-gxrings-v4-4-dae520e2e1cb@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethtool: remove the duplicated handling from ethtool_get_rxrings
Breno Leitao [Wed, 17 Sep 2025 09:58:10 +0000 (02:58 -0700)]
net: ethtool: remove the duplicated handling from ethtool_get_rxrings

ethtool_get_rxrings() was a copy of ethtool_get_rxnfc(). Clean the code
that will never be executed for GRXRINGS specifically.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250917-gxrings-v4-3-dae520e2e1cb@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethtool: add support for ETHTOOL_GRXRINGS ioctl
Breno Leitao [Wed, 17 Sep 2025 09:58:09 +0000 (02:58 -0700)]
net: ethtool: add support for ETHTOOL_GRXRINGS ioctl

This patch adds handling for the ETHTOOL_GRXRINGS ioctl command in the
ethtool ioctl dispatcher. It introduces a new helper function
ethtool_get_rxrings() that calls the driver's get_rxnfc() callback with
appropriate parameters to retrieve the number of RX rings supported
by the device.

By explicitly handling ETHTOOL_GRXRINGS, userspace queries through
ethtool can now obtain RX ring information in a structured manner.

In this patch, ethtool_get_rxrings() is a simply copy of
ethtool_get_rxnfc().

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250917-gxrings-v4-2-dae520e2e1cb@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethtool: pass the num of RX rings directly to ethtool_copy_validate_indir
Breno Leitao [Wed, 17 Sep 2025 09:58:08 +0000 (02:58 -0700)]
net: ethtool: pass the num of RX rings directly to ethtool_copy_validate_indir

Modify ethtool_copy_validate_indir() and callers to validate indirection
table entries against the number of RX rings as an integer instead of
accessing rx_rings->data.

This will be useful in the future, given that struct ethtool_rxnfc might
not exist for native GRXRINGS call.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250917-gxrings-v4-1-dae520e2e1cb@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopsp: rename our psp_dev_destroy()
Eric Dumazet [Thu, 18 Sep 2025 11:35:46 +0000 (11:35 +0000)]
psp: rename our psp_dev_destroy()

psp_dev_destroy() was already used in drivers/crypto/ccp/psp-dev.c

Use psp_dev_free() instead, to avoid a link error when
CRYPTO_DEV_SP_CCP=y

Fixes: 00c94ca2b99e ("psp: base PSP device support")
Closes: https://lore.kernel.org/netdev/CANn89i+ZdBDEV6TE=Nw5gn9ycTzWw4mZOpPuCswgwEsrgOyNnw@mail.gmail.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250918113546.177946-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'bnxt_en-updates-for-net-next'
Paolo Abeni [Thu, 18 Sep 2025 11:09:44 +0000 (13:09 +0200)]
Merge branch 'bnxt_en-updates-for-net-next'

Michael Chan says:

====================
bnxt_en: Updates for net-next

This series includes some code clean-ups and optimizations.  New features
include 2 new backing store memory types to collect FW logs for core
dumps, dynamic SRIOV resource allocations for RoCE, and ethtool tunable
for PFC watchdog.

v2: Drop patch #4.  The patch makes the code different from the original
bnxt_hwrm_func_backing_store_cfg_v2() that allows instance_bmap to have
bits that are not contiguous.  It is safer to keep the original code.

v1: https://lore.kernel.org/netdev/20250915030505.1803478-1-michael.chan@broadcom.com/
====================

Link: https://patch.msgid.link/20250917040839.1924698-1-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Implement ethtool .set_tunable() for ETHTOOL_PFC_PREVENTION_TOUT
Michael Chan [Wed, 17 Sep 2025 04:08:39 +0000 (21:08 -0700)]
bnxt_en: Implement ethtool .set_tunable() for ETHTOOL_PFC_PREVENTION_TOUT

Support the setting of the tunable if it is supported by firmware.
The supported range is 0 to the maximum msec value reported by
firmware.  PFC_STORM_PREVENTION_AUTO is also supported and 0 means it
is disabled.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250917040839.1924698-11-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Implement ethtool .get_tunable() for ETHTOOL_PFC_PREVENTION_TOUT
Michael Chan [Wed, 17 Sep 2025 04:08:38 +0000 (21:08 -0700)]
bnxt_en: Implement ethtool .get_tunable() for ETHTOOL_PFC_PREVENTION_TOUT

Return the current PFC watchdog timeout value if it is supported.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250917040839.1924698-10-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Use VLAN_ETH_HLEN when possible
Kalesh AP [Wed, 17 Sep 2025 04:08:37 +0000 (21:08 -0700)]
bnxt_en: Use VLAN_ETH_HLEN when possible

Replace ETH_HLEN + VLAN_HLEN with VLAN_ETH_HLEN.  Cosmetic change and
no functional changes intended.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250917040839.1924698-9-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Support for RoCE resources dynamically shared within VFs.
Anantha Prabhu [Wed, 17 Sep 2025 04:08:36 +0000 (21:08 -0700)]
bnxt_en: Support for RoCE resources dynamically shared within VFs.

Add support for dynamic RoCE SRIOV resource configuration.  Instead of
statically dividing the RoCE resources by the number of VFs, provide
the maximum resources and let the FW dynamically dsitribute to the VFs
on the fly.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Anantha Prabhu <anantha.prabhu@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250917040839.1924698-8-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Add err_qpc backing store handling
Kashyap Desai [Wed, 17 Sep 2025 04:08:35 +0000 (21:08 -0700)]
bnxt_en: Add err_qpc backing store handling

New backing store component err_qpc is added to the existing host
logging interface for FW traces.

Allocate the backing store memory if this memory type is supported.
Copy this memory when collecting the coredump.

Reviewed-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250917040839.1924698-7-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Add fw log trace support for 5731X/5741X chips
Shruti Parab [Wed, 17 Sep 2025 04:08:34 +0000 (21:08 -0700)]
bnxt_en: Add fw log trace support for 5731X/5741X chips

These older chips now support the fw log traces via backing store
qcaps_v2. No other backing store memory types are supported besides
the fw trace types.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250917040839.1924698-6-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Improve bnxt_backing_store_cfg_v2()
Michael Chan [Wed, 17 Sep 2025 04:08:33 +0000 (21:08 -0700)]
bnxt_en: Improve bnxt_backing_store_cfg_v2()

Improve the logic that determines the last_type in this function.
The different context memory types are configured in a loop.  The
last_type signals the last context memory type to be configured
which requires the ALL_DONE flag to be set for the FW.

The existing logic makes some assumptions that TIM is the last_type
when RDMA is enabled or FTQM is the last_type when only L2 is
enabled.  Improve it to just search for the last_type so that we
don't need to make these assumptions that won't necessary be true
for future devices.

Reviewed-by: Shruti Parab <shruti.parab@broadcom.com>
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250917040839.1924698-5-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Optimize bnxt_sriov_disable()
Kalesh AP [Wed, 17 Sep 2025 04:08:32 +0000 (21:08 -0700)]
bnxt_en: Optimize bnxt_sriov_disable()

bnxt_sriov_disable() is invoked from 2 places:

1. When the user deletes the VFs.
2. During the unload of the PF driver instance.

Inside bnxt_sriov_disable(), driver invokes
bnxt_restore_pf_fw_resources() which in turn causes a close/open_nic().
There is no harm doing this in the unload path, although it is inefficient
and unnecessary.

Optimize the function to make it more efficient.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250917040839.1924698-4-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Remove unnecessary VF check in bnxt_hwrm_nvm_req()
Kalesh AP [Wed, 17 Sep 2025 04:08:31 +0000 (21:08 -0700)]
bnxt_en: Remove unnecessary VF check in bnxt_hwrm_nvm_req()

The driver registers the supported configuration parameters with the
devlink stack only on the PF using devlink_params_register().
Hence there is no need for a VF check inside bnxt_hwrm_nvm_req().

Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250917040839.1924698-3-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Drop redundant if block in bnxt_dl_flash_update()
Kalesh AP [Wed, 17 Sep 2025 04:08:30 +0000 (21:08 -0700)]
bnxt_en: Drop redundant if block in bnxt_dl_flash_update()

The devlink stack has sanity checks and it invokes flash_update()
only if it is supported by the driver.  The VF driver does not
advertise the support for flash_update in struct devlink_ops.
This makes if condition inside bnxt_dl_flash_update() redundant.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250917040839.1924698-2-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>