]> www.infradead.org Git - users/willy/linux.git/log
users/willy/linux.git
4 months agonetmem: fix netmem comments
Mina Almasry [Sun, 15 Jun 2025 20:35:09 +0000 (20:35 +0000)]
netmem: fix netmem comments

Trivial fix to a couple of outdated netmem comments. No code changes,
just more accurately describing current code.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250615203511.591438-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftest: Add selftest for multicast address notifications
Yuyang Huang [Sat, 14 Jun 2025 05:35:22 +0000 (14:35 +0900)]
selftest: Add selftest for multicast address notifications

This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR
and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and
removing a dummy interface and then confirming that the system
correctly receives join and removal notifications for the 224.0.0.1
and ff02::1 multicast addresses.

The test relies on the iproute2 version to be 6.13+.

Tested by the following command:
$ vng -v --user root --cpus 16 -- \
make -C tools/testing/selftests TARGETS=net
TEST_PROGS=rtnetlink_notification.sh \
TEST_GEN_PROGS="" run_tests

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Link: https://patch.msgid.link/20250614053522.623820-1-yuyanghuang@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'net-dsa-b53-fix-bcm5325-support'
Jakub Kicinski [Wed, 18 Jun 2025 00:52:32 +0000 (17:52 -0700)]
Merge branch 'net-dsa-b53-fix-bcm5325-support'

Álvaro Fernández Rojas says:

====================
net: dsa: b53: fix BCM5325 support

These patches get the BCM5325 switch working with b53.

The existing brcm legacy tag only works with BCM63xx switches.
We need to add a new legacy tag for BCM5325 and BCM5365 switches, which
require including the FCS and length.

I'm not really sure that everything here is correct since I don't work for
Broadcom and all this is based on the public datasheet available for the
BCM5325 and my own experiments with a Huawei HG556a (BCM6358).

Both sets of patches have been merged due to the change requested by Jonas
about BRCM_HDR register access depending on legacy tags.
====================

Link: https://patch.msgid.link/20250614080000.1884236-1-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: b53: ensure BCM5325 PHYs are enabled
Álvaro Fernández Rojas [Sat, 14 Jun 2025 08:00:00 +0000 (10:00 +0200)]
net: dsa: b53: ensure BCM5325 PHYs are enabled

According to the datasheet, BCM5325 uses B53_PD_MODE_CTRL_25 register to
disable clocking to individual PHYs.
Only ports 1-4 can be enabled or disabled and the datasheet is explicit
about not toggling BIT(0) since it disables the PLL power and the switch.

Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250614080000.1884236-15-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: b53: fix b53_imp_vlan_setup for BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:59 +0000 (09:59 +0200)]
net: dsa: b53: fix b53_imp_vlan_setup for BCM5325

CPU port should be B53_CPU_PORT instead of B53_CPU_PORT_25 for
B53_PVLAN_PORT_MASK register.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-14-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: b53: fix unicast/multicast flooding on BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:58 +0000 (09:59 +0200)]
net: dsa: b53: fix unicast/multicast flooding on BCM5325

BCM5325 doesn't implement UC_FLOOD_MASK, MC_FLOOD_MASK and IPMC_FLOOD_MASK
registers.
This has to be handled differently with other pages and registers.

Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250614080000.1884236-13-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: b53: prevent GMII_PORT_OVERRIDE_CTRL access on BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:57 +0000 (09:59 +0200)]
net: dsa: b53: prevent GMII_PORT_OVERRIDE_CTRL access on BCM5325

BCM5325 doesn't implement GMII_PORT_OVERRIDE_CTRL register so we should
avoid reading or writing it.
PORT_OVERRIDE_RX_FLOW and PORT_OVERRIDE_TX_FLOW aren't defined on BCM5325
and we should use PORT_OVERRIDE_LP_FLOW_25 instead.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-12-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: b53: prevent BRCM_HDR access on older devices
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:56 +0000 (09:59 +0200)]
net: dsa: b53: prevent BRCM_HDR access on older devices

Older switches don't implement BRCM_HDR register so we should avoid
reading or writing it.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-11-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: b53: prevent DIS_LEARNING access on BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:55 +0000 (09:59 +0200)]
net: dsa: b53: prevent DIS_LEARNING access on BCM5325

BCM5325 doesn't implement DIS_LEARNING register so we should avoid reading
or writing it.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-10-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: b53: fix IP_MULTICAST_CTRL on BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:54 +0000 (09:59 +0200)]
net: dsa: b53: fix IP_MULTICAST_CTRL on BCM5325

BCM5325 doesn't implement B53_UC_FWD_EN, B53_MC_FWD_EN or B53_IPMC_FWD_EN.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-9-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: b53: prevent SWITCH_CTRL access on BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:53 +0000 (09:59 +0200)]
net: dsa: b53: prevent SWITCH_CTRL access on BCM5325

BCM5325 doesn't implement SWITCH_CTRL register so we should avoid reading
or writing it.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-8-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: b53: prevent FAST_AGE access on BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:52 +0000 (09:59 +0200)]
net: dsa: b53: prevent FAST_AGE access on BCM5325

BCM5325 doesn't implement FAST_AGE registers so we should avoid reading or
writing them.

Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250614080000.1884236-7-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: b53: add support for FDB operations on 5325/5365
Florian Fainelli [Sat, 14 Jun 2025 07:59:51 +0000 (09:59 +0200)]
net: dsa: b53: add support for FDB operations on 5325/5365

BCM5325 and BCM5365 are part of a much older generation of switches which,
due to their limited number of ports and VLAN entries (up to 256) allowed
a single 64-bit register to hold a full ARL entry.
This requires a little bit of massaging when reading, writing and
converting ARL entries in both directions.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-6-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: b53: detect BCM5325 variants
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:50 +0000 (09:59 +0200)]
net: dsa: b53: detect BCM5325 variants

We need to be able to differentiate the BCM5325 variants because:
- BCM5325M switches lack the ARLIO_PAGE->VLAN_ID_IDX register.
- BCM5325E have less 512 ARL buckets instead of 1024.

Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250614080000.1884236-5-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: b53: support legacy FCS tags
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:49 +0000 (09:59 +0200)]
net: dsa: b53: support legacy FCS tags

Commit 46c5176c586c ("net: dsa: b53: support legacy tags") introduced
support for legacy tags, but it turns out that BCM5325 and BCM5365
switches require the original FCS value and length, so they have to be
treated differently.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-4-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: tag_brcm: add support for legacy FCS tags
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:48 +0000 (09:59 +0200)]
net: dsa: tag_brcm: add support for legacy FCS tags

Add support for legacy Broadcom FCS tags, which are similar to
DSA_TAG_PROTO_BRCM_LEGACY.
BCM5325 and BCM5365 switches require including the original FCS value and
length, as opposed to BCM63xx switches.
Adding the original FCS value and length to DSA_TAG_PROTO_BRCM_LEGACY would
impact performance of BCM63xx switches, so it's better to create a new tag.

Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250614080000.1884236-3-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: tag_brcm: legacy: reorganize functions
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:47 +0000 (09:59 +0200)]
net: dsa: tag_brcm: legacy: reorganize functions

Move brcm_leg_tag_rcv() definition to top.
This function is going to be shared between two different tags.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-2-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'nte-stmmac-visconti-cleanups'
Jakub Kicinski [Tue, 17 Jun 2025 23:25:27 +0000 (16:25 -0700)]
Merge branch 'nte-stmmac-visconti-cleanups'

Russell King says:

====================
net: stmmac: visconti: cleanups

A short series of cleanups to the visconti dwmac glue.
====================

Link: https://patch.msgid.link/aFCHJWXSLbUoogi6@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: visconti: make phy_intf_sel local
Russell King (Oracle) [Mon, 16 Jun 2025 21:06:32 +0000 (22:06 +0100)]
net: stmmac: visconti: make phy_intf_sel local

There is little need to have phy_intf_sel as a member of struct
visconti_eth when we have the PHY interface mode available from
phylink in visconti_eth_set_clk_tx_rate(). Without multiple
interface support, phylink is fixed to supporting only
plat->phy_interface, so we can be sure that "interface" passed
into this function is the same as plat->phy_interface.

Make phy_intf_sel local to visconti_eth_init_hw() and clean up.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uRH2G-004UyY-GD@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: visconti: clean up code formatting
Russell King (Oracle) [Mon, 16 Jun 2025 21:06:27 +0000 (22:06 +0100)]
net: stmmac: visconti: clean up code formatting

Ensure that code is wrapped prior to column 80, and shorten the
needlessly long "clk_sel_val" to just "clk_sel".

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uRH2B-004UyS-Ch@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: visconti: reorganise visconti_eth_set_clk_tx_rate()
Russell King (Oracle) [Mon, 16 Jun 2025 21:06:22 +0000 (22:06 +0100)]
net: stmmac: visconti: reorganise visconti_eth_set_clk_tx_rate()

Rather than testing dwmac->phy_intf_sel several times for the same
values in this function, group the code together. The only part
which was common was stopping the internal clock before programming
the clock setting.

This further improves the readability of this function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uRH26-004UyM-9G@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: visconti: re-arrange speed decode
Russell King (Oracle) [Mon, 16 Jun 2025 21:06:17 +0000 (22:06 +0100)]
net: stmmac: visconti: re-arrange speed decode

Re-arrange the speed decode in visconti_eth_set_clk_tx_rate() to be
more readable by first checking to see if we're using RGMII or RMII
and then decoding the speed, rather than decoding the speed and then
testing the interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uRH21-004UyG-50@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'link-napi-instances-to-queues-and-irqs'
Jakub Kicinski [Tue, 17 Jun 2025 23:24:13 +0000 (16:24 -0700)]
Merge branch 'link-napi-instances-to-queues-and-irqs'

Justin Lai says:

====================
Link NAPI instances to queues and IRQs

This patch series introduces netdev-genl support to rtase, enabling
user-space applications to query the relationships between IRQs,
queues, and NAPI instances.
====================

Link: https://patch.msgid.link/20250616032226.7318-1-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agortase: Link queues to NAPI instances
Justin Lai [Mon, 16 Jun 2025 03:22:26 +0000 (11:22 +0800)]
rtase: Link queues to NAPI instances

Link queues to NAPI instances with netif_queue_set_napi. This
information can be queried with the netdev-genl API.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250616032226.7318-3-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agortase: Link IRQs to NAPI instances
Justin Lai [Mon, 16 Jun 2025 03:22:25 +0000 (11:22 +0800)]
rtase: Link IRQs to NAPI instances

Link IRQs to NAPI instances with netif_napi_set_irq. This
information can be queried with the netdev-genl API.

Also add support for persistent NAPI configuration using
netif_napi_add_config().

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250616032226.7318-2-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: nettest: Fix typo in log and error messages for clarity
Alok Tiwari [Sun, 15 Jun 2025 08:48:12 +0000 (01:48 -0700)]
selftests: nettest: Fix typo in log and error messages for clarity

This patch corrects several logging and error message in nettest.c:
- Corrects function name in log messages "setsockopt" -> "getsockopt".
- Closes missing parentheses in "setsockopt(IPV6_FREEBIND)".
- Replaces misleading error text ("Invalid port") with the correct
  description ("Invalid prefix length").
- remove Redundant wording like "status from status" and clarifies
  context in IPC error messages.

These changes improve readability and aid in debugging test output.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250615084822.1344759-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'tcp-remove-obsolete-rfc3517-rfc6675-code'
Jakub Kicinski [Tue, 17 Jun 2025 23:19:04 +0000 (16:19 -0700)]
Merge branch 'tcp-remove-obsolete-rfc3517-rfc6675-code'

Neal Cardwell says:

====================
tcp: remove obsolete RFC3517/RFC6675 code

RACK-TLP loss detection has been enabled as the default loss detection
algorithm for Linux TCP since 2018, in:

 commit b38a51fec1c1 ("tcp: disable RFC6675 loss detection")

In case users ran into unexpected bugs or performance regressions,
that commit allowed Linux system administrators to revert to using
RFC3517/RFC6675 loss recovery by setting net.ipv4.tcp_recovery to 0.

In the seven years since 2018, our team has not heard reports of
anyone reverting Linux TCP to use RFC3517/RFC6675 loss recovery, and
we can't find any record in web searches of such a revert.

RACK-TLP was published as a standards-track RFC, RFC8985, in February
2021.

Several other major TCP implementations have default-enabled RACK-TLP
at this point as well.

RACK-TLP offers several significant performance advantages over
RFC3517/RFC6675 loss recovery, including much better performance in
the common cases of tail drops, lost retransmissions, and reordering.

It is now time to remove the obsolete and unused RFC3517/RFC6675 loss
recovery code. This will allow a substantial simplification of the
Linux TCP code base, and removes 12 bytes of state in every tcp_sock
for 64-bit machines (8 bytes on 32-bit machines).

To arrange the commits in reasonable sizes, this patch series is split
into 3 commits:

(1) Removes the core RFC3517/RFC6675 logic.

(2) Removes the RFC3517/RFC6675 hint state and the first layer of logic that
    updates that state.

(3) Removes the emptied-out tcp_clear_retrans_hints_partial() helper function
    and all of its call sites.
====================

Link: https://patch.msgid.link/20250615001435.2390793-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotcp: remove RFC3517/RFC6675 tcp_clear_retrans_hints_partial()
Neal Cardwell [Sun, 15 Jun 2025 00:14:35 +0000 (20:14 -0400)]
tcp: remove RFC3517/RFC6675 tcp_clear_retrans_hints_partial()

Now that we have removed the RFC3517/RFC6675 hints,
tcp_clear_retrans_hints_partial() is empty, and can be removed.

Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250615001435.2390793-4-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint
Neal Cardwell [Sun, 15 Jun 2025 00:14:34 +0000 (20:14 -0400)]
tcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint

Now that obsolete RFC3517/RFC6675 TCP loss detection has been removed,
we can remove the somewhat complex and intrusive code to maintain its
hint state: lost_skb_hint and lost_cnt_hint.

This commit makes tcp_clear_retrans_hints_partial() empty. We will
remove tcp_clear_retrans_hints_partial() and its call sites in the
next commit.

Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250615001435.2390793-3-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code
Neal Cardwell [Sun, 15 Jun 2025 00:14:33 +0000 (20:14 -0400)]
tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code

RACK-TLP loss detection has been enabled as the default loss detection
algorithm for Linux TCP since 2018, in:

 commit b38a51fec1c1 ("tcp: disable RFC6675 loss detection")

In case users ran into unexpected bugs or performance regressions,
that commit allowed Linux system administrators to revert to using
RFC3517/RFC6675 loss recovery by setting net.ipv4.tcp_recovery to 0.

In the seven years since 2018, our team has not heard reports of
anyone reverting Linux TCP to use RFC3517/RFC6675 loss recovery, and
we can't find any record in web searches of such a revert.

RACK-TLP was published as a standards-track RFC, RFC8985, in February
2021.

Several other major TCP implementations have default-enabled RACK-TLP
at this point as well.

RACK-TLP offers several significant performance advantages over
RFC3517/RFC6675 loss recovery, including much better performance in
the common cases of tail drops, lost retransmissions, and reordering.

It is now time to remove the obsolete and unused RFC3517/RFC6675 loss
recovery code. This will allow a substantial simplification of the
Linux TCP code base, and removes 12 bytes of state in every tcp_sock
for 64-bit machines (8 bytes on 32-bit machines).

To arrange the commits in reasonable sizes, this patch series is split
into 3 commits. The following 2 commits remove bookkeeping state and
code that is no longer needed after this removal of RFC3517/RFC6675
loss recovery.

Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250615001435.2390793-2-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: bcmgenet: update PHY power down
Doug Berger [Sat, 14 Jun 2025 02:58:16 +0000 (19:58 -0700)]
net: bcmgenet: update PHY power down

The disable sequence in bcmgenet_phy_power_set() is updated to
match the inverse sequence and timing (and spacing) of the
enable sequence. This ensures that LEDs driven by the GENET IP
are disabled when the GPHY is powered down.

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250614025817.3808354-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agobnxt_en: Improve comment wording and error return code
Alok Tiwari [Sun, 15 Jun 2025 15:40:40 +0000 (08:40 -0700)]
bnxt_en: Improve comment wording and error return code

Improved wording and grammar in several comments for clarity.
  "the must belongs" -> "it must belong"
  "mininum" -> "minimum"
  "fileds" -> "fields"

Replaced return -1 with -EINVAL in hwrm_ring_alloc_send_msg()
to return a proper error code.

These changes enhance code readability and consistent error handling.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250615154051.1365631-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: liquidio: Remove unused validate_cn23xx_pf_config_info()
Dr. David Alan Gilbert [Sat, 14 Jun 2025 23:49:41 +0000 (00:49 +0100)]
net: liquidio: Remove unused validate_cn23xx_pf_config_info()

[Note, I'm wondering if actually this is a case of a missing call;
the other similar function is called in __verify_octeon_config_info(),
but I don't have or know the hardware.]

validate_cn23xx_pf_config_info() was added in 2016 by
commit 72c0091293c0 ("liquidio: CN23XX device init and sriov config")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250614234941.61769-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'net-stmmac-rk-more-cleanups'
Jakub Kicinski [Tue, 17 Jun 2025 22:30:15 +0000 (15:30 -0700)]
Merge branch 'net-stmmac-rk-more-cleanups'

Russell King says:

====================
net: stmmac: rk: more cleanups

Another couple of cleanups removing pointless code.
====================

Link: https://patch.msgid.link/aE_u8mCkUXEWTzJe@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: rk: remove unnecessary clk_mac
Russell King (Oracle) [Mon, 16 Jun 2025 10:16:01 +0000 (11:16 +0100)]
net: stmmac: rk: remove unnecessary clk_mac

The stmmac platform code already gets the "stmmaceth" clock, so there
is no need for drivers to get it. Use the stored pointer in struct
plat_stmmacenet_data instead of getting and storing our own pointer.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uR6sj-004Ku5-HR@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: rk: use device rather than platform device in rk_priv_data
Russell King (Oracle) [Mon, 16 Jun 2025 10:15:56 +0000 (11:15 +0100)]
net: stmmac: rk: use device rather than platform device in rk_priv_data

All the code in dwmac-rk uses &bsp_priv->pdev->dev, nothing uses
bsp_priv->pdev directly. Store the struct device rather than the
struct platform_device in struct rk_priv_data, and simplifying the
code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uR6se-004Ktz-Dx@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: rk: fix code formmating issue
Russell King (Oracle) [Mon, 16 Jun 2025 10:15:51 +0000 (11:15 +0100)]
net: stmmac: rk: fix code formmating issue

Fix a code formatting issue introduced in the previous series, no
space after , before "int".

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uR6sZ-004Ktt-9y@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'io_uring-cmd-for-tx-timestamps'
Jakub Kicinski [Tue, 17 Jun 2025 22:24:29 +0000 (15:24 -0700)]
Merge branch 'io_uring-cmd-for-tx-timestamps'

Pavel Begunkov says:

====================
io_uring cmd for tx timestamps (part)

Apply the networking helpers for the io_uring timestamp API.
====================

Link: https://patch.msgid.link/cover.1750065793.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: timestamp: add helper returning skb's tx tstamp
Pavel Begunkov [Mon, 16 Jun 2025 09:46:25 +0000 (10:46 +0100)]
net: timestamp: add helper returning skb's tx tstamp

Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
associated with an error queue skb.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/702357dd8936ef4c0d3864441e853bfe3224a677.1750065793.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'vsock-test-improve-transport_uaf-test'
Jakub Kicinski [Tue, 17 Jun 2025 21:50:37 +0000 (14:50 -0700)]
Merge branch 'vsock-test-improve-transport_uaf-test'

Michal Luczaj says:

====================
vsock/test: Improve transport_uaf test

Increase the coverage of a test implemented in commit 301a62dfb0d0
("vsock/test: Add test for UAF due to socket unbinding"). Take this
opportunity to factor out some utility code, drop a redundant sync between
client and server, and introduce a /proc/kallsyms harvesting logic for
auto-detecting registered vsock transports.

v2: https://lore.kernel.org/20250528-vsock-test-inc-cov-v2-0-8f655b40d57c@rbox.co
v1: https://lore.kernel.org/20250523-vsock-test-inc-cov-v1-1-fa3507941bbd@rbox.co
====================

Link: https://patch.msgid.link/20250611-vsock-test-inc-cov-v3-0-5834060d9c20@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agovsock/test: Cover more CIDs in transport_uaf test
Michal Luczaj [Wed, 11 Jun 2025 19:56:52 +0000 (21:56 +0200)]
vsock/test: Cover more CIDs in transport_uaf test

Increase the coverage of test for UAF due to socket unbinding, and losing
transport in general. It's a follow up to commit 301a62dfb0d0 ("vsock/test:
Add test for UAF due to socket unbinding") and discussion in [1].

The idea remains the same: take an unconnected stream socket with a
transport assigned and then attempt to switch the transport by trying (and
failing) to connect to some other CID. Now do this iterating over all the
well known CIDs (plus one).

While at it, drop the redundant synchronization between client and server.

Some single-transport setups can't be tested effectively; a warning is
issued. Depending on transports available, a variety of splats are possible
on unpatched machines. After reverting commit 78dafe1cf3af ("vsock: Orphan
socket after transport release") and commit fcdd2242c023 ("vsock: Keep the
binding until socket destruction"):

BUG: KASAN: slab-use-after-free in __vsock_bind+0x61f/0x720
Read of size 4 at addr ffff88811ff46b54 by task vsock_test/1475
Call Trace:
 dump_stack_lvl+0x68/0x90
 print_report+0x170/0x53d
 kasan_report+0xc2/0x180
 __vsock_bind+0x61f/0x720
 vsock_connect+0x727/0xc40
 __sys_connect+0xe8/0x100
 __x64_sys_connect+0x6e/0xc0
 do_syscall_64+0x92/0x1c0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

WARNING: CPU: 0 PID: 1475 at net/vmw_vsock/virtio_transport_common.c:37 virtio_transport_send_pkt_info+0xb2b/0x1160
Call Trace:
 virtio_transport_connect+0x90/0xb0
 vsock_connect+0x782/0xc40
 __sys_connect+0xe8/0x100
 __x64_sys_connect+0x6e/0xc0
 do_syscall_64+0x92/0x1c0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
RIP: 0010:sock_has_perm+0xa7/0x2a0
Call Trace:
 selinux_socket_connect_helper.isra.0+0xbc/0x450
 selinux_socket_connect+0x3b/0x70
 security_socket_connect+0x31/0xd0
 __sys_connect_file+0x79/0x1f0
 __sys_connect+0xe8/0x100
 __x64_sys_connect+0x6e/0xc0
 do_syscall_64+0x92/0x1c0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 1518 at lib/refcount.c:25 refcount_warn_saturate+0xdd/0x140
RIP: 0010:refcount_warn_saturate+0xdd/0x140
Call Trace:
 __vsock_bind+0x65e/0x720
 vsock_connect+0x727/0xc40
 __sys_connect+0xe8/0x100
 __x64_sys_connect+0x6e/0xc0
 do_syscall_64+0x92/0x1c0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 1475 at lib/refcount.c:28 refcount_warn_saturate+0x12b/0x140
RIP: 0010:refcount_warn_saturate+0x12b/0x140
Call Trace:
 vsock_remove_bound+0x18f/0x280
 __vsock_release+0x371/0x480
 vsock_release+0x88/0x120
 __sock_release+0xaa/0x260
 sock_close+0x14/0x20
 __fput+0x35a/0xaa0
 task_work_run+0xff/0x1c0
 do_exit+0x849/0x24c0
 make_task_dead+0xf3/0x110
 rewind_stack_and_make_dead+0x16/0x20

[1]: https://lore.kernel.org/netdev/CAGxU2F5zhfWymY8u0hrKksW8PumXAYz-9_qRmW==92oAx1BX3g@mail.gmail.com/

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250611-vsock-test-inc-cov-v3-3-5834060d9c20@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agovsock/test: Introduce get_transports()
Michal Luczaj [Wed, 11 Jun 2025 19:56:51 +0000 (21:56 +0200)]
vsock/test: Introduce get_transports()

Return a bitmap of registered vsock transports. As guesstimated by grepping
/proc/kallsyms (CONFIG_KALLSYMS=y) for known symbols of type `struct
vsock_transport`, or `struct virtio_transport` in case the vsock_transport
is embedded within.

Note that the way `enum transport` and `transport_ksyms[]` are defined
triggers checkpatch.pl:

util.h:11: ERROR: Macros with complex values should be enclosed in parentheses
util.h:20: ERROR: Macros with complex values should be enclosed in parentheses
util.h:20: WARNING: Argument 'symbol' is not used in function-like macro
util.h:28: WARNING: Argument 'name' is not used in function-like macro

While commit 15d4734c7a58 ("checkpatch: qualify do-while-0 advice")
suggests it is known that the ERRORs heuristics are insufficient, I can not
find many other places where preprocessor is used in this
checkpatch-unhappy fashion. Notable exception being bcachefs, e.g.
fs/bcachefs/alloc_background_format.h. WARNINGs regarding unused macro
arguments seem more common, e.g. __ASM_SEL in arch/x86/include/asm/asm.h.

In other words, this might be unnecessarily complex. The same can be
achieved by just telling human to keep the order:

enum transport {
TRANSPORT_LOOPBACK = BIT(0),
TRANSPORT_VIRTIO = BIT(1),
TRANSPORT_VHOST = BIT(2),
TRANSPORT_VMCI = BIT(3),
TRANSPORT_HYPERV = BIT(4),
TRANSPORT_NUM = 5,
};

 #define KSYM_ENTRY(sym) "d " sym "_transport"

/* Keep `enum transport` order */
static const char * const transport_ksyms[] = {
KSYM_ENTRY("loopback"),
KSYM_ENTRY("virtio"),
KSYM_ENTRY("vhost"),
KSYM_ENTRY("vmci"),
KSYM_ENTRY("vhs"),
};

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Tested-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250611-vsock-test-inc-cov-v3-2-5834060d9c20@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agovsock/test: Introduce vsock_bind_try() helper
Michal Luczaj [Wed, 11 Jun 2025 19:56:50 +0000 (21:56 +0200)]
vsock/test: Introduce vsock_bind_try() helper

Create a socket and bind() it. If binding failed, gracefully return an
error code while preserving `errno`.

Base vsock_bind() on top of it.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://patch.msgid.link/20250611-vsock-test-inc-cov-v3-1-5834060d9c20@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'shradha_v6.16-rc1' of https://github.com/shradhagupta6/linux
Jakub Kicinski [Tue, 17 Jun 2025 21:43:58 +0000 (14:43 -0700)]
Merge branch 'shradha_v6.16-rc1' of https://github.com/shradhagupta6/linux

Shradha Gupta says:

====================
Allow dyn MSI-X vector allocation of MANA

In this patchset we want to enable the MANA driver to be able to
allocate MSI-X vectors in PCI dynamically.

The first patch exports pci_msix_prepare_desc() in PCI to be able to
correctly prepare descriptors for dynamically added MSI-X vectors.

The second patch adds the support of dynamic vector allocation in
pci-hyperv PCI controller by enabling the MSI_FLAG_PCI_MSIX_ALLOC_DYN
flag and using the pci_msix_prepare_desc() exported in first patch.

The third patch adds a detailed description of the irq_setup(), to
help understand the function design better.

The fourth patch is a preparation patch for mana changes to support
dynamic IRQ allocation. It contains changes in irq_setup() to allow
skipping first sibling CPU sets, in case certain IRQs are already
affinitized to them.

The fifth patch has the changes in MANA driver to be able to allocate
MSI-X vectors dynamically. If the support does not exist it defaults to
older behavior.

* 'shradha_v6.16-rc1' of https://github.com/shradhagupta6/linux:
  net: mana: Allocate MSI-X vectors dynamically
  net: mana: Allow irq_setup() to skip cpus for affinity
  net: mana: explain irq_setup() algorithm
  PCI: hv: Allow dynamic MSI-X vector allocation
  PCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations
====================

Link: https://patch.msgid.link/1749650984-9193-1-git-send-email-shradhagupta@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: phy: Add c45_phy_ids sysfs directory entry
Yajun Deng [Fri, 13 Jun 2025 13:19:03 +0000 (21:19 +0800)]
net: phy: Add c45_phy_ids sysfs directory entry

The phy_id field only shows the PHY ID of the C22 device, and the C45
device did not store its PHY ID in this field.

Add a new phy_mmd_group, and export the mmd<n>_device_id for the C45
device. These files are invisible to the C22 device.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250613131903.2961-1-yajun.deng@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agoMerge branch 'intel-next-queue-1GbE'
Paolo Abeni [Tue, 17 Jun 2025 12:58:47 +0000 (14:58 +0200)]
Merge branch 'intel-next-queue-1GbE'

Tony Nguyen says:

====================
Faizal Rahim says:

MAC Merge support for frame preemption was previously added for igc:
https://lore.kernel.org/netdev/20250418163822.3519810-1-anthony.l.nguyen@intel.com/

This series builds on that work and adds support for:
- Harmonizing taprio and mqprio queue priority behavior, based on past
  discussions and suggestions:
  https://lore.kernel.org/all/20250214102206.25dqgut5tbak2rkz@skbuf/
- Enabling preemptible queue support for both taprio and mqprio, with
  priority harmonization as a prerequisite.

Patch organization:
- Patches 1-3: Preparation work for patches 6 and 7
- Patches 4-5: Queue priority harmonization
- Patches 6-7: Add preemptible queue support
====================

Link: https://patch.msgid.link/20250611180314.2059166-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agonet: enetc: replace PCVLANR1/2 with SICVLANR1/2 and remove dead branch
Wei Fang [Fri, 13 Jun 2025 09:36:05 +0000 (17:36 +0800)]
net: enetc: replace PCVLANR1/2 with SICVLANR1/2 and remove dead branch

Both PF and VF have rx-vlan-offload enabled, however, the PCVLANR1/2
registers are resources controlled by PF, so VF cannot access these
two registers. Fortunately, the hardware provides SICVLANR1/2 registers
for each SI to reflect the value of PCVLANR1/2 registers. Therefore,
use SICVLANR1/2 instead of PCVLANR1/2. Note that this is not an issue
in actual use, because the current driver does not support custom TPID,
the driver will not access these two registers in actual use, so this
modification is just an optimization.

In addition, since ENETC_RXBD_FLAG_TPID is defined as GENMASK(1, 0),
the possible values are only 0, 1, 2, 3, so the default branch will
never be true, so remove the default branch.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250613093605.39277-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agonet: mana: Allocate MSI-X vectors dynamically
Shradha Gupta [Wed, 11 Jun 2025 14:11:13 +0000 (07:11 -0700)]
net: mana: Allocate MSI-X vectors dynamically

Currently, the MANA driver allocates MSI-X vectors statically based on
MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
up allocating more vectors than it needs. This is because, by this time
we do not have a HW channel and do not know how many IRQs should be
allocated.

To avoid this, we allocate 1 MSI-X vector during the creation of HWC and
after getting the value supported by hardware, dynamically add the
remaining MSI-X vectors.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
4 months agonet: mana: Allow irq_setup() to skip cpus for affinity
Shradha Gupta [Wed, 11 Jun 2025 14:10:42 +0000 (07:10 -0700)]
net: mana: Allow irq_setup() to skip cpus for affinity

In order to prepare the MANA driver to allocate the MSI-X IRQs
dynamically, we need to enhance irq_setup() to allow skipping
affinitizing IRQs to the first CPU sibling group.

This would be for cases when the number of IRQs is less than or equal
to the number of online CPUs. In such cases for dynamically added IRQs
the first CPU sibling group would already be affinitized with HWC IRQ.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
4 months agonet: mana: explain irq_setup() algorithm
Yury Norov [Wed, 11 Jun 2025 14:10:29 +0000 (07:10 -0700)]
net: mana: explain irq_setup() algorithm

Commit 91bfe210e196 ("net: mana: add a function to spread IRQs per CPUs")
added the irq_setup() function that distributes IRQs on CPUs according
to a tricky heuristic. The corresponding commit message explains the
heuristic.

Duplicate it in the source code to make available for readers without
digging git in history. Also, add more detailed explanation about how
the heuristics is implemented.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
4 months agoPCI: hv: Allow dynamic MSI-X vector allocation
Shradha Gupta [Wed, 11 Jun 2025 14:10:15 +0000 (07:10 -0700)]
PCI: hv: Allow dynamic MSI-X vector allocation

Allow dynamic MSI-X vector allocation for pci_hyperv PCI controller
by adding support for the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and using
pci_msix_prepare_desc() to prepare the MSI-X descriptors.

Feature support added for both x86 and ARM64

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
4 months agoPCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations
Shradha Gupta [Wed, 11 Jun 2025 14:10:01 +0000 (07:10 -0700)]
PCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations

For supporting dynamic MSI-X vector allocation by PCI controllers, enabling
the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc()
to prepare the MSI descriptor is also needed.

Export pci_msix_prepare_desc() to allow PCI controllers to support dynamic
MSI-X vector allocation.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
4 months agoeth: gianfar: migrate to new RXFH callbacks
Jakub Kicinski [Fri, 13 Jun 2025 17:27:51 +0000 (10:27 -0700)]
eth: gianfar: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").

Uniquely, this driver supports only the SET operation. It does not
support GET at all. The SET callback also always returns 0, even
tho it checks a bunch of conditions, and if my quick reading is
right, expects the user to insert filtering rules for given flow
type first? Long story short it seems too convoluted to easily
add the GET as part of the conversion.

Link: https://patch.msgid.link/20250613172751.3754732-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'net-phy-remove-phy_driver_is_genphy-and-phy_driver_is_genphy_10g'
Jakub Kicinski [Tue, 17 Jun 2025 01:15:19 +0000 (18:15 -0700)]
Merge branch 'net-phy-remove-phy_driver_is_genphy-and-phy_driver_is_genphy_10g'

Heiner Kallweit says:

====================
net: phy: remove phy_driver_is_genphy and phy_driver_is_genphy_10g

Replace phy_driver_is_genphy() and phy_driver_is_genphy_10g()
with a new flag in struct phy_device.
====================

Link: https://patch.msgid.link/5778e86e-dd54-4388-b824-6132729ad481@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: phy: remove phy_driver_is_genphy_10g
Heiner Kallweit [Sat, 14 Jun 2025 20:32:47 +0000 (22:32 +0200)]
net: phy: remove phy_driver_is_genphy_10g

Remove now unused function phy_driver_is_genphy_10g().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/49b0589a-9604-4ee9-add5-28fbbbe2c2f3@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: phy: improve phy_driver_is_genphy
Heiner Kallweit [Sat, 14 Jun 2025 20:31:57 +0000 (22:31 +0200)]
net: phy: improve phy_driver_is_genphy

Use new flag phydev->is_genphy_driven to simplify this function.
Note that this includes a minor functional change:
Now this function returns true if ANY of the genphy drivers
is bound to the PHY device.

We have only one user in DSA driver mt7530, and there the
functional change doesn't matter.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/c9ac3a7d-262a-425d-9153-97fe3ca6280a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: phy: add flag is_genphy_driven to struct phy_device
Heiner Kallweit [Sat, 14 Jun 2025 20:30:43 +0000 (22:30 +0200)]
net: phy: add flag is_genphy_driven to struct phy_device

In order to get rid of phy_driver_is_genphy() and
phy_driver_is_genphy_10g(), as first step add and use a flag
phydev->is_genphy_driven.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/3f3ad6dc-402e-4915-8d5a-2306b6d5562b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'eth-intel-migrate-to-new-rxfh-callbacks'
Jakub Kicinski [Tue, 17 Jun 2025 01:14:54 +0000 (18:14 -0700)]
Merge branch 'eth-intel-migrate-to-new-rxfh-callbacks'

Jakub Kicinski says:

====================
eth: intel: migrate to new RXFH callbacks

Migrate Intel drivers to the recently added dedicated .get_rxfh_fields
and .set_rxfh_fields ethtool callbacks.

Note that I'm deleting all the boilerplate kdoc from the affected
functions in the more recent drivers. If the maintainers feel strongly
I can respin and add it back, but it really feels useless and undue
burden for refactoring. No other vendor does this.

v1: https://lore.kernel.org/20250613010111.3548291-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250614180907.4167714-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: iavf: migrate to new RXFH callbacks
Jakub Kicinski [Sat, 14 Jun 2025 18:09:07 +0000 (11:09 -0700)]
eth: iavf: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").

I'm deleting all the boilerplate kdoc from the affected functions.
It is somewhere between pointless and incorrect, just a burden for
people refactoring the code.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250614180907.4167714-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: ice: migrate to new RXFH callbacks
Jakub Kicinski [Sat, 14 Jun 2025 18:09:06 +0000 (11:09 -0700)]
eth: ice: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").

I'm deleting all the boilerplate kdoc from the affected functions.
It is somewhere between pointless and incorrect, just a burden for
people refactoring the code.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250614180907.4167714-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: i40e: migrate to new RXFH callbacks
Jakub Kicinski [Sat, 14 Jun 2025 18:09:05 +0000 (11:09 -0700)]
eth: i40e: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").

I'm deleting all the boilerplate kdoc from the affected functions.
It is somewhere between pointless and incorrect, just a burden for
people refactoring the code.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250614180907.4167714-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: fm10k: migrate to new RXFH callbacks
Jakub Kicinski [Sat, 14 Jun 2025 18:09:04 +0000 (11:09 -0700)]
eth: fm10k: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
.get callback moves out of the switch and set_rxnfc disappears
as ETHTOOL_SRXFH as the only functionality.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250614180907.4167714-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: ixgbe: migrate to new RXFH callbacks
Jakub Kicinski [Sat, 14 Jun 2025 18:09:03 +0000 (11:09 -0700)]
eth: ixgbe: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250614180907.4167714-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: igc: migrate to new RXFH callbacks
Jakub Kicinski [Sat, 14 Jun 2025 18:09:02 +0000 (11:09 -0700)]
eth: igc: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250614180907.4167714-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: igb: migrate to new RXFH callbacks
Jakub Kicinski [Sat, 14 Jun 2025 18:09:01 +0000 (11:09 -0700)]
eth: igb: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250614180907.4167714-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'eth-migrate-to-new-rxfh-callbacks-get-only-drivers'
Jakub Kicinski [Tue, 17 Jun 2025 01:14:28 +0000 (18:14 -0700)]
Merge branch 'eth-migrate-to-new-rxfh-callbacks-get-only-drivers'

Jakub Kicinski says:

====================
eth: migrate to new RXFH callbacks (get-only drivers)

Migrate the drivers which only implement ETHTOOL_GRXFH to
the recently added dedicated .get_rxfh_fields ethtool callback.

v1: https://lore.kernel.org/20250613005409.3544529-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250614180638.4166766-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: enetc: migrate to new RXFH callbacks
Jakub Kicinski [Sat, 14 Jun 2025 18:06:38 +0000 (11:06 -0700)]
eth: enetc: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
This driver's RXFH config is read only / fixed so the conversion
is trivial.

Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250614180638.4166766-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: e1000e: migrate to new RXFH callbacks
Jakub Kicinski [Sat, 14 Jun 2025 18:06:37 +0000 (11:06 -0700)]
eth: e1000e: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
This driver's RXFH config is read only / fixed and it's the only
get_rxnfc sub-command the driver supports. So convert the get_rxnfc
handler into a get_rxfh_fields handler.

Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250614180638.4166766-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: lan743x: migrate to new RXFH callbacks
Jakub Kicinski [Sat, 14 Jun 2025 18:06:36 +0000 (11:06 -0700)]
eth: lan743x: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
This driver's RXFH config is read only / fixed so the conversion
is purely factoring out the handling into a helper.

Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250614180638.4166766-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: cxgb4: migrate to new RXFH callbacks
Jakub Kicinski [Sat, 14 Jun 2025 18:06:35 +0000 (11:06 -0700)]
eth: cxgb4: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
This driver's RXFH config is read only / fixed so the conversion
is purely factoring out the handling into a helper.

Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250614180638.4166766-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: cisco: migrate to new RXFH callbacks
Jakub Kicinski [Sat, 14 Jun 2025 18:06:34 +0000 (11:06 -0700)]
eth: cisco: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
This driver's RXFH config is read only / fixed so the conversion
is trivial.

Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250614180638.4166766-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'cn20k-silicon-with-mbox-support'
Jakub Kicinski [Tue, 17 Jun 2025 00:37:54 +0000 (17:37 -0700)]
Merge branch 'cn20k-silicon-with-mbox-support'

Subbaraya Sundeep says:

====================
CN20K silicon with mbox support

CN20K is the next generation silicon in the Octeon series with various
improvements and new features.

Along with other changes the mailbox communication mechanism between RVU
(Resource virtualization Unit) SRIOV PFs/VFs with Admin function (AF) has
also gone through some changes.

Some of those changes are
- Separate IRQs for mbox request and response/ack.
- Configurable mbox size, default being 64KB.
- Ability for VFs to communicate with RVU AF instead of going through
  parent SRIOV PF.

Due to more memory requirement due to configurable mbox size, mbox memory
will now have to be allocated by
- AF (PF0) for communicating with other PFs and all VFs in the system.
- PF for communicating with it's child VFs.

On previous silicons mbox memory was reserved and configured by firmware.

This patch series add basic mbox support for AF (PF0) <=> PFs and
PF <=> VFs. AF <=> VFs communication and variable mbox size support will
come in later.

Patch #1 Supported co-existance of bit encoding PFs and VFs in 16-bit
         hardware pcifunc format between CN20K silicon and older octeon
         series. Also exported PF,VF masks and shifts present in mailbox
         module to all other modules.

Patch #2 Added basic mbox operation APIs and structures to support both
         CN20K and previous version of silicons.

Patch #3 This patch adds support for basic mbox infrastructure
         implementation for CN20K silicon in AF perspective. There are
         few updates w.r.t MBOX ACK interrupt and offsets in CN20k.

Patch #4 Added mbox implementation between NIC PF and AF for CN20K.

Patch #5 Added mbox communication support between AF and AF's VFs.

Patch #6 This patch adds support for MBOX communication between NIC PF and
         its VFs.
====================

Link: https://patch.msgid.link/1749639716-13868-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoocteontx2-pf: CN20K mbox implementation between PF-VF
Sai Krishna [Wed, 11 Jun 2025 11:01:56 +0000 (16:31 +0530)]
octeontx2-pf: CN20K mbox implementation between PF-VF

This patch implements the CN20k MBOX communication between PF and
it's VFs. CN20K silicon got extra interrupt of MBOX response for trigger
interrupt. Also few of the CSR offsets got changed in CN20K against
prior series of silicons.

Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1749639716-13868-7-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoocteontx2-af: CN20K mbox implementation for AF's VF
Sai Krishna [Wed, 11 Jun 2025 11:01:55 +0000 (16:31 +0530)]
octeontx2-af: CN20K mbox implementation for AF's VF

This patch implements the CN20k MBOX communication between AF and
AF's VFs. This implementation uses separate trigger interrupts
for request, response messages against using trigger message data in CN10K.

Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1749639716-13868-6-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoocteontx2-pf: CN20K mbox REQ/ACK implementation for NIC PF
Sai Krishna [Wed, 11 Jun 2025 11:01:54 +0000 (16:31 +0530)]
octeontx2-pf: CN20K mbox REQ/ACK implementation for NIC PF

This implementation uses separate trigger interrupts for request,
response messages against using trigger message data in CN10K.
This patch adds support for basic mbox implementation for CN20K
from NIC PF side.

Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1749639716-13868-5-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoocteontx2-af: CN20k mbox to support AF REQ/ACK functionality
Sai Krishna [Wed, 11 Jun 2025 11:01:53 +0000 (16:31 +0530)]
octeontx2-af: CN20k mbox to support AF REQ/ACK functionality

This implementation uses separate trigger interrupts for request,
response MBOX messages against using trigger message data in CN10K.
This patch adds support for basic mbox implementation for CN20K
from AF side.

Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1749639716-13868-4-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoocteontx2-af: CN20k basic mbox operations and structures
Sai Krishna [Wed, 11 Jun 2025 11:01:52 +0000 (16:31 +0530)]
octeontx2-af: CN20k basic mbox operations and structures

This patch adds basic mbox operation APIs and structures to add support
for mbox module on CN20k silicon. There are few CSR offsets, interrupts
changed between CN20k and prior Octeon series of devices.

Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1749639716-13868-3-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoocteontx2: Set appropriate PF, VF masks and shifts based on silicon
Subbaraya Sundeep [Wed, 11 Jun 2025 11:01:51 +0000 (16:31 +0530)]
octeontx2: Set appropriate PF, VF masks and shifts based on silicon

Number of RVU PFs on CN20K silicon have increased to 96 from maximum
of 32 that were supported on earlier silicons. Every RVU PF and VF is
identified by HW using a 16bit PF_FUNC value. Due to the change in
Max number of PFs in CN20K, the bit encoding of this PF_FUNC has changed.

This patch handles the change by using helper functions(using silicon
check) to use PF,VF masks and shifts to support both new silicon CN20K,
OcteonTx series. These helper functions are used in different modules.

Also moved the NIX AF register offset macros to other files which
will be posted in coming patches.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Link: https://patch.msgid.link/1749639716-13868-2-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'seg6-allow-end-x-behavior-to-accept-an-oif'
Jakub Kicinski [Mon, 16 Jun 2025 22:31:19 +0000 (15:31 -0700)]
Merge branch 'seg6-allow-end-x-behavior-to-accept-an-oif'

Ido Schimmel says:

====================
seg6: Allow End.X behavior to accept an oif

Patches #1-#3 gradually extend the End.X behavior to accept an output
interface as an optional argument. This is needed for cases where user
space wishes to specify an IPv6 link-local address as the nexthop
address.

Patch #4 adds test cases to the existing End.X selftest to cover the new
functionality.
====================

Link: https://patch.msgid.link/20250612122323.584113-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: seg6: Add test cases for End.X with link-local nexthop
Ido Schimmel [Thu, 12 Jun 2025 12:23:23 +0000 (15:23 +0300)]
selftests: seg6: Add test cases for End.X with link-local nexthop

In the current test topology, all the routers are connected to each
other via dedicated links with addresses of the form fcf0:0:x:y::/64.

The test configures rt-3 with an adjacency with rt-4 and rt-4 with an
adjacency with rt-1:

 # ip -n rt_3-IgWSBJ -6 route show tab 90 fcbb:0:300::/48
 fcbb:0:300::/48  encap seg6local action End.X nh6 fcf0:0:3:4::4 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
 # ip -n rt_4-JdCunK -6 route show tab 90 fcbb:0:400::/48
 fcbb:0:400::/48  encap seg6local action End.X nh6 fcf0:0:1:4::1 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium

The routes are used when pinging hs-2 from hs-1 and vice-versa.

Extend the test to also cover End.X behavior with an IPv6 link-local
nexthop address and an output interface. Configure every router
interface with an IPv6 link-local address of the form fe80::x:y/64 and
before re-running the ping tests, replace the previous End.X routes with
routes that use the new IPv6 link-local addresses:

 # ip -n rt_3-IgWSBJ -6 route show tab 90 fcbb:0:300::/48
 fcbb:0:300::/48  encap seg6local action End.X nh6 fe80::4:3 oif veth-rt-3-4 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
 # ip -n rt_4-JdCunK -6 route show tab 90 fcbb:0:400::/48
 fcbb:0:400::/48  encap seg6local action End.X nh6 fe80::1:4 oif veth-rt-4-1 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium

The new test cases fail without the previous patch ("seg6: Allow End.X
behavior to accept an oif"):

 # ./srv6_end_x_next_csid_l3vpn_test.sh
 [...]
 ################################################################################
 TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv6), link-local
 ################################################################################

     TEST: IPv6 Hosts connectivity: hs-1 -> hs-2                         [FAIL]

     TEST: IPv6 Hosts connectivity: hs-2 -> hs-1                         [FAIL]

 ################################################################################
 TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv4), link-local
 ################################################################################

     TEST: IPv4 Hosts connectivity: hs-1 -> hs-2                         [FAIL]

     TEST: IPv4 Hosts connectivity: hs-2 -> hs-1                         [FAIL]

 Tests passed:  40
 Tests failed:   4

And pass with it:

 # ./srv6_end_x_next_csid_l3vpn_test.sh
 [...]
 ################################################################################
 TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv6), link-local
 ################################################################################

     TEST: IPv6 Hosts connectivity: hs-1 -> hs-2                         [ OK ]

     TEST: IPv6 Hosts connectivity: hs-2 -> hs-1                         [ OK ]

 ################################################################################
 TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv4), link-local
 ################################################################################

     TEST: IPv4 Hosts connectivity: hs-1 -> hs-2                         [ OK ]

     TEST: IPv4 Hosts connectivity: hs-2 -> hs-1                         [ OK ]

 Tests passed:  44
 Tests failed:   0

Without the previous patch, rt-3 and rt-4 resolve the wrong routes for
the link-local nexthops, with the output interface being the input
interface:

 # perf script
 [...]
 ping    1067 [001]    37.554486: fib6:fib6_table_lookup: table 254 oif 0 iif 11 proto 41 cafe::254/0 -> fe80::4:3/0 flowlabel 0xb7973 tos 0 scope 0 flags 2 ==> dev veth-rt-3-1 gw :: err 0
 [...]
 ping    1069 [002]    41.573360: fib6:fib6_table_lookup: table 254 oif 0 iif 12 proto 41 cafe::254/0 -> fe80::1:4/0 flowlabel 0xb7973 tos 0 scope 0 flags 2 ==> dev veth-rt-4-2 gw :: err 0

But the correct routes are resolved with the patch:

 # perf script
 [...]
 ping    1066 [006]    30.672355: fib6:fib6_table_lookup: table 254 oif 13 iif 1 proto 41 cafe::254/0 -> fe80::4:3/0 flowlabel 0x85941 tos 0 scope 0 flags 6 ==> dev veth-rt-3-4 gw :: err 0
 [...]
 ping    1066 [006]    30.672411: fib6:fib6_table_lookup: table 254 oif 11 iif 1 proto 41 cafe::254/0 -> fe80::1:4/0 flowlabel 0x91de0 tos 0 scope 0 flags 6 ==> dev veth-rt-4-1 gw :: err 0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250612122323.584113-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoseg6: Allow End.X behavior to accept an oif
Ido Schimmel [Thu, 12 Jun 2025 12:23:22 +0000 (15:23 +0300)]
seg6: Allow End.X behavior to accept an oif

Extend the End.X behavior to accept an output interface as an optional
attribute and make use of it when resolving a route. This is needed when
user space wants to use a link-local address as the nexthop address.

Before:

 # ip route add 2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 oif eth0 dev sr6
 # ip route add 2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6
 $ ip -6 route show
 2001:db8:1::/64  encap seg6local action End.X nh6 fe80::1 dev sr6 metric 1024 pref medium
 2001:db8:2::/64  encap seg6local action End.X nh6 2001:db8:10::1 dev sr6 metric 1024 pref medium

After:

 # ip route add 2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 oif eth0 dev sr6
 # ip route add 2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6
 $ ip -6 route show
 2001:db8:1::/64  encap seg6local action End.X nh6 fe80::1 oif eth0 dev sr6 metric 1024 pref medium
 2001:db8:2::/64  encap seg6local action End.X nh6 2001:db8:10::1 dev sr6 metric 1024 pref medium

Note that the oif attribute is not dumped to user space when it was not
specified (as an oif of 0) since each entry keeps track of the optional
attributes that it parsed during configuration (see struct
seg6_local_lwt::parsed_optattrs).

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250612122323.584113-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoseg6: Call seg6_lookup_any_nexthop() from End.X behavior
Ido Schimmel [Thu, 12 Jun 2025 12:23:21 +0000 (15:23 +0300)]
seg6: Call seg6_lookup_any_nexthop() from End.X behavior

seg6_lookup_nexthop() is a wrapper around seg6_lookup_any_nexthop().
Change End.X behavior to invoke seg6_lookup_any_nexthop() directly so
that we would not need to expose the new output interface argument
outside of the seg6local module.

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250612122323.584113-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoseg6: Extend seg6_lookup_any_nexthop() with an oif argument
Ido Schimmel [Thu, 12 Jun 2025 12:23:20 +0000 (15:23 +0300)]
seg6: Extend seg6_lookup_any_nexthop() with an oif argument

seg6_lookup_any_nexthop() is called by the different endpoint behaviors
(e.g., End, End.X) to resolve an IPv6 route. Extend the function with an
output interface argument so that it could be used to resolve a route
with a certain output interface. This will be used by subsequent patches
that will extend the End.X behavior with an output interface as an
optional argument.

ip6_route_input_lookup() cannot be used when an output interface is
specified as it ignores this parameter. Similarly, calling
ip6_pol_route() when a table ID was not specified (e.g., End.X behavior)
is wrong.

Therefore, when an output interface is specified without a table ID,
resolve the route using ip6_route_output() which will take the output
interface into account.

Note that no endpoint behavior currently passes both a table ID and an
output interface, so the oif argument passed to ip6_pol_route() is
always zero and there are no functional changes in this regard.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250612122323.584113-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'gve-add-rx-hw-timestamping-support'
Jakub Kicinski [Mon, 16 Jun 2025 22:27:27 +0000 (15:27 -0700)]
Merge branch 'gve-add-rx-hw-timestamping-support'

Ziwei Xiao says:

====================
gve: Add Rx HW timestamping support

This patch series add the support of Rx HW timestamping, which sends
adminq commands periodically to the device for clock synchronization with
the NIC.

The ability to read the PHC from user space will be added in the
future patch series when adding the actual PTP support. For this patch
series, it's adding the initial ptp to utilize the ptp_schedule_worker
to schedule the work of syncing the NIC clock.
====================

Link: https://patch.msgid.link/20250614000754.164827-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agogve: Advertise support for rx hardware timestamping
John Fraker [Sat, 14 Jun 2025 00:07:54 +0000 (00:07 +0000)]
gve: Advertise support for rx hardware timestamping

Expand the get_ts_info ethtool handler with the new gve_get_ts_info
which advertises support for rx hardware timestamping.

With this patch, the driver now fully supports rx hardware timestamping.

Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-9-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agogve: Implement ndo_hwtstamp_get/set for RX timestamping
John Fraker [Sat, 14 Jun 2025 00:07:53 +0000 (00:07 +0000)]
gve: Implement ndo_hwtstamp_get/set for RX timestamping

Implement ndo_hwtstamp_get/set to enable hardware RX timestamping,
providing support for SIOC[SG]HWTSTAMP IOCTLs. Included with this support
is the small change necessary to read the rx timestamp out of the rx
descriptor, now that timestamps start being enabled. The gve clock is
only used for hardware timestamps, so started when timestamps are
requested and stopped when not needed.

This version only supports RX hardware timestamping with the rx filter
HWTSTAMP_FILTER_ALL. If the user attempts to configure a more
restrictive filter, the filter will be set to HWTSTAMP_FILTER_ALL in the
returned structure.

Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-8-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agogve: Add rx hardware timestamp expansion
John Fraker [Sat, 14 Jun 2025 00:07:52 +0000 (00:07 +0000)]
gve: Add rx hardware timestamp expansion

Allow the rx path to recover the high 32 bits of the full 64 bit rx
timestamp.

Use the low 32 bits of the last synced nic time and the 32 bits of the
timestamp provided in the rx descriptor to generate a difference, which
is then applied to the last synced nic time to reconstruct the complete
64-bit timestamp.

This scheme remains accurate as long as no more than ~2 seconds have
passed between the last read of the nic clock and the timestamping
application of the received packet.

Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-7-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agogve: Add support to query the nic clock
Kevin Yang [Sat, 14 Jun 2025 00:07:51 +0000 (00:07 +0000)]
gve: Add support to query the nic clock

Query the nic clock and store the results. The timestamp delivered
in descriptors has a wraparound time of ~4 seconds so 250ms is chosen
as the sync cadence to provide a balance between performance, and
drift potential when we do start associating host time and nic time.

Leverage PTP's aux_work to query the nic clock periodically.

Signed-off-by: Kevin Yang <yyd@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Tim Hostetler <thostet@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-6-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agogve: Add adminq lock for queues creation and destruction
Ziwei Xiao [Sat, 14 Jun 2025 00:07:50 +0000 (00:07 +0000)]
gve: Add adminq lock for queues creation and destruction

Adminq commands for queues creation and destruction were not
consistently protected by the driver's adminq_lock. This was previously
benign as these operations were always initiated from contexts holding
kernel-level locks (e.g., rtnl_lock, netdev_lock), which provided
serialization.

Upcoming PTP aux_work will issue adminq commands directly from the
driver to read the NIC clock, without such kernel lock protection.
To prevent race conditions with this new PTP work, this patch ensures
the adminq_lock is held during queues creation and destruction.

Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-5-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agogve: Add initial PTP device support
Harshitha Ramamurthy [Sat, 14 Jun 2025 00:07:49 +0000 (00:07 +0000)]
gve: Add initial PTP device support

If the device supports reading of the nic clock, add support
to initialize and register the PTP clock.

Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-4-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agogve: Add adminq command to report nic timestamp
John Fraker [Sat, 14 Jun 2025 00:07:48 +0000 (00:07 +0000)]
gve: Add adminq command to report nic timestamp

Add an adminq command to read NIC's hardware clock. The driver
allocates dma memory and passes that dma memory address to the device.
The device then writes the clock to the given address.

Signed-off-by: Jeff Rogers <jefrogers@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-3-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agogve: Add device option for nic clock synchronization
John Fraker [Sat, 14 Jun 2025 00:07:47 +0000 (00:07 +0000)]
gve: Add device option for nic clock synchronization

Add the device option and negotiation with the device for clock
synchronization with the nic. This option is necessary before the driver
will advertise support for hardware timestamping or other related
features.

Signed-off-by: Jeff Rogers <jefrogers@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-2-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: mana: Add handler for hardware servicing events
Haiyang Zhang [Fri, 13 Jun 2025 17:00:34 +0000 (10:00 -0700)]
net: mana: Add handler for hardware servicing events

To collaborate with hardware servicing events, upon receiving the special
EQE notification from the HW channel, remove the devices on this bus.
Then, after a waiting period based on the device specs, rescan the parent
bus to recover the devices.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1749834034-18498-1-git-send-email-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'netpoll-untangle-netconsole-and-netpoll'
Jakub Kicinski [Mon, 16 Jun 2025 22:18:35 +0000 (15:18 -0700)]
Merge branch 'netpoll-untangle-netconsole-and-netpoll'

Breno Leitao says:

====================
netpoll: Untangle netconsole and netpoll

Initially netpoll and netconsole were created together, and some
functions are in the wrong file. Seperate netconsole-only functions
in netconsole, avoiding exports.

1. Expose netpoll logging macros in the public header to enable consistent
   log formatting across netpoll consumers.

2. Relocate netconsole-specific functions from netpoll to the netconsole
   module where they are actually used, reducing unnecessary coupling.

3. Remove unnecessary function exports

4. Rename netpoll parsing functions in netconsole to better reflect their
   specific usage.

5. Create a test to check that cmdline works fine. This was in my todo
   list since [1], this was a good time to add it here to make sure this
   patchset doesn't regress.

PS: The code was split in a way that it is easy to review. When copying
the functions from netpoll to netconsole, I do not change than other
than adding `static`. This will make checkpatch unhappy, but, further
patches will address the issues. It is done this way to make it easy for
reviewers.

Link: https://lore.kernel.org/netdev/Z36TlACdNMwFD7wv@dev-ushankar.dev.purestorage.com/
v2: https://lore.kernel.org/20250611-rework-v2-0-ab1d92b458ca@debian.org
v1: https://lore.kernel.org/20250610-rework-v1-0-7cfde283f246@debian.org
====================

Link: https://patch.msgid.link/20250613-rework-v3-0-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: net: add netconsole test for cmdline configuration
Breno Leitao [Fri, 13 Jun 2025 11:31:37 +0000 (04:31 -0700)]
selftests: net: add netconsole test for cmdline configuration

Add a new selftest to verify netconsole module loading with command
line arguments. This test exercises the init_netconsole() path and
validates proper parsing of the netconsole= parameter format.

The test:
- Loads netconsole module with cmdline configuration instead of
  dynamic reconfiguration
- Validates message transmission through the configured target
- Adds helper functions for cmdline string generation and module
  validation

This complements existing netconsole selftests by covering the
module initialization code path that processes boot-time parameters.
This test is useful to test issues like the one described in [1].

Link: https://lore.kernel.org/netdev/Z36TlACdNMwFD7wv@dev-ushankar.dev.purestorage.com/
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-8-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: net: Refactor cleanup logic in lib_netcons.sh
Breno Leitao [Fri, 13 Jun 2025 11:31:36 +0000 (04:31 -0700)]
selftests: net: Refactor cleanup logic in lib_netcons.sh

Extract the network device and namespace cleanup logic from the
cleanup() function into a new do_cleanup() helper in lib_netcons.sh.

The do_cleanup() function only unconfigure the network and
printk, while cleanup() cleans the netconsole targets plus the network
and printk.

This refactoring let this code to be reused in cases netconsole dynamic
is not being used, as in the upcoming patch.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-7-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonetconsole: improve code style in parser function
Breno Leitao [Fri, 13 Jun 2025 11:31:35 +0000 (04:31 -0700)]
netconsole: improve code style in parser function

Split assignment from conditional checks and use preferred null pointer
check style (!delim instead of == NULL) in netconsole_parser_cmdline().
This improves code readability and follows kernel coding style
conventions.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-6-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonetconsole: rename functions to better reflect their purpose
Breno Leitao [Fri, 13 Jun 2025 11:31:34 +0000 (04:31 -0700)]
netconsole: rename functions to better reflect their purpose

Rename netpoll_parse_options() to netconsole_parser_cmdline() and
netpoll_print_options() to netconsole_print_banner() to better
describe what these functions actually do within the netconsole
context.

Also fix minor code style issues including variable declaration
ordering and spacing.

These functions are specific to netconsole functionality rather
than general netpoll operations, so the new names better reflect
their actual purpose.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-5-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonetpoll: move netpoll_print_options to netconsole
Breno Leitao [Fri, 13 Jun 2025 11:31:33 +0000 (04:31 -0700)]
netpoll: move netpoll_print_options to netconsole

Move netpoll_print_options() from net/core/netpoll.c to
drivers/net/netconsole.c and make it static. This function is only used
by netconsole, so there's no need to export it or keep it in the public
netpoll API.

This reduces the netpoll API surface and improves code locality
by keeping netconsole-specific functionality within the netconsole
driver.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-4-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonetpoll: relocate netconsole-specific functions to netconsole module
Breno Leitao [Fri, 13 Jun 2025 11:31:32 +0000 (04:31 -0700)]
netpoll: relocate netconsole-specific functions to netconsole module

Move netpoll_parse_ip_addr() and netpoll_parse_options() from the generic
netpoll module to the netconsole module where they are actually used.

These functions were originally placed in netpoll but are only consumed by
netconsole. This refactoring improves code organization by:

 - Removing unnecessary exported symbols from netpoll
 - Making netpoll_parse_options() static (no longer needs global visibility)
 - Reducing coupling between netpoll and netconsole modules

The functions remain functionally identical - this is purely a code
reorganization to better reflect their actual usage patterns. Here are
the changes:

 1) Move both functions from netpoll to netconsole
 2) Add static to netpoll_parse_options()
 3) Removed the EXPORT_SYMBOL()

PS: This diff does not change the function format, so, it is easy to
review, but, checkpatch will not be happy. A follow-up patch will
address the current issues reported by checkpatch.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-3-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>