www.infradead.org Git - users/hch/misc.git/log

virtio_net: Split struct virtio_net_rss_config

struct virtio_net_rss_config was less useful in actual code because of a
flexible array placed in the middle. Add new structures that split it
into two to avoid having a flexible array in the middle.

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20250321-virtio-v2-1-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: mcs: Remove redundant 'flush_workqueue()' calls

'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.

Remove the redundant 'flush_workqueue()' calls.

This was generated with coccinelle:

@@
expression E;
@@

- flush_workqueue(E);
destroy_workqueue(E);

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250324080854.408188-1-nichen@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Revert "udp_tunnel: GRO optimizations"

Revert "udp_tunnel: use static call for GRO hooks when possible"
This reverts commit 311b36574ceaccfa3f91b74054a09cd4bb877702.

Revert "udp_tunnel: create a fastpath GRO lookup."
This reverts commit 8d4880db378350f8ed8969feea13bdc164564fc1.

There are multiple small issues with the series. In the interest
of unblocking the merge window let's opt for a revert.

Link: https://lore.kernel.org/cover.1742557254.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-sfp-add-single-byte-smbus-sfp-access'

Maxime Chevallier says:

====================
net: phy: sfp: Add single-byte SMBus SFP access

This is V4 for the single-byte SMBus support for SFP cages as well as
embedded PHYs accessed over mdio-i2c.

v3: https://lore.kernel.org/20250314162319.516163-1-maxime.chevallier@bootlin.com
v2: https://lore.kernel.org/20250225112043.419189-1-maxime.chevallier@bootlin.com
v1: https://lore.kernel.org/20250223172848.1098621-1-maxime.chevallier@bootlin.com
====================

Link: https://patch.msgid.link/20250322075745.120831-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mdio-i2c: Add support for single-byte SMBus operations

PHYs that are within copper SFP modules have their MDIO bus accessible
through address 0x56 (usually) on the i2c bus. The MDIO-I2C bridge is
desgned for 16 bits accesses, but we can also perform 8bits accesses by
reading/writing the high and low bytes sequentially.

This commit adds support for this type of accesses, thus supporting
smbus controllers such as the one in the VSC8552.

This was only tested on Copper SFP modules that embed a Marvell 88e1111
PHY.

Tested-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322075745.120831-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: sfp: Add support for SMBus module access

The SFP module's eeprom and internals are accessible through an i2c bus.

It is possible that the SFP might be connected to an SMBus-only
controller, such as the one found in some PHY devices in the VSC85xx
family.

Introduce a set of sfp read/write ops that are going to be used if the
i2c bus is only capable of doing smbus byte accesses.

As Single-byte SMBus transaction go against SFF-8472 and breaks the
atomicity for diagnostics data access, hwmon is disabled in the case
of SMBus access.

Moreover, as this may cause other instabilities, print a warning at
probe time to indicate that the setup may be unreliable because of the
hardware design.

As hwmon may be disabled for both broken EEPROM and smbus, the warnings
are udpated accordingly.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322075745.120831-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2025-03-24

1) Prevent setting high order sequence number bits input in
   non-ESN mode. From Leon Romanovsky.

2) Support PMTU handling in tunnel mode for packet offload.
   From Leon Romanovsky.

3) Make xfrm_state_lookup_byaddr lockless.
   From Florian Westphal.

4) Remove unnecessary NULL check in xfrm_lookup_with_ifid().
   From Dan Carpenter.

* tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: Remove unnecessary NULL check in xfrm_lookup_with_ifid()
  xfrm: state: make xfrm_state_lookup_byaddr lockless
  xfrm: check for PMTU in tunnel mode for packet offload
  xfrm: provide common xdo_dev_offload_ok callback implementation
  xfrm: rely on XFRM offload
  xfrm: simplify SA initialization routine
  xfrm: delay initialization of offload path till its actually requested
  xfrm: prevent high SEQ input in non-ESN mode
====================

Link: https://patch.msgid.link/20250324061855.4116819-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: qcom,ipa: Correct indentation and style in DTS example

DTS example in the bindings should be indented with 2- or 4-spaces and
aligned with opening '- |', so correct any differences like 3-spaces or
mixtures 2- and 4-spaces in one binding.

No functional changes here, but saves some comments during reviews of
new patches built on existing code.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Alex Elder <elder@riscstar.com>
Link: https://patch.msgid.link/20250324125222.82057-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-next-25-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next:

1) Use kvmalloc in xt_hashlimit, from Denis Kirjanov.

2) Tighten nf_conntrack sysctl accepted values for nf_conntrack_max
   and nf_ct_expect_max, from Nicolas Bouchinet.

3) Avoid lookup in nft_fib if socket is available, from Florian Westphal.

4) Initialize struct lsm_context in nfnetlink_queue to avoid
   hypothetical ENOMEM errors, Chenyuan Yang.

5) Use strscpy() instead of _pad when initializing xtables table name,
   kzalloc is already used to initialized the table memory area.
   From Thorsten Blum.

6) Missing socket lookup by conntrack information for IPv6 traffic
   in nft_socket, there is a similar chunk in IPv4, this was never
   added when IPv6 NAT was introduced. From Maxim Mikityanskiy.

7) Fix clang issues with nf_tables CONFIG_MITIGATION_RETPOLINE,
   from WangYuli.

* tag 'nf-next-25-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: Only use nf_skip_indirect_calls() when MITIGATION_RETPOLINE
  netfilter: socket: Lookup orig tuple for IPv6 SNAT
  netfilter: xtables: Use strscpy() instead of strscpy_pad()
  netfilter: nfnetlink_queue: Initialize ctx to avoid memory allocation error
  netfilter: fib: avoid lookup if socket is available
  netfilter: conntrack: Bound nf_conntrack sysctl writes
  netfilter: xt_hashlimit: replace vmalloc calls with kvmalloc
====================

Link: https://patch.msgid.link/20250323100922.59983-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: au1000_eth: Mark au1000_ReleaseDB() static

This fixes the following build warning:
```
drivers/net/ethernet/amd/au1000_eth.c:574:6: warning: no previous prototype for 'au1000_ReleaseDB' [-Wmissing-prototypes]
574 | void au1000_ReleaseDB(struct au1000_private *aup, struct db_dest *pDB)
| ^~~~~~~~~~~~~~~~
```

Signed-off-by: Johan Korsnes <johan.korsnes@gmail.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250323190450.111241-1-johan.korsnes@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: rfs: hash function change

RFS is using two kinds of hash tables.

First one is controlled by /proc/sys/net/core/rps_sock_flow_entries = 2^N
and using the N low order bits of the l4 hash is good enough.

Then each RX queue has its own hash table, controlled by
/sys/class/net/eth1/queues/rx-$q/rps_flow_cnt = 2^X

Current hash function, using the X low order bits is suboptimal,
because RSS is usually using Func(hash) = (hash % power_of_two);

For example, with 32 RX queues, 6 low order bits have no entropy
for a given queue.

Switch this hash function to hash_32(hash, log) to increase
chances to use all possible slots and reduce collisions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250321171309.634100-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-next-2025-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
More features for 6.15, major changes:
* cfg80211/mac80211: fix and enable link reconfiguration
* rtw88: support RTL8814AE/RTL8814AU
* mt7996: preparations for MLO
* ath12k: continued work on MLO
* iwlwifi: add new iwlmld sub-driver/op-mode for
   some current and future devices
* wfx: wowlan support

* tag 'wireless-next-2025-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (311 commits)
  wifi: mt76: mt7996: fix locking in mt7996_mac_sta_rc_work()
  wifi: mt76: mt76x2u: add TP-Link TL-WDN6200 ID to device table
  wifi: mt76: mt792x: re-register CHANCTX_STA_CSA only for the mt7921 series
  wifi: mt76: mt7996: Update mt7996_tx to MLO support
  wifi: mt76: mt7996: rework mt7996_ampdu_action to support MLO
  wifi: mt76: mt7996: rework set/get_tsf callabcks to support MLO
  wifi: mt76: mt7996: set vif default link_id adding/removing vif links
  wifi: mt76: mt7996: rework mt7996_mcu_beacon_inband_discov to support MLO
  wifi: mt76: mt7996: rework mt7996_mcu_add_obss_spr to support MLO
  wifi: mt76: mt7996: rework mt7996_net_fill_forward_path to support MLO
  wifi: mt76: mt7996: rework mt7996_update_mu_group to support MLO
  wifi: mt76: mt7996: rework mt7996_mac_sta_poll to support MLO
  wifi: mt76: mt7996: rework mt7996_mac_sta_rc_work to support MLO
  wifi: mt76: mt7996: remove mt7996_mac_enable_rtscts()
  wifi: mt76: mt7996: rework mt7996_sta_hw_queue_read to support MLO
  wifi: mt76: mt7996: rework mt7996_set_hw_key to support MLO
  wifi: mt76: mt7996: Add mt7996_sta_link to mt7996_mcu_add_bss_info signature
  wifi: mt76: mt7996: rework mt7996_sta_set_4addr and mt7996_sta_set_decap_offload to support MLO
  wifi: mt76: mt7996: rework mt7996_rx_get_wcid to support MLO
  wifi: mt76: mt7996: Rely on wcid_to_sta in mt7996_mac_add_txs_skb()
  ...
====================

Link: https://patch.msgid.link/20250320131106.33266-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-dwmac-rk-add-gmac-support-for-rk3528'

Jonas Karlman says:

====================
net: stmmac: dwmac-rk: Add GMAC support for RK3528

The Rockchip RK3528 has two Ethernet controllers, one 100/10 MAC to be
used with the integrated PHY and a second 1000/100/10 MAC to be used
with an external Ethernet PHY.

This series add initial support for the Ethernet controllers found
in RK3528 and initial support to power up/down the integrated PHY.

v2: https://lore.kernel.org/20250309232622.1498084-1-jonas@kwiboo.se
v1: https://lore.kernel.org/20250306221402.1704196-1-jonas@kwiboo.se
====================

Link: https://patch.msgid.link/20250319214415.3086027-1-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-rk: Add initial support for RK3528 integrated PHY

Rockchip RK3528 (and RV1106) has a different integrated PHY compared to
the integrated PHY on RK3228/RK3328. Current powerup/down operation is
not compatible with the integrated PHY found in these newer SoCs.

Add operations to powerup/down the integrated PHY found in RK3528.
Use helpers that can be used by other GMAC variants in the future.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250319214415.3086027-6-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-rk: Add integrated_phy_powerdown operation

Rockchip RK3528 (and RV1106) has a different integrated PHY compared to
the integrated PHY on RK3228/RK3328. Current powerup/down operation is
not compatible with the integrated PHY found in these newer SoCs.

Add a new integrated_phy_powerdown operation and change the call chain
for integrated_phy_powerup to prepare support for the integrated PHY
found in these newer SoCs.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250319214415.3086027-5-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-rk: Move integrated_phy_powerup/down functions

Rockchip RK3528 (and RV1106) has a different integrated PHY compared to
the integrated PHY on RK3228/RK3328. Current powerup/down operation is
not compatible with the integrated PHY found in these SoCs.

Move the rk_gmac_integrated_phy_powerup/down functions to top of the
file to prepare for them to be called directly by a GMAC variant
specific powerup/down operation.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250319214415.3086027-4-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-rk: Add GMAC support for RK3528

Rockchip RK3528 has two Ethernet controllers based on Synopsys DWC
Ethernet QoS IP.

Add initial support for the RK3528 GMAC variant.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://patch.msgid.link/20250319214415.3086027-3-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: rockchip-dwmac: Add compatible string for RK3528

Rockchip RK3528 has two Ethernet controllers based on Synopsys DWC
Ethernet QoS IP.

Add compatible string for the RK3528 variant.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250319214415.3086027-2-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-improve-stmmac-resume-rx-clocking'

Russell King says:

====================
net: improve stmmac resume rx clocking

stmmac has had a long history of problems with resuming, illustrated by
reset failure due to the receive clock not running.

Several attempts have been attempted over the years to address this
issue, such as moving phylink_start() (now phylink_resume()) super
early in stmmac_resume() in commit 90702dcd19c0 ("net: stmmac: fix MAC
not working when system resume back with WoL a ctive.") However, this
has the downside that stmmac_mac_link_up() can (and demonstrably is)
called before or during the driver initialisation in another thread.
This can cause issues as packets could begin to be queued, and the
transmit/receive enable bits will be set before any initialisation has
been done.

Another attempt is used by dwmac-socfpga.c in commit 2d871aa07136 ("net:
stmmac: add platform init/exit for Altera's ARM socfpga") which
pre-dates the above commit.

Neither of these two approaches consider the effect of EEE with a PHY
that supports receive clock-stop and has that feature enabled (which
the stmmac driver does enable). If the link is up, then there is the
possibility for the receive path to be in low-power mode, and the PHY
may stop its receive clock.

This series addresses these issues by (each is not necessarily a
separate patch):

1) introducing phylink_prepare_resume(), which can be used by MAC
   drivers to ensure that the PHY is resumed prior to doing any
   re-initialisation work. This call is added to stmmac_resume().

2) moving phylink_resume() after all re-initialisation has completed,
   thereby ensuring that the hardware is ready to be enabled for
   packet reception/transmission.

3) with (1) and (2) addressed, the need for socfpga to have a private
   work-around is no longer necessary, so it is removed.

4) introducing phylink functions to block/unblock the receive clock-
   stop at the PHY. As these require PHY access over the MDIO bus,
   they can sleep, so are not suitable for atomic access.

5) the stmmac hardware requires the receive clock to be running for
   reset to complete. Depending on synthesis options, this requirement
   may also extend to writing various registers as well, e.g. setting
   the MAC address, writing some of the vlan registers, etc. Full
   details are in the databook.

   We add blocking/unblocking of the PHY receive clock-stop around
   parts of the main stmmac driver where we have a context that we
   can sleep. These are wrapped with the new phylink functions.

   However, depending on synthesis options, there could be other
   places where the net core calls the driver with a BH-disabled
   context where we can't sleep, and thus can't block the PHY from
   disabling its receive clock. These are documented with FIXME
   comments.

Given the last paragraph above, I am wondering whether a better
approach would be to ensure that receive clock-stop is always disabled
at the PHY with stmmac. From what I can see, implementations do not
document to this level of detail, which makes it difficult to tell
which registers require the receive clock to be running to behave
correctly.

This patch series has been tested on the Tegra194 Jetson Xavier NX
board kindly donated by NVidia, with two additional patches that are
pending in patchwork - the first is required to have EEE's LPI mode
passed through to the MAC on this platform to allow testing under
PHY clock-stop scenarios. The second is a bug fix for PHYLIB and
makes "eee off" functional, but should not affect this series.

All patches on top of net-next commit f749448ce9f1 ("Merge branch
'net-mlx5-hw-steering-cleanups'")

https://patchwork.kernel.org/project/netdevbpf/patch/E1ttnHW-00785s-Uq@rmk-PC.armlinux.org.uk/
https://patchwork.kernel.org/project/netdevbpf/patch/E1ttmWN-0077Mb-Q6@rmk-PC.armlinux.org.uk/
====================

Link: https://patch.msgid.link/Z9ySeo61VYTClIJJ@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: block PHY RXC clock-stop

The DesignWare core requires the receive clock to be running during
certain operations. Ensure that we block PHY RXC clock-stop during
these operations.

This is a best-efforts change - not everywhere can be covered by this
because of net's core locking, which means we can't access the MDIO
bus to configure the PHY to disable RXC clock-stop in certain areas.
These are marked with FIXME comments.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6p-008Vjz-Qy@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phylink: add functions to block/unblock rx clock stop

Some MACs require the PHY receive clock to be running to complete setup
actions. This may fail if the PHY has negotiated EEE, the MAC supports
receive clock stop, and the link has entered LPI state. Provide a pair
of APIs that MAC drivers can use to temporarily block the PHY disabling
the receive clock.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6k-008Vjt-MZ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: socfpga: remove phy_resume() call

As the previous commit addressed DWGMAC resuming with a PHY in
suspended state, there is now no need for socfpga to work around
this. Remove this code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6f-008Vjn-J1@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: address non-LPI resume failures properly

The Synopsys Designware GMAC core databook requires all clocks to be
active in order to complete software reset, which we perform during
resume.

However, IEEE 802.3 allows a PHY to stop its clocks when placed in
low-power mode, which happens when the system is suspended and WoL
is not enabled.

As an attempt to work around this, commit 36d18b5664ef ("net: stmmac:
start phylink instance before stmmac_hw_setup()") started phylink
early, but this has the side effect that the mac_link_up() method may
be called before or during the initialisation of GMAC hardware.

We also have the socfpga glue driver directly calling phy_resume()
also as an attempt to work around this.

In a previous commit, phylink_prepare_resume() has been introduced
to give MAC drivers a way to ensure that the PHY is resumed prior to
their initialisation of their MAC hardware. This commit adds the call,
and moves the phylink_resume() call back to where it should be before
the aforementioned commit.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6a-008Vjh-FG@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phylink: add phylink_prepare_resume()

When the system is suspended, the PHY may be placed in low-power mode
by setting the BMCR 0.11 Power down bit. IEEE 802.3 states that the
behaviour of the PHY in this state is implementation specific, and
the PHY is not required to meet the RX_CLK and TX_CLK requirements.
Essentially, this means that a PHY may stop the clocks that it is
generating while in power down state.

However, MACs exist which require the clocks from the PHY to be running
in order to properly resume. phylink_prepare_resume() provides them
with a way to clear the Power down bit early.

Note, however, that IEEE 802.3 gives PHYs up to 500ms grace before the
transmit and receive clocks meet the requirements after clearing the
power down bit.

Add a resume preparation function, which will ensure that the receive
clock from the PHY is appropriately configured while resuming.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6V-008Vjb-AP@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'sfc-devlink-flash-for-x4'

Edward Cree says:

====================
sfc: devlink flash for X4

Updates to support devlink flash on X4 NICs.
Patch #2 is needed for NVRAM_PARTITION_TYPE_AUTO, and patch #1 is
needed because the latest MCDI headers from firmware no longer
include MDIO read/write commands.

v1: https://lore.kernel.org/cover.1742223233.git.ecree.xilinx@gmail.com
====================

Link: https://patch.msgid.link/cover.1742493016.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: support X4 devlink flash

Unlike X2 and EF100, we do not attempt to parse the firmware file to
find an image within it; we simply hand the entire file to the MC,
which is responsible for understanding any container formats we might
use and validating that the firmware file is applicable to this NIC.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/9a72a74002a7819c780b0a18ce9294c9d4e1db12.1742493017.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: update MCDI protocol headers

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/bcb7597460a5a99d1dca4ef282f4aa2dd46ae545.1742493017.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: rip out MDIO support

Unlike Siena, no EF10 board ever had an external PHY, and consequently
MDIO handling isn't even built into the firmware. Since Siena has
been split out into its own driver, the MDIO code can be deleted from
the sfc driver.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/aa689d192ddaef7abe82709316c2be648a7bd66e.1742493017.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: reorganize IP MIB values (II)

Commit 14a196807482 ("net: reorganize IP MIB values") changed
MIB values to group hot fields together.

Since then 5 new fields have been added without caring about
data locality.

This patch moves IPSTATS_MIB_OUTPKTS, IPSTATS_MIB_NOECTPKTS,
IPSTATS_MIB_ECT1PKTS, IPSTATS_MIB_ECT0PKTS, IPSTATS_MIB_CEPKTS
to the hot portion of per-cpu data.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250320101434.3174412-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: avoid atomic operations on sk->sk_rmem_alloc

TCP uses generic skb_set_owner_r() and sock_rfree()
for received packets, with socket lock being owned.

Switch to private versions, avoiding two atomic operations
per packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250320121604.3342831-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'nexthop-convert-rtm_-new-del-nexthop-to-per-netns-rtnl'

Kuniyuki Iwashima says:

====================
nexthop: Convert RTM_{NEW,DEL}NEXTHOP to per-netns RTNL.

Patch 1 - 5 move some validation for RTM_NEWNEXTHOP so that it can be
called without RTNL.

Patch 6 & 7 converts RTM_NEWNEXTHOP and RTM_DELNEXTHOP to per-netns RTNL.

Note that RTM_GETNEXTHOP and RTM_GETNEXTHOPBUCKET are not touched in
this series.

rtm_get_nexthop() can be easily converted to RCU, but rtm_dump_nexthop()
needs more work due to the left-to-right rbtree walk, which looks prone
to node deletion and tree rotation without a retry mechanism.

v1: https://lore.kernel.org/netdev/20250318233240.53946-1-kuniyu@amazon.com/
====================

Link: https://patch.msgid.link/20250319230743.65267-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Convert RTM_DELNEXTHOP to per-netns RTNL.

In rtm_del_nexthop(), only nexthop_find_by_id() and remove_nexthop()
require RTNL as they touch net->nexthop.rb_root.

Let's move RTNL down as rtnl_net_lock() before nexthop_find_by_id().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Convert RTM_NEWNEXTHOP to per-netns RTNL.

If we pass false to the rtnl_held param of lwtunnel_valid_encap_type(),
we can move RTNL down before rtm_to_nh_config_rtnl().

Let's use rtnl_net_lock() in rtm_new_nexthop().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Remove redundant group len check in nexthop_create_group().

The number of NHA_GROUP entries is guaranteed to be non-zero in
nh_check_attr_group().

Let's remove the redundant check in nexthop_create_group().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Check NLM_F_REPLACE and NHA_ID in rtm_new_nexthop().

nexthop_add() checks if NLM_F_REPLACE is specified without
non-zero NHA_ID, which does not require RTNL.

Let's move the check to rtm_new_nexthop().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Move NHA_OIF validation to rtm_to_nh_config_rtnl().

NHA_OIF needs to look up a device by __dev_get_by_index(),
which requires RTNL.

Let's move NHA_OIF validation to rtm_to_nh_config_rtnl().

Note that the proceeding checks made the original !cfg->nh_fdb
check redundant.

  NHA_FDB is set           -> NHA_OIF cannot be set
  NHA_FDB is set but false -> NHA_OIF must be set
  NHA_FDB is not set       -> NHA_OIF must be set

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Split nh_check_attr_group().

We will push RTNL down to rtm_new_nexthop(), and then we
want to move non-RTNL operations out of the scope.

nh_check_attr_group() validates NHA_GROUP attributes, and
nexthop_find_by_id() and some validation requires RTNL.

Let's factorise such parts as nh_check_attr_group_rtnl()
and call it from rtm_to_nh_config_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Move nlmsg_parse() in rtm_to_nh_config() to rtm_new_nexthop().

We will split rtm_to_nh_config() into non-RTNL and RTNL parts,
and then the latter also needs tb.

As a prep, let's move nlmsg_parse() to rtm_new_nexthop().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250319230743.65267-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: fix _DEVADD() and _DEVUPD() macros

ip6_rcv_core() is using:

__IP6_ADD_STATS(net, idev,
IPSTATS_MIB_NOECTPKTS +
(ipv6_get_dsfield(hdr) & INET_ECN_MASK),
max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs));

This is currently evaluating both expressions twice.

Fix _DEVADD() and _DEVUPD() macros to evaluate their arguments once.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250319212516.2385451-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-misc-enhancements-2025-03-19'

Tariq Toukan says:

====================
mlx5 misc enhancements 2025-03-19

This series introduces multiple small misc enhancements
from the team to the mlx5 core and Eth drivers.
====================

Link: https://patch.msgid.link/1742392983-153050-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, Don't offload CT commit if it's the last action

For CT action with commit argument, it's usually followed by the
forward action, either to the output netdev or next chain. The default
behavior for software is to drop by setting action attribute to
TC_ACT_SHOT instead of TC_ACT_PIPE if it's the last action. But driver
can't handle it, so block the offload for such case.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: CT: Filter legacy rules that are unrelated to nic

In nic mode CT setup where we do hairpin between the two
nics, both nics register to the same flow table (per zone),
and try to offload all rules on it.

Instead, filter the rules that originated from the relevant nic
(so only one side is offloaded for each nic).

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Update pfnum retrieval for devlink port attributes

Align mlx5 driver usage of 'pfnum' with the documentation clarification
introduced in commit bb70b0d48d8e ("devlink: Improve the port attributes
description").

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: fw reset, check bridge accessibility at earlier stage

Currently, mlx5_is_reset_now_capable() checks whether the pci bridge is
accessible only on bridge hot plug capability check. If the pci bridge
is not accessible, reset now will fail regardless of bridge hotplug
capability. Move this check to function mlx5_is_reset_now_capable()
which, in such case, aborts the reset and does so in the request phase
instead of the reset now phase.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Lag, use port selection tables when available

As queue affinity is being deprecated and will no longer be supported
in the future, Always check for the presence of the port selection
namespace. When available, leverage it to distribute traffic
across the physical ports via steering, ensuring compatibility with
future NICs.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TX, Utilize WQ fragments edge for multi-packet WQEs

For simplicity reasons, the driver avoids crossing work queue fragment
boundaries within the same TX WQE (Work-Queue Element). Until today, as
the number of packets in a TX MPWQE (Multi-Packet WQE) descriptor is not
known in advance, the driver pre-prepared contiguous memory for the
largest possible WQE. For this, when getting too close to the fragment
edge, having no room for the largest WQE possible, the driver was
filling the fragment remainder with NOP descriptors, aligning the next
descriptor to the beginning of the next fragment.

Generating and handling these NOPs wastes resources, like: CPU cycles,
work-queue entries fetched to the device, and PCI bandwidth.

In this patch, we replace this NOPs filling mechanism in the TX MPWQE
flow. Instead, we utilize the remaining entries of the fragment with a
TX MPWQE. If this room turns out to be too small, we simply open an
additional descriptor starting at the beginning of the next fragment.

Performance benchmark:
uperf test, single server against 3 clients.
TCP multi-stream, bidir, traffic profile "2x350B read, 1400B write".
Bottleneck is in inbound PCI bandwidth (device POV).

+---------------+------------+------------+--------+
|               | Before     | After      |        |
+---------------+------------+------------+--------+
| BW            | 117.4 Gbps | 121.1 Gbps | +3.1%  |
+---------------+------------+------------+--------+
| tx_packets    | 15 M/sec   | 15.5 M/sec | +3.3%  |
+---------------+------------+------------+--------+
| tx_nops       | 3  M/sec   | 0          | -100%  |
+---------------+------------+------------+--------+

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1742391746-118647-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dql: Fix dql->limit value when reset.

Executing dql_reset after setting a non-zero value for limit_min can
lead to an unreasonable situation where dql->limit is less than
dql->limit_min.

For instance, after setting
/sys/class/net/eth*/queues/tx-0/byte_queue_limits/limit_min,
an ifconfig down/up operation might cause the ethernet driver to call
netdev_tx_reset_queue, which in turn invokes dql_reset.

In this case, dql->limit is reset to 0 while dql->limit_min remains
non-zero value, which is unexpected. The limit should always be
greater than or equal to limit_min.

Signed-off-by: Jing Su <jingsusu@didiglobal.com>
Link: https://patch.msgid.link/Z9qHD1s/NEuQBdgH@pilot-ThinkCentre-M930t-N000
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-net-mixed-select-polling-mode-for-tcp-ao-tests'

Dmitry Safonov via says:

====================
selftests/net: Mixed select()+polling mode for TCP-AO tests

Should fix flaky tcp-ao/connect-deny-ipv6 test.

v1: https://lore.kernel.org/20250312-tcp-ao-selftests-polling-v1-0-72a642b855d5@gmail.com
====================

Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-0-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Drop timeout argument from test_client_verify()

It's always TEST_TIMEOUT_SEC, with an unjustified exception in rst test,
that is more paranoia-long timeout rather than based on requirements.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-7-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Delete timeout from test_connect_socket()

Unused: it's always either the default timeout or asynchronous
connect().

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-6-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Print the testing side in unsigned-md5

As both client and server print the same test name on failure or pass,
add "[server]" so that it's more obvious from a log which side printed
"ok" or "not ok".

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-5-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Add mixed select()+polling mode to TCP-AO tests

Currently, tcp_ao tests have two timeouts: TEST_RETRANSMIT_SEC and
TEST_TIMEOUT_SEC [by default 1 and 5 seconds]. The first one,
TEST_RETRANSMIT_SEC is used for operations that are expected to succeed
in order for a test to pass. It is usually not consumed and exists only
to avoid indefinite test run if the operation didn't complete.
The second one, TEST_RETRANSMIT_SEC exists for the tests that checking
operations, that are expected to fail/timeout. It is shorter as it is
fully consumed, with an expectation that if operation didn't succeed
during that period, it will timeout. And the related test that expects
the timeout is passing. The actual operation failure is then
cross-verified by other means like counters checks.

The issue with TEST_RETRANSMIT_SEC timeout is that 1 second is the exact
initial TCP timeout. So, in case the initial segment gets lost (quite
unlikely on local veth interface between two net namespaces, yet happens
in slow VMs), the retransmission never happens and as a result, the test
is not actually testing the functionality. Which in the end fails
counters checks.

As I want tcp_ao selftests to be fast and finishing in a reasonable
amount of time on manual run, I didn't consider increasing
TEST_RETRANSMIT_SEC.

Rather, initially, BPF_SOCK_OPS_TIMEOUT_INIT looked promising as a lever
to make the initial TCP timeout shorter. But as it's not a socket bpf
attached thing, but sock_ops (attaches to cgroups), the selftests would
have to use libbpf, which I wanted to avoid if not absolutely required.

Instead, use a mixed select() and counters polling mode with the longer
TEST_TIMEOUT_SEC timeout to detect running-away failed tests. It
actually not only allows losing segments and succeeding after
the previous TEST_RETRANSMIT_SEC timeout was consumed, but makes
the tests expecting timeout/failure pass faster.

The only test case taking longer (TEST_TIMEOUT_SEC) now is connect-deny
"wrong snd id", which checks for no key on SYN-ACK for which there is no
counter in the kernel (see tcp_make_synack()). Yet it can be speed up
by poking skpair from the trace event (see trace_tcp_ao_synack_no_key).

Fixes: ed9d09b309b1 ("selftests/net: Add a test for TCP-AO keys matching")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20241205070656.6ef344d7@kernel.org/
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-4-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Fetch and check TCP-MD5 counters

There are related TCP-MD5 <=> TCP and TCP-MD5 <=> TCP-AO tests
that can benefit from checking the related counters, not only from
validating operations timeouts.

It also prepares the code for introduction of mixed select()+poll mode,
see the follow-up patches.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-3-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Provide tcp-ao counters comparison helper

Rename __test_tcp_ao_counters_cmp() into test_assert_counters_ao() and
test_tcp_ao_key_counters_cmp() into test_assert_counters_key() as they
are asserts, rather than just compare functions.

Provide test_cmp_counters() helper, that's going to be used to compare
ao_info and netns counters as a stop condition for polling the sockets.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-2-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Print TCP flags in more common format

Before:
># 13145[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: !FS!R!P!., keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1

After:
># 13487[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: S, keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1

For the history, I think the initial format was to emphasize the absence
of flags as well as their presence (!R meant no RST flag). But looking
again, it's just unreadable and hard to understand.
Make it the standard/expected one.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-1-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ynl: devlink: add missing board-serial-number

Add a missing attribute of board serial number.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250320085947.103419-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-xdp-add-missing-metadata-support-for-some-xdp-drvs'

Lorenzo Bianconi says:

====================
net: xdp: Add missing metadata support for some xdp drvs

Introduce missing metadata support for some xdp drivers setting metadata
size building the skb from xdp_buff.
Please note most of the drivers are just compile tested.

v1: https://lore.kernel.org/20250311-mvneta-xdp-meta-v1-0-36cf1c99790e@kernel.org
====================

Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-0-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ti: cpsw: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in cpsw/cpsw_new
drivers. ti cpsw and cpsw_new drivers set xdp headroom at least to
CPSW_HEADROOM_NA:

CPSW_HEADROOM_NA max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN

so the headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-7-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in mana driver.
mana driver sets xdp headroom to XDP_PACKET_HEADROOM so the headroom is
large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-6-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mediatek: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in mediatek driver.
mtk_eth_soc driver sets xdp headroom to XDP_PACKET_HEADROOM so the
headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-5-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: octeontx2: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in octeontx2 driver.
octeontx2 driver sets xdp headroom to OTX2_HEAD_ROOM

OTX2_HEAD_ROOM OTX2_ALIGN
OTX2_ALIGN 128

so the headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-4-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: netsec: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in netsec driver.
netsec driver sets xdp headroom to NETSEC_RXBUF_HEADROOM:

NETSEC_RXBUF_HEADROOM max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN

so the headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-3-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mvpp2: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in mvpp2 driver
mvpp2 driver sets xdp headroom to:

MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM

where

MVPP2_MH_SIZE 2
MVPP2_SKB_HEADROOM min(max(XDP_PACKET_HEADROOM, NET_SKB_PAD), 224)

so the headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-2-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mvneta: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in mvneta driver
mvneta sets xdp headroom to:

MVNETA_MH_SIZE + MVNETA_SKB_HEADROOM

where

MVNETA_MH_SIZE 2
MVNETA_SKB_HEADROOM max(NET_SKB_PAD, XDP_PACKET_HEADROOM)

so the headroom is large enough to contain xdp_frame and xdp metadata.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-1-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: tulip: avoid unused variable warning

There is an effort to achieve W=1 kernel builds without warnings.
As part of that effort Helge Deller highlighted the following warnings
in the tulip driver when compiling with W=1 and CONFIG_TULIP_MWI=n:

.../tulip_core.c: In function ‘tulip_init_one’:
.../tulip_core.c:1309:22: warning: variable ‘force_csr0’ set but not used

This patch addresses that problem using IS_ENABLED(). This approach has
the added benefit of reducing conditionally compiled code. And thus
increasing compile coverage. E.g. for allmodconfig builds which enable
CONFIG_TULIP_MWI.

Compile tested only.
No run-time effect intended.

Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250318-tulip-w1-v3-1-a813fadd164d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'af_unix-clean-up-headers'

Kuniyuki Iwashima says:

====================
af_unix: Clean up headers.

AF_UNIX files include many unnecessary headers (netdevice.h and
rtnetlink.h, etc), and this series cleans them up.

Note that there are still some headers included indirectly and
modifying them triggers rebuild, which seems mostly inevitable. [0]

  $ python3 include_graph.py net/unix/garbage.c linux/rtnetlink.h linux/netdevice.h
  ...
  include/net/af_unix.h
  | include/linux/net.h
  | | include/linux/once.h
  | | include/linux/sockptr.h
  | | include/uapi/linux/net.h
  | include/net/sock.h
  | | include/linux/netdevice.h   <---
  ...
  | | include/net/dst.h
  | | | include/linux/rtnetlink.h <---

[0]: https://gist.github.com/q2ven/9c5897f11a493145829029c0bfb364d0
====================

Link: https://patch.msgid.link/20250318034934.86708-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Clean up #include under net/unix/.

net/unix/*.c include many unnecessary header files (rtnetlink.h,
netdevice.h, etc).

Let's clean them up.

af_unix.c:

  +uapi/linux/sockios.h   : Only exist under include/uapi
  +uapi/linux/termios.h   : Only exist under include/uapi

  -linux/freezer.h        : No longer use freezable_schedule_timeout()
  -linux/in.h             : No ipv4_is_XXX() etc
  -linux/module.h         : No longer support CONFIG_UNIX=m
  -linux/netdevice.h      : No dev used
  -linux/rtnetlink.h      : Not part of rtnetlink API
  -linux/signal.h         : signal_pending() is defined in sched/signal.h
  -linux/stat.h           : No struct stat used
  -net/checksum.h         : CHECKSUM_UNNECESSARY is defined in skbuff.h

diag.c:

  +linux/dcache.h         : struct dentry in sk_diag_dump_vfs()
  +linux/user_namespace.h : struct user_namespace in sk_diag_dump_uid()
  +uapi/linux/unix_diag.h : Only exist under include/uapi/

garbage.c:

  +linux/list.h           : struct unix_{vertex,edge}, etc
  +linux/workqueue.h      : DECLARE_WORK(unix_gc_work, ...)

  -linux/file.h           : No fget() etc
  -linux/kernel.h         : No cond_resched() etc
  -linux/netdevice.h      : No dev used
  -linux/proc_fs.h        : No procfs provided
  -linux/string.h         : No memcpy(), kmemdup(), etc

sysctl_net_unix.c:

  +linux/string.h         : kmemdup()
  +net/net_namespace.h    : struct net, net_eq()

  -linux/mm.h             : slab.h is enough

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250318034934.86708-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Explicitly include headers for non-pointer struct fields.

include/net/af_unix.h indirectly includes some definitions for structs.

Let's include such headers explicitly.

  linux/atomic.h   : scm_stat.nr_fds
  linux/net.h      : unix_sock.peer_wq
  linux/path.h     : unix_sock.path
  linux/spinlock.h : unix_sock.lock
  linux/wait.h     : unix_sock.peer_wake
  uapi/linux/un.h  : unix_address.name[]

linux/socket.h is removed as the structs there are not used directly,
and linux/un.h is clarified with uapi as un.h only exists under
include/uapi.

While at it, duplicate headers are removed from .c files.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250318034934.86708-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Move internal definitions to net/unix/.

net/af_unix.h is included by core and some LSMs, but most definitions
need not be.

Let's move struct unix_{vertex,edge} to net/unix/garbage.c and other
definitions to net/unix/af_unix.h.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250318034934.86708-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Sort headers.

This is a prep patch to make the following changes cleaner.

No functional change intended.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250318034934.86708-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'support-tcp_rto_min_us-and-tcp_delack_max_us-for-set-getsockopt'

Jason Xing says:

====================
support TCP_RTO_MIN_US and TCP_DELACK_MAX_US for set/getsockopt

Add set/getsockopt supports for TCP_RTO_MIN_US and TCP_DELACK_MAX_US.
====================

Link: https://patch.msgid.link/20250317120314.41404-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: support TCP_DELACK_MAX_US for set/getsockopt use

Support adjusting/reading delayed ack max for socket level by using
set/getsockopt().

This option aligns with TCP_BPF_DELACK_MAX usage. Considering that bpf
option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.

Add WRITE_ONCE/READ_ONCE() to prevent data-race if setsockopt()
happens to write one value to icsk_delack_max while icsk_delack_max is
being read.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: support TCP_RTO_MIN_US for set/getsockopt use

Support adjusting/reading RTO MIN for socket level by using set/getsockopt().

This new option has the same effect as TCP_BPF_RTO_MIN, which means it
doesn't affect RTAX_RTO_MIN usage (by using ip route...). Considering that
bpf option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.

When the socket is created, its icsk_rto_min is set to the default
value that is controlled by sysctl_tcp_rto_min_us. Then if application
calls setsockopt() with TCP_RTO_MIN_US flag to pass a valid value, then
icsk_rto_min will be overridden in jiffies unit.

This patch adds WRITE_ONCE/READ_ONCE to avoid data-race around
icsk_rto_min.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlxsw-add-vxlan-to-the-same-hardware-domain-as-physical-bridge-ports'

Petr Machata says:

====================
mlxsw: Add VXLAN to the same hardware domain as physical bridge ports

Amit Cohen writes:

Packets which are trapped to CPU for forwarding in software data path
are handled according to driver marking of skb->offload_{,l3}_fwd_mark.
Packets which are marked as L2-forwarded in hardware, will not be flooded
by the bridge to bridge ports which are in the same hardware domain as the
ingress port.

Currently, mlxsw does not add VXLAN bridge ports to the same hardware
domain as physical bridge ports despite the fact that the device is able
to forward packets to and from VXLAN tunnels in hardware. In some
scenarios this can result in remote VTEPs receiving duplicate packets.

To solve such packets duplication, add VXLAN bridge ports to the same
hardware domain as other bridge ports.

One complication is ARP suppression which requires the local VTEP to avoid
flooding ARP packets to remote VTEPs if the local VTEP is able to reply on
behalf of remote hosts. This is currently implemented by having the device
flood ARP packets in hardware and trapping them during VXLAN encapsulation,
but marking them with skb->offload_fwd_mark=1 so that the bridge will not
re-flood them to physical bridge ports.

The above scheme will break when VXLAN bridge ports are added to the same
hardware domain as physical bridge ports as ARP packets that cannot be
suppressed by the bridge will not be able to egress the VXLAN bridge ports
due to hardware domain filtering. This is solved by trapping ARP packets
when they enter the device and not marking them as being forwarded in
hardware.

Patch set overview:
Patch #1 sets hardware to trap ARP packets at layer 2
Patches #2-#4 are preparations for setting hardwarwe domain of VXLAN
Patch #5 sets hardware domain of VXLAN
Patch #6 extends VXLAN flood test to verify that this set solves the
packets duplication
====================

Link: https://patch.msgid.link/cover.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: vxlan_bridge: Test flood with unresolved FDB entry

Extend flood test to configure FDB entry with unresolved destination IP,
check that packets are not sent twice.

Without the previous patch which handles such scenario in mlxsw, the
tests fail:

$ TESTS='test_flood' ./vxlan_bridge_1d.sh
Running tests with UDP port 4789
TEST: VXLAN: flood                                                  [ OK ]
TEST: VXLAN: flood, unresolved FDB entry                            [FAIL]
        vx2 ns2: Expected to capture 10 packets, got 20.

$ TESTS='test_flood' ./vxlan_bridge_1q.sh
INFO: Running tests with UDP port 4789
TEST: VXLAN: flood vlan 10                                          [ OK ]
TEST: VXLAN: flood vlan 20                                          [ OK ]
TEST: VXLAN: flood vlan 10, unresolved FDB entry                    [FAIL]
        vx10 ns2: Expected to capture 10 packets, got 20.
TEST: VXLAN: flood vlan 20, unresolved FDB entry                    [FAIL]
        vx20 ns2: Expected to capture 10 packets, got 20.

With the previous patch, the tests pass.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/7bc96e317531f3bf06319fb2ea447bd8666f29fa.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: Add VXLAN bridge ports to same hardware domain as physical bridge ports

When hardware floods packets to bridge ports, but flooding to VXLAN bridge
port fails during encapsulation to one of the remote VTEPs, the packets are
trapped to CPU. In such case, the packets are marked with
skb->offload_fwd_mark, which means that packet was L2-forwarded in
hardware. Software data path repeats flooding, but packets which are
marked with skb->offload_fwd_mark will not be flooded by the bridge to
bridge ports which are in the same hardware domain as the ingress port.

Currently, mlxsw does not add VXLAN bridge ports to the same hardware
domain as physical bridge ports despite the fact that the device is able
to forward packets to and from VXLAN tunnels in hardware. In some scenarios
(as mentioned above) this can result in remote VTEPs receiving duplicate
packets. The packets are first flooded by hardware and after an
encapsulation failure, they are flooded again to all remote VTEPs by
software.

Solve this by adding VXLAN bridge ports to the same hardware domain as
physical bridge ports, so then nbp_switchdev_allowed_egress() will return
false also for VXLAN, and packets will not be sent twice from VXLAN device.

switchdev_bridge_port_offload() should get vxlan_dev not as const, so
some changes are required. Call switchdev API from
mlxsw_sp_bridge_vxlan_{join,leave}() which handle offload configurations.

Reported-by: Vladimir Oltean <olteanv@gmail.com>
Closes: https://lore.kernel.org/all/20250210152246.4ajumdchwhvbarik@skbuf/
Reported-by: Vladyslav Mykhaliuk <vmykhaliuk@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/7279056843140fae3a72c2d204c7886b79d03899.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_switchdev: Move mlxsw_sp_bridge_vxlan_join()

Next patch will call __mlxsw_sp_bridge_vxlan_leave() from
mlxsw_sp_bridge_vxlan_join() as part of error flow, move the function to
be able to call the second one.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/64750a0965536530482318578bada30fac372b8a.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_switchdev: Add an internal API for VXLAN leave

There is asymmetry in how the VXLAN join and leave functions are used.
The join function (mlxsw_sp_bridge_vxlan_join()) is only called in
response to netdev events (e.g., VXLAN device joining a bridge), but the
leave function is also called in response to switchdev events (e.g.,
VLAN configuration on top of the VXLAN device) in order to invalidate
VNI to FID mappings.

This asymmetry will cause problems when the functions will be later
extended to mark VXLAN bridge ports as offloaded or not.

Therefore, create an internal function (__mlxsw_sp_bridge_vxlan_leave())
that is used to invalidate VNI to FID mappings and call it from
mlxsw_sp_bridge_vxlan_leave() which will only be invoked in response to
netdev events, like mlxsw_sp_bridge_vxlan_join().

No functional changes intended.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/f3a32bd2d87a0b7ac4d2bb98a427dc6d95a01cd0.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum: Call mlxsw_sp_bridge_vxlan_{join, leave}() for VLAN-aware bridge

mlxsw_sp_bridge_vxlan_{join,leave}() are not called when a VXLAN device
joins or leaves a VLAN-aware bridge. As mentioned in the comment - when the
bridge is VLAN-aware, the VNI of the VXLAN device needs to be mapped to a
VLAN, but at this point no VLANs are configured on the VxLAN device. This
means that we can call the APIs, but there is no point to do that, as they
do not configure anything in such cases.

Next patch will extend mlxsw_sp_bridge_vxlan_{join,leave}() to set hardware
domain for VXLAN, this should be done also when a VXLAN device joins or
leaves a VLAN-aware bridge. Call the APIs, which for now do not do anything
in these flows.

Align the call to mlxsw_sp_bridge_vxlan_leave() to be called like
mlxsw_sp_bridge_vxlan_join(), only in case that the VXLAN device is up,
so move the check to be done before calling
mlxsw_sp_bridge_vxlan_{join,leave}(). This does not change the existing
behavior, as there is a similar check inside mlxsw_sp_bridge_vxlan_leave().

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/994c1ea93520f9ea55d1011cd47dc2180d526484.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: Trap ARP packets at layer 2 instead of layer 3

Next patch will set the same hardware domain for all bridge ports,
including VXLAN, to prevent packets from being forwarded by software when
they were already forwarded by hardware.

ARP packets are not flooded by hardware to VXLAN, so software should handle
such flooding. When hardware domain of VXLAN device will be changed, ARP
packets which are trapped and marked with offload_fwd_mark will not be
flooded to VXLAN also in software, which will break VXLAN traffic.

To prevent such breaking, trap ARP packets at layer 2 and don't mark them
as L2-forwarded in hardware, then flooding ARP packets will be done only
in software, and VXLAN will send ARP packets.

Remove NVE_ENCAP_ARP which is no longer needed, as now ARP packets are
trapped when they enter the device.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/b2a2cc607a1f4cb96c10bd3b0b0244ba3117fd2e.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: introduce per netns packet chains

Currently network taps unbound to any interface are linked in the
global ptype_all list, affecting the performance in all the network
namespaces.

Add per netns ptypes chains, so that in the mentioned case only
the netns owning the packet socket(s) is affected.

While at that drop the global ptype_all list: no in kernel user
registers a tap on "any" type without specifying either the target
device or the target namespace (and IMHO doing that would not make
any sense).

Note that this adds a conditional in the fast path (to check for
per netns ptype_specific list) and increases the dataset size by
a cacheline (owing the per netns lists).

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumaze@google.com>
Link: https://patch.msgid.link/ae405f98875ee87f8150c460ad162de7e466f8a7.1742494826.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tty: caif: removed unused function debugfs_tx()

Remove debugfs_tx() which was added when the caif driver was added in
commit 9b27105b4a44 ("net-caif-driver: add CAIF serial driver (ldisc)")
but it has never been used.

Flagged by LLVM 19.1.7 W=1 builds.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250320-caif-debugfs-tx-v1-1-be5654770088@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: Drop unused of_gpio.h

of_gpio.h is deprecated. Since there is no of_gpio_x API, drop
unused of_gpio.h. While at here, drop gpio.h and gpio/consumer.h if
no user in driver.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250320031542.3960381-1-peng.fan@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fixed_phy: transition to the faux device interface

The net fixed phy driver does not require the creation of a platform
device. Originally, this approach was chosen for simplicity when the
driver was first implemented.

With the introduction of the lightweight faux device interface, we now
have a more appropriate alternative. Migrate the device to utilize the
faux bus, given that the platform device it previously created was not
a real one anyway. This will get rid of the fake platform device.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://patch.msgid.link/20250319135209.2734594-1-sudeep.holla@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-cleanups-2025-03-19'

Tariq Toukan says:

====================
mlx5 cleanups 2025-03-19

This series contains small cleanups to the mlx5 core and Eth drivers.
====================

Link: https://patch.msgid.link/1742412199-159596-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Always select CONFIG_PAGE_POOL_STATS

Always set PAGE_POOL_STATS in mlx5 Eth driver.
Cleanup the corresponding #ifdefs.

Page pool stats are essential to monitor and analyze RX performance.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/1742412199-159596-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Use right API to free bitmap memory

Use bitmap_free() to free memory allocated with bitmap_zalloc_node().
This fixes memtrack error:
mtl rsc inconsistency: memtrack_free: .../drivers/net/ethernet/mellanox/mlx5/core/en_main.c::466: kfree for unknown address=0xFFFF0000CA3619E8, device=0x0

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742412199-159596-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Remove NULL check before dev_{put, hold}

Fix coccinelle warnings:
WARNING: NULL check before dev_{put, hold} functions is not needed.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742412199-159596-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phylink: Remove unused function pointer from phylink structure

From what I can tell the .get_fixed_state pointer in the phylink structure
hasn't been used since commit 5c05c1dbb177 ("net: phylink, dsa: eliminate
phylink_fixed_state_cb()") . Since I can't find any users for it we might
as well just drop the pointer.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/174240634772.1745174.5690351737682751849.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netpoll: Eliminate redundant assignment

The assignment of zero to udph->check is unnecessary as it is
immediately overwritten in the subsequent line. Remove the redundant
assignment.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250319-netpoll_nit-v1-1-a7faac5cbd92@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: Call xpcs_config_eee_mult_fact() only when xpcs is present

Some dwmac variants such as dwmac_socfpga don't use xpcs but lynx_pcs.

Don't call xpcs_config_eee_mult_fact() in this case, as this causes a
crash at init :

Unable to handle kernel NULL pointer dereference at virtual address 00000039 when write

[...]

Call trace:
  xpcs_config_eee_mult_fact from stmmac_pcs_setup+0x40/0x10c
  stmmac_pcs_setup from stmmac_dvr_probe+0xc0c/0x1244
  stmmac_dvr_probe from socfpga_dwmac_probe+0x130/0x1bc
  socfpga_dwmac_probe from platform_probe+0x5c/0xb0

Fixes: 060fb27060e8 ("net: stmmac: call xpcs_config_eee_mult_fact()")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250321103502.1303539-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: rss_ctx: Don't assume indirection table is present

The test_rss_context_dump() test assumes the indirection table is always
supported, which is not true for all drivers, e.g., virtio_net when
VIRTIO_NET_F_RSS is disabled.

Skip the check if 'indir' is not present.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250318112426.386651-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs/kcm: Fix typo "BFP"

'BFP' should be 'BPF'.

Signed-off-by: Ryohei Kinugawa <ryohei.kinugawa@gmail.com>
Link: https://patch.msgid.link/20250318095154.4187952-1-ryohei.kinugawa@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'r8169-enable-more-devices-aspm-support'

ChunHao Lin says:

====================
r8169: enable more devices ASPM support

This series of patches will enable more devices ASPM support.
It also fix a RTL8126 cannot enter L1 substate issue when ASPM is
enabled.
====================

Link: https://patch.msgid.link/20250318083721.4127-1-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: disable RTL8126 ZRX-DC timeout

Disable it due to it dose not meet ZRX-DC specification. If it is enabled,
device will exit L1 substate every 100ms. Disable it for saving more power
in L1 substate.

Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250318083721.4127-3-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: enable RTL8168H/RTL8168EP/RTL8168FP ASPM support

This patch will enable RTL8168H/RTL8168EP/RTL8168FP ASPM support on
the platforms that have tested with ASPM enabled.

Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250318083721.4127-2-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: networking: strparser: Fix a typo

The context indicates that 'than' is the correct word instead of 'then',
as a comparison is being performed.

Given that 'then' is also a valid English word, checkpatch.pl wouldn't
have picked up on this spelling error.

This typo was caught by AI during code review.

Suggested-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Link: https://patch.msgid.link/A43BEA49ED5CC6E5+20250318074656.644391-1-wangyuli@uniontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: fix the path of example code and example commands for device memory TCP

This updates the old path and fixes the description of unavailable options.

Signed-off-by: Yui Washizu <yui.washidu@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250318061251.775191-1-yui.washidu@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp/dccp: Remove inet_connection_sock_af_ops.addr2sockaddr().

inet_connection_sock_af_ops.addr2sockaddr() hasn't been used at all
in the git era.

$ git grep addr2sockaddr $(git rev-list HEAD | tail -n 1)

Let's remove it.

Note that there was a 4 bytes hole after sockaddr_len and now it's
6 bytes, so the binary layout is not changed.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250318060112.3729-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftest: net: update proc_net_pktgen (add more imix_weights test cases)

Add more imix_weights test cases (for incomplete input).

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317090401.1240704-2-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>