]> www.infradead.org Git - users/hch/misc.git/log
users/hch/misc.git
7 weeks agotc-tests: Update tc police action tests for tc buffer size rounding fixes.
Jonathan Lennox [Wed, 12 Mar 2025 17:48:04 +0000 (17:48 +0000)]
tc-tests: Update tc police action tests for tc buffer size rounding fixes.

Before tc's recent change to fix rounding errors, several tests which
specified a burst size of "1m" would translate back to being 1048574
bytes (2b less than 1Mb).  sprint_size prints this as "1024Kb".

With the tc fix, the burst size is instead correctly reported as
1048576 bytes (precisely 1Mb), which sprint_size prints as "1Mb".

This updates the expected output in the tests' matchPattern values
to accept either the old or the new output.

Signed-off-by: Jonathan Lennox <jonathan.lennox@8x8.com>
Link: https://patch.msgid.link/20250312174804.313107-1-jonathan.lennox@8x8.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: dwmac-rk: Provide FIFO sizes for DWMAC 1000
Chen-Yu Tsai [Wed, 12 Mar 2025 16:34:26 +0000 (00:34 +0800)]
net: stmmac: dwmac-rk: Provide FIFO sizes for DWMAC 1000

The DWMAC 1000 DMA capabilities register does not provide actual
FIFO sizes, nor does the driver really care. If they are not
provided via some other means, the driver will work fine, only
disallowing changing the MTU setting.

Provide the FIFO sizes through the driver's platform data to enable
MTU changes. The FIFO sizes are confirmed to be the same across RK3288,
RK3328, RK3399 and PX30, based on their respective manuals. It is
likely that Rockchip synthesized their DWMAC 1000 with the same
parameters on all their chips that have it.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250312163426.2178314-1-wens@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'net-mlx5-hw-steering-cleanups'
Paolo Abeni [Wed, 19 Mar 2025 17:17:17 +0000 (18:17 +0100)]
Merge branch 'net-mlx5-hw-steering-cleanups'

Tariq Toukan says:

====================
net/mlx5: HW Steering cleanups

This short series by Yevgeny contains several small HW Steering cleanups:

- Patch 1: removing unused FW commands
- Patch 2: using list_move() instead of list_del/add
- Patch 3: printing the unsupported combination of match fields
====================

Link: https://patch.msgid.link/1741780194-137519-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: HWS, log the unsupported mask in definer
Yevgeny Kliteynik [Wed, 12 Mar 2025 11:49:54 +0000 (13:49 +0200)]
net/mlx5: HWS, log the unsupported mask in definer

If a user requested to match on an unsupported combination of fields,
print the unsupported combination in the error message.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741780194-137519-4-git-send-email-tariqt@nvidia.com
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: HWS, use list_move() instead of del/add
Yevgeny Kliteynik [Wed, 12 Mar 2025 11:49:53 +0000 (13:49 +0200)]
net/mlx5: HWS, use list_move() instead of del/add

Wherever applicable, use list_move function instead of list_del + list_add.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741780194-137519-3-git-send-email-tariqt@nvidia.com
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: HWS, remove unused code for alias flow tables
Yevgeny Kliteynik [Wed, 12 Mar 2025 11:49:52 +0000 (13:49 +0200)]
net/mlx5: HWS, remove unused code for alias flow tables

Alias flow tables are not in use by HWS - remove the unused code.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741780194-137519-2-git-send-email-tariqt@nvidia.com
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'net-stmmac-deprecate-snps-en-tx-lpi-clockgating-property'
Paolo Abeni [Wed, 19 Mar 2025 17:06:48 +0000 (18:06 +0100)]
Merge branch 'net-stmmac-deprecate-snps-en-tx-lpi-clockgating-property'

Russell King says:

====================
net: stmmac: deprecate "snps,en-tx-lpi-clockgating" property

This series deprecates the "snps,en-tx-lpi-clockgating" property for
stmmac.

MII Transmit clock gating, where the MAC hardware supports gating this
clock, is a function of the connected PHY capabilities, which it
reports through its status register.

GMAC versions that support transmit clock gating twiddle the LPITCSE
bit accordingly in the LPI control/status register, which is handled
by the GMAC core specific code.

So, "snps,en-tx-lpi-clockgating" not something that is a GMAC property,
but is a work-around for phylib not providing an interface to determine
whether the PHY allows the transmit clock to be disabled.

This series converts the two SoCs that make use of this property (which,
I hasten to add, is set in the SoC code) to use the PHY capability bit
instead of a DT property, then removes the DT property from the .dtsi,
deprecates it in the snps,dwmac binding, and finally in the stmmac code.

I am expecting some discussion on how to merge this, as I think the
order in which these changes is made is important - we don't want to
deprecate the old way until the new code has landed.
====================

Link: https://patch.msgid.link/Z9FVHEf3uUqtKzyt@shell.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: deprecate "snps,en-tx-lpi-clockgating" property
Russell King (Oracle) [Wed, 12 Mar 2025 09:34:56 +0000 (09:34 +0000)]
net: stmmac: deprecate "snps,en-tx-lpi-clockgating" property

Whether the MII transmit clock can be stopped is primarily a property
of the PHY (there is a capability bit that should be checked first.)
Whether the MAC is capable of stopping the transmit clock is a separate
issue, but this is already handled by the core DesignWare MAC code.

Therefore, snps,en-tx-lpi-clockgating is technically incorrect, and
this commit adds a warning should a DT be encountered with the property
present.

However, we keep backwards compatibility.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/E1tsIUK-005vGk-H7@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agodt-bindings: deprecate "snps,en-tx-lpi-clockgating" property
Russell King (Oracle) [Wed, 12 Mar 2025 09:34:51 +0000 (09:34 +0000)]
dt-bindings: deprecate "snps,en-tx-lpi-clockgating" property

Whether the MII transmit clock can be stopped is primarily a property
of the PHY (there is a capability bit that should be checked first.)
Whether the MAC is capable of stopping the transmit clock is a separate
issue, but this is already handled by the core DesignWare MAC code.

Therefore, snps,en-tx-lpi-clockgating is technically incorrect, so this
commit deprecates the property in the binding.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIUF-005vGd-C5@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoARM: dts: stm32: remove "snps,en-tx-lpi-clockgating" property
Russell King (Oracle) [Wed, 12 Mar 2025 09:34:46 +0000 (09:34 +0000)]
ARM: dts: stm32: remove "snps,en-tx-lpi-clockgating" property

Whether the MII transmit clock can be stopped is primarily a property
of the PHY (there is a capability bit that should be checked first.)
Whether the MAC is capable of stopping the transmit clock is a separate
issue, but this is already handled by the core DesignWare MAC code.

As commit "net: stmmac: stm32: use PHY capability for TX clock stop"
adds the flag to use the PHY capability, remove the DT property that is
now unecessary.

Cc: Samin Guo <samin.guo@starfivetech.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIUA-005vGX-8A@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoriscv: dts: starfive: remove "snps,en-tx-lpi-clockgating" property
Russell King (Oracle) [Wed, 12 Mar 2025 09:34:41 +0000 (09:34 +0000)]
riscv: dts: starfive: remove "snps,en-tx-lpi-clockgating" property

Whether the MII transmit clock can be stopped is primarily a property
of the PHY (there is a capability bit that should be checked first.)
Whether the MAC is capable of stopping the transmit clock is a separate
issue, but this is already handled by the core DesignWare MAC code.

As commit "net: stmmac: starfive: use PHY capability for TX clock stop"
adds the flag to use the PHY capability, remove the DT property that is
now unecessary.

Cc: Samin Guo <samin.guo@starfivetech.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIU5-005vGR-4c@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: stm32: use PHY capability for TX clock stop
Russell King (Oracle) [Wed, 12 Mar 2025 09:34:36 +0000 (09:34 +0000)]
net: stmmac: stm32: use PHY capability for TX clock stop

Whether the MII transmit clock can be stopped is primarily a property
of the PHY (there is a capability bit that should be checked first.)
Whether the MAC is capable of stopping the transmit clock is a separate
issue, but this is already handled by the core DesignWare MAC code.

Add the flag to allow the stmmac core to use the PHY capability.

Cc: Christophe Roullier <christophe.roullier@st.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIU0-005vGL-17@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: starfive: use PHY capability for TX clock stop
Russell King (Oracle) [Wed, 12 Mar 2025 09:34:30 +0000 (09:34 +0000)]
net: stmmac: starfive: use PHY capability for TX clock stop

Whether the MII transmit clock can be stopped is primarily a property
of the PHY (there is a capability bit that should be checked first.)
Whether the MAC is capable of stopping the transmit clock is a separate
issue, but this is already handled by the core DesignWare MAC code.

Add the flag to allow the stmmac core to use the PHY capability.

Cc: Samin Guo <samin.guo@starfivetech.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsITu-005vGF-TM@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: allow platforms to use PHY tx clock stop capability
Russell King (Oracle) [Wed, 12 Mar 2025 09:34:25 +0000 (09:34 +0000)]
net: stmmac: allow platforms to use PHY tx clock stop capability

Allow platform glue to instruct stmmac to make use of the PHY transmit
clock stop capability when deciding whether to allow the transmit clock
from the DWMAC core to be stopped.

Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsITp-005vG9-Px@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge tag 'ieee802154-for-net-next-2025-03-10' of git://git.kernel.org/pub/scm/linux...
Paolo Abeni [Wed, 19 Mar 2025 16:46:33 +0000 (17:46 +0100)]
Merge tag 'ieee802154-for-net-next-2025-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next

Stefan Schmidt says:

====================
pull-request: ieee802154-next 2025-03-10

An update from ieee802154 for your *net-next* tree:

Andy Shevchenko reworked the ca8210 driver to use the gpiod API and fixed
a few problems of the driver along the way.

* tag 'ieee802154-for-net-next-2025-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next:
  dt-bindings: ieee802154: ca8210: Update polarity of the reset pin
  ieee802154: ca8210: Switch to using gpiod API
  ieee802154: ca8210: Get platform data via dev_get_platdata()
  ieee802154: ca8210: Use proper setters and getters for bitwise types
====================

Link: https://patch.msgid.link/20250310185752.2683890-1-stefan@datenfreihafen.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'net-stmmac-remove-unnecessary-of_get_phy_mode-calls'
Paolo Abeni [Tue, 18 Mar 2025 14:53:18 +0000 (15:53 +0100)]
Merge branch 'net-stmmac-remove-unnecessary-of_get_phy_mode-calls'

Russell King says:

====================
net: stmmac: remove unnecessary of_get_phy_mode() calls

This series removes unnecessary of_get_phy_mode() calls from the stmmac
glue drivers. stmmac_probe_config_dt() / devm_stmmac_probe_config_dt()
already gets the interface mode using device_get_phy_mode() and stores
it in plat_dat->phy_interface.

Therefore, glue drivers using of_get_phy_mode() are just duplicating
the work that has already been done.

This series adjusts the glue drivers to remove their usage of
of_get_phy_mode().
====================

Link: https://patch.msgid.link/Z9FQjQZb0IMaQJ9H@shell.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: sunxi: remove of_get_phy_mode()
Russell King (Oracle) [Wed, 12 Mar 2025 09:21:07 +0000 (09:21 +0000)]
net: stmmac: sunxi: remove of_get_phy_mode()

devm_stmmac_probe_config_dt() already gets the PHY mode from firmware,
which is stored in plat_dat->phy_interface. Therefore, we don't need to
get it in platform code.

Set gmac->interface from plat_dat->phy_interface.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIGx-005v0F-Ev@rmk-PC.armlinux.org.uk
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: sun8i: remove of_get_phy_mode()
Russell King (Oracle) [Wed, 12 Mar 2025 09:21:02 +0000 (09:21 +0000)]
net: stmmac: sun8i: remove of_get_phy_mode()

devm_stmmac_probe_config_dt() already gets the PHY mode from firmware,
which is stored in plat_dat->phy_interface. Therefore, we don't need to
get it in platform code.

sun8i was using of_get_phy_mode() to set plat_dat->mac_interface, which
defaults to plat_dat->phy_interface when the mac-mode DT property is
not present. As nothing in arch/*/boot/dts sets the mac-mode property,
it is highly likely that these two will be identical, and thus there
is no need for this glue driver to set plat_dat->mac_interface.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIGs-005v09-CD@rmk-PC.armlinux.org.uk
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: sti: remove of_get_phy_mode()
Russell King (Oracle) [Wed, 12 Mar 2025 09:20:57 +0000 (09:20 +0000)]
net: stmmac: sti: remove of_get_phy_mode()

devm_stmmac_probe_config_dt() already gets the PHY mode from firmware,
which is stored in plat_dat->phy_interface. Therefore, we don't need to
get it in platform code.

Pass plat_dat into sti_dwmac_parse_data(), and set dwmac->interface
from plat_dat->phy_interface.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIGn-005v02-7G@rmk-PC.armlinux.org.uk
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: rk: remove of_get_phy_mode()
Russell King (Oracle) [Wed, 12 Mar 2025 09:20:52 +0000 (09:20 +0000)]
net: stmmac: rk: remove of_get_phy_mode()

devm_stmmac_probe_config_dt() already gets the PHY mode from firmware,
which is stored in plat_dat->phy_interface. Therefore, we don't need to
get it in platform code.

Set bsp_priv->phy_iface from plat->phy_interface.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIGi-005uzx-3p@rmk-PC.armlinux.org.uk
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: meson8b: remove of_get_phy_mode()
Russell King (Oracle) [Wed, 12 Mar 2025 09:20:47 +0000 (09:20 +0000)]
net: stmmac: meson8b: remove of_get_phy_mode()

devm_stmmac_probe_config_dt() already gets the PHY mode from firmware,
which is stored in plat_dat->phy_interface. Therefore, we don't need to
get it in platform code.

Set dwmac->phy_mode from plat_dat->phy_interface.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIGd-005uzr-0C@rmk-PC.armlinux.org.uk
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: ipq806x: remove of_get_phy_mode()
Russell King (Oracle) [Wed, 12 Mar 2025 09:20:41 +0000 (09:20 +0000)]
net: stmmac: ipq806x: remove of_get_phy_mode()

devm_stmmac_probe_config_dt() already gets the PHY mode from firmware,
which is stored in plat_dat->phy_interface. Therefore, we don't need to
get it in platform code.

Pass plat_dat into ipq806x_gmac_of_parse(), and set gmac->phy_mode from
plat_dat->phy_interface.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIGX-005uzl-TQ@rmk-PC.armlinux.org.uk
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: anarion: remove of_get_phy_mode()
Russell King (Oracle) [Wed, 12 Mar 2025 09:20:36 +0000 (09:20 +0000)]
net: stmmac: anarion: remove of_get_phy_mode()

devm_stmmac_probe_config_dt() already gets the PHY mode from firmware,
which is stored in plat_dat->phy_interface. Therefore, we don't need to
get it in platform code.

Rearrange the initialisation order so we can pass plat_dat into
anarion_config_dt(), thereby providing plat_dat->phy_interface as
necessary there.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIGS-005uzf-QE@rmk-PC.armlinux.org.uk
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: mediatek: remove of_get_phy_mode()
Russell King (Oracle) [Wed, 12 Mar 2025 09:20:31 +0000 (09:20 +0000)]
net: stmmac: mediatek: remove of_get_phy_mode()

devm_stmmac_probe_config_dt() already gets the PHY mode from firmware,
which is stored in plat_dat->phy_interface. Therefore, we don't need to
get it in platform code.

Initialise priv_plat->phy_mode from plat->phy_interface
inmediatek_dwmac_common_data().

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIGN-005uzZ-NG@rmk-PC.armlinux.org.uk
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: qcom-ethqos: remove of_get_phy_mode()
Russell King (Oracle) [Wed, 12 Mar 2025 09:20:26 +0000 (09:20 +0000)]
net: stmmac: qcom-ethqos: remove of_get_phy_mode()

devm_stmmac_probe_config_dt() already gets the PHY mode from firmware,
which is stored in plat_dat->phy_interface. Therefore, we don't need to
get it a second time in qcom_ethqos_probe(). Use
plat_dat->phy_interface to initialise ethqos->phy_mode.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tsIGI-005uzT-KB@rmk-PC.armlinux.org.uk
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agotcp: cache RTAX_QUICKACK metric in a hot cache line
Eric Dumazet [Wed, 12 Mar 2025 08:39:07 +0000 (08:39 +0000)]
tcp: cache RTAX_QUICKACK metric in a hot cache line

tcp_in_quickack_mode() is called from input path for small packets.

It calls __sk_dst_get() which reads sk->sk_dst_cache which has been
put in sock_read_tx group (for good reasons).

Then dst_metric(dst, RTAX_QUICKACK) also needs extra cache line misses.

Cache RTAX_QUICKACK in icsk->icsk_ack.dst_quick_ack to no longer pull
these cache lines for the cases a delayed ACK is scheduled.

After this patch TCP receive path does not longer access sock_read_tx
group.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250312083907.1931644-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'inet-frags-fully-use-rcu'
Paolo Abeni [Tue, 18 Mar 2025 12:18:37 +0000 (13:18 +0100)]
Merge branch 'inet-frags-fully-use-rcu'

Eric Dumazet says:

====================
inet: frags: fully use RCU

While inet reassembly uses RCU, it is acquiring/releasing
a refcount on struct inet_frag_queue in fast path,
for no good reason.

This was mentioned in one patch changelog seven years ago :/

This series is removing these refcount changes, by extending
RCU sections.
====================

Link: https://patch.msgid.link/20250312082250.1803501-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoinet: frags: save a pair of atomic operations in reassembly
Eric Dumazet [Wed, 12 Mar 2025 08:22:50 +0000 (08:22 +0000)]
inet: frags: save a pair of atomic operations in reassembly

As mentioned in commit 648700f76b03 ("inet: frags:
use rhashtables for reassembly units"):

  A followup patch will even remove the refcount hold/release
  left from prior implementation and save a couple of atomic
  operations.

This patch implements this idea, seven years later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250312082250.1803501-5-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoinet: frags: change inet_frag_kill() to defer refcount updates
Eric Dumazet [Wed, 12 Mar 2025 08:22:49 +0000 (08:22 +0000)]
inet: frags: change inet_frag_kill() to defer refcount updates

In the following patch, we no longer assume inet_frag_kill()
callers own a reference.

Consuming two refcounts from inet_frag_kill() would lead in UAF.

Propagate the pointer to the refs that will be consumed later
by the final inet_frag_putn() call.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250312082250.1803501-4-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv4: frags: remove ipq_put()
Eric Dumazet [Wed, 12 Mar 2025 08:22:48 +0000 (08:22 +0000)]
ipv4: frags: remove ipq_put()

Replace ipq_put() with inet_frag_putn()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250312082250.1803501-3-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoinet: frags: add inet_frag_putn() helper
Eric Dumazet [Wed, 12 Mar 2025 08:22:47 +0000 (08:22 +0000)]
inet: frags: add inet_frag_putn() helper

inet_frag_putn() can release multiple references
in one step.

Use it in inet_frags_free_cb().

Replace inet_frag_put(X) with inet_frag_putn(X, 1)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250312082250.1803501-2-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: skbuff: Remove unused skb_add_data()
Yue Haibing [Wed, 12 Mar 2025 06:34:50 +0000 (14:34 +0800)]
net: skbuff: Remove unused skb_add_data()

Since commit a4ea4c477619 ("rxrpc: Don't use a ring buffer for call Tx
queue") this function is not used anymore.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250312063450.183652-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge tag 'linux-can-next-for-6.15-20250314' of git://git.kernel.org/pub/scm/linux...
Paolo Abeni [Tue, 18 Mar 2025 11:39:57 +0000 (12:39 +0100)]
Merge tag 'linux-can-next-for-6.15-20250314' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2025-03-14

this is a pull request of 4 patches for net-next/main.

In the first 2 patches by Dimitri Fedrau add CAN transceiver support
to the flexcan driver.

Frank Li's patch adds i.MX94 support to the flexcan device tree
bindings.

The last patch is by Davide Caratti and adds protocol counter for
AF_CAN sockets.

linux-can-next-for-6.15-20250314

* tag 'linux-can-next-for-6.15-20250314' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: add protocol counter for AF_CAN sockets
  dt-bindings: can: fsl,flexcan: add i.MX94 support
  can: flexcan: add transceiver capabilities
  dt-bindings: can: fsl,flexcan: add transceiver capabilities
====================

Link: https://patch.msgid.link/20250314132327.2905693-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge
Paolo Abeni [Tue, 18 Mar 2025 11:10:20 +0000 (12:10 +0100)]
Merge tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - drop batadv_priv_debug_log struct, by Sven Eckelmann

 - adopt netdev_hold() / netdev_put(), by Eric Dumazet

 - add support for jumbo frames, by Sven Eckelmann

 - use consistent name for mesh interface, by Sven Eckelmann

 - cleanup B.A.T.M.A.N. IV OGM aggregation handling,
   by Sven Eckelmann (4 patches)

 - add missing newlines for log macros, by Sven Eckelmann

* tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge:
  batman-adv: add missing newlines for log macros
  batman-adv: Limit aggregation size to outgoing MTU
  batman-adv: Use actual packet count for aggregated packets
  batman-adv: Switch to bitmap helper for aggregation handling
  batman-adv: Limit number of aggregated packets directly
  batman-adv: Use consistent name for mesh interface
  batman-adv: Add support for jumbo frames
  batman-adv: adopt netdev_hold() / netdev_put()
  batman-adv: Drop batadv_priv_debug_log struct
  batman-adv: Start new development cycle
====================

Link: https://patch.msgid.link/20250313164519.72808-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'udp_tunnel-gro-optimizations'
Paolo Abeni [Tue, 18 Mar 2025 10:40:33 +0000 (11:40 +0100)]
Merge branch 'udp_tunnel-gro-optimizations'

Paolo Abeni says:

====================
udp_tunnel: GRO optimizations

The UDP tunnel GRO stage is source of measurable overhead for workload
based on UDP-encapsulated traffic: each incoming packets requires a full
UDP socket lookup and an indirect call.

In the most common setups a single UDP tunnel device is used. In such
case we can optimize both the lookup and the indirect call.

Patch 1 tracks per netns the active UDP tunnels and replaces the socket
lookup with a single destination port comparison when possible.

Patch 2 tracks the different types of UDP tunnels and replaces the
indirect call with a static one when there is a single UDP tunnel type
active.

I measure ~5% performance improvement in TCP over UDP tunnel stream
tests on top of this series.

v3: https://lore.kernel.org/netdev/cover.1741632298.git.pabeni@redhat.com/
v2: https://lore.kernel.org/netdev/cover.1741338765.git.pabeni@redhat.com/
v1: https://lore.kernel.org/netdev/cover.1741275846.git.pabeni@redhat.com/
====================

Link: https://patch.msgid.link/cover.1741718157.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoudp_tunnel: use static call for GRO hooks when possible
Paolo Abeni [Tue, 11 Mar 2025 20:42:29 +0000 (21:42 +0100)]
udp_tunnel: use static call for GRO hooks when possible

It's quite common to have a single UDP tunnel type active in the
whole system. In such a case we can replace the indirect call for
the UDP tunnel GRO callback with a static call.

Add the related accounting in the control path and switch to static
call when possible. To keep the code simple use a static array for
the registered tunnel types, and size such array based on the kernel
config.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/6fd1f9c7651151493ecab174e7b8386a1534170d.1741718157.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoudp_tunnel: create a fastpath GRO lookup.
Paolo Abeni [Tue, 11 Mar 2025 20:42:28 +0000 (21:42 +0100)]
udp_tunnel: create a fastpath GRO lookup.

Most UDP tunnels bind a socket to a local port, with ANY address, no
peer and no interface index specified.
Additionally it's quite common to have a single tunnel device per
namespace.

Track in each namespace the UDP tunnel socket respecting the above.
When only a single one is present, store a reference in the netns.

When such reference is not NULL, UDP tunnel GRO lookup just need to
match the incoming packet destination port vs the socket local port.

The tunnel socket never sets the reuse[port] flag[s]. When bound to no
address and interface, no other socket can exist in the same netns
matching the specified local port.

Matching packets with non-local destination addresses will be
aggregated, and eventually segmented as needed - no behavior changes
intended.

Note that the UDP tunnel socket reference is stored into struct
netns_ipv4 for both IPv4 and IPv6 tunnels. That is intentional to keep
all the fastpath-related netns fields in the same struct and allow
cacheline-based optimization. Currently both the IPv4 and IPv6 socket
pointer share the same cacheline as the `udp_table` field.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/4d5c319c4471161829f50cb8436841de81a5edae.1741718157.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoqed: remove cast to pointers passed to kfree
Chen Ni [Tue, 11 Mar 2025 07:06:24 +0000 (15:06 +0800)]
qed: remove cast to pointers passed to kfree

Remove unnecessary casts to pointer types passed to kfree.
Issue detected by coccinelle:
@@
type t1;
expression *e;
@@

-kfree((t1 *)e);
+kfree(e);

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20250311070624.1037787-1-nichen@iscas.ac.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'mlx5-support-setting-a-parent-for-a-devlink-rate-node'
Paolo Abeni [Tue, 18 Mar 2025 09:37:19 +0000 (10:37 +0100)]
Merge branch 'mlx5-support-setting-a-parent-for-a-devlink-rate-node'

Tariq Toukan says:

====================
mlx5: Support setting a parent for a devlink rate node

This series by Carolina adds mlx5 support for the setting of a parent to
devlink rate nodes.

By introducing a hierarchical level to scheduling nodes, these changes
allow for more granular control over bandwidth allocation and isolation
of Virtual Functions.

Function renaming for parent setting on leafs:
- net/mlx5: Rename devlink rate parent set function for leaf nodes

Add support for hierarchy level tracking:
- net/mlx5: Introduce hierarchy level tracking on scheduling nodes
- net/mlx5: Preserve rate settings when creating a rate node

Support setting parent for rate nodes:
- net/mlx5: Add support for setting parent of nodes
====================

Link: https://patch.msgid.link/1741642016-44918-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: Add support for setting parent of nodes
Carolina Jubran [Mon, 10 Mar 2025 21:26:56 +0000 (23:26 +0200)]
net/mlx5: Add support for setting parent of nodes

Introduce `mlx5_esw_devlink_rate_node_parent_set()` to allow assigning
a parent to scheduling nodes.
Implement `mlx5_esw_qos_node_update_parent()` and
`mlx5_esw_qos_node_validate_set_parent()` to enforce constraints on
node reassignment.

Don't allow reassignment of nodes with active rate objects.

Update `esw_qos_node_set_parent()` to handle cases where
the parent is NULL. A NULL parent indicates that the scheduling element
is attached to the root scheduling element, and since only rate nodes
can be connected to the root, this update is now necessary.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741642016-44918-5-git-send-email-tariqt@nvidia.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: Preserve rate settings when creating a rate node
Carolina Jubran [Mon, 10 Mar 2025 21:26:55 +0000 (23:26 +0200)]
net/mlx5: Preserve rate settings when creating a rate node

Modify `esw_qos_create_node_sched_elem()` to receive max_rate and
bw_share values while maintaining the previous configuration.

This change is essential for the upcoming patch that will modify rate
nodes and requires the existing settings to be preserved unless
explicitly changed.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741642016-44918-4-git-send-email-tariqt@nvidia.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: Introduce hierarchy level tracking on scheduling nodes
Carolina Jubran [Mon, 10 Mar 2025 21:26:54 +0000 (23:26 +0200)]
net/mlx5: Introduce hierarchy level tracking on scheduling nodes

Add a `level` field to `mlx5_esw_sched_node` to track the hierarchy
depth of each scheduling node. This allows enforcement of the
scheduling depth constraints based on `log_esw_max_sched_depth`.

Modify `esw_qos_node_set_parent()` and `__esw_qos_alloc_node()` to
correctly assign hierarchy levels. Ensure that nodes inherit their
parent’s level incrementally.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741642016-44918-3-git-send-email-tariqt@nvidia.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: Rename devlink rate parent set function for leaf nodes
Carolina Jubran [Mon, 10 Mar 2025 21:26:53 +0000 (23:26 +0200)]
net/mlx5: Rename devlink rate parent set function for leaf nodes

Rename `mlx5_esw_devlink_rate_parent_set()` to
`mlx5_esw_devlink_rate_leaf_parent_set()` to distinguish setting a
parent for leafs from nodes, which is not yet supported.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741642016-44918-2-git-send-email-tariqt@nvidia.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'bnxt_en-driver-update'
Paolo Abeni [Tue, 18 Mar 2025 09:25:24 +0000 (10:25 +0100)]
Merge branch 'bnxt_en-driver-update'

Michael Chan says:

====================
bnxt_en: Driver update

This patchset contains these updates to the driver:

1. New ethtool coredump type for FW to include cached context for live dump.
2. Support ENABLE_ROCE devlink generic parameter.
3. Support capability change flag from FW.
4. FW interface update.
5. Support .set_module_eeprom_by_page().
====================

Link: https://patch.msgid.link/20250310183129.3154117-1-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: add .set_module_eeprom_by_page() support
Damodharam Ammepalli [Mon, 10 Mar 2025 18:31:29 +0000 (11:31 -0700)]
bnxt_en: add .set_module_eeprom_by_page() support

Add support for .set_module_eeprom_by_page() callback
which implements generic solution for modules eeprom access.
This implementation also supports CMIS 5.0.3 compliant
eeprom FW download.

Sample Usage:
ethtool --flash-module-firmware enp177s0np0 file dummy.bin

Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250310183129.3154117-8-michael.chan@broadcom.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Refactor bnxt_get_module_eeprom_by_page()
Michael Chan [Mon, 10 Mar 2025 18:31:28 +0000 (11:31 -0700)]
bnxt_en: Refactor bnxt_get_module_eeprom_by_page()

In preparation for adding .set_module_eeprom_by_page(), extract the
common error checking done in bnxt_get_module_eeprom_by_page() into
a new common function that can be re-used for
.set_module_eeprom_by_page().

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250310183129.3154117-7-michael.chan@broadcom.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Update firmware interface to 1.10.3.97
Michael Chan [Mon, 10 Mar 2025 18:31:27 +0000 (11:31 -0700)]
bnxt_en: Update firmware interface to 1.10.3.97

The main changes are adding i2c write for module eeprom and a new v2
PCIe statistics structure.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250310183129.3154117-6-michael.chan@broadcom.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Query FW parameters when the CAPS_CHANGE bit is set
shantiprasad shettar [Mon, 10 Mar 2025 18:31:26 +0000 (11:31 -0700)]
bnxt_en: Query FW parameters when the CAPS_CHANGE bit is set

Newer FW can set the CAPS_CHANGE flag during ifup if some capabilities
or configurations have changed.  For example, the CoS queue
configurations may have changed.  Support this new flag by treating it
almost like FW reset.  The driver will essentially rediscover all
features and capabilities, reconfigure all backing store context memory,
reset everything to default, and reserve all resources.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: shantiprasad shettar <shantiprasad.shettar@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250310183129.3154117-5-michael.chan@broadcom.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Add devlink support for ENABLE_ROCE nvm parameter
Pavan Chebbi [Mon, 10 Mar 2025 18:31:25 +0000 (11:31 -0700)]
bnxt_en: Add devlink support for ENABLE_ROCE nvm parameter

Add set/show support for the ENABLE_ROCE NVM parameter to
enable/disable RoCE for a PF.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Co-developed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250310183129.3154117-4-michael.chan@broadcom.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Refactor bnxt_hwrm_nvm_req()
Michael Chan [Mon, 10 Mar 2025 18:31:24 +0000 (11:31 -0700)]
bnxt_en: Refactor bnxt_hwrm_nvm_req()

bnxt_hwrm_nvm_req() first searches the nvm_params[] array for the
NVM parameter to set or get.  The array entry contains all the
NVM information about that parameter.  The information is then used
to send the FW message to set or get the parameter.

Refactor it to only do the array search in bnxt_hwrm_nvm_req() and
pass the array entry to the new function __bnxt_hwrm_nvm_req() to
send the FW message.  The next patch will be able to use the new
function.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250310183129.3154117-3-michael.chan@broadcom.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agobnxt_en: Add support for a new ethtool dump flag 3
Vasuthevan Maheswaran [Mon, 10 Mar 2025 18:31:23 +0000 (11:31 -0700)]
bnxt_en: Add support for a new ethtool dump flag 3

When doing a live coredump with ethtool -w, the context data cached
in the NIC is not dumped by the FW by default.  The reason is that
retrieving this cached context data with traffic running may cause
problems.  Add a new dump flag 3 to allow the option to include this
cached context data which may be useful in some debug scenarios.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Vasuthevan Maheswaran <vasuthevan.maheswaran@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250310183129.3154117-2-michael.chan@broadcom.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'intel-wired-lan-driver-updates-2025-03-10-ice-ixgbe'
Paolo Abeni [Tue, 18 Mar 2025 09:15:52 +0000 (10:15 +0100)]
Merge branch 'intel-wired-lan-driver-updates-2025-03-10-ice-ixgbe'

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-03-10 (ice, ixgbe)

For ice:

Paul adds generic checksum support for E830 devices.

Karol refactors PTP code related to E825C; simplifying PHY register info
struct, utilizing GENMASK, removing unused defines, etc.

For ixgbe:

Piotr adds PTP support for E610 devices.

Jedrzej adds reporting when overheating is detected on E610 devices.

The following are changes since commit 8ef890df4031121a94407c84659125cbccd3fdbe:
  net: move misc netdev_lock flavors to a separate header
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue 100GbE
====================

Link: https://patch.msgid.link/20250310174502.3708121-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoixgbe: add support for thermal sensor event reception
Jedrzej Jagielski [Mon, 10 Mar 2025 17:44:59 +0000 (10:44 -0700)]
ixgbe: add support for thermal sensor event reception

E610 NICs unlike the previous devices utilising ixgbe driver
are notified in the case of overheating by the FW ACI event.

In event of overheat when threshold is exceeded, FW suspends all
traffic and sends overtemp event to the driver. Then driver
logs appropriate message and disables the adapter instance.
The card remains in that state until the platform is rebooted.

This approach is a solution to the fact current version of the
E610 FW doesn't support reading thermal sensor data by the
SW. So give to user at least any info that overtemp event
has occurred, without interface disappearing from the OS
without any note.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Jeremiah Lokan <jeremiahx.j.lokan@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250310174502.3708121-7-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoixgbe: add PTP support for E610 device
Piotr Kwapulinski [Mon, 10 Mar 2025 17:44:58 +0000 (10:44 -0700)]
ixgbe: add PTP support for E610 device

Add PTP support for E610 adapter. The E610 is based on X550 and adds
firmware managed link, enhanced security capabilities and support for
updated server manageability. It does not introduce any new PTP features
compared to X550.

Reviewed-by: Milena Olech <milena.olech@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250310174502.3708121-6-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoice: E825C PHY register cleanup
Karol Kolacinski [Mon, 10 Mar 2025 17:44:57 +0000 (10:44 -0700)]
ice: E825C PHY register cleanup

Minor PTP register refactor, including logical grouping E825C 1-step
timestamping registers. Remove unused register definitions
(PHY_REG_GPCS_BITSLIP, PHY_REG_REVISION).
Also, apply preferred GENMASK macro (instead of ICE_M) for register
fields definition affected by this patch.

Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250310174502.3708121-5-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoice: Refactor E825C PHY registers info struct
Karol Kolacinski [Mon, 10 Mar 2025 17:44:56 +0000 (10:44 -0700)]
ice: Refactor E825C PHY registers info struct

Simplify ice_phy_reg_info_eth56g struct definition to include base
address for the very first quad. Use base address info and 'step'
value to determine address for specific PHY quad.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250310174502.3708121-4-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoice: rename ice_ptp_init_phc_eth56g function
Karol Kolacinski [Mon, 10 Mar 2025 17:44:55 +0000 (10:44 -0700)]
ice: rename ice_ptp_init_phc_eth56g function

Refactor the code by changing ice_ptp_init_phc_eth56g function
name to ice_ptp_init_phc_e825, to be consistent with the naming pattern
for other devices.

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250310174502.3708121-3-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoice: Add E830 checksum offload support
Paul Greenwalt [Mon, 10 Mar 2025 17:44:54 +0000 (10:44 -0700)]
ice: Add E830 checksum offload support

E830 supports raw receive and generic transmit checksum offloads.

Raw receive checksum support is provided by hardware calculating the
checksum over the whole packet, regardless of type. The calculated
checksum is provided to driver in the Rx flex descriptor. Then the driver
assigns the checksum to skb->csum and sets skb->ip_summed to
CHECKSUM_COMPLETE.

Generic transmit checksum support is provided by hardware calculating the
checksum given two offsets: the start offset to begin checksum calculation,
and the offset to insert the calculated checksum in the packet. Support is
advertised to the stack using NETIF_F_HW_CSUM feature.

E830 has the following limitations when both generic transmit checksum
offload and TCP Segmentation Offload (TSO) are enabled:

1. Inner packet header modification is not supported. This restriction
   includes the inability to alter TCP flags, such as the push flag. As a
   result, this limitation can impact the receiver's ability to coalesce
   packets, potentially degrading network throughput.
2. The Maximum Segment Size (MSS) is limited to 1023 bytes, which prevents
   support of Maximum Transmission Unit (MTU) greater than 1063 bytes.

Therefore NETIF_F_HW_CSUM and NETIF_F_ALL_TSO features are mutually
exclusive. NETIF_F_HW_CSUM hardware feature support is indicated but is not
enabled by default. Instead, IP checksums and NETIF_F_ALL_TSO are the
defaults. Enforcement of mutual exclusivity of NETIF_F_HW_CSUM and
NETIF_F_ALL_TSO is done in ice_set_features(). Mutual exclusivity
of IP checksums and NETIF_F_HW_CSUM is handled by netdev_fix_features().

When NETIF_F_HW_CSUM is requested the provided skb->csum_start and
skb->csum_offset are passed to hardware in the Tx context descriptor
generic checksum (GCS) parameters. Hardware calculates the 1's complement
from skb->csum_start to the end of the packet, and inserts the result in
the packet at skb->csum_offset.

Co-developed-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Co-developed-by: Eric Joyner <eric.joyner@intel.com>
Signed-off-by: Eric Joyner <eric.joyner@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250310174502.3708121-2-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'net-phy-rework-linkmodes-handling-in-a-dedicated-file'
Paolo Abeni [Tue, 18 Mar 2025 08:03:17 +0000 (09:03 +0100)]
Merge branch 'net-phy-rework-linkmodes-handling-in-a-dedicated-file'

Maxime Chevallier says:

====================
net: phy: Rework linkmodes handling in a dedicated file

This is V5 of the phy_caps series. In a nutshell, this series reworks the way
we maintain the list of speed/duplex capablities for each linkmode so that we
no longer have multiple definition of these associations.

That will help making sure that when people add new linkmodes in
include/uapi/linux/ethtool.h, they don't have to update phylib and phylink as
well, making the process more straightforward and less error-prone.

It also generalises the phy_caps interface to be able to lookup linkmodes
from phy_interface_t, which is needed for the multi-port work I've been working
on for a while.

This V5 addresse Russell's and Paolo's reviews, namely :

 - Error out when encountering an unknown SPEED_XXX setting

   It prints an error and fails to initialize phylib. I've tested by
   introducing a dummy 1.6T speed, I guess it's only a matter of time
   before that actually happens :)

 - Deal more gracefully with the fixed-link settings, keeping some level of
   compatibility with what we had before by making sure we report a
   single BaseT mode like before.

V1 : https://lore.kernel.org/netdev/20250222142727.894124-1-maxime.chevallier@bootlin.com/
V2 : https://lore.kernel.org/netdev/20250226100929.1646454-1-maxime.chevallier@bootlin.com/
V3 : https://lore.kernel.org/netdev/20250228145540.2209551-1-maxime.chevallier@bootlin.com/
V4 : https://lore.kernel.org/netdev/20250303090321.805785-1-maxime.chevallier@bootlin.com/
====================

Link: https://patch.msgid.link/20250307173611.129125-1-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phylink: Use phy_caps to get an interface's capabilities and modes
Maxime Chevallier [Fri, 7 Mar 2025 17:36:10 +0000 (18:36 +0100)]
net: phylink: Use phy_caps to get an interface's capabilities and modes

Phylink has internal code to get the MAC capabilities of a given PHY
interface (what are the supported speed and duplex).

Extract that into phy_caps, but use the link_capa for conversion. Add an
internal phylink helper for the link caps -> mac caps conversion, and
use this in phylink_caps_to_linkmodes().

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-14-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phylink: Convert capabilities to linkmodes using phy_caps
Maxime Chevallier [Fri, 7 Mar 2025 17:36:09 +0000 (18:36 +0100)]
net: phylink: Convert capabilities to linkmodes using phy_caps

phylink_caps_to_linkmodes() is used to derive a list of linkmodes that
can be conceivably exposed using a given set of speeds and duplex
through phylink's MAC capabilities.

This list can be derived from the link_caps array in phy_caps, provided
we convert the MAC capabilities into a LINK_CAPA bitmask first.

Introduce an internal phylink helper phylink_caps_to_link_caps() to
convert from MAC capabilities into phy_caps, then  phy_caps_linkmodes()
to do the link_caps -> linkmodes conversion.

This avoids having to update phylink for every new linkmode.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-13-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phylink: Add a mapping between MAC_CAPS and LINK_CAPS
Maxime Chevallier [Fri, 7 Mar 2025 17:36:08 +0000 (18:36 +0100)]
net: phylink: Add a mapping between MAC_CAPS and LINK_CAPS

phylink allows MAC drivers to report the capabilities in terms of speed,
duplex and pause support. This is done through a dedicated set of enum
values in the form of the MAC_ capabilities. They are very close to what
the LINK_CAPA_xxx can express, with the difference that LINK_CAPA don't
have any information about Pause/Asym Pause support.

To prepare converting phylink to using the phy_caps, add the mapping
between MAC capabilities and phy_caps. While doing so, we move the
phylink_caps_params array up a bit to simplify future commits.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-12-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phy: drop phy_settings and the associated lookup helpers
Maxime Chevallier [Fri, 7 Mar 2025 17:36:07 +0000 (18:36 +0100)]
net: phy: drop phy_settings and the associated lookup helpers

The phy_settings array is no longer relevant as it has now been replaced
by the link_caps array and associated phy_caps helpers.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-11-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phylink: Use phy_caps_lookup for fixed-link configuration
Maxime Chevallier [Fri, 7 Mar 2025 17:36:06 +0000 (18:36 +0100)]
net: phylink: Use phy_caps_lookup for fixed-link configuration

When phylink creates a fixed-link configuration, it finds a matching
linkmode to set as the advertised, lp_advertising and supported modes
based on the speed and duplex of the fixed link.

Use the newly introduced phy_caps_lookup to get these modes instead of
phy_lookup_settings(). This has the side effect that the matched
settings and configured linkmodes may now contain several linkmodes (the
intersection of supported linkmodes from the phylink settings and the
linkmodes that match speed/duplex) instead of the one from
phy_lookup_settings().

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-10-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phy: phy_device: Use link_capabilities lookup for PHY aneg config
Maxime Chevallier [Fri, 7 Mar 2025 17:36:05 +0000 (18:36 +0100)]
net: phy: phy_device: Use link_capabilities lookup for PHY aneg config

When configuring PHY advertising with autoneg disabled, we lookd for an
exact linkmode to advertise and configure for the requested Speed and
Duplex, specially at or over 1G.

Using phy_caps_lookup allows us to build a list of the supported
linkmodes at that speed that we can advertise instead of the first mode
that matches.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-9-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phy: phy_caps: Allow looking-up link caps based on speed and duplex
Maxime Chevallier [Fri, 7 Mar 2025 17:36:04 +0000 (18:36 +0100)]
net: phy: phy_caps: Allow looking-up link caps based on speed and duplex

As the link_caps array is efficient for <speed,duplex> lookups,
implement a function for speed/duplex lookups that matches a given
mask. This replicates to some extent the phy_lookup_settings()
behaviour, matching full link_capabilities instead of a single linkmode.

phy.c's phy_santize_settings() and phylink's
phylink_ethtool_ksettings_set() performs such lookup using the
phy_settings table, but are only interested in the actual speed/duplex
that were matched, rathet than the individual linkmode.

Similar to phy_lookup_settings(), the newly introduced phy_caps_lookup()
will run through the link_caps[] array by descending speed/duplex order.

If the link_capabilities for a given <speed/duplex> tuple intersects the
passed linkmodes, we consider that a match.

Similar to phy_lookup_settings(), we also allow passing an 'exact'
boolean, allowing non-exact match. Here, we MUST always match the
linkmodes mask, but we allow matching on lower speed settings.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-8-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phy: phy_caps: Implement link_capabilities lookup by linkmode
Maxime Chevallier [Fri, 7 Mar 2025 17:36:03 +0000 (18:36 +0100)]
net: phy: phy_caps: Implement link_capabilities lookup by linkmode

In several occasions, phylib needs to lookup a set of matching speed and
duplex against a given linkmode set. Instead of relying on the
phy_settings array and thus iterate over the whole linkmodes list, use
the link_capabilities array to lookup these matches, as we aren't
interested in the actual link setting that matches but rather the speed
and duplex for that setting.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-7-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phy: phy_caps: Introduce phy_caps_valid
Maxime Chevallier [Fri, 7 Mar 2025 17:36:02 +0000 (18:36 +0100)]
net: phy: phy_caps: Introduce phy_caps_valid

With the link_capabilities array, it's trivial to validate a given mask
againts a <speed, duplex> tuple. Create a helper for that purpose, and
use it to replace a phy_settings lookup in phy_check_valid();

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-6-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phy: phy_caps: Move __set_linkmode_max_speed to phy_caps
Maxime Chevallier [Fri, 7 Mar 2025 17:36:01 +0000 (18:36 +0100)]
net: phy: phy_caps: Move __set_linkmode_max_speed to phy_caps

Convert the __set_linkmode_max_speed to use the link_capabilities array.
This makes it easy to clamp the linkmodes to a given max speed.
Introduce a new helper phy_caps_linkmode_max_speed to replace the
previous one that used phy_settings.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-5-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phy: phy_caps: Move phy_speeds to phy_caps
Maxime Chevallier [Fri, 7 Mar 2025 17:36:00 +0000 (18:36 +0100)]
net: phy: phy_caps: Move phy_speeds to phy_caps

Use the newly introduced link_capabilities array to derive the list of
possible speeds when given a combination of linkmodes. As
link_capabilities is indexed by speed, we don't have to iterate the
whole phy_settings array.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-4-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phy: Use an internal, searchable storage for the linkmodes
Maxime Chevallier [Fri, 7 Mar 2025 17:35:59 +0000 (18:35 +0100)]
net: phy: Use an internal, searchable storage for the linkmodes

The canonical definition for all the link modes is in linux/ethtool.h,
which is complemented by the link_mode_params array stored in
net/ethtool/common.h . That array contains all the metadata about each
of these modes, including the Speed and Duplex information.

Phylib and phylink needs that information as well for internal
management of the link, which was done by duplicating that information
in locally-stored arrays and lookup functions. This makes it easy for
developpers adding new modes to forget modifying phylib and phylink
accordingly.

However, the link_mode_params array in net/ethtool/common.c is fairly
inefficient to search through, as it isn't sorted in any manner. Phylib
and phylink perform a lot of lookup operations, mostly to filter modes
by speed and/or duplex.

We therefore introduce the link_caps private array in phy_caps.c, that
indexes linkmodes in a more efficient manner. Each element associated a
tuple <speed, duplex> to a bitfield of all the linkmodes runs at these
speed/duplex.

We end-up with an array that's fairly short, easily addressable and that
it optimised for the typical use-cases of phylib/phylink.

That array is initialized at the same time as phylib. As the
link_mode_params array is part of the net stack, which phylink depends
on, it should always be accessible from phylib.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-3-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: ethtool: Export the link_mode_params definitions
Maxime Chevallier [Fri, 7 Mar 2025 17:35:58 +0000 (18:35 +0100)]
net: ethtool: Export the link_mode_params definitions

link_mode_params contains a lookup table of all 802.3 link modes that
are currently supported with structured data about each mode's speed,
duplex, number of lanes and mediums.

As a preparation for a port representation, export that table for the
rest of the net stack to use.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250307173611.129125-2-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'net-stmmac-avoid-unnecessary-work-in-stmmac_release-stmmac_dvr_remove'
Paolo Abeni [Mon, 17 Mar 2025 20:36:22 +0000 (21:36 +0100)]
Merge branch 'net-stmmac-avoid-unnecessary-work-in-stmmac_release-stmmac_dvr_remove'

Russell King says:

====================
net: stmmac: avoid unnecessary work in stmmac_release()/stmmac_dvr_remove()

This small series is a subset of a RFC I sent earlier. These two
patches remove code that is unnecessary and/or wrong in these paths.
Details in each commit.
====================

Link: https://patch.msgid.link/Z87bpDd7QYYVU0ML@shell.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: remove unnecessary stmmac_mac_set() in stmmac_release()
Russell King (Oracle) [Mon, 10 Mar 2025 12:31:30 +0000 (12:31 +0000)]
net: stmmac: remove unnecessary stmmac_mac_set() in stmmac_release()

stmmac_release() calls phylink_stop() and then goes on to call
stmmac_mac_set(, false). However, phylink_stop() will call
stmmac_mac_link_down() before returning, which will do this work.
Remove this unnecessary call.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Furong Xu <0x1207@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1trcI6-005rn8-GV@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: remove redundant racy tear-down in stmmac_dvr_remove()
Russell King (Oracle) [Mon, 10 Mar 2025 12:31:25 +0000 (12:31 +0000)]
net: stmmac: remove redundant racy tear-down in stmmac_dvr_remove()

While the network device is registered, it is published to userspace,
and thus userspace can change its state. This means calling
functions such as stmmac_stop_all_dma() and stmmac_mac_set() are
racy.

Moreover, unregister_netdev() will unpublish the network device, and
then if appropriate call the .ndo_stop() method, which is
stmmac_release(). This will first call phylink_stop() which will
synchronously take the link down, resulting in stmmac_mac_link_down()
and stmmac_mac_set(, false) being called.

stmmac_release() will also call stmmac_stop_all_dma().

Consequently, neither of these two functions need to called prior
to unregister_netdev() as that will safely call paths that will
result in this work being done if necessary.

Remove these redundant racy calls.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Furong Xu <0x1207@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1trcI1-005rn2-CZ@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phylink: expand on .pcs_config() method documentation
Russell King (Oracle) [Mon, 10 Mar 2025 11:10:52 +0000 (11:10 +0000)]
net: phylink: expand on .pcs_config() method documentation

Expand on the requirements of the .pcs_config() method documentation,
specifically mentioning that it should cause minimal disruption to
an established link, and that it should return a positive non-zero
value when requiring the .pcs_an_restart() method to be called.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1trb24-005oVq-Is@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agocdc_ether|r8152: ThinkPad Hybrid USB-C/A Dock quirk
Philipp Hahn [Mon, 10 Mar 2025 10:17:35 +0000 (11:17 +0100)]
cdc_ether|r8152: ThinkPad Hybrid USB-C/A Dock quirk

Lenovo ThinkPad Hybrid USB-C with USB-A Dock (17ef:a359) is affected by
the same problem as the Lenovo Powered USB-C Travel Hub (17ef:721e):
Both are based on the Realtek RTL8153B chip used to use the cdc_ether
driver. However, using this driver, with the system suspended the device
constantly sends pause-frames as soon as the receive buffer fills up.
This causes issues with other devices, where some Ethernet switches stop
forwarding packets altogether.

Using the Realtek driver (r8152) fixes this issue. Pause frames are no
longer sent while the host system is suspended.

Cc: Leon Schuermann <leon@is.currently.online>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Oliver Neukum <oliver@neukum.org> (maintainer:USB CDC ETHERNET DRIVER)
Cc: netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
Link: https://git.kernel.org/netdev/net/c/cb82a54904a9
Link: https://git.kernel.org/netdev/net/c/2284bbd0cf39
Link: https://www.lenovo.com/de/de/p/accessories-and-software/docking/docking-usb-docks/40af0135eu
Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/484336aad52d14ccf061b535bc19ef6396ef5120.1741601523.git.p.hahn@avm.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agostmmac: intel: Fix warning message for return value in intel_tsn_lane_is_available()
Choong Yong Liang [Mon, 10 Mar 2025 05:08:35 +0000 (13:08 +0800)]
stmmac: intel: Fix warning message for return value in intel_tsn_lane_is_available()

Fix the warning "warn: missing error code? 'ret'" in the
intel_tsn_lane_is_available() function.

The function now returns 0 to indicate that a TSN lane was found and
returns -EINVAL when it is not found.

Fixes: a42f6b3f1cc1 ("net: stmmac: configure SerDes according to the interface mode")
Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250310050835.808870-1-yong.liang.choong@linux.intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'net-phy-clean-up-phy-package-mmd-access-functions'
Paolo Abeni [Mon, 17 Mar 2025 18:07:56 +0000 (19:07 +0100)]
Merge branch 'net-phy-clean-up-phy-package-mmd-access-functions'

Heiner Kallweit says:

====================
net: phy: clean up PHY package MMD access functions

Move declarations of the functions with users to phylib.h, and remove
unused functions.
====================

Link: https://patch.msgid.link/b624fcb7-b493-461a-a0b5-9ca7e9d767bc@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phy: remove unused functions phy_package_[read|write]_mmd
Heiner Kallweit [Sun, 9 Mar 2025 20:05:08 +0000 (21:05 +0100)]
net: phy: remove unused functions phy_package_[read|write]_mmd

These functions have never had a user, so remove them.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/5792e2cd-6f0a-4f7d-a5ef-b932f94d82f3@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: phy: move PHY package MMD access function declarations from phy.h to phylib.h
Heiner Kallweit [Sun, 9 Mar 2025 20:04:14 +0000 (21:04 +0100)]
net: phy: move PHY package MMD access function declarations from phy.h to phylib.h

These functions are used by PHY drivers only, therefore move their
declaration to phylib.h.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/406c8a20-b62e-4ee3-b174-b566724a0876@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'mlx5-support-hws-flow-meter-sampler-actions-in-fs-core'
Paolo Abeni [Mon, 17 Mar 2025 17:57:19 +0000 (18:57 +0100)]
Merge branch 'mlx5-support-hws-flow-meter-sampler-actions-in-fs-core'

Tariq Toukan says:

====================
mlx5: Support HWS flow meter/sampler actions in FS core

This series by Moshe adds support for flow meter and flow sampler HW
Steering actions in FS core level. As these actions can be shared by
multiple rules, these patches use refcounts to manage the HWS actions
sharing in FS core level.
====================

Link: https://patch.msgid.link/1741543663-22123-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: fs, add support for dest flow sampler HWS action
Moshe Shemesh [Sun, 9 Mar 2025 18:07:43 +0000 (20:07 +0200)]
net/mlx5: fs, add support for dest flow sampler HWS action

Add support for HW Steering action of flow sampler destination. For each
flow sampler created cache the hws action by sampler id as a key. Hold
refcount for each rule using the cached action.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1741543663-22123-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: fs, add support for flow meters HWS action
Moshe Shemesh [Sun, 9 Mar 2025 18:07:42 +0000 (20:07 +0200)]
net/mlx5: fs, add support for flow meters HWS action

Add support for HW Steering action of flow meter range. Flow meters
range can use one HWS action for the whole range. Thus, share a cached
HWS action among rules that use same flow meter object range. Hold
refcount for each rule using the cached action.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1741543663-22123-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: fs, add API for sharing HWS action by refcount
Moshe Shemesh [Sun, 9 Mar 2025 18:07:41 +0000 (20:07 +0200)]
net/mlx5: fs, add API for sharing HWS action by refcount

Counters HWS actions are shared using refcount, to create action on
demand by flow steering rule and destroy only when no rules are using
the action. The method is extensible to other HWS action types, such as
flow meter and sampler actions, in the downstream patches.

Add an API to facilitate the reuse of get/put logic for HWS actions
shared by refcount.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1741543663-22123-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'tcp-accecn'
David S. Miller [Mon, 17 Mar 2025 13:57:14 +0000 (13:57 +0000)]
Merge branch 'tcp-accecn'

Chia-Yu Chang says:

====================
AccECN protocol preparation patch series

Please find the v7

v7 (03-Mar-2025)
- Move 2 new patches added in v6 to the next AccECN patch series

v6 (27-Dec-2024)
- Avoid removing removing the potential CA_ACK_WIN_UPDATE in ack_ev_flags of patch #1 (Eric Dumazet <edumazet@google.com>)
- Add reviewed-by tag in patches #2, #3, #4, #5, #6, #7, #8, #12, #14
- Foloiwng 2 new pathces are added after patch #9 (Patch that adds SKB_GSO_TCP_ACCECN)
  * New patch #10 to replace exisiting SKB_GSO_TCP_ECN with SKB_GSO_TCP_ACCECN in the driver to avoid CWR flag corruption
  * New patch #11 adds AccECN for virtio by adding new negotiation flag (VIRTIO_NET_F_HOST/GUEST_ACCECN) in feature handshake and translating Accurate ECN GSO flag between virtio_net_hdr (VIRTIO_NET_HDR_GSO_ACCECN) and skb header (SKB_GSO_TCP_ACCECN)
- Add detailed changelog and comments in #13 (Eric Dumazet <edumazet@google.com>)
- Move patch #14 to the next AccECN patch series (Eric Dumazet <edumazet@google.com>)

v5 (5-Nov-2024)
- Add helper function "tcp_flags_ntohs" to preserve last 2 bytes of TCP flags of patch #4 (Paolo Abeni <pabeni@redhat.com>)
- Fix reverse X-max tree order of patches #4, #11 (Paolo Abeni <pabeni@redhat.com>)
- Rename variable "delta" as "timestamp_delta" of patch #2 fo clariety
- Remove patch #14 in this series (Paolo Abeni <pabeni@redhat.com>, Joel Granados <joel.granados@kernel.org>)

v4 (21-Oct-2024)
- Fix line length warning of patches #2, #4, #8, #10, #11, #14
- Fix spaces preferred around '|' (ctx:VxV) warning of patch #7
- Add missing CC'ed of patches #4, #12, #14

v3 (19-Oct-2024)
- Fix build error in v2

v2 (18-Oct-2024)
- Fix warning caused by NETIF_F_GSO_ACCECN_BIT in patch #9 (Jakub Kicinski <kuba@kernel.org>)

The full patch series can be found in
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/

The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agotcp: Pass flags to __tcp_send_ack
Ilpo Järvinen [Wed, 5 Mar 2025 22:38:52 +0000 (23:38 +0100)]
tcp: Pass flags to __tcp_send_ack

Accurate ECN needs to send custom flags to handle IP-ECN
field reflection during handshake.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agotcp: add new TCP_TW_ACK_OOW state and allow ECN bits in TOS
Ilpo Järvinen [Wed, 5 Mar 2025 22:38:51 +0000 (23:38 +0100)]
tcp: add new TCP_TW_ACK_OOW state and allow ECN bits in TOS

ECN bits in TOS are always cleared when sending in ACKs in TW. Clearing
them is problematic for TCP flows that used Accurate ECN because ECN bits
decide which service queue the packet is placed into (L4S vs Classic).
Effectively, TW ACKs are always downgraded from L4S to Classic queue
which might impact, e.g., delay the ACK will experience on the path
compared with the other packets of the flow.

Change the TW ACK sending code to differentiate:
- In tcp_v4_send_reset(), commit ba9e04a7ddf4f ("ip: fix tos reflection
  in ack and reset packets") cleans ECN bits for TW reset and this is
  not affected.
- In tcp_v4_timewait_ack(), ECN bits for all TW ACKs are cleaned. But now
  only ECN bits of ACKs for oow data or paws_reject are cleaned, and ECN
  bits of other ACKs will not be cleaned.
- In tcp_v4_reqsk_send_ack(), commit 66b13d99d96a1 ("ipv4: tcp: fix TOS
  value in ACK messages sent from TIME_WAIT") did not clean ECN bits of
  ACKs for oow data or paws_reject. But now the ECN bits rae cleaned for
  these ACKs.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agotcp: AccECN support to tcp_add_backlog
Ilpo Järvinen [Wed, 5 Mar 2025 22:38:50 +0000 (23:38 +0100)]
tcp: AccECN support to tcp_add_backlog

AE flag needs to be preserved for AccECN.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agogro: prevent ACE field corruption & better AccECN handling
Ilpo Järvinen [Wed, 5 Mar 2025 22:38:49 +0000 (23:38 +0100)]
gro: prevent ACE field corruption & better AccECN handling

There are important differences in how the CWR field behaves
in RFC3168 and AccECN. With AccECN, CWR flag is part of the
ACE counter and its changes are important so adjust the flags
changed mask accordingly.

Also, if CWR is there, set the Accurate ECN GSO flag to avoid
corrupting CWR flag somewhere.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agogso: AccECN support
Ilpo Järvinen [Wed, 5 Mar 2025 22:38:48 +0000 (23:38 +0100)]
gso: AccECN support

Handling the CWR flag differs between RFC 3168 ECN and AccECN.
With RFC 3168 ECN aware TSO (NETIF_F_TSO_ECN) CWR flag is cleared
starting from 2nd segment which is incompatible how AccECN handles
the CWR flag. Such super-segments are indicated by SKB_GSO_TCP_ECN.
With AccECN, CWR flag (or more accurately, the ACE field that also
includes ECE & AE flags) changes only when new packet(s) with CE
mark arrives so the flag should not be changed within a super-skb.
The new skb/feature flags are necessary to prevent such TSO engines
corrupting AccECN ACE counters by clearing the CWR flag (if the
CWR handling feature cannot be turned off).

If NIC is completely unaware of RFC3168 ECN (doesn't support
NETIF_F_TSO_ECN) or its TSO engine can be set to not touch CWR flag
despite supporting also NETIF_F_TSO_ECN, TSO could be safely used
with AccECN on such NIC. This should be evaluated per NIC basis
(not done in this patch series for any NICs).

For the cases, where TSO cannot keep its hands off the CWR flag,
a GSO fallback is provided by this patch.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agotcp: helpers for ECN mode handling
Ilpo Järvinen [Wed, 5 Mar 2025 22:38:47 +0000 (23:38 +0100)]
tcp: helpers for ECN mode handling

Create helpers for TCP ECN modes. No functional changes.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agotcp: rework {__,}tcp_ecn_check_ce() -> tcp_data_ecn_check()
Ilpo Järvinen [Wed, 5 Mar 2025 22:38:46 +0000 (23:38 +0100)]
tcp: rework {__,}tcp_ecn_check_ce() -> tcp_data_ecn_check()

Rename tcp_ecn_check_ce to tcp_data_ecn_check as it is
called only for data segments, not for ACKs (with AccECN,
also ACKs may get ECN bits).

The extra "layer" in tcp_ecn_check_ce() function just
checks for ECN being enabled, that can be moved into
tcp_ecn_field_check rather than having the __ variant.

No functional changes.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agotcp: extend TCP flags to allow AE bit/ACE field
Ilpo Järvinen [Wed, 5 Mar 2025 22:38:44 +0000 (23:38 +0100)]
tcp: extend TCP flags to allow AE bit/ACE field

With AccECN, there's one additional TCP flag to be used (AE)
and ACE field that overloads the definition of AE, CWR, and
ECE flags. As tcp_flags was previously only 1 byte, the
byte-order stuff needs to be added to it's handling.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agotcp: use BIT() macro in include/net/tcp.h
Chia-Yu Chang [Wed, 5 Mar 2025 22:38:43 +0000 (23:38 +0100)]
tcp: use BIT() macro in include/net/tcp.h

Use BIT() macro for TCP flags field and TCP congestion control
flags that will be used by the congestion control algorithm.

No functional changes.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Ilpo Järvinen <ij@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agotcp: create FLAG_TS_PROGRESS
Ilpo Järvinen [Wed, 5 Mar 2025 22:38:42 +0000 (23:38 +0100)]
tcp: create FLAG_TS_PROGRESS

Whenever timestamp advances, it declares progress which
can be used by the other parts of the stack to decide that
the ACK is the most recent one seen so far.

AccECN will use this flag when deciding whether to use the
ACK to update AccECN state or not.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agotcp: reorganize tcp_in_ack_event() and tcp_count_delivered()
Ilpo Järvinen [Wed, 5 Mar 2025 22:38:41 +0000 (23:38 +0100)]
tcp: reorganize tcp_in_ack_event() and tcp_count_delivered()

- Move tcp_count_delivered() earlier and split tcp_count_delivered_ce()
  out of it
- Move tcp_in_ack_event() later
- While at it, remove the inline from tcp_in_ack_event() and let
  the compiler to decide

Accurate ECN's heuristics does not know if there is going
to be ACE field based CE counter increase or not until after
rtx queue has been processed. Only then the number of ACKed
bytes/pkts is available. As CE or not affects presence of
FLAG_ECE, that information for tcp_in_ack_event is not yet
available in the old location of the call to tcp_in_ack_event().

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agonet/smc: use the correct ndev to find pnetid by pnetid table
Guangguan Wang [Tue, 4 Mar 2025 12:43:04 +0000 (20:43 +0800)]
net/smc: use the correct ndev to find pnetid by pnetid table

When using smc_pnet in SMC, it will only search the pnetid in the
base_ndev of the netdev hierarchy(both HW PNETID and User-defined
sw pnetid). This may not work for some scenarios when using SMC in
container on cloud environment.
In container, there have choices of different container network,
such as directly using host network, virtual network IPVLAN, veth,
etc. Different choices of container network have different netdev
hierarchy. Examples of netdev hierarchy show below. (eth0 and eth1
in host below is the netdev directly related to the physical device).
            _______________________________
           |   _________________           |
           |  |POD              |          |
           |  |                 |          |
           |  | eth0_________   |          |
           |  |____|         |__|          |
           |       |         |             |
           |       |         |             |
           |   eth1|base_ndev| eth0_______ |
           |       |         |    | RDMA  ||
           | host  |_________|    |_______||
           ---------------------------------
     netdev hierarchy if directly using host network
           ________________________________
           |   _________________           |
           |  |POD  __________  |          |
           |  |    |upper_ndev| |          |
           |  |eth0|__________| |          |
           |  |_______|_________|          |
           |          |lower netdev        |
           |        __|______              |
           |   eth1|         | eth0_______ |
           |       |base_ndev|    | RDMA  ||
           | host  |_________|    |_______||
           ---------------------------------
            netdev hierarchy if using IPVLAN
            _______________________________
           |   _____________________       |
           |  |POD        _________ |      |
           |  |          |base_ndev||      |
           |  |eth0(veth)|_________||      |
           |  |____________|________|      |
           |               |pairs          |
           |        _______|_              |
           |       |         | eth0_______ |
           |   veth|base_ndev|    | RDMA  ||
           |       |_________|    |_______||
           |        _________              |
           |   eth1|base_ndev|             |
           | host  |_________|             |
           ---------------------------------
             netdev hierarchy if using veth
Due to some reasons, the eth1 in host is not RDMA attached netdevice,
pnetid is needed to map the eth1(in host) with RDMA device so that POD
can do SMC-R. Because the eth1(in host) is managed by CNI plugin(such
as Terway, network management plugin in container environment), and in
cloud environment the eth(in host) can dynamically be inserted by CNI
when POD create and dynamically be removed by CNI when POD destroy and
no POD related to the eth(in host) anymore. It is hard to config the
pnetid to the eth1(in host). But it is easy to config the pnetid to the
netdevice which can be seen in POD. When do SMC-R, both the container
directly using host network and the container using veth network can
successfully match the RDMA device, because the configured pnetid netdev
is a base_ndev. But the container using IPVLAN can not successfully
match the RDMA device and 0x03030000 fallback happens, because the
configured pnetid netdev is not a base_ndev. Additionally, if config
pnetid to the eth1(in host) also can not work for matching RDMA device
when using veth network and doing SMC-R in POD.

To resolve the problems list above, this patch extends to search user
-defined sw pnetid in the clc handshake ndev when no pnetid can be found
in the base_ndev, and the base_ndev take precedence over ndev for backward
compatibility. This patch also can unify the pnetid setup of different
network choices list above in container(Config user-defined sw pnetid in
the netdevice can be seen in POD).

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agocan: add protocol counter for AF_CAN sockets
Davide Caratti [Fri, 14 Mar 2025 11:39:49 +0000 (12:39 +0100)]
can: add protocol counter for AF_CAN sockets

The third column in the output of the following command:

| # grep CAN /proc/net/protocols

is systematically '0': use sock_prot_inuse_add() to account for the number
of sockets for each protocol on top of AF_CAN family.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/9db5d0e6c11b232ad895885616f1258882a32f61.1741952160.git.dcaratti@redhat.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
7 weeks agoselftests: drv-net: fix merge conflicts resolution
Matthieu Baerts (NGI0) [Fri, 14 Mar 2025 08:35:51 +0000 (09:35 +0100)]
selftests: drv-net: fix merge conflicts resolution

After the recent merge between net-next and net, I got some conflicts on
my side because the merge resolution was different from Stephen's one
[1] I applied on my side in the MPTCP tree.

It looks like the code that is now in net-next is using the old way to
retrieve the local and remote addresses. This patch is now using the new
way, like what was in Stephen's email [1].

Also, in get_interface_info(), there were no conflicts in this area,
because that was new code from 'net', but a small adaptation was needed
there as well to get the remote address.

Fixes: 941defcea7e1 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Link: https://lore.kernel.org/20250311115758.17a1d414@canb.auug.org.au
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250314-net-next-drv-net-ping-fix-merge-v1-1-0d5c19daf707@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>