]> www.infradead.org Git - users/hch/misc.git/log
users/hch/misc.git
7 weeks agonet: pcs: rzn1-miic: Add missing include files
Lad Prabhakar [Wed, 10 Sep 2025 20:41:24 +0000 (21:41 +0100)]
net: pcs: rzn1-miic: Add missing include files

The pcs-rzn1-miic driver makes use of ARRAY_SIZE(), BIT() and GENMASK()
macros but does not explicitly include the headers where they are
defined. Add the missing <linux/array_size.h> and <linux/bits.h>
includes.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://patch.msgid.link/20250910204132.319975-4-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: pcs: rzn1-miic: Drop trailing comma from of_device_id table
Lad Prabhakar [Wed, 10 Sep 2025 20:41:23 +0000 (21:41 +0100)]
net: pcs: rzn1-miic: Drop trailing comma from of_device_id table

Remove the trailing comma after the sentinel entry in the
of_device_id match table.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://patch.msgid.link/20250910204132.319975-3-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodt-bindings: net: pcs: renesas,rzn1-miic: Add RZ/T2H and RZ/N2H support
Lad Prabhakar [Wed, 10 Sep 2025 20:41:22 +0000 (21:41 +0100)]
dt-bindings: net: pcs: renesas,rzn1-miic: Add RZ/T2H and RZ/N2H support

Add device tree binding support for RZ/T2H and RZ/N2H SoCs to the
existing RZ/N1 MIIC converter binding. These SoCs share similar MIIC
functionality but have architectural differences that require schema
updates.

Add new compatible strings "renesas,r9a09g077-miic" for RZ/T2H and
"renesas,r9a09g087-miic" for RZ/N2H, with the latter falling back to
the RZ/T2H variant. The new SoCs require reset support with two reset
lines for converter register reset and converter reset, which are not
present on RZ/N1.

Update port configurations to accommodate the different architectures.
RZ/N1 supports 5 ports numbered 1-5 with complex input mappings
covering indices 0-13, while RZ/T2H and RZ/N2H support 4 ports
numbered 0-3 with simplified input mappings covering indices 0-8.
Extend the switch port configuration property to support value 0 for
the new SoCs.

Add a new dt-bindings header file with media interface connection
matrix constants that map GMAC, ESC, and ETHSW ports to numeric
identifiers for use with RZ/T2H and RZ/N2H device trees.

Update DT schema validation to ensure proper port numbering and input
mappings per SoC variant.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250910204132.319975-2-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'accecn-protocol-patch-series'
Jakub Kicinski [Mon, 15 Sep 2025 23:26:40 +0000 (16:26 -0700)]
Merge branch 'accecn-protocol-patch-series'

TCP preparations for AccECN support

Just code reshuffling, no functional changes.

Link: https://patch.msgid.link/20250911110642.87529-1-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: ecn functions in separated include file
Chia-Yu Chang [Thu, 11 Sep 2025 11:06:32 +0000 (13:06 +0200)]
tcp: ecn functions in separated include file

The following patches will modify ECN helpers and add AccECN herlpers,
and this patch moves the existing ones into a separated include file.

No functional changes.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250911110642.87529-5-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: reorganize tcp_sock_write_txrx group for variables later
Chia-Yu Chang [Thu, 11 Sep 2025 11:06:31 +0000 (13:06 +0200)]
tcp: reorganize tcp_sock_write_txrx group for variables later

Use the first 3-byte hole at the beginning of the tcp_sock_write_txrx
group for 'noneagle'/'rate_app_limited' to fill in the existing hole
in later patches. Therefore, the group size of tcp_sock_write_txrx is
reduced from 92 + 4 to 91 + 4. In addition, the group size of
tcp_sock_write_rx is changed to 96 to fit in the pahole outcome.
Below are the trimmed pahole outcomes before and after this patch:

[BEFORE THIS PATCH]
struct tcp_sock {
    [...]
    __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  2521     0 */
    /* XXX 3 bytes hole, try to pack */

    [...]
    struct tcp_options_received rx_opt;              /*  2588    24 */
    u8                         nonagle:4;            /*  2612: 0  1 */
    u8                         rate_app_limited:1;   /*  2612: 4  1 */
    /* XXX 3 bits hole, try to pack */

    __cacheline_group_end__tcp_sock_write_txrx[0];   /*  2613     0 */
    /* XXX 3 bytes hole, try to pack */

    __cacheline_group_begin__tcp_sock_write_rx[0] __attribute__((__aligned__(8))); /*  2616     0 */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];     /*  2712     0 */

    [...]
    /* size: 3200, cachelines: 50, members: 161 */
}

[AFTER THIS PATCH]
struct tcp_sock {
    [...]
    __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  2521     0 */
    u8                         nonagle:4;            /*  2521: 0  1 */
    u8                         rate_app_limited:1;   /*  2521: 4  1 */
    /* XXX 3 bits hole, try to pack */
    /* XXX 2 bytes hole, try to pack */

    [...]
    struct tcp_options_received rx_opt;              /*  2588    24 */

    __cacheline_group_end__tcp_sock_write_txrx[0];   /*  2612     0 */
    /* XXX 4 bytes hole, try to pack */

    __cacheline_group_begin__tcp_sock_write_rx[0] __attribute__((__aligned__(8))); /*  2616     0 */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];     /*  2712     0 */

    [...]
    /* size: 3200, cachelines: 50, members: 161 */
}

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250911110642.87529-4-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: fast path functions later
Ilpo Järvinen [Thu, 11 Sep 2025 11:06:30 +0000 (13:06 +0200)]
tcp: fast path functions later

The following patch will use tcp_ecn_mode_accecn(),
TCP_ACCECN_CEP_INIT_OFFSET, TCP_ACCECN_CEP_ACE_MASK in
__tcp_fast_path_on() to make new flag for AccECN.

No functional changes.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250911110642.87529-3-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: reorganize SYN ECN code
Ilpo Järvinen [Thu, 11 Sep 2025 11:06:29 +0000 (13:06 +0200)]
tcp: reorganize SYN ECN code

Prepare for AccECN that needs to have access here on IP ECN
field value which is only available after INET_ECN_xmit().

No functional changes.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250911110642.87529-2-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: phy: clear EEE runtime state in PHY_HALTED/PHY_ERROR
Oleksij Rempel [Fri, 12 Sep 2025 13:20:00 +0000 (15:20 +0200)]
net: phy: clear EEE runtime state in PHY_HALTED/PHY_ERROR

Clear EEE runtime flags when the PHY transitions to HALTED or ERROR
and the state machine drops the link. This avoids stale EEE state being
reported via ethtool after the PHY is stopped or hits an error.

This change intentionally only clears software runtime flags and avoids
MDIO accesses in HALTED/ERROR. A follow-up patch will address other
link state variables.

Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250912132000.1598234-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests/tc-testing: Adapt tc police action tests for Gb rounding changes
Victor Nogueira [Fri, 12 Sep 2025 15:46:16 +0000 (12:46 -0300)]
selftests/tc-testing: Adapt tc police action tests for Gb rounding changes

For the tc police action, iproute2 rounds up mtu and burst sizes to a
higher order representation. For example, if the user specifies the default
mtu for a police action instance (4294967295 bytes), iproute2 will output
it as 4096Mb when this action instance is dumped. After Jay's changes [1],
iproute2 will round up to Gb, so 4096Mb becomes 4Gb. With that in mind,
fix police's tc test output so that it works both with the current
iproute2 version and Jay's.

[1] https://lore.kernel.org/netdev/20250907014216.2691844-1-jay.vosburgh@canonical.com/

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Cong Wang <cwang@multikernel.io>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://patch.msgid.link/20250912154616.67489-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5: fix typo in pci_irq.c comment
Alok Tiwari [Fri, 12 Sep 2025 13:50:44 +0000 (06:50 -0700)]
net/mlx5: fix typo in pci_irq.c comment

Fix a typo in a comment in pci_irq.c:
 "ssigned" → "assigned"

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250912135050.3921116-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'dpll-zl3073x-add-support-for-devlink-flash'
Jakub Kicinski [Mon, 15 Sep 2025 15:08:41 +0000 (08:08 -0700)]
Merge branch 'dpll-zl3073x-add-support-for-devlink-flash'

Ivan Vecera says:

====================
dpll: zl3073x: Add support for devlink flash

Add functionality for accessing device hardware registers, loading
firmware bundles, and accessing the device's internal flash memory,
and use it to implement the devlink flash functionality.
====================

Link: https://patch.msgid.link/20250909091532.11790-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodpll: zl3073x: Implement devlink flash callback
Ivan Vecera [Tue, 9 Sep 2025 09:15:32 +0000 (11:15 +0200)]
dpll: zl3073x: Implement devlink flash callback

Use the introduced functionality to read firmware files and flash their
contents into the device's internal flash memory to implement the devlink
flash update callback.

Sample output on EDS2 development board:
 # devlink -j dev info i2c/1-0070 | jq '.[][]["versions"]["running"]'
 {
   "fw": "6026"
 }
 # devlink dev flash i2c/1-0070 file firmware_fw2.hex
 [utility] Prepare flash mode
 [utility] Downloading image 100%
 [utility] Flash mode enabled
 [firmware1-part1] Downloading image 100%
 [firmware1-part1] Flashing image
 [firmware1-part2] Downloading image 100%
 [firmware1-part2] Flashing image
 [firmware1] Flashing done
 [firmware2] Downloading image 100%
 [firmware2] Flashing image 100%
 [firmware2] Flashing done
 [utility] Leaving flash mode
 Flashing done
 # devlink -j dev info i2c/1-0070 | jq '.[][]["versions"]["running"]'
 {
   "fw": "7006"
 }

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20250909091532.11790-6-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodpll: zl3073x: Refactor DPLL initialization
Ivan Vecera [Tue, 9 Sep 2025 09:15:31 +0000 (11:15 +0200)]
dpll: zl3073x: Refactor DPLL initialization

Refactor DPLL initialization and move DPLL (de)registration, monitoring
control, fetching device invariant parameters and phase offset
measurement block setup to separate functions.

Use these new functions during device probe and teardown functions and
during changes to the clock_id devlink parameter.

These functions will also be used in the next patch implementing devlink
flash, where this functionality is likewise required.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20250909091532.11790-5-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodpll: zl3073x: Add firmware loading functionality
Ivan Vecera [Tue, 9 Sep 2025 09:15:30 +0000 (11:15 +0200)]
dpll: zl3073x: Add firmware loading functionality

Add functionality for loading firmware files provided by the vendor
to be flashed into the device's internal flash memory. The firmware
consists of several components, such as the firmware executable itself,
chip-specific customizations, and configuration files.

The firmware file contains at least a flash utility, which is executed
on the device side, and one or more flashable components. Each component
has its own specific properties, such as the address where it should be
loaded during flashing, one or more destination flash pages, and
the flashing method that should be used.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20250909091532.11790-4-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodpll: zl3073x: Add low-level flash functions
Ivan Vecera [Tue, 9 Sep 2025 09:15:29 +0000 (11:15 +0200)]
dpll: zl3073x: Add low-level flash functions

To implement the devlink device flash functionality, the driver needs
to access both the device memory and the internal flash memory. The flash
memory is accessed using a device-specific program (called the flash
utility). This flash utility must be downloaded by the driver into
the device memory and then executed by the device CPU. Once running,
the flash utility provides a flash API to access the flash memory itself.

During this operation, the normal functionality provided by the standard
firmware is not available. Therefore, the driver must ensure that DPLL
callbacks and monitoring functions are not executed during the flash
operation.

Add all necessary functions for downloading the utility to device memory,
entering and exiting flash mode, and performing flash operations.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20250909091532.11790-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodpll: zl3073x: Add functions to access hardware registers
Ivan Vecera [Tue, 9 Sep 2025 09:15:28 +0000 (11:15 +0200)]
dpll: zl3073x: Add functions to access hardware registers

Besides the device host registers that are directly accessible, there
are also hardware registers that can be accessed indirectly via specific
host registers.

Add register definitions for accessing hardware registers and provide
helper functions for working with them. Additionally, extend the number
of pages in the regmap configuration to 256, as the host registers used
for accessing hardware registers are located on page 255.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20250909091532.11790-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoamd-xgbe: Add PPS periodic output support
Raju Rangoju [Tue, 9 Sep 2025 11:31:43 +0000 (17:01 +0530)]
amd-xgbe: Add PPS periodic output support

Add support for hardware PPS (Pulse Per Second) output to the
AMD XGBE driver. The implementation enables flexible periodic
output mode, exposing it via the PTP per_out interface.

The driver supports configuring PPS output using the standard
PTP subsystem, allowing precise periodic signal generation for
time synchronization applications.

The feature has been verified using the testptp tool and
oscilloscope.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20250909113143.1364477-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'net-fec-add-the-jumbo-frame-support'
Jakub Kicinski [Sun, 14 Sep 2025 21:20:03 +0000 (14:20 -0700)]
Merge branch 'net-fec-add-the-jumbo-frame-support'

Shenwei Wang says:

====================
net: fec: add the Jumbo frame support
====================

Link: https://patch.msgid.link/20250910185211.721341-1-shenwei.wang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: fec: enable the Jumbo frame support for i.MX8QM
Shenwei Wang [Wed, 10 Sep 2025 18:52:11 +0000 (13:52 -0500)]
net: fec: enable the Jumbo frame support for i.MX8QM

Certain i.MX SoCs, such as i.MX8QM and i.MX8QXP, feature enhanced
FEC hardware that supports Ethernet Jumbo frames with packet sizes
up to 16K bytes.

When Jumbo frames are supported, the TX FIFO may not be large enough
to hold an entire frame. To handle this, the FIFO is configured to
operate in cut-through mode when the frame size exceeds
(PKT_MAXBUF_SIZE - ETH_HLEN - ETH_FCS_LEN), which allows transmission
to begin once the FIFO reaches a certain threshold.

Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Link: https://patch.msgid.link/20250910185211.721341-7-shenwei.wang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: fec: add change_mtu to support dynamic buffer allocation
Shenwei Wang [Wed, 10 Sep 2025 18:52:10 +0000 (13:52 -0500)]
net: fec: add change_mtu to support dynamic buffer allocation

Add a fec_change_mtu() handler to recalculate the pagepool_order based on
the new_mtu value. And update the rx_frame_size accordingly when
pagepool_order changes.

MTU changes are only allowed when the adater is not running.

Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Link: https://patch.msgid.link/20250910185211.721341-6-shenwei.wang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: fec: add rx_frame_size to support configurable RX length
Shenwei Wang [Wed, 10 Sep 2025 18:52:09 +0000 (13:52 -0500)]
net: fec: add rx_frame_size to support configurable RX length

Add a new rx_frame_size member in the fec_enet_private structure to
track the RX buffer size. On the Jumbo frame enabled system, the value
will be recalculated whenever the MTU is updated, allowing the driver
to allocate RX buffer efficiently.

Configure the TRUNC_FL (Frame Truncation Length) based on the smaller
value between max_buf_size and the rx_frame_size to maintain consistent
RX error behavior, regardless of whether Jumbo frames are enabled.

Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Link: https://patch.msgid.link/20250910185211.721341-5-shenwei.wang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: fec: update MAX_FL based on the current MTU
Shenwei Wang [Wed, 10 Sep 2025 18:52:08 +0000 (13:52 -0500)]
net: fec: update MAX_FL based on the current MTU

Configure the MAX_FL (Maximum Frame Length) register according to the
current MTU value, which ensures that packets exceeding the configured MTU
trigger an RX error.

Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Link: https://patch.msgid.link/20250910185211.721341-4-shenwei.wang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: fec: add pagepool_order to support variable page size
Shenwei Wang [Wed, 10 Sep 2025 18:52:07 +0000 (13:52 -0500)]
net: fec: add pagepool_order to support variable page size

Add a new pagepool_order member in the fec_enet_private struct
to allow dynamic configuration of page size for an instance. This
change clears the hardcoded page size assumptions.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Link: https://patch.msgid.link/20250910185211.721341-3-shenwei.wang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: fec: use a member variable for maximum buffer size
Shenwei Wang [Wed, 10 Sep 2025 18:52:06 +0000 (13:52 -0500)]
net: fec: use a member variable for maximum buffer size

Refactor code to support Jumbo frame functionality by adding a member
variable in the fec_enet_private structure to store PKT_MAXBUF_SIZE.

Remove the OPT_FRAME_SIZE and define a new macro OPT_ARCH_HAS_MAX_FL to
indicate architectures that support configurable maximum frame length.
And update the MAX_FL register value to max_buf_size when
OPT_ARCH_HAS_MAX_FL is defined as 1.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Link: https://patch.msgid.link/20250910185211.721341-2-shenwei.wang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoionic: use int type for err in ionic_get_module_eeprom_by_page
Alok Tiwari [Fri, 12 Sep 2025 14:14:24 +0000 (07:14 -0700)]
ionic: use int type for err in ionic_get_module_eeprom_by_page

The variable 'err' is declared as u32, but it is used to store
negative error codes such as -EINVAL.

Changing the type of 'err' to int ensures proper representation of
negative error codes and aligns with standard kernel error handling
conventions.

Also, there is no need to initialize 'err' since it is always set
before being used.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Shannon Nelson <sln@onemain.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20250912141426.3922545-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Sun, 14 Sep 2025 20:13:12 +0000 (13:13 -0700)]
Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Fwlog support in ixgbe

Michal Swiatkowski says:

Firmware logging is a feature that allow user to dump firmware log using
debugfs interface. It is supported on device that can handle specific
firmware ops. It is true for ice and ixgbe driver.

Prepare code from ice driver to be moved to the library code and reuse
it in ixgbe driver.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ixgbe: fwlog support for e610
  ice, libie: move fwlog code to libie
  ice: reregister fwlog after driver reinit
  ice: prepare for moving file to libie
  ice: move debugfs code to fwlog
  libie, ice: move fwlog admin queue to libie
  ice: drop driver specific structure from fwlog code
  ice: check for PF number outside the fwlog code
  ice: move out debugfs init from fwlog
  ice: allow calling custom send function in fwlog
  ice: add pdev into fwlog structure and use it for logging
  ice: introduce ice_fwlog structure
  ice: drop ice_pf_fwlog_update_module()
  ice: move get_fwlog_data() to fwlog file
  ice: make fwlog functions static
====================

Link: https://patch.msgid.link/20250911210525.345110-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'pru-icssm-ethernet-driver'
Jakub Kicinski [Sun, 14 Sep 2025 20:00:56 +0000 (13:00 -0700)]
Merge branch 'pru-icssm-ethernet-driver'

Parvathi Pudi says:

====================
PRU-ICSSM Ethernet Driver

The Programmable Real-Time Unit Industrial Communication Sub-system (PRU-ICSS)
is available on the TI SOCs in two flavors: Gigabit ICSS (ICSSG) and the older
Megabit ICSS (ICSSM).

Support for ICSSG Dual-EMAC mode has already been mainlined [1] and the
fundamental components/drivers such as PRUSS driver, Remoteproc driver,
PRU-ICSS INTC, and PRU-ICSS IEP drivers are already available in the mainline
Linux kernel. The current set of patch series builds on top of these components
and introduces changes to support the Dual-EMAC using ICSSM on the TI AM57xx,
AM437x and AM335x devices.

AM335x, AM437x and AM57xx devices may have either one or two PRU-ICSS instances
with two 32-bit RISC PRU cores. Each PRU core has (a) dedicated Ethernet interface
(MII, MDIO), timers, capture modules, and serial communication interfaces, and
(b) dedicated data and instruction RAM as well as shared RAM for inter PRU
communication within the PRU-ICSS.

These patches add support for basic RX and TX  functionality over PRU Ethernet
ports in Dual-EMAC mode.

Further, note that these are the initial set of patches for a single instance of
PRU-ICSS Ethernet.  Additional features such as Ethtool support, VLAN Filtering,
Multicast Filtering, Promiscuous mode, Storm prevention, Interrupt coalescing,
Linux PTP (ptp4l) Ordinary clock and Switch mode support for AM335x, AM437x
and AM57x along with support for a second instance of  PRU-ICSS on AM57x
will be posted subsequently.

The patches presented in this series have gone through the patch verification
tools and no warnings or errors are reported. Sample test logs obtained from AM33x,
AM43x and AM57x verifying the functionality on Linux next kernel are available here:

[Interface up Testing](https://gist.github.com/ParvathiPudi/59ca0087dc7bed0f83a3b0e6db27d39c)

[Ping Testing](https://gist.github.com/ParvathiPudi/bcd39aa7006f6176d8c5b71a23d0928b)

[Iperf Testing](https://gist.github.com/ParvathiPudi/9bb4fa42410fbc757a93f65ecb45e4f3)

[1] https://lore.kernel.org/all/20230106121046.886863-1-danishanwar@ti.com/
[2] https://lore.kernel.org/all/20250108125937.10604-1-basharath@couthit.com/
====================

Link: https://patch.msgid.link/20250912104741.528721-1-parvathi@couthit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMAINTAINERS: Add entries for ICSSM Ethernet driver
Parvathi Pudi [Fri, 12 Sep 2025 12:53:01 +0000 (18:23 +0530)]
MAINTAINERS: Add entries for ICSSM Ethernet driver

Add an entry to MAINTAINERS file for the ICSSM Ethernet driver with
appropriate maintainer information and mailing list.

Signed-off-by: Parvathi Pudi <parvathi@couthit.com>
Link: https://patch.msgid.link/20250912125421.530286-7-parvathi@couthit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: ti: icssm-prueth: Adds IEP support for PRUETH on AM33x, AM43x and AM57x SOCs
Parvathi Pudi [Fri, 12 Sep 2025 11:53:29 +0000 (17:23 +0530)]
net: ti: icssm-prueth: Adds IEP support for PRUETH on AM33x, AM43x and AM57x SOCs

Added API hooks for IEP module (legacy 32-bit model) to support
timestamping requests from application.

Reviewed-by: Mohan Reddy Putluru <pmohan@couthit.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Basharath Hussain Khaja <basharath@couthit.com>
Signed-off-by: Parvathi Pudi <parvathi@couthit.com>
Link: https://patch.msgid.link/20250912115443.529856-6-parvathi@couthit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: ti: icssm-prueth: Adds link detection, RX and TX support.
Roger Quadros [Fri, 12 Sep 2025 11:53:28 +0000 (17:23 +0530)]
net: ti: icssm-prueth: Adds link detection, RX and TX support.

Changes corresponding to link configuration such as speed and duplexity.
IRQ and handler initializations are performed for packet reception.Firmware
receives the packet from the wire and stores it into OCMC queue. Next, it
notifies the CPU via interrupt. Upon receiving the interrupt CPU will
service the IRQ and packet will be processed by pushing the newly allocated
SKB to upper layers.

When the user application want to transmit a packet, it will invoke
sys_send() which will in turn invoke the PRUETH driver, then it will write
the packet into OCMC queues. PRU firmware will pick up the packet and
transmit it on to the wire.

Reviewed-by: Mohan Reddy Putluru <pmohan@couthit.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Basharath Hussain Khaja <basharath@couthit.com>
Signed-off-by: Parvathi Pudi <parvathi@couthit.com>
Link: https://patch.msgid.link/20250912115443.529856-5-parvathi@couthit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: ti: icssm-prueth: Adds PRUETH HW and SW configuration
Roger Quadros [Fri, 12 Sep 2025 11:53:27 +0000 (17:23 +0530)]
net: ti: icssm-prueth: Adds PRUETH HW and SW configuration

Updates for MII_RT hardware peripheral configuration such as RX and TX
configuration for PRU0 and PRU1, frame sizes, and MUX config.

Updates for PRU-ICSS firmware register configuration and DRAM, SRAM and
OCMC memory initialization, which will be used in the runtime for packet
reception and transmission.

DUAL-EMAC memory allocation for software queues and its supporting
components such as the buffer descriptors and queue descriptors. These
software queues are placed in OCMC memory and are shared with CPU by
PRU-ICSS for packet receive and transmit.

All declarations and macros are being used from common header file
for various protocols.

Reviewed-by: Mohan Reddy Putluru <pmohan@couthit.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Basharath Hussain Khaja <basharath@couthit.com>
Signed-off-by: Parvathi Pudi <parvathi@couthit.com>
Link: https://patch.msgid.link/20250912115443.529856-4-parvathi@couthit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: ti: icssm-prueth: Adds ICSSM Ethernet driver
Roger Quadros [Fri, 12 Sep 2025 10:44:51 +0000 (16:14 +0530)]
net: ti: icssm-prueth: Adds ICSSM Ethernet driver

Updates Kernel configuration to enable PRUETH driver and its dependencies
along with makefile changes to add the new PRUETH driver.

Changes includes init and deinit of ICSSM PRU Ethernet driver including
net dev registration and firmware loading for DUAL-MAC mode running on
PRU-ICSS2 instance.

Changes also includes link handling, PRU booting, default firmware loading
and PRU stopping using existing remoteproc driver APIs.

Reviewed-by: Mohan Reddy Putluru <pmohan@couthit.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Basharath Hussain Khaja <basharath@couthit.com>
Signed-off-by: Parvathi Pudi <parvathi@couthit.com>
Link: https://patch.msgid.link/20250912104741.528721-3-parvathi@couthit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agodt-bindings: net: ti: Adds DUAL-EMAC mode support on PRU-ICSS2 for AM57xx, AM43xx...
Parvathi Pudi [Fri, 12 Sep 2025 10:44:50 +0000 (16:14 +0530)]
dt-bindings: net: ti: Adds DUAL-EMAC mode support on PRU-ICSS2 for AM57xx, AM43xx and AM33xx SOCs

Documentation update for the newly added "pruss2_eth" device tree
node and its dependencies along with compatibility for PRU-ICSS
Industrial Ethernet Peripheral (IEP), PRU-ICSS Enhanced Capture
(eCAP) peripheral and using YAML binding document for AM57xx SoCs.

Reviewed-by: Mohan Reddy Putluru <pmohan@couthit.com>
Co-developed-by: Basharath Hussain Khaja <basharath@couthit.com>
Signed-off-by: Basharath Hussain Khaja <basharath@couthit.com>
Signed-off-by: Parvathi Pudi <parvathi@couthit.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250912104741.528721-2-parvathi@couthit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: phy: broadcom: Convert to PHY_ID_MATCH_MODEL macro
Christian Marangi [Thu, 11 Sep 2025 13:08:33 +0000 (15:08 +0200)]
net: phy: broadcom: Convert to PHY_ID_MATCH_MODEL macro

Convert the pattern phy_id phy_id_mask to the generic PHY_ID_MATCH_MODEL
macro to drop hardcoding magic mask.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20250911130840.23569-3-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: phy: broadcom: Convert to phy_id_compare_model()
Christian Marangi [Thu, 11 Sep 2025 13:08:32 +0000 (15:08 +0200)]
net: phy: broadcom: Convert to phy_id_compare_model()

Convert driver to phy_id_compare_model() helper instead of the custom
BRCM_PHY_MODEL macro.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20250911130840.23569-2-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: phy: introduce phy_id_compare_model() PHY ID helper
Christian Marangi [Thu, 11 Sep 2025 13:08:31 +0000 (15:08 +0200)]
net: phy: introduce phy_id_compare_model() PHY ID helper

Similar to phy_id_compare_vendor(), introduce the equivalent
phy_id_compare_model() helper for the generic PHY ID Model mask.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20250911130840.23569-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'net-stmmac-timestamping-ptp-cleanups'
Jakub Kicinski [Sun, 14 Sep 2025 18:59:21 +0000 (11:59 -0700)]
Merge branch 'net-stmmac-timestamping-ptp-cleanups'

Russell King says:

====================
net: stmmac: timestamping/ptp cleanups

This series cleans up the hardware timestamping / PTP initialisation
and cleanup code in the stmmac driver. Several key points in no
particular order:

1. Golden rule: unregister first, then release resources.
   stmmac_release_ptp didn't do this.

2. Avoid leaking resources - __stmmac_open() failure leaves the
   timestamping support initialised, but stops its clock. Also
   violates (1).

3. Avoid double-release of resources - stmmac_open() followed by
   stmmac_xdp_open() failing results in the PTP clock prepare and
   enable counts being released, and if the interface is then
   brought down, they are incorrectly released again. As XDP
   doesn't gain any additional prepare/enables on the PTP clock,
   remove this incorrect cleanup.

4. Changing the MTU of the interface is disruptive to PTP, and
   remains so as long as. This is not fixed by this series (too
   invasive at the moment.)

5. Avoid exporting functions that aren't used...

6. Avoid unnecessary runtime PM state manipulations (no point
   manipulating this when MTU changes).

7. Make the PTP/timestamping initialisation more readable - no
   point calling functions in the same file from one callsite
   that return error codes from one location in the called function,
   to only have the sole callee print messages depending on that
   return code. Also simplifying the mess in stmmac_hw_setup().
   Also placing support checks in a better location. Also getting
   rid of the "ptp_register" boolean through this restructuring.

v1 was tested by Gatien CHEVALLIER - thanks.
====================

Link: https://patch.msgid.link/aMKtV6O0WqlmJFN4@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: move timestamping/ptp init to stmmac_hw_setup() caller
Russell King (Oracle) [Thu, 11 Sep 2025 11:10:28 +0000 (12:10 +0100)]
net: stmmac: move timestamping/ptp init to stmmac_hw_setup() caller

Move the call to stmmac_init_timestamping() or stmmac_setup_ptp() out
of stmmac_hw_setup() to its caller after stmmac_hw_setup() has
successfully completed. This slightly changes the ordering during
setup, but should be safe to do.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: move PTP support check into stmmac_init_timestamping()
Russell King (Oracle) [Thu, 11 Sep 2025 11:10:23 +0000 (12:10 +0100)]
net: stmmac: move PTP support check into stmmac_init_timestamping()

Move the PTP support check from stmmac_init_tstamp_counter() into
stmmac_init_timestamping() as it makes more sense to be there.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: add stmmac_setup_ptp()
Russell King (Oracle) [Thu, 11 Sep 2025 11:10:18 +0000 (12:10 +0100)]
net: stmmac: add stmmac_setup_ptp()

Add a function to setup PTP, which will enable the clock, initialise
the timestamping, and register with the PTP clock subsystem. Call this
when we want to register the PTP clock in stmmac_hw_setup(), otherwise
just call the

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: rename stmmac_init_ptp()
Russell King (Oracle) [Thu, 11 Sep 2025 11:10:13 +0000 (12:10 +0100)]
net: stmmac: rename stmmac_init_ptp()

Changes to the stmmac driver to fix various issues with PTP have made
stmmac_init_ptp() less about initialising the entire PTP block, and
now primarily deals with the packet timestamping support. The exception
to this is ptp_clk_freq_config(), which is an odditiy. It remains
as stmmac_init_ptp() is used both at .ndo_open() time and in the
resume paths.

However, restructuring this code to make it more easily readable makes
the continued use of "init_ptp" confusing.

In preparation to cleaning up the (re-)initialisation of timestamping,
rename the existing stmmac_init_ptp() to stmmac_init_timestamping()
which better reflects its functionality.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: move stmmac_init_ptp() messages into function
Russell King (Oracle) [Thu, 11 Sep 2025 11:10:07 +0000 (12:10 +0100)]
net: stmmac: move stmmac_init_ptp() messages into function

Move the stmmac_init_ptp() messages from stmmac_hw_setup() to
stmmac_init_ptp(), which will allow further cleanups.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: add __stmmac_release() to complement __stmmac_open()
Russell King (Oracle) [Thu, 11 Sep 2025 11:10:02 +0000 (12:10 +0100)]
net: stmmac: add __stmmac_release() to complement __stmmac_open()

Rename stmmac_release() to __stmmac_release(), providing a new
stmmac_release() method. Update stmmac_change_mtu() to use
__stmmac_release(). Move the runtime PM handling into stmmac_open()
and stmmac_release().

This avoids stmmac_change_mtu() needlessly fiddling with the runtime
PM state, and will allow future changes to remove code from
__stmmac_open() and __stmmac_release() that should only happen when
the net device is administratively brought up or down.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: unexport stmmac_init_tstamp_counter()
Russell King (Oracle) [Thu, 11 Sep 2025 11:09:57 +0000 (12:09 +0100)]
net: stmmac: unexport stmmac_init_tstamp_counter()

Nothing outside of stmmac_main.c makes use of
stmmac_init_tstamp_counter(), so there's no point exporting it for
modules, or even having it non-static. Remove the export and make
it static.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: fix stmmac_xdp_open() clk_ptp_ref error cleanup
Russell King (Oracle) [Thu, 11 Sep 2025 11:09:52 +0000 (12:09 +0100)]
net: stmmac: fix stmmac_xdp_open() clk_ptp_ref error cleanup

Neither stmmac_xdp_release() nor the normal paths of stmmac_xdp_open()
touch clk_ptp_ref, so stmmac_xdp_open() should not be doing anything
with this clock. However, in its error path, it calls
stmmac_hw_teardown() which disables and unprepares this clock, which
can lead to the clock state becoming unbalanced when the netdev is
taken administratively down.

Remove this call to stmmac_hw_teardown(), and as this is the last user
of this function, remove the function as well.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: fix PTP error cleanup in __stmmac_open()
Russell King (Oracle) [Thu, 11 Sep 2025 11:09:47 +0000 (12:09 +0100)]
net: stmmac: fix PTP error cleanup in __stmmac_open()

The cleanup function for stmmac_setup_ptp() is stmmac_release_ptp()
which entirely undoes the effects of stmmac_setup_ptp() by
unregistering the PTP device and then stopping the PTP clock,
whereas stmmac_hw_teardown() will only stop the PTP clock while
leaving the PTP device registered.

This can lead to a kernel oops - if __stmmac_open() fails after
registering the PTP clock, the PTP device will remain registered,
and if the module is removed, subsequent PTP device accesses will
lead to a kernel oops.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: disable PTP clock after unregistering PTP
Russell King (Oracle) [Thu, 11 Sep 2025 11:09:42 +0000 (12:09 +0100)]
net: stmmac: disable PTP clock after unregistering PTP

Follow the principle of unpublish from userspace and then teardown
resources.

Disable the PTP clock only after unregistering with the PTP subsystem,
which ensures that we only stop the clock that ticks the timesource
after we have removed the PTP device.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: ptp: improve handling of aux_ts_lock lifetime
Russell King (Oracle) [Thu, 11 Sep 2025 11:09:37 +0000 (12:09 +0100)]
net: stmmac: ptp: improve handling of aux_ts_lock lifetime

The aux_ts_lock mutex is only required while the PTP clock has been
successfully registered.

stmmac_ptp_register() does not return any errors (as we don't wish to
prevent the netdev being opened if PTP fails), stmmac_ptp_unregister()
was coded to allow it to be called irrespective of whether PTP was
successfully registered or not.

Arrange for the aux_ts_lock mutex to be destroyed if the PTP clock
is not functional during stmmac_ptp_register(), and only destroy it
in stmmac_ptp_unregister() if we had a PTP clock registered.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'net-dsa-mv88e6xxx-remove-redundant-ptp-timestamping-code'
Jakub Kicinski [Sun, 14 Sep 2025 18:56:26 +0000 (11:56 -0700)]
Merge branch 'net-dsa-mv88e6xxx-remove-redundant-ptp-timestamping-code'

Russell King says:

====================
net: dsa: mv88e6xxx: remove redundant ptp/timestamping code

mv88e6xxx as accumulated some unused data structures and code over the
years. This series removes it and simplifies the code. See the patches
for each change.
====================

Link: https://patch.msgid.link/aMKoYyN18FHFCa1q@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: dsa: mv88e6xxx: remove unused support for PPS event capture
Russell King (Oracle) [Thu, 11 Sep 2025 10:46:43 +0000 (11:46 +0100)]
net: dsa: mv88e6xxx: remove unused support for PPS event capture

mv88e6352_config_eventcap() is documented as handling both EXTTS and
PPS capture modes, but nothing ever calls it for PPS capture. Remove
the unused PPS capture mode support, and the now unused
MV88E6XXX_TAI_EVENT_STATUS_CAP_TRIG definition.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uwep9-00000004ikJ-2FeF@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: dsa: mv88e6xxx: remove chip->evcap_config
Russell King (Oracle) [Thu, 11 Sep 2025 10:46:38 +0000 (11:46 +0100)]
net: dsa: mv88e6xxx: remove chip->evcap_config

evcap_config is only read and written in mv88e6352_config_eventcap(),
so it makes little sense to store it in the global chip struct. Make
it a local variable instead.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uwep4-00000004ikD-1ZEh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: dsa: mv88e6xxx: remove chip->trig_config
Russell King (Oracle) [Thu, 11 Sep 2025 10:46:33 +0000 (11:46 +0100)]
net: dsa: mv88e6xxx: remove chip->trig_config

chip->trig_config is never written, and thus takes the value zero.
Remove this struct member and its single reader.

Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uweoz-00000004ik7-13Fl@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: dsa: mv88e6xxx: remove mv88e6250_ptp_ops
Russell King (Oracle) [Thu, 11 Sep 2025 10:46:28 +0000 (11:46 +0100)]
net: dsa: mv88e6xxx: remove mv88e6250_ptp_ops

mv88e6250_ptp_ops and mv88e6352_ptp_ops are identical since commit
7e3c18097a70 ("net: dsa: mv88e6xxx: read cycle counter period from
hardware"). Remove the unnecessary duplication.

Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uweou-00000004ik1-0aiX@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet/smc: Remove unused argument from 2 SMC functions
Mahanta Jambigi [Wed, 10 Sep 2025 06:31:25 +0000 (08:31 +0200)]
net/smc: Remove unused argument from 2 SMC functions

The smc argument is not used in both smc_connect_ism_vlan_setup() &
smc_connect_ism_vlan_cleanup(). Hence removing it.

Signed-off-by: Mahanta Jambigi <mjambigi@linux.ibm.com>
Reviewed-by: Sidraya Jayagond <sidraya@linux.ibm.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://patch.msgid.link/20250910063125.2112577-1-mjambigi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet/cls_cgroup: Fix task_get_classid() during qdisc run
Yafang Shao [Tue, 2 Sep 2025 06:29:33 +0000 (14:29 +0800)]
net/cls_cgroup: Fix task_get_classid() during qdisc run

During recent testing with the netem qdisc to inject delays into TCP
traffic, we observed that our CLS BPF program failed to function correctly
due to incorrect classid retrieval from task_get_classid(). The issue
manifests in the following call stack:

        bpf_get_cgroup_classid+5
        cls_bpf_classify+507
        __tcf_classify+90
        tcf_classify+217
        __dev_queue_xmit+798
        bond_dev_queue_xmit+43
        __bond_start_xmit+211
        bond_start_xmit+70
        dev_hard_start_xmit+142
        sch_direct_xmit+161
        __qdisc_run+102             <<<<< Issue location
        __dev_xmit_skb+1015
        __dev_queue_xmit+637
        neigh_hh_output+159
        ip_finish_output2+461
        __ip_finish_output+183
        ip_finish_output+41
        ip_output+120
        ip_local_out+94
        __ip_queue_xmit+394
        ip_queue_xmit+21
        __tcp_transmit_skb+2169
        tcp_write_xmit+959
        __tcp_push_pending_frames+55
        tcp_push+264
        tcp_sendmsg_locked+661
        tcp_sendmsg+45
        inet_sendmsg+67
        sock_sendmsg+98
        sock_write_iter+147
        vfs_write+786
        ksys_write+181
        __x64_sys_write+25
        do_syscall_64+56
        entry_SYSCALL_64_after_hwframe+100

The problem occurs when multiple tasks share a single qdisc. In such cases,
__qdisc_run() may transmit skbs created by different tasks. Consequently,
task_get_classid() retrieves an incorrect classid since it references the
current task's context rather than the skb's originating task.

Given that dev_queue_xmit() always executes with bh disabled, we can use
softirq_count() instead to obtain the correct classid.

The simple steps to reproduce this issue:
1. Add network delay to the network interface:
  such as: tc qdisc add dev bond0 root netem delay 1.5ms
2. Build two distinct net_cls cgroups, each with a network-intensive task
3. Initiate parallel TCP streams from both tasks to external servers.

Under this specific condition, the issue reliably occurs. The kernel
eventually dequeues an SKB that originated from Task-A while executing in
the context of Task-B.

It is worth noting that it will change the established behavior for a
slightly different scenario:

  <sock S is created by task A>
  <class ID for task A is changed>
  <skb is created by sock S xmit and classified>

prior to this patch the skb will be classified with the 'new' task A
classid, now with the old/original one. The bpf_get_cgroup_classid_curr()
function is a more appropriate choice for this case.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250902062933.30087-1-laoar.shao@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: mana: Reduce waiting time if HWC not responding
Haiyang Zhang [Wed, 10 Sep 2025 20:57:21 +0000 (13:57 -0700)]
net: mana: Reduce waiting time if HWC not responding

If HW Channel (HWC) is not responding, reduce the waiting time, so further
steps will fail quickly.
This will prevent getting stuck for a long time (30 minutes or more), for
example, during unloading while HWC is not responding.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1757537841-5063-1-git-send-email-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: use NUMA drop counters for softnet_data.dropped
Eric Dumazet [Tue, 9 Sep 2025 12:19:42 +0000 (12:19 +0000)]
net: use NUMA drop counters for softnet_data.dropped

Hosts under DOS attack can suffer from false sharing
in enqueue_to_backlog() : atomic_inc(&sd->dropped).

This is because sd->dropped can be touched from many cpus,
possibly residing on different NUMA nodes.

Generalize the sk_drop_counters infrastucture
added in commit c51613fa276f ("net: add sk->sk_drop_counters")
and use it to replace softnet_data.dropped
with NUMA friendly softnet_data.drop_counters.

This adds 64 bytes per cpu, maybe more in the future
if we increase the number of counters (currently 2)
per 'struct numa_drop_counters'.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250909121942.1202585-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'add-gmac-support-for-renesas-rz-t2h-n2h-socs'
Jakub Kicinski [Sun, 14 Sep 2025 18:33:51 +0000 (11:33 -0700)]
Merge branch 'add-gmac-support-for-renesas-rz-t2h-n2h-socs'

Lad Prabhakar says:

====================
Add GMAC support for Renesas RZ/{T2H, N2H} SoCs

This series adds support for the Ethernet MAC (GMAC) IP present on
the Renesas RZ/T2H and RZ/N2H SoCs.

While these SoCs use the same Synopsys DesignWare MAC IP (version 5.20) as
the existing RZ/V2H(P), the hardware is synthesized with different options
that require driver and binding updates:
- 8 RX/TX queue pairs instead of 4 (requiring 19 interrupts vs 11)
- Different clock requirements (3 clocks vs 7)
- Different reset handling (2 named resets vs 1 unnamed)
- Split header feature enabled
- GMAC connected through a MIIC PCS on RZ/T2H

The series first updates the generic dwmac binding to accommodate the
higher interrupt count, then extends the Renesas-specific binding with
a to document both SoCs.

The driver changes prepare for multi-SoC support by introducing OF match
data for per-SoC configuration, then add RZ/T2H support including PCS
integration through the existing RZN1 MIIC driver.

Note this patch series is dependent on the PCS driver [0]
(not a build dependency).
[0] https://lore.kernel.org/all/20250904114204.4148520-1-prabhakar.mahadev-lad.rj@bp.renesas.com/
====================

Link: https://patch.msgid.link/20250908105901.3198975-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: dwmac-renesas-gbeth: Add support for RZ/T2H SoC
Lad Prabhakar [Mon, 8 Sep 2025 10:59:01 +0000 (11:59 +0100)]
net: stmmac: dwmac-renesas-gbeth: Add support for RZ/T2H SoC

Extend the Renesas GBETH stmmac glue driver to support the RZ/T2H SoC,
where the GMAC is connected through a MIIC PCS. Introduce a new
`has_pcs` flag in `struct renesas_gbeth_of_data` to indicate when PCS
handling is required.

When enabled, the driver parses the `pcs-handle` phandle, creates a PCS
instance with `miic_create()`, and wires it into phylink. Proper cleanup
is done with `miic_destroy()`. New init/exit/select hooks are added to
`plat_stmmacenet_data` for PCS integration.

Update Kconfig to select `PCS_RZN1_MIIC` when building the Renesas GBETH
driver so the PCS support is always available.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20250908105901.3198975-4-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: stmmac: dwmac-renesas-gbeth: Use OF data for configuration
Lad Prabhakar [Mon, 8 Sep 2025 10:59:00 +0000 (11:59 +0100)]
net: stmmac: dwmac-renesas-gbeth: Use OF data for configuration

Prepare for adding RZ/T2H SoC support by making the driver configuration
selectable via OF match data. While the RZ/V2H(P) and RZ/T2H use the same
version of the Synopsys DesignWare MAC (version 5.20), the hardware is
synthesized with different options. To accommodate these differences,
introduce a struct holding per-SoC configuration such as clock list,
number of clocks, TX clock rate control, and STMMAC flags, and retrieve
it from the device tree match entry during probe.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20250908105901.3198975-3-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agodt-bindings: net: renesas,rzv2h-gbeth: Document Renesas RZ/T2H and RZ/N2H SoCs
Lad Prabhakar [Mon, 8 Sep 2025 10:58:59 +0000 (11:58 +0100)]
dt-bindings: net: renesas,rzv2h-gbeth: Document Renesas RZ/T2H and RZ/N2H SoCs

Add device tree binding support for the Gigabit Ethernet MAC (GMAC) IP
on Renesas RZ/T2H and RZ/N2H SoCs. While these SoCs use the same
Synopsys DesignWare MAC version 5.20 as RZ/V2H, they are synthesized
with different hardware configurations.

Add new compatible strings "renesas,r9a09g077-gbeth" for RZ/T2H and
"renesas,r9a09g087-gbeth" for RZ/N2H, with the latter using RZ/T2H as
fallback since they share identical GMAC IP.

Update the schema to handle hardware differences between SoC variants.
RZ/T2H requires only 3 clocks compared to 7 on RZ/V2H, supports 8 RX/TX
queue pairs instead of 4, and needs 2 reset controls with reset-names
property versus a single unnamed reset. RZ/T2H also has the split header
feature enabled which is disabled on RZ/V2H.

Add support for an optional pcs-handle property to connect the GMAC to
the MIIC PCS converter on RZ/T2H. Use conditional schema validation to
enforce the correct clock, reset, and interrupt configurations per SoC
variant.

Extend the base snps,dwmac.yaml schema to accommodate the increased
interrupt count, supporting up to 19 interrupts and extending the
rx-queue and tx-queue interrupt name patterns to cover queues 0-7.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250908105901.3198975-2-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: phy: micrel: Update Kconfig help text
Jonas Rebmann [Thu, 11 Sep 2025 08:29:03 +0000 (10:29 +0200)]
net: phy: micrel: Update Kconfig help text

This driver by now supports 17 different Microchip (formerly known as
Micrel) chips: KSZ9021, KSZ9031, KSZ9131, KSZ8001, KS8737, KSZ8021,
KSZ8031, KSZ8041, KSZ8051, KSZ8061, KSZ8081, KSZ8873MLL, KSZ886X,
KSZ9477, LAN8814, LAN8804 and LAN8841.

Support for the VSC8201 was removed in commit 51f932c4870f ("micrel phy
driver - updated(1)")

Update the help text to reflect that, list families instead of models to
ease future maintenance.

Signed-off-by: Jonas Rebmann <jre@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250911-micrel-kconfig-v2-1-e8f295059050@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfi...
Jakub Kicinski [Sat, 13 Sep 2025 00:06:25 +0000 (17:06 -0700)]
Merge tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter: updates for net-next

1) Don't respond to ICMP_UNREACH errors with another ICMP_UNREACH
   error.
2) Support fetching the current bridge ethernet address.
   This allows a more flexible approach to packet redirection
   on bridges without need to use hardcoded addresses. From
   Fernando Fernandez Mancera.
3) Zap a few no-longer needed conditionals from ipvs packet path
   and convert to READ/WRITE_ONCE to avoid KCSAN warnings.
   From Zhang Tengfei.
4) Remove a no-longer-used macro argument in ipset, from Zhen Ni.

* tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_reject: don't reply to icmp error messages
  ipvs: Use READ_ONCE/WRITE_ONCE for ipvs->enable
  netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support
  netfilter: ipset: Remove unused htable_bits in macro ahash_region
  selftest:net: fixed spelling mistakes
====================

Link: https://patch.msgid.link/20250911143819.14753-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agodt-bindings: net: Drop duplicate brcm,bcm7445-switch-v4.0.txt
Rob Herring (Arm) [Tue, 9 Sep 2025 14:23:38 +0000 (09:23 -0500)]
dt-bindings: net: Drop duplicate brcm,bcm7445-switch-v4.0.txt

The brcm,bcm7445-switch-v4.0.txt binding is already covered by
dsa/brcm,sf2.yaml. The listed deprecated properties aren't used anywhere
either.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250909142339.3219200-2-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: mvneta: add support for hardware timestamps
Russell King [Wed, 10 Sep 2025 12:50:46 +0000 (13:50 +0100)]
net: mvneta: add support for hardware timestamps

Add support for hardware timestamps in (e.g.) the PHY by calling
skb_tx_timestamp() as close as reasonably possible to the point that
the hardware is instructed to send the queued packets.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uwKHe-00000004glk-3nkJ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoudp_tunnel: use netdev_warn() instead of netdev_WARN()
Alok Tiwari [Wed, 10 Sep 2025 19:50:26 +0000 (12:50 -0700)]
udp_tunnel: use netdev_warn() instead of netdev_WARN()

netdev_WARN() uses WARN/WARN_ON to print a backtrace along with
file and line information. In this case, udp_tunnel_nic_register()
returning an error is just a failed operation, not a kernel bug.

udp_tunnel_nic_register() can fail due to a memory allocation
failure (kzalloc() or udp_tunnel_nic_alloc()).
This is a normal runtime error and not a kernel bug.

Replace netdev_WARN() with netdev_warn() accordingly.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250910195031.3784748-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'tcp-destroy-tcp-ao-tcp-md5-keys-in-sk_destruct'
Jakub Kicinski [Fri, 12 Sep 2025 02:05:59 +0000 (19:05 -0700)]
Merge branch 'tcp-destroy-tcp-ao-tcp-md5-keys-in-sk_destruct'

Dmitry Safonov says:

====================
tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()

On one side a minor/cosmetic issue, especially nowadays when
TCP-AO/TCP-MD5 signature verification failures aren't logged to dmesg.

Yet, I think worth addressing for two reasons:
- unsigned RST gets ignored by the peer and the connection is alive for
  longer (keep-alive interval)
- netstat counters increase and trace events report that trusted BGP peer
  is sending unsigned/incorrectly signed segments, which can ring alarm
  on monitoring.
====================

Link: https://patch.msgid.link/20250909-b4-tcp-ao-md5-rst-finwait2-v5-0-9ffaaaf8b236@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agotcp: Free TCP-AO/TCP-MD5 info/keys without RCU
Dmitry Safonov [Tue, 9 Sep 2025 01:18:51 +0000 (02:18 +0100)]
tcp: Free TCP-AO/TCP-MD5 info/keys without RCU

Now that the destruction of info/keys is delayed until the socket
destructor, it's safe to use kfree() without an RCU callback.
The socket is in TCP_CLOSE state either because it never left it,
or it's already closed and the refcounter is zero. In any way,
no one can discover it anymore, it's safe to release memory
straight away.

Similar thing was possible for twsk already.

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://patch.msgid.link/20250909-b4-tcp-ao-md5-rst-finwait2-v5-2-9ffaaaf8b236@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agotcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()
Dmitry Safonov [Tue, 9 Sep 2025 01:18:50 +0000 (02:18 +0100)]
tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()

Currently there are a couple of minor issues with destroying the keys
tcp_v4_destroy_sock():

1. The socket is yet in TCP bind buckets, making it reachable for
   incoming segments [on another CPU core], potentially available to send
   late FIN/ACK/RST replies.

2. There is at least one code path, where tcp_done() is called before
   sending RST [kudos to Bob for investigation]. This is a case of
   a server, that finished sending its data and just called close().

   The socket is in TCP_FIN_WAIT2 and has RCV_SHUTDOWN (set by
   __tcp_close())

   tcp_v4_do_rcv()/tcp_v6_do_rcv()
     tcp_rcv_state_process()            /* LINUX_MIB_TCPABORTONDATA */
       tcp_reset()
         tcp_done_with_error()
           tcp_done()
             inet_csk_destroy_sock()    /* Destroys AO/MD5 keys */
     /* tcp_rcv_state_process() returns SKB_DROP_REASON_TCP_ABORT_ON_DATA */
   tcp_v4_send_reset()                  /* Sends an unsigned RST segment */

   tcpdump:
> 22:53:15.399377 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 33929, offset 0, flags [DF], proto TCP (6), length 60)
>     1.0.0.1.34567 > 1.0.0.2.49848: Flags [F.], seq 2185658590, ack 3969644355, win 502, options [nop,nop,md5 valid], length 0
> 22:53:15.399396 00:00:01:01:00:00 > 00:00:b2:1f:00:00, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 51951, offset 0, flags [DF], proto TCP (6), length 72)
>     1.0.0.2.49848 > 1.0.0.1.34567: Flags [.], seq 3969644375, ack 2185658591, win 128, options [nop,nop,md5 valid,nop,nop,sack 1 {2185658590:2185658591}], length 0
> 22:53:16.429588 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
>     1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658590, win 0, length 0
> 22:53:16.664725 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
>     1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
> 22:53:17.289832 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
>     1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0

  Note the signed RSTs later in the dump - those are sent by the server
  when the fin-wait socket gets removed from hash buckets, by
  the listener socket.

Instead of destroying AO/MD5 info and their keys in inet_csk_destroy_sock(),
slightly delay it until the actual socket .sk_destruct(). As shutdown'ed
socket can yet send non-data replies, they should be signed in order for
the peer to process them. Now it also matches how AO/MD5 gets destructed
for TIME-WAIT sockets (in tcp_twsk_destructor()).

This seems optimal for TCP-MD5, while for TCP-AO it seems to have an
open problem: once RST get sent and socket gets actually destructed,
there is no information on the initial sequence numbers. So, in case
this last RST gets lost in the network, the server's listener socket
won't be able to properly sign another RST. Nothing in RFC 1122
prescribes keeping any local state after non-graceful reset.
Luckily, BGP are known to use keep alive(s).

While the issue is quite minor/cosmetic, these days monitoring network
counters is a common practice and getting invalid signed segments from
a trusted BGP peer can get customers worried.

Investigated-by: Bob Gilligan <gilligan@arista.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://patch.msgid.link/20250909-b4-tcp-ao-md5-rst-finwait2-v5-1-9ffaaaf8b236@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'bridge-allow-keeping-local-fdb-entries-only-on-vlan-0'
Jakub Kicinski [Fri, 12 Sep 2025 02:02:53 +0000 (19:02 -0700)]
Merge branch 'bridge-allow-keeping-local-fdb-entries-only-on-vlan-0'

Petr Machata says:

====================
bridge: Allow keeping local FDB entries only on VLAN 0

The bridge FDB contains one local entry per port per VLAN, for the MAC of
the port in question, and likewise for the bridge itself. This allows
bridge to locally receive and punt "up" any packets whose destination MAC
address matches that of one of the bridge interfaces or of the bridge
itself.

The number of these local "service" FDB entries grows linearly with number
of bridge-global VLAN memberships, but that in turn will tend to grow
quadratically with number of ports and per-port VLAN memberships. While
that does not cause issues during forwarding lookups, it does make dumps
impractically slow.

As an example, with 100 interfaces, each on 4K VLANs, a full dump of FDB
that just contains these 400K local entries, takes 6.5s. That's _without_
considering iproute2 formatting overhead, this is just how long it takes to
walk the FDB (repeatedly), serialize it into netlink messages, and parse
the messages back in userspace.

This is to illustrate that with growing number of ports and VLANs, the time
required to dump this repetitive information blows up. Arguably 4K VLANs
per interface is not a very realistic configuration, but then modern
switches can instead have several hundred interfaces, and we have fielded
requests for >1K VLAN memberships per port among customers.

FDB entries are currently all kept on a single linked list, and then
dumping uses this linked list to walk all entries and dump them in order.
When the message buffer is full, the iteration is cut short, and later
restarted. Of course, to restart the iteration, it's first necessary to
walk the already-dumped front part of the list before starting dumping
again. So one possibility is to organize the FDB entries in different
structure more amenable to walk restarts.

One option is to walk directly the hash table. The advantage is that no
auxiliary structure needs to be introduced. With a rough sketch of this
approach, the above scenario gets dumped in not quite 3 s, saving over 50 %
of time. However hash table iteration requires maintaining an active cursor
that must be collected when the dump is aborted. It looks like that would
require changes in the NDO protocol to allow to run this cleanup. Moreover,
on hash table resize the iteration is simply restarted. FDB dumps are
currently not guaranteed to correspond to any one particular state: entries
can be missed, or be duplicated. But with hash table iteration we would get
that plus the much less graceful resize behavior, where swaths of FDB are
duplicated.

Another option is to maintain the FDB entries in a red-black tree. We have
a PoC of this approach on hand, and the above scenario is dumped in about
2.5 s. Still not as snappy as we'd like it, but better than the hash table.
However the savings come at the expense of a more expensive insertion, and
require locking during dumps, which blocks insertion.

The upside of these approaches is that they provide benefits whatever the
FDB contents. But it does not seem like either of these is workable.
However we intend to clean up the RB tree PoC and present it for
consideration later on in case the trade-offs are considered acceptable.

Yet another option might be to use in-kernel FDB filtering, and to filter
the local entries when dumping. Unfortunately, this does not help all that
much either, because the linked-list walk still needs to happen. Also, with
the obvious filtering interface built around ndm_flags / ndm_state
filtering, one can't just exclude pure local entries in one query. One
needs to dump all non-local entries first, and then to get permanent
entries in another run filter local & added_by_user. I.e. one needs to pay
the iteration overhead twice, and then integrate the result in userspace.
To get significant savings, one would need a very specific knob like "dump,
but skip/only include local entries". But if we are adding a local-specific
knobs, maybe let's have an option to just not duplicate them in the first
place.

All this FDB duplication is there merely to make things snappy during
forwarding. But high-radix switches with thousands of VLANs typically do
not process much traffic in the SW datapath at all, but rather offload vast
majority of it. So we could exchange some of the runtime performance for a
neater FDB.

To that end, in this patchset, introduce a new bridge option,
BR_BOOLOPT_FDB_LOCAL_VLAN_0, which when enabled, has local FDB entries
installed only on VLAN 0, instead of duplicating them across all VLANs.
Then to maintain the local termination behavior, on FDB miss, the bridge
does a second lookup on VLAN 0.

Enabling this option changes the bridge behavior in expected ways. Since
the entries are only kept on VLAN 0, FDB get, flush and dump will not
perceive them on non-0 VLANs. And deleting the VLAN 0 entry affects
forwarding on all VLANs.

This patchset is loosely based on a privately circulated patch by Nikolay
Aleksandrov.

The patchset progresses as follows:

- Patch #1 introduces a bridge option to enable the above feature. Then
  patches #2 to #5 gradually patch the bridge to do the right thing when
  the option is enabled. Finally patch #6 adds the UAPI knob and the code
  for when the feature is enabled or disabled.
- Patches #7, #8 and #9 contain fixes and improvements to selftest
  libraries
- Patch #10 contains a new selftest
====================

Link: https://patch.msgid.link/cover.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoselftests: forwarding: Add test for BR_BOOLOPT_FDB_LOCAL_VLAN_0
Petr Machata [Thu, 4 Sep 2025 17:07:27 +0000 (19:07 +0200)]
selftests: forwarding: Add test for BR_BOOLOPT_FDB_LOCAL_VLAN_0

Add a selftest to check the operation of this newly-introduced bridge
option.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/62294f96884ab5d341648eef21243fa099a2dee5.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoselftests: net: lib.sh: Don't defer failed commands
Petr Machata [Thu, 4 Sep 2025 17:07:26 +0000 (19:07 +0200)]
selftests: net: lib.sh: Don't defer failed commands

Usually the autodefer helpers in lib.sh are expected to be run in context
where success is the expected outcome. However when using them for feature
detection, failure can legitimately occur. But the failed command still
schedules a cleanup, which will likely fail again.

Instead, only schedule deferred cleanup when the positive command succeeds.

This way of organizing the cleanup has the added benefit that now the
return code from these functions reflects whether the command passed.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/af10a5bb82ea11ead978cf903550089e006d7e70.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoselftests: defer: Introduce DEFER_PAUSE_ON_FAIL
Petr Machata [Thu, 4 Sep 2025 17:07:25 +0000 (19:07 +0200)]
selftests: defer: Introduce DEFER_PAUSE_ON_FAIL

The fact that all cleanup (ideally) goes through the defer framework makes
debugging of these commands a bit tricky. However, this also gives us a
nice point to place a hook along the lines of PAUSE_ON_FAIL. When the
environment variable DEFER_PAUSE_ON_FAIL is set, and a cleanup command
results in non-zero exit status, show a bit of debuginfo and give the user
an opportunity to interrupt the execution altogether.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/2a07d24568ede6c42e4701657fa0b738e490fe59.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoselftests: defer: Allow spaces in arguments of deferred commands
Petr Machata [Thu, 4 Sep 2025 17:07:24 +0000 (19:07 +0200)]
selftests: defer: Allow spaces in arguments of deferred commands

Currently the way deferred commands are stored and invoked causes any
whitespace to act as an argument separator when the command is executed.
To make it possible to use spaces in deferred commands, store the commands
quoted, and then eval the string prior to execution.

Fixes: a6e263f125cd ("selftests: net: lib: Introduce deferred commands")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/6c2523139a6f99103889c9c9fedcdc66a75441f4.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0
Petr Machata [Thu, 4 Sep 2025 17:07:23 +0000 (19:07 +0200)]
net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0

The previous patches introduced a new option, BR_BOOLOPT_FDB_LOCAL_VLAN_0.
When enabled, it has local FDB entries installed only on VLAN 0, instead of
duplicating them across all VLANs.

In this patch, add the corresponding UAPI toggle, and the code for turning
the feature on and off.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/ea99bfb10f687fa58091e6e1c2f8acc33f47ca45.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation
Petr Machata [Thu, 4 Sep 2025 17:07:22 +0000 (19:07 +0200)]
net: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation

When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the
member ports as well as the bridge itself should not be created per-VLAN,
but instead only on VLAN 0.

Thus when a VLAN is added for a port or the bridge itself, a local FDB
entry with the corresponding address should not be added when in the VLAN-0
mode.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/bb13ba01d58ed6d5d700e012c519d38ee6806d22.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge changeaddr, skip per-VLAN FDBs
Petr Machata [Thu, 4 Sep 2025 17:07:21 +0000 (19:07 +0200)]
net: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge changeaddr, skip per-VLAN FDBs

When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the
bridge itself should not be created per-VLAN, but instead only on VLAN 0.
When the bridge address changes, the local FDB entries need to be updated,
which is done in br_fdb_change_mac_address().

Bail out early when in VLAN-0 mode, so that the per-VLAN FDB entries are
not created. The per-VLAN walk is only done afterwards.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/0bd432cf91921ef7c4ed0e129de1d1cd358c716b.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN FDBs
Petr Machata [Thu, 4 Sep 2025 17:07:20 +0000 (19:07 +0200)]
net: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN FDBs

When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for member
ports should not be created per-VLAN, but instead only on VLAN 0. When the
member port address changes, the local FDB entries need to be updated,
which is done in br_fdb_changeaddr().

Under the VLAN-0 mode, only one local FDB entry will ever be added for a
port's address, and that on VLAN 0. Thus bail out of the delete loop early.
For the same reason, also skip adding the per-VLAN entries.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/0cf9d41836d2a245b0ce07e1a16ee05ca506cbe9.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss
Petr Machata [Thu, 4 Sep 2025 17:07:19 +0000 (19:07 +0200)]
net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss

When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the
member ports as well as the bridge itself should not be created per-VLAN,
but instead only on VLAN 0.

That means that br_handle_frame_finish() needs to make two lookups: the
primary lookup on an appropriate VLAN, and when that misses, a lookup on
VLAN 0.

Have the second lookup only accept local MAC addresses. Turning this into a
generic second-lookup feature is not the goal.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/8087475009dce360fb68d873b1ed9c80827da302.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0
Petr Machata [Thu, 4 Sep 2025 17:07:18 +0000 (19:07 +0200)]
net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0

The following patches will gradually introduce the ability of the bridge
to look up local FDB entries on VLAN 0 instead of using the VLAN indicated
by a packet.

In this patch, just introduce the option itself, with which the feature
will be linked.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/ab85e33ef41ed19a3deaef0ff7da26830da30642.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: devmem: expose tcp_recvmsg_locked errors
Stanislav Fomichev [Wed, 10 Sep 2025 16:24:29 +0000 (09:24 -0700)]
net: devmem: expose tcp_recvmsg_locked errors

tcp_recvmsg_dmabuf can export the following errors:
- EFAULT when linear copy fails
- ETOOSMALL when cmsg put fails
- ENODEV if one of the frags is readable
- ENOMEM on xarray failures

But they are all ignored and replaced by EFAULT in the caller
(tcp_recvmsg_locked). Expose real error to the userspace to
add more transparency on what specifically fails.

In non-devmem case (skb_copy_datagram_msg) doing `if (!copied)
copied=-EFAULT` is ok because skb_copy_datagram_msg can return only EFAULT.

Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250910162429.4127997-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'wireguard-fixes-for-6-17-rc6'
Jakub Kicinski [Fri, 12 Sep 2025 01:52:28 +0000 (18:52 -0700)]
Merge branch 'wireguard-fixes-for-6-17-rc6'

Jason A. Donenfeld says:

====================
wireguard fixes for 6.17-rc6

Please find three small fixes to wireguard:

1) A general simplification to the way wireguard chooses the next
   available cpu, by making use of cpumask_nth(), and covering an edge
   case.

2) A cleanup to the selftests kconfig.

3) A fix to the selftests kconfig so that it actually runs again.
====================

Link: https://patch.msgid.link/20250910013644.4153708-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agowireguard: selftests: select CONFIG_IP_NF_IPTABLES_LEGACY
Jason A. Donenfeld [Wed, 10 Sep 2025 01:36:44 +0000 (03:36 +0200)]
wireguard: selftests: select CONFIG_IP_NF_IPTABLES_LEGACY

This is required on recent kernels, where it is now off by default.
While we're here, fix some stray =m's that were supposed to be =y.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250910013644.4153708-5-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agowireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel config
David Hildenbrand [Wed, 10 Sep 2025 01:36:43 +0000 (03:36 +0200)]
wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel config

It's no longer user-selectable (and the default was already "y"), so
let's just drop it.

It was never really relevant to the wireguard selftests either way.

Cc: Shuah Khan <shuah@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250910013644.4153708-4-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agowireguard: queueing: always return valid online CPU in wg_cpumask_choose_online()
Yury Norov (NVIDIA) [Wed, 10 Sep 2025 01:36:42 +0000 (03:36 +0200)]
wireguard: queueing: always return valid online CPU in wg_cpumask_choose_online()

The function gets number of online CPUS, and uses it to search for
Nth cpu in cpu_online_mask.

If id == num_online_cpus() - 1, and one CPU gets offlined between
calling num_online_cpus() -> cpumask_nth(), there's a chance for
cpumask_nth() to find nothing and return >= nr_cpu_ids.

The caller code in __queue_work() tries to avoid that by checking the
returned CPU against WORK_CPU_UNBOUND, which is NR_CPUS. It's not the
same as '>= nr_cpu_ids'. On a typical Ubuntu desktop, NR_CPUS is 8192,
while nr_cpu_ids is the actual number of possible CPUs, say 8.

The non-existing cpu may later be passed to rcu_dereference() and
corrupt the logic. Fix it by switching from 'if' to 'while'.

Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250910013644.4153708-3-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agowireguard: queueing: simplify wg_cpumask_next_online()
Yury Norov [NVIDIA] [Wed, 10 Sep 2025 01:36:41 +0000 (03:36 +0200)]
wireguard: queueing: simplify wg_cpumask_next_online()

wg_cpumask_choose_online() opencodes cpumask_nth(). Use it and make the
function significantly simpler. While there, fix opencoded cpu_online()
too.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250910013644.4153708-2-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agogeneve: Avoid -Wflex-array-member-not-at-end warning
Gustavo A. R. Silva [Tue, 9 Sep 2025 15:42:39 +0000 (17:42 +0200)]
geneve: Avoid -Wflex-array-member-not-at-end warning

-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Move the conflicting declaration to the end of the corresponding
structure. Notice that `struct ip_tunnel_info` is a flexible
structure, this is a structure that contains a flexible-array
member.

Fix the following warning:

drivers/net/geneve.c:56:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/aMBK78xT2fUnpwE5@kspp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoipv6: udp: fix typos in comments
Alok Tiwari [Tue, 9 Sep 2025 12:26:07 +0000 (05:26 -0700)]
ipv6: udp: fix typos in comments

Correct typos in ipv6/udp.c comments:
"execeeds" -> "exceeds"
"tacking care" -> "taking care"
"measureable" -> "measurable"

No functional changes.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250909122611.3711859-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'net-af_packet-optimize-retire-operation'
Jakub Kicinski [Fri, 12 Sep 2025 01:40:10 +0000 (18:40 -0700)]
Merge branch 'net-af_packet-optimize-retire-operation'

Xin Zhao says:

====================
net: af_packet: optimize retire operation

In a system with high real-time requirements, the timeout mechanism of
ordinary timers with jiffies granularity is insufficient to meet the
demands for real-time performance. Meanwhile, the optimization of CPU
usage with af_packet is quite significant. Use hrtimer instead of timer
to help compensate for the shortcomings in real-time performance.
In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
enough, with fluctuations reaching over 8ms (on a system with HZ=250).
This is unacceptable in some high real-time systems that require timely
processing of network packets. By replacing it with hrtimer, if a timeout
of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
3 ms.
====================

Link: https://patch.msgid.link/20250908104549.204412-1-jackzxcui1989@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: af_packet: Use hrtimer to do the retire operation
Xin Zhao [Mon, 8 Sep 2025 10:45:49 +0000 (18:45 +0800)]
net: af_packet: Use hrtimer to do the retire operation

In a system with high real-time requirements, the timeout mechanism of
ordinary timers with jiffies granularity is insufficient to meet the
demands for real-time performance. Meanwhile, the optimization of CPU
usage with af_packet is quite significant. Use hrtimer instead of timer
to help compensate for the shortcomings in real-time performance.
In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
enough, with fluctuations reaching over 8ms (on a system with HZ=250).
This is unacceptable in some high real-time systems that require timely
processing of network packets. By replacing it with hrtimer, if a timeout
of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
3 ms.

Delete delete_blk_timer field, because hrtimer_cancel will check and wait
until the timer callback return and ensure never enter callback again.

Simplify the logic related to setting timeout, only update the hrtimer
expire time within the hrtimer callback, no longer update the expire time
in prb_open_block which is called by tpacket_rcv or timer callback.
Reasons why NOT update hrtimer in prb_open_block:
1) It will increase complexity to distinguish the two caller scenario.
2) hrtimer_cancel and hrtimer_start need to be called if you want to update
TMO of an already enqueued hrtimer, leading to complex shutdown logic.

One side effect of NOT update hrtimer when called by tpacket_rcv is that
a newly opened block triggered by tpacket_rcv may be retired earlier than
expected. On the other hand, if timeout is updated in prb_open_block, the
frequent reception of network packets that leads to prb_open_block being
called may cause hrtimer to be removed and enqueued repeatedly.

The retire hrtimer expiration is unconditional and periodic. If there are
numerous packet sockets on the system, please set an appropriate timeout
to avoid frequent enqueueing of hrtimers.

Reviewed-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/all/20250831100822.1238795-1-jackzxcui1989@163.com/
Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
Link: https://patch.msgid.link/20250908104549.204412-3-jackzxcui1989@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: af_packet: remove last_kactive_blk_num field
Xin Zhao [Mon, 8 Sep 2025 10:45:48 +0000 (18:45 +0800)]
net: af_packet: remove last_kactive_blk_num field

kactive_blk_num (K) is only incremented on block close.
In timer callback prb_retire_rx_blk_timer_expired, except delete_blk_timer
is true, last_kactive_blk_num (L) is set to match kactive_blk_num (K) in
all cases. L is also set to match K in prb_open_block.
The only case K not equal to L is when scheduled by tpacket_rcv
and K is just incremented on block close but no new block could be opened,
so that it does not call prb_open_block in prb_dispatch_next_block.
This patch modifies the prb_retire_rx_blk_timer_expired function by simply
removing the check for L == K. This patch just provides another checkpoint
to thaw the might-be-frozen block in any case. It doesn't have any effect
because __packet_lookup_frame_in_block() has the same logic and does it
again without this patch when detecting the ring is frozen. The patch only
advances checking the status of the ring.

Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/all/20250831100822.1238795-1-jackzxcui1989@163.com/
Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
Link: https://patch.msgid.link/20250908104549.204412-2-jackzxcui1989@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agodt-bindings: net: Convert APM XGene MDIO to DT schema
Rob Herring (Arm) [Mon, 8 Sep 2025 23:10:14 +0000 (18:10 -0500)]
dt-bindings: net: Convert APM XGene MDIO to DT schema

Convert the APM XGene MDIO bus binding to DT schema format. It's a
straight-forward conversion.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250908231016.2070305-2-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agodt-bindings: net: Convert apm,xgene-enet to DT schema
Rob Herring (Arm) [Mon, 8 Sep 2025 23:10:13 +0000 (18:10 -0500)]
dt-bindings: net: Convert apm,xgene-enet to DT schema

Convert the APM XGene Ethernet binding to DT schema format.

Add the missing apm,xgene2-sgenet and apm,xgene2-xgenet compatibles.
Drop "reg-names" as required. Add support for up to 16 interrupts.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250908231016.2070305-1-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'net-ethernet-renesas-rcar_gen4_ptp-simplify-register-layout'
Jakub Kicinski [Fri, 12 Sep 2025 01:34:38 +0000 (18:34 -0700)]
Merge branch 'net-ethernet-renesas-rcar_gen4_ptp-simplify-register-layout'

Niklas Söderlund says:

====================
net: ethernet: renesas: rcar_gen4_ptp: Simplify register layout

The daughter driver rcar_gen4_ptp used by both rswitch and rtsn where
upstreamed with support for possible different memory layouts on
different users. With all Gen4 boards upstream no such setup is
documented.

There are other issues related to how the rcar_gen4_ptp driver is shared
between multiple useres that needs to be cleaned up. But that will be a
larger work. So before that get some simple fixes done.

Patch 1/3 and 2/3 removes the support to allow different register
layouts on different SoCs by looking up offsets at runtime with a much
simpler interface. The new interface computes the offsets at compile
time.

While patch 3/3 is a drive-by patch taking a spurs comment and making a
lockdep check of it.

There is no intentional functional change in this series just cleaning
up in preparation of larger works to follow.
====================

Link: https://patch.msgid.link/20250908154426.3062861-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: ethernet: renesas: rcar_gen4_ptp: Use lockdep to verify internal usage
Niklas Söderlund [Mon, 8 Sep 2025 15:44:26 +0000 (17:44 +0200)]
net: ethernet: renesas: rcar_gen4_ptp: Use lockdep to verify internal usage

Instead of a having a comment that the lock must be held when calling
the internal helper add a lockdep check to enforce it.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250908154426.3062861-4-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: ethernet: renesas: rcar_gen4_ptp: Hide register layout
Niklas Söderlund [Mon, 8 Sep 2025 15:44:25 +0000 (17:44 +0200)]
net: ethernet: renesas: rcar_gen4_ptp: Hide register layout

With the support for multiple register layout removed all support
structures can be removed from the header file. Covert to a simpler
structure using defines for the register offsets.

There is no functional change, only switching from looking up offsets at
runtime to compile time.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250908154426.3062861-3-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: ethernet: renesas: rcar_gen4_ptp: Remove different memory layout
Niklas Söderlund [Mon, 8 Sep 2025 15:44:24 +0000 (17:44 +0200)]
net: ethernet: renesas: rcar_gen4_ptp: Remove different memory layout

When upstreaming the Gen4 PTP support for R-Car S4 the possibility for
different memory layouts on other Gen4 SoCs was build in. It turns out
this is not needed and instead needlessly makes the driver harder to
read, remove the support code that would have allowed different memory
layouts.

This change only deals with the public functions used by other drivers,
follow up work will clean up the rcar_gen4_ptp internals.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250908154426.3062861-2-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoeth: 8139too: Make 8139TOO_PIO depend on !NO_IOPORT_MAP
Daniel Palmer [Sun, 7 Sep 2025 06:43:49 +0000 (15:43 +0900)]
eth: 8139too: Make 8139TOO_PIO depend on !NO_IOPORT_MAP

When 8139too is probing and 8139TOO_PIO=y it will call pci_iomap_range()
and from there __pci_ioport_map() for the PCI IO space.
If HAS_IOPORT_MAP=n and NO_GENERIC_PCI_IOPORT_MAP=n, like it is on my
m68k config, __pci_ioport_map() becomes NULL, pci_iomap_range() will
always fail and the driver will complain it couldn't map the PIO space
and return an error.

NO_IOPORT_MAP seems to cover the case where what 8139too is trying
to do cannot ever work so make 8139TOO_PIO depend on being it false
and avoid creating an unusable driver.

Signed-off-by: Daniel Palmer <daniel@thingy.jp>
Link: https://patch.msgid.link/20250907064349.3427600-1-daniel@thingy.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoselftests: Replace sleep with slowwait
David Ahern [Wed, 10 Sep 2025 02:58:28 +0000 (20:58 -0600)]
selftests: Replace sleep with slowwait

Replace the sleep in kill_procs with slowwait.

Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250910025828.38900-2-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>