www.infradead.org Git - users/jedix/linux-maple.git/log

net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus

In case of an AQR105-based device, create a software node for the mdio
function, with a child node for the Aquantia AQR105 PHY, providing a
firmware-name (and a bit more, which may be used for future checks) to
allow the PHY to load a MAC specific firmware from the file system.

The name of the PHY software node follows the naming convention suggested
in the patch for the mdiobus_scan function (in the same patch series).

Signed-off-by: Hans-Frieder Vogt <hfdevel@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322-tn9510-v3a-v7-5-672a9a3d8628@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: add essential functions to aqr105 driver

This patch makes functions that were provided for aqr107 applicable to
aqr105, or replaces generic functions with specific ones. Since the aqr105
was introduced before NBASE-T was defined (or 802.3bz), there are a number
of vendor specific registers involved in the definition of the
advertisement, in auto-negotiation and in the setting of the speed. The
functions have been written following the downstream driver for TN4010
cards with aqr105 PHY, and use code from aqr107 functions wherever it
seemed to make sense.

Signed-off-by: Hans-Frieder Vogt <hfdevel@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322-tn9510-v3a-v7-4-672a9a3d8628@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: search for firmware-name in fwnode

Allow the firmware name of an Aquantia PHY alternatively be provided by the
property "firmware-name" of a swnode. This software node may be provided by
the MAC or MDIO driver.

Signed-off-by: Hans-Frieder Vogt <hfdevel@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322-tn9510-v3a-v7-3-672a9a3d8628@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: add probe function to aqr105 for firmware loading

Re-use the AQR107 probe function to load the firmware on the AQR105 (and
to probe the HWMON).

Signed-off-by: Hans-Frieder Vogt <hfdevel@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322-tn9510-v3a-v7-2-672a9a3d8628@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Add swnode support to mdiobus_scan

This patch will allow to use a swnode/fwnode defined for a phy_device. The
MDIO bus (mii_bus) needs to contain nodes for the PHY devices, named
"ethernet-phy@i", with i being the MDIO address (0 .. PHY_MAX_ADDR - 1).

The fwnode is only attached to the phy_device if there isn't already an
fwnode attached.

fwnode_get_named_child_node will increase the usage counter of the fwnode.
However, no new code is needed to decrease the counter again, since this is
already implemented in the phy_device_release function.

Signed-off-by: Hans-Frieder Vogt <hfdevel@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322-tn9510-v3a-v7-1-672a9a3d8628@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'basic-xdp-support-for-dqo-rda-queue-format'

Joshua Washington says:

====================
Basic XDP Support for DQO RDA Queue Format

This patch series updates the GVE XDP infrastructure and introduces
XDP_PASS and XDP_DROP support for the DQO RDA queue format.

The infrastructure changes of note include an allocation path refactor
for XDP queues, and a unification of RX buffer sizes across queue
formats.

This patch series will be followed by more patch series to introduce
XDP_TX and XDP_REDIRECT support, as well as zero-copy and multi-buffer
support.
====================

Link: https://patch.msgid.link/20250321002910.1343422-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: add XDP DROP and PASS support for DQ

This patch adds support for running XDP programs on DQ, along with
rudimentary processing for XDP_DROP and XDP_PASS. These actions require
very limited driver functionality when it comes to processing an XDP
buffer, so currently if the XDP action is not XDP_PASS, the packet is
dropped and stats are updated.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaliginedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy<hramamurthy@google.com>
Link: https://patch.msgid.link/20250321002910.1343422-7-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: update XDP allocation path support RX buffer posting

In order to support installing an XDP program on DQ, RX buffers need to
be reposted using 4K buffers, which is larger than the default packet
buffer size of 2K. This is needed to accommodate the extra head and tail
that accompanies the data portion of an XDP buffer. Continuing to use 2K
buffers would mean that the packet buffer size for the NIC would have to
be restricted to 2048 - 320 - 256 = 1472B. However, this is problematic
for two reasons: first, 1472 is not a packet buffer size accepted by
GVE; second, at least 1474B of buffer space is needed to accommodate an
MTU of 1460, which is the default on GCP. As such, we allocate 4K
buffers, and post a 2K section of those 4K buffers (offset relative to
the XDP headroom) to the NIC for DMA to avoid a potential extra copy.
Because the GQ-QPL datapath requires copies regardless, this change was
not needed to support XDP in that case.

To capture this subtlety, a new field, packet_buffer_truesize, has been
added to the rx ring struct to represent size of the allocated buffer,
while packet_buffer_size has been left to represent the portion of the
buffer posted to the NIC.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250321002910.1343422-6-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: merge packet buffer size fields

The data_buffer_size_dqo field in gve_priv and the packet_buffer_size
field in gve_rx_ring theoretically have the same meaning, but they are
defined in two different places and used in two separate contexts. There
is no good reason for this, so this change merges those fields into the
packet_buffer_size field in the RX ring.

This change also introduces a packet_buffer_size field to struct
gve_rx_queue_config to account for cases where queues are not allocated,
such as when the interface is down.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250321002910.1343422-5-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: update GQ RX to use buf_size

Commit ebdfae0d377b ("gve: adopt page pool for DQ RDA mode") introduced
a buf_size field to the gve_rx_slot_page_info struct, which can be used
in the datapath to take the place of the packet_buffer_size field, as it
will already be hot in the cache due to its extensive use. Using the
buf_size field in the datapath frees up the packet_buffer_size field in
the GQ-specific RX cacheline to be generalized for GQ and DQ (in the
next patch), as there is currently no common packet buffer size field
between the two queue formats.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250321002910.1343422-4-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: introduce config-based allocation for XDP

An earlier patch series[1] introduced RX/TX ring allocation configuration
structs which contained metadata used to allocate and configure new RX
and TX rings. This led to a much cleaner and safer allocation pattern
wherein queue resources were not deallocated until new queue resources
were successfully allocated.

Migrate the XDP allocation path to use the same pattern to allow for the
existence of a single allocation path instead of relying on XDP-specific
allocation methods. These extra allocation methods result in the
duplication of many existing behaviors while being prone to error when
configuration changes unrelated to XDP occur.

Link: https://lore.kernel.org/netdev/20240122182632.1102721-1-shailend@google.com/
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250321002910.1343422-3-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics

These statistics pollute the hotpath and do not have any real-world use
or meaning.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250321002910.1343422-2-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phylink: force link down on major_config failure

If we fail to configure the MAC or PCS according to the desired mode,
do not allow the network link to come up until we have successfully
configured the MAC and PCS. This improves phylink's behaviour when an
error occurs.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1twkqO-0006FI-Gm@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

stmmac: intel: interface switching support for RPL-P platform

Based on the patch series [1], the enablement of interface switching for
RPL-P will use the same handling as ADL-N.

Link: https://patchwork.kernel.org/project/netdevbpf/cover/20250227121522.1802832-1-yong.liang.choong@linux.intel.com/
Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Link: https://patch.msgid.link/20250324062742.462771-1-yong.liang.choong@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'stmmac-several-pci-related-improvements'

Philipp Stanner says:

====================
stmmac: Several PCI-related improvements
====================

Link: https://patch.msgid.link/20250324092928.9482-2-phasta@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

stmmac: Replace deprecated PCI functions

The PCI functions
- pcim_iomap_regions() and
- pcim_iomap_table()
have been deprecated.

Replace them with their successor function, pcim_iomap_region().

Make variable declaration order at closeby places comply with reverse
christmas tree order.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Tested-by: Henry Chen <chenx97@aosc.io>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250324092928.9482-6-phasta@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

stmmac: Remove pcim_* functions for driver detach

Functions prefixed with "pcim_" are managed devres functions which
perform automatic cleanup once the driver unloads. It is, thus, not
necessary to call any cleanup functions in remove() callbacks.

Remove the pcim_ cleanup function calls in the remove() callbacks.

Signed-off-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Tested-by: Henry Chen <chenx97@aosc.io>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250324092928.9482-5-phasta@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

stmmac: loongson: Remove surplus loop

loongson_dwmac_probe() contains a loop which doesn't have an effect,
because it tries to call pcim_iomap_regions() with the same parameters
several times. The break statement at the loop's end furthermore ensures
that the loop only runs once anyways.

Remove the surplus loop.

Signed-off-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Tested-by: Henry Chen <chenx97@aosc.io>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250324092928.9482-4-phasta@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tcp-dccp-remove-16-bytes-from-icsk'

Eric Dumazet says:

====================
tcp/dccp: remove 16 bytes from icsk

icsk->icsk_timeout and icsk->icsk_ack.timeout can be removed.

They mirror existing fields in icsk->icsk_retransmit_timer and
icsk->icsk_retransmit_timer.
====================

Link: https://patch.msgid.link/20250324203607.703850-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp/dccp: remove icsk->icsk_ack.timeout

icsk->icsk_ack.timeout can be replaced by icsk->csk_delack_timer.expires

This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250324203607.703850-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp/dccp: remove icsk->icsk_timeout

icsk->icsk_timeout can be replaced by icsk->icsk_retransmit_timer.expires

This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250324203607.703850-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-skip-taking-rtnl_lock-for-queue-get'

Jakub Kicinski says:

====================
net: skip taking rtnl_lock for queue GET (prep)

Skip taking rtnl_lock for queue GET ops on devices which opt
into running all ops under the instance lock. In preparating
for performing queue ops without rtnl lock clarify the protection
of queue-related fields.

v1: https://lore.kernel.org/20250312223507.805719-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250324224537.248800-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: protect rxq->mp_params with the instance lock

Ensure that all accesses to mp_params are under the netdev
instance lock. The only change we need is to move
dev_memory_provider_uninstall() under the lock.

Appropriately swap the asserts.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: designate queue -> napi linking as "ops protected"

netdev netlink is the only reader of netdev_{,rx_}queue->napi,
and it already holds netdev->lock. Switch protection of
the writes to netdev->lock to "ops protected".

The expectation will be now that accessing queue->napi
will require netdev->lock for "ops locked" drivers, and
rtnl_lock for all other drivers.

Current "ops locked" drivers don't require any changes.
gve and netdevsim use _locked() helpers right next to
netif_queue_set_napi() so they must be holding the instance
lock. iavf doesn't call it. bnxt is a bit messy but all paths
seem locked.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: designate queue counts as "double ops protected" by instance lock

Drivers which opt into instance lock protection of ops should
only call set_real_num_*_queues() under the instance lock.
This means that queue counts are double protected (writes
are under both rtnl_lock and instance lock, readers under
either).

Some readers may still be under the rtnl_lock, however, so for
now we need double protection of writers.

OTOH queue API paths are only under the protection of the instance
lock, so we need to validate that the instance is actually locking
ops, otherwise the input checks we do against queue count are racy.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: explain "protection types" for the instance lock

Try to define some terminology for which fields are protected
by which lock and how. Some fields are protected by both rtnl_lock
and instance lock which is hard to talk about without having
a "key phrase" to refer to a particular protection scheme.

"ops protected" fields are defined later in the series, one by one.

Add ASSERT_RTNL() to netdev_ops_assert_locked() for drivers
not other instance protection of ops. Hopefully it's not too
confusion that netdev_lock_ops() does not match the lock which
netdev_ops_assert_locked() will assert, exactly. The noun "ops"
is in a different place in the name, so I think it's acceptable...

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: constify dev pointer in misc instance lock helpers

lockdep asserts and predicates can operate on const pointers.
In the future this will let us add asserts in functions
which operate on const pointers like dev_get_min_mp_channel_count().

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: remove netif_set_real_num_rx_queues() helper for when SYSFS=n

Since commit a953be53ce40 ("net-sysfs: add support for device-specific
rx queue sysfs attributes"), so for at least a decade now it is safe
to call net_rx_queue_update_kobjects() when SYSFS=n. That function
does its own ifdef-inery and will return 0. Remove the unnecessary
stub for netif_set_real_num_rx_queues().

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf()

A recent commit added taking the netdev instance lock
in netdev_nl_bind_rx_doit(), but didn't remove it in
net_devmem_unbind_dmabuf() which it calls from an error path.
Always expect the callers of net_devmem_unbind_dmabuf() to
hold the lock. This is consistent with net_devmem_bind_dmabuf().

(Not so) coincidentally this also protects mp_param with the instance
lock, which the rest of this series needs.

Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'virtio_net-fixes-and-improvements'

Akihiko Odaki says:

====================
virtio_net: Fixes and improvements

Jason Wang recently proposed an improvement to struct
virtio_net_rss_config:
https://lore.kernel.org/CACGkMEud0Ki8p=z299Q7b4qEDONpYDzbVqhHxCNVk_vo-KdP9A@mail.gmail.com

This patch series implements it and also fixes a few minor bugs I found
when writing patches.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
v1: https://lore.kernel.org/20250318-virtio-v1-0-344caf336ddd@daynix.com
====================

Link: https://patch.msgid.link/20250321-virtio-v2-0-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

virtio_net: Allocate rss_hdr with devres

virtnet_probe() lacks the code to free rss_hdr in its error path.
Allocate rss_hdr with devres so that it will be automatically freed.

Fixes: 86a48a00efdf ("virtio_net: Support dynamic rss indirection table size")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20250321-virtio-v2-4-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

virtio_net: Use new RSS config structs

The new RSS configuration structures allow easily constructing data for
VIRTIO_NET_CTRL_MQ_RSS_CONFIG as they strictly follow the order of data
for the command.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20250321-virtio-v2-3-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

virtio_net: Fix endian with virtio_net_ctrl_rss

Mark the fields of struct virtio_net_ctrl_rss as little endian as
they are in struct virtio_net_rss_config, which it follows.

Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20250321-virtio-v2-2-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

virtio_net: Split struct virtio_net_rss_config

struct virtio_net_rss_config was less useful in actual code because of a
flexible array placed in the middle. Add new structures that split it
into two to avoid having a flexible array in the middle.

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20250321-virtio-v2-1-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: mcs: Remove redundant 'flush_workqueue()' calls

'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.

Remove the redundant 'flush_workqueue()' calls.

This was generated with coccinelle:

@@
expression E;
@@

- flush_workqueue(E);
destroy_workqueue(E);

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250324080854.408188-1-nichen@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Revert "udp_tunnel: GRO optimizations"

Revert "udp_tunnel: use static call for GRO hooks when possible"
This reverts commit 311b36574ceaccfa3f91b74054a09cd4bb877702.

Revert "udp_tunnel: create a fastpath GRO lookup."
This reverts commit 8d4880db378350f8ed8969feea13bdc164564fc1.

There are multiple small issues with the series. In the interest
of unblocking the merge window let's opt for a revert.

Link: https://lore.kernel.org/cover.1742557254.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-sfp-add-single-byte-smbus-sfp-access'

Maxime Chevallier says:

====================
net: phy: sfp: Add single-byte SMBus SFP access

This is V4 for the single-byte SMBus support for SFP cages as well as
embedded PHYs accessed over mdio-i2c.

v3: https://lore.kernel.org/20250314162319.516163-1-maxime.chevallier@bootlin.com
v2: https://lore.kernel.org/20250225112043.419189-1-maxime.chevallier@bootlin.com
v1: https://lore.kernel.org/20250223172848.1098621-1-maxime.chevallier@bootlin.com
====================

Link: https://patch.msgid.link/20250322075745.120831-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mdio-i2c: Add support for single-byte SMBus operations

PHYs that are within copper SFP modules have their MDIO bus accessible
through address 0x56 (usually) on the i2c bus. The MDIO-I2C bridge is
desgned for 16 bits accesses, but we can also perform 8bits accesses by
reading/writing the high and low bytes sequentially.

This commit adds support for this type of accesses, thus supporting
smbus controllers such as the one in the VSC8552.

This was only tested on Copper SFP modules that embed a Marvell 88e1111
PHY.

Tested-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322075745.120831-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: sfp: Add support for SMBus module access

The SFP module's eeprom and internals are accessible through an i2c bus.

It is possible that the SFP might be connected to an SMBus-only
controller, such as the one found in some PHY devices in the VSC85xx
family.

Introduce a set of sfp read/write ops that are going to be used if the
i2c bus is only capable of doing smbus byte accesses.

As Single-byte SMBus transaction go against SFF-8472 and breaks the
atomicity for diagnostics data access, hwmon is disabled in the case
of SMBus access.

Moreover, as this may cause other instabilities, print a warning at
probe time to indicate that the setup may be unreliable because of the
hardware design.

As hwmon may be disabled for both broken EEPROM and smbus, the warnings
are udpated accordingly.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322075745.120831-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2025-03-24

1) Prevent setting high order sequence number bits input in
   non-ESN mode. From Leon Romanovsky.

2) Support PMTU handling in tunnel mode for packet offload.
   From Leon Romanovsky.

3) Make xfrm_state_lookup_byaddr lockless.
   From Florian Westphal.

4) Remove unnecessary NULL check in xfrm_lookup_with_ifid().
   From Dan Carpenter.

* tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: Remove unnecessary NULL check in xfrm_lookup_with_ifid()
  xfrm: state: make xfrm_state_lookup_byaddr lockless
  xfrm: check for PMTU in tunnel mode for packet offload
  xfrm: provide common xdo_dev_offload_ok callback implementation
  xfrm: rely on XFRM offload
  xfrm: simplify SA initialization routine
  xfrm: delay initialization of offload path till its actually requested
  xfrm: prevent high SEQ input in non-ESN mode
====================

Link: https://patch.msgid.link/20250324061855.4116819-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: qcom,ipa: Correct indentation and style in DTS example

DTS example in the bindings should be indented with 2- or 4-spaces and
aligned with opening '- |', so correct any differences like 3-spaces or
mixtures 2- and 4-spaces in one binding.

No functional changes here, but saves some comments during reviews of
new patches built on existing code.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Alex Elder <elder@riscstar.com>
Link: https://patch.msgid.link/20250324125222.82057-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-next-25-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next:

1) Use kvmalloc in xt_hashlimit, from Denis Kirjanov.

2) Tighten nf_conntrack sysctl accepted values for nf_conntrack_max
   and nf_ct_expect_max, from Nicolas Bouchinet.

3) Avoid lookup in nft_fib if socket is available, from Florian Westphal.

4) Initialize struct lsm_context in nfnetlink_queue to avoid
   hypothetical ENOMEM errors, Chenyuan Yang.

5) Use strscpy() instead of _pad when initializing xtables table name,
   kzalloc is already used to initialized the table memory area.
   From Thorsten Blum.

6) Missing socket lookup by conntrack information for IPv6 traffic
   in nft_socket, there is a similar chunk in IPv4, this was never
   added when IPv6 NAT was introduced. From Maxim Mikityanskiy.

7) Fix clang issues with nf_tables CONFIG_MITIGATION_RETPOLINE,
   from WangYuli.

* tag 'nf-next-25-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: Only use nf_skip_indirect_calls() when MITIGATION_RETPOLINE
  netfilter: socket: Lookup orig tuple for IPv6 SNAT
  netfilter: xtables: Use strscpy() instead of strscpy_pad()
  netfilter: nfnetlink_queue: Initialize ctx to avoid memory allocation error
  netfilter: fib: avoid lookup if socket is available
  netfilter: conntrack: Bound nf_conntrack sysctl writes
  netfilter: xt_hashlimit: replace vmalloc calls with kvmalloc
====================

Link: https://patch.msgid.link/20250323100922.59983-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: au1000_eth: Mark au1000_ReleaseDB() static

This fixes the following build warning:
```
drivers/net/ethernet/amd/au1000_eth.c:574:6: warning: no previous prototype for 'au1000_ReleaseDB' [-Wmissing-prototypes]
574 | void au1000_ReleaseDB(struct au1000_private *aup, struct db_dest *pDB)
| ^~~~~~~~~~~~~~~~
```

Signed-off-by: Johan Korsnes <johan.korsnes@gmail.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250323190450.111241-1-johan.korsnes@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: rfs: hash function change

RFS is using two kinds of hash tables.

First one is controlled by /proc/sys/net/core/rps_sock_flow_entries = 2^N
and using the N low order bits of the l4 hash is good enough.

Then each RX queue has its own hash table, controlled by
/sys/class/net/eth1/queues/rx-$q/rps_flow_cnt = 2^X

Current hash function, using the X low order bits is suboptimal,
because RSS is usually using Func(hash) = (hash % power_of_two);

For example, with 32 RX queues, 6 low order bits have no entropy
for a given queue.

Switch this hash function to hash_32(hash, log) to increase
chances to use all possible slots and reduce collisions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250321171309.634100-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-next-2025-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
More features for 6.15, major changes:
* cfg80211/mac80211: fix and enable link reconfiguration
* rtw88: support RTL8814AE/RTL8814AU
* mt7996: preparations for MLO
* ath12k: continued work on MLO
* iwlwifi: add new iwlmld sub-driver/op-mode for
   some current and future devices
* wfx: wowlan support

* tag 'wireless-next-2025-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (311 commits)
  wifi: mt76: mt7996: fix locking in mt7996_mac_sta_rc_work()
  wifi: mt76: mt76x2u: add TP-Link TL-WDN6200 ID to device table
  wifi: mt76: mt792x: re-register CHANCTX_STA_CSA only for the mt7921 series
  wifi: mt76: mt7996: Update mt7996_tx to MLO support
  wifi: mt76: mt7996: rework mt7996_ampdu_action to support MLO
  wifi: mt76: mt7996: rework set/get_tsf callabcks to support MLO
  wifi: mt76: mt7996: set vif default link_id adding/removing vif links
  wifi: mt76: mt7996: rework mt7996_mcu_beacon_inband_discov to support MLO
  wifi: mt76: mt7996: rework mt7996_mcu_add_obss_spr to support MLO
  wifi: mt76: mt7996: rework mt7996_net_fill_forward_path to support MLO
  wifi: mt76: mt7996: rework mt7996_update_mu_group to support MLO
  wifi: mt76: mt7996: rework mt7996_mac_sta_poll to support MLO
  wifi: mt76: mt7996: rework mt7996_mac_sta_rc_work to support MLO
  wifi: mt76: mt7996: remove mt7996_mac_enable_rtscts()
  wifi: mt76: mt7996: rework mt7996_sta_hw_queue_read to support MLO
  wifi: mt76: mt7996: rework mt7996_set_hw_key to support MLO
  wifi: mt76: mt7996: Add mt7996_sta_link to mt7996_mcu_add_bss_info signature
  wifi: mt76: mt7996: rework mt7996_sta_set_4addr and mt7996_sta_set_decap_offload to support MLO
  wifi: mt76: mt7996: rework mt7996_rx_get_wcid to support MLO
  wifi: mt76: mt7996: Rely on wcid_to_sta in mt7996_mac_add_txs_skb()
  ...
====================

Link: https://patch.msgid.link/20250320131106.33266-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-dwmac-rk-add-gmac-support-for-rk3528'

Jonas Karlman says:

====================
net: stmmac: dwmac-rk: Add GMAC support for RK3528

The Rockchip RK3528 has two Ethernet controllers, one 100/10 MAC to be
used with the integrated PHY and a second 1000/100/10 MAC to be used
with an external Ethernet PHY.

This series add initial support for the Ethernet controllers found
in RK3528 and initial support to power up/down the integrated PHY.

v2: https://lore.kernel.org/20250309232622.1498084-1-jonas@kwiboo.se
v1: https://lore.kernel.org/20250306221402.1704196-1-jonas@kwiboo.se
====================

Link: https://patch.msgid.link/20250319214415.3086027-1-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-rk: Add initial support for RK3528 integrated PHY

Rockchip RK3528 (and RV1106) has a different integrated PHY compared to
the integrated PHY on RK3228/RK3328. Current powerup/down operation is
not compatible with the integrated PHY found in these newer SoCs.

Add operations to powerup/down the integrated PHY found in RK3528.
Use helpers that can be used by other GMAC variants in the future.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250319214415.3086027-6-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-rk: Add integrated_phy_powerdown operation

Rockchip RK3528 (and RV1106) has a different integrated PHY compared to
the integrated PHY on RK3228/RK3328. Current powerup/down operation is
not compatible with the integrated PHY found in these newer SoCs.

Add a new integrated_phy_powerdown operation and change the call chain
for integrated_phy_powerup to prepare support for the integrated PHY
found in these newer SoCs.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250319214415.3086027-5-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-rk: Move integrated_phy_powerup/down functions

Rockchip RK3528 (and RV1106) has a different integrated PHY compared to
the integrated PHY on RK3228/RK3328. Current powerup/down operation is
not compatible with the integrated PHY found in these SoCs.

Move the rk_gmac_integrated_phy_powerup/down functions to top of the
file to prepare for them to be called directly by a GMAC variant
specific powerup/down operation.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250319214415.3086027-4-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-rk: Add GMAC support for RK3528

Rockchip RK3528 has two Ethernet controllers based on Synopsys DWC
Ethernet QoS IP.

Add initial support for the RK3528 GMAC variant.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://patch.msgid.link/20250319214415.3086027-3-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: rockchip-dwmac: Add compatible string for RK3528

Rockchip RK3528 has two Ethernet controllers based on Synopsys DWC
Ethernet QoS IP.

Add compatible string for the RK3528 variant.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250319214415.3086027-2-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-improve-stmmac-resume-rx-clocking'

Russell King says:

====================
net: improve stmmac resume rx clocking

stmmac has had a long history of problems with resuming, illustrated by
reset failure due to the receive clock not running.

Several attempts have been attempted over the years to address this
issue, such as moving phylink_start() (now phylink_resume()) super
early in stmmac_resume() in commit 90702dcd19c0 ("net: stmmac: fix MAC
not working when system resume back with WoL a ctive.") However, this
has the downside that stmmac_mac_link_up() can (and demonstrably is)
called before or during the driver initialisation in another thread.
This can cause issues as packets could begin to be queued, and the
transmit/receive enable bits will be set before any initialisation has
been done.

Another attempt is used by dwmac-socfpga.c in commit 2d871aa07136 ("net:
stmmac: add platform init/exit for Altera's ARM socfpga") which
pre-dates the above commit.

Neither of these two approaches consider the effect of EEE with a PHY
that supports receive clock-stop and has that feature enabled (which
the stmmac driver does enable). If the link is up, then there is the
possibility for the receive path to be in low-power mode, and the PHY
may stop its receive clock.

This series addresses these issues by (each is not necessarily a
separate patch):

1) introducing phylink_prepare_resume(), which can be used by MAC
   drivers to ensure that the PHY is resumed prior to doing any
   re-initialisation work. This call is added to stmmac_resume().

2) moving phylink_resume() after all re-initialisation has completed,
   thereby ensuring that the hardware is ready to be enabled for
   packet reception/transmission.

3) with (1) and (2) addressed, the need for socfpga to have a private
   work-around is no longer necessary, so it is removed.

4) introducing phylink functions to block/unblock the receive clock-
   stop at the PHY. As these require PHY access over the MDIO bus,
   they can sleep, so are not suitable for atomic access.

5) the stmmac hardware requires the receive clock to be running for
   reset to complete. Depending on synthesis options, this requirement
   may also extend to writing various registers as well, e.g. setting
   the MAC address, writing some of the vlan registers, etc. Full
   details are in the databook.

   We add blocking/unblocking of the PHY receive clock-stop around
   parts of the main stmmac driver where we have a context that we
   can sleep. These are wrapped with the new phylink functions.

   However, depending on synthesis options, there could be other
   places where the net core calls the driver with a BH-disabled
   context where we can't sleep, and thus can't block the PHY from
   disabling its receive clock. These are documented with FIXME
   comments.

Given the last paragraph above, I am wondering whether a better
approach would be to ensure that receive clock-stop is always disabled
at the PHY with stmmac. From what I can see, implementations do not
document to this level of detail, which makes it difficult to tell
which registers require the receive clock to be running to behave
correctly.

This patch series has been tested on the Tegra194 Jetson Xavier NX
board kindly donated by NVidia, with two additional patches that are
pending in patchwork - the first is required to have EEE's LPI mode
passed through to the MAC on this platform to allow testing under
PHY clock-stop scenarios. The second is a bug fix for PHYLIB and
makes "eee off" functional, but should not affect this series.

All patches on top of net-next commit f749448ce9f1 ("Merge branch
'net-mlx5-hw-steering-cleanups'")

https://patchwork.kernel.org/project/netdevbpf/patch/E1ttnHW-00785s-Uq@rmk-PC.armlinux.org.uk/
https://patchwork.kernel.org/project/netdevbpf/patch/E1ttmWN-0077Mb-Q6@rmk-PC.armlinux.org.uk/
====================

Link: https://patch.msgid.link/Z9ySeo61VYTClIJJ@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: block PHY RXC clock-stop

The DesignWare core requires the receive clock to be running during
certain operations. Ensure that we block PHY RXC clock-stop during
these operations.

This is a best-efforts change - not everywhere can be covered by this
because of net's core locking, which means we can't access the MDIO
bus to configure the PHY to disable RXC clock-stop in certain areas.
These are marked with FIXME comments.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6p-008Vjz-Qy@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phylink: add functions to block/unblock rx clock stop

Some MACs require the PHY receive clock to be running to complete setup
actions. This may fail if the PHY has negotiated EEE, the MAC supports
receive clock stop, and the link has entered LPI state. Provide a pair
of APIs that MAC drivers can use to temporarily block the PHY disabling
the receive clock.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6k-008Vjt-MZ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: socfpga: remove phy_resume() call

As the previous commit addressed DWGMAC resuming with a PHY in
suspended state, there is now no need for socfpga to work around
this. Remove this code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6f-008Vjn-J1@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: address non-LPI resume failures properly

The Synopsys Designware GMAC core databook requires all clocks to be
active in order to complete software reset, which we perform during
resume.

However, IEEE 802.3 allows a PHY to stop its clocks when placed in
low-power mode, which happens when the system is suspended and WoL
is not enabled.

As an attempt to work around this, commit 36d18b5664ef ("net: stmmac:
start phylink instance before stmmac_hw_setup()") started phylink
early, but this has the side effect that the mac_link_up() method may
be called before or during the initialisation of GMAC hardware.

We also have the socfpga glue driver directly calling phy_resume()
also as an attempt to work around this.

In a previous commit, phylink_prepare_resume() has been introduced
to give MAC drivers a way to ensure that the PHY is resumed prior to
their initialisation of their MAC hardware. This commit adds the call,
and moves the phylink_resume() call back to where it should be before
the aforementioned commit.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6a-008Vjh-FG@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phylink: add phylink_prepare_resume()

When the system is suspended, the PHY may be placed in low-power mode
by setting the BMCR 0.11 Power down bit. IEEE 802.3 states that the
behaviour of the PHY in this state is implementation specific, and
the PHY is not required to meet the RX_CLK and TX_CLK requirements.
Essentially, this means that a PHY may stop the clocks that it is
generating while in power down state.

However, MACs exist which require the clocks from the PHY to be running
in order to properly resume. phylink_prepare_resume() provides them
with a way to clear the Power down bit early.

Note, however, that IEEE 802.3 gives PHYs up to 500ms grace before the
transmit and receive clocks meet the requirements after clearing the
power down bit.

Add a resume preparation function, which will ensure that the receive
clock from the PHY is appropriately configured while resuming.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6V-008Vjb-AP@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'sfc-devlink-flash-for-x4'

Edward Cree says:

====================
sfc: devlink flash for X4

Updates to support devlink flash on X4 NICs.
Patch #2 is needed for NVRAM_PARTITION_TYPE_AUTO, and patch #1 is
needed because the latest MCDI headers from firmware no longer
include MDIO read/write commands.

v1: https://lore.kernel.org/cover.1742223233.git.ecree.xilinx@gmail.com
====================

Link: https://patch.msgid.link/cover.1742493016.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: support X4 devlink flash

Unlike X2 and EF100, we do not attempt to parse the firmware file to
find an image within it; we simply hand the entire file to the MC,
which is responsible for understanding any container formats we might
use and validating that the firmware file is applicable to this NIC.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/9a72a74002a7819c780b0a18ce9294c9d4e1db12.1742493017.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: update MCDI protocol headers

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/bcb7597460a5a99d1dca4ef282f4aa2dd46ae545.1742493017.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: rip out MDIO support

Unlike Siena, no EF10 board ever had an external PHY, and consequently
MDIO handling isn't even built into the firmware. Since Siena has
been split out into its own driver, the MDIO code can be deleted from
the sfc driver.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/aa689d192ddaef7abe82709316c2be648a7bd66e.1742493017.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: reorganize IP MIB values (II)

Commit 14a196807482 ("net: reorganize IP MIB values") changed
MIB values to group hot fields together.

Since then 5 new fields have been added without caring about
data locality.

This patch moves IPSTATS_MIB_OUTPKTS, IPSTATS_MIB_NOECTPKTS,
IPSTATS_MIB_ECT1PKTS, IPSTATS_MIB_ECT0PKTS, IPSTATS_MIB_CEPKTS
to the hot portion of per-cpu data.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250320101434.3174412-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: avoid atomic operations on sk->sk_rmem_alloc

TCP uses generic skb_set_owner_r() and sock_rfree()
for received packets, with socket lock being owned.

Switch to private versions, avoiding two atomic operations
per packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250320121604.3342831-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'nexthop-convert-rtm_-new-del-nexthop-to-per-netns-rtnl'

Kuniyuki Iwashima says:

====================
nexthop: Convert RTM_{NEW,DEL}NEXTHOP to per-netns RTNL.

Patch 1 - 5 move some validation for RTM_NEWNEXTHOP so that it can be
called without RTNL.

Patch 6 & 7 converts RTM_NEWNEXTHOP and RTM_DELNEXTHOP to per-netns RTNL.

Note that RTM_GETNEXTHOP and RTM_GETNEXTHOPBUCKET are not touched in
this series.

rtm_get_nexthop() can be easily converted to RCU, but rtm_dump_nexthop()
needs more work due to the left-to-right rbtree walk, which looks prone
to node deletion and tree rotation without a retry mechanism.

v1: https://lore.kernel.org/netdev/20250318233240.53946-1-kuniyu@amazon.com/
====================

Link: https://patch.msgid.link/20250319230743.65267-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Convert RTM_DELNEXTHOP to per-netns RTNL.

In rtm_del_nexthop(), only nexthop_find_by_id() and remove_nexthop()
require RTNL as they touch net->nexthop.rb_root.

Let's move RTNL down as rtnl_net_lock() before nexthop_find_by_id().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Convert RTM_NEWNEXTHOP to per-netns RTNL.

If we pass false to the rtnl_held param of lwtunnel_valid_encap_type(),
we can move RTNL down before rtm_to_nh_config_rtnl().

Let's use rtnl_net_lock() in rtm_new_nexthop().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Remove redundant group len check in nexthop_create_group().

The number of NHA_GROUP entries is guaranteed to be non-zero in
nh_check_attr_group().

Let's remove the redundant check in nexthop_create_group().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Check NLM_F_REPLACE and NHA_ID in rtm_new_nexthop().

nexthop_add() checks if NLM_F_REPLACE is specified without
non-zero NHA_ID, which does not require RTNL.

Let's move the check to rtm_new_nexthop().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Move NHA_OIF validation to rtm_to_nh_config_rtnl().

NHA_OIF needs to look up a device by __dev_get_by_index(),
which requires RTNL.

Let's move NHA_OIF validation to rtm_to_nh_config_rtnl().

Note that the proceeding checks made the original !cfg->nh_fdb
check redundant.

  NHA_FDB is set           -> NHA_OIF cannot be set
  NHA_FDB is set but false -> NHA_OIF must be set
  NHA_FDB is not set       -> NHA_OIF must be set

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Split nh_check_attr_group().

We will push RTNL down to rtm_new_nexthop(), and then we
want to move non-RTNL operations out of the scope.

nh_check_attr_group() validates NHA_GROUP attributes, and
nexthop_find_by_id() and some validation requires RTNL.

Let's factorise such parts as nh_check_attr_group_rtnl()
and call it from rtm_to_nh_config_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Move nlmsg_parse() in rtm_to_nh_config() to rtm_new_nexthop().

We will split rtm_to_nh_config() into non-RTNL and RTNL parts,
and then the latter also needs tb.

As a prep, let's move nlmsg_parse() to rtm_new_nexthop().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250319230743.65267-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: fix _DEVADD() and _DEVUPD() macros

ip6_rcv_core() is using:

__IP6_ADD_STATS(net, idev,
IPSTATS_MIB_NOECTPKTS +
(ipv6_get_dsfield(hdr) & INET_ECN_MASK),
max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs));

This is currently evaluating both expressions twice.

Fix _DEVADD() and _DEVUPD() macros to evaluate their arguments once.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250319212516.2385451-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-misc-enhancements-2025-03-19'

Tariq Toukan says:

====================
mlx5 misc enhancements 2025-03-19

This series introduces multiple small misc enhancements
from the team to the mlx5 core and Eth drivers.
====================

Link: https://patch.msgid.link/1742392983-153050-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, Don't offload CT commit if it's the last action

For CT action with commit argument, it's usually followed by the
forward action, either to the output netdev or next chain. The default
behavior for software is to drop by setting action attribute to
TC_ACT_SHOT instead of TC_ACT_PIPE if it's the last action. But driver
can't handle it, so block the offload for such case.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: CT: Filter legacy rules that are unrelated to nic

In nic mode CT setup where we do hairpin between the two
nics, both nics register to the same flow table (per zone),
and try to offload all rules on it.

Instead, filter the rules that originated from the relevant nic
(so only one side is offloaded for each nic).

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Update pfnum retrieval for devlink port attributes

Align mlx5 driver usage of 'pfnum' with the documentation clarification
introduced in commit bb70b0d48d8e ("devlink: Improve the port attributes
description").

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: fw reset, check bridge accessibility at earlier stage

Currently, mlx5_is_reset_now_capable() checks whether the pci bridge is
accessible only on bridge hot plug capability check. If the pci bridge
is not accessible, reset now will fail regardless of bridge hotplug
capability. Move this check to function mlx5_is_reset_now_capable()
which, in such case, aborts the reset and does so in the request phase
instead of the reset now phase.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Lag, use port selection tables when available

As queue affinity is being deprecated and will no longer be supported
in the future, Always check for the presence of the port selection
namespace. When available, leverage it to distribute traffic
across the physical ports via steering, ensuring compatibility with
future NICs.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TX, Utilize WQ fragments edge for multi-packet WQEs

For simplicity reasons, the driver avoids crossing work queue fragment
boundaries within the same TX WQE (Work-Queue Element). Until today, as
the number of packets in a TX MPWQE (Multi-Packet WQE) descriptor is not
known in advance, the driver pre-prepared contiguous memory for the
largest possible WQE. For this, when getting too close to the fragment
edge, having no room for the largest WQE possible, the driver was
filling the fragment remainder with NOP descriptors, aligning the next
descriptor to the beginning of the next fragment.

Generating and handling these NOPs wastes resources, like: CPU cycles,
work-queue entries fetched to the device, and PCI bandwidth.

In this patch, we replace this NOPs filling mechanism in the TX MPWQE
flow. Instead, we utilize the remaining entries of the fragment with a
TX MPWQE. If this room turns out to be too small, we simply open an
additional descriptor starting at the beginning of the next fragment.

Performance benchmark:
uperf test, single server against 3 clients.
TCP multi-stream, bidir, traffic profile "2x350B read, 1400B write".
Bottleneck is in inbound PCI bandwidth (device POV).

+---------------+------------+------------+--------+
|               | Before     | After      |        |
+---------------+------------+------------+--------+
| BW            | 117.4 Gbps | 121.1 Gbps | +3.1%  |
+---------------+------------+------------+--------+
| tx_packets    | 15 M/sec   | 15.5 M/sec | +3.3%  |
+---------------+------------+------------+--------+
| tx_nops       | 3  M/sec   | 0          | -100%  |
+---------------+------------+------------+--------+

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1742391746-118647-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dql: Fix dql->limit value when reset.

Executing dql_reset after setting a non-zero value for limit_min can
lead to an unreasonable situation where dql->limit is less than
dql->limit_min.

For instance, after setting
/sys/class/net/eth*/queues/tx-0/byte_queue_limits/limit_min,
an ifconfig down/up operation might cause the ethernet driver to call
netdev_tx_reset_queue, which in turn invokes dql_reset.

In this case, dql->limit is reset to 0 while dql->limit_min remains
non-zero value, which is unexpected. The limit should always be
greater than or equal to limit_min.

Signed-off-by: Jing Su <jingsusu@didiglobal.com>
Link: https://patch.msgid.link/Z9qHD1s/NEuQBdgH@pilot-ThinkCentre-M930t-N000
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-net-mixed-select-polling-mode-for-tcp-ao-tests'

Dmitry Safonov via says:

====================
selftests/net: Mixed select()+polling mode for TCP-AO tests

Should fix flaky tcp-ao/connect-deny-ipv6 test.

v1: https://lore.kernel.org/20250312-tcp-ao-selftests-polling-v1-0-72a642b855d5@gmail.com
====================

Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-0-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Drop timeout argument from test_client_verify()

It's always TEST_TIMEOUT_SEC, with an unjustified exception in rst test,
that is more paranoia-long timeout rather than based on requirements.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-7-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Delete timeout from test_connect_socket()

Unused: it's always either the default timeout or asynchronous
connect().

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-6-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Print the testing side in unsigned-md5

As both client and server print the same test name on failure or pass,
add "[server]" so that it's more obvious from a log which side printed
"ok" or "not ok".

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-5-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Add mixed select()+polling mode to TCP-AO tests

Currently, tcp_ao tests have two timeouts: TEST_RETRANSMIT_SEC and
TEST_TIMEOUT_SEC [by default 1 and 5 seconds]. The first one,
TEST_RETRANSMIT_SEC is used for operations that are expected to succeed
in order for a test to pass. It is usually not consumed and exists only
to avoid indefinite test run if the operation didn't complete.
The second one, TEST_RETRANSMIT_SEC exists for the tests that checking
operations, that are expected to fail/timeout. It is shorter as it is
fully consumed, with an expectation that if operation didn't succeed
during that period, it will timeout. And the related test that expects
the timeout is passing. The actual operation failure is then
cross-verified by other means like counters checks.

The issue with TEST_RETRANSMIT_SEC timeout is that 1 second is the exact
initial TCP timeout. So, in case the initial segment gets lost (quite
unlikely on local veth interface between two net namespaces, yet happens
in slow VMs), the retransmission never happens and as a result, the test
is not actually testing the functionality. Which in the end fails
counters checks.

As I want tcp_ao selftests to be fast and finishing in a reasonable
amount of time on manual run, I didn't consider increasing
TEST_RETRANSMIT_SEC.

Rather, initially, BPF_SOCK_OPS_TIMEOUT_INIT looked promising as a lever
to make the initial TCP timeout shorter. But as it's not a socket bpf
attached thing, but sock_ops (attaches to cgroups), the selftests would
have to use libbpf, which I wanted to avoid if not absolutely required.

Instead, use a mixed select() and counters polling mode with the longer
TEST_TIMEOUT_SEC timeout to detect running-away failed tests. It
actually not only allows losing segments and succeeding after
the previous TEST_RETRANSMIT_SEC timeout was consumed, but makes
the tests expecting timeout/failure pass faster.

The only test case taking longer (TEST_TIMEOUT_SEC) now is connect-deny
"wrong snd id", which checks for no key on SYN-ACK for which there is no
counter in the kernel (see tcp_make_synack()). Yet it can be speed up
by poking skpair from the trace event (see trace_tcp_ao_synack_no_key).

Fixes: ed9d09b309b1 ("selftests/net: Add a test for TCP-AO keys matching")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20241205070656.6ef344d7@kernel.org/
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-4-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Fetch and check TCP-MD5 counters

There are related TCP-MD5 <=> TCP and TCP-MD5 <=> TCP-AO tests
that can benefit from checking the related counters, not only from
validating operations timeouts.

It also prepares the code for introduction of mixed select()+poll mode,
see the follow-up patches.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-3-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Provide tcp-ao counters comparison helper

Rename __test_tcp_ao_counters_cmp() into test_assert_counters_ao() and
test_tcp_ao_key_counters_cmp() into test_assert_counters_key() as they
are asserts, rather than just compare functions.

Provide test_cmp_counters() helper, that's going to be used to compare
ao_info and netns counters as a stop condition for polling the sockets.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-2-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Print TCP flags in more common format

Before:
># 13145[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: !FS!R!P!., keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1

After:
># 13487[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: S, keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1

For the history, I think the initial format was to emphasize the absence
of flags as well as their presence (!R meant no RST flag). But looking
again, it's just unreadable and hard to understand.
Make it the standard/expected one.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-1-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ynl: devlink: add missing board-serial-number

Add a missing attribute of board serial number.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250320085947.103419-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-xdp-add-missing-metadata-support-for-some-xdp-drvs'

Lorenzo Bianconi says:

====================
net: xdp: Add missing metadata support for some xdp drvs

Introduce missing metadata support for some xdp drivers setting metadata
size building the skb from xdp_buff.
Please note most of the drivers are just compile tested.

v1: https://lore.kernel.org/20250311-mvneta-xdp-meta-v1-0-36cf1c99790e@kernel.org
====================

Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-0-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ti: cpsw: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in cpsw/cpsw_new
drivers. ti cpsw and cpsw_new drivers set xdp headroom at least to
CPSW_HEADROOM_NA:

CPSW_HEADROOM_NA max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN

so the headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-7-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in mana driver.
mana driver sets xdp headroom to XDP_PACKET_HEADROOM so the headroom is
large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-6-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mediatek: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in mediatek driver.
mtk_eth_soc driver sets xdp headroom to XDP_PACKET_HEADROOM so the
headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-5-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: octeontx2: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in octeontx2 driver.
octeontx2 driver sets xdp headroom to OTX2_HEAD_ROOM

OTX2_HEAD_ROOM OTX2_ALIGN
OTX2_ALIGN 128

so the headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-4-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: netsec: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in netsec driver.
netsec driver sets xdp headroom to NETSEC_RXBUF_HEADROOM:

NETSEC_RXBUF_HEADROOM max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN

so the headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-3-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mvpp2: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in mvpp2 driver
mvpp2 driver sets xdp headroom to:

MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM

where

MVPP2_MH_SIZE 2
MVPP2_SKB_HEADROOM min(max(XDP_PACKET_HEADROOM, NET_SKB_PAD), 224)

so the headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-2-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mvneta: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in mvneta driver
mvneta sets xdp headroom to:

MVNETA_MH_SIZE + MVNETA_SKB_HEADROOM

where

MVNETA_MH_SIZE 2
MVNETA_SKB_HEADROOM max(NET_SKB_PAD, XDP_PACKET_HEADROOM)

so the headroom is large enough to contain xdp_frame and xdp metadata.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-1-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: tulip: avoid unused variable warning

There is an effort to achieve W=1 kernel builds without warnings.
As part of that effort Helge Deller highlighted the following warnings
in the tulip driver when compiling with W=1 and CONFIG_TULIP_MWI=n:

.../tulip_core.c: In function ‘tulip_init_one’:
.../tulip_core.c:1309:22: warning: variable ‘force_csr0’ set but not used

This patch addresses that problem using IS_ENABLED(). This approach has
the added benefit of reducing conditionally compiled code. And thus
increasing compile coverage. E.g. for allmodconfig builds which enable
CONFIG_TULIP_MWI.

Compile tested only.
No run-time effect intended.

Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250318-tulip-w1-v3-1-a813fadd164d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'af_unix-clean-up-headers'

Kuniyuki Iwashima says:

====================
af_unix: Clean up headers.

AF_UNIX files include many unnecessary headers (netdevice.h and
rtnetlink.h, etc), and this series cleans them up.

Note that there are still some headers included indirectly and
modifying them triggers rebuild, which seems mostly inevitable. [0]

  $ python3 include_graph.py net/unix/garbage.c linux/rtnetlink.h linux/netdevice.h
  ...
  include/net/af_unix.h
  | include/linux/net.h
  | | include/linux/once.h
  | | include/linux/sockptr.h
  | | include/uapi/linux/net.h
  | include/net/sock.h
  | | include/linux/netdevice.h   <---
  ...
  | | include/net/dst.h
  | | | include/linux/rtnetlink.h <---

[0]: https://gist.github.com/q2ven/9c5897f11a493145829029c0bfb364d0
====================

Link: https://patch.msgid.link/20250318034934.86708-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Clean up #include under net/unix/.

net/unix/*.c include many unnecessary header files (rtnetlink.h,
netdevice.h, etc).

Let's clean them up.

af_unix.c:

  +uapi/linux/sockios.h   : Only exist under include/uapi
  +uapi/linux/termios.h   : Only exist under include/uapi

  -linux/freezer.h        : No longer use freezable_schedule_timeout()
  -linux/in.h             : No ipv4_is_XXX() etc
  -linux/module.h         : No longer support CONFIG_UNIX=m
  -linux/netdevice.h      : No dev used
  -linux/rtnetlink.h      : Not part of rtnetlink API
  -linux/signal.h         : signal_pending() is defined in sched/signal.h
  -linux/stat.h           : No struct stat used
  -net/checksum.h         : CHECKSUM_UNNECESSARY is defined in skbuff.h

diag.c:

  +linux/dcache.h         : struct dentry in sk_diag_dump_vfs()
  +linux/user_namespace.h : struct user_namespace in sk_diag_dump_uid()
  +uapi/linux/unix_diag.h : Only exist under include/uapi/

garbage.c:

  +linux/list.h           : struct unix_{vertex,edge}, etc
  +linux/workqueue.h      : DECLARE_WORK(unix_gc_work, ...)

  -linux/file.h           : No fget() etc
  -linux/kernel.h         : No cond_resched() etc
  -linux/netdevice.h      : No dev used
  -linux/proc_fs.h        : No procfs provided
  -linux/string.h         : No memcpy(), kmemdup(), etc

sysctl_net_unix.c:

  +linux/string.h         : kmemdup()
  +net/net_namespace.h    : struct net, net_eq()

  -linux/mm.h             : slab.h is enough

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250318034934.86708-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>