]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
2 months agonet: When removing nexthops, don't call synchronize_net if it is not necessary
Christoph Paasch [Sat, 16 Aug 2025 23:12:49 +0000 (16:12 -0700)]
net: When removing nexthops, don't call synchronize_net if it is not necessary

When removing a nexthop, commit
90f33bffa382 ("nexthops: don't modify published nexthop groups") added a
call to synchronize_rcu() (later changed to _net()) to make sure
everyone sees the new nexthop-group before the rtnl-lock is released.

When one wants to delete a large number of groups and nexthops, it is
fastest to first flush the groups (ip nexthop flush groups) and then
flush the nexthops themselves (ip -6 nexthop flush). As that way the
groups don't need to be rebalanced.

However, `ip -6 nexthop flush` will still take a long time if there is
a very large number of nexthops because of the call to
synchronize_net(). Now, if there are no more groups, there is no point
in calling synchronize_net(). So, let's skip that entirely by checking
if nh->grp_list is empty.

This gives us a nice speedup:

BEFORE:
=======

$ time sudo ip -6 nexthop flush
Dump was interrupted and may be inconsistent.
Flushed 2097152 nexthops

real 1m45.345s
user 0m0.001s
sys 0m0.005s

$ time sudo ip -6 nexthop flush
Dump was interrupted and may be inconsistent.
Flushed 4194304 nexthops

real 3m10.430s
user 0m0.002s
sys 0m0.004s

AFTER:
======

$ time sudo ip -6 nexthop flush
Dump was interrupted and may be inconsistent.
Flushed 2097152 nexthops

real 0m17.545s
user 0m0.003s
sys 0m0.003s

$ time sudo ip -6 nexthop flush
Dump was interrupted and may be inconsistent.
Flushed 4194304 nexthops

real 0m35.823s
user 0m0.002s
sys 0m0.004s

Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250816-nexthop_dump-v2-2-491da3462118@openai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: Make nexthop-dumps scale linearly with the number of nexthops
Christoph Paasch [Sat, 16 Aug 2025 23:12:48 +0000 (16:12 -0700)]
net: Make nexthop-dumps scale linearly with the number of nexthops

When we have a (very) large number of nexthops, they do not fit within a
single message. rtm_dump_walk_nexthops() thus will be called repeatedly
and ctx->idx is used to avoid dumping the same nexthops again.

The approach in which we avoid dumping the same nexthops is by basically
walking the entire nexthop rb-tree from the left-most node until we find
a node whose id is >= s_idx. That does not scale well.

Instead of this inefficient approach, rather go directly through the
tree to the nexthop that should be dumped (the one whose nh_id >=
s_idx). This allows us to find the relevant node in O(log(n)).

We have quite a nice improvement with this:

Before:
=======

--> ~1M nexthops:
$ time ~/libnl/src/nl-nh-list | wc -l
1050624

real 0m21.080s
user 0m0.666s
sys 0m20.384s

--> ~2M nexthops:
$ time ~/libnl/src/nl-nh-list | wc -l
2101248

real 1m51.649s
user 0m1.540s
sys 1m49.908s

After:
======

--> ~1M nexthops:
$ time ~/libnl/src/nl-nh-list | wc -l
1050624

real 0m1.157s
user 0m0.926s
sys 0m0.259s

--> ~2M nexthops:
$ time ~/libnl/src/nl-nh-list | wc -l
2101248

real 0m2.763s
user 0m2.042s
sys 0m0.776s

Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250816-nexthop_dump-v2-1-491da3462118@openai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: drv-net: ncdevmem: make configure_channels() support combined channels
Jakub Kicinski [Fri, 15 Aug 2025 23:15:13 +0000 (16:15 -0700)]
selftests: drv-net: ncdevmem: make configure_channels() support combined channels

ncdevmem tests that the kernel correctly rejects attempts
to deactivate queues with MPs bound.

Make the configure_channels() test support combined channels.
Currently it tries to set the queue counts to rx N tx N-1,
which only makes sense for devices which have IRQs per ring
type. Most modern devices used combined IRQs/channels with
both Rx and Tx queues. Since the math is total Rx == combined+Rx
setting Rx when combined is non-zero will be increasing the total
queue count, not decreasing as the test intends.

Note that the test would previously also try to set the Tx
ring count to Rx - 1, for some reason. Which would be 0
if the device has only 2 queues configured.

With this change (device with 2 queues):
  setting channel count rx:1 tx:1
  YNL set channels: Kernel error: 'requested channel counts are too low for existing memory provider setting (2)'

Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250815231513.381652-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: drv-net: tso: increase the retransmit threshold
Jakub Kicinski [Fri, 15 Aug 2025 22:41:00 +0000 (15:41 -0700)]
selftests: drv-net: tso: increase the retransmit threshold

We see quite a few flakes during the TSO test against virtualized
devices in NIPA. There's often 10-30 retransmissions during the
test. Sometimes as many as 100. Set the retransmission threshold
at 1/4th of the wire frame target.

Link: https://patch.msgid.link/20250815224100.363438-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: ethernet: stmmac: dwmac-rk: Make the clk_phy could be used for external phy
Chaoyi Chen [Fri, 15 Aug 2025 02:35:15 +0000 (10:35 +0800)]
net: ethernet: stmmac: dwmac-rk: Make the clk_phy could be used for external phy

For external phy, clk_phy should be optional, and some external phy
need the clock input from clk_phy. This patch adds support for setting
clk_phy for external phy.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: Chaoyi Chen <chaoyi.chen@rock-chips.com>
Link: https://patch.msgid.link/20250815023515.114-1-kernel@airkyi.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoselftests: drv-net: test the napi init state
Jakub Kicinski [Fri, 15 Aug 2025 01:33:14 +0000 (18:33 -0700)]
selftests: drv-net: test the napi init state

Test that threaded state (in the persistent NAPI config) gets updated
even when NAPI with given ID is not allocated at the time.

This test is validating commit ccba9f6baa90 ("net: update NAPI threaded
config even for disabled NAPIs").

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250815013314.2237512-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: mana: Use page pool fragments for RX buffers instead of full pages to improve...
Dipayaan Roy [Thu, 14 Aug 2025 14:04:10 +0000 (07:04 -0700)]
net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency.

This patch enhances RX buffer handling in the mana driver by allocating
pages from a page pool and slicing them into MTU-sized fragments, rather
than dedicating a full page per packet. This approach is especially
beneficial on systems with large base page sizes like 64KB.

Key improvements:

- Proper integration of page pool for RX buffer allocations.
- MTU-sized buffer slicing to improve memory utilization.
- Reduce overall per Rx queue memory footprint.
- Automatic fallback to full-page buffers when:
   * Jumbo frames are enabled (MTU > PAGE_SIZE / 2).
   * The XDP path is active, to avoid complexities with fragment reuse.

Testing on VMs with 64KB pages shows around 200% throughput improvement.
Memory efficiency is significantly improved due to reduced wastage in page
allocations. Example: We are now able to fit 35 rx buffers in a single 64kb
page for MTU size of 1500, instead of 1 rx buffer per page previously.

Tested:

- iperf3, iperf2, and nttcp benchmarks.
- Jumbo frames with MTU 9000.
- Native XDP programs (XDP_PASS, XDP_DROP, XDP_TX, XDP_REDIRECT) for
  testing the XDP path in driver.
- Memory leak detection (kmemleak).
- Driver load/unload, reboot, and stress scenarios.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Link: https://patch.msgid.link/20250814140410.GA22089@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: airoha: Add wlan flowtable TX offload
Lorenzo Bianconi [Thu, 14 Aug 2025 07:51:16 +0000 (09:51 +0200)]
net: airoha: Add wlan flowtable TX offload

Introduce support to offload the traffic received on the ethernet NIC
and forwarded to the wireless one using HW Packet Processor Engine (PPE)
capabilities.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250814-airoha-en7581-wlan-tx-offload-v1-1-72e0a312003e@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoMerge branch 'net-macb-add-taprio-traffic-scheduling-support'
Paolo Abeni [Tue, 19 Aug 2025 10:13:05 +0000 (12:13 +0200)]
Merge branch 'net-macb-add-taprio-traffic-scheduling-support'

Vineeth Karumanchi says:

====================
net: macb: Add TAPRIO traffic scheduling support

Implement Time-Aware Traffic Scheduling (TAPRIO) offload support
for Cadence MACB/GEM ethernet controllers to enable IEEE 802.1Qbv
compliant time-sensitive networking (TSN) capabilities.

Key features implemented:
- Complete TAPRIO qdisc offload infrastructure with TC_SETUP_QDISC_TAPRIO
- Hardware-accelerated time-based gate control for multiple queues
- Enhanced Scheduled Traffic (ENST) register configuration and management
- Gate state scheduling with configurable start times, on/off intervals
- Support for cycle-time based traffic scheduling with validation
- Hardware capability detection via MACB_CAPS_QBV flag
- Robust error handling and parameter validation
- Queue-specific timing register programming
  (ENST_START_TIME, ENST_ON_TIME, ENST_OFF_TIME)

Changes include:
- Add enst_ns_to_hw_units(): Converts nanoseconds to hardware units
- Add enst_max_hw_interval(): Returns max interval for given speed
- Add macb_taprio_setup_replace() for TAPRIO configuration
- Add macb_taprio_destroy() for cleanup and reset
- Add macb_setup_tc() as TC offload entry point
- Enable NETIF_F_HW_TC feature for QBV-capable hardware
- Add ENST register offsets to queue configuration

The implementation validates timing constraints against hardware limits,
supports per-queue gate mask configuration, and provides comprehensive
logging for debugging and monitoring. Hardware registers are programmed
atomically with proper locking to ensure consistent state.

Tested on Xilinx Versal platforms with QBV-capable MACB controllers.

Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com>
====================

Link: https://patch.msgid.link/20250814071058.3062453-1-vineeth.karumanchi@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: macb: Add capability-based QBV detection and Versal support
Vineeth Karumanchi [Thu, 14 Aug 2025 07:10:58 +0000 (12:40 +0530)]
net: macb: Add capability-based QBV detection and Versal support

The 'exclude_qbv' bit in the designcfg_debug1 register varies across
MACB/GEM IP revisions, making direct probing unreliable for detecting
QBV support. This patch introduces a capability-based approach for
consistent QBV feature identification across the IP family.

Platform support updates:
- Establish foundation for QBV detection in TAPRIO implementation
- Enable MACB_CAPS_QBV for Xilinx Versal platform configuration
- Fix capability line wrapping, ensuring code stays within 80 columns

Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com>
Link: https://patch.msgid.link/20250814071058.3062453-3-vineeth.karumanchi@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: macb: Add TAPRIO traffic scheduling support
Vineeth Karumanchi [Thu, 14 Aug 2025 07:10:57 +0000 (12:40 +0530)]
net: macb: Add TAPRIO traffic scheduling support

Implement Time-Aware Traffic Scheduling (TAPRIO) offload support
for Cadence MACB/GEM ethernet controllers to enable IEEE 802.1Qbv
compliant time-sensitive networking (TSN) capabilities.

Key Features:

1. Enhanced Scheduled Traffic (ENST) Register Management
   - Per-queue ENST registers: ENST_START_TIME, ENST_ON_TIME, ENST_OFF_TIME
   - Centralized control via ENST_CONTROL for gate enable/disable
   - Infrastructure enhancements:
     * Extended macb_queue structure with ENST timing control registers
     * Mapped ENST register offsets into queue management framework
     * Introduced macb_queue_enst_config for per-entry TC configuration
   - Timing conversion utility:
     * enst_ns_to_hw_units(): Converts nanoseconds to hardware units
     * Timing values are programmed as hardware units based on link speed
     * Conversion formula: time_bytes = time_ns / divisor
     * Speed-specific divisors: 1Gbps=8, 100Mbps=80, 10Mbps=800
   - Hardware limit utility:
     * enst_max_hw_interval(): Returns max interval for given speed

2. TAPRIO Configuration via "tc qdisc replace"
   - macb_taprio_setup_replace(): Configures TAPRIO hardware offload
   - Parameter validation checks performed:
     * TC entry limit validation against available hardware queues
     * Base time non-negativity enforcement
     * Speed-adaptive timing constraint verification
     * Cycle time vs. total gate time consistency checks
     * Single-queue gate mask enforcement per scheduling entry
   - Programming sequence:
     * GEM doesn't support changing ENST registers if ENST is enabled,
       hence disable ENST before programming
     * Atomic timing register configuration (START_TIME, ON_TIME, OFF_TIME)
     * Enable queues via ENST_CONTROL

3. TAPRIO Cleanup via "tc qdisc destroy"
   - macb_taprio_destroy(): Safely removes TAPRIO configuration
   - Restores default queue behavior
   - Cleanup steps:
     * Reset TC state
     * Disable ENST
     * Clear timing registers
     * Ensure atomic updates with locking

4. Traffic Control Offload Infrastructure
   - macb_setup_taprio(): TAPRIO command dispatcher
     * Verifies hardware support
     * Handles runtime suspend state
   - macb_setup_tc(): TC_SETUP_QDISC_TAPRIO entry point
   - Supports REPLACE and DESTROY operations

Tested on Xilinx Versal platforms with QBV-capable MACB controllers.

Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com>
Link: https://patch.msgid.link/20250814071058.3062453-2-vineeth.karumanchi@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoMerge branch 'eth-fbnic-add-xdp-support-for-fbnic'
Paolo Abeni [Tue, 19 Aug 2025 08:51:20 +0000 (10:51 +0200)]
Merge branch 'eth-fbnic-add-xdp-support-for-fbnic'

Mohsin Bashir says:

====================
eth: fbnic: Add XDP support for fbnic

This patch series introduces basic XDP support for fbnic. To enable this,
it also includes preparatory changes such as making the HDS threshold
configurable via ethtool, updating headroom for fbnic, tracking
frag state in shinfo, and prefetching the first cacheline of data.

V3: https://lore.kernel.org/netdev/20250812220150.161848-1-mohsin.bashr@gmail.com/
V2: https://lore.kernel.org/netdev/20250811211338.857992-1-mohsin.bashr@gmail.com/
V1: https://lore.kernel.org/netdev/20250723145926.4120434-1-mohsin.bashr@gmail.com/
====================

Link: https://patch.msgid.link/20250813221319.3367670-1-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoeth: fbnic: Report XDP stats via ethtool
Mohsin Bashir [Wed, 13 Aug 2025 22:13:19 +0000 (15:13 -0700)]
eth: fbnic: Report XDP stats via ethtool

Add support to collect XDP stats via ethtool API. We record
packets and bytes sent, and packets dropped on the XDP_TX path.

ethtool -S eth0 | grep xdp | grep -v "0"
     xdp_tx_queue_13_packets: 2
     xdp_tx_queue_13_bytes: 16126

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-10-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoeth: fbnic: Collect packet statistics for XDP
Mohsin Bashir [Wed, 13 Aug 2025 22:13:18 +0000 (15:13 -0700)]
eth: fbnic: Collect packet statistics for XDP

Add support for XDP statistics collection and reporting via rtnl_link
and netdev_queue API.

For XDP programs without frags support, fbnic requires MTU to be less
than the HDS threshold. If an over-sized frame is received, the frame
is dropped and recorded as rx_length_errors reported via ip stats to
highlight that this is an error.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-9-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoeth: fbnic: Add support for XDP_TX action
Mohsin Bashir [Wed, 13 Aug 2025 22:13:17 +0000 (15:13 -0700)]
eth: fbnic: Add support for XDP_TX action

Add support for XDP_TX action and cleaning the associated work.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-8-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoeth: fbnic: Add support for XDP queues
Mohsin Bashir [Wed, 13 Aug 2025 22:13:16 +0000 (15:13 -0700)]
eth: fbnic: Add support for XDP queues

Add support for allocating XDP_TX queues and configuring ring support.
FBNIC has been designed with XDP support in mind. Each Tx queue has 2
submission queues and one completion queue, with the expectation that
one of the submission queues will be used by the stack, and the other
by XDP. XDP queues are populated by XDP_TX and start from index 128
in the TX queue array.
The support for XDP_TX is added in the next patch.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-7-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoeth: fbnic: Add XDP pass, drop, abort support
Mohsin Bashir [Wed, 13 Aug 2025 22:13:15 +0000 (15:13 -0700)]
eth: fbnic: Add XDP pass, drop, abort support

Add basic support for attaching an XDP program to the device and support
for PASS/DROP/ABORT actions. In fbnic, buffers are always mapped as
DMA_BIDIRECTIONAL.

The BPF program pointer can be read either on a per-packet basis or on a
per-NAPI poll basis. Both approaches are functionally equivalent, in the
current code. Stick to per-packet as it limits number of arguments we need
to pass around.

On the XDP hot path, check that packets with fragments are only allowed
when multi-buffer support is enabled for the XDP program. Ideally, this
check should not be necessary because ndo_bpf verifies that for XDP
programs without multi-buff support, MTU is less than the hds_thresh.
However, the MTU currently does not enforce the receive size which would
require cleaning up the data path and bouncing the link. For practical
reasons, prioritize the ability to enter and exit BPF mode with different
MTU sizes without requiring a full reconfig.

Testing:

Hook a simple XDP program that passes all the packets destined for a
specific port

iperf3 -c 192.168.1.10 -P 5 -p 12345
Connecting to host 192.168.1.10, port 12345
[  5] local 192.168.1.9 port 46702 connected to 192.168.1.10 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
- - - - - - - - - - - - - - - - - - - - - - - - -
[SUM]   1.00-2.00   sec  3.86 GBytes  33.2 Gbits/sec    0

XDP_DROP:
Hook an XDP program that drops packets destined for a specific port

 iperf3 -c 192.168.1.10 -P 5 -p 12345
^C- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[SUM]   0.00-0.00   sec  0.00 Bytes  0.00 bits/sec    0       sender
[SUM]   0.00-0.00   sec  0.00 Bytes  0.00 bits/sec            receiver
iperf3: interrupt - the client has terminated

XDP with HDS:

- Validate XDP attachment failure when HDS is low
   ~] ethtool -G eth0 hds-thresh 512
   ~] sudo ip link set eth0 xdpdrv obj xdp_pass_12345.o sec xdp
   ~] Error: fbnic: MTU too high, or HDS threshold is too low for single
      buffer XDP.

- Validate successful XDP attachment when HDS threshold is appropriate
  ~] ethtool -G eth0 hds-thresh 1536
  ~] sudo ip link set eth0 xdpdrv obj xdp_pass_12345.o sec xdp

- Validate when the XDP program is attached, changing HDS thresh to a
  lower value fails
  ~] ethtool -G eth0 hds-thresh 512
  ~] netlink error: fbnic: Use higher HDS threshold or multi-buf capable
     program

- Validate HDS thresh does not matter when xdp frags support is
  available
  ~] ethtool -G eth0 hds-thresh 512
  ~] sudo ip link set eth0 xdpdrv obj xdp_pass_mb_12345.o sec xdp.frags

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-6-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoeth: fbnic: Prefetch packet headers on Rx
Mohsin Bashir [Wed, 13 Aug 2025 22:13:14 +0000 (15:13 -0700)]
eth: fbnic: Prefetch packet headers on Rx

Issue a prefetch for the start of the buffer on Rx to try to avoid cache
miss on packet headers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-5-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoeth: fbnic: Use shinfo to track frags state on Rx
Mohsin Bashir [Wed, 13 Aug 2025 22:13:13 +0000 (15:13 -0700)]
eth: fbnic: Use shinfo to track frags state on Rx

Remove local fields that track frags state and instead store this
information directly in the shinfo struct. This change is necessary
because the current implementation can lead to inaccuracies in certain
scenarios, such as when using XDP multi-buff support. Specifically, the
XDP program may update nr_frags without updating the local variables,
resulting in an inconsistent state.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-4-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoeth: fbnic: Update Headroom
Mohsin Bashir [Wed, 13 Aug 2025 22:13:12 +0000 (15:13 -0700)]
eth: fbnic: Update Headroom

Fbnic currently reserves a minimum of 64B headroom, but this is
insufficient for inserting additional headers (e.g., IPV6) via XDP, as
only 24 bytes are available for adjustment. To address this limitation,
increase the headroom to a larger value while ensuring better page use.
Although the resulting headroom (192B) is smaller than the recommended
value (256B), forcing the headroom to 256B would require aligning to
256B (as opposed to the current 128B), which can push the max headroom
to 511B.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-3-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoeth: fbnic: Add support for HDS configuration
Mohsin Bashir [Wed, 13 Aug 2025 22:13:11 +0000 (15:13 -0700)]
eth: fbnic: Add support for HDS configuration

Add support for configuring the header data split threshold.
For fbnic, the tcp data split support is enabled all the time.

Fbnic supports a maximum buffer size of 4KB. However, the reservation
for the headroom, tailroom, and padding reduce the max header size
accordingly.

  ethtool_hds -g eth0
  Ring parameters for eth0:
  Pre-set maximums:
  ...
  HDS thresh: 3584
  Current hardware settings:
  ...
  HDS thresh: 1536

Verify hds tests in ksft-net-drv are passing

  ksft-net-drv]# ./drivers/net/hds.py
  TAP version 13
  1..13
  ok 1 hds.get_hds
  ok 2 hds.get_hds_thresh
  ok 3 hds.set_hds_disable # SKIP disabling of HDS not supported by ...
  ...
  ...
  ok 12 hds.ioctl_set_xdp
  ok 13 hds.ioctl_enabled_set_xdp
  \# Totals: pass:12 fail:0 xfail:0 xpass:0 skip:1 error:0

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-2-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoMerge branch 'net-stmmac-eee-and-wol-cleanups'
Jakub Kicinski [Tue, 19 Aug 2025 01:10:14 +0000 (18:10 -0700)]
Merge branch 'net-stmmac-eee-and-wol-cleanups'

Russell King says:

====================
net: stmmac: EEE and WoL cleanups

This series contains a series of cleanup patches for the EEE and WoL
code in stmmac, prompted by issues raised during the last three weeks.
====================

Link: https://patch.msgid.link/aJ8avIp8DBAckgMc@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: explain the phylink_speed_down() call in stmmac_release()
Russell King (Oracle) [Fri, 15 Aug 2025 11:32:21 +0000 (12:32 +0100)]
net: stmmac: explain the phylink_speed_down() call in stmmac_release()

The call to phylink_speed_down() looks odd on the face of it. Add a
comment to explain why this call is there. phylink_speed_up() is
always called in __stmmac_open(), and already has a comment.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1umsfV-008vKv-1O@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: add helpers to indicate WoL enable status
Russell King (Oracle) [Fri, 15 Aug 2025 11:32:15 +0000 (12:32 +0100)]
net: stmmac: add helpers to indicate WoL enable status

Add two helpers to abstract the WoL enable status at the PHY and MAC to
make the code easier to read.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1umsfP-008vKp-U1@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: use core wake IRQ support
Russell King (Oracle) [Fri, 15 Aug 2025 11:32:10 +0000 (12:32 +0100)]
net: stmmac: use core wake IRQ support

The PM core provides management of wake IRQs along side setting the
device wake enable state. In order to use this, we need to register
the interrupt used to wakeup the system using devm_pm_set_wake_irq()
or dev_pm_set_wake_irq(). The core will then enable or disable IRQ
wake state on this interrupt as appropriate, depending on the
device_set_wakeup_enable() state. device_set_wakeup_enable() does not
care about having balanced enable/disable calls.

Make use of this functionality, rather than explicitly managing the
IRQ enable state in the set_wol() ethtool op. This removes the IRQ
wake state management from stmmac.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1umsfK-008vKj-Pw@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: remove unnecessary "stmmac: wakeup enable" print
Russell King (Oracle) [Fri, 15 Aug 2025 11:32:05 +0000 (12:32 +0100)]
net: stmmac: remove unnecessary "stmmac: wakeup enable" print

Printing "stmmac: wakeup enable" to the kernel log isn't useful - it
doesn't identify the adapter, and is effectively nothing more than a
debugging print. This information can be discovered by looking at
/sys/device.../power/wakeup as the device_set_wakeup_enable() call
updates this sysfs file.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1umsfF-008vKc-Kt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: remove redundant WoL option validation
Russell King (Oracle) [Fri, 15 Aug 2025 11:32:00 +0000 (12:32 +0100)]
net: stmmac: remove redundant WoL option validation

The core ethtool API validates the WoL options passed from userspace
against the support which the driver reports from its get_wol() method,
returning EINVAL if an unsupported mode is requested.

Therefore, there is no need for stmmac to implement its own validation.
Remove this unnecessary code.

See ethnl_set_wol() in net/ethtool/wol.c and ethtool_set_wol() in
net/ethtool/ioctl.c.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1umsfA-008vKW-H1@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: remove write-only mac->pmt
Russell King (Oracle) [Fri, 15 Aug 2025 11:31:55 +0000 (12:31 +0100)]
net: stmmac: remove write-only mac->pmt

mac_device_info->pmt is only ever written, nothing reads it. Remove
this struct member.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1umsf5-008vKQ-DT@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: remove unnecessary checks in ethtool eee ops
Russell King (Oracle) [Fri, 15 Aug 2025 11:31:50 +0000 (12:31 +0100)]
net: stmmac: remove unnecessary checks in ethtool eee ops

Phylink will check whether the MAC supports the LPI methods in
struct phylink_mac_ops, and return -EOPNOTSUPP if the LPI capabilities
are not provided. stmmac doesn't provide LPI capabilities if
priv->dma_cap.eee is not set.

Therefore, checking the state of priv->dma_cap.eee in the ethtool ops
and returning -EOPNOTSUPP is redundant - let phylink handle this.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1umsf0-008vKK-A3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoamd-xgbe: Configure and retrieve 'tx-usecs' for Tx coalescing
Vishal Badole [Sat, 16 Aug 2025 14:19:41 +0000 (19:49 +0530)]
amd-xgbe: Configure and retrieve 'tx-usecs' for Tx coalescing

Ethtool has advanced with additional configurable options, but the
current driver does not support tx-usecs configuration using Ethtool.

Add support to configure and retrieve 'tx-usecs' using ethtool, which
specifies the wait time before servicing an interrupt for Tx coalescing.

Signed-off-by: Vishal Badole <Vishal.Badole@amd.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://patch.msgid.link/20250816141941.126054-1-Vishal.Badole@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-use-vmalloc_array-to-simplify-code'
Jakub Kicinski [Tue, 19 Aug 2025 00:49:54 +0000 (17:49 -0700)]
Merge branch 'net-use-vmalloc_array-to-simplify-code'

Qianfeng Rong says:

====================
net: use vmalloc_array() to simplify code

Remove array_size() calls and replace vmalloc() with vmalloc_array() to
simplify the code and maintain consistency with existing kmalloc_array()
usage.

vmalloc_array() is also optimized better, resulting in less instructions
being used [1].

[1]: https://lore.kernel.org/lkml/abc66ec5-85a4-47e1-9759-2f60ab111971@vivo.com/
====================

Link: https://patch.msgid.link/20250816090659.117699-1-rongqianfeng@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoppp: use vmalloc_array() to simplify code
Qianfeng Rong [Sat, 16 Aug 2025 09:06:54 +0000 (17:06 +0800)]
ppp: use vmalloc_array() to simplify code

Remove array_size() calls and replace vmalloc() with vmalloc_array() in
bsd_alloc().

vmalloc_array() is also optimized better, resulting in less instructions
being used.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Link: https://patch.msgid.link/20250816090659.117699-4-rongqianfeng@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonfp: flower: use vmalloc_array() to simplify code
Qianfeng Rong [Sat, 16 Aug 2025 09:06:53 +0000 (17:06 +0800)]
nfp: flower: use vmalloc_array() to simplify code

Remove array_size() calls and replace vmalloc() with vmalloc_array() in
nfp_flower_metadata_init().  vmalloc_array() is also optimized better,
resulting in less instructions being used.

Place 'NFP_FL_STATS_ELEM_RS' with the sizeof() parameter as the second
argument to vmalloc_array() to avoid -Wcalloc-transposed-args compilation
warnings.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Link: https://patch.msgid.link/20250816090659.117699-3-rongqianfeng@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoeth: intel: use vmalloc_array() to simplify code
Qianfeng Rong [Sat, 16 Aug 2025 09:06:52 +0000 (17:06 +0800)]
eth: intel: use vmalloc_array() to simplify code

Remove array_size() calls and replace vmalloc() with vmalloc_array() to
simplify the code and maintain consistency with existing kmalloc_array()
usage.

vmalloc_array() is also optimized better, resulting in less instructions
being used.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Link: https://patch.msgid.link/20250816090659.117699-2-rongqianfeng@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agodocs: netdev: refine the clean-up patch examples
Jakub Kicinski [Fri, 15 Aug 2025 16:52:42 +0000 (09:52 -0700)]
docs: netdev: refine the clean-up patch examples

We discourage sending trivial patches to clean up checkpatch warnings.
There are other tools which lead to patches of similarly low value
like some coccicheck warnings. The warnings are useful for new code
but fixing them in the existing code base is a waste of review time.

Broaden the example given in the doc a little bit.

Link: https://patch.msgid.link/20250815165242.124240-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoppp: mppe: Use SHA-1 library instead of crypto_shash
Eric Biggers [Fri, 15 Aug 2025 02:07:05 +0000 (19:07 -0700)]
ppp: mppe: Use SHA-1 library instead of crypto_shash

Now that a SHA-1 library API is available, use it instead of
crypto_shash.  This is simpler and faster.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250815020705.23055-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoeth: nfp: Remove u64_stats_update_begin()/end() for stats fetch
Li RongQing [Fri, 15 Aug 2025 01:56:19 +0000 (09:56 +0800)]
eth: nfp: Remove u64_stats_update_begin()/end() for stats fetch

This place is fetching the stats, u64_stats_update_begin()/end()
should not be used, and the fetcher of stats is in the same
context as the updater of the stats, so don't need any protection

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Link: https://patch.msgid.link/20250815015619.2713-1-lirongqing@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonfc: s3fwrn5: Use SHA-1 library instead of crypto_shash
Eric Biggers [Fri, 15 Aug 2025 02:23:29 +0000 (19:23 -0700)]
nfc: s3fwrn5: Use SHA-1 library instead of crypto_shash

Now that a SHA-1 library API is available, use it instead of
crypto_shash.  This is simpler and faster.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250815022329.28672-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-dsa-move-ks8995-phy-driver-to-dsa'
Jakub Kicinski [Tue, 19 Aug 2025 00:26:12 +0000 (17:26 -0700)]
Merge branch 'net-dsa-move-ks8995-phy-driver-to-dsa'

Linus Walleij says:

====================
net: dsa: Move ks8995 "phy" driver to DSA

After we concluded that the KS8995 is a DSA switch, see
commit a0f29a07b654a50ebc9b070ef6dcb3219c4de867
it is time to move the driver to it's right place under
DSA.

Developing full support for the custom tagging, but we
can make sure the driver does the job it did as a "phy",
act as a switch with individually represented ports.

This patch series achieves that first step so the
current device tree bindings produces working set-ups
and paves the way for custom tagging.
====================

Link: https://patch.msgid.link/20250813-ks8995-to-dsa-v1-0-75c359ede3a5@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: dsa: ks8995: Add basic switch set-up
Linus Walleij [Wed, 13 Aug 2025 21:43:06 +0000 (23:43 +0200)]
net: dsa: ks8995: Add basic switch set-up

We start to extend the KS8995 driver by simply registering it
as a DSA device and implementing a few switch callbacks for
STP set-up and such to begin with.

No special tags or other advanced stuff: we use DSA_TAG_NONE
and rely on the default set-up in the switch with the special
DSA tags turned off. This makes the switch wire up properly
to all its PHY's and simple bridge traffic works properly.

After this the bridge DT bindings are respected, ports and their
PHYs get connected to the switch and react appropriately through
the phylib when cables are plugged in etc.

Tested like this in a hacky OpenWrt image:

Bring up conduit interface manually:
ixp4xx_eth c8009000.ethernet eth0: eth0: link up,
  speed 100 Mb/s, full duplex

spi-ks8995 spi0.0: enable port 0
spi-ks8995 spi0.0: set KS8995_REG_PC2 for port 0 to 06
spi-ks8995 spi0.0 lan1: configuring for phy/mii link mode
spi-ks8995 spi0.0 lan1: Link is Up - 100Mbps/Full - flow control rx/tx

PING 169.254.1.1 (169.254.1.1): 56 data bytes
64 bytes from 169.254.1.1: seq=0 ttl=64 time=1.629 ms
64 bytes from 169.254.1.1: seq=1 ttl=64 time=0.951 ms

I also tested SSH from the device to the host and it works fine.

It also works fine to ping the device from the host and to SSH
into the device from the host.

This brings the ks8995 driver to a reasonable state where it can
be used from the current device tree bindings and the existing
device trees in the kernel.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250813-ks8995-to-dsa-v1-4-75c359ede3a5@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: dsa: ks8995: Delete sysfs register access
Linus Walleij [Wed, 13 Aug 2025 21:43:05 +0000 (23:43 +0200)]
net: dsa: ks8995: Delete sysfs register access

There is some sysfs file to read and write registers randomly
in the ks8995 driver.

The contemporary way to achieve the same thing is to implement
regmap abstractions, if needed. Delete this and implement
regmap later if we want to be able to inspect individual registers.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250813-ks8995-to-dsa-v1-3-75c359ede3a5@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: dsa: ks8995: Add proper RESET delay
Linus Walleij [Wed, 13 Aug 2025 21:43:04 +0000 (23:43 +0200)]
net: dsa: ks8995: Add proper RESET delay

According to the datasheet we need to wait 100us before accessing
any registers in the KS8995 after a reset de-assertion.

Add this delay, if and only if we obtained a GPIO descriptor,
otherwise it is just a pointless delay.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250813-ks8995-to-dsa-v1-2-75c359ede3a5@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: dsa: Move KS8995 to the DSA subsystem
Linus Walleij [Wed, 13 Aug 2025 21:43:03 +0000 (23:43 +0200)]
net: dsa: Move KS8995 to the DSA subsystem

By reading the datasheets for the KS8995 it is obvious that this
is a 100 Mbit DSA switch.

Let us start the refactoring by moving it to the DSA subsystem to
preserve development history.

Verified that the chip still probes the same after this patch
provided CONFIG_HAVE_NET_DSA, CONFIG_NET_DSA and CONFIG_DSA_KS8995
are selected.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250813-ks8995-to-dsa-v1-1-75c359ede3a5@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agodt-bindings: net: realtek,rtl82xx: document wakeup-source property
Russell King (Oracle) [Wed, 13 Aug 2025 11:21:19 +0000 (12:21 +0100)]
dt-bindings: net: realtek,rtl82xx: document wakeup-source property

The RTL8211F PHY has two modes for a single INTB/PMEB pin:

1. INTB mode, where it signals interrupts to the CPU, which can
   include wake-on-LAN events.
2. PMEB mode, where it only signals a wake-on-LAN event, which
   may either be a latched logic low until software manually
   clears the WoL state, or pulsed mode.

In the case of (1), there is no way to know whether the interrupt to
which the PHY is connected is capable of waking the system. In the
case of (2), there would be no interrupt property in the PHY's DT
description, and thus there is nothing to describe whether the pin is
even wired to anything such as a power management controller.

There is a "wakeup-source" property which can be applied to any device
- see Documentation/devicetree/bindings/power/wakeup-source.txt

Case 1 above matches example 2 in this document, and case 2 above
matches example 3. Therefore, it seems reasonable to make use of this
existing specification, albiet it hasn't been converted to YAML.

Document the wakeup-source property in the device description, which
will indicate that the PHY has been wired up in such a way that it
can wake the system from a low power state.

We will use this in a rewrite of the existing broken Wake-on-Lan code
that was merged during the 6.16 merge window to support case 1. Case 2
can be added to the driver later without needing to further alter the
DT description. To be clear, the existing Wake-on-Lan code that was
recently merged has multiple functional issues.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/E1um9Xj-008kBx-72@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: phy: realtek: fix RTL8211F wake-on-lan support
Russell King (Oracle) [Wed, 13 Aug 2025 10:04:45 +0000 (11:04 +0100)]
net: phy: realtek: fix RTL8211F wake-on-lan support

Implement Wake-on-Lan for RTL8211F correctly. The existing
implementation has multiple issues:

1. It assumes that Wake-on-Lan can always be used, whether or not the
   interrupt is wired, and whether or not the interrupt is capable of
   waking the system. This breaks the ability for MAC drivers to detect
   whether the PHY WoL is functional.
2. switching the interrupt pin in the .set_wol() method to PMEB mode
   immediately silences link-state interrupts, which breaks phylib
   when interrupts are being used rather than polling mode.
3. the code claiming to "reset WOL status" was doing nothing of the
   sort. Bit 15 in page 0xd8a register 17 controls WoL reset, and
   needs to be pulsed low to reset the WoL state. This bit was always
   written as '1', resulting in no reset.
4. not resetting WoL state results in the PMEB pin remaining asserted,
   which in turn leads to an interrupt storm. Only resetting the WoL
   state in .set_wol() is not sufficient.
5. PMEB mode does not allow software detection of the wake-up event as
   there is no status bit to indicate we received the WoL packet.
6. across reboots of at least the Jetson Xavier NX system, the WoL
   configuration is preserved.

Fix all of these issues by essentially rewriting the support. We:
1. clear the WoL event enable register at probe time.
2. detect whether we can support wake-up by having a valid interrupt,
   and the "wakeup-source" property in DT. If we can, then we mark
   the MDIO device as wakeup capable, and associate the interrupt
   with the wakeup source.
3. arrange for the get_wol() and set_wol() implementations to handle
   the case where the MDIO device has not been marked as wakeup
   capable (thereby returning no WoL support, and refusing to enable
   WoL support.)
4. avoid switching to PMEB mode, instead using INTB mode with the
   interrupt enable, reconfiguring the interrupt enables at suspend
   time, and restoring their original state at resume time (we track
   the state of the interrupt enable register in .config_intr()
   register.)
5. move WoL reset from .set_wol() to the suspend function to ensure
   that WoL state is cleared prior to suspend. This is necessary
   after the PME interrupt has been enabled as a second WoL packet
   will not re-raise a previously cleared PME interrupt.
6. when a PME interrupt (for wakeup) is asserted, pass this to the
   PM wakeup so it knows which device woke the system.

This fixes WoL support in the Realtek RTL8211F driver when used on the
nVidia Jetson Xavier NX platform, and needs to be applied before stmmac
patches which allow these platforms to forward the ethtool WoL commands
to the Realtek PHY.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1um8Ld-008jxD-Mc@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Sat, 16 Aug 2025 01:13:41 +0000 (18:13 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
ice: implement SRIOV VF Active-Active LAG

Dave Ertman says:

Implement support for SRIOV VFs over an Active-Active link aggregate.
The same restrictions apply as are in place for the support of
Active-Backup bonds.

- the two interfaces must be on the same NIC
- the FW LLDP engine needs to be disabled
- the DDP package that supports VF LAG must be loaded on device
- the two interfaces must have the same QoS config
- only the first interface added to the bond will have VF support
- the interface with VFs must be in switchdev mode

With the additional requirement of
- the version of the FW on the NIC needs to have VF Active/Active support

The balancing of traffic between the two interfaces is done on a queue
basis. Taking the queues allocated to all of the VFs as a whole, one
half of them will be distributed to each interface. When a link goes
down, then the queues allocated to the down interface will migrate to
the active port. When the down port comes back up, then the same
queues as were originally assigned there will be moved back.

Patch 1 cleans up void pointer casts
Patch 2 utilizes bool over u8 when appropriate
Patch 3 adds a driver prefix to a LAG define
Patch 4 pre-move a function to reduce delta in implementation patch
Patch 5 cleanup variable initialization in declaration block
Patch 6 cleanup capability parsing for LAG feature
Patch 7 is the implementation of the new functionality

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: Implement support for SRIOV VFs across Active/Active bonds
  ice: cleanup capabilities evaluation
  ice: Cleanup variable initialization in LAG code
  ice: move LAG function in code to prepare for Active-Active
  ice: Add driver specific prefix to LAG defines
  ice: replace u8 elements with bool where appropriate
  ice: Remove casts on void pointers in LAG code
====================

Link: https://patch.msgid.link/20250814230855.128068-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: Space: Replace memset(0) + strscpy() with strscpy_pad()
Thorsten Blum [Thu, 14 Aug 2025 18:05:14 +0000 (20:05 +0200)]
net: Space: Replace memset(0) + strscpy() with strscpy_pad()

Replace memset(0) followed by strscpy() with strscpy_pad() to improve
netdev_boot_setup_add(). This avoids zeroing the memory before copying
the string and ensures the destination buffer is only written to once,
simplifying the code and improving efficiency.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20250814180514.251000-2-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-mlx5-support-disabling-host-pfs'
Jakub Kicinski [Fri, 15 Aug 2025 19:29:09 +0000 (12:29 -0700)]
Merge branch 'net-mlx5-support-disabling-host-pfs'

Tariq Toukan says:

====================
net/mlx5: Support disabling host PFs

This small series by Daniel adds support for disabling host PFs.
If device is capable and configured, the driver won't access vports of
disabled host functions.
====================

Link: https://patch.msgid.link/1755112796-467444-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5: Support disabling host PFs
Daniel Jurgens [Wed, 13 Aug 2025 19:19:56 +0000 (22:19 +0300)]
net/mlx5: Support disabling host PFs

Some devices support disabling the physical function on the host. When
this is configured the vports for the host functions do not exist.

This patch checks if host functions are enabled before trying to access
their vports.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1755112796-467444-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5: Query to see if host PF is disabled
Daniel Jurgens [Wed, 13 Aug 2025 19:19:55 +0000 (22:19 +0300)]
net/mlx5: Query to see if host PF is disabled

The host PF can be disabled, query firmware to check if the host PF of
this function exists.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1755112796-467444-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: phy: fixed: remove usage of a faux device
Heiner Kallweit [Wed, 13 Aug 2025 19:18:03 +0000 (21:18 +0200)]
net: phy: fixed: remove usage of a faux device

A struct mii_bus doesn't need a parent, so we can simplify the code and
remove using a faux device. Only difference is the following in sysfs
under /sys/class/mdio_bus:

old: fixed-0 -> '../../devices/faux/Fixed MDIO bus/mdio_bus/fixed-0'
new: fixed-0 -> ../../devices/virtual/mdio_bus/fixed-0

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/e9426bb9-f228-4b99-bc09-a80a958b5a93@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: bridge: remove unused argument of br_multicast_query_expired()
Wang Liang [Thu, 14 Aug 2025 04:23:55 +0000 (12:23 +0800)]
net: bridge: remove unused argument of br_multicast_query_expired()

Since commit 67b746f94ff3 ("net: bridge: mcast: make sure querier
port/address updates are consistent"), the argument 'querier' is unused,
just get rid of it.

Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250814042355.1720755-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-dsa-b53-mmap-add-bcm63268-gphy-power-control'
Jakub Kicinski [Fri, 15 Aug 2025 00:54:02 +0000 (17:54 -0700)]
Merge branch 'net-dsa-b53-mmap-add-bcm63268-gphy-power-control'

Kyle Hendry says:

====================
net: dsa: b53: mmap: Add bcm63268 GPHY power control

The gpio controller on the bcm63268 has a register for
controlling the gigabit phy power. These patches disable
low power mode when enabling the gphy port.

This is based on an earlier patch series here:
https://lore.kernel.org/20250306053105.41677-1-kylehendrydev@gmail.com

I have created a new series since many of the changes
were included in the ephy control patches:
https://lore.kernel.org/20250724035300.20497-1-kylehendrydev@gmail.com
====================

Link: https://patch.msgid.link/20250814002530.5866-1-kylehendrydev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: dsa: b53: mmap: Implement bcm63268 gphy power control
Kyle Hendry [Thu, 14 Aug 2025 00:25:28 +0000 (17:25 -0700)]
net: dsa: b53: mmap: Implement bcm63268 gphy power control

Add check for gphy in enable/disable phy calls and set power bits
in gphy control register.

Signed-off-by: Kyle Hendry <kylehendrydev@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250814002530.5866-3-kylehendrydev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: dsa: b53: mmap: Add gphy port to phy info for bcm63268
Kyle Hendry [Thu, 14 Aug 2025 00:25:27 +0000 (17:25 -0700)]
net: dsa: b53: mmap: Add gphy port to phy info for bcm63268

Add gphy mask to bcm63xx phy info struct and add data for bcm63268

Signed-off-by: Kyle Hendry <kylehendrydev@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250814002530.5866-2-kylehendrydev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agosfc: replace min/max nesting with clamp()
Xichao Zhao [Tue, 12 Aug 2025 06:50:26 +0000 (14:50 +0800)]
sfc: replace min/max nesting with clamp()

The clamp() macro explicitly expresses the intent of constraining
a value within bounds.Therefore, replacing min(max(a, b), c) with
clamp(val, lo, hi) can improve code readability.

Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250812065026.620115-1-zhao.xichao@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'bridge-redirect-to-backup-port-when-port-is-administratively-down'
Jakub Kicinski [Fri, 15 Aug 2025 00:45:39 +0000 (17:45 -0700)]
Merge branch 'bridge-redirect-to-backup-port-when-port-is-administratively-down'

Ido Schimmel says:

====================
bridge: Redirect to backup port when port is administratively down

Patch #1 amends the bridge to redirect to the backup port when the
primary port is administratively down and not only when it does not have
a carrier. See the commit message for more details.

Patch #2 extends the bridge backup port selftest to cover this case.
====================

Link: https://patch.msgid.link/20250812080213.325298-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: net: Test bridge backup port when port is administratively down
Ido Schimmel [Tue, 12 Aug 2025 08:02:13 +0000 (11:02 +0300)]
selftests: net: Test bridge backup port when port is administratively down

Test that packets are redirected to the backup port when the primary
port is administratively down.

With the previous patch:

 # ./test_bridge_backup_port.sh
 [...]
 TEST: swp1 administratively down                                    [ OK ]
 TEST: No forwarding out of swp1                                     [ OK ]
 TEST: Forwarding out of vx0                                         [ OK ]
 TEST: swp1 administratively up                                      [ OK ]
 TEST: Forwarding out of swp1                                        [ OK ]
 TEST: No forwarding out of vx0                                      [ OK ]
 [...]
 Tests passed:  89
 Tests failed:   0

Without the previous patch:

 # ./test_bridge_backup_port.sh
 [...]
 TEST: swp1 administratively down                                    [ OK ]
 TEST: No forwarding out of swp1                                     [ OK ]
 TEST: Forwarding out of vx0                                         [FAIL]
 TEST: swp1 administratively up                                      [ OK ]
 TEST: Forwarding out of swp1                                        [ OK ]
 [...]
 Tests passed:  85
 Tests failed:   4

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250812080213.325298-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agobridge: Redirect to backup port when port is administratively down
Ido Schimmel [Tue, 12 Aug 2025 08:02:12 +0000 (11:02 +0300)]
bridge: Redirect to backup port when port is administratively down

If a backup port is configured for a bridge port, the bridge will
redirect known unicast traffic towards the backup port when the primary
port is administratively up but without a carrier. This is useful, for
example, in MLAG configurations where a system is connected to two
switches and there is a peer link between both switches. The peer link
serves as the backup port in case one of the switches loses its
connection to the multi-homed system.

In order to avoid flooding when the primary port loses its carrier, the
bridge does not flush dynamic FDB entries pointing to the port upon STP
disablement, if the port has a backup port.

The above means that known unicast traffic destined to the primary port
will be blackholed when the port is put administratively down, until the
FDB entries pointing to it are aged-out.

Given that the current behavior is quite weird and unlikely to be
depended on by anyone, amend the bridge to redirect to the backup port
also when the primary port is administratively down and not only when it
does not have a carrier.

The change is motivated by a report from a user who expected traffic to
be redirected to the backup port when the primary port was put
administratively down while debugging a network issue.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250812080213.325298-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: drv-net: wait for carrier
Jakub Kicinski [Tue, 12 Aug 2025 14:20:54 +0000 (07:20 -0700)]
selftests: drv-net: wait for carrier

On fast machines the tests run in quick succession so even
when tests clean up after themselves the carrier may need
some time to come back.

Specifically in NIPA when ping.py runs right after netpoll_basic.py
the first ping command fails.

Since the context manager callbacks are now common NetDrvEpEnv
gets an ip link up call as well.

Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250812142054.750282-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: phy: mscc: report and configure in-band auto-negotiation for SGMII/QSGMII
Vladimir Oltean [Wed, 13 Aug 2025 07:44:54 +0000 (10:44 +0300)]
net: phy: mscc: report and configure in-band auto-negotiation for SGMII/QSGMII

The following Vitesse/Microsemi/Microchip PHYs, among those supported by
this driver, have the host interface configurable as SGMII or QSGMII:
- VSC8504
- VSC8514
- VSC8552
- VSC8562
- VSC8572
- VSC8574
- VSC8575
- VSC8582
- VSC8584

All these PHYs are documented to have bit 7 of "MAC SerDes PCS Control"
as "MAC SerDes ANEG enable".

Out of these, I could test the VSC8514 quad PHY in QSGMII. This works
both with the in-band autoneg on and off, on the NXP LS1028A-RDB and
T1040-RDB boards.

Notably, the bit is sticky (survives soft resets), so giving Linux the
tools to read and modify this settings makes it robust to changes made
to it by previous boot layers (U-Boot).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250813074454.63224-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: dsa: realtek: remove unnecessary file, dentry, inode declarations
Vladimir Oltean [Wed, 13 Aug 2025 18:10:23 +0000 (21:10 +0300)]
net: dsa: realtek: remove unnecessary file, dentry, inode declarations

These are present since commit d8652956cf37 ("net: dsa: realtek-smi: Add
Realtek SMI driver") and never needed. Apparently the driver was not
cleaned up sufficiently for submission.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20250813181023.808528-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/sched: Use TC_RTAB_SIZE instead of magic number
Yue Haibing [Wed, 13 Aug 2025 12:55:26 +0000 (20:55 +0800)]
net/sched: Use TC_RTAB_SIZE instead of magic number

Replace magic number with TC_RTAB_SIZE to make it more informative.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250813125526.853895-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoptp: ptp_clockmatrix: Remove redundant semicolons
Liao Yuanhong [Wed, 13 Aug 2025 09:50:24 +0000 (17:50 +0800)]
ptp: ptp_clockmatrix: Remove redundant semicolons

Remove unnecessary semicolons.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Link: https://patch.msgid.link/20250813095024.559085-1-liaoyuanhong@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'devlink-port-attr-cleanup'
Jakub Kicinski [Fri, 15 Aug 2025 00:35:23 +0000 (17:35 -0700)]
Merge branch 'devlink-port-attr-cleanup'

Parav Pandit says:

====================
devlink port attr cleanup

patch-1 removes the return 0 check at several places and simplfies
patch-2 constifies the attributes and moves the checks early
caller
====================

Link: https://patch.msgid.link/20250813094417.7269-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agodevlink/port: Check attributes early and constify
Parav Pandit [Wed, 13 Aug 2025 09:44:17 +0000 (12:44 +0300)]
devlink/port: Check attributes early and constify

Constify the devlink port attributes to indicate they are read only
and does not depend on anything else. Therefore, validate it early
before setting in the devlink port.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://patch.msgid.link/20250813094417.7269-3-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agodevlink/port: Simplify return checks
Parav Pandit [Wed, 13 Aug 2025 09:44:16 +0000 (12:44 +0300)]
devlink/port: Simplify return checks

Drop always returning 0 from the helper routine and simplify
its callers.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://patch.msgid.link/20250813094417.7269-2-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonfc: pn533: Delete an unnecessary check
Dan Carpenter [Wed, 13 Aug 2025 05:51:22 +0000 (08:51 +0300)]
nfc: pn533: Delete an unnecessary check

The "rc" variable is set like this:

if (IS_ERR(resp)) {
rc = PTR_ERR(resp);

We know that "rc" is negative so there is no need to check.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/aJwn2ox5g9WsD2Vx@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: phy: realtek: convert RTL8226-CG to c45 only
Markus Stockhausen [Wed, 13 Aug 2025 05:44:07 +0000 (01:44 -0400)]
net: phy: realtek: convert RTL8226-CG to c45 only

Short: Convert the RTL8226-CG to c45 so it can be used in its
Realtek based ecosystems.

Long: The RTL8226-CG can be mainly found on devices of the
Realtek Otto switch platform. Devices like the Zyxel XGS1210-12
are based on it. These implement a hardware based phy polling
in the background to update SoC status registers.

The hardware provides 4 smi busses where phys are attached to.
For each bus one can decide if it is polled in c45 or c22 mode.
See https://svanheule.net/realtek/longan/register/smi_glb_ctrl
With this setting the register access will be limited by the
hardware. This is very complex (including caching and special
c45-over-c22 handling). But basically it boils down to "enable
protocol x and SoC will disable register access via protocol y".

Mainline already gained support for the rtl9300 mdio driver
in commit 24e31e474769 ("net: mdio: Add RTL9300 MDIO driver").

It covers the basic features, but a lot effort is still needed
to understand hardware properly. So it runs a simple setup by
selecting the proper bus mode during startup.

/* Put the interfaces into C45 mode if required */
glb_ctrl_mask = GENMASK(19, 16);
for (i = 0; i < MAX_SMI_BUSSES; i++)
if (priv->smi_bus_is_c45[i])
glb_ctrl_val |= GLB_CTRL_INTF_SEL(i);
...
err = regmap_update_bits(regmap, SMI_GLB_CTRL,
 glb_ctrl_mask, glb_ctrl_val);

To avoid complex coding later on, it limits access by only
providing either c22 or c45:

bus->name = "Realtek Switch MDIO Bus";
if (priv->smi_bus_is_c45[mdio_bus]) {
bus->read_c45 = rtl9300_mdio_read_c45;
bus->write_c45 =  rtl9300_mdio_write_c45;
} else {
bus->read = rtl9300_mdio_read_c22;
bus->write = rtl9300_mdio_write_c22;
}

Because of these limitations the existing RTL8226 phy driver
is not working at all on Realtek switches. Convert the driver
to c45-only.

Luckily the RTL8226 seems to support proper MDIO_PMA_EXTABLE
flags. So standard function genphy_c45_pma_read_abilities() can
call genphy_c45_pma_read_ext_abilities() and 10/100/1000 is
populated right. Thus conversion is straight forward.

Outputs before - REMARK: For this a "hacked" bus was used that
toggles the mode for each c22/c45 access. But that is slow and
produces unstable data in the SoC status registers).

Settings for lan9:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                2500baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                2500baseT/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 24
        Transceiver: external
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: d
        Wake-on: d
        Link detected: no

Outputs with this commit:

Settings for lan9:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                2500baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                2500baseT/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 24
        Transceiver: external
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: d
        Wake-on: d
        Link detected: no

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://patch.msgid.link/20250813054407.1108285-1-markus.stockhausen@gmx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: phy: motorcomm: Add support for PHY LEDs on YT8521
Jijie Shao [Wed, 13 Aug 2025 12:45:42 +0000 (20:45 +0800)]
net: phy: motorcomm: Add support for PHY LEDs on YT8521

Add minimal LED controller driver supporting
the most common uses with the 'netdev' trigger.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250813124542.3450447-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge tag 'docs/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Jakub Kicinski [Fri, 15 Aug 2025 00:26:37 +0000 (17:26 -0700)]
Merge tag 'docs/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-docs

Mauro Carvalho Chehab says:

====================
add a generic yaml parser integrated with Netlink specs generation

- An YAML parser Sphinx plugin, integrated with Netlink YAML doc
  parser.

The patch content is identical to my v10 submission:
https://lore.kernel.org/cover.1753718185.git.mchehab+huawei@kernel.org

* tag 'docs/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-docs:
  sphinx: parser_yaml.py: fix line numbers information
  docs: parser_yaml.py: fix backward compatibility with old docutils
  docs: parser_yaml.py: add support for line numbers from the parser
  tools: netlink_yml_parser.py: add line numbers to parsed data
  MAINTAINERS: add netlink_yml_parser.py to linux-doc
  docs: netlink: remove obsolete .gitignore from unused directory
  tools: ynl_gen_rst.py: drop support for generating index files
  docs: uapi: netlink: update netlink specs link
  docs: use parser_yaml extension to handle Netlink specs
  docs: sphinx: add a parser for yaml files for Netlink specs
  tools: ynl_gen_rst.py: cleanup coding style
  docs: netlink: index.rst: add a netlink index file
  tools: ynl_gen_rst.py: Split library from command line tool
  docs: netlink: netlink-raw.rst: use :ref: instead of :doc:
====================

Link: https://patch.msgid.link/20250812113329.356c93c2@foz.lan
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoice: Implement support for SRIOV VFs across Active/Active bonds
Dave Ertman [Mon, 16 Jun 2025 11:03:23 +0000 (13:03 +0200)]
ice: Implement support for SRIOV VFs across Active/Active bonds

This patch implements the software flows to handle SRIOV VF
communication across an Active/Active link aggregate.  The same
restrictions apply as are in place for the support of Active/Backup
bonds.

- the two interfaces must be on the same NIC
- the FW LLDP engine needs to be disabled
- the DDP package that supports VF LAG must be loaded on device
- the two interfaces must have the same QoS config
- only the first interface added to the bond will have VF support
- the interface with VFs must be in switchdev mode

With the additional requirement of
- the version of the FW on the NIC needs to have VF Active/Active support
This requirement is indicated in the capabilities struct associated
with the NVM loaded on the NIC.

The balancing of traffic between the two interfaces is done on a queue
basis.  Taking the queues allocated to all of the VFs as a whole, one
half of them will be distributed to each interface.  When a link goes
down, then the queues allocated to the down interface will migrate to
the active port.  When the down port comes back up, then the same
queues as were originally assigned there will be moved back.

Co-developed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 17 Jul 2025 17:56:56 +0000 (10:56 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.17-rc2).

No conflicts.

Adjacent changes:

drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
  d7a276a5768f ("net: stmmac: rk: convert to suspend()/resume() methods")
  de1e963ad064 ("net: stmmac: rk: put the PHY clock on remove")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoice: cleanup capabilities evaluation
Dave Ertman [Mon, 16 Jun 2025 11:03:21 +0000 (13:03 +0200)]
ice: cleanup capabilities evaluation

When evaluating the capabilities field, the ICE_AQC_BIT_ROCEV2_LAG and
ICE_AQC_BIT_SRIOV_LAG defines were both not using the BIT operator,
instead simply setting a hex value that set the correct bits.  While
not inaccurate, this method is misleading, and when it is expanded in
the following implementation it becomes even more confusing.

Switch to using the BIT() operator to clarify what is being checked.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: Cleanup variable initialization in LAG code
Dave Ertman [Mon, 16 Jun 2025 11:03:20 +0000 (13:03 +0200)]
ice: Cleanup variable initialization in LAG code

In preparation for implementing SRIOV Active-Active LAG support,
cleanup several unneeded variable initializations in declaration
blocks.

Also move a couple of variable initializations into declaration
block that should be there.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: move LAG function in code to prepare for Active-Active
Dave Ertman [Mon, 16 Jun 2025 11:03:19 +0000 (13:03 +0200)]
ice: move LAG function in code to prepare for Active-Active

In the SRIOV LAG Active-Active, the functions ice_lag_cfg_pf_fltr's
and ice_lag_config_eswitch's content are moved to earlier locations
in the source file.  Also, ice_lag_cfg_pf_fltr is renamed, and its
flow is changed.

To reduce the delta in the larger patch, move the original functions
to their new location so that only functional changes are needed in
the larger patch.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: Add driver specific prefix to LAG defines
Dave Ertman [Mon, 16 Jun 2025 11:03:18 +0000 (13:03 +0200)]
ice: Add driver specific prefix to LAG defines

A define in the LAG code is missing a driver specific prefix.
Add a prefix to the define.

Also shorten a defines name and move to a more logical place.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: replace u8 elements with bool where appropriate
Dave Ertman [Mon, 16 Jun 2025 11:03:17 +0000 (13:03 +0200)]
ice: replace u8 elements with bool where appropriate

In preparation for the new LAG functionality implementation, there are
a couple of existing LAG elements of the capabilities struct that should
be bool instead of u8.  Since we are adding a new element to this struct
that should also be a bool, fix the existing LAG u8 in this patch and
eliminate !! operators where possible.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: Remove casts on void pointers in LAG code
Dave Ertman [Mon, 16 Jun 2025 11:03:16 +0000 (13:03 +0200)]
ice: Remove casts on void pointers in LAG code

This series will be touching on the LAG code in the ice driver,
to prevent moving or propagating casting on void pointers, clean
them up first.

This also allows for moving the variable initialization into the
variable declaration.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoMerge tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 14 Aug 2025 14:14:30 +0000 (07:14 -0700)]
Merge tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from Netfilter and IPsec.

  Current release - regressions:

   - netfilter: nft_set_pipapo:
      - don't return bogus extension pointer
      - fix null deref for empty set

  Current release - new code bugs:

   - core: prevent deadlocks when enabling NAPIs with mixed kthread
     config

   - eth: netdevsim: Fix wild pointer access in nsim_queue_free().

  Previous releases - regressions:

   - page_pool: allow enabling recycling late, fix false positive
     warning

   - sched: ets: use old 'nbands' while purging unused classes

   - xfrm:
      - restore GSO for SW crypto
      - bring back device check in validate_xmit_xfrm

   - tls: handle data disappearing from under the TLS ULP

   - ptp: prevent possible ABBA deadlock in ptp_clock_freerun()

   - eth:
      - bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
      - hv_netvsc: fix panic during namespace deletion with VF

  Previous releases - always broken:

   - netfilter: fix refcount leak on table dump

   - vsock: do not allow binding to VMADDR_PORT_ANY

   - sctp: linearize cloned gso packets in sctp_rcv

   - eth:
      - hibmcge: fix the division by zero issue
      - microchip: fix KSZ8863 reset problem"

* tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
  net: usb: asix_devices: add phy_mask for ax88772 mdio bus
  net: kcm: Fix race condition in kcm_unattach()
  selftests: net/forwarding: test purge of active DWRR classes
  net/sched: ets: use old 'nbands' while purging unused classes
  bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
  netdevsim: Fix wild pointer access in nsim_queue_free().
  net: mctp: Fix bad kfree_skb in bind lookup test
  netfilter: nf_tables: reject duplicate device on updates
  ipvs: Fix estimator kthreads preferred affinity
  netfilter: nft_set_pipapo: fix null deref for empty set
  selftests: tls: test TCP stealing data from under the TLS socket
  tls: handle data disappearing from under the TLS ULP
  ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
  ixgbe: prevent from unwanted interface name changes
  devlink: let driver opt out of automatic phys_port_name generation
  net: prevent deadlocks when enabling NAPIs with mixed kthread config
  net: update NAPI threaded config even for disabled NAPIs
  selftests: drv-net: don't assume device has only 2 queues
  docs: Fix name for net.ipv4.udp_child_hash_entries
  riscv: dts: thead: Add APB clocks for TH1520 GMACs
  ...

2 months agoMerge branch 'net-ethtool-support-including-flow-label-in-the-flow-hash-for-rss'
Paolo Abeni [Thu, 14 Aug 2025 09:40:21 +0000 (11:40 +0200)]
Merge branch 'net-ethtool-support-including-flow-label-in-the-flow-hash-for-rss'

Jakub Kicinski says:

====================
net: ethtool: support including Flow Label in the flow hash for RSS

Add support for using IPv6 Flow Label in Rx hash computation
and therefore RSS queue selection.

v3: https://lore.kernel.org/20250724015101.186608-1-kuba@kernel.org
v2:  https://lore.kernel.org/20250722014915.3365370-1-kuba@kernel.org
RFC: https://lore.kernel.org/20250609173442.1745856-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250811234212.580748-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoselftests: drv-net: add test for RSS on flow label
Jakub Kicinski [Mon, 11 Aug 2025 23:42:12 +0000 (16:42 -0700)]
selftests: drv-net: add test for RSS on flow label

Add a simple test for checking that RSS on flow label works,
and that its rejected for IPv4 flows.

 # ./tools/testing/selftests/drivers/net/hw/rss_flow_label.py
 TAP version 13
 1..2
 ok 1 rss_flow_label.test_rss_flow_label
 ok 2 rss_flow_label.test_rss_flow_label_6only
 # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250811234212.580748-5-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoeth: bnxt: support RSS on IPv6 Flow Label
Jakub Kicinski [Mon, 11 Aug 2025 23:42:11 +0000 (16:42 -0700)]
eth: bnxt: support RSS on IPv6 Flow Label

It appears that the bnxt FW API has the relevant bit for Flow Label
hashing. Plumb in the support. Obey the capability bit.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250811234212.580748-4-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoeth: fbnic: support RSS on IPv6 Flow Label
Jakub Kicinski [Mon, 11 Aug 2025 23:42:10 +0000 (16:42 -0700)]
eth: fbnic: support RSS on IPv6 Flow Label

Support IPv6 Flow Label hashing. Use both inner and outer IPv6
header's Flow Label if both headers are detected. Flow Label
is unlike normal header fields, by enabling it user accepts
the unstable hash and possible reordering. Because of that
I think it's reasonable to hash over all Flow Labels we can
find, even tho we don't hash over all L3 addresses.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250811234212.580748-3-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: ethtool: support including Flow Label in the flow hash for RSS
Jakub Kicinski [Mon, 11 Aug 2025 23:42:09 +0000 (16:42 -0700)]
net: ethtool: support including Flow Label in the flow hash for RSS

Some modern NICs support including the IPv6 Flow Label in
the flow hash for RSS queue selection. This is outside
the old "Microsoft spec", but was included in the OCP NIC spec:

  [ ] RSS include flow label in the hash (configurable)

https://www.opencompute.org/w/index.php?title=Core_Offloads#Receive_Side_Scaling

RSS Flow Label hashing allows TCP Protective Load Balancing (PLB)
to recover from receiver congestion / overload.
Rx CPU/queue hotspots are relatively common for data ingest
workloads, and so far we had to try to detect the condition
at the RPC layer and reopen the connection. PLB lets us change
the Flow Label and therefore Rx CPU on RTO, with minimal packet
reordering. PLB reaction times are much faster, and can happen
at any point in the connection, not just at RPC boundaries.

Due to the nature of host processing (relatively long queues,
other kernel subsystems masking IRQs for 100s of msecs)
the risk of reordering within the host is higher than in
the network. But for applications which need it - it is far
preferable to potentially persistent overload of subset of
queues.

It is expected that the hash communicated to the host
may change if the Flow Label changes. This may be surprising
to some host software, but I don't expect the devices
can compute two Toeplitz hashes, one with the Flow Label
for queue selection and one without for the rx hash
communicated to the host. Besides, changing the hash
may potentially help to change the path thru host queues.
User can disable NETIF_F_RXHASH if they require a stable
flow hash.

The name RXH_IP6_FL was chosen based on what we call
Flow Label variables in IPv6 processing (fl). I prefer
fl_lbl but that appears to be an fbnic-only spelling.
We could spell out RXH_IP6_FLOW_LABEL but existing
RXH_ defines are a lot more terse.

Willem notes [1] that Flow Label is defined as identifying the flow
and therefore including both the flow label _and_ the L4 header
fields is not generally necessary. But it should not hurt so
it's not explicitly prevented if the driver supports hashing
on both at the same time.

Link: https://lore.kernel.org/68483433b45e2_3cd66f29440@willemb.c.googlers.com.notmuch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250811234212.580748-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: usb: asix_devices: add phy_mask for ax88772 mdio bus
Xu Yang [Mon, 11 Aug 2025 09:29:31 +0000 (17:29 +0800)]
net: usb: asix_devices: add phy_mask for ax88772 mdio bus

Without setting phy_mask for ax88772 mdio bus, current driver may create
at most 32 mdio phy devices with phy address range from 0x00 ~ 0x1f.
DLink DUB-E100 H/W Ver B1 is such a device. However, only one main phy
device will bind to net phy driver. This is creating issue during system
suspend/resume since phy_polling_mode() in phy_state_machine() will
directly deference member of phydev->drv for non-main phy devices. Then
NULL pointer dereference issue will occur. Due to only external phy or
internal phy is necessary, add phy_mask for ax88772 mdio bus to workarnoud
the issue.

Closes: https://lore.kernel.org/netdev/20250806082931.3289134-1-xu.yang_2@nxp.com
Fixes: e532a096be0e ("net: usb: asix: ax88772: add phylib support")
Cc: stable@vger.kernel.org
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250811092931.860333-1-xu.yang_2@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: cadence: macb: convert from round_rate() to determine_rate()
Brian Masney [Sun, 10 Aug 2025 22:24:14 +0000 (18:24 -0400)]
net: cadence: macb: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate().

Signed-off-by: Brian Masney <bmasney@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250810-net-round-rate-v1-1-dbb237c9fe5c@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoMerge tag 'probes-fixes-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 14 Aug 2025 03:23:32 +0000 (20:23 -0700)]
Merge tag 'probes-fixes-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fix from Masami Hiramatsu:

 - MAINTAINERS: Remove bouncing kprobes maintainer

* tag 'probes-fixes-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  MAINTAINERS: Remove bouncing kprobes maintainer

2 months agoMAINTAINERS: Remove bouncing kprobes maintainer
Dave Hansen [Thu, 14 Aug 2025 02:38:58 +0000 (11:38 +0900)]
MAINTAINERS: Remove bouncing kprobes maintainer

The kprobes MAINTAINERS entry includes anil.s.keshavamurthy@intel.com.
That address is bouncing. Remove it.

This still leaves three other listed maintainers.

Link: https://lore.kernel.org/all/20250808180124.7DDE2ECD@davehans-spike.ostc.intel.com/
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: linux-trace-kernel@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2 months agoMerge branch 'net-don-t-use-pk-through-printk-or-tracepoints'
Jakub Kicinski [Thu, 14 Aug 2025 01:26:18 +0000 (18:26 -0700)]
Merge branch 'net-don-t-use-pk-through-printk-or-tracepoints'

Thomas Weißschuh says:

====================
net: Don't use %pK through printk or tracepoints

In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.

Switch to the regular pointer formatting which is safer and
easier to reason about.
There are still a few users of %pK left, but these use it through seq_file,
for which its usage is safe.
====================

Link: https://patch.msgid.link/20250811-restricted-pointers-net-v5-0-2e2fdc7d3f2c@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5: Don't use %pK through tracepoints
Thomas Weißschuh [Mon, 11 Aug 2025 09:43:19 +0000 (11:43 +0200)]
net/mlx5: Don't use %pK through tracepoints

In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through tracepoints. They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.

Switch to the regular pointer formatting which is safer and
easier to reason about.
There are still a few users of %pK left, but these use it through seq_file,
for which its usage is safe.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250811-restricted-pointers-net-v5-2-2e2fdc7d3f2c@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoice: Don't use %pK through printk or tracepoints
Thomas Weißschuh [Mon, 11 Aug 2025 09:43:18 +0000 (11:43 +0200)]
ice: Don't use %pK through printk or tracepoints

In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.

Switch to the regular pointer formatting which is safer and
easier to reason about.
There are still a few users of %pK left, but these use it through seq_file,
for which its usage is safe.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250811-restricted-pointers-net-v5-1-2e2fdc7d3f2c@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: kcm: Fix race condition in kcm_unattach()
Sven Stegemann [Tue, 12 Aug 2025 19:18:03 +0000 (21:18 +0200)]
net: kcm: Fix race condition in kcm_unattach()

syzbot found a race condition when kcm_unattach(psock)
and kcm_release(kcm) are executed at the same time.

kcm_unattach() is missing a check of the flag
kcm->tx_stopped before calling queue_work().

If the kcm has a reserved psock, kcm_unattach() might get executed
between cancel_work_sync() and unreserve_psock() in kcm_release(),
requeuing kcm->tx_work right before kcm gets freed in kcm_done().

Remove kcm->tx_stopped and replace it by the less
error-prone disable_work_sync().

Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+e62c9db591c30e174662@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e62c9db591c30e174662
Reported-by: syzbot+d199b52665b6c3069b94@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d199b52665b6c3069b94
Reported-by: syzbot+be6b1fdfeae512726b4e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=be6b1fdfeae512726b4e
Signed-off-by: Sven Stegemann <sven@stegemann.de>
Link: https://patch.msgid.link/20250812191810.27777-1-sven@stegemann.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'ets-use-old-nbands-while-purging-unused-classes'
Jakub Kicinski [Thu, 14 Aug 2025 01:11:56 +0000 (18:11 -0700)]
Merge branch 'ets-use-old-nbands-while-purging-unused-classes'

Davide Caratti says:

====================
ets: use old 'nbands' while purging unused classes

- patch 1/2 fixes a NULL dereference in the control path of sch_ets qdisc
- patch 2/2 extends kselftests to verify effectiveness of the above fix
====================

Link: https://patch.msgid.link/cover.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: net/forwarding: test purge of active DWRR classes
Davide Caratti [Tue, 12 Aug 2025 16:40:30 +0000 (18:40 +0200)]
selftests: net/forwarding: test purge of active DWRR classes

Extend sch_ets.sh to add a reproducer for problematic list deletions when
active DWRR class are purged by ets_qdisc_change() [1] [2].

[1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/
[2] https://lore.kernel.org/netdev/f3b9bacc73145f265c19ab80785933da5b7cbdec.1754581577.git.dcaratti@redhat.com/

Suggested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/489497cb781af7389011ca1591fb702a7391f5e7.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/sched: ets: use old 'nbands' while purging unused classes
Davide Caratti [Tue, 12 Aug 2025 16:40:29 +0000 (18:40 +0200)]
net/sched: ets: use old 'nbands' while purging unused classes

Shuang reported sch_ets test-case [1] crashing in ets_class_qlen_notify()
after recent changes from Lion [2]. The problem is: in ets_qdisc_change()
we purge unused DWRR queues; the value of 'q->nbands' is the new one, and
the cleanup should be done with the old one. The problem is here since my
first attempts to fix ets_qdisc_change(), but it surfaced again after the
recent qdisc len accounting fixes. Fix it purging idle DWRR queues before
assigning a new value of 'q->nbands', so that all purge operations find a
consistent configuration:

 - old 'q->nbands' because it's needed by ets_class_find()
 - old 'q->nstrict' because it's needed by ets_class_is_strict()

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 62 UID: 0 PID: 39457 Comm: tc Kdump: loaded Not tainted 6.12.0-116.el10.x86_64 #1 PREEMPT(voluntary)
 Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS 2.12.2 07/09/2021
 RIP: 0010:__list_del_entry_valid_or_report+0x4/0x80
 Code: ff 4c 39 c7 0f 84 39 19 8e ff b8 01 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b 17 48 8b 4f 08 48 85 d2 0f 84 56 19 8e ff 48 85 c9 0f 84 ab
 RSP: 0018:ffffba186009f400 EFLAGS: 00010202
 RAX: 00000000000000d6 RBX: 0000000000000000 RCX: 0000000000000004
 RDX: ffff9f0fa29b69c0 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffffffc12c2400 R08: 0000000000000008 R09: 0000000000000004
 R10: ffffffffffffffff R11: 0000000000000004 R12: 0000000000000000
 R13: ffff9f0f8cfe0000 R14: 0000000000100005 R15: 0000000000000000
 FS:  00007f2154f37480(0000) GS:ffff9f269c1c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001530be001 CR4: 00000000007726f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  ets_class_qlen_notify+0x65/0x90 [sch_ets]
  qdisc_tree_reduce_backlog+0x74/0x110
  ets_qdisc_change+0x630/0xa40 [sch_ets]
  __tc_modify_qdisc.constprop.0+0x216/0x7f0
  tc_modify_qdisc+0x7c/0x120
  rtnetlink_rcv_msg+0x145/0x3f0
  netlink_rcv_skb+0x53/0x100
  netlink_unicast+0x245/0x390
  netlink_sendmsg+0x21b/0x470
  ____sys_sendmsg+0x39d/0x3d0
  ___sys_sendmsg+0x9a/0xe0
  __sys_sendmsg+0x7a/0xd0
  do_syscall_64+0x7d/0x160
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f2155114084
 Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d 25 f0 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
 RSP: 002b:00007fff1fd7a988 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 0000560ec063e5e0 RCX: 00007f2155114084
 RDX: 0000000000000000 RSI: 00007fff1fd7a9f0 RDI: 0000000000000003
 RBP: 00007fff1fd7aa60 R08: 0000000000000010 R09: 000000000000003f
 R10: 0000560ee9b3a010 R11: 0000000000000202 R12: 00007fff1fd7aae0
 R13: 000000006891ccde R14: 0000560ec063e5e0 R15: 00007fff1fd7aad0
  </TASK>

 [1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/
 [2] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/

Cc: stable@vger.kernel.org
Fixes: 103406b38c60 ("net/sched: Always pass notifications when child class becomes empty")
Fixes: c062f2a0b04d ("net/sched: sch_ets: don't remove idle classes from the round-robin list")
Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc")
Reported-by: Li Shuang <shuali@redhat.com>
Closes: https://issues.redhat.com/browse/RHEL-108026
Reviewed-by: Petr Machata <petrm@nvidia.com>
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/7928ff6d17db47a2ae7cc205c44777b1f1950545.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Jakub Kicinski [Thu, 14 Aug 2025 00:31:46 +0000 (17:31 -0700)]
Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
ixgbe: bypass devlink phys_port_name generation

Jedrzej adds option to skip phys_port_name generation and opts
ixgbe into it as some configurations rely on pre-devlink naming
which could end up broken as a result.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ixgbe: prevent from unwanted interface name changes
  devlink: let driver opt out of automatic phys_port_name generation
====================

Link: https://patch.msgid.link/20250812205226.1984369-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: netconsole: Validate interface selection by MAC address
Andre Carvalho [Tue, 12 Aug 2025 19:38:23 +0000 (20:38 +0100)]
selftests: netconsole: Validate interface selection by MAC address

Extend the existing netconsole cmdline selftest to also validate that
interface selection can be performed via MAC address.

The test now validates that netconsole works with both interface name
and MAC address, improving test coverage.

Suggested-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Andre Carvalho <asantostc@gmail.com>
Link: https://patch.msgid.link/20250812-netcons-cmdline-selftest-v2-1-8099fb7afa9e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agobnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
David Wei [Tue, 12 Aug 2025 18:29:07 +0000 (11:29 -0700)]
bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE

The data page pool always fills the HW rx ring with pages. On arm64 with
64K pages, this will waste _at least_ 32K of memory per entry in the rx
ring.

Fix by fragmenting the pages if PAGE_SIZE > BNXT_RX_PAGE_SIZE. This
makes the data page pool the same as the header pool.

Tested with iperf3 with a small (64 entries) rx ring to encourage buffer
circulation.

Fixes: cd1fafe7da1f ("eth: bnxt: add support rx side device memory TCP")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250812182907.1540755-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonetdevsim: Fix wild pointer access in nsim_queue_free().
Kuniyuki Iwashima [Tue, 12 Aug 2025 16:21:26 +0000 (16:21 +0000)]
netdevsim: Fix wild pointer access in nsim_queue_free().

syzbot reported the splat below. [0]

When nsim_queue_uninit() is called from nsim_init_netdevsim(),
register_netdevice() has not been called, thus dev->dstats has
not been allocated.

Let's not call dev_dstats_rx_dropped_add() in such a case.

[0]
BUG: unable to handle page fault for address: ffff88809782c020
 PF: supervisor write access in kernel mode
 PF: error_code(0x0002) - not-present page
PGD 1b401067 P4D 1b401067 PUD 0
Oops: Oops: 0002 [#1] SMP KASAN NOPTI
CPU: 3 UID: 0 PID: 8476 Comm: syz.1.251 Not tainted 6.16.0-syzkaller-06699-ge8d780dcd957 #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:local_add arch/x86/include/asm/local.h:33 [inline]
RIP: 0010:u64_stats_add include/linux/u64_stats_sync.h:89 [inline]
RIP: 0010:dev_dstats_rx_dropped_add include/linux/netdevice.h:3027 [inline]
RIP: 0010:nsim_queue_free+0xba/0x120 drivers/net/netdevsim/netdev.c:714
Code: 07 77 6c 4a 8d 3c ed 20 7e f1 8d 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 46 4a 03 1c ed 20 7e f1 8d <4c> 01 63 20 be 00 02 00 00 48 8d 3d 00 00 00 00 e8 61 2f 58 fa 48
RSP: 0018:ffffc900044af150 EFLAGS: 00010286
RAX: dffffc0000000000 RBX: ffff88809782c000 RCX: 00000000000079c3
RDX: 1ffffffff1be2fc7 RSI: ffffffff8c15f380 RDI: ffffffff8df17e38
RBP: ffff88805f59d000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000003 R14: ffff88806ceb3d00 R15: ffffed100dfd308e
FS:  0000000000000000(0000) GS:ffff88809782c000(0063) knlGS:00000000f505db40
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: ffff88809782c020 CR3: 000000006fc6a000 CR4: 0000000000352ef0
Call Trace:
 <TASK>
 nsim_queue_uninit drivers/net/netdevsim/netdev.c:993 [inline]
 nsim_init_netdevsim drivers/net/netdevsim/netdev.c:1049 [inline]
 nsim_create+0xd0a/0x1260 drivers/net/netdevsim/netdev.c:1101
 __nsim_dev_port_add+0x435/0x7d0 drivers/net/netdevsim/dev.c:1438
 nsim_dev_port_add_all drivers/net/netdevsim/dev.c:1494 [inline]
 nsim_dev_reload_create drivers/net/netdevsim/dev.c:1546 [inline]
 nsim_dev_reload_up+0x5b8/0x860 drivers/net/netdevsim/dev.c:1003
 devlink_reload+0x322/0x7c0 net/devlink/dev.c:474
 devlink_nl_reload_doit+0xe31/0x1410 net/devlink/dev.c:584
 genl_family_rcv_msg_doit+0x206/0x2f0 net/netlink/genetlink.c:1115
 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
 genl_rcv_msg+0x55c/0x800 net/netlink/genetlink.c:1210
 netlink_rcv_skb+0x155/0x420 net/netlink/af_netlink.c:2552
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
 netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
 netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1346
 netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896
 sock_sendmsg_nosec net/socket.c:714 [inline]
 __sock_sendmsg net/socket.c:729 [inline]
 ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614
 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668
 __sys_sendmsg+0x16d/0x220 net/socket.c:2700
 do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline]
 __do_fast_syscall_32+0x7c/0x3a0 arch/x86/entry/syscall_32.c:306
 do_fast_syscall_32+0x32/0x80 arch/x86/entry/syscall_32.c:331
 entry_SYSENTER_compat_after_hwframe+0x84/0x8e
RIP: 0023:0xf708e579
Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
RSP: 002b:00000000f505d55c EFLAGS: 00000296 ORIG_RAX: 0000000000000172
RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 0000000080000080
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 </TASK>
Modules linked in:
CR2: ffff88809782c020

Fixes: 2a68a22304f9 ("netdevsim: account dropped packet length in stats on queue free")
Reported-by: syzbot+8aa80c6232008f7b957d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/688bb9ca.a00a0220.26d0e1.0050.GAE@google.com/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250812162130.4129322-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>