Ido Schimmel [Tue, 12 Aug 2025 08:02:13 +0000 (11:02 +0300)]
selftests: net: Test bridge backup port when port is administratively down
Test that packets are redirected to the backup port when the primary
port is administratively down.
With the previous patch:
# ./test_bridge_backup_port.sh
[...]
TEST: swp1 administratively down [ OK ]
TEST: No forwarding out of swp1 [ OK ]
TEST: Forwarding out of vx0 [ OK ]
TEST: swp1 administratively up [ OK ]
TEST: Forwarding out of swp1 [ OK ]
TEST: No forwarding out of vx0 [ OK ]
[...]
Tests passed: 89
Tests failed: 0
Without the previous patch:
# ./test_bridge_backup_port.sh
[...]
TEST: swp1 administratively down [ OK ]
TEST: No forwarding out of swp1 [ OK ]
TEST: Forwarding out of vx0 [FAIL]
TEST: swp1 administratively up [ OK ]
TEST: Forwarding out of swp1 [ OK ]
[...]
Tests passed: 85
Tests failed: 4
Ido Schimmel [Tue, 12 Aug 2025 08:02:12 +0000 (11:02 +0300)]
bridge: Redirect to backup port when port is administratively down
If a backup port is configured for a bridge port, the bridge will
redirect known unicast traffic towards the backup port when the primary
port is administratively up but without a carrier. This is useful, for
example, in MLAG configurations where a system is connected to two
switches and there is a peer link between both switches. The peer link
serves as the backup port in case one of the switches loses its
connection to the multi-homed system.
In order to avoid flooding when the primary port loses its carrier, the
bridge does not flush dynamic FDB entries pointing to the port upon STP
disablement, if the port has a backup port.
The above means that known unicast traffic destined to the primary port
will be blackholed when the port is put administratively down, until the
FDB entries pointing to it are aged-out.
Given that the current behavior is quite weird and unlikely to be
depended on by anyone, amend the bridge to redirect to the backup port
also when the primary port is administratively down and not only when it
does not have a carrier.
The change is motivated by a report from a user who expected traffic to
be redirected to the backup port when the primary port was put
administratively down while debugging a network issue.
Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250812080213.325298-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 13 Aug 2025 07:44:54 +0000 (10:44 +0300)]
net: phy: mscc: report and configure in-band auto-negotiation for SGMII/QSGMII
The following Vitesse/Microsemi/Microchip PHYs, among those supported by
this driver, have the host interface configurable as SGMII or QSGMII:
- VSC8504
- VSC8514
- VSC8552
- VSC8562
- VSC8572
- VSC8574
- VSC8575
- VSC8582
- VSC8584
All these PHYs are documented to have bit 7 of "MAC SerDes PCS Control"
as "MAC SerDes ANEG enable".
Out of these, I could test the VSC8514 quad PHY in QSGMII. This works
both with the in-band autoneg on and off, on the NXP LS1028A-RDB and
T1040-RDB boards.
Notably, the bit is sticky (survives soft resets), so giving Linux the
tools to read and modify this settings makes it robust to changes made
to it by previous boot layers (U-Boot).
These are present since commit d8652956cf37 ("net: dsa: realtek-smi: Add
Realtek SMI driver") and never needed. Apparently the driver was not
cleaned up sufficiently for submission.
Jakub Kicinski [Fri, 15 Aug 2025 00:35:23 +0000 (17:35 -0700)]
Merge branch 'devlink-port-attr-cleanup'
Parav Pandit says:
====================
devlink port attr cleanup
patch-1 removes the return 0 check at several places and simplfies
patch-2 constifies the attributes and moves the checks early
caller
====================
Parav Pandit [Wed, 13 Aug 2025 09:44:17 +0000 (12:44 +0300)]
devlink/port: Check attributes early and constify
Constify the devlink port attributes to indicate they are read only
and does not depend on anything else. Therefore, validate it early
before setting in the devlink port.
Dan Carpenter [Wed, 13 Aug 2025 05:51:22 +0000 (08:51 +0300)]
nfc: pn533: Delete an unnecessary check
The "rc" variable is set like this:
if (IS_ERR(resp)) {
rc = PTR_ERR(resp);
We know that "rc" is negative so there is no need to check.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/aJwn2ox5g9WsD2Vx@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Markus Stockhausen [Wed, 13 Aug 2025 05:44:07 +0000 (01:44 -0400)]
net: phy: realtek: convert RTL8226-CG to c45 only
Short: Convert the RTL8226-CG to c45 so it can be used in its
Realtek based ecosystems.
Long: The RTL8226-CG can be mainly found on devices of the
Realtek Otto switch platform. Devices like the Zyxel XGS1210-12
are based on it. These implement a hardware based phy polling
in the background to update SoC status registers.
The hardware provides 4 smi busses where phys are attached to.
For each bus one can decide if it is polled in c45 or c22 mode.
See https://svanheule.net/realtek/longan/register/smi_glb_ctrl
With this setting the register access will be limited by the
hardware. This is very complex (including caching and special
c45-over-c22 handling). But basically it boils down to "enable
protocol x and SoC will disable register access via protocol y".
Mainline already gained support for the rtl9300 mdio driver
in commit 24e31e474769 ("net: mdio: Add RTL9300 MDIO driver").
It covers the basic features, but a lot effort is still needed
to understand hardware properly. So it runs a simple setup by
selecting the proper bus mode during startup.
/* Put the interfaces into C45 mode if required */
glb_ctrl_mask = GENMASK(19, 16);
for (i = 0; i < MAX_SMI_BUSSES; i++)
if (priv->smi_bus_is_c45[i])
glb_ctrl_val |= GLB_CTRL_INTF_SEL(i);
...
err = regmap_update_bits(regmap, SMI_GLB_CTRL,
glb_ctrl_mask, glb_ctrl_val);
To avoid complex coding later on, it limits access by only
providing either c22 or c45:
Because of these limitations the existing RTL8226 phy driver
is not working at all on Realtek switches. Convert the driver
to c45-only.
Luckily the RTL8226 seems to support proper MDIO_PMA_EXTABLE
flags. So standard function genphy_c45_pma_read_abilities() can
call genphy_c45_pma_read_ext_abilities() and 10/100/1000 is
populated right. Thus conversion is straight forward.
Outputs before - REMARK: For this a "hacked" bus was used that
toggles the mode for each c22/c45 access. But that is slow and
produces unstable data in the SoC status registers).
Settings for lan9:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
2500baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
2500baseT/Full
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 24
Transceiver: external
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Link detected: no
Outputs with this commit:
Settings for lan9:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
2500baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
2500baseT/Full
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 24
Transceiver: external
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Link detected: no
Jakub Kicinski [Fri, 15 Aug 2025 00:26:37 +0000 (17:26 -0700)]
Merge tag 'docs/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-docs
Mauro Carvalho Chehab says:
====================
add a generic yaml parser integrated with Netlink specs generation
- An YAML parser Sphinx plugin, integrated with Netlink YAML doc
parser.
The patch content is identical to my v10 submission:
https://lore.kernel.org/cover.1753718185.git.mchehab+huawei@kernel.org
* tag 'docs/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-docs:
sphinx: parser_yaml.py: fix line numbers information
docs: parser_yaml.py: fix backward compatibility with old docutils
docs: parser_yaml.py: add support for line numbers from the parser
tools: netlink_yml_parser.py: add line numbers to parsed data
MAINTAINERS: add netlink_yml_parser.py to linux-doc
docs: netlink: remove obsolete .gitignore from unused directory
tools: ynl_gen_rst.py: drop support for generating index files
docs: uapi: netlink: update netlink specs link
docs: use parser_yaml extension to handle Netlink specs
docs: sphinx: add a parser for yaml files for Netlink specs
tools: ynl_gen_rst.py: cleanup coding style
docs: netlink: index.rst: add a netlink index file
tools: ynl_gen_rst.py: Split library from command line tool
docs: netlink: netlink-raw.rst: use :ref: instead of :doc:
====================
Cross-merge networking fixes after downstream PR (net-6.17-rc2).
No conflicts.
Adjacent changes:
drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c d7a276a5768f ("net: stmmac: rk: convert to suspend()/resume() methods") de1e963ad064 ("net: stmmac: rk: put the PHY clock on remove")
- sched: ets: use old 'nbands' while purging unused classes
- xfrm:
- restore GSO for SW crypto
- bring back device check in validate_xmit_xfrm
- tls: handle data disappearing from under the TLS ULP
- ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
- eth:
- bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
- hv_netvsc: fix panic during namespace deletion with VF
Previous releases - always broken:
- netfilter: fix refcount leak on table dump
- vsock: do not allow binding to VMADDR_PORT_ANY
- sctp: linearize cloned gso packets in sctp_rcv
- eth:
- hibmcge: fix the division by zero issue
- microchip: fix KSZ8863 reset problem"
* tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
net: usb: asix_devices: add phy_mask for ax88772 mdio bus
net: kcm: Fix race condition in kcm_unattach()
selftests: net/forwarding: test purge of active DWRR classes
net/sched: ets: use old 'nbands' while purging unused classes
bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
netdevsim: Fix wild pointer access in nsim_queue_free().
net: mctp: Fix bad kfree_skb in bind lookup test
netfilter: nf_tables: reject duplicate device on updates
ipvs: Fix estimator kthreads preferred affinity
netfilter: nft_set_pipapo: fix null deref for empty set
selftests: tls: test TCP stealing data from under the TLS socket
tls: handle data disappearing from under the TLS ULP
ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
ixgbe: prevent from unwanted interface name changes
devlink: let driver opt out of automatic phys_port_name generation
net: prevent deadlocks when enabling NAPIs with mixed kthread config
net: update NAPI threaded config even for disabled NAPIs
selftests: drv-net: don't assume device has only 2 queues
docs: Fix name for net.ipv4.udp_child_hash_entries
riscv: dts: thead: Add APB clocks for TH1520 GMACs
...
Jakub Kicinski [Mon, 11 Aug 2025 23:42:12 +0000 (16:42 -0700)]
selftests: drv-net: add test for RSS on flow label
Add a simple test for checking that RSS on flow label works,
and that its rejected for IPv4 flows.
# ./tools/testing/selftests/drivers/net/hw/rss_flow_label.py
TAP version 13
1..2
ok 1 rss_flow_label.test_rss_flow_label
ok 2 rss_flow_label.test_rss_flow_label_6only
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Jakub Kicinski [Mon, 11 Aug 2025 23:42:10 +0000 (16:42 -0700)]
eth: fbnic: support RSS on IPv6 Flow Label
Support IPv6 Flow Label hashing. Use both inner and outer IPv6
header's Flow Label if both headers are detected. Flow Label
is unlike normal header fields, by enabling it user accepts
the unstable hash and possible reordering. Because of that
I think it's reasonable to hash over all Flow Labels we can
find, even tho we don't hash over all L3 addresses.
Jakub Kicinski [Mon, 11 Aug 2025 23:42:09 +0000 (16:42 -0700)]
net: ethtool: support including Flow Label in the flow hash for RSS
Some modern NICs support including the IPv6 Flow Label in
the flow hash for RSS queue selection. This is outside
the old "Microsoft spec", but was included in the OCP NIC spec:
[ ] RSS include flow label in the hash (configurable)
RSS Flow Label hashing allows TCP Protective Load Balancing (PLB)
to recover from receiver congestion / overload.
Rx CPU/queue hotspots are relatively common for data ingest
workloads, and so far we had to try to detect the condition
at the RPC layer and reopen the connection. PLB lets us change
the Flow Label and therefore Rx CPU on RTO, with minimal packet
reordering. PLB reaction times are much faster, and can happen
at any point in the connection, not just at RPC boundaries.
Due to the nature of host processing (relatively long queues,
other kernel subsystems masking IRQs for 100s of msecs)
the risk of reordering within the host is higher than in
the network. But for applications which need it - it is far
preferable to potentially persistent overload of subset of
queues.
It is expected that the hash communicated to the host
may change if the Flow Label changes. This may be surprising
to some host software, but I don't expect the devices
can compute two Toeplitz hashes, one with the Flow Label
for queue selection and one without for the rx hash
communicated to the host. Besides, changing the hash
may potentially help to change the path thru host queues.
User can disable NETIF_F_RXHASH if they require a stable
flow hash.
The name RXH_IP6_FL was chosen based on what we call
Flow Label variables in IPv6 processing (fl). I prefer
fl_lbl but that appears to be an fbnic-only spelling.
We could spell out RXH_IP6_FLOW_LABEL but existing
RXH_ defines are a lot more terse.
Willem notes [1] that Flow Label is defined as identifying the flow
and therefore including both the flow label _and_ the L4 header
fields is not generally necessary. But it should not hurt so
it's not explicitly prevented if the driver supports hashing
on both at the same time.
Xu Yang [Mon, 11 Aug 2025 09:29:31 +0000 (17:29 +0800)]
net: usb: asix_devices: add phy_mask for ax88772 mdio bus
Without setting phy_mask for ax88772 mdio bus, current driver may create
at most 32 mdio phy devices with phy address range from 0x00 ~ 0x1f.
DLink DUB-E100 H/W Ver B1 is such a device. However, only one main phy
device will bind to net phy driver. This is creating issue during system
suspend/resume since phy_polling_mode() in phy_state_machine() will
directly deference member of phydev->drv for non-main phy devices. Then
NULL pointer dereference issue will occur. Due to only external phy or
internal phy is necessary, add phy_mask for ax88772 mdio bus to workarnoud
the issue.
====================
net: Don't use %pK through printk or tracepoints
In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.
Switch to the regular pointer formatting which is safer and
easier to reason about.
There are still a few users of %pK left, but these use it through seq_file,
for which its usage is safe.
====================
Thomas Weißschuh [Mon, 11 Aug 2025 09:43:19 +0000 (11:43 +0200)]
net/mlx5: Don't use %pK through tracepoints
In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through tracepoints. They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.
Switch to the regular pointer formatting which is safer and
easier to reason about.
There are still a few users of %pK left, but these use it through seq_file,
for which its usage is safe.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250811-restricted-pointers-net-v5-2-2e2fdc7d3f2c@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thomas Weißschuh [Mon, 11 Aug 2025 09:43:18 +0000 (11:43 +0200)]
ice: Don't use %pK through printk or tracepoints
In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.
Switch to the regular pointer formatting which is safer and
easier to reason about.
There are still a few users of %pK left, but these use it through seq_file,
for which its usage is safe.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250811-restricted-pointers-net-v5-1-2e2fdc7d3f2c@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sven Stegemann [Tue, 12 Aug 2025 19:18:03 +0000 (21:18 +0200)]
net: kcm: Fix race condition in kcm_unattach()
syzbot found a race condition when kcm_unattach(psock)
and kcm_release(kcm) are executed at the same time.
kcm_unattach() is missing a check of the flag
kcm->tx_stopped before calling queue_work().
If the kcm has a reserved psock, kcm_unattach() might get executed
between cancel_work_sync() and unreserve_psock() in kcm_release(),
requeuing kcm->tx_work right before kcm gets freed in kcm_done().
Remove kcm->tx_stopped and replace it by the less
error-prone disable_work_sync().
====================
ets: use old 'nbands' while purging unused classes
- patch 1/2 fixes a NULL dereference in the control path of sch_ets qdisc
- patch 2/2 extends kselftests to verify effectiveness of the above fix
====================
Davide Caratti [Tue, 12 Aug 2025 16:40:29 +0000 (18:40 +0200)]
net/sched: ets: use old 'nbands' while purging unused classes
Shuang reported sch_ets test-case [1] crashing in ets_class_qlen_notify()
after recent changes from Lion [2]. The problem is: in ets_qdisc_change()
we purge unused DWRR queues; the value of 'q->nbands' is the new one, and
the cleanup should be done with the old one. The problem is here since my
first attempts to fix ets_qdisc_change(), but it surfaced again after the
recent qdisc len accounting fixes. Fix it purging idle DWRR queues before
assigning a new value of 'q->nbands', so that all purge operations find a
consistent configuration:
- old 'q->nbands' because it's needed by ets_class_find()
- old 'q->nstrict' because it's needed by ets_class_is_strict()
Jedrzej adds option to skip phys_port_name generation and opts
ixgbe into it as some configurations rely on pre-devlink naming
which could end up broken as a result.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ixgbe: prevent from unwanted interface name changes
devlink: let driver opt out of automatic phys_port_name generation
====================
David Wei [Tue, 12 Aug 2025 18:29:07 +0000 (11:29 -0700)]
bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
The data page pool always fills the HW rx ring with pages. On arm64 with
64K pages, this will waste _at least_ 32K of memory per entry in the rx
ring.
Fix by fragmenting the pages if PAGE_SIZE > BNXT_RX_PAGE_SIZE. This
makes the data page pool the same as the header pool.
Tested with iperf3 with a small (64 entries) rx ring to encourage buffer
circulation.
Fixes: cd1fafe7da1f ("eth: bnxt: add support rx side device memory TCP") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250812182907.1540755-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Waqar Hameed [Tue, 12 Aug 2025 12:13:58 +0000 (14:13 +0200)]
net: enetc: Remove error print for devm_add_action_or_reset()
When `devm_add_action_or_reset()` fails, it is due to a failed memory
allocation and will thus return `-ENOMEM`. `dev_err_probe()` doesn't do
anything when error is `-ENOMEM`. Therefore, remove the useless call to
`dev_err_probe()` when `devm_add_action_or_reset()` fails, and just
return the value instead.
Miguel García [Tue, 12 Aug 2025 08:22:44 +0000 (10:22 +0200)]
tun: replace strcpy with strscpy for ifr_name
Replace the strcpy() calls that copy the device name into ifr->ifr_name
with strscpy() to avoid potential overflows and guarantee NULL termination.
Destination is ifr->ifr_name (size IFNAMSIZ).
Tested in QEMU (BusyBox rootfs):
- Created TUN devices via TUNSETIFF helper
- Set addresses and brought links up
- Verified long interface names are safely truncated (IFNAMSIZ-1)
Lorenzo Bianconi [Tue, 12 Aug 2025 04:57:23 +0000 (06:57 +0200)]
net: mediatek: wed: Introduce MT7992 WED support to MT7988 SoC
Introduce the second WDMA RX ring in WED driver for MT7988 SoC since the
Mediatek MT7992 WiFi chipset supports two separated WDMA rings.
Add missing MT7988 configurations to properly support WED for MT7992 in
MT76 driver.
Wang Liang [Tue, 12 Aug 2025 01:59:29 +0000 (09:59 +0800)]
vsock: use sizeof(struct sockaddr_storage) instead of magic value
Previous commit 230b183921ec ("net: Use standard structures for generic
socket address structures.") use 'struct sockaddr_storage address;'
to replace 'char address[MAX_SOCK_ADDR];'.
The macro MAX_SOCK_ADDR is removed by commit 01893c82b4e6 ("net: Remove
MAX_SOCK_ADDR constant").
The comment in vsock_getname() is outdated, use sizeof(struct
sockaddr_storage) instead of magic value 128.
Tiezhu Yang [Mon, 11 Aug 2025 07:35:06 +0000 (15:35 +0800)]
net: stmmac: Return early if invalid in loongson_dwmac_fix_reset()
If the MAC controller does not connect to any PHY interface, there is a
missing clock, then the DMA reset fails.
For this case, the DMA_BUS_MODE_SFT_RESET bit is 1 before software reset,
just print an error message which gives a hint the PHY clock is missing,
and then return -EINVAL immediately to avoid waiting for the timeout when
the DMA reset fails in loongson_dwmac_fix_reset().
With this patch, for the normal end user, the computer start faster with
reducing boot time for 2 seconds on the specified mainboard.
Tiezhu Yang [Mon, 11 Aug 2025 07:35:05 +0000 (15:35 +0800)]
net: stmmac: Change first parameter of fix_soc_reset()
In order to use netdev_err() to print message in the callback function of
fix_soc_reset(), change fix_soc_reset() to have "struct stmmac_priv *" as
its first parameter.
This is preparation for later patch, no functionality change.
Tiezhu Yang [Mon, 11 Aug 2025 07:35:04 +0000 (15:35 +0800)]
net: stmmac: Check stmmac_hw_setup() in stmmac_resume()
stmmac_hw_setup() may return 0 on success and an appropriate negative
integer as defined in errno.h file on failure, just check it and then
return early if failed in stmmac_resume().
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://patch.msgid.link/20250811073506.27513-2-yangtiezhu@loongson.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 13 Aug 2025 21:51:51 +0000 (14:51 -0700)]
Merge tag 'nf-25-08-13' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for *net*:
1) I managed to add a null dereference crash in nft_set_pipapo
in the current development cycle, was not caught by CI
because the avx2 implementation is fine, but selftest
splats when run on non-avx2 host.
2) Fix the ipvs estimater kthread affinity, was incorrect
since 6.14. From Frederic Weisbecker.
3) nf_tables should not allow to add a device to a flowtable
or netdev chain more than once -- reject this.
From Pablo Neira Ayuso. This has been broken for long time,
blamed commit dates from v5.8.
* tag 'nf-25-08-13' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: reject duplicate device on updates
ipvs: Fix estimator kthreads preferred affinity
netfilter: nft_set_pipapo: fix null deref for empty set
====================
Linus Torvalds [Wed, 13 Aug 2025 18:29:27 +0000 (11:29 -0700)]
Merge tag 'erofs-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Align FSDAX enablement among multiple devices
- Fix EROFS_FS_ZIP_ACCEL build dependency again to prevent forcing
CRYPTO{,_DEFLATE}=y even if EROFS=m
- Fix atomic context detection to properly launch kworkers on demand
- Fix block count statistics for 48-bit addressing support
* tag 'erofs-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix block count report when 48-bit layout is on
erofs: fix atomic context detection when !CONFIG_DEBUG_LOCK_ALLOC
erofs: Do not select tristate symbols from bool symbols
erofs: Fallback to normal access if DAX is not supported on extra device
Linus Torvalds [Wed, 13 Aug 2025 17:23:28 +0000 (10:23 -0700)]
Merge tag 'rcu.fixes.6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU fix from Neeraj Upadhyay:
"Fix a regression introduced by commit b41642c87716 ("rcu: Fix
rcu_read_unlock() deadloop due to IRQ work") which results in boot
hang as reported by kernel test bot at [1].
This issue happens because RCU re-initializes the deferred QS IRQ work
everytime it is queued. With commit b41642c87716, the IRQ work
re-initialization can happen while it is already queued. This results
in IRQ work being requeued to itself. When IRQ work finally fires, as
it is requeued to itself, it is repeatedly executed and results in
hang.
Fix this with initializing the IRQ work only once before the CPU
boots"
Linus Torvalds [Wed, 13 Aug 2025 15:28:33 +0000 (08:28 -0700)]
Merge tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"12 hotfixes. 5 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels.
10 of these fixes are for MM"
* tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
proc: proc_maps_open allow proc_mem_open to return NULL
mm/mremap: avoid expensive folio lookup on mremap folio pte batch
userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
mm: pass page directly instead of using folio_page
selftests/proc: fix string literal warning in proc-maps-race.c
fs/proc/task_mmu: hold PTL in pagemap_hugetlb_range and gather_hugetlb_stats
mm/smaps: fix race between smaps_hugetlb_range and migration
mm: fix the race between collapse and PT_RECLAIM under per-vma lock
mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup()
MAINTAINERS: add Masami as a reviewer of hung task detector
mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock
kasan/test: fix protection against compiler elision
Pablo Neira Ayuso [Wed, 13 Aug 2025 00:38:50 +0000 (02:38 +0200)]
netfilter: nf_tables: reject duplicate device on updates
A chain/flowtable update with duplicated devices in the same batch is
possible. Unfortunately, netdev event path only removes the first
device that is found, leaving unregistered the hook of the duplicated
device.
Check if a duplicated device exists in the transaction batch, bail out
with EEXIST in such case.
WARNING is hit when unregistering the hook:
[49042.221275] WARNING: CPU: 4 PID: 8425 at net/netfilter/core.c:340 nf_hook_entry_head+0xaa/0x150
[49042.221375] CPU: 4 UID: 0 PID: 8425 Comm: nft Tainted: G S 6.16.0+ #170 PREEMPT(full)
[...]
[49042.221382] RIP: 0010:nf_hook_entry_head+0xaa/0x150
Fixes: 78d9f48f7f44 ("netfilter: nf_tables: add devices to existing flowtable") Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
The estimator kthreads' affinity are defined by sysctl overwritten
preferences and applied through a plain call to the scheduler's affinity
API.
However since the introduction of managed kthreads preferred affinity,
such a practice shortcuts the kthreads core code which eventually
overwrites the target to the default unbound affinity.
Fix this with using the appropriate kthread's API.
Fixes: d1a89197589c ("kthread: Default affine kthread to its preferred NUMA node") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
Jakub Kicinski [Thu, 7 Aug 2025 23:29:06 +0000 (16:29 -0700)]
tls: handle data disappearing from under the TLS ULP
TLS expects that it owns the receive queue of the TCP socket.
This cannot be guaranteed in case the reader of the TCP socket
entered before the TLS ULP was installed, or uses some non-standard
read API (eg. zerocopy ones). Replace the WARN_ON() and a buggy
early exit (which leaves anchor pointing to a freed skb) with real
error handling. Wipe the parsing state and tell the reader to retry.
We already reload the anchor every time we (re)acquire the socket lock,
so the only condition we need to avoid is an out of bounds read
(not having enough bytes in the socket for previously parsed record len).
If some data was read from under TLS but there's enough in the queue
we'll reload and decrypt what is most likely not a valid TLS record.
Leading to some undefined behavior from TLS perspective (corrupting
a stream? missing an alert? missing an attack?) but no kernel crash
should take place.
====================
net: airoha: Introduce NPU callbacks for wlan offloading
Similar to wired traffic, EN7581 SoC allows to offload traffic to/from
the MT76 wireless NIC configuring the NPU module via the Netfilter
flowtable. This series introduces the necessary NPU callback used by
the MT7996 driver in order to enable the offloading.
MT76 support has been posted as RFC in [0] in order to show how the
APIs are consumed.
====================
Lorenzo Bianconi [Mon, 11 Aug 2025 15:31:40 +0000 (17:31 +0200)]
net: airoha: npu: Read NPU wlan interrupt lines from the DTS
Read all NPU wlan IRQ lines from the NPU device-tree node.
NPU module fires wlan irq lines when the traffic to/from the WiFi NIC is
not hw accelerated (these interrupts will be consumed by the MT76 driver
in subsequent patches).
This is a preliminary patch to enable wlan flowtable offload for EN7581
SoC.
Introduce callbacks used by the MT76 driver to configure NPU SoC
interrupts. This is a preliminary patch to enable wlan flowtable
offload for EN7581 SoC with MT76 driver.
Introduce wlan_send_msg() and wlan_get_msg() NPU wlan callbacks used
by the wlan driver (MT76) to initialize NPU module registers in order to
offload wireless-wired traffic.
This is a preliminary patch to enable wlan flowtable offload for EN7581
SoC with MT76 driver.
Introduce wlan_init_reserved_memory callback used by MT76 driver during
NPU wlan offloading setup.
This is a preliminary patch to enable wlan flowtable offload for EN7581
SoC with MT76 driver.
Lorenzo Bianconi [Mon, 11 Aug 2025 15:31:36 +0000 (17:31 +0200)]
dt-bindings: net: airoha: npu: Add memory regions used for wlan offload
Document memory regions used by Airoha EN7581 NPU for wlan traffic
offloading. The brand new added memory regions do not introduce any
backward compatibility issues since they will be used just to offload
traffic to/from the MT76 wireless NIC and the MT76 probing will not fail
if these memory regions are not provide, it will just disable offloading
via the NPU module.
A few tweaks to the devmem test to make it more "NIPA-compatible".
We still need a fix to make sure that the test sets hds threshold
to 0. Taehee is presumably already/still working on that:
https://lore.kernel.org/20250702104249.1665034-1-ap420073@gmail.com
so I'm not including my version.
# ./tools/testing/selftests/drivers/net/hw/devmem.py
TAP version 13
1..3
ok 1 devmem.check_rx
ok 2 devmem.check_tx
ok 3 devmem.check_tx_chunks
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
====================
Jakub Kicinski [Mon, 11 Aug 2025 23:13:34 +0000 (16:13 -0700)]
selftests: drv-net: devmem: flip the direction of Tx tests
The Device Under Test should always be the local system.
While the Rx test gets this right the Tx test is sending
from remote to local. So Tx of DMABUF memory happens on remote.
These tests never run in NIPA since we don't have a compatible
device so we haven't caught this.
Jakub Kicinski [Mon, 11 Aug 2025 23:13:33 +0000 (16:13 -0700)]
selftests: net: terminate bkg() commands on exception
There is a number of:
with bkg("socat ..LISTEN..", exit_wait=True)
uses in the tests. If whatever is supposed to send the traffic
fails we will get stuck in the bkg(). Try to kill the process
in case of exception, to avoid the long wait.
A specific example where this happens is the devmem Tx tests.
Jakub Kicinski [Mon, 11 Aug 2025 23:13:31 +0000 (16:13 -0700)]
selftests: drv-net: devmem: remove sudo from system() calls
The general expectations for network HW selftests is that they
will be run as root. sudo doesn't seem to work on NIPA VMs.
While it's probably something solvable in the setup I think we should
remove the sudos. devmem is the only networking test using sudo.
Jakub Kicinski [Mon, 11 Aug 2025 23:13:30 +0000 (16:13 -0700)]
selftests: drv-net: add configs for zerocopy Rx
Looks like neither IO_URING nor UDMABUF are enabled even tho
iou-zcrx.py and devmem.py (respectively) need those.
IO_URING gets enabled by default but UDMABUF is missing.
This series improves the stmmac suspend/resume architecture by
providing a couple of method hooks in struct plat_stmmacenet_data which
are called by core code, and thus are available for any of the
platform glue drivers, whether using a platform or PCI device.
As these methods are called by core code, we can also provide a simple
PM ops structure also in the core code for converted glue drivers to
use.
The remainder of the patches convert the various drivers.
====================
Russell King (Oracle) [Mon, 11 Aug 2025 18:51:24 +0000 (19:51 +0100)]
net: stmmac: mediatek: convert to resume() method
Convert mediatek to use the resume() platform method rather than the
init() platform method as mediatek_dwmac_init() is only called from
the resume paths.
This will ensure that in a future commit, mediatek_dwmac_init() won't
be called when probing the main part of the stmmac driver.
Russell King (Oracle) [Mon, 11 Aug 2025 18:51:19 +0000 (19:51 +0100)]
net: stmmac: stm32: convert to suspend()/resume() methods
Convert stm32 to use the new suspend() and resume() methods rather
than implementing these in custom wrappers around the main driver's
suspend/resume methods. This allows this driver to use the stmmac
simple PM ops structure.
Russell King (Oracle) [Mon, 11 Aug 2025 18:51:14 +0000 (19:51 +0100)]
net: stmmac: rk: convert to suspend()/resume() methods
Convert rk to use the new suspend() and resume() methods rather than
implementing these in custom wrappers around the main driver's
suspend/resume methods. This allows this driver to use the simmac
simple PM ops structure.
We can further simplify the driver as there is no need to track whether
the device was suspended, we only need to check whether the device is
wakeup capable in the resume method. This is because the resume method
will only be called after the suspend method.
Russell King (Oracle) [Mon, 11 Aug 2025 18:51:09 +0000 (19:51 +0100)]
net: stmmac: pci: convert to suspend()/resume() methods
Convert pci to use the new suspend() and resume() methods rather
than implementing these in custom wrappers around the main driver's
suspend/resume methods.
Russell King (Oracle) [Mon, 11 Aug 2025 18:51:04 +0000 (19:51 +0100)]
net: stmmac: loongson: convert to suspend()/resume() methods
Convert loongson to use the new suspend() and resume() methods rather
than implementing these in custom wrappers around the main driver's
suspend/resume methods. This allows this driver to use the stmmac
simple PM ops structure.
Russell King (Oracle) [Mon, 11 Aug 2025 18:50:58 +0000 (19:50 +0100)]
net: stmmac: intel: convert to suspend()/resume() methods
Convert intel to use the new suspend() and resume() methods rather
than implementing these in custom wrappers around the main driver's
suspend/resume methods. This allows this driver to use the stmmac
simple PM ops structure.
Russell King (Oracle) [Mon, 11 Aug 2025 18:50:53 +0000 (19:50 +0100)]
net: stmmac: platform: legacy hooks for suspend()/resume() methods
Add legacy hooks for the suspend() and resume() methods to forward
these calls to the init() and exit() methods when the platform code
hasn't populated the two former methods. This allows us to get rid
of stmmac_pltfr_suspend() and stmmac_pltfr_resume().
Russell King (Oracle) [Mon, 11 Aug 2025 18:50:43 +0000 (19:50 +0100)]
net: stmmac: add suspend()/resume() platform ops
Add suspend/resume platform operations, which, when populated, override
the init/exit platform operations when we suspend and resume. These
suspend()/resume() methods are called by core code, and thus are
designed to support any struct device, not just platform devices. This
allows them to be used by the PCI drivers we have.
Kuniyuki Iwashima [Mon, 11 Aug 2025 21:53:06 +0000 (21:53 +0000)]
selftest: af_unix: Silence -Wflex-array-member-not-at-end warning for scm_rights.c.
scm_rights.c has no problem in functionality, but when compiled with
-Wflex-array-member-not-at-end, it shows this warning:
scm_rights.c: In function ‘__send_fd’:
scm_rights.c:275:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
275 | struct cmsghdr cmsghdr;
| ^~~~~~~
Kuniyuki Iwashima [Mon, 11 Aug 2025 21:53:05 +0000 (21:53 +0000)]
selftest: af_unix: Silence -Wflex-array-member-not-at-end warning for scm_inq.c.
scm_inq.c has no problem in functionality, but when compiled with
-Wflex-array-member-not-at-end, it shows this warning:
scm_inq.c:15:24: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
15 | struct cmsghdr cmsghdr;
| ^~~~~~~
====================
netconsole: reuse netpoll_parse_ip_addr in configfs helpers
This patchset refactors the IP address parsing logic in the netconsole
driver to eliminate code duplication and improve maintainability. The
changes centralize IPv4 and IPv6 address parsing into a single function
(netpoll_parse_ip_addr). For that, it needs to teach
netpoll_parse_ip_addr() to handle strings with newlines, which is the
type of string coming from configfs.
Background
The netconsole driver currently has duplicate IP address parsing logic
in both local_ip_store() and remote_ip_store() functions. This
duplication increases the risk of inconsistencies and makes the code
harder to maintain.
Benefits
* Reduced code duplication: ~40 lines of duplicate parsing logic
eliminated
* Improved robustness: Centralized parsing reduces the chance
of inconsistencies
* Easier to maintain: Code follow more the netdev way
Breno Leitao [Mon, 11 Aug 2025 18:13:27 +0000 (11:13 -0700)]
netconsole: use netpoll_parse_ip_addr in local_ip_store
Replace manual IP address parsing with a call to netpoll_parse_ip_addr
in local_ip_store(), simplifying the code and reducing the chance of
errors.
Also, remove the pr_err() if the user enters an invalid value in
configfs entries. pr_err() is not the best way to alert user that the
configuration is invalid.
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Breno Leitao [Mon, 11 Aug 2025 18:13:26 +0000 (11:13 -0700)]
netconsole: add support for strings with new line in netpoll_parse_ip_addr
The current IP address parsing logic fails when the input string
contains a trailing newline character. This can occur when IP
addresses are provided through configfs, which contains newlines in
a const buffer.
Teach netpoll_parse_ip_addr() how to ignore newlines at the end of the
IPs. Also, simplify the code by:
* No need to check for separators. Try to parse ipv4, if it fails try
ipv6 similarly to ceph_pton()
* If ipv6 is not supported, don't call in6_pton() at all.
Breno Leitao [Mon, 11 Aug 2025 18:13:25 +0000 (11:13 -0700)]
netconsole: move netpoll_parse_ip_addr() earlier for reuse
Move netpoll_parse_ip_addr() earlier in the file to be reused in
other functions, such as local_ip_store(). This avoids duplicate
address parsing logic and centralizes validation for both IPv4
and IPv6 string input.
In light of a81649a4efd3 ("net: mdio: mdio-bcm-unimac: Correct rate
fallback logic"), it became clear that the warning should be specific to
the MDIO controller instance, and there should be further information
provided to indicate what is wrong, whether the requested clock
frequency or the rate calculation. Clarify the message accordingly.
Since ptp virtual clock is registered only under ptp physical clock, both
ptp_clock and posix_clock must be physical clocks for ptp_vclock_in_use()
to lock &ptp->n_vclocks_mux and check ptp->n_vclocks.
However, when unregistering vclocks in n_vclocks_store(), the locking
ptp->n_vclocks_mux is a physical clock lock, but clk->rwsem of
ptp_clock_unregister() called through device_for_each_child_reverse()
is a virtual clock lock.
Therefore, clk->rwsem used in CPU0 and clk->rwsem used in CPU1 are
different locks, but in lockdep, a false positive occurs because the
possibility of deadlock is determined through lock-class.
To solve this, lock subclass annotation must be added to the posix_clock
rwsem of the vclock.
Reported-by: syzbot+7cfb66a237c4a5fb22ad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7cfb66a237c4a5fb22ad Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250728062649.469882-1-aha310510@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Colin Ian King [Mon, 11 Aug 2025 11:12:11 +0000 (12:12 +0100)]
net: stmmac: make variable data a u32
Make data a u32 instead of an unsigned long, this way it is
explicitly the same width as the operations performed on it
and the same width as a writel store, and it cleans up sign
extention warnings when 64 bit static analysis is performed
on the code.
Thorsten Blum [Mon, 11 Aug 2025 09:34:40 +0000 (11:34 +0200)]
caif: Replace memset(0) + strscpy() with strscpy_pad()
Replace memset(0) followed by strscpy() with strscpy_pad() to improve
cfctrl_linkup_request(). This avoids zeroing the memory before copying
the string and ensures the destination buffer is only written to once,
simplifying the code and improving efficiency.
Commit 16f5dfbc851b ("gfp: include __GFP_NOWARN in GFP_NOWAIT") made
GFP_NOWAIT implicitly include __GFP_NOWARN.
Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT (e.g.,
`GFP_NOWAIT | __GFP_NOWARN`) is now redundant. Let's clean up these
redundant flags across subsystems.