]> www.infradead.org Git - linux.git/log
linux.git
5 months agonet: dpaa2-eth: add ndo_hwtstamp_get() implementation
Vladimir Oltean [Thu, 8 May 2025 13:41:02 +0000 (16:41 +0300)]
net: dpaa2-eth: add ndo_hwtstamp_get() implementation

Allow SIOCGHWTSTAMP to retrieve the current timestamping settings on
DPNIs. This reflects what has been done in dpaa2_eth_hwtstamp_set().

Tested with hwstamp_ctl -i endpmac17. Before the change, it prints
"Device driver does not have support for non-destructive SIOCGHWTSTAMP."
After the change, it retrieves the previously set configuration.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # LX2160A
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250508134102.1747075-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dpaa2-eth: convert to ndo_hwtstamp_set()
Vladimir Oltean [Thu, 8 May 2025 13:41:01 +0000 (16:41 +0300)]
net: dpaa2-eth: convert to ndo_hwtstamp_set()

New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert the DPAA2 Ethernet driver to the new API, so that the
ndo_eth_ioctl() path can be removed completely.

This driver only responds to SIOCSHWTSTAMP (not SIOCGHWTSTAMP) so
convert just that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # LX2160A
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250508134102.1747075-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'dpaa_eth-conversion-to-ndo_hwtstamp_get-and-ndo_hwtstamp_set'
Jakub Kicinski [Fri, 9 May 2025 23:40:24 +0000 (16:40 -0700)]
Merge branch 'dpaa_eth-conversion-to-ndo_hwtstamp_get-and-ndo_hwtstamp_set'

Vladimir Oltean says:

====================
dpaa_eth conversion to ndo_hwtstamp_get() and ndo_hwtstamp_set()

This is part of the effort to finalize the conversion of drivers to the
dedicated hardware timestamping API.

In the case of the DPAA1 Ethernet driver, a bit more care is needed,
because dpaa_ioctl() looks a bit strange. It handles the "set" IOCTL but
not the "get", and even the phylink_mii_ioctl() portion could do with
some cleanup.
====================

Link: https://patch.msgid.link/20250508124753.1492742-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dpaa_eth: simplify dpaa_ioctl()
Vladimir Oltean [Thu, 8 May 2025 12:47:53 +0000 (15:47 +0300)]
net: dpaa_eth: simplify dpaa_ioctl()

phylink_mii_ioctl() handles multiple ioctls in addition to just
SIOCGMIIREG: SIOCGMIIPHY, SIOCSMIIREG. Don't filter these out.

Also, phylink can handle the case where net_dev->phydev is NULL (like
optical SFP module, fixed-link). Be like other drivers and let phylink
do so without any driver-side call filtering.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20250508124753.1492742-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dpaa_eth: add ndo_hwtstamp_get() implementation
Vladimir Oltean [Thu, 8 May 2025 12:47:52 +0000 (15:47 +0300)]
net: dpaa_eth: add ndo_hwtstamp_get() implementation

Allow SIOCGHWTSTAMP to retrieve the current timestamping settings
on DPAA1 interfaces. This reflects what has been done in
dpaa_hwtstamp_set().

Tested with hwstamp_ctl -i fm1-mac9.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20250508124753.1492742-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dpaa_eth: convert to ndo_hwtstamp_set()
Vladimir Oltean [Thu, 8 May 2025 12:47:51 +0000 (15:47 +0300)]
net: dpaa_eth: convert to ndo_hwtstamp_set()

New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert the DPAA1 driver to the new API, so that the
ndo_eth_ioctl() path can be removed completely.

This driver only responds to SIOCSHWTSTAMP (not SIOCGHWTSTAMP) so
convert just that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20250508124753.1492742-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
Vladimir Oltean [Thu, 8 May 2025 09:52:36 +0000 (12:52 +0300)]
net: dsa: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()

New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert DSA to the new API, so that the ndo_eth_ioctl() path can
be removed completely.

Move the ds->ops->port_hwtstamp_get() and ds->ops->port_hwtstamp_set()
calls from dsa_user_ioctl() to dsa_user_hwtstamp_get() and
dsa_user_hwtstamp_set().

Due to the fact that the underlying ifreq type changes to
kernel_hwtstamp_config, the drivers and the Ocelot switchdev front-end,
all hooked up directly or indirectly, must also be converted all at once.

The conversion also updates the comment from dsa_port_supports_hwtstamp(),
which is no longer true because kernel_hwtstamp_config is kernel memory
and does not need copy_to_user(). I've deliberated whether it is
necessary to also update "err != -EOPNOTSUPP" to a more general "!err",
but all drivers now either return 0 or -EOPNOTSUPP.

The existing logic from the ocelot_ioctl() function, to avoid
configuring timestamping if the PHY supports the operation, is obsoleted
by more advanced core logic in dev_set_hwtstamp_phylib().

This is only a partial preparation for proper PHY timestamping support.
None of these switch driver currently sets up PTP traps for PHY
timestamping, so setting dev->see_all_hwtstamp_requests is not yet
necessary and the conversion is relatively trivial.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # felix, sja1105, mv88e6xxx
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250508095236.887789-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agotools: ynl: handle broken pipe gracefully in CLI
Donald Hunter [Thu, 8 May 2025 11:21:02 +0000 (12:21 +0100)]
tools: ynl: handle broken pipe gracefully in CLI

When sending YNL CLI output into a pipe, closing the pipe causes a
BrokenPipeError. E.g. running the following and quitting less:

./tools/net/ynl/pyynl/cli.py --family rt-link --dump getlink | less
Traceback (most recent call last):
  File "/home/donaldh/net-next/./tools/net/ynl/pyynl/cli.py", line 160, in <module>
    main()
    ~~~~^^
  File "/home/donaldh/net-next/./tools/net/ynl/pyynl/cli.py", line 142, in main
    output(reply)
    ~~~~~~^^^^^^^
  File "/home/donaldh/net-next/./tools/net/ynl/pyynl/cli.py", line 97, in output
    pprint.PrettyPrinter().pprint(msg)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
[...]
BrokenPipeError: [Errno 32] Broken pipe

Consolidate the try block for ops and notifications, and gracefully
handle the BrokenPipeError by adding an exception handler to the
consolidated try block.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250508112102.63539-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoethtool: Block setting of symmetric RSS when non-symmetric rx-flow-hash is requested
Gal Pressman [Thu, 8 May 2025 10:30:34 +0000 (13:30 +0300)]
ethtool: Block setting of symmetric RSS when non-symmetric rx-flow-hash is requested

Symmetric RSS hash requires that:
* No other fields besides IP src/dst and/or L4 src/dst are set
* If src is set, dst must also be set

This restriction was only enforced when RXNFC was configured after
symmetric hash was enabled. In the opposite order of operations (RXNFC
then symmetric enablement) the check was not performed.

Perform the sanity check on set_rxfh as well, by iterating over all flow
types hash fields and making sure they are all symmetric.

Introduce a function that returns whether a flow type is hashable (not
spec only) and needs to be iterated over. To make sure that no one
forgets to update the list of hashable flow types when adding new flow
types, a static assert is added to draw the developer's attention.

The conversion of uapi #defines to enum is not ideal, but as Jakub
mentioned [1], we have precedent for that.

[1] https://lore.kernel.org/netdev/20250324073509.6571ade3@kernel.org/

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250508103034.885536-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: thunder: make tx software timestamp independent
Jason Xing [Thu, 8 May 2025 03:44:33 +0000 (11:44 +0800)]
net: thunder: make tx software timestamp independent

skb_tx_timestamp() is used for tx software timestamp enabled by
SOF_TIMESTAMPING_TX_SOFTWARE while SKBTX_HW_TSTAMP is used for
SOF_TIMESTAMPING_TX_HARDWARE. As it clearly shows they are different
timestamps in two dimensions, it's not appropriate to group these two
together in the if-statement.

This patch completes three things:
1. make the software one standalone. Users are able to set both
timestamps together with SOF_TIMESTAMPING_OPT_TX_SWHW flag.
2. make the software one generated after the hardware timestamp logic to
avoid generating sw and hw timestamps at one time without
SOF_TIMESTAMPING_OPT_TX_SWHW being set.
3. move the software timestamp call as close to the door bell.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250508034433.14408-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'add-more-features-for-enetc-v4-round-2'
Jakub Kicinski [Fri, 9 May 2025 02:43:54 +0000 (19:43 -0700)]
Merge branch 'add-more-features-for-enetc-v4-round-2'

Wei Fang says:

====================
Add more features for ENETC v4 - round 2

This patch set adds the following features.
1. Compared with ENETC v1, the formats of tables and command BD of ENETC
v4 have changed significantly, and the two are not compatible. Therefore,
in order to support the NETC Table Management Protocol (NTMP) v2.0, we
introduced the netc-lib driver and added support for MAC address filter
table and RSS table.
2. Add MAC filter and VLAN filter support for i.MX95 ENETC PF.
3. Add RSS support for i.MX95 ENETC PF.
4. Add loopback support for i.MX95 ENETC PF.

Link: https://lore.kernel.org/20250103060610.2233908-1-wei.fang@nxp.com/
Link: https://lore.kernel.org/20250113082245.2332775-1-wei.fang@nxp.com/
Link: https://lore.kernel.org/20250304072201.1332603-1-wei.fang@nxp.com/
Link: https://lore.kernel.org/20250311053830.1516523-1-wei.fang@nxp.com/
Link: https://lore.kernel.org/20250411095752.3072696-1-wei.fang@nxp.com/
Link: https://lore.kernel.org/20250428105657.3283130-1-wei.fang@nxp.com/
====================

Link: https://patch.msgid.link/20250506080735.3444381-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: add loopback support for i.MX95 ENETC PF
Wei Fang [Tue, 6 May 2025 08:07:35 +0000 (16:07 +0800)]
net: enetc: add loopback support for i.MX95 ENETC PF

Add internal loopback support for i.MX95 ENETC PF, the default loopback
mode is MAC level loopback, the MAC Tx data is looped back onto the Rx.
The MAC interface runs at a fixed 1:8 ratio of NETC clock in MAC-level
loopback mode, with no dependency on Tx clock.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-15-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: add VLAN filtering support for i.MX95 ENETC PF
Wei Fang [Tue, 6 May 2025 08:07:34 +0000 (16:07 +0800)]
net: enetc: add VLAN filtering support for i.MX95 ENETC PF

Since the offsets of the VLAN hash filter registers of ENETC v4 are
different from ENETC v1. Therefore, enetc_set_si_vlan_ht_filter() is
added to set the correct VLAN hash filter based on the SI ID and ENETC
revision, so that ENETC v4 PF driver can reuse enetc_vlan_rx_add_vid()
and enetc_vlan_rx_del_vid(). In addition, the VLAN promiscuous mode will
be enabled if VLAN filtering is disabled, which means that PF qualifies
for reception of all VLAN tags.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-14-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: move generic VLAN hash filter functions to enetc_pf_common.c
Wei Fang [Tue, 6 May 2025 08:07:33 +0000 (16:07 +0800)]
net: enetc: move generic VLAN hash filter functions to enetc_pf_common.c

The VLAN hash filters of ENETC v1 and v4 are basically the same, the only
difference is that the offset of the VLAN hash filter registers has been
changed in ENETC v4. So some functions like enetc_vlan_rx_add_vid() and
enetc_vlan_rx_del_vid() only need to be slightly modified to be reused
by ENETC v4. Currently, we just move these functions from enetc_pf.c to
enetc_pf_common.c. Appropriate modifications will be made for ENETC4 in
a subsequent patch.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-13-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: extract enetc_refresh_vlan_ht_filter()
Wei Fang [Tue, 6 May 2025 08:07:32 +0000 (16:07 +0800)]
net: enetc: extract enetc_refresh_vlan_ht_filter()

Extract the common function enetc_refresh_vlan_ht_filter() from
enetc_sync_vlan_ht_filter() so that it can be reused by the ENETC
v4 PF and VF drivers in the future.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-12-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: enable RSS feature by default
Wei Fang [Tue, 6 May 2025 08:07:31 +0000 (16:07 +0800)]
net: enetc: enable RSS feature by default

Receive side scaling (RSS) is a network driver technology that enables
the efficient distribution of network receive processing across multiple
CPUs in multiprocessor systems. Therefore, it is better to enable RSS by
default so that the CPU load can be balanced and network performance can
be improved when then network is enabled.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-11-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: change enetc_set_rss() to void type
Wei Fang [Tue, 6 May 2025 08:07:30 +0000 (16:07 +0800)]
net: enetc: change enetc_set_rss() to void type

Actually enetc_set_rss() does not need a return value, so change its
type to void.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-10-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: add RSS support for i.MX95 ENETC PF
Wei Fang [Tue, 6 May 2025 08:07:29 +0000 (16:07 +0800)]
net: enetc: add RSS support for i.MX95 ENETC PF

Compared with LS1028A, there are two main differences: first, i.MX95
ENETC uses NTMP 2.0 to manage the RSS table, and second, the offset
of the RSS Key registers is different. Some modifications have been
made in the previous patches based on these differences to ensure that
the relevant interfaces are compatible with i.MX95. So it's time to
add RSS support to i.MX95 ENETC PF.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-9-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: make enetc_set_rss_key() reusable
Wei Fang [Tue, 6 May 2025 08:07:28 +0000 (16:07 +0800)]
net: enetc: make enetc_set_rss_key() reusable

Since the offset of the RSS key registers of i.MX95 ENETC is different
from that of LS1028A, so add enetc_get_rss_key_base() to get the base
offset for the different chips, so that enetc_set_rss_key() can be
reused for this trivial thing.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-8-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: add set/get_rss_table() hooks to enetc_si_ops
Wei Fang [Tue, 6 May 2025 08:07:27 +0000 (16:07 +0800)]
net: enetc: add set/get_rss_table() hooks to enetc_si_ops

Since i.MX95 ENETC (v4) uses NTMP 2.0 to manage the RSS table, which is
different from LS1028A ENETC (v1). In order to reuse some functions
related to the RSS table, so add .get_rss_table() and .set_rss_table()
hooks to enetc_si_ops.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-7-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: add debugfs interface to dump MAC filter
Wei Fang [Tue, 6 May 2025 08:07:26 +0000 (16:07 +0800)]
net: enetc: add debugfs interface to dump MAC filter

ENETC's MAC filter consists of hash MAC filter and exact MAC filter.
Hash MAC filter is a 64-bit entry hash table consisting of two 32-bit
registers. Exact MAC filter is implemented by configuring MAC address
filter table through command BD ring. The table is stored in ENETC's
internal memory and needs to be read through command BD ring. In order
to facilitate debugging, added a debugfs interface to get the relevant
information about MAC filter.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-6-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: add MAC filtering for i.MX95 ENETC PF
Wei Fang [Tue, 6 May 2025 08:07:25 +0000 (16:07 +0800)]
net: enetc: add MAC filtering for i.MX95 ENETC PF

The i.MX95 ENETC supports both MAC hash filter and MAC exact filter. MAC
hash filter is implenented through a 64-bit hash table to match against
the hashed addresses, PF and VFs each have two MAC hash tables, one is
for unicast and the other one is for multicast. But MAC exact filter is
shared between SIs (PF and VFs), each table entry contains a MAC address
that may be unicast or multicast and the entry also contains an SI bitmap
field that indicates for which SIs the entry is valid.

For i.MX95 ENETC, MAC exact filter only has 4 entries. According to the
observation of the system default network configuration, the MAC filter
will be configured with multiple multicast addresses, so MAC exact filter
does not have enough entries to implement multicast filtering. Therefore,
the current MAC exact filter is only used for unicast filtering. If the
number of unicast addresses exceeds 4, then MAC hash filter is used.

Note that both MAC hash filter and MAC exact filter can only be accessed
by PF, VFs can notify PF to set its corresponding MAC filter through the
mailbox mechanism of ENETC. But currently MAC filter is only added for
i.MX95 ENETC PF. The MAC filter support of ENETC VFs will be supported in
subsequent patches.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-5-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: move generic MAC filtering interfaces to enetc-core
Wei Fang [Tue, 6 May 2025 08:07:24 +0000 (16:07 +0800)]
net: enetc: move generic MAC filtering interfaces to enetc-core

Although only ENETC PF can access the MAC address filter table, the table
entries can specify MAC address filtering for one or more SIs based on
SI_BITMAP, which means that the table also supports MAC address filtering
for VFs.

Currently, only the ENETC v1 PF driver supports MAC address filtering. In
order to add the MAC address filtering support for the ENETC v4 PF driver
and VF driver in the future, the relevant generic interfaces are moved to
the enetc-core driver. This lays the basis for i.MX95 ENETC PF and VFs to
support MAC address filtering.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: add command BD ring support for i.MX95 ENETC
Wei Fang [Tue, 6 May 2025 08:07:23 +0000 (16:07 +0800)]
net: enetc: add command BD ring support for i.MX95 ENETC

The command BD ring is used to configure functionality where the
underlying resources may be shared between different entities or being
too large to configure using direct registers (such as lookup tables).

Because the command BD and table formats of i.MX95 and LS1028A are very
different, the software processing logic is also different. So add
enetc4_setup_cbdr() and enetc4_teardown_cbdr() for ENETC v4 drivers.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: add initial netc-lib driver to support NTMP
Wei Fang [Tue, 6 May 2025 08:07:22 +0000 (16:07 +0800)]
net: enetc: add initial netc-lib driver to support NTMP

Some NETC functionality is controlled using control messages sent to the
hardware using BD ring interface with 32B descriptor similar to transmit
BD ring used on ENETC. This BD ring interface is referred to as command
BD ring. It is used to configure functionality where the underlying
resources may be shared between different entities or being too large to
configure using direct registers. Therefore, a messaging protocol called
NETC Table Management Protocol (NTMP) is provided for exchanging
configuration and management information between the software and the
hardware using the command BD ring interface.

For the management protocol of LS1028A has been retroactively named NTMP
1.0, and its implementation is in enetc_cbdr.c and enetc_qos.c. However,
NTMP of i.MX95 has been upgraded to version 2.0, which is incompatible
with LS1028A, because the message formats have been changed. Therefore,
add the netc-lib driver to support NTMP 2.0 to operate various tables.
Note that, only MAC address filter table and RSS table are supported at
the moment. More tables will be supported in subsequent patches.

It is worth mentioning that the purpose of the netc-lib driver is to
provide some NTMP-based generic interfaces for ENETC and NETC Switch
drivers. Currently, it only supports the configurations of some tables.
Interfaces such as tc flower and debugfs will be added in the future.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: net-drv: remove the nic_performance and nic_link_layer tests
Jakub Kicinski [Wed, 7 May 2025 14:01:07 +0000 (07:01 -0700)]
selftests: net-drv: remove the nic_performance and nic_link_layer tests

Revert fbbf93556f0c ("selftests: nic_performance: Add selftest for performance of NIC driver")
Revert c087dc54394b ("selftests: nic_link_layer: Add selftest case for speed and duplex states")
Revert 6116075e18f7 ("selftests: nic_link_layer: Add link layer selftest for NIC driver")

These tests don't clean up after themselves, don't use the disruptive
annotations, don't get included in make install etc. etc. The tests
were added before we have any "HW" runner, so the issues were missed.
Our CI doesn't have any way of excluding broken tests, remove these
for now to stop the random pollution of results due to broken env.
We can always add them back once / if fixed.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250507140109.929801-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: fix conntrack stress test failures on debug kernels
Florian Westphal [Wed, 7 May 2025 07:49:55 +0000 (09:49 +0200)]
selftests: netfilter: fix conntrack stress test failures on debug kernels

Jakub reports test failures on debug kernel:
FAIL: proc inconsistency after uniq filter for ...

This is because entries are expiring while validation is happening.

Increase the timeout of ctnetlink injected entries and the
icmp (ping) timeout to 1h to avoid this.

To reduce run-time, add less entries via ctnetlink when KSFT_MACHINE_SLOW
is set.

also log of a failed run had:
 PASS: dump in netns had same entry count (-C 0, -L 0, -p 0, /proc 0)

... i.e. all entries already expired: add a check and set failure if
this happens.

While at it, include a diff when there were duplicate entries and add
netns name to error messages (it tells if icmp or ctnetlink failed).

Fixes: d33f889fd80c ("selftests: netfilter: add conntrack stress test")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20250506061125.1a244d12@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250507075000.5819-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 8 May 2025 15:56:12 +0000 (08:56 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.15-rc6).

No conflicts.

Adjacent changes:

net/core/dev.c:
  08e9f2d584c4 ("net: Lock netdevices during dev_shutdown")
  a82dc19db136 ("net: avoid potential race between netdev_get_by_index_lock() and netns switch")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge tag 'net-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 8 May 2025 15:33:56 +0000 (08:33 -0700)]
Merge tag 'net-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from CAN, WiFi and netfilter.

  We have still a comple of regressions open due to the recent
  drivers locking refactor. The patches are in-flight, but not
  ready yet.

  Current release - regressions:

   - core: lock netdevices during dev_shutdown

   - sch_htb: make htb_deactivate() idempotent

   - eth: virtio-net: don't re-enable refill work too early

  Current release - new code bugs:

   - eth: icssg-prueth: fix kernel panic during concurrent Tx queue
     access

  Previous releases - regressions:

   - gre: fix again IPv6 link-local address generation.

   - eth: b53: fix learning on VLAN unaware bridges

  Previous releases - always broken:

   - wifi: fix out-of-bounds access during multi-link element
     defragmentation

   - can:
       - initialize spin lock on device probe
       - fix order of unregistration calls

   - openvswitch: fix unsafe attribute parsing in output_userspace()

   - eth:
       - virtio-net: fix total qstat values
       - mtk_eth_soc: reset all TX queues on DMA free
       - fbnic: firmware IPC mailbox fixes"

* tag 'net-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
  virtio-net: fix total qstat values
  net: export a helper for adding up queue stats
  fbnic: Do not allow mailbox to toggle to ready outside fbnic_mbx_poll_tx_ready
  fbnic: Pull fbnic_fw_xmit_cap_msg use out of interrupt context
  fbnic: Improve responsiveness of fbnic_mbx_poll_tx_ready
  fbnic: Cleanup handling of completions
  fbnic: Actually flush_tx instead of stalling out
  fbnic: Add additional handling of IRQs
  fbnic: Gate AXI read/write enabling on FW mailbox
  fbnic: Fix initialization of mailbox descriptor rings
  net: dsa: b53: do not set learning and unicast/multicast on up
  net: dsa: b53: fix learning on VLAN unaware bridges
  net: dsa: b53: fix toggling vlan_filtering
  net: dsa: b53: do not program vlans when vlan filtering is off
  net: dsa: b53: do not allow to configure VLAN 0
  net: dsa: b53: always rejoin default untagged VLAN on bridge leave
  net: dsa: b53: fix VLAN ID for untagged vlan on bridge leave
  net: dsa: b53: fix flushing old pvid VLAN on pvid change
  net: dsa: b53: fix clearing PVID of a port
  net: dsa: b53: keep CPU port always tagged again
  ...

5 months agoMerge tag 's390-6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Thu, 8 May 2025 15:29:13 +0000 (08:29 -0700)]
Merge tag 's390-6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

 - Fix potential use-after-free bug and missing error handling in PCI
   code

 - Fix dcssblk build error

 - Fix last breaking event handling in case of stack corruption to allow
   for better error reporting

 - Update defconfigs

* tag 's390-6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/pci: Fix duplicate pci_dev_put() in disable_slot() when PF has child VFs
  s390/pci: Fix missing check for zpci_create_device() error return
  s390: Update defconfigs
  s390/dcssblk: Fix build error with CONFIG_DAX=m and CONFIG_DCSSBLK=y
  s390/entry: Fix last breaking event handling in case of stack corruption
  s390/configs: Enable options required for TC flow offload
  s390/configs: Enable VDPA on Nvidia ConnectX-6 network card

5 months agoMerge tag 'v6.15-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Thu, 8 May 2025 15:22:35 +0000 (08:22 -0700)]
Merge tag 'v6.15-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Fix UAF closing file table (e.g. in tree disconnect)

 - Fix potential out of bounds write

 - Fix potential memory leak parsing lease state in open

 - Fix oops in rename with empty target

* tag 'v6.15-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: Fix UAF in __close_file_table_ids
  ksmbd: prevent out-of-bounds stream writes by validating *pos
  ksmbd: fix memory leak in parse_lease_state()
  ksmbd: prevent rename with empty string

5 months agoMerge branch 'virtio-net-fix-total-qstat-values'
Paolo Abeni [Thu, 8 May 2025 09:56:13 +0000 (11:56 +0200)]
Merge branch 'virtio-net-fix-total-qstat-values'

Jakub Kicinski says:

====================
virtio-net: fix total qstat values

Another small fix discovered after we enabled virtio multi-queue
in netdev CI. The queue stat test fails:

  # Exception| Exception: Qstats are lower, fetched later
  not ok 3 stats.pkt_byte_sum

The queue stats from disabled queues are supposed to be reported
in the "base" stats.
====================

Link: https://patch.msgid.link/20250507003221.823267-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agovirtio-net: fix total qstat values
Jakub Kicinski [Wed, 7 May 2025 00:32:21 +0000 (17:32 -0700)]
virtio-net: fix total qstat values

NIPA tests report that the interface statistics reported
via qstat are lower than those reported via ip link.
Looks like this is because some tests flip the queue
count up and down, and we end up with some of the traffic
accounted on disabled queues.

Add up counters from disabled queues.

Fixes: d888f04c09bb ("virtio-net: support queue stat")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250507003221.823267-3-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: export a helper for adding up queue stats
Jakub Kicinski [Wed, 7 May 2025 00:32:20 +0000 (17:32 -0700)]
net: export a helper for adding up queue stats

Older drivers and drivers with lower queue counts often have a static
array of queues, rather than allocating structs for each queue on demand.
Add a helper for adding up qstats from a queue range. Expectation is
that driver will pass a queue range [netdev->real_num_*x_queues, MAX).
It was tempting to always use num_*x_queues as the end, but virtio
seems to clamp its queue count after allocating the netdev. And this
way we can trivaly reuse the helper for [0, real_..).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250507003221.823267-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoMerge branch 'fbnic-fw-ipc-mailbox-fixes'
Paolo Abeni [Thu, 8 May 2025 09:33:32 +0000 (11:33 +0200)]
Merge branch 'fbnic-fw-ipc-mailbox-fixes'

Alexander Duyck says:

====================
fbnic: FW IPC Mailbox fixes

This series is meant to address a number of issues that have been found in
the FW IPC mailbox over the past several months.

The main issues addressed are:
1. Resolve a potential race between host and FW during initialization that
can cause the FW to only have the lower 32b of an address.
2. Block the FW from issuing DMA requests after we have closed the mailbox
and before we have started issuing requests on it.
3. Fix races in the IRQ handlers that can cause the IRQ to unmask itself if
it is being processed while we are trying to disable it.
4. Cleanup the Tx flush logic so that we actually lock down the Tx path
before we start flushing it instead of letting it free run while we are
shutting it down.
5. Fix several memory leaks that could occur if we failed initialization.
6. Cleanup the mailbox completion if we are flushing Tx since we are no
longer processing Rx.
7. Move several allocations out of a potential IRQ/atomic context.

There have been a few optimizations we also picked up since then. Rather
than split them out I just folded them into these diffs. They mostly
address minor issues such as how long it takes to initialize and/or fail so
I thought they could probably go in with the rest of the patches. They
consist of:
1. Do not sleep more than 20ms waiting on FW to respond as the 200ms value
likely originated from simulation/emulation testing.
2. Use jiffies to determine timeout instead of sleep * attempts for better
accuracy.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
====================

Link: https://patch.msgid.link/174654659243.499179.11194817277075480209.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agofbnic: Do not allow mailbox to toggle to ready outside fbnic_mbx_poll_tx_ready
Alexander Duyck [Tue, 6 May 2025 16:00:25 +0000 (09:00 -0700)]
fbnic: Do not allow mailbox to toggle to ready outside fbnic_mbx_poll_tx_ready

We had originally thought to have the mailbox go to ready in the background
while we were doing other things. One issue with this though is that we
can't disable it by clearing the ready state without also blocking
interrupts or calls to mbx_poll as it will just pop back to life during an
interrupt.

In order to prevent that from happening we can pull the code for toggling
to ready out of the interrupt path and instead place it in the
fbnic_mbx_poll_tx_ready path so that it becomes the only spot where the
Rx/Tx can toggle to the ready state. By doing this we can prevent races
where we disable the DMA and/or free buffers only to have an interrupt fire
and undo what we have done.

Fixes: da3cde08209e ("eth: fbnic: Add FW communication mechanism")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/174654722518.499179.11612865740376848478.stgit@ahduyck-xeon-server.home.arpa
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agofbnic: Pull fbnic_fw_xmit_cap_msg use out of interrupt context
Alexander Duyck [Tue, 6 May 2025 16:00:18 +0000 (09:00 -0700)]
fbnic: Pull fbnic_fw_xmit_cap_msg use out of interrupt context

This change pulls the call to fbnic_fw_xmit_cap_msg out of
fbnic_mbx_init_desc_ring and instead places it in the polling function for
getting the Tx ready. Doing that we can avoid the potential issue with an
interrupt coming in later from the firmware that causes it to get fired in
interrupt context.

Fixes: 20d2e88cc746 ("eth: fbnic: Add initial messaging to notify FW of our presence")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/174654721876.499179.9839651602256668493.stgit@ahduyck-xeon-server.home.arpa
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agofbnic: Improve responsiveness of fbnic_mbx_poll_tx_ready
Alexander Duyck [Tue, 6 May 2025 16:00:12 +0000 (09:00 -0700)]
fbnic: Improve responsiveness of fbnic_mbx_poll_tx_ready

There were a couple different issues found in fbnic_mbx_poll_tx_ready.
Among them were the fact that we were sleeping much longer than we actually
needed to as the actual FW could respond in under 20ms. The other issue was
that we would just keep polling the mailbox even if the device itself had
gone away.

To address the responsiveness issues we can decrease the sleeps to 20ms and
use a jiffies based timeout value rather than just counting the number of
times we slept and then polled.

To address the hardware going away we can move the check for the firmware
BAR being present from where it was and place it inside the loop after the
mailbox descriptor ring is initialized and before we sleep so that we just
abort and return an error if the device went away during initialization.

With these two changes we see a significant improvement in boot times for
the driver.

Fixes: da3cde08209e ("eth: fbnic: Add FW communication mechanism")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/174654721224.499179.2698616208976624755.stgit@ahduyck-xeon-server.home.arpa
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agofbnic: Cleanup handling of completions
Alexander Duyck [Tue, 6 May 2025 16:00:05 +0000 (09:00 -0700)]
fbnic: Cleanup handling of completions

There was an issue in that if we were to shutdown we could be left with
a completion in flight as the mailbox went away. To address that I have
added an fbnic_mbx_evict_all_cmpl function that is meant to essentially
create a "broken pipe" type response so that all callers will receive an
error indicating that the connection has been broken as a result of us
shutting down the mailbox.

Fixes: 378e5cc1c6c6 ("eth: fbnic: hwmon: Add completion infrastructure for firmware requests")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/174654720578.499179.380252598204530873.stgit@ahduyck-xeon-server.home.arpa
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agofbnic: Actually flush_tx instead of stalling out
Alexander Duyck [Tue, 6 May 2025 15:59:59 +0000 (08:59 -0700)]
fbnic: Actually flush_tx instead of stalling out

The fbnic_mbx_flush_tx function had a number of issues.

First, we were waiting 200ms for the firmware to process the packets. We
can drop this to 20ms and in almost all cases this should be more than
enough time. So by changing this we can significantly reduce shutdown time.

Second, we were not making sure that the Tx path was actually shut off. As
such we could still have packets added while we were flushing the mailbox.
To prevent that we can now clear the ready flag for the Tx side and it
should stay down since the interrupt is disabled.

Third, we kept re-reading the tail due to the second issue. The tail should
not move after we have started the flush so we can just read it once while
we are holding the mailbox Tx lock. By doing that we are guaranteed that
the value should be consistent.

Fourth, we were keeping a count of descriptors cleaned due to the second
and third issues called out. That count is not a valid reason to be exiting
the cleanup, and with the tail only being read once we shouldn't see any
cases where the tail moves after the disable so the tracking of count can
be dropped.

Fifth, we were using attempts * sleep time to determine how long we would
wait in our polling loop to flush out the Tx. This can be very imprecise.
In order to tighten up the timing we are shifting over to using a jiffies
value of jiffies + 10 * HZ + 1 to determine the jiffies value we should
stop polling at as this should be accurate within once sleep cycle for the
total amount of time spent polling.

Fixes: da3cde08209e ("eth: fbnic: Add FW communication mechanism")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/174654719929.499179.16406653096197423749.stgit@ahduyck-xeon-server.home.arpa
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agofbnic: Add additional handling of IRQs
Alexander Duyck [Tue, 6 May 2025 15:59:52 +0000 (08:59 -0700)]
fbnic: Add additional handling of IRQs

We have two issues that need to be addressed in our IRQ handling.

One is the fact that we can end up double-freeing IRQs in the event of an
exception handling error such as a PCIe reset/recovery that fails. To
prevent that from becoming an issue we can use the msix_vector values to
indicate that we have successfully requested/freed the IRQ by only setting
or clearing them when we have completed the given action.

The other issue is that we have several potential races in our IRQ path due
to us manipulating the mask before the vector has been truly disabled. In
order to handle that in the case of the FW mailbox we need to not
auto-enable the IRQ and instead will be enabling/disabling it separately.
In the case of the PCS vector we can mitigate this by unmapping it and
synchronizing the IRQ before we clear the mask.

The general order of operations after this change is now to request the
interrupt, poll the FW mailbox to ready, and then enable the interrupt. For
the shutdown we do the reverse where we disable the interrupt, flush any
pending Tx, and then free the IRQ. I am renaming the enable/disable to
request/free to be equivilent with the IRQ calls being used. We may see
additions in the future to enable/disable the IRQs versus request/free them
for certain use cases.

Fixes: da3cde08209e ("eth: fbnic: Add FW communication mechanism")
Fixes: 69684376eed5 ("eth: fbnic: Add link detection")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/174654719271.499179.3634535105127848325.stgit@ahduyck-xeon-server.home.arpa
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agofbnic: Gate AXI read/write enabling on FW mailbox
Alexander Duyck [Tue, 6 May 2025 15:59:46 +0000 (08:59 -0700)]
fbnic: Gate AXI read/write enabling on FW mailbox

In order to prevent the device from throwing spurious writes and/or reads
at us we need to gate the AXI fabric interface to the PCIe until such time
as we know the FW is in a known good state.

To accomplish this we use the mailbox as a mechanism for us to recognize
that the FW has acknowledged our presence and is no longer sending any
stale message data to us.

We start in fbnic_mbx_init by calling fbnic_mbx_reset_desc_ring function,
disabling the DMA in both directions, and then invalidating all the
descriptors in each ring.

We then poll the mailbox in fbnic_mbx_poll_tx_ready and when the interrupt
is set by the FW we pick it up and mark the mailboxes as ready, while also
enabling the DMA.

Once we have completed all the transactions and need to shut down we call
into fbnic_mbx_clean which will in turn call fbnic_mbx_reset_desc_ring for
each ring and shut down the DMA and once again invalidate the descriptors.

Fixes: 3646153161f1 ("eth: fbnic: Add register init to set PCIe/Ethernet device config")
Fixes: da3cde08209e ("eth: fbnic: Add FW communication mechanism")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/174654718623.499179.7445197308109347982.stgit@ahduyck-xeon-server.home.arpa
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agofbnic: Fix initialization of mailbox descriptor rings
Alexander Duyck [Tue, 6 May 2025 15:59:39 +0000 (08:59 -0700)]
fbnic: Fix initialization of mailbox descriptor rings

Address to issues with the FW mailbox descriptor initialization.

We need to reverse the order of accesses when we invalidate an entry versus
writing an entry. When writing an entry we write upper and then lower as
the lower 32b contain the valid bit that makes the entire address valid.
However for invalidation we should write it in the reverse order so that
the upper is marked invalid before we update it.

Without this change we may see FW attempt to access pages with the upper
32b of the address set to 0 which will likely result in DMAR faults due to
write access failures on mailbox shutdown.

Fixes: da3cde08209e ("eth: fbnic: Add FW communication mechanism")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/174654717972.499179.8083789731819297034.stgit@ahduyck-xeon-server.home.arpa
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoMerge branch 'net-dsa-b53-accumulated-fixes'
Jakub Kicinski [Thu, 8 May 2025 02:30:37 +0000 (19:30 -0700)]
Merge branch 'net-dsa-b53-accumulated-fixes'

Jonas Gorski says:

====================
net: dsa: b53: accumulated fixes

This patchset aims at fixing most issues observed while running the
vlan_unaware_bridge, vlan_aware_bridge and local_termination selftests.

Most tests succeed with these patches on BCM53115, connected to a
BCM6368.

It took me a while to figure out that a lot of tests will fail if all
ports have the same MAC address, as the switches drop any frames with
DA == SA. Luckily BCM63XX boards often have enough MACs allocated for
all ports, so I just needed to assign them.

The still failing tests are:

FDB learning, both vlan aware aware and unaware:

This is expected, as b53 currently does not implement changing the
ageing time, and both the bridge code and DSA ignore that, so the
learned entries don't age out as expected.

ping and ping6 in vlan unaware:

These fail because of the now fixed learning, the switch trying to
forward packet ingressing on one of the standalone ports to the learned
port of the mac address when the packets ingressed on the bridged port.

The port VLAN masks only prevent forwarding to other ports, but the ARL
lookup will still happen, and the packet gets dropped because the port
isn't allowed to forward there.

I have a fix/workaround for that, but as it is a bit more controversial
and makes use of an unrelated feature, I decided to hold off from that
and post it later.

This wasn't noticed so far, because learning was never working in VLAN
unaware mode, so the traffic was always broadcast (which sidesteps the
issue).

Finally some of the multicast tests from local_termination fail, where
the reception worked except it shouldn't. This doesn't seem to me as a
super serious issue, so I didn't attempt to debug/fix these yet.

I'm not super confident I didn't break sf2 along the way, but I did
compile test and tried to find ways it cause issues (I failed to find
any). I hope Florian will tell me.
====================

Link: https://patch.msgid.link/20250429201710.330937-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: b53: do not set learning and unicast/multicast on up
Jonas Gorski [Tue, 29 Apr 2025 20:17:10 +0000 (22:17 +0200)]
net: dsa: b53: do not set learning and unicast/multicast on up

When a port gets set up, b53 disables learning and enables the port for
flooding. This can undo any bridge configuration on the port.

E.g. the following flow would disable learning on a port:

$ ip link add br0 type bridge
$ ip link set sw1p1 master br0 <- enables learning for sw1p1
$ ip link set br0 up
$ ip link set sw1p1 up <- disables learning again

Fix this by populating dsa_switch_ops::port_setup(), and set up initial
config there.

Fixes: f9b3827ee66c ("net: dsa: b53: Support setting learning on port")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250429201710.330937-12-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: b53: fix learning on VLAN unaware bridges
Jonas Gorski [Tue, 29 Apr 2025 20:17:09 +0000 (22:17 +0200)]
net: dsa: b53: fix learning on VLAN unaware bridges

When VLAN filtering is off, we configure the switch to forward, but not
learn on VLAN table misses. This effectively disables learning while not
filtering.

Fix this by switching to forward and learn. Setting the learning disable
register will still control whether learning actually happens.

Fixes: dad8d7c6452b ("net: dsa: b53: Properly account for VLAN filtering")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250429201710.330937-11-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: b53: fix toggling vlan_filtering
Jonas Gorski [Tue, 29 Apr 2025 20:17:08 +0000 (22:17 +0200)]
net: dsa: b53: fix toggling vlan_filtering

To allow runtime switching between vlan aware and vlan non-aware mode,
we need to properly keep track of any bridge VLAN configuration.
Likewise, we need to know when we actually switch between both modes, to
not have to rewrite the full VLAN table every time we update the VLANs.

So keep track of the current vlan_filtering mode, and on changes, apply
the appropriate VLAN configuration.

Fixes: 0ee2af4ebbe3 ("net: dsa: set configure_vlan_while_not_filtering to true by default")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250429201710.330937-10-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: b53: do not program vlans when vlan filtering is off
Jonas Gorski [Tue, 29 Apr 2025 20:17:07 +0000 (22:17 +0200)]
net: dsa: b53: do not program vlans when vlan filtering is off

Documentation/networking/switchdev.rst says:

- with VLAN filtering turned off: the bridge is strictly VLAN unaware and its
  data path will process all Ethernet frames as if they are VLAN-untagged.
  The bridge VLAN database can still be modified, but the modifications should
  have no effect while VLAN filtering is turned off.

This breaks if we immediately apply the VLAN configuration, so skip
writing it when vlan_filtering is off.

Fixes: 0ee2af4ebbe3 ("net: dsa: set configure_vlan_while_not_filtering to true by default")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250429201710.330937-9-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: b53: do not allow to configure VLAN 0
Jonas Gorski [Tue, 29 Apr 2025 20:17:06 +0000 (22:17 +0200)]
net: dsa: b53: do not allow to configure VLAN 0

Since we cannot set forwarding destinations per VLAN, we should not have
a VLAN 0 configured, as it would allow untagged traffic to work across
ports on VLAN aware bridges regardless if a PVID untagged VLAN exists.

So remove the VLAN 0 on join, an re-add it on leave. But only do so if
we have a VLAN aware bridge, as without it, untagged traffic would
become tagged with VID 0 on a VLAN unaware bridge.

Fixes: a2482d2ce349 ("net: dsa: b53: Plug in VLAN support")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250429201710.330937-8-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: b53: always rejoin default untagged VLAN on bridge leave
Jonas Gorski [Tue, 29 Apr 2025 20:17:05 +0000 (22:17 +0200)]
net: dsa: b53: always rejoin default untagged VLAN on bridge leave

While JOIN_ALL_VLAN allows to join all VLANs, we still need to keep the
default VLAN enabled so that untagged traffic stays untagged.

So rejoin the default VLAN even for switches with JOIN_ALL_VLAN support.

Fixes: 48aea33a77ab ("net: dsa: b53: Add JOIN_ALL_VLAN support")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250429201710.330937-7-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: b53: fix VLAN ID for untagged vlan on bridge leave
Jonas Gorski [Tue, 29 Apr 2025 20:17:04 +0000 (22:17 +0200)]
net: dsa: b53: fix VLAN ID for untagged vlan on bridge leave

The untagged default VLAN is added to the default vlan, which may be
one, but we modify the VLAN 0 entry on bridge leave.

Fix this to use the correct VLAN entry for the default pvid.

Fixes: fea83353177a ("net: dsa: b53: Fix default VLAN ID")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250429201710.330937-6-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: b53: fix flushing old pvid VLAN on pvid change
Jonas Gorski [Tue, 29 Apr 2025 20:17:03 +0000 (22:17 +0200)]
net: dsa: b53: fix flushing old pvid VLAN on pvid change

Presumably the intention here was to flush the VLAN of the old pvid, not
the added VLAN again, which we already flushed before.

Fixes: a2482d2ce349 ("net: dsa: b53: Plug in VLAN support")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250429201710.330937-5-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: b53: fix clearing PVID of a port
Jonas Gorski [Tue, 29 Apr 2025 20:17:02 +0000 (22:17 +0200)]
net: dsa: b53: fix clearing PVID of a port

Currently the PVID of ports are only set when adding/updating VLANs with
PVID set or removing VLANs, but not when clearing the PVID flag of a
VLAN.

E.g. the following flow

$ ip link add br0 type bridge vlan_filtering 1
$ ip link set sw1p1 master bridge
$ bridge vlan add dev sw1p1 vid 10 pvid untagged
$ bridge vlan add dev sw1p1 vid 10 untagged

Would keep the PVID set as 10, despite the flag being cleared. Fix this
by checking if we need to unset the PVID on vlan updates.

Fixes: a2482d2ce349 ("net: dsa: b53: Plug in VLAN support")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250429201710.330937-4-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: b53: keep CPU port always tagged again
Jonas Gorski [Tue, 29 Apr 2025 20:17:01 +0000 (22:17 +0200)]
net: dsa: b53: keep CPU port always tagged again

The Broadcom management header does not carry the original VLAN tag
state information, just the ingress port, so for untagged frames we do
not know from which VLAN they originated.

Therefore keep the CPU port always tagged except for VLAN 0.

Fixes the following setup:

$ ip link add br0 type bridge vlan_filtering 1
$ ip link set sw1p1 master br0
$ bridge vlan add dev br0 pvid untagged self
$ ip link add sw1p2.10 link sw1p2 type vlan id 10

Where VID 10 would stay untagged on the CPU port.

Fixes: 2c32a3d3c233 ("net: dsa: b53: Do not force CPU to be always tagged")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250429201710.330937-3-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: b53: allow leaky reserved multicast
Jonas Gorski [Tue, 29 Apr 2025 20:17:00 +0000 (22:17 +0200)]
net: dsa: b53: allow leaky reserved multicast

Allow reserved multicast to ignore VLAN membership so STP and other
management protocols work without a PVID VLAN configured when using a
vlan aware bridge.

Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250429201710.330937-2-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: ibmveth: Refactored veth_pool_store for better maintainability
Dave Marquardt [Tue, 6 May 2025 16:00:04 +0000 (11:00 -0500)]
net: ibmveth: Refactored veth_pool_store for better maintainability

Make veth_pool_store detect requested pool changes, close device if
necessary, update pool, and reopen device.

Signed-off-by: Dave Marquardt <davemarq@linux.ibm.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250506160004.328347-1-davemarq@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'netlink-specs-remove-phantom-structs'
Jakub Kicinski [Thu, 8 May 2025 01:21:52 +0000 (18:21 -0700)]
Merge branch 'netlink-specs-remove-phantom-structs'

Jakub Kicinski says:

====================
netlink: specs: remove phantom structs

rt-netlink and nl80211 have a few structs which may be helpful for Python
decoding of binary attrs, but which don't actually exist in the C uAPI.
This prevents us from using struct pointers for binary types in C.

We could support this situation better in the codegen, or add these
structs to uAPI. That said Johannes suggested we remove the WiFi
structs for now, and the rt-link ones are semi-broken.
Drop the struct definitions, for now, if someone has a need to use
such structs in Python (as opposed to them being defined for completeness)
we can revist.

v1: https://lore.kernel.org/20250505170215.253672-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250506194101.696272-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonetlink: specs: rt-link: remove implicit structs from devconf
Jakub Kicinski [Tue, 6 May 2025 19:41:00 +0000 (12:41 -0700)]
netlink: specs: rt-link: remove implicit structs from devconf

devconf is even odder than SNMP. On input it reports an array of u32s
which seem to be indexed by the enum values - 1. On output kernel
expects a nest where each attr has the enum type as the nla type.

sub-type: u32 is probably best we can do right now.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250506194101.696272-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonetlink: specs: remove implicit structs for SNMP counters
Jakub Kicinski [Tue, 6 May 2025 19:40:59 +0000 (12:40 -0700)]
netlink: specs: remove implicit structs for SNMP counters

uAPI doesn't define structs for the SNMP counters, just enums to index
them as arrays. Switch to the same representation in the spec. C codegen
will soon need all the struct types to actually exist.

Note that the existing definition was broken, anyway, as the first
member should be the number of counters reported.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250506194101.696272-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonetlink: specs: ovs: correct struct names
Jakub Kicinski [Tue, 6 May 2025 19:40:58 +0000 (12:40 -0700)]
netlink: specs: ovs: correct struct names

C codegen will soon support using struct types for binary attrs.
Correct the struct names in OvS specs.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250506194101.696272-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonetlink: specs: nl80211: drop structs which are not uAPI
Jakub Kicinski [Tue, 6 May 2025 19:40:57 +0000 (12:40 -0700)]
netlink: specs: nl80211: drop structs which are not uAPI

C codegen will soon use structs for binary types. A handful of structs
in WiFi carry information elements from the wire, defined by the standard.
The structs are not part of uAPI, so we can't use them in C directly.
We could add them to the uAPI or add some annotation to tell the codegen
to output a local version to the user header. The former seems arbitrary
since we don't expose structs for most of the standard. The latter seems
like a lot of work for a rare occurrence. Drop the struct info for now.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/004030652d592b379e730be2f0344bebc4a03475.camel@sipsolutions.net
Link: https://patch.msgid.link/20250506194101.696272-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'tools-ynl-gen-split-presence-metadata'
Jakub Kicinski [Thu, 8 May 2025 01:21:28 +0000 (18:21 -0700)]
Merge branch 'tools-ynl-gen-split-presence-metadata'

Jakub Kicinski says:

====================
tools: ynl-gen: split presence metadata

The presence metadata indicates whether given attribute was/should be
added to the Netlink message. We have 3 types of such metadata:
 - bit presence for simple values like integers,
 - len presence for variable size attrs, like binary and strings,
 - count for arrays.

Previously this information was spread around with first two types
living in a dedicated sub-struct called _present. The counts resided
directly in the main struct with an n_ prefix.

Reshuffle these an uniformly store them in dedicated sub-structs.
The immediate motivation is that current scheme causes name collisions
for TC.
====================

Link: https://patch.msgid.link/20250505165208.248049-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agotools: ynl-gen: move the count into a presence struct too
Jakub Kicinski [Mon, 5 May 2025 16:52:07 +0000 (09:52 -0700)]
tools: ynl-gen: move the count into a presence struct too

While we reshuffle the presence members, move the counts as well.
Previously array count members would have been place directly in
the struct, so:

  struct family_op_req {
      struct {
            u32 a:1;
            u32 b:1;
      } _present;
      struct {
            u32 bin;
      } _len;

      u32 a;
      u64 b;
      const unsigned char *bin;
      u32 n_multi;                 << count
      u32 *multi;                  << objects
  };

Since len has been moved to its own presence struct move the count
as well:

  struct family_op_req {
      struct {
            u32 a:1;
            u32 b:1;
      } _present;
      struct {
            u32 bin;
      } _len;
      struct {
            u32 multi;             << count
      } _count;

      u32 a;
      u64 b;
      const unsigned char *bin;
      u32 *multi;                  << objects
  };

This improves the consistency and allows us to remove some hacks
in the codegen. Unlike for len there is no known name collision
with the existing scheme.

Link: https://patch.msgid.link/20250505165208.248049-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agotools: ynl-gen: split presence metadata
Jakub Kicinski [Mon, 5 May 2025 16:52:06 +0000 (09:52 -0700)]
tools: ynl-gen: split presence metadata

Each YNL struct contains the data and a sub-struct indicating which
fields are valid. Something like:

  struct family_op_req {
      struct {
            u32 a:1;
            u32 b:1;
    u32 bin_len;
      } _present;

      u32 a;
      u64 b;
      const unsigned char *bin;
  };

Note that the bin object 'bin' has a length stored, and that length
has a _len suffix added to the field name. This breaks if there
is a explicit field called bin_len, which is the case for some
TC actions. Move the length fields out of the _present struct,
create a new struct called _len:

  struct family_op_req {
      struct {
            u32 a:1;
            u32 b:1;
      } _present;
      struct {
    u32 bin;
      } _len;

      u32 a;
      u64 b;
      const unsigned char *bin;
  };

This should prevent name collisions and help with the packing
of the struct.

Unfortunately this is a breaking change, but hopefully the migration
isn't too painful.

Link: https://patch.msgid.link/20250505165208.248049-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agotools: ynl-gen: rename basic presence from 'bit' to 'present'
Jakub Kicinski [Mon, 5 May 2025 16:52:05 +0000 (09:52 -0700)]
tools: ynl-gen: rename basic presence from 'bit' to 'present'

Internal change to the code gen. Rename how we indicate a type
has a single bit presence from using a 'bit' string to 'present'.
This is a noop in terms of generated code but will make next
breaking change easier.

Reviewed-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250505165208.248049-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'bug-fixes-from-xdp-patch-series'
Jakub Kicinski [Thu, 8 May 2025 01:19:16 +0000 (18:19 -0700)]
Merge branch 'bug-fixes-from-xdp-patch-series'

Meghana Malladi says:

====================
Bug fixes from XDP patch series

This patch series fixes the bugs introduced while adding
xdp support in the icssg driver, and were reproduced while
running xdp-trafficgen to generate xdp traffic on icssg interfaces.

v1: https://lore.kernel.org/all/20250428120459.244525-1-m-malladi@ti.com/
====================

Link: https://patch.msgid.link/20250506110546.4065715-1-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: ti: icssg-prueth: Report BQL before sending XDP packets
Meghana Malladi [Tue, 6 May 2025 11:05:46 +0000 (16:35 +0530)]
net: ti: icssg-prueth: Report BQL before sending XDP packets

When sending out any kind of traffic, it is essential that the driver
keeps reporting BQL of the number of bytes that have been sent so that
BQL can track the amount of data in the queue and prevents it from
overflowing. If BQL is not reported, the driver may continue sending
packets even when the queue is full, leading to packet loss, congestion
and decreased network performance. Currently this is missing in
emac_xmit_xdp_frame() and this patch fixes it.

Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250506110546.4065715-4-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: ti: icssg-prueth: Fix kernel panic during concurrent Tx queue access
Meghana Malladi [Tue, 6 May 2025 11:05:45 +0000 (16:35 +0530)]
net: ti: icssg-prueth: Fix kernel panic during concurrent Tx queue access

Add __netif_tx_lock() to ensure that only one packet is being
transmitted at a time to avoid race conditions in the netif_txq
struct and prevent packet data corruption. Failing to do so causes
kernel panic with the following error:

[ 2184.746764] ------------[ cut here ]------------
[ 2184.751412] kernel BUG at lib/dynamic_queue_limits.c:99!
[ 2184.756728] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP

logs: https://gist.github.com/MeghanaMalladiTI/9c7aa5fc3b7fb03f87c74aad487956e9

The lock is acquired before calling emac_xmit_xdp_frame() and released after the
call returns. This ensures that the TX queue is protected from concurrent access
during the transmission of XDP frames.

Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250506110546.4065715-3-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: ti: icssg-prueth: Set XDP feature flags for ndev
Meghana Malladi [Tue, 6 May 2025 11:05:44 +0000 (16:35 +0530)]
net: ti: icssg-prueth: Set XDP feature flags for ndev

xdp_features demonstrates what all XDP capabilities are supported
on a given network device. The driver needs to set these xdp_features
flag to let the network stack know what XDP features a given driver
is supporting. These flags need to be set for a given ndev irrespective
of any XDP program being loaded or not.

Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250506110546.4065715-2-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agobpf: Clarify handling of mark and tstamp by redirect_peer
Paul Chaignon [Mon, 5 May 2025 19:58:39 +0000 (21:58 +0200)]
bpf: Clarify handling of mark and tstamp by redirect_peer

When switching network namespaces with the bpf_redirect_peer helper, the
skb->mark and skb->tstamp fields are not zeroed out like they can be on
a typical netns switch. This patch clarifies that in the helper
description.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/ccc86af26d43c5c0b776bcba2601b7479c0d46d0.1746460653.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agobpf: Scrub packet on bpf_redirect_peer
Paul Chaignon [Mon, 5 May 2025 19:58:04 +0000 (21:58 +0200)]
bpf: Scrub packet on bpf_redirect_peer

When bpf_redirect_peer is used to redirect packets to a device in
another network namespace, the skb isn't scrubbed. That can lead skb
information from one namespace to be "misused" in another namespace.

As one example, this is causing Cilium to drop traffic when using
bpf_redirect_peer to redirect packets that just went through IPsec
decryption to a container namespace. The following pwru trace shows (1)
the packet path from the host's XFRM layer to the container's XFRM
layer where it's dropped and (2) the number of active skb extensions at
each function.

    NETNS       MARK  IFACE  TUPLE                                FUNC
    4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  xfrm_rcv_cb
                             .active_extensions = (__u8)2,
    4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  xfrm4_rcv_cb
                             .active_extensions = (__u8)2,
    4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  gro_cells_receive
                             .active_extensions = (__u8)2,
    [...]
    4026533547  0     eth0   10.244.3.124:35473->10.244.2.158:53  skb_do_redirect
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  ip_rcv
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  ip_rcv_core
                             .active_extensions = (__u8)2,
    [...]
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  udp_queue_rcv_one_skb
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  __xfrm_policy_check
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  __xfrm_decode_session
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  security_xfrm_decode_session
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  kfree_skb_reason(SKB_DROP_REASON_XFRM_POLICY)
                             .active_extensions = (__u8)2,

In this case, there are no XFRM policies in the container's network
namespace so the drop is unexpected. When we decrypt the IPsec packet,
the XFRM state used for decryption is set in the skb extensions. This
information is preserved across the netns switch. When we reach the
XFRM policy check in the container's netns, __xfrm_policy_check drops
the packet with LINUX_MIB_XFRMINNOPOLS because a (container-side) XFRM
policy can't be found that matches the (host-side) XFRM state used for
decryption.

This patch fixes this by scrubbing the packet when using
bpf_redirect_peer, as is done on typical netns switches via veth
devices except skb->mark and skb->tstamp are not zeroed.

Fixes: 9aa1206e8f482 ("bpf: Add redirect_peer helper")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/1728ead5e0fe45e7a6542c36bd4e3ca07a73b7d6.1746460653.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: airoha: Add missing field to ppe_mbox_data struct
Lorenzo Bianconi [Tue, 6 May 2025 16:56:47 +0000 (18:56 +0200)]
net: airoha: Add missing field to ppe_mbox_data struct

The official Airoha EN7581 firmware requires adding max_packet field in
ppe_mbox_data struct while the unofficial one used to develop the Airoha
EN7581 flowtable support does not require this field.
This patch does not introduce any real backwards compatible issue since
EN7581 fw is not publicly available in linux-firmware or other
repositories (e.g. OpenWrt) yet and the official fw version will use this
new layout. For this reason this change needs to be backported.
Moreover, make explicit the padding added by the compiler introducing
the rsv array in init_info struct.
At the same time use u32 instead of int for init_info and set_info
struct definitions in ppe_mbox_data struct.

Fixes: 23290c7bc190d ("net: airoha: Introduce Airoha NPU support")
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250506-airoha-en7581-fix-ppe_mbox_data-v5-1-29cabed6864d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge tag 'nf-25-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Thu, 8 May 2025 00:57:03 +0000 (17:57 -0700)]
Merge tag 'nf-25-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contain Netfilter/IPVS fixes for net:

1) Fix KMSAN uninit-value in do_output_route4, reported by syzbot.
   Patch from Julian Anastasov.

2) ipset hashtable set type breaks up the hashtable into regions of
   2^10 buckets. Fix the macro that determines the hashtable lock
   region to protect concurrent updates. From Jozsef Kadlecsik.

* tag 'nf-25-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: ipset: fix region locking in hash types
  ipvs: fix uninit-value for saddr in do_output_route4
====================

Link: https://patch.msgid.link/20250507221952.86505-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoopenvswitch: Fix unsafe attribute parsing in output_userspace()
Eelco Chaudron [Tue, 6 May 2025 14:28:54 +0000 (16:28 +0200)]
openvswitch: Fix unsafe attribute parsing in output_userspace()

This patch replaces the manual Netlink attribute iteration in
output_userspace() with nla_for_each_nested(), which ensures that only
well-formed attributes are processed.

Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/0bd65949df61591d9171c0dc13e42cea8941da10.1746541734.git.echaudro@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonetfilter: ipset: fix region locking in hash types
Jozsef Kadlecsik [Wed, 7 May 2025 15:01:59 +0000 (17:01 +0200)]
netfilter: ipset: fix region locking in hash types

Region locking introduced in v5.6-rc4 contained three macros to handle
the region locks: ahash_bucket_start(), ahash_bucket_end() which gave
back the start and end hash bucket values belonging to a given region
lock and ahash_region() which should give back the region lock belonging
to a given hash bucket. The latter was incorrect which can lead to a
race condition between the garbage collector and adding new elements
when a hash type of set is defined with timeouts.

Fixes: f66ee0410b1c ("netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports")
Reported-by: Kota Toda <kota.toda@gmo-cybersecurity.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 months agoipvs: fix uninit-value for saddr in do_output_route4
Julian Anastasov [Fri, 2 May 2025 22:01:18 +0000 (01:01 +0300)]
ipvs: fix uninit-value for saddr in do_output_route4

syzbot reports for uninit-value for the saddr argument [1].
commit 4754957f04f5 ("ipvs: do not use random local source address for
tunnels") already implies that the input value of saddr
should be ignored but the code is still reading it which can prevent
to connect the route. Fix it by changing the argument to ret_saddr.

[1]
BUG: KMSAN: uninit-value in do_output_route4+0x42c/0x4d0 net/netfilter/ipvs/ip_vs_xmit.c:147
 do_output_route4+0x42c/0x4d0 net/netfilter/ipvs/ip_vs_xmit.c:147
 __ip_vs_get_out_rt+0x403/0x21d0 net/netfilter/ipvs/ip_vs_xmit.c:330
 ip_vs_tunnel_xmit+0x205/0x2380 net/netfilter/ipvs/ip_vs_xmit.c:1136
 ip_vs_in_hook+0x1aa5/0x35b0 net/netfilter/ipvs/ip_vs_core.c:2063
 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
 nf_hook_slow+0xf7/0x400 net/netfilter/core.c:626
 nf_hook include/linux/netfilter.h:269 [inline]
 __ip_local_out+0x758/0x7e0 net/ipv4/ip_output.c:118
 ip_local_out net/ipv4/ip_output.c:127 [inline]
 ip_send_skb+0x6a/0x3c0 net/ipv4/ip_output.c:1501
 udp_send_skb+0xfda/0x1b70 net/ipv4/udp.c:1195
 udp_sendmsg+0x2fe3/0x33c0 net/ipv4/udp.c:1483
 inet_sendmsg+0x1fc/0x280 net/ipv4/af_inet.c:851
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg+0x267/0x380 net/socket.c:727
 ____sys_sendmsg+0x91b/0xda0 net/socket.c:2566
 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2620
 __sys_sendmmsg+0x41d/0x880 net/socket.c:2702
 __compat_sys_sendmmsg net/compat.c:360 [inline]
 __do_compat_sys_sendmmsg net/compat.c:367 [inline]
 __se_compat_sys_sendmmsg net/compat.c:364 [inline]
 __ia32_compat_sys_sendmmsg+0xc8/0x140 net/compat.c:364
 ia32_sys_call+0x3ffa/0x41f0 arch/x86/include/generated/asm/syscalls_32.h:346
 do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline]
 __do_fast_syscall_32+0xb0/0x110 arch/x86/entry/syscall_32.c:306
 do_fast_syscall_32+0x38/0x80 arch/x86/entry/syscall_32.c:331
 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/syscall_32.c:369
 entry_SYSENTER_compat_after_hwframe+0x84/0x8e

Uninit was created at:
 slab_post_alloc_hook mm/slub.c:4167 [inline]
 slab_alloc_node mm/slub.c:4210 [inline]
 __kmalloc_cache_noprof+0x8fa/0xe00 mm/slub.c:4367
 kmalloc_noprof include/linux/slab.h:905 [inline]
 ip_vs_dest_dst_alloc net/netfilter/ipvs/ip_vs_xmit.c:61 [inline]
 __ip_vs_get_out_rt+0x35d/0x21d0 net/netfilter/ipvs/ip_vs_xmit.c:323
 ip_vs_tunnel_xmit+0x205/0x2380 net/netfilter/ipvs/ip_vs_xmit.c:1136
 ip_vs_in_hook+0x1aa5/0x35b0 net/netfilter/ipvs/ip_vs_core.c:2063
 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
 nf_hook_slow+0xf7/0x400 net/netfilter/core.c:626
 nf_hook include/linux/netfilter.h:269 [inline]
 __ip_local_out+0x758/0x7e0 net/ipv4/ip_output.c:118
 ip_local_out net/ipv4/ip_output.c:127 [inline]
 ip_send_skb+0x6a/0x3c0 net/ipv4/ip_output.c:1501
 udp_send_skb+0xfda/0x1b70 net/ipv4/udp.c:1195
 udp_sendmsg+0x2fe3/0x33c0 net/ipv4/udp.c:1483
 inet_sendmsg+0x1fc/0x280 net/ipv4/af_inet.c:851
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg+0x267/0x380 net/socket.c:727
 ____sys_sendmsg+0x91b/0xda0 net/socket.c:2566
 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2620
 __sys_sendmmsg+0x41d/0x880 net/socket.c:2702
 __compat_sys_sendmmsg net/compat.c:360 [inline]
 __do_compat_sys_sendmmsg net/compat.c:367 [inline]
 __se_compat_sys_sendmmsg net/compat.c:364 [inline]
 __ia32_compat_sys_sendmmsg+0xc8/0x140 net/compat.c:364
 ia32_sys_call+0x3ffa/0x41f0 arch/x86/include/generated/asm/syscalls_32.h:346
 do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline]
 __do_fast_syscall_32+0xb0/0x110 arch/x86/entry/syscall_32.c:306
 do_fast_syscall_32+0x38/0x80 arch/x86/entry/syscall_32.c:331
 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/syscall_32.c:369
 entry_SYSENTER_compat_after_hwframe+0x84/0x8e

CPU: 0 UID: 0 PID: 22408 Comm: syz.4.5165 Not tainted 6.15.0-rc3-syzkaller-00019-gbc3372351d0c #0 PREEMPT(undef)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025

Reported-by: syzbot+04b9a82855c8aed20860@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68138dfa.050a0220.14dd7d.0017.GAE@google.com/
Fixes: 4754957f04f5 ("ipvs: do not use random local source address for tunnels")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 months agoMerge tag 'erofs-for-6.15-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 7 May 2025 17:19:47 +0000 (10:19 -0700)]
Merge tag 'erofs-for-6.15-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

 - Add a new reviewer, Hongbo Li, for better community development

 - Fix an I/O hang out of file-backed mounts

 - Address a rare data corruption caused by concurrent I/Os on the same
   deduplicated compressed data

 - Minor cleanup

* tag 'erofs-for-6.15-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: ensure the extra temporary copy is valid for shortened bvecs
  erofs: remove unused enum type
  fs/erofs/fileio: call erofs_onlinefolio_split() after bio_add_folio()
  MAINTAINERS: erofs: add myself as reviewer

5 months agoMerge tag 'media/v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Wed, 7 May 2025 14:00:15 +0000 (07:00 -0700)]
Merge tag 'media/v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
 "Some Kconfig dependency fixes"

* tag 'media/v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: cec: tda9950: add back i2c dependency
  media: i2c: lt6911uxe: add two selects to Kconfig
  media: platform: synopsys: VIDEO_SYNOPSYS_HDMIRX should depend on ARCH_ROCKCHIP
  media: i2c: lt6911uxe: Fix Kconfig dependencies:
  media: vivid: fix FB dependency

5 months agoMerge branch 'lan78xx-phylink-prep'
David S. Miller [Wed, 7 May 2025 11:57:06 +0000 (12:57 +0100)]
Merge branch 'lan78xx-phylink-prep'

Oleksij Rempel says:

====================
lan78xx: preparation for PHYLINK conversion

This patch series contains the first part of the LAN78xx driver
refactoring in preparation for converting the driver to use the PHYLINK
framework.

The goal of this initial part is to reduce the size and complexity of
the final PHYLINK conversion by introducing incremental cleanups and
logical separation of concerns, such as:

- Improving error handling in the PHY initialization path
- Refactoring PHY detection and MAC-side configuration
- Moving LED DT configuration to a dedicated helper
- Separating USB link power and flow control setup from the main probe logic
- Extracting PHY interrupt acknowledgment logic

Each patch is self-contained and moves non-PHYLINK-specific logic out of
the way, setting the stage for the actual conversion in a follow-up
patch series.

changes v8 (as split from full v7 00/12 series):
- Split the original series to make review easier
- This part includes only preparation patches; actual PHYLINK
  integration will follow
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: usb: lan78xx: Extract flow control configuration to helper
Oleksij Rempel [Mon, 5 May 2025 08:43:41 +0000 (10:43 +0200)]
net: usb: lan78xx: Extract flow control configuration to helper

Move flow control register configuration from
lan78xx_update_flowcontrol() into a new helper function
lan78xx_configure_flowcontrol(). This separates hardware-specific
programming from policy logic and simplifies the upcoming phylink
integration.

The values used in this initial version of
lan78xx_configure_flowcontrol() are taken over as-is from the original
implementation to avoid functional changes. While they may not be
optimal for all USB and link speed combinations, they are known to work
reliably. Optimization of pause time and thresholds based on runtime
conditions can be done in a separate follow-up patch.

The forward declaration of lan78xx_configure_flowcontrol() will also be
removed later during the phylink conversion.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: usb: lan78xx: Refactor USB link power configuration into helper
Oleksij Rempel [Mon, 5 May 2025 08:43:40 +0000 (10:43 +0200)]
net: usb: lan78xx: Refactor USB link power configuration into helper

Move the USB link power configuration logic from lan78xx_link_reset()
to a new helper function lan78xx_configure_usb(). This simplifies the
main link reset path and isolates USB-specific logic.

The new function handles U1/U2 enablement based on Ethernet link speed,
but only for SuperSpeed-capable devices (LAN7800 and LAN7801). LAN7850,
a High-Speed-only device, is explicitly excluded. A warning is logged
if SuperSpeed is reported unexpectedly for LAN7850.

Add a forward declaration for lan78xx_configure_usb() as preparation for
the upcoming phylink conversion, where it will also be used from the
mac_link_up() callback.

Open questions remain:

- Why is the 1000 Mbps configuration split into two steps (U2 disable,
  then U1 enable), unlike the single-step config used for 10/100 Mbps?

- U1/U2 behavior appears to depend on proper EEPROM configuration.
  There are known devices in the field without EEPROM. Should the driver
  enforce safe defaults in such cases?

Due to lack of USB subsystem expertise, no changes were made to this logic
beyond structural refactoring.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: usb: lan78xx: Extract PHY interrupt acknowledgment to helper
Oleksij Rempel [Mon, 5 May 2025 08:43:39 +0000 (10:43 +0200)]
net: usb: lan78xx: Extract PHY interrupt acknowledgment to helper

Move the PHY interrupt acknowledgment logic from lan78xx_link_reset()
to a new helper function lan78xx_phy_int_ack(). This simplifies the
code and prepares for reusing the acknowledgment logic independently
from the full link reset process, such as when using phylink.

No functional change intended.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: usb: lan78xx: move LED DT configuration to helper
Oleksij Rempel [Mon, 5 May 2025 08:43:38 +0000 (10:43 +0200)]
net: usb: lan78xx: move LED DT configuration to helper

Extract the LED enable logic based on the "microchip,led-modes"
property into a new helper function lan78xx_configure_leds_from_dt().

This simplifies lan78xx_phy_init() and improves modularity.
No functional changes intended.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: usb: lan78xx: refactor PHY init to separate detection and MAC configuration
Oleksij Rempel [Mon, 5 May 2025 08:43:37 +0000 (10:43 +0200)]
net: usb: lan78xx: refactor PHY init to separate detection and MAC configuration

Split out PHY detection into lan78xx_get_phy() and MAC-side setup into
lan78xx_mac_prepare_for_phy(), making the main lan78xx_phy_init() cleaner
and easier to follow.

This improves separation of concerns and prepares the code for a future
transition to phylink. Fixed PHY registration and interface selection
are now handled in lan78xx_get_phy(), while MAC-side delay configuration
is done in lan78xx_mac_prepare_for_phy().

The fixed PHY fallback is preserved for setups like EVB-KSZ9897-1,
where LAN7801 connects directly to a KSZ switch without a standard PHY
or device tree support.

No functional changes intended.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: usb: lan78xx: remove explicit check for missing PHY driver
Oleksij Rempel [Mon, 5 May 2025 08:43:36 +0000 (10:43 +0200)]
net: usb: lan78xx: remove explicit check for missing PHY driver

RGMII timing correctness relies on the PHY providing internal delays.
This is typically ensured via PHY driver, strap pins, or PCB layout.

Explicitly checking for a PHY driver here is unnecessary and non-standard.
This logic applies to all MACs, not just LAN78xx, and should be left to
phylib, phylink, or platform configuration.

Drop the check and rely on standard subsystem behavior.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: usb: lan78xx: Improve error handling in PHY initialization
Oleksij Rempel [Mon, 5 May 2025 08:43:35 +0000 (10:43 +0200)]
net: usb: lan78xx: Improve error handling in PHY initialization

Ensure that return values from `lan78xx_write_reg()`,
`lan78xx_read_reg()`, and `phy_find_first()` are properly checked and
propagated. Use `ERR_PTR(ret)` for error reporting in
`lan7801_phy_init()` and replace `-EIO` with `-ENODEV` where appropriate
to provide more accurate error codes.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agos390/pci: Fix duplicate pci_dev_put() in disable_slot() when PF has child VFs
Niklas Schnelle [Wed, 30 Apr 2025 13:26:19 +0000 (15:26 +0200)]
s390/pci: Fix duplicate pci_dev_put() in disable_slot() when PF has child VFs

With commit bcb5d6c76903 ("s390/pci: introduce lock to synchronize state
of zpci_dev's") the code to ignore power off of a PF that has child VFs
was changed from a direct return to a goto to the unlock and
pci_dev_put() section. The change however left the existing pci_dev_put()
untouched resulting in a doubple put. This can subsequently cause a use
after free if the struct pci_dev is released in an unexpected state.
Fix this by removing the extra pci_dev_put().

Cc: stable@vger.kernel.org
Fixes: bcb5d6c76903 ("s390/pci: introduce lock to synchronize state of zpci_dev's")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
5 months agos390/pci: Fix missing check for zpci_create_device() error return
Niklas Schnelle [Wed, 30 Apr 2025 13:26:18 +0000 (15:26 +0200)]
s390/pci: Fix missing check for zpci_create_device() error return

The zpci_create_device() function returns an error pointer that needs to
be checked before dereferencing it as a struct zpci_dev pointer. Add the
missing check in __clp_add() where it was missed when adding the
scan_list in the fixed commit. Simply not adding the device to the scan
list results in the previous behavior.

Cc: stable@vger.kernel.org
Fixes: 0467cdde8c43 ("s390/pci: Sort PCI functions prior to creating virtual busses")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
5 months agoMerge tag 'wireless-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Wed, 7 May 2025 02:06:50 +0000 (19:06 -0700)]
Merge tag 'wireless-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Couple of fixes:
 * iwlwifi: add two missing device entries
 * cfg80211: fix a potential out-of-bounds access
 * mac80211: fix format of TID to link mapping action frames

* tag 'wireless-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: add support for Killer on MTL
  wifi: mac80211: fix the type of status_code for negotiated TID to Link Mapping
  wifi: cfg80211: fix out-of-bounds access during multi-link element defragmentation
====================

Link: https://patch.msgid.link/20250506203506.158818-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge tag 'wireless-next-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Wed, 7 May 2025 02:04:42 +0000 (19:04 -0700)]
Merge tag 'wireless-next-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
wireless features, notably

 * stack
   - free SKBTX_WIFI_STATUS flag
   - fixes for VLAN multicast in multi-link
   - improve codel parameters (revert some old twiddling)
 * ath12k
   - Enable AHB support for IPQ5332.
   - Add monitor interface support to QCN9274.
   - Add MLO support to WCN7850.
   - Add 802.11d scan offload support to WCN7850.
 * ath11k
   - Restore hibernation support
 * iwlwifi
   - EMLSR on two 5 GHz links
 * mwifiex
   - cleanups/refactoring

along with many other small features/cleanups

* tag 'wireless-next-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (177 commits)
  Revert "wifi: iwlwifi: clean up config macro"
  wifi: iwlwifi: move phy_filters to fw_runtime
  wifi: iwlwifi: pcie: make sure to lock rxq->read
  wifi: iwlwifi: add definitions for iwl_mac_power_cmd version 2
  wifi: iwlwifi: clean up config macro
  wifi: iwlwifi: mld: simplify iwl_mld_rx_fill_status()
  wifi: iwlwifi: mld: rx: simplify channel handling
  wifi: iwlwifi: clean up band in RX metadata
  wifi: iwlwifi: mld: skip unknown FW channel load values
  wifi: iwlwifi: define API for external FSEQ images
  wifi: iwlwifi: mld: allow EMLSR on separated 5 GHz subbands
  wifi: iwlwifi: mld: use cfg80211_chandef_get_width()
  wifi: iwlwifi: mld: fix iwl_mld_emlsr_disallowed_with_link() return
  wifi: iwlwifi: mld: clarify variable type
  wifi: iwlwifi: pcie: add support for the reset handshake in MSI
  wifi: mac80211_hwsim: Prevent tsf from setting if beacon is disabled
  wifi: mac80211: restructure tx profile retrieval for MLO MBSSID
  wifi: nl80211: add link id of transmitted profile for MLO MBSSID
  wifi: ieee80211: Add helpers to fetch EMLSR delay and timeout values
  wifi: mac80211: update ML STA with EML capabilities
  ...

====================

Link: https://patch.msgid.link/20250506174656.119970-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge tag 'linux-can-fixes-for-6.15-20250506' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Wed, 7 May 2025 01:56:35 +0000 (18:56 -0700)]
Merge tag 'linux-can-fixes-for-6.15-20250506' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2025-05-06

The first patch is by Antonios Salios and adds a missing
spin_lock_init() to the m_can driver.

The next 3 patches are by me and fix the unregistration order in the
mcp251xfd, rockchip_canfd and m_can driver.

The last patch is by Oliver Hartkopp and fixes RCU and BH
locking/handling in the CAN gw protocol.

* tag 'linux-can-fixes-for-6.15-20250506' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: gw: fix RCU/BH usage in cgw_create_job()
  can: mcan: m_can_class_unregister(): fix order of unregistration calls
  can: rockchip_canfd: rkcanfd_remove(): fix order of unregistration calls
  can: mcp251xfd: mcp251xfd_remove(): fix order of unregistration calls
  can: mcp251xfd: fix TDC setting for low data bit rates
  can: m_can: m_can_class_allocate_dev(): initialize spin lock on device probe
====================

Link: https://patch.msgid.link/20250506135939.652543-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: add missing instance lock to dev_set_promiscuity
Stanislav Fomichev [Tue, 6 May 2025 01:19:19 +0000 (18:19 -0700)]
net: add missing instance lock to dev_set_promiscuity

Accidentally spotted while trying to understand what else needs
to be renamed to netif_ prefix. Most of the calls to dev_set_promiscuity
are adjacent to dev_set_allmulti or dev_disable_lro so it should
be safe to add the lock. Note that new netif_set_promiscuity is
currently unused, the locked paths call __dev_set_promiscuity directly.

Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250506011919.2882313-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoerofs: ensure the extra temporary copy is valid for shortened bvecs
Gao Xiang [Tue, 6 May 2025 10:18:50 +0000 (18:18 +0800)]
erofs: ensure the extra temporary copy is valid for shortened bvecs

When compressed data deduplication is enabled, multiple logical extents
may reference the same compressed physical cluster.

The previous commit 94c43de73521 ("erofs: fix wrong primary bvec
selection on deduplicated extents") already avoids using shortened
bvecs.  However, in such cases, the extra temporary buffers also
need to be preserved for later use in z_erofs_fill_other_copies() to
to prevent data corruption.

IOWs, extra temporary buffers have to be retained not only due to
varying start relative offsets (`pageofs_out`, as indicated by
`pcl->multibases`) but also because of shortened bvecs.

android.hardware.graphics.composer@2.1.so : 270696 bytes
   0:        0..  204185 |  204185 :  628019200.. 628084736 |   65536
-> 1:   204185..  225536 |   21351 :  544063488.. 544129024 |   65536
   2:   225536..  270696 |   45160 :          0..         0 |       0

com.android.vndk.v28.apex : 93814897 bytes
...
   364: 53869896..54095257 |  225361 :  543997952.. 544063488 |   65536
-> 365: 54095257..54309344 |  214087 :  544063488.. 544129024 |   65536
   366: 54309344..54514557 |  205213 :  544129024.. 544194560 |   65536
...

Both 204185 and 54095257 have the same start relative offset of 3481,
but the logical page 55 of `android.hardware.graphics.composer@2.1.so`
ranges from 225280 to 229632, forming a shortened bvec [225280, 225536)
that cannot be used for decompressing the range from 54095257 to
54309344 of `com.android.vndk.v28.apex`.

Since `pcl->multibases` is already meaningless, just mark `be->keepxcpy`
on demand for simplicity.

Again, this issue can only lead to data corruption if `-Ededupe` is on.

Fixes: 94c43de73521 ("erofs: fix wrong primary bvec selection on deduplicated extents")
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250506101850.191506-1-hsiangkao@linux.alibaba.com
5 months agonet: Lock netdevices during dev_shutdown
Cosmin Ratiu [Mon, 5 May 2025 19:47:13 +0000 (22:47 +0300)]
net: Lock netdevices during dev_shutdown

__qdisc_destroy() calls into various qdiscs .destroy() op, which in turn
can call .ndo_setup_tc(), which requires the netdev instance lock.

This commit extends the critical section in
unregister_netdevice_many_notify() to cover dev_shutdown() (and
dev_tcx_uninstall() as a side-effect) and acquires the netdev instance
lock in __dev_change_net_namespace() for the other dev_shutdown() call.

This should now guarantee that for all qdisc ops, the netdev instance
lock is held during .ndo_setup_tc().

Fixes: a0527ee2df3f ("net: hold netdev instance lock during qdisc ndo_setup_tc")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250505194713.1723399-1-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoice: use DSN instead of PCI BDF for ice_adapter index
Przemek Kitszel [Mon, 5 May 2025 16:19:38 +0000 (09:19 -0700)]
ice: use DSN instead of PCI BDF for ice_adapter index

Use Device Serial Number instead of PCI bus/device/function for
the index of struct ice_adapter.

Functions on the same physical device should point to the very same
ice_adapter instance, but with two PFs, when at least one of them is
PCI-e passed-through to a VM, it is no longer the case - PFs will get
seemingly random PCI BDF values, and thus indices, what finally leds to
each of them being on their own instance of ice_adapter. That causes them
to don't attempt any synchronization of the PTP HW clock usage, or any
other future resources.

DSN works nicely in place of the index, as it is "immutable" in terms of
virtualization.

Fixes: 0e2bddf9e5f9 ("ice: add ice_adapter for shared data across PFs on the same NIC")
Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250505161939.2083581-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'devlink-sanitize-variable-typed-attributes'
Jakub Kicinski [Wed, 7 May 2025 01:20:12 +0000 (18:20 -0700)]
Merge branch 'devlink-sanitize-variable-typed-attributes'

Jiri Pirko says:

====================
devlink: sanitize variable typed attributes

This is continuation based on first two patches of
https://lore.kernel.org/20250425214808.507732-1-saeed@kernel.org

Better to take it as a separate patchset, as the rest of the original
patchset is probably settled.

This patchset is taking care of incorrect usage of internal NLA_* values
in uapi, introduces new enum (in patch #2) that shadows NLA_* values and
makes that part of UAPI.

The last two patches removes unnecessary translations with maintaining
clear devlink param driver api.
====================

Link: https://patch.msgid.link/20250505114513.53370-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agodevlink: use DEVLINK_VAR_ATTR_TYPE_* instead of NLA_* in fmsg
Jiri Pirko [Mon, 5 May 2025 11:45:13 +0000 (13:45 +0200)]
devlink: use DEVLINK_VAR_ATTR_TYPE_* instead of NLA_* in fmsg

Use newly introduced DEVLINK_VAR_ATTR_TYPE_* enum values instead of
internal NLA_* in fmsg health reporter code.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250505114513.53370-5-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agodevlink: avoid param type value translations
Jiri Pirko [Mon, 5 May 2025 11:45:12 +0000 (13:45 +0200)]
devlink: avoid param type value translations

Assign DEVLINK_PARAM_TYPE_* enum values to DEVLINK_VAR_ATTR_TYPE_* to
ensure the same values are used internally and in UAPI. Benefit from
that by removing the value translations.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250505114513.53370-4-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agodevlink: define enum for attr types of dynamic attributes
Jiri Pirko [Mon, 5 May 2025 11:45:11 +0000 (13:45 +0200)]
devlink: define enum for attr types of dynamic attributes

Devlink param and health reporter fmsg use attributes with dynamic type
which is determined according to a different type. Currently used values
are NLA_*. The problem is, they are not part of UAPI. They may change
which would cause a break.

To make this future safe, introduce a enum that shadows NLA_* values in
it and is part of UAPI.

Also, this allows to possibly carry types that are unrelated to NLA_*
values.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250505114513.53370-3-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agotools: ynl-gen: allow noncontiguous enums
Jiri Pirko [Mon, 5 May 2025 11:45:10 +0000 (13:45 +0200)]
tools: ynl-gen: allow noncontiguous enums

in case the enum has holes, instead of hard stop, generate a validation
callback to check valid enum values.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250505114513.53370-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>