]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
3 months agodt-bindings: net: ftgmac100: Add resets property
Jacky Chou [Wed, 9 Jul 2025 07:08:06 +0000 (15:08 +0800)]
dt-bindings: net: ftgmac100: Add resets property

In Aspeed AST2600 design, the MAC internal delay on MAC register cannot
fully reset the RMII interfaces, it may cause the RMII incompletely.
Therefore, we need to add resets property to do SoC-level reset line to
reset the whole MAC function that includes ftgmac, RGMII and RMII.

Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250709070809.2560688-2-jacky_chou@aspeedtech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: mana: fix spelling for mana_gd_deregiser_irq()
Shradha Gupta [Wed, 9 Jul 2025 13:43:00 +0000 (06:43 -0700)]
net: mana: fix spelling for mana_gd_deregiser_irq()

Fix the typo in function name mana_gd_deregiser_irq()

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1752068580-27215-1-git-send-email-shradhagupta@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: pse-pd: pd692x0: reduce stack usage in pd692x0_setup_pi_matrix
Arnd Bergmann [Wed, 9 Jul 2025 15:32:04 +0000 (17:32 +0200)]
net: pse-pd: pd692x0: reduce stack usage in pd692x0_setup_pi_matrix

The pd692x0_manager array in this function is really too big to fit on the
stack, though this never triggered a warning until a recent patch made
it slightly bigger:

drivers/net/pse-pd/pd692x0.c: In function 'pd692x0_setup_pi_matrix':
drivers/net/pse-pd/pd692x0.c:1210:1: error: the frame size of 1584 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]

Change the function to dynamically allocate the array here.

Fixes: 359754013e6a ("net: pse-pd: pd692x0: Add support for PSE PI priority feature")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250709153210.1920125-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'ethtool-rss-report-which-fields-are-configured-for-hashing'
Jakub Kicinski [Fri, 11 Jul 2025 00:57:53 +0000 (17:57 -0700)]
Merge branch 'ethtool-rss-report-which-fields-are-configured-for-hashing'

Jakub Kicinski says:

====================
ethtool: rss: report which fields are configured for hashing

Add support for reading flow hash configuration via Netlink ethtool.

  $ ynl --family ethtool --dump rss-get
     [{
        "header": {
            "dev-index": 1,
            "dev-name": "enp1s0"
        },
        "hfunc": 1,
        "hkey": b"...",
        "indir": [0, 1, ...],
        "flow-hash": {
            "ether": {"l2da"},
            "ah-esp4": {"ip-src", "ip-dst"},
            "ah-esp6": {"ip-src", "ip-dst"},
            "ah4": {"ip-src", "ip-dst"},
            "ah6": {"ip-src", "ip-dst"},
            "esp4": {"ip-src", "ip-dst"},
            "esp6": {"ip-src", "ip-dst"},
            "ip4": {"ip-src", "ip-dst"},
            "ip6": {"ip-src", "ip-dst"},
            "sctp4": {"ip-src", "ip-dst"},
            "sctp6": {"ip-src", "ip-dst"},
            "udp4": {"ip-src", "ip-dst"},
            "udp6": {"ip-src", "ip-dst"}
            "tcp4": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"},
            "tcp6": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"},
        },
     }]
====================

Link: https://patch.msgid.link/20250708220640.2738464-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoselftests: drv-net: test RSS header field configuration
Jakub Kicinski [Tue, 8 Jul 2025 22:06:40 +0000 (15:06 -0700)]
selftests: drv-net: test RSS header field configuration

Test reading RXFH fields over IOCTL and netlink.

  # ./tools/testing/selftests/drivers/net/hw/rss_api.py
  TAP version 13
  1..3
  ok 1 rss_api.test_rxfh_indir_ntf
  ok 2 rss_api.test_rxfh_indir_ctx_ntf
  ok 3 rss_api.test_rxfh_fields
  # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0

Link: https://patch.msgid.link/20250708220640.2738464-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoethtool: rss: report which fields are configured for hashing
Jakub Kicinski [Tue, 8 Jul 2025 22:06:39 +0000 (15:06 -0700)]
ethtool: rss: report which fields are configured for hashing

Implement ETHTOOL_GRXFH over Netlink. The number of flow types is
reasonable (around 20) so report all of them at once for simplicity.

Do not maintain the flow ID mapping with ioctl at the uAPI level.
This gives us a chance to clean up the confusion that come from
RxNFC vs RxFH (flow direction vs hashing) in the ioctl.
Try to align with the names used in ethtool CLI, they seem to have
stood the test of time just fine. One annoyance is that we still
call L4 ports the weird names, but I guess they also apply to IPSec
(where they cover the SPI) so it is what it is.

 $ ynl --family ethtool --dump rss-get
 {
    "header": {
"dev-index": 1,
"dev-name": "enp1s0"
    },
    "hfunc": 1,
    "hkey": b"...",
    "indir": [0, 1, ...],
    "flow-hash": {
        "ether": {"l2da"},
"ah-esp4": {"ip-src", "ip-dst"},
        "ah-esp6": {"ip-src", "ip-dst"},
        "ah4": {"ip-src", "ip-dst"},
        "ah6": {"ip-src", "ip-dst"},
        "esp4": {"ip-src", "ip-dst"},
        "esp6": {"ip-src", "ip-dst"},
        "ip4": {"ip-src", "ip-dst"},
        "ip6": {"ip-src", "ip-dst"},
        "sctp4": {"ip-src", "ip-dst"},
        "sctp6": {"ip-src", "ip-dst"},
        "udp4": {"ip-src", "ip-dst"},
        "udp6": {"ip-src", "ip-dst"}
        "tcp4": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"},
        "tcp6": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"},
    },
 }

Link: https://patch.msgid.link/20250708220640.2738464-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoethtool: mark ETHER_FLOW as usable for Rx hash
Jakub Kicinski [Tue, 8 Jul 2025 22:06:38 +0000 (15:06 -0700)]
ethtool: mark ETHER_FLOW as usable for Rx hash

Looks like some drivers (ena, enetc, fbnic.. there's probably more)
consider ETHER_FLOW to be legitimate target for flow hashing.
I'm not sure how intentional that is from the uAPI perspective
vs just an effect of ethtool IOCTL doing minimal input validation.
But Netlink will do strict validation, so we need to decide whether
we allow this use case or not. I don't see a strong reason against
it, and rejecting it would potentially regress a number of drivers.
So update the comments and flow_type_hashable().

Link: https://patch.msgid.link/20250708220640.2738464-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agotools: ynl: decode enums in auto-ints
Jakub Kicinski [Tue, 8 Jul 2025 22:06:37 +0000 (15:06 -0700)]
tools: ynl: decode enums in auto-ints

Use enum decoding on auto-ints. Looks like we only had enum
auto-ints for input values until now. Upcoming RSS work will
need this to declare an attribute with flags as a uint.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250708220640.2738464-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoethtool: rss: make sure dump takes the rss lock
Jakub Kicinski [Tue, 8 Jul 2025 22:06:36 +0000 (15:06 -0700)]
ethtool: rss: make sure dump takes the rss lock

After commit 040cef30b5e6 ("net: ethtool: move get_rxfh callback
under the rss_lock") we're expected to take rss_lock around get.
Switch dump to using the new prep helper and move the locking into it.

Link: https://patch.msgid.link/20250708220640.2738464-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge tag 'wireless-next-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Fri, 11 Jul 2025 00:24:21 +0000 (17:24 -0700)]
Merge tag 'wireless-next-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Quite a bit more work, notably:
 - mt76: firmware recovery improvements, MLO work
 - iwlwifi: use embedded PNVM in (to be released) FW images
            to fix compatibility issues
 - cfg80211/mac80211: extended regulatory info support (6 GHz)
 - cfg80211: use "faux device" for regulatory

* tag 'wireless-next-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (48 commits)
  wifi: mac80211: don't complete management TX on SAE commit
  wifi: cfg80211/mac80211: implement dot11ExtendedRegInfoSupport
  wifi: mac80211: send extended MLD capa/ops if AP has it
  wifi: mac80211: copy first_part into HW scan
  wifi: cfg80211: add a flag for the first part of a scan
  wifi: mac80211: remove DISALLOW_PUNCTURING_5GHZ code
  wifi: cfg80211: only verify part of Extended MLD Capabilities
  wifi: nl80211: make nl80211_check_scan_flags() type safe
  wifi: cfg80211: hide scan internals
  wifi: mac80211: fix deactivated link CSA
  wifi: mac80211: add mandatory bitrate support for 6 GHz
  wifi: mac80211: remove spurious blank line
  wifi: mac80211: verify state before connection
  wifi: mac80211: avoid weird state in error path
  wifi: iwlwifi: mvm: remove support for iwl_wowlan_info_notif_v4
  wifi: iwlwifi: bump minimum API version in BZ
  wifi: iwlwifi: mvm: remove unneeded argument
  wifi: iwlwifi: mvm: remove MLO GTK rekey code
  wifi: iwlwifi: pcie: rename iwl_pci_gen1_2_probe() argument
  wifi: iwlwifi: match discrete/integrated to fix some names
  ...
====================

Link: https://patch.msgid.link/20250710123113.24878-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: replace ND_PRINTK with dynamic debug
Wang Liang [Tue, 8 Jul 2025 03:33:42 +0000 (11:33 +0800)]
net: replace ND_PRINTK with dynamic debug

ND_PRINTK with val > 1 only works when the ND_DEBUG was set in compilation
phase. Replace it with dynamic debug. Convert ND_PRINTK with val <= 1 to
net_{err,warn}_ratelimited, and convert the rest to net_dbg_ratelimited.

Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250708033342.1627636-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'riscv-sophgo-add-ethernet-support-for-sg2042'
Jakub Kicinski [Thu, 10 Jul 2025 22:26:57 +0000 (15:26 -0700)]
Merge branch 'riscv-sophgo-add-ethernet-support-for-sg2042'

Inochi Amaoto says:

====================
riscv: sophgo: Add ethernet support for SG2042

The ethernet controller of SG2042 is Synopsys DesignWare IP with
tx clock. Add device id for it.

This patch can only be tested on a SG2042 evb board, as pioneer
does not expose this device.

The user dts patch link:
https://lore.kernel.org/linux-riscv/cover.1751700954.git.rabenda.cn@gmail.com
====================

Link: https://patch.msgid.link/20250708064052.507094-1-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: stmmac: platform: Add snps,dwmac-5.00a IP compatible string
Inochi Amaoto [Tue, 8 Jul 2025 06:40:51 +0000 (14:40 +0800)]
net: stmmac: platform: Add snps,dwmac-5.00a IP compatible string

Add "snps,dwmac-5.30a" compatible string for 5.00a version that
can avoid to define some platform data in the glue layer.

Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Tested-by: Han Gao <rabenda.cn@gmail.com>
Link: https://patch.msgid.link/20250708064052.507094-4-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: stmmac: dwmac-sophgo: Add support for Sophgo SG2042 SoC
Inochi Amaoto [Tue, 8 Jul 2025 06:40:50 +0000 (14:40 +0800)]
net: stmmac: dwmac-sophgo: Add support for Sophgo SG2042 SoC

Adds device id of the ethernet controller on the Sophgo SG2042 SoC.

Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Tested-by: Han Gao <rabenda.cn@gmail.com>
Link: https://patch.msgid.link/20250708064052.507094-3-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: net: sophgo,sg2044-dwmac: Add support for Sophgo SG2042 dwmac
Inochi Amaoto [Tue, 8 Jul 2025 06:40:49 +0000 (14:40 +0800)]
dt-bindings: net: sophgo,sg2044-dwmac: Add support for Sophgo SG2042 dwmac

The GMAC IP on SG2042 is a standard Synopsys DesignWare MAC
(version 5.00a) with tx clock.

Add necessary compatible string for this device.

Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Tested-by: Han Gao <rabenda.cn@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20250708064052.507094-2-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'further-mt7988-devicetree-work'
Jakub Kicinski [Thu, 10 Jul 2025 22:02:23 +0000 (15:02 -0700)]
Merge branch 'further-mt7988-devicetree-work'

Frank Wunderlich says:

====================
further mt7988 devicetree work

This series continues mt7988 devicetree work

- Extend cpu frequency scaling with CCI
- GPIO leds
- Basic network-support (ethernet controller + builtin switch + SFP Cages)

depencies (i hope this list is complete and latest patches/series linked):

support interrupt-names is optional again as i re-added the reserved IRQs
(they are not unusable as i thought and can allow features in future)
https://patchwork.kernel.org/project/netdevbpf/patch/20250619132125.78368-2-linux@fw-web.de/

needs change in mtk ethernet driver for the sram to be read from separate node:
https://patchwork.kernel.org/project/netdevbpf/patch/c2b9242229d06af4e468204bcf42daa1535c3a72.1751461762.git.daniel@makrotopia.org/

for SFP-Function (macs currently disabled):

PCS clearance which is a 1.5 year discussion currently ongoing

Daniel asked netdev for a way 2 go:
https://lore.kernel.org/netdev/aEwfME3dYisQtdCj@pidgin.makrotopia.org/

e.g. something like this (one of):
* https://patchwork.kernel.org/project/netdevbpf/patch/20250610233134.3588011-4-sean.anderson@linux.dev/ (v6)
* https://patchwork.kernel.org/project/netdevbpf/patch/20250511201250.3789083-4-ansuelsmth@gmail.com/ (v4)
* https://patchwork.kernel.org/project/netdevbpf/patch/ba4e359584a6b3bc4b3470822c42186d5b0856f9.1721910728.git.daniel@makrotopia.org/

full usxgmii driver:
https://patchwork.kernel.org/project/netdevbpf/patch/07845ec900ba41ff992875dce12c622277592c32.1702352117.git.daniel@makrotopia.org/

first PCS-discussion is here:
https://patchwork.kernel.org/project/netdevbpf/patch/8aa905080bdb6760875d62cb3b2b41258837f80e.1702352117.git.daniel@makrotopia.org/
some more here:
https://lore.kernel.org/netdev/20250511201250.3789083-4-ansuelsmth@gmail.com/

and then dts nodes for sgmiisys+usxgmii+2g5 firmware

when above depencies are solved the mac1/2 can be enabled and 2.5G phy/SFP slots will work.
====================

Link: https://patch.msgid.link/20250709111147.11843-1-linux@fw-web.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: net: dsa: mediatek,mt7530: add internal mdio bus
Frank Wunderlich [Wed, 9 Jul 2025 11:09:42 +0000 (13:09 +0200)]
dt-bindings: net: dsa: mediatek,mt7530: add internal mdio bus

Mt7988 buildin switch has own mdio bus where ge-phys are connected.
Add related property for this.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20250709111147.11843-7-linux@fw-web.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: net: dsa: mediatek,mt7530: add dsa-port definition for mt7988
Frank Wunderlich [Wed, 9 Jul 2025 11:09:41 +0000 (13:09 +0200)]
dt-bindings: net: dsa: mediatek,mt7530: add dsa-port definition for mt7988

Add own dsa-port binding for SoC with internal switch where only phy-mode
'internal' is valid.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20250709111147.11843-6-linux@fw-web.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: net: mediatek,net: add sram property
Frank Wunderlich [Wed, 9 Jul 2025 11:09:40 +0000 (13:09 +0200)]
dt-bindings: net: mediatek,net: add sram property

Meditak Filogic SoCs (MT798x) have dedicated MMIO-SRAM for dma operations.

MT7981 and MT7986 currently use static offset to ethernet MAC register
which will be changed in separate patch once this way is accepted.

Add "sram" property to map ethernet controller to dedicated mmio-sram node.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250709111147.11843-5-linux@fw-web.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: net: mediatek,net: allow irq names
Frank Wunderlich [Wed, 9 Jul 2025 11:09:39 +0000 (13:09 +0200)]
dt-bindings: net: mediatek,net: allow irq names

In preparation for MT7988 and RSS/LRO allow the interrupt-names
property.
In this way driver can request the interrupts by name which is much
more readable in the driver code and SoC's dtsi than relying on a
specific order.

Frame-engine-IRQs (fe0..3):
MT7621, MT7628: 1 FE-IRQ
MT7622, MT7623: 3 FE-IRQs (only two used by the driver for now)
MT7981, MT7986: 4 FE-IRQs (only two used by the driver for now)

RSS/LRO IRQs (pdma0..3) additional only on Filogic (MT798x) with
count of 4. So all IRQ-names (8) for Filogic.

Set boundaries for all compatibles same as irq count.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250709111147.11843-4-linux@fw-web.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: net: mediatek,net: allow up to 8 IRQs
Frank Wunderlich [Wed, 9 Jul 2025 11:09:38 +0000 (13:09 +0200)]
dt-bindings: net: mediatek,net: allow up to 8 IRQs

Increase the maximum IRQ count to 8 (4 FE + 4 RSS/LRO).

Frame-engine-IRQs (max 4):
MT7621, MT7628: 1 FE-IRQ
MT7622, MT7623: 3 FE-IRQs (only two used by the driver for now)
MT7981, MT7986, MT7988: 4 FE-IRQs (only two used by the driver for now)

Mediatek Filogic SoCs (mt798x) have 4 additional IRQs for RSS and/or
LRO. So MT798x have 8 IRQs in total.

MT7981 does not have a ethernet-node yet.
MT7986 Ethernet node is updated with RSS/LRO IRQs in this series.
MT7988 Ethernet node is added in this series.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250709111147.11843-3-linux@fw-web.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: net: mediatek,net: update mac subnode pattern for mt7988
Frank Wunderlich [Wed, 9 Jul 2025 11:09:37 +0000 (13:09 +0200)]
dt-bindings: net: mediatek,net: update mac subnode pattern for mt7988

MT7888 have 3 Macs and so its nodes have names from mac0 - mac2. Update
pattern to fix this.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250709111147.11843-2-linux@fw-web.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'virtio_udp_tunnel_08_07_2025' of https://github.com/pabeni/linux-devel
Jakub Kicinski [Thu, 10 Jul 2025 20:32:35 +0000 (13:32 -0700)]
Merge branch 'virtio_udp_tunnel_08_07_2025' of https://github.com/pabeni/linux-devel

Paolo Abeni says:

====================
virtio: introduce GSO over UDP tunnel

Some virtualized deployments use UDP tunnel pervasively and are impacted
negatively by the lack of GSO support for such kind of traffic in the
virtual NIC driver.

The virtio_net specification recently introduced support for GSO over
UDP tunnel, this series updates the virtio implementation to support
such a feature.

Currently the kernel virtio support limits the feature space to 64,
while the virtio specification allows for a larger number of features.
Specifically the GSO-over-UDP-tunnel-related virtio features use bits
65-69.

The first four patches in this series rework the virtio and vhost
feature support to cope with up to 128 bits. The limit is set by
a define and could be easily raised in future, as needed.

This implementation choice is aimed at keeping the code churn as
limited as possible. For the same reason, only the virtio_net driver is
reworked to leverage the extended feature space; all other
virtio/vhost drivers are unaffected, but could be upgraded to support
the extended features space in a later time.

The last four patches bring in the actual GSO over UDP tunnel support.
As per specification, some additional fields are introduced into the
virtio net header to support the new offload. The presence of such
fields depends on the negotiated features.

New helpers are introduced to convert the UDP-tunneled skb metadata to
an extended virtio net header and vice versa. Such helpers are used by
the tun and virtio_net driver to cope with the newly supported offloads.

Tested with basic stream transfer with all the possible permutations of
host kernel/qemu/guest kernel with/without GSO over UDP tunnel support.
====================

Link: https://patch.msgid.link/cover.1751874094.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 10 Jul 2025 17:08:47 +0000 (10:08 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.16-rc6).

No conflicts.

Adjacent changes:

Documentation/devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml
  0a12c435a1d6 ("dt-bindings: net: sun8i-emac: Add A100 EMAC compatible")
  b3603c0466a8 ("dt-bindings: net: sun8i-emac: Rename A523 EMAC0 to GMAC0")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge tag 'net-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 10 Jul 2025 16:18:53 +0000 (09:18 -0700)]
Merge tag 'net-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from Bluetooth.

  Current release - regressions:

   - tcp: refine sk_rcvbuf increase for ooo packets

   - bluetooth: fix attempting to send HCI_Disconnect to BIS handle

   - rxrpc: fix over large frame size warning

   - eth: bcmgenet: initialize u64 stats seq counter

  Previous releases - regressions:

   - tcp: correct signedness in skb remaining space calculation

   - sched: abort __tc_modify_qdisc if parent class does not exist

   - vsock: fix transport_{g2h,h2g} TOCTOU

   - rxrpc: fix bug due to prealloc collision

   - tipc: fix use-after-free in tipc_conn_close().

   - bluetooth: fix not marking Broadcast Sink BIS as connected

   - phy: qca808x: fix WoL issue by utilizing at8031_set_wol()

   - eth: am65-cpsw-nuss: fix skb size by accounting for skb_shared_info

  Previous releases - always broken:

   - netlink: fix wraparounds of sk->sk_rmem_alloc.

   - atm: fix infinite recursive call of clip_push().

   - eth:
      - stmmac: fix interrupt handling for level-triggered mode in DWC_XGMAC2
      - rtsn: fix a null pointer dereference in rtsn_probe()"

* tag 'net-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
  net/sched: sch_qfq: Fix null-deref in agg_dequeue
  rxrpc: Fix oops due to non-existence of prealloc backlog struct
  rxrpc: Fix bug due to prealloc collision
  MAINTAINERS: remove myself as netronome maintainer
  selftests/net: packetdrill: add tcp_ooo-before-and-after-accept.pkt
  tcp: refine sk_rcvbuf increase for ooo packets
  net/sched: Abort __tc_modify_qdisc if parent class does not exist
  net: ethernet: ti: am65-cpsw-nuss: Fix skb size by accounting for skb_shared_info
  net: thunderx: avoid direct MTU assignment after WRITE_ONCE()
  selftests/tc-testing: Create test case for UAF scenario with DRR/NETEM/BLACKHOLE chain
  atm: clip: Fix NULL pointer dereference in vcc_sendmsg()
  atm: clip: Fix infinite recursive call of clip_push().
  atm: clip: Fix memory leak of struct clip_vcc.
  atm: clip: Fix potential null-ptr-deref in to_atmarpd().
  net: phy: smsc: Fix link failure in forced mode with Auto-MDIX
  net: phy: smsc: Force predictable MDI-X state on LAN87xx
  net: phy: smsc: Fix Auto-MDIX configuration when disabled by strap
  net: stmmac: Fix interrupt handling for level-triggered mode in DWC_XGMAC2
  rxrpc: Fix over large frame size warning
  net: airoha: Fix an error handling path in airoha_probe()
  ...

3 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Thu, 10 Jul 2025 16:06:53 +0000 (09:06 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Many patches, pretty much all of them small, that accumulated while I
  was on vacation.

  ARM:

   - Remove the last leftovers of the ill-fated FPSIMD host state
     mapping at EL2 stage-1

   - Fix unexpected advertisement to the guest of unimplemented S2 base
     granule sizes

   - Gracefully fail initialising pKVM if the interrupt controller isn't
     GICv3

   - Also gracefully fail initialising pKVM if the carveout allocation
     fails

   - Fix the computing of the minimum MMIO range required for the host
     on stage-2 fault

   - Fix the generation of the GICv3 Maintenance Interrupt in nested
     mode

  x86:

   - Reject SEV{-ES} intra-host migration if one or more vCPUs are
     actively being created, so as not to create a non-SEV{-ES} vCPU in
     an SEV{-ES} VM

   - Use a pre-allocated, per-vCPU buffer for handling de-sparsification
     of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too
     large" issue

   - Allow out-of-range/invalid Xen event channel ports when configuring
     IRQ routing, to avoid dictating a specific ioctl() ordering to
     userspace

   - Conditionally reschedule when setting memory attributes to avoid
     soft lockups when userspace converts huge swaths of memory to/from
     private

   - Add back MWAIT as a required feature for the MONITOR/MWAIT selftest

   - Add a missing field in struct sev_data_snp_launch_start that
     resulted in the guest-visible workarounds field being filled at the
     wrong offset

   - Skip non-canonical address when processing Hyper-V PV TLB flushes
     to avoid VM-Fail on INVVPID

   - Advertise supported TDX TDVMCALLs to userspace

   - Pass SetupEventNotifyInterrupt arguments to userspace

   - Fix TSC frequency underflow"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: avoid underflow when scaling TSC frequency
  KVM: arm64: Remove kvm_arch_vcpu_run_map_fp()
  KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes
  KVM: arm64: Don't free hyp pages with pKVM on GICv2
  KVM: arm64: Fix error path in init_hyp_mode()
  KVM: arm64: Adjust range correctly during host stage-2 faults
  KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi()
  KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush
  KVM: SVM: Add missing member in SNP_LAUNCH_START command structure
  Documentation: KVM: Fix unexpected unindent warnings
  KVM: selftests: Add back the missing check of MONITOR/MWAIT availability
  KVM: Allow CPU to reschedule while setting per-page memory attributes
  KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table.
  KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks
  KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL
  KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight
  KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities
  KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt

3 months agoMerge branch 'net-dsa-rzn1_a5psw-add-compile_test'
Paolo Abeni [Thu, 10 Jul 2025 13:41:03 +0000 (15:41 +0200)]
Merge branch 'net-dsa-rzn1_a5psw-add-compile_test'

Rosen Penev says:

====================
net: dsa: rzn1_a5psw: add COMPILE_TEST

Allows the various bots to test compilation. Also threw in a small devm
conversion for enabling clocks.
====================

Link: https://patch.msgid.link/20250708014144.2514-1-rosenp@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agonet: dsa: rzn1_a5psw: use devm to enable clocks
Rosen Penev [Tue, 8 Jul 2025 01:41:44 +0000 (18:41 -0700)]
net: dsa: rzn1_a5psw: use devm to enable clocks

The remove function has these in the wrong order. The switch should be
unregistered last. Simpler to use devm so that the right thing is done.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20250708014144.2514-3-rosenp@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agonet: dsa: rzn1_a5psw: add COMPILE_TEST
Rosen Penev [Tue, 8 Jul 2025 01:41:43 +0000 (18:41 -0700)]
net: dsa: rzn1_a5psw: add COMPILE_TEST

There's no architecture specific requirement for it to compile. Allows
the bots to test compilation properly.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250708014144.2514-2-rosenp@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agonet: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt
Jason Xing [Fri, 4 Jul 2025 16:01:38 +0000 (00:01 +0800)]
net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt

This patch provides a setsockopt method to let applications leverage to
adjust how many descs to be handled at most in one send syscall. It
mitigates the situation where the default value (32) that is too small
leads to higher frequency of triggering send syscall.

Considering the prosperity/complexity the applications have, there is no
absolutely ideal suggestion fitting all cases. So keep 32 as its default
value like before.

The patch does the following things:
- Add XDP_MAX_TX_SKB_BUDGET socket option.
- Set max_tx_budget to 32 by default in the initialization phase as a
  per-socket granular control.
- Set the range of max_tx_budget as [32, xs->tx->nentries].

The idea behind this comes out of real workloads in production. We use a
user-level stack with xsk support to accelerate sending packets and
minimize triggering syscalls. When the packets are aggregated, it's not
hard to hit the upper bound (namely, 32). The moment user-space stack
fetches the -EAGAIN error number passed from sendto(), it will loop to try
again until all the expected descs from tx ring are sent out to the driver.
Enlarging the XDP_MAX_TX_SKB_BUDGET value contributes to less frequency of
sendto() and higher throughput/PPS.

Here is what I did in production, along with some numbers as follows:
For one application I saw lately, I suggested using 128 as max_tx_budget
because I saw two limitations without changing any default configuration:
1) XDP_MAX_TX_SKB_BUDGET, 2) socket sndbuf which is 212992 decided by
net.core.wmem_default. As to XDP_MAX_TX_SKB_BUDGET, the scenario behind
this was I counted how many descs are transmitted to the driver at one
time of sendto() based on [1] patch and then I calculated the
possibility of hitting the upper bound. Finally I chose 128 as a
suitable value because 1) it covers most of the cases, 2) a higher
number would not bring evident results. After twisting the parameters,
a stable improvement of around 4% for both PPS and throughput and less
resources consumption were found to be observed by strace -c -p xxx:
1) %time was decreased by 7.8%
2) error counter was decreased from 18367 to 572

[1]: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20250704160138.48677-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agonet/sched: sch_qfq: Fix null-deref in agg_dequeue
Xiang Mei [Sat, 5 Jul 2025 21:21:43 +0000 (14:21 -0700)]
net/sched: sch_qfq: Fix null-deref in agg_dequeue

To prevent a potential crash in agg_dequeue (net/sched/sch_qfq.c)
when cl->qdisc->ops->peek(cl->qdisc) returns NULL, we check the return
value before using it, similar to the existing approach in sch_hfsc.c.

To avoid code duplication, the following changes are made:

1. Changed qdisc_warn_nonwc(include/net/pkt_sched.h) into a static
inline function.

2. Moved qdisc_peek_len from net/sched/sch_hfsc.c to
include/net/pkt_sched.h so that sch_qfq can reuse it.

3. Applied qdisc_peek_len in agg_dequeue to avoid crashing.

Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250705212143.3982664-1-xmei5@asu.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agoMerge branch 'net-mlx5-misc-changes-2025-07-09'
Jakub Kicinski [Thu, 10 Jul 2025 02:47:47 +0000 (19:47 -0700)]
Merge branch 'net-mlx5-misc-changes-2025-07-09'

Tariq Toukan says:

====================
net/mlx5: misc changes 2025-07-09

This series contains misc enhancements to the mlx5 driver.
====================

Link: https://patch.msgid.link/1752009387-13300-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: RX, Remove unnecessary RQT redirects
Tariq Toukan [Tue, 8 Jul 2025 21:16:27 +0000 (00:16 +0300)]
net/mlx5e: RX, Remove unnecessary RQT redirects

RQTs (Receive Queue Table) should redirect traffic to the channels' RQs
when they're active.  Otherwise, redirect to the designated "drop RQ".

RQTs are created in "inactive" state, pointing to the "drop RQ".
In activate and de-activate flows, do not "deactivate" the rest of RQTs
(beyond the num of channels), as they are already inactive.

This cuts down unnecessary execution of FW commands (MODIFY_RQT), and
improves the latency of open/close channels or configuration change.

Perf:
NIC: Connect-X7.
Configuration: 1 combined channel, max num channels 248.
Measure time for "interface up + interface down".

Before: 0.313 sec
After:  0.057 sec (5.5x faster)

247 MODIFY_RQT commands saved in interface up.
247 MODIFY_RQT commands saved in interface down.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1752009387-13300-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5: Warn when write combining is not supported
Maor Gottlieb [Tue, 8 Jul 2025 21:16:26 +0000 (00:16 +0300)]
net/mlx5: Warn when write combining is not supported

Warn if write combining is not supported, as it can impact latency.
Add the warning message to be printed only when the driver actually
run the test and detect unsupported state, rather than when
inheriting parent's result for SFs.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1752009387-13300-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: Replace recursive VLAN push handling with an iterative loop
Gal Pressman [Tue, 8 Jul 2025 21:16:25 +0000 (00:16 +0300)]
net/mlx5e: Replace recursive VLAN push handling with an iterative loop

mlx5e_tc_act_vlan_add_push_action() uses tail-recursion to walk through
a stack of VLAN devices.

There is no need for a complicated recursion with unnecessary stack
consumption and less obvious code flow, rewrite the function so that it
uses a do while loop instead.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1752009387-13300-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: CT: extract a memcmp from a spinlock section
Cosmin Ratiu [Tue, 8 Jul 2025 21:16:24 +0000 (00:16 +0300)]
net/mlx5e: CT: extract a memcmp from a spinlock section

This reduces the time the lock is held and reduces contention.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1752009387-13300-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: Remove unused VLAN insertion logic in TX path
Carolina Jubran [Tue, 8 Jul 2025 21:16:23 +0000 (00:16 +0300)]
net/mlx5e: Remove unused VLAN insertion logic in TX path

The VLAN insertion capability (`wqe_vlan_insert`) was never enabled on
all mlx5 devices. When VLAN TX offload is advertised but this
capability is not supported, the driver uses inline headers to insert
the VLAN tag.

To support this, the driver used to set the
`MLX5E_SQ_STATE_VLAN_NEED_L2_INLINE` bit to enforce L2 inline mode
when `wqe_vlan_insert` was not supported. Since the capability is
disabled on all devices, this logic was always active, and the SQ flag
has become redundant. L2 inline is enforced unconditionally for
VLAN-tagged packets.

The `skb_vlan_tag_present()` check in the else-if block of
`mlx5e_sq_xmit_wqe()` is never true by this point in the TX flow,
as the VLAN tag has already been inserted by the driver using inline
headers. As a result, this code is never executed.

Remove the redundant SQ state, dead VLAN insertion code block, and
related logic.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1752009387-13300-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'rxrpc-miscellaneous-fixes'
Jakub Kicinski [Thu, 10 Jul 2025 02:41:45 +0000 (19:41 -0700)]
Merge branch 'rxrpc-miscellaneous-fixes'

David Howells says:

====================
rxrpc: Miscellaneous fixes

Here are some miscellaneous fixes for rxrpc:

 (1) Fix assertion failure due to preallocation collision.

 (2) Fix oops due to prealloc backlog struct not yet having been allocated
     if no service calls have yet been preallocated.
====================

Link: https://patch.msgid.link/20250708211506.2699012-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agorxrpc: Fix oops due to non-existence of prealloc backlog struct
David Howells [Tue, 8 Jul 2025 21:15:04 +0000 (22:15 +0100)]
rxrpc: Fix oops due to non-existence of prealloc backlog struct

If an AF_RXRPC service socket is opened and bound, but calls are
preallocated, then rxrpc_alloc_incoming_call() will oops because the
rxrpc_backlog struct doesn't get allocated until the first preallocation is
made.

Fix this by returning NULL from rxrpc_alloc_incoming_call() if there is no
backlog struct.  This will cause the incoming call to be aborted.

Reported-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com>
Suggested-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: LePremierHomme <kwqcheii@proton.me>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Willy Tarreau <w@1wt.eu>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250708211506.2699012-3-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agorxrpc: Fix bug due to prealloc collision
David Howells [Tue, 8 Jul 2025 21:15:03 +0000 (22:15 +0100)]
rxrpc: Fix bug due to prealloc collision

When userspace is using AF_RXRPC to provide a server, it has to preallocate
incoming calls and assign to them call IDs that will be used to thread
related recvmsg() and sendmsg() together.  The preallocated call IDs will
automatically be attached to calls as they come in until the pool is empty.

To the kernel, the call IDs are just arbitrary numbers, but userspace can
use the call ID to hold a pointer to prepared structs.  In any case, the
user isn't permitted to create two calls with the same call ID (call IDs
become available again when the call ends) and EBADSLT should result from
sendmsg() if an attempt is made to preallocate a call with an in-use call
ID.

However, the cleanup in the error handling will trigger both assertions in
rxrpc_cleanup_call() because the call isn't marked complete and isn't
marked as having been released.

Fix this by setting the call state in rxrpc_service_prealloc_one() and then
marking it as being released before calling the cleanup function.

Fixes: 00e907127e6f ("rxrpc: Preallocate peers, conns and calls for incoming service requests")
Reported-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: LePremierHomme <kwqcheii@proton.me>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250708211506.2699012-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agovsock/test: fix test for null ptr deref when transport changes
Stefano Garzarella [Tue, 8 Jul 2025 11:17:01 +0000 (13:17 +0200)]
vsock/test: fix test for null ptr deref when transport changes

In test_stream_transport_change_client(), the client sends CONTROL_CONTINUE
on each iteration, even when connect() is unsuccessful. This causes a flood
of control messages in the server that hangs around for more than 10
seconds after the test finishes, triggering several timeouts and causing
subsequent tests to fail. This was discovered in testing a newly proposed
test that failed in this way on the client side:
    ...
    33 - SOCK_STREAM transport change null-ptr-deref...ok
    34 - SOCK_STREAM ioctl(SIOCINQ) functionality...recv timed out

The CONTROL_CONTINUE message is used only to tell to the server to call
accept() to consume successful connections, so that subsequent connect()
will not fail for finding the queue full.

Send CONTROL_CONTINUE message only when the connect() has succeeded, or
found the queue full. Note that the second connect() can also succeed if
the first one was interrupted after sending the request.

Fixes: 3a764d93385c ("vsock/test: Add test for null ptr deref when transport changes")
Cc: leonardi@redhat.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://patch.msgid.link/20250708111701.129585-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'net-phy-bcm54811-phy-initialization'
Jakub Kicinski [Thu, 10 Jul 2025 02:32:32 +0000 (19:32 -0700)]
Merge branch 'net-phy-bcm54811-phy-initialization'

 says:

====================
net: phy: bcm54811: PHY initialization

Proper bcm54811 PHY driver initialization for MII-Lite.

The bcm54811 PHY in MLP package must be setup for MII-Lite interface
mode by software. Normally, the PHY to MAC interface is selected in
hardware by setting the bootstrap pins of the PHY. However, MII and
MII-Lite share the same hardware setup and must be distinguished by
software, setting appropriate bit in a configuration register.
The MII-Lite interface mode is non-standard one, defined by Broadcom
for some of their PHYs. The MII-Lite lightness consist in omitting
RXER, TXER, CRS and COL signals of the standard MII interface.
Absence of COL them makes half-duplex links modes impossible but
does not interfere with Broadcom's BroadR-Reach link modes, because
they are full-duplex only.
To do it in a clean way, MII-Lite must be introduced first, including
its limitation to link modes (no half-duplex), because it is a
prerequisite for the patch #3 of this series. The patch #4 does not
depend on MII-Lite directly but both #3 and #4 are necessary for
bcm54811 to work properly without additional configuration steps to be
done - for example in the bootloader, before the kernel starts.

PATCH 1 - Add MII-Lite PHY interface mode as defined by Broadcom for
   their two-wire PHYs. It can be used with most Ethernet controllers
   under certain limitations (no half-duplex link modes etc.).

PATCH 2 - Add MII-Lite PHY interface type

PATCH 3 - Activation of MII-Lite interface mode on Broadcom bcm5481x
   PHYs

PATCH 4 - Initialize the BCM54811 PHY properly so that it conforms
   to the datasheet regarding a reserved bit in the LRE Control
   register, which must be written to zero after every device reset.
   Ignore the LDS capability bit in LRE Status register on bcm54811.
====================

Link: https://patch.msgid.link/20250708090140.61355-1-kamilh@axis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: phy: bcm54811: PHY initialization
Kamil Horák - 2N [Tue, 8 Jul 2025 09:01:40 +0000 (11:01 +0200)]
net: phy: bcm54811: PHY initialization

Reset the bit 12 in PHY's LRE Control register upon initialization.
According to the datasheet, this bit must be written to zero after
every device reset.

Signed-off-by: Kamil Horák - 2N <kamilh@axis.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250708090140.61355-5-kamilh@axis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: phy: bcm5481x: MII-Lite activation
Kamil Horák - 2N [Tue, 8 Jul 2025 09:01:39 +0000 (11:01 +0200)]
net: phy: bcm5481x: MII-Lite activation

Broadcom PHYs featuring the BroadR-Reach two-wire link mode are usually
capable to operate in simplified MII mode, without TXER, RXER, CRS and
COL signals as defined for the MII. The absence of COL signal makes
half-duplex link modes impossible, however, the BroadR-Reach modes are
all full-duplex only.
Depending on the IC encapsulation, there exist MII-Lite-only PHYs such
as bcm54811 in MLP. The PHY itself is hardware-strapped to select among
multiple RGMII and MII-Lite modes, but the MII-Lite mode must be also
activated by software.

Add MII-Lite activation for bcm5481x PHYs.

Signed-off-by: Kamil Horák - 2N <kamilh@axis.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250708090140.61355-4-kamilh@axis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: ethernet-phy: add MII-Lite phy interface type
Kamil Horák - 2N [Tue, 8 Jul 2025 09:01:38 +0000 (11:01 +0200)]
dt-bindings: ethernet-phy: add MII-Lite phy interface type

Some Broadcom PHYs are capable to operate in simplified MII mode,
without TXER, RXER, CRS and COL signals as defined for the MII.
The MII-Lite mode can be used on most Ethernet controllers with full
MII interface by just leaving the input signals (RXER, CRS, COL)
inactive. The absence of COL signal makes half-duplex link modes
impossible but does not interfere with BroadR-Reach link modes on
Broadcom PHYs, because they are all full-duplex only.

Add new interface type "mii-lite" to phy-connection-type enum.

Signed-off-by: Kamil Horák - 2N <kamilh@axis.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250708090140.61355-3-kamilh@axis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: phy: MII-Lite PHY interface mode
Kamil Horák - 2N [Tue, 8 Jul 2025 09:01:37 +0000 (11:01 +0200)]
net: phy: MII-Lite PHY interface mode

Some Broadcom PHYs are capable to operate in simplified MII mode,
without TXER, RXER, CRS and COL signals as defined for the MII.
The MII-Lite mode can be used on most Ethernet controllers with full
MII interface by just leaving the input signals (RXER, CRS, COL)
inactive. The absence of COL signal makes half-duplex link modes
impossible but does not interfere with BroadR-Reach link modes on
Broadcom PHYs, because they are all full-duplex only.

Add MII-Lite interface mode, especially for Broadcom two-wire PHYs.

Signed-off-by: Kamil Horák - 2N <kamilh@axis.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250708090140.61355-2-kamilh@axis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMAINTAINERS: remove myself as netronome maintainer
Louis Peens [Tue, 8 Jul 2025 08:20:51 +0000 (10:20 +0200)]
MAINTAINERS: remove myself as netronome maintainer

I am moving on from Corigine to different things, for the moment
slightly removed from kernel development. Right now there is nobody I
can in good conscience recommend to take over the maintainer role, but
there are still people available for review, so put the driver state to
'Odd Fixes'.

Additionally add Simon Horman as reviewer - thanks Simon.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: usb: enable the work after stop usbnet by ip down/up
Zqiang [Tue, 8 Jul 2025 08:16:53 +0000 (16:16 +0800)]
net: usb: enable the work after stop usbnet by ip down/up

Oleksij reported that:
The smsc95xx driver fails after one down/up cycle, like this:
 $ nmcli device set enu1u1 managed no
 $ p a a 10.10.10.1/24 dev enu1u1
 $ ping -c 4 10.10.10.3
 $ ip l s dev enu1u1 down
 $ ip l s dev enu1u1 up
 $ ping -c 4 10.10.10.3
The second ping does not reach the host. Networking also fails on other interfaces.

Enable the work by replacing the disable_work_sync() with cancel_work_sync().

[Jun Miao: completely write the commit changelog]

Fixes: 2c04d279e857 ("net: usb: Convert tasklet API to new bottom half workqueue mechanism")
Reported-by: Oleksij Rempel <o.rempel@pengutronix.de>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Signed-off-by: Jun Miao <jun.miao@intel.com>
Link: https://patch.msgid.link/20250708081653.307815-1-jun.miao@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'vsock-introduce-siocinq-ioctl-support'
Jakub Kicinski [Thu, 10 Jul 2025 02:30:00 +0000 (19:30 -0700)]
Merge branch 'vsock-introduce-siocinq-ioctl-support'

Xuewei Niu says:

====================
vsock: Introduce SIOCINQ ioctl support

Introduce SIOCINQ ioctl support for vsock, indicating the length of unread
bytes.

Similar with SIOCOUTQ ioctl, the information is transport-dependent.

The first patch adds SIOCINQ ioctl support in AF_VSOCK.

Thanks to @dexuan, the second patch is to fix the issue where hyper-v
`hvs_stream_has_data()` doesn't return the readable bytes.

The third patch wraps the ioctl into `ioctl_int()`, which implements a
retry mechanism to prevent immediate failure.

The last one adds two test cases to check the functionality. The changes
have been tested, and the results are as expected.
====================

Link: https://patch.msgid.link/20250708-siocinq-v6-0-3775f9a9e359@antgroup.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agotest/vsock: Add ioctl SIOCINQ tests
Xuewei Niu [Tue, 8 Jul 2025 06:36:14 +0000 (14:36 +0800)]
test/vsock: Add ioctl SIOCINQ tests

Add SIOCINQ ioctl tests for both SOCK_STREAM and SOCK_SEQPACKET.

The client waits for the server to send data, and checks if the SIOCINQ
ioctl value matches the data size. After consuming the data, the client
checks if the SIOCINQ value is 0.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://patch.msgid.link/20250708-siocinq-v6-4-3775f9a9e359@antgroup.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agotest/vsock: Add retry mechanism to ioctl wrapper
Xuewei Niu [Tue, 8 Jul 2025 06:36:13 +0000 (14:36 +0800)]
test/vsock: Add retry mechanism to ioctl wrapper

Wrap the ioctl in `ioctl_int()`, which takes a pointer to the actual
int value and an expected int value. The function will not return until
either the ioctl returns the expected value or a timeout occurs, thus
avoiding immediate failure.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://patch.msgid.link/20250708-siocinq-v6-3-3775f9a9e359@antgroup.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agovsock: Add support for SIOCINQ ioctl
Xuewei Niu [Tue, 8 Jul 2025 06:36:12 +0000 (14:36 +0800)]
vsock: Add support for SIOCINQ ioctl

Add support for SIOCINQ ioctl, indicating the length of bytes unread in the
socket. The value is obtained from `vsock_stream_has_data()`.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://patch.msgid.link/20250708-siocinq-v6-2-3775f9a9e359@antgroup.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agohv_sock: Return the readable bytes in hvs_stream_has_data()
Dexuan Cui [Tue, 8 Jul 2025 06:36:11 +0000 (14:36 +0800)]
hv_sock: Return the readable bytes in hvs_stream_has_data()

When hv_sock was originally added, __vsock_stream_recvmsg() and
vsock_stream_has_data() actually only needed to know whether there
is any readable data or not, so hvs_stream_has_data() was written to
return 1 or 0 for simplicity.

However, now hvs_stream_has_data() should return the readable bytes
because vsock_data_ready() -> vsock_stream_has_data() needs to know the
actual bytes rather than a boolean value of 1 or 0.

The SIOCINQ ioctl support also needs hvs_stream_has_data() to return
the readable bytes.

Let hvs_stream_has_data() return the readable bytes of the payload in
the next host-to-guest VMBus hv_sock packet.

Note: there may be multiple incoming hv_sock packets pending in the
VMBus channel's ringbuffer, but so far there is not a VMBus API that
allows us to know all the readable bytes in total without reading and
caching the payload of the multiple packets, so let's just return the
readable bytes of the next single packet. In the future, we'll either
add a VMBus API that allows us to know the total readable bytes without
touching the data in the ringbuffer, or the hv_sock driver needs to
understand the VMBus packet format and parse the packets directly.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Link: https://patch.msgid.link/20250708-siocinq-v6-1-3775f9a9e359@antgroup.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoDocumentation: xsk: correct the obsolete references and examples
Jason Xing [Tue, 8 Jul 2025 06:29:07 +0000 (14:29 +0800)]
Documentation: xsk: correct the obsolete references and examples

The modified lines are mainly related to the following commits[1][2]
which remove those tests and examples. Since samples/bpf has been
deprecated, we can refer to more examples that are easily searched
in the various xdp-projects, like the following link:
https://github.com/xdp-project/bpf-examples/tree/main/AF_XDP-example

[1]
commit f36600634282 ("libbpf: move xsk.{c,h} into selftests/bpf")
[2]
commit cfb5a2dbf141 ("bpf, samples: Remove AF_XDP samples")

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250708062907.11557-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoskbuff: Add MSG_MORE flag to optimize tcp large packet transmission
Feng Yang [Tue, 8 Jul 2025 05:40:53 +0000 (13:40 +0800)]
skbuff: Add MSG_MORE flag to optimize tcp large packet transmission

When using sockmap for forwarding, the average latency for different packet sizes
after sending 10,000 packets is as follows:
size    old(us)         new(us)
512     56              55
1472    58              58
1600    106             81
3000    145             105
5000    182             125

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250708054053.39551-1-yangfeng59949@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'converge-on-using-secs_to_jiffies-part-two'
Jakub Kicinski [Thu, 10 Jul 2025 02:25:03 +0000 (19:25 -0700)]
Merge branch 'converge-on-using-secs_to_jiffies-part-two'

Easwar Hariharan says:

====================
Converge on using secs_to_jiffies() part two

This is the second series (part 1*) that converts users of msecs_to_jiffies() that
either use the multiply pattern of either of:
- msecs_to_jiffies(N*1000) or
- msecs_to_jiffies(N*MSEC_PER_SEC)

where N is a constant or an expression, to avoid the multiplication.

The conversion is made with Coccinelle with the secs_to_jiffies() script
in scripts/coccinelle/misc. Attention is paid to what the best change
can be rather than restricting to what the tool provides.

v1: https://lore.kernel.org/20250219-netdev-secs-to-jiffies-part-2-v1-0-c484cc63611b@linux.microsoft.com
====================

Link: https://patch.msgid.link/20250707-netdev-secs-to-jiffies-part-2-v2-0-b7817036342f@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: ipconfig: convert timeouts to secs_to_jiffies()
Easwar Hariharan [Mon, 7 Jul 2025 22:03:33 +0000 (15:03 -0700)]
net: ipconfig: convert timeouts to secs_to_jiffies()

Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies().  As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication.

This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:

@depends on patch@
expression E;
@@

-msecs_to_jiffies(E * 1000)
+secs_to_jiffies(E)

-msecs_to_jiffies(E * MSEC_PER_SEC)
+secs_to_jiffies(E)

While here, manually convert a couple timeouts denominated in seconds

Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Link: https://patch.msgid.link/20250707-netdev-secs-to-jiffies-part-2-v2-2-b7817036342f@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/smc: convert timeouts to secs_to_jiffies()
Easwar Hariharan [Mon, 7 Jul 2025 22:03:32 +0000 (15:03 -0700)]
net/smc: convert timeouts to secs_to_jiffies()

Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies().  As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication.

This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:

@depends on patch@
expression E;
@@

-msecs_to_jiffies(E * 1000)
+secs_to_jiffies(E)

-msecs_to_jiffies(E * MSEC_PER_SEC)
+secs_to_jiffies(E)

Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Link: https://patch.msgid.link/20250707-netdev-secs-to-jiffies-part-2-v2-1-b7817036342f@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'tcp-better-memory-control-for-not-yet-accepted-sockets'
Jakub Kicinski [Thu, 10 Jul 2025 02:24:12 +0000 (19:24 -0700)]
Merge branch 'tcp-better-memory-control-for-not-yet-accepted-sockets'

Eric Dumazet says:

====================
tcp: better memory control for not-yet-accepted sockets

Address a possible OOM condition caused by a recent change.

Add a new packetdrill test checking the expected behavior.
====================

Link: https://patch.msgid.link/20250707213900.1543248-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoselftests/net: packetdrill: add tcp_ooo-before-and-after-accept.pkt
Eric Dumazet [Mon, 7 Jul 2025 21:39:00 +0000 (21:39 +0000)]
selftests/net: packetdrill: add tcp_ooo-before-and-after-accept.pkt

Test how new passive flows react to ooo incoming packets.

Their sk_rcvbuf can increase only after accept().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250707213900.1543248-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agotcp: refine sk_rcvbuf increase for ooo packets
Eric Dumazet [Mon, 7 Jul 2025 21:38:59 +0000 (21:38 +0000)]
tcp: refine sk_rcvbuf increase for ooo packets

When a passive flow has not been accepted yet, it is
not wise to increase sk_rcvbuf when receiving ooo packets.

A very busy server might tune down tcp_rmem[1] to better
control how much memory can be used by sockets waiting
in its listeners accept queues.

Fixes: 63ad7dfedfae ("tcp: adjust rcvbuf in presence of reorders")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250707213900.1543248-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/sched: Abort __tc_modify_qdisc if parent class does not exist
Victor Nogueira [Mon, 7 Jul 2025 21:08:01 +0000 (18:08 -0300)]
net/sched: Abort __tc_modify_qdisc if parent class does not exist

Lion's patch [1] revealed an ancient bug in the qdisc API.
Whenever a user creates/modifies a qdisc specifying as a parent another
qdisc, the qdisc API will, during grafting, detect that the user is
not trying to attach to a class and reject. However grafting is
performed after qdisc_create (and thus the qdiscs' init callback) is
executed. In qdiscs that eventually call qdisc_tree_reduce_backlog
during init or change (such as fq, hhf, choke, etc), an issue
arises. For example, executing the following commands:

sudo tc qdisc add dev lo root handle a: htb default 2
sudo tc qdisc add dev lo parent a: handle beef fq

Qdiscs such as fq, hhf, choke, etc unconditionally invoke
qdisc_tree_reduce_backlog() in their control path init() or change() which
then causes a failure to find the child class; however, that does not stop
the unconditional invocation of the assumed child qdisc's qlen_notify with
a null class. All these qdiscs make the assumption that class is non-null.

The solution is ensure that qdisc_leaf() which looks up the parent
class, and is invoked prior to qdisc_create(), should return failure on
not finding the class.
In this patch, we leverage qdisc_leaf to return ERR_PTRs whenever the
parentid doesn't correspond to a class, so that we can detect it
earlier on and abort before qdisc_create is called.

[1] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/

Fixes: 5e50da01d0ce ("[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs")
Reported-by: syzbot+d8b58d7b0ad89a678a16@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68663c93.a70a0220.5d25f.0857.GAE@google.com/
Reported-by: syzbot+5eccb463fa89309d8bdc@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68663c94.a70a0220.5d25f.0858.GAE@google.com/
Reported-by: syzbot+1261670bbdefc5485a06@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/686764a5.a00a0220.c7b3.0013.GAE@google.com/
Reported-by: syzbot+15b96fc3aac35468fe77@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/686764a5.a00a0220.c7b3.0014.GAE@google.com/
Reported-by: syzbot+4dadc5aecf80324d5a51@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68679e81.a70a0220.29cf51.0016.GAE@google.com/
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250707210801.372995-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agogve: make IRQ handlers and page allocation NUMA aware
Bailey Forrest [Mon, 7 Jul 2025 21:01:07 +0000 (14:01 -0700)]
gve: make IRQ handlers and page allocation NUMA aware

All memory in GVE is currently allocated without regard for the NUMA
node of the device. Because access to NUMA-local memory access is
significantly cheaper than access to a remote node, this change attempts
to ensure that page frags used in the RX path, including page pool
frags, are allocated on the NUMA node local to the gVNIC device. Note
that this attempt is best-effort. If necessary, the driver will still
allocate non-local memory, as __GFP_THISNODE is not passed. Descriptor
ring allocations are not updated, as dma_alloc_coherent handles that.

This change also modifies the IRQ affinity setting to only select CPUs
from the node local to the device, preserving the behavior that TX and
RX queues of the same index share CPU affinity.

Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250707210107.2742029-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: ethernet: ti: am65-cpsw-nuss: Fix skb size by accounting for skb_shared_info
Chintan Vankar [Mon, 7 Jul 2025 08:52:01 +0000 (14:22 +0530)]
net: ethernet: ti: am65-cpsw-nuss: Fix skb size by accounting for skb_shared_info

While transitioning from netdev_alloc_ip_align() to build_skb(), memory
for the "skb_shared_info" member of an "skb" was not allocated. Fix this
by allocating "PAGE_SIZE" as the skb length, accounting for the packet
length, headroom and tailroom, thereby including the required memory space
for skb_shared_info.

Fixes: 8acacc40f733 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support")
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Chintan Vankar <c-vankar@ti.com>
Link: https://patch.msgid.link/20250707085201.1898818-1-c-vankar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: thunderx: avoid direct MTU assignment after WRITE_ONCE()
Alok Tiwari [Sun, 6 Jul 2025 19:43:21 +0000 (12:43 -0700)]
net: thunderx: avoid direct MTU assignment after WRITE_ONCE()

The current logic in nicvf_change_mtu() writes the new MTU to
netdev->mtu using WRITE_ONCE() before verifying if the hardware
update succeeds. However on hardware update failure, it attempts
to revert to the original MTU using a direct assignment
(netdev->mtu = orig_mtu)
which violates the intended of WRITE_ONCE protection introduced in
commit 1eb2cded45b3 ("net: annotate writes on dev->mtu from
ndo_change_mtu()")

Additionally, WRITE_ONCE(netdev->mtu, new_mtu) is unnecessarily
performed even when the device is not running.

Fix this by:
  Only writing netdev->mtu after successfully updating the hardware.
  Skipping hardware update when the device is down, and setting MTU
  directly. Remove unused variable orig_mtu.

This ensures that all writes to netdev->mtu are consistent with
WRITE_ONCE expectations and avoids unintended state corruption
on failure paths.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250706194327.1369390-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoselftests/tc-testing: Create test case for UAF scenario with DRR/NETEM/BLACKHOLE...
Victor Nogueira [Sat, 5 Jul 2025 20:36:38 +0000 (17:36 -0300)]
selftests/tc-testing: Create test case for UAF scenario with DRR/NETEM/BLACKHOLE chain

Create a tdc test for the UAF scenario with DRR/NETEM/BLACKHOLE chain
shared by Lion on his report [1].

[1] https://lore.kernel.org/netdev/45876f14-cf28-4177-8ead-bb769fd9e57a@gmail.com/

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250705203638.246350-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoatm: clip: Fix NULL pointer dereference in vcc_sendmsg()
Yue Haibing [Sat, 5 Jul 2025 08:52:28 +0000 (16:52 +0800)]
atm: clip: Fix NULL pointer dereference in vcc_sendmsg()

atmarpd_dev_ops does not implement the send method, which may cause crash
as bellow.

BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: Oops: 0010 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5324 Comm: syz.0.0 Not tainted 6.15.0-rc6-syzkaller-00346-g5723cc3450bc #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0018:ffffc9000d3cf778 EFLAGS: 00010246
RAX: 1ffffffff1910dd1 RBX: 00000000000000c0 RCX: dffffc0000000000
RDX: ffffc9000dc82000 RSI: ffff88803e4c4640 RDI: ffff888052cd0000
RBP: ffffc9000d3cf8d0 R08: ffff888052c9143f R09: 1ffff1100a592287
R10: dffffc0000000000 R11: 0000000000000000 R12: 1ffff92001a79f00
R13: ffff888052cd0000 R14: ffff88803e4c4640 R15: ffffffff8c886e88
FS:  00007fbc762566c0(0000) GS:ffff88808d6c2000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 0000000041f1b000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 vcc_sendmsg+0xa10/0xc50 net/atm/common.c:644
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg+0x219/0x270 net/socket.c:727
 ____sys_sendmsg+0x52d/0x830 net/socket.c:2566
 ___sys_sendmsg+0x21f/0x2a0 net/socket.c:2620
 __sys_sendmmsg+0x227/0x430 net/socket.c:2709
 __do_sys_sendmmsg net/socket.c:2736 [inline]
 __se_sys_sendmmsg net/socket.c:2733 [inline]
 __x64_sys_sendmmsg+0xa0/0xc0 net/socket.c:2733
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+e34e5e6b5eddb0014def@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/682f82d5.a70a0220.1765ec.0143.GAE@google.com/T
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250705085228.329202-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'add-microchip-zl3073x-support-part-1'
Jakub Kicinski [Thu, 10 Jul 2025 02:08:56 +0000 (19:08 -0700)]
Merge branch 'add-microchip-zl3073x-support-part-1'

Ivan Vecera says:

====================
Add Microchip ZL3073x support (part 1)

Add support for Microchip Azurite DPLL/PTP/SyncE chip family that
provides DPLL and PTP functionality. This series bring first part
that adds the core functionality and basic DPLL support.

The next part of the series will bring additional DPLL functionality
like eSync support, phase offset and frequency offset reporting and
phase adjustments.

Testing was done by myself and by Prathosh Satish on Microchip EDS2
development board with ZL30732 DPLL chip connected over I2C bus.
====================

Link: https://patch.msgid.link/20250704182202.1641943-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodpll: zl3073x: Add support to get/set frequency on pins
Ivan Vecera [Fri, 4 Jul 2025 18:22:02 +0000 (20:22 +0200)]
dpll: zl3073x: Add support to get/set frequency on pins

Add support to get/set frequency on pins. The frequency for input
pins (references) is computed in the device according this formula:

 freq = base_freq * multiplier * (nominator / denominator)

where the base_freq comes from the list of supported base frequencies
and other parameters are arbitrary numbers. All these parameters are
16-bit unsigned integers.

The frequency for output pin is determined by the frequency of
synthesizer the output pin is connected to and divisor of the output
to which is the given pin belongs. The resulting frequency of the
P-pin and the N-pin from this output pair depends on the signal
format of this output pair.

The device supports so-called N-divided signal formats where for the
N-pin there is an additional divisor. The frequencies for both pins
from such output pair are computed:

 P-pin-freq = synth_freq / output_div
 N-pin-freq = synth_freq / output_div / n_div

For other signal-format types both P and N pin have the same frequency
based only synth frequency and output divisor.

Implement output pin callbacks to get and set frequency. The frequency
setting for the output non-N-divided signal format is simple as we have
to compute just new output divisor. For N-divided formats it is more
complex because by changing of output divisor we change frequency for
both P and N pins. In this case if we are changing frequency for P-pin
we have to compute also new N-divisor for N-pin to keep its current
frequency. From this and the above it follows that the frequency of
the N-pin cannot be higher than the frequency of the P-pin and the
callback must take this limitation into account.

Co-developed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-13-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodpll: zl3073x: Implement input pin state setting in automatic mode
Ivan Vecera [Fri, 4 Jul 2025 18:22:01 +0000 (20:22 +0200)]
dpll: zl3073x: Implement input pin state setting in automatic mode

Implement input pin state setting when the DPLL is running in automatic
mode. Unlike manual mode, the DPLL mode switching is not used here and
the implementation uses special priority value (15) to make the given
pin non-selectable.

When the user sets state of the pin as disconnected the driver
internally sets its priority in HW to 15 that prevents the DPLL to
choose this input pin. Conversely, if the pin status is set to
selectable, the driver sets the pin priority in HW to the original saved
value.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-12-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodpll: zl3073x: Add support to get/set priority on input pins
Ivan Vecera [Fri, 4 Jul 2025 18:22:00 +0000 (20:22 +0200)]
dpll: zl3073x: Add support to get/set priority on input pins

Add support for getting and setting input pin priority. Implement
required callbacks and set appropriate capability for input pins.
Although the pin priority make sense only if the DPLL is running in
automatic mode we have to expose this capability unconditionally because
input pins (references) are shared between all DPLLs where one of them
can run in automatic mode while the other one not.

Co-developed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-11-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodpll: zl3073x: Implement input pin selection in manual mode
Ivan Vecera [Fri, 4 Jul 2025 18:21:59 +0000 (20:21 +0200)]
dpll: zl3073x: Implement input pin selection in manual mode

Implement input pin state setting if the DPLL is running in manual mode.
The driver indicates manual mode if the DPLL mode is one of ref-lock,
forced-holdover, freerun.

Use these modes to implement input pin state change between connected
and disconnected states. When the user set the particular pin as
connected the driver marks this input pin as forced reference and
switches the DPLL mode to ref-lock. When the use set the pin as
disconnected the driver switches the DPLL to freerun or forced holdover
mode. The switch to holdover mode is done if the DPLL has holdover
capability (e.g is currently locked with holdover acquired).

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-10-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodpll: zl3073x: Register DPLL devices and pins
Ivan Vecera [Fri, 4 Jul 2025 18:21:58 +0000 (20:21 +0200)]
dpll: zl3073x: Register DPLL devices and pins

Enumerate all available DPLL channels and registers a DPLL device for
each of them. Check all input references and outputs and register
DPLL pins for them.

Number of registered DPLL pins depends on configuration of references
and outputs. If the reference or output is configured as differential
one then only one DPLL pin is registered. Both references and outputs
can be also disabled from firmware configuration and in this case
no DPLL pins are registered.

All registrable references are registered to all available DPLL devices
with exception of DPLLs that are configured in NCO (numerically
controlled oscillator) mode. In this mode DPLL channel acts as PHC and
cannot be locked to any reference.

Device outputs are connected to one of synthesizers and each synthesizer
is driven by some DPLL channel. So output pins belonging to given output
are registered to DPLL device that drives associated synthesizer.

Finally add kworker task to monitor async changes on all DPLL channels
and input pins and to notify about them DPLL core. Output pins are not
monitored as their parameters are not changed asynchronously by the
device.

Co-developed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-9-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodpll: zl3073x: Read DPLL types and pin properties from system firmware
Ivan Vecera [Fri, 4 Jul 2025 18:21:57 +0000 (20:21 +0200)]
dpll: zl3073x: Read DPLL types and pin properties from system firmware

Add support for reading of DPLL types and optional pin properties from
the system firmware (DT, ACPI...).

The DPLL types are stored in property 'dpll-types' as string array and
possible values 'pps' and 'eec' are mapped to DPLL enums DPLL_TYPE_PPS
and DPLL_TYPE_EEC.

The pin properties are stored under 'input-pins' and 'output-pins'
sub-nodes and the following ones are supported:

* reg
    integer that specifies pin index
* label
    string that is used by driver as board label
* connection-type
    string that indicates pin connection type
* supported-frequencies-hz
    array of u64 values what frequencies are supported / allowed for
    given pin with respect to hardware wiring

Do not blindly trust system firmware and filter out frequencies that
cannot be configured/represented in device (input frequencies have to
be factorized by one of the base frequencies and output frequencies have
to divide configured synthesizer frequency).

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-8-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodpll: zl3073x: Fetch invariants during probe
Ivan Vecera [Fri, 4 Jul 2025 18:21:56 +0000 (20:21 +0200)]
dpll: zl3073x: Fetch invariants during probe

Several configuration parameters will remain constant at runtime,
so we can load them during probe to avoid excessive reads from
the hardware.

Read the following parameters from the device during probe and store
them for later use:

* enablement status and frequencies of the synthesizers and their
  associated DPLL channels
* enablement status and type (single-ended or differential) of input pins
* associated synthesizers, signal format, and enablement status of
  outputs

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-7-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodpll: Add basic Microchip ZL3073x support
Ivan Vecera [Fri, 4 Jul 2025 18:21:55 +0000 (20:21 +0200)]
dpll: Add basic Microchip ZL3073x support

Microchip Azurite ZL3073x represents chip family providing DPLL
and optionally PHC (PTP) functionality. The chips can be connected
be connected over I2C or SPI bus.

They have the following characteristics:
* up to 5 separate DPLL units (channels)
* 5 synthesizers
* 10 input pins (references)
* 10 outputs
* 20 output pins (output pin pair shares one output)
* Each reference and output can operate in either differential or
  single-ended mode (differential mode uses 2 pins)
* Each output is connected to one of the synthesizers
* Each synthesizer is driven by one of the DPLL unit

The device uses 7-bit addresses and 8-bits values. It exposes 8-, 16-,
32- and 48-bits registers in address range <0x000,0x77F>. Due to 7bit
addressing, the range is organized into pages of 128 bytes, with each
page containing a page selector register at address 0x7F.
For reading/writing multi-byte registers, the device supports bulk
transfers.

Add basic functionality to access device registers, probe functionality
both I2C and SPI cases and add devlink support to provide info and
to set clock ID parameter.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-6-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodevlink: Add new "clock_id" generic device param
Ivan Vecera [Fri, 4 Jul 2025 18:21:54 +0000 (20:21 +0200)]
devlink: Add new "clock_id" generic device param

Add a new device generic parameter to specify clock ID that should
be used by the device for registering DPLL devices and pins.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-5-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodevlink: Add support for u64 parameters
Ivan Vecera [Fri, 4 Jul 2025 18:21:53 +0000 (20:21 +0200)]
devlink: Add support for u64 parameters

Only 8, 16 and 32-bit integers are supported for numeric devlink
parameters. The subsequent patch adds support for DPLL clock ID
that is defined as 64-bit number. Add support for u64 parameter
type.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-4-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: dpll: Add support for Microchip Azurite chip family
Ivan Vecera [Fri, 4 Jul 2025 18:21:52 +0000 (20:21 +0200)]
dt-bindings: dpll: Add support for Microchip Azurite chip family

Add DT bindings for Microchip Azurite DPLL chip family. These chips
provide up to 5 independent DPLL channels, 10 differential or
single-ended inputs and 10 differential or 20 single-ended outputs.
They can be connected via I2C or SPI busses.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: dpll: Add DPLL device and pin
Ivan Vecera [Fri, 4 Jul 2025 18:21:51 +0000 (20:21 +0200)]
dt-bindings: dpll: Add DPLL device and pin

Add a common DT schema for DPLL device and its associated pins.
The DPLL (device phase-locked loop) is a device used for precise clock
synchronization in networking and telecom hardware.

The device includes one or more DPLLs (channels) and one or more
physical input/output pins.

Each DPLL channel is used either to provide a pulse-per-clock signal or
to drive an Ethernet equipment clock.

The input and output pins have the following properties:
* label: specifies board label
* connection type: specifies its usage depending on wiring
* list of supported or allowed frequencies: depending on how the pin
  is connected and where)
* embedded sync capability: indicates whether the pin supports this

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agovirtio-net: xsk: rx: move the xdp->data adjustment to buf_to_xdp()
Bui Quang Minh [Sat, 5 Jul 2025 07:55:14 +0000 (14:55 +0700)]
virtio-net: xsk: rx: move the xdp->data adjustment to buf_to_xdp()

This commit does not do any functional changes. It moves xdp->data
adjustment for buffer other than first buffer to buf_to_xdp() helper so
that the xdp_buff adjustment does not scatter over different functions.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250705075515.34260-1-minhquangbui99@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'add-vf-drivers-for-wangxun-virtual-functions'
Jakub Kicinski [Thu, 10 Jul 2025 01:39:15 +0000 (18:39 -0700)]
Merge branch 'add-vf-drivers-for-wangxun-virtual-functions'

Mengyuan Lou says:

====================
Add vf drivers for wangxun virtual functions

Introduces basic support for Wangxun’s virtual function (VF) network
drivers, specifically txgbevf and ngbevf. These drivers provide SR-IOV
VF functionality for Wangxun 10/25/40G network devices.
The first three patches add common APIs for Wangxun VF drivers, including
mailbox communication and shared initialization logic.These abstractions
are placed in libwx to reduce duplication across VF drivers.
Patches 4–8 introduce the txgbevf driver, including:
PCI device initialization, Hardware reset, Interrupt setup, Rx/Tx datapath
implementation and link status changeing flow.
Patches 9–12 implement the ngbevf driver, mirroring the functionality
added in txgbevf.

v2: https://lore.kernel.org/20250625102058.19898-1-mengyuanlou@net-swift.com
v1: https://lore.kernel.org/20250611083559.14175-1-mengyuanlou@net-swift.com
====================

Link: https://patch.msgid.link/20250704094923.652-1-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: ngbevf: add link update flow
Mengyuan Lou [Fri, 4 Jul 2025 09:49:23 +0000 (17:49 +0800)]
net: ngbevf: add link update flow

Add link update flow to wangxun 1G virtual functions.
Get link status from pf in mbox, and if it is failed then
check the vx_status, because vx_status switching is too slow.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-13-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: ngbevf: init interrupts and request irqs
Mengyuan Lou [Fri, 4 Jul 2025 09:49:22 +0000 (17:49 +0800)]
net: ngbevf: init interrupts and request irqs

Add specific parameters for irq alloc, then use
wx_init_interrupt_scheme to initialize interrupt
allocation in probe.
Add .ndo_start_xmit support and start all queues.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-12-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: ngbevf: add sw init pci info and reset hardware
Mengyuan Lou [Fri, 4 Jul 2025 09:49:21 +0000 (17:49 +0800)]
net: ngbevf: add sw init pci info and reset hardware

Do sw init and reset hw for ngbevf virtual functions, then
register netdev.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-11-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: wangxun: add ngbevf build
Mengyuan Lou [Fri, 4 Jul 2025 09:49:20 +0000 (17:49 +0800)]
net: wangxun: add ngbevf build

Add doc build infrastructure for ngbevf driver.
Implement the basic PCI driver loading and unloading interface.
Initialize the id_table which support 1G virtual
functions for Wangxun.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-10-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: txgbevf: add link update flow
Mengyuan Lou [Fri, 4 Jul 2025 09:49:19 +0000 (17:49 +0800)]
net: txgbevf: add link update flow

Add link update flow to wangxun 10/25/40G virtual functions.
Get link status from pf in mbox, and if it is failed then
check the vx_status, because vx_status switching is too slow.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-9-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: txgbevf: Support Rx and Tx process path
Mengyuan Lou [Fri, 4 Jul 2025 09:49:18 +0000 (17:49 +0800)]
net: txgbevf: Support Rx and Tx process path

Improve the configuration of Rx and Tx ring.
Setup and alloc resources.
Configure Rx and Tx unit on hardware.
Add .ndo_start_xmit support and start all queues.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-8-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: txgbevf: init interrupts and request irqs
Mengyuan Lou [Fri, 4 Jul 2025 09:49:17 +0000 (17:49 +0800)]
net: txgbevf: init interrupts and request irqs

Add irq alloc flow functions for vf.
Alloc pcie msix irqs for drivers and request_irq for tx/rx rings
and misc other events.
If the application is successful, config vertors for interrupts.
Enable interrupts mask in wxvf_irq_enable.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-7-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: txgbevf: add sw init pci info and reset hardware
Mengyuan Lou [Fri, 4 Jul 2025 09:49:16 +0000 (17:49 +0800)]
net: txgbevf: add sw init pci info and reset hardware

Add sw init and reset hw for txgbevf virtual functions
which initialize basic parameters, and then register netdev.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-6-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: wangxun: add txgbevf build
Mengyuan Lou [Fri, 4 Jul 2025 09:49:15 +0000 (17:49 +0800)]
net: wangxun: add txgbevf build

Add doc build infrastructure for txgbevf driver.
Implement the basic PCI driver loading and unloading interface.
Initialize the id_table which support 10/25/40G virtual
functions for Wangxun.
Ioremap the space of bar0 and bar4 which will be used.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-5-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: libwx: add wangxun vf common api
Mengyuan Lou [Fri, 4 Jul 2025 09:49:14 +0000 (17:49 +0800)]
net: libwx: add wangxun vf common api

Add common wx_configure_vf and wx_set_mac_vf for
ngbevf and txgbevf.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-4-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: libwx: add base vf api for vf drivers
Mengyuan Lou [Fri, 4 Jul 2025 09:49:13 +0000 (17:49 +0800)]
net: libwx: add base vf api for vf drivers

Implement mbox_write_and_read_ack functions which are
used to set basic functions like set_mac, get_link.etc
for vf.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-3-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: libwx: add mailbox api for wangxun vf drivers
Mengyuan Lou [Fri, 4 Jul 2025 09:49:12 +0000 (17:49 +0800)]
net: libwx: add mailbox api for wangxun vf drivers

Implements the mailbox interfaces for Wangxun vf drivers which
will be used in txgbevf and ngbevf.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-2-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'atm-clip-fix-infinite-recursion-potential-null-ptr-deref-and-memleak'
Jakub Kicinski [Thu, 10 Jul 2025 00:52:30 +0000 (17:52 -0700)]
Merge branch 'atm-clip-fix-infinite-recursion-potential-null-ptr-deref-and-memleak'

Kuniyuki Iwashima says:

====================
atm: clip: Fix infinite recursion, potential null-ptr-deref, and memleak.

Patch 1 fixes racy access to atmarpd found while checking RTNL usage
in clip.c.

Patch 2 fixes memory leak by ioctl(ATMARP_MKIP) and ioctl(ATMARPD_CTRL).

Patch 3 fixes infinite recursive call of clip_vcc->old_push(), which
was reported by syzbot.

v1: https://lore.kernel.org/20250702020437.703698-1-kuniyu@google.com
====================

Link: https://patch.msgid.link/20250704062416.1613927-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoatm: clip: Fix infinite recursive call of clip_push().
Kuniyuki Iwashima [Fri, 4 Jul 2025 06:23:53 +0000 (06:23 +0000)]
atm: clip: Fix infinite recursive call of clip_push().

syzbot reported the splat below. [0]

This happens if we call ioctl(ATMARP_MKIP) more than once.

During the first call, clip_mkip() sets clip_push() to vcc->push(),
and the second call copies it to clip_vcc->old_push().

Later, when the socket is close()d, vcc_destroy_socket() passes
NULL skb to clip_push(), which calls clip_vcc->old_push(),
triggering the infinite recursion.

Let's prevent the second ioctl(ATMARP_MKIP) by checking
vcc->user_back, which is allocated by the first call as clip_vcc.

Note also that we use lock_sock() to prevent racy calls.

[0]:
BUG: TASK stack guard page was hit at ffffc9000d66fff8 (stack is ffffc9000d670000..ffffc9000d678000)
Oops: stack guard page: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5322 Comm: syz.0.0 Not tainted 6.16.0-rc4-syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:clip_push+0x5/0x720 net/atm/clip.c:191
Code: e0 8f aa 8c e8 1c ad 5b fa eb ae 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 55 <41> 57 41 56 41 55 41 54 53 48 83 ec 20 48 89 f3 49 89 fd 48 bd 00
RSP: 0018:ffffc9000d670000 EFLAGS: 00010246
RAX: 1ffff1100235a4a5 RBX: ffff888011ad2508 RCX: ffff8880003c0000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888037f01000
RBP: dffffc0000000000 R08: ffffffff8fa104f7 R09: 1ffffffff1f4209e
R10: dffffc0000000000 R11: ffffffff8a99b300 R12: ffffffff8a99b300
R13: ffff888037f01000 R14: ffff888011ad2500 R15: ffff888037f01578
FS:  000055557ab6d500(0000) GS:ffff88808d250000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000d66fff8 CR3: 0000000043172000 CR4: 0000000000352ef0
Call Trace:
 <TASK>
 clip_push+0x6dc/0x720 net/atm/clip.c:200
 clip_push+0x6dc/0x720 net/atm/clip.c:200
 clip_push+0x6dc/0x720 net/atm/clip.c:200
...
 clip_push+0x6dc/0x720 net/atm/clip.c:200
 clip_push+0x6dc/0x720 net/atm/clip.c:200
 clip_push+0x6dc/0x720 net/atm/clip.c:200
 vcc_destroy_socket net/atm/common.c:183 [inline]
 vcc_release+0x157/0x460 net/atm/common.c:205
 __sock_release net/socket.c:647 [inline]
 sock_close+0xc0/0x240 net/socket.c:1391
 __fput+0x449/0xa70 fs/file_table.c:465
 task_work_run+0x1d1/0x260 kernel/task_work.c:227
 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
 exit_to_user_mode_loop+0xec/0x110 kernel/entry/common.c:114
 exit_to_user_mode_prepare include/linux/entry-common.h:330 [inline]
 syscall_exit_to_user_mode_work include/linux/entry-common.h:414 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:449 [inline]
 do_syscall_64+0x2bd/0x3b0 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff31c98e929
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffb5aa1f78 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
RAX: 0000000000000000 RBX: 0000000000012747 RCX: 00007ff31c98e929
RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
RBP: 00007ff31cbb7ba0 R08: 0000000000000001 R09: 0000000db5aa226f
R10: 00007ff31c7ff030 R11: 0000000000000246 R12: 00007ff31cbb608c
R13: 00007ff31cbb6080 R14: ffffffffffffffff R15: 00007fffb5aa2090
 </TASK>
Modules linked in:

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+0c77cccd6b7cd917b35a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2371d94d248d126c1eb1
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250704062416.1613927-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoatm: clip: Fix memory leak of struct clip_vcc.
Kuniyuki Iwashima [Fri, 4 Jul 2025 06:23:52 +0000 (06:23 +0000)]
atm: clip: Fix memory leak of struct clip_vcc.

ioctl(ATMARP_MKIP) allocates struct clip_vcc and set it to
vcc->user_back.

The code assumes that vcc_destroy_socket() passes NULL skb
to vcc->push() when the socket is close()d, and then clip_push()
frees clip_vcc.

However, ioctl(ATMARPD_CTRL) sets NULL to vcc->push() in
atm_init_atmarp(), resulting in memory leak.

Let's serialise two ioctl() by lock_sock() and check vcc->push()
in atm_init_atmarp() to prevent memleak.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250704062416.1613927-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoatm: clip: Fix potential null-ptr-deref in to_atmarpd().
Kuniyuki Iwashima [Fri, 4 Jul 2025 06:23:51 +0000 (06:23 +0000)]
atm: clip: Fix potential null-ptr-deref in to_atmarpd().

atmarpd is protected by RTNL since commit f3a0592b37b8 ("[ATM]: clip
causes unregister hang").

However, it is not enough because to_atmarpd() is called without RTNL,
especially clip_neigh_solicit() / neigh_ops->solicit() is unsleepable.

Also, there is no RTNL dependency around atmarpd.

Let's use a private mutex and RCU to protect access to atmarpd in
to_atmarpd().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250704062416.1613927-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: net: Add support for Sophgo CV1800 dwmac
Inochi Amaoto [Thu, 3 Jul 2025 02:12:19 +0000 (10:12 +0800)]
dt-bindings: net: Add support for Sophgo CV1800 dwmac

The GMAC IP on CV1800 series SoC is a standard Synopsys
DesignWare MAC (version 3.70a).

Add necessary compatible string for this device.

Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20250703021220.124195-1-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoKVM: x86: avoid underflow when scaling TSC frequency
Paolo Bonzini [Wed, 9 Jul 2025 17:45:14 +0000 (13:45 -0400)]
KVM: x86: avoid underflow when scaling TSC frequency

In function kvm_guest_time_update(), __scale_tsc() is used to calculate
a TSC *frequency* rather than a TSC value.  With low-enough ratios,
a TSC value that is less than 1 would underflow to 0 and to an infinite
while loop in kvm_get_time_scale():

  kvm_guest_time_update(struct kvm_vcpu *v)
    if (kvm_caps.has_tsc_control)
      tgt_tsc_khz = kvm_scale_tsc(tgt_tsc_khz,
                                  v->arch.l1_tsc_scaling_ratio);
        __scale_tsc(u64 ratio, u64 tsc)
          ratio=122380531, tsc=2299998, N=48
          ratio*tsc >> N = 0.999... -> 0

Later in the function:

  Call Trace:
   <TASK>
   kvm_get_time_scale arch/x86/kvm/x86.c:2458 [inline]
   kvm_guest_time_update+0x926/0xb00 arch/x86/kvm/x86.c:3268
   vcpu_enter_guest.constprop.0+0x1e70/0x3cf0 arch/x86/kvm/x86.c:10678
   vcpu_run+0x129/0x8d0 arch/x86/kvm/x86.c:11126
   kvm_arch_vcpu_ioctl_run+0x37a/0x13d0 arch/x86/kvm/x86.c:11352
   kvm_vcpu_ioctl+0x56b/0xe60 virt/kvm/kvm_main.c:4188
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:871 [inline]
   __se_sys_ioctl+0x12d/0x190 fs/ioctl.c:857
   do_syscall_x64 arch/x86/entry/common.c:51 [inline]
   do_syscall_64+0x59/0x110 arch/x86/entry/common.c:81
   entry_SYSCALL_64_after_hwframe+0x78/0xe2

This can really happen only when fuzzing, since the TSC frequency
would have to be nonsensically low.

Fixes: 35181e86df97 ("KVM: x86: Add a common TSC scaling function")
Reported-by: Yuntao Liu <liuyuntao12@huawei.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>