On linux-next, build for bpf selftest displays a warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h'
differs from latest version at 'include/uapi/linux/if_xdp.h'.
Commit 8066e388be48 ("net: add UAPI to the header guard in various network headers")
changed the header guard from _LINUX_IF_XDP_H to _UAPI_LINUX_IF_XDP_H
in include/uapi/linux/if_xdp.h.
To resolve the warning, update tools/include/uapi/linux/if_xdp.h
to align with the changes in include/uapi/linux/if_xdp.h
Fixes: 8066e388be48 ("net: add UAPI to the header guard in various network headers") Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/c2bc466d-dff2-4d0d-a797-9af7f676c065@linux.ibm.com/ Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20250527054138.1086006-1-skb99@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Phil Sutter [Tue, 27 May 2025 09:41:17 +0000 (11:41 +0200)]
selftests: netfilter: Fix skip of wildcard interface test
The script is supposed to skip wildcard interface testing if unsupported
by the host's nft tool. The failing check caused script abort due to
'set -e' though. Fix this by running the potentially failing nft command
inside the if-conditional pipe.
Fixes: 73db1b5dab6f ("selftests: netfilter: Torture nftables netdev hooks") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://patch.msgid.link/20250527094117.18589-1-phil@nwl.cc Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Fri, 23 May 2025 08:27:16 +0000 (10:27 +0200)]
net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames
We have noticed that when PHY timestamping is enabled, L2 frames seems
to be modified by changing two 2 bytes with a value of 0. The place were
these 2 bytes seems to be random(or I couldn't find a pattern). In most
of the cases the userspace can ignore these frames but if for example
those 2 bytes are in the correction field there is nothing to do. This
seems to happen when configuring the HW for IPv4 even that the flow is
not enabled.
These 2 bytes correspond to the UDPv4 checksum and once we don't enable
clearing the checksum when using L2 frames then the frame doesn't seem
to be changed anymore.
Faicker Mo [Fri, 23 May 2025 03:41:43 +0000 (03:41 +0000)]
net: openvswitch: Fix the dead loop of MPLS parse
The unexpected MPLS packet may not end with the bottom label stack.
When there are many stacks, The label count value has wrapped around.
A dead loop occurs, soft lockup/CPU stuck finally.
stack backtrace:
UBSAN: array-index-out-of-bounds in /build/linux-0Pa0xK/linux-5.15.0/net/openvswitch/flow.c:662:26
index -1 is out of range for type '__be32 [3]'
CPU: 34 PID: 0 Comm: swapper/34 Kdump: loaded Tainted: G OE 5.15.0-121-generic #131-Ubuntu
Hardware name: Dell Inc. PowerEdge C6420/0JP9TF, BIOS 2.12.2 07/14/2021
Call Trace:
<IRQ>
show_stack+0x52/0x5c
dump_stack_lvl+0x4a/0x63
dump_stack+0x10/0x16
ubsan_epilogue+0x9/0x36
__ubsan_handle_out_of_bounds.cold+0x44/0x49
key_extract_l3l4+0x82a/0x840 [openvswitch]
? kfree_skbmem+0x52/0xa0
key_extract+0x9c/0x2b0 [openvswitch]
ovs_flow_key_extract+0x124/0x350 [openvswitch]
ovs_vport_receive+0x61/0xd0 [openvswitch]
? kernel_init_free_pages.part.0+0x4a/0x70
? get_page_from_freelist+0x353/0x540
netdev_port_receive+0xc4/0x180 [openvswitch]
? netdev_port_receive+0x180/0x180 [openvswitch]
netdev_frame_hook+0x1f/0x40 [openvswitch]
__netif_receive_skb_core.constprop.0+0x23a/0xf00
__netif_receive_skb_list_core+0xfa/0x240
netif_receive_skb_list_internal+0x18e/0x2a0
napi_complete_done+0x7a/0x1c0
bnxt_poll+0x155/0x1c0 [bnxt_en]
__napi_poll+0x30/0x180
net_rx_action+0x126/0x280
? bnxt_msix+0x67/0x80 [bnxt_en]
handle_softirqs+0xda/0x2d0
irq_exit_rcu+0x96/0xc0
common_interrupt+0x8e/0xa0
</IRQ>
Fixes: fbdcdd78da7c ("Change in Openvswitch to support MPLS label depth of 3 in ingress direction") Signed-off-by: Faicker Mo <faicker.mo@zenlayer.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/259D3404-575D-4A6D-B263-1DF59A67CF89@zenlayer.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kuniyuki Iwashima [Thu, 22 May 2025 22:18:56 +0000 (15:18 -0700)]
calipso: Don't call calipso functions for AF_INET sk.
syzkaller reported a null-ptr-deref in txopt_get(). [0]
The offset 0x70 was of struct ipv6_txoptions in struct ipv6_pinfo,
so struct ipv6_pinfo was NULL there.
However, this never happens for IPv6 sockets as inet_sk(sk)->pinet6
is always set in inet6_create(), meaning the socket was not IPv6 one.
The root cause is missing validation in netlbl_conn_setattr().
netlbl_conn_setattr() switches branches based on struct
sockaddr.sa_family, which is passed from userspace. However,
netlbl_conn_setattr() does not check if the address family matches
the socket.
The syzkaller must have called connect() for an IPv6 address on
an IPv4 socket.
We have a proper validation in tcp_v[46]_connect(), but
security_socket_connect() is called in the earlier stage.
Let's copy the validation to netlbl_conn_setattr().
Fixes: ceba1832b1b2 ("calipso: Set the calipso socket label to match the secattr.") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: John Cheung <john.cs.hey@gmail.com> Closes: https://lore.kernel.org/netdev/CAP=Rh=M1LzunrcQB1fSGauMrJrhL6GGps5cPAKzHJXj6GQV+-g@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20250522221858.91240-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
net_sched: hfsc: Address reentrant enqueue adding class to eltree twice
Savino says:
"We are writing to report that this recent patch
(141d34391abbb315d68556b7c67ad97885407547)
can be bypassed, and a UAF can still occur when HFSC is utilized with
NETEM.
The patch only checks the cl->cl_nactive field to determine whether
it is the first insertion or not, but this field is only
incremented by init_vf.
By using HFSC_RSC (which uses init_ed), it is possible to bypass the
check and insert the class twice in the eltree.
Under normal conditions, this would lead to an infinite loop in
hfsc_dequeue for the reasons we already explained in this report.
However, if TBF is added as root qdisc and it is configured with a
very low rate,
it can be utilized to prevent packets from being dequeued.
This behavior can be exploited to perform subsequent insertions in the
HFSC eltree and cause a UAF."
To fix both the UAF and the infinite loop, with netem as an hfsc child,
check explicitly in hfsc_enqueue whether the class is already in the eltree
whenever the HFSC_RSC flag is set.
Also add a TDC test to reproduce the UAF scenario.
====================
Pedro Tammela [Thu, 22 May 2025 18:14:48 +0000 (15:14 -0300)]
selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem
Reproduce the UAF scenario where netem is a child of HFSC and HFSC
is configured to use the eltree. In such case, this TDC test would
cause the HFSC class to be added to the eltree twice resulting
in a UAF.
Pedro Tammela [Thu, 22 May 2025 18:14:47 +0000 (15:14 -0300)]
net_sched: hfsc: Address reentrant enqueue adding class to eltree twice
Savino says:
"We are writing to report that this recent patch
(141d34391abbb315d68556b7c67ad97885407547) [1]
can be bypassed, and a UAF can still occur when HFSC is utilized with
NETEM.
The patch only checks the cl->cl_nactive field to determine whether
it is the first insertion or not [2], but this field is only
incremented by init_vf [3].
By using HFSC_RSC (which uses init_ed) [4], it is possible to bypass the
check and insert the class twice in the eltree.
Under normal conditions, this would lead to an infinite loop in
hfsc_dequeue for the reasons we already explained in this report [5].
However, if TBF is added as root qdisc and it is configured with a
very low rate,
it can be utilized to prevent packets from being dequeued.
This behavior can be exploited to perform subsequent insertions in the
HFSC eltree and cause a UAF."
To fix both the UAF and the infinite loop, with netem as an hfsc child,
check explicitly in hfsc_enqueue whether the class is already in the eltree
whenever the HFSC_RSC flag is set.
Fixes: 37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs") Reported-by: Savino Dicanosa <savy@syst3mfailure.io> Reported-by: William Liu <will@willsroot.io> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://patch.msgid.link/20250522181448.1439717-2-pctammela@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
1. Active traffic on the leaf node must be stopped before its send queue
is reassigned to the parent. This patch resolves the issue by marking
the node as 'Inner'.
2. During a system reboot, the interface receives TC_HTB_LEAF_DEL
and TC_HTB_LEAF_DEL_LAST callbacks to delete its HTB queues.
In the case of TC_HTB_LEAF_DEL_LAST, although the same send queue
is reassigned to the parent, the current logic still attempts to update
the real number of queues, leadning to below warnings
New queues can't be registered after device unregistration.
WARNING: CPU: 0 PID: 6475 at net/core/net-sysfs.c:1714
netdev_queue_update_kobjects+0x1e4/0x200
Fixes: 5e6808b4c68d ("octeontx2-pf: Add support for HTB offload") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250522115842.1499666-1-hkelam@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hariprasad Kelam [Thu, 22 May 2025 09:47:41 +0000 (15:17 +0530)]
octeontx2-pf: QOS: Perform cache sync on send queue teardown
QOS is designed to create a new send queue whenever a class
is created, ensuring proper shaping and scheduling. However,
when multiple send queues are created and deleted in a loop,
SMMU errors are observed.
This patch addresses the issue by performing an data cache sync
during the teardown of QOS send queues.
Haiyang Zhang [Mon, 19 May 2025 16:20:36 +0000 (09:20 -0700)]
net: mana: Add support for Multi Vports on Bare metal
To support Multi Vports on Bare metal, increase the device config response
version. And, skip the register HW vport, and register filter steps, when
the Bare metal hostmode is set.
Mina Almasry [Fri, 23 May 2025 23:05:23 +0000 (23:05 +0000)]
net: devmem: ksft: upgrade rx test to send 1K data
The current test just sends "hello\nworld" and verifies that is the
string received on the RX side. That is fine, but improve the test a bit
by sending 1K data. The test should be improved further to send more
data, but for now this should be a welcome improvement.
The test will send a repeating pattern of 0x01, 0x02, ... 0x06. The
ncdevmem `-v 7` flag will verify this pattern. ncdevmem will provide
useful debugging info when the test fails, such as the frags received
and verified fine, and which frag exactly failed, what was the expected
byte pattern, and what is the actual byte pattern received. All this
debug information will be useful when the test fails.
Mina Almasry [Fri, 23 May 2025 23:05:22 +0000 (23:05 +0000)]
net: devmem: ksft: add 5 tuple FS support
ncdevmem supports drivers that are limited to either 3-tuple or 5-tuple
FS support, but the ksft is currently 3-tuple only. Support drivers that
have 5-tuple FS supported by adding a ksft arg.
Signed-off-by: Mina Almasry <almasrymina@google.com>
fix 5-tuple
Mina Almasry [Fri, 23 May 2025 23:05:20 +0000 (23:05 +0000)]
net: devmem: ksft: add ipv4 support
ncdevmem supports both ipv4 and ipv6, but the ksft is currently
ipv6-only. Propagate the ipv4 support to the ksft, so that folks that
are limited to these networks can also test.
Mina Almasry [Fri, 23 May 2025 23:05:17 +0000 (23:05 +0000)]
net: devmem: move list_add to net_devmem_bind_dmabuf.
It's annoying for the list_add to be outside net_devmem_bind_dmabuf, but
the list_del is in net_devmem_unbind_dmabuf. Make it consistent by
having both the list_add/del be inside the net_devmem_[un]bind_dmabuf.
Cc: ap420073@gmail.com Signed-off-by: Mina Almasry <almasrymina@google.com> Tested-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250523230524.1107879-2-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal [Fri, 23 May 2025 12:16:57 +0000 (14:16 +0200)]
selftests: netfilter: nft_queue.sh: include file transfer duration in log message
Paolo Abeni says:
Recently the nipa CI infra went through some tuning, and the mentioned
self-test now often fails.
The failing test is the sctp+nfqueue one, where the file transfer takes
too long and hits the timeout (1 minute).
Because SCTP nfqueue tests had timeout related issues before (esp. on debug
kernels) print the file transfer duration in the PASS/FAIL message.
This would aallow us to see if there is/was an unexpected slowdown
(CI keeps logs around) or 'creeping slowdown' where things got slower
over time until 'fail point' was reached.
Output of altered lines looks like this:
PASS: tcp and nfqueue in forward chan (duration: 2s)
PASS: tcp via loopback (duration: 2s)
PASS: sctp and nfqueue in forward chain (duration: 42s)
PASS: sctp and nfqueue in output chain with GSO (duration: 21s)
Reported-by: Paolo Abeni <pabeni@redhat.com Closes: https://lore.kernel.org/netdev/584524ef-9fd7-4326-9f1b-693ca62c5692@redhat.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20250523121700.20011-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Horatiu Vultur [Thu, 22 May 2025 11:57:22 +0000 (13:57 +0200)]
net: phy: mscc: Fix memory leak when using one step timestamping
Fix memory leak when running one-step timestamping. When running
one-step sync timestamping, the HW is configured to insert the TX time
into the frame, so there is no reason to keep the skb anymore. As in
this case the HW will never generate an interrupt to say that the frame
was timestamped, then the frame will never released.
Fix this by freeing the frame in case of one-step timestamping.
Rengarajan S [Fri, 23 May 2025 17:33:26 +0000 (23:03 +0530)]
net: lan743x: Modify the EEPROM and OTP size for PCI1xxxx devices
Maximum OTP and EEPROM size for hearthstone PCI1xxxx devices are 8 Kb
and 64 Kb respectively. Adjust max size definitions and return correct
EEPROM length based on device. Also prevent out-of-bound read/write.
Jiawen Wu [Fri, 23 May 2025 08:04:37 +0000 (16:04 +0800)]
net: libwx: Fix statistics of multicast packets
When SR-IOV is enabled, the number of multicast packets is mistakenly
counted starting from queue 0. It would be a wrong count that includes
the packets received on VF. Fix it to count from the correct offset.
Jakub Kicinski [Wed, 28 May 2025 00:52:03 +0000 (17:52 -0700)]
Merge branch 'refactor-phy-reset-handling-and'
Thangaraj Samynathan says:
====================
Refactor PHY reset handling and fix WOL
This patch series refines the PHY reset and initialization logic in
the lan743x driver. Enhance the robustness of the driver initialization
process and prevent WOL-related issues during suspend/resume cycles.
====================
Thangaraj Samynathan [Mon, 26 May 2025 05:30:48 +0000 (11:00 +0530)]
net: lan743x: Fix PHY reset handling during initialization and WOL
Remove lan743x_phy_init from lan743x_hardware_init as it resets the PHY
registers, causing WOL to fail on subsequent attempts. Add a call to
lan743x_hw_reset_phy in the probe function to ensure the PHY is reset
during device initialization.
Fixes: 23f0703c125be ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250526053048.287095-3-thangaraj.s@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thangaraj Samynathan [Mon, 26 May 2025 05:30:47 +0000 (11:00 +0530)]
net: lan743x: rename lan743x_reset_phy to lan743x_hw_reset_phy
rename the function to lan743x_hw_reset_phy to better describe it
operation.
Fixes: 23f0703c125be ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250526053048.287095-2-thangaraj.s@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Greg Kroah-Hartman [Thu, 22 May 2025 11:21:47 +0000 (13:21 +0200)]
net: phy: fix up const issues in to_mdio_device() and to_phy_device()
Both to_mdio_device() and to_phy_device() "throw away" the const pointer
attribute passed to them and return a non-const pointer, which generally
is not a good thing overall. Fix this up by using container_of_const()
which was designed for this very problem.
Cc: Alexander Lobakin <alobakin@pm.me> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Fixes: 7eab14de73a8 ("mdio, phy: fix -Wshadow warnings triggered by nested container_of()") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2025052246-conduit-glory-8fc9@gregkh Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed [Thu, 22 May 2025 21:41:16 +0000 (00:41 +0300)]
net: Kconfig NET_DEVMEM selects GENERIC_ALLOCATOR
GENERIC_ALLOCATOR is a non-prompt kconfig, meaning users can't enable it
selectively. All kconfig users of GENERIC_ALLOCATOR select it, except of
NET_DEVMEM which only depends on it, there is no easy way to turn
GENERIC_ALLOCATOR on unless we select other unnecessary configs that
will select it.
Instead of depending on it, select it when NET_DEVMEM is enabled.
Hangbin Liu [Mon, 26 May 2025 01:46:00 +0000 (01:46 +0000)]
selftests: net: move wait_local_port_listen to lib.sh
The function wait_local_port_listen() is the only function defined in
net_helper.sh. Since some tests source both lib.sh and net_helper.sh,
we can simplify the setup by moving wait_local_port_listen() to lib.sh.
With this change, net_helper.sh becomes redundant and can be removed.
Christophe JAILLET [Sun, 25 May 2025 09:21:24 +0000 (11:21 +0200)]
cxgb4: Constify struct thermal_zone_device_ops
'struct thermal_zone_device_ops' are not modified in this driver.
Constifying these structures moves some data to a read-only section, so
increases overall security, especially when the structure holds some
function pointers.
On a x86_64, with allmodconfig:
Before:
======
text data bss dec hex filename
2912 1064 0 3976 f88 drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.o
After:
=====
text data bss dec hex filename
3040 936 0 3976 f88 drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.o
'struct thermal_zone_device_ops' are not modified in this driver.
Constifying these structures moves some data to a read-only section, so
increases overall security, especially when the structure holds some
function pointers.
While at it, also constify a struct thermal_zone_params.
On a x86_64, with allmodconfig:
Before:
======
text data bss dec hex filename
24899 8036 0 32935 80a7 drivers/net/ethernet/mellanox/mlxsw/core_thermal.o
After:
=====
text data bss dec hex filename
25379 7556 0 32935 80a7 drivers/net/ethernet/mellanox/mlxsw/core_thermal.o
Dan Carpenter [Fri, 23 May 2025 16:00:12 +0000 (19:00 +0300)]
net/mlx5: HWS, Fix an error code in mlx5hws_bwc_rule_create_complex()
This was intended to be negative -ENOMEM but the '-' character was left
off accidentally. This typo doesn't affect runtime because the caller
treats all non-zero returns the same.
Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/aDCbjNcquNC68Hyj@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zilin Guan [Fri, 23 May 2025 11:47:17 +0000 (11:47 +0000)]
tipc: use kfree_sensitive() for aead cleanup
The tipc_aead_free() function currently uses kfree() to release the aead
structure. However, this structure contains sensitive information, such
as key's SALT value, which should be securely erased from memory to
prevent potential leakage.
To enhance security, replace kfree() with kfree_sensitive() when freeing
the aead structure. This change ensures that sensitive data is explicitly
cleared before memory deallocation, aligning with the approach used in
tipc_aead_init() and adhering to best practices for handling confidential
information.
Donald Hunter [Fri, 23 May 2025 10:30:31 +0000 (11:30 +0100)]
tools: ynl: parse extack for sub-messages
Extend the Python YNL extack decoding to handle sub-messages in the same
way that YNL C does. This involves retaining the input values so that
they are available during extack decoding.
Wentao Liang [Sat, 24 May 2025 16:34:25 +0000 (00:34 +0800)]
net/mlx5: Add error handling in mlx5_query_nic_vport_node_guid()
The function mlx5_query_nic_vport_node_guid() calls the function
mlx5_query_nic_vport_context() but does not check its return value.
A proper implementation can be found in mlx5_nic_vport_query_local_lb().
Add error handling for mlx5_query_nic_vport_context(). If it fails, free
the out buffer via kvfree() and return error code.
Fixes: 9efa75254593 ("net/mlx5_core: Introduce access functions to query vport RoCE fields") Cc: stable@vger.kernel.org # v4.5 Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250524163425.1695-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christophe JAILLET [Sat, 24 May 2025 07:29:11 +0000 (09:29 +0200)]
net: airoha: Fix an error handling path in airoha_alloc_gdm_port()
If register_netdev() fails, the error handling path of the probe will not
free the memory allocated by the previous airoha_metadata_dst_alloc() call
because port->dev->reg_state will not be NETREG_REGISTERED.
So, an explicit airoha_metadata_dst_free() call is needed in this case to
avoid a memory leak.
Wei Fang [Fri, 23 May 2025 08:37:59 +0000 (16:37 +0800)]
net: phy: clear phydev->devlink when the link is deleted
There is a potential crash issue when disabling and re-enabling the
network port. When disabling the network port, phy_detach() calls
device_link_del() to remove the device link, but it does not clear
phydev->devlink, so phydev->devlink is not a NULL pointer. Then the
network port is re-enabled, but if phy_attach_direct() fails before
calling device_link_add(), the code jumps to the "error" label and
calls phy_detach(). Since phydev->devlink retains the old value from
the previous attach/detach cycle, device_link_del() uses the old value,
which accesses a NULL pointer and causes a crash. The simplified crash
log is as follows.
Therefore, phydev->devlink needs to be cleared when the device link is
deleted.
Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250523083759.3741168-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christian Marangi [Thu, 22 May 2025 16:53:11 +0000 (18:53 +0200)]
net: phy: mediatek: Add Airoha AN7583 PHY support
Add Airoha AN7583 PHY support based on Airoha AN7581 with the small
difference that BMCR_PDOWN is enabled by default and needs to be cleared
to make the internal PHY correctly work.
Add airoha,an7583-switch additional compatible to the mt7530 DSA Switch
Family. This is an exact match of the airoha,en7581-switch (based on
mt7988-switch) with the additional requirement of tweak on the
GEPHY_CONN_CFG registers to make the internal PHY actually work.
Alok Tiwari [Thu, 22 May 2025 07:43:55 +0000 (00:43 -0700)]
Doc: networking: Fix various typos in rds.rst
Corrected "sages" to "messages" in the bitmap allocation description.
Fixed "competed" to "completed" in the recv path datagram handling section.
Corrected "privatee" to "private" in the multipath RDS section.
Fixed "mutlipath" to "multipath" in the transport capabilities description.
These changes improve documentation clarity and maintain consistency.
Mark Bloch [Thu, 22 May 2025 07:13:56 +0000 (10:13 +0300)]
net/mlx5e: Allow setting MAC address of representors
A representor netdev does not correspond to real hardware that needs to
be updated when setting the MAC address. The default eth_mac_addr() is
sufficient for simply updating the netdev's MAC address with validation.
Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747898036-1121904-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
octeontx2-pf: Do not detect MACSEC block based on silicon
Out of various silicon variants of CN10K series some have hardware
MACSEC block for offloading MACSEC operations and some do not.
AF driver already has the information of whether MACSEC is present
or not on running silicon. Hence fetch that information from
AF via mailbox message.
====================
Subbaraya Sundeep [Thu, 22 May 2025 06:15:48 +0000 (11:45 +0530)]
octeontx2-pf: macsec: Get MACSEC capability flag from AF
The presence of MACSEC block is currently figured out based
on the running silicon variant. This may not be correct all
the times since the MACSEC block can be fused out. Hence get
the macsec info from AF via mailbox.
Subbaraya Sundeep [Thu, 22 May 2025 06:15:28 +0000 (11:45 +0530)]
octeontx2-af: Add MACSEC capability flag
MACSEC block may be fused out on some silicons hence modify
get_hw_cap mailbox message to set a capability flag in its
response message based on MACSEC block availability.
xsk: add missing virtual address conversion for page
In commit 7ead4405e06f ("xsk: convert xdp_copy_frags_from_zc() to use
page_pool_dev_alloc()"), when converting from netmem to page, I missed a
call to page_address() around skb_frag_page(frag) to get the virtual
address of the page. This commit uses skb_frag_address() helper to fix
the issue.
Fixes: 7ead4405e06f ("xsk: convert xdp_copy_frags_from_zc() to use page_pool_dev_alloc()") Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20250522040115.5057-1-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Calling `PACKET_ADD_MEMBERSHIP` on an ops-locked device can trigger
the `NETDEV_UNREGISTER` notifier, which may require disabling promiscuous
and/or allmulti mode. Both of these operations require acquiring
the netdev instance lock.
Move the call to `packet_dev_mc` outside of the RCU critical section.
The `mclist` modifications (add, del, flush, unregister) are protected by
the RTNL, not the RCU. The RCU only protects the `sklist` and its
associated `sks`. The delayed operation on the `mclist` entry remains
within the RTNL.
Reported-by: syzbot+b191b5ccad8d7a986286@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b191b5ccad8d7a986286 Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250522031129.3247266-1-stfomichev@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michal Luczaj [Wed, 21 May 2025 23:18:25 +0000 (01:18 +0200)]
vsock/test: Add test for an unexpectedly lingering close()
There was an issue with SO_LINGER: instead of blocking until all queued
messages for the socket have been successfully sent (or the linger timeout
has been reached), close() would block until packets were handled by the
peer.
Add a test to alert on close() lingering when it should not.
Michal Luczaj [Wed, 21 May 2025 23:18:22 +0000 (01:18 +0200)]
vsock: Move lingering logic to af_vsock core
Lingering should be transport-independent in the long run. In preparation
for supporting other transports, as well as the linger on shutdown(), move
code to core.
Generalize by querying vsock_transport::unsent_bytes(), guard against the
callback being unimplemented. Do not pass sk_lingertime explicitly. Pull
SOCK_LINGER check into vsock_linger().
Flatten the function. Remove the nested block by inverting the condition:
return early on !timeout.
Michal Luczaj [Wed, 21 May 2025 23:18:21 +0000 (01:18 +0200)]
vsock/virtio: Linger on unsent data
Currently vsock's lingering effectively boils down to waiting (or timing
out) until packets are consumed or dropped by the peer; be it by receiving
the data, closing or shutting down the connection.
To align with the semantics described in the SO_LINGER section of man
socket(7) and to mimic AF_INET's behaviour more closely, change the logic
of a lingering close(): instead of waiting for all data to be handled,
block until data is considered sent from the vsock's transport point of
view. That is until worker picks the packets for processing and decrements
virtio_vsock_sock::bytes_unsent down to 0.
Note that (some interpretation of) lingering was always limited to
transports that called virtio_transport_wait_close() on transport release.
This does not change, i.e. under Hyper-V and VMCI no lingering would be
observed.
The implementation does not adhere strictly to man page's interpretation of
SO_LINGER: shutdown() will not trigger the lingering. This follows AF_INET.
Stefano Radaelli [Wed, 21 May 2025 21:28:15 +0000 (23:28 +0200)]
net: phy: add driver for MaxLinear MxL86110 PHY
Add support for the MaxLinear MxL86110 Gigabit Ethernet PHY, a low-power,
cost-optimized transceiver supporting 10/100/1000 Mbps over twisted-pair
copper, compliant with IEEE 802.3.
The driver implements basic features such as:
- Device initialization
- RGMII interface timing configuration
- Wake-on-LAN support
- LED initialization and control via /sys/class/leds
This driver has been tested on multiple Variscite boards, including:
- VAR-SOM-MX93 (i.MX93)
- VAR-SOM-MX8M-PLUS (i.MX8MP)
Paolo Abeni [Tue, 27 May 2025 07:10:05 +0000 (09:10 +0200)]
Merge branch 'wireguard-updates-for-6-16'
Jason A. Donenfeld says:
====================
wireguard updates for 6.16
This small series contains mostly cleanups and one new feature:
1) Kees' __nonstring annotation comes to wireguard.
2) Two selftest fixes, one to help with compilation on gcc 15, and one
removing stale config options.
3) Adoption of NLA_POLICY_MASK.
4) Jordan has added the ability to run:
# wg set ... peer ... allowed-ips -192.168.1.0/24
Which will remove the allowed IP for that peer. Previously you had to
replace all the IPs non-atomically, or move it to a dummy peer
atomically, which wasn't very clean.
====================
Jordan Rife [Wed, 21 May 2025 21:27:06 +0000 (23:27 +0200)]
wireguard: allowedips: add WGALLOWEDIP_F_REMOVE_ME flag
The current netlink API for WireGuard does not directly support removal
of allowed ips from a peer. A user can remove an allowed ip from a peer
in one of two ways:
1. By using the WGPEER_F_REPLACE_ALLOWEDIPS flag and providing a new
list of allowed ips which omits the allowed ip that is to be removed.
2. By reassigning an allowed ip to a "dummy" peer then removing that
peer with WGPEER_F_REMOVE_ME.
With the first approach, the driver completely rebuilds the allowed ip
list for a peer. If my current configuration is such that a peer has
allowed ips 192.168.0.2 and 192.168.0.3 and I want to remove 192.168.0.2
the actual transition looks like this.
[192.168.0.2, 192.168.0.3] <-- Initial state
[] <-- Step 1: Allowed ips removed for peer
[192.168.0.3] <-- Step 2: Allowed ips added back for peer
This is true even if the allowed ip list is small and the update does
not need to be batched into multiple WG_CMD_SET_DEVICE requests, as the
removal and subsequent addition of ips is non-atomic within a single
request. Consequently, wg_allowedips_lookup_dst and
wg_allowedips_lookup_src may return NULL while reconfiguring a peer even
for packets bound for ips a user did not intend to remove leading to
unintended interruptions in connectivity. This presents in userspace as
failed calls to sendto and sendmsg for UDP sockets. In my case, I ran
netperf while repeatedly reconfiguring the allowed ips for a peer with
wg.
/usr/local/bin/netperf -H 10.102.73.72 -l 10m -t UDP_STREAM -- -R 1 -m 1024
send_data: data send error: No route to host (errno 113)
netperf: send_omni: send_data failed: No route to host
While this may not be of particular concern for environments where peers
and allowed ips are mostly static, systems like Cilium manage peers and
allowed ips in a dynamic environment where peers (i.e. Kubernetes nodes)
and allowed ips (i.e. pods running on those nodes) can frequently
change making WGPEER_F_REPLACE_ALLOWEDIPS problematic.
The second approach avoids any possible connectivity interruptions
but is hacky and less direct, requiring the creation of a temporary
peer just to dispose of an allowed ip.
Introduce a new flag called WGALLOWEDIP_F_REMOVE_ME which in the same
way that WGPEER_F_REMOVE_ME allows a user to remove a single peer from
a WireGuard device's configuration allows a user to remove an ip from a
peer's set of allowed ips. This enables incremental updates to a
device's configuration without any connectivity blips or messy
workarounds.
A corresponding patch for wg extends the existing `wg set` interface to
leverage this feature.
$ wg set wg0 peer <PUBKEY> allowed-ips +192.168.88.0/24,-192.168.0.1/32
When '+' or '-' is prepended to any ip in the list, wg clears
WGPEER_F_REPLACE_ALLOWEDIPS and sets the WGALLOWEDIP_F_REMOVE_ME flag on
any ip prefixed with '-'.
Signed-off-by: Jordan Rife <jordan@jrife.io>
[Jason: minor style nits, fixes to selftest, bump of wireguard-tools version] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-5-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jason A. Donenfeld [Wed, 21 May 2025 21:27:05 +0000 (23:27 +0200)]
wireguard: netlink: use NLA_POLICY_MASK where possible
Rather than manually validating flags against the various __ALL_*
constants, put this in the netlink policy description and have the upper
layer machinery check it for us.
Kees Cook [Wed, 21 May 2025 21:27:04 +0000 (23:27 +0200)]
wireguard: global: add __nonstring annotations for unterminated strings
When a character array without a terminating NUL character has a static
initializer, GCC 15's -Wunterminated-string-initialization will only
warn if the array lacks the "nonstring" attribute[1]. Mark the arrays
with __nonstring to correctly identify the char array as "not a C string"
and thereby eliminate the warning:
../drivers/net/wireguard/cookie.c:29:56: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (9 chars into 8 available) [-Wunterminated-string-initialization]
29 | static const u8 mac1_key_label[COOKIE_KEY_LABEL_LEN] = "mac1----";
| ^~~~~~~~~~
../drivers/net/wireguard/cookie.c:30:58: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (9 chars into 8 available) [-Wunterminated-string-initialization]
30 | static const u8 cookie_key_label[COOKIE_KEY_LABEL_LEN] = "cookie--";
| ^~~~~~~~~~
../drivers/net/wireguard/noise.c:28:38: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (38 chars into 37 available) [-Wunterminated-string-initialization]
28 | static const u8 handshake_name[37] = "Noise_IKpsk2_25519_ChaChaPoly_BLAKE2s";
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/net/wireguard/noise.c:29:39: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (35 chars into 34 available) [-Wunterminated-string-initialization]
29 | static const u8 identifier_name[34] = "WireGuard v1 zx2c4 Jason@zx2c4.com";
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The arrays are always used with their fixed size, so use __nonstring.
Commit 918327e9b7ff ("ubsan: Remove CONFIG_UBSAN_SANITIZE_ALL")
removed the CONFIG_UBSAN_SANITIZE_ALL configuration option.
Eliminate invalid configurations to improve code readability.
Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-2-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
net: Convert dev_set_mac_address() to struct sockaddr_storage
As part of the effort to allow the compiler to reason about object sizes,
we need to deal with the problematic variably sized struct sockaddr,
which has no internal runtime size tracking. In much of the network
stack the use of struct sockaddr_storage has been adopted. Continue the
transition toward this for more of the internal APIs. Specifically:
Kees Cook [Wed, 21 May 2025 20:46:15 +0000 (13:46 -0700)]
rtnetlink: do_setlink: Use struct sockaddr_storage
Instead of a heap allocating a variably sized struct sockaddr and lying
about the type in the call to netif_set_mac_address(), use a stack
allocated struct sockaddr_storage. This lets us drop the cast and avoid
the allocation.
Putting "ss" on the stack means it will get a reused stack slot since
it is the same size (128B) as other existing single-scope stack variables,
like the vfinfo array (128B), so no additional stack space is used by
this function.
Kees Cook [Wed, 21 May 2025 20:46:14 +0000 (13:46 -0700)]
net: core: Convert dev_set_mac_address() to struct sockaddr_storage
All users of dev_set_mac_address() are now using a struct sockaddr_storage.
Convert the internal data type to struct sockaddr_storage, drop the casts,
and update pointer types.
Kees Cook [Wed, 21 May 2025 20:46:12 +0000 (13:46 -0700)]
ieee802154: Use struct sockaddr_storage with dev_set_mac_address()
Switch to struct sockaddr_storage for calling dev_set_mac_address(). Add
a temporary cast to struct sockaddr, which will be removed in a
subsequent patch.
Kees Cook [Wed, 21 May 2025 20:46:10 +0000 (13:46 -0700)]
net: core: Switch netif_set_mac_address() to struct sockaddr_storage
In order to avoid passing around struct sockaddr that has a size the
compiler cannot reason about (nor track at runtime), convert
netif_set_mac_address() to take struct sockaddr_storage. This is just a
cast conversion, so there is are no binary changes. Following patches
will make actual allocation changes.
In Dmaengine flow, driver maintains struct skbuf_dma_descriptor rings each
element of which corresponds to a skb. In Tx datapath, compare available
space in skb ring with number of skbs instead of skb fragments.
Replace x * (MAX_SKB_FRAGS) in netif_txq_completed_wake() and
netif_txq_maybe_stop() with x * (1 skb) to fix the comparison.
Fixes: 6a91b846af85 ("net: axienet: Introduce dmaengine support") Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250521181608.669554-1-suraj.gupta2@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Baris Can Goral [Wed, 21 May 2025 16:10:37 +0000 (19:10 +0300)]
replace strncpy with strscpy_pad
The strncpy() function is actively dangerous to use since it may not
NULL-terminate the destination string, resulting in potential memory
content exposures, unbounded reads, or crashes. Link: https://github.com/KSPP/linux/issues/90
In addition, strscpy_pad is more appropriate because it also zero-fills
any remaining space in the destination if the source is shorter than
the provided buffer size.
The function mlx5_query_nic_vport_qkey_viol_cntr() calls the function
mlx5_query_nic_vport_context() but does not check its return value. This
could lead to undefined behavior if the query fails. A proper
implementation can be found in mlx5_nic_vport_query_local_lb().
Add error handling for mlx5_query_nic_vport_context(). If it fails, free
the out buffer via kvfree() and return error code.
Fixes: 9efa75254593 ("net/mlx5_core: Introduce access functions to query vport RoCE fields") Cc: stable@vger.kernel.org # v4.5 Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250521133620.912-1-vulab@iscas.ac.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Wed, 21 May 2025 12:41:59 +0000 (14:41 +0200)]
net: lan966x: Fix 1-step timestamping over ipv4 or ipv6
When enabling 1-step timestamping for ptp frames that are over udpv4 or
udpv6 then the inserted timestamp is added at the wrong offset in the
frame, meaning that will modify the frame at the wrong place, so the
frame will be malformed.
To fix this, the HW needs to know which kind of frame it is to know
where to insert the timestamp. For that there is a field in the IFH that
says the PDU_TYPE, which can be NONE which is the default value,
IPV4 or IPV6. Therefore make sure to set the PDU_TYPE so the HW knows
where to insert the timestamp.
Like I mention before the issue is not seen with L2 frames because by
default the PDU_TYPE has a value of 0, which represents the L2 frames.
Stefano Garzarella [Wed, 21 May 2025 12:17:05 +0000 (14:17 +0200)]
vsock/virtio: fix `rx_bytes` accounting for stream sockets
In `struct virtio_vsock_sock`, we maintain two counters:
- `rx_bytes`: used internally to track how many bytes have been read.
This supports mechanisms like .stream_has_data() and sock_rcvlowat().
- `fwd_cnt`: used for the credit mechanism to inform available receive
buffer space to the remote peer.
These counters are updated via virtio_transport_inc_rx_pkt() and
virtio_transport_dec_rx_pkt().
Since the beginning with commit 06a8fc78367d ("VSOCK: Introduce
virtio_vsock_common.ko"), we call virtio_transport_dec_rx_pkt() in
virtio_transport_stream_do_dequeue() only when we consume the entire
packet, so partial reads, do not update `rx_bytes` and `fwd_cnt`.
This is fine for `fwd_cnt`, because we still have space used for the
entire packet, and we don't want to update the credit for the other
peer until we free the space of the entire packet. However, this
causes `rx_bytes` to be stale on partial reads.
Previously, this didn’t cause issues because `rx_bytes` was used only by
.stream_has_data(), and any unread portion of a packet implied data was
still available. However, since commit 93b808876682
("virtio/vsock: fix logic which reduces credit update messages"), we now
rely on `rx_bytes` to determine if a credit update should be sent when
the data in the RX queue drops below SO_RCVLOWAT value.
This patch fixes the accounting by updating `rx_bytes` with the number
of bytes actually read, even on partial reads, while leaving `fwd_cnt`
untouched until the packet is fully consumed. Also introduce a new
`buf_used` counter to check that the remote peer is honoring the given
credit; this was previously done via `rx_bytes`.
Paolo Abeni [Mon, 26 May 2025 16:30:47 +0000 (18:30 +0200)]
Merge tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
1) Remove some unnecessary strscpy_pad() size arguments.
From Thorsten Blum.
2) Correct use of xso.real_dev on bonding offloads.
Patchset from Cosmin Ratiu.
3) Add hardware offload configuration to XFRM_MSG_MIGRATE.
From Chiachang Wang.
4) Refactor migration setup during cloning. This was
done after the clone was created. Now it is done
in the cloning function itself.
From Chiachang Wang.
5) Validate assignment of maximal possible SEQ number.
Prevent from setting to the maximum sequrnce number
as this would cause for traffic drop.
From Leon Romanovsky.
6) Prevent configuration of interface index when offload
is used. Hardware can't handle this case.i
From Leon Romanovsky.
7) Always use kfree_sensitive() for SA secret zeroization.
From Zilin Guan.
ipsec-next-2025-05-23
* tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: use kfree_sensitive() for SA secret zeroization
xfrm: prevent configuration of interface index when offload is used
xfrm: validate assignment of maximal possible SEQ number
xfrm: Refactor migration setup during the cloning process
xfrm: Migrate offload configuration
bonding: Fix multiple long standing offload races
bonding: Mark active offloaded xfrm_states
xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
xfrm: Remove unneeded device check from validate_xmit_xfrm
xfrm: Use xdo.dev instead of xdo.real_dev
net/mlx5: Avoid using xso.real_dev unnecessarily
xfrm: Remove unnecessary strscpy_pad() size arguments
====================
Subbaraya Sundeep [Wed, 21 May 2025 10:30:43 +0000 (16:00 +0530)]
octeontx2-af: Send Link events one by one
Send link events one after another otherwise new message
is overwriting the message which is being processed by PF.
Fixes: a88e0f936ba9 ("octeontx2: Detect the mbox up or down message via register") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747823443-404-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jeremy Kerr [Wed, 21 May 2025 09:33:36 +0000 (17:33 +0800)]
net: mctp: use nlmsg_payload() for netlink message data extraction
Jakub suggests:
> I have a different request :) Matt, once this ends up in net-next
> (end of this week) could you refactor it to use nlmsg_payload() ?
> It doesn't exist in net but this is exactly why it was added.
This refactors the additions to both mctp_dump_addrinfo(), and
mctp_rtm_getneigh() - two cases where we're calling nlh_data() on an
an incoming netlink message, without a prior nlmsg_parse().
For the neigh.c case, we cannot hit the failure where the nlh does not
contain a full ndmsg at present, as the core handler
(net/core/neighbour.c, neigh_get()) has already validated the size
through neigh_valid_req_get(), and would have failed the get operation
before the MCTP hander is called.
However, relying on that is a bit fragile, so apply the nlmsg_payload
refector here too.
====================
Add the capability to consume SRAM for hwfd descriptor queue in airoha_eth driver
In order to improve packet processing and packet forwarding
performances, EN7581 SoC supports consuming SRAM instead of DRAM for hw
forwarding descriptors queue. For downlink hw accelerated traffic
request to consume SRAM memory for hw forwarding descriptors queue.
Moreover, in some configurations QDMA blocks require a contiguous block
of system memory for hwfd buffers queue. Introduce the capability to
allocate hw buffers forwarding queue via the reserved-memory DTS
property instead of running dmam_alloc_coherent().
Lorenzo Bianconi [Wed, 21 May 2025 07:16:39 +0000 (09:16 +0200)]
net: airoha: Add the capability to allocate hfwd descriptors in SRAM
In order to improve packet processing and packet forwarding
performances, EN7581 SoC supports consuming SRAM instead of DRAM for
hw forwarding descriptors queue.
For downlink hw accelerated traffic request to consume SRAM memory
for hw forwarding descriptors queue.
Lorenzo Bianconi [Wed, 21 May 2025 07:16:38 +0000 (09:16 +0200)]
net: airoha: Add the capability to allocate hwfd buffers via reserved-memory
In some configurations QDMA blocks require a contiguous block of
system memory for hwfd buffers queue. Introduce the capability to allocate
hw buffers forwarding queue via the reserved-memory DTS property instead of
running dmam_alloc_coherent().
Lorenzo Bianconi [Wed, 21 May 2025 07:16:37 +0000 (09:16 +0200)]
net: airoha: Do not store hfwd references in airoha_qdma struct
Since hfwd descriptor and buffer queues are allocated via
dmam_alloc_coherent() we do not need to store their references
in airoha_qdma struct. This patch does not introduce any logical changes,
just code clean-up.
Introduce memory-region and memory-region-names properties for the
ethernet node available on EN7581 SoC in order to reserve system memory
for hw forwarding buffers queue used by the QDMA modules.
Jiawen Wu [Wed, 21 May 2025 06:44:02 +0000 (14:44 +0800)]
net: txgbe: Implement SRIOV for AML devices
Since .mac_link_up and .mac_link_down are changed for AML 25G/10G NICs,
the SR-IOV related function should be invoked in these new functions, to
bring VFs link up.
Jiawen Wu [Wed, 21 May 2025 06:44:00 +0000 (14:44 +0800)]
net: txgbe: Restrict the use of mismatched FW versions
The new added mailbox commands require a new released firmware version.
Otherwise, a lot of logs "Unknown FW command" would be printed. And the
devices may not work properly. So add the test command in the probe
function.
Jiawen Wu [Wed, 21 May 2025 06:43:59 +0000 (14:43 +0800)]
net: txgbe: Correct the currect link settings
For AML 25G/10G devices, some of the information returned from
phylink_ethtool_ksettings_get() is not correct, since there is a
fixed-link mode. So add additional corrections.
Jiawen Wu [Wed, 21 May 2025 06:43:58 +0000 (14:43 +0800)]
net: txgbe: Support to handle GPIO IRQs for AML devices
The driver needs to handle GPIO interrupts to identify SFP module and
configure PHY by sending mailbox messages to firmware.
Since the SFP module needs to wait for ready to get information when it
is inserted, workqueue is added to handle delayed tasks. And each SW-FW
interaction takes time to wait, so they are processed in the workqueue
instead of IRQ handler function.
Jiawen Wu [Wed, 21 May 2025 06:43:57 +0000 (14:43 +0800)]
net: txgbe: Implement PHYLINK for AML 25G/10G devices
There is a new PHY attached to AML 25G/10G NIC, which is different from
SP 10G/1G NIC. But the PHY configuration is handed over to firmware, and
also I2C is controlled by firmware. So the different PHYLINK fixed-link
mode is added for these devices.
Jiawen Wu [Wed, 21 May 2025 06:43:56 +0000 (14:43 +0800)]
net: txgbe: Distinguish between 40G and 25G devices
For the following patches to support PHYLINK for AML 25G devices,
separate MAC type wx_mac_aml40 to maintain the driver of 40G devices.
Because 40G devices will complete support later, not now.
And this patch makes the 25G devices use some PHYLINK interfaces, but it
is not yet create PHYLINK and cannot be used on its own. It is just
preparation for the next patches.
Jiawen Wu [Wed, 21 May 2025 06:43:55 +0000 (14:43 +0800)]
net: wangxun: Use specific flag bit to simplify the code
Most of the different code that requires MAC type in the common library
is due to NGBE only supports a few queues and pools, unlike TXGBE, which
supports 128 queues and 64 pools. This difference accounts for most of
the hardware configuration differences in the driver code. So add a flag
bit "WX_FLAG_MULTI_64_FUNC" for them to clean-up the driver code.