www.infradead.org Git - users/jedix/linux-maple.git/log

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Merge in late fixes to prepare for the 6.16 net-next PR.

No conflicts nor adjacent changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests/bpf: Fix bpf selftest build warning

On linux-next, build for bpf selftest displays a warning:

Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h'
differs from latest version at 'include/uapi/linux/if_xdp.h'.

Commit 8066e388be48 ("net: add UAPI to the header guard in various network headers")
changed the header guard from _LINUX_IF_XDP_H to _UAPI_LINUX_IF_XDP_H
in include/uapi/linux/if_xdp.h.

To resolve the warning, update tools/include/uapi/linux/if_xdp.h
to align with the changes in include/uapi/linux/if_xdp.h

Fixes: 8066e388be48 ("net: add UAPI to the header guard in various network headers")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/c2bc466d-dff2-4d0d-a797-9af7f676c065@linux.ibm.com/
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20250527054138.1086006-1-skb99@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: netfilter: Fix skip of wildcard interface test

The script is supposed to skip wildcard interface testing if unsupported
by the host's nft tool. The failing check caused script abort due to
'set -e' though. Fix this by running the potentially failing nft command
inside the if-conditional pipe.

Fixes: 73db1b5dab6f ("selftests: netfilter: Torture nftables netdev hooks")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://patch.msgid.link/20250527094117.18589-1-phil@nwl.cc
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames

We have noticed that when PHY timestamping is enabled, L2 frames seems
to be modified by changing two 2 bytes with a value of 0. The place were
these 2 bytes seems to be random(or I couldn't find a pattern). In most
of the cases the userspace can ignore these frames but if for example
those 2 bytes are in the correction field there is nothing to do. This
seems to happen when configuring the HW for IPv4 even that the flow is
not enabled.
These 2 bytes correspond to the UDPv4 checksum and once we don't enable
clearing the checksum when using L2 frames then the frame doesn't seem
to be changed anymore.

Fixes: 7d272e63e0979d ("net: phy: mscc: timestamping and PHC support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250523082716.2935895-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: openvswitch: Fix the dead loop of MPLS parse

The unexpected MPLS packet may not end with the bottom label stack.
When there are many stacks, The label count value has wrapped around.
A dead loop occurs, soft lockup/CPU stuck finally.

stack backtrace:
UBSAN: array-index-out-of-bounds in /build/linux-0Pa0xK/linux-5.15.0/net/openvswitch/flow.c:662:26
index -1 is out of range for type '__be32 [3]'
CPU: 34 PID: 0 Comm: swapper/34 Kdump: loaded Tainted: G OE 5.15.0-121-generic #131-Ubuntu
Hardware name: Dell Inc. PowerEdge C6420/0JP9TF, BIOS 2.12.2 07/14/2021
Call Trace:
<IRQ>
show_stack+0x52/0x5c
dump_stack_lvl+0x4a/0x63
dump_stack+0x10/0x16
ubsan_epilogue+0x9/0x36
__ubsan_handle_out_of_bounds.cold+0x44/0x49
key_extract_l3l4+0x82a/0x840 [openvswitch]
? kfree_skbmem+0x52/0xa0
key_extract+0x9c/0x2b0 [openvswitch]
ovs_flow_key_extract+0x124/0x350 [openvswitch]
ovs_vport_receive+0x61/0xd0 [openvswitch]
? kernel_init_free_pages.part.0+0x4a/0x70
? get_page_from_freelist+0x353/0x540
netdev_port_receive+0xc4/0x180 [openvswitch]
? netdev_port_receive+0x180/0x180 [openvswitch]
netdev_frame_hook+0x1f/0x40 [openvswitch]
__netif_receive_skb_core.constprop.0+0x23a/0xf00
__netif_receive_skb_list_core+0xfa/0x240
netif_receive_skb_list_internal+0x18e/0x2a0
napi_complete_done+0x7a/0x1c0
bnxt_poll+0x155/0x1c0 [bnxt_en]
__napi_poll+0x30/0x180
net_rx_action+0x126/0x280
? bnxt_msix+0x67/0x80 [bnxt_en]
handle_softirqs+0xda/0x2d0
irq_exit_rcu+0x96/0xc0
common_interrupt+0x8e/0xa0
</IRQ>

Fixes: fbdcdd78da7c ("Change in Openvswitch to support MPLS label depth of 3 in ingress direction")
Signed-off-by: Faicker Mo <faicker.mo@zenlayer.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/259D3404-575D-4A6D-B263-1DF59A67CF89@zenlayer.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

calipso: Don't call calipso functions for AF_INET sk.

syzkaller reported a null-ptr-deref in txopt_get(). [0]

The offset 0x70 was of struct ipv6_txoptions in struct ipv6_pinfo,
so struct ipv6_pinfo was NULL there.

However, this never happens for IPv6 sockets as inet_sk(sk)->pinet6
is always set in inet6_create(), meaning the socket was not IPv6 one.

The root cause is missing validation in netlbl_conn_setattr().

netlbl_conn_setattr() switches branches based on struct
sockaddr.sa_family, which is passed from userspace.  However,
netlbl_conn_setattr() does not check if the address family matches
the socket.

The syzkaller must have called connect() for an IPv6 address on
an IPv4 socket.

We have a proper validation in tcp_v[46]_connect(), but
security_socket_connect() is called in the earlier stage.

Let's copy the validation to netlbl_conn_setattr().

[0]:
Oops: general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
CPU: 2 UID: 0 PID: 12928 Comm: syz.9.1677 Not tainted 6.12.0 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:txopt_get include/net/ipv6.h:390 [inline]
RIP: 0010:
Code: 02 00 00 49 8b ac 24 f8 02 00 00 e8 84 69 2a fd e8 ff 00 16 fd 48 8d 7d 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 53 02 00 00 48 8b 6d 70 48 85 ed 0f 84 ab 01 00
RSP: 0018:ffff88811b8afc48 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: 1ffff11023715f8a RCX: ffffffff841ab00c
RDX: 000000000000000e RSI: ffffc90007d9e000 RDI: 0000000000000070
RBP: 0000000000000000 R08: ffffed1023715f9d R09: ffffed1023715f9e
R10: ffffed1023715f9d R11: 0000000000000003 R12: ffff888123075f00
R13: ffff88810245bd80 R14: ffff888113646780 R15: ffff888100578a80
FS:  00007f9019bd7640(0000) GS:ffff8882d2d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f901b927bac CR3: 0000000104788003 CR4: 0000000000770ef0
PKRU: 80000000
Call Trace:
<TASK>
calipso_sock_setattr+0x56/0x80 net/netlabel/netlabel_calipso.c:557
netlbl_conn_setattr+0x10c/0x280 net/netlabel/netlabel_kapi.c:1177
selinux_netlbl_socket_connect_helper+0xd3/0x1b0 security/selinux/netlabel.c:569
selinux_netlbl_socket_connect_locked security/selinux/netlabel.c:597 [inline]
selinux_netlbl_socket_connect+0xb6/0x100 security/selinux/netlabel.c:615
selinux_socket_connect+0x5f/0x80 security/selinux/hooks.c:4931
security_socket_connect+0x50/0xa0 security/security.c:4598
__sys_connect_file+0xa4/0x190 net/socket.c:2067
__sys_connect+0x12c/0x170 net/socket.c:2088
__do_sys_connect net/socket.c:2098 [inline]
__se_sys_connect net/socket.c:2095 [inline]
__x64_sys_connect+0x73/0xb0 net/socket.c:2095
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xaa/0x1b0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f901b61a12d
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f9019bd6fa8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 00007f901b925fa0 RCX: 00007f901b61a12d
RDX: 000000000000001c RSI: 0000200000000140 RDI: 0000000000000003
RBP: 00007f901b701505 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f901b5b62a0 R15: 00007f9019bb7000
</TASK>
Modules linked in:

Fixes: ceba1832b1b2 ("calipso: Set the calipso socket label to match the secattr.")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: John Cheung <john.cs.hey@gmail.com>
Closes: https://lore.kernel.org/netdev/CAP=Rh=M1LzunrcQB1fSGauMrJrhL6GGps5cPAKzHJXj6GQV+-g@mail.gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://patch.msgid.link/20250522221858.91240-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net_sched-hfsc-address-reentrant-enqueue-adding-class-to-eltree-twice'

Pedro Tammela says:

====================
net_sched: hfsc: Address reentrant enqueue adding class to eltree twice

Savino says:
    "We are writing to report that this recent patch
    (141d34391abbb315d68556b7c67ad97885407547)
    can be bypassed, and a UAF can still occur when HFSC is utilized with
    NETEM.

    The patch only checks the cl->cl_nactive field to determine whether
    it is the first insertion or not, but this field is only
    incremented by init_vf.

    By using HFSC_RSC (which uses init_ed), it is possible to bypass the
    check and insert the class twice in the eltree.
    Under normal conditions, this would lead to an infinite loop in
    hfsc_dequeue for the reasons we already explained in this report.

    However, if TBF is added as root qdisc and it is configured with a
    very low rate,
    it can be utilized to prevent packets from being dequeued.
    This behavior can be exploited to perform subsequent insertions in the
    HFSC eltree and cause a UAF."

To fix both the UAF and the infinite loop, with netem as an hfsc child,
check explicitly in hfsc_enqueue whether the class is already in the eltree
whenever the HFSC_RSC flag is set.

Also add a TDC test to reproduce the UAF scenario.
====================

Link: https://patch.msgid.link/20250522181448.1439717-1-pctammela@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem

Reproduce the UAF scenario where netem is a child of HFSC and HFSC
is configured to use the eltree. In such case, this TDC test would
cause the HFSC class to be added to the eltree twice resulting
in a UAF.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://patch.msgid.link/20250522181448.1439717-3-pctammela@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net_sched: hfsc: Address reentrant enqueue adding class to eltree twice

Savino says:
    "We are writing to report that this recent patch
    (141d34391abbb315d68556b7c67ad97885407547) [1]
    can be bypassed, and a UAF can still occur when HFSC is utilized with
    NETEM.

    The patch only checks the cl->cl_nactive field to determine whether
    it is the first insertion or not [2], but this field is only
    incremented by init_vf [3].

    By using HFSC_RSC (which uses init_ed) [4], it is possible to bypass the
    check and insert the class twice in the eltree.
    Under normal conditions, this would lead to an infinite loop in
    hfsc_dequeue for the reasons we already explained in this report [5].

    However, if TBF is added as root qdisc and it is configured with a
    very low rate,
    it can be utilized to prevent packets from being dequeued.
    This behavior can be exploited to perform subsequent insertions in the
    HFSC eltree and cause a UAF."

To fix both the UAF and the infinite loop, with netem as an hfsc child,
check explicitly in hfsc_enqueue whether the class is already in the eltree
whenever the HFSC_RSC flag is set.

[1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=141d34391abbb315d68556b7c67ad97885407547
[2] https://elixir.bootlin.com/linux/v6.15-rc5/source/net/sched/sch_hfsc.c#L1572
[3] https://elixir.bootlin.com/linux/v6.15-rc5/source/net/sched/sch_hfsc.c#L677
[4] https://elixir.bootlin.com/linux/v6.15-rc5/source/net/sched/sch_hfsc.c#L1574
[5] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/T/#u

Fixes: 37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs")
Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
Reported-by: William Liu <will@willsroot.io>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://patch.msgid.link/20250522181448.1439717-2-pctammela@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback

This patch addresses below issues,

1. Active traffic on the leaf node must be stopped before its send queue
   is reassigned to the parent. This patch resolves the issue by marking
   the node as 'Inner'.

2. During a system reboot, the interface receives TC_HTB_LEAF_DEL
   and TC_HTB_LEAF_DEL_LAST callbacks to delete its HTB queues.
   In the case of TC_HTB_LEAF_DEL_LAST, although the same send queue
   is reassigned to the parent, the current logic still attempts to update
   the real number of queues, leadning to below warnings

        New queues can't be registered after device unregistration.
        WARNING: CPU: 0 PID: 6475 at net/core/net-sysfs.c:1714
        netdev_queue_update_kobjects+0x1e4/0x200

Fixes: 5e6808b4c68d ("octeontx2-pf: Add support for HTB offload")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250522115842.1499666-1-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

octeontx2-pf: QOS: Perform cache sync on send queue teardown

QOS is designed to create a new send queue whenever a class
is created, ensuring proper shaping and scheduling. However,
when multiple send queues are created and deleted in a loop,
SMMU errors are observed.

This patch addresses the issue by performing an data cache sync
during the teardown of QOS send queues.

Fixes: ab6dddd2a669 ("octeontx2-pf: qos send queues management")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250522094742.1498295-1-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mana: Add support for Multi Vports on Bare metal

To support Multi Vports on Bare metal, increase the device config response
version. And, skip the register HW vport, and register filter steps, when
the Bare metal hostmode is set.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1747671636-5810-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'devmem-tcp-minor-cleanups-and-ksft-improvements'

Mina Almasry says:

====================
Devmem TCP minor cleanups and ksft improvements

v2: https://lore.kernel.org/20250519023517.4062941-1-almasrymina@google.com
====================

Link: https://patch.msgid.link/20250523230524.1107879-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devmem: ncdevmem: remove unused variable

This variable is unused and can be removed.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250523230524.1107879-9-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devmem: ksft: upgrade rx test to send 1K data

The current test just sends "hello\nworld" and verifies that is the
string received on the RX side. That is fine, but improve the test a bit
by sending 1K data. The test should be improved further to send more
data, but for now this should be a welcome improvement.

The test will send a repeating pattern of 0x01, 0x02, ... 0x06. The
ncdevmem `-v 7` flag will verify this pattern. ncdevmem will provide
useful debugging info when the test fails, such as the frags received
and verified fine, and which frag exactly failed, what was the expected
byte pattern, and what is the actual byte pattern received. All this
debug information will be useful when the test fails.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250523230524.1107879-8-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devmem: ksft: add 5 tuple FS support

ncdevmem supports drivers that are limited to either 3-tuple or 5-tuple
FS support, but the ksft is currently 3-tuple only. Support drivers that
have 5-tuple FS supported by adding a ksft arg.

Signed-off-by: Mina Almasry <almasrymina@google.com>
fix 5-tuple

fix 5-tuple
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250523230524.1107879-7-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devmem: ksft: add exit_wait to make rx test pass

This exit_wait seems necessary to make the rx side test pass for me.
I think this is just missed from the original test add patch. Add it now.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250523230524.1107879-6-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devmem: ksft: add ipv4 support

ncdevmem supports both ipv4 and ipv6, but the ksft is currently
ipv6-only. Propagate the ipv4 support to the ksft, so that folks that
are limited to these networks can also test.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250523230524.1107879-5-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devmem: preserve sockc_err

Preserve the error code returned by sock_cmsg_send and return that on
err.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250523230524.1107879-4-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

page_pool: fix ugly page_pool formatting

Minor cleanup; this line is badly formatted.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250523230524.1107879-3-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devmem: move list_add to net_devmem_bind_dmabuf.

It's annoying for the list_add to be outside net_devmem_bind_dmabuf, but
the list_del is in net_devmem_unbind_dmabuf. Make it consistent by
having both the list_add/del be inside the net_devmem_[un]bind_dmabuf.

Cc: ap420073@gmail.com
Signed-off-by: Mina Almasry <almasrymina@google.com>
Tested-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250523230524.1107879-2-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: netfilter: nft_queue.sh: include file transfer duration in log message

Paolo Abeni says:
Recently the nipa CI infra went through some tuning, and the mentioned
self-test now often fails.

The failing test is the sctp+nfqueue one, where the file transfer takes
too long and hits the timeout (1 minute).

Because SCTP nfqueue tests had timeout related issues before (esp. on debug
kernels) print the file transfer duration in the PASS/FAIL message.
This would aallow us to see if there is/was an unexpected slowdown
(CI keeps logs around) or 'creeping slowdown' where things got slower
over time until 'fail point' was reached.

Output of altered lines looks like this:
  PASS: tcp and nfqueue in forward chan (duration: 2s)
  PASS: tcp via loopback (duration: 2s)
  PASS: sctp and nfqueue in forward chain (duration: 42s)
  PASS: sctp and nfqueue in output chain with GSO (duration: 21s)

Reported-by: Paolo Abeni <pabeni@redhat.com
Closes: https://lore.kernel.org/netdev/584524ef-9fd7-4326-9f1b-693ca62c5692@redhat.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250523121700.20011-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: mscc: Fix memory leak when using one step timestamping

Fix memory leak when running one-step timestamping. When running
one-step sync timestamping, the HW is configured to insert the TX time
into the frame, so there is no reason to keep the skb anymore. As in
this case the HW will never generate an interrupt to say that the frame
was timestamped, then the frame will never released.
Fix this by freeing the frame in case of one-step timestamping.

Fixes: 7d272e63e0979d ("net: phy: mscc: timestamping and PHC support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250522115722.2827199-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: mark sctp_do_peeloff static

sctp_do_peeloff is only used inside of net/sctp/socket.c,
so mark it static.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20250526054745.2329201-1-hch@lst.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan743x: Modify the EEPROM and OTP size for PCI1xxxx devices

Maximum OTP and EEPROM size for hearthstone PCI1xxxx devices are 8 Kb
and 64 Kb respectively. Adjust max size definitions and return correct
EEPROM length based on device. Also prevent out-of-bound read/write.

Signed-off-by: Rengarajan S <rengarajan.s@microchip.com>
Link: https://patch.msgid.link/20250523173326.18509-1-rengarajan.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: Support the FDIR rules assigned to VFs

When SR-IOV is enabled, the FDIR rule is supported to filter packets to
VFs. The action queue id is calculated as an absolute id.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/BE7EA355FDDAAA97+20250523080438.27968-2-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: libwx: Fix statistics of multicast packets

When SR-IOV is enabled, the number of multicast packets is mistakenly
counted starting from queue 0. It would be a wrong count that includes
the packets received on VF. Fix it to count from the correct offset.

Fixes: c52d4b898901 ("net: libwx: Redesign flow when sriov is enabled")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/F70910CFE86C1F6F+20250523080438.27968-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'refactor-phy-reset-handling-and'

Thangaraj Samynathan says:

====================
Refactor PHY reset handling and fix WOL

This patch series refines the PHY reset and initialization logic in
the lan743x driver. Enhance the robustness of the driver initialization
process and prevent WOL-related issues during suspend/resume cycles.
====================

Link: https://patch.msgid.link/20250526053048.287095-1-thangaraj.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan743x: Fix PHY reset handling during initialization and WOL

Remove lan743x_phy_init from lan743x_hardware_init as it resets the PHY
registers, causing WOL to fail on subsequent attempts. Add a call to
lan743x_hw_reset_phy in the probe function to ensure the PHY is reset
during device initialization.

Fixes: 23f0703c125be ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250526053048.287095-3-thangaraj.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan743x: rename lan743x_reset_phy to lan743x_hw_reset_phy

rename the function to lan743x_hw_reset_phy to better describe it
operation.

Fixes: 23f0703c125be ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250526053048.287095-2-thangaraj.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: macb: Check return value of dma_set_mask_and_coherent()

Issue flagged by coverity. Add a safety check for the return value
of dma_set_mask_and_coherent, go to a safe exit if it returns error.

Link: https://scan7.scan.coverity.com/#/project-view/53936/11354?selectedIssue=1643754
Signed-off-by: Sergio Perez Gonzalez <sperezglz@gmail.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Link: https://patch.msgid.link/20250526032034.84900-1-sperezglz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fix up const issues in to_mdio_device() and to_phy_device()

Both to_mdio_device() and to_phy_device() "throw away" the const pointer
attribute passed to them and return a non-const pointer, which generally
is not a good thing overall. Fix this up by using container_of_const()
which was designed for this very problem.

Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Fixes: 7eab14de73a8 ("mdio, phy: fix -Wshadow warnings triggered by nested container_of()")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2025052246-conduit-glory-8fc9@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Kconfig NET_DEVMEM selects GENERIC_ALLOCATOR

GENERIC_ALLOCATOR is a non-prompt kconfig, meaning users can't enable it
selectively. All kconfig users of GENERIC_ALLOCATOR select it, except of
NET_DEVMEM which only depends on it, there is no easy way to turn
GENERIC_ALLOCATOR on unless we select other unnecessary configs that
will select it.

Instead of depending on it, select it when NET_DEVMEM is enabled.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/1747950086-1246773-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: move wait_local_port_listen to lib.sh

The function wait_local_port_listen() is the only function defined in
net_helper.sh. Since some tests source both lib.sh and net_helper.sh,
we can simplify the setup by moving wait_local_port_listen() to lib.sh.

With this change, net_helper.sh becomes redundant and can be removed.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250526014600.9128-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cxgb4: Constify struct thermal_zone_device_ops

'struct thermal_zone_device_ops' are not modified in this driver.

Constifying these structures moves some data to a read-only section, so
increases overall security, especially when the structure holds some
function pointers.

On a x86_64, with allmodconfig:
Before:
======
   text    data     bss     dec     hex filename
   2912    1064       0    3976     f88 drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.o

After:
=====
   text    data     bss     dec     hex filename
   3040     936       0    3976     f88 drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://patch.msgid.link/e6416e0d15ea27a55fe1fb4e349928ac7bae1b95.1748164843.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: core_thermal: Constify struct thermal_zone_device_ops

'struct thermal_zone_device_ops' are not modified in this driver.

Constifying these structures moves some data to a read-only section, so
increases overall security, especially when the structure holds some
function pointers.

While at it, also constify a struct thermal_zone_params.

On a x86_64, with allmodconfig:
Before:
======
   text    data     bss     dec     hex filename
  24899    8036       0   32935    80a7 drivers/net/ethernet/mellanox/mlxsw/core_thermal.o

After:
=====
   text    data     bss     dec     hex filename
  25379    7556       0   32935    80a7 drivers/net/ethernet/mellanox/mlxsw/core_thermal.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/4516676973f5adc1cdb76db1691c0f98b6fa6614.1748164348.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, Fix an error code in mlx5hws_bwc_rule_create_complex()

This was intended to be negative -ENOMEM but the '-' character was left
off accidentally. This typo doesn't affect runtime because the caller
treats all non-zero returns the same.

Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/aDCbjNcquNC68Hyj@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tipc: use kfree_sensitive() for aead cleanup

The tipc_aead_free() function currently uses kfree() to release the aead
structure. However, this structure contains sensitive information, such
as key's SALT value, which should be securely erased from memory to
prevent potential leakage.

To enhance security, replace kfree() with kfree_sensitive() when freeing
the aead structure. This change ensures that sensitive data is explicitly
cleared before memory deallocation, aligning with the approach used in
tipc_aead_init() and adhering to best practices for handling confidential
information.

Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Link: https://patch.msgid.link/20250523114717.4021518-1-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: parse extack for sub-messages

Extend the Python YNL extack decoding to handle sub-messages in the same
way that YNL C does. This involves retaining the input values so that
they are available during extack decoding.

./tools/net/ynl/pyynl/cli.py --family rt-link --do newlink --create \
    --json '{
        "linkinfo": {"kind": "netkit", "data": {"policy": 10} }
    }'
Netlink error: Invalid argument
nl_len = 92 (76) nl_flags = 0x300 nl_type = 2
error: -22
extack: {'msg': 'Provided default xmit policy not supported', 'bad-attr': '.linkinfo.data(netkit).policy'}

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250523103031.80236-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mctp: start tx queue on netdev open

We stop queues in ndo_stop, so they need to be restarted in ndo_open.
This allows us to resume tx after a link down/up cycle.

Suggested-by: Nitin Singh <nitsingh@nvidia.com>
Fixes: 0791c0327a6e ("net: mctp: Add MCTP USB transport driver")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/20250526-dev-mctp-usb-v1-1-c7bd6cb75aa0@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Add error handling in mlx5_query_nic_vport_node_guid()

The function mlx5_query_nic_vport_node_guid() calls the function
mlx5_query_nic_vport_context() but does not check its return value.
A proper implementation can be found in mlx5_nic_vport_query_local_lb().

Add error handling for mlx5_query_nic_vport_context(). If it fails, free
the out buffer via kvfree() and return error code.

Fixes: 9efa75254593 ("net/mlx5_core: Introduce access functions to query vport RoCE fields")
Cc: stable@vger.kernel.org # v4.5
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250524163425.1695-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: airoha: Fix an error handling path in airoha_alloc_gdm_port()

If register_netdev() fails, the error handling path of the probe will not
free the memory allocated by the previous airoha_metadata_dst_alloc() call
because port->dev->reg_state will not be NETREG_REGISTERED.

So, an explicit airoha_metadata_dst_free() call is needed in this case to
avoid a memory leak.

Fixes: af3cf757d5c9 ("net: airoha: Move DSA tag in DMA descriptor")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1b94b91345017429ed653e2f05d25620dc2823f9.1746715755.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: clear phydev->devlink when the link is deleted

There is a potential crash issue when disabling and re-enabling the
network port. When disabling the network port, phy_detach() calls
device_link_del() to remove the device link, but it does not clear
phydev->devlink, so phydev->devlink is not a NULL pointer. Then the
network port is re-enabled, but if phy_attach_direct() fails before
calling device_link_add(), the code jumps to the "error" label and
calls phy_detach(). Since phydev->devlink retains the old value from
the previous attach/detach cycle, device_link_del() uses the old value,
which accesses a NULL pointer and causes a crash. The simplified crash
log is as follows.

[   24.702421] Call trace:
[   24.704856]  device_link_put_kref+0x20/0x120
[   24.709124]  device_link_del+0x30/0x48
[   24.712864]  phy_detach+0x24/0x168
[   24.716261]  phy_attach_direct+0x168/0x3a4
[   24.720352]  phylink_fwnode_phy_connect+0xc8/0x14c
[   24.725140]  phylink_of_phy_connect+0x1c/0x34

Therefore, phydev->devlink needs to be cleared when the device link is
deleted.

Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250523083759.3741168-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: mediatek: Add Airoha AN7583 PHY support

Add Airoha AN7583 PHY support based on Airoha AN7581 with the small
difference that BMCR_PDOWN is enabled by default and needs to be cleared
to make the internal PHY correctly work.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250522165313.6411-4-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mt7530: Add AN7583 support

Add Airoha AN7583 Switch support. This is based on Airoha EN7581 that is
based on Mediatek MT7988 Switch.

Airoha AN7583 require additional tweak to the GEPHY_CONN_CFG register to
make the internal PHY work.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250522165313.6411-3-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: dsa: mediatek,mt7530: Add airoha,an7583-switch

Add airoha,an7583-switch additional compatible to the mt7530 DSA Switch
Family. This is an exact match of the airoha,en7581-switch (based on
mt7988-switch) with the additional requirement of tweak on the
GEPHY_CONN_CFG registers to make the internal PHY actually work.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20250522165313.6411-2-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Doc: networking: Fix various typos in rds.rst

Corrected "sages" to "messages" in the bitmap allocation description.
Fixed "competed" to "completed" in the recv path datagram handling section.
Corrected "privatee" to "private" in the multipath RDS section.
Fixed "mutlipath" to "multipath" in the transport capabilities description.

These changes improve documentation clarity and maintain consistency.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250522074413.3634446-1-alok.a.tiwari@oracle.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Allow setting MAC address of representors

A representor netdev does not correspond to real hardware that needs to
be updated when setting the MAC address. The default eth_mac_addr() is
sufficient for simply updating the netdev's MAC address with validation.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1747898036-1121904-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'octeontx2-pf-do-not-detect-macsec-block-based-on-silicon'

Subbaraya Sundeep says:

====================
octeontx2-pf: Do not detect MACSEC block based on silicon

Out of various silicon variants of CN10K series some have hardware
MACSEC block for offloading MACSEC operations and some do not.
AF driver already has the information of whether MACSEC is present
or not on running silicon. Hence fetch that information from
AF via mailbox message.
====================

Link: https://patch.msgid.link/1747894516-4565-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

octeontx2-pf: macsec: Get MACSEC capability flag from AF

The presence of MACSEC block is currently figured out based
on the running silicon variant. This may not be correct all
the times since the MACSEC block can be fused out. Hence get
the macsec info from AF via mailbox.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1747894548-4657-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

octeontx2-af: Add MACSEC capability flag

MACSEC block may be fused out on some silicons hence modify
get_hw_cap mailbox message to set a capability flag in its
response message based on MACSEC block availability.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1747894528-4611-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

xsk: add missing virtual address conversion for page

In commit 7ead4405e06f ("xsk: convert xdp_copy_frags_from_zc() to use
page_pool_dev_alloc()"), when converting from netmem to page, I missed a
call to page_address() around skb_frag_page(frag) to get the virtual
address of the page. This commit uses skb_frag_address() helper to fix
the issue.

Fixes: 7ead4405e06f ("xsk: convert xdp_copy_frags_from_zc() to use page_pool_dev_alloc()")
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250522040115.5057-1-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

af_packet: move notifier's packet_dev_mc out of rcu critical section

Syzkaller reports the following issue:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578
__mutex_lock+0x106/0xe80 kernel/locking/mutex.c:746
team_change_rx_flags+0x38/0x220 drivers/net/team/team_core.c:1781
dev_change_rx_flags net/core/dev.c:9145 [inline]
__dev_set_promiscuity+0x3f8/0x590 net/core/dev.c:9189
netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9201
dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:286 packet_dev_mc net/packet/af_packet.c:3698 [inline]
packet_dev_mclist_delete net/packet/af_packet.c:3722 [inline]
packet_notifier+0x292/0xa60 net/packet/af_packet.c:4247
notifier_call_chain+0x1b3/0x3e0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2214 [inline]
call_netdevice_notifiers net/core/dev.c:2228 [inline]
unregister_netdevice_many_notify+0x15d8/0x2330 net/core/dev.c:11972
rtnl_delete_link net/core/rtnetlink.c:3522 [inline]
rtnl_dellink+0x488/0x710 net/core/rtnetlink.c:3564
rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6955
netlink_rcv_skb+0x219/0x490 net/netlink/af_netlink.c:2534

Calling `PACKET_ADD_MEMBERSHIP` on an ops-locked device can trigger
the `NETDEV_UNREGISTER` notifier, which may require disabling promiscuous
and/or allmulti mode. Both of these operations require acquiring
the netdev instance lock.

Move the call to `packet_dev_mc` outside of the RCU critical section.
The `mclist` modifications (add, del, flush, unregister) are protected by
the RTNL, not the RCU. The RCU only protects the `sklist` and its
associated `sks`. The delayed operation on the `mclist` entry remains
within the RTNL.

Reported-by: syzbot+b191b5ccad8d7a986286@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b191b5ccad8d7a986286
Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations")
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250522031129.3247266-1-stfomichev@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'vsock-sock_linger-rework'

Michal Luczaj says:

====================
vsock: SOCK_LINGER rework

Change vsock's lingerning to wait on close() until all data is sent, i.e.
until workers picked all the packets for processing.

v5: https://lore.kernel.org/r/20250521-vsock-linger-v5-0-94827860d1d6@rbox.co
v4: https://lore.kernel.org/r/20250501-vsock-linger-v4-0-beabbd8a0847@rbox.co
v3: https://lore.kernel.org/r/20250430-vsock-linger-v3-0-ddbe73b53457@rbox.co
v2: https://lore.kernel.org/r/20250421-vsock-linger-v2-0-fe9febd64668@rbox.co
v1: https://lore.kernel.org/r/20250407-vsock-linger-v1-0-1458038e3492@rbox.co

Signed-off-by: Michal Luczaj <mhal@rbox.co>
====================

Link: https://patch.msgid.link/20250522-vsock-linger-v6-0-2ad00b0e447e@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

vsock/test: Add test for an unexpectedly lingering close()

There was an issue with SO_LINGER: instead of blocking until all queued
messages for the socket have been successfully sent (or the linger timeout
has been reached), close() would block until packets were handled by the
peer.

Add a test to alert on close() lingering when it should not.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250522-vsock-linger-v6-5-2ad00b0e447e@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

vsock/test: Introduce enable_so_linger() helper

Add a helper function that sets SO_LINGER. Adapt the caller.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250522-vsock-linger-v6-4-2ad00b0e447e@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

vsock/test: Introduce vsock_wait_sent() helper

Distill the virtio_vsock_sock::bytes_unsent checking loop (ioctl SIOCOUTQ)
and move it to utils. Tweak the comment.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250522-vsock-linger-v6-3-2ad00b0e447e@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

vsock: Move lingering logic to af_vsock core

Lingering should be transport-independent in the long run. In preparation
for supporting other transports, as well as the linger on shutdown(), move
code to core.

Generalize by querying vsock_transport::unsent_bytes(), guard against the
callback being unimplemented. Do not pass sk_lingertime explicitly. Pull
SOCK_LINGER check into vsock_linger().

Flatten the function. Remove the nested block by inverting the condition:
return early on !timeout.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20250522-vsock-linger-v6-2-2ad00b0e447e@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

vsock/virtio: Linger on unsent data

Currently vsock's lingering effectively boils down to waiting (or timing
out) until packets are consumed or dropped by the peer; be it by receiving
the data, closing or shutting down the connection.

To align with the semantics described in the SO_LINGER section of man
socket(7) and to mimic AF_INET's behaviour more closely, change the logic
of a lingering close(): instead of waiting for all data to be handled,
block until data is considered sent from the vsock's transport point of
view. That is until worker picks the packets for processing and decrements
virtio_vsock_sock::bytes_unsent down to 0.

Note that (some interpretation of) lingering was always limited to
transports that called virtio_transport_wait_close() on transport release.
This does not change, i.e. under Hyper-V and VMCI no lingering would be
observed.

The implementation does not adhere strictly to man page's interpretation of
SO_LINGER: shutdown() will not trigger the lingering. This follows AF_INET.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20250522-vsock-linger-v6-1-2ad00b0e447e@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: add driver for MaxLinear MxL86110 PHY

Add support for the MaxLinear MxL86110 Gigabit Ethernet PHY, a low-power,
cost-optimized transceiver supporting 10/100/1000 Mbps over twisted-pair
copper, compliant with IEEE 802.3.

The driver implements basic features such as:
- Device initialization
- RGMII interface timing configuration
- Wake-on-LAN support
- LED initialization and control via /sys/class/leds

This driver has been tested on multiple Variscite boards, including:
- VAR-SOM-MX93 (i.MX93)
- VAR-SOM-MX8M-PLUS (i.MX8MP)

Example boot log showing driver probe:
[ 7.692101] imx-dwmac 428a0000.ethernet eth0:
PHY [stmmac-0:00] driver [MXL86110 Gigabit Ethernet] (irq=POLL)

Signed-off-by: Stefano Radaelli <stefano.radaelli21@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250521212821.593057-1-stefano.radaelli21@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'wireguard-updates-for-6-16'

Jason A. Donenfeld says:

====================
wireguard updates for 6.16

This small series contains mostly cleanups and one new feature:

1) Kees' __nonstring annotation comes to wireguard.

2) Two selftest fixes, one to help with compilation on gcc 15, and one
   removing stale config options.

3) Adoption of NLA_POLICY_MASK.

4) Jordan has added the ability to run:

    # wg set ... peer ... allowed-ips -192.168.1.0/24

  Which will remove the allowed IP for that peer. Previously you had to
  replace all the IPs non-atomically, or move it to a dummy peer
  atomically, which wasn't very clean.
====================

Link: https://patch.msgid.link/20250521212707.1767879-1-Jason@zx2c4.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

wireguard: selftests: specify -std=gnu17 for bash

GCC 15 defaults to C23, which bash can't compile under, so specify gnu17
explicitly.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250521212707.1767879-6-Jason@zx2c4.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

wireguard: allowedips: add WGALLOWEDIP_F_REMOVE_ME flag

The current netlink API for WireGuard does not directly support removal
of allowed ips from a peer. A user can remove an allowed ip from a peer
in one of two ways:

1. By using the WGPEER_F_REPLACE_ALLOWEDIPS flag and providing a new
   list of allowed ips which omits the allowed ip that is to be removed.
2. By reassigning an allowed ip to a "dummy" peer then removing that
   peer with WGPEER_F_REMOVE_ME.

With the first approach, the driver completely rebuilds the allowed ip
list for a peer. If my current configuration is such that a peer has
allowed ips 192.168.0.2 and 192.168.0.3 and I want to remove 192.168.0.2
the actual transition looks like this.

[192.168.0.2, 192.168.0.3] <-- Initial state
[]                         <-- Step 1: Allowed ips removed for peer
[192.168.0.3]              <-- Step 2: Allowed ips added back for peer

This is true even if the allowed ip list is small and the update does
not need to be batched into multiple WG_CMD_SET_DEVICE requests, as the
removal and subsequent addition of ips is non-atomic within a single
request. Consequently, wg_allowedips_lookup_dst and
wg_allowedips_lookup_src may return NULL while reconfiguring a peer even
for packets bound for ips a user did not intend to remove leading to
unintended interruptions in connectivity. This presents in userspace as
failed calls to sendto and sendmsg for UDP sockets. In my case, I ran
netperf while repeatedly reconfiguring the allowed ips for a peer with
wg.

/usr/local/bin/netperf -H 10.102.73.72 -l 10m -t UDP_STREAM -- -R 1 -m 1024
send_data: data send error: No route to host (errno 113)
netperf: send_omni: send_data failed: No route to host

While this may not be of particular concern for environments where peers
and allowed ips are mostly static, systems like Cilium manage peers and
allowed ips in a dynamic environment where peers (i.e. Kubernetes nodes)
and allowed ips (i.e. pods running on those nodes) can frequently
change making WGPEER_F_REPLACE_ALLOWEDIPS problematic.

The second approach avoids any possible connectivity interruptions
but is hacky and less direct, requiring the creation of a temporary
peer just to dispose of an allowed ip.

Introduce a new flag called WGALLOWEDIP_F_REMOVE_ME which in the same
way that WGPEER_F_REMOVE_ME allows a user to remove a single peer from
a WireGuard device's configuration allows a user to remove an ip from a
peer's set of allowed ips. This enables incremental updates to a
device's configuration without any connectivity blips or messy
workarounds.

A corresponding patch for wg extends the existing `wg set` interface to
leverage this feature.

$ wg set wg0 peer <PUBKEY> allowed-ips +192.168.88.0/24,-192.168.0.1/32

When '+' or '-' is prepended to any ip in the list, wg clears
WGPEER_F_REPLACE_ALLOWEDIPS and sets the WGALLOWEDIP_F_REMOVE_ME flag on
any ip prefixed with '-'.

Signed-off-by: Jordan Rife <jordan@jrife.io>
[Jason: minor style nits, fixes to selftest, bump of wireguard-tools version]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250521212707.1767879-5-Jason@zx2c4.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

wireguard: netlink: use NLA_POLICY_MASK where possible

Rather than manually validating flags against the various __ALL_*
constants, put this in the netlink policy description and have the upper
layer machinery check it for us.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250521212707.1767879-4-Jason@zx2c4.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

wireguard: global: add __nonstring annotations for unterminated strings

When a character array without a terminating NUL character has a static
initializer, GCC 15's -Wunterminated-string-initialization will only
warn if the array lacks the "nonstring" attribute[1]. Mark the arrays
with __nonstring to correctly identify the char array as "not a C string"
and thereby eliminate the warning:

../drivers/net/wireguard/cookie.c:29:56: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (9 chars into 8 available) [-Wunterminated-string-initialization]
   29 | static const u8 mac1_key_label[COOKIE_KEY_LABEL_LEN] = "mac1----";
      |                                                        ^~~~~~~~~~
../drivers/net/wireguard/cookie.c:30:58: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (9 chars into 8 available) [-Wunterminated-string-initialization]
   30 | static const u8 cookie_key_label[COOKIE_KEY_LABEL_LEN] = "cookie--";
      |                                                          ^~~~~~~~~~
../drivers/net/wireguard/noise.c:28:38: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (38 chars into 37 available) [-Wunterminated-string-initialization]
   28 | static const u8 handshake_name[37] = "Noise_IKpsk2_25519_ChaChaPoly_BLAKE2s";
      |                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/net/wireguard/noise.c:29:39: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (35 chars into 34 available) [-Wunterminated-string-initialization]
   29 | static const u8 identifier_name[34] = "WireGuard v1 zx2c4 Jason@zx2c4.com";
      |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The arrays are always used with their fixed size, so use __nonstring.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250521212707.1767879-3-Jason@zx2c4.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

wireguard: selftests: cleanup CONFIG_UBSAN_SANITIZE_ALL

Commit 918327e9b7ff ("ubsan: Remove CONFIG_UBSAN_SANITIZE_ALL")
removed the CONFIG_UBSAN_SANITIZE_ALL configuration option.
Eliminate invalid configurations to improve code readability.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250521212707.1767879-2-Jason@zx2c4.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-convert-dev_set_mac_address-to-struct-sockaddr_storage'

Kees Cook says:

====================
net: Convert dev_set_mac_address() to struct sockaddr_storage

As part of the effort to allow the compiler to reason about object sizes,
we need to deal with the problematic variably sized struct sockaddr,
which has no internal runtime size tracking. In much of the network
stack the use of struct sockaddr_storage has been adopted. Continue the
transition toward this for more of the internal APIs. Specifically:

- inet_addr_is_any()
- netif_set_mac_address()
- dev_set_mac_address()
- dev_set_mac_address_user()

Only a few callers of dev_set_mac_address() needed adjustment; all others
were already using struct sockaddr_storage internally.

v1: https://lore.kernel.org/all/20250520222452.work.063-kees@kernel.org/
====================

Link: https://patch.msgid.link/20250521204310.it.500-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: core: Convert dev_set_mac_address_user() to use struct sockaddr_storage

Convert callers of dev_set_mac_address_user() to use struct
sockaddr_storage. Add sanity checks on dev->addr_len usage.

Signed-off-by: Kees Cook <kees@kernel.org>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/20250521204619.2301870-8-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

rtnetlink: do_setlink: Use struct sockaddr_storage

Instead of a heap allocating a variably sized struct sockaddr and lying
about the type in the call to netif_set_mac_address(), use a stack
allocated struct sockaddr_storage. This lets us drop the cast and avoid
the allocation.

Putting "ss" on the stack means it will get a reused stack slot since
it is the same size (128B) as other existing single-scope stack variables,
like the vfinfo array (128B), so no additional stack space is used by
this function.

Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20250521204619.2301870-7-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: core: Convert dev_set_mac_address() to struct sockaddr_storage

All users of dev_set_mac_address() are now using a struct sockaddr_storage.
Convert the internal data type to struct sockaddr_storage, drop the casts,
and update pointer types.

Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20250521204619.2301870-6-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: usb: r8152: Convert to use struct sockaddr_storage internally

To support coming API type changes, switch to sockaddr_storage usage
internally.

Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20250521204619.2301870-5-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ieee802154: Use struct sockaddr_storage with dev_set_mac_address()

Switch to struct sockaddr_storage for calling dev_set_mac_address(). Add
a temporary cast to struct sockaddr, which will be removed in a
subsequent patch.

Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20250521204619.2301870-4-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/ncsi: Use struct sockaddr_storage for pending_mac

To avoid future casting with coming API type changes, switch struct
ncsi_dev_priv::pending_mac to a full struct sockaddr_storage.

Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20250521204619.2301870-3-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: core: Switch netif_set_mac_address() to struct sockaddr_storage

In order to avoid passing around struct sockaddr that has a size the
compiler cannot reason about (nor track at runtime), convert
netif_set_mac_address() to take struct sockaddr_storage. This is just a
cast conversion, so there is are no binary changes. Following patches
will make actual allocation changes.

Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20250521204619.2301870-2-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: core: Convert inet_addr_is_any() to sockaddr_storage

All the callers of inet_addr_is_any() have a sockaddr_storage-backed
sockaddr. Avoid casts and switch prototype to the actual object being
used.

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20250521204619.2301870-1-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: xilinx: axienet: Fix Tx skb circular buffer occupancy check in dmaengine xmit

In Dmaengine flow, driver maintains struct skbuf_dma_descriptor rings each
element of which corresponds to a skb. In Tx datapath, compare available
space in skb ring with number of skbs instead of skb fragments.
Replace x * (MAX_SKB_FRAGS) in netif_txq_completed_wake() and
netif_txq_maybe_stop() with x * (1 skb) to fix the comparison.

Fixes: 6a91b846af85 ("net: axienet: Introduce dmaengine support")
Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
Reviewed-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250521181608.669554-1-suraj.gupta2@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

replace strncpy with strscpy_pad

The strncpy() function is actively dangerous to use since it may not
NULL-terminate the destination string, resulting in potential memory
content exposures, unbounded reads, or crashes.
Link: https://github.com/KSPP/linux/issues/90
In addition, strscpy_pad is more appropriate because it also zero-fills
any remaining space in the destination if the source is shorter than
the provided buffer size.

Signed-off-by: Baris Can Goral <goralbaris@gmail.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250521161036.14489-1-goralbaris@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5_core: Add error handling inmlx5_query_nic_vport_qkey_viol_cntr()

The function mlx5_query_nic_vport_qkey_viol_cntr() calls the function
mlx5_query_nic_vport_context() but does not check its return value. This
could lead to undefined behavior if the query fails. A proper
implementation can be found in mlx5_nic_vport_query_local_lb().

Add error handling for mlx5_query_nic_vport_context(). If it fails, free
the out buffer via kvfree() and return error code.

Fixes: 9efa75254593 ("net/mlx5_core: Introduce access functions to query vport RoCE fields")
Cc: stable@vger.kernel.org # v4.5
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250521133620.912-1-vulab@iscas.ac.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: lan966x: Fix 1-step timestamping over ipv4 or ipv6

When enabling 1-step timestamping for ptp frames that are over udpv4 or
udpv6 then the inserted timestamp is added at the wrong offset in the
frame, meaning that will modify the frame at the wrong place, so the
frame will be malformed.
To fix this, the HW needs to know which kind of frame it is to know
where to insert the timestamp. For that there is a field in the IFH that
says the PDU_TYPE, which can be NONE which is the default value,
IPV4 or IPV6. Therefore make sure to set the PDU_TYPE so the HW knows
where to insert the timestamp.
Like I mention before the issue is not seen with L2 frames because by
default the PDU_TYPE has a value of 0, which represents the L2 frames.

Fixes: 77eecf25bd9d2f ("net: lan966x: Update extraction/injection for timestamping")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250521124159.2713525-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

vsock/virtio: fix `rx_bytes` accounting for stream sockets

In `struct virtio_vsock_sock`, we maintain two counters:
- `rx_bytes`: used internally to track how many bytes have been read.
This supports mechanisms like .stream_has_data() and sock_rcvlowat().
- `fwd_cnt`: used for the credit mechanism to inform available receive
buffer space to the remote peer.

These counters are updated via virtio_transport_inc_rx_pkt() and
virtio_transport_dec_rx_pkt().

Since the beginning with commit 06a8fc78367d ("VSOCK: Introduce
virtio_vsock_common.ko"), we call virtio_transport_dec_rx_pkt() in
virtio_transport_stream_do_dequeue() only when we consume the entire
packet, so partial reads, do not update `rx_bytes` and `fwd_cnt`.

This is fine for `fwd_cnt`, because we still have space used for the
entire packet, and we don't want to update the credit for the other
peer until we free the space of the entire packet. However, this
causes `rx_bytes` to be stale on partial reads.

Previously, this didn’t cause issues because `rx_bytes` was used only by
.stream_has_data(), and any unread portion of a packet implied data was
still available. However, since commit 93b808876682
("virtio/vsock: fix logic which reduces credit update messages"), we now
rely on `rx_bytes` to determine if a credit update should be sent when
the data in the RX queue drops below SO_RCVLOWAT value.

This patch fixes the accounting by updating `rx_bytes` with the number
of bytes actually read, even on partial reads, while leaving `fwd_cnt`
untouched until the packet is fully consumed. Also introduce a new
`buf_used` counter to check that the remote peer is honoring the given
credit; this was previously done via `rx_bytes`.

Fixes: 93b808876682 ("virtio/vsock: fix logic which reduces credit update messages")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250521121705.196379-1-sgarzare@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'nf-next-25-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next,
specifically 26 patches: 5 patches adding/updating selftests,
4 fixes, 3 PREEMPT_RT fixes, and 14 patches to enhance nf_tables):

1) Improve selftest coverage for pipapo 4 bit group format, from
   Florian Westphal.

2) Fix incorrect dependencies when compiling a kernel without
   legacy ip{6}tables support, also from Florian.

3) Two patches to fix nft_fib vrf issues, including selftest updates
   to improve coverage, also from Florian Westphal.

4) Fix incorrect nesting in nft_tunnel's GENEVE support, from
   Fernando F. Mancera.

5) Three patches to fix PREEMPT_RT issues with nf_dup infrastructure
   and nft_inner to match in inner headers, from Sebastian Andrzej Siewior.

6) Integrate conntrack information into nft trace infrastructure,
   from Florian Westphal.

7) A series of 13 patches to allow to specify wildcard netdevice in
   netdev basechain and flowtables, eg.

   table netdev filter {
       chain ingress {
           type filter hook ingress devices = { eth0, eth1, vlan* } priority 0; policy accept;
       }
   }

   This also allows for runtime hook registration on NETDEV_{UN}REGISTER
   event, from Phil Sutter.

netfilter pull request 25-05-23

* tag 'nf-next-25-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: (26 commits)
  selftests: netfilter: Torture nftables netdev hooks
  netfilter: nf_tables: Add notifications for hook changes
  netfilter: nf_tables: Support wildcard netdev hook specs
  netfilter: nf_tables: Sort labels in nft_netdev_hook_alloc()
  netfilter: nf_tables: Handle NETDEV_CHANGENAME events
  netfilter: nf_tables: Wrap netdev notifiers
  netfilter: nf_tables: Respect NETDEV_REGISTER events
  netfilter: nf_tables: Prepare for handling NETDEV_REGISTER events
  netfilter: nf_tables: Have a list of nf_hook_ops in nft_hook
  netfilter: nf_tables: Pass nf_hook_ops to nft_unregister_flowtable_hook()
  netfilter: nf_tables: Introduce nft_register_flowtable_ops()
  netfilter: nf_tables: Introduce nft_hook_find_ops{,_rcu}()
  netfilter: nf_tables: Introduce functions freeing nft_hook objects
  netfilter: nf_tables: add packets conntrack state to debug trace info
  netfilter: conntrack: make nf_conntrack_id callable without a module dependency
  netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmit
  netfilter: nft_inner: Use nested-BH locking for nft_pcpu_tun_ctx
  netfilter: nf_dup{4, 6}: Move duplication check to task_struct
  netfilter: nft_tunnel: fix geneve_opt dump
  selftests: netfilter: nft_fib.sh: add type and oif tests with and without VRFs
  ...
====================

Link: https://patch.msgid.link/20250523132712.458507-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
1) Remove some unnecessary strscpy_pad() size arguments.
   From Thorsten Blum.

2) Correct use of xso.real_dev on bonding offloads.
   Patchset from Cosmin Ratiu.

3) Add hardware offload configuration to XFRM_MSG_MIGRATE.
   From Chiachang Wang.

4) Refactor migration setup during cloning. This was
   done after the clone was created. Now it is done
   in the cloning function itself.
   From Chiachang Wang.

5) Validate assignment of maximal possible SEQ number.
   Prevent from setting to the maximum sequrnce number
   as this would cause for traffic drop.
   From Leon Romanovsky.

6) Prevent configuration of interface index when offload
   is used. Hardware can't handle this case.i
   From Leon Romanovsky.

7) Always use kfree_sensitive() for SA secret zeroization.
   From Zilin Guan.

ipsec-next-2025-05-23

* tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: use kfree_sensitive() for SA secret zeroization
  xfrm: prevent configuration of interface index when offload is used
  xfrm: validate assignment of maximal possible SEQ number
  xfrm: Refactor migration setup during the cloning process
  xfrm: Migrate offload configuration
  bonding: Fix multiple long standing offload races
  bonding: Mark active offloaded xfrm_states
  xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
  xfrm: Remove unneeded device check from validate_xmit_xfrm
  xfrm: Use xdo.dev instead of xdo.real_dev
  net/mlx5: Avoid using xso.real_dev unnecessarily
  xfrm: Remove unnecessary strscpy_pad() size arguments
====================

Link: https://patch.msgid.link/20250523075611.3723340-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'linux-can-next-for-6.16-20250522' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2025-05-22

this is a pull request of 22 patches for net-next/main.

The series by Biju Das contains 19 patches and adds RZ/G3E CANFD
support to the rcar_canfd driver.

The patch by Vincent Mailhol adds a struct data_bittiming_params to
group FD parameters as a preparation patch for CAN-XL support.

Felix Maurer's patch imports tst-filter from can-tests into the kernel
self tests and Vincent Mailhol adds support for physical CAN
interfaces.

linux-can-next-for-6.16-20250522

* tag 'linux-can-next-for-6.16-20250522' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (22 commits)
  selftests: can: test_raw_filter.sh: add support of physical interfaces
  selftests: can: Import tst-filter from can-tests
  can: dev: add struct data_bittiming_params to group FD parameters
  can: rcar_canfd: Add RZ/G3E support
  can: rcar_canfd: Enhance multi_channel_irqs handling
  can: rcar_canfd: Add external_clk variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add sh variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add struct rcanfd_regs variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add shared_can_regs variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add ch_interface_mode variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add {nom,data}_bittiming variables to struct rcar_canfd_hw_info
  can: rcar_canfd: Add max_cftml variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add max_aflpn variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Add rnc_field_width variable to struct rcar_canfd_hw_info
  can: rcar_canfd: Update RCANFD_GAFLCFG macro
  can: rcar_canfd: Add rcar_canfd_setrnc()
  can: rcar_canfd: Drop the mask operation in RCANFD_GAFLCFG_SETRNC macro
  can: rcar_canfd: Update RCANFD_GERFL_ERR macro
  can: rcar_canfd: Drop RCANFD_GAFLCFG_GETRNC macro
  can: rcar_canfd: Use of_get_available_child_by_name()
  ...
====================

Link: https://patch.msgid.link/20250522084128.501049-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

octeontx2-af: Send Link events one by one

Send link events one after another otherwise new message
is overwriting the message which is being processed by PF.

Fixes: a88e0f936ba9 ("octeontx2: Detect the mbox up or down message via register")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1747823443-404-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mctp: use nlmsg_payload() for netlink message data extraction

Jakub suggests:

> I have a different request :) Matt, once this ends up in net-next
> (end of this week) could you refactor it to use nlmsg_payload() ?
> It doesn't exist in net but this is exactly why it was added.

This refactors the additions to both mctp_dump_addrinfo(), and
mctp_rtm_getneigh() - two cases where we're calling nlh_data() on an
an incoming netlink message, without a prior nlmsg_parse().

For the neigh.c case, we cannot hit the failure where the nlh does not
contain a full ndmsg at present, as the core handler
(net/core/neighbour.c, neigh_get()) has already validated the size
through neigh_valid_req_get(), and would have failed the get operation
before the MCTP hander is called.

However, relying on that is a bit fragile, so apply the nlmsg_payload
refector here too.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/20250521-mctp-nlmsg-payload-v2-1-e85df160c405@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'add-the-capability-to-consume-sram-for-hwfd-descriptor-queue-in-airoha_eth-driver'

Lorenzo Bianconi says:

====================
Add the capability to consume SRAM for hwfd descriptor queue in airoha_eth driver

In order to improve packet processing and packet forwarding
performances, EN7581 SoC supports consuming SRAM instead of DRAM for hw
forwarding descriptors queue. For downlink hw accelerated traffic
request to consume SRAM memory for hw forwarding descriptors queue.
Moreover, in some configurations QDMA blocks require a contiguous block
of system memory for hwfd buffers queue. Introduce the capability to
allocate hw buffers forwarding queue via the reserved-memory DTS
property instead of running dmam_alloc_coherent().

v2: https://lore.kernel.org/r/20250509-airopha-desc-sram-v2-0-9dc3d8076dfb@kernel.org
v1: https://lore.kernel.org/r/20250507-airopha-desc-sram-v1-0-d42037431bfa@kernel.org
====================

Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-0-a6e9b085b4f0@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: airoha: Add the capability to allocate hfwd descriptors in SRAM

In order to improve packet processing and packet forwarding
performances, EN7581 SoC supports consuming SRAM instead of DRAM for
hw forwarding descriptors queue.
For downlink hw accelerated traffic request to consume SRAM memory
for hw forwarding descriptors queue.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-4-a6e9b085b4f0@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: airoha: Add the capability to allocate hwfd buffers via reserved-memory

In some configurations QDMA blocks require a contiguous block of
system memory for hwfd buffers queue. Introduce the capability to allocate
hw buffers forwarding queue via the reserved-memory DTS property instead of
running dmam_alloc_coherent().

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-3-a6e9b085b4f0@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: airoha: Do not store hfwd references in airoha_qdma struct

Since hfwd descriptor and buffer queues are allocated via
dmam_alloc_coherent() we do not need to store their references
in airoha_qdma struct. This patch does not introduce any logical changes,
just code clean-up.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-2-a6e9b085b4f0@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: airoha: Add EN7581 memory-region property

Introduce memory-region and memory-region-names properties for the
ethernet node available on EN7581 SoC in order to reserve system memory
for hw forwarding buffers queue used by the QDMA modules.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-1-a6e9b085b4f0@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'add-functions-for-txgbe-aml-devices'

Jiawen Wu says:

====================
Support phylink and link/gpio irqs for AML 25G/10G devices, and complete
PTP and SRIOV.
====================

Link: https://patch.msgid.link/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: Implement SRIOV for AML devices

Since .mac_link_up and .mac_link_down are changed for AML 25G/10G NICs,
the SR-IOV related function should be invoked in these new functions, to
bring VFs link up.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/BA8B302B7AAB6EA6+20250521064402.22348-10-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: Implement PTP for AML devices

Support PTP clock and 1PPS output signal for AML devices.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/F2F6E5E8899D2C20+20250521064402.22348-9-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: Restrict the use of mismatched FW versions

The new added mailbox commands require a new released firmware version.
Otherwise, a lot of logs "Unknown FW command" would be printed. And the
devices may not work properly. So add the test command in the probe
function.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/18283F17BE0FA335+20250521064402.22348-8-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: Correct the currect link settings

For AML 25G/10G devices, some of the information returned from
phylink_ethtool_ksettings_get() is not correct, since there is a
fixed-link mode. So add additional corrections.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/C94BF867617C544D+20250521064402.22348-7-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: Support to handle GPIO IRQs for AML devices

The driver needs to handle GPIO interrupts to identify SFP module and
configure PHY by sending mailbox messages to firmware.

Since the SFP module needs to wait for ready to get information when it
is inserted, workqueue is added to handle delayed tasks. And each SW-FW
interaction takes time to wait, so they are processed in the workqueue
instead of IRQ handler function.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/399624AF221E8E28+20250521064402.22348-6-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: Implement PHYLINK for AML 25G/10G devices

There is a new PHY attached to AML 25G/10G NIC, which is different from
SP 10G/1G NIC. But the PHY configuration is handed over to firmware, and
also I2C is controlled by firmware. So the different PHYLINK fixed-link
mode is added for these devices.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/987B973A5929CD48+20250521064402.22348-5-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: Distinguish between 40G and 25G devices

For the following patches to support PHYLINK for AML 25G devices,
separate MAC type wx_mac_aml40 to maintain the driver of 40G devices.
Because 40G devices will complete support later, not now.

And this patch makes the 25G devices use some PHYLINK interfaces, but it
is not yet create PHYLINK and cannot be used on its own. It is just
preparation for the next patches.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/592B1A6920867D0C+20250521064402.22348-4-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: wangxun: Use specific flag bit to simplify the code

Most of the different code that requires MAC type in the common library
is due to NGBE only supports a few queues and pools, unlike TXGBE, which
supports 128 queues and 64 pools. This difference accounts for most of
the hardware configuration differences in the driver code. So add a flag
bit "WX_FLAG_MULTI_64_FUNC" for them to clean-up the driver code.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/C731132E124D75E5+20250521064402.22348-3-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: Remove specified SP type

Since AML devices are going to reuse some definitions, remove the "SP"
qualifier from these definitions.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/8EF712EC14B8FF70+20250521064402.22348-2-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>