www.infradead.org Git - users/hch/misc.git/log

dql: Fix dql->limit value when reset.

Executing dql_reset after setting a non-zero value for limit_min can
lead to an unreasonable situation where dql->limit is less than
dql->limit_min.

For instance, after setting
/sys/class/net/eth*/queues/tx-0/byte_queue_limits/limit_min,
an ifconfig down/up operation might cause the ethernet driver to call
netdev_tx_reset_queue, which in turn invokes dql_reset.

In this case, dql->limit is reset to 0 while dql->limit_min remains
non-zero value, which is unexpected. The limit should always be
greater than or equal to limit_min.

Signed-off-by: Jing Su <jingsusu@didiglobal.com>
Link: https://patch.msgid.link/Z9qHD1s/NEuQBdgH@pilot-ThinkCentre-M930t-N000
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-net-mixed-select-polling-mode-for-tcp-ao-tests'

Dmitry Safonov via says:

====================
selftests/net: Mixed select()+polling mode for TCP-AO tests

Should fix flaky tcp-ao/connect-deny-ipv6 test.

v1: https://lore.kernel.org/20250312-tcp-ao-selftests-polling-v1-0-72a642b855d5@gmail.com
====================

Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-0-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Drop timeout argument from test_client_verify()

It's always TEST_TIMEOUT_SEC, with an unjustified exception in rst test,
that is more paranoia-long timeout rather than based on requirements.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-7-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Delete timeout from test_connect_socket()

Unused: it's always either the default timeout or asynchronous
connect().

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-6-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Print the testing side in unsigned-md5

As both client and server print the same test name on failure or pass,
add "[server]" so that it's more obvious from a log which side printed
"ok" or "not ok".

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-5-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Add mixed select()+polling mode to TCP-AO tests

Currently, tcp_ao tests have two timeouts: TEST_RETRANSMIT_SEC and
TEST_TIMEOUT_SEC [by default 1 and 5 seconds]. The first one,
TEST_RETRANSMIT_SEC is used for operations that are expected to succeed
in order for a test to pass. It is usually not consumed and exists only
to avoid indefinite test run if the operation didn't complete.
The second one, TEST_RETRANSMIT_SEC exists for the tests that checking
operations, that are expected to fail/timeout. It is shorter as it is
fully consumed, with an expectation that if operation didn't succeed
during that period, it will timeout. And the related test that expects
the timeout is passing. The actual operation failure is then
cross-verified by other means like counters checks.

The issue with TEST_RETRANSMIT_SEC timeout is that 1 second is the exact
initial TCP timeout. So, in case the initial segment gets lost (quite
unlikely on local veth interface between two net namespaces, yet happens
in slow VMs), the retransmission never happens and as a result, the test
is not actually testing the functionality. Which in the end fails
counters checks.

As I want tcp_ao selftests to be fast and finishing in a reasonable
amount of time on manual run, I didn't consider increasing
TEST_RETRANSMIT_SEC.

Rather, initially, BPF_SOCK_OPS_TIMEOUT_INIT looked promising as a lever
to make the initial TCP timeout shorter. But as it's not a socket bpf
attached thing, but sock_ops (attaches to cgroups), the selftests would
have to use libbpf, which I wanted to avoid if not absolutely required.

Instead, use a mixed select() and counters polling mode with the longer
TEST_TIMEOUT_SEC timeout to detect running-away failed tests. It
actually not only allows losing segments and succeeding after
the previous TEST_RETRANSMIT_SEC timeout was consumed, but makes
the tests expecting timeout/failure pass faster.

The only test case taking longer (TEST_TIMEOUT_SEC) now is connect-deny
"wrong snd id", which checks for no key on SYN-ACK for which there is no
counter in the kernel (see tcp_make_synack()). Yet it can be speed up
by poking skpair from the trace event (see trace_tcp_ao_synack_no_key).

Fixes: ed9d09b309b1 ("selftests/net: Add a test for TCP-AO keys matching")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20241205070656.6ef344d7@kernel.org/
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-4-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Fetch and check TCP-MD5 counters

There are related TCP-MD5 <=> TCP and TCP-MD5 <=> TCP-AO tests
that can benefit from checking the related counters, not only from
validating operations timeouts.

It also prepares the code for introduction of mixed select()+poll mode,
see the follow-up patches.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-3-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Provide tcp-ao counters comparison helper

Rename __test_tcp_ao_counters_cmp() into test_assert_counters_ao() and
test_tcp_ao_key_counters_cmp() into test_assert_counters_key() as they
are asserts, rather than just compare functions.

Provide test_cmp_counters() helper, that's going to be used to compare
ao_info and netns counters as a stop condition for polling the sockets.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-2-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Print TCP flags in more common format

Before:
># 13145[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: !FS!R!P!., keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1

After:
># 13487[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: S, keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1

For the history, I think the initial format was to emphasize the absence
of flags as well as their presence (!R meant no RST flag). But looking
again, it's just unreadable and hard to understand.
Make it the standard/expected one.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-1-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ynl: devlink: add missing board-serial-number

Add a missing attribute of board serial number.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250320085947.103419-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-xdp-add-missing-metadata-support-for-some-xdp-drvs'

Lorenzo Bianconi says:

====================
net: xdp: Add missing metadata support for some xdp drvs

Introduce missing metadata support for some xdp drivers setting metadata
size building the skb from xdp_buff.
Please note most of the drivers are just compile tested.

v1: https://lore.kernel.org/20250311-mvneta-xdp-meta-v1-0-36cf1c99790e@kernel.org
====================

Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-0-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ti: cpsw: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in cpsw/cpsw_new
drivers. ti cpsw and cpsw_new drivers set xdp headroom at least to
CPSW_HEADROOM_NA:

CPSW_HEADROOM_NA max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN

so the headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-7-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in mana driver.
mana driver sets xdp headroom to XDP_PACKET_HEADROOM so the headroom is
large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-6-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mediatek: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in mediatek driver.
mtk_eth_soc driver sets xdp headroom to XDP_PACKET_HEADROOM so the
headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-5-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: octeontx2: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in octeontx2 driver.
octeontx2 driver sets xdp headroom to OTX2_HEAD_ROOM

OTX2_HEAD_ROOM OTX2_ALIGN
OTX2_ALIGN 128

so the headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-4-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: netsec: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in netsec driver.
netsec driver sets xdp headroom to NETSEC_RXBUF_HEADROOM:

NETSEC_RXBUF_HEADROOM max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN

so the headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-3-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mvpp2: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in mvpp2 driver
mvpp2 driver sets xdp headroom to:

MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM

where

MVPP2_MH_SIZE 2
MVPP2_SKB_HEADROOM min(max(XDP_PACKET_HEADROOM, NET_SKB_PAD), 224)

so the headroom is large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-2-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mvneta: Add metadata support for xdp mode

Set metadata size building the skb from xdp_buff in mvneta driver
mvneta sets xdp headroom to:

MVNETA_MH_SIZE + MVNETA_SKB_HEADROOM

where

MVNETA_MH_SIZE 2
MVNETA_SKB_HEADROOM max(NET_SKB_PAD, XDP_PACKET_HEADROOM)

so the headroom is large enough to contain xdp_frame and xdp metadata.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-1-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: tulip: avoid unused variable warning

There is an effort to achieve W=1 kernel builds without warnings.
As part of that effort Helge Deller highlighted the following warnings
in the tulip driver when compiling with W=1 and CONFIG_TULIP_MWI=n:

.../tulip_core.c: In function ‘tulip_init_one’:
.../tulip_core.c:1309:22: warning: variable ‘force_csr0’ set but not used

This patch addresses that problem using IS_ENABLED(). This approach has
the added benefit of reducing conditionally compiled code. And thus
increasing compile coverage. E.g. for allmodconfig builds which enable
CONFIG_TULIP_MWI.

Compile tested only.
No run-time effect intended.

Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250318-tulip-w1-v3-1-a813fadd164d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'af_unix-clean-up-headers'

Kuniyuki Iwashima says:

====================
af_unix: Clean up headers.

AF_UNIX files include many unnecessary headers (netdevice.h and
rtnetlink.h, etc), and this series cleans them up.

Note that there are still some headers included indirectly and
modifying them triggers rebuild, which seems mostly inevitable. [0]

  $ python3 include_graph.py net/unix/garbage.c linux/rtnetlink.h linux/netdevice.h
  ...
  include/net/af_unix.h
  | include/linux/net.h
  | | include/linux/once.h
  | | include/linux/sockptr.h
  | | include/uapi/linux/net.h
  | include/net/sock.h
  | | include/linux/netdevice.h   <---
  ...
  | | include/net/dst.h
  | | | include/linux/rtnetlink.h <---

[0]: https://gist.github.com/q2ven/9c5897f11a493145829029c0bfb364d0
====================

Link: https://patch.msgid.link/20250318034934.86708-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Clean up #include under net/unix/.

net/unix/*.c include many unnecessary header files (rtnetlink.h,
netdevice.h, etc).

Let's clean them up.

af_unix.c:

  +uapi/linux/sockios.h   : Only exist under include/uapi
  +uapi/linux/termios.h   : Only exist under include/uapi

  -linux/freezer.h        : No longer use freezable_schedule_timeout()
  -linux/in.h             : No ipv4_is_XXX() etc
  -linux/module.h         : No longer support CONFIG_UNIX=m
  -linux/netdevice.h      : No dev used
  -linux/rtnetlink.h      : Not part of rtnetlink API
  -linux/signal.h         : signal_pending() is defined in sched/signal.h
  -linux/stat.h           : No struct stat used
  -net/checksum.h         : CHECKSUM_UNNECESSARY is defined in skbuff.h

diag.c:

  +linux/dcache.h         : struct dentry in sk_diag_dump_vfs()
  +linux/user_namespace.h : struct user_namespace in sk_diag_dump_uid()
  +uapi/linux/unix_diag.h : Only exist under include/uapi/

garbage.c:

  +linux/list.h           : struct unix_{vertex,edge}, etc
  +linux/workqueue.h      : DECLARE_WORK(unix_gc_work, ...)

  -linux/file.h           : No fget() etc
  -linux/kernel.h         : No cond_resched() etc
  -linux/netdevice.h      : No dev used
  -linux/proc_fs.h        : No procfs provided
  -linux/string.h         : No memcpy(), kmemdup(), etc

sysctl_net_unix.c:

  +linux/string.h         : kmemdup()
  +net/net_namespace.h    : struct net, net_eq()

  -linux/mm.h             : slab.h is enough

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250318034934.86708-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Explicitly include headers for non-pointer struct fields.

include/net/af_unix.h indirectly includes some definitions for structs.

Let's include such headers explicitly.

  linux/atomic.h   : scm_stat.nr_fds
  linux/net.h      : unix_sock.peer_wq
  linux/path.h     : unix_sock.path
  linux/spinlock.h : unix_sock.lock
  linux/wait.h     : unix_sock.peer_wake
  uapi/linux/un.h  : unix_address.name[]

linux/socket.h is removed as the structs there are not used directly,
and linux/un.h is clarified with uapi as un.h only exists under
include/uapi.

While at it, duplicate headers are removed from .c files.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250318034934.86708-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Move internal definitions to net/unix/.

net/af_unix.h is included by core and some LSMs, but most definitions
need not be.

Let's move struct unix_{vertex,edge} to net/unix/garbage.c and other
definitions to net/unix/af_unix.h.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250318034934.86708-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Sort headers.

This is a prep patch to make the following changes cleaner.

No functional change intended.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250318034934.86708-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'support-tcp_rto_min_us-and-tcp_delack_max_us-for-set-getsockopt'

Jason Xing says:

====================
support TCP_RTO_MIN_US and TCP_DELACK_MAX_US for set/getsockopt

Add set/getsockopt supports for TCP_RTO_MIN_US and TCP_DELACK_MAX_US.
====================

Link: https://patch.msgid.link/20250317120314.41404-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: support TCP_DELACK_MAX_US for set/getsockopt use

Support adjusting/reading delayed ack max for socket level by using
set/getsockopt().

This option aligns with TCP_BPF_DELACK_MAX usage. Considering that bpf
option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.

Add WRITE_ONCE/READ_ONCE() to prevent data-race if setsockopt()
happens to write one value to icsk_delack_max while icsk_delack_max is
being read.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: support TCP_RTO_MIN_US for set/getsockopt use

Support adjusting/reading RTO MIN for socket level by using set/getsockopt().

This new option has the same effect as TCP_BPF_RTO_MIN, which means it
doesn't affect RTAX_RTO_MIN usage (by using ip route...). Considering that
bpf option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.

When the socket is created, its icsk_rto_min is set to the default
value that is controlled by sysctl_tcp_rto_min_us. Then if application
calls setsockopt() with TCP_RTO_MIN_US flag to pass a valid value, then
icsk_rto_min will be overridden in jiffies unit.

This patch adds WRITE_ONCE/READ_ONCE to avoid data-race around
icsk_rto_min.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlxsw-add-vxlan-to-the-same-hardware-domain-as-physical-bridge-ports'

Petr Machata says:

====================
mlxsw: Add VXLAN to the same hardware domain as physical bridge ports

Amit Cohen writes:

Packets which are trapped to CPU for forwarding in software data path
are handled according to driver marking of skb->offload_{,l3}_fwd_mark.
Packets which are marked as L2-forwarded in hardware, will not be flooded
by the bridge to bridge ports which are in the same hardware domain as the
ingress port.

Currently, mlxsw does not add VXLAN bridge ports to the same hardware
domain as physical bridge ports despite the fact that the device is able
to forward packets to and from VXLAN tunnels in hardware. In some
scenarios this can result in remote VTEPs receiving duplicate packets.

To solve such packets duplication, add VXLAN bridge ports to the same
hardware domain as other bridge ports.

One complication is ARP suppression which requires the local VTEP to avoid
flooding ARP packets to remote VTEPs if the local VTEP is able to reply on
behalf of remote hosts. This is currently implemented by having the device
flood ARP packets in hardware and trapping them during VXLAN encapsulation,
but marking them with skb->offload_fwd_mark=1 so that the bridge will not
re-flood them to physical bridge ports.

The above scheme will break when VXLAN bridge ports are added to the same
hardware domain as physical bridge ports as ARP packets that cannot be
suppressed by the bridge will not be able to egress the VXLAN bridge ports
due to hardware domain filtering. This is solved by trapping ARP packets
when they enter the device and not marking them as being forwarded in
hardware.

Patch set overview:
Patch #1 sets hardware to trap ARP packets at layer 2
Patches #2-#4 are preparations for setting hardwarwe domain of VXLAN
Patch #5 sets hardware domain of VXLAN
Patch #6 extends VXLAN flood test to verify that this set solves the
packets duplication
====================

Link: https://patch.msgid.link/cover.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: vxlan_bridge: Test flood with unresolved FDB entry

Extend flood test to configure FDB entry with unresolved destination IP,
check that packets are not sent twice.

Without the previous patch which handles such scenario in mlxsw, the
tests fail:

$ TESTS='test_flood' ./vxlan_bridge_1d.sh
Running tests with UDP port 4789
TEST: VXLAN: flood                                                  [ OK ]
TEST: VXLAN: flood, unresolved FDB entry                            [FAIL]
        vx2 ns2: Expected to capture 10 packets, got 20.

$ TESTS='test_flood' ./vxlan_bridge_1q.sh
INFO: Running tests with UDP port 4789
TEST: VXLAN: flood vlan 10                                          [ OK ]
TEST: VXLAN: flood vlan 20                                          [ OK ]
TEST: VXLAN: flood vlan 10, unresolved FDB entry                    [FAIL]
        vx10 ns2: Expected to capture 10 packets, got 20.
TEST: VXLAN: flood vlan 20, unresolved FDB entry                    [FAIL]
        vx20 ns2: Expected to capture 10 packets, got 20.

With the previous patch, the tests pass.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/7bc96e317531f3bf06319fb2ea447bd8666f29fa.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: Add VXLAN bridge ports to same hardware domain as physical bridge ports

When hardware floods packets to bridge ports, but flooding to VXLAN bridge
port fails during encapsulation to one of the remote VTEPs, the packets are
trapped to CPU. In such case, the packets are marked with
skb->offload_fwd_mark, which means that packet was L2-forwarded in
hardware. Software data path repeats flooding, but packets which are
marked with skb->offload_fwd_mark will not be flooded by the bridge to
bridge ports which are in the same hardware domain as the ingress port.

Currently, mlxsw does not add VXLAN bridge ports to the same hardware
domain as physical bridge ports despite the fact that the device is able
to forward packets to and from VXLAN tunnels in hardware. In some scenarios
(as mentioned above) this can result in remote VTEPs receiving duplicate
packets. The packets are first flooded by hardware and after an
encapsulation failure, they are flooded again to all remote VTEPs by
software.

Solve this by adding VXLAN bridge ports to the same hardware domain as
physical bridge ports, so then nbp_switchdev_allowed_egress() will return
false also for VXLAN, and packets will not be sent twice from VXLAN device.

switchdev_bridge_port_offload() should get vxlan_dev not as const, so
some changes are required. Call switchdev API from
mlxsw_sp_bridge_vxlan_{join,leave}() which handle offload configurations.

Reported-by: Vladimir Oltean <olteanv@gmail.com>
Closes: https://lore.kernel.org/all/20250210152246.4ajumdchwhvbarik@skbuf/
Reported-by: Vladyslav Mykhaliuk <vmykhaliuk@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/7279056843140fae3a72c2d204c7886b79d03899.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_switchdev: Move mlxsw_sp_bridge_vxlan_join()

Next patch will call __mlxsw_sp_bridge_vxlan_leave() from
mlxsw_sp_bridge_vxlan_join() as part of error flow, move the function to
be able to call the second one.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/64750a0965536530482318578bada30fac372b8a.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_switchdev: Add an internal API for VXLAN leave

There is asymmetry in how the VXLAN join and leave functions are used.
The join function (mlxsw_sp_bridge_vxlan_join()) is only called in
response to netdev events (e.g., VXLAN device joining a bridge), but the
leave function is also called in response to switchdev events (e.g.,
VLAN configuration on top of the VXLAN device) in order to invalidate
VNI to FID mappings.

This asymmetry will cause problems when the functions will be later
extended to mark VXLAN bridge ports as offloaded or not.

Therefore, create an internal function (__mlxsw_sp_bridge_vxlan_leave())
that is used to invalidate VNI to FID mappings and call it from
mlxsw_sp_bridge_vxlan_leave() which will only be invoked in response to
netdev events, like mlxsw_sp_bridge_vxlan_join().

No functional changes intended.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/f3a32bd2d87a0b7ac4d2bb98a427dc6d95a01cd0.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum: Call mlxsw_sp_bridge_vxlan_{join, leave}() for VLAN-aware bridge

mlxsw_sp_bridge_vxlan_{join,leave}() are not called when a VXLAN device
joins or leaves a VLAN-aware bridge. As mentioned in the comment - when the
bridge is VLAN-aware, the VNI of the VXLAN device needs to be mapped to a
VLAN, but at this point no VLANs are configured on the VxLAN device. This
means that we can call the APIs, but there is no point to do that, as they
do not configure anything in such cases.

Next patch will extend mlxsw_sp_bridge_vxlan_{join,leave}() to set hardware
domain for VXLAN, this should be done also when a VXLAN device joins or
leaves a VLAN-aware bridge. Call the APIs, which for now do not do anything
in these flows.

Align the call to mlxsw_sp_bridge_vxlan_leave() to be called like
mlxsw_sp_bridge_vxlan_join(), only in case that the VXLAN device is up,
so move the check to be done before calling
mlxsw_sp_bridge_vxlan_{join,leave}(). This does not change the existing
behavior, as there is a similar check inside mlxsw_sp_bridge_vxlan_leave().

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/994c1ea93520f9ea55d1011cd47dc2180d526484.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: Trap ARP packets at layer 2 instead of layer 3

Next patch will set the same hardware domain for all bridge ports,
including VXLAN, to prevent packets from being forwarded by software when
they were already forwarded by hardware.

ARP packets are not flooded by hardware to VXLAN, so software should handle
such flooding. When hardware domain of VXLAN device will be changed, ARP
packets which are trapped and marked with offload_fwd_mark will not be
flooded to VXLAN also in software, which will break VXLAN traffic.

To prevent such breaking, trap ARP packets at layer 2 and don't mark them
as L2-forwarded in hardware, then flooding ARP packets will be done only
in software, and VXLAN will send ARP packets.

Remove NVE_ENCAP_ARP which is no longer needed, as now ARP packets are
trapped when they enter the device.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/b2a2cc607a1f4cb96c10bd3b0b0244ba3117fd2e.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: introduce per netns packet chains

Currently network taps unbound to any interface are linked in the
global ptype_all list, affecting the performance in all the network
namespaces.

Add per netns ptypes chains, so that in the mentioned case only
the netns owning the packet socket(s) is affected.

While at that drop the global ptype_all list: no in kernel user
registers a tap on "any" type without specifying either the target
device or the target namespace (and IMHO doing that would not make
any sense).

Note that this adds a conditional in the fast path (to check for
per netns ptype_specific list) and increases the dataset size by
a cacheline (owing the per netns lists).

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumaze@google.com>
Link: https://patch.msgid.link/ae405f98875ee87f8150c460ad162de7e466f8a7.1742494826.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tty: caif: removed unused function debugfs_tx()

Remove debugfs_tx() which was added when the caif driver was added in
commit 9b27105b4a44 ("net-caif-driver: add CAIF serial driver (ldisc)")
but it has never been used.

Flagged by LLVM 19.1.7 W=1 builds.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250320-caif-debugfs-tx-v1-1-be5654770088@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: Drop unused of_gpio.h

of_gpio.h is deprecated. Since there is no of_gpio_x API, drop
unused of_gpio.h. While at here, drop gpio.h and gpio/consumer.h if
no user in driver.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250320031542.3960381-1-peng.fan@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fixed_phy: transition to the faux device interface

The net fixed phy driver does not require the creation of a platform
device. Originally, this approach was chosen for simplicity when the
driver was first implemented.

With the introduction of the lightweight faux device interface, we now
have a more appropriate alternative. Migrate the device to utilize the
faux bus, given that the platform device it previously created was not
a real one anyway. This will get rid of the fake platform device.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://patch.msgid.link/20250319135209.2734594-1-sudeep.holla@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-cleanups-2025-03-19'

Tariq Toukan says:

====================
mlx5 cleanups 2025-03-19

This series contains small cleanups to the mlx5 core and Eth drivers.
====================

Link: https://patch.msgid.link/1742412199-159596-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Always select CONFIG_PAGE_POOL_STATS

Always set PAGE_POOL_STATS in mlx5 Eth driver.
Cleanup the corresponding #ifdefs.

Page pool stats are essential to monitor and analyze RX performance.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/1742412199-159596-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Use right API to free bitmap memory

Use bitmap_free() to free memory allocated with bitmap_zalloc_node().
This fixes memtrack error:
mtl rsc inconsistency: memtrack_free: .../drivers/net/ethernet/mellanox/mlx5/core/en_main.c::466: kfree for unknown address=0xFFFF0000CA3619E8, device=0x0

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742412199-159596-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Remove NULL check before dev_{put, hold}

Fix coccinelle warnings:
WARNING: NULL check before dev_{put, hold} functions is not needed.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742412199-159596-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phylink: Remove unused function pointer from phylink structure

From what I can tell the .get_fixed_state pointer in the phylink structure
hasn't been used since commit 5c05c1dbb177 ("net: phylink, dsa: eliminate
phylink_fixed_state_cb()") . Since I can't find any users for it we might
as well just drop the pointer.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/174240634772.1745174.5690351737682751849.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netpoll: Eliminate redundant assignment

The assignment of zero to udph->check is unnecessary as it is
immediately overwritten in the subsequent line. Remove the redundant
assignment.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250319-netpoll_nit-v1-1-a7faac5cbd92@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: Call xpcs_config_eee_mult_fact() only when xpcs is present

Some dwmac variants such as dwmac_socfpga don't use xpcs but lynx_pcs.

Don't call xpcs_config_eee_mult_fact() in this case, as this causes a
crash at init :

Unable to handle kernel NULL pointer dereference at virtual address 00000039 when write

[...]

Call trace:
  xpcs_config_eee_mult_fact from stmmac_pcs_setup+0x40/0x10c
  stmmac_pcs_setup from stmmac_dvr_probe+0xc0c/0x1244
  stmmac_dvr_probe from socfpga_dwmac_probe+0x130/0x1bc
  socfpga_dwmac_probe from platform_probe+0x5c/0xb0

Fixes: 060fb27060e8 ("net: stmmac: call xpcs_config_eee_mult_fact()")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250321103502.1303539-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: rss_ctx: Don't assume indirection table is present

The test_rss_context_dump() test assumes the indirection table is always
supported, which is not true for all drivers, e.g., virtio_net when
VIRTIO_NET_F_RSS is disabled.

Skip the check if 'indir' is not present.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250318112426.386651-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs/kcm: Fix typo "BFP"

'BFP' should be 'BPF'.

Signed-off-by: Ryohei Kinugawa <ryohei.kinugawa@gmail.com>
Link: https://patch.msgid.link/20250318095154.4187952-1-ryohei.kinugawa@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'r8169-enable-more-devices-aspm-support'

ChunHao Lin says:

====================
r8169: enable more devices ASPM support

This series of patches will enable more devices ASPM support.
It also fix a RTL8126 cannot enter L1 substate issue when ASPM is
enabled.
====================

Link: https://patch.msgid.link/20250318083721.4127-1-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: disable RTL8126 ZRX-DC timeout

Disable it due to it dose not meet ZRX-DC specification. If it is enabled,
device will exit L1 substate every 100ms. Disable it for saving more power
in L1 substate.

Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250318083721.4127-3-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: enable RTL8168H/RTL8168EP/RTL8168FP ASPM support

This patch will enable RTL8168H/RTL8168EP/RTL8168FP ASPM support on
the platforms that have tested with ASPM enabled.

Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250318083721.4127-2-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: networking: strparser: Fix a typo

The context indicates that 'than' is the correct word instead of 'then',
as a comparison is being performed.

Given that 'then' is also a valid English word, checkpatch.pl wouldn't
have picked up on this spelling error.

This typo was caught by AI during code review.

Suggested-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Link: https://patch.msgid.link/A43BEA49ED5CC6E5+20250318074656.644391-1-wangyuli@uniontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: fix the path of example code and example commands for device memory TCP

This updates the old path and fixes the description of unavailable options.

Signed-off-by: Yui Washizu <yui.washidu@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250318061251.775191-1-yui.washidu@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp/dccp: Remove inet_connection_sock_af_ops.addr2sockaddr().

inet_connection_sock_af_ops.addr2sockaddr() hasn't been used at all
in the git era.

$ git grep addr2sockaddr $(git rev-list HEAD | tail -n 1)

Let's remove it.

Note that there was a 4 bytes hole after sockaddr_len and now it's
6 bytes, so the binary layout is not changed.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250318060112.3729-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftest: net: update proc_net_pktgen (add more imix_weights test cases)

Add more imix_weights test cases (for incomplete input).

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317090401.1240704-2-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pktgen: add strict buffer parsing index check

Add strict buffer parsing index check to avoid the following Smatch
warning:

net/core/pktgen.c:877 get_imix_entries()
warn: check that incremented offset 'i' is capped

Checking the buffer index i after every get_user/i++ step and returning
with error code immediately avoids the current indirect (but correct)
error handling.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/netdev/36cf3ee2-38b1-47e5-a42a-363efeb0ace3@stanley.mountain/
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317090401.1240704-1-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: adjust the file entry in INTEL PMC CORE DRIVER

Commit 7e2f7e25f6ff ("arch: x86: add IPC mailbox accessor function and add
SoC register access") adds a new file entry referring to the non-existent
file linux/platform_data/x86/intel_pmc_ipc.h in section INTEL PMC CORE
DRIVER rather than referring to the file
include/linux/platform_data/x86/intel_pmc_ipc.h added with this commit.
Note that it was missing 'include' in the beginning.

Adjust the file reference to the intended file.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20250317092717.322862-1-lukas.bulwahn@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: move icsk_clean_acked to a better location

As a followup of my presentation in Zagreb for netdev 0x19:

icsk_clean_acked is only used by TCP when/if CONFIG_TLS_DEVICE
is enabled from tcp_ack().

Rename it to tcp_clean_acked, move it to tcp_sock structure
in the tcp_sock_read_rx for better cache locality in TCP
fast path.

Define this field only when CONFIG_TLS_DEVICE is enabled
saving 8 bytes on configs not using it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250317085313.2023214-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: dp83822: fix transmit amplitude if CONFIG_OF_MDIO not defined

When CONFIG_OF_MDIO is not defined the index for selecting the transmit
amplitude voltage for 100BASE-TX is set to 0, but it should be -1, if there
is no need to modify the transmit amplitude voltage. Move initialization of
the index from dp83822_of_init to dp8382x_probe.

Fixes: 4f3735e82d8a ("net: phy: dp83822: Add support for changing the transmit amplitude voltage")
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com>
Link: https://patch.msgid.link/20250317-dp83822-fix-transceiver-mdio-v2-1-fb09454099a4@liebherr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ena: resolve WARN_ON when freeing IRQs

When IRQs are freed, a WARN_ON is triggered as the
affinity notifier is not released.
This results in the below stack trace:

[  484.544586]  ? __warn+0x84/0x130
[  484.544843]  ? free_irq+0x5c/0x70
[  484.545105]  ? report_bug+0x18a/0x1a0
[  484.545390]  ? handle_bug+0x53/0x90
[  484.545664]  ? exc_invalid_op+0x14/0x70
[  484.545959]  ? asm_exc_invalid_op+0x16/0x20
[  484.546279]  ? free_irq+0x5c/0x70
[  484.546545]  ? free_irq+0x10/0x70
[  484.546807]  ena_free_io_irq+0x5f/0x70 [ena]
[  484.547138]  ena_down+0x250/0x3e0 [ena]
[  484.547435]  ena_destroy_device+0x118/0x150 [ena]
[  484.547796]  __ena_shutoff+0x5a/0xe0 [ena]
[  484.548110]  pci_device_remove+0x3b/0xb0
[  484.548412]  device_release_driver_internal+0x193/0x200
[  484.548804]  driver_detach+0x44/0x90
[  484.549084]  bus_remove_driver+0x69/0xf0
[  484.549386]  pci_unregister_driver+0x2a/0xb0
[  484.549717]  ena_cleanup+0xc/0x130 [ena]
[  484.550021]  __do_sys_delete_module.constprop.0+0x176/0x310
[  484.550438]  ? syscall_trace_enter+0xfb/0x1c0
[  484.550782]  do_syscall_64+0x5b/0x170
[  484.551067]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Adding a call to `netif_napi_set_irq` with -1 as the IRQ index,
which frees the notifier.

Fixes: de340d8206bf ("net: ena: use napi's aRFS rmap notifers")
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250317071147.1105-1-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: openvswitch: fix kernel-doc warnings in internal headers

Some field descriptions were missing, some were not very accurate.
Not touching the uAPI header or .c files for now.

Formatting of those comments isn't great in general, but at least
they are not missing anything now.

Before:
  $ ./scripts/kernel-doc -none -Wall net/openvswitch/*.h 2>&1 | wc -l
  16

After:
  $ ./scripts/kernel-doc -none -Wall net/openvswitch/*.h 2>&1 | wc -l
  0

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20250320224431.252489-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: phy_interface_t: Fix RGMII_TXID code comment

Fix copy-paste error in the code comment for Interface Mode definitions.
The code refers to Internal TX delay, not Internal RX delay. It was likely
copied from the line above this one.

Signed-off-by: Ihor Matushchak <ihor.matushchak@foobox.net>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250316071551.9794-1-ihor.matushchak@foobox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: disable PHY-mode EEE

Realtek RTL8211F has a "PHY-mode" EEE support which interferes with an
IEEE 802.3 compliant implementation. This mode defaults to enabled, and
results in the MAC receive path not seeing the link transition to LPI
state.

Fix this by disabling PHY-mode EEE.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1ttnHW-00785s-Uq@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: fix genphy_c45_eee_is_active() for disabled EEE

Commit 809265fe96fe ("net: phy: c45: remove local advertisement
parameter from genphy_c45_eee_is_active") stopped reading the local
advertisement from the PHY earlier in this development cycle, which
broke "ethtool --set-eee ethX eee off".

When ethtool is used to set EEE off, genphy_c45_eee_is_active()
indicates that EEE was active if the link partner reported an
advertisement, which causes phylib to set phydev->enable_tx_lpi on
link up, despite our local advertisement in hardware being empty.
However, phydev->advertising_eee is preserved while EEE is turned off,
which leads to genphy_c45_eee_is_active() incorrectly reporting that
EEE is active.

Fix it by checking phydev->eee_cfg.eee_enabled, and if clear,
immediately indicate that EEE is not active.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1ttmWN-0077Mb-Q6@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'mlx5e-support-recovery-counter-in-reset'

Tariq Toukan says:

====================
mlx5e: Support recovery counter in reset

This series by Yael adds a recovery counter in ethtool, for any recovery
type during port reset cycle.
Series starts with some cleanup and refactoring patches.
New counter is added and exposed to ethtool stats in patch #4.
====================

Link: https://patch.msgid.link/1742112876-2890-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Expose port reset cycle recovery counter via ethtool

Display recovery event of PPCNT recovery counters group. Counts (per
link) the number of total successful recovery events of any recovery
types during port reset cycle.

Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1742112876-2890-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Get counter group size by FW capability

Retrieve the number of fields supported by each PPCNT counter group
based on the FW capability for this group.

Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742112876-2890-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Access PHY layer counter group as other counter groups

Adjust the way physical layer counters group is accessed to match the
generic method used for accessing other PPCNT counter groups.

Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1742112876-2890-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Ensure each counter group uses its PCAM bit

The code was incorrectly relying on PCAM bit of ppcnt_statistical_group
for accessing per_lane_error_counters.
If ppcnt_statistical_group PCAM bit was not set, we would not read
per_lane_error_counters, even when its PCAM bit is set.
Given the existing device capabilities, it seems to cause no harm, so
this change primarily serves as cleanup.

Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742112876-2890-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: sockopt: fix getting freebind & transparent

When adding a socket option support in MPTCP, both the get and set parts
are supposed to be implemented.

IP(V6)_FREEBIND and IP(V6)_TRANSPARENT support for the setsockopt part
has been added a while ago, but it looks like the get part got
forgotten. It should have been present as a way to verify a setting has
been set as expected, and not to act differently from TCP or any other
socket types.

Everything was in place to expose it, just the last step was missing.
Only new code is added to cover these specific getsockopt(), that seems
safe.

Fixes: c9406a23c116 ("mptcp: sockopt: add SOL_IP freebind & transparent options")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-3-122dbb249db3@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: sockopt: fix getting IPV6_V6ONLY

When adding a socket option support in MPTCP, both the get and set parts
are supposed to be implemented.

IPV6_V6ONLY support for the setsockopt part has been added a while ago,
but it looks like the get part got forgotten. It should have been
present as a way to verify a setting has been set as expected, and not
to act differently from TCP or any other socket types.

Not supporting this getsockopt(IPV6_V6ONLY) blocks some apps which want
to check the default value, before doing extra actions. On Linux, the
default value is 0, but this can be changed with the net.ipv6.bindv6only
sysctl knob. On Windows, it is set to 1 by default. So supporting the
get part, like for all other socket options, is important.

Everything was in place to expose it, just the last step was missing.
Only new code is added to cover this specific getsockopt(), that seems
safe.

Fixes: c9b95a135987 ("mptcp: support IPV6_V6ONLY setsockopt")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/550
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-2-122dbb249db3@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'netconsole-add-support-for-userdata-release'

Breno Leitao says:

====================
netconsole: Add support for userdata release

I am submitting a series of patches that introduce a new feature for the
netconsole subsystem, specifically the addition of the 'release' field
to the sysdata structure. This feature allows the kernel release/version
to be appended to the userdata dictionary in every message sent,
enhancing the information available for debugging and monitoring
purposes.

This complements the already supported release prepend feature, which
was added some time ago. The release prepend appends the release
information at the message header, which is not ideal for two reasons:

1) It is difficult to determine if a message includes this information,
making it hard and resource-intensive to parse.

2) When a message is fragmented, the release information is appended to
every message fragment, consuming valuable space in the packet.

The "release prepend" feature was created before the concept of userdata
and sysdata. Now that this format has proven successful, we are
implementing the release feature as part of this enhanced structure.

This patch series aims to improve the netconsole subsystem by providing
a more efficient and user-friendly way to include kernel release
information in messages. I believe these changes will significantly aid
in system analysis and troubleshooting.

Suggested-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
====================

Link: https://patch.msgid.link/20250314-netcons_release-v1-0-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

docs: netconsole: document release feature

Add documentation explaining the kernel release auto-population feature
in netconsole.

This feature appends kernel version information to the userdata
dictionary in every message sent when enabled via the `release_enabled`
file in the configfs hierarchy.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-6-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: netconsole: Add tests for 'release' feature in sysdata

Expands the self-tests to include the 'release' feature in
sysdata.

Verifies that enabling the 'release' feature appends the
correct data and ensures that disabling it functions as expected.

When enabled, the message should have an item similar to in the
userdata: `release=$(uname -r)`

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-5-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netconsole: append release to sysdata

Append the init_utsname()->release to sysdata buffer before sending the
message in case the feature is set.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-4-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netconsole: add 'sysdata' suffix to related functions

This commit appends a common "sysdata" suffix to functions responsible
for appending data to sysdata.

This change enhances code clarity and prevents naming conflicts with
other "append" functions, particularly in anticipation of the upcoming
inclusion of the `release` field in the next patch.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-3-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netconsole: implement configfs for release_enabled

Implement the configfs helpers to show and set release_enabled configfs
directories under userdata.

When enabled, set the feature bit in netconsole_target->sysdata_fields.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-2-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netconsole: introduce 'release' as a new sysdata field

This commit adds a new feature to the sysdata structure, allowing the
kernel release/version to be appended as part of sysdata. Additionally,
it updates the logic to count this new field as a used entry when
enabled.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-1-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: airoha: fix CONFIG_DEBUG_FS check

The #if check causes a build failure when CONFIG_DEBUG_FS is turned
off:

In file included from drivers/net/ethernet/airoha/airoha_eth.c:17:
drivers/net/ethernet/airoha/airoha_eth.h:543:5: error: "CONFIG_DEBUG_FS" is not defined, evaluates to 0 [-Werror=undef]
543 | #if CONFIG_DEBUG_FS
| ^~~~~~~~~~~~~~~

Replace it with the correct #ifdef.

Fixes: 3fe15c640f38 ("net: airoha: Introduce PPE debugfs support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250314155009.4114308-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mctp: Remove unnecessary cast in mctp_cb

The void * cast in mctp_cb is unnecessary as it's already been done
at the start of the function.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/Z9PwOQeBSYlgZlHq@gondor.apana.org.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-phy-remove-calls-to-devm_hwmon_sanitize_name'

Heiner Kallweit says:

====================
net: phy: remove calls to devm_hwmon_sanitize_name

Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in
devm_hwmon_device_register_with_info") we can simply provide NULL
as name argument.
====================

Link: https://patch.msgid.link/198f3cd0-6c39-4783-afe7-95576a4b8539@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: marvell-88q2xxx: remove call to devm_hwmon_sanitize_name

Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in
devm_hwmon_device_register_with_info") we can simply provide NULL
as name argument.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/59c485e4-983c-42f6-9114-916703a62e3f@gmail.com
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: mxl-gpy: remove call to devm_hwmon_sanitize_name

Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in
devm_hwmon_device_register_with_info") we can simply provide NULL
as name argument.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/e34c4802-20ce-4556-a47c-812e602e8526@gmail.com
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: tja11xx: remove call to devm_hwmon_sanitize_name

Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in
devm_hwmon_device_register_with_info") we can simply provide NULL
as name argument.

Note that neither priv->hwmon_name nor priv->hwmon_dev are used
outside tja11xx_hwmon_register.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/4452cb7e-1a2f-4213-b49f-9de196be9204@gmail.com
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: realtek: remove call to devm_hwmon_sanitize_name

Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in
devm_hwmon_device_register_with_info") we can simply provide NULL
as name argument.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/6e8d26f4-8d0a-4c83-aec3-378847a377eb@gmail.com
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: remove sb1000 cable modem driver

This one is hilariously outdated, it provided a faster downlink over
TV cable for users of analog modems in the 1990s, through an ISA card.

The web page for the userspace tools has been broken for 25 years, and
the driver has only ever seen mechanical updates.

Link: http://web.archive.org/web/20000611165545/http://home.adelphia.net:80/~siglercm/sb1000.html
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250312085236.2531870-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-03-13

The following pull-request contains BPF updates for your *net-next* tree.

We've added 4 non-merge commits during the last 3 day(s) which contain
a total of 2 files changed, 35 insertions(+), 12 deletions(-).

The main changes are:

1) bpf_getsockopt support for TCP_BPF_RTO_MIN and TCP_BPF_DELACK_MAX,
   from Jason Xing

bpf-next-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Add bpf_getsockopt() for TCP_BPF_DELACK_MAX and TCP_BPF_RTO_MIN
  tcp: bpf: Support bpf_getsockopt for TCP_BPF_DELACK_MAX
  tcp: bpf: Support bpf_getsockopt for TCP_BPF_RTO_MIN
  tcp: bpf: Introduce bpf_sol_tcp_getsockopt to support TCP_BPF flags
====================

Link: https://patch.msgid.link/20250313221620.2512684-1-martin.lau@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.14-rc8).

Conflict:

tools/testing/selftests/net/Makefile
  03544faad761 ("selftest: net: add proc_net_pktgen")
  3ed61b8938c6 ("selftests: net: test for lwtunnel dst ref loops")

tools/testing/selftests/net/config:
  85cb3711acb8 ("selftests: net: Add test cases for link and peer netns")
  3ed61b8938c6 ("selftests: net: test for lwtunnel dst ref loops")

Adjacent commits:

tools/testing/selftests/net/Makefile
  c935af429ec2 ("selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITY")
  355d940f4d5a ("Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from can, bluetooth and ipsec.

  This contains a last minute revert of a recent GRE patch, mostly to
  allow me stating there are no known regressions outstanding.

  Current release - regressions:

   - revert "gre: Fix IPv6 link-local address generation."

   - eth: ti: am65-cpsw: fix NAPI registration sequence

  Previous releases - regressions:

   - ipv6: fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().

   - mptcp: fix data stream corruption in the address announcement

   - bluetooth: fix connection regression between LE and non-LE adapters

   - can:
       - flexcan: only change CAN state when link up in system PM
       - ucan: fix out of bound read in strscpy() source

  Previous releases - always broken:

   - lwtunnel: fix reentry loops

   - ipv6: fix TCP GSO segmentation with NAT

   - xfrm: force software GSO only in tunnel mode

   - eth: ti: icssg-prueth: add lock to stats

  Misc:

   - add Andrea Mayer as a maintainer of SRv6"

* tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
  MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6
  Revert "gre: Fix IPv6 link-local address generation."
  Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."
  net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES
  tools headers: Sync uapi/asm-generic/socket.h with the kernel sources
  mptcp: Fix data stream corruption in the address announcement
  selftests: net: test for lwtunnel dst ref loops
  net: ipv6: ioam6: fix lwtunnel_output() loop
  net: lwtunnel: fix recursion loops
  net: ti: icssg-prueth: Add lock to stats
  net: atm: fix use after free in lec_send()
  xsk: fix an integer overflow in xp_create_and_assign_umem()
  net: stmmac: dwc-qos-eth: use devm_kzalloc() for AXI data
  selftests: drv-net: use defer in the ping test
  phy: fix xa_alloc_cyclic() error handling
  dpll: fix xa_alloc_cyclic() error handling
  devlink: fix xa_alloc_cyclic() error handling
  ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create().
  ipv6: Fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().
  net: ipv6: fix TCP GSO segmentation with NAT
  ...

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"Collected driver fixes from the last few weeks, I was surprised how
  significant many of them seemed to be.

   - Fix rdma-core test failures due to wrong startup ordering in rxe

   - Don't crash in bnxt_re if the FW supports more than 64k QPs

   - Fix wrong QP table indexing math in bnxt_re

   - Calculate the max SRQs for userspace properly in bnxt_re

   - Don't try to do math on errno for mlx5's rate calculation

   - Properly allow userspace to control the VLAN in the QP state during
     INIT->RTR for bnxt_re

   - 6 bug fixes for HNS:
      - Soft lockup when processing huge MRs, add a cond_resched()
      - Fix missed error unwind for doorbell allocation
      - Prevent bad send queue parameters from userspace
      - Wrong error unwind in qp creation
      - Missed xa_destroy during driver shutdown
      - Fix reporting to userspace of max_sge_rd, hns doesn't have a
        read/write difference"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/hns: Fix wrong value of max_sge_rd
  RDMA/hns: Fix missing xa_destroy()
  RDMA/hns: Fix a missing rollback in error path of hns_roce_create_qp_common()
  RDMA/hns: Fix invalid sq params not being blocked
  RDMA/hns: Fix unmatched condition in error path of alloc_user_qp_db()
  RDMA/hns: Fix soft lockup during bt pages loop
  RDMA/bnxt_re: Avoid clearing VLAN_ID mask in modify qp path
  RDMA/mlx5: Handle errors returned from mlx5r_ib_rate()
  RDMA/bnxt_re: Fix reporting maximum SRQs on P7 chips
  RDMA/bnxt_re: Add missing paranthesis in map_qp_id_to_tbl_indx
  RDMA/bnxt_re: Fix allocation of QP table
  RDMA/rxe: Fix the failure of ibv_query_device() and ibv_query_device_ex() tests

Merge tag 'mmc-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC host fixes from Ulf Hansson:

- sdhci-brcmstb: Fix CQE suspend/resume support

- atmel-mci: Add a missing clk_disable_unprepare() in ->probe()

* tag 'mmc-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-brcmstb: add cqhci suspend/resume to PM ops
mmc: atmel-mci: Add missing clk_disable_unprepare()

Merge tag 'efi-fixes-for-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:
"Here's a final batch of EFI fixes for v6.14.

  The efivarfs ones are fixes for changes that were made this cycle.
  James's fix is somewhat of a band-aid, but it was blessed by the VFS
  folks, who are working with James to come up with something better for
  the next cycle.

   - Avoid physical address 0x0 for random page allocations

   - Add correct lockdep annotation when traversing efivarfs on resume

   - Avoid NULL mount in kernel_file_open() when traversing efivarfs on
     resume"

* tag 'efi-fixes-for-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efivarfs: fix NULL dereference on resume
  efivarfs: use I_MUTEX_CHILD nested lock to traverse variables on resume
  efi/libstub: Avoid physical address 0x0 when doing random allocation

MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6

Andrea has made significant contributions to SRv6 support in Linux.
Acknowledge the work and on-going interest in Srv6 support with a
maintainers entry for these files so hopefully he is included
on patches going forward.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250312092212.46299-1-dsahern@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'gre-revert-ipv6-link-local-address-fix'

Guillaume Nault says:

====================
gre: Revert IPv6 link-local address fix.

Following Paolo's suggestion, let's revert the IPv6 link-local address
generation fix for GRE devices. The patch introduced regressions in the
upstream CI, which are still under investigation.

Start by reverting the kselftest that depend on that fix (patch 1), then
revert the kernel code itself (patch 2).
====================

Link: https://patch.msgid.link/cover.1742418408.git.gnault@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Revert "gre: Fix IPv6 link-local address generation."

This reverts commit 183185a18ff96751db52a46ccf93fff3a1f42815.

This patch broke net/forwarding/ip6gre_custom_multipath_hash.sh in some
circumstances (https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/).
Let's revert it while the problem is being investigated.

Fixes: 183185a18ff9 ("gre: Fix IPv6 link-local address generation.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/8b1ce738eb15dd841aab9ef888640cab4f6ccfea.1742418408.git.gnault@redhat.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."

This reverts commit 6f50175ccad4278ed3a9394c00b797b75441bd6e.

Commit 183185a18ff9 ("gre: Fix IPv6 link-local address generation.") is
going to be reverted. So let's revert the corresponding kselftest
first.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/259a9e98f7f1be7ce02b53d0b4afb7c18a8ff747.1742418408.git.gnault@redhat.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'ipsec-2025-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2025-03-19

1) Fix tunnel mode TX datapath in packet offload mode
   by directly putting it to the xmit path.
   From Alexandre Cassen.

2) Force software GSO only in tunnel mode in favor
   of potential HW GSO. From Cosmin Ratiu.

ipsec-2025-03-19

* tag 'ipsec-2025-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm_output: Force software GSO only in tunnel mode
  xfrm: fix tunnel mode TX datapath in packet offload mode
====================

Link: https://patch.msgid.link/20250319065513.987135-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'batadv-net-pullrequest-20250318' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here is batman-adv bugfix:

- Ignore own maximum aggregation size during RX, Sven Eckelmann

* tag 'batadv-net-pullrequest-20250318' of git://git.open-mesh.org/linux-merge:
batman-adv: Ignore own maximum aggregation size during RX
====================

Link: https://patch.msgid.link/20250318150035.35356-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES

Previous commit 8b5c171bb3dc ("neigh: new unresolved queue limits")
introduces new netlink attribute NDTPA_QUEUE_LENBYTES to represent
approximative value for deprecated QUEUE_LEN. However, it forgot to add
the associated nla_policy in nl_ntbl_parm_policy array. Fix it with one
simple NLA_U32 type policy.

Fixes: 8b5c171bb3dc ("neigh: new unresolved queue limits")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://patch.msgid.link/20250315165113.37600-1-linma@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

tools headers: Sync uapi/asm-generic/socket.h with the kernel sources

This also fixes a wrong definitions for SCM_TS_OPT_ID & SO_RCVPRIORITY.

Accidentally found while working on another patchset.

Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jason Xing <kerneljasonxing@gmail.com>
Cc: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Fixes: a89568e9be75 ("selftests: txtimestamp: add SCM_TS_OPT_ID test")
Fixes: e45469e594b2 ("sock: Introduce SO_RCVPRIORITY socket option")
Link: https://lore.kernel.org/netdev/20250314195257.34854-1-kuniyu@amazon.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250314214155.16046-1-aleksandr.mikhalitsyn@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: Fix data stream corruption in the address announcement

Because of the size restriction in the TCP options space, the MPTCP
ADD_ADDR option is exclusive and cannot be sent with other MPTCP ones.
For this reason, in the linked mptcp_out_options structure, group of
fields linked to different options are part of the same union.

There is a case where the mptcp_pm_add_addr_signal() function can modify
opts->addr, but not ended up sending an ADD_ADDR. Later on, back in
mptcp_established_options, other options will be sent, but with
unexpected data written in other fields due to the union, e.g. in
opts->ext_copy. This could lead to a data stream corruption in the next
packet.

Using an intermediate variable, prevents from corrupting previously
established DSS option. The assignment of the ADD_ADDR option
parameters is now done once we are sure this ADD_ADDR option can be set
in the packet, e.g. after having dropped other suboptions.

Fixes: 1bff1e43a30e ("mptcp: optimize out option generation")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Arthur Mongodin <amongodin@randorisec.fr>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
[ Matt: the commit message has been updated: long lines splits and some
clarifications. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-1-122dbb249db3@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>