From db9ae3b6b43c79b1ba87eea849fd65efa05b4b2e Mon Sep 17 00:00:00 2001 From: Mirco Barone Date: Thu, 5 Jun 2025 14:06:16 +0200 Subject: [PATCH 01/16] wireguard: device: enable threaded NAPI MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Enable threaded NAPI by default for WireGuard devices in response to low performance behavior that we observed when multiple tunnels (and thus multiple wg devices) are deployed on a single host. This affects any kind of multi-tunnel deployment, regardless of whether the tunnels share the same endpoints or not (i.e., a VPN concentrator type of gateway would also be affected). The problem is caused by the fact that, in case of a traffic surge that involves multiple tunnels at the same time, the polling of the NAPI instance of all these wg devices tends to converge onto the same core, causing underutilization of the CPU and bottlenecking performance. This happens because NAPI polling is hosted by default in softirq context, but the WireGuard driver only raises this softirq after the rx peer queue has been drained, which doesn't happen during high traffic. In this case, the softirq already active on a core is reused instead of raising a new one. As a result, once two or more tunnel softirqs have been scheduled on the same core, they remain pinned there until the surge ends. In our experiments, this almost always leads to all tunnel NAPIs being handled on a single core shortly after a surge begins, limiting scalability to less than 3× the performance of a single tunnel, despite plenty of unused CPU cores being available. The proposed mitigation is to enable threaded NAPI for all WireGuard devices. This moves the NAPI polling context to a dedicated per-device kernel thread, allowing the scheduler to balance the load across all available cores. On our 32-core gateways, enabling threaded NAPI yields a ~4× performance improvement with 16 tunnels, increasing throughput from ~13 Gbps to ~48 Gbps. Meanwhile, CPU usage on the receiver (which is the bottleneck) jumps from 20% to 100%. We have found no performance regressions in any scenario we tested. Single-tunnel throughput remains unchanged. More details are available in our Netdev paper. Link: https://netdevconf.info/0x18/docs/netdev-0x18-paper23-talk-paper.pdf Signed-off-by: Mirco Barone Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld Link: https://patch.msgid.link/20250605120616.2808744-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski --- drivers/net/wireguard/device.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c index 3ffeeba5dccf4..4a529f1f9beab 100644 --- a/drivers/net/wireguard/device.c +++ b/drivers/net/wireguard/device.c @@ -366,6 +366,7 @@ static int wg_newlink(struct net_device *dev, if (ret < 0) goto err_free_handshake_queue; + dev_set_threaded(dev, true); ret = register_netdevice(dev); if (ret < 0) goto err_uninit_ratelimiter; -- 2.51.0 From 7eb6b63aa3c3b406512db15943ce9eb114095b51 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 3 Jun 2025 17:16:52 -0700 Subject: [PATCH 02/16] selftests: drv-net: add configs for the TSO test Add missing config options for the tso.py test, specifically to make sure the kernel is built with vxlan and gre tunnels. I noticed this while adding a TSO-capable device QEMU to the CI. Previously we only run virtio tests and it doesn't report LSO stats on the QEMU we have. Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test") Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20250604001653.853008-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- tools/testing/selftests/drivers/net/hw/config | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 tools/testing/selftests/drivers/net/hw/config diff --git a/tools/testing/selftests/drivers/net/hw/config b/tools/testing/selftests/drivers/net/hw/config new file mode 100644 index 0000000000000..88ae719e6f8f4 --- /dev/null +++ b/tools/testing/selftests/drivers/net/hw/config @@ -0,0 +1,5 @@ +CONFIG_IPV6=y +CONFIG_IPV6_GRE=y +CONFIG_NET_IPGRE=y +CONFIG_NET_IPGRE_DEMUX=y +CONFIG_VXLAN=y -- 2.51.0 From c68804c934e3197e34560744854c57cf88dff8e7 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 3 Jun 2025 18:20:31 -0700 Subject: [PATCH 03/16] selftests: drv-net: tso: fix the GRE device name The device type for IPv4 GRE is "gre" not "ipgre", unlike for IPv6 which uses "ip6gre". Not sure how I missed this when writing the test, perhaps because all HW I have access to is on an IPv6-only network. Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test") Reviewed-by: Simon Horman Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20250604012031.891242-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- tools/testing/selftests/drivers/net/hw/tso.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/drivers/net/hw/tso.py b/tools/testing/selftests/drivers/net/hw/tso.py index e1ecb92f79d9b..eec647e7ec19c 100755 --- a/tools/testing/selftests/drivers/net/hw/tso.py +++ b/tools/testing/selftests/drivers/net/hw/tso.py @@ -216,7 +216,7 @@ def main() -> None: ("", "6", "tx-tcp6-segmentation", None), ("vxlan", "", "tx-udp_tnl-segmentation", ("vxlan", True, "id 100 dstport 4789 noudpcsum")), ("vxlan_csum", "", "tx-udp_tnl-csum-segmentation", ("vxlan", False, "id 100 dstport 4789 udpcsum")), - ("gre", "4", "tx-gre-segmentation", ("ipgre", False, "")), + ("gre", "4", "tx-gre-segmentation", ("gre", False, "")), ("gre", "6", "tx-gre-segmentation", ("ip6gre", False, "")), ) -- 2.51.0 From e6854be4d80ea266a7be64a65a0322bcdfa72807 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 3 Jun 2025 18:20:55 -0700 Subject: [PATCH 04/16] selftests: drv-net: tso: make bkg() wait for socat to quit Commit 846742f7e32f ("selftests: drv-net: add a warning for bkg + shell + terminate") added a warning for bkg() used with terminate=True. The tso test was missed as we didn't have it running anywhere in NIPA. Add exit_wait=True, to avoid: # Warning: combining shell and terminate is risky! # SIGTERM may not reach the child on zsh/ksh! getting printed twice for every variant. Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test") Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20250604012055.891431-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- tools/testing/selftests/drivers/net/hw/tso.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/drivers/net/hw/tso.py b/tools/testing/selftests/drivers/net/hw/tso.py index eec647e7ec19c..3370827409aa0 100755 --- a/tools/testing/selftests/drivers/net/hw/tso.py +++ b/tools/testing/selftests/drivers/net/hw/tso.py @@ -39,7 +39,7 @@ def run_one_stream(cfg, ipver, remote_v4, remote_v6, should_lso): port = rand_port() listen_cmd = f"socat -{ipver} -t 2 -u TCP-LISTEN:{port},reuseport /dev/null,ignoreeof" - with bkg(listen_cmd, host=cfg.remote) as nc: + with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as nc: wait_port_listen(port, host=cfg.remote) if ipver == "4": -- 2.51.0 From 535caaca921c653b1f9838fbd5c4e9494cafc3d9 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 4 Jun 2025 09:39:28 +0000 Subject: [PATCH 05/16] net: annotate data-races around cleanup_net_task from_cleanup_net() reads cleanup_net_task locklessly. Add READ_ONCE()/WRITE_ONCE() annotations to avoid a potential KCSAN warning, even if the race is harmless. Fixes: 0734d7c3d93c ("net: expedite synchronize_net() for cleanup_net()") Signed-off-by: Eric Dumazet Reviewed-by: Jason Xing Link: https://patch.msgid.link/20250604093928.1323333-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/core/dev.c | 2 +- net/core/net_namespace.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index a388f459a3666..be97c440ecd5f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10499,7 +10499,7 @@ static void dev_index_release(struct net *net, int ifindex) static bool from_cleanup_net(void) { #ifdef CONFIG_NET_NS - return current == cleanup_net_task; + return current == READ_ONCE(cleanup_net_task); #else return false; #endif diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 42ee7fce3d95b..ae54f26709ca2 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -654,7 +654,7 @@ static void cleanup_net(struct work_struct *work) struct net *net, *tmp, *last; LIST_HEAD(net_exit_list); - cleanup_net_task = current; + WRITE_ONCE(cleanup_net_task, current); /* Atomically snapshot the list of namespaces to cleanup */ net_kill_list = llist_del_all(&cleanup_list); @@ -704,7 +704,7 @@ static void cleanup_net(struct work_struct *work) put_user_ns(net->user_ns); net_passive_dec(net); } - cleanup_net_task = NULL; + WRITE_ONCE(cleanup_net_task, NULL); } /** -- 2.51.0 From feafc73f3e6ae73371777a037d41d2e31c929636 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 4 Jun 2025 10:58:15 +0000 Subject: [PATCH 06/16] net: prevent a NULL deref in rtnl_create_link() At the time rtnl_create_link() is running, dev->netdev_ops is NULL, we must not use netdev_lock_ops() or risk a NULL deref if CONFIG_NET_SHAPER is defined. Use netif_set_group() instead of dev_set_group(). RIP: 0010:netdev_need_ops_lock include/net/netdev_lock.h:33 [inline] RIP: 0010:netdev_lock_ops include/net/netdev_lock.h:41 [inline] RIP: 0010:dev_set_group+0xc0/0x230 net/core/dev_api.c:82 Call Trace: rtnl_create_link+0x748/0xd10 net/core/rtnetlink.c:3674 rtnl_newlink_create+0x25c/0xb00 net/core/rtnetlink.c:3813 __rtnl_newlink net/core/rtnetlink.c:3940 [inline] rtnl_newlink+0x16d6/0x1c70 net/core/rtnetlink.c:4055 rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6944 netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2534 netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] netlink_unicast+0x75b/0x8d0 net/netlink/af_netlink.c:1339 netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1883 sock_sendmsg_nosec net/socket.c:712 [inline] Reported-by: syzbot+9fc858ba0312b42b577e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6840265f.a00a0220.d4325.0009.GAE@google.com/T/#u Signed-off-by: Eric Dumazet Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250604105815.1516973-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/core/rtnetlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index f9a35bdc58ad2..c57692eb8da9d 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3671,7 +3671,7 @@ struct net_device *rtnl_create_link(struct net *net, const char *ifname, if (tb[IFLA_LINKMODE]) dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]); if (tb[IFLA_GROUP]) - dev_set_group(dev, nla_get_u32(tb[IFLA_GROUP])); + netif_set_group(dev, nla_get_u32(tb[IFLA_GROUP])); if (tb[IFLA_GSO_MAX_SIZE]) netif_set_gso_max_size(dev, nla_get_u32(tb[IFLA_GSO_MAX_SIZE])); if (tb[IFLA_GSO_MAX_SEGS]) -- 2.51.0 From 7632fedb266d93ed0ed9f487133e6c6314a9b2d1 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Wed, 4 Jun 2025 14:32:52 +0300 Subject: [PATCH 07/16] seg6: Fix validation of nexthop addresses The kernel currently validates that the length of the provided nexthop address does not exceed the specified length. This can lead to the kernel reading uninitialized memory if user space provided a shorter length than the specified one. Fix by validating that the provided length exactly matches the specified one. Fixes: d1df6fd8a1d2 ("ipv6: sr: define core operations for seg6local lightweight tunnel") Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel Reviewed-by: David Ahern Link: https://patch.msgid.link/20250604113252.371528-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski --- net/ipv6/seg6_local.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c index ac1dbd492c22d..a11a02b4ba95b 100644 --- a/net/ipv6/seg6_local.c +++ b/net/ipv6/seg6_local.c @@ -1644,10 +1644,8 @@ static const struct nla_policy seg6_local_policy[SEG6_LOCAL_MAX + 1] = { [SEG6_LOCAL_SRH] = { .type = NLA_BINARY }, [SEG6_LOCAL_TABLE] = { .type = NLA_U32 }, [SEG6_LOCAL_VRFTABLE] = { .type = NLA_U32 }, - [SEG6_LOCAL_NH4] = { .type = NLA_BINARY, - .len = sizeof(struct in_addr) }, - [SEG6_LOCAL_NH6] = { .type = NLA_BINARY, - .len = sizeof(struct in6_addr) }, + [SEG6_LOCAL_NH4] = NLA_POLICY_EXACT_LEN(sizeof(struct in_addr)), + [SEG6_LOCAL_NH6] = NLA_POLICY_EXACT_LEN(sizeof(struct in6_addr)), [SEG6_LOCAL_IIF] = { .type = NLA_U32 }, [SEG6_LOCAL_OIF] = { .type = NLA_U32 }, [SEG6_LOCAL_BPF] = { .type = NLA_NESTED }, -- 2.51.0 From 3cae906e1a6184cdc9e4d260e4dbdf9a118d94ad Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 4 Jun 2025 13:38:26 +0000 Subject: [PATCH 08/16] calipso: unlock rcu before returning -EAFNOSUPPORT syzbot reported that a recent patch forgot to unlock rcu in the error path. Adopt the convention that netlbl_conn_setattr() is already using. Fixes: 6e9f2df1c550 ("calipso: Don't call calipso functions for AF_INET sk.") Reported-by: syzbot Signed-off-by: Eric Dumazet Cc: Kuniyuki Iwashima Acked-by: Paul Moore Link: https://patch.msgid.link/20250604133826.1667664-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/netlabel/netlabel_kapi.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c index 6ea16138582c0..33b77084a4e5f 100644 --- a/net/netlabel/netlabel_kapi.c +++ b/net/netlabel/netlabel_kapi.c @@ -1165,8 +1165,10 @@ int netlbl_conn_setattr(struct sock *sk, break; #if IS_ENABLED(CONFIG_IPV6) case AF_INET6: - if (sk->sk_family != AF_INET6) - return -EAFNOSUPPORT; + if (sk->sk_family != AF_INET6) { + ret_val = -EAFNOSUPPORT; + goto conn_setattr_return; + } addr6 = (struct sockaddr_in6 *)addr; entry = netlbl_domhsh_getentry_af6(secattr->domain, -- 2.51.0 From 71052a800371c64b4546a1d7f69ad515a779316f Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Thu, 5 Jun 2025 10:11:56 -0700 Subject: [PATCH 09/16] MAINTAINERS: add entry for crypto library I am volunteering to maintain the kernel's crypto library code. [ And Jason and Ard piped up too - Linus ] Signed-off-by: Eric Biggers Acked-by: Jason A. Donenfeld Acked-by: Ard Biesheuvel Signed-off-by: Linus Torvalds --- MAINTAINERS | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index ac8ccc837baba..f2668b81115cb 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6374,11 +6374,20 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git F: Documentation/crypto/ F: Documentation/devicetree/bindings/crypto/ F: arch/*/crypto/ -F: arch/*/lib/crypto/ F: crypto/ F: drivers/crypto/ F: include/crypto/ F: include/linux/crypto* + +CRYPTO LIBRARY +M: Eric Biggers +M: Jason A. Donenfeld +M: Ard Biesheuvel +L: linux-crypto@vger.kernel.org +S: Maintained +T: git https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git libcrypto-next +T: git https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git libcrypto-fixes +F: arch/*/lib/crypto/ F: lib/crypto/ CRYPTO SPEED TEST COMPARE -- 2.51.0 From bc0cb64db1c765a81f69997d5a28f539e1731bc0 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Mon, 9 Jun 2025 02:46:26 -0700 Subject: [PATCH 10/16] netconsole: Only register console drivers when targets are configured The netconsole driver currently registers the basic console driver unconditionally during initialization, even when only extended targets are configured. This results in unnecessary console registration and performance overhead, as the write_msg() callback is invoked for every log message only to return early when no matching targets are found. Optimize the driver by conditionally registering console drivers based on the actual target configuration. The basic console driver is now registered only when non-extended targets exist, same as the extended console. The implementation also handles dynamic target creation through the configfs interface. This change eliminates unnecessary console driver registrations, redundant write_msg() callbacks for unused console types, and associated lock contention and target list iterations. The optimization is particularly beneficial for systems using only the most common extended console type. Fixes: e2f15f9a79201 ("netconsole: implement extended console support") Signed-off-by: Breno Leitao Link: https://patch.msgid.link/20250609-netcons_ext-v3-1-5336fa670326@debian.org Signed-off-by: Jakub Kicinski --- drivers/net/netconsole.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 4289ccd3e41bf..01baa45221b4b 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -86,10 +86,10 @@ static DEFINE_SPINLOCK(target_list_lock); static DEFINE_MUTEX(target_cleanup_list_lock); /* - * Console driver for extended netconsoles. Registered on the first use to - * avoid unnecessarily enabling ext message formatting. + * Console driver for netconsoles. Register only consoles that have + * an associated target of the same type. */ -static struct console netconsole_ext; +static struct console netconsole_ext, netconsole; struct netconsole_target_stats { u64_stats_t xmit_drop_count; @@ -97,6 +97,11 @@ struct netconsole_target_stats { struct u64_stats_sync syncp; }; +enum console_type { + CONS_BASIC = BIT(0), + CONS_EXTENDED = BIT(1), +}; + /* Features enabled in sysdata. Contrary to userdata, this data is populated by * the kernel. The fields are designed as bitwise flags, allowing multiple * features to be set in sysdata_fields. @@ -491,6 +496,12 @@ static ssize_t enabled_store(struct config_item *item, if (nt->extended && !console_is_registered(&netconsole_ext)) register_console(&netconsole_ext); + /* User might be enabling the basic format target for the very + * first time, make sure the console is registered. + */ + if (!nt->extended && !console_is_registered(&netconsole)) + register_console(&netconsole); + /* * Skip netpoll_parse_options() -- all the attributes are * already configured via configfs. Just print them out. @@ -1691,8 +1702,8 @@ static int __init init_netconsole(void) { int err; struct netconsole_target *nt, *tmp; + u32 console_type_needed = 0; unsigned int count = 0; - bool extended = false; unsigned long flags; char *target_config; char *input = config; @@ -1708,9 +1719,10 @@ static int __init init_netconsole(void) } /* Dump existing printks when we register */ if (nt->extended) { - extended = true; + console_type_needed |= CONS_EXTENDED; netconsole_ext.flags |= CON_PRINTBUFFER; } else { + console_type_needed |= CONS_BASIC; netconsole.flags |= CON_PRINTBUFFER; } @@ -1729,9 +1741,10 @@ static int __init init_netconsole(void) if (err) goto undonotifier; - if (extended) + if (console_type_needed & CONS_EXTENDED) register_console(&netconsole_ext); - register_console(&netconsole); + if (console_type_needed & CONS_BASIC) + register_console(&netconsole); pr_info("network logging started\n"); return err; @@ -1761,7 +1774,8 @@ static void __exit cleanup_netconsole(void) if (console_is_registered(&netconsole_ext)) unregister_console(&netconsole_ext); - unregister_console(&netconsole); + if (console_is_registered(&netconsole)) + unregister_console(&netconsole); dynamic_netconsole_exit(); unregister_netdevice_notifier(&netconsole_netdev_notifier); -- 2.51.0 From e99d938f867173937373b1bb08cbc2d102a07d0c Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Mon, 9 Jun 2025 02:46:27 -0700 Subject: [PATCH 11/16] netconsole: Add automatic console unregistration on target removal Add unregister_netcons_consoles() function to automatically unregister console handlers when no targets of the corresponding type remain active. The function iterates through the target list to determine which console types (basic vs extended) are still needed, and unregisters any console handlers that are no longer required. This prevents having registered console handlers without corresponding active targets. The function is called when a target is disabled and moved to the cleanup list, ensuring proper cleanup of unused console registrations. Signed-off-by: Breno Leitao Link: https://patch.msgid.link/20250609-netcons_ext-v3-2-5336fa670326@debian.org Signed-off-by: Jakub Kicinski --- drivers/net/netconsole.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 01baa45221b4b..21077aff061c5 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -460,6 +460,33 @@ static ssize_t sysdata_release_enabled_show(struct config_item *item, return sysfs_emit(buf, "%d\n", release_enabled); } +/* Iterate in the list of target, and make sure we don't have any console + * register without targets of the same type + */ +static void unregister_netcons_consoles(void) +{ + struct netconsole_target *nt; + u32 console_type_needed = 0; + unsigned long flags; + + spin_lock_irqsave(&target_list_lock, flags); + list_for_each_entry(nt, &target_list, list) { + if (nt->extended) + console_type_needed |= CONS_EXTENDED; + else + console_type_needed |= CONS_BASIC; + } + spin_unlock_irqrestore(&target_list_lock, flags); + + if (!(console_type_needed & CONS_EXTENDED) && + console_is_registered(&netconsole_ext)) + unregister_console(&netconsole_ext); + + if (!(console_type_needed & CONS_BASIC) && + console_is_registered(&netconsole)) + unregister_console(&netconsole); +} + /* * This one is special -- targets created through the configfs interface * are not enabled (and the corresponding netpoll activated) by default. @@ -493,14 +520,18 @@ static ssize_t enabled_store(struct config_item *item, goto out_unlock; } - if (nt->extended && !console_is_registered(&netconsole_ext)) + if (nt->extended && !console_is_registered(&netconsole_ext)) { + netconsole_ext.flags |= CON_ENABLED; register_console(&netconsole_ext); + } /* User might be enabling the basic format target for the very * first time, make sure the console is registered. */ - if (!nt->extended && !console_is_registered(&netconsole)) + if (!nt->extended && !console_is_registered(&netconsole)) { + netconsole.flags |= CON_ENABLED; register_console(&netconsole); + } /* * Skip netpoll_parse_options() -- all the attributes are @@ -528,6 +559,10 @@ static ssize_t enabled_store(struct config_item *item, list_move(&nt->list, &target_cleanup_list); spin_unlock_irqrestore(&target_list_lock, flags); mutex_unlock(&target_cleanup_list_lock); + /* Unregister consoles, whose the last target of that type got + * disabled. + */ + unregister_netcons_consoles(); } ret = strnlen(buf, count); -- 2.51.0 From 69b25dd20c8368750d1e8b12cfe31387c545bdd1 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Mon, 9 Jun 2025 02:46:28 -0700 Subject: [PATCH 12/16] selftests: netconsole: Do not exit from inside the validation function Remove the exit call from validate_result() function and move the test exit logic to the main script. This allows the function to be reused in scenarios where the test needs to continue execution after validation, rather than terminating immediately. The validate_result() function should focus on validation logic only, while the calling script maintains control over program flow and exit conditions. This change improves code modularity and prepares for potential future enhancements where multiple validations might be needed in a single test run. Signed-off-by: Breno Leitao Link: https://patch.msgid.link/20250609-netcons_ext-v3-3-5336fa670326@debian.org Signed-off-by: Jakub Kicinski --- tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh | 1 - tools/testing/selftests/drivers/net/netcons_basic.sh | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh index 29b01b8e2215c..2d5dd3297693c 100644 --- a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh +++ b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh @@ -185,7 +185,6 @@ function validate_result() { # Delete the file once it is validated, otherwise keep it # for debugging purposes rm "${TMPFILENAME}" - exit "${ksft_pass}" } function check_for_dependencies() { diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh b/tools/testing/selftests/drivers/net/netcons_basic.sh index fe765da498e84..d2f0685d24ba3 100755 --- a/tools/testing/selftests/drivers/net/netcons_basic.sh +++ b/tools/testing/selftests/drivers/net/netcons_basic.sh @@ -50,3 +50,5 @@ busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}" # Make sure the message was received in the dst part # and exit validate_result "${OUTPUT_FILE}" + +exit "${ksft_pass}" -- 2.51.0 From 224a6e602fb371b42ba5f854f32191d23b7e140a Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Mon, 9 Jun 2025 02:46:29 -0700 Subject: [PATCH 13/16] selftests: netconsole: Add support for basic netconsole target format Extend the netconsole selftest to validate both basic and extended target formats. The basic format is a simpler variant that doesn't support userdata or release functionality. The test now validates that netconsole works correctly in both configurations, improving test coverage for different netconsole deployment scenarios. Signed-off-by: Breno Leitao Link: https://patch.msgid.link/20250609-netcons_ext-v3-4-5336fa670326@debian.org Signed-off-by: Jakub Kicinski --- .../drivers/net/lib/sh/lib_netcons.sh | 26 ++++++++-- .../selftests/drivers/net/netcons_basic.sh | 48 ++++++++++++------- 2 files changed, 52 insertions(+), 22 deletions(-) diff --git a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh index 2d5dd3297693c..71a5a8b1712c0 100644 --- a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh +++ b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh @@ -95,6 +95,8 @@ function set_network() { } function create_dynamic_target() { + local FORMAT=${1:-"extended"} + DSTMAC=$(ip netns exec "${NAMESPACE}" \ ip link show "${DSTIF}" | awk '/ether/ {print $2}') @@ -106,6 +108,16 @@ function create_dynamic_target() { echo "${DSTMAC}" > "${NETCONS_PATH}"/remote_mac echo "${SRCIF}" > "${NETCONS_PATH}"/dev_name + if [ "${FORMAT}" == "basic" ] + then + # Basic target does not support release + echo 0 > "${NETCONS_PATH}"/release + echo 0 > "${NETCONS_PATH}"/extended + elif [ "${FORMAT}" == "extended" ] + then + echo 1 > "${NETCONS_PATH}"/extended + fi + echo 1 > "${NETCONS_PATH}"/enabled } @@ -159,6 +171,7 @@ function listen_port_and_save_to() { function validate_result() { local TMPFILENAME="$1" + local FORMAT=${2:-"extended"} # TMPFILENAME will contain something like: # 6.11.1-0_fbk0_rc13_509_g30d75cea12f7,13,1822,115075213798,-;netconsole selftest: netcons_gtJHM @@ -176,10 +189,15 @@ function validate_result() { exit "${ksft_fail}" fi - if ! grep -q "${USERDATA_KEY}=${USERDATA_VALUE}" "${TMPFILENAME}"; then - echo "FAIL: ${USERDATA_KEY}=${USERDATA_VALUE} not found in ${TMPFILENAME}" >&2 - cat "${TMPFILENAME}" >&2 - exit "${ksft_fail}" + # userdata is not supported on basic format target, + # thus, do not validate it. + if [ "${FORMAT}" != "basic" ]; + then + if ! grep -q "${USERDATA_KEY}=${USERDATA_VALUE}" "${TMPFILENAME}"; then + echo "FAIL: ${USERDATA_KEY}=${USERDATA_VALUE} not found in ${TMPFILENAME}" >&2 + cat "${TMPFILENAME}" >&2 + exit "${ksft_fail}" + fi fi # Delete the file once it is validated, otherwise keep it diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh b/tools/testing/selftests/drivers/net/netcons_basic.sh index d2f0685d24ba3..40a6ac6191b8b 100755 --- a/tools/testing/selftests/drivers/net/netcons_basic.sh +++ b/tools/testing/selftests/drivers/net/netcons_basic.sh @@ -32,23 +32,35 @@ check_for_dependencies echo "6 5" > /proc/sys/kernel/printk # Remove the namespace, interfaces and netconsole target on exit trap cleanup EXIT -# Create one namespace and two interfaces -set_network -# Create a dynamic target for netconsole -create_dynamic_target -# Set userdata "key" with the "value" value -set_user_data -# Listed for netconsole port inside the namespace and destination interface -listen_port_and_save_to "${OUTPUT_FILE}" & -# Wait for socat to start and listen to the port. -wait_local_port_listen "${NAMESPACE}" "${PORT}" udp -# Send the message -echo "${MSG}: ${TARGET}" > /dev/kmsg -# Wait until socat saves the file to disk -busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}" - -# Make sure the message was received in the dst part -# and exit -validate_result "${OUTPUT_FILE}" +# Run the test twice, with different format modes +for FORMAT in "basic" "extended" +do + echo "Running with target mode: ${FORMAT}" + # Create one namespace and two interfaces + set_network + # Create a dynamic target for netconsole + create_dynamic_target "${FORMAT}" + # Only set userdata for extended format + if [ "$FORMAT" == "extended" ] + then + # Set userdata "key" with the "value" value + set_user_data + fi + # Listed for netconsole port inside the namespace and destination interface + listen_port_and_save_to "${OUTPUT_FILE}" & + # Wait for socat to start and listen to the port. + wait_local_port_listen "${NAMESPACE}" "${PORT}" udp + # Send the message + echo "${MSG}: ${TARGET}" > /dev/kmsg + # Wait until socat saves the file to disk + busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}" + + # Make sure the message was received in the dst part + # and exit + validate_result "${OUTPUT_FILE}" "${FORMAT}" + cleanup +done + +trap - EXIT exit "${ksft_pass}" -- 2.51.0 From c09ef59e17c6921c577d54bc8da4331b955d01a7 Mon Sep 17 00:00:00 2001 From: Dipayaan Roy Date: Mon, 9 Jun 2025 03:01:03 -0700 Subject: [PATCH 14/16] net: mana: Expose additional hardware counters for drop and TC via ethtool. Add support for reporting additional hardware counters for drop and TC using the ethtool -S interface. These counters include: - Aggregate Rx/Tx drop counters - Per-TC Rx/Tx packet counters - Per-TC Rx/Tx byte counters - Per-TC Rx/Tx pause frame counters The counters are exposed using ethtool_ops->get_ethtool_stats and ethtool_ops->get_strings. This feature/counters are not available to all versions of hardware. Signed-off-by: Dipayaan Roy Reviewed-by: Subbaraya Sundeep Reviewed-by: Haiyang Zhang Link: https://patch.msgid.link/20250609100103.GA7102@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Jakub Kicinski --- .../net/ethernet/microsoft/mana/hw_channel.c | 6 +- drivers/net/ethernet/microsoft/mana/mana_en.c | 87 +++++++++++- .../ethernet/microsoft/mana/mana_ethtool.c | 76 +++++++++- include/net/mana/mana.h | 131 ++++++++++++++++++ 4 files changed, 292 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c index a8c4d8db75a56..3d3677c0d0147 100644 --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c @@ -2,6 +2,7 @@ /* Copyright (c) 2021, Microsoft Corporation. */ #include +#include #include #include @@ -890,8 +891,9 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len, } if (ctx->status_code && ctx->status_code != GDMA_STATUS_MORE_ENTRIES) { - dev_err(hwc->dev, "HWC: Failed hw_channel req: 0x%x\n", - ctx->status_code); + if (req_msg->req.msg_type != MANA_QUERY_PHY_STAT) + dev_err(hwc->dev, "HWC: Failed hw_channel req: 0x%x\n", + ctx->status_code); err = -EPROTO; goto out; } diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index ccd2885c939e0..e68b8190bb7a8 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -774,8 +774,9 @@ static int mana_send_request(struct mana_context *ac, void *in_buf, err = mana_gd_send_request(gc, in_len, in_buf, out_len, out_buf); if (err || resp->status) { - dev_err(dev, "Failed to send mana message: %d, 0x%x\n", - err, resp->status); + if (req->req.msg_type != MANA_QUERY_PHY_STAT) + dev_err(dev, "Failed to send mana message: %d, 0x%x\n", + err, resp->status); return err ? err : -EPROTO; } @@ -2611,6 +2612,88 @@ void mana_query_gf_stats(struct mana_port_context *apc) apc->eth_stats.hc_tx_err_gdma = resp.tx_err_gdma; } +void mana_query_phy_stats(struct mana_port_context *apc) +{ + struct mana_query_phy_stat_resp resp = {}; + struct mana_query_phy_stat_req req = {}; + struct net_device *ndev = apc->ndev; + int err; + + mana_gd_init_req_hdr(&req.hdr, MANA_QUERY_PHY_STAT, + sizeof(req), sizeof(resp)); + err = mana_send_request(apc->ac, &req, sizeof(req), &resp, + sizeof(resp)); + if (err) + return; + + err = mana_verify_resp_hdr(&resp.hdr, MANA_QUERY_PHY_STAT, + sizeof(resp)); + if (err || resp.hdr.status) { + netdev_err(ndev, + "Failed to query PHY stats: %d, resp:0x%x\n", + err, resp.hdr.status); + return; + } + + /* Aggregate drop counters */ + apc->phy_stats.rx_pkt_drop_phy = resp.rx_pkt_drop_phy; + apc->phy_stats.tx_pkt_drop_phy = resp.tx_pkt_drop_phy; + + /* Per TC traffic Counters */ + apc->phy_stats.rx_pkt_tc0_phy = resp.rx_pkt_tc0_phy; + apc->phy_stats.tx_pkt_tc0_phy = resp.tx_pkt_tc0_phy; + apc->phy_stats.rx_pkt_tc1_phy = resp.rx_pkt_tc1_phy; + apc->phy_stats.tx_pkt_tc1_phy = resp.tx_pkt_tc1_phy; + apc->phy_stats.rx_pkt_tc2_phy = resp.rx_pkt_tc2_phy; + apc->phy_stats.tx_pkt_tc2_phy = resp.tx_pkt_tc2_phy; + apc->phy_stats.rx_pkt_tc3_phy = resp.rx_pkt_tc3_phy; + apc->phy_stats.tx_pkt_tc3_phy = resp.tx_pkt_tc3_phy; + apc->phy_stats.rx_pkt_tc4_phy = resp.rx_pkt_tc4_phy; + apc->phy_stats.tx_pkt_tc4_phy = resp.tx_pkt_tc4_phy; + apc->phy_stats.rx_pkt_tc5_phy = resp.rx_pkt_tc5_phy; + apc->phy_stats.tx_pkt_tc5_phy = resp.tx_pkt_tc5_phy; + apc->phy_stats.rx_pkt_tc6_phy = resp.rx_pkt_tc6_phy; + apc->phy_stats.tx_pkt_tc6_phy = resp.tx_pkt_tc6_phy; + apc->phy_stats.rx_pkt_tc7_phy = resp.rx_pkt_tc7_phy; + apc->phy_stats.tx_pkt_tc7_phy = resp.tx_pkt_tc7_phy; + + /* Per TC byte Counters */ + apc->phy_stats.rx_byte_tc0_phy = resp.rx_byte_tc0_phy; + apc->phy_stats.tx_byte_tc0_phy = resp.tx_byte_tc0_phy; + apc->phy_stats.rx_byte_tc1_phy = resp.rx_byte_tc1_phy; + apc->phy_stats.tx_byte_tc1_phy = resp.tx_byte_tc1_phy; + apc->phy_stats.rx_byte_tc2_phy = resp.rx_byte_tc2_phy; + apc->phy_stats.tx_byte_tc2_phy = resp.tx_byte_tc2_phy; + apc->phy_stats.rx_byte_tc3_phy = resp.rx_byte_tc3_phy; + apc->phy_stats.tx_byte_tc3_phy = resp.tx_byte_tc3_phy; + apc->phy_stats.rx_byte_tc4_phy = resp.rx_byte_tc4_phy; + apc->phy_stats.tx_byte_tc4_phy = resp.tx_byte_tc4_phy; + apc->phy_stats.rx_byte_tc5_phy = resp.rx_byte_tc5_phy; + apc->phy_stats.tx_byte_tc5_phy = resp.tx_byte_tc5_phy; + apc->phy_stats.rx_byte_tc6_phy = resp.rx_byte_tc6_phy; + apc->phy_stats.tx_byte_tc6_phy = resp.tx_byte_tc6_phy; + apc->phy_stats.rx_byte_tc7_phy = resp.rx_byte_tc7_phy; + apc->phy_stats.tx_byte_tc7_phy = resp.tx_byte_tc7_phy; + + /* Per TC pause Counters */ + apc->phy_stats.rx_pause_tc0_phy = resp.rx_pause_tc0_phy; + apc->phy_stats.tx_pause_tc0_phy = resp.tx_pause_tc0_phy; + apc->phy_stats.rx_pause_tc1_phy = resp.rx_pause_tc1_phy; + apc->phy_stats.tx_pause_tc1_phy = resp.tx_pause_tc1_phy; + apc->phy_stats.rx_pause_tc2_phy = resp.rx_pause_tc2_phy; + apc->phy_stats.tx_pause_tc2_phy = resp.tx_pause_tc2_phy; + apc->phy_stats.rx_pause_tc3_phy = resp.rx_pause_tc3_phy; + apc->phy_stats.tx_pause_tc3_phy = resp.tx_pause_tc3_phy; + apc->phy_stats.rx_pause_tc4_phy = resp.rx_pause_tc4_phy; + apc->phy_stats.tx_pause_tc4_phy = resp.tx_pause_tc4_phy; + apc->phy_stats.rx_pause_tc5_phy = resp.rx_pause_tc5_phy; + apc->phy_stats.tx_pause_tc5_phy = resp.tx_pause_tc5_phy; + apc->phy_stats.rx_pause_tc6_phy = resp.rx_pause_tc6_phy; + apc->phy_stats.tx_pause_tc6_phy = resp.tx_pause_tc6_phy; + apc->phy_stats.rx_pause_tc7_phy = resp.rx_pause_tc7_phy; + apc->phy_stats.tx_pause_tc7_phy = resp.tx_pause_tc7_phy; +} + static int mana_init_port(struct net_device *ndev) { struct mana_port_context *apc = netdev_priv(ndev); diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c index c419626073f5f..4fb3a04994a2d 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c @@ -7,10 +7,12 @@ #include -static const struct { +struct mana_stats_desc { char name[ETH_GSTRING_LEN]; u16 offset; -} mana_eth_stats[] = { +}; + +static const struct mana_stats_desc mana_eth_stats[] = { {"stop_queue", offsetof(struct mana_ethtool_stats, stop_queue)}, {"wake_queue", offsetof(struct mana_ethtool_stats, wake_queue)}, {"hc_rx_discards_no_wqe", offsetof(struct mana_ethtool_stats, @@ -75,6 +77,59 @@ static const struct { rx_cqe_unknown_type)}, }; +static const struct mana_stats_desc mana_phy_stats[] = { + { "hc_rx_pkt_drop_phy", offsetof(struct mana_ethtool_phy_stats, rx_pkt_drop_phy) }, + { "hc_tx_pkt_drop_phy", offsetof(struct mana_ethtool_phy_stats, tx_pkt_drop_phy) }, + { "hc_tc0_rx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, rx_pkt_tc0_phy) }, + { "hc_tc0_rx_byte_phy", offsetof(struct mana_ethtool_phy_stats, rx_byte_tc0_phy) }, + { "hc_tc0_tx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, tx_pkt_tc0_phy) }, + { "hc_tc0_tx_byte_phy", offsetof(struct mana_ethtool_phy_stats, tx_byte_tc0_phy) }, + { "hc_tc1_rx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, rx_pkt_tc1_phy) }, + { "hc_tc1_rx_byte_phy", offsetof(struct mana_ethtool_phy_stats, rx_byte_tc1_phy) }, + { "hc_tc1_tx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, tx_pkt_tc1_phy) }, + { "hc_tc1_tx_byte_phy", offsetof(struct mana_ethtool_phy_stats, tx_byte_tc1_phy) }, + { "hc_tc2_rx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, rx_pkt_tc2_phy) }, + { "hc_tc2_rx_byte_phy", offsetof(struct mana_ethtool_phy_stats, rx_byte_tc2_phy) }, + { "hc_tc2_tx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, tx_pkt_tc2_phy) }, + { "hc_tc2_tx_byte_phy", offsetof(struct mana_ethtool_phy_stats, tx_byte_tc2_phy) }, + { "hc_tc3_rx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, rx_pkt_tc3_phy) }, + { "hc_tc3_rx_byte_phy", offsetof(struct mana_ethtool_phy_stats, rx_byte_tc3_phy) }, + { "hc_tc3_tx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, tx_pkt_tc3_phy) }, + { "hc_tc3_tx_byte_phy", offsetof(struct mana_ethtool_phy_stats, tx_byte_tc3_phy) }, + { "hc_tc4_rx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, rx_pkt_tc4_phy) }, + { "hc_tc4_rx_byte_phy", offsetof(struct mana_ethtool_phy_stats, rx_byte_tc4_phy) }, + { "hc_tc4_tx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, tx_pkt_tc4_phy) }, + { "hc_tc4_tx_byte_phy", offsetof(struct mana_ethtool_phy_stats, tx_byte_tc4_phy) }, + { "hc_tc5_rx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, rx_pkt_tc5_phy) }, + { "hc_tc5_rx_byte_phy", offsetof(struct mana_ethtool_phy_stats, rx_byte_tc5_phy) }, + { "hc_tc5_tx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, tx_pkt_tc5_phy) }, + { "hc_tc5_tx_byte_phy", offsetof(struct mana_ethtool_phy_stats, tx_byte_tc5_phy) }, + { "hc_tc6_rx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, rx_pkt_tc6_phy) }, + { "hc_tc6_rx_byte_phy", offsetof(struct mana_ethtool_phy_stats, rx_byte_tc6_phy) }, + { "hc_tc6_tx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, tx_pkt_tc6_phy) }, + { "hc_tc6_tx_byte_phy", offsetof(struct mana_ethtool_phy_stats, tx_byte_tc6_phy) }, + { "hc_tc7_rx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, rx_pkt_tc7_phy) }, + { "hc_tc7_rx_byte_phy", offsetof(struct mana_ethtool_phy_stats, rx_byte_tc7_phy) }, + { "hc_tc7_tx_pkt_phy", offsetof(struct mana_ethtool_phy_stats, tx_pkt_tc7_phy) }, + { "hc_tc7_tx_byte_phy", offsetof(struct mana_ethtool_phy_stats, tx_byte_tc7_phy) }, + { "hc_tc0_rx_pause_phy", offsetof(struct mana_ethtool_phy_stats, rx_pause_tc0_phy) }, + { "hc_tc0_tx_pause_phy", offsetof(struct mana_ethtool_phy_stats, tx_pause_tc0_phy) }, + { "hc_tc1_rx_pause_phy", offsetof(struct mana_ethtool_phy_stats, rx_pause_tc1_phy) }, + { "hc_tc1_tx_pause_phy", offsetof(struct mana_ethtool_phy_stats, tx_pause_tc1_phy) }, + { "hc_tc2_rx_pause_phy", offsetof(struct mana_ethtool_phy_stats, rx_pause_tc2_phy) }, + { "hc_tc2_tx_pause_phy", offsetof(struct mana_ethtool_phy_stats, tx_pause_tc2_phy) }, + { "hc_tc3_rx_pause_phy", offsetof(struct mana_ethtool_phy_stats, rx_pause_tc3_phy) }, + { "hc_tc3_tx_pause_phy", offsetof(struct mana_ethtool_phy_stats, tx_pause_tc3_phy) }, + { "hc_tc4_rx_pause_phy", offsetof(struct mana_ethtool_phy_stats, rx_pause_tc4_phy) }, + { "hc_tc4_tx_pause_phy", offsetof(struct mana_ethtool_phy_stats, tx_pause_tc4_phy) }, + { "hc_tc5_rx_pause_phy", offsetof(struct mana_ethtool_phy_stats, rx_pause_tc5_phy) }, + { "hc_tc5_tx_pause_phy", offsetof(struct mana_ethtool_phy_stats, tx_pause_tc5_phy) }, + { "hc_tc6_rx_pause_phy", offsetof(struct mana_ethtool_phy_stats, rx_pause_tc6_phy) }, + { "hc_tc6_tx_pause_phy", offsetof(struct mana_ethtool_phy_stats, tx_pause_tc6_phy) }, + { "hc_tc7_rx_pause_phy", offsetof(struct mana_ethtool_phy_stats, rx_pause_tc7_phy) }, + { "hc_tc7_tx_pause_phy", offsetof(struct mana_ethtool_phy_stats, tx_pause_tc7_phy) }, +}; + static int mana_get_sset_count(struct net_device *ndev, int stringset) { struct mana_port_context *apc = netdev_priv(ndev); @@ -83,8 +138,8 @@ static int mana_get_sset_count(struct net_device *ndev, int stringset) if (stringset != ETH_SS_STATS) return -EINVAL; - return ARRAY_SIZE(mana_eth_stats) + num_queues * - (MANA_STATS_RX_COUNT + MANA_STATS_TX_COUNT); + return ARRAY_SIZE(mana_eth_stats) + ARRAY_SIZE(mana_phy_stats) + + num_queues * (MANA_STATS_RX_COUNT + MANA_STATS_TX_COUNT); } static void mana_get_strings(struct net_device *ndev, u32 stringset, u8 *data) @@ -99,6 +154,9 @@ static void mana_get_strings(struct net_device *ndev, u32 stringset, u8 *data) for (i = 0; i < ARRAY_SIZE(mana_eth_stats); i++) ethtool_puts(&data, mana_eth_stats[i].name); + for (i = 0; i < ARRAY_SIZE(mana_phy_stats); i++) + ethtool_puts(&data, mana_phy_stats[i].name); + for (i = 0; i < num_queues; i++) { ethtool_sprintf(&data, "rx_%d_packets", i); ethtool_sprintf(&data, "rx_%d_bytes", i); @@ -128,6 +186,7 @@ static void mana_get_ethtool_stats(struct net_device *ndev, struct mana_port_context *apc = netdev_priv(ndev); unsigned int num_queues = apc->num_queues; void *eth_stats = &apc->eth_stats; + void *phy_stats = &apc->phy_stats; struct mana_stats_rx *rx_stats; struct mana_stats_tx *tx_stats; unsigned int start; @@ -151,9 +210,18 @@ static void mana_get_ethtool_stats(struct net_device *ndev, /* we call mana function to update stats from GDMA */ mana_query_gf_stats(apc); + /* We call this mana function to get the phy stats from GDMA and includes + * aggregate tx/rx drop counters, Per-TC(Traffic Channel) tx/rx and pause + * counters. + */ + mana_query_phy_stats(apc); + for (q = 0; q < ARRAY_SIZE(mana_eth_stats); q++) data[i++] = *(u64 *)(eth_stats + mana_eth_stats[q].offset); + for (q = 0; q < ARRAY_SIZE(mana_phy_stats); q++) + data[i++] = *(u64 *)(phy_stats + mana_phy_stats[q].offset); + for (q = 0; q < num_queues; q++) { rx_stats = &apc->rxqs[q]->stats; diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h index 9abb664612110..4176edf1be719 100644 --- a/include/net/mana/mana.h +++ b/include/net/mana/mana.h @@ -404,6 +404,65 @@ struct mana_ethtool_stats { u64 rx_cqe_unknown_type; }; +struct mana_ethtool_phy_stats { + /* Drop Counters */ + u64 rx_pkt_drop_phy; + u64 tx_pkt_drop_phy; + + /* Per TC traffic Counters */ + u64 rx_pkt_tc0_phy; + u64 tx_pkt_tc0_phy; + u64 rx_pkt_tc1_phy; + u64 tx_pkt_tc1_phy; + u64 rx_pkt_tc2_phy; + u64 tx_pkt_tc2_phy; + u64 rx_pkt_tc3_phy; + u64 tx_pkt_tc3_phy; + u64 rx_pkt_tc4_phy; + u64 tx_pkt_tc4_phy; + u64 rx_pkt_tc5_phy; + u64 tx_pkt_tc5_phy; + u64 rx_pkt_tc6_phy; + u64 tx_pkt_tc6_phy; + u64 rx_pkt_tc7_phy; + u64 tx_pkt_tc7_phy; + + u64 rx_byte_tc0_phy; + u64 tx_byte_tc0_phy; + u64 rx_byte_tc1_phy; + u64 tx_byte_tc1_phy; + u64 rx_byte_tc2_phy; + u64 tx_byte_tc2_phy; + u64 rx_byte_tc3_phy; + u64 tx_byte_tc3_phy; + u64 rx_byte_tc4_phy; + u64 tx_byte_tc4_phy; + u64 rx_byte_tc5_phy; + u64 tx_byte_tc5_phy; + u64 rx_byte_tc6_phy; + u64 tx_byte_tc6_phy; + u64 rx_byte_tc7_phy; + u64 tx_byte_tc7_phy; + + /* Per TC pause Counters */ + u64 rx_pause_tc0_phy; + u64 tx_pause_tc0_phy; + u64 rx_pause_tc1_phy; + u64 tx_pause_tc1_phy; + u64 rx_pause_tc2_phy; + u64 tx_pause_tc2_phy; + u64 rx_pause_tc3_phy; + u64 tx_pause_tc3_phy; + u64 rx_pause_tc4_phy; + u64 tx_pause_tc4_phy; + u64 rx_pause_tc5_phy; + u64 tx_pause_tc5_phy; + u64 rx_pause_tc6_phy; + u64 tx_pause_tc6_phy; + u64 rx_pause_tc7_phy; + u64 tx_pause_tc7_phy; +}; + struct mana_context { struct gdma_dev *gdma_dev; @@ -474,6 +533,8 @@ struct mana_port_context { struct mana_ethtool_stats eth_stats; + struct mana_ethtool_phy_stats phy_stats; + /* Debugfs */ struct dentry *mana_port_debugfs; }; @@ -501,6 +562,7 @@ struct bpf_prog *mana_xdp_get(struct mana_port_context *apc); void mana_chn_setxdp(struct mana_port_context *apc, struct bpf_prog *prog); int mana_bpf(struct net_device *ndev, struct netdev_bpf *bpf); void mana_query_gf_stats(struct mana_port_context *apc); +void mana_query_phy_stats(struct mana_port_context *apc); int mana_pre_alloc_rxbufs(struct mana_port_context *apc, int mtu, int num_queues); void mana_pre_dealloc_rxbufs(struct mana_port_context *apc); @@ -527,6 +589,7 @@ enum mana_command_code { MANA_FENCE_RQ = 0x20006, MANA_CONFIG_VPORT_RX = 0x20007, MANA_QUERY_VPORT_CONFIG = 0x20008, + MANA_QUERY_PHY_STAT = 0x2000c, /* Privileged commands for the PF mode */ MANA_REGISTER_FILTER = 0x28000, @@ -689,6 +752,74 @@ struct mana_query_gf_stat_resp { u64 tx_err_gdma; }; /* HW DATA */ +/* Query phy stats */ +struct mana_query_phy_stat_req { + struct gdma_req_hdr hdr; + u64 req_stats; +}; /* HW DATA */ + +struct mana_query_phy_stat_resp { + struct gdma_resp_hdr hdr; + u64 reported_stats; + + /* Aggregate Drop Counters */ + u64 rx_pkt_drop_phy; + u64 tx_pkt_drop_phy; + + /* Per TC(Traffic class) traffic Counters */ + u64 rx_pkt_tc0_phy; + u64 tx_pkt_tc0_phy; + u64 rx_pkt_tc1_phy; + u64 tx_pkt_tc1_phy; + u64 rx_pkt_tc2_phy; + u64 tx_pkt_tc2_phy; + u64 rx_pkt_tc3_phy; + u64 tx_pkt_tc3_phy; + u64 rx_pkt_tc4_phy; + u64 tx_pkt_tc4_phy; + u64 rx_pkt_tc5_phy; + u64 tx_pkt_tc5_phy; + u64 rx_pkt_tc6_phy; + u64 tx_pkt_tc6_phy; + u64 rx_pkt_tc7_phy; + u64 tx_pkt_tc7_phy; + + u64 rx_byte_tc0_phy; + u64 tx_byte_tc0_phy; + u64 rx_byte_tc1_phy; + u64 tx_byte_tc1_phy; + u64 rx_byte_tc2_phy; + u64 tx_byte_tc2_phy; + u64 rx_byte_tc3_phy; + u64 tx_byte_tc3_phy; + u64 rx_byte_tc4_phy; + u64 tx_byte_tc4_phy; + u64 rx_byte_tc5_phy; + u64 tx_byte_tc5_phy; + u64 rx_byte_tc6_phy; + u64 tx_byte_tc6_phy; + u64 rx_byte_tc7_phy; + u64 tx_byte_tc7_phy; + + /* Per TC(Traffic Class) pause Counters */ + u64 rx_pause_tc0_phy; + u64 tx_pause_tc0_phy; + u64 rx_pause_tc1_phy; + u64 tx_pause_tc1_phy; + u64 rx_pause_tc2_phy; + u64 tx_pause_tc2_phy; + u64 rx_pause_tc3_phy; + u64 tx_pause_tc3_phy; + u64 rx_pause_tc4_phy; + u64 tx_pause_tc4_phy; + u64 rx_pause_tc5_phy; + u64 tx_pause_tc5_phy; + u64 rx_pause_tc6_phy; + u64 tx_pause_tc6_phy; + u64 rx_pause_tc7_phy; + u64 tx_pause_tc7_phy; +}; /* HW DATA */ + /* Configure vPort Rx Steering */ struct mana_cfg_rx_steer_req_v2 { struct gdma_req_hdr hdr; -- 2.51.0 From 31557b3487b349464daf42bc4366153743c1e727 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 9 Jun 2025 07:39:33 -0700 Subject: [PATCH 15/16] uapi: in6: restore visibility of most IPv6 socket options MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit A decade ago commit 6d08acd2d32e ("in6: fix conflict with glibc") hid the definitions of IPV6 options, because GCC was complaining about duplicates. The commit did not list the warnings seen, but trying to recreate them now I think they are (building iproute2): In file included from ./include/uapi/rdma/rdma_user_cm.h:39, from rdma.h:16, from res.h:9, from res-ctx.c:7: ../include/uapi/linux/in6.h:171:9: warning: ‘IPV6_ADD_MEMBERSHIP’ redefined 171 | #define IPV6_ADD_MEMBERSHIP 20 | ^~~~~~~~~~~~~~~~~~~ In file included from /usr/include/netinet/in.h:37, from rdma.h:13: /usr/include/bits/in.h:233:10: note: this is the location of the previous definition 233 | # define IPV6_ADD_MEMBERSHIP IPV6_JOIN_GROUP | ^~~~~~~~~~~~~~~~~~~ ../include/uapi/linux/in6.h:172:9: warning: ‘IPV6_DROP_MEMBERSHIP’ redefined 172 | #define IPV6_DROP_MEMBERSHIP 21 | ^~~~~~~~~~~~~~~~~~~~ /usr/include/bits/in.h:234:10: note: this is the location of the previous definition 234 | # define IPV6_DROP_MEMBERSHIP IPV6_LEAVE_GROUP | ^~~~~~~~~~~~~~~~~~~~ Compilers don't complain about redefinition if the defines are identical, but here we have the kernel using the literal value, and glibc using an indirection (defining to a name of another define, with the same numerical value). Problem is, the commit in question hid all the IPV6 socket options, and glibc has a pretty sparse list. For instance it lacks Flow Label related options. Willem called this out in commit 3fb321fde22d ("selftests/net: ipv6 flowlabel"): /* uapi/glibc weirdness may leave this undefined */ #ifndef IPV6_FLOWINFO #define IPV6_FLOWINFO 11 #endif More interestingly some applications (socat) use a #ifdef IPV6_FLOWINFO to gate compilation of thier rudimentary flow label support. (For added confusion socat misspells it as IPV4_FLOWINFO in some places.) Hide only the two defines we know glibc has a problem with. If we discover more warnings we can hide more but we should avoid covering the entire block of defines for "IPV6 socket options". Link: https://patch.msgid.link/20250609143933.1654417-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/uapi/linux/in6.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h index ff8d21f9e95b7..5a47339ef7d76 100644 --- a/include/uapi/linux/in6.h +++ b/include/uapi/linux/in6.h @@ -152,7 +152,6 @@ struct in6_flowlabel_req { /* * IPV6 socket options */ -#if __UAPI_DEF_IPV6_OPTIONS #define IPV6_ADDRFORM 1 #define IPV6_2292PKTINFO 2 #define IPV6_2292HOPOPTS 3 @@ -169,8 +168,10 @@ struct in6_flowlabel_req { #define IPV6_MULTICAST_IF 17 #define IPV6_MULTICAST_HOPS 18 #define IPV6_MULTICAST_LOOP 19 +#if __UAPI_DEF_IPV6_OPTIONS #define IPV6_ADD_MEMBERSHIP 20 #define IPV6_DROP_MEMBERSHIP 21 +#endif #define IPV6_ROUTER_ALERT 22 #define IPV6_MTU_DISCOVER 23 #define IPV6_MTU 24 @@ -203,7 +204,6 @@ struct in6_flowlabel_req { #define IPV6_IPSEC_POLICY 34 #define IPV6_XFRM_POLICY 35 #define IPV6_HDRINCL 36 -#endif /* * Multicast: -- 2.51.0 From 1f07789152b8d1e646d6bfecd96a2cf7bd2b9a05 Mon Sep 17 00:00:00 2001 From: "Dr. David Alan Gilbert" Date: Mon, 9 Jun 2025 16:23:30 +0100 Subject: [PATCH 16/16] cxgb3/l2t: Remove unused t3_l2t_send_event The last use of t3_l2t_send_event() was removed in 2019 by commit 30e0f6cf5acb ("RDMA/iw_cxgb3: Remove the iw_cxgb3 module from kernel") Remove it. Signed-off-by: Dr. David Alan Gilbert Link: https://patch.msgid.link/20250609152330.24027-1-linux@treblig.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/chelsio/cxgb3/l2t.c | 37 ------------------------ drivers/net/ethernet/chelsio/cxgb3/l2t.h | 1 - 2 files changed, 38 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c index 9749d1239f58e..5d5f3380ecca8 100644 --- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c +++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c @@ -176,43 +176,6 @@ again: EXPORT_SYMBOL(t3_l2t_send_slow); -void t3_l2t_send_event(struct t3cdev *dev, struct l2t_entry *e) -{ -again: - switch (e->state) { - case L2T_STATE_STALE: /* entry is stale, kick off revalidation */ - neigh_event_send(e->neigh, NULL); - spin_lock_bh(&e->lock); - if (e->state == L2T_STATE_STALE) { - e->state = L2T_STATE_VALID; - } - spin_unlock_bh(&e->lock); - return; - case L2T_STATE_VALID: /* fast-path, send the packet on */ - return; - case L2T_STATE_RESOLVING: - spin_lock_bh(&e->lock); - if (e->state != L2T_STATE_RESOLVING) { - /* ARP already completed */ - spin_unlock_bh(&e->lock); - goto again; - } - spin_unlock_bh(&e->lock); - - /* - * Only the first packet added to the arpq should kick off - * resolution. However, because the alloc_skb below can fail, - * we allow each packet added to the arpq to retry resolution - * as a way of recovering from transient memory exhaustion. - * A better way would be to use a work request to retry L2T - * entries when there's no memory. - */ - neigh_event_send(e->neigh, NULL); - } -} - -EXPORT_SYMBOL(t3_l2t_send_event); - /* * Allocate a free L2T entry. Must be called with l2t_data.lock held. */ diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.h b/drivers/net/ethernet/chelsio/cxgb3/l2t.h index 646ca0bc25bd0..33558f177497e 100644 --- a/drivers/net/ethernet/chelsio/cxgb3/l2t.h +++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.h @@ -113,7 +113,6 @@ struct l2t_entry *t3_l2t_get(struct t3cdev *cdev, struct dst_entry *dst, struct net_device *dev, const void *daddr); int t3_l2t_send_slow(struct t3cdev *dev, struct sk_buff *skb, struct l2t_entry *e); -void t3_l2t_send_event(struct t3cdev *dev, struct l2t_entry *e); struct l2t_data *t3_init_l2t(unsigned int l2t_capacity); int cxgb3_ofld_send(struct t3cdev *dev, struct sk_buff *skb); -- 2.51.0