From ba3b7ac4f7143568ed6480180a847dc752780ece Mon Sep 17 00:00:00 2001 From: Roger Quadros Date: Fri, 1 Nov 2024 12:18:51 +0200 Subject: [PATCH 01/16] net: ethernet: ti: am65-cpsw: fix warning in am65_cpsw_nuss_remove_rx_chns() flow->irq is initialized to 0 which is a valid IRQ. Set it to -EINVAL in error path of am65_cpsw_nuss_init_rx_chns() so we do not try to free an unallocated IRQ in am65_cpsw_nuss_remove_rx_chns(). If user tried to change number of RX queues and am65_cpsw_nuss_init_rx_chns() failed due to any reason, the warning will happen if user tries to change the number of RX queues after the error condition. root@am62xx-evm:~# ethtool -L eth0 rx 3 [ 40.385293] am65-cpsw-nuss 8000000.ethernet: set new flow-id-base 19 [ 40.393211] am65-cpsw-nuss 8000000.ethernet: Failed to init rx flow2 netlink error: Invalid argument root@am62xx-evm:~# ethtool -L eth0 rx 2 [ 82.306427] ------------[ cut here ]------------ [ 82.311075] WARNING: CPU: 0 PID: 378 at kernel/irq/devres.c:144 devm_free_irq+0x84/0x90 [ 82.469770] Call trace: [ 82.472208] devm_free_irq+0x84/0x90 [ 82.475777] am65_cpsw_nuss_remove_rx_chns+0x6c/0xac [ti_am65_cpsw_nuss] [ 82.482487] am65_cpsw_nuss_update_tx_rx_chns+0x2c/0x9c [ti_am65_cpsw_nuss] [ 82.489442] am65_cpsw_set_channels+0x30/0x4c [ti_am65_cpsw_nuss] [ 82.495531] ethnl_set_channels+0x224/0x2dc [ 82.499713] ethnl_default_set_doit+0xb8/0x1b8 [ 82.504149] genl_family_rcv_msg_doit+0xc0/0x124 [ 82.508757] genl_rcv_msg+0x1f0/0x284 [ 82.512409] netlink_rcv_skb+0x58/0x130 [ 82.516239] genl_rcv+0x38/0x50 [ 82.519374] netlink_unicast+0x1d0/0x2b0 [ 82.523289] netlink_sendmsg+0x180/0x3c4 [ 82.527205] __sys_sendto+0xe4/0x158 [ 82.530779] __arm64_sys_sendto+0x28/0x38 [ 82.534782] invoke_syscall+0x44/0x100 [ 82.538526] el0_svc_common.constprop.0+0xc0/0xe0 [ 82.543221] do_el0_svc+0x1c/0x28 [ 82.546528] el0_svc+0x28/0x98 [ 82.549578] el0t_64_sync_handler+0xc0/0xc4 [ 82.553752] el0t_64_sync+0x190/0x194 [ 82.557407] ---[ end trace 0000000000000000 ]--- Fixes: da70d184a8c3 ("net: ethernet: ti: am65-cpsw: Introduce multi queue Rx") Signed-off-by: Roger Quadros Signed-off-by: Paolo Abeni --- drivers/net/ethernet/ti/am65-cpsw-nuss.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c index 70aea654c79f..ba6db61dd227 100644 --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c @@ -2441,6 +2441,7 @@ static int am65_cpsw_nuss_init_rx_chns(struct am65_cpsw_common *common) flow = &rx_chn->flows[i]; flow->id = i; flow->common = common; + flow->irq = -EINVAL; rx_flow_cfg.ring_rxfdq0_id = fdqring_id; rx_flow_cfg.rx_cfg.size = max_desc_num; @@ -2483,6 +2484,7 @@ static int am65_cpsw_nuss_init_rx_chns(struct am65_cpsw_common *common) if (ret) { dev_err(dev, "failure requesting rx %d irq %u, %d\n", i, flow->irq, ret); + flow->irq = -EINVAL; goto err; } } -- 2.51.0 From 256748d5480bb3c4b731236c6d6fc86a8e2815d8 Mon Sep 17 00:00:00 2001 From: Diogo Silva Date: Sat, 2 Nov 2024 16:15:05 +0100 Subject: [PATCH 02/16] net: phy: ti: add PHY_RST_AFTER_CLK_EN flag DP83848 datasheet (section 4.7.2) indicates that the reset pin should be toggled after the clocks are running. Add the PHY_RST_AFTER_CLK_EN to make sure that this indication is respected. In my experience not having this flag enabled would lead to, on some boots, the wrong MII mode being selected if the PHY was initialized on the bootloader and was receiving data during Linux boot. Signed-off-by: Diogo Silva Reviewed-by: Andrew Lunn Fixes: 34e45ad9378c ("net: phy: dp83848: Add TI DP83848 Ethernet PHY") Link: https://patch.msgid.link/20241102151504.811306-1-paissilva@ld-100007.ds1.internal Signed-off-by: Jakub Kicinski --- drivers/net/phy/dp83848.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/phy/dp83848.c b/drivers/net/phy/dp83848.c index 937061acfc61..351411f0aa6f 100644 --- a/drivers/net/phy/dp83848.c +++ b/drivers/net/phy/dp83848.c @@ -147,6 +147,8 @@ MODULE_DEVICE_TABLE(mdio, dp83848_tbl); /* IRQ related */ \ .config_intr = dp83848_config_intr, \ .handle_interrupt = dp83848_handle_interrupt, \ + \ + .flags = PHY_RST_AFTER_CLK_EN, \ } static struct phy_driver dp83848_driver[] = { -- 2.51.0 From cfbbd4859882a5469f6f4945937a074ee78c4b46 Mon Sep 17 00:00:00 2001 From: "Matthieu Baerts (NGI0)" Date: Mon, 4 Nov 2024 13:31:41 +0100 Subject: [PATCH 03/16] mptcp: no admin perm to list endpoints During the switch to YNL, the command to list all endpoints has been accidentally restricted to users with admin permissions. It looks like there are no reasons to have this restriction which makes it harder for a user to quickly check if the endpoint list has been correctly populated by an automated tool. Best to go back to the previous behaviour then. mptcp_pm_gen.c has been modified using ynl-gen-c.py: $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \ --spec Documentation/netlink/specs/mptcp_pm.yaml --source \ -o net/mptcp/mptcp_pm_gen.c The header file doesn't need to be regenerated. Fixes: 1d0507f46843 ("net: mptcp: convert netlink from small_ops to ops") Cc: stable@vger.kernel.org Reviewed-by: Davide Caratti Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) Link: https://patch.msgid.link/20241104-net-mptcp-misc-6-12-v1-1-c13f2ff1656f@kernel.org Signed-off-by: Jakub Kicinski --- Documentation/netlink/specs/mptcp_pm.yaml | 1 - net/mptcp/mptcp_pm_gen.c | 1 - 2 files changed, 2 deletions(-) diff --git a/Documentation/netlink/specs/mptcp_pm.yaml b/Documentation/netlink/specs/mptcp_pm.yaml index 30d8342cacc8..dc190bf838fe 100644 --- a/Documentation/netlink/specs/mptcp_pm.yaml +++ b/Documentation/netlink/specs/mptcp_pm.yaml @@ -293,7 +293,6 @@ operations: doc: Get endpoint information attribute-set: attr dont-validate: [ strict ] - flags: [ uns-admin-perm ] do: &get-addr-attrs request: attributes: diff --git a/net/mptcp/mptcp_pm_gen.c b/net/mptcp/mptcp_pm_gen.c index c30a2a90a192..bfb37c5a88c4 100644 --- a/net/mptcp/mptcp_pm_gen.c +++ b/net/mptcp/mptcp_pm_gen.c @@ -112,7 +112,6 @@ const struct genl_ops mptcp_pm_nl_ops[11] = { .dumpit = mptcp_pm_nl_get_addr_dumpit, .policy = mptcp_pm_get_addr_nl_policy, .maxattr = MPTCP_PM_ATTR_TOKEN, - .flags = GENL_UNS_ADMIN_PERM, }, { .cmd = MPTCP_PM_CMD_FLUSH_ADDRS, -- 2.51.0 From 99635c91fb8b860a6404b9bc8b769df7bdaa2ae3 Mon Sep 17 00:00:00 2001 From: Geliang Tang Date: Mon, 4 Nov 2024 13:31:42 +0100 Subject: [PATCH 04/16] mptcp: use sock_kfree_s instead of kfree The local address entries on userspace_pm_local_addr_list are allocated by sock_kmalloc(). It's then required to use sock_kfree_s() instead of kfree() to free these entries in order to adjust the allocated size on the sk side. Fixes: 24430f8bf516 ("mptcp: add address into userspace pm list") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) Link: https://patch.msgid.link/20241104-net-mptcp-misc-6-12-v1-2-c13f2ff1656f@kernel.org Signed-off-by: Jakub Kicinski --- net/mptcp/pm_userspace.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/mptcp/pm_userspace.c b/net/mptcp/pm_userspace.c index 2cceded3a83a..56dfea9862b7 100644 --- a/net/mptcp/pm_userspace.c +++ b/net/mptcp/pm_userspace.c @@ -91,6 +91,7 @@ static int mptcp_userspace_pm_delete_local_addr(struct mptcp_sock *msk, struct mptcp_pm_addr_entry *addr) { struct mptcp_pm_addr_entry *entry, *tmp; + struct sock *sk = (struct sock *)msk; list_for_each_entry_safe(entry, tmp, &msk->pm.userspace_pm_local_addr_list, list) { if (mptcp_addresses_equal(&entry->addr, &addr->addr, false)) { @@ -98,7 +99,7 @@ static int mptcp_userspace_pm_delete_local_addr(struct mptcp_sock *msk, * be used multiple times (e.g. fullmesh mode). */ list_del_rcu(&entry->list); - kfree(entry); + sock_kfree_s(sk, entry, sizeof(*entry)); msk->pm.local_addr_used--; return 0; } -- 2.51.0 From 1f26339b2ed63d1e8e18a18674fb73a392f3660e Mon Sep 17 00:00:00 2001 From: Stefan Wahren Date: Tue, 5 Nov 2024 17:31:01 +0100 Subject: [PATCH 05/16] net: vertexcom: mse102x: Fix possible double free of TX skb The scope of the TX skb is wider than just mse102x_tx_frame_spi(), so in case the TX skb room needs to be expanded, we should free the the temporary skb instead of the original skb. Otherwise the original TX skb pointer would be freed again in mse102x_tx_work(), which leads to crashes: Internal error: Oops: 0000000096000004 [#2] PREEMPT SMP CPU: 0 PID: 712 Comm: kworker/0:1 Tainted: G D 6.6.23 Hardware name: chargebyte Charge SOM DC-ONE (DT) Workqueue: events mse102x_tx_work [mse102x] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : skb_release_data+0xb8/0x1d8 lr : skb_release_data+0x1ac/0x1d8 sp : ffff8000819a3cc0 x29: ffff8000819a3cc0 x28: ffff0000046daa60 x27: ffff0000057f2dc0 x26: ffff000005386c00 x25: 0000000000000002 x24: 00000000ffffffff x23: 0000000000000000 x22: 0000000000000001 x21: ffff0000057f2e50 x20: 0000000000000006 x19: 0000000000000000 x18: ffff00003fdacfcc x17: e69ad452d0c49def x16: 84a005feff870102 x15: 0000000000000000 x14: 000000000000024a x13: 0000000000000002 x12: 0000000000000000 x11: 0000000000000400 x10: 0000000000000930 x9 : ffff00003fd913e8 x8 : fffffc00001bc008 x7 : 0000000000000000 x6 : 0000000000000008 x5 : ffff00003fd91340 x4 : 0000000000000000 x3 : 0000000000000009 x2 : 00000000fffffffe x1 : 0000000000000000 x0 : 0000000000000000 Call trace: skb_release_data+0xb8/0x1d8 kfree_skb_reason+0x48/0xb0 mse102x_tx_work+0x164/0x35c [mse102x] process_one_work+0x138/0x260 worker_thread+0x32c/0x438 kthread+0x118/0x11c ret_from_fork+0x10/0x20 Code: aa1303e0 97fffab6 72001c1f 54000141 (f9400660) Cc: stable@vger.kernel.org Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support") Signed-off-by: Stefan Wahren Link: https://patch.msgid.link/20241105163101.33216-1-wahrenst@gmx.net Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/vertexcom/mse102x.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/vertexcom/mse102x.c b/drivers/net/ethernet/vertexcom/mse102x.c index a04d4073def9..2c37957478fb 100644 --- a/drivers/net/ethernet/vertexcom/mse102x.c +++ b/drivers/net/ethernet/vertexcom/mse102x.c @@ -222,7 +222,7 @@ static int mse102x_tx_frame_spi(struct mse102x_net *mse, struct sk_buff *txp, struct mse102x_net_spi *mses = to_mse102x_spi(mse); struct spi_transfer *xfer = &mses->spi_xfer; struct spi_message *msg = &mses->spi_msg; - struct sk_buff *tskb; + struct sk_buff *tskb = NULL; int ret; netif_dbg(mse, tx_queued, mse->ndev, "%s: skb %p, %d@%p\n", @@ -235,7 +235,6 @@ static int mse102x_tx_frame_spi(struct mse102x_net *mse, struct sk_buff *txp, if (!tskb) return -ENOMEM; - dev_kfree_skb(txp); txp = tskb; } @@ -257,6 +256,8 @@ static int mse102x_tx_frame_spi(struct mse102x_net *mse, struct sk_buff *txp, mse->stats.xfer_err++; } + dev_kfree_skb(tskb); + return ret; } -- 2.51.0 From 25d70702142ac2115e75e01a0a985c6ea1d78033 Mon Sep 17 00:00:00 2001 From: =?utf8?q?N=C3=ADcolas=20F=2E=20R=2E=20A=2E=20Prado?= Date: Fri, 1 Nov 2024 17:17:29 -0400 Subject: [PATCH 06/16] net: stmmac: Fix unbalanced IRQ wake disable warning on single irq case MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Commit a23aa0404218 ("net: stmmac: ethtool: Fixed calltrace caused by unbalanced disable_irq_wake calls") introduced checks to prevent unbalanced enable and disable IRQ wake calls. However it only initialized the auxiliary variable on one of the paths, stmmac_request_irq_multi_msi(), missing the other, stmmac_request_irq_single(). Add the same initialization on stmmac_request_irq_single() to prevent "Unbalanced IRQ wake disable" warnings from being printed the first time disable_irq_wake() is called on platforms that run on that code path. Fixes: a23aa0404218 ("net: stmmac: ethtool: Fixed calltrace caused by unbalanced disable_irq_wake calls") Signed-off-by: Nícolas F. R. A. Prado Reviewed-by: Simon Horman Link: https://patch.msgid.link/20241101-stmmac-unbalanced-wake-single-fix-v1-1-5952524c97f0@collabora.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 208dbc68aaf9..7bf275f127c9 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3780,6 +3780,7 @@ static int stmmac_request_irq_single(struct net_device *dev) /* Request the Wake IRQ in case of another line * is used for WoL */ + priv->wol_irq_disabled = true; if (priv->wol_irq > 0 && priv->wol_irq != dev->irq) { ret = request_irq(priv->wol_irq, stmmac_interrupt, IRQF_SHARED, dev->name, dev); -- 2.51.0 From c03d278fdf35e73dd0ec543b9b556876b9d9a8dc Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 5 Nov 2024 12:07:22 +0100 Subject: [PATCH 07/16] netfilter: nf_tables: wait for rcu grace period on net_device removal 8c873e219970 ("netfilter: core: free hooks with call_rcu") removed synchronize_net() call when unregistering basechain hook, however, net_device removal event handler for the NFPROTO_NETDEV was not updated to wait for RCU grace period. Note that 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal") does not remove basechain rules on device removal, I was hinted to remove rules on net_device removal later, see 5ebe0b0eec9d ("netfilter: nf_tables: destroy basechain and rules on netdevice removal"). Although NETDEV_UNREGISTER event is guaranteed to be handled after synchronize_net() call, this path needs to wait for rcu grace period via rcu callback to release basechain hooks if netns is alive because an ongoing netlink dump could be in progress (sockets hold a reference on the netns). Note that nf_tables_pre_exit_net() unregisters and releases basechain hooks but it is possible to see NETDEV_UNREGISTER at a later stage in the netns exit path, eg. veth peer device in another netns: cleanup_net() default_device_exit_batch() unregister_netdevice_many_notify() notifier_call_chain() nf_tables_netdev_event() __nft_release_basechain() In this particular case, same rule of thumb applies: if netns is alive, then wait for rcu grace period because netlink dump in the other netns could be in progress. Otherwise, if the other netns is going away then no netlink dump can be in progress and basechain hooks can be released inmediately. While at it, turn WARN_ON() into WARN_ON_ONCE() for the basechain validation, which should not ever happen. Fixes: 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal") Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_tables.h | 4 +++ net/netfilter/nf_tables_api.c | 41 +++++++++++++++++++++++++------ 2 files changed, 38 insertions(+), 7 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 91ae20cb7648..066a3ea33b12 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1103,6 +1103,7 @@ struct nft_rule_blob { * @name: name of the chain * @udlen: user data length * @udata: user data in the chain + * @rcu_head: rcu head for deferred release * @blob_next: rule blob pointer to the next in the chain */ struct nft_chain { @@ -1120,6 +1121,7 @@ struct nft_chain { char *name; u16 udlen; u8 *udata; + struct rcu_head rcu_head; /* Only used during control plane commit phase: */ struct nft_rule_blob *blob_next; @@ -1263,6 +1265,7 @@ static inline void nft_use_inc_restore(u32 *use) * @sets: sets in the table * @objects: stateful objects in the table * @flowtables: flow tables in the table + * @net: netnamespace this table belongs to * @hgenerator: handle generator state * @handle: table handle * @use: number of chain references to this table @@ -1282,6 +1285,7 @@ struct nft_table { struct list_head sets; struct list_head objects; struct list_head flowtables; + possible_net_t net; u64 hgenerator; u64 handle; u32 use; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index a24fe62650a7..588a2757986c 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1495,6 +1495,7 @@ static int nf_tables_newtable(struct sk_buff *skb, const struct nfnl_info *info, INIT_LIST_HEAD(&table->sets); INIT_LIST_HEAD(&table->objects); INIT_LIST_HEAD(&table->flowtables); + write_pnet(&table->net, net); table->family = family; table->flags = flags; table->handle = ++nft_net->table_handle; @@ -11430,22 +11431,48 @@ int nft_data_dump(struct sk_buff *skb, int attr, const struct nft_data *data, } EXPORT_SYMBOL_GPL(nft_data_dump); -int __nft_release_basechain(struct nft_ctx *ctx) +static void __nft_release_basechain_now(struct nft_ctx *ctx) { struct nft_rule *rule, *nr; - if (WARN_ON(!nft_is_base_chain(ctx->chain))) - return 0; - - nf_tables_unregister_hook(ctx->net, ctx->chain->table, ctx->chain); list_for_each_entry_safe(rule, nr, &ctx->chain->rules, list) { list_del(&rule->list); - nft_use_dec(&ctx->chain->use); nf_tables_rule_release(ctx, rule); } + nf_tables_chain_destroy(ctx->chain); +} + +static void nft_release_basechain_rcu(struct rcu_head *head) +{ + struct nft_chain *chain = container_of(head, struct nft_chain, rcu_head); + struct nft_ctx ctx = { + .family = chain->table->family, + .chain = chain, + .net = read_pnet(&chain->table->net), + }; + + __nft_release_basechain_now(&ctx); + put_net(ctx.net); +} + +int __nft_release_basechain(struct nft_ctx *ctx) +{ + struct nft_rule *rule; + + if (WARN_ON_ONCE(!nft_is_base_chain(ctx->chain))) + return 0; + + nf_tables_unregister_hook(ctx->net, ctx->chain->table, ctx->chain); + list_for_each_entry(rule, &ctx->chain->rules, list) + nft_use_dec(&ctx->chain->use); + nft_chain_del(ctx->chain); nft_use_dec(&ctx->table->use); - nf_tables_chain_destroy(ctx->chain); + + if (maybe_get_net(ctx->net)) + call_rcu(&ctx->chain->rcu_head, nft_release_basechain_rcu); + else + __nft_release_basechain_now(ctx); return 0; } -- 2.51.0 From 86a48a00efdf61197b6658e52c6140463eb313dc Mon Sep 17 00:00:00 2001 From: Philo Lu Date: Mon, 4 Nov 2024 16:57:03 +0800 Subject: [PATCH 08/16] virtio_net: Support dynamic rss indirection table size When reading/writing virtio_net_ctrl_rss, we get the indirection table size from vi->rss_indir_table_size, which is initialized in virtnet_probe(). However, the actual size of indirection_table was set as VIRTIO_NET_RSS_MAX_TABLE_LEN=128. This collision may cause issues if the vi->rss_indir_table_size exceeds 128. This patch instead uses dynamic indirection table, allocated with vi->rss after vi->rss_indir_table_size initialized. And free it in virtnet_remove(). In virtnet_commit_rss_command(), sgs for rss is initialized differently with hash_report. So indirection_table is not used if !vi->has_rss, and then we don't need to alloc indirection_table for hash_report only uses. Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Philo Lu Signed-off-by: Xuan Zhuo Acked-by: Joe Damato Acked-by: Michael S. Tsirkin Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 39 ++++++++++++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 792e9eadbfc3..4b507007d242 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -368,15 +368,16 @@ struct receive_queue { * because table sizes may be differ according to the device configuration. */ #define VIRTIO_NET_RSS_MAX_KEY_SIZE 40 -#define VIRTIO_NET_RSS_MAX_TABLE_LEN 128 struct virtio_net_ctrl_rss { u32 hash_types; u16 indirection_table_mask; u16 unclassified_queue; - u16 indirection_table[VIRTIO_NET_RSS_MAX_TABLE_LEN]; + u16 hash_cfg_reserved; /* for HASH_CONFIG (see virtio_net_hash_config for details) */ u16 max_tx_vq; u8 hash_key_length; u8 key[VIRTIO_NET_RSS_MAX_KEY_SIZE]; + + u16 *indirection_table; }; /* Control VQ buffers: protected by the rtnl lock */ @@ -512,6 +513,25 @@ static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb, struct page *page, void *buf, int len, int truesize); +static int rss_indirection_table_alloc(struct virtio_net_ctrl_rss *rss, u16 indir_table_size) +{ + if (!indir_table_size) { + rss->indirection_table = NULL; + return 0; + } + + rss->indirection_table = kmalloc_array(indir_table_size, sizeof(u16), GFP_KERNEL); + if (!rss->indirection_table) + return -ENOMEM; + + return 0; +} + +static void rss_indirection_table_free(struct virtio_net_ctrl_rss *rss) +{ + kfree(rss->indirection_table); +} + static bool is_xdp_frame(void *ptr) { return (unsigned long)ptr & VIRTIO_XDP_FLAG; @@ -3828,11 +3848,15 @@ static bool virtnet_commit_rss_command(struct virtnet_info *vi) /* prepare sgs */ sg_init_table(sgs, 4); - sg_buf_size = offsetof(struct virtio_net_ctrl_rss, indirection_table); + sg_buf_size = offsetof(struct virtio_net_ctrl_rss, hash_cfg_reserved); sg_set_buf(&sgs[0], &vi->rss, sg_buf_size); - sg_buf_size = sizeof(uint16_t) * (vi->rss.indirection_table_mask + 1); - sg_set_buf(&sgs[1], vi->rss.indirection_table, sg_buf_size); + if (vi->has_rss) { + sg_buf_size = sizeof(uint16_t) * vi->rss_indir_table_size; + sg_set_buf(&sgs[1], vi->rss.indirection_table, sg_buf_size); + } else { + sg_set_buf(&sgs[1], &vi->rss.hash_cfg_reserved, sizeof(uint16_t)); + } sg_buf_size = offsetof(struct virtio_net_ctrl_rss, key) - offsetof(struct virtio_net_ctrl_rss, max_tx_vq); @@ -6420,6 +6444,9 @@ static int virtnet_probe(struct virtio_device *vdev) virtio_cread16(vdev, offsetof(struct virtio_net_config, rss_max_indirection_table_length)); } + err = rss_indirection_table_alloc(&vi->rss, vi->rss_indir_table_size); + if (err) + goto free; if (vi->has_rss || vi->has_rss_hash_report) { vi->rss_key_size = @@ -6674,6 +6701,8 @@ static void virtnet_remove(struct virtio_device *vdev) remove_vq_common(vi); + rss_indirection_table_free(&vi->rss); + free_netdev(vi->dev); } -- 2.51.0 From 3f7d9c1964fcd16d02a8a9d4fd6f6cb60c4cc530 Mon Sep 17 00:00:00 2001 From: Philo Lu Date: Mon, 4 Nov 2024 16:57:04 +0800 Subject: [PATCH 09/16] virtio_net: Add hash_key_length check Add hash_key_length check in virtnet_probe() to avoid possible out of bound errors when setting/reading the hash key. Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Philo Lu Signed-off-by: Xuan Zhuo Acked-by: Joe Damato Acked-by: Michael S. Tsirkin Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 4b507007d242..545dda8ec077 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -6451,6 +6451,12 @@ static int virtnet_probe(struct virtio_device *vdev) if (vi->has_rss || vi->has_rss_hash_report) { vi->rss_key_size = virtio_cread8(vdev, offsetof(struct virtio_net_config, rss_max_key_size)); + if (vi->rss_key_size > VIRTIO_NET_RSS_MAX_KEY_SIZE) { + dev_err(&vdev->dev, "rss_max_key_size=%u exceeds the limit %u.\n", + vi->rss_key_size, VIRTIO_NET_RSS_MAX_KEY_SIZE); + err = -EINVAL; + goto free; + } vi->rss_hash_types_supported = virtio_cread32(vdev, offsetof(struct virtio_net_config, supported_hash_types)); -- 2.51.0 From dc749b7b06082ccaacc602e724445da19cd03e9f Mon Sep 17 00:00:00 2001 From: Philo Lu Date: Mon, 4 Nov 2024 16:57:05 +0800 Subject: [PATCH 10/16] virtio_net: Sync rss config to device when virtnet_probe During virtnet_probe, default rss configuration is initialized, but was not committed to the device. This patch fix this by sending rss command after device ready in virtnet_probe. Otherwise, the actual rss configuration used by device can be different with that read by user from driver, which may confuse the user. If the command committing fails, driver rss will be disabled. Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Philo Lu Signed-off-by: Xuan Zhuo Acked-by: Joe Damato Acked-by: Michael S. Tsirkin Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 545dda8ec077..b3232b8baa25 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -6584,6 +6584,15 @@ static int virtnet_probe(struct virtio_device *vdev) virtio_device_ready(vdev); + if (vi->has_rss || vi->has_rss_hash_report) { + if (!virtnet_commit_rss_command(vi)) { + dev_warn(&vdev->dev, "RSS disabled because committing failed.\n"); + dev->hw_features &= ~NETIF_F_RXHASH; + vi->has_rss_hash_report = false; + vi->has_rss = false; + } + } + virtnet_set_queues(vi, vi->curr_queue_pairs); /* a random MAC address has been assigned, notify the device. -- 2.51.0 From 50bfcaedd78e53135ec0504302269b3b65bf1eff Mon Sep 17 00:00:00 2001 From: Philo Lu Date: Mon, 4 Nov 2024 16:57:06 +0800 Subject: [PATCH 11/16] virtio_net: Update rss when set queue RSS configuration should be updated with queue number. In particular, it should be updated when (1) rss enabled and (2) default rss configuration is used without user modification. During rss command processing, device updates queue_pairs using rss.max_tx_vq. That is, the device updates queue_pairs together with rss, so we can skip the sperate queue_pairs update (VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET below) and return directly. Also remove the `vi->has_rss ?` check when setting vi->rss.max_tx_vq, because this is not used in the other hash_report case. Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Philo Lu Signed-off-by: Xuan Zhuo Acked-by: Michael S. Tsirkin Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 65 +++++++++++++++++++++++++++++++--------- 1 file changed, 51 insertions(+), 14 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b3232b8baa25..53a038fcbe99 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3394,15 +3394,59 @@ static void virtnet_ack_link_announce(struct virtnet_info *vi) dev_warn(&vi->dev->dev, "Failed to ack link announce.\n"); } +static bool virtnet_commit_rss_command(struct virtnet_info *vi); + +static void virtnet_rss_update_by_qpairs(struct virtnet_info *vi, u16 queue_pairs) +{ + u32 indir_val = 0; + int i = 0; + + for (; i < vi->rss_indir_table_size; ++i) { + indir_val = ethtool_rxfh_indir_default(i, queue_pairs); + vi->rss.indirection_table[i] = indir_val; + } + vi->rss.max_tx_vq = queue_pairs; +} + static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) { struct virtio_net_ctrl_mq *mq __free(kfree) = NULL; - struct scatterlist sg; + struct virtio_net_ctrl_rss old_rss; struct net_device *dev = vi->dev; + struct scatterlist sg; if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ)) return 0; + /* Firstly check if we need update rss. Do updating if both (1) rss enabled and + * (2) no user configuration. + * + * During rss command processing, device updates queue_pairs using rss.max_tx_vq. That is, + * the device updates queue_pairs together with rss, so we can skip the sperate queue_pairs + * update (VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET below) and return directly. + */ + if (vi->has_rss && !netif_is_rxfh_configured(dev)) { + memcpy(&old_rss, &vi->rss, sizeof(old_rss)); + if (rss_indirection_table_alloc(&vi->rss, vi->rss_indir_table_size)) { + vi->rss.indirection_table = old_rss.indirection_table; + return -ENOMEM; + } + + virtnet_rss_update_by_qpairs(vi, queue_pairs); + + if (!virtnet_commit_rss_command(vi)) { + /* restore ctrl_rss if commit_rss_command failed */ + rss_indirection_table_free(&vi->rss); + memcpy(&vi->rss, &old_rss, sizeof(old_rss)); + + dev_warn(&dev->dev, "Fail to set num of queue pairs to %d, because committing RSS failed\n", + queue_pairs); + return -EINVAL; + } + rss_indirection_table_free(&old_rss); + goto succ; + } + mq = kzalloc(sizeof(*mq), GFP_KERNEL); if (!mq) return -ENOMEM; @@ -3415,12 +3459,12 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) dev_warn(&dev->dev, "Fail to set num of queue pairs to %d\n", queue_pairs); return -EINVAL; - } else { - vi->curr_queue_pairs = queue_pairs; - /* virtnet_open() will refill when device is going to up. */ - if (dev->flags & IFF_UP) - schedule_delayed_work(&vi->refill, 0); } +succ: + vi->curr_queue_pairs = queue_pairs; + /* virtnet_open() will refill when device is going to up. */ + if (dev->flags & IFF_UP) + schedule_delayed_work(&vi->refill, 0); return 0; } @@ -3880,21 +3924,14 @@ err: static void virtnet_init_default_rss(struct virtnet_info *vi) { - u32 indir_val = 0; - int i = 0; - vi->rss.hash_types = vi->rss_hash_types_supported; vi->rss_hash_types_saved = vi->rss_hash_types_supported; vi->rss.indirection_table_mask = vi->rss_indir_table_size ? vi->rss_indir_table_size - 1 : 0; vi->rss.unclassified_queue = 0; - for (; i < vi->rss_indir_table_size; ++i) { - indir_val = ethtool_rxfh_indir_default(i, vi->curr_queue_pairs); - vi->rss.indirection_table[i] = indir_val; - } + virtnet_rss_update_by_qpairs(vi, vi->curr_queue_pairs); - vi->rss.max_tx_vq = vi->has_rss ? vi->curr_queue_pairs : 0; vi->rss.hash_key_length = vi->rss_key_size; netdev_rss_key_fill(vi->rss.key, vi->rss_key_size); -- 2.51.0 From 71803c1dfa29e0d13b99e48fda11107cc8caebc7 Mon Sep 17 00:00:00 2001 From: Johan Jonker Date: Mon, 4 Nov 2024 21:01:38 +0800 Subject: [PATCH 12/16] net: arc: fix the device for dma_map_single/dma_unmap_single The ndev->dev and pdev->dev aren't the same device, use ndev->dev.parent which has dma_mask, ndev->dev.parent is just pdev->dev. Or it would cause the following issue: [ 39.933526] ------------[ cut here ]------------ [ 39.938414] WARNING: CPU: 1 PID: 501 at kernel/dma/mapping.c:149 dma_map_page_attrs+0x90/0x1f8 Fixes: f959dcd6ddfd ("dma-direct: Fix potential NULL pointer dereference") Signed-off-by: David Wu Signed-off-by: Johan Jonker Signed-off-by: Andy Yan Signed-off-by: Paolo Abeni --- drivers/net/ethernet/arc/emac_main.c | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c index 31ee477dd131..8283aeee35fb 100644 --- a/drivers/net/ethernet/arc/emac_main.c +++ b/drivers/net/ethernet/arc/emac_main.c @@ -111,6 +111,7 @@ static void arc_emac_tx_clean(struct net_device *ndev) { struct arc_emac_priv *priv = netdev_priv(ndev); struct net_device_stats *stats = &ndev->stats; + struct device *dev = ndev->dev.parent; unsigned int i; for (i = 0; i < TX_BD_NUM; i++) { @@ -140,7 +141,7 @@ static void arc_emac_tx_clean(struct net_device *ndev) stats->tx_bytes += skb->len; } - dma_unmap_single(&ndev->dev, dma_unmap_addr(tx_buff, addr), + dma_unmap_single(dev, dma_unmap_addr(tx_buff, addr), dma_unmap_len(tx_buff, len), DMA_TO_DEVICE); /* return the sk_buff to system */ @@ -174,6 +175,7 @@ static void arc_emac_tx_clean(struct net_device *ndev) static int arc_emac_rx(struct net_device *ndev, int budget) { struct arc_emac_priv *priv = netdev_priv(ndev); + struct device *dev = ndev->dev.parent; unsigned int work_done; for (work_done = 0; work_done < budget; work_done++) { @@ -223,9 +225,9 @@ static int arc_emac_rx(struct net_device *ndev, int budget) continue; } - addr = dma_map_single(&ndev->dev, (void *)skb->data, + addr = dma_map_single(dev, (void *)skb->data, EMAC_BUFFER_SIZE, DMA_FROM_DEVICE); - if (dma_mapping_error(&ndev->dev, addr)) { + if (dma_mapping_error(dev, addr)) { if (net_ratelimit()) netdev_err(ndev, "cannot map dma buffer\n"); dev_kfree_skb(skb); @@ -237,7 +239,7 @@ static int arc_emac_rx(struct net_device *ndev, int budget) } /* unmap previosly mapped skb */ - dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr), + dma_unmap_single(dev, dma_unmap_addr(rx_buff, addr), dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE); pktlen = info & LEN_MASK; @@ -423,6 +425,7 @@ static int arc_emac_open(struct net_device *ndev) { struct arc_emac_priv *priv = netdev_priv(ndev); struct phy_device *phy_dev = ndev->phydev; + struct device *dev = ndev->dev.parent; int i; phy_dev->autoneg = AUTONEG_ENABLE; @@ -445,9 +448,9 @@ static int arc_emac_open(struct net_device *ndev) if (unlikely(!rx_buff->skb)) return -ENOMEM; - addr = dma_map_single(&ndev->dev, (void *)rx_buff->skb->data, + addr = dma_map_single(dev, (void *)rx_buff->skb->data, EMAC_BUFFER_SIZE, DMA_FROM_DEVICE); - if (dma_mapping_error(&ndev->dev, addr)) { + if (dma_mapping_error(dev, addr)) { netdev_err(ndev, "cannot dma map\n"); dev_kfree_skb(rx_buff->skb); return -ENOMEM; @@ -548,6 +551,7 @@ static void arc_emac_set_rx_mode(struct net_device *ndev) static void arc_free_tx_queue(struct net_device *ndev) { struct arc_emac_priv *priv = netdev_priv(ndev); + struct device *dev = ndev->dev.parent; unsigned int i; for (i = 0; i < TX_BD_NUM; i++) { @@ -555,7 +559,7 @@ static void arc_free_tx_queue(struct net_device *ndev) struct buffer_state *tx_buff = &priv->tx_buff[i]; if (tx_buff->skb) { - dma_unmap_single(&ndev->dev, + dma_unmap_single(dev, dma_unmap_addr(tx_buff, addr), dma_unmap_len(tx_buff, len), DMA_TO_DEVICE); @@ -579,6 +583,7 @@ static void arc_free_tx_queue(struct net_device *ndev) static void arc_free_rx_queue(struct net_device *ndev) { struct arc_emac_priv *priv = netdev_priv(ndev); + struct device *dev = ndev->dev.parent; unsigned int i; for (i = 0; i < RX_BD_NUM; i++) { @@ -586,7 +591,7 @@ static void arc_free_rx_queue(struct net_device *ndev) struct buffer_state *rx_buff = &priv->rx_buff[i]; if (rx_buff->skb) { - dma_unmap_single(&ndev->dev, + dma_unmap_single(dev, dma_unmap_addr(rx_buff, addr), dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE); @@ -679,6 +684,7 @@ static netdev_tx_t arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) unsigned int len, *txbd_curr = &priv->txbd_curr; struct net_device_stats *stats = &ndev->stats; __le32 *info = &priv->txbd[*txbd_curr].info; + struct device *dev = ndev->dev.parent; dma_addr_t addr; if (skb_padto(skb, ETH_ZLEN)) @@ -692,10 +698,9 @@ static netdev_tx_t arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) return NETDEV_TX_BUSY; } - addr = dma_map_single(&ndev->dev, (void *)skb->data, len, - DMA_TO_DEVICE); + addr = dma_map_single(dev, (void *)skb->data, len, DMA_TO_DEVICE); - if (unlikely(dma_mapping_error(&ndev->dev, addr))) { + if (unlikely(dma_mapping_error(dev, addr))) { stats->tx_dropped++; stats->tx_errors++; dev_kfree_skb_any(skb); -- 2.51.0 From 0a1c7a7b0adbf595ce7f218609db53749e966573 Mon Sep 17 00:00:00 2001 From: Johan Jonker Date: Mon, 4 Nov 2024 21:01:39 +0800 Subject: [PATCH 13/16] net: arc: rockchip: fix emac mdio node support The binding emac_rockchip.txt is converted to YAML. Changed against the original binding is an added MDIO subnode. This make the driver failed to find the PHY, and given the 'mdio has invalid PHY address' it is probably looking in the wrong node. Fix emac_mdio.c so that it can handle both old and new device trees. Fixes: 1dabb74971b3 ("ARM: dts: rockchip: restyle emac nodes") Signed-off-by: Johan Jonker Tested-by: Andy Yan Link: https://lore.kernel.org/r/20220603163539.537-3-jbx6244@gmail.com Signed-off-by: Andy Yan Reviewed-by: Andrew Lunn Signed-off-by: Paolo Abeni --- drivers/net/ethernet/arc/emac_mdio.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/arc/emac_mdio.c b/drivers/net/ethernet/arc/emac_mdio.c index 87f40c2ba904..078b1a72c161 100644 --- a/drivers/net/ethernet/arc/emac_mdio.c +++ b/drivers/net/ethernet/arc/emac_mdio.c @@ -133,6 +133,7 @@ int arc_mdio_probe(struct arc_emac_priv *priv) struct arc_emac_mdio_bus_data *data = &priv->bus_data; struct device_node *np = priv->dev->of_node; const char *name = "Synopsys MII Bus"; + struct device_node *mdio_node; struct mii_bus *bus; int error; @@ -164,7 +165,13 @@ int arc_mdio_probe(struct arc_emac_priv *priv) snprintf(bus->id, MII_BUS_ID_SIZE, "%s", bus->name); - error = of_mdiobus_register(bus, priv->dev->of_node); + /* Backwards compatibility for EMAC nodes without MDIO subnode. */ + mdio_node = of_get_child_by_name(np, "mdio"); + if (!mdio_node) + mdio_node = of_node_get(np); + + error = of_mdiobus_register(bus, mdio_node); + of_node_put(mdio_node); if (error) { mdiobus_free(bus); return dev_err_probe(priv->dev, error, -- 2.51.0 From de88df01796b309903b70888fbdf2b89607e3a6a Mon Sep 17 00:00:00 2001 From: Wenjia Zhang Date: Wed, 6 Nov 2024 09:26:12 +0100 Subject: [PATCH 14/16] net/smc: Fix lookup of netdev by using ib_device_get_netdev() The SMC-R variant of the SMC protocol used direct call to function ib_device_ops.get_netdev() to lookup netdev. As we used mlx5 device driver to run SMC-R, it failed to find a device, because in mlx5_ib the internal net device management for retrieving net devices was replaced by a common interface ib_device_get_netdev() in commit 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions"). Since such direct accesses to the internal net device management is not recommended at all, update the SMC-R code to use proper API ib_device_get_netdev(). Fixes: 54903572c23c ("net/smc: allow pnetid-less configuration") Reported-by: Aswin K Reviewed-by: Gerd Bayer Reviewed-by: Halil Pasic Reviewed-by: Simon Horman Reviewed-by: Dust Li Reviewed-by: Wen Gu Reviewed-by: Zhu Yanjun Reviewed-by: D. Wythe Signed-off-by: Wenjia Zhang Reviewed-by: Leon Romanovsky Link: https://patch.msgid.link/20241106082612.57803-1-wenjia@linux.ibm.com Signed-off-by: Jakub Kicinski --- net/smc/smc_ib.c | 8 ++------ net/smc/smc_pnet.c | 4 +--- 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c index 9297dc20bfe2..9c563cdbea90 100644 --- a/net/smc/smc_ib.c +++ b/net/smc/smc_ib.c @@ -899,9 +899,7 @@ static void smc_copy_netdev_ifindex(struct smc_ib_device *smcibdev, int port) struct ib_device *ibdev = smcibdev->ibdev; struct net_device *ndev; - if (!ibdev->ops.get_netdev) - return; - ndev = ibdev->ops.get_netdev(ibdev, port + 1); + ndev = ib_device_get_netdev(ibdev, port + 1); if (ndev) { smcibdev->ndev_ifidx[port] = ndev->ifindex; dev_put(ndev); @@ -921,9 +919,7 @@ void smc_ib_ndev_change(struct net_device *ndev, unsigned long event) port_cnt = smcibdev->ibdev->phys_port_cnt; for (i = 0; i < min_t(size_t, port_cnt, SMC_MAX_PORTS); i++) { libdev = smcibdev->ibdev; - if (!libdev->ops.get_netdev) - continue; - lndev = libdev->ops.get_netdev(libdev, i + 1); + lndev = ib_device_get_netdev(libdev, i + 1); dev_put(lndev); if (lndev != ndev) continue; diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c index a04aa0e882f8..716808f374a8 100644 --- a/net/smc/smc_pnet.c +++ b/net/smc/smc_pnet.c @@ -1054,9 +1054,7 @@ static void smc_pnet_find_rdma_dev(struct net_device *netdev, for (i = 1; i <= SMC_MAX_PORTS; i++) { if (!rdma_is_port_valid(ibdev->ibdev, i)) continue; - if (!ibdev->ibdev->ops.get_netdev) - continue; - ndev = ibdev->ibdev->ops.get_netdev(ibdev->ibdev, i); + ndev = ib_device_get_netdev(ibdev->ibdev, i); if (!ndev) continue; dev_put(ndev); -- 2.51.0 From fc9de52de38f656399d2ce40f7349a6b5f86e787 Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 6 Nov 2024 13:03:22 +0000 Subject: [PATCH 15/16] rxrpc: Fix missing locking causing hanging calls If a call gets aborted (e.g. because kafs saw a signal) between it being queued for connection and the I/O thread picking up the call, the abort will be prioritised over the connection and it will be removed from local->new_client_calls by rxrpc_disconnect_client_call() without a lock being held. This may cause other calls on the list to disappear if a race occurs. Fix this by taking the client_call_lock when removing a call from whatever list its ->wait_link happens to be on. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org Reported-by: Marc Dionne Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread") Link: https://patch.msgid.link/726660.1730898202@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski --- include/trace/events/rxrpc.h | 1 + net/rxrpc/conn_client.c | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index a1b126a6b0d7..cc22596c7250 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -287,6 +287,7 @@ EM(rxrpc_call_see_input, "SEE input ") \ EM(rxrpc_call_see_release, "SEE release ") \ EM(rxrpc_call_see_userid_exists, "SEE u-exists") \ + EM(rxrpc_call_see_waiting_call, "SEE q-conn ") \ E_(rxrpc_call_see_zap, "SEE zap ") #define rxrpc_txqueue_traces \ diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index d25bf1cf3670..bb11e8289d6d 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -516,6 +516,7 @@ void rxrpc_connect_client_calls(struct rxrpc_local *local) spin_lock(&local->client_call_lock); list_move_tail(&call->wait_link, &bundle->waiting_calls); + rxrpc_see_call(call, rxrpc_call_see_waiting_call); spin_unlock(&local->client_call_lock); if (rxrpc_bundle_has_space(bundle)) @@ -586,7 +587,10 @@ void rxrpc_disconnect_client_call(struct rxrpc_bundle *bundle, struct rxrpc_call _debug("call is waiting"); ASSERTCMP(call->call_id, ==, 0); ASSERT(!test_bit(RXRPC_CALL_EXPOSED, &call->flags)); + /* May still be on ->new_client_calls. */ + spin_lock(&local->client_call_lock); list_del_init(&call->wait_link); + spin_unlock(&local->client_call_lock); return; } -- 2.51.0 From d293958a8595ba566fb90b99da4d6263e14fee15 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 6 Nov 2024 22:19:22 +0000 Subject: [PATCH 16/16] net/smc: do not leave a dangling sk pointer in __smc_create() Thanks to commit 4bbd360a5084 ("socket: Print pf->create() when it does not clear sock->sk on failure."), syzbot found an issue with AF_SMC: smc_create must clear sock->sk on failure, family: 43, type: 1, protocol: 0 WARNING: CPU: 0 PID: 5827 at net/socket.c:1565 __sock_create+0x96f/0xa30 net/socket.c:1563 Modules linked in: CPU: 0 UID: 0 PID: 5827 Comm: syz-executor259 Not tainted 6.12.0-rc6-next-20241106-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:__sock_create+0x96f/0xa30 net/socket.c:1563 Code: 03 00 74 08 4c 89 e7 e8 4f 3b 85 f8 49 8b 34 24 48 c7 c7 40 89 0c 8d 8b 54 24 04 8b 4c 24 0c 44 8b 44 24 08 e8 32 78 db f7 90 <0f> 0b 90 90 e9 d3 fd ff ff 89 e9 80 e1 07 fe c1 38 c1 0f 8c ee f7 RSP: 0018:ffffc90003e4fda0 EFLAGS: 00010246 RAX: 099c6f938c7f4700 RBX: 1ffffffff1a595fd RCX: ffff888034823c00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 00000000ffffffe9 R08: ffffffff81567052 R09: 1ffff920007c9f50 R10: dffffc0000000000 R11: fffff520007c9f51 R12: ffffffff8d2cafe8 R13: 1ffffffff1a595fe R14: ffffffff9a789c40 R15: ffff8880764298c0 FS: 000055557b518380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa62ff43225 CR3: 0000000031628000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: sock_create net/socket.c:1616 [inline] __sys_socket_create net/socket.c:1653 [inline] __sys_socket+0x150/0x3c0 net/socket.c:1700 __do_sys_socket net/socket.c:1714 [inline] __se_sys_socket net/socket.c:1712 [inline] For reference, see commit 2d859aff775d ("Merge branch 'do-not-leave-dangling-sk-pointers-in-pf-create-functions'") Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC") Signed-off-by: Eric Dumazet Cc: Ignat Korchagin Cc: D. Wythe Cc: Dust Li Reviewed-by: Kuniyuki Iwashima Reviewed-by: Wenjia Zhang Link: https://patch.msgid.link/20241106221922.1544045-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/smc/af_smc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 0316217b7687..9d76e902fd77 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -3359,8 +3359,10 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol, else rc = smc_create_clcsk(net, sk, family); - if (rc) + if (rc) { sk_common_release(sk); + sock->sk = NULL; + } out: return rc; } -- 2.51.0