From 6e33123a18bf6c69c941d09e091e8bfef5fb0d41 Mon Sep 17 00:00:00 2001 From: Vyshnav Ajith Date: Fri, 22 Nov 2024 03:48:52 +0530 Subject: [PATCH 01/16] Fix spelling mistake Changed from reequires to require. A minute typo. Signed-off-by: Vyshnav Ajith Link: https://patch.msgid.link/20241121221852.10754-1-puthen1977@gmail.com Signed-off-by: Paolo Abeni --- Documentation/networking/cdc_mbim.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/networking/cdc_mbim.rst b/Documentation/networking/cdc_mbim.rst index 37f968acc473..8404a3f794f3 100644 --- a/Documentation/networking/cdc_mbim.rst +++ b/Documentation/networking/cdc_mbim.rst @@ -51,7 +51,7 @@ Such userspace applications includes, but are not limited to: - mbimcli (included with the libmbim [3] library), and - ModemManager [4] -Establishing a MBIM IP session reequires at least these actions by the +Establishing a MBIM IP session requires at least these actions by the management application: - open the control channel -- 2.51.0 From 04f5cb48995d51deed0af71aaba1b8699511313f Mon Sep 17 00:00:00 2001 From: Leo Stone Date: Sun, 24 Nov 2024 15:00:02 -0800 Subject: [PATCH 02/16] Documentation: tls_offload: fix typos and grammar Fix typos and grammar where it improves readability. Signed-off-by: Leo Stone Reviewed-by: Bagas Sanjaya Link: https://patch.msgid.link/20241124230002.56058-1-leocstone@gmail.com Signed-off-by: Paolo Abeni --- Documentation/networking/tls-offload.rst | 29 ++++++++++++------------ 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/Documentation/networking/tls-offload.rst b/Documentation/networking/tls-offload.rst index 5f0dea3d571e..7354d48cdf92 100644 --- a/Documentation/networking/tls-offload.rst +++ b/Documentation/networking/tls-offload.rst @@ -51,7 +51,7 @@ and send them to the device for encryption and transmission. RX -- -On the receive side if the device handled decryption and authentication +On the receive side, if the device handled decryption and authentication successfully, the driver will set the decrypted bit in the associated :c:type:`struct sk_buff `. The packets reach the TCP stack and are handled normally. ``ktls`` is informed when data is queued to the socket @@ -120,8 +120,9 @@ before installing the connection state in the kernel. RX -- -In RX direction local networking stack has little control over the segmentation, -so the initial records' TCP sequence number may be anywhere inside the segment. +In the RX direction, the local networking stack has little control over +segmentation, so the initial records' TCP sequence number may be anywhere +inside the segment. Normal operation ================ @@ -138,8 +139,8 @@ There are no guarantees on record length or record segmentation. In particular segments may start at any point of a record and contain any number of records. Assuming segments are received in order, the device should be able to perform crypto operations and authentication regardless of segmentation. For this -to be possible device has to keep small amount of segment-to-segment state. -This includes at least: +to be possible, the device has to keep a small amount of segment-to-segment +state. This includes at least: * partial headers (if a segment carried only a part of the TLS header) * partial data block @@ -175,12 +176,12 @@ and packet transformation functions) the device validates the Layer 4 checksum and performs a 5-tuple lookup to find any TLS connection the packet may belong to (technically a 4-tuple lookup is sufficient - IP addresses and TCP port numbers, as the protocol -is always TCP). If connection is matched device confirms if the TCP sequence -number is the expected one and proceeds to TLS handling (record delineation, -decryption, authentication for each record in the packet). The device leaves -the record framing unmodified, the stack takes care of record decapsulation. -Device indicates successful handling of TLS offload in the per-packet context -(descriptor) passed to the host. +is always TCP). If the packet is matched to a connection, the device confirms +if the TCP sequence number is the expected one and proceeds to TLS handling +(record delineation, decryption, authentication for each record in the packet). +The device leaves the record framing unmodified, the stack takes care of record +decapsulation. Device indicates successful handling of TLS offload in the +per-packet context (descriptor) passed to the host. Upon reception of a TLS offloaded packet, the driver sets the :c:member:`decrypted` mark in :c:type:`struct sk_buff ` @@ -439,7 +440,7 @@ by the driver: * ``rx_tls_resync_req_end`` - number of times the TLS async resync request properly ended with providing the HW tracked tcp-seq. * ``rx_tls_resync_req_skip`` - number of times the TLS async resync request - procedure was started by not properly ended. + procedure was started but not properly ended. * ``rx_tls_resync_res_ok`` - number of times the TLS resync response call to the driver was successfully handled. * ``rx_tls_resync_res_skip`` - number of times the TLS resync response call to @@ -507,8 +508,8 @@ in packets as seen on the wire. Transport layer transparency ---------------------------- -The device should not modify any packet headers for the purpose -of the simplifying TLS offload. +For the purpose of simplifying TLS offload, the device should not modify any +packet headers. The device should not depend on any packet headers beyond what is strictly necessary for TLS offload. -- 2.51.0 From f6d7695b5ae22092fa2cc42529bb7462f7e0c4ad Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Thu, 28 Nov 2024 17:18:04 +0100 Subject: [PATCH 03/16] ipmr: fix build with clang and DEBUG_NET disabled. Sasha reported a build issue in ipmr:: net/ipv4/ipmr.c:320:13: error: function 'ipmr_can_free_table' is not \ needed and will not be emitted \ [-Werror,-Wunneeded-internal-declaration] 320 | static bool ipmr_can_free_table(struct net *net) Apparently clang is too smart with BUILD_BUG_ON_INVALID(), let's fallback to a plain WARN_ON_ONCE(). Reported-by: Sasha Levin Closes: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.11-25635-g6813e2326f1e/testrun/26111580/suite/build/test/clang-nightly-lkftconfig/details/ Fixes: 11b6e701bce9 ("ipmr: add debug check for mr table cleanup") Signed-off-by: Paolo Abeni Link: https://patch.msgid.link/ee75faa926b2446b8302ee5fc30e129d2df73b90.1732810228.git.pabeni@redhat.com Signed-off-by: Paolo Abeni --- net/ipv4/ipmr.c | 2 +- net/ipv6/ip6mr.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index 383ea8b91cc7..c5b8ec5c0a8c 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -437,7 +437,7 @@ static void ipmr_free_table(struct mr_table *mrt) { struct net *net = read_pnet(&mrt->net); - DEBUG_NET_WARN_ON_ONCE(!ipmr_can_free_table(net)); + WARN_ON_ONCE(!ipmr_can_free_table(net)); timer_shutdown_sync(&mrt->ipmr_expire_timer); mroute_clean_tables(mrt, MRT_FLUSH_VIFS | MRT_FLUSH_VIFS_STATIC | diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index 4147890fe98f..7f1902ac3586 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -416,7 +416,7 @@ static void ip6mr_free_table(struct mr_table *mrt) { struct net *net = read_pnet(&mrt->net); - DEBUG_NET_WARN_ON_ONCE(!ip6mr_can_free_table(net)); + WARN_ON_ONCE(!ip6mr_can_free_table(net)); timer_shutdown_sync(&mrt->ipmr_expire_timer); mroute_clean_tables(mrt, MRT6_FLUSH_MIFS | MRT6_FLUSH_MIFS_STATIC | -- 2.51.0 From 8e00072c31e28e7de1ffd9b28c773a3143aa4167 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Mon, 25 Nov 2024 17:07:18 +0800 Subject: [PATCH 04/16] net: enetc: read TSN capabilities from port register, not SI Configuring TSN (Qbv, Qbu, PSFP) capabilities requires access to port registers, which are available to the PSI but not the VSI. Yet, the SI port capability register 0 (PSICAPR0), exposed to both PSIs and VSIs, presents the same capabilities to the VF as to the PF, thus leading the VF driver into thinking it can configure these features. In the case of ENETC_SI_F_QBU, having it set in the VF leads to a crash: root@ls1028ardb:~# tc qdisc add dev eno0vf0 parent root handle 100: \ mqprio num_tc 4 map 0 0 1 1 2 2 3 3 queues 1@0 1@1 1@2 1@3 hw 1 [ 187.290775] Unable to handle kernel paging request at virtual address 0000000000001f00 [ 187.424831] pc : enetc_mm_commit_preemptible_tcs+0x1c4/0x400 [ 187.430518] lr : enetc_mm_commit_preemptible_tcs+0x30c/0x400 [ 187.511140] Call trace: [ 187.513588] enetc_mm_commit_preemptible_tcs+0x1c4/0x400 [ 187.518918] enetc_setup_tc_mqprio+0x180/0x214 [ 187.523374] enetc_vf_setup_tc+0x1c/0x30 [ 187.527306] mqprio_enable_offload+0x144/0x178 [ 187.531766] mqprio_init+0x3ec/0x668 [ 187.535351] qdisc_create+0x15c/0x488 [ 187.539023] tc_modify_qdisc+0x398/0x73c [ 187.542958] rtnetlink_rcv_msg+0x128/0x378 [ 187.547064] netlink_rcv_skb+0x60/0x130 [ 187.550910] rtnetlink_rcv+0x18/0x24 [ 187.554492] netlink_unicast+0x300/0x36c [ 187.558425] netlink_sendmsg+0x1a8/0x420 [ 187.606759] ---[ end trace 0000000000000000 ]--- while the other TSN features in the VF are harmless, because the net_device_ops used for the VF driver do not expose entry points for these other features. These capability bits are in the process of being defeatured from the SI registers. We should read them from the port capability register, where they are also present, and which is naturally only exposed to the PF. The change to blame (relevant for stable backports) is the one where this started being a problem, aka when the kernel started to crash due to the wrong capability seen by the VF driver. Fixes: 827145392a4a ("net: enetc: only commit preemptible TCs to hardware when MM TX is active") Reported-by: Wei Fang Signed-off-by: Vladimir Oltean Reviewed-by: Frank Li Signed-off-by: David S. Miller --- drivers/net/ethernet/freescale/enetc/enetc.c | 9 --------- .../net/ethernet/freescale/enetc/enetc_hw.h | 6 +++--- .../net/ethernet/freescale/enetc/enetc_pf.c | 19 +++++++++++++++++++ 3 files changed, 22 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c index 35634c516e26..bece220535a1 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -1756,15 +1756,6 @@ void enetc_get_si_caps(struct enetc_si *si) rss = enetc_rd(hw, ENETC_SIRSSCAPR); si->num_rss = ENETC_SIRSSCAPR_GET_NUM_RSS(rss); } - - if (val & ENETC_SIPCAPR0_QBV) - si->hw_features |= ENETC_SI_F_QBV; - - if (val & ENETC_SIPCAPR0_QBU) - si->hw_features |= ENETC_SI_F_QBU; - - if (val & ENETC_SIPCAPR0_PSFP) - si->hw_features |= ENETC_SI_F_PSFP; } EXPORT_SYMBOL_GPL(enetc_get_si_caps); diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h index 7c3285584f8a..55ba949230ff 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h +++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h @@ -23,10 +23,7 @@ #define ENETC_SICTR0 0x18 #define ENETC_SICTR1 0x1c #define ENETC_SIPCAPR0 0x20 -#define ENETC_SIPCAPR0_PSFP BIT(9) #define ENETC_SIPCAPR0_RSS BIT(8) -#define ENETC_SIPCAPR0_QBV BIT(4) -#define ENETC_SIPCAPR0_QBU BIT(3) #define ENETC_SIPCAPR0_RFS BIT(2) #define ENETC_SIPCAPR1 0x24 #define ENETC_SITGTGR 0x30 @@ -194,6 +191,9 @@ enum enetc_bdr_type {TX, RX}; #define ENETC_PCAPR0 0x0900 #define ENETC_PCAPR0_RXBDR(val) ((val) >> 24) #define ENETC_PCAPR0_TXBDR(val) (((val) >> 16) & 0xff) +#define ENETC_PCAPR0_PSFP BIT(9) +#define ENETC_PCAPR0_QBV BIT(4) +#define ENETC_PCAPR0_QBU BIT(3) #define ENETC_PCAPR1 0x0904 #define ENETC_PSICFGR0(n) (0x0940 + (n) * 0xc) /* n = SI index */ #define ENETC_PSICFGR0_SET_TXBDR(val) ((val) & 0xff) diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c index c47b4a743d93..203862ec1114 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c @@ -409,6 +409,23 @@ static void enetc_port_assign_rfs_entries(struct enetc_si *si) enetc_port_wr(hw, ENETC_PRFSMR, ENETC_PRFSMR_RFSE); } +static void enetc_port_get_caps(struct enetc_si *si) +{ + struct enetc_hw *hw = &si->hw; + u32 val; + + val = enetc_port_rd(hw, ENETC_PCAPR0); + + if (val & ENETC_PCAPR0_QBV) + si->hw_features |= ENETC_SI_F_QBV; + + if (val & ENETC_PCAPR0_QBU) + si->hw_features |= ENETC_SI_F_QBU; + + if (val & ENETC_PCAPR0_PSFP) + si->hw_features |= ENETC_SI_F_PSFP; +} + static void enetc_port_si_configure(struct enetc_si *si) { struct enetc_pf *pf = enetc_si_priv(si); @@ -416,6 +433,8 @@ static void enetc_port_si_configure(struct enetc_si *si) int num_rings, i; u32 val; + enetc_port_get_caps(si); + val = enetc_port_rd(hw, ENETC_PCAPR0); num_rings = min(ENETC_PCAPR0_RXBDR(val), ENETC_PCAPR0_TXBDR(val)); -- 2.51.0 From b2420b8c81ec674552d00c55d46245e5c184b260 Mon Sep 17 00:00:00 2001 From: Wei Fang Date: Mon, 25 Nov 2024 17:07:19 +0800 Subject: [PATCH 05/16] net: enetc: Do not configure preemptible TCs if SIs do not support Both ENETC PF and VF drivers share enetc_setup_tc_mqprio() to configure MQPRIO. And enetc_setup_tc_mqprio() calls enetc_change_preemptible_tcs() to configure preemptible TCs. However, only PF is able to configure preemptible TCs. Because only PF has related registers, while VF does not have these registers. So for VF, its hw->port pointer is NULL. Therefore, VF will access an invalid pointer when accessing a non-existent register, which will cause a crash issue. The simplified log is as follows. root@ls1028ardb:~# tc qdisc add dev eno0vf0 parent root handle 100: \ mqprio num_tc 4 map 0 0 1 1 2 2 3 3 queues 1@0 1@1 1@2 1@3 hw 1 [ 187.290775] Unable to handle kernel paging request at virtual address 0000000000001f00 [ 187.424831] pc : enetc_mm_commit_preemptible_tcs+0x1c4/0x400 [ 187.430518] lr : enetc_mm_commit_preemptible_tcs+0x30c/0x400 [ 187.511140] Call trace: [ 187.513588] enetc_mm_commit_preemptible_tcs+0x1c4/0x400 [ 187.518918] enetc_setup_tc_mqprio+0x180/0x214 [ 187.523374] enetc_vf_setup_tc+0x1c/0x30 [ 187.527306] mqprio_enable_offload+0x144/0x178 [ 187.531766] mqprio_init+0x3ec/0x668 [ 187.535351] qdisc_create+0x15c/0x488 [ 187.539023] tc_modify_qdisc+0x398/0x73c [ 187.542958] rtnetlink_rcv_msg+0x128/0x378 [ 187.547064] netlink_rcv_skb+0x60/0x130 [ 187.550910] rtnetlink_rcv+0x18/0x24 [ 187.554492] netlink_unicast+0x300/0x36c [ 187.558425] netlink_sendmsg+0x1a8/0x420 [ 187.606759] ---[ end trace 0000000000000000 ]--- In addition, some PFs also do not support configuring preemptible TCs, such as eno1 and eno3 on LS1028A. It won't crash like it does for VFs, but we should prevent these PFs from accessing these unimplemented registers. Fixes: 827145392a4a ("net: enetc: only commit preemptible TCs to hardware when MM TX is active") Signed-off-by: Wei Fang Suggested-by: Vladimir Oltean Reviewed-by: Frank Li Signed-off-by: David S. Miller --- drivers/net/ethernet/freescale/enetc/enetc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c index bece220535a1..535969fa0fdb 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -29,6 +29,9 @@ EXPORT_SYMBOL_GPL(enetc_port_mac_wr); static void enetc_change_preemptible_tcs(struct enetc_ndev_priv *priv, u8 preemptible_tcs) { + if (!(priv->si->hw_features & ENETC_SI_F_QBU)) + return; + priv->preemptible_tcs = preemptible_tcs; enetc_mm_commit_preemptible_tcs(priv); } -- 2.51.0 From 0a4cc4accf00b49b4728bb7639cb90a6a5b674e2 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 25 Nov 2024 09:30:39 +0000 Subject: [PATCH 06/16] tcp: populate XPS related fields of timewait sockets syzbot reported that netdev_core_pick_tx() was reading an uninitialized field [1]. This is indeed hapening for timewait sockets after recent commits. We can copy the original established socket sk_tx_queue_mapping and sk_rx_queue_mapping fields, instead of adding more checks in fast paths. As a bonus, packets will use the same transmit queue than prior ones, this potentially can avoid reordering. [1] BUG: KMSAN: uninit-value in netdev_pick_tx+0x5c7/0x1550 netdev_pick_tx+0x5c7/0x1550 netdev_core_pick_tx+0x1d2/0x4a0 net/core/dev.c:4312 __dev_queue_xmit+0x128a/0x57d0 net/core/dev.c:4394 dev_queue_xmit include/linux/netdevice.h:3168 [inline] neigh_hh_output include/net/neighbour.h:523 [inline] neigh_output include/net/neighbour.h:537 [inline] ip_finish_output2+0x187c/0x1b70 net/ipv4/ip_output.c:236 __ip_finish_output+0x287/0x810 ip_finish_output+0x4b/0x600 net/ipv4/ip_output.c:324 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:434 dst_output include/net/dst.h:450 [inline] ip_local_out net/ipv4/ip_output.c:130 [inline] ip_send_skb net/ipv4/ip_output.c:1505 [inline] ip_push_pending_frames+0x444/0x570 net/ipv4/ip_output.c:1525 ip_send_unicast_reply+0x18c1/0x1b30 net/ipv4/ip_output.c:1672 tcp_v4_send_reset+0x238d/0x2a40 net/ipv4/tcp_ipv4.c:910 tcp_v4_rcv+0x48f8/0x5750 net/ipv4/tcp_ipv4.c:2431 ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:460 [inline] ip_sublist_rcv_finish net/ipv4/ip_input.c:578 [inline] ip_list_rcv_finish net/ipv4/ip_input.c:628 [inline] ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:636 ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:670 __netif_receive_skb_list_ptype net/core/dev.c:5715 [inline] __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5762 __netif_receive_skb_list net/core/dev.c:5814 [inline] netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:5905 gro_normal_list include/net/gro.h:515 [inline] napi_complete_done+0x3d4/0x810 net/core/dev.c:6256 virtqueue_napi_complete drivers/net/virtio_net.c:758 [inline] virtnet_poll+0x5d80/0x6bf0 drivers/net/virtio_net.c:3013 __napi_poll+0xe7/0x980 net/core/dev.c:6877 napi_poll net/core/dev.c:6946 [inline] net_rx_action+0xa5a/0x19b0 net/core/dev.c:7068 handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0x68/0x180 kernel/softirq.c:655 irq_exit_rcu+0x12/0x20 kernel/softirq.c:671 common_interrupt+0x97/0xb0 arch/x86/kernel/irq.c:278 asm_common_interrupt+0x2b/0x40 arch/x86/include/asm/idtentry.h:693 __preempt_count_sub arch/x86/include/asm/preempt.h:84 [inline] kmsan_virt_addr_valid arch/x86/include/asm/kmsan.h:95 [inline] virt_to_page_or_null+0xfb/0x150 mm/kmsan/shadow.c:75 kmsan_get_metadata+0x13e/0x1c0 mm/kmsan/shadow.c:141 kmsan_get_shadow_origin_ptr+0x4d/0xb0 mm/kmsan/shadow.c:102 get_shadow_origin_ptr mm/kmsan/instrumentation.c:38 [inline] __msan_metadata_ptr_for_store_4+0x27/0x40 mm/kmsan/instrumentation.c:93 rcu_preempt_read_enter kernel/rcu/tree_plugin.h:390 [inline] __rcu_read_lock+0x46/0x70 kernel/rcu/tree_plugin.h:413 rcu_read_lock include/linux/rcupdate.h:847 [inline] batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:408 [inline] batadv_nc_worker+0x114/0x19e0 net/batman-adv/network-coding.c:719 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xae0/0x1c40 kernel/workqueue.c:3310 worker_thread+0xea7/0x14f0 kernel/workqueue.c:3391 kthread+0x3e2/0x540 kernel/kthread.c:389 ret_from_fork+0x6d/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Uninit was created at: __alloc_pages_noprof+0x9a7/0xe00 mm/page_alloc.c:4774 alloc_pages_mpol_noprof+0x299/0x990 mm/mempolicy.c:2265 alloc_pages_noprof+0x1bf/0x1e0 mm/mempolicy.c:2344 alloc_slab_page mm/slub.c:2412 [inline] allocate_slab+0x320/0x12e0 mm/slub.c:2578 new_slab mm/slub.c:2631 [inline] ___slab_alloc+0x12ef/0x35e0 mm/slub.c:3818 __slab_alloc mm/slub.c:3908 [inline] __slab_alloc_node mm/slub.c:3961 [inline] slab_alloc_node mm/slub.c:4122 [inline] kmem_cache_alloc_noprof+0x57a/0xb20 mm/slub.c:4141 inet_twsk_alloc+0x11f/0x9d0 net/ipv4/inet_timewait_sock.c:188 tcp_time_wait+0x83/0xf50 net/ipv4/tcp_minisocks.c:309 tcp_rcv_state_process+0x145a/0x49d0 tcp_v4_do_rcv+0xbf9/0x11a0 net/ipv4/tcp_ipv4.c:1939 tcp_v4_rcv+0x51df/0x5750 net/ipv4/tcp_ipv4.c:2351 ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:460 [inline] ip_sublist_rcv_finish net/ipv4/ip_input.c:578 [inline] ip_list_rcv_finish net/ipv4/ip_input.c:628 [inline] ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:636 ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:670 __netif_receive_skb_list_ptype net/core/dev.c:5715 [inline] __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5762 __netif_receive_skb_list net/core/dev.c:5814 [inline] netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:5905 gro_normal_list include/net/gro.h:515 [inline] napi_complete_done+0x3d4/0x810 net/core/dev.c:6256 virtqueue_napi_complete drivers/net/virtio_net.c:758 [inline] virtnet_poll+0x5d80/0x6bf0 drivers/net/virtio_net.c:3013 __napi_poll+0xe7/0x980 net/core/dev.c:6877 napi_poll net/core/dev.c:6946 [inline] net_rx_action+0xa5a/0x19b0 net/core/dev.c:7068 handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0x68/0x180 kernel/softirq.c:655 irq_exit_rcu+0x12/0x20 kernel/softirq.c:671 common_interrupt+0x97/0xb0 arch/x86/kernel/irq.c:278 asm_common_interrupt+0x2b/0x40 arch/x86/include/asm/idtentry.h:693 CPU: 0 UID: 0 PID: 3962 Comm: kworker/u8:18 Not tainted 6.12.0-syzkaller-09073-g9f16d5e6f220 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: bat_events batadv_nc_worker Fixes: 79636038d37e ("ipv4: tcp: give socket pointer to control skbs") Fixes: 507a96737d99 ("ipv6: tcp: give socket pointer to control skbs") Reported-by: syzbot+8b0959fc16551d55896b@syzkaller.appspotmail.com Link: https://lore.kernel.org/netdev/674442bd.050a0220.1cc393.0072.GAE@google.com/T/#u Signed-off-by: Eric Dumazet Reviewed-by: Kuniyuki Iwashima Reviewed-by: Brian Vazquez Link: https://patch.msgid.link/20241125093039.3095790-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/net/inet_timewait_sock.h | 2 ++ net/ipv4/tcp_minisocks.c | 4 ++++ 2 files changed, 6 insertions(+) diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index beb533a0e880..62c0a7e65d6b 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -45,6 +45,8 @@ struct inet_timewait_sock { #define tw_node __tw_common.skc_nulls_node #define tw_bind_node __tw_common.skc_bind_node #define tw_refcnt __tw_common.skc_refcnt +#define tw_tx_queue_mapping __tw_common.skc_tx_queue_mapping +#define tw_rx_queue_mapping __tw_common.skc_rx_queue_mapping #define tw_hash __tw_common.skc_hash #define tw_prot __tw_common.skc_prot #define tw_net __tw_common.skc_net diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index bb1fe1ba867a..7121d8573928 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -326,6 +326,10 @@ void tcp_time_wait(struct sock *sk, int state, int timeo) tcptw->tw_last_oow_ack_time = 0; tcptw->tw_tx_delay = tp->tcp_tx_delay; tw->tw_txhash = sk->sk_txhash; + tw->tw_tx_queue_mapping = sk->sk_tx_queue_mapping; +#ifdef CONFIG_SOCK_RX_QUEUE_MAPPING + tw->tw_rx_queue_mapping = sk->sk_rx_queue_mapping; +#endif #if IS_ENABLED(CONFIG_IPV6) if (tw->tw_family == PF_INET6) { struct ipv6_pinfo *np = inet6_sk(sk); -- 2.51.0 From 98337d7c87577ded71114f6976edb70a163e27bc Mon Sep 17 00:00:00 2001 From: Ajay Kaher Date: Mon, 25 Nov 2024 10:59:54 +0000 Subject: [PATCH 07/16] ptp: Add error handling for adjfine callback in ptp_clock_adjtime ptp_clock_adjtime sets ptp->dialed_frequency even when adjfine callback returns an error. This causes subsequent reads to return an incorrect value. Fix this by adding error check before ptp->dialed_frequency is set. Fixes: 39a8cbd9ca05 ("ptp: remember the adjusted frequency") Signed-off-by: Ajay Kaher Acked-by: Richard Cochran Link: https://patch.msgid.link/20241125105954.1509971-1-ajay.kaher@broadcom.com Signed-off-by: Jakub Kicinski --- drivers/ptp/ptp_clock.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c index c56cd0f63909..77a36e7bddd5 100644 --- a/drivers/ptp/ptp_clock.c +++ b/drivers/ptp/ptp_clock.c @@ -150,7 +150,8 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct __kernel_timex *tx) if (ppb > ops->max_adj || ppb < -ops->max_adj) return -ERANGE; err = ops->adjfine(ops, tx->freq); - ptp->dialed_frequency = tx->freq; + if (!err) + ptp->dialed_frequency = tx->freq; } else if (tx->modes & ADJ_OFFSET) { if (ops->adjphase) { s32 max_phase_adj = ops->getmaxphase(ops); -- 2.51.0 From 1596a135e3180c92e42dd1fbcad321f4fb3e3b17 Mon Sep 17 00:00:00 2001 From: Martin Ottens Date: Mon, 25 Nov 2024 18:46:07 +0100 Subject: [PATCH 08/16] net/sched: tbf: correct backlog statistic for GSO packets When the length of a GSO packet in the tbf qdisc is larger than the burst size configured the packet will be segmented by the tbf_segment function. Whenever this function is used to enqueue SKBs, the backlog statistic of the tbf is not increased correctly. This can lead to underflows of the 'backlog' byte-statistic value when these packets are dequeued from tbf. Reproduce the bug: Ensure that the sender machine has GSO enabled. Configured the tbf on the outgoing interface of the machine as follows (burstsize = 1 MTU): $ tc qdisc add dev root handle 1: tbf rate 50Mbit burst 1514 latency 50ms Send bulk TCP traffic out via this interface, e.g., by running an iPerf3 client on this machine. Check the qdisc statistics: $ tc -s qdisc show dev The 'backlog' byte-statistic has incorrect values while traffic is transferred, e.g., high values due to u32 underflows. When the transfer is stopped, the value is != 0, which should never happen. This patch fixes this bug by updating the statistics correctly, even if single SKBs of a GSO SKB cannot be enqueued. Fixes: e43ac79a4bc6 ("sch_tbf: segment too big GSO packets") Signed-off-by: Martin Ottens Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20241125174608.1484356-1-martin.ottens@fau.de Signed-off-by: Jakub Kicinski --- net/sched/sch_tbf.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c index f1d09183ae63..dc26b22d53c7 100644 --- a/net/sched/sch_tbf.c +++ b/net/sched/sch_tbf.c @@ -208,7 +208,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch, struct tbf_sched_data *q = qdisc_priv(sch); struct sk_buff *segs, *nskb; netdev_features_t features = netif_skb_features(skb); - unsigned int len = 0, prev_len = qdisc_pkt_len(skb); + unsigned int len = 0, prev_len = qdisc_pkt_len(skb), seg_len; int ret, nb; segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK); @@ -219,21 +219,27 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch, nb = 0; skb_list_walk_safe(segs, segs, nskb) { skb_mark_not_on_list(segs); - qdisc_skb_cb(segs)->pkt_len = segs->len; - len += segs->len; + seg_len = segs->len; + qdisc_skb_cb(segs)->pkt_len = seg_len; ret = qdisc_enqueue(segs, q->qdisc, to_free); if (ret != NET_XMIT_SUCCESS) { if (net_xmit_drop_count(ret)) qdisc_qstats_drop(sch); } else { nb++; + len += seg_len; } } sch->q.qlen += nb; - if (nb > 1) + sch->qstats.backlog += len; + if (nb > 0) { qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len); - consume_skb(skb); - return nb > 0 ? NET_XMIT_SUCCESS : NET_XMIT_DROP; + consume_skb(skb); + return NET_XMIT_SUCCESS; + } + + kfree_skb(skb); + return NET_XMIT_DROP; } static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch, -- 2.51.0 From eedcad2f2a371786f8a32d0046794103dadcedf3 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 26 Nov 2024 14:59:11 +0000 Subject: [PATCH 09/16] selinux: use sk_to_full_sk() in selinux_ip_output() In blamed commit, TCP started to attach timewait sockets to some skbs. syzbot reported that selinux_ip_output() was not expecting them yet. Note that using sk_to_full_sk() is still allowing the following sk_listener() check to work as before. BUG: KASAN: slab-out-of-bounds in selinux_sock security/selinux/include/objsec.h:207 [inline] BUG: KASAN: slab-out-of-bounds in selinux_ip_output+0x1e0/0x1f0 security/selinux/hooks.c:5761 Read of size 8 at addr ffff88804e86e758 by task syz-executor347/5894 CPU: 0 UID: 0 PID: 5894 Comm: syz-executor347 Not tainted 6.12.0-syzkaller-05480-gfcc79e1714e8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024 Call Trace: __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:377 [inline] print_report+0xc3/0x620 mm/kasan/report.c:488 kasan_report+0xd9/0x110 mm/kasan/report.c:601 selinux_sock security/selinux/include/objsec.h:207 [inline] selinux_ip_output+0x1e0/0x1f0 security/selinux/hooks.c:5761 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xbb/0x200 net/netfilter/core.c:626 nf_hook+0x386/0x6d0 include/linux/netfilter.h:269 __ip_local_out+0x339/0x640 net/ipv4/ip_output.c:119 ip_local_out net/ipv4/ip_output.c:128 [inline] ip_send_skb net/ipv4/ip_output.c:1505 [inline] ip_push_pending_frames+0xa0/0x5b0 net/ipv4/ip_output.c:1525 ip_send_unicast_reply+0xd0e/0x1650 net/ipv4/ip_output.c:1672 tcp_v4_send_ack+0x976/0x13f0 net/ipv4/tcp_ipv4.c:1024 tcp_v4_timewait_ack net/ipv4/tcp_ipv4.c:1077 [inline] tcp_v4_rcv+0x2f96/0x4390 net/ipv4/tcp_ipv4.c:2428 ip_protocol_deliver_rcu+0xba/0x4c0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x316/0x570 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip_local_deliver+0x18e/0x1f0 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:460 [inline] ip_rcv_finish net/ipv4/ip_input.c:447 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip_rcv+0x2c3/0x5d0 net/ipv4/ip_input.c:567 __netif_receive_skb_one_core+0x199/0x1e0 net/core/dev.c:5672 __netif_receive_skb+0x1d/0x160 net/core/dev.c:5785 process_backlog+0x443/0x15f0 net/core/dev.c:6117 __napi_poll.constprop.0+0xb7/0x550 net/core/dev.c:6877 napi_poll net/core/dev.c:6946 [inline] net_rx_action+0xa94/0x1010 net/core/dev.c:7068 handle_softirqs+0x213/0x8f0 kernel/softirq.c:554 do_softirq kernel/softirq.c:455 [inline] do_softirq+0xb2/0xf0 kernel/softirq.c:442 __local_bh_enable_ip+0x100/0x120 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline] __dev_queue_xmit+0x8af/0x43e0 net/core/dev.c:4461 dev_queue_xmit include/linux/netdevice.h:3168 [inline] neigh_hh_output include/net/neighbour.h:523 [inline] neigh_output include/net/neighbour.h:537 [inline] ip_finish_output2+0xc6c/0x2150 net/ipv4/ip_output.c:236 __ip_finish_output net/ipv4/ip_output.c:314 [inline] __ip_finish_output+0x49e/0x950 net/ipv4/ip_output.c:296 ip_finish_output+0x35/0x380 net/ipv4/ip_output.c:324 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip_output+0x13b/0x2a0 net/ipv4/ip_output.c:434 dst_output include/net/dst.h:450 [inline] ip_local_out+0x33e/0x4a0 net/ipv4/ip_output.c:130 __ip_queue_xmit+0x777/0x1970 net/ipv4/ip_output.c:536 __tcp_transmit_skb+0x2b39/0x3df0 net/ipv4/tcp_output.c:1466 tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline] tcp_write_xmit+0x12b1/0x8560 net/ipv4/tcp_output.c:2827 __tcp_push_pending_frames+0xaf/0x390 net/ipv4/tcp_output.c:3010 tcp_send_fin+0x154/0xc70 net/ipv4/tcp_output.c:3616 __tcp_close+0x96b/0xff0 net/ipv4/tcp.c:3130 tcp_close+0x28/0x120 net/ipv4/tcp.c:3221 inet_release+0x13c/0x280 net/ipv4/af_inet.c:435 __sock_release net/socket.c:640 [inline] sock_release+0x8e/0x1d0 net/socket.c:668 smc_clcsock_release+0xb7/0xe0 net/smc/smc_close.c:34 __smc_release+0x5c2/0x880 net/smc/af_smc.c:301 smc_release+0x1fc/0x5f0 net/smc/af_smc.c:344 __sock_release+0xb0/0x270 net/socket.c:640 sock_close+0x1c/0x30 net/socket.c:1408 __fput+0x3f8/0xb60 fs/file_table.c:450 __fput_sync+0xa1/0xc0 fs/file_table.c:535 __do_sys_close fs/open.c:1550 [inline] __se_sys_close fs/open.c:1535 [inline] __x64_sys_close+0x86/0x100 fs/open.c:1535 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f6814c9ae10 Code: ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d b1 e2 07 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c RSP: 002b:00007fffb2389758 EFLAGS: 00000202 ORIG_RAX: 0000000000000003 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f6814c9ae10 RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000003 RBP: 00000000000f4240 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000202 R12: 00007fffb23897b0 R13: 00000000000141c3 R14: 00007fffb238977c R15: 00007fffb2389790 Fixes: 79636038d37e ("ipv4: tcp: give socket pointer to control skbs") Reported-by: syzbot+2d9f5f948c31dcb7745e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/6745e1a2.050a0220.1286eb.001c.GAE@google.com/T/#u Signed-off-by: Eric Dumazet Acked-by: Paul Moore Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20241126145911.4187198-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- security/selinux/hooks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index f5a08f94e094..366c87a40bd1 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -5738,7 +5738,7 @@ static unsigned int selinux_ip_output(void *priv, struct sk_buff *skb, /* we do this in the LOCAL_OUT path and not the POST_ROUTING path * because we want to make sure we apply the necessary labeling * before IPsec is applied so we can leverage AH protection */ - sk = skb->sk; + sk = sk_to_full_sk(skb->sk); if (sk) { struct sk_security_struct *sksec; -- 2.51.0 From 16ed454515a4fdcb8050a399bdcba6961aca089d Mon Sep 17 00:00:00 2001 From: Vyshnav Ajith Date: Fri, 22 Nov 2024 04:18:27 +0530 Subject: [PATCH 10/16] docs: net: bareudp: fix spelling and grammar mistakes The BareUDP documentation had several grammar and spelling mistakes, making it harder to read. This patch fixes those errors to improve clarity and readability for developers. Signed-off-by: Vyshnav Ajith Link: https://patch.msgid.link/20241121224827.12293-1-puthen1977@gmail.com Signed-off-by: Jakub Kicinski --- Documentation/networking/bareudp.rst | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/Documentation/networking/bareudp.rst b/Documentation/networking/bareudp.rst index b9d04ee6dac1..621cb9575c8f 100644 --- a/Documentation/networking/bareudp.rst +++ b/Documentation/networking/bareudp.rst @@ -6,16 +6,17 @@ Bare UDP Tunnelling Module Documentation There are various L3 encapsulation standards using UDP being discussed to leverage the UDP based load balancing capability of different networks. -MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them. +MPLSoUDP (https://tools.ietf.org/html/rfc7510) is one among them. The Bareudp tunnel module provides a generic L3 encapsulation support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside a UDP tunnel. Special Handling ---------------- + The bareudp device supports special handling for MPLS & IP as they can have multiple ethertypes. -MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). +The MPLS protocol can have ethertypes ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). IP protocol can have ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6). This special handling can be enabled only for ethertypes ETH_P_IP & ETH_P_MPLS_UC with a flag called multiproto mode. @@ -52,7 +53,7 @@ be enabled explicitly with the "multiproto" flag. 3) Device Usage The bareudp device could be used along with OVS or flower filter in TC. -The OVS or TC flower layer must set the tunnel information in SKB dst field before -sending packet buffer to the bareudp device for transmission. On reception the -bareudp device extracts and stores the tunnel information in SKB dst field before +The OVS or TC flower layer must set the tunnel information in the SKB dst field before +sending the packet buffer to the bareudp device for transmission. On reception, the +bareUDP device extracts and stores the tunnel information in the SKB dst field before passing the packet buffer to the network stack. -- 2.51.0 From b9653d19e556c6afd035602927a93d100a0d7644 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 26 Nov 2024 14:43:44 +0000 Subject: [PATCH 11/16] net: hsr: avoid potential out-of-bound access in fill_frame_info() syzbot is able to feed a packet with 14 bytes, pretending it is a vlan one. Since fill_frame_info() is relying on skb->mac_len already, extend the check to cover this case. BUG: KMSAN: uninit-value in fill_frame_info net/hsr/hsr_forward.c:709 [inline] BUG: KMSAN: uninit-value in hsr_forward_skb+0x9ee/0x3b10 net/hsr/hsr_forward.c:724 fill_frame_info net/hsr/hsr_forward.c:709 [inline] hsr_forward_skb+0x9ee/0x3b10 net/hsr/hsr_forward.c:724 hsr_dev_xmit+0x2f0/0x350 net/hsr/hsr_device.c:235 __netdev_start_xmit include/linux/netdevice.h:5002 [inline] netdev_start_xmit include/linux/netdevice.h:5011 [inline] xmit_one net/core/dev.c:3590 [inline] dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3606 __dev_queue_xmit+0x366a/0x57d0 net/core/dev.c:4434 dev_queue_xmit include/linux/netdevice.h:3168 [inline] packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3146 [inline] packet_sendmsg+0x91ae/0xa6f0 net/packet/af_packet.c:3178 sock_sendmsg_nosec net/socket.c:711 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:726 __sys_sendto+0x594/0x750 net/socket.c:2197 __do_sys_sendto net/socket.c:2204 [inline] __se_sys_sendto net/socket.c:2200 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2200 x64_sys_call+0x346a/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:45 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4091 [inline] slab_alloc_node mm/slub.c:4134 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678 alloc_skb include/linux/skbuff.h:1323 [inline] alloc_skb_with_frags+0xc8/0xd00 net/core/skbuff.c:6612 sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2881 packet_alloc_skb net/packet/af_packet.c:2995 [inline] packet_snd net/packet/af_packet.c:3089 [inline] packet_sendmsg+0x74c6/0xa6f0 net/packet/af_packet.c:3178 sock_sendmsg_nosec net/socket.c:711 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:726 __sys_sendto+0x594/0x750 net/socket.c:2197 __do_sys_sendto net/socket.c:2204 [inline] __se_sys_sendto net/socket.c:2200 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2200 x64_sys_call+0x346a/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:45 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 48b491a5cc74 ("net: hsr: fix mac_len checks") Reported-by: syzbot+671e2853f9851d039551@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6745dc7f.050a0220.21d33d.0018.GAE@google.com/T/#u Signed-off-by: Eric Dumazet Cc: WingMan Kwok Cc: Murali Karicheri Cc: MD Danish Anwar Cc: Jiri Pirko Cc: George McCollister Link: https://patch.msgid.link/20241126144344.4177332-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/hsr/hsr_forward.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/hsr/hsr_forward.c b/net/hsr/hsr_forward.c index aa6acebc7c1e..87bb3a91598e 100644 --- a/net/hsr/hsr_forward.c +++ b/net/hsr/hsr_forward.c @@ -700,6 +700,8 @@ static int fill_frame_info(struct hsr_frame_info *frame, frame->is_vlan = true; if (frame->is_vlan) { + if (skb->mac_len < offsetofend(struct hsr_vlan_ethhdr, vlanhdr)) + return -EINVAL; vlan_hdr = (struct hsr_vlan_ethhdr *)ethhdr; proto = vlan_hdr->vlanhdr.h_vlan_encapsulated_proto; } -- 2.51.0 From be75cda92a65a13db242117d674cd5584477a168 Mon Sep 17 00:00:00 2001 From: Daniel Xu Date: Wed, 27 Nov 2024 15:58:29 -0700 Subject: [PATCH 12/16] bnxt_en: ethtool: Supply ntuple rss context action Commit 2f4f9fe5bf5f ("bnxt_en: Support adding ntuple rules on RSS contexts") added support for redirecting to an RSS context as an ntuple rule action. However, it forgot to update the ETHTOOL_GRXCLSRULE codepath. This caused `ethtool -n` to always report the action as "Action: Direct to queue 0" which is wrong. Fix by teaching bnxt driver to report the RSS context when applicable. Fixes: 2f4f9fe5bf5f ("bnxt_en: Support adding ntuple rules on RSS contexts") Reviewed-by: Michael Chan Signed-off-by: Daniel Xu Link: https://patch.msgid.link/2e884ae39e08dc5123be7c170a6089cefe6a78f7.1732748253.git.dxu@dxuuu.xyz Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index f1f6bb328a55..d87681d71106 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -1187,10 +1187,14 @@ static int bnxt_grxclsrule(struct bnxt *bp, struct ethtool_rxnfc *cmd) } } - if (fltr->base.flags & BNXT_ACT_DROP) + if (fltr->base.flags & BNXT_ACT_DROP) { fs->ring_cookie = RX_CLS_FLOW_DISC; - else + } else if (fltr->base.flags & BNXT_ACT_RSS_CTX) { + fs->flow_type |= FLOW_RSS; + cmd->rss_context = fltr->base.fw_vnic_id; + } else { fs->ring_cookie = fltr->base.rxq; + } rc = 0; fltr_err: -- 2.51.0 From 7078d43b2374daedf254662053e924c938eb86b3 Mon Sep 17 00:00:00 2001 From: Daniel Xu Date: Wed, 27 Nov 2024 15:58:30 -0700 Subject: [PATCH 13/16] selftests: drv-net: rss_ctx: Add test for ntuple rule Extend the rss_ctx test suite to test that an ntuple action that redirects to an RSS context contains that information in `ethtool -n`. Otherwise the output from ethtool is highly deceiving. This test helps ensure drivers are compliant with the API. Signed-off-by: Daniel Xu Link: https://patch.msgid.link/759870e430b7c93ecaae6e448f30a47284c59637.1732748253.git.dxu@dxuuu.xyz Signed-off-by: Jakub Kicinski --- tools/testing/selftests/drivers/net/hw/rss_ctx.py | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/drivers/net/hw/rss_ctx.py b/tools/testing/selftests/drivers/net/hw/rss_ctx.py index 0b49ce7ae678..ca8a7edff3dd 100755 --- a/tools/testing/selftests/drivers/net/hw/rss_ctx.py +++ b/tools/testing/selftests/drivers/net/hw/rss_ctx.py @@ -3,7 +3,8 @@ import datetime import random -from lib.py import ksft_run, ksft_pr, ksft_exit, ksft_eq, ksft_ne, ksft_ge, ksft_lt +import re +from lib.py import ksft_run, ksft_pr, ksft_exit, ksft_eq, ksft_ne, ksft_ge, ksft_lt, ksft_true from lib.py import NetDrvEpEnv from lib.py import EthtoolFamily, NetdevFamily from lib.py import KsftSkipEx, KsftFailEx @@ -96,6 +97,13 @@ def _send_traffic_check(cfg, port, name, params): f"traffic on inactive queues ({name}): " + str(cnts)) +def _ntuple_rule_check(cfg, rule_id, ctx_id): + """Check that ntuple rule references RSS context ID""" + text = ethtool(f"-n {cfg.ifname} rule {rule_id}").stdout + pattern = f"RSS Context (ID: )?{ctx_id}" + ksft_true(re.search(pattern, text), "RSS context not referenced in ntuple rule") + + def test_rss_key_indir(cfg): """Test basics like updating the main RSS key and indirection table.""" @@ -459,6 +467,8 @@ def test_rss_context(cfg, ctx_cnt=1, create_with_cfg=None): ntuple = ethtool_create(cfg, "-N", flow) defer(ethtool, f"-N {cfg.ifname} delete {ntuple}") + _ntuple_rule_check(cfg, ntuple, ctx_id) + for i in range(ctx_cnt): _send_traffic_check(cfg, ports[i], f"context {i}", { 'target': (2+i*2, 3+i*2), -- 2.51.0 From c44daa7e3c73229f7ac74985acb8c7fb909c4e0a Mon Sep 17 00:00:00 2001 From: Dong Chenchen Date: Wed, 27 Nov 2024 12:08:50 +0800 Subject: [PATCH 14/16] net: Fix icmp host relookup triggering ip_rt_bug arp link failure may trigger ip_rt_bug while xfrm enabled, call trace is: WARNING: CPU: 0 PID: 0 at net/ipv4/route.c:1241 ip_rt_bug+0x14/0x20 Modules linked in: CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc6-00077-g2e1b3cc9d7f7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:ip_rt_bug+0x14/0x20 Call Trace: ip_send_skb+0x14/0x40 __icmp_send+0x42d/0x6a0 ipv4_link_failure+0xe2/0x1d0 arp_error_report+0x3c/0x50 neigh_invalidate+0x8d/0x100 neigh_timer_handler+0x2e1/0x330 call_timer_fn+0x21/0x120 __run_timer_base.part.0+0x1c9/0x270 run_timer_softirq+0x4c/0x80 handle_softirqs+0xac/0x280 irq_exit_rcu+0x62/0x80 sysvec_apic_timer_interrupt+0x77/0x90 The script below reproduces this scenario: ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 \ dir out priority 0 ptype main flag localok icmp ip l a veth1 type veth ip a a 192.168.141.111/24 dev veth0 ip l s veth0 up ping 192.168.141.155 -c 1 icmp_route_lookup() create input routes for locally generated packets while xfrm relookup ICMP traffic.Then it will set input route (dst->out = ip_rt_bug) to skb for DESTUNREACH. For ICMP err triggered by locally generated packets, dst->dev of output route is loopback. Generally, xfrm relookup verification is not required on loopback interfaces (net.ipv4.conf.lo.disable_xfrm = 1). Skip icmp relookup for locally generated packets to fix it. Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support") Signed-off-by: Dong Chenchen Reviewed-by: David Ahern Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20241127040850.1513135-1-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski --- net/ipv4/icmp.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 4f088fa1c2f2..963a89ae9c26 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -517,6 +517,9 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4, if (!IS_ERR(dst)) { if (rt != rt2) return rt; + if (inet_addr_type_dev_table(net, route_lookup_dev, + fl4->daddr) == RTN_LOCAL) + return rt; } else if (PTR_ERR(dst) == -EPERM) { rt = NULL; } else { -- 2.51.0 From a747e02430dfb3657141f99aa6b09331283fa493 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 26 Nov 2024 19:28:27 +0000 Subject: [PATCH 15/16] ipv6: avoid possible NULL deref in modify_prefix_route() syzbot found a NULL deref [1] in modify_prefix_route(), caused by one fib6_info without a fib6_table pointer set. This can happen for net->ipv6.fib6_null_entry [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] CPU: 1 UID: 0 PID: 5837 Comm: syz-executor888 Not tainted 6.12.0-syzkaller-09567-g7eef7e306d3c #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:__lock_acquire+0xe4/0x3c40 kernel/locking/lockdep.c:5089 Code: 08 84 d2 0f 85 15 14 00 00 44 8b 0d ca 98 f5 0e 45 85 c9 0f 84 b4 0e 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 96 2c 00 00 49 8b 04 24 48 3d a0 07 7f 93 0f 84 RSP: 0018:ffffc900035d7268 EFLAGS: 00010006 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000006 RSI: 1ffff920006bae5f RDI: 0000000000000030 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 R10: ffffffff90608e17 R11: 0000000000000001 R12: 0000000000000030 R13: ffff888036334880 R14: 0000000000000000 R15: 0000000000000000 FS: 0000555579e90380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffc59cc4278 CR3: 0000000072b54000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5849 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] modify_prefix_route+0x30b/0x8b0 net/ipv6/addrconf.c:4831 inet6_addr_modify net/ipv6/addrconf.c:4923 [inline] inet6_rtm_newaddr+0x12c7/0x1ab0 net/ipv6/addrconf.c:5055 rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6920 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2541 netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline] netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1347 netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1891 sock_sendmsg_nosec net/socket.c:711 [inline] __sock_sendmsg net/socket.c:726 [inline] ____sys_sendmsg+0xaaf/0xc90 net/socket.c:2583 ___sys_sendmsg+0x135/0x1e0 net/socket.c:2637 __sys_sendmsg+0x16e/0x220 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fd1dcef8b79 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc59cc4378 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd1dcef8b79 RDX: 0000000000040040 RSI: 0000000020000140 RDI: 0000000000000004 RBP: 00000000000113fd R08: 0000000000000006 R09: 0000000000000006 R10: 0000000000000006 R11: 0000000000000246 R12: 00007ffc59cc438c R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001 Fixes: 5eb902b8e719 ("net/ipv6: Remove expired routes with a separated list of routes.") Reported-by: syzbot+1de74b0794c40c8eb300@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/67461f7f.050a0220.1286eb.0021.GAE@google.com/T/#u Signed-off-by: Eric Dumazet CC: Kui-Feng Lee Cc: David Ahern Reviewed-by: David Ahern Signed-off-by: David S. Miller --- net/ipv6/addrconf.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index c489a1e6aec9..0e765466d7f7 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4821,7 +4821,7 @@ inet6_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, ifm->ifa_prefixlen, extack); } -static int modify_prefix_route(struct inet6_ifaddr *ifp, +static int modify_prefix_route(struct net *net, struct inet6_ifaddr *ifp, unsigned long expires, u32 flags, bool modify_peer) { @@ -4845,7 +4845,9 @@ static int modify_prefix_route(struct inet6_ifaddr *ifp, ifp->prefix_len, ifp->rt_priority, ifp->idev->dev, expires, flags, GFP_KERNEL); - } else { + return 0; + } + if (f6i != net->ipv6.fib6_null_entry) { table = f6i->fib6_table; spin_lock_bh(&table->tb6_lock); @@ -4858,9 +4860,8 @@ static int modify_prefix_route(struct inet6_ifaddr *ifp, } spin_unlock_bh(&table->tb6_lock); - - fib6_info_release(f6i); } + fib6_info_release(f6i); return 0; } @@ -4939,7 +4940,7 @@ static int inet6_addr_modify(struct net *net, struct inet6_ifaddr *ifp, int rc = -ENOENT; if (had_prefixroute) - rc = modify_prefix_route(ifp, expires, flags, false); + rc = modify_prefix_route(net, ifp, expires, flags, false); /* prefix route could have been deleted; if so restore it */ if (rc == -ENOENT) { @@ -4949,7 +4950,7 @@ static int inet6_addr_modify(struct net *net, struct inet6_ifaddr *ifp, } if (had_prefixroute && !ipv6_addr_any(&ifp->peer_addr)) - rc = modify_prefix_route(ifp, expires, flags, true); + rc = modify_prefix_route(net, ifp, expires, flags, true); if (rc == -ENOENT && !ipv6_addr_any(&ifp->peer_addr)) { addrconf_prefix_route(&ifp->peer_addr, ifp->prefix_len, -- 2.51.0 From 28866d6e84b8d36a76b2cde221391aed1294e5cd Mon Sep 17 00:00:00 2001 From: Geetha sowjanya Date: Tue, 26 Nov 2024 17:14:31 +0530 Subject: [PATCH 16/16] octeontx2-af: Fix SDP MAC link credits configuration Current driver allows only packet size < 512B as SDP_LINK_CREDIT register is set to default value. This patch fixes this issue by configure the register with maximum HW supported value to allow packet size > 512B. Fixes: 2f7f33a09516 ("octeontx2-pf: Add representors for sdp MAC") Signed-off-by: Geetha sowjanya Signed-off-by: David S. Miller --- drivers/net/ethernet/marvell/octeontx2/af/common.h | 1 + drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/common.h b/drivers/net/ethernet/marvell/octeontx2/af/common.h index 5d84386ed22d..406c59100a35 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/common.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/common.h @@ -159,6 +159,7 @@ enum nix_scheduler { #define SDP_HW_MIN_FRS 16 #define CN10K_LMAC_LINK_MAX_FRS 16380 /* 16k - FCS */ #define CN10K_LBK_LINK_MAX_FRS 65535 /* 64k */ +#define SDP_LINK_CREDIT 0x320202 /* NIX RX action operation*/ #define NIX_RX_ACTIONOP_DROP (0x0ull) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c index 5d5a01dbbca1..a5d1e2bddd58 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c @@ -4672,6 +4672,9 @@ static void nix_link_config(struct rvu *rvu, int blkaddr, rvu_get_lbk_link_max_frs(rvu, &lbk_max_frs); rvu_get_lmac_link_max_frs(rvu, &lmac_max_frs); + /* Set SDP link credit */ + rvu_write64(rvu, blkaddr, NIX_AF_SDP_LINK_CREDIT, SDP_LINK_CREDIT); + /* Set default min/max packet lengths allowed on NIX Rx links. * * With HW reset minlen value of 60byte, HW will treat ARP pkts -- 2.51.0