From: David S. Miller Date: Wed, 20 Apr 2022 09:42:57 +0000 (+0100) Subject: Merge branch 'atlantic-xdp-multi-buffer' X-Git-Tag: sched-urgent-2022-06-05~35^2~292 X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=e97e917b0efbfbf5dabffac63c6cacfd765fa403;p=users%2Fdwmw2%2Flinux.git Merge branch 'atlantic-xdp-multi-buffer' [PATCH net-next v5 0/3] net: atlantic: Add XDP support @ 2022-04-17 10:12 Taehee Yoo 2022-04-17 10:12 ` [PATCH net-next v5 1/3] net: atlantic: Implement xdp control plane Taehee Yoo ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: Taehee Yoo @ 2022-04-17 10:12 UTC (permalink / raw) To: davem, kuba, pabeni, netdev, irusskikh, ast, daniel, hawk, john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh, bpf Cc: ap420073 This patchset is to make atlantic to support multi-buffer XDP. The first patch implement control plane of xdp. The aq_xdp(), callback of .xdp_bpf is added. The second patch implements data plane of xdp. XDP_TX, XDP_DROP, and XDP_PASS is supported. __aq_ring_xdp_clean() is added to receive and execute xdp program. aq_nic_xmit_xdpf() is added to send packet by XDP. The third patch implements callback of .ndo_xdp_xmit. aq_xdp_xmit() is added to send redirected packets and it internally calls aq_nic_xmit_xdpf(). Memory model is MEM_TYPE_PAGE_SHARED. Order-2 page allocation is used when XDP is enabled. LRO will be disabled if XDP program doesn't supports multi buffer. AQC chip supports 32 multi-queues and 8 vectors(irq). There are two options. 1. under 8 cores and maximum 4 tx queues per core. 2. under 4 cores and maximum 8 tx queues per core. Like other drivers, these tx queues can be used only for XDP_TX, XDP_REDIRECT queue. If so, no tx_lock is needed. But this patchset doesn't use this strategy because getting hardware tx queue index cost is too high. So, tx_lock is used in the aq_nic_xmit_xdpf(). single-core, single queue, 80% cpu utilization. 32.30% [kernel] [k] aq_get_rxpages_xdp 10.44% [kernel] [k] aq_hw_read_reg <---------- here 9.86% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx 5.51% [kernel] [k] aq_ring_rx_clean single-core, 8 queues, 100% cpu utilization, half PPS. 52.03% [kernel] [k] aq_hw_read_reg <---------- here 18.24% [kernel] [k] aq_get_rxpages_xdp 4.30% [kernel] [k] hw_atl_b0_hw_ring_rx_receive 4.24% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx 2.79% [kernel] [k] aq_ring_rx_clean Performance result(64 Byte) 1. XDP_TX a. xdp_geieric, single core - 2.5Mpps, 100% cpu b. xdp_driver, single core - 4.5Mpps, 80% cpu c. xdp_generic, 8 core(hyper thread) - 6.3Mpps, 40% cpu d. xdp_driver, 8 core(hyper thread) - 6.3Mpps, 30% cpu 2. XDP_REDIRECT a. xdp_generic, single core - 2.3Mpps b. xdp_driver, single core - 4.5Mpps v5: - Use MEM_TYPE_PAGE_SHARED instead of MEM_TYPE_PAGE_ORDER0 - Use 2K frame size instead of 3K - Use order-2 page allocation instead of order-0 - Rename aq_get_rxpage() to aq_alloc_rxpages() - Add missing PageFree stats for ethtool - Remove aq_unset_rxpage_xdp(), introduced by v2 patch due to change of memory model - Fix wrong last parameter value of xdp_prepare_buff() - Add aq_get_rxpages_xdp() to increase page reference count v4: - Fix compile warning v3: - Change wrong PPS performance result 40% -> 80% in single core(Intel i3-12100) - Separate aq_nic_map_xdp() from aq_nic_map_skb() - Drop multi buffer packets if single buffer XDP is attached - Disable LRO when single buffer XDP is attached - Use xdp_get_{frame/buff}_len() v2: - Do not use inline in C file Taehee Yoo (3): net: atlantic: Implement xdp control plane net: atlantic: Implement xdp data plane net: atlantic: Implement .ndo_xdp_xmit handler .../net/ethernet/aquantia/atlantic/aq_cfg.h | 1 + .../ethernet/aquantia/atlantic/aq_ethtool.c | 9 + .../net/ethernet/aquantia/atlantic/aq_main.c | 87 ++++ .../net/ethernet/aquantia/atlantic/aq_main.h | 2 + .../net/ethernet/aquantia/atlantic/aq_nic.c | 136 ++++++ .../net/ethernet/aquantia/atlantic/aq_nic.h | 5 + .../net/ethernet/aquantia/atlantic/aq_ring.c | 409 ++++++++++++++++-- .../net/ethernet/aquantia/atlantic/aq_ring.h | 21 +- .../net/ethernet/aquantia/atlantic/aq_vec.c | 23 +- .../net/ethernet/aquantia/atlantic/aq_vec.h | 6 + .../aquantia/atlantic/hw_atl/hw_atl_a0.c | 6 +- .../aquantia/atlantic/hw_atl/hw_atl_b0.c | 10 +- 12 files changed, 670 insertions(+), 45 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 4+ messages in thread * [PATCH net-next v5 1/3] net: atlantic: Implement xdp control plane 2022-04-17 10:12 [PATCH net-next v5 0/3] net: atlantic: Add XDP support Taehee Yoo @ 2022-04-17 10:12 ` Taehee Yoo 2022-04-17 10:12 ` [PATCH net-next v5 2/3] net: atlantic: Implement xdp data plane Taehee Yoo 2022-04-17 10:12 ` [PATCH net-next v5 3/3] net: atlantic: Implement .ndo_xdp_xmit handler Taehee Yoo 2 siblings, 0 replies; 4+ messages in thread From: Taehee Yoo @ 2022-04-17 10:12 UTC (permalink / raw) To: davem, kuba, pabeni, netdev, irusskikh, ast, daniel, hawk, john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh, bpf Cc: ap420073 aq_xdp() is a xdp setup callback function for Atlantic driver. When XDP is attached or detached, the device will be restarted because it uses different headroom, tailroom, and page order value. If XDP enabled, it switches default page order value from 0 to 2. Because the default maximum frame size is still 2K and it needs additional area for headroom and tailroom. The total size(headroom + frame size + tailroom) is 2624. So, 1472Bytes will be always wasted for every frame. But when order-2 is used, these pages can be used 6 times with flip strategy. It means only about 106Bytes per frame will be wasted. Also, It supports xdp fragment feature. MTU can be 16K if xdp prog supports xdp fragment. If not, MTU can not exceed 2K - ETH_HLEN - ETH_FCS. And a static key is added and It will be used to call the xdp_clean handler in ->poll(). data plane implementation will be contained the followed patch. Signed-off-by: Taehee Yoo --- v5: - Use MEM_TYPE_PAGE_SHARED instead of MEM_TYPE_PAGE_ORDER0 - Use 2K frame size instead of 3K - Use order-2 page allocation instead of order-0 - Rename aq_get_rxpage() to aq_alloc_rxpages() v4: - No changed v3: - Disable LRO when single buffer XDP is attached v2: - No changed --- e97e917b0efbfbf5dabffac63c6cacfd765fa403