]> www.infradead.org Git - users/jedix/linux-maple.git/commit
Merge branch 'bpf-cpumap-enable-gro-for-xdp_pass-frames'
authorPaolo Abeni <pabeni@redhat.com>
Thu, 27 Feb 2025 13:03:52 +0000 (14:03 +0100)
committerPaolo Abeni <pabeni@redhat.com>
Thu, 27 Feb 2025 13:03:52 +0000 (14:03 +0100)
commita0a9d4d2b7f3f5ea84853e658c8c396334654ce7
tree6aa87c40948fffded186c2c596eb76292ee99c03
parent01358e8fe922f716c05d7864ac2213b2440026e7
parentb696d289c07d8480a7d4752e448f4ee2bee9e443
Merge branch 'bpf-cpumap-enable-gro-for-xdp_pass-frames'

Alexander Lobakin says:

====================
bpf: cpumap: enable GRO for XDP_PASS frames

Several months ago, I had been looking through my old XDP hints tree[0]
to check whether some patches not directly related to hints can be sent
standalone. Roughly at the same time, Daniel appeared and asked[1] about
GRO for cpumap from that tree.

Currently, cpumap uses its own kthread which processes cpumap-redirected
frames by batches of 8, without any weighting (but with rescheduling
points). The resulting skbs get passed to the stack via
netif_receive_skb_list(), which means no GRO happens.
Even though we can't currently pass checksum status from the drivers,
in many cases GRO performs better than the listified Rx without the
aggregation, confirmed by tests.

In order to enable GRO in cpumap, we need to do the following:

* patches 1-2: decouple the GRO struct from the NAPI struct and allow
  using it out of a NAPI entity within the kernel core code;
* patch 3: switch cpumap from netif_receive_skb_list() to
  gro_receive_skb().

Additional improvements:

* patch 4: optimize XDP_PASS in cpumap by using arrays instead of linked
  lists;
* patch 5-6: introduce and use function do get skbs from the NAPI percpu
  caches by bulks, not one at a time;
* patch 7-8: use that function in veth as well and remove the one that
  was now superseded by it.

My trafficgen UDP GRO tests, small frame sizes:

                GRO off    GRO on
baseline        2.7        N/A       Mpps
patch 3         2.3        4         Mpps
patch 8         2.4        4.7       Mpps

1...3 diff      -17        +48       %
1...8 diff      -11        +74       %

Daniel reported from +14%[2] to +18%[3] of throughput in neper's TCP RR
tests. On my system however, the same test gave me up to +100%.

Note that there's a series from Lorenzo[4] which achieves the same, but
in a different way. During the discussions, the approach using a
standalone GRO instance was preferred over the threaded NAPI.

[0] https://github.com/alobakin/linux/tree/xdp_hints
[1] https://lore.kernel.org/bpf/cadda351-6e93-4568-ba26-21a760bf9a57@app.fastmail.com
[2] https://lore.kernel.org/bpf/merfatcdvwpx2lj4j2pahhwp4vihstpidws3jwljwazhh76xkd@t5vsh4gvk4mh
[3] https://lore.kernel.org/bpf/yzda66wro5twmzpmjoxvy4si5zvkehlmgtpi6brheek3sj73tj@o7kd6nurr3o6
[4] https://lore.kernel.org/bpf/20241130-cpumap-gro-v1-0-c1180b1b5758@kernel.org
====================

Link: https://patch.msgid.link/20250225171751.2268401-1-aleksander.lobakin@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>