300-400B RPC requests are fairly common. With the current default
of 256B HDS threshold bnxt ends up splitting those, lowering PCIe
bandwidth efficiency and increasing the number of memory allocation.
Increase the HDS threshold to fit 4 buffers in a 4k page.
This works out to 640B as the threshold on a typical kernel confing.
This change increases the performance for a microbenchmark which
receives 400B RPCs and sends empty responses by 4.5%.
Admittedly this is just a single benchmark, but 256B works out to
just 6 (so 2 more) packets per head page, because shinfo size
dominates the headers.
Now that we use page pool for the header pages I was also tempted
to default rx_copybreak to 0, but in synthetic testing the copybreak
size doesn't seem to make much difference.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
static void bnxt_init_ring_params(struct bnxt *bp)
{
+ unsigned int rx_size;
+
bp->rx_copybreak = BNXT_DEFAULT_RX_COPYBREAK;
- bp->dev->cfg->hds_thresh = BNXT_DEFAULT_RX_COPYBREAK;
+ /* Try to fit 4 chunks into a 4k page */
+ rx_size = SZ_1K -
+ NET_SKB_PAD - SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ bp->dev->cfg->hds_thresh = max(BNXT_DEFAULT_RX_COPYBREAK, rx_size);
}
/* bp->rx_ring_size, bp->tx_ring_size, dev->mtu, BNXT_FLAG_{G|L}RO flags must