net: mana: add a function to spread IRQs per CPUs
Souradeep investigated that the driver performs faster if IRQs are
spread on CPUs with the following heuristics:
1. No more than one IRQ per CPU, if possible;
2. NUMA locality is the second priority;
3. Sibling dislocality is the last priority.
Let's consider this topology:
Node 0 1
Core 0 1 2 3
CPU 0 1 2 3 4 5 6 7
The most performant IRQ distribution based on the above topology
and heuristics may look like this:
IRQ Nodes Cores CPUs
0 1 0 0-1
1 1 1 2-3
2 1 0 0-1
3 1 1 2-3
4 2 2 4-5
5 2 3 6-7
6 2 2 4-5
7 2 3 6-7
The irq_setup() routine introduced in this patch leverages the
for_each_numa_hop_mask() iterator and assigns IRQs to sibling groups
as described above.
According to [1], for NUMA-aware but sibling-ignorant IRQ distribution
based on cpumask_local_spread() performance test results look like this:
./ntttcp -r -m 16
NTTTCP for Linux 1.4.0
---------------------------------------------------------
08:05:20 INFO: 17 threads created
08:05:28 INFO: Network activity progressing...
08:06:28 INFO: Test run completed.
08:06:28 INFO: Test cycle finished.
08:06:28 INFO: ##### Totals: #####
08:06:28 INFO: test duration :60.00 seconds
08:06:28 INFO: total bytes :
630292053310
08:06:28 INFO: throughput :84.04Gbps
08:06:28 INFO: retrans segs :4
08:06:28 INFO: cpu cores :192
08:06:28 INFO: cpu speed :3799.725MHz
08:06:28 INFO: user :0.05%
08:06:28 INFO: system :1.60%
08:06:28 INFO: idle :96.41%
08:06:28 INFO: iowait :0.00%
08:06:28 INFO: softirq :1.94%
08:06:28 INFO: cycles/byte :2.50
08:06:28 INFO: cpu busy (all) :534.41%
For NUMA- and sibling-aware IRQ distribution, the same test works
15% faster:
./ntttcp -r -m 16
NTTTCP for Linux 1.4.0
---------------------------------------------------------
08:08:51 INFO: 17 threads created
08:08:56 INFO: Network activity progressing...
08:09:56 INFO: Test run completed.
08:09:56 INFO: Test cycle finished.
08:09:56 INFO: ##### Totals: #####
08:09:56 INFO: test duration :60.00 seconds
08:09:56 INFO: total bytes :
741966608384
08:09:56 INFO: throughput :98.93Gbps
08:09:56 INFO: retrans segs :6
08:09:56 INFO: cpu cores :192
08:09:56 INFO: cpu speed :3799.791MHz
08:09:56 INFO: user :0.06%
08:09:56 INFO: system :1.81%
08:09:56 INFO: idle :96.18%
08:09:56 INFO: iowait :0.00%
08:09:56 INFO: softirq :1.95%
08:09:56 INFO: cycles/byte :2.25
08:09:56 INFO: cpu busy (all) :569.22%
[1] https://lore.kernel.org/all/
20231211063726.GA4977@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Co-developed-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>