net/mlx4_core: Use round robin scheme to avoid stale caches
The mlx4 driver in uek4 has a bug where frequent re-use of CQs, MPTs,
or SRQs leads to memory corruption and subsequent crash of lwipc.
The issue has not been root-caused, but by partly reverting the
upstream commit
7c6d74d23a33 ("mlx4_core: Roll back round robin bitmap
allocation commit for CQs, SRQs, and MPTs") by re-introducing
round-robin (RR) allocation of said structures, we have a mitigation,
and the bug does not reproduce.
The root-cause of bug
25730857 is tracked by bug
26266051.
The commit message of the upstream commit states a performance concern
related to the use of RR. Simple testing using this commit reveals up
to 20% performance regression running simple OF-UV tests in loop, but
these tests are not deemed close to any real use-cases.
The same RR is in uek2 and performance issues are not reported related
to the concern.
The plan is therefore to merge this commit, to buy some time to
root-cause the issue. When the issue is root-caused, this commit
should be reverted.
Orabug:
25730857
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>