The mlx4 driver in uek4 has a bug where frequent re-use of CQs, MPTs,
or SRQs leads to memory corruption and subsequent crash of lwipc.
The issue has not been root-caused, but by partly reverting the
upstream commit
7c6d74d23a33 ("mlx4_core: Roll back round robin bitmap
allocation commit for CQs, SRQs, and MPTs") by re-introducing
round-robin (RR) allocation of said structures, we have a mitigation,
and the bug does not reproduce.
The root-cause of bug
25730857 is tracked by bug
26266051.
The commit message of the upstream commit states a performance concern
related to the use of RR. Simple testing using this commit reveals up
to 20% performance regression running simple OF-UV tests in loop, but
these tests are not deemed close to any real use-cases.
The same RR is in uek2 and performance issues are not reported related
to the concern.
The plan is therefore to merge this commit, to buy some time to
root-cause the issue. When the issue is root-caused, this commit
should be reverted.
Orabug:
25730857
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
mlx4_table_put(dev, &cq_table->table, *cqn);
err_out:
- mlx4_bitmap_free(&cq_table->bitmap, *cqn, MLX4_NO_RR);
+ mlx4_bitmap_free(&cq_table->bitmap, *cqn, MLX4_USE_RR);
return err;
}
mlx4_table_put(dev, &cq_table->cmpt_table, cqn);
mlx4_table_put(dev, &cq_table->table, cqn);
- mlx4_bitmap_free(&cq_table->bitmap, cqn, MLX4_NO_RR);
+ mlx4_bitmap_free(&cq_table->bitmap, cqn, MLX4_USE_RR);
}
static void mlx4_cq_free_icm(struct mlx4_dev *dev, int cqn)
{
struct mlx4_priv *priv = mlx4_priv(dev);
- mlx4_bitmap_free(&priv->mr_table.mpt_bitmap, index, MLX4_NO_RR);
+ mlx4_bitmap_free(&priv->mr_table.mpt_bitmap, index, MLX4_USE_RR);
}
static void mlx4_mpt_release(struct mlx4_dev *dev, u32 index)
mlx4_table_put(dev, &srq_table->table, *srqn);
err_out:
- mlx4_bitmap_free(&srq_table->bitmap, *srqn, MLX4_NO_RR);
+ mlx4_bitmap_free(&srq_table->bitmap, *srqn, MLX4_USE_RR);
return err;
}
mlx4_table_put(dev, &srq_table->cmpt_table, srqn);
mlx4_table_put(dev, &srq_table->table, srqn);
- mlx4_bitmap_free(&srq_table->bitmap, srqn, MLX4_NO_RR);
+ mlx4_bitmap_free(&srq_table->bitmap, srqn, MLX4_USE_RR);
}
static void mlx4_srq_free_icm(struct mlx4_dev *dev, int srqn)