IB/ipoib: Change send workqueue size for CM mode
Idea here is, one misbehaving connection should not become single point
of failure.
priv->tx_outstanding is shared by all QPs and when it reaches
sendq_size, network interface queue is stopped.
In connected mode, for every connection, TX QP size is sendq_size.
So if one of QP starts behaving bad and we don't receive send
completions in time, priv->tx_outstanding value can reach to the limit
where network interface queue is required to be stopped.
This can bring down entire cluster, because even ping will not go
forward from that point onwards.
With this patch, when creating CM QP for send operations, we limit size:
+int ipoib_cm_sendq_size __read_mostly = ipoib_sendq_size / 8;
Based on Yuval's suggestion, added module parameter to dictate how many
bad connections we want to allow (8 above is configurable).
If outstanding completions for that particular connection reaches to
size of ipoib_cm_sendq_size; we halt sending data on that connection
till we receive at least one completion.
In summary, this will require multiple QPs to misbehave (instead of 1)
in order to bring down entire cluster.
As clarification, this patch is not trying to recover or change behavior
of connection which may have gone bad; but it's reducing impact of bad
connection.
Orabug:
23254764
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>