net/rds: Fix endless RNR situation
Working with the following SRs:
Exadata SR# 3-
15640329311
Linux SR#3-
15675579325
it was discovered that by inserting IB_SEND_SOLICITED at regular
intervals removed the endless RNR Retry situation. The test was made
by inserting IB_SEND_SOLICITED at the same interval as
IB_SEND_SIGNALED was inserted, that is, by default for every 17th
fragment.
This commit introduces the sysctl variable
net.rds.ib.max_unsolicited_wr. A value of zero disables the
functionality of inserting IB_SEND_SOLICITED. A value of N will insert
IB_SEND_SOLICITED for every Nth fragment.
net.rds.ib.max_unsolicited_wr is by default 16, in order to avoid
customization when this fix is applied at the customer site.
This fix also has the nice side-effect that it improves IOPS for 1Q,
1D, 1T cases:
-q 1M -a 256:
Without fix:
tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu %
1 1161 0
1189243.20 0.00 0.00 203.52 857.34 -1.00
(average)
With fix (with default net.rds.ib.max_unsolicited_wr = 16):
tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu %
1 1323 0
1355849.36 0.00 0.00 203.76 751.50 -1.00
(average)
-q $[32*1024+256] -a 256:
With fix (net.rds.ib.max_unsolicited_wr = 0, i.e. disabled):
tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu %
1 15243 0 492547.75 0.00 0.00 10.58 62.01 -1.00
(average)
Ditto with net.rds.ib.max_unsolicited_wr = 4 (two SEND_SOLICITED per ~32K):
tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu %
1 16422 0 530641.03 0.00 0.00 10.28 57.25 -1.00
(average)
Orabug:
28857027
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>