www.infradead.org Git - users/jedix/linux-maple.git/commit

author	Wei Lin Guay <wei.lin.guay@oracle.com>
	Fri, 25 Aug 2017 09:13:51 +0000 (11:13 +0200)
committer	Brian Maly <brian.maly@oracle.com>
	Tue, 12 Jun 2018 00:37:25 +0000 (20:37 -0400)
commit	8456dc853376a9254b325db2933c25d3ec9e8532
tree	eaaecc3b0c5c4c5d5dc814e84c35832930a7e2c7	tree
parent	d8bd5dfb5de44f079d3d4858d5aafbf092987f78	commit \| diff

net/rds: Avoid stalled connection due to CM REQ retries

RDS drops a connection and destroys its cm_id once a CM REJ is sent. In a
congested fabric, there is a race where a remote node receives a CM REJ
after CM has retried another CM REQ. In this scenario, the cm_id that sends
the CM REQ is no longer exists even though the remote end might respond
with a CM REP, and wait for an incoming CM RTU. This RDS connection
establishment is stuck until the connection is destroyed after the CM
timeout. As a result, this leads to a very long brownout time. Thus, this
patch adds a mechanism to detect a rejected CM REQ and rejects all the
subsequent CM REQ that are retried by the CM.

Orabug: 28068627

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Tested-by: Dib Chatterjee <dib.chatterjee@oracle.com>
(cherry picked from commit c5c4f1472bc788ddc69af713f975ad92bdefe206
repo https://linux-git.us.oracle.com/UEK/linux-wguay-public)

Conflict:
net/rds/ib_cm.c

Made it checkpatch clean.

v1->v2:
Added Shannon's recommendations

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>

net/rds/ib.h		diff \| blob \| history
net/rds/ib_cm.c		diff \| blob \| history