RDS: avoid double destory of cm_id when rdms_resolve_route fails
It crashes in rds_ib_conn_shutdown because it was using a freed cm_id. The
cm_id had got freed quite a while back actually (more than 15 secs back) during
an earlier connect attempt.
This was the sequence of the earlier connect attempt: rds_ib_conn_connect calls
rdma_resolve_addr. The synchronous part of rdma_resolve_addr succeeds. But the
asynchronous part fails at some point. RDMA Connection Manager returns the
event RDMA_CM_EVENT_ADDR_RESOLVED. This part succeeds. Next, RDS calls
rdma_resolve_route from the rds_rdma_cm_event_handler. This fails. We return
this error back to the RDMA CM addr_handler which destroys the cm_id as
follows: addr_handler (cma.c):
Later when a new connect req comes in from the remote side, we shutdown this cm_id
and try to reconnect:
/*
* after 15 seconds, give up on existing connection
* attempts and make them try again. At this point
* it's no longer a race but something has gone
* horribly wrong
*/
if (now > conn->c_connection_start &&
now - conn->c_connection_start > 5) {
printk(KERN_CRIT "rds connection racing for 15s, forcing reset "
"connection %u.%u.%u.%u->%u.%u.%u.%u\n",
NIPQUAD(conn->c_laddr), NIPQUAD(conn->c_faddr));
rds_conn_drop(conn);
....
We crash during the shutdown.
Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>