RDS: double free rdma_cm_id
RDS currently offloads rdma_destroy_id() to an aux thread as part of the
connection shutdown. This was to workaround a bug in which rdma_destroy_id()
could block and cause RDS reconnect to hang. By queuing the rdma_destroy_id()
work, we unfortunately open up a timing window in which the pending
CMA_ADDR_QUERY request might not get canceled right away and race with
rdma_destroy_id().
In this case, rdma_destroyed_id() gets called and frees the cm id. Then,
CMA_ADDR_QUERY completes and calls RDS event handler which calls
rds_resolve_route on the destroyed cm id. The event handler returns failure
which causes RDMA CM to call rdma_destroy_id() again on the same cm id!
Hence the problem.
Since the rdma_destroy_id() bug has been fixed by MLX to offload the blocking
operation to the worker thread, RDS no longer needs to queue up
rdma_destroy_id(). This closes up the window above and fixes the problem.
Orabug:
17192816
Signed-off-by: Richard Frank <richard.frank@oracle.com>
(cherry picked from commit
3fec98717bf926d869d049e17baad849d1ba7d78)