]> www.infradead.org Git - users/hch/misc.git/commit
IB/cm: Rework sending DREQ when destroying a cm_id
authorSean Hefty <shefty@nvidia.com>
Wed, 13 Nov 2024 11:12:56 +0000 (13:12 +0200)
committerLeon Romanovsky <leon@kernel.org>
Sun, 17 Nov 2024 09:51:49 +0000 (04:51 -0500)
commitfc0856c3a32576fb21c494f38b9c6c8dc3bf58ab
tree0770648a1f89f7581fee15376a547c053dc0e88a
parent1e5159219076ddb2e44338c667c83fd1bd43dfef
IB/cm: Rework sending DREQ when destroying a cm_id

A DREQ is sent in 2 situations:

  1. When requested by the user.
     This DREQ has to wait for a DREP, which will be routed to the user.

  2. When the cm_id is destroyed.
     This DREQ is generated by the CM to notify the peer that the
     connection has been destroyed.

In the latter case, any DREP that is received will be discarded.
There's no need to hold a reference on the cm_id.  Today, both
situations are covered by the same function: cm_send_dreq_locked().
When invoked in the cm_id destroy path, the cm_id reference would be
held until the DREQ completes, blocking the destruction.  Because it
could take several seconds to minutes before the DREQ receives a DREP,
the destroy call posts a send for the DREQ then immediately cancels the
MAD.  However, cancellation is not immediate in the MAD layer.  There
could still be a delay before the MAD layer returns the DREQ to the CM.
Moreover, the only guarantee is that the DREQ will be sent at most once.

Introduce a separate flow for sending a DREQ when destroying the cm_id.
The new flow will not hold a reference on the cm_id, allowing it to be
cleaned up immediately.  The cancellation trick is no longer needed.
The MAD layer will send the DREQ exactly once.

Signed-off-by: Sean Hefty <shefty@nvidia.com>
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Link: https://patch.msgid.link/a288a098b8e0550305755fd4a7937431699317f4.1731495873.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
drivers/infiniband/core/cm.c