coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
may cause problems when accessing in parallel:
say coml_nr > 0
  ctx_flush_and put                  other context
   if (compl_nr)                      get mutex
                                      coml_nr > 0
                                      do flush
                                          coml_nr = 0
                                      release mutex
        get mutex
           do flush (*)
        release mutex
in (*) place, we call io_cqring_ev_posted() and users likely get
no events there. To avoid spurious events, re-check the value when
under the lock.
Fixes: 2c32395d8111 ("io_uring: fix __tctx_task_work() ctx race")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210820221954.61815-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
                return;
        if (ctx->submit_state.compl_nr) {
                mutex_lock(&ctx->uring_lock);
-               io_submit_flush_completions(ctx);
+               if (ctx->submit_state.compl_nr)
+                       io_submit_flush_completions(ctx);
                mutex_unlock(&ctx->uring_lock);
        }
        percpu_ref_put(&ctx->refs);