From: Stefan Hajnoczi Date: Tue, 12 Sep 2023 23:10:34 +0000 (-0400) Subject: test-bdrv-drain: avoid race with BH in IOThread drain test X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=c8bf923d5eeed0b2bbe26feb37bb399ec1a894f5;p=qemu-nvme.git test-bdrv-drain: avoid race with BH in IOThread drain test This patch fixes a race condition in test-bdrv-drain that is difficult to reproduce. test-bdrv-drain sometimes fails without an error message on the block pull request sent by Kevin Wolf on Sep 4, 2023. I was able to reproduce it locally and found that "block-backend: process I/O in the current AioContext" (in this patch series) is the first commit where it reproduces. I do not know why "block-backend: process I/O in the current AioContext" exposes this bug. It might be related to the fact that the test's preadv request runs in the main thread instead of IOThread a after my commit. That might simply change the timing of the test. Now on to the race condition in test-bdrv-drain. The main thread schedules a BH in IOThread a and then drains the BDS: aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data); /* The request is running on the IOThread a. Draining its block device * will make sure that it has completed as far as the BDS is concerned, * but the drain in this thread can continue immediately after * bdrv_dec_in_flight() and aio_ret might be assigned only slightly * later. */ do_drain_begin(drain_type, bs); If the BH completes before do_drain_begin() then there is nothing to worry about. If the BH invokes bdrv_flush() before do_drain_begin(), then do_drain_begin() waits for it to complete. The problematic case is when do_drain_begin() runs before the BH enters bdrv_flush(). Then do_drain_begin() misses the BH and the drain mechanism has failed in quiescing I/O. Fix this by incrementing the in_flight counter so that do_drain_begin() waits for test_iothread_main_thread_bh(). Signed-off-by: Stefan Hajnoczi Message-ID: <20230912231037.826804-3-stefanha@redhat.com> Reviewed-by: Eric Blake Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf --- diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c index b040a73bb9..0b603e7c57 100644 --- a/tests/unit/test-bdrv-drain.c +++ b/tests/unit/test-bdrv-drain.c @@ -512,6 +512,7 @@ static void test_iothread_main_thread_bh(void *opaque) * executed during drain, otherwise this would deadlock. */ aio_context_acquire(bdrv_get_aio_context(data->bs)); bdrv_flush(data->bs); + bdrv_dec_in_flight(data->bs); /* incremented by test_iothread_common() */ aio_context_release(bdrv_get_aio_context(data->bs)); } @@ -583,6 +584,13 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread) aio_context_acquire(ctx_a); } + /* + * Increment in_flight so that do_drain_begin() waits for + * test_iothread_main_thread_bh(). This prevents the race between + * test_iothread_main_thread_bh() in IOThread a and do_drain_begin() in + * this thread. test_iothread_main_thread_bh() decrements in_flight. + */ + bdrv_inc_in_flight(bs); aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data); /* The request is running on the IOThread a. Draining its block device