Commit 
0c0be98bbe67 ("md/raid10: prevent unnecessary calls to wake_up()
in fast path") missed one place, for example, with:
	fio -direct=1 -rw=write/randwrite -iodepth=1 ...
Plug and unplug are called for each io, then wake_up() from raid10_unplug()
will cause lock contention as well.
Avoid this contention by using wake_up_barrier() instead of wake_up(),
where spin_lock is not held if waitqueue is empty.
Fio test script:
[global]
name=random reads and writes
ioengine=libaio
direct=1
readwrite=randrw
rwmixread=70
iodepth=64
buffered=0
filename=/dev/md0
size=1G
runtime=30
time_based
randrepeat=0
norandommap
refill_buffers
ramp_time=10
bs=4k
numjobs=400
group_reporting=1
[job1]
Test result with ramdisk raid10(By Ali):
	Before this patch	With this patch
READ	IOPS=2033k		IOPS=3642k
WRITE	IOPS=871k		IOPS=1561K
By the way, in this scenario, blk_plug_cb() will be allocated and freed
for each io, this seems need to be optimized as well.
Reported-and-tested-by: Ali Gholami Rudi <aligrudi@gmail.com>
Closes: https://lore.kernel.org/all/20231606122233@laper.mirepesht/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230621105728.1268542-1-yukuai1@huaweicloud.com
                spin_lock_irq(&conf->device_lock);
                bio_list_merge(&conf->pending_bio_list, &plug->pending);
                spin_unlock_irq(&conf->device_lock);
-               wake_up(&conf->wait_barrier);
+               wake_up_barrier(conf);
                md_wakeup_thread(mddev->thread);
                kfree(plug);
                return;
        /* we aren't scheduling, so we can do the write-out directly. */
        bio = bio_list_get(&plug->pending);
        raid1_prepare_flush_writes(mddev->bitmap);
-       wake_up(&conf->wait_barrier);
+       wake_up_barrier(conf);
 
        while (bio) { /* submit pending writes */
                struct bio *next = bio->bi_next;