]> www.infradead.org Git - users/jedix/linux-maple.git/commit
md/raid5: fix a race condition in stripe batch
authorShaohua Li <shli@fb.com>
Mon, 3 Dec 2018 00:53:59 +0000 (08:53 +0800)
committerBrian Maly <brian.maly@oracle.com>
Tue, 4 Dec 2018 20:50:19 +0000 (15:50 -0500)
commit57f7422cfb21150a42b7b426a54b462c24e1b0d2
treeadf5af63d9e3fd7e03f37f474b48f7e030064bb8
parentadb2cf4030a4c2d59542f18d53784b5b593f3aeb
md/raid5: fix a race condition in stripe batch

We have a race condition in below scenario, say have 3 continuous stripes, sh1,
sh2 and sh3, sh1 is the stripe_head of sh2 and sh3:

CPU1 CPU2 CPU3
handle_stripe(sh3)
stripe_add_to_batch_list(sh3)
-> lock(sh2, sh3)
-> lock batch_lock(sh1)
-> add sh3 to batch_list of sh1
-> unlock batch_lock(sh1)
clear_batch_ready(sh1)
-> lock(sh1) and batch_lock(sh1)
-> clear STRIPE_BATCH_READY for all stripes in batch_list
-> unlock(sh1) and batch_lock(sh1)
->clear_batch_ready(sh3)
-->test_and_clear_bit(STRIPE_BATCH_READY, sh3)
--->return 0 as sh->batch == NULL
-> sh3->batch_head = sh1
-> unlock (sh2, sh3)

In CPU1, handle_stripe will continue handle sh3 even it's in batch stripe list
of sh1. By moving sh3->batch_head assignment in to batch_lock, we make it
impossible to clear STRIPE_BATCH_READY before batch_head is set.

Thanks Stephane for helping debug this tricky issue.

Reported-and-tested-by: Stephane Thiell <sthiell@stanford.edu>
Cc: stable@vger.kernel.org (v4.1+)
Signed-off-by: Shaohua Li <shli@fb.com>
(cherry pick from upstream commit 3664847d95e60a9a943858b7800f8484669740fc)

Orabug: 28917012

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
drivers/md/raid5.c