blk-mq: fix freeze queue race
Orabug:
24914952
There are several race conditions while freezing queue.
When unfreezing queue, there is a small window between decrementing
q->mq_freeze_depth to zero and percpu_ref_reinit() call with
q->mq_usage_counter. If the other calls blk_mq_freeze_queue_start()
in the window, q->mq_freeze_depth is increased from zero to one and
percpu_ref_kill() is called with q->mq_usage_counter which is already
killed. percpu refcount should be re-initialized before killed again.
Also, there is a race condition while switching to percpu mode.
percpu_ref_switch_to_percpu() and percpu_ref_kill() must not be
executed at the same time as the following scenario is possible:
1. q->mq_usage_counter is initialized in atomic mode.
(atomic counter: 1)
2. After the disk registration, a process like systemd-udev starts
accessing the disk, and successfully increases refcount successfully
by percpu_ref_tryget_live() in blk_mq_queue_enter().
(atomic counter: 2)
3. In the final stage of initialization, q->mq_usage_counter is being
switched to percpu mode by percpu_ref_switch_to_percpu() in
blk_mq_finish_init(). But if CONFIG_PREEMPT_VOLUNTARY is enabled,
the process is rescheduled in the middle of switching when calling
wait_event() in __percpu_ref_switch_to_percpu().
(atomic counter: 2)
4. CPU hotplug handling for blk-mq calls percpu_ref_kill() to freeze
request queue. q->mq_usage_counter is decreased and marked as
DEAD. Wait until all requests have finished.
(atomic counter: 1)
5. The process rescheduled in the step 3. is resumed and finishes
all remaining work in __percpu_ref_switch_to_percpu().
A bias value is added to atomic counter of q->mq_usage_counter.
(atomic counter: PERCPU_COUNT_BIAS + 1)
6. A request issed in the step 2. is finished and q->mq_usage_counter
is decreased by blk_mq_queue_exit(). q->mq_usage_counter is DEAD,
so atomic counter is decreased and no release handler is called.
(atomic counter: PERCPU_COUNT_BIAS)
7. CPU hotplug handling in the step 4. will wait forever as
q->mq_usage_counter will never be zero.
Also, percpu_ref_reinit() and percpu_ref_kill() must not be executed
at the same time. Because both functions could call
__percpu_ref_switch_to_percpu() which adds the bias value and
initialize percpu counter.
Fix those races by serializing with per-queue mutex.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
(cherry picked from https://patchwork.kernel.org/patch/
7269471/)
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>