In a high-load arm64 environment, the pcrypt_aead01 test in LTP can lead
to system UAF (Use-After-Free) issues. Due to the lengthy analysis of
the pcrypt_aead01 function call, I'll describe the problem scenario
using a simplified model:
Suppose there's a user of padata named `user_function` that adheres to
the padata requirement of calling `padata_free_shell` after `serial()`
has been invoked, as demonstrated in the following code:
```c
struct request {
    struct padata_priv padata;
    struct completion *done;
};
void parallel(struct padata_priv *padata) {
    do_something();
}
void serial(struct padata_priv *padata) {
    struct request *request = container_of(padata,
    				struct request,
				padata);
    complete(request->done);
}
void user_function() {
    DECLARE_COMPLETION(done)
    padata->parallel = parallel;
    padata->serial = serial;
    padata_do_parallel();
    wait_for_completion(&done);
    padata_free_shell();
}
```
In the corresponding padata.c file, there's the following code:
```c
static void padata_serial_worker(struct work_struct *serial_work) {
    ...
    cnt = 0;
    while (!list_empty(&local_list)) {
        ...
        padata->serial(padata);
        cnt++;
    }
    local_bh_enable();
    if (refcount_sub_and_test(cnt, &pd->refcnt))
        padata_free_pd(pd);
}
```
Because of the high system load and the accumulation of unexecuted
softirq at this moment, `local_bh_enable()` in padata takes longer
to execute than usual. Subsequently, when accessing `pd->refcnt`,
`pd` has already been released by `padata_free_shell()`, resulting
in a UAF issue with `pd->refcnt`.
The fix is straightforward: add `refcount_dec_and_test` before calling
`padata_free_pd` in `padata_free_shell`.
Fixes: 07928d9bfc81 ("padata: Remove broken queue flushing")
Signed-off-by: WangJinchao <wangjinchao@xfusion.com>
Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  */
 void padata_free_shell(struct padata_shell *ps)
 {
+       struct parallel_data *pd;
+
        if (!ps)
                return;
 
        mutex_lock(&ps->pinst->lock);
        list_del(&ps->list);
-       padata_free_pd(rcu_dereference_protected(ps->pd, 1));
+       pd = rcu_dereference_protected(ps->pd, 1);
+       if (refcount_dec_and_test(&pd->refcnt))
+               padata_free_pd(pd);
        mutex_unlock(&ps->pinst->lock);
 
        kfree(ps);