It seems that alloc_retstack_tasklist() can also take a lockless
approach for scanning the tasklist, instead of using the big global
tasklist_lock. For this we also kill another deprecated and rcu-unsafe
tsk->thread_group user replacing it with for_each_process_thread(),
maintaining semantics.
Here tasklist_lock does not protect anything other than the list
against concurrent fork/exit. And considering that the whole thing
is capped by FTRACE_RETSTACK_ALLOC_SIZE (32), it should not be a
problem to have a pontentially stale, yet stable, list. The task cannot
go away either, so we don't risk racing with ftrace_graph_exit_task()
which clears the retstack.
The tsk->ret_stack management is not protected by tasklist_lock, being
serialized with the corresponding publish/subscribe barriers against
concurrent ftrace_push_return_trace(). In addition this plays nicer
with cachelines by avoiding two atomic ops in the uncontended case.
Link: https://lkml.kernel.org/r/20200907013326.9870-1-dave@stgolabs.net
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
                }
        }
 
-       read_lock(&tasklist_lock);
-       do_each_thread(g, t) {
+       rcu_read_lock();
+       for_each_process_thread(g, t) {
                if (start == end) {
                        ret = -EAGAIN;
                        goto unlock;
                        smp_wmb();
                        t->ret_stack = ret_stack_list[start++];
                }
-       } while_each_thread(g, t);
+       }
 
 unlock:
-       read_unlock(&tasklist_lock);
+       rcu_read_unlock();
 free:
        for (i = start; i < end; i++)
                kfree(ret_stack_list[i]);