Mateusz Guzik <mjguzik@gmail.com> says:
The clone side contends against exit side in a way which avoidably
exacerbates the problem by the latter waiting on locks held by the
former while holding the tasklist_lock.
Whacking this for both add_device_randomness and pids allocation gives
me a 15% speed up for thread creation/destruction in a 24-core vm.
The random patch is worth about 4%.
The new bottleneck is pidmap_lock itself, with the biggest problem being
the allocation itself taking the lock *twice*.
Bench (plop into will-it-scale):
$ cat tests/threadspawn1.c
char *testcase_description = "Thread creation and teardown";
static void *worker(void *arg)
{
return (NULL);
}
void testcase(unsigned long long *iterations, unsigned long nr)
{
pthread_t thread;
int error;
while (1) {
error = pthread_create(&thread, NULL, worker, NULL);
assert(error == 0);
error = pthread_join(thread, NULL);
assert(error == 0);
(*iterations)++;
}
}
* patches from https://lore.kernel.org/r/
20250206164415.450051-1-mjguzik@gmail.com:
pid: drop irq disablement around pidmap_lock
pid: perform free_pid() calls outside of tasklist_lock
pid: sprinkle tasklist_lock asserts
exit: hoist get_pid() in release_task() outside of tasklist_lock
exit: perform add_device_randomness() without tasklist_lock
Link: https://lore.kernel.org/r/20250206164415.450051-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>