Patch series "mm/oom_kill: Only delay OOM reaper for processes using
robust futexes", v4.
The OOM reaper quickly reclaims a process's memory when the system hits
OOM, helping the system recover. Without the OOM reaper, if a process
frozen by cgroup v1 is OOM killed, the victim's memory cannot be freed,
leaving the system in a poor state. Even if the process is not frozen by
cgroup v1, reclaiming victims' memory remains important, as having one
more process working speeds up memory release.
When processes holding robust futexes are OOM killed but waiters on those
futexes remain alive, the robust futexes might be reaped before
futex_cleanup() runs. This can cause the waiters to block indefinitely
[1].
To prevent this issue, the OOM reaper's work is delayed by 2 seconds [1].
Since many killed processes exit within 2 seconds, the OOM reaper rarely
runs after this delay. However, robust futex users are few, so delaying
OOM reap for all victims is unnecessary.
If each thread's robust_list in a process is NULL, the process holds no
robust futexes. For such processes, the OOM reaper should not be delayed.
For processes holding robust futexes, to avoid issue [1], the OOM reaper
must still be delayed.
Patch 1 introduces process_has_robust_futex() to detect whether a process
uses robust futexes. Patch 2 delays the OOM reaper only for processes
holding robust futexes, improving OOM reaper performance. Patch 3 makes
the OOM reaper and exit_mmap() traverse the maple tree in opposite orders
to reduce PTE lock contention caused by unmapping the same vma.
This patch (of 3):
When the holders of robust futexes are OOM killed but the waiters on
robust futexes are still alive, the robust futexes might be reaped before
futex_cleanup() runs. This can cause the waiters to block indefinitely
[1]. To prevent this issue, the OOM reaper's work is delayed by 2 seconds
[1]. However, the OOM reaper now rarely runs since many killed processes
exit within 2 seconds.
Because robust futex users are few, delay the reaper's execution only for
processes holding robust futexes to improve the performance of the OOM
reaper.
Introduce the function process_has_robust_futex() to detect whether a
process uses robust futexes. If each thread's robust_list in a process is
NULL, it means the process holds no robust futexes. Conversely, it means
the process holds robust futexes.
Link: https://lkml.kernel.org/r/20250814135555.17493-1-zhongjinji@honor.com
Link: https://lkml.kernel.org/r/20250814135555.17493-2-zhongjinji@honor.com
Link: https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u
Signed-off-by: zhongjinji <zhongjinji@honor.com>
Cc: Andre Almeida <andrealmeid@igalia.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Joel Savitz <jsavitz@redhat.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3);
int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4);
+bool process_has_robust_futex(struct task_struct *tsk);
#ifdef CONFIG_FUTEX_PRIVATE_HASH
int futex_hash_allocate_default(void);
{
return -EINVAL;
}
+static inline bool process_has_robust_futex(struct task_struct *tsk)
+{
+ return false;
+}
static inline int futex_hash_allocate_default(void)
{
return 0;
return ret;
}
+/*
+ * process_has_robust_futex() - check whether the given task hold robust futexes.
+ * @p: task struct of which task to consider
+ *
+ * If any thread in the task has a non-NULL robust_list or compat_robust_list,
+ * it indicates that the task holds robust futexes.
+ */
+bool process_has_robust_futex(struct task_struct *tsk)
+{
+ struct task_struct *t;
+ bool ret = false;
+
+ rcu_read_lock();
+ for_each_thread(tsk, t) {
+ if (unlikely(t->robust_list)) {
+ ret = true;
+ break;
+ }
+#ifdef CONFIG_COMPAT
+ if (unlikely(t->compat_robust_list)) {
+ ret = true;
+ break;
+ }
+#endif
+ }
+ rcu_read_unlock();
+
+ return ret;
+}
+
static int __init futex_init(void)
{
unsigned long hashsize, i;