From 3ea9d72d66f8cf8d5ca08d80dd37b846999a14fd Mon Sep 17 00:00:00 2001 From: Zach O'Keefe Date: Mon, 1 Aug 2022 14:09:46 -0700 Subject: [PATCH] mm/madvise: remove CAP_SYS_ADMIN requirement for process_madvise(MADV_COLLAPSE) process_madvise(MADV_COLLAPSE) currently requires CAP_SYS_ADMIN when not acting on the caller's own mm. This is maximally restrictive, and perpetuates existing issues with CAP_SYS_ADMIN. Remove this requirement. When acting on an external process' memory, the biggest concerns for process_madvise(MADV_COLLAPSE) are (1) being able to influence process performance by moving memory, possibly between nodes, that is mapped into the address space of external process(es), (2) defeat of address-space-layout randomization, and (3), being able to increase process RSS and memcg usage, possibly causing memcg OOM. process_madvise(2) already enforces CAP_SYS_NICE and PTRACE_MODE_READ (in PTRACE_MODE_FSCREDS mode). A process with these credentials can already accomplish (1) and (2) via move_pages(MPOL_MF_MOVE_ALL), and (3) via process_madvise(MADV_WILLNEED). process_madvise(MADV_COLLAPSE) may also circumvent sysfs THP settings. When acting on one's own memory (which is equivalent to madvise(MADV_COLLAPSE)), this is deemed acceptable, since aside from the possibility of hoarding available hugepages (which is currently already possible) no harm to the system can be done. When acting on an external process' memory, circumventing sysfs THP settings should provide no additional threat compared to the ones listed. As such, imposing additional capabilities (such as CAP_SETUID, as a way to ensure the caller could have just altered the sysfs THP settings themselves) provides no extra protection. Link: https://lkml.kernel.org/r/20220801210946.3069083-1-zokeefe@google.com Fixes: 7ec952341312 ("mm/madvise: add MADV_COLLAPSE to process_madvise()") Signed-off-by: Zach O'Keefe Cc: Alex Shi Cc: Andrea Arcangeli Cc: Arnd Bergmann Cc: Axel Rasmussen Cc: Chris Kennelly Cc: Chris Zankel Cc: Dan Carpenter Cc: David Hildenbrand Cc: David Rientjes Cc: Helge Deller Cc: Hugh Dickins Cc: Ivan Kokshaysky Cc: James Bottomley Cc: Jens Axboe Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Matt Turner Cc: Max Filippov Cc: Miaohe Lin Cc: Michal Hocko Cc: Minchan Kim Cc: Pasha Tatashin Cc: Pavel Begunkov Cc: Peter Xu Cc: Rongwei Wang Cc: SeongJae Park Cc: Song Liu Cc: "Souptick Joarder (HPE)" Cc: Thomas Bogendoerfer Cc: Vlastimil Babka Cc: Yang Shi Cc: Zi Yan Signed-off-by: Andrew Morton --- mm/madvise.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index f9e11b6c9916..af97100a0727 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1170,16 +1170,14 @@ madvise_behavior_valid(int behavior) } } -static bool -process_madvise_behavior_valid(int behavior, struct task_struct *task) +static bool process_madvise_behavior_valid(int behavior) { switch (behavior) { case MADV_COLD: case MADV_PAGEOUT: case MADV_WILLNEED: - return true; case MADV_COLLAPSE: - return task == current || capable(CAP_SYS_ADMIN); + return true; default: return false; } @@ -1457,7 +1455,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, goto free_iov; } - if (!process_madvise_behavior_valid(behavior, task)) { + if (!process_madvise_behavior_valid(behavior)) { ret = -EINVAL; goto release_task; } -- 2.50.1