]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
2 years agomm: add merging after mremap resize
Jakub Matěna [Fri, 3 Jun 2022 14:57:19 +0000 (16:57 +0200)]
mm: add merging after mremap resize

When mremap call results in expansion, it might be possible to merge the
VMA with the next VMA which might become adjacent.  This patch adds
vma_merge call after the expansion is done to try and merge.

Link: https://lkml.kernel.org/r/20220603145719.1012094-3-matenajakub@gmail.com
Signed-off-by: Jakub Matěna <matenajakub@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: refactor of vma_merge()
Jakub Matěna [Fri, 3 Jun 2022 14:57:18 +0000 (16:57 +0200)]
mm: refactor of vma_merge()

Patch series "Refactor of vma_merge and new merge call", v4.

I am currently working on my master's thesis trying to increase number of
merges of VMAs currently failing because of page offset incompatibility
and difference in their anon_vmas.  The following refactor and added merge
call included in this series is just two smaller upgrades I created along
the way.

This patch (of 2):

Refactor vma_merge() to make it shorter and more understandable.  Main
change is the elimination of code duplicity in the case of merge next
check.  This is done by first doing checks and caching the results before
executing the merge itself.  The variable 'area' is divided into 'mid' and
'res' as previously it was used for two purposes, as the middle VMA
between prev and next and also as the result of the merge itself.  Exit
paths are also unified.

Link: https://lkml.kernel.org/r/20220603145719.1012094-1-matenajakub@gmail.com
Link: https://lkml.kernel.org/r/20220603145719.1012094-2-matenajakub@gmail.com
Signed-off-by: Jakub Matěna <matenajakub@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Rik van Riel <riel@surriel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: delete unused MMF_OOM_VICTIM flag
Suren Baghdasaryan [Tue, 31 May 2022 22:31:00 +0000 (15:31 -0700)]
mm: delete unused MMF_OOM_VICTIM flag

With the last usage of MMF_OOM_VICTIM in exit_mmap gone, this flag is now
unused and can be removed.

Link: https://lkml.kernel.org/r/20220531223100.510392-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm-drop-oom-code-from-exit_mmap-fix-fix
Andrew Morton [Wed, 1 Jun 2022 22:17:52 +0000 (15:17 -0700)]
mm-drop-oom-code-from-exit_mmap-fix-fix

restore Suren's mmap_read_lock() optimization

Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: drop oom code from exit_mmap
Suren Baghdasaryan [Tue, 31 May 2022 22:30:59 +0000 (15:30 -0700)]
mm: drop oom code from exit_mmap

The primary reason to invoke the oom reaper from the exit_mmap path used
to be a prevention of an excessive oom killing if the oom victim exit
races with the oom reaper (see [1] for more details).  The invocation has
moved around since then because of the interaction with the munlock logic
but the underlying reason has remained the same (see [2]).

Munlock code is no longer a problem since [3] and there shouldn't be any
blocking operation before the memory is unmapped by exit_mmap so the oom
reaper invocation can be dropped.  The unmapping part can be done with the
non-exclusive mmap_sem and the exclusive one is only required when page
tables are freed.

Remove the oom_reaper from exit_mmap which will make the code easier to
read.  This is really unlikely to make any observable difference although
some microbenchmarks could benefit from one less branch that needs to be
evaluated even though it almost never is true.

[1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
[2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
[3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap")

Link: https://lkml.kernel.org/r/20220531223100.510392-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mmap.c: pass in mapping to __vma_link_file()
Liam R. Howlett [Wed, 20 Jul 2022 02:18:05 +0000 (02:18 +0000)]
mm/mmap.c: pass in mapping to __vma_link_file()

__vma_link_file() resolves the mapping from the file, if there is one.
Pass through the mapping and check the vm_file externally since most
places already have the required information and check of vm_file.

Link: https://lkml.kernel.org/r/20220504011345.662299-54-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-70-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-70-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mmap: drop range_has_overlap() function
Liam R. Howlett [Wed, 20 Jul 2022 02:18:04 +0000 (02:18 +0000)]
mm/mmap: drop range_has_overlap() function

Since there is no longer a linked list, the range_has_overlap() function
is identical to the find_vma_intersection() function.

Link: https://lkml.kernel.org/r/20220504011345.662299-53-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-69-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-69-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: remove the vma linked list
Liam R. Howlett [Wed, 20 Jul 2022 02:18:04 +0000 (02:18 +0000)]
mm: remove the vma linked list

Replace any vm_next use with vma_find().

Update free_pgtables(), unmap_vmas(), and zap_page_range() to use the
maple tree.

Use the new free_pgtables() and unmap_vmas() in do_mas_align_munmap().  At
the same time, alter the loop to be more compact.

Now that free_pgtables() and unmap_vmas() take a maple tree as an
argument, rearrange do_mas_align_munmap() to use the new tree to hold the
vmas to remove.

Remove __vma_link_list() and __vma_unlink_list() as they are exclusively
used to update the linked list.

Drop linked list update from __insert_vm_struct().

Rework validation of tree as it was depending on the linked list.

Link: https://lkml.kernel.org/r/20220504011345.662299-52-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220513141548.2019143-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-68-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-68-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoriscv: use vma iterator for vdso
Liam R. Howlett [Wed, 20 Jul 2022 02:18:04 +0000 (02:18 +0000)]
riscv: use vma iterator for vdso

Remove the linked list use in favour of the vma iterator.

Link: https://lkml.kernel.org/r/20220504011345.662299-51-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-67-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-67-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agonommu: remove uses of VMA linked list
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:18:03 +0000 (02:18 +0000)]
nommu: remove uses of VMA linked list

Use the maple tree or VMA iterator instead.  This is faster and will allow
us to shrink the VMA.

Link: https://lkml.kernel.org/r/20220504011345.662299-50-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-66-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-66-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoi915: use the VMA iterator
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:18:03 +0000 (02:18 +0000)]
i915: use the VMA iterator

Replace the linked list in probe_range() with the VMA iterator.

Link: https://lkml.kernel.org/r/20220504011345.662299-49-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-65-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-65-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/swapfile: use vma iterator instead of vma linked list
Liam R. Howlett [Wed, 20 Jul 2022 02:18:03 +0000 (02:18 +0000)]
mm/swapfile: use vma iterator instead of vma linked list

unuse_mm() no longer needs to reference the linked list.

Link: https://lkml.kernel.org/r/20220504011345.662299-48-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-64-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-64-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/pagewalk: use vma_find() instead of vma linked list
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:18:02 +0000 (02:18 +0000)]
mm/pagewalk: use vma_find() instead of vma linked list

walk_page_range() no longer uses the one vma linked list reference.

Link: https://lkml.kernel.org/r/20220504011345.662299-47-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-63-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-63-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/oom_kill: use maple tree iterators instead of vma linked list
Liam R. Howlett [Wed, 20 Jul 2022 02:18:02 +0000 (02:18 +0000)]
mm/oom_kill: use maple tree iterators instead of vma linked list

Link: https://lkml.kernel.org/r/20220504011345.662299-46-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-62-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-62-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/msync: use vma_find() instead of vma linked list
Liam R. Howlett [Wed, 20 Jul 2022 02:18:02 +0000 (02:18 +0000)]
mm/msync: use vma_find() instead of vma linked list

Link: https://lkml.kernel.org/r/20220504011345.662299-45-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-61-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-61-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mremap: use vma_find_intersection() instead of vma linked list
Liam R. Howlett [Wed, 20 Jul 2022 02:18:02 +0000 (02:18 +0000)]
mm/mremap: use vma_find_intersection() instead of vma linked list

Link: https://lkml.kernel.org/r/20220504011345.662299-44-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-60-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-60-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mprotect: use maple tree navigation instead of vma linked list
Liam R. Howlett [Wed, 20 Jul 2022 02:18:01 +0000 (02:18 +0000)]
mm/mprotect: use maple tree navigation instead of vma linked list

Link: https://lkml.kernel.org/r/20220504011345.662299-43-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-59-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-59-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mlock: use vma iterator and maple state instead of vma linked list
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:18:01 +0000 (02:18 +0000)]
mm/mlock: use vma iterator and maple state instead of vma linked list

Handle overflow checking in count_mm_mlocked_page_nr() differently.

Link: https://lkml.kernel.org/r/20220504011345.662299-42-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-58-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-58-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mempolicy: use vma iterator & maple state instead of vma linked list
Liam R. Howlett [Wed, 20 Jul 2022 02:18:01 +0000 (02:18 +0000)]
mm/mempolicy: use vma iterator & maple state instead of vma linked list

Reworked the way mbind_range() finds the first VMA to reuse the maple
state and limit the number of tree walks needed.

Note, this drops the VM_BUG_ON(!vma) call, which would catch a start
address higher than the last VMA.  The code was written in a way that
allowed no VMA updates to occur and still return success.  There should be
no functional change to this scenario with the new code.

Link: https://lkml.kernel.org/r/20220504011345.662299-41-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-57-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-57-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/memcontrol: stop using mm->highest_vm_end
Liam R. Howlett [Wed, 20 Jul 2022 02:18:01 +0000 (02:18 +0000)]
mm/memcontrol: stop using mm->highest_vm_end

Pass through ULONG_MAX instead.

Link: https://lkml.kernel.org/r/20220504011345.662299-40-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-56-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-56-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/madvise: use vma_find() instead of vma linked list
Liam R. Howlett [Wed, 20 Jul 2022 02:18:00 +0000 (02:18 +0000)]
mm/madvise: use vma_find() instead of vma linked list

madvise_walk_vmas() no longer uses linked list.

Link: https://lkml.kernel.org/r/20220504011345.662299-39-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-55-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-55-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/ksm: use vma iterators instead of vma linked list
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:18:00 +0000 (02:18 +0000)]
mm/ksm: use vma iterators instead of vma linked list

Remove the use of the linked list for eventual removal.

Link: https://lkml.kernel.org/r/20220504011345.662299-38-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-54-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-54-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/khugepaged: stop using vma linked list
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:18:00 +0000 (02:18 +0000)]
mm/khugepaged: stop using vma linked list

Use vma iterator & find_vma() instead of vma linked list.

Link: https://lkml.kernel.org/r/20220504011345.662299-37-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-53-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-53-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/gup: use maple tree navigation instead of linked list
Liam R. Howlett [Wed, 20 Jul 2022 02:17:59 +0000 (02:17 +0000)]
mm/gup: use maple tree navigation instead of linked list

Use find_vma_intersection() to locate the VMAs in __mm_populate() instead
of using find_vma() and the linked list.

Link: https://lkml.kernel.org/r/20220504011345.662299-36-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-52-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-52-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agobpf: remove VMA linked list
Liam R. Howlett [Wed, 20 Jul 2022 02:17:59 +0000 (02:17 +0000)]
bpf: remove VMA linked list

Use vma_next() and remove reference to the start of the linked list

Link: https://lkml.kernel.org/r/20220504011345.662299-35-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-51-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-51-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agofork: use VMA iterator
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:59 +0000 (02:17 +0000)]
fork: use VMA iterator

The VMA iterator is faster than the linked list and removing the linked
list will shrink the vm_area_struct.

Link: https://lkml.kernel.org/r/20220504011345.662299-34-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-50-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-50-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agosched: use maple tree iterator to walk VMAs
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:59 +0000 (02:17 +0000)]
sched: use maple tree iterator to walk VMAs

The linked list is slower than walking the VMAs using the maple tree.  We
can't use the VMA iterator here because it doesn't support moving to an
earlier position.

Link: https://lkml.kernel.org/r/20220504011345.662299-33-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-49-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-49-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoperf: use VMA iterator
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:58 +0000 (02:17 +0000)]
perf: use VMA iterator

The VMA iterator is faster than the linked list and removing the linked
list will shrink the vm_area_struct.

Link: https://lkml.kernel.org/r/20220504011345.662299-32-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-48-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-48-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoacct: use VMA iterator instead of linked list
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:58 +0000 (02:17 +0000)]
acct: use VMA iterator instead of linked list

The VMA iterator is faster than the linked list.

Link: https://lkml.kernel.org/r/20220504011345.662299-31-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-47-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-47-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoipc/shm: use VMA iterator instead of linked list
Liam R. Howlett [Wed, 20 Jul 2022 02:17:58 +0000 (02:17 +0000)]
ipc/shm: use VMA iterator instead of linked list

The VMA iterator is faster than the linked llist, and it can be walked
even when VMAs are being removed from the address space, so there's no
need to keep track of 'next'.

Link: https://lkml.kernel.org/r/20220504011345.662299-30-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-46-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-46-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agouserfaultfd: use maple tree iterator to iterate VMAs
Liam R. Howlett [Wed, 20 Jul 2022 02:17:57 +0000 (02:17 +0000)]
userfaultfd: use maple tree iterator to iterate VMAs

Don't use the mm_struct linked list or the vma->vm_next in prep for
removal.

Link: https://lkml.kernel.org/r/20220504011345.662299-29-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220615164150.652376-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-45-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-45-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agofs/proc/task_mmu: stop using linked list and highest_vm_end
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:57 +0000 (02:17 +0000)]
fs/proc/task_mmu: stop using linked list and highest_vm_end

Remove references to mm_struct linked list and highest_vm_end for when
they are removed

Link: https://lkml.kernel.org/r/20220504011345.662299-28-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-44-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-44-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agofs/proc/base: use maple tree iterators in place of linked list
Liam R. Howlett [Wed, 20 Jul 2022 02:17:57 +0000 (02:17 +0000)]
fs/proc/base: use maple tree iterators in place of linked list

Link: https://lkml.kernel.org/r/20220504011345.662299-27-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-43-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-43-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoexec: use VMA iterator instead of linked list
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:57 +0000 (02:17 +0000)]
exec: use VMA iterator instead of linked list

Remove a use of the vm_next list by doing the initial lookup with the VMA
iterator and then using it to find the next entry.

Link: https://lkml.kernel.org/r/20220504011345.662299-26-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-42-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-42-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agocoredump: remove vma linked list walk
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:56 +0000 (02:17 +0000)]
coredump: remove vma linked list walk

Use the Maple Tree iterator instead.  This is too complicated for the VMA
iterator to handle, so let's open-code it for now.  If this turns out to
be a common pattern, we can migrate it to common code.

Link: https://lkml.kernel.org/r/20220504011345.662299-25-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-41-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-41-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoum: remove vma linked list walk
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:56 +0000 (02:17 +0000)]
um: remove vma linked list walk

Use the VMA iterator instead.

Link: https://lkml.kernel.org/r/20220504011345.662299-24-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-40-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-40-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agooptee: remove vma linked list walk
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:56 +0000 (02:17 +0000)]
optee: remove vma linked list walk

Use the VMA iterator instead.  Change the calling convention of
__check_mem_type() to pass in the mm instead of the first vma in the
range.

Link: https://lkml.kernel.org/r/20220504011345.662299-23-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-39-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-39-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agocxl: remove vma linked list walk
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:56 +0000 (02:17 +0000)]
cxl: remove vma linked list walk

Use the VMA iterator instead.  This requires a little restructuring of the
surrounding code to hoist the mm to the caller.  That turns
cxl_prefault_one() into a trivial function, so call cxl_fault_segment()
directly.

Link: https://lkml.kernel.org/r/20220504011345.662299-22-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-38-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-38-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoxtensa: remove vma linked list walks
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:55 +0000 (02:17 +0000)]
xtensa: remove vma linked list walks

Use the VMA iterator instead.  Since VMA can no longer be NULL in the
loop, then deal with out-of-memory outside the loop.  This means a
slightly longer run time in the failure case (-ENOMEM) - it will run to
the end of the VMAs before erroring instead of in the middle of the loop.

Link: https://lkml.kernel.org/r/20220504011345.662299-21-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-37-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-37-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agox86: remove vma linked list walks
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:55 +0000 (02:17 +0000)]
x86: remove vma linked list walks

Use the VMA iterator instead.

Link: https://lkml.kernel.org/r/20220504011345.662299-20-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-36-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-36-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agos390: remove vma linked list walks
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:55 +0000 (02:17 +0000)]
s390: remove vma linked list walks

Use the VMA iterator instead.

Link: https://lkml.kernel.org/r/20220504011345.662299-19-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-35-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-35-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agopowerpc: remove mmap linked list walks
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:54 +0000 (02:17 +0000)]
powerpc: remove mmap linked list walks

Use the VMA iterator instead.

Link: https://lkml.kernel.org/r/20220504011345.662299-18-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-34-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-34-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoparisc: remove mmap linked list from cache handling
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:54 +0000 (02:17 +0000)]
parisc: remove mmap linked list from cache handling

Use the VMA iterator instead.

Link: https://lkml.kernel.org/r/20220504011345.662299-17-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-33-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-33-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoarm64: Change elfcore for_each_mte_vma() to use VMA iterator
Liam R. Howlett [Wed, 20 Jul 2022 02:17:54 +0000 (02:17 +0000)]
arm64: Change elfcore for_each_mte_vma() to use VMA iterator

Rework for_each_mte_vma() to use a VMA iterator instead of an explicit
linked-list.

Link: https://lkml.kernel.org/r/20220504011345.662299-16-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-32-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-32-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220218023650.672072-1-Liam.Howlett@oracle.com
Signed-off-by: Will Deacon <will@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoarm64: remove mmap linked list from vdso
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:53 +0000 (02:17 +0000)]
arm64: remove mmap linked list from vdso

Use the VMA iterator instead.

Link: https://lkml.kernel.org/r/20220504011345.662299-15-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-31-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-31-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mmap: change do_brk_munmap() to use do_mas_align_munmap()
Liam R. Howlett [Wed, 20 Jul 2022 02:17:53 +0000 (02:17 +0000)]
mm/mmap: change do_brk_munmap() to use do_mas_align_munmap()

do_brk_munmap() has already aligned the address and has a maple tree state
to be used.  Use the new do_mas_align_munmap() to avoid unnecessary
alignment and error checks.

Link: https://lkml.kernel.org/r/20220504011345.662299-14-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220519150509.1290067-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-30-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-30-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mmap: reorganize munmap to use maple states
Liam R. Howlett [Wed, 20 Jul 2022 02:17:53 +0000 (02:17 +0000)]
mm/mmap: reorganize munmap to use maple states

Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and
do_mas_align_munmap().

do_munmap() is a wrapper to create a maple state for any callers that have
not been converted to the maple tree.

do_mas_munmap() takes a maple state to mumap a range.  This is just a
small function which checks for error conditions and aligns the end of the
range.

do_mas_align_munmap() uses the aligned range to mumap a range.
do_mas_align_munmap() starts with the first VMA in the range, then finds
the last VMA in the range.  Both start and end are split if necessary.
Then the VMAs are removed from the linked list and the mm mlock count is
updated at the same time.  Followed by a single tree operation of
overwriting the area in with a NULL.  Finally, the detached list is
unmapped and freed.

By reorganizing the munmap calls as outlined, it is now possible to avoid
extra work of aligning pre-aligned callers which are known to be safe,
avoid extra VMA lookups or tree walks for modifications.

detach_vmas_to_be_unmapped() is no longer used, so drop this code.

vm_brk_flags() can just call the do_mas_munmap() as it checks for
intersecting VMAs directly.

Link: https://lkml.kernel.org/r/20220504011345.662299-13-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-29-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-29-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mmap: move mmap_region() below do_munmap()
Liam R. Howlett [Wed, 20 Jul 2022 02:17:52 +0000 (02:17 +0000)]
mm/mmap: move mmap_region() below do_munmap()

Relocation of code for the next commit.  There should be no changes here.

Link: https://lkml.kernel.org/r/20220504011345.662299-12-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-28-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-28-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: convert vma_lookup() to use mtree_load()
Liam R. Howlett [Wed, 20 Jul 2022 02:17:52 +0000 (02:17 +0000)]
mm: convert vma_lookup() to use mtree_load()

Unlike the rbtree, the Maple Tree will return a NULL if there's nothing at
a particular address.

Since the previous commit dropped the vmacache, it is now possible to
consult the tree directly.

Link: https://lkml.kernel.org/r/20220504011345.662299-11-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-27-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-27-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: remove vmacache
Liam R. Howlett [Wed, 20 Jul 2022 02:17:52 +0000 (02:17 +0000)]
mm: remove vmacache

By using the maple tree and the maple tree state, the vmacache is no
longer beneficial and is complicating the VMA code.  Remove the vmacache
to reduce the work in keeping it up to date and code complexity.

Link: https://lkml.kernel.org/r/20220504011345.662299-10-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-26-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-26-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mmap: use advanced maple tree API for mmap_region()
Liam R. Howlett [Wed, 20 Jul 2022 02:17:51 +0000 (02:17 +0000)]
mm/mmap: use advanced maple tree API for mmap_region()

Changing mmap_region() to use the maple tree state and the advanced maple
tree interface allows for a lot less tree walking.

This change removes the last caller of munmap_vma_range(), so drop this
unused function.

Add vma_expand() to expand a VMA if possible by doing the necessary
hugepage check, uprobe_munmap of files, dcache flush, modifications then
undoing the detaches, etc.

Link: https://lkml.kernel.org/r/20220504011345.662299-9-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220519020341.rr3s6b4dr7o36cqb@revolver
Link: https://lkml.kernel.org/r/20220621204632.3370049-25-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-25-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: use maple tree operations for find_vma_intersection()
Liam R. Howlett [Wed, 20 Jul 2022 02:17:51 +0000 (02:17 +0000)]
mm: use maple tree operations for find_vma_intersection()

Move find_vma_intersection() to mmap.c and change implementation to maple
tree.

When searching for a vma within a range, it is easier to use the maple
tree interface.

Exported find_vma_intersection() for kvm module.

Link: https://lkml.kernel.org/r/20220504011345.662299-8-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-24-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-24-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()
Liam R. Howlett [Wed, 20 Jul 2022 02:17:50 +0000 (02:17 +0000)]
mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()

Avoid allocating a new VMA when it a vma modification can occur.  When a
brk() can expand or contract a VMA, then the single store operation will
only modify one index of the maple tree instead of causing a node to split
or coalesce.  This avoids unnecessary allocations/frees of maple tree
nodes and VMAs.

Move some limit & flag verifications out of the do_brk_flags() function to
use only relevant checks in the code path of bkr() and vm_brk_flags().

Set the vma to check if it can expand in vm_brk_flags() if extra criteria
are met.

Drop userfaultfd from do_brk_flags() path and only use it in
vm_brk_flags() path since that is the only place a munmap will happen.

Add a wraper for munmap for the brk case called do_brk_munmap().

Link: https://lkml.kernel.org/r/20220504011345.662299-7-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-23-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-23-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/khugepaged: optimize collapse_pte_mapped_thp() by using vma_lookup()
Liam R. Howlett [Wed, 20 Jul 2022 02:17:50 +0000 (02:17 +0000)]
mm/khugepaged: optimize collapse_pte_mapped_thp() by using vma_lookup()

vma_lookup() will walk the vma tree once and not continue to look for the
next vma.  Since the exact vma is checked below, this is a more optimal
way of searching.

Link: https://lkml.kernel.org/r/20220504011345.662299-6-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-22-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-22-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: optimize find_exact_vma() to use vma_lookup()
Liam R. Howlett [Wed, 20 Jul 2022 02:17:50 +0000 (02:17 +0000)]
mm: optimize find_exact_vma() to use vma_lookup()

Use vma_lookup() to walk the tree to the start value requested.  If the
vma at the start does not match, then the answer is NULL and there is no
need to look at the next vma the way that find_vma() would.

Link: https://lkml.kernel.org/r/20220504011345.662299-5-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-21-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-21-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoxen: use vma_lookup() in privcmd_ioctl_mmap()
Liam R. Howlett [Wed, 20 Jul 2022 02:17:49 +0000 (02:17 +0000)]
xen: use vma_lookup() in privcmd_ioctl_mmap()

vma_lookup() walks the VMA tree for a specific value, find_vma() will
search the tree after walking to a specific value.  It is more efficient
to only walk to the requested value since privcmd_ioctl_mmap() will exit
the loop if vm_start != msg->va.

Link: https://lkml.kernel.org/r/20220504011345.662299-4-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-20-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-20-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agommap: change zeroing of maple tree in __vma_adjust()
Liam R. Howlett [Wed, 20 Jul 2022 02:17:49 +0000 (02:17 +0000)]
mmap: change zeroing of maple tree in __vma_adjust()

Only write to the maple tree if we are not inserting or the insert isn't
going to overwrite the area to clear.  This avoids spanning writes and
node coealescing when unnecessary.

The change requires a custom search for the linked list addition to find
the correct VMA for the prev link.

Link: https://lkml.kernel.org/r/20220504011345.662299-3-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-19-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-19-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: remove rb tree.
Liam R. Howlett [Wed, 20 Jul 2022 02:17:49 +0000 (02:17 +0000)]
mm: remove rb tree.

Remove the RB tree and start using the maple tree for vm_area_struct
tracking.

Drop validate_mm() calls in expand_upwards() and expand_downwards() as the
lock is not held.

Link: https://lkml.kernel.org/r/20220504011345.662299-2-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-18-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-18-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoproc: remove VMA rbtree use from nommu
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:48 +0000 (02:17 +0000)]
proc: remove VMA rbtree use from nommu

These users of the rbtree should probably have been walks of the linked
list, but convert them to use walks of the maple tree.

Link: https://lkml.kernel.org/r/20220504011345.662299-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-17-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-17-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agodamon: convert __damon_va_three_regions to use the VMA iterator
Liam R. Howlett [Wed, 20 Jul 2022 02:17:48 +0000 (02:17 +0000)]
damon: convert __damon_va_three_regions to use the VMA iterator

This rather specialised walk can use the VMA iterator.  If this proves to
be too slow, we can write a custom routine to find the two largest gaps,
but it will be somewhat complicated, so let's see if we need it first.

Update the kunit test case to use the maple tree.  This also fixes an
issue with the kunit testcase not adding the last VMA to the list.

Link: https://lkml.kernel.org/r/20220504011215.661968-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-16-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-16-Liam.Howlett@oracle.com
Fixes: 17ccae8bb5c9 (mm/damon: add kunit tests)
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokernel/fork: use maple tree for dup_mmap() during forking
Liam R. Howlett [Wed, 20 Jul 2022 02:17:48 +0000 (02:17 +0000)]
kernel/fork: use maple tree for dup_mmap() during forking

The maple tree was already tracking VMAs in this function by an earlier
commit, but the rbtree iterator was being used to iterate the list.
Change the iterator to use a maple tree native iterator and switch to the
maple tree advanced API to avoid multiple walks of the tree during insert
operations.  Unexport the now-unused vma_store() function.

For performance reasons we bulk allocate the maple tree nodes.  The node
calculations are done internally to the tree and use the VMA count and
assume the worst-case node requirements.  The VM_DONT_COPY flag does not
allow for the most efficient copy method of the tree and so a bulk loading
algorithm is used.

Link: https://lkml.kernel.org/r/20220504010716.661115-16-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-15-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-15-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mmap: use maple tree for unmapped_area{_topdown}
Liam R. Howlett [Wed, 20 Jul 2022 02:17:47 +0000 (02:17 +0000)]
mm/mmap: use maple tree for unmapped_area{_topdown}

The maple tree code was added to find the unmapped area in a previous
commit and was checked against what the rbtree returned, but the actual
result was never used.  Start using the maple tree implementation and
remove the rbtree code.

Add kernel documentation comment for these functions.

Link: https://lkml.kernel.org/r/20220504010716.661115-15-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-14-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-14-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mmap: use the maple tree for find_vma_prev() instead of the rbtree
Liam R. Howlett [Wed, 20 Jul 2022 02:17:47 +0000 (02:17 +0000)]
mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree

Use the maple tree's advanced API and a maple state to walk the tree for
the entry at the address of the next vma, then use the maple state to walk
back one entry to find the previous entry.

Add kernel documentation comments for this API.

Link: https://lkml.kernel.org/r/20220504010716.661115-14-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-13-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-13-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mmap: use the maple tree in find_vma() instead of the rbtree.
Liam R. Howlett [Wed, 20 Jul 2022 02:17:47 +0000 (02:17 +0000)]
mm/mmap: use the maple tree in find_vma() instead of the rbtree.

Using the maple tree interface mt_find() will handle the RCU locking and
will start searching at the address up to the limit, ULONG_MAX in this
case.

Add kernel documentation to this API.

Link: https://lkml.kernel.org/r/20220504010716.661115-13-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-12-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-12-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agommap: use the VMA iterator in count_vma_pages_range()
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:46 +0000 (02:17 +0000)]
mmap: use the VMA iterator in count_vma_pages_range()

This simplifies the implementation and is faster than using the linked
list.

Link: https://lkml.kernel.org/r/20220504010716.661115-12-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-11-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-11-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: add VMA iterator
Matthew Wilcox (Oracle) [Wed, 20 Jul 2022 02:17:46 +0000 (02:17 +0000)]
mm: add VMA iterator

This thin layer of abstraction over the maple tree state is for iterating
over VMAs.  You can go forwards, go backwards or ask where the iterator
is.  Rename the existing vma_next() to __vma_next() -- it will be removed
by the end of this series.

Link: https://lkml.kernel.org/r/20220504010716.661115-11-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-10-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-10-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: start tracking VMAs with maple tree
Liam R. Howlett [Wed, 20 Jul 2022 02:17:45 +0000 (02:17 +0000)]
mm: start tracking VMAs with maple tree

Start tracking the VMAs with the new maple tree structure in parallel with
the rb_tree.  Add debug and trace events for maple tree operations and
duplicate the rb_tree that is created on forks into the maple tree.

The maple tree is added to the mm_struct including the mm_init struct,
added support in required mm/mmap functions, added tracking in kernel/fork
for process forking, and used to find the unmapped_area and checked
against what the rbtree finds.

This also moves the mmap_lock() in exit_mmap() since the oom reaper call
does walk the VMAs.  Otherwise lockdep will be unhappy if oom happens.

When splitting a vma fails due to allocations of the maple tree nodes,
the error path in __split_vma() calls new->vm_ops->close(new).  The page
accounting for hugetlb is actually in the close() operation,  so it
accounts for the removal of 1/2 of the VMA which was not adjusted.  This
results in a negative exit value.  To avoid the negative charge, set
vm_start = vm_end and vm_pgoff = 0.

There is also a potential accounting issue in special mappings from
insert_vm_struct() failing to allocate, so reverse the charge there in
the failure scenario.

Link: https://lkml.kernel.org/r/20220504010716.661115-10-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-9-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-9-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agolib/test_maple_tree: add testing for maple tree
Liam R. Howlett [Wed, 20 Jul 2022 02:17:45 +0000 (02:17 +0000)]
lib/test_maple_tree: add testing for maple tree

This is a test suite that uses the radix test infrastructure.  It has been
split into its own commit to allow for easier review of the maple tree
code.

Link: https://lkml.kernel.org/r/20220504010716.661115-9-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220511144304.1430851-3-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220615141921.417598-4-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-8-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-8-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoradix tree test suite: add lockdep_is_held to header
Liam R. Howlett [Wed, 20 Jul 2022 02:17:41 +0000 (02:17 +0000)]
radix tree test suite: add lockdep_is_held to header

maple tree uses lockdep_is_held, so define it as external in the header.

Link: https://lkml.kernel.org/r/20220504010716.661115-8-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-7-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-7-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoradix tree test suite: add support for slab bulk APIs
Liam R. Howlett [Wed, 20 Jul 2022 02:17:40 +0000 (02:17 +0000)]
radix tree test suite: add support for slab bulk APIs

Add support for kmem_cache_free_bulk() and kmem_cache_alloc_bulk() to the
radix tree test suite.

Link: https://lkml.kernel.org/r/20220504010716.661115-7-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-6-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-6-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoradix tree test suite: add allocation counts and size to kmem_cache
Liam R. Howlett [Wed, 20 Jul 2022 02:17:40 +0000 (02:17 +0000)]
radix tree test suite: add allocation counts and size to kmem_cache

Add functions to get the number of allocations, and total allocations from
a kmem_cache.  Also add a function to get the allocated size and a way to
zero the total allocations.

Link: https://lkml.kernel.org/r/20220504010716.661115-6-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-5-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-5-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoradix tree test suite: add kmem_cache_set_non_kernel()
Liam R. Howlett [Wed, 20 Jul 2022 02:17:39 +0000 (02:17 +0000)]
radix tree test suite: add kmem_cache_set_non_kernel()

kmem_cache_set_non_kernel() is a mechanism to allow a certain number of
kmem_cache_alloc requests to succeed even when GFP_KERNEL is not set in
the flags.  This functionality allows for testing different paths though
the code.

Link: https://lkml.kernel.org/r/20220504010716.661115-5-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-4-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-4-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoradix tree test suite: add pr_err define
Liam R. Howlett [Wed, 20 Jul 2022 02:17:39 +0000 (02:17 +0000)]
radix tree test suite: add pr_err define

define pr_err to printk

Link: https://lkml.kernel.org/r/20220404143501.2016403-2-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220504010716.661115-4-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-3-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-3-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoMaple Tree: add new data structure
Liam R. Howlett [Wed, 20 Jul 2022 02:17:39 +0000 (02:17 +0000)]
Maple Tree: add new data structure

Patch series "Introducing the Maple Tree", v12.

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently.  There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface.  If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes.  With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses.  The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.

The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
long term goal is to reduce or remove the mmap_lock contention.

The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers.  A single write operation will be
allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.

Davidlor said

: Yes I like the maple tree, and at this stage I don't think we can ask for
: more from this series wrt the MM - albeit there seems to still be some
: folks reporting breakage.  Fundamentally I see Liam's work to (re)move
: complexity out of the MM (not to say that the actual maple tree is not
: complex) by consolidating the three complimentary data structures very
: much worth it considering performance does not take a hit.  This was very
: much a turn off with the range locking approach, which worst case scenario
: incurred in prohibitive overhead.  Also as Liam and Matthew have
: mentioned, RCU opens up a lot of nice performance opportunities, and in
: addition academia[1] has shown outstanding scalability of address spaces
: with the foundation of replacing the locked rbtree with RCU aware trees.

A similar work has been discovered in the academic press

https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf

Sheer coincidence.  We designed our tree with the intention of solving the
hardest problem first.  Upon settling on a b-tree variant and a rough
outline, we researched ranged based b-trees and RCU b-trees and did find
that article.  So it was nice to find reassurances that we were on the
right path, but our design choice of using ranges made that paper unusable
for us.

This patch (of 69):

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently.  There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface.  If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes.  With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses.  The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.

The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
long term goal is to reduce or remove the mmap_lock contention.

The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers.  A single write operation will be
allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.

Link: https://lkml.kernel.org/r/20220720021727.17018-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220504010716.661115-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220504002554.654642-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220504010716.661115-3-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220511144304.1430851-2-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220517145913.3480729-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220517152209.3486724-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220519150304.1289636-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220607063834.7004-1-lukas.bulwahn@gmail.com
Link: https://lkml.kernel.org/r/20220615141921.417598-2-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220615141921.417598-3-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220616011739.802669-3-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220615174213.738849-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220617134609.1771611-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220621204632.3370049-2-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220720021727.17018-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoandroid: binder: fix lockdep check on clearing vma
Liam Howlett [Mon, 27 Jun 2022 15:18:59 +0000 (15:18 +0000)]
android: binder: fix lockdep check on clearing vma

When munmapping a vma, the mmap_lock can be degraded to a write before
calling close() on the file handle.  The binder close() function calls
binder_alloc_set_vma() to clear the vma address, which now has a lock dep
check for writing on the mmap_lock.  Change the lockdep check to ensure
the reading lock is held while clearing and keep the write check while
writing.

Link: https://lkml.kernel.org/r/20220627151857.2316964-1-Liam.Howlett@oracle.com
Fixes: 472a68df605b ("android: binder: stop saving a pointer to the VMA")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: syzbot+da54fa8d793ca89c741f@syzkaller.appspotmail.com
Acked-by: Todd Kjos <tkjos@google.com>
Cc: "Arve Hjønnevåg" <arve@android.com>
Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Martijn Coenen <maco@android.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoandroid-binder-stop-saving-a-pointer-to-the-vma-fix
Andrew Morton [Wed, 22 Jun 2022 02:16:17 +0000 (19:16 -0700)]
android-binder-stop-saving-a-pointer-to-the-vma-fix

fix drivers/android/binder_alloc_selftest.c

drivers/android/binder_alloc_selftest.c: In function 'binder_selftest_alloc':
drivers/android/binder_alloc_selftest.c:290:43: error: 'struct binder_alloc' has no member named 'vma'
  290 |         if (!binder_selftest_run || !alloc->vma)

Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Martijn Coenen <maco@android.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Todd Kjos <tkjos@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoandroid: binder: stop saving a pointer to the VMA
Liam R. Howlett [Tue, 21 Jun 2022 01:09:09 +0000 (21:09 -0400)]
android: binder: stop saving a pointer to the VMA

Do not record a pointer to a VMA outside of the mmap_lock for later use.
This is unsafe and there are a number of failure paths *after* the
recorded VMA pointer may be freed during setup.  There is no callback to
the driver to clear the saved pointer from generic mm code.  Furthermore,
the VMA pointer may become stale if any number of VMA operations end up
freeing the VMA so saving it was fragile to being with.

Instead, change the binder_alloc struct to record the start address of the
VMA and use vma_lookup() to get the vma when needed.  Add lockdep
mmap_lock checks on updates to the vma pointer to ensure the lock is held
and depend on that lock for synchronization of readers and writers - which
was already the case anyways, so the smp_wmb()/smp_rmb() was not
necessary.

Link: https://lkml.kernel.org/r/20220621140212.vpkio64idahetbyf@revolver
Fixes: da1b9564e85b ("android: binder: fix the race mmap and alloc_new_buf_locked")
Reported-by: syzbot+58b51ac2b04e388ab7b0@syzkaller.appspotmail.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Martijn Coenen <maco@android.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomips: rename mt_init to mips_mt_init
Liam R. Howlett [Wed, 20 Jul 2022 21:41:31 +0000 (14:41 -0700)]
mips: rename mt_init to mips_mt_init

Move mt_init out of the way for the maple tree.  Use mips_mt prefix to
match the rest of the functions in the file.

Link: https://lkml.kernel.org/r/20220504002554.654642-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: discard __GFP_ATOMIC
NeilBrown [Wed, 20 Jul 2022 21:41:31 +0000 (14:41 -0700)]
mm: discard __GFP_ATOMIC

__GFP_ATOMIC serves little purpose.  Its main effect is to set
ALLOC_HARDER which adds a few little boosts to increase the chance of an
allocation succeeding, one of which is to lower the water-mark at which it
will succeed.

It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also
adjusts this watermark.  It is probable that other users of __GFP_HIGH
should benefit from the other little bonuses that __GFP_ATOMIC gets.

__GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM.
There is little point to this.  We already get a might_sleep() warning if
__GFP_DIRECT_RECLAIM is set.

__GFP_ATOMIC allows the "watermark_boost" to be side-stepped.  It is
probable that testing ALLOC_HARDER is a better fit here.

__GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might
sleep.  This should test __GFP_DIRECT_RECLAIM instead.

This patch:
 - removes __GFP_ATOMIC
 - causes __GFP_HIGH to set ALLOC_HARDER unless __GFP_NOMEMALLOC is set
   (as well as ALLOC_HIGH).
 - makes other adjustments as suggested by the above.

The net result is not change to GFP_ATOMIC allocations.  Other
allocations that use __GFP_HIGH will benefit from a few different extra
privileges.  This affects:
  xen, dm, md, ntfs3
  the vermillion frame buffer
  hibernation
  ksm
  swap
all of which likely produce more benefit than cost if these selected
allocation are more likely to succeed quickly.

Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: hugetlb_vmemmap: use PTRS_PER_PTE instead of PMD_SIZE / PAGE_SIZE
Muchun Song [Tue, 28 Jun 2022 09:22:35 +0000 (17:22 +0800)]
mm: hugetlb_vmemmap: use PTRS_PER_PTE instead of PMD_SIZE / PAGE_SIZE

There is already a macro PTRS_PER_PTE to represent the number of page
table entries, just use it.

Link: https://lkml.kernel.org/r/20220628092235.91270-9-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst
Muchun Song [Tue, 28 Jun 2022 09:22:34 +0000 (17:22 +0800)]
mm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst

All the comments which explains how HVO works are moved to
vmemmap_dedup.rst since

  commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps")

except some comments above page_fixed_fake_head().  This commit moves
those comments to vmemmap_dedup.rst and improve vmemmap_dedup.rst as well.

Link: https://lkml.kernel.org/r/20220628092235.91270-8-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability
Muchun Song [Tue, 28 Jun 2022 09:22:33 +0000 (17:22 +0800)]
mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability

There is a discussion about the name of hugetlb_vmemmap_alloc/free in
thread [1].  The suggestion suggested by David is rename "alloc/free" to
"optimize/restore" to make functionalities clearer to users, "optimize"
means the function will optimize vmemmap pages, while "restore" means
restoring its vmemmap pages discared before.  This commit does this.

Another discussion is the confusion RESERVE_VMEMMAP_NR isn't used
explicitly for vmemmap_addr but implicitly for vmemmap_end in
hugetlb_vmemmap_alloc/free.  David suggested we can compute what
hugetlb_vmemmap_init() does now at runtime.  We do not need to worry for
the overhead of computing at runtime since the calculation is simple
enough and those functions are not in a hot path.  This commit has the
following improvements:

  1) The function suffixed name ("optimize/restore") is more expressive.
  2) The logic becomes less weird in hugetlb_vmemmap_optimize/restore().
  3) The hugetlb_vmemmap_init() does not need to be exported anymore.
  4) A ->optimize_vmemmap_pages field in struct hstate is killed.
  5) There is only one place where checks is_power_of_2(sizeof(struct
     page)) instead of two places.
  6) Add more comments for hugetlb_vmemmap_optimize/restore().
  7) For external users, hugetlb_optimize_vmemmap_pages() is used for
     detecting if the HugeTLB's vmemmap pages is optimizable originally.
     In this commit, it is killed and we introduce a new helper
     hugetlb_vmemmap_optimizable() to replace it.  The name is more
     expressive.

Link: https://lore.kernel.org/all/20220404074652.68024-2-songmuchun@bytedance.com/
Link: https://lkml.kernel.org/r/20220628092235.91270-7-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: hugetlb_vmemmap: replace early_param() with core_param()
Muchun Song [Tue, 28 Jun 2022 09:22:32 +0000 (17:22 +0800)]
mm: hugetlb_vmemmap: replace early_param() with core_param()

After the following commit:

  78f39084b41d ("mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl")

There is no order requirement between the parameter of
"hugetlb_free_vmemmap" and "hugepages" since we have removed the check of
whether HVO is enabled from hugetlb_vmemmap_init().  Therefore we can
safely replace early_param() with core_param() to simplify the code.

Link: https://lkml.kernel.org/r/20220628092235.91270-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c
Muchun Song [Tue, 28 Jun 2022 09:22:31 +0000 (17:22 +0800)]
mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c

When I first introduced vmemmap manipulation functions related to HugeTLB,
I thought those functions may be reused by other modules (e.g.  using
similar approach to optimize vmemmap pages, unfortunately, the DAX used
the same approach but does not use those functions).  After two years, we
didn't see any other users.  So move those functions to hugetlb_vmemmap.c.
Code movement without any functional change.

Link: https://lkml.kernel.org/r/20220628092235.91270-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: hugetlb_vmemmap: introduce the name HVO
Muchun Song [Tue, 28 Jun 2022 09:22:30 +0000 (17:22 +0800)]
mm: hugetlb_vmemmap: introduce the name HVO

It it inconvenient to mention the feature of optimizing vmemmap pages
associated with HugeTLB pages when communicating with others since there
is no specific or abbreviated name for it when it is first introduced.
Let us give it a name HVO (HugeTLB Vmemmap Optimization) from now.

This commit also updates the document about "hugetlb_free_vmemmap" by the
way discussed in thread [1].

Link: https://lore.kernel.org/all/21aae898-d54d-cc4b-a11f-1bb7fddcfffa@redhat.com/
Link: https://lkml.kernel.org/r/20220628092235.91270-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: hugetlb_vmemmap: optimize vmemmap_optimize_mode handling
Muchun Song [Tue, 28 Jun 2022 09:22:29 +0000 (17:22 +0800)]
mm: hugetlb_vmemmap: optimize vmemmap_optimize_mode handling

We hold an another reference to hugetlb_optimize_vmemmap_key when making
vmemmap_optimize_mode on, because we use static_key to tell memory_hotplug
that memory_hotplug.memmap_on_memory should be overridden.  However, this
rule has gone when we have introduced PageVmemmapSelfHosted.  Therefore,
we could simplify vmemmap_optimize_mode handling by not holding an another
reference to hugetlb_optimize_vmemmap_key.  This also means that we not
incur the extra page_fixed_fake_head checks if there are no vmemmap
optinmized hugetlb pages after this change.

Link: https://lkml.kernel.org/r/20220628092235.91270-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: hugetlb_vmemmap: delete hugetlb_optimize_vmemmap_enabled()
Muchun Song [Tue, 28 Jun 2022 09:22:28 +0000 (17:22 +0800)]
mm: hugetlb_vmemmap: delete hugetlb_optimize_vmemmap_enabled()

Patch series "Simplify hugetlb vmemmap and improve its readability", v2.

This series aims to simplify hugetlb vmemmap and improve its readability.

This patch (of 8):

The name hugetlb_optimize_vmemmap_enabled() a bit confusing as it tests
two conditions (enabled and pages in use).  Instead of coming up to an
appropriate name, we could just delete it.  There is already a discussion
about deleting it in thread [1].

There is only one user of hugetlb_optimize_vmemmap_enabled() outside of
hugetlb_vmemmap, that is flush_dcache_page() in arch/arm64/mm/flush.c.
However, it does not need to call hugetlb_optimize_vmemmap_enabled() in
flush_dcache_page() since HugeTLB pages are always fully mapped and only
head page will be set PG_dcache_clean meaning only head page's flag may
need to be cleared (see commit cf5a501d985b).  So it is easy to remove
hugetlb_optimize_vmemmap_enabled().

Link: https://lore.kernel.org/all/c77c61c8-8a5a-87e8-db89-d04d8aaab4cc@oracle.com/
Link: https://lkml.kernel.org/r/20220628092235.91270-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm-shrinkers-fix-double-kfree-on-shrinker-name-fix
Roman Gushchin [Wed, 20 Jul 2022 16:29:45 +0000 (09:29 -0700)]
mm-shrinkers-fix-double-kfree-on-shrinker-name-fix

zero shrinker->name in all cases where shrinker->name is freed

Link: https://lkml.kernel.org/r/YtgteTnQTgyuKUSY@castle
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: shrinkers: fix double kfree on shrinker name
Tetsuo Handa [Wed, 20 Jul 2022 14:47:55 +0000 (23:47 +0900)]
mm: shrinkers: fix double kfree on shrinker name

syzbot is reporting double kfree() at free_prealloced_shrinker() [1], for
destroy_unused_super() calls free_prealloced_shrinker() even if
prealloc_shrinker() returned an error.  Explicitly clear shrinker name
when prealloc_shrinker() called kfree().

Link: https://syzkaller.appspot.com/bug?extid=8b481578352d4637f510
Link: https://lkml.kernel.org/r/ffa62ece-6a42-2644-16cf-0d33ef32c676@I-love.SAKURA.ne.jp
Fixes: e33c267ab70de424 ("mm: shrinkers: provide shrinkers with names")
Reported-by: syzbot <syzbot+8b481578352d4637f510@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoMerge branch 'mm-stable' into mm-unstable
akpm [Wed, 20 Jul 2022 21:41:02 +0000 (14:41 -0700)]
Merge branch 'mm-stable' into mm-unstable

2 years agomailmap: update Gao Xiang's email addresses
Gao Xiang [Tue, 19 Jul 2022 15:42:46 +0000 (23:42 +0800)]
mailmap: update Gao Xiang's email addresses

I've been in Alibaba Cloud for more than one year, mainly to address
cloud-native challenges (such as high-performance container images) for
open source communities.

Update my email addresses on behalf of my current employer (Alibaba Cloud)
to support all my (team) work in this area.  Also add an outdated
@redhat.com address of me.

Link: https://lkml.kernel.org/r/20220719154246.62970-1-xiang@kernel.org
Signed-off-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agouserfaultfd: provide properly masked address for huge-pages
Nadav Amit [Mon, 11 Jul 2022 16:59:06 +0000 (09:59 -0700)]
userfaultfd: provide properly masked address for huge-pages

Commit 824ddc601adc ("userfaultfd: provide unmasked address on
page-fault") was introduced to fix an old bug, in which the offset in the
address of a page-fault was masked.  Concerns were raised - although were
never backed by actual code - that some userspace code might break because
the bug has been around for quite a while.  To address these concerns a
new flag was introduced, and only when this flag is set by the user,
userfaultfd provides the exact address of the page-fault.

The commit however had a bug, and if the flag is unset, the offset was
always masked based on a base-page granularity.  Yet, for huge-pages, the
behavior prior to the commit was that the address is masked to the
huge-page granulrity.

While there are no reports on real breakage, fix this issue.  If the flag
is unset, use the address with the masking that was done before.

Link: https://lkml.kernel.org/r/20220711165906.2682-1-namit@vmware.com
Fixes: 824ddc601adc ("userfaultfd: provide unmasked address on page-fault")
Signed-off-by: Nadav Amit <namit@vmware.com>
Reported-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: James Houghton <jthoughton@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte
Miaohe Lin [Tue, 12 Jul 2022 13:05:42 +0000 (21:05 +0800)]
mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte

In MCOPY_ATOMIC_CONTINUE case with a non-shared VMA, pages in the page
cache are installed in the ptes.  But hugepage_add_new_anon_rmap is called
for them mistakenly because they're not vm_shared.  This will corrupt the
page->mapping used by page cache code.

Link: https://lkml.kernel.org/r/20220712130542.18836-1-linmiaohe@huawei.com
Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoRevert "ocfs2: mount shared volume without ha stack"
Junxiao Bi [Fri, 3 Jun 2022 22:28:01 +0000 (15:28 -0700)]
Revert "ocfs2: mount shared volume without ha stack"

This reverts commit 912f655d78c5d4ad05eac287f23a435924df7144.

This commit introduced a regression that can cause mount hung.  The
changes in __ocfs2_find_empty_slot causes that any node with none-zero
node number can grab the slot that was already taken by node 0, so node 1
will access the same journal with node 0, when it try to grab journal
cluster lock, it will hung because it was already acquired by node 0.
It's very easy to reproduce this, in one cluster, mount node 0 first, then
node 1, you will see the following call trace from node 1.

[13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
[13148.739691]       Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
[13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045 ppid: 53044 flags:0x00004000
[13148.749354] Call Trace:
[13148.750718]  <TASK>
[13148.752019]  ? usleep_range+0x90/0x89
[13148.753882]  __schedule+0x210/0x567
[13148.755684]  schedule+0x44/0xa8
[13148.757270]  schedule_timeout+0x106/0x13c
[13148.759273]  ? __prepare_to_swait+0x53/0x78
[13148.761218]  __wait_for_common+0xae/0x163
[13148.763144]  __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
[13148.765780]  ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
[13148.768312]  ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
[13148.770968]  ocfs2_journal_init+0x91/0x340 [ocfs2]
[13148.773202]  ocfs2_check_volume+0x39/0x461 [ocfs2]
[13148.775401]  ? iput+0x69/0xba
[13148.777047]  ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
[13148.779646]  ocfs2_fill_super+0x54b/0x853 [ocfs2]
[13148.781756]  mount_bdev+0x190/0x1b7
[13148.783443]  ? ocfs2_remount+0x440/0x440 [ocfs2]
[13148.785634]  legacy_get_tree+0x27/0x48
[13148.787466]  vfs_get_tree+0x25/0xd0
[13148.789270]  do_new_mount+0x18c/0x2d9
[13148.791046]  __x64_sys_mount+0x10e/0x142
[13148.792911]  do_syscall_64+0x3b/0x89
[13148.794667]  entry_SYSCALL_64_after_hwframe+0x170/0x0
[13148.797051] RIP: 0033:0x7f2309f6e26e
[13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
[13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
[13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
[13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
[13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
[13148.816564]  </TASK>

To fix it, we can just fix __ocfs2_find_empty_slot.  But original commit
introduced the feature to mount ocfs2 locally even it is cluster based,
that is a very dangerous, it can easily cause serious data corruption,
there is no way to stop other nodes mounting the fs and corrupting it.
Setup ha or other cluster-aware stack is just the cost that we have to
take for avoiding corruption, otherwise we have to do it in kernel.

Link: https://lkml.kernel.org/r/20220603222801.42488-1-junxiao.bi@oracle.com
Fixes: 912f655d78c5("ocfs2: mount shared volume without ha stack")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <heming.zhao@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agohugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte
Miaohe Lin [Sat, 9 Jul 2022 09:26:29 +0000 (17:26 +0800)]
hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte

When alloc_huge_page fails, *pagep is set to NULL without put_page first.
So the hugepage indicated by *pagep is leaked.

Link: https://lkml.kernel.org/r/20220709092629.54291-1-linmiaohe@huawei.com
Fixes: 8cc5fcbb5be8 ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agofs: sendfile handles O_NONBLOCK of out_fd
Andrei Vagin [Sun, 17 Jul 2022 04:37:10 +0000 (21:37 -0700)]
fs: sendfile handles O_NONBLOCK of out_fd

sendfile has to return EAGAIN if out_fd is nonblocking and the write into
it would block.

Here is a small reproducer for the problem:

#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/sendfile.h>

#define FILE_SIZE (1UL << 30)
int main(int argc, char **argv) {
        int p[2], fd;

        if (pipe2(p, O_NONBLOCK))
                return 1;

        fd = open(argv[1], O_RDWR | O_TMPFILE, 0666);
        if (fd < 0)
                return 1;
        ftruncate(fd, FILE_SIZE);

        if (sendfile(p[1], fd, 0, FILE_SIZE) == -1) {
                fprintf(stderr, "FAIL\n");
        }
        if (sendfile(p[1], fd, 0, FILE_SIZE) != -1 || errno != EAGAIN) {
                fprintf(stderr, "FAIL\n");
        }
        return 0;
}

It worked before b964bf53e540, it is stuck after b964bf53e540, and it
works again with this fix.

This regression occurred because do_splice_direct() calls pipe_write
that handles O_NONBLOCK.  Here is a trace log from the reproducer:

 1)               |  __x64_sys_sendfile64() {
 1)               |    do_sendfile() {
 1)               |      __fdget()
 1)               |      rw_verify_area()
 1)               |      __fdget()
 1)               |      rw_verify_area()
 1)               |      do_splice_direct() {
 1)               |        rw_verify_area()
 1)               |        splice_direct_to_actor() {
 1)               |          do_splice_to() {
 1)               |            rw_verify_area()
 1)               |            generic_file_splice_read()
 1) + 74.153 us   |          }
 1)               |          direct_splice_actor() {
 1)               |            iter_file_splice_write() {
 1)               |              __kmalloc()
 1)   0.148 us    |              pipe_lock();
 1)   0.153 us    |              splice_from_pipe_next.part.0();
 1)   0.162 us    |              page_cache_pipe_buf_confirm();
... 16 times
 1)   0.159 us    |              page_cache_pipe_buf_confirm();
 1)               |              vfs_iter_write() {
 1)               |                do_iter_write() {
 1)               |                  rw_verify_area()
 1)               |                  do_iter_readv_writev() {
 1)               |                    pipe_write() {
 1)               |                      mutex_lock()
 1)   0.153 us    |                      mutex_unlock();
 1)   1.368 us    |                    }
 1)   1.686 us    |                  }
 1)   5.798 us    |                }
 1)   6.084 us    |              }
 1)   0.174 us    |              kfree();
 1)   0.152 us    |              pipe_unlock();
 1) + 14.461 us   |            }
 1) + 14.783 us   |          }
 1)   0.164 us    |          page_cache_pipe_buf_release();
... 16 times
 1)   0.161 us    |          page_cache_pipe_buf_release();
 1)               |          touch_atime()
 1) + 95.854 us   |        }
 1) + 99.784 us   |      }
 1) ! 107.393 us  |    }
 1) ! 107.699 us  |  }

Link: https://lkml.kernel.org/r/20220415005015.525191-1-avagin@gmail.com
Fixes: b964bf53e540 ("teach sendfile(2) to handle send-to-pipe directly")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agontfs: fix use-after-free in ntfs_ucsncmp()
ChenXiaoSong [Thu, 7 Jul 2022 10:53:29 +0000 (18:53 +0800)]
ntfs: fix use-after-free in ntfs_ucsncmp()

Syzkaller reported use-after-free bug as follows:

==================================================================
BUG: KASAN: use-after-free in ntfs_ucsncmp+0x123/0x130
Read of size 2 at addr ffff8880751acee8 by task a.out/879

CPU: 7 PID: 879 Comm: a.out Not tainted 5.19.0-rc4-next-20220630-00001-gcc5218c8bd2c-dirty #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x1c0/0x2b0
 print_address_description.constprop.0.cold+0xd4/0x484
 print_report.cold+0x55/0x232
 kasan_report+0xbf/0xf0
 ntfs_ucsncmp+0x123/0x130
 ntfs_are_names_equal.cold+0x2b/0x41
 ntfs_attr_find+0x43b/0xb90
 ntfs_attr_lookup+0x16d/0x1e0
 ntfs_read_locked_attr_inode+0x4aa/0x2360
 ntfs_attr_iget+0x1af/0x220
 ntfs_read_locked_inode+0x246c/0x5120
 ntfs_iget+0x132/0x180
 load_system_files+0x1cc6/0x3480
 ntfs_fill_super+0xa66/0x1cf0
 mount_bdev+0x38d/0x460
 legacy_get_tree+0x10d/0x220
 vfs_get_tree+0x93/0x300
 do_new_mount+0x2da/0x6d0
 path_mount+0x496/0x19d0
 __x64_sys_mount+0x284/0x300
 do_syscall_64+0x3b/0xc0
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f3f2118d9ea
Code: 48 8b 0d a9 f4 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 76 f4 0b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc269deac8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3f2118d9ea
RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffc269dec00
RBP: 00007ffc269dec80 R08: 00007ffc269deb00 R09: 00007ffc269dec44
R10: 0000000000000000 R11: 0000000000000202 R12: 000055f81ab1d220
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 </TASK>

The buggy address belongs to the physical page:
page:0000000085430378 refcount:1 mapcount:1 mapping:0000000000000000 index:0x555c6a81d pfn:0x751ac
memcg:ffff888101f7e180
anon flags: 0xfffffc00a0014(uptodate|lru|mappedtodisk|swapbacked|node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc00a0014 ffffea0001bf2988 ffffea0001de2448 ffff88801712e201
raw: 0000000555c6a81d 0000000000000000 0000000100000000 ffff888101f7e180
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8880751acd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8880751ace00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8880751ace80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                                                          ^
 ffff8880751acf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8880751acf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

The reason is that struct ATTR_RECORD->name_offset is 6485, end address of
name string is out of bounds.

Fix this by adding sanity check on end address of attribute name string.

[akpm@linux-foundation.org: coding-style cleanups]
[chenxiaosong2@huawei.com: cleanup suggested by Hawkins Jiawei]
Link: https://lkml.kernel.org/r/20220709064511.3304299-1-chenxiaosong2@huawei.com
Link: https://lkml.kernel.org/r/20220707105329.4020708-1-chenxiaosong2@huawei.com
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: ChenXiaoSong <chenxiaosong2@huawei.com>
Cc: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agosecretmem: fix unhandled fault in truncate
Mike Rapoport [Thu, 7 Jul 2022 16:56:50 +0000 (19:56 +0300)]
secretmem: fix unhandled fault in truncate

syzkaller reports the following issue:

BUG: unable to handle page fault for address: ffff888021f7e005
PGD 11401067 P4D 11401067 PUD 11402067 PMD 21f7d063 PTE 800fffffde081060
Oops: 0002 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 3761 Comm: syz-executor281 Not tainted 5.19.0-rc4-syzkaller-00014-g941e3e791269 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:64
Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
RSP: 0018:ffffc9000329fa90 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000ffb
RDX: 0000000000000ffb RSI: 0000000000000000 RDI: ffff888021f7e005
RBP: ffffea000087df80 R08: 0000000000000001 R09: ffff888021f7e005
R10: ffffed10043efdff R11: 0000000000000000 R12: 0000000000000005
R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000ffb
FS:  00007fb29d8b2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888021f7e005 CR3: 0000000026e7b000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 zero_user_segments include/linux/highmem.h:272 [inline]
 folio_zero_range include/linux/highmem.h:428 [inline]
 truncate_inode_partial_folio+0x76a/0xdf0 mm/truncate.c:237
 truncate_inode_pages_range+0x83b/0x1530 mm/truncate.c:381
 truncate_inode_pages mm/truncate.c:452 [inline]
 truncate_pagecache+0x63/0x90 mm/truncate.c:753
 simple_setattr+0xed/0x110 fs/libfs.c:535
 secretmem_setattr+0xae/0xf0 mm/secretmem.c:170
 notify_change+0xb8c/0x12b0 fs/attr.c:424
 do_truncate+0x13c/0x200 fs/open.c:65
 do_sys_ftruncate+0x536/0x730 fs/open.c:193
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fb29d900899
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb29d8b2318 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
RAX: ffffffffffffffda RBX: 00007fb29d988408 RCX: 00007fb29d900899
RDX: 00007fb29d900899 RSI: 0000000000000005 RDI: 0000000000000003
RBP: 00007fb29d988400 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb29d98840c
R13: 00007ffca01a23bf R14: 00007fb29d8b2400 R15: 0000000000022000
 </TASK>
Modules linked in:
CR2: ffff888021f7e005
---[ end trace 0000000000000000 ]---

Eric Biggers suggested that this happens when
secretmem_setattr()->simple_setattr() races with secretmem_fault() so that
a page that is faulted in by secretmem_fault() (and thus removed from the
direct map) is zeroed by inode truncation right afterwards.

Use mapping->invalidate_lock to make secretmem_fault() and
secretmem_setattr() mutually exclusive.

[rppt@linux.ibm.com: v3]
Link: https://lkml.kernel.org/r/20220714091337.412297-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20220707165650.248088-1-rppt@kernel.org
Reported-by: syzbot+9bd2b7adbd34b30b87e4@syzkaller.appspotmail.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range()
Naoya Horiguchi [Mon, 4 Jul 2022 01:33:05 +0000 (10:33 +0900)]
mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range()

Originally copy_hugetlb_page_range() handles migration entries and
hwpoisoned entries in similar manner.  But recently the related code path
has more code for migration entries, and when
is_writable_migration_entry() was converted to
!is_readable_migration_entry(), hwpoison entries on source processes got
to be unexpectedly updated (which is legitimate for migration entries, but
not for hwpoison entries).  This results in unexpected serious issues like
kernel panic when forking processes with hwpoison entries in pmd.

Separate the if branch into one for hwpoison entries and one for migration
entries.

Link: https://lkml.kernel.org/r/20220704013312.2415700-3-naoya.horiguchi@linux.dev
Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org> [5.18]
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: fix missing wake-up event for FSDAX pages
Muchun Song [Tue, 5 Jul 2022 12:35:32 +0000 (20:35 +0800)]
mm: fix missing wake-up event for FSDAX pages

FSDAX page refcounts are 1-based, rather than 0-based: if refcount is
1, then the page is freed.  The FSDAX pages can be pinned through GUP,
then they will be unpinned via unpin_user_page() using a folio variant
to put the page, however, folio variants did not consider this special
case, the result will be to miss a wakeup event (like the user of
__fuse_dax_break_layouts()).  This results in a task being permanently
stuck in TASK_INTERRUPTIBLE state.

Since FSDAX pages are only possibly obtained by GUP users, so fix GUP
instead of folio_put() to lower overhead.

Link: https://lkml.kernel.org/r/20220705123532.283-1-songmuchun@bytedance.com
Fixes: d8ddc099c6b3 ("mm/gup: Add gup_put_folio()")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>