]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
3 years agomm/mmap: drop range_has_overlap() function
Liam R. Howlett [Thu, 14 Apr 2022 06:07:24 +0000 (23:07 -0700)]
mm/mmap: drop range_has_overlap() function

Since there is no longer a linked list, the range_has_overlap() function
is identical to the find_vma_intersection() function.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm: remove the vma linked list
Liam R. Howlett [Thu, 14 Apr 2022 06:07:24 +0000 (23:07 -0700)]
mm: remove the vma linked list

Replace any vm_next use with vma_find().

Update free_pgtables(), unmap_vmas(), and zap_page_range() to use the
maple tree.

Use the new free_pgtables() and unmap_vmas() in do_mas_align_munmap().  At
the same time, alter the loop to be more compact.

Now that free_pgtables() and unmap_vmas() take a maple tree as an
argument, rearrange do_mas_align_munmap() to use the new tree to hold the
vmas to remove.

Remove __vma_link_list() and __vma_unlink_list() as they are exclusively
used to update the linked list

Drop linked list update from __insert_vm_struct().

Rework validation of tree as it was depending on the linked list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoriscv: use vma iterator for vdso
Liam R. Howlett [Thu, 14 Apr 2022 06:07:24 +0000 (23:07 -0700)]
riscv: use vma iterator for vdso

Remove the linked list use in favour of the vma iterator.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
3 years agonommu: remove uses of VMA linked list
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:24 +0000 (23:07 -0700)]
nommu: remove uses of VMA linked list

Use the maple tree or VMA iterator instead.  This is faster and will allow
us to shrink the VMA.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agoi915: use the VMA iterator
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:23 +0000 (23:07 -0700)]
i915: use the VMA iterator

Replace the linked list in probe_range() with the VMA iterator.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm/swapfile: use vma iterator instead of vma linked list
Liam R. Howlett [Thu, 14 Apr 2022 06:07:23 +0000 (23:07 -0700)]
mm/swapfile: use vma iterator instead of vma linked list

unuse_mm() no longer needs to reference the linked list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm/pagewalk: use vma_find() instead of vma linked list
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:23 +0000 (23:07 -0700)]
mm/pagewalk: use vma_find() instead of vma linked list

walk_page_range() no longer uses the one vma linked list reference.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm/oom_kill: use maple tree iterators instead of vma linked list
Liam R. Howlett [Thu, 14 Apr 2022 06:07:23 +0000 (23:07 -0700)]
mm/oom_kill: use maple tree iterators instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm/msync: use vma_find() instead of vma linked list
Liam R. Howlett [Thu, 14 Apr 2022 06:07:23 +0000 (23:07 -0700)]
mm/msync: use vma_find() instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm/mremap: use vma_find_intersection() instead of vma linked list
Liam R. Howlett [Thu, 14 Apr 2022 06:07:22 +0000 (23:07 -0700)]
mm/mremap: use vma_find_intersection() instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm/mprotect: use maple tree navigation instead of vma linked list
Liam R. Howlett [Thu, 14 Apr 2022 06:07:22 +0000 (23:07 -0700)]
mm/mprotect: use maple tree navigation instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm/mlock: use vma iterator and maple state instead of vma linked list
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:22 +0000 (23:07 -0700)]
mm/mlock: use vma iterator and maple state instead of vma linked list

Handle overflow checking in count_mm_mlocked_page_nr() differently.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mempolicy: use vma iterator & maple state instead of vma linked list
Liam R. Howlett [Thu, 14 Apr 2022 06:07:22 +0000 (23:07 -0700)]
mm/mempolicy: use vma iterator & maple state instead of vma linked list

Reworked the way mbind_range() finds the first VMA to reuse the maple
state and limit the number of tree walks needed.

Note, this drops the VM_BUG_ON(!vma) call, which would catch a start
address higher than the last VMA.  The code was written in a way that
allowed no VMA updates to occur and still return success.  There should be
no functional change to this scenario with the new code.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agomm/memcontrol: stop using mm->highest_vm_end
Liam R. Howlett [Thu, 14 Apr 2022 06:07:22 +0000 (23:07 -0700)]
mm/memcontrol: stop using mm->highest_vm_end

Pass through ULONG_MAX instead.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/madvise: use vma_find() instead of vma linked list
Liam R. Howlett [Thu, 14 Apr 2022 06:07:21 +0000 (23:07 -0700)]
mm/madvise: use vma_find() instead of vma linked list

madvise_walk_vmas() no longer uses linked list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm/ksm: use vma iterators instead of vma linked list
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:21 +0000 (23:07 -0700)]
mm/ksm: use vma iterators instead of vma linked list

Remove the use of the linked list for eventual removal.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/khugepaged: stop using vma linked list
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:21 +0000 (23:07 -0700)]
mm/khugepaged: stop using vma linked list

Use vma iterator & find_vma() instead of vma linked list.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
3 years agomm/gup: use maple tree navigation instead of linked list
Liam R. Howlett [Thu, 14 Apr 2022 06:07:21 +0000 (23:07 -0700)]
mm/gup: use maple tree navigation instead of linked list

Use find_vma_intersection() to locate the VMAs in __mm_populate() instead
of using find_vma() and the linked list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agobpf: remove VMA linked list
Liam R. Howlett [Thu, 14 Apr 2022 06:07:21 +0000 (23:07 -0700)]
bpf: remove VMA linked list

Use vma_next() and remove reference to the start of the linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agofork: use VMA iterator
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:21 +0000 (23:07 -0700)]
fork: use VMA iterator

The VMA iterator is faster than the linked list and removing the linked
list will shrink the vm_area_struct.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agosched: use maple tree iterator to walk VMAs
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:20 +0000 (23:07 -0700)]
sched: use maple tree iterator to walk VMAs

The linked list is slower than walking the VMAs using the maple tree.  We
can't use the VMA iterator here because it doesn't support moving to an
earlier position.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agoperf: use VMA iterator
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:20 +0000 (23:07 -0700)]
perf: use VMA iterator

The VMA iterator is faster than the linked list and removing the linked
list will shrink the vm_area_struct.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agoacct: use VMA iterator instead of linked list
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:20 +0000 (23:07 -0700)]
acct: use VMA iterator instead of linked list

The VMA iterator is faster than the linked list.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agoipc/shm: use VMA iterator instead of linked list
Liam R. Howlett [Thu, 14 Apr 2022 06:07:20 +0000 (23:07 -0700)]
ipc/shm: use VMA iterator instead of linked list

The VMA iterator is faster than the linked llist, and it can be walked
even when VMAs are being removed from the address space, so there's no
need to keep track of 'next'.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agouserfaultfd: use maple tree iterator to iterate VMAs
Liam R. Howlett [Thu, 14 Apr 2022 06:07:20 +0000 (23:07 -0700)]
userfaultfd: use maple tree iterator to iterate VMAs

Don't use the mm_struct linked list or the vma->vm_next in prep for
removal.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agofs/proc/task_mmu: stop using linked list and highest_vm_end
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:19 +0000 (23:07 -0700)]
fs/proc/task_mmu: stop using linked list and highest_vm_end

Remove references to mm_struct linked list and highest_vm_end for when
they are removed

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agofs/proc/base: use maple tree iterators in place of linked list
Liam R. Howlett [Thu, 14 Apr 2022 06:07:19 +0000 (23:07 -0700)]
fs/proc/base: use maple tree iterators in place of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoexec: use VMA iterator instead of linked list
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:19 +0000 (23:07 -0700)]
exec: use VMA iterator instead of linked list

Remove a use of the vm_next list by doing the initial lookup with the VMA
iterator and then using it to find the next entry.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agocoredump: remove vma linked list walk
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:19 +0000 (23:07 -0700)]
coredump: remove vma linked list walk

Use the Maple Tree iterator instead.  This is too complicated for the VMA
iterator to handle, so let's open-code it for now.  If this turns out to
be a common pattern, we can migrate it to common code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoum: remove vma linked list walk
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:19 +0000 (23:07 -0700)]
um: remove vma linked list walk

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agooptee: remove vma linked list walk
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:18 +0000 (23:07 -0700)]
optee: remove vma linked list walk

Use the VMA iterator instead.  Change the calling convention of
__check_mem_type() to pass in the mm instead of the first vma in the
range.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agocxl: remove vma linked list walk
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:18 +0000 (23:07 -0700)]
cxl: remove vma linked list walk

Use the VMA iterator instead.  This requires a little restructuring of the
surrounding code to hoist the mm to the caller.  That turns
cxl_prefault_one() into a trivial function, so call cxl_fault_segment()
directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agoxtensa: remove vma linked list walks
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:18 +0000 (23:07 -0700)]
xtensa: remove vma linked list walks

Use the VMA iterator instead.  Since VMA can no longer be NULL in the
loop, then deal with out-of-memory outside the loop.  This means a
slightly longer run time in the failure case (-ENOMEM) - it will run to
the end of the VMAs before erroring instead of in the middle of the loop.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agox86: remove vma linked list walks
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:18 +0000 (23:07 -0700)]
x86: remove vma linked list walks

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agos390: remove vma linked list walks
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:18 +0000 (23:07 -0700)]
s390: remove vma linked list walks

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agopowerpc: remove mmap linked list walks
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:17 +0000 (23:07 -0700)]
powerpc: remove mmap linked list walks

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
3 years agoparisc: remove mmap linked list from cache handling
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:17 +0000 (23:07 -0700)]
parisc: remove mmap linked list from cache handling

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agoarm64: Change elfcore for_each_mte_vma() to use VMA iterator
Liam R. Howlett [Fri, 18 Feb 2022 02:37:04 +0000 (02:37 +0000)]
arm64: Change elfcore for_each_mte_vma() to use VMA iterator

Rework for_each_mte_vma() to use a VMA iterator instead of an explicit
linked-list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220218023650.672072-1-Liam.Howlett@oracle.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoarm64: remove mmap linked list from vdso
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:17 +0000 (23:07 -0700)]
arm64: remove mmap linked list from vdso

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm/mmap: change do_brk_munmap() to use do_mas_align_munmap()
Liam R. Howlett [Thu, 14 Apr 2022 06:07:17 +0000 (23:07 -0700)]
mm/mmap: change do_brk_munmap() to use do_mas_align_munmap()

do_brk_munmap() has already aligned the address and has a maple tree state
to be used.  Use the new do_mas_align_munmap() to avoid unnecessary
alignment and error checks.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mmap: reorganize munmap to use maple states
Liam R. Howlett [Thu, 14 Apr 2022 06:07:17 +0000 (23:07 -0700)]
mm/mmap: reorganize munmap to use maple states

Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and
do_mas_align_munmap().

do_munmap() is a wrapper to create a maple state for any callers that have
not been converted to the maple tree.

do_mas_munmap() takes a maple state to mumap a range.  This is just a
small function which checks for error conditions and aligns the end of the
range.

do_mas_align_munmap() uses the aligned range to mumap a range.
do_mas_align_munmap() starts with the first VMA in the range, then finds
the last VMA in the range.  Both start and end are split if necessary.
Then the VMAs are removed from the linked list and the mm mlock count is
updated at the same time.  Followed by a single tree operation of
overwriting the area in with a NULL.  Finally, the detached list is
unmapped and freed.

By reorganizing the munmap calls as outlined, it is now possible to avoid
extra work of aligning pre-aligned callers which are known to be safe,
avoid extra VMA lookups or tree walks for modifications.

detach_vmas_to_be_unmapped() is no longer used, so drop this code.

vm_brk_flags() can just call the do_mas_munmap() as it checks for
intersecting VMAs directly.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mmap: move mmap_region() below do_munmap()
Liam R. Howlett [Thu, 14 Apr 2022 06:07:16 +0000 (23:07 -0700)]
mm/mmap: move mmap_region() below do_munmap()

Relocation of code for the next commit.  There should be no changes here.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm: convert vma_lookup() to use mtree_load()
Liam R. Howlett [Thu, 14 Apr 2022 06:07:16 +0000 (23:07 -0700)]
mm: convert vma_lookup() to use mtree_load()

Unlike the rbtree, the Maple Tree will return a NULL if there's nothing at
a particular address.

Since the previous commit dropped the vmacache, it is now possible to
consult the tree directly.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm: remove vmacache
Liam R. Howlett [Thu, 14 Apr 2022 06:07:16 +0000 (23:07 -0700)]
mm: remove vmacache

By using the maple tree and the maple tree state, the vmacache is no
longer beneficial and is complicating the VMA code.  Remove the vmacache
to reduce the work in keeping it up to date and code complexity.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm/mmap: use advanced maple tree API for mmap_region()
Liam R. Howlett [Thu, 14 Apr 2022 06:07:16 +0000 (23:07 -0700)]
mm/mmap: use advanced maple tree API for mmap_region()

Changing mmap_region() to use the maple tree state and the advanced maple
tree interface allows for a lot less tree walking.

This change removes the last caller of munmap_vma_range(), so drop this
unused function.

Add vma_expand() to expand a VMA if possible by doing the necessary
hugepage check, uprobe_munmap of files, dcache flush, modifications then
undoing the detaches, etc.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm: use maple tree operations for find_vma_intersection()
Liam R. Howlett [Thu, 14 Apr 2022 06:07:16 +0000 (23:07 -0700)]
mm: use maple tree operations for find_vma_intersection()

Move find_vma_intersection() to mmap.c and change implementation to maple
tree.

When searching for a vma within a range, it is easier to use the maple
tree interface.

Exported find_vma_intersection() for kvm module.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()
Liam R. Howlett [Thu, 14 Apr 2022 06:07:15 +0000 (23:07 -0700)]
mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()

Avoid allocating a new VMA when it a vma modification can occur.  When a
brk() can expand or contract a VMA, then the single store operation will
only modify one index of the maple tree instead of causing a node to split
or coalesce.  This avoids unnecessary allocations/frees of maple tree
nodes and VMAs.

Move some limit & flag verifications out of the do_brk_flags() function to
use only relevant checks in the code path of bkr() and vm_brk_flags().

Set the vma to check if it can expand in vm_brk_flags() if extra criteria
are met.

Drop userfaultfd from do_brk_flags() path and only use it in
vm_brk_flags() path since that is the only place a munmap will happen.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/khugepaged: optimize collapse_pte_mapped_thp() by using vma_lookup()
Liam R. Howlett [Thu, 14 Apr 2022 06:07:15 +0000 (23:07 -0700)]
mm/khugepaged: optimize collapse_pte_mapped_thp() by using vma_lookup()

vma_lookup() will walk the vma tree once and not continue to look for the
next vma.  Since the exact vma is checked below, this is a more optimal
way of searching.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm: optimize find_exact_vma() to use vma_lookup()
Liam R. Howlett [Thu, 14 Apr 2022 06:07:15 +0000 (23:07 -0700)]
mm: optimize find_exact_vma() to use vma_lookup()

Use vma_lookup() to walk the tree to the start value requested.  If the
vma at the start does not match, then the answer is NULL and there is no
need to look at the next vma the way that find_vma() would.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
3 years agoxen: use vma_lookup() in privcmd_ioctl_mmap()
Liam R. Howlett [Thu, 14 Apr 2022 06:07:15 +0000 (23:07 -0700)]
xen: use vma_lookup() in privcmd_ioctl_mmap()

vma_lookup() walks the VMA tree for a specific value, find_vma() will
search the tree after walking to a specific value.  It is more efficient
to only walk to the requested value since privcmd_ioctl_mmap() will exit
the loop if vm_start != msg->va.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agommap: change zeroing of maple tree in __vma_adjust()
Liam R. Howlett [Thu, 14 Apr 2022 06:07:15 +0000 (23:07 -0700)]
mmap: change zeroing of maple tree in __vma_adjust()

Only write to the maple tree if we are not inserting or the insert isn't
going to overwrite the area to clear.  This avoids spanning writes and
node coealescing when unnecessary.

The change requires a custom search for the linked list addition to find
the correct VMA for the prev link.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
3 years agomm: remove rb tree.
Liam R. Howlett [Thu, 14 Apr 2022 06:07:14 +0000 (23:07 -0700)]
mm: remove rb tree.

Remove the RB tree and start using the maple tree for vm_area_struct
tracking.

Drop validate_mm() calls in expand_upwards() and expand_downwards() as the
lock is not held.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoproc: remove VMA rbtree use from nommu
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:14 +0000 (23:07 -0700)]
proc: remove VMA rbtree use from nommu

These users of the rbtree should probably have been walks of the linked
list, but convert them to use walks of the maple tree.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agodamon-convert-__damon_va_three_regions-to-use-the-vma-iterator-fix
Andrew Morton [Wed, 11 May 2022 00:47:12 +0000 (17:47 -0700)]
damon-convert-__damon_va_three_regions-to-use-the-vma-iterator-fix

vaddr-test.h no longer needs internal.h

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agodamon: Convert __damon_va_three_regions to use the VMA iterator
Liam R. Howlett [Thu, 14 Apr 2022 06:07:14 +0000 (23:07 -0700)]
damon: Convert __damon_va_three_regions to use the VMA iterator

This rather specialised walk can use the VMA iterator.  If this proves to
be too slow, we can write a custom routine to find the two largest gaps,
but it will be somewhat complicated, so let's see if we need it first.

Update the kunit test case to use the maple tree.  This also fixes an
issue with the kunit testcase not adding the last VMA to the list.

Fixes: 17ccae8bb5c9 (mm/damon: add kunit tests)
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
3 years agokernel/fork: use maple tree for dup_mmap() during forking
Liam R. Howlett [Thu, 14 Apr 2022 06:07:14 +0000 (23:07 -0700)]
kernel/fork: use maple tree for dup_mmap() during forking

The maple tree was already tracking VMAs in this function by an earlier
commit, but the rbtree iterator was being used to iterate the list.
Change the iterator to use a maple tree native iterator and switch to the
maple tree advanced API to avoid multiple walks of the tree during insert
operations.  Unexport the now-unused vma_store() function.

For performance reasons we bulk allocate the maple tree nodes.  The node
calculations are done internally to the tree and use the VMA count and
assume the worst-case node requirements.  The VM_DONT_COPY flag does not
allow for the most efficient copy method of the tree and so a bulk loading
algorithm is used.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm/mmap: use maple tree for unmapped_area{_topdown}
Liam R. Howlett [Thu, 14 Apr 2022 06:07:14 +0000 (23:07 -0700)]
mm/mmap: use maple tree for unmapped_area{_topdown}

The maple tree code was added to find the unmapped area in a previous
commit and was checked against what the rbtree returned, but the actual
result was never used.  Start using the maple tree implementation and
remove the rbtree code.

Add kernel documentation comment for these functions.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mmap: use the maple tree for find_vma_prev() instead of the rbtree
Liam R. Howlett [Thu, 14 Apr 2022 06:07:13 +0000 (23:07 -0700)]
mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree

Use the maple tree's advanced API and a maple state to walk the tree for
the entry at the address of the next vma, then use the maple state to walk
back one entry to find the previous entry.

Add kernel documentation comments for this API.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm/mmap: use the maple tree in find_vma() instead of the rbtree.
Liam R. Howlett [Thu, 14 Apr 2022 06:07:13 +0000 (23:07 -0700)]
mm/mmap: use the maple tree in find_vma() instead of the rbtree.

Using the maple tree interface mt_find() will handle the RCU locking and
will start searching at the address up to the limit, ULONG_MAX in this
case.

Add kernel documentation to this API.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agommap: use the VMA iterator in count_vma_pages_range()
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:13 +0000 (23:07 -0700)]
mmap: use the VMA iterator in count_vma_pages_range()

This simplifies the implementation and is faster than using the linked
list.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomm: add VMA iterator
Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:13 +0000 (23:07 -0700)]
mm: add VMA iterator

This thin layer of abstraction over the maple tree state is for iterating
over VMAs.  You can go forwards, go backwards or ask where the iterator
is.  Rename the existing vma_next() to __vma_next() -- it will be removed
by the end of this series.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
3 years agomapletree: build fix
Andrew Morton [Wed, 11 May 2022 00:47:10 +0000 (17:47 -0700)]
mapletree: build fix

Fix the vma_mas_store/vma_mas_remove issues.  Missing prototypes, missing
implementation on nommu.

Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: start tracking VMAs with maple tree
Liam R. Howlett [Thu, 14 Apr 2022 06:07:13 +0000 (23:07 -0700)]
mm: start tracking VMAs with maple tree

Start tracking the VMAs with the new maple tree structure in parallel with
the rb_tree.  Add debug and trace events for maple tree operations and
duplicate the rb_tree that is created on forks into the maple tree.

The maple tree is added to the mm_struct including the mm_init struct,
added support in required mm/mmap functions, added tracking in kernel/fork
for process forking, and used to find the unmapped_area and checked
against what the rbtree finds.

This also moves the mmap_lock() in exit_mmap() since the oom reaper call
does walk the VMAs.  Otherwise lockdep will be unhappy if oom happens.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agolib/test_maple_tree: add testing for maple tree
Liam R. Howlett [Thu, 14 Apr 2022 06:07:12 +0000 (23:07 -0700)]
lib/test_maple_tree: add testing for maple tree

This is a test suite that uses the radix test infrastructure.  It has
been split into its own commit to allow for easier review of the maple
tree code.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
3 years agoradix tree test suite: add lockdep_is_held to header
Liam R. Howlett [Thu, 14 Apr 2022 06:07:12 +0000 (23:07 -0700)]
radix tree test suite: add lockdep_is_held to header

maple tree uses lockdep_is_held, so define it as external in the header.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
3 years agoradix tree test suite: add support for slab bulk APIs
Liam R. Howlett [Thu, 14 Apr 2022 06:07:12 +0000 (23:07 -0700)]
radix tree test suite: add support for slab bulk APIs

Add support for kmem_cache_free_bulk() and kmem_cache_alloc_bulk() to the
radix tree test suite.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoradix tree test suite: add allocation counts and size to kmem_cache
Liam R. Howlett [Thu, 14 Apr 2022 06:07:11 +0000 (23:07 -0700)]
radix tree test suite: add allocation counts and size to kmem_cache

Add functions to get the number of allocations, and total allocations from
a kmem_cache.  Also add a function to get the allocated size and a way to
zero the total allocations.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoradix tree test suite: add kmem_cache_set_non_kernel()
Liam R. Howlett [Thu, 14 Apr 2022 06:07:11 +0000 (23:07 -0700)]
radix tree test suite: add kmem_cache_set_non_kernel()

kmem_cache_set_non_kernel() is a mechanism to allow a certain number of
kmem_cache_alloc requests to succeed even when GFP_KERNEL is not set in
the flags.  This functionality allows for testing different paths though
the code.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agoradix tree test suite: add pr_err define
Liam R. Howlett [Thu, 14 Apr 2022 06:07:11 +0000 (23:07 -0700)]
radix tree test suite: add pr_err define

Patch series "Introducing the Maple Tree".

This patch (of 70):

define pr_err to printk

Link: https://lkml.kernel.org/r/20220404143501.2016403-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agoMaple Tree: add new data structure
Liam R. Howlett [Thu, 14 Apr 2022 06:07:12 +0000 (23:07 -0700)]
Maple Tree: add new data structure

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently.  There are a number of places in the kernel that a
non-overlapping range-based tree would be beneficial, especially one with a
simple interface.  If you use an rbtree with other data structures to improve
performance or an interval tree to track non-overlapping ranges, then this is
for you.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes.
With the increased branching factor, it is significantly shorter than the
rbtree so it has fewer cache misses.  The removal of the linked list between
subsequent entries also reduces the cache misses and the need to pull in the
previous and next VMA during many tree alterations.

The first user that is covered in this patch set is the vm_area_struct, where
three data structures are replaced by the maple tree: the augmented rbtree, the
vma cache, and the linked list of VMAs in the mm_struct.  The long term goal is
to reduce or remove the mmap_lock contention.

The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers.  A single write operation will be allowed
at a time.  A reader re-walks if stale data is encountered. VMAs would be RCU
enabled and this mode would be entered once multiple tasks are using the
mm_struct.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: David Howells <dhowells@redhat.com>
3 years agomips: rename mt_init to mips_mt_init
Liam R. Howlett [Thu, 14 Apr 2022 06:07:12 +0000 (23:07 -0700)]
mips: rename mt_init to mips_mt_init

Move mt_init out of the way for the maple tree.  Use mips_mt prefix to
match the rest of the functions in the file.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
3 years agomm/damon/reclaim: fix the timer always stays active
Hailong Tu [Fri, 29 Apr 2022 21:37:00 +0000 (14:37 -0700)]
mm/damon/reclaim: fix the timer always stays active

The timer stays active even if the reclaim mechanism is never enabled.  It
is unnecessary overhead can be completely avoided by using
module_param_cb() for enabled flag.

Link: https://lkml.kernel.org/r/20220421125910.1052459-1-tuhailong@gmail.com
Signed-off-by: Hailong Tu <tuhailong@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm/damon: remove unnecessary type castings
Yu Zhe [Fri, 29 Apr 2022 21:37:00 +0000 (14:37 -0700)]
mm/damon: remove unnecessary type castings

Remove unnecessary void* type castings.

Link: https://lkml.kernel.org/r/20220421153056.8474-1-yuzhe@nfschina.com
Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: liqiong <liqiong@nfschina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm/damon/core-test: add a kunit test case for ops registration
SeongJae Park [Fri, 29 Apr 2022 21:37:00 +0000 (14:37 -0700)]
mm/damon/core-test: add a kunit test case for ops registration

This commit adds a simple kunit test case for DAMON operations
registration feature.

Link: https://lkml.kernel.org/r/20220419122225.290518-1-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agodamon: vaddr-test: tweak code to make the logic clearer
Xiaomeng Tong [Fri, 29 Apr 2022 21:37:00 +0000 (14:37 -0700)]
damon: vaddr-test: tweak code to make the logic clearer

Move these two lines into the damon_for_each_region loop, it is always for
testing the last region.  And also avoid to use a list iterator 'r'
outside the loop which is considered harmful[1].

[1]:  https://lkml.org/lkml/2022/2/17/1032

Link: https://lkml.kernel.org/r/20220328115252.31675-1-xiam0nd.tong@gmail.com
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agoselftests: cgroup: add a selftest for memory.reclaim
Yosry Ahmed [Fri, 29 Apr 2022 21:37:00 +0000 (14:37 -0700)]
selftests: cgroup: add a selftest for memory.reclaim

Add a new test for memory.reclaim that verifies that the interface
correctly reclaims memory as intended, from both anon and file pages.

Link: https://lkml.kernel.org/r/20220425190040.2475377-5-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agoselftests: cgroup: fix alloc_anon_noexit() instantly freeing memory
Yosry Ahmed [Fri, 29 Apr 2022 21:36:59 +0000 (14:36 -0700)]
selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory

Currently, alloc_anon_noexit() calls alloc_anon() which instantly frees
the allocated memory. alloc_anon_noexit() is usually used with
cg_run_nowait() to run a process in the background that allocates
memory. It makes sense for the background process to keep the memory
allocated and not instantly free it (otherwise there is no point of
running it in the background).

Link: https://lkml.kernel.org/r/20220425190040.2475377-4-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agoselftests: cgroup: return -errno from cg_read()/cg_write() on failure
Yosry Ahmed [Fri, 29 Apr 2022 21:36:59 +0000 (14:36 -0700)]
selftests: cgroup: return -errno from cg_read()/cg_write() on failure

Currently, cg_read()/cg_write() returns 0 on success and -1 on failure.
Modify them to return the -errno on failure.

Link: https://lkml.kernel.org/r/20220425190040.2475377-3-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomemcg: introduce per-memcg reclaim interface
Shakeel Butt [Fri, 29 Apr 2022 21:36:59 +0000 (14:36 -0700)]
memcg: introduce per-memcg reclaim interface

This patch series adds a memory.reclaim proactive reclaim interface.
The rationale behind the interface and how it works are in the first
patch.

This patch (of 4):

Introduce a memcg interface to trigger memory reclaim on a memory cgroup.

Use case: Proactive Reclaim
---------------------------

A userspace proactive reclaimer can continuously probe the memcg to
reclaim a small amount of memory.  This gives more accurate and up-to-date
workingset estimation as the LRUs are continuously sorted and can
potentially provide more deterministic memory overcommit behavior.  The
memory overcommit controller can provide more proactive response to the
changing behavior of the running applications instead of being reactive.

A userspace reclaimer's purpose in this case is not a complete replacement
for kswapd or direct reclaim, it is to proactively identify memory savings
opportunities and reclaim some amount of cold pages set by the policy to
free up the memory for more demanding jobs or scheduling new jobs.

A user space proactive reclaimer is used in Google data centers.
Additionally, Meta's TMO paper recently referenced a very similar
interface used for user space proactive reclaim:
https://dl.acm.org/doi/pdf/10.1145/3503222.3507731

Benefits of a user space reclaimer:
-----------------------------------

1) More flexible on who should be charged for the cpu of the memory
   reclaim.  For proactive reclaim, it makes more sense to be centralized.

2) More flexible on dedicating the resources (like cpu).  The memory
   overcommit controller can balance the cost between the cpu usage and
   the memory reclaimed.

3) Provides a way to the applications to keep their LRUs sorted, so,
   under memory pressure better reclaim candidates are selected.  This
   also gives more accurate and uptodate notion of working set for an
   application.

Why memory.high is not enough?
------------------------------

- memory.high can be used to trigger reclaim in a memcg and can
  potentially be used for proactive reclaim.  However there is a big
  downside in using memory.high.  It can potentially introduce high
  reclaim stalls in the target application as the allocations from the
  processes or the threads of the application can hit the temporary
  memory.high limit.

- Userspace proactive reclaimers usually use feedback loops to decide
  how much memory to proactively reclaim from a workload.  The metrics
  used for this are usually either refaults or PSI, and these metrics will
  become messy if the application gets throttled by hitting the high
  limit.

- memory.high is a stateful interface, if the userspace proactive
  reclaimer crashes for any reason while triggering reclaim it can leave
  the application in a bad state.

- If a workload is rapidly expanding, setting memory.high to proactively
  reclaim memory can result in actually reclaiming more memory than
  intended.

The benefits of such interface and shortcomings of existing interface were
further discussed in this RFC thread:
https://lore.kernel.org/linux-mm/5df21376-7dd1-bf81-8414-32a73cea45dd@google.com/

Interface:
----------

Introducing a very simple memcg interface 'echo 10M > memory.reclaim' to
trigger reclaim in the target memory cgroup.

The interface is introduced as a nested-keyed file to allow for future
optional arguments to be easily added to configure the behavior of
reclaim.

Possible Extensions:
--------------------

- This interface can be extended with an additional parameter or flags
  to allow specifying one or more types of memory to reclaim from (e.g.
  file, anon, ..).

- The interface can also be extended with a node mask to reclaim from
  specific nodes. This has use cases for reclaim-based demotion in memory
  tiering systens.

- A similar per-node interface can also be added to support proactive
  reclaim and reclaim-based demotion in systems without memcg.

- Add a timeout parameter to make it easier for user space to call the
  interface without worrying about being blocked for an undefined amount
  of time.

For now, let's keep things simple by adding the basic functionality.

[yosryahmed@google.com: worked on versions v2 onwards, refreshed to
current master, updated commit message based on recent
discussions and use cases]
Link: https://lkml.kernel.org/r/20220425190040.2475377-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20220425190040.2475377-2-yosryahmed@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Co-developed-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Wei Xu <weixugc@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agozram: add a huge_idle writeback mode
Brian Geffon [Fri, 29 Apr 2022 21:36:59 +0000 (14:36 -0700)]
zram: add a huge_idle writeback mode

Today it's only possible to write back as a page, idle, or huge.  A user
might want to writeback pages which are huge and idle first as these idle
pages do not require decompression and make a good first pass for
writeback.

Idle writeback specifically has the advantage that a refault is unlikely
given that the page has been swapped for some amount of time without being
refaulted.

Huge writeback has the advantage that you're guaranteed to get the maximum
benefit from a single page writeback, that is, you're reclaiming one full
page of memory.  Pages which are compressed in zram being written back
result in some benefit which is always less than a page size because of
the fact that it was compressed.

The primary use of this is for minimizing refaults in situations where the
device has to be sensitive to storage endurance.  On ChromeOS we have
devices with slow eMMC and repeated writes and refaults can negatively
affect performance and endurance.

Link: https://lkml.kernel.org/r/20220322215821.1196994-1-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm/page_alloc: simplify update of pgdat in wake_all_kswapds
Chen Wandun [Fri, 29 Apr 2022 21:36:59 +0000 (14:36 -0700)]
mm/page_alloc: simplify update of pgdat in wake_all_kswapds

There is no need to update last_pgdat for each zone, only update
last_pgdat when iterating the first zone of a node.

Link: https://lkml.kernel.org/r/20220322115635.2708989-1-chenwandun@huawei.com
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agokasan: mark KASAN_VMALLOC flags as kasan_vmalloc_flags_t
Andrey Konovalov [Fri, 29 Apr 2022 21:36:58 +0000 (14:36 -0700)]
kasan: mark KASAN_VMALLOC flags as kasan_vmalloc_flags_t

Fix sparse warning:

mm/kasan/shadow.c:496:15: warning: restricted kasan_vmalloc_flags_t degrades to integer

Link: https://lkml.kernel.org/r/52d8fccdd3a48d4bdfd0ff522553bac2a13f1579.1649351254.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agokasan: fix sleeping function called from invalid context on RT kernel
Zqiang [Fri, 29 Apr 2022 21:36:58 +0000 (14:36 -0700)]
kasan: fix sleeping function called from invalid context on RT kernel

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
...........
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt #22
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x60/0x8c
dump_stack+0x10/0x12
 __might_resched.cold+0x13b/0x173
rt_spin_lock+0x5b/0xf0
 ___cache_free+0xa5/0x180
qlist_free_all+0x7a/0x160
per_cpu_remove_cache+0x5f/0x70
smp_call_function_many_cond+0x4c4/0x4f0
on_each_cpu_cond_mask+0x49/0xc0
kasan_quarantine_remove_cache+0x54/0xf0
kasan_cache_shrink+0x9/0x10
kmem_cache_shrink+0x13/0x20
acpi_os_purge_cache+0xe/0x20
acpi_purge_cached_objects+0x21/0x6d
acpi_initialize_objects+0x15/0x3b
acpi_init+0x130/0x5ba
do_one_initcall+0xe5/0x5b0
kernel_init_freeable+0x34f/0x3ad
kernel_init+0x1e/0x140
ret_from_fork+0x22/0x30

When the kmem_cache_shrink() was called, the IPI was triggered, the
___cache_free() is called in IPI interrupt context, the local-lock or
spin-lock will be acquired.  On PREEMPT_RT kernel, these locks are
replaced with sleepbale rt-spinlock, so the above problem is triggered.
Fix it by moving the qlist_free_allfrom() from IPI interrupt context to
task context when PREEMPT_RT is enabled.

[akpm@linux-foundation.org: reduce ifdeffery]
Link: https://lkml.kernel.org/r/20220401134649.2222485-1-qiang1.zhang@intel.com
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: hugetlb: add missing cache flushing in hugetlb_unshare_all_pmds()
Baolin Wang [Fri, 29 Apr 2022 21:36:58 +0000 (14:36 -0700)]
mm: hugetlb: add missing cache flushing in hugetlb_unshare_all_pmds()

Missed calling flush_cache_range() before removing the sharing PMD
entrires, otherwise data consistence issue may be occurred on some
architectures whose caches are strict and require a virtual>physical
translation to exist for a virtual address.  Thus add it.

Now no architectures enabling PMD sharing will be affected, since they do
not have a VIVT cache.  That means this issue can not be happened in
practice so far.

Link: https://lkml.kernel.org/r/47441086affcabb6ecbe403173e9283b0d904b38.1650956489.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/419b0e777c9e6d1454dcd906e0f5b752a736d335.1650781755.git.baolin.wang@linux.alibaba.com
Fixes: 6dfeaff93be1 ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm/khugepaged: use vma_is_anonymous
xu xin [Fri, 29 Apr 2022 21:36:58 +0000 (14:36 -0700)]
mm/khugepaged: use vma_is_anonymous

Clean up the vma->vm_ops usage.  Use vma_is_anonymous instead of
vma->vm_ops to make it more understandable.

Link: https://lkml.kernel.org/r/20220424071642.3234971-1-xu.xin16@zte.com.cn
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: use for_each_online_node and node_online instead of open coding
Peng Liu [Fri, 29 Apr 2022 21:36:58 +0000 (14:36 -0700)]
mm: use for_each_online_node and node_online instead of open coding

Use more generic functions to deal with issues related to online nodes.
The changes will make the code simplified.

Link: https://lkml.kernel.org/r/20220429030218.644635-1-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Suggested-by: Davidlohr Bueso <dave@stgolabs.net>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agohugetlb: fix return value of __setup handlers
Peng Liu [Fri, 29 Apr 2022 21:36:57 +0000 (14:36 -0700)]
hugetlb: fix return value of __setup handlers

When __setup() return '0', using invalid option values causes the entire
kernel boot option string to be reported as Unknown.  Hugetlb calls
__setup() and will return '0' when set invalid parameter string.

The following phenomenon is observed:
 cmdline:
  hugepagesz=1Y hugepages=1
 dmesg:
  HugeTLB: unsupported hugepagesz=1Y
  HugeTLB: hugepages=1 does not follow a valid hugepagesz, ignoring
  Unknown kernel command line parameters "hugepagesz=1Y hugepages=1"

Since hugetlb will print warning/error information before return for
invalid parameter string, just use return '1' to avoid print again.

Link: https://lkml.kernel.org/r/20220413032915.251254-4-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Yuntao <liuyuntao10@huawei.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agohugetlb: fix hugepages_setup when deal with pernode
Peng Liu [Fri, 29 Apr 2022 21:36:57 +0000 (14:36 -0700)]
hugetlb: fix hugepages_setup when deal with pernode

Hugepages can be specified to pernode since "hugetlbfs: extend the
definition of hugepages parameter to support node allocation", but the
following problem is observed.

Confusing behavior is observed when both 1G and 2M hugepage is set
after "numa=off".
 cmdline hugepage settings:
  hugepagesz=1G hugepages=0:3,1:3
  hugepagesz=2M hugepages=0:1024,1:1024
 results:
  HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
  HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages

Furthermore, confusing behavior can be also observed when an invalid node
behind a valid node.  To fix this, never allocate any typical hugepage
when an invalid parameter is received.

Link: https://lkml.kernel.org/r/20220413032915.251254-3-liupeng256@huawei.com
Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Yuntao <liuyuntao10@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agohugetlb: fix wrong use of nr_online_nodes
Peng Liu [Fri, 29 Apr 2022 21:36:57 +0000 (14:36 -0700)]
hugetlb: fix wrong use of nr_online_nodes

Patch series "hugetlb: Fix some incorrect behavior", v3.

This series fix three bugs of hugetlb:
1) Invalid use of nr_online_nodes;
2) Inconsistency between 1G hugepage and 2M hugepage;
3) Useless information in dmesg.

This patch (of 4):

Certain systems are designed to have sparse/discontiguous nodes.  In this
case, nr_online_nodes can not be used to walk through numa node.  Also, a
valid node may be greater than nr_online_nodes.

However, in hugetlb, it is assumed that nodes are contiguous.

For sparse/discontiguous nodes, the current code may treat a valid node
as invalid, and will fail to allocate all hugepages on a valid node that
"nid >= nr_online_nodes".

As David suggested:

if (tmp >= nr_online_nodes)
goto invalid;

Just imagine node 0 and node 2 are online, and node 1 is offline.
Assuming that "node < 2" is valid is wrong.

Recheck all the places that use nr_online_nodes, and repair them one by
one.

[liupeng256@huawei.com: v4]
Link: https://lkml.kernel.org/r/20220416103526.3287348-1-liupeng256@huawei.com
Link: https://lkml.kernel.org/r/20220413032915.251254-1-liupeng256@huawei.com
Link: https://lkml.kernel.org/r/20220413032915.251254-2-liupeng256@huawei.com
Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of gigantic pages can't work")
Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages parameter")
Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue warnings")
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Liu Yuntao <liuyuntao10@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agodrivers/base/memory: fix an unlikely reference counting issue in __add_memory_block()
Christophe JAILLET [Fri, 29 Apr 2022 06:16:19 +0000 (23:16 -0700)]
drivers/base/memory: fix an unlikely reference counting issue in __add_memory_block()

__add_memory_block() calls both put_device() and device_unregister() when
storing the memory block into the xarray.  This is incorrect because
xarray doesn't take an additional reference and device_unregister()
already calls put_device().

Triggering the issue looks really unlikely and its only effect should be
to log a spurious warning about a ref counted issue.

Link: https://lkml.kernel.org/r/d44c63d78affe844f020dc02ad6af29abc448fc4.1650611702.git.christophe.jaillet@wanadoo.fr
Fixes: 4fb6eabf1037 ("drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Scott Cheloha <cheloha@linux.vnet.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: compaction: make sure highest is above the min_pfn
Miaohe Lin [Fri, 29 Apr 2022 06:16:19 +0000 (23:16 -0700)]
mm: compaction: make sure highest is above the min_pfn

It's not guaranteed that highest will be above the min_pfn.  If highest is
below the min_pfn, migrate_pfn and free_pfn can meet prematurely and lead
to some useless work.  Make sure highest is above min_pfn to avoid making
a futile effort.

Link: https://lkml.kernel.org/r/20220418141253.24298-13-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Charan Teja Kalla <charante@codeaurora.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pintu Kumar <pintu@codeaurora.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: compaction: simplify the code in __compact_finished
Miaohe Lin [Fri, 29 Apr 2022 06:16:19 +0000 (23:16 -0700)]
mm: compaction: simplify the code in __compact_finished

Since commit efe771c7603b ("mm, compaction: always finish scanning of a
full pageblock"), compaction will always finish scanning a pageblock.  And
migrate_pfn is assured to align with pageblock_nr_pages when we reach
here.  So we will always return COMPACT_SUCCESS if a suitable fallback is
found due to the below IS_ALIGNED check of migrate_pfn.  Simplify the code
to make this clear and improve the readability.  No functional change
intended.

Link: https://lkml.kernel.org/r/20220418141253.24298-12-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Charan Teja Kalla <charante@codeaurora.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pintu Kumar <pintu@codeaurora.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: compaction: make compaction_zonelist_suitable return false when COMPACT_SUCCESS
Miaohe Lin [Fri, 29 Apr 2022 06:16:18 +0000 (23:16 -0700)]
mm: compaction: make compaction_zonelist_suitable return false when COMPACT_SUCCESS

When compact_result indicates that the allocation should now succeed, i.e.
compact_result = COMPACT_SUCCESS, compaction_zonelist_suitable should
return false because there is no need to do compaction now.

Link: https://lkml.kernel.org/r/20220418141253.24298-11-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Charan Teja Kalla <charante@codeaurora.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pintu Kumar <pintu@codeaurora.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: compaction: avoid possible NULL pointer dereference in kcompactd_cpu_online
Miaohe Lin [Fri, 29 Apr 2022 06:16:18 +0000 (23:16 -0700)]
mm: compaction: avoid possible NULL pointer dereference in kcompactd_cpu_online

It's possible that kcompactd_run could fail to run kcompactd for a hot
added node and leave pgdat->kcompactd as NULL.  So pgdat->kcompactd should
be checked here to avoid possible NULL pointer dereference.

Link: https://lkml.kernel.org/r/20220418141253.24298-10-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Charan Teja Kalla <charante@codeaurora.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pintu Kumar <pintu@codeaurora.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: compaction: clean up comment about async compaction in isolate_migratepages
Miaohe Lin [Fri, 29 Apr 2022 06:16:18 +0000 (23:16 -0700)]
mm: compaction: clean up comment about async compaction in isolate_migratepages

Since commit 282722b0d258 ("mm, compaction: restrict async compaction to
pageblocks of same migratetype"), async direct compaction is restricted to
scan the pageblocks of same migratetype.  Correct the comment accordingly.

Link: https://lkml.kernel.org/r/20220418141253.24298-9-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Charan Teja Kalla <charante@codeaurora.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pintu Kumar <pintu@codeaurora.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: compaction: use helper compound_nr in isolate_migratepages_block
Miaohe Lin [Fri, 29 Apr 2022 06:16:18 +0000 (23:16 -0700)]
mm: compaction: use helper compound_nr in isolate_migratepages_block

Use helper compound_nr to make use of compound_nr when CONFIG_64BIT and
simplify the code a bit.

Link: https://lkml.kernel.org/r/20220418141253.24298-8-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Charan Teja Kalla <charante@codeaurora.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pintu Kumar <pintu@codeaurora.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: compaction: use COMPACT_CLUSTER_MAX in compaction.c
Miaohe Lin [Fri, 29 Apr 2022 06:16:18 +0000 (23:16 -0700)]
mm: compaction: use COMPACT_CLUSTER_MAX in compaction.c

Always use COMPACT_CLUSTER_MAX here as we're doing the compaction.  Minor
improvements in readability.

Link: https://lkml.kernel.org/r/20220418141253.24298-7-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Charan Teja Kalla <charante@codeaurora.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pintu Kumar <pintu@codeaurora.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: compaction: clean up comment about suitable migration target recheck
Miaohe Lin [Fri, 29 Apr 2022 06:16:17 +0000 (23:16 -0700)]
mm: compaction: clean up comment about suitable migration target recheck

checked_pageblock is already removed and suitable_migration_target is not
rechecked under the zone lock since commit f8224aa5a0a4 ("mm, compaction:
do not recheck suitable_migration_target under lock").  Correct the
comment accordingly.

Link: https://lkml.kernel.org/r/20220418141253.24298-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Charan Teja Kalla <charante@codeaurora.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pintu Kumar <pintu@codeaurora.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: compaction: clean up comment for sched contention
Miaohe Lin [Fri, 29 Apr 2022 06:16:17 +0000 (23:16 -0700)]
mm: compaction: clean up comment for sched contention

Since commit cf66f0700c8f ("mm, compaction: do not consider a need to
reschedule as contention"), async compaction won't abort when scheduling
is needed.  Correct the relevant comment accordingly.

Link: https://lkml.kernel.org/r/20220418141253.24298-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Charan Teja Kalla <charante@codeaurora.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pintu Kumar <pintu@codeaurora.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: compaction: remove unneeded assignment to isolate_start_pfn
Miaohe Lin [Fri, 29 Apr 2022 06:16:17 +0000 (23:16 -0700)]
mm: compaction: remove unneeded assignment to isolate_start_pfn

isolate_start_pfn is unused when cc->nr_freepages !  = 0.  Otherwise
cc->free_pfn will overwrite it unconditionally.  So we should remove this
unneeded and somewhat misleading assignment.

Link: https://lkml.kernel.org/r/20220418141253.24298-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Charan Teja Kalla <charante@codeaurora.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pintu Kumar <pintu@codeaurora.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>