www.infradead.org Git - users/jedix/linux-maple.git/log

]> www.infradead.org Git - users/jedix/linux-maple.git/log

projects / users / jedix / linux-maple.git / log

Liam R. Howlett [Fri, 13 May 2022 14:13:22 +0000 (10:13 -0400)]

mm/mmap: Fix potential leak on do_mas_align_munmap()

There is a leak when the system is low on resources and fails to
allocate enough memory to complete the munmap task. Fix this by adding
the necessary free operations in the unwinding.

Fixes: a760774e7b7b (mm: start tracking VMAs with maple tree)
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 12 May 2022 17:53:29 +0000 (13:53 -0400)]

mm/mmap: Fix leak on expand_downwards() and expand_upwards()

A memory leak is possible in the race and error path in both
expand_downwards() and expand_upwards() due to the maple tree
preallocations. Fix these by always destroying the maple state.

Fixes: a760774e7b7b (mm: start tracking VMAs with maple tree)
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 11 May 2022 15:51:36 +0000 (11:51 -0400)]

mm/mmap: Remove case 1,6 from using vma_adjust()

case 1 and 6 can use vma_expand() as it is written today.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 11 May 2022 13:51:25 +0000 (09:51 -0400)]

test_maple_tree: Add null expansion tests

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 11 May 2022 14:43:59 +0000 (10:43 -0400)]

radix tree test suite: Add static-libasan

static-libasan removes the (new?) need to use LD_PRELOAD when running
the tests.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 11 May 2022 13:51:01 +0000 (09:51 -0400)]

maple_tree: Drop unnecessary code from mas_wr_extend_null()

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 10 May 2022 16:20:17 +0000 (12:20 -0400)]

vma_replace: Fix memory leak

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 10 May 2022 02:46:56 +0000 (22:46 -0400)]

mm/mremap: Remove vma_adjust() call from mremap.

vma_adjust() is called after vma_expandable() ensures the area is
completely empty. Since this is the trivial expansion case, the
vma_expand() function can be used instead.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 10 May 2022 02:41:23 +0000 (22:41 -0400)]

fs/exec: Remove vma_adjust() call and use mmap and maple tree calls.

Remove yet another user of vma_adjust().

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 10 May 2022 02:41:03 +0000 (22:41 -0400)]

mm/mmap.c: Fix vma_link() documentation typo

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 10 May 2022 02:37:55 +0000 (22:37 -0400)]

maple_tree: Fix null expand into ULONG_MAX causing incorrect metadata

When expanding a null write to ULONG_MAX, it may cause the metadata
calculation to be off by one. Fix this issue by detecting the offset
with write maple state end_piv instead of reading the node data.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 5 May 2022 20:51:19 +0000 (13:51 -0700)]

mm/mmap: Introduce sub_vma() and use it in __split_vma()

sub_vma() creates a vma that covers a subset of the addresses covered by
the larger vma. Use this new function in __split_vma() and in later
commits of the series.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 28 Apr 2022 16:18:54 +0000 (12:18 -0400)]

vma_replace fix

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 12 Apr 2022 15:12:06 +0000 (11:12 -0400)]

mm: Replace vma on split_vma() calls.

When splitting a VMA, create two new VMAs to replace both parts of the
VMA. Change the callers to pass in a pointer and update the pointer to
the new VMA based on the value of new_below.

do_mas_align_munmap() needed to update a local variable in the case of
splitting the end and only having one VMA to split.

mprotect_fixup() needed to change where it set the previous pointer to
after the split to avoid a use-after-free scenario.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 30 Mar 2022 17:35:49 +0000 (13:35 -0400)]

mm: Change munmap splitting order and move_vma()

Splitting can be more efficient when done in the reverse order to
minimize VMA walking. Change do_mas_align_munmap() to reduce walking of
the tree during split operations.

move_vma() must also be altered to remove the dependency of keeping the
original VMA as the active part of the split. Look up the new VMA or
two if necessary.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 22 Mar 2022 16:38:04 +0000 (12:38 -0400)]

maple_tree: change mas_split_final_node() logic a bit

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 22 Mar 2022 16:37:31 +0000 (12:37 -0400)]

maple_tree: Reduce mas_split() memory usage

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:24 +0000 (23:07 -0700)]

mm/mmap.c: pass in mapping to __vma_link_file()

__vma_link_file() resolves the mapping from the file, if there is one.
Pass through the mapping and check the vm_file externally since most
places already have the required information and check of vm_file.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:24 +0000 (23:07 -0700)]

mm/mmap: drop range_has_overlap() function

Since there is no longer a linked list, the range_has_overlap() function
is identical to the find_vma_intersection() function.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:24 +0000 (23:07 -0700)]

mm: remove the vma linked list

Replace any vm_next use with vma_find().

Update free_pgtables(), unmap_vmas(), and zap_page_range() to use the
maple tree.

Use the new free_pgtables() and unmap_vmas() in do_mas_align_munmap(). At
the same time, alter the loop to be more compact.

Now that free_pgtables() and unmap_vmas() take a maple tree as an
argument, rearrange do_mas_align_munmap() to use the new tree to hold the
vmas to remove.

Remove __vma_link_list() and __vma_unlink_list() as they are exclusively
used to update the linked list

Drop linked list update from __insert_vm_struct().

Rework validation of tree as it was depending on the linked list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:24 +0000 (23:07 -0700)]

riscv: use vma iterator for vdso

Remove the linked list use in favour of the vma iterator.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:24 +0000 (23:07 -0700)]

nommu: remove uses of VMA linked list

Use the maple tree or VMA iterator instead. This is faster and will allow
us to shrink the VMA.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:23 +0000 (23:07 -0700)]

i915: use the VMA iterator

Replace the linked list in probe_range() with the VMA iterator.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:23 +0000 (23:07 -0700)]

mm/swapfile: use vma iterator instead of vma linked list

unuse_mm() no longer needs to reference the linked list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:23 +0000 (23:07 -0700)]

mm/pagewalk: use vma_find() instead of vma linked list

walk_page_range() no longer uses the one vma linked list reference.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:23 +0000 (23:07 -0700)]

mm/oom_kill: use maple tree iterators instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:23 +0000 (23:07 -0700)]

mm/msync: use vma_find() instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:22 +0000 (23:07 -0700)]

mm/mremap: use vma_find_intersection() instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:22 +0000 (23:07 -0700)]

mm/mprotect: use maple tree navigation instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:22 +0000 (23:07 -0700)]

mm/mlock: use vma iterator and maple state instead of vma linked list

Handle overflow checking in count_mm_mlocked_page_nr() differently.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:22 +0000 (23:07 -0700)]

mm/mempolicy: use vma iterator & maple state instead of vma linked list

Reworked the way mbind_range() finds the first VMA to reuse the maple
state and limit the number of tree walks needed.

Note, this drops the VM_BUG_ON(!vma) call, which would catch a start
address higher than the last VMA. The code was written in a way that
allowed no VMA updates to occur and still return success. There should be
no functional change to this scenario with the new code.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:22 +0000 (23:07 -0700)]

mm/memcontrol: stop using mm->highest_vm_end

Pass through ULONG_MAX instead.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:21 +0000 (23:07 -0700)]

mm/madvise: use vma_find() instead of vma linked list

madvise_walk_vmas() no longer uses linked list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:21 +0000 (23:07 -0700)]

mm/ksm: use vma iterators instead of vma linked list

Remove the use of the linked list for eventual removal.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:21 +0000 (23:07 -0700)]

mm/khugepaged: stop using vma linked list

Use vma iterator & find_vma() instead of vma linked list.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:21 +0000 (23:07 -0700)]

mm/gup: use maple tree navigation instead of linked list

Use find_vma_intersection() to locate the VMAs in __mm_populate() instead
of using find_vma() and the linked list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:21 +0000 (23:07 -0700)]

bpf: remove VMA linked list

Use vma_next() and remove reference to the start of the linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:21 +0000 (23:07 -0700)]

fork: use VMA iterator

The VMA iterator is faster than the linked list and removing the linked
list will shrink the vm_area_struct.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:20 +0000 (23:07 -0700)]

sched: use maple tree iterator to walk VMAs

The linked list is slower than walking the VMAs using the maple tree. We
can't use the VMA iterator here because it doesn't support moving to an
earlier position.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:20 +0000 (23:07 -0700)]

perf: use VMA iterator

The VMA iterator is faster than the linked list and removing the linked
list will shrink the vm_area_struct.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:20 +0000 (23:07 -0700)]

acct: use VMA iterator instead of linked list

The VMA iterator is faster than the linked list.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:20 +0000 (23:07 -0700)]

ipc/shm: use VMA iterator instead of linked list

The VMA iterator is faster than the linked llist, and it can be walked
even when VMAs are being removed from the address space, so there's no
need to keep track of 'next'.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:20 +0000 (23:07 -0700)]

userfaultfd: use maple tree iterator to iterate VMAs

Don't use the mm_struct linked list or the vma->vm_next in prep for
removal.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:19 +0000 (23:07 -0700)]

fs/proc/task_mmu: stop using linked list and highest_vm_end

Remove references to mm_struct linked list and highest_vm_end for when
they are removed

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:19 +0000 (23:07 -0700)]

fs/proc/base: use maple tree iterators in place of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:19 +0000 (23:07 -0700)]

exec: use VMA iterator instead of linked list

Remove a use of the vm_next list by doing the initial lookup with the VMA
iterator and then using it to find the next entry.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:19 +0000 (23:07 -0700)]

coredump: remove vma linked list walk

Use the Maple Tree iterator instead. This is too complicated for the VMA
iterator to handle, so let's open-code it for now. If this turns out to
be a common pattern, we can migrate it to common code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:19 +0000 (23:07 -0700)]

um: remove vma linked list walk

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:18 +0000 (23:07 -0700)]

optee: remove vma linked list walk

Use the VMA iterator instead. Change the calling convention of
__check_mem_type() to pass in the mm instead of the first vma in the
range.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:18 +0000 (23:07 -0700)]

cxl: remove vma linked list walk

Use the VMA iterator instead. This requires a little restructuring of the
surrounding code to hoist the mm to the caller. That turns
cxl_prefault_one() into a trivial function, so call cxl_fault_segment()
directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:18 +0000 (23:07 -0700)]

xtensa: remove vma linked list walks

Use the VMA iterator instead. Since VMA can no longer be NULL in the
loop, then deal with out-of-memory outside the loop. This means a
slightly longer run time in the failure case (-ENOMEM) - it will run to
the end of the VMAs before erroring instead of in the middle of the loop.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:18 +0000 (23:07 -0700)]

x86: remove vma linked list walks

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:18 +0000 (23:07 -0700)]

s390: remove vma linked list walks

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:17 +0000 (23:07 -0700)]

powerpc: remove mmap linked list walks

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:17 +0000 (23:07 -0700)]

parisc: remove mmap linked list from cache handling

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Fri, 18 Feb 2022 02:37:04 +0000 (02:37 +0000)]

arm64: Change elfcore for_each_mte_vma() to use VMA iterator

Rework for_each_mte_vma() to use a VMA iterator instead of an explicit
linked-list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220218023650.672072-1-Liam.Howlett@oracle.com
Signed-off-by: Will Deacon <will@kernel.org>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:17 +0000 (23:07 -0700)]

arm64: remove mmap linked list from vdso

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:17 +0000 (23:07 -0700)]

mm/mmap: change do_brk_munmap() to use do_mas_align_munmap()

do_brk_munmap() has already aligned the address and has a maple tree state
to be used. Use the new do_mas_align_munmap() to avoid unnecessary
alignment and error checks.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:17 +0000 (23:07 -0700)]

mm/mmap: reorganize munmap to use maple states

Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and
do_mas_align_munmap().

do_munmap() is a wrapper to create a maple state for any callers that have
not been converted to the maple tree.

do_mas_munmap() takes a maple state to mumap a range.  This is just a
small function which checks for error conditions and aligns the end of the
range.

do_mas_align_munmap() uses the aligned range to mumap a range.
do_mas_align_munmap() starts with the first VMA in the range, then finds
the last VMA in the range.  Both start and end are split if necessary.
Then the VMAs are removed from the linked list and the mm mlock count is
updated at the same time.  Followed by a single tree operation of
overwriting the area in with a NULL.  Finally, the detached list is
unmapped and freed.

By reorganizing the munmap calls as outlined, it is now possible to avoid
extra work of aligning pre-aligned callers which are known to be safe,
avoid extra VMA lookups or tree walks for modifications.

detach_vmas_to_be_unmapped() is no longer used, so drop this code.

vm_brk_flags() can just call the do_mas_munmap() as it checks for
intersecting VMAs directly.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:16 +0000 (23:07 -0700)]

mm/mmap: move mmap_region() below do_munmap()

Relocation of code for the next commit. There should be no changes here.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:16 +0000 (23:07 -0700)]

mm: convert vma_lookup() to use mtree_load()

Unlike the rbtree, the Maple Tree will return a NULL if there's nothing at
a particular address.

Since the previous commit dropped the vmacache, it is now possible to
consult the tree directly.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:16 +0000 (23:07 -0700)]

mm: remove vmacache

By using the maple tree and the maple tree state, the vmacache is no
longer beneficial and is complicating the VMA code. Remove the vmacache
to reduce the work in keeping it up to date and code complexity.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:16 +0000 (23:07 -0700)]

mm/mmap: use advanced maple tree API for mmap_region()

Changing mmap_region() to use the maple tree state and the advanced maple
tree interface allows for a lot less tree walking.

This change removes the last caller of munmap_vma_range(), so drop this
unused function.

Add vma_expand() to expand a VMA if possible by doing the necessary
hugepage check, uprobe_munmap of files, dcache flush, modifications then
undoing the detaches, etc.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:16 +0000 (23:07 -0700)]

mm: use maple tree operations for find_vma_intersection()

Move find_vma_intersection() to mmap.c and change implementation to maple
tree.

When searching for a vma within a range, it is easier to use the maple
tree interface.

Exported find_vma_intersection() for kvm module.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:15 +0000 (23:07 -0700)]

mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()

Avoid allocating a new VMA when it a vma modification can occur. When a
brk() can expand or contract a VMA, then the single store operation will
only modify one index of the maple tree instead of causing a node to split
or coalesce. This avoids unnecessary allocations/frees of maple tree
nodes and VMAs.

Move some limit & flag verifications out of the do_brk_flags() function to
use only relevant checks in the code path of bkr() and vm_brk_flags().

Set the vma to check if it can expand in vm_brk_flags() if extra criteria
are met.

Drop userfaultfd from do_brk_flags() path and only use it in
vm_brk_flags() path since that is the only place a munmap will happen.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:15 +0000 (23:07 -0700)]

mm/khugepaged: optimize collapse_pte_mapped_thp() by using vma_lookup()

vma_lookup() will walk the vma tree once and not continue to look for the
next vma. Since the exact vma is checked below, this is a more optimal
way of searching.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:15 +0000 (23:07 -0700)]

mm: optimize find_exact_vma() to use vma_lookup()

Use vma_lookup() to walk the tree to the start value requested. If the
vma at the start does not match, then the answer is NULL and there is no
need to look at the next vma the way that find_vma() would.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:15 +0000 (23:07 -0700)]

xen: use vma_lookup() in privcmd_ioctl_mmap()

vma_lookup() walks the VMA tree for a specific value, find_vma() will
search the tree after walking to a specific value. It is more efficient
to only walk to the requested value since privcmd_ioctl_mmap() will exit
the loop if vm_start != msg->va.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:15 +0000 (23:07 -0700)]

mmap: change zeroing of maple tree in __vma_adjust()

Only write to the maple tree if we are not inserting or the insert isn't
going to overwrite the area to clear. This avoids spanning writes and
node coealescing when unnecessary.

The change requires a custom search for the linked list addition to find
the correct VMA for the prev link.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:14 +0000 (23:07 -0700)]

mm: remove rb tree.

Remove the RB tree and start using the maple tree for vm_area_struct
tracking.

Drop validate_mm() calls in expand_upwards() and expand_downwards() as the
lock is not held.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:14 +0000 (23:07 -0700)]

proc: remove VMA rbtree use from nommu

These users of the rbtree should probably have been walks of the linked
list, but convert them to use walks of the maple tree.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Andrew Morton [Wed, 11 May 2022 00:47:12 +0000 (17:47 -0700)]

damon-convert-__damon_va_three_regions-to-use-the-vma-iterator-fix

vaddr-test.h no longer needs internal.h

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:14 +0000 (23:07 -0700)]

damon: Convert __damon_va_three_regions to use the VMA iterator

This rather specialised walk can use the VMA iterator. If this proves to
be too slow, we can write a custom routine to find the two largest gaps,
but it will be somewhat complicated, so let's see if we need it first.

Update the kunit test case to use the maple tree. This also fixes an
issue with the kunit testcase not adding the last VMA to the list.

Fixes: 17ccae8bb5c9 (mm/damon: add kunit tests)
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: SeongJae Park <sj@kernel.org>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:14 +0000 (23:07 -0700)]

kernel/fork: use maple tree for dup_mmap() during forking

The maple tree was already tracking VMAs in this function by an earlier
commit, but the rbtree iterator was being used to iterate the list.
Change the iterator to use a maple tree native iterator and switch to the
maple tree advanced API to avoid multiple walks of the tree during insert
operations.  Unexport the now-unused vma_store() function.

For performance reasons we bulk allocate the maple tree nodes.  The node
calculations are done internally to the tree and use the VMA count and
assume the worst-case node requirements.  The VM_DONT_COPY flag does not
allow for the most efficient copy method of the tree and so a bulk loading
algorithm is used.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:14 +0000 (23:07 -0700)]

mm/mmap: use maple tree for unmapped_area{_topdown}

The maple tree code was added to find the unmapped area in a previous
commit and was checked against what the rbtree returned, but the actual
result was never used. Start using the maple tree implementation and
remove the rbtree code.

Add kernel documentation comment for these functions.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:13 +0000 (23:07 -0700)]

mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree

Use the maple tree's advanced API and a maple state to walk the tree for
the entry at the address of the next vma, then use the maple state to walk
back one entry to find the previous entry.

Add kernel documentation comments for this API.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:13 +0000 (23:07 -0700)]

mm/mmap: use the maple tree in find_vma() instead of the rbtree.

Using the maple tree interface mt_find() will handle the RCU locking and
will start searching at the address up to the limit, ULONG_MAX in this
case.

Add kernel documentation to this API.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:13 +0000 (23:07 -0700)]

mmap: use the VMA iterator in count_vma_pages_range()

This simplifies the implementation and is faster than using the linked
list.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 14 Apr 2022 06:07:13 +0000 (23:07 -0700)]

mm: add VMA iterator

This thin layer of abstraction over the maple tree state is for iterating
over VMAs. You can go forwards, go backwards or ask where the iterator
is. Rename the existing vma_next() to __vma_next() -- it will be removed
by the end of this series.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Andrew Morton [Wed, 11 May 2022 00:47:10 +0000 (17:47 -0700)]

mapletree: build fix

Fix the vma_mas_store/vma_mas_remove issues. Missing prototypes, missing
implementation on nommu.

Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:13 +0000 (23:07 -0700)]

mm: start tracking VMAs with maple tree

Start tracking the VMAs with the new maple tree structure in parallel with
the rb_tree. Add debug and trace events for maple tree operations and
duplicate the rb_tree that is created on forks into the maple tree.

The maple tree is added to the mm_struct including the mm_init struct,
added support in required mm/mmap functions, added tracking in kernel/fork
for process forking, and used to find the unmapped_area and checked
against what the rbtree finds.

This also moves the mmap_lock() in exit_mmap() since the oom reaper call
does walk the VMAs. Otherwise lockdep will be unhappy if oom happens.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:12 +0000 (23:07 -0700)]

lib/test_maple_tree: add testing for maple tree

This is a test suite that uses the radix test infrastructure. It has
been split into its own commit to allow for easier review of the maple
tree code.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:12 +0000 (23:07 -0700)]

radix tree test suite: add lockdep_is_held to header

maple tree uses lockdep_is_held, so define it as external in the header.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:12 +0000 (23:07 -0700)]

radix tree test suite: add support for slab bulk APIs

Add support for kmem_cache_free_bulk() and kmem_cache_alloc_bulk() to the
radix tree test suite.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:11 +0000 (23:07 -0700)]

radix tree test suite: add allocation counts and size to kmem_cache

Add functions to get the number of allocations, and total allocations from
a kmem_cache. Also add a function to get the allocated size and a way to
zero the total allocations.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:11 +0000 (23:07 -0700)]

radix tree test suite: add kmem_cache_set_non_kernel()

kmem_cache_set_non_kernel() is a mechanism to allow a certain number of
kmem_cache_alloc requests to succeed even when GFP_KERNEL is not set in
the flags. This functionality allows for testing different paths though
the code.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:11 +0000 (23:07 -0700)]

radix tree test suite: add pr_err define

Patch series "Introducing the Maple Tree".

This patch (of 70):

define pr_err to printk

Link: https://lkml.kernel.org/r/20220404143501.2016403-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:12 +0000 (23:07 -0700)]

Maple Tree: add new data structure

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently.  There are a number of places in the kernel that a
non-overlapping range-based tree would be beneficial, especially one with a
simple interface.  If you use an rbtree with other data structures to improve
performance or an interval tree to track non-overlapping ranges, then this is
for you.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes.
With the increased branching factor, it is significantly shorter than the
rbtree so it has fewer cache misses.  The removal of the linked list between
subsequent entries also reduces the cache misses and the need to pull in the
previous and next VMA during many tree alterations.

The first user that is covered in this patch set is the vm_area_struct, where
three data structures are replaced by the maple tree: the augmented rbtree, the
vma cache, and the linked list of VMAs in the mm_struct.  The long term goal is
to reduce or remove the mmap_lock contention.

The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers.  A single write operation will be allowed
at a time.  A reader re-walks if stale data is encountered. VMAs would be RCU
enabled and this mode would be entered once multiple tasks are using the
mm_struct.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 14 Apr 2022 06:07:12 +0000 (23:07 -0700)]

mips: rename mt_init to mips_mt_init

Move mt_init out of the way for the maple tree. Use mips_mt prefix to
match the rest of the functions in the file.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Hailong Tu [Fri, 29 Apr 2022 21:37:00 +0000 (14:37 -0700)]

mm/damon/reclaim: fix the timer always stays active

The timer stays active even if the reclaim mechanism is never enabled. It
is unnecessary overhead can be completely avoided by using
module_param_cb() for enabled flag.

Link: https://lkml.kernel.org/r/20220421125910.1052459-1-tuhailong@gmail.com
Signed-off-by: Hailong Tu <tuhailong@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Yu Zhe [Fri, 29 Apr 2022 21:37:00 +0000 (14:37 -0700)]

mm/damon: remove unnecessary type castings

Remove unnecessary void* type castings.

Link: https://lkml.kernel.org/r/20220421153056.8474-1-yuzhe@nfschina.com
Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: liqiong <liqiong@nfschina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

SeongJae Park [Fri, 29 Apr 2022 21:37:00 +0000 (14:37 -0700)]

mm/damon/core-test: add a kunit test case for ops registration

This commit adds a simple kunit test case for DAMON operations
registration feature.

Link: https://lkml.kernel.org/r/20220419122225.290518-1-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Xiaomeng Tong [Fri, 29 Apr 2022 21:37:00 +0000 (14:37 -0700)]

damon: vaddr-test: tweak code to make the logic clearer

Move these two lines into the damon_for_each_region loop, it is always for
testing the last region. And also avoid to use a list iterator 'r'
outside the loop which is considered harmful[1].

[1]: https://lkml.org/lkml/2022/2/17/1032

Link: https://lkml.kernel.org/r/20220328115252.31675-1-xiam0nd.tong@gmail.com
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Yosry Ahmed [Fri, 29 Apr 2022 21:37:00 +0000 (14:37 -0700)]

selftests: cgroup: add a selftest for memory.reclaim

Add a new test for memory.reclaim that verifies that the interface
correctly reclaims memory as intended, from both anon and file pages.

Link: https://lkml.kernel.org/r/20220425190040.2475377-5-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Yosry Ahmed [Fri, 29 Apr 2022 21:36:59 +0000 (14:36 -0700)]

selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory

Currently, alloc_anon_noexit() calls alloc_anon() which instantly frees
the allocated memory. alloc_anon_noexit() is usually used with
cg_run_nowait() to run a process in the background that allocates
memory. It makes sense for the background process to keep the memory
allocated and not instantly free it (otherwise there is no point of
running it in the background).

Link: https://lkml.kernel.org/r/20220425190040.2475377-4-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Yosry Ahmed [Fri, 29 Apr 2022 21:36:59 +0000 (14:36 -0700)]

selftests: cgroup: return -errno from cg_read()/cg_write() on failure

Currently, cg_read()/cg_write() returns 0 on success and -1 on failure.
Modify them to return the -errno on failure.

Link: https://lkml.kernel.org/r/20220425190040.2475377-3-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Shakeel Butt [Fri, 29 Apr 2022 21:36:59 +0000 (14:36 -0700)]

memcg: introduce per-memcg reclaim interface

This patch series adds a memory.reclaim proactive reclaim interface.
The rationale behind the interface and how it works are in the first
patch.

This patch (of 4):

Introduce a memcg interface to trigger memory reclaim on a memory cgroup.

Use case: Proactive Reclaim
---------------------------

A userspace proactive reclaimer can continuously probe the memcg to
reclaim a small amount of memory.  This gives more accurate and up-to-date
workingset estimation as the LRUs are continuously sorted and can
potentially provide more deterministic memory overcommit behavior.  The
memory overcommit controller can provide more proactive response to the
changing behavior of the running applications instead of being reactive.

A userspace reclaimer's purpose in this case is not a complete replacement
for kswapd or direct reclaim, it is to proactively identify memory savings
opportunities and reclaim some amount of cold pages set by the policy to
free up the memory for more demanding jobs or scheduling new jobs.

A user space proactive reclaimer is used in Google data centers.
Additionally, Meta's TMO paper recently referenced a very similar
interface used for user space proactive reclaim:
https://dl.acm.org/doi/pdf/10.1145/3503222.3507731

Benefits of a user space reclaimer:
-----------------------------------

1) More flexible on who should be charged for the cpu of the memory
   reclaim.  For proactive reclaim, it makes more sense to be centralized.

2) More flexible on dedicating the resources (like cpu).  The memory
   overcommit controller can balance the cost between the cpu usage and
   the memory reclaimed.

3) Provides a way to the applications to keep their LRUs sorted, so,
   under memory pressure better reclaim candidates are selected.  This
   also gives more accurate and uptodate notion of working set for an
   application.

Why memory.high is not enough?
------------------------------

- memory.high can be used to trigger reclaim in a memcg and can
  potentially be used for proactive reclaim.  However there is a big
  downside in using memory.high.  It can potentially introduce high
  reclaim stalls in the target application as the allocations from the
  processes or the threads of the application can hit the temporary
  memory.high limit.

- Userspace proactive reclaimers usually use feedback loops to decide
  how much memory to proactively reclaim from a workload.  The metrics
  used for this are usually either refaults or PSI, and these metrics will
  become messy if the application gets throttled by hitting the high
  limit.

- memory.high is a stateful interface, if the userspace proactive
  reclaimer crashes for any reason while triggering reclaim it can leave
  the application in a bad state.

- If a workload is rapidly expanding, setting memory.high to proactively
  reclaim memory can result in actually reclaiming more memory than
  intended.

The benefits of such interface and shortcomings of existing interface were
further discussed in this RFC thread:
https://lore.kernel.org/linux-mm/5df21376-7dd1-bf81-8414-32a73cea45dd@google.com/

Interface:
----------

Introducing a very simple memcg interface 'echo 10M > memory.reclaim' to
trigger reclaim in the target memory cgroup.

The interface is introduced as a nested-keyed file to allow for future
optional arguments to be easily added to configure the behavior of
reclaim.

Possible Extensions:
--------------------

- This interface can be extended with an additional parameter or flags
  to allow specifying one or more types of memory to reclaim from (e.g.
  file, anon, ..).

- The interface can also be extended with a node mask to reclaim from
  specific nodes. This has use cases for reclaim-based demotion in memory
  tiering systens.

- A similar per-node interface can also be added to support proactive
  reclaim and reclaim-based demotion in systems without memcg.

- Add a timeout parameter to make it easier for user space to call the
  interface without worrying about being blocked for an undefined amount
  of time.

For now, let's keep things simple by adding the basic functionality.

[yosryahmed@google.com: worked on versions v2 onwards, refreshed to
current master, updated commit message based on recent
discussions and use cases]
Link: https://lkml.kernel.org/r/20220425190040.2475377-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20220425190040.2475377-2-yosryahmed@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Co-developed-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Wei Xu <weixugc@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Brian Geffon [Fri, 29 Apr 2022 21:36:59 +0000 (14:36 -0700)]

zram: add a huge_idle writeback mode

Today it's only possible to write back as a page, idle, or huge.  A user
might want to writeback pages which are huge and idle first as these idle
pages do not require decompression and make a good first pass for
writeback.

Idle writeback specifically has the advantage that a refault is unlikely
given that the page has been swapped for some amount of time without being
refaulted.

Huge writeback has the advantage that you're guaranteed to get the maximum
benefit from a single page writeback, that is, you're reclaiming one full
page of memory.  Pages which are compressed in zram being written back
result in some benefit which is always less than a page size because of
the fact that it was compressed.

The primary use of this is for minimizing refaults in situations where the
device has to be sensitive to storage endurance.  On ChromeOS we have
devices with slow eMMC and repeated writes and refaults can negatively
affect performance and endurance.

Link: https://lkml.kernel.org/r/20220322215821.1196994-1-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Chen Wandun [Fri, 29 Apr 2022 21:36:59 +0000 (14:36 -0700)]

mm/page_alloc: simplify update of pgdat in wake_all_kswapds

There is no need to update last_pgdat for each zone, only update
last_pgdat when iterating the first zone of a node.

Link: https://lkml.kernel.org/r/20220322115635.2708989-1-chenwandun@huawei.com
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrey Konovalov [Fri, 29 Apr 2022 21:36:58 +0000 (14:36 -0700)]

kasan: mark KASAN_VMALLOC flags as kasan_vmalloc_flags_t

Fix sparse warning:

mm/kasan/shadow.c:496:15: warning: restricted kasan_vmalloc_flags_t degrades to integer

Link: https://lkml.kernel.org/r/52d8fccdd3a48d4bdfd0ff522553bac2a13f1579.1649351254.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom