]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
3 years agomm/mmap: Drop range_has_overlap() function maple_v5.16-rc2
Liam R. Howlett [Fri, 18 Jun 2021 19:06:12 +0000 (15:06 -0400)]
mm/mmap: Drop range_has_overlap() function

Since there is no longer a linked list, the range_has_overlap() function
is identical to the find_vma_intersection() function.  There is only one
place that actually needs the previous vma, so just use vma_prev() in
that one case.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm: Remove the vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:10:54 +0000 (15:10 -0500)]
mm: Remove the vma linked list

Replace any vm_next use with vma_find().

Update free_pgtables(), unmap_vmas(), and zap_page_range() to use the
maple tree.

Use the new free_pgtables() and unmap_vmas() in do_mas_align_munmap().
At the same time, alter the loop to be more compact.

Now that free_pgtables() and unmap_vmas() take a maple tree as an
argument, rearrange do_mas_align_munmap() to use the new table to hold
the lock

Remove __vma_link_list() and __vma_unlink_list() as they are exclusively
used to update the linked list

Rework validation of tree as it was depending on the linked list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agonommu: Remove uses of VMA linked list
Matthew Wilcox (Oracle) [Fri, 29 Oct 2021 14:18:13 +0000 (10:18 -0400)]
nommu: Remove uses of VMA linked list

Use the maple tree or VMA iterator instead.  This is faster and will
allow us to shrink the VMA.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agoi915: Use the VMA iterator
Matthew Wilcox (Oracle) [Sat, 30 Oct 2021 03:46:58 +0000 (23:46 -0400)]
i915: Use the VMA iterator

Replace the O(n.log(n)) loop with an O(n) loop.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agomm/swapfile: Use maple tree iterator instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:07:11 +0000 (15:07 -0500)]
mm/swapfile: Use maple tree iterator instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/pagewalk: Use vma_find() instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:06:47 +0000 (15:06 -0500)]
mm/pagewalk: Use vma_find() instead of vma linked list

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/oom_kill: Use maple tree iterators instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:06:25 +0000 (15:06 -0500)]
mm/oom_kill: Use maple tree iterators instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/msync: Use vma_find() instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:04:14 +0000 (15:04 -0500)]
mm/msync: Use vma_find() instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mremap: Use vma_find() instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:03:49 +0000 (15:03 -0500)]
mm/mremap: Use vma_find() instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mprotect: Use maple tree navigation instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:02:28 +0000 (15:02 -0500)]
mm/mprotect: Use maple tree navigation instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mlock: Use maple tree iterators instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:00:14 +0000 (15:00 -0500)]
mm/mlock: Use maple tree iterators instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mempolicy: Use maple tree iterators instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:59:52 +0000 (14:59 -0500)]
mm/mempolicy: Use maple tree iterators instead of vma linked list

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/memcontrol: Stop using mm->highest_vm_end
Liam R. Howlett [Mon, 4 Jan 2021 19:59:05 +0000 (14:59 -0500)]
mm/memcontrol: Stop using mm->highest_vm_end

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/madvise: Use vma_find() instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:58:35 +0000 (14:58 -0500)]
mm/madvise: Use vma_find() instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/ksm: Use maple tree iterators instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:58:11 +0000 (14:58 -0500)]
mm/ksm: Use maple tree iterators instead of vma linked list

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/khugepaged: Use maple tree iterators instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:56:58 +0000 (14:56 -0500)]
mm/khugepaged: Use maple tree iterators instead of vma linked list

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/gup: Use maple tree navigation instead of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:56:20 +0000 (14:56 -0500)]
mm/gup: Use maple tree navigation instead of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agobpf: Remove VMA linked list
Liam R. Howlett [Wed, 7 Apr 2021 19:56:08 +0000 (15:56 -0400)]
bpf: Remove VMA linked list

Use vma_next() and remove reference to the start of the linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agofork: Use VMA iterator
Liam R. Howlett [Mon, 4 Jan 2021 19:54:49 +0000 (14:54 -0500)]
fork: Use VMA iterator

The VMA iterator is faster than the linked list and removing the linked
list will shrink the vm_area_struct.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agosched: Use maple tree iterator to walk VMAs
Liam R. Howlett [Mon, 4 Jan 2021 19:54:07 +0000 (14:54 -0500)]
sched: Use maple tree iterator to walk VMAs

The linked list is slower than walking the VMAs using the maple tree.
We can't use the VMA iterator here because it doesn't support
moving to an earlier position.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoperf: Use VMA iterator
Liam R. Howlett [Mon, 4 Jan 2021 19:52:39 +0000 (14:52 -0500)]
perf: Use VMA iterator

The VMA iterator is faster than the linked list and removing the linked
list will shrink the vm_area_struct.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoacct: Use VMA iterator instead of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:52:15 +0000 (14:52 -0500)]
acct: Use VMA iterator instead of linked list

The VMA iterator is faster than the linked list.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoipc/shm: Use VMA iterator instead of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:51:19 +0000 (14:51 -0500)]
ipc/shm: Use VMA iterator instead of linked list

The VMA iterator is faster than the linked llist, and it can be walked
even when VMAs are being removed from the address space, so there's no
need to keep track of 'next'.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agouserfaultfd: Use maple tree iterator to iterate VMAs
Liam R. Howlett [Mon, 4 Jan 2021 19:49:47 +0000 (14:49 -0500)]
userfaultfd: Use maple tree iterator to iterate VMAs

Don't use the mm_struct linked list or the vma->vm_next in prep for removal

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agofs/proc/task_mmu: Stop using linked list and highest_vm_end
Liam R. Howlett [Mon, 4 Jan 2021 19:47:48 +0000 (14:47 -0500)]
fs/proc/task_mmu: Stop using linked list and highest_vm_end

Remove references to mm_struct linked list and highest_vm_end for when
they are removed

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agofs/proc/base: Use maple tree iterators in place of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:46:23 +0000 (14:46 -0500)]
fs/proc/base: Use maple tree iterators in place of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoexec: Use VMA iterator instead of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:45:37 +0000 (14:45 -0500)]
exec: Use VMA iterator instead of linked list

Remove a use of the vm_next list by doing the initial lookup with the
VMA iterator and then using it to find the next entry.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agobinfmt_elf: Take the mmap lock when walking the VMA list
Matthew Wilcox (Oracle) [Tue, 26 Oct 2021 21:20:15 +0000 (17:20 -0400)]
binfmt_elf: Take the mmap lock when walking the VMA list

I'm not sure if the VMA list can change under us, but dump_vma_snapshot()
is very careful to take the mmap_lock in write mode.  We only need to
take it in read mode here as we do not care if the size of the stack
VMA changes underneath us.

If it can be changed underneath us, this is a potential use-after-free
for a multithreaded process which is dumping core.

Fixes: 2aa362c49c31 ("coredump: extend core dump note section to contain file names of mapped files")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agocoredump: Remove vma linked list walk
Matthew Wilcox (Oracle) [Tue, 26 Oct 2021 21:18:31 +0000 (17:18 -0400)]
coredump: Remove vma linked list walk

Use the Maple Tree iterator instead.  This is too complicated for the
VMA iterator to handle, so let's open-code it for now.  If this turns
out to be a common pattern, we can migrate it to common code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agobinfmt_elf: Remove vma linked list walk
Liam R. Howlett [Mon, 4 Jan 2021 19:44:16 +0000 (14:44 -0500)]
binfmt_elf: Remove vma linked list walk

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoum: Remove vma linked list walk
Liam R. Howlett [Tue, 16 Mar 2021 19:56:41 +0000 (15:56 -0400)]
um: Remove vma linked list walk

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agooptee: Remove vma linked list walk
Liam R. Howlett [Mon, 4 Jan 2021 19:43:36 +0000 (14:43 -0500)]
optee: Remove vma linked list walk

Use the VMA iterator instead.  Change the calling convention of
__check_mem_type() to pass in the mm instead of the first vma in the
range.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agocxl: Remove vma linked list walk
Liam R. Howlett [Mon, 4 Jan 2021 19:31:50 +0000 (14:31 -0500)]
cxl: Remove vma linked list walk

Use the VMA iterator instead.  This requires a little restructuring
of the surrounding code to hoist the mm to the caller.  That turns
cxl_prefault_one() into a trivial function, so call cxl_fault_segment()
directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoxtensa: Remove vma linked list walks
Liam R. Howlett [Mon, 4 Jan 2021 19:30:59 +0000 (14:30 -0500)]
xtensa: Remove vma linked list walks

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agox86: Remove vma linked list walks
Liam R. Howlett [Mon, 4 Jan 2021 19:30:25 +0000 (14:30 -0500)]
x86: Remove vma linked list walks

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agos390: Remove vma linked list walks
Liam R. Howlett [Mon, 4 Jan 2021 19:29:19 +0000 (14:29 -0500)]
s390: Remove vma linked list walks

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agopowerpc: Remove mmap linked list walks
Liam R. Howlett [Mon, 4 Jan 2021 19:25:54 +0000 (14:25 -0500)]
powerpc: Remove mmap linked list walks

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoparisc: Remove mmap linked list from cache handling
Liam R. Howlett [Mon, 4 Jan 2021 19:25:19 +0000 (14:25 -0500)]
parisc: Remove mmap linked list from cache handling

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoarm64: Remove mmap linked list from vdso
Liam R. Howlett [Mon, 4 Jan 2021 19:24:40 +0000 (14:24 -0500)]
arm64: Remove mmap linked list from vdso

Use the VMA iterator instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mmap: Change do_brk_munmap() to use do_mas_align_munmap()
Liam R. Howlett [Tue, 1 Dec 2020 02:30:04 +0000 (21:30 -0500)]
mm/mmap: Change do_brk_munmap() to use do_mas_align_munmap()

do_brk_munmap() has already aligned the address and has a maple tree
state to be used.  Use the new do_mas_align_munmap() to avoid
unnecessary alignment and error checks.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mmap: Reorganize munmap to use maple states
Liam R. Howlett [Thu, 19 Nov 2020 17:57:23 +0000 (12:57 -0500)]
mm/mmap: Reorganize munmap to use maple states

Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and
do_mas_align_munmap().

do_munmap() is a wrapper to create a maple state for any callers that
have not been converted to the maple tree.

do_mas_munmap() takes a maple state to mumap a range.  This is just a
small function which checks for error conditions and aligns the end of
the range.

do_mas_align_munmap() uses the aligned range to mumap a range.
do_mas_align_munmap() starts with the first VMA in the range, then finds
the last VMA in the range.  Both start and end are split if necessary.
Then the VMAs are unlocked and removed from the linked list at the same
time.  Followed by a single tree operation of overwriting the area in
with a NULL.  Finally, the detached list is unmapped and freed.

By reorganizing the munmap calls as outlined, it is now possible to
avoid extra work of aligning pre-aligned callers which are known to be
safe, avoid extra VMA lookups or tree walks for modifications.

detach_vmas_to_be_unmapped() is no longer used, so drop this code.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mmap: Move mmap_region() below do_munmap()
Liam R. Howlett [Wed, 27 Jan 2021 15:25:13 +0000 (10:25 -0500)]
mm/mmap: Move mmap_region() below do_munmap()

Relocation of code for the next commit.  There should be no changes
here.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm: Remove vmacache
Liam R. Howlett [Mon, 16 Nov 2020 19:50:20 +0000 (14:50 -0500)]
mm: Remove vmacache

By using the maple tree and the maple tree state, the vmacache is no
longer beneficial and is complicating the VMA code.  Remove the vmacache
to reduce the work in keeping it up to date and code complexity.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mmap: Use advanced maple tree API for mmap_region()
Liam R. Howlett [Tue, 10 Nov 2020 18:37:40 +0000 (13:37 -0500)]
mm/mmap: Use advanced maple tree API for mmap_region()

Changing mmap_region() to use the maple tree state and the advanced
maple tree interface allows for a lot less tree walking.

This change removes the last caller of munmap_vma_range(), so drop this
unused function.

Add vma_expand() to expand a VMA if possible by doing the necessary
hugepage check, uprobe_munmap of files, dcache flush, modifications then
undoing the detaches, etc.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm: Use maple tree operations for find_vma_intersection() and find_vma()
Liam R. Howlett [Mon, 28 Sep 2020 19:50:19 +0000 (15:50 -0400)]
mm: Use maple tree operations for find_vma_intersection() and find_vma()

Move find_vma_intersection() to mmap.c and change implementation to
maple tree.

When searching for a vma within a range, it is easier to use the maple
tree interface.  This means the find_vma() call changes to a special
case of the find_vma_intersection().

Exported for kvm module.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mmap: Change do_brk_flags() to expand existing VMA and add
Liam R. Howlett [Mon, 21 Sep 2020 14:47:34 +0000 (10:47 -0400)]
mm/mmap: Change do_brk_flags() to expand existing VMA and add
do_brk_munmap()

Avoid allocating a new VMA when it a vma modification can occur.  When a
brk() can expand or contract a VMA, then the single store operation will
only modify one index of the maple tree instead of causing a node to
split or coalesce.  This avoids unnecessary allocations/frees of maple
tree nodes and VMAs.

Use the advanced API for the maple tree to avoid unnecessary walks of
the tree.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/khugepaged: Optimize collapse_pte_mapped_thp() by using vma_lookup()
Liam R. Howlett [Thu, 8 Apr 2021 20:21:58 +0000 (16:21 -0400)]
mm/khugepaged: Optimize collapse_pte_mapped_thp() by using vma_lookup()

vma_lookup() will walk the vma tree once and not continue to look for
the next vma.  Since the exact vma is checked below, this is a more
optimal way of searching.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm: Optimize find_exact_vma() to use vma_lookup()
Liam R. Howlett [Thu, 8 Apr 2021 20:15:00 +0000 (16:15 -0400)]
mm: Optimize find_exact_vma() to use vma_lookup()

Use vma_lookup() to walk the tree to the start value requested.  If
the vma at the start does not match, then the answer is NULL and there
is no need to look at the next vma the way that find_vma() would.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoxen: Use vma_lookup() in privcmd_ioctl_mmap()
Liam R. Howlett [Thu, 8 Apr 2021 20:06:36 +0000 (16:06 -0400)]
xen: Use vma_lookup() in privcmd_ioctl_mmap()

vma_lookup() walks the VMA tree for a specific value, find_vma() will
search the tree after walking to a specific value.  It is more efficient
to only walk to the requested value as this case requires the address to
equal the vm_start.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agommap: Change zeroing of maple tree in __vma_adjust
Liam R. Howlett [Mon, 6 Sep 2021 03:31:15 +0000 (23:31 -0400)]
mmap: Change zeroing of maple tree in __vma_adjust

Only write to the maple tree if we are not inserting or the insert isn't
going to overwrite the area to clear.  This avoids spanning writes and
node coealescing when unnecessary.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
3 years agomm: Remove rb tree.
Liam R. Howlett [Fri, 24 Jul 2020 19:06:25 +0000 (15:06 -0400)]
mm: Remove rb tree.

Remove the RB tree and start using the maple tree for vm_area_struct
tracking.

Drop validate_mm() calls in expand_upwards() and expand_downwards() as
the lock is not held.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm: Convert vma_lookup() to use the Maple Tree
Matthew Wilcox (Oracle) [Fri, 29 Oct 2021 01:53:09 +0000 (21:53 -0400)]
mm: Convert vma_lookup() to use the Maple Tree

Unlike the rbtree, the Maple Tree will return a NULL if there's
nothing at a particular address.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agoproc: Remove VMA rbtree use from nommu
Matthew Wilcox (Oracle) [Thu, 28 Oct 2021 22:22:49 +0000 (18:22 -0400)]
proc: Remove VMA rbtree use from nommu

These users of the rbtree should probably have been walks of the
linked list, but convert them to use walks of the maple tree.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agodamon: Convert __damon_va_three_regions to use the VMA iterator
Matthew Wilcox (Oracle) [Thu, 28 Oct 2021 18:39:28 +0000 (14:39 -0400)]
damon: Convert __damon_va_three_regions to use the VMA iterator

This rather specialised walk can use the VMA iterator.  If this proves
to be too slow, we can write a custom routine to find the two largest
gaps, but it will be somewhat complicated, so let's see if we need it
first.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agokernel/fork: Use maple tree for dup_mmap() during forking
Liam R. Howlett [Fri, 22 Oct 2021 17:53:01 +0000 (13:53 -0400)]
kernel/fork: Use maple tree for dup_mmap() during forking

The maple tree was already tracking VMAs in this function by an earlier
commit, but the rbtree iterator was being used to iterate the list.
Change the iterator to use a maple tree native iterator and switch to
the maple tree advanced API to avoid multiple walks of the tree during
insert operations.  Unexport the now-unused vma_store() function.

We track whether we need to free the VMAs and tree nodes through RCU
(ie whether there have been multiple threads that can see the mm_struct
simultaneously; by pthread(), ptrace() or looking at /proc/$pid/maps).
This setting is sticky because it's too tricky to decide when it's safe
to exit RCU mode.

For performance reasons we bulk allocate the maple tree nodes.  The node
calculations are done internally to the tree and use the VMA count and
assume the worst-case node requirements.  The VM_DONT_COPY flag does
not allow for the most efficient copy method of the tree and so a bulk
loading algorithm is used.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agomm/mmap: Use maple tree for unmapped_area{_topdown}
Liam R. Howlett [Fri, 24 Jul 2020 16:01:47 +0000 (12:01 -0400)]
mm/mmap: Use maple tree for unmapped_area{_topdown}

The maple tree code was added to find the unmapped area in a previous
commit and was checked against what the rbtree returned, but the actual
result was never used.  Start using the maple tree implementation and
remove the rbtree code.

Add kernel documentation comment for these functions.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agomm/mmap: Use the maple tree for find_vma_prev() instead of the rbtree
Liam R. Howlett [Fri, 24 Jul 2020 15:48:08 +0000 (11:48 -0400)]
mm/mmap: Use the maple tree for find_vma_prev() instead of the rbtree

Use the maple tree's advanced API and a maple state to walk the tree
for the entry at the address of the next vma, then use the maple state
to walk back one entry to find the previous entry.

Add kernel documentation comments for this API.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agomm/mmap: Use the maple tree in find_vma() instead of the rbtree.
Liam R. Howlett [Fri, 24 Jul 2020 15:38:39 +0000 (11:38 -0400)]
mm/mmap: Use the maple tree in find_vma() instead of the rbtree.

Using the maple tree interface mt_find() will handle the RCU locking and
will start searching at the address up to the limit, ULONG_MAX in this
case.

Add kernel documentation to this API.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agommap: Use the VMA iterator in count_vma_pages_range()
Matthew Wilcox (Oracle) [Fri, 29 Oct 2021 13:08:39 +0000 (09:08 -0400)]
mmap: Use the VMA iterator in count_vma_pages_range()

This simplifies the implementation and is faster than using the linked
list.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agomm: Add VMA iterator
Matthew Wilcox (Oracle) [Fri, 22 Oct 2021 14:52:21 +0000 (10:52 -0400)]
mm: Add VMA iterator

This thin layer of abstraction over the maple tree state is for
iterating over VMAs.  You can go forwards, go backwards or ask where
the iterator is.  Rename the existing vma_next() to __vma_next() --
it will be removed by the end of this series.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agomm: Start tracking VMAs with maple tree
Liam R. Howlett [Fri, 24 Jul 2020 17:04:30 +0000 (13:04 -0400)]
mm: Start tracking VMAs with maple tree

Start tracking the VMAs with the new maple tree structure in parallel
with the rb_tree.  Add debug and trace events for maple tree operations
and duplicate the rb_tree that is created on forks into the maple tree.

The maple tree is added to the mm_struct including the mm_init struct,
added support in required mm/mmap functions, added tracking in
kernel/fork for process forking, and used to find the unmapped_area and
checked against what the rbtree finds.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agoMaple Tree: Add new data structure
Liam R. Howlett [Fri, 24 Jul 2020 17:02:52 +0000 (13:02 -0400)]
Maple Tree: Add new data structure

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently.  There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface.  The first user that is covered in this
patch set is the vm_area_struct, where three data structures are
replaced by the maple tree: the augmented rbtree, the vma cache, and the
linked list of VMAs in the mm_struct.  The long term goal is to reduce
or remove the mmap_sem contention.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes.  With the increased branching factor, it is significantly shorter than
the rbtree so it has fewer cache misses.  The removal of the linked list
between subsequent entries also reduces the cache misses and the need to pull
in the previous and next VMA during many tree alterations.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agoradix tree test suite: Add support for slab bulk APIs
Liam R. Howlett [Tue, 25 Aug 2020 19:51:54 +0000 (15:51 -0400)]
radix tree test suite: Add support for slab bulk APIs

Add support for kmem_cache_free_bulk() and kmem_cache_alloc_bulk() to
the radix tree test suite.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoradix tree test suite: Add allocation counts and size to kmem_cache
Liam R. Howlett [Tue, 1 Jun 2021 14:35:43 +0000 (10:35 -0400)]
radix tree test suite: Add allocation counts and size to kmem_cache

Add functions to get the number of allocations, and total allocations
from a kmem_cache.  Also add a function to get the allocated size and a
way to zero the total allocations.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoradix tree test suite: Add kmem_cache_set_non_kernel()
Liam R. Howlett [Thu, 27 Feb 2020 20:13:20 +0000 (15:13 -0500)]
radix tree test suite: Add kmem_cache_set_non_kernel()

kmem_cache_set_non_kernel() is a mechanism to allow a certain number of
kmem_cache_alloc requests to succeed even when GFP_KERNEL is not set in
the flags.  This functionality allows for testing different paths though
the code.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
3 years agoradix tree test suite: Add pr_err define
Liam R. Howlett [Tue, 1 Jun 2021 14:25:22 +0000 (10:25 -0400)]
radix tree test suite: Add pr_err define

define pr_err to printk

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
3 years agoLinux 5.16-rc2
Linus Torvalds [Sun, 21 Nov 2021 21:47:39 +0000 (13:47 -0800)]
Linux 5.16-rc2

3 years agoMerge tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Nov 2021 19:25:19 +0000 (11:25 -0800)]
Merge tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:

 - Move the command line preparation and the early command line parsing
   earlier so that the command line parameters which affect
   early_reserve_memory(), e.g. efi=nosftreserve, are taken into
   account. This was broken when the invocation of
   early_reserve_memory() was moved recently.

 - Use an atomic type for the SGX page accounting, which is read and
   written locklessly, to plug various race conditions related to it.

* tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Fix free page accounting
  x86/boot: Pull up cmdline preparation and early param parsing

3 years agoMerge tag 'perf-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Nov 2021 19:17:50 +0000 (11:17 -0800)]
Merge tag 'perf-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf fixes from Thomas Gleixner:

 - Remove unneded PEBS disabling when taking LBR snapshots to prevent an
   unchecked MSR access error.

 - Fix IIO event constraints for Snowridge and Skylake server chips.

* tag 'perf-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/perf: Fix snapshot_branch_stack warning in VM
  perf/x86/intel/uncore: Fix IIO event constraints for Snowridge
  perf/x86/intel/uncore: Fix IIO event constraints for Skylake Server
  perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake Server

3 years agoMerge tag 'powerpc-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 21 Nov 2021 18:26:35 +0000 (10:26 -0800)]
Merge tag 'powerpc-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc fixes from Michael Ellerman:

 - Fix a bug in copying of sigset_t for 32-bit systems, which caused X
   to not start.

 - Fix handling of shared LSIs (rare) with the xive interrupt controller
   (Power9/10).

 - Fix missing TOC setup in some KVM code, which could result in oopses
   depending on kernel data layout.

 - Fix DMA mapping when we have persistent memory and only one DMA
   window available.

 - Fix further problems with STRICT_KERNEL_RWX on 8xx, exposed by a
   recent fix.

 - A couple of other minor fixes.

Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Cédric Le Goater,
Christian Zigotzky, Christophe Leroy, Daniel Axtens, Finn Thain, Greg
Kurz, Masahiro Yamada, Nicholas Piggin, and Uwe Kleine-König.

* tag 'powerpc-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/xive: Change IRQ domain to a tree domain
  powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWX
  powerpc/signal32: Fix sigset_t copy
  powerpc/book3e: Fix TLBCAM preset at boot
  powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one window
  powerpc/pseries/ddw: simplify enable_ddw()
  powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"
  powerpc/pseries: Fix numa FORM2 parsing fallback code
  powerpc/pseries: rename numa_dist_table to form2_distances
  powerpc: clean vdso32 and vdso64 directories
  powerpc/83xx/mpc8349emitx: Drop unused variable
  KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()

3 years agopstore/blk: Use "%lu" to format unsigned long
Geert Uytterhoeven [Thu, 18 Nov 2021 18:26:21 +0000 (10:26 -0800)]
pstore/blk: Use "%lu" to format unsigned long

On 32-bit:

    fs/pstore/blk.c: In function â€˜__best_effort_init’:
    include/linux/kern_levels.h:5:18: warning: format â€˜%zu’ expects argument of type â€˜size_t’, but argument 3 has type â€˜long unsigned int’ [-Wformat=]
5 | #define KERN_SOH "\001"  /* ASCII Start Of Header */
  |                  ^~~~~~
    include/linux/kern_levels.h:14:19: note: in expansion of macro â€˜KERN_SOH’
       14 | #define KERN_INFO KERN_SOH "6" /* informational */
  |                   ^~~~~~~~
    include/linux/printk.h:373:9: note: in expansion of macro â€˜KERN_INFO’
      373 |  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
  |         ^~~~~~~~~
    fs/pstore/blk.c:314:3: note: in expansion of macro â€˜pr_info’
      314 |   pr_info("attached %s (%zu) (no dedicated panic_write!)\n",
  |   ^~~~~~~

Cc: stable@vger.kernel.org
Fixes: 7bb9557b48fcabaa ("pstore/blk: Use the normal block device I/O path")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210629103700.1935012-1-geert@linux-m68k.org
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 20 Nov 2021 21:17:24 +0000 (13:17 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "15 patches.

  Subsystems affected by this patch series: ipc, hexagon, mm (swap,
  slab-generic, kmemleak, hugetlb, kasan, damon, and highmem), and proc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  proc/vmcore: fix clearing user buffer by properly using clear_user()
  kmap_local: don't assume kmap PTEs are linear arrays in memory
  mm/damon/dbgfs: fix missed use of damon_dbgfs_lock
  mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation
  kasan: test: silence intentional read overflow warnings
  hugetlb, userfaultfd: fix reservation restore on userfaultfd error
  hugetlb: fix hugetlb cgroup refcounting during mremap
  mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
  hexagon: ignore vmlinux.lds
  hexagon: clean up timer-regs.h
  hexagon: export raw I/O routines for modules
  mm: emit the "free" trace report before freeing memory in kmem_cache_free()
  shm: extend forced shm destroy to support objects from several IPC nses
  ipc: WARN if trying to remove ipc object which is absent
  mm/swap.c:put_pages_list(): reinitialise the page list

3 years agoMerge tag 'block-5.16-2021-11-19' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 20 Nov 2021 19:05:10 +0000 (11:05 -0800)]
Merge tag 'block-5.16-2021-11-19' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Flip a cap check to avoid a selinux error (Alistair)

 - Fix for a regression this merge window where we can miss a queue ref
   put (me)

 - Un-mark pstore-blk as broken, as the condition that triggered that
   change has been rectified (Kees)

 - Queue quiesce and sync fixes (Ming)

 - FUA insertion fix (Ming)

 - blk-cgroup error path put fix (Yu)

* tag 'block-5.16-2021-11-19' of git://git.kernel.dk/linux-block:
  blk-mq: don't insert FUA request with data into scheduler queue
  blk-cgroup: fix missing put device in error path from blkg_conf_pref()
  block: avoid to quiesce queue in elevator_init_mq
  Revert "mark pstore-blk as broken"
  blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release()
  block: fix missing queue put in error path
  block: Check ADMIN before NICE for IOPRIO_CLASS_RT

3 years agoMerge tag 'pinctrl-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Sat, 20 Nov 2021 18:59:03 +0000 (10:59 -0800)]
Merge tag 'pinctrl-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "There is an ACPI stubs fix which is ACKed by the ACPI maintainer for
  merging through my tree.

  One item stand out and that is that I delete the <linux/sdb.h> header
  that is used by nothing. I deleted this subsystem (through the GPIO
  tree) a while back so I feel responsible for tidying up the floor.

  Other than that it is the usual mistakes, a bit noisy around build
  issue and Kconfig then driver fixes.

  Specifics:

   - Fix some stubs causing compile issues for ACPI.

   - Fix some wakeups on AMD IRQs shared between GPIO and SCI.

   - Fix a build warning in the Tegra driver.

   - Fix a Kconfig issue in the Qualcomm driver.

   - Add a missing include the RALink driver.

   - Return a valid type for the Apple pinctrl IRQs.

   - Implement some Qualcomm SDM845 dual-edge errata.

   - Remove the unused <linux/sdb.h> header. (The subsystem was once
     deleted by the pinctrl maintainer...)

   - Fix a duplicate initialized in the Tegra driver.

   - Fix register offsets for UFS and SDC in the Qualcomm SM8350 driver"

* tag 'pinctrl-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: qcom: sm8350: Correct UFS and SDC offsets
  pinctrl: tegra194: remove duplicate initializer again
  Remove unused header <linux/sdb.h>
  pinctrl: qcom: sdm845: Enable dual edge errata
  pinctrl: apple: Always return valid type in apple_gpio_irq_type
  pinctrl: ralink: include 'ralink_regs.h' in 'pinctrl-mt7620.c'
  pinctrl: qcom: fix unmet dependencies on GPIOLIB for GPIOLIB_IRQCHIP
  pinctrl: tegra: Return const pointer from tegra_pinctrl_get_group()
  pinctrl: amd: Fix wakeups when IRQ is shared with SCI
  ACPI: Add stubs for wakeup handler functions

3 years agoMerge tag 's390-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 20 Nov 2021 18:55:50 +0000 (10:55 -0800)]
Merge tag 's390-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - Add missing Kconfig option for ftrace direct multi sample, so it can
   be compiled again, and also add s390 support for this sample.

 - Update Christian Borntraeger's email address.

 - Various fixes for memory layout setup. Besides other this makes it
   possible to load shared DCSS segments again.

 - Fix copy to user space of swapped kdump oldmem.

 - Remove -mstack-guard and -mstack-size compile options when building
   vdso binaries. This can happen when CONFIG_VMAP_STACK is disabled and
   results in broken vdso code which causes more or less random
   exceptions. Also remove the not needed -nostdlib option.

 - Fix memory leak on cpu hotplug and return code handling in kexec
   code.

 - Wire up futex_waitv system call.

 - Replace snprintf with sysfs_emit where appropriate.

* tag 's390-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  ftrace/samples: add s390 support for ftrace direct multi sample
  ftrace/samples: add missing Kconfig option for ftrace direct multi sample
  MAINTAINERS: update email address of Christian Borntraeger
  s390/kexec: fix memory leak of ipl report buffer
  s390/kexec: fix return code handling
  s390/dump: fix copying to user-space of swapped kdump oldmem
  s390: wire up sys_futex_waitv system call
  s390/vdso: filter out -mstack-guard and -mstack-size
  s390/vdso: remove -nostdlib compiler flag
  s390: replace snprintf in show functions with sysfs_emit
  s390/boot: simplify and fix kernel memory layout setup
  s390/setup: re-arrange memblock setup
  s390/setup: avoid using memblock_enforce_memory_limit
  s390/setup: avoid reserving memory above identity mapping

3 years agoMerge tag '5.16-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 20 Nov 2021 18:47:16 +0000 (10:47 -0800)]
Merge tag '5.16-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Three small cifs/smb3 fixes: two to address minor coverity issues and
  one cleanup"

* tag '5.16-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: introduce cifs_ses_mark_for_reconnect() helper
  cifs: protect srv_count with cifs_tcp_ses_lock
  cifs: move debug print out of spinlock

3 years agoproc/vmcore: fix clearing user buffer by properly using clear_user()
David Hildenbrand [Sat, 20 Nov 2021 00:43:58 +0000 (16:43 -0800)]
proc/vmcore: fix clearing user buffer by properly using clear_user()

To clear a user buffer we cannot simply use memset, we have to use
clear_user().  With a virtio-mem device that registers a vmcore_cb and
has some logically unplugged memory inside an added Linux memory block,
I can easily trigger a BUG by copying the vmcore via "cp":

  systemd[1]: Starting Kdump Vmcore Save Service...
  kdump[420]: Kdump is using the default log level(3).
  kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
  kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
  kdump[465]: saving vmcore-dmesg.txt complete
  kdump[467]: saving vmcore
  BUG: unable to handle page fault for address: 00007f2374e01000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0003) - permissions violation
  PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
  Oops: 0003 [#1] PREEMPT SMP NOPTI
  CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
  RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
  Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
  RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
  RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
  RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
  RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
  R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
  R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
  FS:  00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
  Call Trace:
   read_vmcore+0x236/0x2c0
   proc_reg_read+0x55/0xa0
   vfs_read+0x95/0x190
   ksys_read+0x4f/0xc0
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on
wrong access.  In the x86-64 variant of clear_user(), SMAP is properly
handled via clac()+stac().

To fix, properly use clear_user() when we're dealing with a user buffer.

Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokmap_local: don't assume kmap PTEs are linear arrays in memory
Ard Biesheuvel [Sat, 20 Nov 2021 00:43:55 +0000 (16:43 -0800)]
kmap_local: don't assume kmap PTEs are linear arrays in memory

The kmap_local conversion broke the ARM architecture, because the new
code assumes that all PTEs used for creating kmaps form a linear array
in memory, and uses array indexing to look up the kmap PTE belonging to
a certain kmap index.

On ARM, this cannot work, not only because the PTE pages may be
non-adjacent in memory, but also because ARM/!LPAE interleaves hardware
entries and extended entries (carrying software-only bits) in a way that
is not compatible with array indexing.

Fortunately, this only seems to affect configurations with more than 8
CPUs, due to the way the per-CPU kmap slots are organized in memory.

Work around this by permitting an architecture to set a Kconfig symbol
that signifies that the kmap PTEs do not form a lineary array in memory,
and so the only way to locate the appropriate one is to walk the page
tables.

Link: https://lore.kernel.org/linux-arm-kernel/20211026131249.3731275-1-ardb@kernel.org/
Link: https://lkml.kernel.org/r/20211116094737.7391-1-ardb@kernel.org
Fixes: 2a15ba82fa6c ("ARM: highmem: Switch to generic kmap atomic")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reported-by: Quanyang Wang <quanyang.wang@windriver.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/damon/dbgfs: fix missed use of damon_dbgfs_lock
SeongJae Park [Sat, 20 Nov 2021 00:43:52 +0000 (16:43 -0800)]
mm/damon/dbgfs: fix missed use of damon_dbgfs_lock

DAMON debugfs is supposed to protect dbgfs_ctxs, dbgfs_nr_ctxs, and
dbgfs_dirs using damon_dbgfs_lock.  However, some of the code is
accessing the variables without the protection.  This fixes it by
protecting all such accesses.

Link: https://lkml.kernel.org/r/20211110145758.16558-3-sj@kernel.org
Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation
SeongJae Park [Sat, 20 Nov 2021 00:43:49 +0000 (16:43 -0800)]
mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation

Patch series "DAMON fixes".

This patch (of 2):

DAMON users can trigger below warning in '__alloc_pages()' by invoking
write() to some DAMON debugfs files with arbitrarily high count
argument, because DAMON debugfs interface allocates some buffers based
on the user-specified 'count'.

        if (unlikely(order >= MAX_ORDER)) {
                WARN_ON_ONCE(!(gfp & __GFP_NOWARN));
                return NULL;
        }

Because the DAMON debugfs interface code checks failure of the
'kmalloc()', this commit simply suppresses the warnings by adding
'__GFP_NOWARN' flag.

Link: https://lkml.kernel.org/r/20211110145758.16558-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211110145758.16558-2-sj@kernel.org
Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokasan: test: silence intentional read overflow warnings
Kees Cook [Sat, 20 Nov 2021 00:43:46 +0000 (16:43 -0800)]
kasan: test: silence intentional read overflow warnings

As done in commit d73dad4eb5ad ("kasan: test: bypass __alloc_size
checks") for __write_overflow warnings, also silence some more cases
that trip the __read_overflow warnings seen in 5.16-rc1[1]:

  In file included from include/linux/string.h:253,
                   from include/linux/bitmap.h:10,
                   from include/linux/cpumask.h:12,
                   from include/linux/mm_types_task.h:14,
                   from include/linux/mm_types.h:5,
                   from include/linux/page-flags.h:13,
                   from arch/arm64/include/asm/mte.h:14,
                   from arch/arm64/include/asm/pgtable.h:12,
                   from include/linux/pgtable.h:6,
                   from include/linux/kasan.h:29,
                   from lib/test_kasan.c:10:
  In function 'memcmp',
      inlined from 'kasan_memcmp' at lib/test_kasan.c:897:2:
  include/linux/fortify-string.h:263:25: error: call to '__read_overflow' declared with attribute error: detected read beyond size of object (1st parameter)
    263 |                         __read_overflow();
        |                         ^~~~~~~~~~~~~~~~~
  In function 'memchr',
      inlined from 'kasan_memchr' at lib/test_kasan.c:872:2:
  include/linux/fortify-string.h:277:17: error: call to '__read_overflow' declared with attribute error: detected read beyond size of object (1st parameter)
    277 |                 __read_overflow();
        |                 ^~~~~~~~~~~~~~~~~

[1] http://kisskb.ellerman.id.au/kisskb/buildresult/14660585/log/

Link: https://lkml.kernel.org/r/20211116004111.3171781-1-keescook@chromium.org
Fixes: d73dad4eb5ad ("kasan: test: bypass __alloc_size checks")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agohugetlb, userfaultfd: fix reservation restore on userfaultfd error
Mina Almasry [Sat, 20 Nov 2021 00:43:43 +0000 (16:43 -0800)]
hugetlb, userfaultfd: fix reservation restore on userfaultfd error

Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we
bail out using "goto out_release_unlock;" in the cases where idx >=
size, or !huge_pte_none(), the code will detect that new_pagecache_page
== false, and so call restore_reserve_on_error().  In this case I see
restore_reserve_on_error() delete the reservation, and the following
call to remove_inode_hugepages() will increment h->resv_hugepages
causing a 100% reproducible leak.

We should treat the is_continue case similar to adding a page into the
pagecache and set new_pagecache_page to true, to indicate that there is
no reservation to restore on the error path, and we need not call
restore_reserve_on_error().  Rename new_pagecache_page to
page_in_pagecache to make that clear.

Link: https://lkml.kernel.org/r/20211117193825.378528-1-almasrymina@google.com
Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error")
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reported-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agohugetlb: fix hugetlb cgroup refcounting during mremap
Bui Quang Minh [Sat, 20 Nov 2021 00:43:40 +0000 (16:43 -0800)]
hugetlb: fix hugetlb cgroup refcounting during mremap

When hugetlb_vm_op_open() is called during copy_vma(), we may take the
reference to resv_map->css.  Later, when clearing the reservation
pointer of old_vma after transferring it to new_vma, we forget to drop
the reference to resv_map->css.  This leads to a reference leak of css.

Fixes this by adding a check to drop reservation css reference in
clear_vma_resv_huge_pages()

Link: https://lkml.kernel.org/r/20211113154412.91134-1-minhquangbui99@gmail.com
Fixes: 550a7d60bd5e35 ("mm, hugepages: add mremap() support for hugepage backed vma")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
Rustam Kovhaev [Sat, 20 Nov 2021 00:43:37 +0000 (16:43 -0800)]
mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag

When kmemleak is enabled for SLOB, system does not boot and does not
print anything to the console.  At the very early stage in the boot
process we hit infinite recursion from kmemleak_init() and eventually
kernel crashes.

kmemleak_init() specifies SLAB_NOLEAKTRACE for KMEM_CACHE(), but
kmem_cache_create_usercopy() removes it because CACHE_CREATE_MASK is not
valid for SLOB.

Let's fix CACHE_CREATE_MASK and make kmemleak work with SLOB

Link: https://lkml.kernel.org/r/20211115020850.3154366-1-rkovhaev@gmail.com
Fixes: d8843922fba4 ("slab: Ignore internal flags in cache creation")
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Glauber Costa <glommer@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agohexagon: ignore vmlinux.lds
Nathan Chancellor [Sat, 20 Nov 2021 00:43:34 +0000 (16:43 -0800)]
hexagon: ignore vmlinux.lds

After building allmodconfig, there is an untracked vmlinux.lds file in
arch/hexagon/kernel:

    $ git ls-files . --exclude-standard --others
    arch/hexagon/kernel/vmlinux.lds

Ignore it as all other architectures have.

Link: https://lkml.kernel.org/r/20211115174250.1994179-4-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agohexagon: clean up timer-regs.h
Nathan Chancellor [Sat, 20 Nov 2021 00:43:31 +0000 (16:43 -0800)]
hexagon: clean up timer-regs.h

When building allmodconfig, there is a warning about TIMER_ENABLE being
redefined:

  drivers/clocksource/timer-oxnas-rps.c:39:9: error: 'TIMER_ENABLE' macro redefined [-Werror,-Wmacro-redefined]
  #define TIMER_ENABLE            BIT(7)
          ^
  arch/hexagon/include/asm/timer-regs.h:13:9: note: previous definition is here
  #define TIMER_ENABLE            0
           ^
  1 error generated.

The values in this header are only used in one file each, if they are
used at all.  Remove the header and sink all of the constants into their
respective files.

TCX0_CLK_RATE is only used in arch/hexagon/include/asm/timex.h

TIMER_ENABLE, RTOS_TIMER_INT, RTOS_TIMER_REGS_ADDR are only used in
arch/hexagon/kernel/time.c.

SLEEP_CLK_RATE and TIMER_CLR_ON_MATCH have both been unused since the
file's introduction in commit 71e4a47f32f4 ("Hexagon: Add time and timer
functions").

TIMER_ENABLE is redefined as BIT(0) so the shift is moved into the
definition, rather than its use.

Link: https://lkml.kernel.org/r/20211115174250.1994179-3-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Brian Cain <bcain@codeaurora.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agohexagon: export raw I/O routines for modules
Nathan Chancellor [Sat, 20 Nov 2021 00:43:28 +0000 (16:43 -0800)]
hexagon: export raw I/O routines for modules

Patch series "Fixes for ARCH=hexagon allmodconfig", v2.

This series fixes some issues noticed with ARCH=hexagon allmodconfig.

This patch (of 3):

When building ARCH=hexagon allmodconfig, the following errors occur:

  ERROR: modpost: "__raw_readsl" [drivers/i3c/master/svc-i3c-master.ko] undefined!
  ERROR: modpost: "__raw_writesl" [drivers/i3c/master/dw-i3c-master.ko] undefined!
  ERROR: modpost: "__raw_readsl" [drivers/i3c/master/dw-i3c-master.ko] undefined!
  ERROR: modpost: "__raw_writesl" [drivers/i3c/master/i3c-master-cdns.ko] undefined!
  ERROR: modpost: "__raw_readsl" [drivers/i3c/master/i3c-master-cdns.ko] undefined!

Export these symbols so that modules can use them without any errors.

Link: https://lkml.kernel.org/r/20211115174250.1994179-1-nathan@kernel.org
Link: https://lkml.kernel.org/r/20211115174250.1994179-2-nathan@kernel.org
Fixes: 013bf24c3829 ("Hexagon: Provide basic implementation and/or stubs for I/O routines.")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Brian Cain <bcain@codeaurora.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: emit the "free" trace report before freeing memory in kmem_cache_free()
Yunfeng Ye [Sat, 20 Nov 2021 00:43:25 +0000 (16:43 -0800)]
mm: emit the "free" trace report before freeing memory in kmem_cache_free()

After the memory is freed, it can be immediately allocated by other
CPUs, before the "free" trace report has been emitted.  This causes
inaccurate traces.

For example, if the following sequence of events occurs:

    CPU 0                 CPU 1

  (1) alloc xxxxxx
  (2) free  xxxxxx
                         (3) alloc xxxxxx
                         (4) free  xxxxxx

Then they will be inaccurately reported via tracing, so that they appear
to have happened in this order:

    CPU 0                 CPU 1

  (1) alloc xxxxxx
                         (2) alloc xxxxxx
  (3) free  xxxxxx
                         (4) free  xxxxxx

This makes it look like CPU 1 somehow managed to allocate memory that
CPU 0 still had allocated for itself.

In order to avoid this, emit the "free xxxxxx" tracing report just
before the actual call to free the memory, instead of just after it.

Link: https://lkml.kernel.org/r/374eb75d-7404-8721-4e1e-65b0e5b17279@huawei.com
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoshm: extend forced shm destroy to support objects from several IPC nses
Alexander Mikhalitsyn [Sat, 20 Nov 2021 00:43:21 +0000 (16:43 -0800)]
shm: extend forced shm destroy to support objects from several IPC nses

Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.

This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).

This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.

To achieve that we do several things:

1. add a namespace (non-refcounted) pointer to the struct shmid_kernel

2. during new shm object creation (newseg()/shmget syscall) we
   initialize this pointer by current task IPC ns

3. exit_shm() fully reworked such that it traverses over all shp's in
   task->sysvshm.shm_clist and gets IPC namespace not from current task
   as it was before but from shp's object itself, then call
   shm_destroy(shp, ns).

Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed.  To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".

Q/A

Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
   lifetime, so, if we get shp object from the task->sysvshm.shm_clist
   while holding task_lock(task) nobody can steal our namespace.

Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
   namespace without getting task->sysvshm.shm_clist list cleaned up.

Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoipc: WARN if trying to remove ipc object which is absent
Alexander Mikhalitsyn [Sat, 20 Nov 2021 00:43:18 +0000 (16:43 -0800)]
ipc: WARN if trying to remove ipc object which is absent

Patch series "shm: shm_rmid_forced feature fixes".

Some time ago I met kernel crash after CRIU restore procedure,
fortunately, it was CRIU restore, so, I had dump files and could do
restore many times and crash reproduced easily.  After some
investigation I've constructed the minimal reproducer.  It was found
that it's use-after-free and it happens only if sysctl
kernel.shm_rmid_forced = 1.

The key of the problem is that the exit_shm() function not handles shp's
object destroy when task->sysvshm.shm_clist contains items from
different IPC namespaces.  In most cases this list will contain only
items from one IPC namespace.

How can this list contain object from different namespaces? The
exit_shm() function is designed to clean up this list always when
process leaves IPC namespace.  But we made a mistake a long time ago and
did not add a exit_shm() call into the setns() syscall procedures.

The first idea was just to add this call to setns() syscall but it
obviously changes semantics of setns() syscall and that's
userspace-visible change.  So, I gave up on this idea.

The first real attempt to address the issue was just to omit forced
destroy if we meet shp object not from current task IPC namespace [1].
But that was not the best idea because task->sysvshm.shm_clist was
protected by rwsem which belongs to current task IPC namespace.  It
means that list corruption may occur.

Second approach is just extend exit_shm() to properly handle shp's from
different IPC namespaces [2].  This is really non-trivial thing, I've
put a lot of effort into that but not believed that it's possible to
make it fully safe, clean and clear.

Thanks to the efforts of Manfred Spraul working an elegant solution was
designed.  Thanks a lot, Manfred!

Eric also suggested the way to address the issue in ("[RFC][PATCH] shm:
In shm_exit destroy all created and never attached segments") Eric's
idea was to maintain a list of shm_clists one per IPC namespace, use
lock-less lists.  But there is some extra memory consumption-related
concerns.

An alternative solution which was suggested by me was implemented in
("shm: reset shm_clist on setns but omit forced shm destroy").  The idea
is pretty simple, we add exit_shm() syscall to setns() but DO NOT
destroy shm segments even if sysctl kernel.shm_rmid_forced = 1, we just
clean up the task->sysvshm.shm_clist list.

This chages semantics of setns() syscall a little bit but in comparision
to the "naive" solution when we just add exit_shm() without any special
exclusions this looks like a safer option.

[1] https://lkml.org/lkml/2021/7/6/1108
[2] https://lkml.org/lkml/2021/7/14/736

This patch (of 2):

Let's produce a warning if we trying to remove non-existing IPC object
from IPC namespace kht/idr structures.

This allows us to catch possible bugs when the ipc_rmid() function was
called with inconsistent struct ipc_ids*, struct kern_ipc_perm*
arguments.

Link: https://lkml.kernel.org/r/20211027224348.611025-1-alexander.mikhalitsyn@virtuozzo.com
Link: https://lkml.kernel.org/r/20211027224348.611025-2-alexander.mikhalitsyn@virtuozzo.com
Co-developed-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/swap.c:put_pages_list(): reinitialise the page list
Matthew Wilcox [Sat, 20 Nov 2021 00:43:15 +0000 (16:43 -0800)]
mm/swap.c:put_pages_list(): reinitialise the page list

While free_unref_page_list() puts pages onto the CPU local LRU list, it
does not remove them from the list they were passed in on.  That makes
the list_head appear to be non-empty, and would lead to various
corruption problems if we didn't have an assertion that the list was
empty.

Reinitialise the list after calling free_unref_page_list() to avoid this
problem.

Link: https://lkml.kernel.org/r/YYp40A2lNrxaZji8@casper.infradead.org
Fixes: 988c69f1bc23 ("mm: optimise put_pages_list()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Steve French <stfrench@microsoft.com>
Reported-by: Namjae Jeon <linkinjeon@kernel.org>
Tested-by: Steve French <stfrench@microsoft.com>
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Hyeoncheol Lee <hyc.lee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge tag 'libata-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Fri, 19 Nov 2021 22:15:14 +0000 (14:15 -0800)]
Merge tag 'libata-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull libata fixes from Damien Le Moal:

 - Prevent accesses to unsupported log pages as that causes device scan
   failures with LLDDs using libsas (from me).

 - A couple of fixes for AMD AHCI adapters handling of low power modes
   and resume (from Mario).

 - Fix a compilation warning (from me).

* tag 'libata-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-sata: Declare ata_ncq_sdev_attrs static
  ata: libahci: Adjust behavior when StorageD3Enable _DSD is set
  ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile
  ata: libata: add missing ata_identify_page_supported() calls
  ata: libata: improve ata_read_log_page() error message

3 years agoMerge tag 'trace-v5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Fri, 19 Nov 2021 21:50:48 +0000 (13:50 -0800)]
Merge tag 'trace-v5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix double free in destroy_hist_field

 - Harden memset() of trace_iterator structure

 - Do not warn in trace printk check when test buffer fills up

* tag 'trace-v5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Don't use out-of-sync va_list in event printing
  tracing: Use memset_startat() to zero struct trace_iterator
  tracing/histogram: Fix UAF in destroy_hist_field()

3 years agoMerge tag 'perf-tools-fixes-for-v5.16-2021-11-19' of git://git.kernel.org/pub/scm...
Linus Torvalds [Fri, 19 Nov 2021 20:47:29 +0000 (12:47 -0800)]
Merge tag 'perf-tools-fixes-for-v5.16-2021-11-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix the 'local_weight', 'weight' (memory access latency),
   'local_ins_lat', 'ins_lat' (instruction latency) and 'pstage_cyc'
   (pipeline stage cycles) sort key sample aggregation.

 - Fix 'perf test' entry for watchpoints on s/390.

 - Fix branch_stack entry endianness check in the 'perf test' sample
   parsing test.

 - Fix ARM SPE handling on 'perf inject'.

 - Fix memory leaks detected with ASan.

 - Fix build on arm64 related to reallocarray() availability.

 - Sync copies of kernel headers: cpufeatures, kvm, MIPS syscalltable
   (futex_waitv).

* tag 'perf-tools-fixes-for-v5.16-2021-11-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf evsel: Fix memory leaks relating to unit
  perf report: Fix memory leaks around perf_tip()
  perf hist: Fix memory leak of a perf_hpp_fmt
  tools headers UAPI: Sync MIPS syscall table file changed by new futex_waitv syscall
  tools build: Fix removal of feature-sync-compare-and-swap feature detection
  perf inject: Fix ARM SPE handling
  perf bench: Fix two memory leaks detected with ASan
  perf test sample-parsing: Fix branch_stack entry endianness check
  tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
  perf sort: Fix the 'p_stage_cyc' sort key behavior
  perf sort: Fix the 'ins_lat' sort key behavior
  perf sort: Fix the 'weight' sort key behavior
  perf tools: Set COMPAT_NEED_REALLOCARRAY for CONFIG_AUXTRACE=1
  perf tests wp: Remove unused functions on s390
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources

3 years agoMerge tag 'riscv-for-linus-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 19 Nov 2021 19:40:14 +0000 (11:40 -0800)]
Merge tag 'riscv-for-linus-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "I have two patches for 5.16:

   - allow external modules to be built against read-only source trees

   - turn KVM on in the defconfigs

  The second one isn't technically a fix, but it got tied up pending
  some defconfig cleanups that ended up finding some larger issues. I
  figured it'd be better to get the config changes some more testing,
  but didn't want to hold up turning KVM on for that"

* tag 'riscv-for-linus-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: fix building external modules
  RISC-V: Enable KVM in RV64 and RV32 defconfigs as a module

3 years agoMerge branch 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Fri, 19 Nov 2021 19:33:31 +0000 (11:33 -0800)]
Merge branch 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull exit-vs-signal handling fixes from Eric Biederman:
 "This is a small set of changes where debuggers were no longer able to
  intercept synchronous SIGTRAP and SIGSEGV, introduced by the exit
  cleanups.

  This is essentially the change you suggested with all of i's dotted
  and the t's crossed so that ptrace can intercept all of the cases it
  has been able to intercept the past, and all of the cases that made it
  to exit without giving ptrace a chance still don't give ptrace a
  chance"

* 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal: Replace force_fatal_sig with force_exit_sig when in doubt
  signal: Don't always set SA_IMMUTABLE for forced signals

3 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Fri, 19 Nov 2021 19:19:58 +0000 (11:19 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Six fixes, five in drivers (ufs, qla2xxx, iscsi) and one core change
  to fix a regression in user space device state setting, which is used
  by the iscsi daemons to effect device recovery"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qla2xxx: Fix mailbox direction flags in qla2xxx_get_adapter_id()
  scsi: ufs: core: Fix another task management completion race
  scsi: ufs: core: Fix task management completion timeout race
  scsi: core: sysfs: Fix hang when device state is set via sysfs
  scsi: iscsi: Unblock session then wake up error handler
  scsi: ufs: core: Improve SCSI abort handling

3 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Fri, 19 Nov 2021 19:07:13 +0000 (11:07 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "There are a few big regression items from the merge window suggesting
  that people are testing rc1's but not testing the for-next branches:

   - Warnings fixes

   - Crash in hf1 when creating QPs and setting counters

   - Some old mlx4 cards fail to probe due to missing counters

   - Syzkaller crash in the new counters code"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  MAINTAINERS: Update for VMware PVRDMA driver
  RDMA/nldev: Check stat attribute before accessing it
  RDMA/mlx4: Do not fail the registration on port stats
  IB/hfi1: Properly allocate rdma counter desc memory
  RDMA/core: Set send and receive CQ before forwarding to the driver
  RDMA/netlink: Add __maybe_unused to static inline in C file

3 years agoMerge tag 'gpio-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 19 Nov 2021 19:02:09 +0000 (11:02 -0800)]
Merge tag 'gpio-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

 - fix a coccicheck warning in gpio-virtio

 - fix gpio selftests build issues

 - fix a Kconfig issue in gpio-rockchip

* tag 'gpio-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: rockchip: needs GENERIC_IRQ_CHIP to fix build errors
  selftests: gpio: restore CFLAGS options
  selftests: gpio: fix uninitialised variable warning
  selftests: gpio: fix gpio compiling error
  gpio: virtio: remove unneeded semicolon

3 years agoMerge tag 'drm-fixes-2021-11-19' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 19 Nov 2021 18:50:11 +0000 (10:50 -0800)]
Merge tag 'drm-fixes-2021-11-19' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "This week's fixes, pretty quiet, about right for rc2. amdgpu is the
  bulk of them but the scheduler ones have been reported in a few places
  I think.

  Otherwise just some minor i915 fixes and a few other scattered around:

  scheduler:
   - two refcounting fixes

  cma-helper:
   - use correct free path for noncoherent

  efifb:
   - probing fix

  amdgpu:
   - Better debugging info for SMU msgs
   - Better error reporting when adding IP blocks
   - Fix UVD powergating regression on CZ
   - Clock reporting fix for navi1x
   - OLED panel backlight fix
   - Fix scaling on VGA/DVI for non-DC display code
   - Fix GLFCLK handling for RGP on some APUs
   - fix potential memory leak

  amdkfd:
   - GPU reset fix

  i915:
   - return error handling fix
   - ADL-P display fix
   - TGL DSI display clocks fix

  nouveau:
   - infoframe corruption fix

  sun4i:
   - Kconfig fix"

* tag 'drm-fixes-2021-11-19' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/amdgpu: fix potential memleak
  drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
  drm/amd/pm: add GFXCLK/SCLK clocks level print support for APUs
  drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works on vga and dvi connectors
  drm/amd/display: Fix OLED brightness control on eDP
  drm/amd/pm: Remove artificial freq level on Navi1x
  drm/amd/pm: avoid duplicate powergate/ungate setting
  drm/amdgpu: add error print when failing to add IP block(v2)
  drm/amd/pm: Enhanced reporting also for a stuck command
  drm/i915/guc: fix NULL vs IS_ERR() checking
  drm/i915/dsi/xelpd: Fix the bit mask for wakeup GB
  Revert "drm/i915/tgl/dsi: Gate the ddi clocks after pll mapping"
  fbdev: Prevent probing generic drivers if a FB is already registered
  drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder
  drm/scheduler: fix drm_sched_job_add_implicit_dependencies
  drm/sun4i: fix unmet dependency on RESET_CONTROLLER for PHY_SUN6I_MIPI_DPHY
  drm/cma-helper: Release non-coherent memory with dma_free_noncoherent()
  drm/nouveau: hdmigv100.c: fix corrupted HDMI Vendor InfoFrame