]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
4 years agomm/mmap: Convert __insert_vm_struct to use mas, convert vma_link to use vma_mas_link() maple_spf_fixes
Liam R. Howlett [Mon, 4 Jan 2021 18:55:29 +0000 (13:55 -0500)]
mm/mmap: Convert __insert_vm_struct to use mas, convert vma_link to use vma_mas_link()

Drop unused vma_store()

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm: Remove vma linked list.
Liam R. Howlett [Mon, 4 Jan 2021 20:15:50 +0000 (15:15 -0500)]
mm: Remove vma linked list.

The vma linked list has been replaced by the maple tree iterators and
vma_next() vma_prev() functions.

A part of this change is also the iterators free_pgd_range(),
zap_page_range(), and unmap_single_vma()

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/util: Remove __vma_link_list() and __vma_unlink_list()
Liam R. Howlett [Mon, 4 Jan 2021 20:10:54 +0000 (15:10 -0500)]
mm/util: Remove __vma_link_list() and __vma_unlink_list()

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/nommu: Stop inserting into the vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:10:18 +0000 (15:10 -0500)]
mm/nommu: Stop inserting into the vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/swapfile: Use maple tree iterator instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:07:11 +0000 (15:07 -0500)]
mm/swapfile: Use maple tree iterator instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/pagewalk: Use vma_next() instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:06:47 +0000 (15:06 -0500)]
mm/pagewalk: Use vma_next() instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/oom_kill: Use maple tree iterators instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:06:25 +0000 (15:06 -0500)]
mm/oom_kill: Use maple tree iterators instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/nommu: Use maple tree iterators instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:04:42 +0000 (15:04 -0500)]
mm/nommu: Use maple tree iterators instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/msync: Use vma_next() instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:04:14 +0000 (15:04 -0500)]
mm/msync: Use vma_next() instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mremap: Use vma_next() instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:03:49 +0000 (15:03 -0500)]
mm/mremap: Use vma_next() instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mprotect: Use maple tree navigation instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:02:28 +0000 (15:02 -0500)]
mm/mprotect: Use maple tree navigation instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mlock: Use maple tree iterators instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 20:00:14 +0000 (15:00 -0500)]
mm/mlock: Use maple tree iterators instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mempolicy: Use maple tree iterators instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:59:52 +0000 (14:59 -0500)]
mm/mempolicy: Use maple tree iterators instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/memcontrol: Stop using mm->highest_vm_end
Liam R. Howlett [Mon, 4 Jan 2021 19:59:05 +0000 (14:59 -0500)]
mm/memcontrol: Stop using mm->highest_vm_end

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/madvise: Use vma_next instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:58:35 +0000 (14:58 -0500)]
mm/madvise: Use vma_next instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/ksm: Use maple tree iterators instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:58:11 +0000 (14:58 -0500)]
mm/ksm: Use maple tree iterators instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/khugepaged: Use maple tree iterators instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:57:47 +0000 (14:57 -0500)]
mm/khugepaged: Use maple tree iterators instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/huge_memory: Use vma_next() instead of vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:56:58 +0000 (14:56 -0500)]
mm/huge_memory: Use vma_next() instead of vma linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/gup: Use maple tree navigation instead of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:56:20 +0000 (14:56 -0500)]
mm/gup: Use maple tree navigation instead of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agokernel/sys: Use maple tree iterators instead of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:54:49 +0000 (14:54 -0500)]
kernel/sys: Use maple tree iterators instead of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agokernel/sched/fair: Use maple tree iterators instead of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:54:07 +0000 (14:54 -0500)]
kernel/sched/fair: Use maple tree iterators instead of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agokernel/events/uprobes: Use maple tree iterators instead of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:53:01 +0000 (14:53 -0500)]
kernel/events/uprobes: Use maple tree iterators instead of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agokernel/events/core: Use maple tree iterators instead of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:52:39 +0000 (14:52 -0500)]
kernel/events/core: Use maple tree iterators instead of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agokernel/acct: Use maple tree iterators instead of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:52:15 +0000 (14:52 -0500)]
kernel/acct: Use maple tree iterators instead of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoipc/shm: Stop using the vma linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:51:19 +0000 (14:51 -0500)]
ipc/shm: Stop using the vma linked list

When searching for a VMA, a maple state can be used instead of the linked list in
the mm_struct

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agofs/userfaultfd: Stop using vma linked list.
Liam R. Howlett [Mon, 4 Jan 2021 19:49:47 +0000 (14:49 -0500)]
fs/userfaultfd: Stop using vma linked list.

Don't use the mm_struct linked list or the vma->vm_next in prep for removal

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agofs/proc/task_mmu: Stop using linked list and highest_vm_end
Liam R. Howlett [Mon, 4 Jan 2021 19:47:48 +0000 (14:47 -0500)]
fs/proc/task_mmu: Stop using linked list and highest_vm_end

Remove references to mm_struct linked list and highest_vm_end for when they are removed

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agofs/proc/base: Use maple tree iterators in place of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:46:23 +0000 (14:46 -0500)]
fs/proc/base: Use maple tree iterators in place of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agofs/exec: Use vma_next() instead of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:45:37 +0000 (14:45 -0500)]
fs/exec: Use vma_next() instead of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agofs/coredump: Use maple tree iterators in place of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:44:45 +0000 (14:44 -0500)]
fs/coredump: Use maple tree iterators in place of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agofs/binfmt_elf: Use maple tree iterators for fill_files_note()
Liam R. Howlett [Mon, 4 Jan 2021 19:44:16 +0000 (14:44 -0500)]
fs/binfmt_elf: Use maple tree iterators for fill_files_note()

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agodrivers/tee/optee: Use maple tree iterators for __check_mem_type()
Liam R. Howlett [Mon, 4 Jan 2021 19:43:36 +0000 (14:43 -0500)]
drivers/tee/optee: Use maple tree iterators for __check_mem_type()

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agodrivers/oprofile: Lookup address in tree instead of linked list.
Liam R. Howlett [Mon, 4 Jan 2021 19:42:37 +0000 (14:42 -0500)]
drivers/oprofile: Lookup address in tree instead of linked list.

Use the vma interface to find the vma if one exists instead of the linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agodrivers/misc/cxl: Use maple tree iterators for cxl_prefault_vma()
Liam R. Howlett [Mon, 4 Jan 2021 19:31:50 +0000 (14:31 -0500)]
drivers/misc/cxl: Use maple tree iterators for cxl_prefault_vma()

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoarch/xtensa: Use maple tree iterators for unmapped area
Liam R. Howlett [Mon, 4 Jan 2021 19:30:59 +0000 (14:30 -0500)]
arch/xtensa: Use maple tree iterators for unmapped area

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoarch/x86: Use maple tree iterators for vdso/vma
Liam R. Howlett [Mon, 4 Jan 2021 19:30:25 +0000 (14:30 -0500)]
arch/x86: Use maple tree iterators for vdso/vma

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoarch/um: Use maple tree iterators instead of linked list
Liam R. Howlett [Mon, 4 Jan 2021 19:29:56 +0000 (14:29 -0500)]
arch/um: Use maple tree iterators instead of linked list

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoarch/s390: Use maple tree iterators instead of linked list.
Liam R. Howlett [Mon, 4 Jan 2021 19:29:19 +0000 (14:29 -0500)]
arch/s390: Use maple tree iterators instead of linked list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoarch/powerpc: Optimize cell spu task sync.
Liam R. Howlett [Mon, 4 Jan 2021 19:28:48 +0000 (14:28 -0500)]
arch/powerpc: Optimize cell spu task sync.

Use the vma api to look up the spu reference instead of walking the linked list.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoarch/powerpc: Remove mmap linked list from mm/book2s32/subpage_prot
Liam R. Howlett [Mon, 4 Jan 2021 19:26:34 +0000 (14:26 -0500)]
arch/powerpc: Remove mmap linked list from mm/book2s32/subpage_prot

Start using the maple tree

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoarch/powerpc: Remove mmap linked list from mm/book2s32/tlb
Liam R. Howlett [Mon, 4 Jan 2021 19:25:54 +0000 (14:25 -0500)]
arch/powerpc: Remove mmap linked list from mm/book2s32/tlb

Start using the maple tree

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoarch/parsic: Remove mmap linked list from kernel/cache
Liam R. Howlett [Mon, 4 Jan 2021 19:25:19 +0000 (14:25 -0500)]
arch/parsic: Remove mmap linked list from kernel/cache

Start using the maple tree

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoarch/arm64: Remove mmap linked list from vdso.
Liam R. Howlett [Mon, 4 Jan 2021 19:24:40 +0000 (14:24 -0500)]
arch/arm64: Remove mmap linked list from vdso.

Start using the maple tree

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm: Introduce vma_next() and vma_prev()
Liam R. Howlett [Mon, 4 Jan 2021 20:13:14 +0000 (15:13 -0500)]
mm: Introduce vma_next() and vma_prev()

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agommap: make remove_vma_list() inline
Liam R. Howlett [Wed, 25 Nov 2020 01:24:10 +0000 (20:24 -0500)]
mmap: make remove_vma_list() inline

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mmap: Change do_brk_munmap() to use do_mas_align_munmap()
Liam R. Howlett [Tue, 1 Dec 2020 02:30:04 +0000 (21:30 -0500)]
mm/mmap: Change do_brk_munmap() to use do_mas_align_munmap()

do_brk_munmap() already has aligned addresses.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agommap: Remove __do_munmap() in favour of do_mas_munmap()
Liam R. Howlett [Tue, 24 Nov 2020 21:27:58 +0000 (16:27 -0500)]
mmap: Remove __do_munmap() in favour of do_mas_munmap()

Export new interface and use it in place of old interface.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agommap: Use find_vma_intersection in do_mmap() for overlap
Liam R. Howlett [Tue, 24 Nov 2020 21:23:38 +0000 (16:23 -0500)]
mmap: Use find_vma_intersection in do_mmap() for overlap

When detecting a conflict with MAP_FIXED_NOREPLACE, using the new interface avoids
the need for a temp variable

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mmap: Add do_mas_munmap() and wraper for __do_munmap()
Liam R. Howlett [Tue, 24 Nov 2020 19:50:43 +0000 (14:50 -0500)]
mm/mmap: Add do_mas_munmap() and wraper for __do_munmap()

To avoid extra tree work, it is necessary to support passing in a maple state
to key functions.  Start this work with __do_munmap().

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mmap: Move mmap_region() below do_munmap()
Liam R. Howlett [Wed, 27 Jan 2021 15:25:13 +0000 (10:25 -0500)]
mm/mmap: Move mmap_region() below do_munmap()

Relocation of code for the next commit.  There should be no changes
here.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mmap: Change __do_munmap() to avoid unnecessary lookups.
Liam R. Howlett [Thu, 19 Nov 2020 17:57:23 +0000 (12:57 -0500)]
mm/mmap: Change __do_munmap() to avoid unnecessary lookups.

As there is no longer a vmacache, find_vma() is more expensive and so
avoid doing them

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm: Remove vmacache
Liam R. Howlett [Mon, 16 Nov 2020 19:50:20 +0000 (14:50 -0500)]
mm: Remove vmacache

The maple tree is able to find a VMA quick enough to no longer need the
vma cache.  Remove the vmacache to reduce work in keeping it up to date
and code complexity.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mmap: Drop munmap_vma_range()
Liam R. Howlett [Wed, 11 Nov 2020 18:32:57 +0000 (13:32 -0500)]
mm/mmap: Drop munmap_vma_range()

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mmap: Change mmap_region to use maple tree state
Liam R. Howlett [Tue, 10 Nov 2020 18:37:40 +0000 (13:37 -0500)]
mm/mmap: Change mmap_region to use maple tree state

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm: Change find_vma_intersection to maple tree and make find_vma to
Liam R. Howlett [Mon, 28 Sep 2020 19:50:19 +0000 (15:50 -0400)]
mm: Change find_vma_intersection to maple tree and make find_vma to
inline.

Move find_vma_intersection() to mmap.c and change implementation to
maple tree.

When searching for a vma within a range, it is easier to use the maple
tree interface.  This means the find_vma() call changes to a special
case of the find_vma_intersection().  Exported for kvm module.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mmap: Change vm_brk_flags() to use mm_populate_vma()
Liam R. Howlett [Fri, 9 Oct 2020 19:57:40 +0000 (15:57 -0400)]
mm/mmap: Change vm_brk_flags() to use mm_populate_vma()

Skip searching for the vma to populate after creating it by passing in a pointer.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mmap: Change do_brk_flags() to expand existing VMA and add
Liam R. Howlett [Mon, 21 Sep 2020 14:47:34 +0000 (10:47 -0400)]
mm/mmap: Change do_brk_flags() to expand existing VMA and add
do_brk_munmap()

Avoid allocating a new VMA when it is not necessary.  Expand or contract
the existing VMA instead.  This avoids unnecessary tree manipulations
and allocations.

Once the VMA is known, use it directly when populating to avoid
unnecessary lookup work.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/gup: Add mm_populate_vma() for use when the vma is known
Liam R. Howlett [Wed, 2 Dec 2020 18:15:58 +0000 (13:15 -0500)]
mm/gup: Add mm_populate_vma() for use when the vma is known

When a vma is known, avoid calling mm_populate to search for the vma to
populate.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm: Remove rb tree.
Liam R. Howlett [Fri, 24 Jul 2020 19:06:25 +0000 (15:06 -0400)]
mm: Remove rb tree.

Remove the RB tree and start using the maple tree for vm_area_struct
tracking.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agokernel/fork: Convert dup_mmap to use maple tree
Liam R. Howlett [Fri, 24 Jul 2020 17:32:30 +0000 (13:32 -0400)]
kernel/fork: Convert dup_mmap to use maple tree

Use the maple tree iterator to duplicate the mm_struct trees.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mmap: Change unmapped_area and unmapped_area_topdown to use maple
Liam R. Howlett [Fri, 24 Jul 2020 16:01:47 +0000 (12:01 -0400)]
mm/mmap: Change unmapped_area and unmapped_area_topdown to use maple
tree

Use the new maple tree data structure to find an unmapped area.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mmap: Change find_vma_prev() to use maple tree
Liam R. Howlett [Fri, 24 Jul 2020 15:48:08 +0000 (11:48 -0400)]
mm/mmap: Change find_vma_prev() to use maple tree

Change the implementation of find_vma_prev to use the new maple tree
data structure.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mmap: Change find_vma() to use the maple tree
Liam R. Howlett [Fri, 24 Jul 2020 15:38:39 +0000 (11:38 -0400)]
mm/mmap: Change find_vma() to use the maple tree

Start using the maple tree to find VMA entries in an mm_struct.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm/mmap: Introduce unlock_range() for code cleanup
Liam R. Howlett [Mon, 21 Sep 2020 19:37:14 +0000 (15:37 -0400)]
mm/mmap: Introduce unlock_range() for code cleanup

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agomm: Start tracking VMAs with maple tree
Liam R. Howlett [Fri, 24 Jul 2020 17:04:30 +0000 (13:04 -0400)]
mm: Start tracking VMAs with maple tree

Start tracking the VMAs with the new maple tree structure in parallel
with the rb_tree.  Add debug and trace events for maple tree operations
and duplicate the rb_tree that is created on forks into the maple tree.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoMaple Tree: Add new data structure
Liam R. Howlett [Fri, 24 Jul 2020 17:02:52 +0000 (13:02 -0400)]
Maple Tree: Add new data structure

The Maple Tree is a new data structure optimised for storing
ranges.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoradix tree test suite: Add __must_be_array() support
Liam R. Howlett [Thu, 14 Jan 2021 01:30:02 +0000 (20:30 -0500)]
radix tree test suite: Add __must_be_array() support

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoradix tree test suite: Add keme_cache_alloc_bulk() support
Liam R. Howlett [Sat, 17 Oct 2020 00:05:01 +0000 (20:05 -0400)]
radix tree test suite: Add keme_cache_alloc_bulk() support

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoradix tree test suite: Add support for kmem_cache_free_bulk
Liam R. Howlett [Tue, 25 Aug 2020 19:51:54 +0000 (15:51 -0400)]
radix tree test suite: Add support for kmem_cache_free_bulk

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoradix tree test suite: Add support for fallthrough attribute
Liam R. Howlett [Mon, 27 Jul 2020 19:28:58 +0000 (15:28 -0400)]
radix tree test suite: Add support for fallthrough attribute

Add support for fallthrough on case statements.  Note this does *NOT*
check for missing fallthrough, but does allow compiling of code with
fallthrough in case statements.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
4 years agoradix tree test suite: Enhancements for Maple Tree
Liam R. Howlett [Thu, 27 Feb 2020 20:13:20 +0000 (15:13 -0500)]
radix tree test suite: Enhancements for Maple Tree

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agopci: test for unexpectedly disabled bridges
Linus Torvalds [Thu, 31 Dec 2020 22:05:12 +0000 (22:05 +0000)]
pci: test for unexpectedly disabled bridges

The all-ones value is not just a "device didn't exist" case, it's also
potentially a quite valid value, so not restoring it would be wrong.

What *would* be interesting is to hear where the bad values came from in
the first place.  It sounds like the device state is saved after the PCI
bus controller in front of the device has been crapped on, resulting in the
PCI config cycles never reaching the device at all.

Something along this patch (together with suspend/resume debugging output)
migth help pinpoint it.  But it really sounds like something totally
brokenly turned off the PCI bridge (some ACPI shutdown crud?  I wouldn't be
entirely surprised)

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agokernel/fork.c: export kernel_thread() to modules
Andrew Morton [Thu, 31 Dec 2020 22:05:11 +0000 (22:05 +0000)]
kernel/fork.c: export kernel_thread() to modules

mutex-subsystem-synchro-test-module.patch needs this

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agomutex subsystem, synchro-test module
David Howells [Thu, 31 Dec 2020 22:05:09 +0000 (22:05 +0000)]
mutex subsystem, synchro-test module

The attached patch adds a module for testing and benchmarking mutexes,
semaphores and R/W semaphores.

Using it is simple:

insmod synchro-test.ko <args>

It will exit with error ENOANO after running the tests and printing the
results to the kernel console log.

The available arguments are:

 (*) mx=N

Start up to N mutex thrashing threads, where N is at most 20. All will
try and thrash the same mutex.

 (*) sm=N

Start up to N counting semaphore thrashing threads, where N is at most
20. All will try and thrash the same semaphore.

 (*) ism=M

Initialise the counting semaphore with M, where M is any positive
integer greater than zero. The default is 4.

 (*) rd=N
 (*) wr=O
 (*) dg=P

Start up to N reader thrashing threads, O writer thrashing threads and
P downgrader thrashing threads, where N, O and P are at most 20
apiece. All will try and thrash the same read/write semaphore.

 (*) elapse=N

Run the tests for N seconds. The default is 5.

 (*) load=N

Each thread delays for N uS whilst holding the lock. The dfault is 0.

 (*) interval=N

Each thread delays for N uS whilst not holding the lock. The default
is 0.

 (*) do_sched=1

Each thread will call schedule if required after each iteration.

 (*) v=1

Print more verbose information, including a thread iteration
distribution list.

The module should be enabled by turning on CONFIG_DEBUG_SYNCHRO_TEST to "m".

[randy.dunlap@oracle.com: fix build errors, add <sched.h> header file]
[akpm@linux-foundation.org: remove smp_lock.h inclusion]
[viro@ZenIV.linux.org.uk: kill daemonize() calls]
[rdunlap@xenotime.net: fix printk format warrnings]
[walken@google.com: add spinlock test]
[walken@google.com: document default load and interval values]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agoReleasing resources with children
Matthew Wilcox [Thu, 31 Dec 2020 22:05:08 +0000 (22:05 +0000)]
Releasing resources with children

What does it mean to release a resource with children?  Should the children
become children of the released resource's parent?  Should they be released
too?  Should we fail the release?

I bet we have no callers who expect this right now, but with
insert_resource() we may get some.  At the point where someone hits this
BUG we can figure out what semantics we want.

Signed-off-by: Matthew Wilcox <willy@parisc-linux.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agoMake sure nobody's leaking resources
Matthew Wilcox [Thu, 31 Dec 2020 22:05:08 +0000 (22:05 +0000)]
Make sure nobody's leaking resources

Currently, releasing a resource also releases all of its children.  That
made sense when request_resource was the main method of dividing up the
memory map.  With the increased use of insert_resource, it seems to me that
we should instead reparent the newly orphaned resources.  Before we do
that, let's make sure that nobody's actually relying on the current
semantics.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Cc: Greg KH <greg@kroah.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agosecretmem-test-add-basic-selftest-for-memfd_secret2-fix
John Hubbard [Thu, 31 Dec 2020 22:05:07 +0000 (22:05 +0000)]
secretmem-test-add-basic-selftest-for-memfd_secret2-fix

fix build

Link: https://lkml.kernel.org/r/248f928b-1383-48ea-8584-ec10146e60c9@nvidia.com
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agosecretmem: test: add basic selftest for memfd_secret(2)
Mike Rapoport [Thu, 31 Dec 2020 22:05:07 +0000 (22:05 +0000)]
secretmem: test: add basic selftest for memfd_secret(2)

The test verifies that file descriptor created with memfd_secret does
not allow read/write operations, that secret memory mappings respect
RLIMIT_MEMLOCK and that remote accesses with process_vm_read() and
ptrace() to the secret memory fail.

Link: https://lkml.kernel.org/r/20201203062949.5484-11-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agoarch-mm-wire-up-memfd_secret-system-call-were-relevant-fix-fix
Stephen Rothwell [Thu, 31 Dec 2020 22:05:07 +0000 (22:05 +0000)]
arch-mm-wire-up-memfd_secret-system-call-were-relevant-fix-fix

fix syscall numbering

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agoarch-mm-wire-up-memfd_secret-system-call-were-relevant-fix
Andrew Morton [Thu, 31 Dec 2020 22:05:06 +0000 (22:05 +0000)]
arch-mm-wire-up-memfd_secret-system-call-were-relevant-fix

Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agoarch, mm: wire up memfd_secret system call where relevant
Mike Rapoport [Thu, 31 Dec 2020 22:05:06 +0000 (22:05 +0000)]
arch, mm: wire up memfd_secret system call where relevant

Wire up memfd_secret system call on architectures that define
ARCH_HAS_SET_DIRECT_MAP, namely arm64, risc-v and x86.

Link: https://lkml.kernel.org/r/20201203062949.5484-10-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agoPM: hibernate: disable when there are active secretmem users
Mike Rapoport [Thu, 31 Dec 2020 22:05:05 +0000 (22:05 +0000)]
PM: hibernate: disable when there are active secretmem users

It is unsafe to allow saving of secretmem areas to the hibernation snapshot
as they would be visible after the resume and this essentially will defeat
the purpose of secret memory mappings.

Prevent hibernation whenever there are active secret memory users.

Link: https://lkml.kernel.org/r/20201203062949.5484-9-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agosecretmem: add memcg accounting
Mike Rapoport [Thu, 31 Dec 2020 22:05:04 +0000 (22:05 +0000)]
secretmem: add memcg accounting

Account memory consumed by secretmem to memcg. The accounting is updated
when the memory is actually allocated and freed.

Link: https://lkml.kernel.org/r/20201203062949.5484-8-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agosecretmem: use PMD-size pages to amortize direct map fragmentation
Mike Rapoport [Thu, 31 Dec 2020 22:05:04 +0000 (22:05 +0000)]
secretmem: use PMD-size pages to amortize direct map fragmentation

Removing a PAGE_SIZE page from the direct map every time such page is
allocated for a secret memory mapping will cause severe fragmentation of
the direct map. This fragmentation can be reduced by using PMD-size pages
as a pool for small pages for secret memory mappings.

Add a gen_pool per secretmem inode and lazily populate this pool with
PMD-size pages.

As pages allocated by secretmem become unmovable, use CMA to back large
page caches so that page allocator won't be surprised by failing attempt to
migrate these pages.

The CMA area used by secretmem is controlled by the "secretmem=" kernel
parameter. This allows explicit control over the memory available for
secretmem and provides upper hard limit for secretmem consumption.

Link: https://lkml.kernel.org/r/20201203062949.5484-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agomm-introduce-memfd_secret-system-call-to-create-secret-memory-areas-fix
Mike Rapoport [Thu, 31 Dec 2020 22:05:04 +0000 (22:05 +0000)]
mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas-fix

restore original white space

Link: https://lkml.kernel.org/r/20201220060536.GB392333@linux.ibm.com
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agomm: introduce memfd_secret system call to create "secret" memory areas
Mike Rapoport [Thu, 31 Dec 2020 22:05:03 +0000 (22:05 +0000)]
mm: introduce memfd_secret system call to create "secret" memory areas

Introduce "memfd_secret" system call with the ability to create memory
areas visible only in the context of the owning process and not mapped not
only to other processes but in the kernel page tables as well.

The user will create a file descriptor using the memfd_secret() system
call. The memory areas created by mmap() calls from this file descriptor
will be unmapped from the kernel direct map and they will be only mapped in
the page table of the owning mm.

The secret memory remains accessible in the process context using uaccess
primitives, but it is not accessible using direct/linear map addresses.

Functions in the follow_page()/get_user_page() family will refuse to return
a page that belongs to the secret memory area.

A page that was a part of the secret memory area is cleared when it is
freed.

The following example demonstrates creation of a secret mapping (error
handling is omitted):

fd = memfd_secret(0);
ftruncate(fd, MAP_SIZE);
ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

Link: https://lkml.kernel.org/r/20201203062949.5484-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agokfence: fix implicit function declaration
Anders Roxell [Thu, 31 Dec 2020 22:05:02 +0000 (22:05 +0000)]
kfence: fix implicit function declaration

When building kfence the following error shows up:

In file included from mm/kfence/report.c:13:
arch/arm64/include/asm/kfence.h: In function `kfence_protect_page':
arch/arm64/include/asm/kfence.h:12:2: error: implicit declaration of function `set_memory_valid' [-Werror=implicit-function-declaration]
   12 |  set_memory_valid(addr, 1, !protect);
      |  ^~~~~~~~~~~~~~~~

Use the correct include both
f2b7c491916d ("set_memory: allow querying whether set_direct_map_*() is actually enabled")
and 4c4c75881536 ("arm64, kfence: enable KFENCE for ARM64") went in the
same day via different trees.

Link: https://lkml.kernel.org/r/20201204121804.1532849-1-anders.roxell@linaro.org
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agoset_memory: allow querying whether set_direct_map_*() is actually enabled
Mike Rapoport [Thu, 31 Dec 2020 22:05:02 +0000 (22:05 +0000)]
set_memory: allow querying whether set_direct_map_*() is actually enabled

On arm64, set_direct_map_*() functions may return 0 without actually
changing the linear map. This behaviour can be controlled using kernel
parameters, so we need a way to determine at runtime whether calls to
set_direct_map_invalid_noflush() and set_direct_map_default_noflush() have
any effect.

Extend set_memory API with can_set_direct_map() function that allows
checking if calling set_direct_map_*() will actually change the page table,
replace several occurrences of open coded checks in arm64 with the new
function and provide a generic stub for architectures that always modify
page tables upon calls to set_direct_map APIs.

Link: https://lkml.kernel.org/r/20201203062949.5484-5-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agoset_memory-allow-set_direct_map__noflush-for-multiple-pages-fix
Andrew Morton [Thu, 31 Dec 2020 22:05:01 +0000 (22:05 +0000)]
set_memory-allow-set_direct_map__noflush-for-multiple-pages-fix

fix kernel/power/snapshot.c

Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agoset_memory: allow set_direct_map_*_noflush() for multiple pages
Mike Rapoport [Thu, 31 Dec 2020 22:05:01 +0000 (22:05 +0000)]
set_memory: allow set_direct_map_*_noflush() for multiple pages

The underlying implementations of set_direct_map_invalid_noflush() and
set_direct_map_default_noflush() allow updating multiple contiguous pages
at once.

Add numpages parameter to set_direct_map_*_noflush() to expose this ability
with these APIs.

Link: https://lkml.kernel.org/r/20201203062949.5484-4-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agommap: make mlock_future_check() global
Mike Rapoport [Thu, 31 Dec 2020 22:05:00 +0000 (22:05 +0000)]
mmap: make mlock_future_check() global

It will be used by the upcoming secret memory implementation.

Link: https://lkml.kernel.org/r/20201203062949.5484-3-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agomm: add definition of PMD_PAGE_ORDER
Mike Rapoport [Thu, 31 Dec 2020 22:04:59 +0000 (22:04 +0000)]
mm: add definition of PMD_PAGE_ORDER

Patch series "mm: introduce memfd_secret system call to create "secret" memory areas", v14.

This is an implementation of "secret" mappings backed by a file descriptor.

The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call.  The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping.  The pages in that mapping will be marked as not present
in the direct map and will be present only in the page table of the owning
mm.

Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.

Additionally, in the future the secret mappings may be used as a mean to
protect guest memory in a virtual machine host.

For demonstration of secret memory usage we've created a userspace library

https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloader.git

that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it.  We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.

Hiding secret memory mappings behind an anonymous file allows (ab)use of
the page cache for tracking pages allocated for the "secret" mappings as
well as using address_space_operations for e.g.  page migration callbacks.

The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native"
mm ABIs in the future.

To limit fragmentation of the direct map to splitting only PUD-size pages,
I've added an amortizing cache of PMD-size pages to each file descriptor
that is used as an allocation pool for the secret memory areas.

As the memory allocated by secretmem becomes unmovable, we use CMA to back
large page caches so that page allocator won't be surprised by failing
attempt to migrate these pages.

This patch (of 10):

The definition of PMD_PAGE_ORDER denoting the number of base pages in the
second-level leaf page is already used by DAX and maybe handy in other
cases as well.

Several architectures already have definition of PMD_ORDER as the size of
second level page table, so to avoid conflict with these definitions use
PMD_PAGE_ORDER name and update DAX respectively.

Link: https://lkml.kernel.org/r/20201203062949.5484-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20201203062949.5484-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agolinux-next-git-rejects
Andrew Morton [Thu, 31 Dec 2020 22:04:59 +0000 (22:04 +0000)]
linux-next-git-rejects

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agolinux-next
Andrew Morton [Thu, 31 Dec 2020 22:04:58 +0000 (22:04 +0000)]
linux-next

GIT e775f22d9be6cc53a0b81a0b15bde19d463bbccc

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agoaio: simplify read_events()
Oleg Nesterov [Thu, 31 Dec 2020 22:04:43 +0000 (22:04 +0000)]
aio: simplify read_events()

Change wait_event_hrtimeout() to not call __wait_event_hrtimeout() if
timeout == 0, this matches other _timeout() helpers in wait.h.

This allows to simplify its only user, read_events(), it no longer needs
to optimize the "until == 0" case by hand.

Note: this patch doesn't use ___wait_cond_timeout because _hrtimeout()
also differs in that it returns 0 if succeeds and -ETIME on timeout.
Perhaps we should change this to make it fully compatible with other
helpers.

Link: http://lkml.kernel.org/r/20190607175413.GA29187@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Eric Wong <e@80x24.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agolib: optimize cpumask_local_spread()
Yuqi Jin [Thu, 31 Dec 2020 22:04:43 +0000 (22:04 +0000)]
lib: optimize cpumask_local_spread()

In multi-processor and NUMA system, I/O driver will find cpu cores that
which shall be bound IRQ.  When cpu cores in the local numa have been used
up, it is better to find the node closest to the local numa node for
performance, instead of choosing any online cpu immediately.

On arm64 or x86 platform that has 2-sockets and 4-NUMA nodes, if the
network card is located in node2 of socket1, while the number queues of
network card is greater than the number of cores of node2, when all cores
of node2 has been bound to the queues, the remaining queues will be bound
to the cores of node0 which is further than NUMA node3.  It is not
friendly for performance or Intel's DDIO (Data Direct I/O Technology) when
if the user enables SNC (sub-NUMA-clustering).  Let's improve it and find
the nearest unused node through NUMA distance for the non-local NUMA
nodes.

On Huawei Kunpeng 920 server, there are 4 NUMA node(0 - 3) in the 2-cpu
system(0 - 1). The topology of this server is followed:
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
node 0 size: 63379 MB
node 0 free: 61899 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 1 size: 64509 MB
node 1 free: 63942 MB
node 2 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
node 2 size: 64509 MB
node 2 free: 63056 MB
node 3 cpus: 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
node 3 size: 63997 MB
node 3 free: 63420 MB
node distances:
node   0   1   2   3
  0:  10  16  32  33
  1:  16  10  25  32
  2:  32  25  10  16
  3:  33  32  16  10

We perform PS (parameter server) business test, the behavior of the
service is that the client initiates a request through the network card,
the server responds to the request after calculation.  When two PS
processes run on node2 and node3 separately and the network card is
located on 'node2' which is in cpu1, the performance of node2 (26W QPS)
and node3 (22W QPS) is different.

It is better that the NIC queues are bound to the cpu1 cores in turn, then
XPS will also be properly initialized, while cpumask_local_spread only
considers the local node.  When the number of NIC queues exceeds the
number of cores in the local node, it returns to the online core directly.
So when PS runs on node3 sending a calculated request, the performance is
not as good as the node2.

The IRQ from 369-392 will be bound from NUMA node0 to NUMA node3 with this
patch, before the patch:

Euler:/sys/bus/pci # cat /proc/irq/369/smp_affinity_list
0
Euler:/sys/bus/pci # cat /proc/irq/370/smp_affinity_list
1
...
Euler:/sys/bus/pci # cat /proc/irq/391/smp_affinity_list
22
Euler:/sys/bus/pci # cat /proc/irq/392/smp_affinity_list
23
After the patch:
Euler:/sys/bus/pci # cat /proc/irq/369/smp_affinity_list
72
Euler:/sys/bus/pci # cat /proc/irq/370/smp_affinity_list
73
...
Euler:/sys/bus/pci # cat /proc/irq/391/smp_affinity_list
94
Euler:/sys/bus/pci # cat /proc/irq/392/smp_affinity_list
95

So the performance of the node3 is the same as node2 that is 26W QPS when
the network card is still in 'node2' with the patch.

Link: https://lkml.kernel.org/r/1605668072-44780-1-git-send-email-zhangshaokun@hisilicon.com
Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Juergen Gross <jgross@suse.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agolib/linear_ranges: fix repeated words & one typo
Randy Dunlap [Thu, 31 Dec 2020 22:04:42 +0000 (22:04 +0000)]
lib/linear_ranges: fix repeated words & one typo

Change "which which" to "for which" in 3 places.
Change "ranges" to possessive "range's" in 1 place.

Link: https://lkml.kernel.org/r/20201221040610.12809-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agoproc/sysctl: make protected_* world readable
Julius Hemanth Pitti [Thu, 31 Dec 2020 22:04:42 +0000 (22:04 +0000)]
proc/sysctl: make protected_* world readable

protected_* files have 600 permissions which prevents non-superuser from
reading them.

Container like "AWS greengrass" refuse to launch unless
protected_hardlinks and protected_symlinks are set.  When containers like
these run with "userns-remap" or "--user" mapping container's root to
non-superuser on host, they fail to run due to denied read access to these
files.

As these protections are hardly a secret, and do not possess any security
risk, making them world readable.

Though above greengrass usecase needs read access to only
protected_hardlinks and protected_symlinks files, setting all other
protected_* files to 644 to keep consistency.

Link: http://lkml.kernel.org/r/20200709235115.56954-1-jpitti@cisco.com
Fixes: 800179c9b8a1 ("fs: add link restrictions")
Signed-off-by: Julius Hemanth Pitti <jpitti@cisco.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agosysctl.c: fix underflow value setting risk in vm_table
Lin Feng [Thu, 31 Dec 2020 22:04:41 +0000 (22:04 +0000)]
sysctl.c: fix underflow value setting risk in vm_table

Apart from subsystem specific .proc_handler handler, all ctl_tables with
extra1 and extra2 members set should use proc_dointvec_minmax instead of
proc_dointvec, or the limit set in extra* never work and potentially echo
underflow values(negative numbers) is likely make system unstable.

Especially vfs_cache_pressure and zone_reclaim_mode, -1 is apparently not
a valid value, but we can set to them.  And then kernel may crash.

# echo -1 > /proc/sys/vm/vfs_cache_pressure

Link: https://lkml.kernel.org/r/20201223105535.2875-1-linf@wangsu.com
Signed-off-by: Lin Feng <linf@wangsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agoproc/wchan: use printk format instead of lookup_symbol_name()
Helge Deller [Thu, 31 Dec 2020 22:04:41 +0000 (22:04 +0000)]
proc/wchan: use printk format instead of lookup_symbol_name()

To resolve the symbol fuction name for wchan, use the printk format
specifier %ps instead of manually looking up the symbol function name
via lookup_symbol_name().

Link: https://lkml.kernel.org/r/20201217165413.GA1959@ls3530.fritz.box
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>