]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
3 days agorust: maple_tree: add MapleTree
Alice Ryhl [Tue, 2 Sep 2025 08:35:11 +0000 (08:35 +0000)]
rust: maple_tree: add MapleTree

Patch series "Add Rust abstraction for Maple Trees", v3.

This will be used in the Tyr driver [1] to allocate from the GPU's VA
space that is not owned by userspace, but by the kernel, for kernel GPU
mappings.

Danilo tells me that in nouveau, the maple tree is used for keeping track
of "VM regions" on top of GPUVM, and that he will most likely end up doing
the same in the Rust Nova driver as well.

These abstractions intentionally do not expose any way to make use of
external locking.  You are required to use the internal spinlock.  For
now, we do not support loads that only utilize rcu for protection.

This contains some parts taken from Andrew Ballance's RFC [2] from April.
However, it has also been reworked significantly compared to that RFC
taking the use-cases in Tyr into account.

This patch (of 3):

The maple tree will be used in the Tyr driver to allocate and keep track
of GPU allocations created internally (i.e.  not by userspace).  It will
likely also be used in the Nova driver eventually.

This adds the simplest methods for additional and removal that do not
require any special care with respect to concurrency.

This implementation is based on the RFC by Andrew but with significant
changes to simplify the implementation.

Link: https://lkml.kernel.org/r/20250902-maple-tree-v3-0-fb5c8958fb1e@google.com
Link: https://lkml.kernel.org/r/20250902-maple-tree-v3-1-fb5c8958fb1e@google.com
Link: https://lore.kernel.org/r/20250627-tyr-v1-1-cb5f4c6ced46@collabora.com
Link: https://lore.kernel.org/r/20250405060154.1550858-1-andrewjballance@gmail.com
Co-developed-by: Andrew Ballance <andrewjballance@gmail.com>
Signed-off-by: Andrew Ballance <andrewjballance@gmail.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Daniel Almeida <daniel.almeida@collabora.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Trevor Gross <tmgross@umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/show_mem: add trylock while printing alloc info
Yueyang Pan [Wed, 3 Sep 2025 11:16:14 +0000 (04:16 -0700)]
mm/show_mem: add trylock while printing alloc info

In production, show_mem() can be called concurrently from two different
entities, for example one from oom_kill_process() another from
__alloc_pages_slowpath from another kthread.  This patch adds a spinlock
and invokes trylock before printing out the kernel alloc info in
show_mem().  This way two alloc info won't interleave with each other,
which then makes parsing easier.

Link: https://lkml.kernel.org/r/4ed91296e0c595d945a38458f7a8d9611b0c1e52.1756897825.git.pyyjason@gmail.com
Signed-off-by: Yueyang Pan <pyyjason@gmail.com>
Acked-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/show_mem: dump the status of the mem alloc profiling before printing
Yueyang Pan [Wed, 3 Sep 2025 11:16:13 +0000 (04:16 -0700)]
mm/show_mem: dump the status of the mem alloc profiling before printing

This patchset fixes two issues we saw in production rollout.

The first issue is that we saw all zero output of memory allocation
profiling information from show_mem() if CONFIG_MEM_ALLOC_PROFILING is set
and sysctl.vm.mem_profiling=0.  This cause ambiguity as we don't know what
0B actually means in the output.  It can mean either memory allocation
profiling is temporary disabled or the allocation at that position is
actually 0.  Such ambiguity will make further parsing harder as we cannot
differentiate between two case.

The second issue is that multiple entities can call show_mem() which
messed up the allocation info in dmesg.  We saw outputs like this:

    327 MiB    83635 mm/compaction.c:1880 func:compaction_alloc
   48.4 GiB 12684937 mm/memory.c:1061 func:folio_prealloc
   7.48 GiB    10899 mm/huge_memory.c:1159 func:vma_alloc_anon_folio_pmd
    298 MiB    95216 kernel/fork.c:318 func:alloc_thread_stack_node
    250 MiB    63901 mm/zsmalloc.c:987 func:alloc_zspage
    1.42 GiB   372527 mm/memory.c:1063 func:folio_prealloc
    1.17 GiB    95693 mm/slub.c:2424 func:alloc_slab_page
     651 MiB   166732 mm/readahead.c:270 func:page_cache_ra_unbounded
     419 MiB   107261 net/core/page_pool.c:572 func:__page_pool_alloc_pages_slow
     404 MiB   103425 arch/x86/mm/pgtable.c:25 func:pte_alloc_one

The above example is because one kthread invokes show_mem() from
__alloc_pages_slowpath while kernel itself calls oom_kill_process()

This patch (of 2):

This patch prints the status of the memory allocation profiling before
__show_mem actually prints the detailed allocation info.  This way will
let us know the `0B` we saw in allocation info is because the profiling is
disabled or the allocation is actually 0B.

Link: https://lkml.kernel.org/r/cover.1756897825.git.pyyjason@gmail.com
Link: https://lkml.kernel.org/r/d7998ea0ddc2ea1a78bb6e89adf530526f76679a.1756897825.git.pyyjason@gmail.com
Signed-off-by: Yueyang Pan <pyyjason@gmail.com>
Acked-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()
Lorenzo Stoakes [Wed, 3 Sep 2025 17:48:42 +0000 (18:48 +0100)]
mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()

In commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for
nested file systems") we introduced the ability for stacked drivers and
file systems to correctly invoke the f_op->mmap_prepare() handler from an
f_op->mmap() handler via a compatibility layer implemented in
compat_vma_mmap_prepare().

This populates vm_area_desc fields according to those found in the (not
yet fully initialised) VMA passed to f_op->mmap().

However this function implicitly assumes that the struct file which we are
operating upon is equal to vma->vm_file.  This is not a safe assumption in
all cases.

The only really sane situation in which this matters would be something
like e.g.  i915_gem_dmabuf_mmap() which invokes vfs_mmap() against
obj->base.filp:

ret = vfs_mmap(obj->base.filp, vma);
if (ret)
return ret;

And then sets the VMA's file to this, should the mmap operation succeed:

vma_set_file(vma, obj->base.filp);

That is - it is the file that is intended to back the VMA mapping.

This is not an issue currently, as so far we have only implemented
f_op->mmap_prepare() handlers for some file systems and internal mm uses,
and the only stacked f_op->mmap() operations that can be performed upon
these are those in backing_file_mmap() and coda_file_mmap(), both of which
use vma->vm_file.

However, moving forward, as we convert drivers to using
f_op->mmap_prepare(), this will become a problem.

Resolve this issue by explicitly setting desc->file to the provided file
parameter and update callers accordingly.

Callers are expected to read desc->file and update desc->vm_file - the
former will be the file provided by the caller (if stacked, this may
differ from vma->vm_file).

If the caller needs to differentiate between the two they therefore now
can.

While we are here, also provide a variant of compat_vma_mmap_prepare()
that operates against a pointer to any file_operations struct and does not
assume that the file_operations struct we are interested in is file->f_op.

This function is __compat_vma_mmap_prepare() and we invoke it from
compat_vma_mmap_prepare() so that we share code between the two functions.

This is important, because some drivers provide hooks in a separate
struct, for instance struct drm_device provides an fops field for this
purpose.

Also update the VMA selftests accordingly.

Link: https://lkml.kernel.org/r/dd0c72df8a33e8ffaa243eeb9b01010b670610e9.1756920635.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: specify separate file and vm_file params in vm_area_desc
Lorenzo Stoakes [Wed, 3 Sep 2025 17:48:41 +0000 (18:48 +0100)]
mm: specify separate file and vm_file params in vm_area_desc

Patch series "mm: do not assume file == vma->vm_file in
compat_vma_mmap_prepare()", v2.

As part of the efforts to eliminate the problematic f_op->mmap callback, a
new callback - f_op->mmap_prepare was provided.

While we are converting these callbacks, we must deal with 'stacked'
filesystems and drivers - those which in their own f_op->mmap callback
invoke an inner f_op->mmap callback.

To accomodate for this, a compatibility layer is provided that, via
vfs_mmap(), detects if f_op->mmap_prepare is provided and if so, generates
a vm_area_desc containing the VMA's metadata and invokes the call.

So far, we have provided desc->file equal to vma->vm_file.  However this
is not necessarily valid, especially in the case of stacked drivers which
wish to assign a new file after the inner hook is invoked.

To account for this, we adjust vm_area_desc to have both file and vm_file
fields.  The .vm_file field is strictly set to vma->vm_file (or in the
case of a new mapping, what will become vma->vm_file).

However, .file is set to whichever file vfs_mmap() is invoked with when
using the compatibilty layer.

Therefore, if the VMA's file needs to be updated in .mmap_prepare,
desc->vm_file should be assigned, whilst desc->file should be read.

No current f_op->mmap_prepare users assign desc->file so this is safe to
do.

This makes the .mmap_prepare callback in the context of a stacked
filesystem or driver completely consistent with the existing .mmap
implementations.

While we're here, we do a few small cleanups, and ensure that we const-ify
things correctly in the vm_area_desc struct to avoid hooks accidentally
trying to assign fields they should not.

This patch (of 2):

Stacked filesystems and drivers may invoke mmap hooks with a struct file
pointer that differs from the overlying file.  We will make this
functionality possible in a subsequent patch.

In order to prepare for this, let's update vm_area_struct to separately
provide desc->file and desc->vm_file parameters.

The desc->file parameter is the file that the hook is expected to operate
upon, and is not assignable (though the hok may wish to e.g.  update the
file's accessed time for instance).

The desc->vm_file defaults to what will become vma->vm_file and is what
the hook must reassign should it wish to change the VMA"s vma->vm_file.

For now we keep desc->file, vm_file the same to remain consistent.

No f_op->mmap_prepare() callback sets a new vma->vm_file currently, so
this is safe to change.

While we're here, make the mm_struct desc->mm pointers at immutable as
well as the desc->mm field itself.

As part of this change, also update the single hook which this would
otherwise break - mlock_future_ok(), invoked by secretmem_mmap_prepare()).

We additionally update set_vma_from_desc() to compare fields in a more
logical fashion, checking the (possibly) user-modified fields as the first
operand against the existing value as the second one.

Additionally, update VMA tests to accommodate changes.

Link: https://lkml.kernel.org/r/cover.1756920635.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/3fa15a861bb7419f033d22970598aa61850ea267.1756920635.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pedro Falcato <pfalcato@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agovirtio_balloon: stop calling page_address() in free_pages()
Vishal Moola (Oracle) [Wed, 3 Sep 2025 18:59:21 +0000 (11:59 -0700)]
virtio_balloon: stop calling page_address() in free_pages()

free_pages() should be used when we only have a virtual address.  We
should call __free_pages() directly on our page instead.

Link: https://lkml.kernel.org/r/20250903185921.1785167-8-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Justin Sanders <justin@coraid.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agoarm64: stop calling page_address() in free_pages()
Vishal Moola (Oracle) [Wed, 3 Sep 2025 18:59:20 +0000 (11:59 -0700)]
arm64: stop calling page_address() in free_pages()

free_pages() should be used when we only have a virtual address.  We
should call __free_pages() directly on our page instead.

Link: https://lkml.kernel.org/r/20250903185921.1785167-7-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Justin Sanders <justin@coraid.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agopowerpc: stop calling page_address() in free_pages()
Vishal Moola (Oracle) [Wed, 3 Sep 2025 18:59:19 +0000 (11:59 -0700)]
powerpc: stop calling page_address() in free_pages()

free_pages() should be used when we only have a virtual address.  We
should call __free_pages() directly on our page instead.

Link: https://lkml.kernel.org/r/20250903185921.1785167-6-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Justin Sanders <justin@coraid.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agoriscv: stop calling page_address() in free_pages()
Vishal Moola (Oracle) [Wed, 3 Sep 2025 18:59:18 +0000 (11:59 -0700)]
riscv: stop calling page_address() in free_pages()

free_pages() should be used when we only have a virtual address.  We
should call __free_pages() directly on our page instead.

Link: https://lkml.kernel.org/r/20250903185921.1785167-5-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Justin Sanders <justin@coraid.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agox86: stop calling page_address() in free_pages()
Vishal Moola (Oracle) [Wed, 3 Sep 2025 18:59:17 +0000 (11:59 -0700)]
x86: stop calling page_address() in free_pages()

free_pages() should be used when we only have a virtual address.  We
should call __free_pages() directly on our page instead.

Link: https://lkml.kernel.org/r/20250903185921.1785167-4-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Justin Sanders <justin@coraid.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agoaoe: stop calling page_address() in free_page()
Vishal Moola (Oracle) [Wed, 3 Sep 2025 18:59:16 +0000 (11:59 -0700)]
aoe: stop calling page_address() in free_page()

free_page() should be used when we only have a virtual address.  We should
call __free_page() directly on our page instead.

Link: https://lkml.kernel.org/r/20250903185921.1785167-3-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Justin Sanders <justin@coraid.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/page_alloc: add kernel-docs for free_pages()
Vishal Moola (Oracle) [Wed, 3 Sep 2025 18:59:15 +0000 (11:59 -0700)]
mm/page_alloc: add kernel-docs for free_pages()

Patch series "Cleanup free_pages() misuse", v3.

free_pages() is supposed to be called when we only have a virtual address.
__free_pages() is supposed to be called when we have a page.

There are a number of callers that use page_address() to get a page's
virtual address then call free_pages() on it when they should just call
__free_pages() directly.

Add kernel-docs for free_pages() to help callers better understand which
function they should be calling, and replace the obvious cases of misuse.

This patch (of 7):

Add kernel-docs to free_pages().  This will help callers understand when
to use it instead of __free_pages().

Link: https://lkml.kernel.org/r/20250903185921.1785167-1-vishal.moola@gmail.com
Link: https://lkml.kernel.org/r/20250903185921.1785167-2-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Justin Sanders <justin@coraid.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: remove mlock_count from struct page
Matthew Wilcox (Oracle) [Wed, 3 Sep 2025 19:10:39 +0000 (20:10 +0100)]
mm: remove mlock_count from struct page

All users now use folio->mlock_count so we can remove this element of
struct page.  Move the useful comments over to struct folio.

Link: https://lkml.kernel.org/r/20250903191041.1630338-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agompage: convert do_mpage_readpage() to return void type
Chi Zhiling [Fri, 29 Aug 2025 02:36:59 +0000 (10:36 +0800)]
mpage: convert do_mpage_readpage() to return void type

The return value of do_mpage_readpage() is arg->bio, which is already set
in the arg structure.  Returning it again is redundant.

This patch changes the return type to void since the caller doesn't care
about the return value.

Link: https://lkml.kernel.org/r/20250829023659.688649-2-chizhiling@163.com
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Sungjong Seo <sj1557.seo@samsung.com>
Cc: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agompage: terminate read-ahead on read error
Chi Zhiling [Fri, 29 Aug 2025 02:36:58 +0000 (10:36 +0800)]
mpage: terminate read-ahead on read error

For exFAT filesystems with 4MB read_ahead_size, removing the storage
device during read operations can delay EIO error reporting by several
minutes.  This occurs because the read-ahead implementation in mpage
doesn't handle errors.

Another reason for the delay is that the filesystem requires metadata to
issue file read request.  When the storage device is removed, the metadata
buffers are invalidated, causing mpage to repeatedly attempt to fetch
metadata during each get_block call.

The original purpose of this patch is terminate read ahead when we fail to
get metadata, to make the patch more generic, implement it by checking
folio status, instead of checking the return of get_block().

So, if a folio is synchronously unlocked and non-uptodate, should we quit
the read ahead?

I think it depends on whether the error is permanent or temporary, and
whether further read ahead might succeed.  A device being unplugged is one
reason for returning such a folio, but we could return it for many other
reasons (e.g., metadata errors).  I think most errors won't be restored in
a short time, so we should quit read ahead when they occur.

Link: https://lkml.kernel.org/r/20250829023659.688649-1-chizhiling@163.com
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Sungjong Seo <sj1557.seo@samsung.com>
Cc: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm-filemap-align-last_index-to-folio-size-fix-fix
Andrew Morton [Wed, 3 Sep 2025 22:50:51 +0000 (15:50 -0700)]
mm-filemap-align-last_index-to-folio-size-fix-fix

update it for Max's constification efforts

Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Chi Zhiling <chizhiling@kylinos.cn>
Cc: Youling Tang <tangyouling@kylinos.cn>
Cc: Youling Tang <youling.tang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm-filemap-align-last_index-to-folio-size-fix
Klara Modin [Mon, 14 Jul 2025 22:34:12 +0000 (00:34 +0200)]
mm-filemap-align-last_index-to-folio-size-fix

fix overflow on 32-bit

iocb->ki_pos is loff_t (long long) while pgoff_t is unsigned long and
this overflow seems to happen in practice, resulting in last_index being
before index.

Link: https://lkml.kernel.org/r/yru7qf5gvyzccq5ohhpylvxug5lr5tf54omspbjh4sm6pcdb2r@fpjgj2pxw7va
Signed-off-by: Klara Modin <klarasmodin@gmail.com>
Cc: Chi Zhiling <chizhiling@kylinos.cn>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Youling Tang <tangyouling@kylinos.cn>
Cc: Youling Tang <youling.tang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/filemap: align last_index to folio size
Youling Tang [Fri, 11 Jul 2025 05:55:09 +0000 (13:55 +0800)]
mm/filemap: align last_index to folio size

On XFS systems with pagesize=4K, blocksize=16K, and
CONFIG_TRANSPARENT_HUGEPAGE enabled, We observed the following readahead
behaviors:

 # echo 3 > /proc/sys/vm/drop_caches
 # dd if=test of=/dev/null bs=64k count=1
 # ./tools/mm/page-types -r -L -f  /mnt/xfs/test
 foffset offset flags
 0 136d4c __RU_l_________H______t_________________F_1
 1 136d4d __RU_l__________T_____t_________________F_1
 2 136d4e __RU_l__________T_____t_________________F_1
 3 136d4f __RU_l__________T_____t_________________F_1
 ...
 c 136bb8 __RU_l_________H______t_________________F_1
 d 136bb9 __RU_l__________T_____t_________________F_1
 e 136bba __RU_l__________T_____t_________________F_1
 f 136bbb __RU_l__________T_____t_________________F_1   <-- first read
 10 13c2cc ___U_l_________H______t______________I__F_1   <-- readahead flag
 11 13c2cd ___U_l__________T_____t______________I__F_1
 12 13c2ce ___U_l__________T_____t______________I__F_1
 13 13c2cf ___U_l__________T_____t______________I__F_1
 ...
 1c 1405d4 ___U_l_________H______t_________________F_1
 1d 1405d5 ___U_l__________T_____t_________________F_1
 1e 1405d6 ___U_l__________T_____t_________________F_1
 1f 1405d7 ___U_l__________T_____t_________________F_1
 [ra_size = 32, req_count = 16, async_size = 16]

 # echo 3 > /proc/sys/vm/drop_caches
 # dd if=test of=/dev/null bs=60k count=1
 # ./page-types -r -L -f  /mnt/xfs/test
 foffset offset flags
 0 136048 __RU_l_________H______t_________________F_1
 ...
 c 110a40 __RU_l_________H______t_________________F_1
 d 110a41 __RU_l__________T_____t_________________F_1
 e 110a42 __RU_l__________T_____t_________________F_1   <-- first read
 f 110a43 __RU_l__________T_____t_________________F_1   <-- first readahead flag
 10 13e7a8 ___U_l_________H______t_________________F_1
 ...
 20 137a00 ___U_l_________H______t_______P______I__F_1   <-- second readahead flag (20 - 2f)
 21 137a01 ___U_l__________T_____t_______P______I__F_1
 ...
 3f 10d4af ___U_l__________T_____t_______P_________F_1
 [first readahead: ra_size = 32, req_count = 15, async_size = 17]

When reading 64k data (same for 61-63k range, where last_index is
page-aligned in filemap_get_pages()), 128k readahead is triggered via
page_cache_sync_ra() and the PG_readahead flag is set on the next folio
(the one containing 0x10 page).

When reading 60k data, 128k readahead is also triggered via
page_cache_sync_ra().  However, in this case the readahead flag is set on
the 0xf page.  Although the requested read size (req_count) is 60k, the
actual read will be aligned to folio size (64k), which triggers the
readahead flag and initiates asynchronous readahead via
page_cache_async_ra().  This results in two readahead operations totaling
256k.

The root cause is that when the requested size is smaller than the actual
read size (due to folio alignment), it triggers asynchronous readahead.
By changing last_index alignment from page size to folio size, we ensure
the requested size matches the actual read size, preventing the case where
a single read operation triggers two readahead operations.

After applying the patch:
 # echo 3 > /proc/sys/vm/drop_caches
 # dd if=test of=/dev/null bs=60k count=1
 # ./page-types -r -L -f  /mnt/xfs/test
 foffset offset flags
 0 136d4c __RU_l_________H______t_________________F_1
 1 136d4d __RU_l__________T_____t_________________F_1
 2 136d4e __RU_l__________T_____t_________________F_1
 3 136d4f __RU_l__________T_____t_________________F_1
 ...
 c 136bb8 __RU_l_________H______t_________________F_1
 d 136bb9 __RU_l__________T_____t_________________F_1
 e 136bba __RU_l__________T_____t_________________F_1   <-- first read
 f 136bbb __RU_l__________T_____t_________________F_1
 10 13c2cc ___U_l_________H______t______________I__F_1   <-- readahead flag
 11 13c2cd ___U_l__________T_____t______________I__F_1
 12 13c2ce ___U_l__________T_____t______________I__F_1
 13 13c2cf ___U_l__________T_____t______________I__F_1
 ...
 1c 1405d4 ___U_l_________H______t_________________F_1
 1d 1405d5 ___U_l__________T_____t_________________F_1
 1e 1405d6 ___U_l__________T_____t_________________F_1
 1f 1405d7 ___U_l__________T_____t_________________F_1
 [ra_size = 32, req_count = 16, async_size = 16]

The same phenomenon will occur when reading from 49k to 64k.  Set the
readahead flag to the next folio.

Because the minimum order of folio in address_space equals the block size
(at least in xfs and bcachefs that already support bs > ps), having
request_count aligned to block size will not cause overread.

Link: https://lkml.kernel.org/r/20250711055509.91587-1-youling.tang@linux.dev
Co-developed-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Youling Tang <youling.tang@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm-constify-highmem-related-functions-for-improved-const-correctness-fix
Andrew Morton [Thu, 4 Sep 2025 02:51:59 +0000 (19:51 -0700)]
mm-constify-highmem-related-functions-for-improved-const-correctness-fix

nastily "fix" folio_page()

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: constify highmem related functions for improved const-correctness
Max Kellermann [Mon, 1 Sep 2025 20:50:21 +0000 (22:50 +0200)]
mm: constify highmem related functions for improved const-correctness

Lots of functions in mm/highmem.c do not write to the given pointers and
do not call functions that take non-const pointers and can therefore be
constified.

This includes functions like kunmap() which might be implemented in a way
that writes to the pointer (e.g.  to update reference counters or mapping
fields), but currently are not.

kmap() on the other hand cannot be made const because it calls
set_page_address() which is non-const in some
architectures/configurations.

Link: https://lkml.kernel.org/r/20250901205021.3573313-13-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Zankel <chris@zankel.net>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Nysal Jan K.A" <nysal@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: constify assert/test functions in mm.h
Max Kellermann [Mon, 1 Sep 2025 20:50:20 +0000 (22:50 +0200)]
mm: constify assert/test functions in mm.h

For improved const-correctness.

We select certain assert and test functions which either invoke each
other, functions that are already const-ified, or no further functions.

It is therefore relatively trivial to const-ify them, which provides a
basis for further const-ification further up the call stack.

Link: https://lkml.kernel.org/r/20250901205021.3573313-12-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Zankel <chris@zankel.net>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Nysal Jan K.A" <nysal@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: constify various inline functions for improved const-correctness
Max Kellermann [Mon, 1 Sep 2025 20:50:19 +0000 (22:50 +0200)]
mm: constify various inline functions for improved const-correctness

We select certain test functions plus folio_migrate_refs() from
mm_inline.h which either invoke each other, functions that are already
const-ified, or no further functions.

It is therefore relatively trivial to const-ify them, which provides a
basis for further const-ification further up the call stack.

One exception is the function folio_migrate_refs() which does write to the
"new" folio pointer; there, only the "old" folio pointer is being
constified; only its "flags" field is read, but nothing written.

Link: https://lkml.kernel.org/r/20250901205021.3573313-11-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Zankel <chris@zankel.net>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Nysal Jan K.A" <nysal@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: constify ptdesc_pmd_pts_count() and folio_get_private()
Max Kellermann [Mon, 1 Sep 2025 20:50:18 +0000 (22:50 +0200)]
mm: constify ptdesc_pmd_pts_count() and folio_get_private()

These functions from mm_types.h are trivial getters that should never
write to the given pointers.

Link: https://lkml.kernel.org/r/20250901205021.3573313-10-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Zankel <chris@zankel.net>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Nysal Jan K.A" <nysal@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: constify arch_pick_mmap_layout() for improved const-correctness
Max Kellermann [Mon, 1 Sep 2025 20:50:17 +0000 (22:50 +0200)]
mm: constify arch_pick_mmap_layout() for improved const-correctness

This function only reads from the rlimit pointer (but writes to the
mm_struct pointer which is kept without `const`).

All callees are already const-ified or (internal functions) are being
constified by this patch.

Link: https://lkml.kernel.org/r/20250901205021.3573313-9-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Zankel <chris@zankel.net>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Nysal Jan K.A" <nysal@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agoparisc: constify mmap_upper_limit() parameter
Max Kellermann [Mon, 1 Sep 2025 20:50:16 +0000 (22:50 +0200)]
parisc: constify mmap_upper_limit() parameter

For improved const-correctness.

This piece is necessary to make the `rlim_stack` parameter to mmap_base()
const.

Link: https://lkml.kernel.org/r/20250901205021.3573313-8-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Zankel <chris@zankel.net>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Nysal Jan K.A" <nysal@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm, s390: constify mapping related test/getter functions
Max Kellermann [Mon, 1 Sep 2025 20:50:15 +0000 (22:50 +0200)]
mm, s390: constify mapping related test/getter functions

For improved const-correctness.

We select certain test functions which either invoke each other, functions
that are already const-ified, or no further functions.

It is therefore relatively trivial to const-ify them, which provides a
basis for further const-ification further up the call stack.

(Even though seemingly unrelated, this also constifies the pointer
parameter of mmap_is_legacy() in arch/s390/mm/mmap.c because a copy of the
function exists in mm/util.c.)

Link: https://lkml.kernel.org/r/20250901205021.3573313-7-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Zankel <chris@zankel.net>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Nysal Jan K.A" <nysal@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: constify process_shares_mm() for improved const-correctness
Max Kellermann [Mon, 1 Sep 2025 20:50:14 +0000 (22:50 +0200)]
mm: constify process_shares_mm() for improved const-correctness

This function only reads from the pointer arguments.

Local (loop) variables are also annotated with `const` to clarify that
these will not be written to.

Link: https://lkml.kernel.org/r/20250901205021.3573313-6-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Zankel <chris@zankel.net>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Nysal Jan K.A" <nysal@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agofs: constify mapping related test functions for improved const-correctness
Max Kellermann [Mon, 1 Sep 2025 20:50:13 +0000 (22:50 +0200)]
fs: constify mapping related test functions for improved const-correctness

We select certain test functions which either invoke each other, functions
that are already const-ified, or no further functions.

It is therefore relatively trivial to const-ify them, which provides a
basis for further const-ification further up the call stack.

Link: https://lkml.kernel.org/r/20250901205021.3573313-5-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Nysal Jan K.A" <nysal@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: constify zone related test/getter functions
Max Kellermann [Mon, 1 Sep 2025 20:50:12 +0000 (22:50 +0200)]
mm: constify zone related test/getter functions

For improved const-correctness.

We select certain test functions which either invoke each other,
functions that are already const-ified, or no further functions.

It is therefore relatively trivial to const-ify them, which provides a
basis for further const-ification further up the call stack.

Link: https://lkml.kernel.org/r/20250901205021.3573313-4-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Zankel <chris@zankel.net>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Nysal Jan K.A" <nysal@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: constify pagemap related test/getter functions
Max Kellermann [Mon, 1 Sep 2025 20:50:11 +0000 (22:50 +0200)]
mm: constify pagemap related test/getter functions

For improved const-correctness.

We select certain test functions which either invoke each other, functions
that are already const-ified, or no further functions.

It is therefore relatively trivial to const-ify them, which provides a
basis for further const-ification further up the call stack.

Link: https://lkml.kernel.org/r/20250901205021.3573313-3-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Zankel <chris@zankel.net>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Nysal Jan K.A" <nysal@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: constify shmem related test functions for improved const-correctness
Max Kellermann [Mon, 1 Sep 2025 20:50:10 +0000 (22:50 +0200)]
mm: constify shmem related test functions for improved const-correctness

Patch series "mm: establish const-correctness for pointer parameters", v6.

This series is to improved const-correctness in the low-level
memory-management subsystem, which provides a basis for further
constification further up the call stack (e.g.  filesystems).

I started this work when I tried to constify the Ceph filesystem code, but
found that to be impossible because many "mm" functions accept non-const
pointers, even though they modify nothing.

This patch (of 12):

We select certain test functions which either invoke each other, functions
that are already const-ified, or no further functions.

It is therefore relatively trivial to const-ify them, which provides a
basis for further const-ification further up the call stack.

Link: https://lkml.kernel.org/r/20250901205021.3573313-1-max.kellermann@ionos.com
Link: https://lkml.kernel.org/r/20250901205021.3573313-2-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Zankel <chris@zankel.net>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Nysal Jan K.A" <nysal@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: hugeltb: check NUMA_NO_NODE in only_alloc_fresh_hugetlb_folio()
Kefeng Wang [Wed, 10 Sep 2025 13:39:58 +0000 (21:39 +0800)]
mm: hugeltb: check NUMA_NO_NODE in only_alloc_fresh_hugetlb_folio()

Move the NUMA_NO_NODE check out of buddy and gigantic folio allocation to
cleanup code a bit, also this will avoid NUMA_NO_NODE passed as 'nid' to
node_isset() in alloc_buddy_hugetlb_folio().

Link: https://lkml.kernel.org/r/20250910133958.301467-6-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: hugetlb: remove struct hstate from init_new_hugetlb_folio()
Kefeng Wang [Wed, 10 Sep 2025 13:39:57 +0000 (21:39 +0800)]
mm: hugetlb: remove struct hstate from init_new_hugetlb_folio()

The struct hstate is never used since commit d67e32f26713 ("hugetlb:
restructure pool allocations”), remove it.

Link: https://lkml.kernel.org/r/20250910133958.301467-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: hugetlb: directly pass order when allocate a hugetlb folio
Kefeng Wang [Wed, 10 Sep 2025 13:39:56 +0000 (21:39 +0800)]
mm: hugetlb: directly pass order when allocate a hugetlb folio

Use order instead of struct hstate to remove huge_page_order() call from
all hugetlb folio allocation, also order_is_gigantic() is added to check
whether it is a gigantic order.

Link: https://lkml.kernel.org/r/20250910133958.301467-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: hugetlb: convert to account_new_hugetlb_folio()
Kefeng Wang [Wed, 10 Sep 2025 13:39:55 +0000 (21:39 +0800)]
mm: hugetlb: convert to account_new_hugetlb_folio()

In order to avoid the wrong nid passed into the account, and we did make
such mistake before, so it's better to move folio_nid() into
account_new_hugetlb_folio().

Link: https://lkml.kernel.org/r/20250910133958.301467-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: hugetlb: convert to use more alloc_fresh_hugetlb_folio()
Kefeng Wang [Wed, 10 Sep 2025 13:39:54 +0000 (21:39 +0800)]
mm: hugetlb: convert to use more alloc_fresh_hugetlb_folio()

Patch series "mm: hugetlb: cleanup hugetlb folio allocation", v3.

Some cleanups for hugetlb folio allocation.

This patch (of 3):

Simplify alloc_fresh_hugetlb_folio() and convert more functions to use it,
which help us to remove prep_new_hugetlb_folio() and
__prep_new_hugetlb_folio().

Link: https://lkml.kernel.org/r/20250910133958.301467-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20250910133958.301467-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: show_mem: show number of zspages in show_free_areas
Thadeu Lima de Souza Cascardo [Tue, 2 Sep 2025 12:49:21 +0000 (09:49 -0300)]
mm: show_mem: show number of zspages in show_free_areas

When OOM is triggered, it will show where the pages might be for each
zone.  When using zram or zswap, it might look like lots of pages are
missing.  After this patch, zspages are shown as below.

[   48.792859] Node 0 DMA free:2812kB boost:0kB min:60kB low:72kB high:84kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB zspages:11160kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   48.792962] lowmem_reserve[]: 0 956 956 956 956
[   48.792988] Node 0 DMA32 free:3512kB boost:0kB min:3912kB low:4888kB high:5864kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:28kB active_file:8kB inactive_file:16kB unevictable:0kB writepending:0kB zspages:916780kB present:1032064kB managed:978944kB mlocked:0kB bounce:0kB free_pcp:500kB local_pcp:248kB free_cma:0kB
[   48.793118] lowmem_reserve[]: 0 0 0 0 0

Link: https://lkml.kernel.org/r/20250902-show_mem_zspages-v2-1-545daaa8b410@igalia.com
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/hugetlb: retry to allocate for early boot hugepage allocation
Li RongQing [Mon, 1 Sep 2025 08:20:52 +0000 (16:20 +0800)]
mm/hugetlb: retry to allocate for early boot hugepage allocation

In cloud environments with massive hugepage reservations (95%+ of system
RAM), single-attempt allocation during early boot often fails due to
memory pressure.

Commit 91f386bf0772 ("hugetlb: batch freeing of vmemmap pages")
intensified this by deferring page frees, increase peak memory usage
during allocation.

Introduce a retry mechanism that leverages vmemmap optimization reclaim
(~1.6% memory) when available.  Upon initial allocation failure, the
system retries until successful or no further progress is made, ensuring
reliable hugepage allocation while preserving batched vmemmap freeing
benefits.

Testing on a 256G machine allocating 252G of hugepages:
Before: 128056/129024 hugepages allocated
After:  Successfully allocated all 129024 hugepages

Link: https://lkml.kernel.org/r/20250901082052.3247-1-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Li RongQing <lirongqing@baidu.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agokasan-apply-write-only-mode-in-kasan-kunit-testcases-v7
Yeoreum Yun [Wed, 3 Sep 2025 15:00:20 +0000 (16:00 +0100)]
kasan-apply-write-only-mode-in-kasan-kunit-testcases-v7

update a comment

Link: https://lkml.kernel.org/r/20250903150020.1131840-3-yeoreum.yun@arm.com
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Breno Leitao <leitao@debian.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Hardevsinh Palaniya <hardevsinh.palaniya@siliconsignals.io>
Cc: James Morse <james.morse@arm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Pankaj Gupta <pankaj.gupta@amd.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agokasan: apply write-only mode in kasan kunit testcases
Yeoreum Yun [Mon, 1 Sep 2025 10:46:23 +0000 (11:46 +0100)]
kasan: apply write-only mode in kasan kunit testcases

When KASAN is configured in write-only mode, fetch/load operations do not
trigger tag check faults.

As a result, the outcome of some test cases may differ compared to when
KASAN is configured without write-only mode.

Therefore, by modifying pre-exist testcases check the write only makes tag
check fault (TCF) where writing is perform in "allocated memory" but tag
is invalid (i.e) redzone write in atomic_set() testcases.  Otherwise check
the invalid fetch/read doesn't generate TCF.

Also, skip some testcases affected by initial value (i.e) atomic_cmpxchg()
testcase maybe successd if it passes valid atomic_t address and invalid
oldaval address.  In this case, if invalid atomic_t doesn't have the same
oldval, it won't trigger write operation so the test will pass.

Link: https://lkml.kernel.org/r/20250901104623.402172-3-yeoreum.yun@arm.com
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Breno Leitao <leitao@debian.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Hardevsinh Palaniya <hardevsinh.palaniya@siliconsignals.io>
Cc: James Morse <james.morse@arm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Pankaj Gupta <pankaj.gupta@amd.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agokasan/hw-tags: introduce kasan.write_only option
Yeoreum Yun [Mon, 1 Sep 2025 10:46:22 +0000 (11:46 +0100)]
kasan/hw-tags: introduce kasan.write_only option

Patch series "introduce kasan.write_only option in hw-tags", v7.

Hardware tag based KASAN is implemented using the Memory Tagging Extension
(MTE) feature.

MTE is built on top of the ARMv8.0 virtual address tagging TBI (Top Byte
Ignore) feature and allows software to access a 4-bit allocation tag for
each 16-byte granule in the physical address space.  A logical tag is
derived from bits 59-56 of the virtual address used for the memory access.
A CPU with MTE enabled will compare the logical tag against the
allocation tag and potentially raise an tag check fault on mismatch,
subject to system registers configuration.

Since ARMv8.9, FEAT_MTE_STORE_ONLY can be used to restrict raise of tag
check fault on store operation only.

Using this feature (FEAT_MTE_STORE_ONLY), introduce KASAN write-only mode
which restricts KASAN check write (store) operation only.  This mode omits
KASAN check for read (fetch/load) operation.  Therefore, it might be used
not only debugging purpose but also in normal environment.

This patch (of 2):

Since Armv8.9, FEATURE_MTE_STORE_ONLY feature is introduced to restrict
raise of tag check fault on store operation only.  Introduce KASAN write
only mode based on this feature.

KASAN write only mode restricts KASAN checks operation for write only and
omits the checks for fetch/read operations when accessing memory.  So it
might be used not only debugging environment but also normal environment
to check memory safety.

This features can be controlled with "kasan.write_only" arguments.  When
"kasan.write_only=on", KASAN checks write operation only otherwise KASAN
checks all operations.

This changes the MTE_STORE_ONLY feature as BOOT_CPU_FEATURE like
ARM64_MTE_ASYMM so that makes it initialise in kasan_init_hw_tags() with
other function together.

Link: https://lkml.kernel.org/r/20250903150020.1131840-1-yeoreum.yun@arm.com
Link: https://lkml.kernel.org/r/20250901104623.402172-1-yeoreum.yun@arm.com
Link: https://lkml.kernel.org/r/20250901104623.402172-2-yeoreum.yun@arm.com
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Breno Leitao <leitao@debian.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Hardevsinh Palaniya <hardevsinh.palaniya@siliconsignals.io>
Cc: James Morse <james.morse@arm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: levi.yun <yeoreum.yun@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Pankaj Gupta <pankaj.gupta@amd.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: remove nth_page()
David Hildenbrand [Mon, 1 Sep 2025 15:03:58 +0000 (17:03 +0200)]
mm: remove nth_page()

Now that all users are gone, let's remove it.

Link: https://lkml.kernel.org/r/20250901150359.867252-38-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agoblock: update comment of "struct bio_vec" regarding nth_page()
David Hildenbrand [Mon, 1 Sep 2025 15:03:57 +0000 (17:03 +0200)]
block: update comment of "struct bio_vec" regarding nth_page()

Ever since commit 858c708d9efb ("block: move the bi_size update out of
__bio_try_merge_page"), page_is_mergeable() no longer exists, and the
logic in bvec_try_merge_page() is now a simple page pointer comparison.

Link: https://lkml.kernel.org/r/20250901150359.867252-37-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agokfence: drop nth_page() usage
David Hildenbrand [Mon, 1 Sep 2025 15:03:56 +0000 (17:03 +0200)]
kfence: drop nth_page() usage

We want to get rid of nth_page(), and kfence init code is the last user.

Unfortunately, we might actually walk a PFN range where the pages are not
contiguous, because we might be allocating an area from memblock that
could span memory sections in problematic kernel configs (SPARSEMEM
without SPARSEMEM_VMEMMAP).

We could check whether the page range is contiguous using
page_range_contiguous() and failing kfence init, or making kfence
incompatible these problemtic kernel configs.

Let's keep it simple and simply use pfn_to_page() by iterating PFNs.

Link: https://lkml.kernel.org/r/20250901150359.867252-36-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock()
David Hildenbrand [Mon, 1 Sep 2025 15:03:55 +0000 (17:03 +0200)]
mm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock()

There is the concern that unpin_user_page_range_dirty_lock() might do some
weird merging of PFN ranges -- either now or in the future -- such that
PFN range is contiguous but the page range might not be.

Let's sanity-check for that and drop the nth_page() usage.

Link: https://lkml.kernel.org/r/20250901150359.867252-35-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agocrypto: remove nth_page() usage within SG entry
David Hildenbrand [Mon, 1 Sep 2025 15:03:54 +0000 (17:03 +0200)]
crypto: remove nth_page() usage within SG entry

It's no longer required to use nth_page() when iterating pages within a
single SG entry, so let's drop the nth_page() usage.

Link: https://lkml.kernel.org/r/20250901150359.867252-34-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agovfio/pci: drop nth_page() usage within SG entry
David Hildenbrand [Mon, 1 Sep 2025 15:03:53 +0000 (17:03 +0200)]
vfio/pci: drop nth_page() usage within SG entry

It's no longer required to use nth_page() when iterating pages within a
single SG entry, so let's drop the nth_page() usage.

Link: https://lkml.kernel.org/r/20250901150359.867252-33-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Yishai Hadas <yishaih@nvidia.com>
Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agoscsi: sg: drop nth_page() usage within SG entry
David Hildenbrand [Mon, 1 Sep 2025 15:03:52 +0000 (17:03 +0200)]
scsi: sg: drop nth_page() usage within SG entry

It's no longer required to use nth_page() when iterating pages within a
single SG entry, so let's drop the nth_page() usage.

Link: https://lkml.kernel.org/r/20250901150359.867252-32-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agoscsi: scsi_lib: drop nth_page() usage within SG entry
David Hildenbrand [Mon, 1 Sep 2025 15:03:51 +0000 (17:03 +0200)]
scsi: scsi_lib: drop nth_page() usage within SG entry

It's no longer required to use nth_page() when iterating pages within a
single SG entry, so let's drop the nth_page() usage.

Link: https://lkml.kernel.org/r/20250901150359.867252-31-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agommc: drop nth_page() usage within SG entry
David Hildenbrand [Mon, 1 Sep 2025 15:03:50 +0000 (17:03 +0200)]
mmc: drop nth_page() usage within SG entry

It's no longer required to use nth_page() when iterating pages within a
single SG entry, so let's drop the nth_page() usage.

Link: https://lkml.kernel.org/r/20250901150359.867252-30-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Lars Persson <lars.persson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomemstick: drop nth_page() usage within SG entry
David Hildenbrand [Mon, 1 Sep 2025 15:03:49 +0000 (17:03 +0200)]
memstick: drop nth_page() usage within SG entry

It's no longer required to use nth_page() when iterating pages within a
single SG entry, so let's drop the nth_page() usage.

Link: https://lkml.kernel.org/r/20250901150359.867252-29-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomspro_block: drop nth_page() usage within SG entry
David Hildenbrand [Mon, 1 Sep 2025 15:03:48 +0000 (17:03 +0200)]
mspro_block: drop nth_page() usage within SG entry

It's no longer required to use nth_page() when iterating pages within a
single SG entry, so let's drop the nth_page() usage.

Link: https://lkml.kernel.org/r/20250901150359.867252-28-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agodrm/i915/gem: drop nth_page() usage within SG entry
David Hildenbrand [Mon, 1 Sep 2025 15:03:47 +0000 (17:03 +0200)]
drm/i915/gem: drop nth_page() usage within SG entry

It's no longer required to use nth_page() when iterating pages within a
single SG entry, so let's drop the nth_page() usage.

Link: https://lkml.kernel.org/r/20250901150359.867252-27-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agoata: libata-sff: drop nth_page() usage within SG entry
David Hildenbrand [Mon, 1 Sep 2025 15:03:46 +0000 (17:03 +0200)]
ata: libata-sff: drop nth_page() usage within SG entry

It's no longer required to use nth_page() when iterating pages within a
single SG entry, so let's drop the nth_page() usage.

Link: https://lkml.kernel.org/r/20250901150359.867252-26-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agoscatterlist: disallow non-contigous page ranges in a single SG entry
David Hildenbrand [Mon, 1 Sep 2025 15:03:45 +0000 (17:03 +0200)]
scatterlist: disallow non-contigous page ranges in a single SG entry

The expectation is that there is currently no user that would pass in
non-contigous page ranges: no allocator, not even VMA, will hand these
out.

The only problematic part would be if someone would provide a range
obtained directly from memblock, or manually merge problematic ranges.  If
we find such cases, we should fix them to create separate SG entries.

Let's check in sg_set_page() that this is really the case.  No need to
check in sg_set_folio(), as pages in a folio are guaranteed to be
contiguous.  As sg_set_page() gets inlined into modules, we have to export
the page_range_contiguous() helper -- use EXPORT_SYMBOL, there is nothing
special about this helper such that we would want to enforce GPL-only
modules.

We can now drop the nth_page() usage in sg_page_iter_page().

Link: https://lkml.kernel.org/r/20250901150359.867252-25-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agodma-remap: drop nth_page() in dma_common_contiguous_remap()
David Hildenbrand [Mon, 1 Sep 2025 15:03:44 +0000 (17:03 +0200)]
dma-remap: drop nth_page() in dma_common_contiguous_remap()

dma_common_contiguous_remap() is used to remap an "allocated contiguous
region".  Within a single allocation, there is no need to use nth_page()
anymore.

Neither the buddy, nor hugetlb, nor CMA will hand out problematic page
ranges.

Link: https://lkml.kernel.org/r/20250901150359.867252-24-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm-cma-refuse-handing-out-non-contiguous-page-ranges-fix
David Hildenbrand [Tue, 9 Sep 2025 09:50:13 +0000 (05:50 -0400)]
mm-cma-refuse-handing-out-non-contiguous-page-ranges-fix

apparently we can have NUMMU configs with SPARSEMEM enabled

Link: https://lkml.kernel.org/r/6ec933b1-b3f7-41c0-95d8-e518bb87375e@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/cma: refuse handing out non-contiguous page ranges
David Hildenbrand [Mon, 1 Sep 2025 15:03:43 +0000 (17:03 +0200)]
mm/cma: refuse handing out non-contiguous page ranges

Let's disallow handing out PFN ranges with non-contiguous pages, so we can
remove the nth-page usage in __cma_alloc(), and so any callers don't have
to worry about that either when wanting to blindly iterate pages.

This is really only a problem in configs with SPARSEMEM but without
SPARSEMEM_VMEMMAP, and only when we would cross memory sections in some
cases.

Will this cause harm?  Probably not, because it's mostly 32bit that does
not support SPARSEMEM_VMEMMAP.  If this ever becomes a problem we could
look into allocating the memmap for the memory sections spanned by a
single CMA region in one go from memblock.

Link: https://lkml.kernel.org/r/20250901150359.867252-23-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages()
David Hildenbrand [Mon, 1 Sep 2025 15:03:42 +0000 (17:03 +0200)]
mips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages()

Let's make it clearer that we are operating within a single folio by
providing both the folio and the page.

This implies that for flush_dcache_folio() we'll now avoid one more
page->folio lookup, and that we can safely drop the "nth_page" usage.

While at it, drop the "extern" from the function declaration.

Link: https://lkml.kernel.org/r/20250901150359.867252-22-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agoio_uring/zcrx: remove nth_page() usage within folio
David Hildenbrand [Mon, 1 Sep 2025 15:03:41 +0000 (17:03 +0200)]
io_uring/zcrx: remove nth_page() usage within folio

Within a folio/compound page, nth_page() is no longer required.  Given
that we call folio_test_partial_kmap()+kmap_local_page(), the code would
already be problematic if the pages would span multiple folios.

So let's just assume that all src pages belong to a single folio/compound
page and can be iterated ordinarily.  The dst page is currently always a
single page, so we're not actually iterating anything.

Link: https://lkml.kernel.org/r/20250901150359.867252-21-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agofixup: mm/gup: remove record_subpages()
David Hildenbrand [Fri, 5 Sep 2025 06:38:43 +0000 (08:38 +0200)]
fixup: mm/gup: remove record_subpages()

pages is not adjusted by the caller, but indexed by existing *nr.

Link: https://lkml.kernel.org/r/cc7f03f8-da8b-407e-a03a-e8e5a9ec5462@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+1ab243d3eebb2aabf4a4@syzkaller.appspotmail.com
Tested-by: syzbot+1ab243d3eebb2aabf4a4@syzkaller.appspotmail.com
Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/gup: remove record_subpages()
David Hildenbrand [Mon, 1 Sep 2025 15:03:40 +0000 (17:03 +0200)]
mm/gup: remove record_subpages()

We can just cleanup the code by calculating the #refs earlier, so we can
just inline what remains of record_subpages().

Calculate the number of references/pages ahead of times, and record them
only once all our tests passed.

Link: https://lkml.kernel.org/r/20250901150359.867252-20-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/gup: drop nth_page() usage within folio when recording subpages
David Hildenbrand [Mon, 1 Sep 2025 15:03:39 +0000 (17:03 +0200)]
mm/gup: drop nth_page() usage within folio when recording subpages

nth_page() is no longer required when iterating over pages within a single
folio, so let's just drop it when recording subpages.

Link: https://lkml.kernel.org/r/20250901150359.867252-19-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/pagewalk: drop nth_page() usage within folio in folio_walk_start()
David Hildenbrand [Mon, 1 Sep 2025 15:03:38 +0000 (17:03 +0200)]
mm/pagewalk: drop nth_page() usage within folio in folio_walk_start()

It's no longer required to use nth_page() within a folio, so let's just
drop the nth_page() in folio_walk_start().

Link: https://lkml.kernel.org/r/20250901150359.867252-18-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agofs: hugetlbfs: cleanup folio in adjust_range_hwpoison()
David Hildenbrand [Mon, 1 Sep 2025 15:03:37 +0000 (17:03 +0200)]
fs: hugetlbfs: cleanup folio in adjust_range_hwpoison()

Let's cleanup and simplify the function a bit.

Link: https://lkml.kernel.org/r/20250901150359.867252-17-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agofs: hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison()
David Hildenbrand [Mon, 1 Sep 2025 15:03:36 +0000 (17:03 +0200)]
fs: hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison()

The nth_page() is not really required anymore, so let's remove it.

Link: https://lkml.kernel.org/r/20250901150359.867252-16-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/mm/percpu-km: drop nth_page() usage within single allocation
David Hildenbrand [Mon, 1 Sep 2025 15:03:35 +0000 (17:03 +0200)]
mm/mm/percpu-km: drop nth_page() usage within single allocation

We're allocating a higher-order page from the buddy.  For these pages
(that are guaranteed to not exceed a single memory section) there is no
need to use nth_page().

Link: https://lkml.kernel.org/r/20250901150359.867252-15-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
David Hildenbrand [Mon, 1 Sep 2025 15:03:34 +0000 (17:03 +0200)]
mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()

We can now safely iterate over all pages in a folio, so no need for the
pfn_to_page().

Also, as we already force the refcount in __init_single_page() to 1
through init_page_count(), we can just set the refcount to 0 and avoid
page_ref_freeze() + VM_BUG_ON.  Likely, in the future, we would just want
to tell __init_single_page() to which value to initialize the refcount.

Further, adjust the comments to highlight that we are dealing with an
open-coded prep_compound_page() variant, and add another comment
explaining why we really need the __init_single_page() only on the tail
pages.

Note that the current code was likely problematic, but we never ran into
it: prep_compound_tail() would have been called with an offset that might
exceed a memory section, and prep_compound_tail() would have simply added
that offset to the page pointer -- which would not have done the right
thing on sparsemem without vmemmap.

Link: https://lkml.kernel.org/r/20250901150359.867252-14-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: simplify folio_page() and folio_page_idx()
David Hildenbrand [Mon, 1 Sep 2025 15:03:33 +0000 (17:03 +0200)]
mm: simplify folio_page() and folio_page_idx()

Now that a single folio/compound page can no longer span memory sections
in problematic kernel configurations, we can stop using nth_page() in
folio_page() and folio_page_idx().

While at it, turn both macros into static inline functions and add kernel
doc for folio_page_idx().

Link: https://lkml.kernel.org/r/20250901150359.867252-13-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: limit folio/compound page sizes in problematic kernel configs
David Hildenbrand [Mon, 1 Sep 2025 15:03:32 +0000 (17:03 +0200)]
mm: limit folio/compound page sizes in problematic kernel configs

Let's limit the maximum folio size in problematic kernel config where the
memmap is allocated per memory section (SPARSEMEM without
SPARSEMEM_VMEMMAP) to a single memory section.

Currently, only a single architectures supports ARCH_HAS_GIGANTIC_PAGE but
not SPARSEMEM_VMEMMAP: sh.

Fortunately, the biggest hugetlb size sh supports is 64 MiB
(HUGETLB_PAGE_SIZE_64MB) and the section size is at least 64 MiB
(SECTION_SIZE_BITS == 26), so their use case is not degraded.

As folios and memory sections are naturally aligned to their order-2 size
in memory, consequently a single folio can no longer span multiple memory
sections on these problematic kernel configs.

nth_page() is no longer required when operating within a single compound
page / folio.

Link: https://lkml.kernel.org/r/20250901150359.867252-12-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: sanity-check maximum folio size in folio_set_order()
David Hildenbrand [Mon, 1 Sep 2025 15:03:31 +0000 (17:03 +0200)]
mm: sanity-check maximum folio size in folio_set_order()

Let's sanity-check in folio_set_order() whether we would be trying to
create a folio with an order that would make it exceed MAX_FOLIO_ORDER.

This will enable the check whenever a folio/compound page is initialized
through prepare_compound_head() / prepare_compound_page() with
CONFIG_DEBUG_VM set.

Link: https://lkml.kernel.org/r/20250901150359.867252-11-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/mm_init: make memmap_init_compound() look more like prep_compound_page()
David Hildenbrand [Mon, 1 Sep 2025 15:03:30 +0000 (17:03 +0200)]
mm/mm_init: make memmap_init_compound() look more like prep_compound_page()

Grepping for "prep_compound_page" leaves on clueless how devdax gets its
compound pages initialized.

Let's add a comment that might help finding this open-coded
prep_compound_page() initialization more easily.

Further, let's be less smart about the ordering of initialization and just
perform the prep_compound_head() call after all tail pages were
initialized: just like prep_compound_page() does.

No need for a comment to describe the initialization order: again, just
like prep_compound_page().

Link: https://lkml.kernel.org/r/20250901150359.867252-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/hugetlb: check for unreasonable folio sizes when registering hstate
David Hildenbrand [Mon, 1 Sep 2025 15:03:29 +0000 (17:03 +0200)]
mm/hugetlb: check for unreasonable folio sizes when registering hstate

Let's check that no hstate that corresponds to an unreasonable folio size
is registered by an architecture.  If we were to succeed registering, we
could later try allocating an unsupported gigantic folio size.

Further, let's add a BUILD_BUG_ON() for checking that HUGETLB_PAGE_ORDER
is sane at build time.  As HUGETLB_PAGE_ORDER is dynamic on powerpc, we
have to use a BUILD_BUG_ON_INVALID() to make it compile.

No existing kernel configuration should be able to trigger this check:
either SPARSEMEM without SPARSEMEM_VMEMMAP cannot be configured or
gigantic folios will not exceed a memory section (the case on sparse).

Link: https://lkml.kernel.org/r/20250901150359.867252-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/memremap: reject unreasonable folio/compound page sizes in memremap_pages()
David Hildenbrand [Mon, 1 Sep 2025 15:03:28 +0000 (17:03 +0200)]
mm/memremap: reject unreasonable folio/compound page sizes in memremap_pages()

Let's reject unreasonable folio sizes early, where we can still fail.
We'll add sanity checks to prepare_compound_head/prepare_compound_page
next.

Is there a way to configure a system such that unreasonable folio sizes
would be possible?  It would already be rather questionable.

If so, we'd probably want to bail out earlier, where we can avoid a WARN
and just report a proper error message that indicates where something went
wrong such that we messed up.

Link: https://lkml.kernel.org/r/20250901150359.867252-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_no...
David Hildenbrand [Mon, 1 Sep 2025 15:03:27 +0000 (17:03 +0200)]
mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof()

Let's reject them early, which in turn makes folio_alloc_gigantic() reject
them properly.

To avoid converting from order to nr_pages, let's just add MAX_FOLIO_ORDER
and calculate MAX_FOLIO_NR_PAGES based on that.

While at it, let's just make the order a "const unsigned order".

Link: https://lkml.kernel.org/r/20250901150359.867252-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agowireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel config
David Hildenbrand [Mon, 1 Sep 2025 15:03:26 +0000 (17:03 +0200)]
wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel config

It's no longer user-selectable (and the default was already "y"), so let's
just drop it.

It was never really relevant to the wireguard selftests either way.

Link: https://lkml.kernel.org/r/20250901150359.867252-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agox86/Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
David Hildenbrand [Mon, 1 Sep 2025 15:03:25 +0000 (17:03 +0200)]
x86/Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"

Now handled by the core automatically once SPARSEMEM_VMEMMAP_ENABLE is
selected.

Link: https://lkml.kernel.org/r/20250901150359.867252-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agos390/Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
David Hildenbrand [Mon, 1 Sep 2025 15:03:24 +0000 (17:03 +0200)]
s390/Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"

Now handled by the core automatically once SPARSEMEM_VMEMMAP_ENABLE is
selected.

Link: https://lkml.kernel.org/r/20250901150359.867252-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agoarm64: Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
David Hildenbrand [Mon, 1 Sep 2025 15:03:23 +0000 (17:03 +0200)]
arm64: Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"

Now handled by the core automatically once SPARSEMEM_VMEMMAP_ENABLE is
selected.

Link: https://lkml.kernel.org/r/20250901150359.867252-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: stop making SPARSEMEM_VMEMMAP user-selectable
David Hildenbrand [Mon, 1 Sep 2025 15:03:22 +0000 (17:03 +0200)]
mm: stop making SPARSEMEM_VMEMMAP user-selectable

Patch series "mm: remove nth_page()", v2.

As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.

To recap, the reason we currently need nth_page() within a folio is
because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.

While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.

So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.

Likely, many people have no idea when nth_page() is required and when it
might be dropped.

We refer to such problematic PFN ranges and "non-contiguous pages".  If we
only deal with "contiguous pages", there is not need for nth_page().

Besides that "obvious" folio case, we might end up using nth_page() within
CMA allocations (again, could span memory sections), and in one corner
case (kfence) when processing memblock allocations (again, could span
memory sections).

So let's handle all that, add sanity checks, and remove nth_page().

Patch #1 -> #5   : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #13  : disallow folios to have non-contiguous pages
Patch #14 -> #20 : remove nth_page() usage within folios
Patch #22        : disallow CMA allocations of non-contiguous pages
Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry
Patch #34        : sanity-check + remove nth_page() usage in
                   unpin_user_page_range_dirty_lock()
Patch #35        : remove nth_page() in kfence
Patch #36        : adjust stale comment regarding nth_page
Patch #37        : mm: remove nth_page()

A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.

This patch (of 37):

In an ideal world, we wouldn't have to deal with SPARSEMEM without
SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is
considered too costly and consequently not supported.

However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP,
let's forbid the user to disable VMEMMAP: just like we already do for
arm64, s390 and x86.

So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without
SPARSEMEM_VMEMMAP.

This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone
for loongarch, powerpc, riscv and sparc.  All architectures only enable
SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big
downside to using the VMEMMAP (quite the contrary).

This is a preparation for not supporting

(1) folio sizes that exceed a single memory section

(2) CMA allocations of non-contiguous page ranges

in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit
possible impact as much as possible (e.g., gigantic hugetlb page
allocations suddenly fails).

Link: https://lkml.kernel.org/r/20250901150359.867252-1-david@redhat.com
Link: https://lkml.kernel.org/r/20250901150359.867252-2-david@redhat.com
Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Alex Willamson <alex.williamson@redhat.com>
Cc: Bart van Assche <bvanassche@acm.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Brett Creeley <brett.creeley@amd.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Damien Le Maol <dlemoal@kernel.org>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inki Dae <m.szyprowski@samsung.com>
Cc: James Bottomley <james.bottomley@HansenPartnership.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxim Levitky <maximlevitsky@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Niklas Cassel <cassel@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agotools-mm-slabinfo-fix-access-to-null-terminator-in-string-boundary-v4
Kaushlendra Kumar [Mon, 1 Sep 2025 04:49:55 +0000 (10:19 +0530)]
tools-mm-slabinfo-fix-access-to-null-terminator-in-string-boundary-v4

remove unnecessary blank line

Link: https://lkml.kernel.org/r/20250901044955.3902815-1-kaushlendra.kumar@intel.com
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agotools/mm/slabinfo: fix access to null terminator in string boundary
Kaushlendra Kumar [Sat, 30 Aug 2025 17:20:22 +0000 (22:50 +0530)]
tools/mm/slabinfo: fix access to null terminator in string boundary

The current code incorrectly accesses buffer[strlen(buffer)], which points
to the null terminator ('\0') at the end of the string.  This is
technically out-of-bounds access since valid string content ends at index
strlen(buffer)-1.

Fix by:
1. Declaring strlen() result variable at function scope
2. Adding bounds check (len > 0) to handle empty strings
3. Using buffer[len-1] to correctly access the last character before
   the null terminator

Link: https://lkml.kernel.org/r/20250830172022.1927448-1-kaushlendra.kumar@intel.com
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Acked-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomemfd: move MFD_ALL_FLAGS definition to memfd.h
Andrew Morton [Sun, 31 Aug 2025 19:29:57 +0000 (12:29 -0700)]
memfd: move MFD_ALL_FLAGS definition to memfd.h

It's not part of the UAPI, but putting it here is better from a
maintainability POV.
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/memfd: remove redundant casts
Joey Pabalinas [Sun, 31 Aug 2025 11:47:48 +0000 (01:47 -1000)]
mm/memfd: remove redundant casts

MFD_ALL_FLAGS is already an unsigned int.  Remove redundant casts to
unsigned int.

Link: https://lkml.kernel.org/r/efbbe6093b64a5b19f974871d5262d6e75dff2c0.1756639225.git.joeypabalinas@gmail.com
Signed-off-by: Joey Pabalinas <joeypabalinas@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agotask_stack.h: clean-up stack_not_used() implementation
Pasha Tatashin [Fri, 29 Aug 2025 11:44:41 +0000 (13:44 +0200)]
task_stack.h: clean-up stack_not_used() implementation

Inside the small stack_not_used() function there are several ifdefs for
stack growing-up vs.  regular versions.  Instead just implement this
function two times, one for growing-up and another regular.

Add comments like /* !CONFIG_DEBUG_STACK_USAGE */ to clarify what the
ifdefs are doing.

[linus.walleij@linaro.org: rebased, function moved elsewhere in the kernel]
Link: https://lkml.kernel.org/r/20250829-fork-cleanups-for-dynstack-v1-2-3bbaadce1f00@linaro.org
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/20240311164638.2015063-13-pasha.tatashin@soleen.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agofork: check charging success before zeroing stack
Pasha Tatashin [Fri, 29 Aug 2025 11:44:40 +0000 (13:44 +0200)]
fork: check charging success before zeroing stack

Patch series "mm: task_stack: Stack handling cleanups".

These are some small cleanups for the fork code that was split off from
Pasha:s dynamic stack patch series, they are generally nice on their own
so let's propose them for merging.

This patch (of 2):

No need to do zero cached stack if memcg charge fails, so move the
charging attempt before the memset operation.

Link: https://lkml.kernel.org/r/20250829-fork-cleanups-for-dynstack-v1-0-3bbaadce1f00@linaro.org
Link: https://lkml.kernel.org/r/20250829-fork-cleanups-for-dynstack-v1-1-3bbaadce1f00@linaro.org
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/20240311164638.2015063-6-pasha.tatashin@soleen.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agoselftests/mm/uffd: refactor non-composite global vars into struct
Ujwal Kundur [Fri, 29 Aug 2025 15:56:00 +0000 (21:26 +0530)]
selftests/mm/uffd: refactor non-composite global vars into struct

Refactor macros and non-composite global variable definitions into a
struct that is defined at the start of a test and is passed around instead
of relying on global vars.

Link: https://lkml.kernel.org/r/20250829155600.2000-1-ujwal.kundur@gmail.com
Signed-off-by: Ujwal Kundur <ujwal.kundur@gmail.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: zpdesc: minor naming and comment corrections
Johannes Weiner [Fri, 29 Aug 2025 16:15:28 +0000 (17:15 +0100)]
mm: zpdesc: minor naming and comment corrections

zpdesc is the page descriptor used by the zsmalloc backend allocator,
which in turn is used by zswap and zram.  The zpool layer is gone.

Link: https://lkml.kernel.org/r/20250829162212.208258-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: remove unused zpool layer
Johannes Weiner [Fri, 29 Aug 2025 16:15:27 +0000 (17:15 +0100)]
mm: remove unused zpool layer

With zswap using zsmalloc directly, there are no more in-tree users of
this code.  Remove it.

With zpool gone, zsmalloc is now always a simple dependency and no
longer something the user needs to configure. Hide CONFIG_ZSMALLOC
from the user and have zswap and zram pull it in as needed.

Link: https://lkml.kernel.org/r/20250829162212.208258-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm: zswap: interact directly with zsmalloc
Johannes Weiner [Fri, 29 Aug 2025 16:15:26 +0000 (17:15 +0100)]
mm: zswap: interact directly with zsmalloc

Patch series "mm: remove zpool".

zpool is an indirection layer for zswap to switch between multiple
allocator backends at runtime.  Since 6.15, zsmalloc is the only allocator
left in-tree, so there is no point in keeping zpool around.

This patch (of 3):

zswap goes through the zpool layer to enable runtime-switching of
allocator backends for compressed data.  However, since zbud and z3fold
were removed in 6.15, zsmalloc has been the only option available.

As such, the zpool indirection is unnecessary.  Make zswap deal with
zsmalloc directly.  This is comparable to zram, which also directly
interacts with zsmalloc and has never supported a different backend.

Note that this does not preclude future improvements and experiments with
different allocation strategies.  Should it become necessary, it's
possible to provide an alternate implementation for the zsmalloc API,
selectable at compile time.  However, zsmalloc is also rather mature and
feature rich, with years of widespread production exposure; it's
encouraged to make incremental improvements rather than fork it.

In any case, the complexity of runtime pluggability seems excessive and
unjustified at this time.  Switch zswap to zsmalloc to remove the last
user of the zpool API.

Link: https://lkml.kernel.org/r/20250829162212.208258-1-hannes@cmpxchg.org
Link: https://lkml.kernel.org/r/20250829162212.208258-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Nacked-by: Vitaly Wool <vitaly.wool@konsulko.se>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomaple_tree: testing fix for spanning store on 32b
Liam R. Howlett [Thu, 28 Aug 2025 00:30:23 +0000 (20:30 -0400)]
maple_tree: testing fix for spanning store on 32b

32 bit nodes have a larger branching factor.  This affects the required
value to cause a height change.  Update the spanning store height test to
work for both 64 and 32 bit nodes.

Link: https://lkml.kernel.org/r/20250828003023.418966-3-Liam.Howlett@oracle.com
Fixes: f9d3a963fef4 ("maple_tree: use height and depth consistently")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomaple_tree: fix testing for 32 bit builds
Liam R. Howlett [Thu, 28 Aug 2025 00:30:22 +0000 (20:30 -0400)]
maple_tree: fix testing for 32 bit builds

Patch series "maple_tree: Fix testing for 32bit compiles".

The maple tree test suite supports 32bit builds which causes 32bit nodes
and index/last values.  Some tests have too large values and must be
skipped while others depend on certain actions causing the tree to be
altered in another measurable way (such as the height decreasing or
increasing).

Two tests were added that broke 32bit testing, either by compile warnings
or failures.  These fixes restore the tests to a working order.

Building 32bit version can be done on a 32bit platform, or by using a
command like: BUILD=32 make clean maple

This patch (of 2):

Some tests are invalid on 32bit due to the size of the index and last.
Making those tests depend on the correct build flags stops compile
complaints.

Link: https://lkml.kernel.org/r/20250828003023.418966-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20250828003023.418966-2-Liam.Howlett@oracle.com
Fixes: 5d659bbb52a2 ("maple_tree: introduce mas_wr_store_type()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agohuge_mm.h: disallow is_huge_zero_folio(NULL)
Max Kellermann [Thu, 28 Aug 2025 08:48:20 +0000 (10:48 +0200)]
huge_mm.h: disallow is_huge_zero_folio(NULL)

Calling is_huge_zero_folio(NULL) should not be legal - it makes no sense,
and a different (theoretical) implementation may dereference the pointer.
But currently, lacking any explicit documentation, this call is possible.

But if somebody really passes NULL, the function should not return true -
this isn't the huge zero folio after all!  However, if the
`huge_zero_folio` hasn't been allocated yet, it's NULL, and
is_huge_zero_folio(NULL) just happens to return true, which is a lie.

This weird side effect prevented me from reproducing a kernel crash that
occurred when the elements of a folio_batch were NULL - since
folios_put_refs() skips huge zero folios, this sometimes causes a crash,
but sometimes does not.  For debugging, it is better to reveal such bugs
reliably and not hide them behind random preconditions like "has the huge
zero folio already been created?"

To improve detection of such bugs, David Hildenbrand suggested adding a
VM_WARN_ON_ONCE().

Link: https://lkml.kernel.org/r/20250828084820.570118-1-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm-page_alloc-find_large_buddy-from-start_pfn-aligned-order-v2
Wei Yang [Tue, 2 Sep 2025 02:58:07 +0000 (02:58 +0000)]
mm-page_alloc-find_large_buddy-from-start_pfn-aligned-order-v2

add comment on assignment of order

Link: https://lkml.kernel.org/r/20250828091618.7869-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20250902025807.11467-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agomm/page_alloc: find_large_buddy() from start_pfn aligned order
Wei Yang [Thu, 28 Aug 2025 09:16:18 +0000 (09:16 +0000)]
mm/page_alloc: find_large_buddy() from start_pfn aligned order

We iterate pfn from order 0 to MAX_PAGE_ORDER aligned to find large buddy.
While if the order is less than start_pfn aligned order, we would get the
same pfn and do the same check again.

Iterate from start_pfn aligned order to reduce duplicated work.

Link: https://lkml.kernel.org/r/20250828091618.7869-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20250902025807.11467-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agotools: testing: use existing atomic.h for vma/maple tests
Brendan Jackman [Thu, 28 Aug 2025 12:28:01 +0000 (12:28 +0000)]
tools: testing: use existing atomic.h for vma/maple tests

The shared userspace logic used for unit-testing maple tree and VMA code
currently has its own replacements for atomics helpers.  This is not
needed as the necessary APIs already have userspace implementations in the
tools tree.  Switching over to that allows deleting a bit of code.

Note that the implementation is different; while the version being deleted
here is implemented using liburcu, the existing version in tools uses
either x86 asm or compiler builtins.  It's assumed that both are equally
likely to be correct.

The tools tree's version of atomic_t is a struct type while the version
being deleted was just a typedef of an integer.  This means it's no longer
valid to call __sync_bool_compare_and_swap() directly on it.  One option
would be to just peek into the struct and call it on the field, but it
seems a little cleaner to just use the corresponding atomic.h API whic has
been added recently.  Now the fake mapping_map_writable() is copied from
the real one.

Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-4-02d146a58ed2@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agotools: testing: support EXTRA_CFLAGS in shared.mk
Brendan Jackman [Thu, 28 Aug 2025 12:28:00 +0000 (12:28 +0000)]
tools: testing: support EXTRA_CFLAGS in shared.mk

This allows the user to set cflags when building tests that use this
shared build infrastructure.

For example, it enables building with -Werror so that patch-check scripts
will fail:

make -C tools/testing/vma -j EXTRA_CFLAGS=-Werror

Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-3-02d146a58ed2@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agotools: testing: allow importing arch headers in shared.mk
Brendan Jackman [Thu, 28 Aug 2025 12:27:59 +0000 (12:27 +0000)]
tools: testing: allow importing arch headers in shared.mk

There is an arch/ tree under tools.  This contains some useful stuff, to
make that available, add it to the -I flags.  This requires $(SRCARCH),
which is provided by Makefile.arch, so include that..

There still aren't that many headers so also just smush all of them into
SHARED_DEPS instead of starting to do any header dependency hocus pocus.

Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-2-02d146a58ed2@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agotools/include: implement a couple of atomic_t ops
Brendan Jackman [Thu, 28 Aug 2025 12:27:58 +0000 (12:27 +0000)]
tools/include: implement a couple of atomic_t ops

Patch series "tools: testing: Use existing atomic.h for vma/maple tests",
v2.

De-duplicating this lets us delete a bit of code.

Ulterior motive: I'm working on a new set of the userspace-based unit
tests, which will need the atomics API too.  That would involve even more
duplication, so while the win in this patchset alone is very minimal, it
looks a lot more significant with my other WIP patchset.

I've tested these commands:

make -C tools/testing/vma -j
tools/testing/vma/vma

make -C tools/testing/radix-tree -j
tools/testing/radix-tree/maple

Note the EXTRA_CFLAGS patch is actually orthogonal, let me know if you'd
prefer I send it separately.

This patch (of 4):

The VMA tests need an operation equivalent to atomic_inc_unless_negative()
to implement a fake mapping_map_writable().  Adding it will enable them to
switch to the shared atomic headers and simplify that fake implementation.

In order to add that, also add atomic_try_cmpxchg() which can be used to
implement it.  This is copied from Documentation/atomic_t.txt.  Then,
implement atomic_inc_unless_negative() itself based on the
raw_atomic_dec_unless_positive() in
include/linux/atomic/atomic-arch-fallback.h.

There's no present need for a highly-optimised version of this (nor any
reason to think this implementation is sub-optimal on x86) so just
implement this with generic C, no x86-specifics.

Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-0-02d146a58ed2@google.com
Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-1-02d146a58ed2@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agopagevec.h: add `const` to pointer parameters of getter functions
Max Kellermann [Thu, 28 Aug 2025 13:03:11 +0000 (15:03 +0200)]
pagevec.h: add `const` to pointer parameters of getter functions

For improved const-correctness.

Link: https://lkml.kernel.org/r/20250828130311.772993-1-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>