Charan Teja Kalla [Wed, 9 Aug 2023 08:05:44 +0000 (13:35 +0530)]
Multi-gen LRU: skip CMA pages when they are not eligible
This patch is based on the commit 5da226dbfce3("mm: skip CMA pages when
they are not available") which skips cma pages reclaim when they are not
eligible for the current allocation context. In mglru, such pages are
added to the tail of the immediate generation to maintain better LRU
order, which is unlike the case of conventional LRU where such pages are
directly added to the head of the LRU list(akin to adding to head of the
youngest generation in mglru).
No observable issue without this patch on MGLRU, but logically it make
sense to skip the CMA page reclaim when those pages can't be satisfied for
the current allocation context.
Kemeng Shi [Wed, 9 Aug 2023 10:07:53 +0000 (18:07 +0800)]
mm/page_alloc: remove track of active PCP lists range in bulk free
Patch series "Two minor cleanups for pcp list in page_alloc".
There are two minor cleanups for pcp list in page_alloc. More details
can be found in respective patches.
This patch (of 2):
After commit fd56eef258a17 ("mm/page_alloc: simplify how many pages are
selected per pcp list during bulk free"), we will drain all pages in
selected pcp list. And we ensured passed count is < pcp->count. Then,
the search will finish before wrap-around and track of active PCP lists
range intended for wrap-around case is no longer needed.
mm/memory_hotplug: embed vmem_altmap details in memory block
With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
computing them again when we remove a memory block, embed vmem_altmap
details in struct memory_block if we are using memmap on memory block
feature.
[yangyingliang@huawei.com: fix error return code in add_memory_resource()] Link: https://lkml.kernel.org/r/20230809081552.1351184-1-yangyingliang@huawei.com Link: https://lkml.kernel.org/r/20230808091501.287660-7-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
powerpc/book3s64/memhotplug: enable memmap on memory for radix
Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. Hence we skip the
restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
makes the feature widely useful because to use PMD_SIZE vmemmap area we
require a memory block size of 2GiB
We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature can
work with a memory block size of 256MB. Using altmap.reserve feature to
align things correctly at pageblock granularity. We can end up losing
some pages in memory with this. For ex: with a 256MiB memory block size,
we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096
pages 28 pages get reserved.
Link: https://lkml.kernel.org/r/20230808091501.287660-6-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/memory_hotplug: support memmap_on_memory when memmap is not aligned to pageblocks
Currently, memmap_on_memory feature is only supported with memory block
sizes that result in vmemmap pages covering full page blocks. This is
because memory onlining/offlining code requires applicable ranges to be
pageblock-aligned, for example, to set the migratetypes properly.
This patch helps to lift that restriction by reserving more pages than
required for vmemmap space. This helps the start address to be page block
aligned with different memory block sizes. Using this facility implies
the kernel will be reserving some pages for every memoryblock. This
allows the memmap on memory feature to be widely useful with different
memory block size values.
For ex: with 64K page size and 256MiB memory block size, we require 4
pages to map vmemmap pages, To align things correctly we end up adding a
reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.
Link: https://lkml.kernel.org/r/20230808091501.287660-5-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Add support for memmap on memory feature on ppc64", v8.
This patch series update memmap on memory feature to fall back to
memmap allocation outside the memory block if the alignment rules are
not met. This makes the feature more useful on architectures like
ppc64 where alignment rules are different with 64K page size.
This patch (of 6):
Instead of adding menu entry with all supported architectures, add
mm/Kconfig variable and select the same from supported architectures.
Xiu Jianfeng [Tue, 8 Aug 2023 06:20:56 +0000 (06:20 +0000)]
mm: zswap: update comment for struct zswap_entry
Since commit 0bb488498c98 ("mm: zswap: remove zswap_header"), the 'offset'
has been replaced by swpentry, update the comment for it, and also add
comment for 'objcg'.
Kefeng Wang [Tue, 8 Aug 2023 03:33:59 +0000 (11:33 +0800)]
mm: memtest: convert to memtest_report_meminfo()
It is better to not expose too many internal variables of memtest,
add a helper memtest_report_meminfo() to show memtest results.
Link: https://lkml.kernel.org/r/20230808033359.174986-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Tomas Mudrunka <tomas.mudrunka@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Miaohe Lin [Mon, 7 Aug 2023 02:35:28 +0000 (10:35 +0800)]
mm/mm_init: use helper macro BITS_PER_LONG and BITS_PER_BYTE
It's more readable to use helper macro BITS_PER_LONG and BITS_PER_BYTE.
No functional change intended.
Link: https://lkml.kernel.org/r/20230807023528.325191-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Suren Baghdasaryan [Fri, 4 Aug 2023 15:27:24 +0000 (08:27 -0700)]
mm: move vma locking out of vma_prepare and dup_anon_vma
vma_prepare() is currently the central place where vmas are being locked
before vma_complete() applies changes to them. While this is convenient,
it also obscures vma locking and makes it harder to follow the locking
rules. Move vma locking out of vma_prepare() and take vma locks
explicitly at the locations where vmas are being modified. Move vma
locking and replace it with an assertion inside dup_anon_vma() to further
clarify the locking pattern inside vma_merge().
Suren Baghdasaryan [Fri, 4 Aug 2023 15:27:23 +0000 (08:27 -0700)]
mm: always lock new vma before inserting into vma tree
While it's not strictly necessary to lock a newly created vma before
adding it into the vma tree (as long as no further changes are performed
to it), it seems like a good policy to lock it and prevent accidental
changes after it becomes visible to the page faults. Lock the vma before
adding it into the vma tree.
[akpm@linux-foundation.org: fix reject fixing in vma_link(), per Jann] Link: https://lkml.kernel.org/r/20230804152724.3090321-6-surenb@google.com Suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Suren Baghdasaryan [Fri, 4 Aug 2023 15:27:22 +0000 (08:27 -0700)]
mm: lock vma explicitly before doing vm_flags_reset and vm_flags_reset_once
Implicit vma locking inside vm_flags_reset() and vm_flags_reset_once() is
not obvious and makes it hard to understand where vma locking is happening.
Also in some cases (like in dup_userfaultfd()) vma should be locked earlier
than vma_flags modification. To make locking more visible, change these
functions to assert that the vma write lock is taken and explicitly lock
the vma beforehand. Fix userfaultfd functions which should lock the vma
earlier.
Suren Baghdasaryan [Fri, 4 Aug 2023 15:27:21 +0000 (08:27 -0700)]
mm: replace mmap with vma write lock assertions when operating on a vma
Vma write lock assertion always includes mmap write lock assertion and
additional vma lock checks when per-VMA locks are enabled. Replace
weaker mmap_assert_write_locked() assertions with stronger
vma_assert_write_locked() ones when we are operating on a vma which
is expected to be locked.
ZhangPeng [Fri, 4 Aug 2023 01:25:53 +0000 (09:25 +0800)]
mm: remove redundant K() macro definition
Patch series "cleanup with helper macro K()".
Use helper macro K() to improve code readability. No functional
modification involved. Remove redundant K() macro definition.
This patch (of 7):
Since commit eb8589b4f8c1 ("mm: move mem_init_print_info() to mm_init.c"),
the K() macro definition has been moved to mm/internal.h. Therefore, the
definitions in mm/memcontrol.c, mm/backing-dev.c and mm/oom_kill.c are
redundant. Drop redundant definitions.
Ma Wupeng [Wed, 2 Aug 2023 07:23:28 +0000 (15:23 +0800)]
mm: disable kernelcore=mirror when no mirror memory
For system with kernelcore=mirror enabled while no mirrored memory is
reported by efi. This could lead to kernel OOM during startup since all
memory beside zone DMA are in the movable zone and this prevents the
kernel to use it.
Zone DMA/DMA32 initialization is independent of mirrored memory and their
max pfn is set in zone_sizes_init(). Since kernel can fallback to zone
DMA/DMA32 if there is no memory in zone Normal, these zones are seen as
mirrored memory no mather their memory attributes are.
To solve this problem, disable kernelcore=mirror when there is no real
mirrored memory exists.
Link: https://lkml.kernel.org/r/20230802072328.2107981-1-mawupeng1@huawei.com Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Suggested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Suggested-by: Mike Rapoport <rppt@kernel.org> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Levi Yun <ppbuk5246@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kemeng Shi [Fri, 4 Aug 2023 11:04:52 +0000 (19:04 +0800)]
mm/compaction: correct comment to complete migration failure
Commit cfccd2e63e7e0 ("mm, compaction: finish pageblocks on complete
migration failure") convert cc->order aligned check to page block order
aligned check. Correct comment relevant with it.
Link: https://lkml.kernel.org/r/20230804110454.2935878-7-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kemeng Shi [Fri, 4 Aug 2023 11:04:51 +0000 (19:04 +0800)]
mm/compaction: correct comment of cached migrate pfn update
Commit e380bebe47715 ("mm, compaction: keep migration source private to a
single compaction instance") moved update of async and sync
compact_cached_migrate_pfn from update_pageblock_skip to
update_cached_migrate but left the comment behind. Move the relevant
comment to correct this.
Link: https://lkml.kernel.org/r/20230804110454.2935878-6-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kemeng Shi [Fri, 4 Aug 2023 11:04:50 +0000 (19:04 +0800)]
mm/compaction: correct comment of fast_find_migrateblock in isolate_migratepages
After 90ed667c03fe5 ("Revert "Revert "mm/compaction: fix set skip in
fast_find_migrateblock"""), we remove skip set in fast_find_migrateblock.
Correct comment that fast_find_block is used to avoid isolation_suitable
check for pageblock returned from fast_find_migrateblock because
fast_find_migrateblock will mark found pageblock skipped.
Instead, comment that fast_find_block is used to avoid a redundant check
of fast found pageblock which is already checked skip flag inside
fast_find_migrateblock.
Link: https://lkml.kernel.org/r/20230804110454.2935878-5-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kemeng Shi [Fri, 4 Aug 2023 11:04:49 +0000 (19:04 +0800)]
mm/compaction: skip page block marked skip in isolate_migratepages_block
Move migrate_pfn to page block end when block is marked skip to avoid
unnecessary scan retry of that block from upper caller. For example,
compact_zone may wrongly rescan skip page block with finish_pageblock
set as following:
1. cc->migrate point to the start of page block
2. compact_zone record last_migrated_pfn to cc->migrate
3. compact_zone->isolate_migratepages->isolate_migratepages_block
tries to scan the block. The low_pfn maybe moved forward to middle of
block because of free pages at beginning of block.
4. we find first lru page could be isolated but block was exclusive
marked skip.
5. abort isolate_migratepages_block and make cc->migrate_pfn point to
found lru page at middle of block.
6. compact_zone find cc->migrate_pfn and last_migrated_pfn are in the
same block and wrongly rescan the block with finish_pageblock set.
Link: https://lkml.kernel.org/r/20230804110454.2935878-4-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kemeng Shi [Fri, 4 Aug 2023 11:04:48 +0000 (19:04 +0800)]
mm/compaction: correct last_migrated_pfn update in compact_zone
We record start pfn of last isolated page block with last_migrated_pfn. And
then:
1. We check if we mark the page block skip for exclusive access in
isolate_migratepages_block by test if next migrate pfn is still in last
isolated page block. If so, we will set finish_pageblock to do the
rescan.
2. We check if a full cc->order block is scanned by test if last scan
range passes the cc->order block boundary. If so, we flush the pages
were freed.
We treat cc->migrate_pfn before isolate_migratepages as the start pfn of
last isolated page range. However, we always align migrate_pfn to page
block or move to another page block in fast_find_migrateblock or in
linearly scan forward in isolate_migratepages before do page isolation in
isolate_migratepages_block.
Update last_migrated_pfn with pageblock_start_pfn(cc->migrate_pfn - 1)
after scan to correctly set start pfn of last isolated page range. To
avoid that:
1. Miss a rescan with finish_pageblock set as last_migrate_pfn does
not point to right pageblock and the migrate will not be in pageblock
of last_migrate_pfn as it should be.
2. Wrongly issue flush by test cc->order block boundary with wrong
last_migrate_pfn.
Link: https://lkml.kernel.org/r/20230804110454.2935878-3-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam R. Howlett [Fri, 4 Aug 2023 16:59:51 +0000 (12:59 -0400)]
maple_tree: replace data before marking dead in split and spanning store
Reorder the operations for split and spanning stores so that new data is
placed in the tree prior to marking the old data as dead. This will limit
re-walks on dead data to just once instead of a retry loop.
The order of operations is as follows: Create the new data, put the new
data in place, mark the top node of the old data as dead.
Then repair parent links in the reused nodes through all levels of the
tree, following the new nodes downwards. Finally walk the top dead node
looking for nodes that are no longer used, or subtrees that should be
destroyed (marked dead throughout then freed), follow the partially used
nodes downwards to discover other dead nodes and subtrees.
Link: https://lkml.kernel.org/r/20230804165951.2661157-7-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
All calls to mas_adopt_children() currently pass the parent as the node in
the maple state. Allow for the parent pointer that is passed in to be
used instead.
Link: https://lkml.kernel.org/r/20230804165951.2661157-6-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a definition to shorten long code lines and clarify what the code is
doing. Use the new definition to get the maple tree parent pointer from
the maple state where possible.
Link: https://lkml.kernel.org/r/20230804165951.2661157-5-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam R. Howlett [Fri, 4 Aug 2023 16:59:48 +0000 (12:59 -0400)]
maple_tree: introduce mas_put_in_tree()
mas_replace() has a single user that takes a flag which is now always
true. Replace this function with mas_put_in_tree() to better align with
mas_replace_node(). Inline the remaining logic into the only caller;
mas_wmb_replace().
Link: https://lkml.kernel.org/r/20230804165951.2661157-4-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam R. Howlett [Fri, 4 Aug 2023 16:59:47 +0000 (12:59 -0400)]
maple_tree: reorder replacement of nodes to avoid live lock
Replacing nodes may cause a live lock-up if CPU resources are saturated by
write operations on the tree by continuously retrying on dead nodes. To
avoid the continuous retry scenario, ensure the new node is inserted into
the tree prior to marking the old data as dead. This will define a window
where old and new data is swapped.
When reusing lower level nodes, ensure the parent pointer is updated after
the parent is marked dead. This ensures that the child is still reachable
from the top of the tree, but walking up to a dead node will result in a
single retry that will start a fresh walk from the top down through the
new node.
Link: https://lkml.kernel.org/r/20230804165951.2661157-3-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam R. Howlett [Fri, 4 Aug 2023 16:59:46 +0000 (12:59 -0400)]
maple_tree: add hex output to maple_arange64 dump
Patch series "maple_tree: Change replacement strategy".
The maple tree marks nodes dead as soon as they are going to be replaced.
This could be problematic when used in the RCU context since the writer
may be starved of CPU time by the readers. This patch set addresses the
issue by switching the data replacement strategy to one that will only
mark data as dead once the new data is available.
This series changes the ordering of the node replacement so that the new
data is live before the old data is marked 'dead'. When readers hit
'dead' nodes, they will restart from the top of the tree and end up in the
new data.
In more complex scenarios, the replacement strategy means a subtree is
built and graphed into the tree leaving some nodes to point to the old
parent. The view of tasks into the old data will either remain with the
old data, or see the new data once the old data is marked 'dead'.
Iterators will see the 'dead' node and restart on their own and switch to
the new data. There is no risk of the reader seeing old data in these
cases.
The 'dead' subtree of data is then fully marked dead, but reused nodes
will still point to the dead nodes until the parent pointer is updated.
Walking up to a 'dead' node will cause a re-walk from the top of the tree
and enter the new data area where old data is not reachable.
Once the parent pointers are fully up to date in the active data, the
'dead' subtree is iterated to collect entirely 'dead' subtrees, and dead
nodes (nodes that partially contained reused data).
This patch (of 6):
When dumping the tree, honour formatting request to output hex for the
maple node type arange64.
It is better to use huge page size instead of PAGE_SIZE for stride when
flush hugepage, which reduces the loop in __flush_tlb_range().
Let's support arch's flush_hugetlb_tlb_range(), which is used in
hugetlb_unshare_all_pmds(), move_hugetlb_page_tables() and
hugetlb_change_protection() for now.
Note,: for hugepages based on contiguous bit, it has to be invalidated
individually since the contiguous PTE bit is just a hint, the hardware may
or may not take it into account.
Link: https://lkml.kernel.org/r/20230802012731.62512-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Will Deacon <will@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kefeng Wang [Tue, 1 Aug 2023 02:31:44 +0000 (10:31 +0800)]
mm: hugetlb: use flush_hugetlb_tlb_range() in move_hugetlb_page_tables()
Archs may need to do special things when flushing hugepage tlb, so use the
more applicable flush_hugetlb_tlb_range() instead of flush_tlb_range().
Link: https://lkml.kernel.org/r/20230801023145.17026-2-wangkefeng.wang@huawei.com Fixes: 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed vma") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mina Almasry <almasrymina@google.com> Cc: Will Deacon <will@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kemeng Shi [Thu, 3 Aug 2023 09:48:59 +0000 (17:48 +0800)]
mm/compaction: merge end_pfn boundary check in isolate_freepages_range
Merge the end_pfn boundary check for single page block forward and
multiple page blocks forward to avoid do twice boundary check for multiple
page blocks forward.
Link: https://lkml.kernel.org/r/20230803094901.2915942-3-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kemeng Shi [Thu, 3 Aug 2023 09:48:58 +0000 (17:48 +0800)]
mm/compaction: set compact_cached_free_pfn correctly in update_pageblock_skip
Patch series "Fixes and cleanups to compaction", v2.
This series contains random fixes and cleanups to free page isolation in
compaction. This is based on another compact series[1]. More details can
be found in respective patches.
This patch (of 4):
We will set skip to page block of block_start_pfn, it's more reasonable to
set compact_cached_free_pfn to page block before the block_start_pfn.
Miaohe Lin [Thu, 3 Aug 2023 12:00:21 +0000 (20:00 +0800)]
mm/memcg: fix wrong function name above obj_cgroup_charge_zswap()
The correct function name is obj_cgroup_may_zswap(). Correct the comment.
Link: https://lkml.kernel.org/r/20230803120021.762279-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Miaohe Lin [Thu, 3 Aug 2023 11:49:34 +0000 (19:49 +0800)]
mm/page_alloc: remove unneeded variable base
Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for
THP-sized allocations"), local variable base is just as same as order. So
remove it. No functional change intended.
Ruan Jinjie [Thu, 3 Aug 2023 11:38:23 +0000 (19:38 +0800)]
mm/z3fold: use helper function put_z3fold_locked() and put_z3fold_locked_list()
This code is already duplicated six times, use helper function
put_z3fold_locked() to release z3fold page instead of open code it to help
improve code readability a bit. And add put_z3fold_locked_list() helper
function to be consistent with it. No functional change involved.
SeongJae Park [Wed, 2 Aug 2023 21:43:08 +0000 (21:43 +0000)]
mm/damon/sysfs-schemes: support target damos filter
Extend DAMON sysfs interface to support the DAMON monitoring target based
DAMOS filter. Users can use it via writing 'target' to the filter's
'type' file and specifying the index of the target from the corresponding
DAMON context's monitoring targets list to 'target_idx' sysfs file.
Link: https://lkml.kernel.org/r/20230802214312.110532-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Wed, 2 Aug 2023 21:43:07 +0000 (21:43 +0000)]
mm/damon/core: implement target type damos filter
One DAMON context can have multiple monitoring targets, and DAMOS schemes
are applied to all targets. In some cases, users need to apply different
scheme to different targets. Retrieving monitoring results via DAMON
sysfs interface' 'tried_regions' directory could be one good example.
Also, there could be cases that cgroup DAMOS filter is not enough. All
such use cases can be worked around by having multiple DAMON contexts
having only single target, but it is inefficient in terms of resource
usage, thogh the overhead is not estimated to be huge.
Implement DAMON monitoring target based DAMOS filter for the case. Like
address range target DAMOS filter, handle these filters in the DAMON core
layer, since it is more efficient than doing in operations set layer.
This also means that regions that filtered out by monitoring target type
DAMOS filters are counted as not tried by the scheme. Hence, target
granularity monitoring results retrieval via DAMON sysfs interface becomes
available.
Link: https://lkml.kernel.org/r/20230802214312.110532-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Wed, 2 Aug 2023 21:43:04 +0000 (21:43 +0000)]
Docs/mm/damon/design: update for address range filters
Update DAMON design document's DAMOS filters section for address range
DAMOS filters. Because address range filters are handled by the core
layer and it makes difference in schemes tried regions and schemes
statistics, clearly describe it.
Link: https://lkml.kernel.org/r/20230802214312.110532-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Wed, 2 Aug 2023 21:43:03 +0000 (21:43 +0000)]
selftests/damon/sysfs: test address range damos filter
Add a selftest for checking existence of addr_{start,end} files under
DAMOS filter directory, and 'addr' damos filter type input of DAMON sysfs
interface.
Link: https://lkml.kernel.org/r/20230802214312.110532-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Wed, 2 Aug 2023 21:43:02 +0000 (21:43 +0000)]
mm/damon/core-test: add a unit test for __damos_filter_out()
Implement a kunit test for the core of address range DAMOS filter
handling, namely __damos_filter_out(). The test especially focus on
regions that overlap with given filter's target address range.
Link: https://lkml.kernel.org/r/20230802214312.110532-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Wed, 2 Aug 2023 21:43:01 +0000 (21:43 +0000)]
mm/damon/sysfs-schemes: support address range type DAMOS filter
Extend DAMON sysfs interface to support address range based DAMOS filters,
by adding a special keyword for the filter/<N>/type file, namely 'addr',
and two files under filter/<N>/ for specifying the start and the end
addresses of the range, namely 'addr_start' and 'addr_end'.
Link: https://lkml.kernel.org/r/20230802214312.110532-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Wed, 2 Aug 2023 21:43:00 +0000 (21:43 +0000)]
mm/damon/core: introduce address range type damos filter
Patch series "Extend DAMOS filters for address ranges and DAMON monitoring
targets"
There are use cases that need to apply DAMOS schemes to specific address
ranges or DAMON monitoring targets. NUMA nodes in the physical address
space, special memory objects in the virtual address space, and monitoring
target specific efficient monitoring results snapshot retrieval could be
examples of such use cases. This patchset extends DAMOS filters feature
for such cases, by implementing two more filter types, namely address
ranges and DAMON monitoring types.
Patches sequence
----------------
The first seven patches are for the address ranges based DAMOS filter.
The first patch implements the filter feature and expose it via DAMON
kernel API. The second patch further expose the feature to users via
DAMON sysfs interface. The third and fourth patches implement unit tests
and selftests for the feature. Three patches (fifth to seventh) updating
the documents follow.
The following six patches are for the DAMON monitoring target based DAMOS
filter. The eighth patch implements the feature in the core layer and
expose it via DAMON's kernel API. The ninth patch further expose it to
users via DAMON sysfs interface. Tenth patch add a selftest, and two
patches (eleventh and twelfth) update documents.
Users can know special characteristic of specific address ranges. NUMA
nodes or special objects or buffers in virtual address space could be such
examples. For such cases, DAMOS schemes could required to be applied to
only specific address ranges. Implement yet another type of DAMOS filter
for the purpose.
Note that the existing filter types, namely anon pages and memcg DAMOS
filters needed page level type check. Because such check can be done
efficiently in the opertions set layer, those filters are handled in
operations set layer. Specifically, only paddr operations set
implementation supports these filters. Also, because statistics counting
is done in the DAMON core layer, the regions that filtered out by these
filters are counted as tried but failed to the statistics.
Unlike those, address range based filters can efficiently handled in the
core layer. Hence, do the handling in the layer, and count the regions
that filtered out by those as the scheme has not tried for the region.
This difference should clearly documented.
SeongJae Park [Wed, 2 Aug 2023 21:32:18 +0000 (21:32 +0000)]
mm/damon/sysfs: implement a command for updating only schemes tried total bytes
Using tried_regions/total_bytes file, users can efficiently retrieve the
total size of memory regions having specific access pattern. However,
DAMON sysfs interface in kernel still populates all the infomration on the
tried_regions subdirectories. That means the kernel part overhead for the
construction of tried regions directories still exists. To remove the
overhead, implement yet another command input for 'state' DAMON sysfs
file. Writing the input to the file makes DAMON sysfs interface to update
only the total_bytes file.
SeongJae Park [Wed, 2 Aug 2023 21:32:17 +0000 (21:32 +0000)]
mm/damon/sysfs-schemes: implement DAMOS tried total bytes file
Patch series "mm/damon/sysfs-schemes: implement DAMOS tried total bytes
file".
The tried_regions directory of DAMON sysfs interface is useful for
retrieving monitoring results snapshot or DAMOS debugging. However, for
common use case that need to monitor only the total size of the scheme
tried regions (e.g., monitoring working set size), the kernel overhead for
directory construction and user overhead for reading the content could be
high if the number of monitoring region is not small. This patchset
implements DAMON sysfs files for efficient support of the use case.
The first patch implements the sysfs file to reduce the user space
overhead, and the second patch implements a command for reducing the
kernel space overhead.
The third patch adds a selftest for the new file, and following two
patches update documents.
The tried_regions directory can be used for retrieving the monitoring
results snapshot for regions of specific access pattern, by setting the
scheme's action as 'stat' and the access pattern as required. While the
interface provides every detail of the monitoring results, some use cases
including working set size monitoring requires only the total size of the
regions. For such cases, users should read all the information and
calculate the total size of the regions. However, it could incur high
overhead if the number of regions is high. Add a file for retrieving only
the information, namely 'total_bytes' file. It allows users to get the
total size by reading only the file.
Kalesh Singh [Wed, 2 Aug 2023 02:56:04 +0000 (19:56 -0700)]
Multi-gen LRU: fix can_swap in lru_gen_look_around()
walk->can_swap might be invalid since it's not guaranteed to be
initialized for the particular lruvec. Instead deduce it from the folio
type (anon/file).
Link: https://lkml.kernel.org/r/20230802025606.346758-3-kaleshsingh@google.com Fixes: 018ee47f1489 ("mm: multi-gen LRU: exploit locality in rmap") Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> [mediatek] Tested-by: Charan Teja Kalla <quic_charante@quicinc.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> Cc: Barry Song <baohua@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Steven Barrett <steven@liquorix.net> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kalesh Singh [Wed, 2 Aug 2023 02:56:03 +0000 (19:56 -0700)]
Multi-gen LRU: avoid race in inc_min_seq()
inc_max_seq() will try to inc_min_seq() if nr_gens == MAX_NR_GENS. This
is because the generations are reused (the last oldest now empty
generation will become the next youngest generation).
inc_min_seq() is retried until successful, dropping the lru_lock
and yielding the CPU on each failure, and retaking the lock before
trying again:
while (!inc_min_seq(lruvec, type, can_swap)) {
spin_unlock_irq(&lruvec->lru_lock);
cond_resched();
spin_lock_irq(&lruvec->lru_lock);
}
However, the initial condition that required incrementing the min_seq
(nr_gens == MAX_NR_GENS) is not retested. This can change by another
call to inc_max_seq() from run_aging() with force_scan=true from the
debugfs interface.
Since the eviction stalls when the nr_gens == MIN_NR_GENS, avoid
unnecessarily incrementing the min_seq by rechecking the number of
generations before each attempt.
This issue was uncovered in previous discussion on the list by Yu Zhao
and Aneesh Kumar [1].
Kalesh Singh [Wed, 2 Aug 2023 02:56:02 +0000 (19:56 -0700)]
Multi-gen LRU: fix per-zone reclaim
MGLRU has a LRU list for each zone for each type (anon/file) in each
generation:
long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
The min_seq (oldest generation) can progress independently for each
type but the max_seq (youngest generation) is shared for both anon and
file. This is to maintain a common frame of reference.
In order for eviction to advance the min_seq of a type, all the per-zone
lists in the oldest generation of that type must be empty.
The eviction logic only considers pages from eligible zones for
eviction or promotion.
Consider the system has the movable zone configured and default 4
generations. The current state of the system is as shown below
(only illustrating one type for simplicity):
Type: ANON
Zone DMA32 Normal Movable Device
Gen 0 0 0 4GB 0
Gen 1 0 1GB 1MB 0
Gen 2 1MB 4GB 1MB 0
Gen 3 1MB 1MB 1MB 0
Now consider there is a GFP_KERNEL allocation request (eligible zone
index <= Normal), evict_folios() will return without doing any work
since there are no pages to scan in the eligible zones of the oldest
generation. Reclaim won't make progress until triggered from a ZONE_MOVABLE
allocation request; which may not happen soon if there is a lot of free
memory in the movable zone. This can lead to OOM kills, although there
is 1GB pages in the Normal zone of Gen 1 that we have not yet tried to
reclaim.
This issue is not seen in the conventional active/inactive LRU since
there are no per-zone lists.
If there are no (not enough) folios to scan in the eligible zones, move
folios from ineligible zone (zone_index > reclaim_index) to the next
generation. This allows for the progression of min_seq and reclaiming
from the next generation (Gen 1).
Qualcomm, Mediatek and raspberrypi [1] discovered this issue independently.
Efly Young [Fri, 21 Jul 2023 01:41:16 +0000 (09:41 +0800)]
mm:vmscan: fix inaccurate reclaim during proactive reclaim
Before commit f53af4285d77 ("mm: vmscan: fix extreme overreclaim and swap
floods"), proactive reclaim will extreme overreclaim sometimes. But
proactive reclaim still inaccurate and some extent overreclaim.
Problematic case is easy to construct. Allocate lots of anonymous memory
(e.g., 20G) in a memcg, then swapping by writing memory.recalim and there
is a certain probability of overreclaim. For example, request 1G by
writing memory.reclaim will eventually reclaim 1.7G or other values more
than 1G.
The reason is that reclaimer may have already reclaimed part of requested
memory in one loop, but before adjust sc->nr_to_reclaim in outer loop,
call shrink_lruvec() again will still follow the current sc->nr_to_reclaim
to work. It will eventually lead to overreclaim. In theory, the amount
of reclaimed would be in [request, 2 * request).
Reclaimer usually tends to reclaim more than request. But either direct
or kswapd reclaim have much smaller nr_to_reclaim targets, so it is less
noticeable and not have much impact.
Proactive reclaim can usually come in with a larger value, so the error is
difficult to ignore. Considering proactive reclaim is usually low
frequency, handle the batching into smaller chunks is a better approach.
SeongJae Park [Sat, 29 Jul 2023 20:37:33 +0000 (20:37 +0000)]
mm/damon/core-test: add a test for damos_new_filter()
damos_new_filter() was having a bug that not initializing ->list field of
the returning damos_filter struct, which results in access to
uninitialized memory. Add a unit test for the function.
Since commit bef8620cd8e0 ("mm: memcg: deprecate the non-hierarchical
mode"), use_hierarchy is already deprecated. And it's further removed via
commit 9d9d341df4d5 ("cgroup: remove obsoleted broken_hierarchy and
warned_broken_hierarchy"). Update corresponding comment.
Link: https://lkml.kernel.org/r/20230801124359.2266860-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When free_pages is 0, alike_pages is not used. So alike_pages calculation
can be avoided by checking free_pages early to save cpu cycles. Also fix
typo 'comparable'. It should be 'compatible' here.
Kefeng Wang [Fri, 28 Jul 2023 05:00:43 +0000 (13:00 +0800)]
perf/core: use vma_is_initial_stack() and vma_is_initial_heap()
Use the helpers to simplify code, also kill unneeded goto cpy_name.
Link: https://lkml.kernel.org/r/20230728050043.59880-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Göttsche <cgzones@googlemail.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Eric Paris <eparis@parisplace.org> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kefeng Wang [Fri, 28 Jul 2023 05:00:42 +0000 (13:00 +0800)]
selinux: use vma_is_initial_stack() and vma_is_initial_heap()
Use the helpers to simplify code.
Link: https://lkml.kernel.org/r/20230728050043.59880-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Eric Paris <eparis@parisplace.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Christian Göttsche <cgzones@googlemail.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kefeng Wang [Fri, 28 Jul 2023 05:00:41 +0000 (13:00 +0800)]
drm/amdkfd: use vma_is_initial_stack() and vma_is_initial_heap()
Use the helpers to simplify code.
Link: https://lkml.kernel.org/r/20230728050043.59880-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Christian Göttsche <cgzones@googlemail.com> Cc: Eric Paris <eparis@parisplace.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kefeng Wang [Fri, 28 Jul 2023 05:00:40 +0000 (13:00 +0800)]
mm: factor out VMA stack and heap checks
Patch series "mm: convert to vma_is_initial_heap/stack()", v3.
Add vma_is_initial_stack() and vma_is_initial_heap() helpers and use them
to simplify code.
This patch (of 4):
Factor out VMA stack and heap checks and name them vma_is_initial_stack()
and vma_is_initial_heap() for general use.
Link: https://lkml.kernel.org/r/20230728050043.59880-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20230728050043.59880-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Göttsche <cgzones@googlemail.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Christian Göttsche <cgzones@googlemail.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Eric Paris <eparis@parisplace.org> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add KSM_MERGE_TIME and KSM_MERGE_TIME_HUGE_PAGES tests with
size of 100.
./run_vmtests.sh -t ksm
-----------------------------
running ./ksm_tests -H -s 100
-----------------------------
Number of normal pages: 0
Number of huge pages: 50
Total size: 100 MiB
Total time: 0.399844662 s
Average speed: 250.097 MiB/s
[PASS]
-----------------------------
running ./ksm_tests -P -s 100
-----------------------------
Total size: 100 MiB
Total time: 0.451931496 s
Average speed: 221.272 MiB/s
[PASS]
Matthew Wilcox [Thu, 27 Jul 2023 20:39:44 +0000 (21:39 +0100)]
mm: improve the comment in isolate_migratepages_block()
A recent patch shows that not everybody understands that "stabilise the
mapping" really means "prevent the mapping from being freed", so change
the wording to hopefully make that more clear.
Link: https://lkml.kernel.org/r/ZMLWEB4m3zvX6SBN@casper.infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yang Li [Thu, 27 Jul 2023 01:55:58 +0000 (09:55 +0800)]
mm/memory.c: fix some kernel-doc comments
Add description of @mas and @tree_end, remove @mt in unmap_vmas(). to
silence the warnings:
mm/memory.c:1837: warning: Function parameter or member 'mas' not described in 'unmap_vmas'
mm/memory.c:1837: warning: Function parameter or member 'tree_end' not described in 'unmap_vmas'
mm/memory.c:1837: warning: Excess function parameter 'mt' description in 'unmap_vmas'
Link: https://lkml.kernel.org/r/20230727015558.69554-1-yang.lee@linux.alibaba.com Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5996 Cc: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Miaohe Lin [Thu, 27 Jul 2023 11:59:34 +0000 (19:59 +0800)]
mm/memcg: fix obsolete function name in mem_cgroup_protection()
Commit 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from
protection checks") changed the function name but not the corresponding
comment.
Link: https://lkml.kernel.org/r/20230727115934.657787-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johannes Weiner [Thu, 27 Jul 2023 16:22:25 +0000 (12:22 -0400)]
mm: zswap: kill zswap_get_swap_cache_page()
The __read_swap_cache_async() interface isn't more difficult to understand
than what the helper abstracts. Save the indirection and a level of
indentation for the primary work of the writeback func.
Link: https://lkml.kernel.org/r/20230727162343.1415598-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Barry Song <song.bao.hua@hisilicon.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johannes Weiner [Thu, 27 Jul 2023 16:22:24 +0000 (12:22 -0400)]
mm: zswap: tighten up entry invalidation
Removing a zswap entry from the tree is tied to an explicit operation
that's supposed to drop the base reference: swap invalidation, exclusive
load, duplicate store. Don't silently remove the entry on final put, but
instead warn if an entry is in tree without reference.
While in that diff context, convert a BUG_ON to a WARN_ON_ONCE. No need
to crash on a refcount underflow.
Link: https://lkml.kernel.org/r/20230727162343.1415598-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Cc: Barry Song <song.bao.hua@hisilicon.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kemeng Shi [Tue, 18 Jul 2023 14:58:10 +0000 (22:58 +0800)]
mm/page_ext: add common function to get client data from page_ext
Patch series "add page_ext_data to get client data in page_ext".
Current clients get data from page_ext by adding offset which is auto
generated in page_ext core and exposes the data layout design inside
page_ext core. This series adds a page_ext_data() to hide this from
clients.
Benefits include:
1. Future clients can call page_ext_data directly instead of defining
a new function like get_page_owner to get the data.
2. There is no change to clients if the layout of page_ext data changes.
This patch (of 3):
Add common page_ext_data function to get client data. This could hide
offset which is auto generated in page_ext core and expose the desgin of
page_ext data layout.