Let's reuse our new MM ownership tracking infrastructure for large folios
to make folio_likely_mapped_shared() never return false negatives -- never
indicating "not mapped shared" although the folio *is* mapped shared.
With that, we can rename it to folio_maybe_mapped_shared() and get rid of
the dependency on the mapcount of the first folio page.
The semantics are now arguably clearer: no mixture of "false negatives"
and "false positives", only the remaining possibility for "false
positives".
Thoroughly document the new semantics. We might now detect that a large
folio is "maybe mapped shared" although it *no longer* is -- but once was.
Now, if more than two MMs mapped a folio at the same time, and the MM
mapping the folio exclusively at the end is not one tracked in the two
folio MM slots, we will detect the folio as "maybe mapped shared".
For anonymous folios, usually (except weird corner cases) all PTEs that
target a "maybe mapped shared" folio are R/O. As soon as a child process
would write to them (iow, actively use them), we would CoW and effectively
replace these PTEs. Most cases (below) are not expected to really matter
with large anonymous folios for this reason.
Most importantly, there will be no change at all for:
* small folios
* hugetlb folios
* PMD-mapped PMD-sized THPs (single mapping)
This change has the potential to affect existing callers of
folio_likely_mapped_shared() -> folio_maybe_mapped_shared():
(1) fs/proc/task_mmu.c: no change (hugetlb)
(2) khugepaged counts PTEs that target shared folios towards
max_ptes_shared (default: HPAGE_PMD_NR / 2), meaning we could skip a
collapse where we would have previously collapsed. This only applies
to anonymous folios and is not expected to matter in practice.
Worth noting that this change sorts out case (A) documented in
commit
1bafe96e89f0 ("mm/khugepaged: replace page_mapcount() check by
folio_likely_mapped_shared()") by removing the possibility for "false
negatives".
(3) MADV_COLD / MADV_PAGEOUT / MADV_FREE will not try splitting
PTE-mapped THPs that are considered shared but not fully covered by
the requested range, consequently not processing them.
PMD-mapped PMD-sized THP are not affected, or when all PTEs are
covered. These functions are usually only called on anon/file folios
that are exclusively mapped most of the time (no other file mappings
or no fork()), so the "false negatives" are not expected to matter in
practice.
(4) mbind() / migrate_pages() / move_pages() will refuse to migrate
shared folios unless MPOL_MF_MOVE_ALL is effective (requires
CAP_SYS_NICE). We will now reject some folios that could be migrated.
Similar to (3), especially with MPOL_MF_MOVE_ALL, so this is not
expected to matter in practice.
Note that cpuset_migrate_mm_workfn() calls do_migrate_pages() with
MPOL_MF_MOVE_ALL.
(5) NUMA hinting
mm/migrate.c:migrate_misplaced_folio_prepare() will skip file
folios that are probably shared libraries (-> "mapped shared" and
executable). This check would have detected it as a shared library at
some point (at least 3 MMs mapping it), so detecting it afterwards
does not sound wrong (still a shared library). Not expected to
matter.
mm/memory.c:numa_migrate_check() will indicate TNF_SHARED in
MAP_SHARED file mappings when encountering a shared folio. Similar
reasoning, not expected to matter.
mm/mprotect.c:change_pte_range() will skip folios detected as
shared in CoW mappings. Similarly, this is not expected to matter in
practice, but if it would ever be a problem we could relax that check
a bit (e.g., basing it on the average page-mapcount in a folio),
because it was only an optimization when many (e.g., 288) processes
were mapping the same folios -- see commit
859d4adc3415 ("mm: numa: do
not trap faults on shared data section pages.")
(6) mm/rmap.c:folio_referenced_one() will skip exclusive swapbacked
folios in dying processes. Applies to anonymous folios only. Without
"false negatives", we'll now skip all actually shared ones. Skipping
ones that are actually exclusive won't really matter, it's a pure
optimization, and is not expected to matter in practice.
In theory, one can detect the problematic scenario: folio_mapcount() > 0
and no folio MM slot is occupied ("state unknown"). One could reset the
MM slots while doing an rmap walk, which migration / folio split already
do when setting everything up. Further, when batching PTEs we might
naturally learn about a owner (e.g., folio_mapcount() == nr_ptes) and
could update the owner. However, we'll defer that until the scenarios
where it would really matter are clear.
Link: https://lkml.kernel.org/r/20250303163014.1128035-15-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirks^H^Hski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: tejun heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
if (folio) {
/* We treat non-present entries as "maybe shared". */
- if (!present || folio_likely_mapped_shared(folio) ||
+ if (!present || folio_maybe_mapped_shared(folio) ||
hugetlb_pmd_shared(pte))
mss->shared_hugetlb += huge_page_size(hstate_vma(vma));
else
if (!folio_test_anon(folio))
flags |= PM_FILE;
- if (!folio_likely_mapped_shared(folio) &&
+ if (!folio_maybe_mapped_shared(folio) &&
!hugetlb_pmd_shared(ptep))
flags |= PM_MMAP_EXCLUSIVE;
}
/**
- * folio_likely_mapped_shared - Estimate if the folio is mapped into the page
- * tables of more than one MM
+ * folio_maybe_mapped_shared - Whether the folio is mapped into the page
+ * tables of more than one MM
* @folio: The folio.
*
- * This function checks if the folio is currently mapped into more than one
- * MM ("mapped shared"), or if the folio is only mapped into a single MM
- * ("mapped exclusively").
+ * This function checks if the folio maybe currently mapped into more than one
+ * MM ("maybe mapped shared"), or if the folio is certainly mapped into a single
+ * MM ("mapped exclusively").
*
* For KSM folios, this function also returns "mapped shared" when a folio is
* mapped multiple times into the same MM, because the individual page mappings
* are independent.
*
- * As precise information is not easily available for all folios, this function
- * estimates the number of MMs ("sharers") that are currently mapping a folio
- * using the number of times the first page of the folio is currently mapped
- * into page tables.
- *
* For small anonymous folios and anonymous hugetlb folios, the return
* value will be exactly correct: non-KSM folios can only be mapped at most once
* into an MM, and they cannot be partially mapped. KSM folios are
*
* For other folios, the result can be fuzzy:
* #. For partially-mappable large folios (THP), the return value can wrongly
- * indicate "mapped exclusively" (false negative) when the folio is
- * only partially mapped into at least one MM.
+ * indicate "mapped shared" (false positive) if a folio was mapped by
+ * more than two MMs at one point in time.
* #. For pagecache folios (including hugetlb), the return value can wrongly
* indicate "mapped shared" (false positive) when two VMAs in the same MM
* cover the same file range.
*
* Return: Whether the folio is estimated to be mapped into more than one MM.
*/
-static inline bool folio_likely_mapped_shared(struct folio *folio)
+static inline bool folio_maybe_mapped_shared(struct folio *folio)
{
int mapcount = folio_mapcount(folio);
if (!folio_test_large(folio) || unlikely(folio_test_hugetlb(folio)))
return mapcount > 1;
- /* A single mapping implies "mapped exclusively". */
- if (mapcount <= 1)
- return false;
-
- /* If any page is mapped more than once we treat it "mapped shared". */
- if (folio_entire_mapcount(folio) || mapcount > folio_nr_pages(folio))
+ /*
+ * vm_insert_page() without CONFIG_TRANSPARENT_HUGEPAGE ...
+ * simply assume "mapped shared", nobody should really care
+ * about this for arbitrary kernel allocations.
+ */
+ if (!IS_ENABLED(CONFIG_MM_ID))
return true;
- /* Let's guess based on the first subpage. */
- return atomic_read(&folio->_mapcount) > 0;
+ /*
+ * A single mapping implies "mapped exclusively", even if the
+ * folio flag says something different: it's easier to handle this
+ * case here instead of on the RMAP hot path.
+ */
+ if (mapcount <= 1)
+ return false;
+ return folio_test_large_maybe_mapped_shared(folio);
}
#ifndef HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE
* If other processes are mapping this folio, we couldn't discard
* the folio unless they all do MADV_FREE so let's skip the folio.
*/
- if (folio_likely_mapped_shared(folio))
+ if (folio_maybe_mapped_shared(folio))
goto out;
if (!folio_trylock(folio))
VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
/* See hpage_collapse_scan_pmd(). */
- if (folio_likely_mapped_shared(folio)) {
+ if (folio_maybe_mapped_shared(folio)) {
++shared;
if (cc->is_khugepaged &&
shared > khugepaged_max_ptes_shared) {
/*
* We treat a single page as shared if any part of the THP
- * is shared. "False negatives" from
- * folio_likely_mapped_shared() are not expected to matter
- * much in practice.
+ * is shared.
*/
- if (folio_likely_mapped_shared(folio)) {
+ if (folio_maybe_mapped_shared(folio)) {
++shared;
if (cc->is_khugepaged &&
shared > khugepaged_max_ptes_shared) {
folio = pmd_folio(orig_pmd);
/* Do not interfere with other mappings of this folio */
- if (folio_likely_mapped_shared(folio))
+ if (folio_maybe_mapped_shared(folio))
goto huge_unlock;
if (pageout_anon_only_filter && !folio_test_anon(folio))
if (nr < folio_nr_pages(folio)) {
int err;
- if (folio_likely_mapped_shared(folio))
+ if (folio_maybe_mapped_shared(folio))
continue;
if (pageout_anon_only_filter && !folio_test_anon(folio))
continue;
if (nr < folio_nr_pages(folio)) {
int err;
- if (folio_likely_mapped_shared(folio))
+ if (folio_maybe_mapped_shared(folio))
continue;
if (!folio_trylock(folio))
continue;
* Flag if the folio is shared between multiple address spaces. This
* is later used when determining whether to group tasks together
*/
- if (folio_likely_mapped_shared(folio) && (vma->vm_flags & VM_SHARED))
+ if (folio_maybe_mapped_shared(folio) && (vma->vm_flags & VM_SHARED))
*flags |= TNF_SHARED;
/*
* For memory tiering mode, cpupid of slow memory page is used
* Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio.
* Choosing not to migrate a shared folio is not counted as a failure.
*
- * See folio_likely_mapped_shared() on possible imprecision when we
+ * See folio_maybe_mapped_shared() on possible imprecision when we
* cannot easily detect if a folio is shared.
*/
if ((flags & MPOL_MF_MOVE_ALL) ||
- (!folio_likely_mapped_shared(folio) && !hugetlb_pmd_shared(pte)))
+ (!folio_maybe_mapped_shared(folio) && !hugetlb_pmd_shared(pte)))
if (!folio_isolate_hugetlb(folio, qp->pagelist))
qp->nr_failed++;
unlock:
* Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio.
* Choosing not to migrate a shared folio is not counted as a failure.
*
- * See folio_likely_mapped_shared() on possible imprecision when we
+ * See folio_maybe_mapped_shared() on possible imprecision when we
* cannot easily detect if a folio is shared.
*/
- if ((flags & MPOL_MF_MOVE_ALL) || !folio_likely_mapped_shared(folio)) {
+ if ((flags & MPOL_MF_MOVE_ALL) || !folio_maybe_mapped_shared(folio)) {
if (folio_isolate_lru(folio)) {
list_add_tail(&folio->lru, foliolist);
node_stat_mod_folio(folio,
if (folio_nid(folio) == node)
return 0;
- if (folio_likely_mapped_shared(folio) && !migrate_all)
+ if (folio_maybe_mapped_shared(folio) && !migrate_all)
return -EACCES;
if (folio_test_hugetlb(folio)) {
* processes with execute permissions as they are probably
* shared libraries.
*
- * See folio_likely_mapped_shared() on possible imprecision
+ * See folio_maybe_mapped_shared() on possible imprecision
* when we cannot easily detect if a folio is shared.
*/
- if ((vma->vm_flags & VM_EXEC) &&
- folio_likely_mapped_shared(folio))
+ if ((vma->vm_flags & VM_EXEC) && folio_maybe_mapped_shared(folio))
return -EACCES;
/*
/* Also skip shared copy-on-write pages */
if (is_cow_mapping(vma->vm_flags) &&
(folio_maybe_dma_pinned(folio) ||
- folio_likely_mapped_shared(folio)))
+ folio_maybe_mapped_shared(folio)))
continue;
/*
if ((!atomic_read(&vma->vm_mm->mm_users) ||
check_stable_address_space(vma->vm_mm)) &&
folio_test_anon(folio) && folio_test_swapbacked(folio) &&
- !folio_likely_mapped_shared(folio)) {
+ !folio_maybe_mapped_shared(folio)) {
pra->referenced = -1;
page_vma_mapped_walk_done(&pvmw);
return false;