mm/hugetlb: document huge_pte_offset usage

author Peter Xu <peterx@redhat.com>

Fri, 16 Dec 2022 15:50:54 +0000 (10:50 -0500)

committer Andrew Morton <akpm@linux-foundation.org>

Thu, 19 Jan 2023 01:12:38 +0000 (17:12 -0800)
author Peter Xu <peterx@redhat.com>
Fri, 16 Dec 2022 15:50:54 +0000 (10:50 -0500)
committer Andrew Morton <akpm@linux-foundation.org>
Thu, 19 Jan 2023 01:12:38 +0000 (17:12 -0800)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h

index 551834cd529951e37a73cba4ebbf1840f5579db3..d755e2a7c0db0723a1acf9fe01becd6b19061836 100644 (file)
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -192,6 +192,38 @@ extern struct list_head huge_boot_pages;
  
  pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
                         unsigned long addr, unsigned long sz);
+/*
+ * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE.
+ * Returns the pte_t* if found, or NULL if the address is not mapped.
+ *
+ * Since this function will walk all the pgtable pages (including not only
+ * high-level pgtable page, but also PUD entry that can be unshared
+ * concurrently for VM_SHARED), the caller of this function should be
+ * responsible of its thread safety.  One can follow this rule:
+ *
+ *  (1) For private mappings: pmd unsharing is not possible, so holding the
+ *      mmap_lock for either read or write is sufficient. Most callers
+ *      already hold the mmap_lock, so normally, no special action is
+ *      required.
+ *
+ *  (2) For shared mappings: pmd unsharing is possible (so the PUD-ranged
+ *      pgtable page can go away from under us!  It can be done by a pmd
+ *      unshare with a follow up munmap() on the other process), then we
+ *      need either:
+ *
+ *     (2.1) hugetlb vma lock read or write held, to make sure pmd unshare
+ *           won't happen upon the range (it also makes sure the pte_t we
+ *           read is the right and stable one), or,
+ *
+ *     (2.2) hugetlb mapping i_mmap_rwsem lock held read or write, to make
+ *           sure even if unshare happened the racy unmap() will wait until
+ *           i_mmap_rwsem is released.
+ *
+ * Option (2.1) is the safest, which guarantees pte stability from pmd
+ * sharing pov, until the vma lock released.  Option (2.2) doesn't protect
+ * a concurrent pmd unshare, but it makes sure the pgtable page is safe to
+ * access.
+ */
  pte_t *huge_pte_offset(struct mm_struct *mm,
                        unsigned long addr, unsigned long sz);
  unsigned long hugetlb_mask_last_page(struct hstate *h);
author	Peter Xu <peterx@redhat.com>
	Fri, 16 Dec 2022 15:50:54 +0000 (10:50 -0500)
committer	Andrew Morton <akpm@linux-foundation.org>
	Thu, 19 Jan 2023 01:12:38 +0000 (17:12 -0800)