From: Mike Kravetz Date: Thu, 19 Nov 2015 18:12:01 +0000 (-0800) Subject: mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes X-Git-Tag: v4.1.12-92~233^2~1 X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=3005ce1b1586704e523a3b56dc9df35d63f56f0c;p=users%2Fjedix%2Flinux-maple.git mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes Orabug: 22220400 Hugh Dickins pointed out problems with the new hugetlbfs fallocate hole punch code. These problems are in the routine remove_inode_hugepages and mostly occur in the case where there are holes in the range of pages to be removed. These holes could be the result of a previous hole punch or simply sparse allocation. The current code could access pages outside the specified range. remove_inode_hugepages handles both hole punch and truncate operations. Page index handling was fixed/cleaned up so that the loop index always matches the page being processed. The code now only makes a single pass through the range of pages as it was determined page faults could not race with truncate. A cond_resched() was added after removing up to PAGEVEC_SIZE pages. Some totally unnecessary code in hugetlbfs_fallocate() that remained from early development was also removed. Tested with fallocate tests submitted here: http://librelist.com/browser//libhugetlbfs/2015/6/25/patch-tests-add-tests-for-fallocate-system-call/ And, some ftruncate tests under development Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages") Signed-off-by: Mike Kravetz Acked-by: Hugh Dickins Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Davidlohr Bueso Cc: "Hillf Danton" Cc: [4.3] Signed-off-by: Andrew Morton (cherry picked from commit ef9a2b7a46755b6b2d4ab522c2ffa53c6e1a0729) Signed-off-by: Mike Kravetz --- diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 4247ecd8304b..a8143ddd688f 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -369,10 +369,25 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, lookup_nr = end - next; /* - * This pagevec_lookup() may return pages past 'end', - * so we must check for page->index > end. + * When no more pages are found, take different action for + * hole punch and truncate. + * + * For hole punch, this indicates we have removed each page + * within the range and are done. Note that pages may have + * been faulted in after being removed in the hole punch case. + * This is OK as long as each page in the range was removed + * once. + * + * For truncate, we need to make sure all pages within the + * range are removed when exiting this routine. We could + * have raced with a fault that brought in a page after it + * was first removed. Check the range again until no pages + * are found. */ if (!pagevec_lookup(&pvec, mapping, next, lookup_nr)) { + if (!truncate_op) + break; + if (next == start) break; next = start; @@ -383,19 +398,23 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, struct page *page = pvec.pages[i]; u32 hash; + /* + * The page (index) could be beyond end. This is + * only possible in the punch hole case as end is + * LLONG_MAX for truncate. + */ + if (page->index >= end) { + next = end; /* we are done */ + break; + } + next = page->index; + hash = hugetlb_fault_mutex_hash(h, current->mm, &pseudo_vma, mapping, next, 0); mutex_lock(&hugetlb_fault_mutex_table[hash]); lock_page(page); - if (page->index >= end) { - unlock_page(page); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); - next = end; /* we are done */ - break; - } - /* * If page is mapped, it was faulted in after being * unmapped. Do nothing in this race case. In the @@ -424,15 +443,13 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, } } - if (page->index > next) - next = page->index; - ++next; unlock_page(page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); } huge_pagevec_release(&pvec); + cond_resched(); } if (truncate_op) @@ -648,9 +665,6 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size) i_size_write(inode, offset + len); inode->i_ctime = CURRENT_TIME; - spin_lock(&inode->i_lock); - inode->i_private = NULL; - spin_unlock(&inode->i_lock); out: mutex_unlock(&inode->i_mutex); return error;