]> www.infradead.org Git - users/willy/pagecache.git/log
users/willy/pagecache.git
4 years agomm: Make page_index() THP-safe next
Matthew Wilcox (Oracle) [Thu, 29 Oct 2020 18:07:47 +0000 (14:07 -0400)]
mm: Make page_index() THP-safe

You can't dereference page->index for tail pages, so use the head's
index to calculate the offset of this page in the file.  Also move
the function to pagemap.h since it's a pagecache function not used
by the rest of the kernel.  This also makes page_file_offset() work
when called for tail pages.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoselftests/vm/transhuge-stress: Support file-backed THPs
Matthew Wilcox (Oracle) [Thu, 1 Oct 2020 15:27:17 +0000 (11:27 -0400)]
selftests/vm/transhuge-stress: Support file-backed THPs

Add a -f <filename> option to test THPs on files

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/filemap: Support VM_HUGEPAGE for file mappings
Matthew Wilcox (Oracle) [Thu, 1 Oct 2020 17:21:18 +0000 (13:21 -0400)]
mm/filemap: Support VM_HUGEPAGE for file mappings

If the VM_HUGEPAGE flag is set, attempt to allocate PMD-sized THPs
during readahead, even if we have no history of readahead being
successful.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/readahead: Switch to page_cache_ra_order
Matthew Wilcox (Oracle) [Sat, 17 Oct 2020 21:53:21 +0000 (17:53 -0400)]
mm/readahead: Switch to page_cache_ra_order

do_page_cache_ra() was being exposed for the benefit of
do_sync_mmap_readahead().  Switch it over to page_cache_ra_order()
partly because it's a better interface but mostly for the benefit of
the next patch.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/readahead: Align THP mappings for non-DAX
William Kucharski [Sun, 22 Sep 2019 12:43:15 +0000 (08:43 -0400)]
mm/readahead: Align THP mappings for non-DAX

When we have the opportunity to use transparent huge pages to map a
file, we want to follow the same rules as DAX.

Signed-off-by: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/readahead: Add THP readahead
Matthew Wilcox (Oracle) [Wed, 5 Feb 2020 16:27:01 +0000 (11:27 -0500)]
mm/readahead: Add THP readahead

If the filesystem supports THPs, allocate larger pages in the
readahead code when it seems worth doing.  The heuristic for choosing
larger page sizes will surely need some tuning, but this aggressive
ramp-up seems good for testing.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/filemap: Allow PageReadahead to be set on head pages
Matthew Wilcox (Oracle) [Thu, 30 Apr 2020 18:43:43 +0000 (14:43 -0400)]
mm/filemap: Allow PageReadahead to be set on head pages

Adjust the callers to only call PageReadahead on the head page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/vmscan: Optimise shrink_page_list for smaller THPs
Matthew Wilcox (Oracle) [Sat, 15 Aug 2020 23:28:51 +0000 (19:28 -0400)]
mm/vmscan: Optimise shrink_page_list for smaller THPs

A THP which is smaller than a PMD does not need to do the extra work
in try_to_unmap() of trying to split a PMD entry.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/filemap: Allow THPs to be added to the page cache
Matthew Wilcox (Oracle) [Thu, 5 Sep 2019 18:03:12 +0000 (14:03 -0400)]
mm/filemap: Allow THPs to be added to the page cache

We return -EEXIST if there are any non-shadow entries in the page
cache in the range covered by the THP.  If there are multiple
shadow entries in the range, we set *shadowp to one of them (currently
the one at the highest index).  If that turns out to be the wrong
answer, we can implement something more complex.  This is mostly
modelled after the equivalent function in the shmem code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/filemap: Add __page_cache_alloc_order
Matthew Wilcox (Oracle) [Thu, 5 Sep 2019 17:58:41 +0000 (13:58 -0400)]
mm/filemap: Add __page_cache_alloc_order

This new function allows page cache pages to be allocated that are
larger than an order-0 page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
4 years agomm/filemap: Simplify generic_file_read_iter
Christoph Hellwig [Sat, 31 Oct 2020 09:00:04 +0000 (10:00 +0100)]
mm/filemap: Simplify generic_file_read_iter

Avoid the pointless goto out just for returning retval.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
4 years agomm/filemap: Rename generic_file_buffered_read to filemap_read
Christoph Hellwig [Sat, 31 Oct 2020 09:00:03 +0000 (10:00 +0100)]
mm/filemap: Rename generic_file_buffered_read to filemap_read

Rename generic_file_buffered_read to match the naming of filemap_fault,
also update the written parameter to a more descriptive name and
improve the kerneldoc comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
4 years agomm/filemap: Don't relock the page after calling readpage
Matthew Wilcox (Oracle) [Mon, 2 Nov 2020 02:21:54 +0000 (21:21 -0500)]
mm/filemap: Don't relock the page after calling readpage

We don't need to get the page lock again; we just need to wait for
the I/O to finish, so use wait_on_page_locked_killable() like the
other callers of ->readpage.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
4 years agomm/filemap: Restructure filemap_get_pages
Matthew Wilcox (Oracle) [Sun, 1 Nov 2020 03:19:06 +0000 (23:19 -0400)]
mm/filemap: Restructure filemap_get_pages

Remove the got_pages label, remove indentation, rename find_page to retry,
simplify error handling.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
4 years agomm/filemap: Split filemap_readahead out of filemap_get_pages
Matthew Wilcox (Oracle) [Mon, 2 Nov 2020 16:54:39 +0000 (11:54 -0500)]
mm/filemap: Split filemap_readahead out of filemap_get_pages

This simplifies the error handling.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
4 years agomm/filemap: Add filemap_range_uptodate
Matthew Wilcox (Oracle) [Sun, 1 Nov 2020 17:48:06 +0000 (12:48 -0500)]
mm/filemap: Add filemap_range_uptodate

Move the complicated condition and the calculations out of
filemap_update_page() into its own function.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
4 years agomm/filemap: Move the iocb checks into filemap_update_page
Matthew Wilcox (Oracle) [Sun, 1 Nov 2020 02:52:20 +0000 (22:52 -0400)]
mm/filemap: Move the iocb checks into filemap_update_page

We don't need to give up when a non-blocking request sees a !Uptodate page.
We may be able to satisfy the read from a partially-uptodate page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agomm/filemap: Convert filemap_update_page to return an errno
Matthew Wilcox (Oracle) [Sun, 1 Nov 2020 02:41:31 +0000 (22:41 -0400)]
mm/filemap: Convert filemap_update_page to return an errno

Use AOP_TRUNCATED_PAGE to indicate that no error occurred, but the
page we looked up is no longer valid.  In this case, the reference
to the page will have been removed; if we hit any other error, the
caller will release the reference.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
4 years agomm/filemap: Change filemap_create_page calling conventions
Matthew Wilcox (Oracle) [Mon, 2 Nov 2020 05:42:58 +0000 (00:42 -0500)]
mm/filemap: Change filemap_create_page calling conventions

By moving the iocb flag checks to the caller, we can pass the
file and the page index instead of the iocb.  It never needed the iter.
By passing the pagevec, we can return an errno (or AOP_TRUNCATED_PAGE)
instead of an ERR_PTR.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agomm/filemap: Change filemap_read_page calling conventions
Matthew Wilcox (Oracle) [Sat, 31 Oct 2020 21:30:20 +0000 (17:30 -0400)]
mm/filemap: Change filemap_read_page calling conventions

Make this function more generic by passing the file instead of the iocb.
Check in the callers whether we should call readpage or not.  Also make
it return an errno / 0 / AOP_TRUNCATED_PAGE, and make calling put_page()
the caller's responsibility.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
4 years agomm/filemap: Don't call ->readpage if IOCB_WAITQ is set
Matthew Wilcox (Oracle) [Sat, 31 Oct 2020 00:31:01 +0000 (20:31 -0400)]
mm/filemap: Don't call ->readpage if IOCB_WAITQ is set

The readpage operation can block in many (most?) filesystems, so we
should punt to a work queue instead of calling it.  This was the last
caller of lock_page_for_iocb(), so remove it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agomm/filemap: Inline __wait_on_page_locked_async into caller
Matthew Wilcox (Oracle) [Mon, 12 Oct 2020 20:42:00 +0000 (16:42 -0400)]
mm/filemap: Inline __wait_on_page_locked_async into caller

The previous patch removed wait_on_page_locked_async(), so inline
__wait_on_page_locked_async into __lock_page_async().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agomm/filemap: Support readpage splitting a page
Matthew Wilcox (Oracle) [Mon, 12 Oct 2020 20:36:11 +0000 (16:36 -0400)]
mm/filemap: Support readpage splitting a page

For page splitting to succeed, the thread asking to split the
page has to be the only one with a reference to the page.  Calling
wait_on_page_locked() while holding a reference to the page will
effectively prevent this from happening with sufficient threads waiting
on the same page.  Use put_and_wait_on_page_locked() to sleep without
holding a reference to the page, then retry the page lookup after the
page is unlocked.

Since we now get the page lock a little earlier in filemap_update_page(),
we can eliminate a number of duplicate checks.  The original intent
(commit ebded02788b5 ("avoid unnecessary calls to lock_page when waiting
for IO to complete during a read")) behind getting the page lock later
was to avoid re-locking the page after it has been brought uptodate by
another thread.  We still avoid that because we go through the normal
lookup path again after the winning thread has brought the page uptodate.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agomm/filemap: Pass a sleep state to put_and_wait_on_page_locked
Matthew Wilcox (Oracle) [Mon, 12 Oct 2020 15:12:45 +0000 (11:12 -0400)]
mm/filemap: Pass a sleep state to put_and_wait_on_page_locked

This is prep work for the next patch, but I think at least one of the
current callers would prefer a killable sleep to an uninterruptible one.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agomm/filemap: Use THPs in generic_file_buffered_read
Matthew Wilcox (Oracle) [Thu, 29 Oct 2020 13:32:28 +0000 (09:32 -0400)]
mm/filemap: Use THPs in generic_file_buffered_read

Add filemap_get_read_batch() which returns the THPs which represent a
contiguous array of bytes in the file.  It also stops when encountering
a page marked as Readahead or !Uptodate (but does return that page)
so it can be handled appropriately by filemap_get_pages().  That lets us
remove the loop in filemap_get_pages() and check only the last page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agomm/filemap: Convert filemap_get_pages to take a pagevec
Matthew Wilcox (Oracle) [Wed, 4 Nov 2020 12:43:47 +0000 (07:43 -0500)]
mm/filemap: Convert filemap_get_pages to take a pagevec

Using a pagevec lets us keep the pages and the number of pages together
which simplifies a lot of the calling conventions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/filemap: Remove dynamically allocated array from filemap_read
Matthew Wilcox (Oracle) [Wed, 4 Nov 2020 04:22:47 +0000 (23:22 -0500)]
mm/filemap: Remove dynamically allocated array from filemap_read

Increasing the batch size runs into diminishing returns.  It's probably
better to make, eg, three calls to filemap_get_pages() than it is to
call into kmalloc().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/filemap: Rename generic_file_buffered_read subfunctions
Matthew Wilcox (Oracle) [Thu, 29 Oct 2020 12:38:49 +0000 (08:38 -0400)]
mm/filemap: Rename generic_file_buffered_read subfunctions

The recent split of generic_file_buffered_read() created some very
long function names which are hard to distinguish from each other.
Rename as follows:

generic_file_buffered_read_readpage -> filemap_read_page
generic_file_buffered_read_pagenotuptodate -> filemap_update_page
generic_file_buffered_read_no_cached_page -> filemap_create_page
generic_file_buffered_read_get_pages -> filemap_get_pages

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
4 years agomm: Change NR_FILE_THPS to account in base pages
Matthew Wilcox (Oracle) [Mon, 28 Sep 2020 17:36:00 +0000 (13:36 -0400)]
mm: Change NR_FILE_THPS to account in base pages

With variable sized THPs, we have to account in number of base pages,
not in number of PMD-sized pages.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm: Support arbitrary THP sizes
Matthew Wilcox (Oracle) [Sat, 30 May 2020 00:54:38 +0000 (20:54 -0400)]
mm: Support arbitrary THP sizes

Use the compound size of the page instead of assuming PTE or PMD size.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm: Use multi-index entries in the page cache
Matthew Wilcox (Oracle) [Sun, 28 Jun 2020 02:19:08 +0000 (22:19 -0400)]
mm: Use multi-index entries in the page cache

We currently store order-N THPs as 2^N consecutive entries.  While this
consumes rather more memory than necessary, it also turns out to be buggy.
A writeback operation which starts in the middle of a dirty THP will not
notice as the dirty bit is only set on the head index.  With multi-index
entries, the dirty bit will be found no matter where in the THP the
iteration starts.

This does end up simplifying the page cache slightly, although not as
much as I had hoped.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoXArray: Expose xas_destroy
Matthew Wilcox (Oracle) [Tue, 27 Oct 2020 01:43:25 +0000 (21:43 -0400)]
XArray: Expose xas_destroy

This proves to be useful functionality for the THP page cache.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm: Fix READ_ONLY_THP warning
Matthew Wilcox (Oracle) [Sat, 10 Oct 2020 15:47:55 +0000 (11:47 -0400)]
mm: Fix READ_ONLY_THP warning

These counters only exist if CONFIG_READ_ONLY_THP_FOR_FS is defined,
but we should not warn if the filesystem natively supports THPs.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm: Fix THP size assumption in mem_cgroup_split_huge_fixup
Zi Yan [Sat, 10 Oct 2020 15:42:32 +0000 (11:42 -0400)]
mm: Fix THP size assumption in mem_cgroup_split_huge_fixup

Ask the page what size it is instead of assuming PMD size.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/vmscan: Free non-shmem THPs without splitting them
Matthew Wilcox (Oracle) [Wed, 30 Sep 2020 20:15:46 +0000 (16:15 -0400)]
mm/vmscan: Free non-shmem THPs without splitting them

We have to allocate memory in order to split a file-backed page,
so it's not a good idea to split them.  It also doesn't work for XFS
because pages have an extra reference count from page_has_private() and
split_huge_page() expects that reference to have already been removed.
Unfortunately, we still have to split shmem THPs because we can't handle
swapping out an entire THP yet.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/truncate: Fix invalidate_complete_page2 for THPs
Matthew Wilcox (Oracle) [Sun, 16 Aug 2020 17:32:34 +0000 (13:32 -0400)]
mm/truncate: Fix invalidate_complete_page2 for THPs

invalidate_complete_page2() currently open-codes page_cache_free_page(),
except for the part where it handles THP.  Rather than adding that,
call page_cache_free_page() from invalidate_complete_page2().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/truncate: Make invalidate_inode_pages2_range work with THPs
Matthew Wilcox (Oracle) [Thu, 6 Aug 2020 14:11:20 +0000 (10:11 -0400)]
mm/truncate: Make invalidate_inode_pages2_range work with THPs

If we're going to unmap a THP, we have to be sure to unmap the entire
page, not just the part of it which lies after the search index.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm: Replace prep_transhuge_page with thp_prep
Matthew Wilcox (Oracle) [Fri, 6 Sep 2019 18:23:41 +0000 (14:23 -0400)]
mm: Replace prep_transhuge_page with thp_prep

Since this is a THP function, move it into the thp_* function namespace.

By permitting NULL or order-0 pages as an argument, and returning the
argument, callers can write:

return thp_prep(alloc_pages(...));

instead of assigning the result to a temporary variable and conditionally
passing that to prep_transhuge_page().  It'd be even nicer to have a
thp_alloc() function, but there are a lot of different ways that THPs
get allocated, and replicating the alloc_pages() family of APIs is a
bit too much verbosity.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
4 years agomm: Return head pages from grab_cache_page_write_begin
Matthew Wilcox (Oracle) [Sat, 17 Oct 2020 20:20:01 +0000 (16:20 -0400)]
mm: Return head pages from grab_cache_page_write_begin

This function is only called from filesystems on their own mapping,
so no caller will be surprised by getting back a head page when they
were expecting a tail page.  This lets us remove a call to thp_head()
in wait_for_stable_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm/page-flags: Allow accessing PageError on tail pages
Matthew Wilcox (Oracle) [Sat, 5 Sep 2020 03:35:43 +0000 (23:35 -0400)]
mm/page-flags: Allow accessing PageError on tail pages

We track whether a page has had a read error for the entire THP, just like
we track Uptodate and Dirty.  This lets, eg, generic_file_buffered_read()
continue to work on the appropriate subpage of the THP without
modification.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm: Remove nrexceptional from inode
Matthew Wilcox (Oracle) [Tue, 4 Aug 2020 15:23:25 +0000 (11:23 -0400)]
mm: Remove nrexceptional from inode

We no longer track anything in nrexceptional, so remove it, saving 8
bytes per inode.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
4 years agodax: Account DAX entries as nrpages
Matthew Wilcox (Oracle) [Tue, 4 Aug 2020 15:22:31 +0000 (11:22 -0400)]
dax: Account DAX entries as nrpages

Simplify mapping_needs_writeback() by accounting DAX entries as
pages instead of exceptional entries.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
4 years agomm: Stop accounting shadow entries
Matthew Wilcox (Oracle) [Tue, 4 Aug 2020 15:19:44 +0000 (11:19 -0400)]
mm: Stop accounting shadow entries

We no longer need to keep track of how many shadow entries are
present in a mapping.  This saves a few writes to the inode and
memory barriers.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
4 years agomm: Introduce and use mapping_empty
Matthew Wilcox (Oracle) [Tue, 4 Aug 2020 14:22:54 +0000 (10:22 -0400)]
mm: Introduce and use mapping_empty

Instead of checking the two counters (nrpages and nrexceptional), we
can just check whether i_pages is empty.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
4 years agoxfs: Support THPs
Matthew Wilcox (Oracle) [Wed, 31 Jul 2019 16:59:40 +0000 (12:59 -0400)]
xfs: Support THPs

There is one place which assumes the size of a page; fix it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoiomap: Handle tail pages in iomap_page_mkwrite
Matthew Wilcox (Oracle) [Sat, 2 May 2020 04:26:24 +0000 (00:26 -0400)]
iomap: Handle tail pages in iomap_page_mkwrite

iomap_page_mkwrite() can be called with a tail page.  If we are,
operate on the head page, since we're treating the entire thing as a
single unit and the whole page is dirtied at the same time.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoiomap: Inline data shouldn't see THPs
Matthew Wilcox (Oracle) [Sun, 5 Jan 2020 03:28:24 +0000 (22:28 -0500)]
iomap: Inline data shouldn't see THPs

Assert that we're not seeing THPs in functions that read/write
inline data, rather than zeroing out the tail.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agoiomap: Support THP writeback
Matthew Wilcox (Oracle) [Wed, 9 Sep 2020 13:48:29 +0000 (09:48 -0400)]
iomap: Support THP writeback

Use offset_in_thp() and similar helpers to submit THPs for writeback.
This simplifies the logic in iomap_do_writepage() around handling the
end of the file.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoiomap: Handle THPs when writing to pages
Matthew Wilcox (Oracle) [Wed, 9 Sep 2020 11:45:09 +0000 (07:45 -0400)]
iomap: Handle THPs when writing to pages

If we come across a THP that is not uptodate when writing to the page
cache, this must be due to a readahead error, so behave the same way as
readpage and split it.  Make sure to flush the right page after completing
the write.  We still only copy up to a page boundary, so there's no need
to flush multiple pages at this time.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoiomap: Change iomap_write_begin calling convention
Matthew Wilcox (Oracle) [Sun, 7 Jun 2020 02:13:02 +0000 (22:13 -0400)]
iomap: Change iomap_write_begin calling convention

Pass (up to) the remaining length of the extent to iomap_write_begin()
and have it return the number of bytes that will fit in the page.
That lets us copy more bytes per call to iomap_write_begin() if the page
cache has already allocated a THP (and will in future allow us to pass
a hint to the page cache that it should try to allocate a larger page
if there are none in the cache).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoiomap: Prepare page ops for larger I/Os
Matthew Wilcox (Oracle) [Wed, 14 Oct 2020 17:56:55 +0000 (13:56 -0400)]
iomap: Prepare page ops for larger I/Os

Just adjust the types for the page_prepare() and page_done() callbacks
to size_t from unsigned int.  While I don't see individual pages that
large in our near future, it's convenient to be able to pass in larger
lengths and has no cost.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoiomap: Support THPs in readahead
Matthew Wilcox (Oracle) [Sat, 4 Jan 2020 22:35:18 +0000 (17:35 -0500)]
iomap: Support THPs in readahead

Use offset_in_thp() instead of offset_in_page() and change how we estimate
the number of bio_vecs we're going to need.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoiomap: Support THPs in readpage
Matthew Wilcox (Oracle) [Wed, 9 Sep 2020 03:31:35 +0000 (23:31 -0400)]
iomap: Support THPs in readpage

The VFS only calls readpage if readahead has encountered an error.
Assume that any error requires the page to be split, and attempt to
do so.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoiomap: Support THPs in iomap_is_partially_uptodate
Matthew Wilcox (Oracle) [Wed, 9 Sep 2020 03:23:04 +0000 (23:23 -0400)]
iomap: Support THPs in iomap_is_partially_uptodate

Factor iomap_range_uptodate() out of iomap_is_partially_uptodate() to use
by iomap_readpage() later.  iomap_is_partially_uptodate() can be called
on a tail page by generic_file_buffered_read(), so handle that correctly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoiomap: Support THPs in invalidatepage
Matthew Wilcox (Oracle) [Wed, 27 May 2020 21:12:44 +0000 (17:12 -0400)]
iomap: Support THPs in invalidatepage

If we're punching a hole in a THP, we need to remove the per-page
iomap data as the THP is about to be split and each page will need
its own.  This means that writepage can now come across a page with
no iop allocated, so remove the assertion that there is already one,
and just create one (with the uptodate bits set) if there isn't one.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoiomap: Support THPs in iomap_adjust_read_range
Matthew Wilcox (Oracle) [Sat, 4 Jan 2020 22:00:30 +0000 (17:00 -0500)]
iomap: Support THPs in iomap_adjust_read_range

Pass the struct page instead of the iomap_page so we can determine the
size of the page.  Use offset_in_thp() instead of offset_in_page() and
use thp_size() instead of PAGE_SIZE.  Convert the arguments to be size_t
instead of unsigned int, in case pages ever get larger than 2^31 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agoiomap: Support THPs in BIO completion path
Matthew Wilcox (Oracle) [Wed, 9 Sep 2020 13:45:46 +0000 (09:45 -0400)]
iomap: Support THPs in BIO completion path

bio_for_each_segment_all() iterates once per regular sized page.
Use bio_for_each_bvec_all() to iterate once per bvec and handle
merged THPs ourselves, instead of teaching the block layer about THPs.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agofs: Make page_mkwrite_check_truncate thp-aware
Matthew Wilcox (Oracle) [Sat, 1 Feb 2020 06:43:24 +0000 (01:43 -0500)]
fs: Make page_mkwrite_check_truncate thp-aware

If the page is compound, check the last index in the page and return
the appropriate size.  Change the return type to ssize_t in case we ever
support pages larger than 2GB.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agofs: Support THPs in vfs_dedupe_file_range
Matthew Wilcox (Oracle) [Sun, 2 Aug 2020 01:59:48 +0000 (21:59 -0400)]
fs: Support THPs in vfs_dedupe_file_range

We may get tail pages returned from vfs_dedupe_get_page().  If we do,
we have to call page_mapping() instead of dereferencing page->mapping
directly.  We may also deadlock trying to lock the page twice if they're
subpages of the same THP, so compare the head pages instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agomm: Change type of thp_size to size_t
Matthew Wilcox (Oracle) [Sat, 14 Nov 2020 17:33:03 +0000 (12:33 -0500)]
mm: Change type of thp_size to size_t

ARM defines size_t to be unsigned int, which is exactly as many bits as
unsigned long, but causes a type mismatch when used with the min() macro.
Since it's a byte count, thp_size should be size_t.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
4 years agofix mm-truncateshmem-handle-truncates-that-split-thps.patch
Matthew Wilcox (Oracle) [Tue, 17 Nov 2020 15:45:18 +0000 (10:45 -0500)]
fix mm-truncateshmem-handle-truncates-that-split-thps.patch

4 years agofix highmem
Matthew Wilcox (Oracle) [Wed, 25 Nov 2020 01:19:02 +0000 (20:19 -0500)]
fix highmem

4 years agoAdd linux-next specific files for 20201124
Stephen Rothwell [Tue, 24 Nov 2020 07:04:04 +0000 (18:04 +1100)]
Add linux-next specific files for 20201124

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agoMerge branch 'akpm/master'
Stephen Rothwell [Tue, 24 Nov 2020 06:12:33 +0000 (17:12 +1100)]
Merge branch 'akpm/master'

4 years agomm/gup: assert that the mmap lock is held in __get_user_pages()
Jann Horn [Tue, 24 Nov 2020 05:45:20 +0000 (16:45 +1100)]
mm/gup: assert that the mmap lock is held in __get_user_pages()

After having cleaned up all GUP callers (except for the atomisp staging
driver, which currently gets mmap locking completely wrong [1]) to always
ensure that they hold the mmap lock when calling into GUP (unless the mm
is not yet globally visible), add an assertion to make sure it stays that
way going forward.

[1] https://lore.kernel.org/lkml/CAG48ez3tZAb9JVhw4T5e-i=h2_DUZxfNRTDsagSRCVazNXx5qA@mail.gmail.com/

Link: https://lkml.kernel.org/r/CAG48ez1GM==OnHpS=ghqZNJPn02FCDUEHc7GQmGRMXUD_aKudg@mail.gmail.com
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Michel Lespinasse <walken@google.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agommap locking API: don't check locking if the mm isn't live yet
Jann Horn [Tue, 24 Nov 2020 05:45:20 +0000 (16:45 +1100)]
mmap locking API: don't check locking if the mm isn't live yet

In preparation for adding a mmap_assert_locked() check in
__get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
to operate on an mm without locking in the middle of execve() as long as
it hasn't been installed on a process yet.

Existing code paths that do this are (reverse callgraph):

  get_user_pages_remote
    get_arg_page
      copy_strings
      copy_string_kernel
      remove_arg_zero
    tomoyo_dump_page
      tomoyo_print_bprm
      tomoyo_scan_bprm
      tomoyo_environ

Link: https://lkml.kernel.org/r/CAG48ez03YJG9JU_6tGiMcaVjuTyRE_o4LEQ7901b5ZoCnNAjcg@mail.gmail.com
Signed-off-by: Jann Horn <jannh@google.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: update documentation
Andrey Konovalov [Tue, 24 Nov 2020 05:45:20 +0000 (16:45 +1100)]
kasan: update documentation

This change updates KASAN documentation to reflect the addition of boot
parameters and also reworks and clarifies some of the existing sections,
in particular: defines what a memory granule is, mentions quarantine,
makes Kunit section more readable.

Link: https://lkml.kernel.org/r/748daf013e17d925b0fe00c1c3b5dce726dd2430.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Ib1f83e91be273264b25f42b04448ac96b858849f
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan, mm: allow cache merging with no metadata
Andrey Konovalov [Tue, 24 Nov 2020 05:45:20 +0000 (16:45 +1100)]
kasan, mm: allow cache merging with no metadata

The reason cache merging is disabled with KASAN is because KASAN puts its
metadata right after the allocated object. When the merged caches have
slightly different sizes, the metadata ends up in different places, which
KASAN doesn't support.

It might be possible to adjust the metadata allocation algorithm and make
it friendly to the cache merging code. Instead this change takes a simpler
approach and allows merging caches when no metadata is present. Which is
the case for hardware tag-based KASAN with kasan.mode=prod.

Link: https://lkml.kernel.org/r/37497e940bfd4b32c0a93a702a9ae4cf061d5392.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Ia114847dfb2244f297d2cb82d592bf6a07455dba
Co-developed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Signed-off-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: sanitize objects when metadata doesn't fit
Andrey Konovalov [Tue, 24 Nov 2020 05:45:19 +0000 (16:45 +1100)]
kasan: sanitize objects when metadata doesn't fit

KASAN marks caches that are sanitized with the SLAB_KASAN cache flag.
Currently if the metadata that is appended after the object (stores e.g.
stack trace ids) doesn't fit into KMALLOC_MAX_SIZE (can only happen with
SLAB, see the comment in the patch), KASAN turns off sanitization
completely.

With this change sanitization of the object data is always enabled.
However the metadata is only stored when it fits.  Instead of checking for
SLAB_KASAN flag accross the code to find out whether the metadata is
there, use cache->kasan_info.alloc/free_meta_offset.  As 0 can be a valid
value for free_meta_offset, introduce KASAN_NO_FREE_META as an indicator
that the free metadata is missing.

Without this change all sanitized KASAN objects would be put into
quarantine with generic KASAN.  With this change, only the objects that
have metadata (i.e.  when it fits) are put into quarantine, the rest is
freed right away.

Along the way rework __kasan_cache_create() and add claryfying comments.

Link: https://lkml.kernel.org/r/aee34b87a5e4afe586c2ac6a0b32db8dc4dcc2dc.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Icd947e2bea054cb5cfbdc6cf6652227d97032dcb
Co-developed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Signed-off-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: clarify comment in __kasan_kfree_large
Andrey Konovalov [Tue, 24 Nov 2020 05:45:19 +0000 (16:45 +1100)]
kasan: clarify comment in __kasan_kfree_large

Currently it says that the memory gets poisoned by page_alloc code.
Clarify this by mentioning the specific callback that poisons the memory.

Link: https://lkml.kernel.org/r/1c8380fe0332a3bcc720fe29f1e0bef2e2974416.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/I1334dffb69b87d7986fab88a1a039cc3ea764725
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: simplify assign_tag and set_tag calls
Andrey Konovalov [Tue, 24 Nov 2020 05:45:19 +0000 (16:45 +1100)]
kasan: simplify assign_tag and set_tag calls

set_tag() already ignores the tag for the generic mode, so just call it
as is. Add a check for the generic mode to assign_tag(), and simplify its
call in ____kasan_kmalloc().

Link: https://lkml.kernel.org/r/121eeab245f98555862b289d2ba9269c868fbbcf.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/I18905ca78fb4a3d60e1a34a4ca00247272480438
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: don't round_up too much
Andrey Konovalov [Tue, 24 Nov 2020 05:45:19 +0000 (16:45 +1100)]
kasan: don't round_up too much

For hardware tag-based mode kasan_poison_memory() already rounds up the
size. Do the same for software modes and remove round_up() from the common
code.

Link: https://lkml.kernel.org/r/47b232474f1f89dc072aeda0fa58daa6efade377.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Ib397128fac6eba874008662b4964d65352db4aa4
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan, mm: rename kasan_poison_kfree
Andrey Konovalov [Tue, 24 Nov 2020 05:45:19 +0000 (16:45 +1100)]
kasan, mm: rename kasan_poison_kfree

Rename kasan_poison_kfree() to kasan_slab_free_mempool() as it better
reflects what this annotation does. Also add a comment that explains the
PageSlab() check.

No functional changes.

Link: https://lkml.kernel.org/r/141675fb493555e984c5dca555e9d9f768c7bbaa.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/I5026f87364e556b506ef1baee725144bb04b8810
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan, mm: check kasan_enabled in annotations
Andrey Konovalov [Tue, 24 Nov 2020 05:45:18 +0000 (16:45 +1100)]
kasan, mm: check kasan_enabled in annotations

Declare the kasan_enabled static key in include/linux/kasan.h and in
include/linux/mm.h and check it in all kasan annotations. This allows to
avoid any slowdown caused by function calls when kasan_enabled is
disabled.

Link: https://lkml.kernel.org/r/9f90e3c0aa840dbb4833367c2335193299f69023.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/I2589451d3c96c97abbcbf714baabe6161c6f153e
Co-developed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Signed-off-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: add and integrate kasan boot parameters
Andrey Konovalov [Tue, 24 Nov 2020 05:45:18 +0000 (16:45 +1100)]
kasan: add and integrate kasan boot parameters

Hardware tag-based KASAN mode is intended to eventually be used in
production as a security mitigation. Therefore there's a need for finer
control over KASAN features and for an existence of a kill switch.

This change adds a few boot parameters for hardware tag-based KASAN that
allow to disable or otherwise control particular KASAN features.

The features that can be controlled are:

1. Whether KASAN is enabled at all.
2. Whether KASAN collects and saves alloc/free stacks.
3. Whether KASAN panics on a detected bug or not.

With this change a new boot parameter kasan.mode allows to choose one of
three main modes:

- kasan.mode=off - KASAN is disabled, no tag checks are performed
- kasan.mode=prod - only essential production features are enabled
- kasan.mode=full - all KASAN features are enabled

The chosen mode provides default control values for the features mentioned
above. However it's also possible to override the default values by
providing:

- kasan.stacktrace=off/on - enable alloc/free stack collection
                            (default: on for mode=full, otherwise off)
- kasan.fault=report/panic - only report tag fault or also panic
                             (default: report)

If kasan.mode parameter is not provided, it defaults to full when
CONFIG_DEBUG_KERNEL is enabled, and to prod otherwise.

It is essential that switching between these modes doesn't require
rebuilding the kernel with different configs, as this is required by
the Android GKI (Generic Kernel Image) initiative [1].

[1] https://source.android.com/devices/architecture/kernel/generic-kernel-image

Link: https://lkml.kernel.org/r/cb093613879d8d8841173f090133eddeb4c35f1f.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/If7d37003875b2ed3e0935702c8015c223d6416a4
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: inline (un)poison_range and check_invalid_free
Andrey Konovalov [Tue, 24 Nov 2020 05:45:18 +0000 (16:45 +1100)]
kasan: inline (un)poison_range and check_invalid_free

Using (un)poison_range() or check_invalid_free() currently results in
function calls. Move their definitions to mm/kasan/kasan.h and turn them
into static inline functions for hardware tag-based mode to avoid
unneeded function calls.

Link: https://lkml.kernel.org/r/7007955b69eb31b5376a7dc1e0f4ac49138504f2.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Ia9d8191024a12d1374675b3d27197f10193f50bb
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: open-code kasan_unpoison_slab
Andrey Konovalov [Tue, 24 Nov 2020 05:45:18 +0000 (16:45 +1100)]
kasan: open-code kasan_unpoison_slab

There's the external annotation kasan_unpoison_slab() that is currently
defined as static inline and uses kasan_unpoison_range(). Open-code this
function in mempool.c. Otherwise with an upcoming change this function
will result in an unnecessary function call.

Link: https://lkml.kernel.org/r/131a6694a978a9a8b150187e539eecc8bcbf759b.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Ia7c8b659f79209935cbaab3913bf7f082cc43a0e
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: inline random_tag for HW_TAGS
Andrey Konovalov [Tue, 24 Nov 2020 05:45:18 +0000 (16:45 +1100)]
kasan: inline random_tag for HW_TAGS

Using random_tag() currently results in a function call. Move its
definition to mm/kasan/kasan.h and turn it into a static inline function
for hardware tag-based mode to avoid uneeded function calls.

Link: https://lkml.kernel.org/r/be438471690e351e1d792e6bb432e8c03ccb15d3.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Iac5b2faf9a912900e16cca6834d621f5d4abf427
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: inline kasan_reset_tag for tag-based modes
Andrey Konovalov [Tue, 24 Nov 2020 05:45:17 +0000 (16:45 +1100)]
kasan: inline kasan_reset_tag for tag-based modes

Using kasan_reset_tag() currently results in a function call. As it's
called quite often from the allocator code, this leads to a noticeable
slowdown. Move it to include/linux/kasan.h and turn it into a static
inline function. Also remove the now unneeded reset_tag() internal KASAN
macro and use kasan_reset_tag() instead.

Link: https://lkml.kernel.org/r/6940383a3a9dfb416134d338d8fac97a9ebb8686.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/I4d2061acfe91d480a75df00b07c22d8494ef14b5
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: remove __kasan_unpoison_stack
Andrey Konovalov [Tue, 24 Nov 2020 05:45:17 +0000 (16:45 +1100)]
kasan: remove __kasan_unpoison_stack

There's no need for __kasan_unpoison_stack() helper, as it's only
currently used in a single place. Removing it also removes unneeded
arithmetic.

No functional changes.

Link: https://lkml.kernel.org/r/93e78948704a42ea92f6248ff8a725613d721161.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Ie5ba549d445292fe629b4a96735e4034957bcc50
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: allow VMAP_STACK for HW_TAGS mode
Andrey Konovalov [Tue, 24 Nov 2020 05:45:17 +0000 (16:45 +1100)]
kasan: allow VMAP_STACK for HW_TAGS mode

Even though hardware tag-based mode currently doesn't support checking
vmalloc allocations, it doesn't use shadow memory and works with
VMAP_STACK as is. Change VMAP_STACK definition accordingly.

Link: https://lkml.kernel.org/r/ecdb2a1658ebd88eb276dee2493518ac0e82de41.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/I3552cbc12321dec82cd7372676e9372a2eb452ac
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan, arm64: unpoison stack only with CONFIG_KASAN_STACK
Andrey Konovalov [Tue, 24 Nov 2020 05:45:17 +0000 (16:45 +1100)]
kasan, arm64: unpoison stack only with CONFIG_KASAN_STACK

There's a config option CONFIG_KASAN_STACK that has to be enabled for
KASAN to use stack instrumentation and perform validity checks for
stack variables.

There's no need to unpoison stack when CONFIG_KASAN_STACK is not enabled.
Only call kasan_unpoison_task_stack[_below]() when CONFIG_KASAN_STACK is
enabled.

Note, that CONFIG_KASAN_STACK is an option that is currently always
defined when CONFIG_KASAN is enabled, and therefore has to be tested
with #if instead of #ifdef.

Link: https://lkml.kernel.org/r/d09dd3f8abb388da397fd11598c5edeaa83fe559.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/If8a891e9fe01ea543e00b576852685afec0887e3
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: introduce set_alloc_info
Andrey Konovalov [Tue, 24 Nov 2020 05:45:17 +0000 (16:45 +1100)]
kasan: introduce set_alloc_info

Add set_alloc_info() helper and move kasan_set_track() into it. This will
simplify the code for one of the upcoming changes.

No functional changes.

Link: https://lkml.kernel.org/r/b2393e8f1e311a70fc3aaa2196461b6acdee7d21.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/I0316193cbb4ecc9b87b7c2eee0dd79f8ec908c1a
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: rename get_alloc/free_info
Andrey Konovalov [Tue, 24 Nov 2020 05:45:16 +0000 (16:45 +1100)]
kasan: rename get_alloc/free_info

Rename get_alloc_info() and get_free_info() to kasan_get_alloc_meta() and
kasan_get_free_meta() to better reflect what those do and avoid confusion
with kasan_set_free_info().

No functional changes.

Link: https://lkml.kernel.org/r/27b7c036b754af15a2839e945f6d8bfce32b4c2f.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Ib6e4ba61c8b12112b403d3479a9799ac8fff8de1
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: simplify quarantine_put call site
Andrey Konovalov [Tue, 24 Nov 2020 05:45:16 +0000 (16:45 +1100)]
kasan: simplify quarantine_put call site

Patch series "kasan: boot parameters for hardware tag-based mode", v4.

=== Overview

Hardware tag-based KASAN mode [1] is intended to eventually be used in
production as a security mitigation. Therefore there's a need for finer
control over KASAN features and for an existence of a kill switch.

This patchset adds a few boot parameters for hardware tag-based KASAN that
allow to disable or otherwise control particular KASAN features, as well
as provides some initial optimizations for running KASAN in production.

There's another planned patchset what will further optimize hardware
tag-based KASAN, provide proper benchmarking and tests, and will fully
enable tag-based KASAN for production use.

Hardware tag-based KASAN relies on arm64 Memory Tagging Extension (MTE)
[2] to perform memory and pointer tagging. Please see [3] and [4] for
detailed analysis of how MTE helps to fight memory safety problems.

The features that can be controlled are:

1. Whether KASAN is enabled at all.
2. Whether KASAN collects and saves alloc/free stacks.
3. Whether KASAN panics on a detected bug or not.

The patch titled "kasan: add and integrate kasan boot parameters" of this
series adds a few new boot parameters.

kasan.mode allows to choose one of three main modes:

- kasan.mode=off - KASAN is disabled, no tag checks are performed
- kasan.mode=prod - only essential production features are enabled
- kasan.mode=full - all KASAN features are enabled

The chosen mode provides default control values for the features mentioned
above. However it's also possible to override the default values by
providing:

- kasan.stacktrace=off/on - enable stacks collection
                            (default: on for mode=full, otherwise off)
- kasan.fault=report/panic - only report tag fault or also panic
                             (default: report)

If kasan.mode parameter is not provided, it defaults to full when
CONFIG_DEBUG_KERNEL is enabled, and to prod otherwise.

It is essential that switching between these modes doesn't require
rebuilding the kernel with different configs, as this is required by
the Android GKI (Generic Kernel Image) initiative.

=== Benchmarks

For now I've only performed a few simple benchmarks such as measuring
kernel boot time and slab memory usage after boot. There's an upcoming
patchset which will optimize KASAN further and include more detailed
benchmarking results.

The benchmarks were performed in QEMU and the results below exclude the
slowdown caused by QEMU memory tagging emulation (as it's different from
the slowdown that will be introduced by hardware and is therefore
irrelevant).

KASAN_HW_TAGS=y + kasan.mode=off introduces no performance or memory
impact compared to KASAN_HW_TAGS=n.

kasan.mode=prod (manually excluding tagging) introduces 3% of performance
and no memory impact (except memory used by hardware to store tags)
compared to kasan.mode=off.

kasan.mode=full has about 40% performance and 30% memory impact over
kasan.mode=prod. Both come from alloc/free stack collection.

=== Notes

This patchset is available here:

https://github.com/xairy/linux/tree/up-boot-mte-v4

This patchset is based on v11 of "kasan: add hardware tag-based mode for
arm64" patchset [1].

For testing in QEMU hardware tag-based KASAN requires:

1. QEMU built from master [6] (use "-machine virt,mte=on -cpu max" arguments
   to run).
2. GCC version 10.

[1] https://lore.kernel.org/linux-arm-kernel/cover.1606161801.git.andreyknvl@google.com/T/#t
[2] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety
[3] https://arxiv.org/pdf/1802.09517.pdf
[4] https://github.com/microsoft/MSRC-Security-Research/blob/master/papers/2020/Security%20analysis%20of%20memory%20tagging.pdf
[5] https://source.android.com/devices/architecture/kernel/generic-kernel-image
[6] https://github.com/qemu/qemu

=== Tags

Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
This patch (of 19):

Move get_free_info() call into quarantine_put() to simplify the call site.

No functional changes.

Link: https://lkml.kernel.org/r/cover.1606162397.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/312d0a3ef92cc6dc4fa5452cbc1714f9393ca239.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Iab0f04e7ebf8d83247024b7190c67c3c34c7940f
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokselftest/arm64: check GCR_EL1 after context switch
Vincenzo Frascino [Tue, 24 Nov 2020 05:45:16 +0000 (16:45 +1100)]
kselftest/arm64: check GCR_EL1 after context switch

This test is specific to MTE and verifies that the GCR_EL1 register is
context switched correctly.

It spawns 1024 processes and each process spawns 5 threads.  Each thread
writes a random setting of GCR_EL1 through the prctl() system call and
reads it back verifying that it is the same.  If the values are not the
same it reports a failure.

Note: The test has been extended to verify that even SYNC and ASYNC mode
setting is preserved correctly over context switching.

Link: https://lkml.kernel.org/r/b51a165426e906e7ec8a68d806ef3f8cd92581a6.1606161801.git.andreyknvl@google.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: add documentation for hardware tag-based mode
Andrey Konovalov [Tue, 24 Nov 2020 05:45:16 +0000 (16:45 +1100)]
kasan: add documentation for hardware tag-based mode

Add documentation for hardware tag-based KASAN mode and also add some
clarifications for software tag-based mode.

Link: https://lkml.kernel.org/r/20ed1d387685e89fc31be068f890f070ef9fd5d5.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan, arm64: enable CONFIG_KASAN_HW_TAGS
Andrey Konovalov [Tue, 24 Nov 2020 05:45:16 +0000 (16:45 +1100)]
kasan, arm64: enable CONFIG_KASAN_HW_TAGS

Hardware tag-based KASAN is now ready, enable the configuration option.

Link: https://lkml.kernel.org/r/a6fa50d3bb6b318e05c6389a44095be96442b8b0.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan, mm: reset tags when accessing metadata
Andrey Konovalov [Tue, 24 Nov 2020 05:45:15 +0000 (16:45 +1100)]
kasan, mm: reset tags when accessing metadata

Kernel allocator code accesses metadata for slab objects, that may lie
out-of-bounds of the object itself, or be accessed when an object is
freed.  Such accesses trigger tag faults and lead to false-positive
reports with hardware tag-based KASAN.

Software KASAN modes disable instrumentation for allocator code via
KASAN_SANITIZE Makefile macro, and rely on kasan_enable/disable_current()
annotations which are used to ignore KASAN reports.

With hardware tag-based KASAN neither of those options are available, as
it doesn't use compiler instrumetation, no tag faults are ignored, and MTE
is disabled after the first one.

Instead, reset tags when accessing metadata (currently only for SLUB).

Link: https://lkml.kernel.org/r/a0f3cefbc49f34c843b664110842de4db28179d0.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan, arm64: print report from tag fault handler
Andrey Konovalov [Tue, 24 Nov 2020 05:45:15 +0000 (16:45 +1100)]
kasan, arm64: print report from tag fault handler

Add error reporting for hardware tag-based KASAN.  When
CONFIG_KASAN_HW_TAGS is enabled, print KASAN report from the arm64 tag
fault handler.

SAS bits aren't set in ESR for all faults reported in EL1, so it's
impossible to find out the size of the access the caused the fault.  Adapt
KASAN reporting code to handle this case.

Link: https://lkml.kernel.org/r/b559c82b6a969afedf53b4694b475f0234067a1a.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan, arm64: implement HW_TAGS runtime
Andrey Konovalov [Tue, 24 Nov 2020 05:45:15 +0000 (16:45 +1100)]
kasan, arm64: implement HW_TAGS runtime

Provide implementation of KASAN functions required for the hardware
tag-based mode.  Those include core functions for memory and pointer
tagging (tags_hw.c) and bug reporting (report_tags_hw.c).  Also adapt
common KASAN code to support the new mode.

Link: https://lkml.kernel.org/r/cfd0fbede579a6b66755c98c88c108e54f9c56bf.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan, arm64: expand CONFIG_KASAN checks
Andrey Konovalov [Tue, 24 Nov 2020 05:45:15 +0000 (16:45 +1100)]
kasan, arm64: expand CONFIG_KASAN checks

Some #ifdef CONFIG_KASAN checks are only relevant for software KASAN modes
(either related to shadow memory or compiler instrumentation).  Expand
those into CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS.

Link: https://lkml.kernel.org/r/e6971e432dbd72bb897ff14134ebb7e169bdcf0c.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan, x86, s390: update undef CONFIG_KASAN
Andrey Konovalov [Tue, 24 Nov 2020 05:45:15 +0000 (16:45 +1100)]
kasan, x86, s390: update undef CONFIG_KASAN

With the intoduction of hardware tag-based KASAN some kernel checks of
this kind:

  ifdef CONFIG_KASAN

will be updated to:

  if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)

x86 and s390 use a trick to #undef CONFIG_KASAN for some of the code
that isn't linked with KASAN runtime and shouldn't have any KASAN
annotations.

Also #undef CONFIG_KASAN_GENERIC with CONFIG_KASAN.

Link: https://lkml.kernel.org/r/9d84bfaaf8fabe0fc89f913c9e420a30bd31a260.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan: define KASAN_GRANULE_SIZE for HW_TAGS
Andrey Konovalov [Tue, 24 Nov 2020 05:45:14 +0000 (16:45 +1100)]
kasan: define KASAN_GRANULE_SIZE for HW_TAGS

Hardware tag-based KASAN has granules of MTE_GRANULE_SIZE.  Define
KASAN_GRANULE_SIZE to MTE_GRANULE_SIZE for CONFIG_KASAN_HW_TAGS.

Link: https://lkml.kernel.org/r/3d15794b3d1b27447fd7fdf862c073192ba657bd.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agoarm64: kasan: add arch layer for memory tagging helpers
Andrey Konovalov [Tue, 24 Nov 2020 05:45:14 +0000 (16:45 +1100)]
arm64: kasan: add arch layer for memory tagging helpers

This patch add a set of arch_*() memory tagging helpers currently only
defined for arm64 when hardware tag-based KASAN is enabled.  These helpers
will be used by KASAN runtime to implement the hardware tag-based mode.

The arch-level indirection level is introduced to simplify adding hardware
tag-based KASAN support for other architectures in the future by defining
the appropriate arch_*() macros.

Link: https://lkml.kernel.org/r/fc9e5bb71201c03131a2fc00a74125723568dda9.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agoarm64: kasan: align allocations for HW_TAGS
Andrey Konovalov [Tue, 24 Nov 2020 05:45:14 +0000 (16:45 +1100)]
arm64: kasan: align allocations for HW_TAGS

Hardware tag-based KASAN uses the memory tagging approach, which requires
all allocations to be aligned to the memory granule size.  Align the
allocations to MTE_GRANULE_SIZE via ARCH_SLAB_MINALIGN when
CONFIG_KASAN_HW_TAGS is enabled.

Link: https://lkml.kernel.org/r/fe64131606b1c2aabfd34ae99554c0d9df18eb19.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agokasan, mm: untag page address in free_reserved_area
Vincenzo Frascino [Tue, 24 Nov 2020 05:45:14 +0000 (16:45 +1100)]
kasan, mm: untag page address in free_reserved_area

free_reserved_area() memsets the pages belonging to a given memory area.
As that memory hasn't been allocated via page_alloc, the KASAN tags that
those pages have are 0x00.  As the result the memset might result in a tag
mismatch.

Untag the address to avoid spurious faults.

Link: https://lkml.kernel.org/r/ebef6425f4468d063e2f09c1b62ccbb2236b71d3.1606161801.git.andreyknvl@google.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agoarm64: mte: switch GCR_EL1 in kernel entry and exit
Vincenzo Frascino [Tue, 24 Nov 2020 05:45:14 +0000 (16:45 +1100)]
arm64: mte: switch GCR_EL1 in kernel entry and exit

When MTE is present, the GCR_EL1 register contains the tags mask that
allows to exclude tags from the random generation via the IRG instruction.

With the introduction of the new Tag-Based KASAN API that provides a
mechanism to reserve tags for special reasons, the MTE implementation has
to make sure that the GCR_EL1 setting for the kernel does not affect the
userspace processes and viceversa.

Save and restore the kernel/user mask in GCR_EL1 in kernel entry and exit.

Link: https://lkml.kernel.org/r/578b03294708cc7258fad0dc9c2a2e809e5a8214.1606161801.git.andreyknvl@google.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agoarm64: mte: convert gcr_user into an exclude mask
Vincenzo Frascino [Tue, 24 Nov 2020 05:45:13 +0000 (16:45 +1100)]
arm64: mte: convert gcr_user into an exclude mask

The gcr_user mask is a per thread mask that represents the tags that are
excluded from random generation when the Memory Tagging Extension is
present and an 'irg' instruction is invoked.

gcr_user affects the behavior on EL0 only.

Currently that mask is an include mask and it is controlled by the user
via prctl() while GCR_EL1 accepts an exclude mask.

Convert the include mask into an exclude one to make it easier the
register setting.

Note: This change will affect gcr_kernel (for EL1) introduced with a
future patch.

Link: https://lkml.kernel.org/r/946dd31be833b660334c4f93410acf6d6c4cf3c4.1606161801.git.andreyknvl@google.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
4 years agoarm64: kasan: allow enabling in-kernel MTE
Vincenzo Frascino [Tue, 24 Nov 2020 05:45:13 +0000 (16:45 +1100)]
arm64: kasan: allow enabling in-kernel MTE

Hardware tag-based KASAN relies on Memory Tagging Extension (MTE) feature
and requires it to be enabled.  MTE supports

This patch adds a new mte_enable_kernel() helper, that enables MTE in
Synchronous mode in EL1 and is intended to be called from KASAN runtime
during initialization.

The Tag Checking operation causes a synchronous data abort as a
consequence of a tag check fault when MTE is configured in synchronous
mode.

As part of this change enable match-all tag for EL1 to allow the kernel to
access user pages without faulting.  This is required because the kernel
does not have knowledge of the tags set by the user in a page.

Note: For MTE, the TCF bit field in SCTLR_EL1 affects only EL1 in a
similar way as TCF0 affects EL0.

MTE that is built on top of the Top Byte Ignore (TBI) feature hence we
enable it as part of this patch as well.

Link: https://lkml.kernel.org/r/7352b0a0899af65c2785416c8ca6bf3845b66fa1.1606161801.git.andreyknvl@google.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>