www.infradead.org Git - users/jedix/linux-maple.git/commit

author	David Hildenbrand <david@redhat.com>
	Wed, 7 Aug 2024 11:55:15 +0000 (13:55 +0200)
committer	Andrew Morton <akpm@linux-foundation.org>
	Sat, 17 Aug 2024 00:52:55 +0000 (17:52 -0700)
commit	65f328d29807b6191a1520d0142e2b752f065b85
tree	1c6f8e1dd1d1b155d7d538eff6f7e82cac2f6de2	tree
parent	f17a6279de976aa77854bf54b7611bb9e2e2314c	commit \| diff

mm/rmap: minimize folio->_nr_pages_mapped updates when batching PTE (un)mapping

It is not immediately obvious, but we can move the folio->_nr_pages_mapped
update out of the loop and reduce the number of atomic ops without
affecting the stats.

The important point to realize is that only removing the last PMD mapping
will result in _nr_pages_mapped going below ENTIRELY_MAPPED, not the
individual atomic_inc_return_relaxed() calls. Concurrent races with
removal of PMD mappings should be handled as expected, just like when we
would have such races right now on a single mapcount update.

In a simple munmap() microbenchmark [1] on 1 GiB of memory backed by the
same PTE-mapped folio size (only mapped by a single process such that they
will get completely unmapped), this change results in a speedup (positive
is good) per folio size on a x86-64 Intel machine of roughly (a bit of
noise expected):

* 16 KiB: +10%
* 32 KiB: +15%
* 64 KiB: +17%
* 128 KiB: +21%
* 256 KiB: +22%
* 512 KiB: +22%
* 1024 KiB: +23%
* 2048 KiB: +27%

[1] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/pte-mapped-folio-benchmarks.c

Link: https://lkml.kernel.org/r/20240807115515.1640951-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>