mm: multi-gen LRU: exploit locality in rmap
Searching the rmap for PTEs mapping each page on an LRU list (to test and
clear the accessed bit) can be expensive because pages from different VMAs
(PA space) are not cache friendly to the rmap (VA space). For workloads
mostly using mapped pages, the rmap has a high CPU cost in the reclaim
path.
This patch exploits spatial locality to reduce the trips into the rmap.
When shrink_page_list() walks the rmap and finds a young PTE, a new
function lru_gen_look_around() scans at most BITS_PER_LONG-1 adjacent
PTEs. On finding another young PTE, it clears the accessed bit and
updates the gen counter of the page mapped by this PTE to
(max_seq%MAX_NR_GENS)+1.
Server benchmark results:
Single workload:
fio (buffered I/O): no change
Single workload:
memcached (anon): +[5.5, 7.5]%
Ops/sec KB/sec
patch1-6:
1120643.70 43588.06
patch1-7:
1193918.93 46438.15
Configurations:
no change
Client benchmark results:
kswapd profiles:
patch1-6
35.99% lzo1x_1_do_compress (real work)
19.40% page_vma_mapped_walk
6.31% _raw_spin_unlock_irq
3.95% do_raw_spin_lock
2.39% anon_vma_interval_tree_iter_first
2.25% ptep_clear_flush
1.92% __anon_vma_interval_tree_subtree_search
1.70% folio_referenced_one
1.68% __zram_bvec_write
1.43% anon_vma_interval_tree_iter_next
patch1-7
45.90% lzo1x_1_do_compress (real work)
9.14% page_vma_mapped_walk
6.81% _raw_spin_unlock_irq
2.80% ptep_clear_flush
2.34% __zram_bvec_write
2.29% do_raw_spin_lock
1.84% lru_gen_look_around
1.78% memmove
1.74% obj_malloc
1.50% free_unref_page_list
Configurations:
no change
Link: https://lkml.kernel.org/r/20220407031525.2368067-8-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>