mm: enable khugepaged anonymous collapse on non-writable regions
Patch series "Expand scope of khugepaged anonymous collapse", v2.
Currently khugepaged does not collapse an anonymous region which does not
have a single writable pte. This is wasteful since a region mapped with
non-writable ptes, for example, non-writable VMAs mapped by the
application, won't benefit from THP collapse.
An additional consequence of this constraint is that MADV_COLLAPSE does
not perform a collapse on a non-writable VMA, and this restriction is
nowhere to be found on the manpage - the restriction itself sounds wrong
to me since the user knows the protection of the memory it has mapped, so
collapsing read-only memory via madvise() should be a choice of the user
which shouldn't be overridden by the kernel.
Therefore, remove this constraint.
On an arm64 bare metal machine, comparing with vanilla 6.17-rc2, an
average of 5% improvement is seen on some mmtests benchmarks, particularly
hackbench, with a maximum improvement of 12%. In the following table, (I)
denotes statistically significant improvement, (R) denotes statistically
significant regression.
+-------------------------+--------------------------------+---------------+
| mmtests/hackbench | process-pipes-1 (seconds) | -0.06% |
| | process-pipes-4 (seconds) | -0.27% |
| | process-pipes-7 (seconds) | (I) -12.13% |
| | process-pipes-12 (seconds) | (I) -5.32% |
| | process-pipes-21 (seconds) | (I) -2.87% |
| | process-pipes-30 (seconds) | (I) -3.39% |
| | process-pipes-48 (seconds) | (I) -5.65% |
| | process-pipes-79 (seconds) | (I) -6.74% |
| | process-pipes-110 (seconds) | (I) -6.26% |
| | process-pipes-141 (seconds) | (I) -4.99% |
| | process-pipes-172 (seconds) | (I) -4.45% |
| | process-pipes-203 (seconds) | (I) -3.65% |
| | process-pipes-234 (seconds) | (I) -3.45% |
| | process-pipes-256 (seconds) | (I) -3.47% |
| | process-sockets-1 (seconds) | 2.13% |
| | process-sockets-4 (seconds) | 1.02% |
| | process-sockets-7 (seconds) | -0.26% |
| | process-sockets-12 (seconds) | -1.24% |
| | process-sockets-21 (seconds) | 0.01% |
| | process-sockets-30 (seconds) | -0.15% |
| | process-sockets-48 (seconds) | 0.15% |
| | process-sockets-79 (seconds) | 1.45% |
| | process-sockets-110 (seconds) | -1.64% |
| | process-sockets-141 (seconds) | (I) -4.27% |
| | process-sockets-172 (seconds) | 0.30% |
| | process-sockets-203 (seconds) | -1.71% |
| | process-sockets-234 (seconds) | -1.94% |
| | process-sockets-256 (seconds) | -0.71% |
| | thread-pipes-1 (seconds) | 0.66% |
| | thread-pipes-4 (seconds) | 1.66% |
| | thread-pipes-7 (seconds) | -0.17% |
| | thread-pipes-12 (seconds) | (I) -4.12% |
| | thread-pipes-21 (seconds) | (I) -2.13% |
| | thread-pipes-30 (seconds) | (I) -3.78% |
| | thread-pipes-48 (seconds) | (I) -5.77% |
| | thread-pipes-79 (seconds) | (I) -5.31% |
| | thread-pipes-110 (seconds) | (I) -6.12% |
| | thread-pipes-141 (seconds) | (I) -4.00% |
| | thread-pipes-172 (seconds) | (I) -3.01% |
| | thread-pipes-203 (seconds) | (I) -2.62% |
| | thread-pipes-234 (seconds) | (I) -2.00% |
| | thread-pipes-256 (seconds) | (I) -2.30% |
| | thread-sockets-1 (seconds) | (R) 2.39% |
+-------------------------+--------------------------------+---------------+
+-------------------------+------------------------------------------------+
| mmtests/sysbench-mutex | sysbenchmutex-1 (usec) | -0.02% |
| | sysbenchmutex-4 (usec) | -0.02% |
| | sysbenchmutex-7 (usec) | 0.00% |
| | sysbenchmutex-12 (usec) | 0.12% |
| | sysbenchmutex-21 (usec) | -0.40% |
| | sysbenchmutex-30 (usec) | 0.08% |
| | sysbenchmutex-48 (usec) | 2.59% |
| | sysbenchmutex-79 (usec) | -0.80% |
| | sysbenchmutex-110 (usec) | -3.87% |
| | sysbenchmutex-128 (usec) | (I) -4.46% |
+-------------------------+--------------------------------+---------------+
This patch (of 2):
Currently khugepaged does not collapse an anonymous region which does not
have a single writable pte. This is wasteful since a region mapped with
non-writable ptes, for example, non-writable VMAs mapped by the
application, won't benefit from THP collapse.
An additional consequence of this constraint is that MADV_COLLAPSE does
not perform a collapse on a non-writable VMA, and this restriction is
nowhere to be found on the manpage - the restriction itself sounds wrong
to me since the user knows the protection of the memory it has mapped, so
collapsing read-only memory via madvise() should be a choice of the user
which shouldn't be overridden by the kernel.
Therefore, remove this restriction by not honouring SCAN_PAGE_RO.
Link: https://lkml.kernel.org/r/20250908075028.38431-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20250908075028.38431-2-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>