From 0dea8b6286b64cc5c967a2d2ff1ca4da92251bdb Mon Sep 17 00:00:00 2001 From: cuishiwei Date: Tue, 9 Sep 2025 09:21:41 +0800 Subject: [PATCH] mm: disable demotion during memory reclamation MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit I've found an issue while using CXL memory. My machine has one DRAM NUMA node and one CXL NUMA node: node 1 cpus: 96 97 98 99... - dram Numa node node 1 size: 772048 MB node 1 free: 759737 MB node 3 cpus: - CXL memory Numa node node 3 size: 524288 MB node 3 free: 524287 MB 1.enable demotion echo 1 > /sys/kernel/mm/numa/demotion_enabled 2.Execute a memory allocation program in a memcg cgexec -g memory:test numactl -N 1 ./allocate_memory 20 - allocate 20G memory numastat allocate_memory: Node 0 Node 1 Node 3 --------------- --------------- --------------- Huge 0.00 0.00 0.00 Heap 0.00 0.00 0.00 Stack 0.00 0.01 0.00 Private 0.05 20481.56 0.01 3.Setting the memory cgroup memory limit to be exceeded echo 15G > /sys/fs/cgroup/test/memory.max numastat allocate_memory: Node 0 Node 1 Node 3 --------------- --------------- --------------- Huge 0.00 0.00 0.00 Heap 0.00 0.00 0.00 Stack 0.00 0.01 0.00 Private 0.00 4011.54 10560.00 This happens because demotion was enabled, when the memcg's memory limit was exceeded, memory from the DRAM NUMA node was first migrated to the CXL NUMA node. After that, a memory reclaim was performed, which was unnecessary. When a memory cgroup exceeds its memory limit, the system reclaims its cold memory.However, if /sys/kernel/mm/numa/demotion_enabled is set to 1, memory on fast memory nodes will also be demoted to slow memory nodes. This demotion contradicts the goal of reclaiming cold memory within the memcg.At this point, demoting cold memory from fast to slow nodes is pointless;it doesn't reduce the memcg's memory usage. Therefore, we should set no_demotion when reclaiming memory in a memcg. Link: https://lkml.kernel.org/r/20250909012141.1467-1-cuishw@inspur.com Signed-off-by: cuishiwei Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Johannes Weiner Cc: Lorenzo Stoakes Cc: Qi Zheng Cc: Shakeel Butt Cc: Wei Xu Cc: Yuanchu Xie Cc: Michal Hocko Cc: Roman Gushchin Cc: Muchun Song Signed-off-by: Andrew Morton --- mm/vmscan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/vmscan.c b/mm/vmscan.c index 6a44289975bc..f1fc36729ddd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6717,6 +6717,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, .may_unmap = 1, .may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP), .proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE), + .no_demotion = 1, }; /* * Traverse the ZONELIST_FALLBACK zonelist of the current node to put -- 2.51.0