From: cuishiwei Date: Tue, 9 Sep 2025 01:21:41 +0000 (+0800) Subject: mm: disable demotion during memory reclamation X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=0dea8b6286b64cc5c967a2d2ff1ca4da92251bdb;p=users%2Fjedix%2Flinux-maple.git mm: disable demotion during memory reclamation I've found an issue while using CXL memory. My machine has one DRAM NUMA node and one CXL NUMA node: node 1 cpus: 96 97 98 99... - dram Numa node node 1 size: 772048 MB node 1 free: 759737 MB node 3 cpus: - CXL memory Numa node node 3 size: 524288 MB node 3 free: 524287 MB 1.enable demotion echo 1 > /sys/kernel/mm/numa/demotion_enabled 2.Execute a memory allocation program in a memcg cgexec -g memory:test numactl -N 1 ./allocate_memory 20 - allocate 20G memory numastat allocate_memory: Node 0 Node 1 Node 3 --------------- --------------- --------------- Huge 0.00 0.00 0.00 Heap 0.00 0.00 0.00 Stack 0.00 0.01 0.00 Private 0.05 20481.56 0.01 3.Setting the memory cgroup memory limit to be exceeded echo 15G > /sys/fs/cgroup/test/memory.max numastat allocate_memory: Node 0 Node 1 Node 3 --------------- --------------- --------------- Huge 0.00 0.00 0.00 Heap 0.00 0.00 0.00 Stack 0.00 0.01 0.00 Private 0.00 4011.54 10560.00 This happens because demotion was enabled, when the memcg's memory limit was exceeded, memory from the DRAM NUMA node was first migrated to the CXL NUMA node. After that, a memory reclaim was performed, which was unnecessary. When a memory cgroup exceeds its memory limit, the system reclaims its cold memory.However, if /sys/kernel/mm/numa/demotion_enabled is set to 1, memory on fast memory nodes will also be demoted to slow memory nodes. This demotion contradicts the goal of reclaiming cold memory within the memcg.At this point, demoting cold memory from fast to slow nodes is pointless;it doesn't reduce the memcg's memory usage. Therefore, we should set no_demotion when reclaiming memory in a memcg. Link: https://lkml.kernel.org/r/20250909012141.1467-1-cuishw@inspur.com Signed-off-by: cuishiwei Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Johannes Weiner Cc: Lorenzo Stoakes Cc: Qi Zheng Cc: Shakeel Butt Cc: Wei Xu Cc: Yuanchu Xie Cc: Michal Hocko Cc: Roman Gushchin Cc: Muchun Song Signed-off-by: Andrew Morton --- diff --git a/mm/vmscan.c b/mm/vmscan.c index 6a44289975bc..f1fc36729ddd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6717,6 +6717,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, .may_unmap = 1, .may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP), .proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE), + .no_demotion = 1, }; /* * Traverse the ZONELIST_FALLBACK zonelist of the current node to put