]> www.infradead.org Git - users/jedix/linux-maple.git/commit
mm: disable demotion during memory reclamation
authorcuishiwei <cuishw@inspur.com>
Tue, 9 Sep 2025 01:21:41 +0000 (09:21 +0800)
committerAndrew Morton <akpm@linux-foundation.org>
Fri, 12 Sep 2025 00:26:04 +0000 (17:26 -0700)
commit0dea8b6286b64cc5c967a2d2ff1ca4da92251bdb
tree1bed585e3921a3cba39f33ad594e88a703e91a95
parent3ca4060a3ad8c3f6e77c87f849309a112204a384
mm: disable demotion during memory reclamation

I've found an issue while using CXL memory.  My machine has one DRAM NUMA
node and one CXL NUMA node:

node 1 cpus: 96 97 98 99... - dram Numa node
node 1 size: 772048 MB
node 1 free: 759737 MB
node 3 cpus: - CXL memory Numa node
node 3 size: 524288 MB
node 3 free: 524287 MB
1.enable demotion
echo 1 > /sys/kernel/mm/numa/demotion_enabled
2.Execute a memory allocation program in a memcg
cgexec -g memory:test numactl -N 1 ./allocate_memory 20 - allocate 20G memory
numastat allocate_memory:
                           Node 0          Node 1          Node 3
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         0.00            0.00            0.00
Stack                        0.00            0.01            0.00
Private                      0.05        20481.56            0.01
3.Setting the memory cgroup memory limit to be exceeded
echo 15G > /sys/fs/cgroup/test/memory.max
numastat allocate_memory:
                           Node 0          Node 1          Node 3
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         0.00            0.00            0.00
Stack                        0.00            0.01            0.00
Private                      0.00         4011.54            10560.00

This happens because demotion was enabled, when the memcg's memory limit
was exceeded, memory from the DRAM NUMA node was first migrated to the CXL
NUMA node.  After that, a memory reclaim was performed, which was
unnecessary.

When a memory cgroup exceeds its memory limit, the system reclaims its
cold memory.However, if /sys/kernel/mm/numa/demotion_enabled is set to 1,
memory on fast memory nodes will also be demoted to slow memory nodes.

This demotion contradicts the goal of reclaiming cold memory within the
memcg.At this point, demoting cold memory from fast to slow nodes is
pointless;it doesn't reduce the memcg's memory usage.  Therefore, we
should set no_demotion when reclaiming memory in a memcg.

Link: https://lkml.kernel.org/r/20250909012141.1467-1-cuishw@inspur.com
Signed-off-by: cuishiwei <cuishw@inspur.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/vmscan.c