]> www.infradead.org Git - users/jedix/linux-maple.git/commitdiff
mm: disable demotion during memory reclamation
authorcuishiwei <cuishw@inspur.com>
Tue, 9 Sep 2025 01:21:41 +0000 (09:21 +0800)
committerAndrew Morton <akpm@linux-foundation.org>
Fri, 12 Sep 2025 00:26:04 +0000 (17:26 -0700)
I've found an issue while using CXL memory.  My machine has one DRAM NUMA
node and one CXL NUMA node:

node 1 cpus: 96 97 98 99... - dram Numa node
node 1 size: 772048 MB
node 1 free: 759737 MB
node 3 cpus: - CXL memory Numa node
node 3 size: 524288 MB
node 3 free: 524287 MB
1.enable demotion
echo 1 > /sys/kernel/mm/numa/demotion_enabled
2.Execute a memory allocation program in a memcg
cgexec -g memory:test numactl -N 1 ./allocate_memory 20 - allocate 20G memory
numastat allocate_memory:
                           Node 0          Node 1          Node 3
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         0.00            0.00            0.00
Stack                        0.00            0.01            0.00
Private                      0.05        20481.56            0.01
3.Setting the memory cgroup memory limit to be exceeded
echo 15G > /sys/fs/cgroup/test/memory.max
numastat allocate_memory:
                           Node 0          Node 1          Node 3
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         0.00            0.00            0.00
Stack                        0.00            0.01            0.00
Private                      0.00         4011.54            10560.00

This happens because demotion was enabled, when the memcg's memory limit
was exceeded, memory from the DRAM NUMA node was first migrated to the CXL
NUMA node.  After that, a memory reclaim was performed, which was
unnecessary.

When a memory cgroup exceeds its memory limit, the system reclaims its
cold memory.However, if /sys/kernel/mm/numa/demotion_enabled is set to 1,
memory on fast memory nodes will also be demoted to slow memory nodes.

This demotion contradicts the goal of reclaiming cold memory within the
memcg.At this point, demoting cold memory from fast to slow nodes is
pointless;it doesn't reduce the memcg's memory usage.  Therefore, we
should set no_demotion when reclaiming memory in a memcg.

Link: https://lkml.kernel.org/r/20250909012141.1467-1-cuishw@inspur.com
Signed-off-by: cuishiwei <cuishw@inspur.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/vmscan.c

index 6a44289975bc8aac52a39ad0864568ffcd73614c..f1fc36729ddd4a23bfab47567ef820ce6d2d5067 100644 (file)
@@ -6717,6 +6717,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
                .may_unmap = 1,
                .may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP),
                .proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE),
+               .no_demotion = 1,
        };
        /*
         * Traverse the ZONELIST_FALLBACK zonelist of the current node to put