www.infradead.org Git - users/jedix/linux-maple.git/log

]> www.infradead.org Git - users/jedix/linux-maple.git/log

projects / users / jedix / linux-maple.git / log

commit | commitdiff | tree

Liam R. Howlett [Tue, 6 May 2025 15:35:58 +0000 (11:35 -0400)]

mas_wr_rebalance_two rename

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 6 May 2025 15:34:24 +0000 (11:34 -0400)]

ditch debug and some cleanup

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 6 May 2025 15:13:56 +0000 (11:13 -0400)]

off by one\n

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 6 May 2025 14:45:41 +0000 (10:45 -0400)]

still issues

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 2 May 2025 20:15:48 +0000 (16:15 -0400)]

enable all testing again

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 2 May 2025 20:15:33 +0000 (16:15 -0400)]

rebalance_reduce use split_state_setup

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 2 May 2025 20:01:06 +0000 (16:01 -0400)]

split_state_setup() working better

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 2 May 2025 19:30:02 +0000 (15:30 -0400)]

fix src offset

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 2 May 2025 19:26:06 +0000 (15:26 -0400)]

ditch mni_insert_part, dst_max_off, unfinished, and use src->offset

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 2 May 2025 15:18:20 +0000 (11:18 -0400)]

add state_setup() function

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 1 May 2025 18:26:50 +0000 (14:26 -0400)]

rebalance left to right functioning well

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 30 Apr 2025 20:12:08 +0000 (16:12 -0400)]

mas_wr_rebalance: rebalancing, not reducing

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 25 Apr 2025 15:50:07 +0000 (11:50 -0400)]

rebalance wip

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 16 Apr 2025 15:54:58 +0000 (11:54 -0400)]

code cleanup of try_rebalance

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 16 Apr 2025 15:39:06 +0000 (11:39 -0400)]

Introduce helpers for can_rebalance directions

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 16 Apr 2025 14:53:45 +0000 (10:53 -0400)]

outline

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 16 Apr 2025 01:02:35 +0000 (21:02 -0400)]

compiles, ship it.

The copy and pasting of code does not work out without an overall plan

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Mon, 14 Apr 2025 19:39:23 +0000 (15:39 -0400)]

broken for now

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 10 Apr 2025 19:12:52 +0000 (15:12 -0400)]

rename alloc to is_alloc

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 10 Apr 2025 18:43:55 +0000 (14:43 -0400)]

ma_part -> part and comments

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 10 Apr 2025 18:34:08 +0000 (14:34 -0400)]

finished conversion

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 10 Apr 2025 18:19:39 +0000 (14:19 -0400)]

try_rebalance converted

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 10 Apr 2025 18:13:16 +0000 (14:13 -0400)]

converged converted

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 10 Apr 2025 15:58:47 +0000 (11:58 -0400)]

use split_data struct

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 8 Apr 2025 17:50:28 +0000 (13:50 -0400)]

wip, cleanup

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 8 Apr 2025 15:07:39 +0000 (11:07 -0400)]

mas_wr_split cleanup and some comments

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 8 Apr 2025 02:02:54 +0000 (22:02 -0400)]

Revert "remove dead code"

This reverts commit b3062ee59ae87e44f81fd309b79c8fbf5660f2db.

commit | commitdiff | tree

Liam R. Howlett [Tue, 8 Apr 2025 02:00:03 +0000 (22:00 -0400)]

more printk and dead code

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 8 Apr 2025 01:53:20 +0000 (21:53 -0400)]

remove dead code

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 8 Apr 2025 01:53:01 +0000 (21:53 -0400)]

remove printk

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 8 Apr 2025 01:49:42 +0000 (21:49 -0400)]

working new code

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 2 Apr 2025 13:36:49 +0000 (09:36 -0400)]

debug

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 28 Feb 2025 22:28:38 +0000 (17:28 -0500)]

mas_wr_try_rebalance: Drop dead code

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 28 Feb 2025 22:27:39 +0000 (17:27 -0500)]

mas_split: Drop dead code

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 28 Feb 2025 22:27:01 +0000 (17:27 -0500)]

maple_tree: use mt_wr_split_data() in split and try_rebalance

Signed-off-by: Liam R. Howlett <howlett@gmail.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 27 Feb 2025 18:57:47 +0000 (13:57 -0500)]

fix rewind for ma_part

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 27 Feb 2025 03:02:30 +0000 (22:02 -0500)]

mas_wr_split using mt_wr_split_data

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 27 Feb 2025 02:20:19 +0000 (21:20 -0500)]

mt_wr_split_data()

Signed-off-by: Liam R. Howlett <howlett@gmail.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 19 Feb 2025 02:45:42 +0000 (21:45 -0500)]

progress, I guess Split and reblance testing

Signed-off-by: Liam R. Howlett <howlett@gmail.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 14 Feb 2025 20:42:56 +0000 (15:42 -0500)]

working on split rebalance now

Signed-off-by: Liam R. Howlett <howlett@gmail.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 11 Feb 2025 17:18:30 +0000 (12:18 -0500)]

wip, moving to struct splitting

Signed-off-by: Liam R. Howlett <howlett@gmail.com>

commit | commitdiff | tree

Liam R. Howlett [Tue, 4 Feb 2025 18:05:35 +0000 (13:05 -0500)]

rebalance wip

Signed-off-by: Liam R. Howlett <howlett@gmail.com>

commit | commitdiff | tree

Liam R. Howlett [Mon, 3 Feb 2025 21:04:12 +0000 (16:04 -0500)]

mas_wr_rebalance() - it compiles

Signed-off-by: Liam R. Howlett <howlett@gmail.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 30 Jan 2025 18:42:29 +0000 (13:42 -0500)]

mas_destroy_rebalance() whitespace fix

Signed-off-by: Liam R. Howlett <howlett@gmail.com>

commit | commitdiff | tree

Liam R. Howlett [Mon, 27 Jan 2025 21:10:36 +0000 (16:10 -0500)]

mas_wr_rebalance_nodes: fix line length

Signed-off-by: Liam R. Howlett <howlett@gmail.com>

commit | commitdiff | tree

Liam R. Howlett [Mon, 27 Jan 2025 20:59:53 +0000 (15:59 -0500)]

testing/raix-tree/maple: Increase readers and reduce delay for faster machines

Signed-off-by: Liam R. Howlett <howlett@gmail.com>

commit | commitdiff | tree

Liam R. Howlett [Mon, 27 Jan 2025 20:56:40 +0000 (15:56 -0500)]

mas_wr_rebalance for insufficient data

Signed-off-by: Liam R. Howlett <howlett@gmail.com>

commit | commitdiff | tree

Liam R. Howlett [Mon, 27 Jan 2025 20:56:11 +0000 (15:56 -0500)]

drop mas_wr_rebalance to be re-added clean

Signed-off-by: Liam R. Howlett <howlett@gmail.com>

commit | commitdiff | tree

Liam R. Howlett [Thu, 5 Dec 2024 20:49:17 +0000 (15:49 -0500)]

drop meaningless return from mas_wr_split

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Sat, 30 Nov 2024 04:54:58 +0000 (23:54 -0500)]

maple_tree: Combine mas_parent_gap() into mas_update_gap()

mas_parent_gap() is used in one location and a lot of what is needed already
exists in the calling function. Inline the function and dropping the
duplication simplifies the code and reduces the instruction count.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Sat, 30 Nov 2024 04:30:52 +0000 (23:30 -0500)]

maple_tree: Inline ma_max_gap() in mas_update_gap()

ma_max_gap is called from a single location and can benefit from the
setup in mas_update_gap, so inline it.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 30 Oct 2024 15:28:08 +0000 (11:28 -0400)]

maple_tree: Fix ma_dead_node() comment

Update the arguemnt description to be accurate.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Sat, 30 Nov 2024 04:02:41 +0000 (23:02 -0500)]

inline three small functions

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Sat, 30 Nov 2024 04:01:13 +0000 (23:01 -0500)]

mas_wr_split() Avoid mas_wr_new_end() call

ma_part has the size set to what is needed by examining the same data.
Use ma_part.size instead.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Wed, 23 Aug 2023 16:13:52 +0000 (12:13 -0400)]

tools/testing/radix-tree: Add maple tree fuzzer

This is the introduction of the maple tree fuzzer into the radix-tree
test suite, based heavily on the work of Vasily Gorbik sent in March
2022.

The tester uses the LLVM fuzzer to automate testing of the maple tree by
randomly inserting, storing, deleting, and resetting the tree.  Testing
has been expanded to test both allocation and basic trees.

After building the fuzz-maple target with clang, just run the resulting
executable.  The llvm libfuzzer supports minimizing the steps to
reproduce a crash from a crash file.

Using V=1 on the minimized crash will result in a testcase that can be
added to the lib/test_maple_tree.c test suite.  Using V=2 can help
figure out what is happening to cause the crash.

Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Sat, 30 Nov 2024 01:34:34 +0000 (20:34 -0500)]

maple_tree: Use local variable

the maple state variable has already been set up, so use it.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Sat, 30 Nov 2024 01:33:38 +0000 (20:33 -0500)]

inlineing

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 29 Nov 2024 23:26:03 +0000 (18:26 -0500)]

always inlined and such

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 29 Nov 2024 22:15:53 +0000 (17:15 -0500)]

maple_tree: Add skipping slot support

Allow maple node state to jump entire slots when necessary.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 29 Nov 2024 18:25:18 +0000 (13:25 -0500)]

optimize a bunch of junk

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 29 Nov 2024 15:31:43 +0000 (10:31 -0500)]

rebalance starts now

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Fri, 29 Nov 2024 16:28:33 +0000 (11:28 -0500)]

maple_tree: Optimise mas_wr_store_entry() and mas_prealloc_calc()

Rearrange the switch statements so that the more likely code paths are
checked first.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Liam R. Howlett [Mon, 4 Nov 2024 15:32:03 +0000 (10:32 -0500)]

maple_tree: Create new mas_wr_split()

Stop using the large struct big_node and use logic with two allocated
nodes.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

commit | commitdiff | tree

Andrew Morton [Sun, 26 Jan 2025 23:05:03 +0000 (15:05 -0800)]

foo

commit | commitdiff | tree

Hamza Mahfooz [Mon, 20 Jan 2025 20:56:59 +0000 (15:56 -0500)]

mailmap: add an entry for Hamza Mahfooz

Map my previous work email to my current one.

Link: https://lkml.kernel.org/r/20250120205659.139027-1-hamzamahfooz@linux.microsoft.com
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hans verkuil <hverkuil@xs4all.nl>
Cc: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Zhaoyang Huang [Tue, 21 Jan 2025 02:01:59 +0000 (10:01 +0800)]

mm: gup: fix infinite loop within __get_longterm_locked

We can run into an infinite loop in __get_longterm_locked() when
collect_longterm_unpinnable_folios() finds only folios that are isolated
from the LRU or were never added to the LRU. This can happen when all
folios to be pinned are never added to the LRU, for example when
vm_ops->fault allocated pages using cma_alloc() and never added them to
the LRU.

We incorrectly update the "collected" variable even if nothing was
collected. Fix it by incrementing "collected" only when we isolated a
folio and added it to the list of folios to migrate.

Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com
Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()")
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: Aijun Sun <aijun.sun@unisoc.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Heming Zhao [Tue, 21 Jan 2025 11:22:03 +0000 (19:22 +0800)]

ocfs2: fix incorrect CPU endianness conversion causing mount failure

Commit 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()")
introduced a regression bug. The blksz_bits value is already converted to
CPU endian in the previous code; therefore, the code shouldn't use
le32_to_cpu() anymore.

Link: https://lkml.kernel.org/r/20250121112204.12834-1-heming.zhao@suse.com
Fixes: 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Liu Shixin [Wed, 22 Jan 2025 06:11:51 +0000 (14:11 +0800)]

mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock

I found a NULL pointer dereference as followed:

BUG: kernel NULL pointer dereference, address: 0000000000000028
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 5 UID: 0 PID: 5964 Comm: sh Kdump: loaded Not tainted 6.13.0-dirty #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.
RIP: 0010:has_unmovable_pages+0x184/0x360
...
Call Trace:
  <TASK>
  set_migratetype_isolate+0xd1/0x180
  start_isolate_page_range+0xd2/0x170
  alloc_contig_range_noprof+0x101/0x660
  alloc_contig_pages_noprof+0x238/0x290
  alloc_gigantic_folio.isra.0+0xb6/0x1f0
  only_alloc_fresh_hugetlb_folio.isra.0+0xf/0x60
  alloc_pool_huge_folio+0x80/0xf0
  set_max_huge_pages+0x211/0x490
  __nr_hugepages_store_common+0x5f/0xe0
  nr_hugepages_store+0x77/0x80
  kernfs_fop_write_iter+0x118/0x200
  vfs_write+0x23c/0x3f0
  ksys_write+0x62/0xe0
  do_syscall_64+0x5b/0x170
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

As has_unmovable_pages() call folio_hstate() without hugetlb_lock, there
is a race to free the HugeTLB page between PageHuge() and folio_hstate().
There is no need to add hugetlb_lock here as the HugeTLB page can be freed
in lot of places.  So it's enough to unfold folio_hstate() and add a check
to avoid NULL pointer dereference for hugepage_migration_supported().

Link: https://lkml.kernel.org/r/20250122061151.578768-1-liushixin2@huawei.com
Fixes: 464c7ffbcb16 ("mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Yosry Ahmed [Thu, 23 Jan 2025 23:13:44 +0000 (23:13 +0000)]

MAINTAINERS: mailmap: update Yosry Ahmed's email address

Moving to a linux.dev email address.

Link: https://lkml.kernel.org/r/20250123231344.817358-1-yosry.ahmed@linux.dev
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Carlos Bilbao [Sat, 11 Jan 2025 16:11:06 +0000 (10:11 -0600)]

mailmap, docs: update email to carlos.bilbao@kernel.org

Update .mailmap to reflect my new (and final) primary email address,
carlos.bilbao@kernel.org. This ensures consistent attribution in Git
history. Also update my contact information in file
Documentation/translations/sp_SP/index.rst to help contributors reach out
for Spanish translations.

Link: https://lkml.kernel.org/r/20250111161110.862131-1-carlos.bilbao@kernel.org
Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org>
Cc: Avadhut Naik <avadhut.naik@amd.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Jan Kiszka [Fri, 10 Jan 2025 10:36:33 +0000 (11:36 +0100)]

scripts/gdb: fix aarch64 userspace detection in get_current_task

At least recent gdb releases (seen with 14.2) return SP_EL0 as signed long
which lets the right-shift always return 0.

Link: https://lkml.kernel.org/r/dcd2fabc-9131-4b48-8419-6444e2d67454@siemens.com
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Li Zhijian [Sat, 11 Jan 2025 01:52:53 +0000 (09:52 +0800)]

mm-vmscan-accumulate-nr_demoted-for-accurate-demotion-statistics-v2

introduce local nr_demoted to fix nr_reclaimed double counting

Link: https://lkml.kernel.org/r/20250111015253.425693-1-lizhijian@fujitsu.com
Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Li Zhijian [Fri, 10 Jan 2025 12:21:32 +0000 (20:21 +0800)]

mm/vmscan: accumulate nr_demoted for accurate demotion statistics

In shrink_folio_list(), demote_folio_list() can be called 2 times.
Currently stat->nr_demoted will only store the last nr_demoted( the later
nr_demoted is always zero, the former nr_demoted will get lost), as a
result number of demoted pages is not accurate.

Accumulate the nr_demoted count across multiple calls to
demote_folio_list(), ensuring accurate reporting of demotion statistics.

Link: https://lkml.kernel.org/r/20250110122133.423481-1-lizhijian@fujitsu.com
Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Tested-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Donet Tom <donettom@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Yu Zhao [Wed, 8 Jan 2025 07:48:21 +0000 (00:48 -0700)]

mm/hugetlb_vmemmap: fix memory loads ordering

Using x86_64 as an example, for a 32KB struct page[] area describing a 2MB
hugeTLB, HVO reduces the area to 4KB by the following steps:

1. Split the (r/w vmemmap) PMD mapping the area into 512 (r/w) PTEs;
2. For the 8 PTEs mapping the area, remap PTE 1-7 to the page mapped
   by PTE 0, and at the same time change the permission from r/w to
   r/o;
3. Free the pages PTE 1-7 used to map, hence the reduction from 32KB
   to 4KB.

However, the following race can happen due to improperly memory loads
ordering:
  CPU 1 (HVO)                     CPU 2 (speculative PFN walker)

  page_ref_freeze()
  synchronize_rcu()
                                  rcu_read_lock()
                                  page_is_fake_head() is false
  vmemmap_remap_pte()
  XXX: struct page[] becomes r/o

  page_ref_unfreeze()
                                  page_ref_count() is not zero

                                  atomic_add_unless(&page->_refcount)
                                  XXX: try to modify r/o struct page[]

Specifically, page_is_fake_head() must be ordered after page_ref_count()
on CPU 2 so that it can only return true for this case, to avoid the later
attempt to modify r/o struct page[].

This patch adds the missing memory barrier and makes the tests on
page_is_fake_head() and page_ref_count() done in the proper order.

Link: https://lkml.kernel.org/r/20250108074822.722696-1-yuzhao@google.com
Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: Will Deacon <will@kernel.org>
Closes: https://lore.kernel.org/20241128142028.GA3506@willie-the-truck/
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

liuye [Tue, 19 Nov 2024 06:08:42 +0000 (14:08 +0800)]

mm/vmscan: fix hard LOCKUP in function isolate_lru_folios

This fixes the following hard lockup in isolate_lru_folios() during memory
reclaim.  If the LRU mostly contains ineligible folios this may trigger
watchdog.

watchdog: Watchdog detected hard LOCKUP on cpu 173
RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0
Call Trace:
_raw_spin_lock_irqsave+0x31/0x40
folio_lruvec_lock_irqsave+0x5f/0x90
folio_batch_move_lru+0x91/0x150
lru_add_drain_per_cpu+0x1c/0x40
process_one_work+0x17d/0x350
worker_thread+0x27b/0x3a0
kthread+0xe8/0x120
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1b/0x30

lruvec->lru_lock owner：

PID: 2865     TASK: ffff888139214d40  CPU: 40   COMMAND: "kswapd0"
#0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555
#1 [fffffe0000945e68] nmi_handle at ffffffffa563b171
#2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920
#3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4
#4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde
    [exception RIP: isolate_lru_folios+403]
    RIP: ffffffffa597df53  RSP: ffffc90006fb7c28  RFLAGS: 00000002
    RAX: 0000000000000001  RBX: ffffc90006fb7c60  RCX: ffffea04a2196f88
    RDX: ffffc90006fb7c60  RSI: ffffc90006fb7c60  RDI: ffffea04a2197048
    RBP: ffff88812cbd3010   R8: ffffea04a2197008   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000001  R12: ffffea04a2197008
    R13: ffffea04a2197048  R14: ffffc90006fb7de8  R15: 0000000003e3e937
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <NMI exception stack>
#5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
#6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788
#7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0
#8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354
#9 [ffffc90006fb7ef8] kthread at ffffffffa5748238
crash>

Scenario:
User processe are requesting a large amount of memory and keep page active.
Then a module continuously requests memory from ZONE_DMA32 area.
Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached.
However pages in the LRU(active_anon) list are mostly from
the ZONE_NORMAL area.

Reproduce:
Terminal 1: Construct to continuously increase pages active(anon).
mkdir /tmp/memory
mount -t tmpfs -o size=1024000M tmpfs /tmp/memory
dd if=/dev/zero of=/tmp/memory/block bs=4M
tail /tmp/memory/block

Terminal 2:
vmstat -a 1
active will increase.
procs ---memory--- ---swap-- ---io---- -system-- ---cpu--- ...
r  b   swpd   free  inact active   si   so    bi    bo
1  0   0 1445623076 45898836 83646008    0    0     0
1  0   0 1445623076 43450228 86094616    0    0     0
1  0   0 1445623076 41003480 88541364    0    0     0
1  0   0 1445623076 38557088 90987756    0    0     0
1  0   0 1445623076 36109688 93435156    0    0     0
1  0   0 1445619552 33663256 95881632    0    0     0
1  0   0 1445619804 31217140 98327792    0    0     0
1  0   0 1445619804 28769988 100774944    0    0     0
1  0   0 1445619804 26322348 103222584    0    0     0
1  0   0 1445619804 23875592 105669340    0    0     0

cat /proc/meminfo | head
Active(anon) increase.
MemTotal:       1579941036 kB
MemFree:        1445618500 kB
MemAvailable:   1453013224 kB
Buffers:            6516 kB
Cached:         128653956 kB
SwapCached:            0 kB
Active:         118110812 kB
Inactive:       11436620 kB
Active(anon):   115345744 kB
Inactive(anon):   945292 kB

When the Active(anon) is 115345744 kB, insmod module triggers
the ZONE_DMA32 watermark.

perf record -e vmscan:mm_vmscan_lru_isolate -aR
perf script
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2
nr_skipped=2 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=28835844
nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=28835844
nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29
nr_skipped=29 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon

See nr_scanned=28835844.
28835844 * 4k = 115343376KB approximately equal to 115345744 kB.

If increase Active(anon) to 1000G then insmod module triggers
the ZONE_DMA32 watermark. hard lockup will occur.

In my device nr_scanned = 0000000003e3e937 when hard lockup.
Convert to memory size 0x0000000003e3e937 * 4KB = 261072092 KB.

   [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
    ffffc90006fb7c30: 0000000000000020 0000000000000000
    ffffc90006fb7c40: ffffc90006fb7d40 ffff88812cbd3000
    ffffc90006fb7c50: ffffc90006fb7d30 0000000106fb7de8
    ffffc90006fb7c60: ffffea04a2197008 ffffea0006ed4a48
    ffffc90006fb7c70: 0000000000000000 0000000000000000
    ffffc90006fb7c80: 0000000000000000 0000000000000000
    ffffc90006fb7c90: 0000000000000000 0000000000000000
    ffffc90006fb7ca0: 0000000000000000 0000000003e3e937
    ffffc90006fb7cb0: 0000000000000000 0000000000000000
    ffffc90006fb7cc0: 8d7c0b56b7874b00 ffff88812cbd3000

About the Fixes:
Why did it take eight years to be discovered?

The problem requires the following conditions to occur:
1. The device memory should be large enough.
2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area.
3. The memory in ZONE_DMA32 needs to reach the watermark.

If the memory is not large enough, or if the usage design of ZONE_DMA32
area memory is reasonable, this problem is difficult to detect.

notes:
The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL,
but other suitable scenarios may also trigger the problem.

Link: https://lkml.kernel.org/r/20241119060842.274072-1-liuye@kylinos.cn
Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Signed-off-by: liuye <liuye@kylinos.cn>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Liu Shixin [Thu, 23 Jan 2025 02:10:29 +0000 (10:10 +0800)]

mm/compaction: fix UBSAN shift-out-of-bounds warning

syzkaller reported a UBSAN shift-out-of-bounds warning of (1UL << order)
in isolate_freepages_block(). The bogus compound_order can be any value
because it is union with flags. Add back the MAX_PAGE_ORDER check to fix
the warning.

Link: https://lkml.kernel.org/r/20250123021029.2826736-1-liushixin2@huawei.com
Fixes: 3da0272a4c7d ("mm/compaction: correctly return failure with bogus compound_order in strict mode")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Alexander Gordeev [Thu, 23 Jan 2025 16:03:49 +0000 (17:03 +0100)]

s390/mm: add missing ctor/dtor on page table upgrade

Commit 78966b550289 ("s390: pgtable: add statistics for PUD and P4D level
page table") misses the call to pagetable_p4d_ctor() against a newly
allocated P4D table in crst_table_upgrade();

Commit 68c601de75d8 ("mm: introduce ctor/dtor at PGD level") misses the
call to pagetable_pgd_ctor() against a newly allocated PGD and the call to
pagetable_dtor() against a newly allocated P4D that is about to be freed
on crst_table_upgrade() PGD upgrade fail path.

The missed constructors and destructor break (at least) the page table
accounting when a process memory space is upgraded.

Link: https://lkml.kernel.org/r/20250123160349.200154-1-agordeev@linux.ibm.com
Fixes: 78966b550289 ("s390: pgtable: add statistics for PUD and P4D level page table")
Fixes: 68c601de75d8 ("mm: introduce ctor/dtor at PGD level")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reported-by: Heiko Carstens <hca@linux.ibm.com>
Closes: https://lore.kernel.org/all/20250122074954.8685-A-hca@linux.ibm.com/
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Thorsten Blum [Thu, 16 Jan 2025 06:24:04 +0000 (07:24 +0100)]

kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()

Remove hard-coded strings by using the str_on_off() helper function.

Link: https://lkml.kernel.org/r/20250116062403.2496-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Suren Baghdasaryan [Thu, 16 Jan 2025 18:15:38 +0000 (10:15 -0800)]

tools: add VM_WARN_ON_VMG definition

vma tests compilation yields the following error:

vma.c:732:9: error: implicit declaration of function ‘VM_WARN_ON_VMG’

Fix it by adding missing VM_WARN_ON_VMG() definition.

Link: https://lkml.kernel.org/r/20250116181538.759469-1-surenb@google.com
Fixes: e3a7ae85f87c ("mm/debug: prefer VM_WARN_ON_VMG() to report VMG debug warnings")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Thorsten Blum [Thu, 16 Jan 2025 20:42:16 +0000 (21:42 +0100)]

mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()

Remove hard-coded strings by using the str_high_low() helper function.

Link: https://lkml.kernel.org/r/20250116204216.106999-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Suren Baghdasaryan [Thu, 16 Jan 2025 18:27:30 +0000 (10:27 -0800)]

seqlock: add missing parameter documentation for raw_seqcount_try_begin()

Add missing documentation for raw_seqcount_try_begin() start parameter.

Link: https://lkml.kernel.org/r/20250116182730.801497-1-surenb@google.com
Fixes: dba4761a3e40 ("seqlock: add raw_seqcount_try_begin")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/all/20250116170522.23e884d5@canb.auug.org.au/
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Jim Zhao [Thu, 21 Nov 2024 10:05:39 +0000 (18:05 +0800)]

mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh

Address the feedback from 39ac99852fca ("mm/page-writeback: raise
wb_thresh to prevent write blocking with strictlimit)". The wb_thresh
bumping logic is scattered across wb_position_ratio, __wb_calc_thresh, and
wb_update_dirty_ratelimit. For consistency, consolidate all wb_thresh
bumping logic into __wb_calc_thresh.

Link: https://lkml.kernel.org/r/20241121100539.605818-1-jimzhao.ai@gmail.com
Signed-off-by: Jim Zhao <jimzhao.ai@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Yuntao Wang [Wed, 15 Jan 2025 04:16:34 +0000 (12:16 +0800)]

mm/page_alloc: remove the incorrect and misleading comment

The comment removed in this patch originally belonged to the
build_zonelists_in_zone_order() function, which was introduced by commit
f0c0b2b808f2 ("change zonelist order: zonelist order selection logic").

Later, commit c9bff3eebc09 ("mm, page_alloc: rip out ZONELIST_ORDER_ZONE")
removed build_zonelists_in_zone_order() but left its comment behind.

Subsequently, commit 9d3be21bf9c0 ("mm, page_alloc: simplify zonelist
initialization") moved the node_order variable into build_zonelists(),
making the comment originally belonged to build_zonelists_in_zone_order()
appear as if it were part of build_zonelists().

Remove this misleading comment.

Link: https://lkml.kernel.org/r/20250115041634.63387-1-yuntao.wang@linux.dev
Signed-off-by: Yuntao Wang <yuntao.wang@linux.dev>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Sergey Senozhatsky [Wed, 15 Jan 2025 07:19:16 +0000 (16:19 +0900)]

zram: remove zcomp_stream_put() from write_incompressible_page()

We cannot and should not put per-CPU compression stream in
write_incompressible_page() because that function never gets any
per-CPU streams in the first place. It's zram_write_page() that
puts the stream before it calls write_incompressible_page().

Link: https://lkml.kernel.org/r/20250115072003.380567-1-senozhatsky@chromium.org
Fixes: 485d11509d6d ("zram: factor out ZRAM_HUGE write")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Byungchul Park [Thu, 8 Aug 2024 06:53:58 +0000 (15:53 +0900)]

mm: separate move/undo parts from migrate_pages_batch()

Functionally, no change. This is a preparation for luf mechanism that
requires to use separated folio lists for its own handling during
migration. Refactored migrate_pages_batch() so as to separate move/undo
parts from migrate_pages_batch().

Link: https://lkml.kernel.org/r/20250115103403.11882-1-byungchul@sk.com
Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Thorsten Blum [Wed, 15 Jan 2025 15:55:12 +0000 (16:55 +0100)]

mm/kfence: use str_write_read() helper in get_access_type()

Remove hard-coded strings by using the str_write_read() helper function.

Link: https://lkml.kernel.org/r/20250115155511.954535-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

liuye [Tue, 14 Jan 2025 02:38:38 +0000 (10:38 +0800)]

selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()

Release memory before exception branch returns to prevent memory leaks

Checking tools/testing/selftests/mm/mkdirty.c ...
tools/testing/selftests/mm/mkdirty.c:283:3: error: Memory leak: src [memleak]
return;
^

Link: https://lkml.kernel.org/r/20250114023838.48589-1-liuye@kylinos.cn
Signed-off-by: liuye <liuye@kylinos.cn>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Thorsten Blum [Tue, 14 Jan 2025 15:09:35 +0000 (16:09 +0100)]

kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()

Remove hard-coded strings by using the str_on_off() helper function.

Link: https://lkml.kernel.org/r/20250114150935.780869-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Thomas Weißschuh [Tue, 14 Jan 2025 16:06:48 +0000 (17:06 +0100)]

selftests/mm: virtual_address_range: avoid reading from VM_IO mappings

The virtual_address_range selftest reads from the start of each mapping
listed in /proc/self/maps.  However not all mappings are valid to be
arbitrarily accessed.

For example the vvar data used for virtual clocks on x86 [vvar_vclock] can
only be accessed if 1) the kernel configuration enables virtual clocks and
2) the hypervisor provided the data for it.  Only the VDSO itself has the
necessary information to know this.  Since commit e93d2521b27f ("x86/vdso:
Split virtual clock pages into dedicated mapping") the virtual clock data
was split out into its own mapping, leading to EFAULT from read() during
the validation.

Check for the VM_IO flag as a proxy.  It is present for the VVAR mappings
and MMIO ranges can be dangerous to access arbitrarily.

Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-4-6fd7269934a5@linutronix.de
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com
Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping")
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Suggested-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/lkml/e97c2a5d-c815-4936-a767-ac42a3220a90@redhat.com/
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Thomas Weißschuh [Tue, 14 Jan 2025 16:06:47 +0000 (17:06 +0100)]

selftests/mm: vm_util: split up /proc/self/smaps parsing

Upcoming changes want to reuse the /proc/self/smaps parsing logic to parse
the VmFlags field.

As that works differently from the currently parsed HugePage counters,
split up the logic so common functionality can be shared.

While reworking this code, also use the correct sscanf placeholder for the
"uint64_t thp" variable.

Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-3-6fd7269934a5@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Thomas Weißschuh [Tue, 14 Jan 2025 16:06:46 +0000 (17:06 +0100)]

selftests/mm: virtual_address_range: unmap chunks after validation

For each accessed chunk a PTE is created. More than 1GiB of PTEs is used
in this way. Remove each PTE after validating a chunk to reduce peak
memory usage.

It is important to only unmap memory that previously mmap()ed, as
unmapping other mappings like the stack, heap or executable mappings will
crash the process.

The mappings read from /proc/self/maps and the return values from mmap()
don't allow a simple correlation due to merging and no guaranteed order.
To correlate the pointers and mappings use prctl(PR_SET_VMA_ANON_NAME).
While it introduces a test dependency, other alternatives would introduce
runtime or development overhead.

Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-2-6fd7269934a5@linutronix.de
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Thomas Weißschuh [Tue, 14 Jan 2025 16:06:45 +0000 (17:06 +0100)]

selftests/mm: virtual_address_range: mmap() without PROT_WRITE

Patch series "selftests/mm: virtual_address_range: Reduce memory", v4.

The selftest started failing since commit e93d2521b27f ("x86/vdso: Split
virtual clock pages into dedicated mapping") was merged.  While debugging
I stumbled upon some memory usage optimizations.

With these test now runs on a VM with only 60MiB of memory.

This patch (of 4):

When mapping a larger chunk than physical memory is available with
PROT_WRITE and overcommit is disabled, the mapping will fail.  This will
prevent the test from running on systems with less then ~1GiB of memory
and triggering an inscrutinable test failure.  As the mappings are never
written to anyways, the flag can be removed.

Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-0-6fd7269934a5@linutronix.de
Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-1-6fd7269934a5@linutronix.de
Fixes: 4e5ce33ceb32 ("selftests/vm: add a test for virtual address range mapping")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Dev Jain <dev.jain@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

liuye [Tue, 14 Jan 2025 03:21:15 +0000 (11:21 +0800)]

selftests/memfd/memfd_test: fix possible NULL pointer dereference

If `name' is NULL, a NULL pointer may be accessed in printf.

Link: https://lkml.kernel.org/r/20250114032115.58638-1-liuye@kylinos.cn
Signed-off-by: liuye <liuye@kylinos.cn>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: "Isaac J. Manjarres" <isaacmanjarres@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Saurav Shah <sauravshah.31@gmail.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Jens Axboe [Fri, 20 Dec 2024 15:47:50 +0000 (08:47 -0700)]

mm: add FGP_DONTCACHE folio creation flag

Callers can pass this in for uncached folio creation, in which case if a
folio is newly created it gets marked as uncached. If a folio exists for
this index and lookup succeeds, then it will not get marked as uncached.
If an !uncached lookup finds a cached folio, clear the flag. For that
case, there are competeting uncached and cached users of the folio, and it
should not get pruned.

Link: https://lkml.kernel.org/r/20241220154831.1086649-13-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Chris Mason <clm@meta.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Jens Axboe [Fri, 20 Dec 2024 15:47:49 +0000 (08:47 -0700)]

mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue

When a buffered write submitted with IOCB_DONTCACHE has been successfully
submitted, call filemap_fdatawrite_range_kick() to kick off the IO. File
systems call generic_write_sync() for any successful buffered write
submission, hence add the logic here rather than needing to modify the
file system.

Link: https://lkml.kernel.org/r/20241220154831.1086649-12-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Chris Mason <clm@meta.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Jens Axboe [Fri, 20 Dec 2024 15:47:48 +0000 (08:47 -0700)]

mm/filemap: add filemap_fdatawrite_range_kick() helper

Works like filemap_fdatawrite_range(), except it's a non-integrity data
writeback and hence only starts writeback on the specified range. Will
help facilitate generically starting uncached writeback from
generic_write_sync(), as header dependencies preclude doing this inline
from fs.h.

Link: https://lkml.kernel.org/r/20241220154831.1086649-11-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Chris Mason <clm@meta.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Jens Axboe [Fri, 20 Dec 2024 15:47:47 +0000 (08:47 -0700)]

mm/filemap: drop streaming/uncached pages when writeback completes

If the folio is marked as streaming, drop pages when writeback completes.
Intended to be used with RWF_DONTCACHE, to avoid needing sync writes for
uncached IO.

Link: https://lkml.kernel.org/r/20241220154831.1086649-10-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Chris Mason <clm@meta.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Jens Axboe [Fri, 20 Dec 2024 15:47:46 +0000 (08:47 -0700)]

mm/filemap: add read support for RWF_DONTCACHE

Add RWF_DONTCACHE as a read operation flag, which means that any data read
wil be removed from the page cache upon completion.  Uses the page cache
to synchronize, and simply prunes folios that were instantiated when the
operation completes.  While it would be possible to use private pages for
this, using the page cache as synchronization is handy for a variety of
reasons:

1) No special truncate magic is needed
2) Async buffered reads need some place to serialize, using the page
   cache is a lot easier than writing extra code for this
3) The pruning cost is pretty reasonable

and the code to support this is much simpler as a result.

You can think of uncached buffered IO as being the much more attractive
cousin of O_DIRECT - it has none of the restrictions of O_DIRECT.  Yes, it
will copy the data, but unlike regular buffered IO, it doesn't run into
the unpredictability of the page cache in terms of reclaim.  As an
example, on a test box with 32 drives, reading them with buffered IO looks
as follows:

Reading bs 65536, uncached 0
  1s: 145945MB/sec
  2s: 158067MB/sec
  3s: 157007MB/sec
  4s: 148622MB/sec
  5s: 118824MB/sec
  6s: 70494MB/sec
  7s: 41754MB/sec
  8s: 90811MB/sec
  9s: 92204MB/sec
10s: 95178MB/sec
11s: 95488MB/sec
12s: 95552MB/sec
13s: 96275MB/sec

where it's quite easy to see where the page cache filled up, and
performance went from good to erratic, and finally settles at a much
lower rate. Looking at top while this is ongoing, we see:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
7535 root      20   0  267004      0      0 S  3199   0.0   8:40.65 uncached
3326 root      20   0       0      0      0 R 100.0   0.0   0:16.40 kswapd4
3327 root      20   0       0      0      0 R 100.0   0.0   0:17.22 kswapd5
3328 root      20   0       0      0      0 R 100.0   0.0   0:13.29 kswapd6
3332 root      20   0       0      0      0 R 100.0   0.0   0:11.11 kswapd10
3339 root      20   0       0      0      0 R 100.0   0.0   0:16.25 kswapd17
3348 root      20   0       0      0      0 R 100.0   0.0   0:16.40 kswapd26
3343 root      20   0       0      0      0 R 100.0   0.0   0:16.30 kswapd21
3344 root      20   0       0      0      0 R 100.0   0.0   0:11.92 kswapd22
3349 root      20   0       0      0      0 R 100.0   0.0   0:16.28 kswapd27
3352 root      20   0       0      0      0 R  99.7   0.0   0:11.89 kswapd30
3353 root      20   0       0      0      0 R  96.7   0.0   0:16.04 kswapd31
3329 root      20   0       0      0      0 R  96.4   0.0   0:11.41 kswapd7
3345 root      20   0       0      0      0 R  96.4   0.0   0:13.40 kswapd23
3330 root      20   0       0      0      0 S  91.1   0.0   0:08.28 kswapd8
3350 root      20   0       0      0      0 S  86.8   0.0   0:11.13 kswapd28
3325 root      20   0       0      0      0 S  76.3   0.0   0:07.43 kswapd3
3341 root      20   0       0      0      0 S  74.7   0.0   0:08.85 kswapd19
3334 root      20   0       0      0      0 S  71.7   0.0   0:10.04 kswapd12
3351 root      20   0       0      0      0 R  60.5   0.0   0:09.59 kswapd29
3323 root      20   0       0      0      0 R  57.6   0.0   0:11.50 kswapd1
[...]

which is just showing a partial list of the 32 kswapd threads that are
running mostly full tilt, burning ~28 full CPU cores.

If the same test case is run with RWF_DONTCACHE set for the buffered read,
the output looks as follows:

Reading bs 65536, uncached 0
  1s: 153144MB/sec
  2s: 156760MB/sec
  3s: 158110MB/sec
  4s: 158009MB/sec
  5s: 158043MB/sec
  6s: 157638MB/sec
  7s: 157999MB/sec
  8s: 158024MB/sec
  9s: 157764MB/sec
10s: 157477MB/sec
11s: 157417MB/sec
12s: 157455MB/sec
13s: 157233MB/sec
14s: 156692MB/sec

which is just chugging along at ~155GB/sec of read performance. Looking
at top, we see:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
7961 root      20   0  267004      0      0 S  3180   0.0   5:37.95 uncached
8024 axboe     20   0   14292   4096      0 R   1.0   0.0   0:00.13 top

where just the test app is using CPU, no reclaim is taking place outside
of the main thread.  Not only is performance 65% better, it's also using
half the CPU to do it.

Link: https://lkml.kernel.org/r/20241220154831.1086649-9-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Chris Mason <clm@meta.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Jens Axboe [Fri, 20 Dec 2024 15:47:45 +0000 (08:47 -0700)]

fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag

If a file system supports uncached buffered IO, it may set FOP_DONTCACHE
and enable support for RWF_DONTCACHE. If RWF_DONTCACHE is attempted
without the file system supporting it, it'll get errored with -EOPNOTSUPP.

Link: https://lkml.kernel.org/r/20241220154831.1086649-8-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Chris Mason <clm@meta.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Jens Axboe [Fri, 20 Dec 2024 15:47:44 +0000 (08:47 -0700)]

mm/truncate: add folio_unmap_invalidate() helper

Add a folio_unmap_invalidate() helper, which unmaps and invalidates a
given folio. The caller must already have locked the folio. Embed the
old invalidate_complete_folio2() helper in there as well, as nobody else
calls it.

Use this new helper in invalidate_inode_pages2_range(), rather than
duplicate the code there.

In preparation for using this elsewhere as well, have it take a gfp_t mask
rather than assume GFP_KERNEL is the right choice. This bubbles back to
invalidate_complete_folio2() as well.

Link: https://lkml.kernel.org/r/20241220154831.1086649-7-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Chris Mason <clm@meta.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom