]> www.infradead.org Git - users/hch/dma-mapping.git/log
users/hch/dma-mapping.git
2 years agomaple_tree: fix get wrong data_end in mtree_lookup_walk()
Peng Zhang [Tue, 14 Mar 2023 12:42:01 +0000 (20:42 +0800)]
maple_tree: fix get wrong data_end in mtree_lookup_walk()

if (likely(offset > end))
max = pivots[offset];

The above code should be changed to if (likely(offset < end)), which is
correct.  This affects the correctness of ma_data_end().  Now it seems
that the final result will not be wrong, but it is best to change it.
This patch does not change the code as above, because it simplifies the
code by the way.

Link: https://lkml.kernel.org/r/20230314124203.91572-1-zhangpeng.00@bytedance.com
Link: https://lkml.kernel.org/r/20230314124203.91572-2-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
Rongwei Wang [Tue, 4 Apr 2023 15:47:16 +0000 (23:47 +0800)]
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()

The si->lock must be held when deleting the si from the available list.
Otherwise, another thread can re-add the si to the available list, which
can lead to memory corruption.  The only place we have found where this
happens is in the swapoff path.  This case can be described as below:

core 0                       core 1
swapoff

del_from_avail_list(si)      waiting

try lock si->lock            acquire swap_avail_lock
                             and re-add si into
                             swap_avail_head

acquire si->lock but missing si already being added again, and continuing
to clear SWP_WRITEOK, etc.

It can be easily found that a massive warning messages can be triggered
inside get_swap_pages() by some special cases, for example, we call
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
run much swapon-swapoff operations (e.g.  stress-ng-swap).

However, in the worst case, panic can be caused by the above scene.  In
swapoff(), the memory used by si could be kept in swap_info[] after
turning off a swap.  This means memory corruption will not be caused
immediately until allocated and reset for a new swap in the swapon path.
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)

------------[ cut here ]------------
top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a
prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d
next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
Modules linked in: rfkill(E) crct10dif_ce(E)...
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
pc : plist_check_prev_next_node+0x50/0x70
lr : plist_check_prev_next_node+0x50/0x70
sp : ffff0018009d3c30
x29: ffff0018009d3c40 x28: ffff800011b32a98
x27: 0000000000000000 x26: ffff001803908000
x25: ffff8000128ea088 x24: ffff800011b32a48
x23: 0000000000000028 x22: ffff001800875c00
x21: ffff800010f9e520 x20: ffff001800875c00
x19: ffff001800fdc6e0 x18: 0000000000000030
x17: 0000000000000000 x16: 0000000000000000
x15: 0736076307640766 x14: 0730073007380731
x13: 0736076307640766 x12: 0730073007380731
x11: 000000000004058d x10: 0000000085a85b76
x9 : ffff8000101436e4 x8 : ffff800011c8ce08
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0017df9ed338 x4 : 0000000000000001
x3 : ffff8017ce62a000 x2 : ffff0017df9ed340
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 plist_check_prev_next_node+0x50/0x70
 plist_check_head+0x80/0xf0
 plist_add+0x28/0x140
 add_to_avail_list+0x9c/0xf0
 _enable_swap_info+0x78/0xb4
 __do_sys_swapon+0x918/0xa10
 __arm64_sys_swapon+0x20/0x30
 el0_svc_common+0x8c/0x220
 do_el0_svc+0x2c/0x90
 el0_svc+0x1c/0x30
 el0_sync_handler+0xa8/0xb0
 el0_sync+0x148/0x180
irq event stamp: 2082270

Now, si->lock locked before calling 'del_from_avail_list()' to make sure
other thread see the si had been deleted and SWP_WRITEOK cleared together,
will not reinsert again.

This problem exists in versions after stable 5.10.y.

Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com
Fixes: a2468cc9bfdff ("swap: choose swap device according to numa node")
Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com>
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agonilfs2: fix sysfs interface lifetime
Ryusuke Konishi [Thu, 30 Mar 2023 20:55:15 +0000 (05:55 +0900)]
nilfs2: fix sysfs interface lifetime

The current nilfs2 sysfs support has issues with the timing of creation
and deletion of sysfs entries, potentially leading to null pointer
dereferences, use-after-free, and lockdep warnings.

Some of the sysfs attributes for nilfs2 per-filesystem instance refer to
metadata file "cpfile", "sufile", or "dat", but
nilfs_sysfs_create_device_group that creates those attributes is executed
before the inodes for these metadata files are loaded, and
nilfs_sysfs_delete_device_group which deletes these sysfs entries is
called after releasing their metadata file inodes.

Therefore, access to some of these sysfs attributes may occur outside of
the lifetime of these metadata files, resulting in inode NULL pointer
dereferences or use-after-free.

In addition, the call to nilfs_sysfs_create_device_group() is made during
the locking period of the semaphore "ns_sem" of nilfs object, so the
shrinker call caused by the memory allocation for the sysfs entries, may
derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in
nilfs_evict_inode()".

Since nilfs2 may acquire "ns_sem" deep in the call stack holding other
locks via its error handler __nilfs_error(), this causes lockdep to report
circular locking.  This is a false positive and no circular locking
actually occurs as no inodes exist yet when
nilfs_sysfs_create_device_group() is called.  Fortunately, the lockdep
warnings can be resolved by simply moving the call to
nilfs_sysfs_create_device_group() out of "ns_sem".

This fixes these sysfs issues by revising where the device's sysfs
interface is created/deleted and keeping its lifetime within the lifetime
of the metadata files above.

Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com
Fixes: dd70edbde262 ("nilfs2: integrate sysfs support into driver")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+979fa7f9c0d086fdc282@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com
Reported-by: syzbot+5b7d542076d9bddc3c6a@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: take a page reference when removing device exclusive entries
Alistair Popple [Thu, 30 Mar 2023 01:25:19 +0000 (12:25 +1100)]
mm: take a page reference when removing device exclusive entries

Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device.  Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access.  When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.

The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present.  However the fault
handling code does not hold the PTL when taking the page lock.  This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.

This can lead to threads locking or waiting on a folio with a zero
refcount.  Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration.  This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero.

Fix this by trying to take a reference on the folio before locking it.
The code already checks the PTE under the PTL and aborts if the entry is
no longer there.  It is also possible the folio has been unmapped, freed
and re-allocated allowing a reference to be taken on an unrelated folio.
This case is also detected by the PTE check and the folio is unlocked
without further changes.

Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: vmalloc: avoid warn_alloc noise caused by fatal signal
Yafang Shao [Thu, 30 Mar 2023 16:26:25 +0000 (16:26 +0000)]
mm: vmalloc: avoid warn_alloc noise caused by fatal signal

There're some suspicious warn_alloc on my test serer, for example,

[13366.518837] warn_alloc: 81 callbacks suppressed
[13366.518841] test_verifier: vmalloc error: size 4096, page order 0, failed to allocate pages, mode:0x500dc2(GFP_HIGHUSER|__GFP_ZERO|__GFP_ACCOUNT), nodemask=(null),cpuset=/,mems_allowed=0-1
[13366.522240] CPU: 30 PID: 722463 Comm: test_verifier Kdump: loaded Tainted: G        W  O       6.2.0+ #638
[13366.524216] Call Trace:
[13366.524702]  <TASK>
[13366.525148]  dump_stack_lvl+0x6c/0x80
[13366.525712]  dump_stack+0x10/0x20
[13366.526239]  warn_alloc+0x119/0x190
[13366.526783]  ? alloc_pages_bulk_array_mempolicy+0x9e/0x2a0
[13366.527470]  __vmalloc_area_node+0x546/0x5b0
[13366.528066]  __vmalloc_node_range+0xc2/0x210
[13366.528660]  __vmalloc_node+0x42/0x50
[13366.529186]  ? bpf_prog_realloc+0x53/0xc0
[13366.529743]  __vmalloc+0x1e/0x30
[13366.530235]  bpf_prog_realloc+0x53/0xc0
[13366.530771]  bpf_patch_insn_single+0x80/0x1b0
[13366.531351]  bpf_jit_blind_constants+0xe9/0x1c0
[13366.531932]  ? __free_pages+0xee/0x100
[13366.532457]  ? free_large_kmalloc+0x58/0xb0
[13366.533002]  bpf_int_jit_compile+0x8c/0x5e0
[13366.533546]  bpf_prog_select_runtime+0xb4/0x100
[13366.534108]  bpf_prog_load+0x6b1/0xa50
[13366.534610]  ? perf_event_task_tick+0x96/0xb0
[13366.535151]  ? security_capable+0x3a/0x60
[13366.535663]  __sys_bpf+0xb38/0x2190
[13366.536120]  ? kvm_clock_get_cycles+0x9/0x10
[13366.536643]  __x64_sys_bpf+0x1c/0x30
[13366.537094]  do_syscall_64+0x38/0x90
[13366.537554]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[13366.538107] RIP: 0033:0x7f78310f8e29
[13366.538561] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 17 e0 2c 00 f7 d8 64 89 01 48
[13366.540286] RSP: 002b:00007ffe2a61fff8 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
[13366.541031] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f78310f8e29
[13366.541749] RDX: 0000000000000080 RSI: 00007ffe2a6200b0 RDI: 0000000000000005
[13366.542470] RBP: 00007ffe2a620010 R08: 00007ffe2a6202a0 R09: 00007ffe2a6200b0
[13366.543183] R10: 00000000000f423e R11: 0000000000000206 R12: 0000000000407800
[13366.543900] R13: 00007ffe2a620540 R14: 0000000000000000 R15: 0000000000000000
[13366.544623]  </TASK>
[13366.545260] Mem-Info:
[13366.546121] active_anon:81319 inactive_anon:20733 isolated_anon:0
 active_file:69450 inactive_file:5624 isolated_file:0
 unevictable:0 dirty:10 writeback:0
 slab_reclaimable:69649 slab_unreclaimable:48930
 mapped:27400 shmem:12868 pagetables:4929
 sec_pagetables:0 bounce:0
 kernel_misc_reclaimable:0
 free:15870308 free_pcp:142935 free_cma:0
[13366.551886] Node 0 active_anon:224836kB inactive_anon:33528kB active_file:175692kB inactive_file:13752kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:59248kB dirty:32kB writeback:0kB shmem:18252kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:4616kB pagetables:10664kB sec_pagetables:0kB all_unreclaimable? no
[13366.555184] Node 1 active_anon:100440kB inactive_anon:49404kB active_file:102108kB inactive_file:8744kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:50352kB dirty:8kB writeback:0kB shmem:33220kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:3896kB pagetables:9052kB sec_pagetables:0kB all_unreclaimable? no
[13366.558262] Node 0 DMA free:15360kB boost:0kB min:304kB low:380kB high:456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[13366.560821] lowmem_reserve[]: 0 2735 31873 31873 31873
[13366.561981] Node 0 DMA32 free:2790904kB boost:0kB min:56028kB low:70032kB high:84036kB reserved_highatomic:0KB active_anon:1936kB inactive_anon:20kB active_file:396kB inactive_file:344kB unevictable:0kB writepending:0kB present:3129200kB managed:2801520kB mlocked:0kB bounce:0kB free_pcp:5188kB local_pcp:0kB free_cma:0kB
[13366.565148] lowmem_reserve[]: 0 0 29137 29137 29137
[13366.566168] Node 0 Normal free:28533824kB boost:0kB min:596740kB low:745924kB high:895108kB reserved_highatomic:28672KB active_anon:222900kB inactive_anon:33508kB active_file:175296kB inactive_file:13408kB unevictable:0kB writepending:32kB present:30408704kB managed:29837172kB mlocked:0kB bounce:0kB free_pcp:295724kB local_pcp:0kB free_cma:0kB
[13366.569485] lowmem_reserve[]: 0 0 0 0 0
[13366.570416] Node 1 Normal free:32141144kB boost:0kB min:660504kB low:825628kB high:990752kB reserved_highatomic:69632KB active_anon:100440kB inactive_anon:49404kB active_file:102108kB inactive_file:8744kB unevictable:0kB writepending:8kB present:33554432kB managed:33025372kB mlocked:0kB bounce:0kB free_pcp:270880kB local_pcp:46860kB free_cma:0kB
[13366.573403] lowmem_reserve[]: 0 0 0 0 0
[13366.574015] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB
[13366.575474] Node 0 DMA32: 782*4kB (UME) 756*8kB (UME) 736*16kB (UME) 745*32kB (UME) 694*64kB (UME) 653*128kB (UME) 595*256kB (UME) 552*512kB (UME) 454*1024kB (UME) 347*2048kB (UME) 246*4096kB (UME) = 2790904kB
[13366.577442] Node 0 Normal: 33856*4kB (UMEH) 51815*8kB (UMEH) 42418*16kB (UMEH) 36272*32kB (UMEH) 22195*64kB (UMEH) 10296*128kB (UMEH) 7238*256kB (UMEH) 5638*512kB (UEH) 5337*1024kB (UMEH) 3506*2048kB (UMEH) 1470*4096kB (UME) = 28533784kB
[13366.580460] Node 1 Normal: 15776*4kB (UMEH) 37485*8kB (UMEH) 29509*16kB (UMEH) 21420*32kB (UMEH) 14818*64kB (UMEH) 13051*128kB (UMEH) 9918*256kB (UMEH) 7374*512kB (UMEH) 5397*1024kB (UMEH) 3887*2048kB (UMEH) 2002*4096kB (UME) = 32141240kB
[13366.583027] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[13366.584380] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[13366.585702] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[13366.587042] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[13366.588372] 87386 total pagecache pages
[13366.589266] 0 pages in swap cache
[13366.590327] Free swap  = 0kB
[13366.591227] Total swap = 0kB
[13366.592142] 16777082 pages RAM
[13366.593057] 0 pages HighMem/MovableOnly
[13366.594037] 357226 pages reserved
[13366.594979] 0 pages hwpoisoned

This failure really confuse me as there're still lots of available pages.
Finally I figured out it was caused by a fatal signal.  When a process is
allocating memory via vm_area_alloc_pages(), it will break directly even
if it hasn't allocated the requested pages when it receives a fatal
signal.  In that case, we shouldn't show this warn_alloc, as it is
useless.  We only need to show this warning when there're really no enough
pages.

Link: https://lkml.kernel.org/r/20230330162625.13604-1-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agonilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
Tetsuo Handa [Sun, 26 Mar 2023 15:21:46 +0000 (00:21 +0900)]
nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field

nilfs_btree_assign_p() and nilfs_direct_assign_p() are not initializing
"struct nilfs_binfo_dat"->bi_pad field, causing uninit-value reports when
being passed to CRC function.

Link: https://lkml.kernel.org/r/20230326152146.15872-1-konishi.ryusuke@gmail.com
Reported-by: syzbot <syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b
Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com>
Link: https://lkml.kernel.org/r/CANX2M5bVbzRi6zH3PTcNE_31TzerstOXUa9Bay4E6y6dX23_pg@mail.gmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agonilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
Ryusuke Konishi [Mon, 27 Mar 2023 17:53:18 +0000 (02:53 +0900)]
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()

The finalization of nilfs_segctor_thread() can race with
nilfs_segctor_kill_thread() which terminates that thread, potentially
causing a use-after-free BUG as KASAN detected.

At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member
of "struct nilfs_sc_info" to indicate the thread has finished, and then
notifies nilfs_segctor_kill_thread() of this using waitqueue
"sc_wait_task" on the struct nilfs_sc_info.

However, here, immediately after the NULL assignment to "sc_task", it is
possible that nilfs_segctor_kill_thread() will detect it and return to
continue the deallocation, freeing the nilfs_sc_info structure before the
thread does the notification.

This fixes the issue by protecting the NULL assignment to "sc_task" and
its notification, with spinlock "sc_state_lock" of the struct
nilfs_sc_info.  Since nilfs_segctor_kill_thread() does a final check to
see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate
the race.

Link: https://lkml.kernel.org/r/20230327175318.8060-1-konishi.ryusuke@gmail.com
Reported-by: syzbot+b08ebcc22f8f3e6be43a@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/00000000000000660d05f7dfa877@google.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agozsmalloc: document freeable stats
Sergey Senozhatsky [Sat, 25 Mar 2023 02:46:31 +0000 (11:46 +0900)]
zsmalloc: document freeable stats

When freeable class stat was added to classes file (back in 2016) we
forgot to update zsmalloc documentation.  Fix that.

Link: https://lkml.kernel.org/r/20230325024631.2817153-3-senozhatsky@chromium.org
Fixes: 1120ed548394 ("mm/zsmalloc: add `freeable' column to pool stat")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agozsmalloc: document new fullness grouping
Sergey Senozhatsky [Sat, 25 Mar 2023 02:46:30 +0000 (11:46 +0900)]
zsmalloc: document new fullness grouping

Patch series "zsmalloc: minor documentation updates".

Two minor patches that bring zsmalloc documentation up to date.

This patch (of 2):

Update documentation and reflect new zspages fullness grouping (we don't
use almost_empty and almost_full anymore).

Link: https://lkml.kernel.org/r/20230325024631.2817153-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20230325024631.2817153-2-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Fixes: 67e157eb3639 ("zsmalloc: show per fullness group class stats")
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agofsdax: force clear dirty mark if CoW
Shiyang Ruan [Fri, 24 Mar 2023 10:28:00 +0000 (10:28 +0000)]
fsdax: force clear dirty mark if CoW

XFS allows CoW on non-shared extents to combat fragmentation[1].  The old
non-shared extent could be mwrited before, its dax entry is marked dirty.

This results in a WARNing:

[   28.512349] ------------[ cut here ]------------
[   28.512622] WARNING: CPU: 2 PID: 5255 at fs/dax.c:390 dax_insert_entry+0x342/0x390
[   28.513050] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables
[   28.515462] CPU: 2 PID: 5255 Comm: fsstress Kdump: loaded Not tainted 6.3.0-rc1-00001-g85e1481e19c1-dirty #117
[   28.515902] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.1-1-1 04/01/2014
[   28.516307] RIP: 0010:dax_insert_entry+0x342/0x390
[   28.516536] Code: 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 45 20 48 83 c0 01 e9 e2 fe ff ff 48 8b 45 20 48 83 c0 01 e9 cd fe ff ff <0f> 0b e9 53 ff ff ff 48 8b 7c 24 08 31 f6 e8 1b 61 a1 00 eb 8c 48
[   28.517417] RSP: 0000:ffffc9000845fb18 EFLAGS: 00010086
[   28.517721] RAX: 0000000000000053 RBX: 0000000000000155 RCX: 000000000018824b
[   28.518113] RDX: 0000000000000000 RSI: ffffffff827525a6 RDI: 00000000ffffffff
[   28.518515] RBP: ffffea00062092c0 R08: 0000000000000000 R09: ffffc9000845f9c8
[   28.518905] R10: 0000000000000003 R11: ffffffff82ddb7e8 R12: 0000000000000155
[   28.519301] R13: 0000000000000000 R14: 000000000018824b R15: ffff88810cfa76b8
[   28.519703] FS:  00007f14a0c94740(0000) GS:ffff88817bd00000(0000) knlGS:0000000000000000
[   28.520148] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.520472] CR2: 00007f14a0c8d000 CR3: 000000010321c004 CR4: 0000000000770ee0
[   28.520863] PKRU: 55555554
[   28.521043] Call Trace:
[   28.521219]  <TASK>
[   28.521368]  dax_fault_iter+0x196/0x390
[   28.521595]  dax_iomap_pte_fault+0x19b/0x3d0
[   28.521852]  __xfs_filemap_fault+0x234/0x2b0
[   28.522116]  __do_fault+0x30/0x130
[   28.522334]  do_fault+0x193/0x340
[   28.522586]  __handle_mm_fault+0x2d3/0x690
[   28.522975]  handle_mm_fault+0xe6/0x2c0
[   28.523259]  do_user_addr_fault+0x1bc/0x6f0
[   28.523521]  exc_page_fault+0x60/0x140
[   28.523763]  asm_exc_page_fault+0x22/0x30
[   28.524001] RIP: 0033:0x7f14a0b589ca
[   28.524225] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
[   28.525198] RSP: 002b:00007fff1dea1c98 EFLAGS: 00010202
[   28.525505] RAX: 000000000000001e RBX: 000000000014a000 RCX: 0000000000006046
[   28.525895] RDX: 00007f14a0c82000 RSI: 000000000000001e RDI: 00007f14a0c8d000
[   28.526290] RBP: 000000000000006f R08: 0000000000000004 R09: 000000000014a000
[   28.526681] R10: 0000000000000008 R11: 0000000000000246 R12: 028f5c28f5c28f5c
[   28.527067] R13: 8f5c28f5c28f5c29 R14: 0000000000011046 R15: 00007f14a0c946c0
[   28.527449]  </TASK>
[   28.527600] ---[ end trace 0000000000000000 ]---

To be able to delete this entry, clear its dirty mark before
invalidate_inode_pages2_range().

[1] https://lore.kernel.org/linux-xfs/20230321151339.GA11376@frogsfrogsfrogs/

Link: https://lkml.kernel.org/r/1679653680-2-1-git-send-email-ruansy.fnst@fujitsu.com
Fixes: f80e1668888f3 ("fsdax: invalidate pages when CoW")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/hugetlb: fix uffd wr-protection for CoW optimization path
Peter Xu [Tue, 21 Mar 2023 19:18:40 +0000 (15:18 -0400)]
mm/hugetlb: fix uffd wr-protection for CoW optimization path

This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
writable even with uffd-wp bit set.  It only happens with hugetlb private
mappings, when someone firstly wr-protects a missing pte (which will
install a pte marker), then a write to the same page without any prior
access to the page.

Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
reaching hugetlb_wp() to avoid taking more locks that userfault won't
need.  However there's one CoW optimization path that can trigger
hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap.

This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
is detected.  The new path will only trigger in the CoW optimization path
because generic hugetlb_fault() (e.g.  when a present pte was
wr-protected) will resolve the uffd-wp bit already.  Also make sure
anonymous UNSHARE won't be affected and can still be resolved, IOW only
skip CoW not CoR.

This patch will be needed for v5.19+ hence copy stable.

[peterx@redhat.com: v2]
Link: https://lkml.kernel.org/r/ZBzOqwF2wrHgBVZb@x1n
[peterx@redhat.com: v3]
Link: https://lkml.kernel.org/r/20230324142620.2344140-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230321191840.1897940-1-peterx@redhat.com
Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: enable maple tree RCU mode by default
Liam R. Howlett [Mon, 27 Feb 2023 17:36:07 +0000 (09:36 -0800)]
mm: enable maple tree RCU mode by default

Use the maple tree in RCU mode for VMA tracking.

The maple tree tracks the stack and is able to update the pivot
(lower/upper boundary) in-place to allow the page fault handler to write
to the tree while holding just the mmap read lock.  This is safe as the
writes to the stack have a guard VMA which ensures there will always be a
NULL in the direction of the growth and thus will only update a pivot.

It is possible, but not recommended, to have VMAs that grow up/down
without guard VMAs.  syzbot has constructed a testcase which sets up a VMA
to grow and consume the empty space.  Overwriting the entire NULL entry
causes the tree to be altered in a way that is not safe for concurrent
readers; the readers may see a node being rewritten or one that does not
match the maple state they are using.

Enabling RCU mode allows the concurrent readers to see a stable node and
will return the expected result.

[Liam.Howlett@Oracle.com: we don't need to free the nodes with RCU[
Link: https://lore.kernel.org/linux-mm/000000000000b0a65805f663ace6@google.com/
Link: https://lkml.kernel.org/r/20230227173632.3292573-9-surenb@google.com
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: syzbot+8d95422d3537159ca390@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomaple_tree: add RCU lock checking to rcu callback functions
Liam R. Howlett [Mon, 27 Feb 2023 17:36:06 +0000 (09:36 -0800)]
maple_tree: add RCU lock checking to rcu callback functions

Dereferencing RCU objects within the RCU callback without the RCU check
has caused lockdep to complain.  Fix the RCU dereferencing by using the
RCU callback lock to ensure the operation is safe.

Also stop creating a new lock to use for dereferencing during destruction
of the tree or subtree.  Instead, pass through a pointer to the tree that
has the lock that is held for RCU dereferencing checking.  It also does
not make sense to use the maple state in the freeing scenario as the tree
walk is a special case where the tree no longer has the normal encodings
and parent pointers.

Link: https://lkml.kernel.org/r/20230227173632.3292573-8-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomaple_tree: add smp_rmb() to dead node detection
Liam R. Howlett [Mon, 27 Feb 2023 17:36:05 +0000 (09:36 -0800)]
maple_tree: add smp_rmb() to dead node detection

Add an smp_rmb() before reading the parent pointer to ensure that anything
read from the node prior to the parent pointer hasn't been reordered ahead
of this check.

The is necessary for RCU mode.

Link: https://lkml.kernel.org/r/20230227173632.3292573-7-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomaple_tree: fix write memory barrier of nodes once dead for RCU mode
Liam R. Howlett [Mon, 27 Feb 2023 17:36:04 +0000 (09:36 -0800)]
maple_tree: fix write memory barrier of nodes once dead for RCU mode

During the development of the maple tree, the strategy of freeing multiple
nodes changed and, in the process, the pivots were reused to store
pointers to dead nodes.  To ensure the readers see accurate pivots, the
writers need to mark the nodes as dead and call smp_wmb() to ensure any
readers can identify the node as dead before using the pivot values.

There were two places where the old method of marking the node as dead
without smp_wmb() were being used, which resulted in RCU readers seeing
the wrong pivot value before seeing the node was dead.  Fix this race
condition by using mte_set_node_dead() which has the smp_wmb() call to
ensure the race is closed.

Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed
are marked as dead to ensure there are no other call paths besides the two
updated paths.

This is necessary for the RCU mode of the maple tree.

Link: https://lkml.kernel.org/r/20230227173632.3292573-6-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomaple_tree: remove extra smp_wmb() from mas_dead_leaves()
Liam Howlett [Mon, 27 Feb 2023 17:36:03 +0000 (09:36 -0800)]
maple_tree: remove extra smp_wmb() from mas_dead_leaves()

The call to mte_set_dead_node() before the smp_wmb() already calls
smp_wmb() so this is not needed.  This is an optimization for the RCU mode
of the maple tree.

Link: https://lkml.kernel.org/r/20230227173632.3292573-5-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomaple_tree: fix freeing of nodes in rcu mode
Liam Howlett [Mon, 27 Feb 2023 17:36:02 +0000 (09:36 -0800)]
maple_tree: fix freeing of nodes in rcu mode

The walk to destroy the nodes was not always setting the node type and
would result in a destroy method potentially using the values as nodes.
Avoid this by setting the correct node types.  This is necessary for the
RCU mode of the maple tree.

Link: https://lkml.kernel.org/r/20230227173632.3292573-4-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomaple_tree: detect dead nodes in mas_start()
Liam Howlett [Mon, 27 Feb 2023 17:36:01 +0000 (09:36 -0800)]
maple_tree: detect dead nodes in mas_start()

When initially starting a search, the root node may already be in the
process of being replaced in RCU mode.  Detect and restart the walk if
this is the case.  This is necessary for RCU mode of the maple tree.

Link: https://lkml.kernel.org/r/20230227173632.3292573-3-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomaple_tree: be more cautious about dead nodes
Liam Howlett [Mon, 27 Feb 2023 17:36:00 +0000 (09:36 -0800)]
maple_tree: be more cautious about dead nodes

Patch series "Fix VMA tree modification under mmap read lock".

Syzbot reported a BUG_ON in mm/mmap.c which was found to be caused by an
inconsistency between threads walking the VMA maple tree.  The
inconsistency is caused by the page fault handler modifying the maple tree
while holding the mmap_lock for read.

This only happens for stack VMAs.  We had thought this was safe as it only
modifies a single pivot in the tree.  Unfortunately, syzbot constructed a
test case where the stack had no guard page and grew the stack to abut the
next VMA.  This causes us to delete the NULL entry between the two VMAs
and rewrite the node.

We considered several options for fixing this, including dropping the
mmap_lock, then reacquiring it for write; and relaxing the definition of
the tree to permit a zero-length NULL entry in the node.  We decided the
best option was to backport some of the RCU patches from -next, which
solve the problem by allocating a new node and RCU-freeing the old node.
Since the problem exists in 6.1, we preferred a solution which is similar
to the one we intended to merge next merge window.

These patches have been in -next since next-20230301, and have received
intensive testing in Android as part of the RCU page fault patchset.  They
were also sent as part of the "Per-VMA locks" v4 patch series.  Patches 1
to 7 are bug fixes for RCU mode of the tree and patch 8 enables RCU mode
for the tree.

Performance v6.3-rc3 vs patched v6.3-rc3: Running these changes through
mmtests showed there was a 15-20% performance decrease in
will-it-scale/brk1-processes.  This tests creating and inserting a single
VMA repeatedly through the brk interface and isn't representative of any
real world applications.

This patch (of 8):

ma_pivots() and ma_data_end() may be called with a dead node.  Ensure to
that the node isn't dead before using the returned values.

This is necessary for RCU mode of the maple tree.

Link: https://lkml.kernel.org/r/20230327185532.2354250-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-2-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chriscli@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: freak07 <michalechner92@googlemail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomailmap: add an entry for Leonard Crestez
Florian Fainelli [Fri, 24 Mar 2023 13:07:36 +0000 (06:07 -0700)]
mailmap: add an entry for Leonard Crestez

Link: https://lkml.kernel.org/r/20230324130737.3360169-1-f.fainelli@gmail.com
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
Cc: Leonard Crestez <cdleonard@gmail.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Vasily Averin <vasily.averin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: kfence: fix handling discontiguous page
Muchun Song [Thu, 23 Mar 2023 02:50:03 +0000 (10:50 +0800)]
mm: kfence: fix handling discontiguous page

The struct pages could be discontiguous when the kfence pool is allocated
via alloc_contig_pages() with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP.

This may result in setting PG_slab and memcg_data to a arbitrary
address (may be not used as a struct page), which in the worst case
might corrupt the kernel.

So the iteration should use nth_page().

Link: https://lkml.kernel.org/r/20230323025003.94447-1-songmuchun@bytedance.com
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: kfence: fix PG_slab and memcg_data clearing
Muchun Song [Mon, 20 Mar 2023 03:00:59 +0000 (11:00 +0800)]
mm: kfence: fix PG_slab and memcg_data clearing

It does not reset PG_slab and memcg_data when KFENCE fails to initialize
kfence pool at runtime.  It is reporting a "Bad page state" message when
kfence pool is freed to buddy.  The checking of whether it is a compound
head page seems unnecessary since we already guarantee this when
allocating kfence pool.   Remove the check to simplify the code.

Link: https://lkml.kernel.org/r/20230320030059.20189-1-songmuchun@bytedance.com
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agofsdax: dedupe should compare the min of two iters' length
Shiyang Ruan [Wed, 22 Mar 2023 07:25:58 +0000 (07:25 +0000)]
fsdax: dedupe should compare the min of two iters' length

In an dedupe comparison iter loop, the length of iomap_iter decreases
because it implies the remaining length after each iteration.

The dedupe command will fail with -EIO if the range is larger than one
page size and not aligned to the page size.  Also report warning in dmesg:

[ 4338.498374] ------------[ cut here ]------------
[ 4338.498689] WARNING: CPU: 3 PID: 1415645 at fs/iomap/iter.c:16
...

The compare function should use the min length of the current iters,
not the total length.

Link: https://lkml.kernel.org/r/1679469958-2-1-git-send-email-ruansy.fnst@fujitsu.com
Fixes: 0e79e3736d54 ("fsdax: dedupe: iter two files at the same time")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agofsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN
Shiyang Ruan [Wed, 22 Mar 2023 11:11:09 +0000 (11:11 +0000)]
fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN

unshare copies data from source to destination.  But if the source is
HOLE or UNWRITTEN extents, we should zero the destination, otherwise
the HOLE or UNWRITTEN part will be user-visible old data of the new
allocated extent.

Found by running generic/649 while mounting with -o dax=always on pmem.

Link: https://lkml.kernel.org/r/1679483469-2-1-git-send-email-ruansy.fnst@fujitsu.com
Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agolib/Kconfig.debug: correct help info of LOCKDEP_STACK_TRACE_HASH_BITS
Tiezhu Yang [Tue, 21 Mar 2023 06:35:08 +0000 (14:35 +0800)]
lib/Kconfig.debug: correct help info of LOCKDEP_STACK_TRACE_HASH_BITS

We can see the following definition in kernel/locking/lockdep_internals.h:

  #define STACK_TRACE_HASH_SIZE (1 << CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS)

CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS is related with STACK_TRACE_HASH_SIZE
instead of MAX_STACK_TRACE_ENTRIES, fix it.

Link: https://lkml.kernel.org/r/1679380508-20830-1-git-send-email-yangtiezhu@loongson.cn
Fixes: 5dc33592e955 ("lockdep: Allow tuning tracing capacity constants.")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoKconfig.debug: fix SCHED_DEBUG dependency
ye xingchen [Sun, 29 Jan 2023 03:10:09 +0000 (11:10 +0800)]
Kconfig.debug: fix SCHED_DEBUG dependency

The path for SCHED_DEBUG is /sys/kernel/debug/sched.  So, SCHED_DEBUG
should depend on DEBUG_FS, not PROC_FS.

Link: https://lkml.kernel.org/r/202301291110098787982@zte.com.cn
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years ago.mailmap: add entry for Leonard Göhrs
Leonard Göhrs [Tue, 21 Mar 2023 14:55:25 +0000 (15:55 +0100)]
.mailmap: add entry for Leonard Göhrs

My very first kernel commit:

  e4e1d47c7906 ("ALSA: ppc: remove redundant checks in PS3 driver probe")

was sent with the umlaut in my last name transcribed (Göhrs -> Goehrs).

Add a mailmap entry so all my commits use the same name.

Link: https://lkml.kernel.org/r/20230321145525.1317230-1-l.goehrs@pengutronix.de
Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoLinux 6.3-rc4
Linus Torvalds [Sun, 26 Mar 2023 21:40:20 +0000 (14:40 -0700)]
Linux 6.3-rc4

2 years agoMerge tag 'usb-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 26 Mar 2023 17:22:44 +0000 (10:22 -0700)]
Merge tag 'usb-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB / Thunderbolt driver fixes from Greg KH:
 "Here are a small set of USB and Thunderbolt driver fixes for reported
  problems and a documentation update, for 6.3-rc4.

  Included in here are:

   - documentation update for uvc gadget driver

   - small thunderbolt driver fixes

   - cdns3 driver fixes

   - dwc3 driver fixes

   - dwc2 driver fixes

   - chipidea driver fixes

   - typec driver fixes

   - onboard_usb_hub device id updates

   - quirk updates

  All of these have been in linux-next with no reported problems"

* tag 'usb-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (30 commits)
  usb: dwc2: fix a race, don't power off/on phy for dual-role mode
  usb: dwc2: fix a devres leak in hw_enable upon suspend resume
  usb: chipidea: core: fix possible concurrent when switch role
  usb: chipdea: core: fix return -EINVAL if request role is the same with current role
  thunderbolt: Rename shadowed variables bit to interrupt_bit and auto_clear_bit
  thunderbolt: Disable interrupt auto clear for rings
  thunderbolt: Use const qualifier for `ring_interrupt_index`
  usb: gadget: Use correct endianness of the wLength field for WebUSB
  uas: Add US_FL_NO_REPORT_OPCODES for JMicron JMS583Gen 2
  usb: cdnsp: changes PCI Device ID to fix conflict with CNDS3 driver
  usb: cdns3: Fix issue with using incorrect PCI device function
  usb: cdnsp: Fixes issue with redundant Status Stage
  MAINTAINERS: make me a reviewer of USB/IP
  thunderbolt: Use scale field when allocating USB3 bandwidth
  thunderbolt: Limit USB3 bandwidth of certain Intel USB4 host routers
  thunderbolt: Call tb_check_quirks() after initializing adapters
  thunderbolt: Add missing UNSET_INBOUND_SBTX for retimer access
  thunderbolt: Fix memory leak in margining
  usb: dwc2: drd: fix inconsistent mode if role-switch-default-mode="host"
  docs: usb: Add documentation for the UVC Gadget
  ...

2 years agoMerge tag 'sched_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Mar 2023 16:18:30 +0000 (09:18 -0700)]
Merge tag 'sched_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:

 - Fix a corner case where vruntime of a task is not being sanitized

* tag 'sched_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Sanitize vruntime of entity being migrated

2 years agoMerge tag 'perf_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Mar 2023 16:13:35 +0000 (09:13 -0700)]
Merge tag 'perf_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Borislav Petkov:

 - Properly clear perf event status tracking in the AMD perf event
   overflow handler

* tag 'perf_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/amd/core: Always clear status for idx

2 years agoMerge tag 'core_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Mar 2023 16:06:20 +0000 (09:06 -0700)]
Merge tag 'core_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core fixes from Borislav Petkov:

 - Do the delayed RCU wakeup for kthreads in the proper order so that
   former doesn't get ignored

 - A noinstr warning fix

* tag 'core_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up
  entry: Fix noinstr warning in __enter_from_user_mode()

2 years agoMerge tag 'x86_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Mar 2023 16:01:24 +0000 (09:01 -0700)]
Merge tag 'x86_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add a AMX ptrace self test

 - Prevent a false-positive warning when retrieving the (invalid)
   address of dynamic FPU features in their init state which are not
   saved in init_fpstate at all

 - Randomize per-CPU entry areas only when KASLR is enabled

* tag 'x86_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/x86/amx: Add a ptrace test
  x86/fpu/xstate: Prevent false-positive warning in __copy_xstate_uabi_buf()
  x86/mm: Do not shuffle CPU entry areas without KASLR

2 years agoMerge tag 'smb3-client-fixes-6.3-rc3' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 26 Mar 2023 15:56:09 +0000 (08:56 -0700)]
Merge tag 'smb3-client-fixes-6.3-rc3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client fixes from Steve French:
 "Twelve cifs/smb3 client fixes (most also for stable)

   - forced umount fix

   - fix for two perf regressions

   - reconnect fixes

   - small debugging improvements

   - multichannel fixes"

* tag 'smb3-client-fixes-6.3-rc3' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix unusable share after force unmount failure
  cifs: fix dentry lookups in directory handle cache
  smb3: lower default deferred close timeout to address perf regression
  cifs: fix missing unload_nls() in smb2_reconnect()
  cifs: avoid race conditions with parallel reconnects
  cifs: append path to open_enter trace event
  cifs: print session id while listing open files
  cifs: dump pending mids for all channels in DebugData
  cifs: empty interface list when server doesn't support query interfaces
  cifs: do not poll server interfaces too regularly
  cifs: lock chan_lock outside match_session
  cifs: check only tcon status on tcon related functions

2 years agoMerge tag 'nfsd-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Sat, 25 Mar 2023 20:32:43 +0000 (13:32 -0700)]
Merge tag 'nfsd-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Fix a crash when using NFS with krb5p

* tag 'nfsd-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix a crash in gss_krb5_checksum()

2 years agoMerge tag 'xfs-6.3-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 25 Mar 2023 20:12:36 +0000 (13:12 -0700)]
Merge tag 'xfs-6.3-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull yet more xfs bug fixes from Darrick Wong:
 "The first bugfix addresses a longstanding problem where we use the
  wrong file mapping cursors when trying to compute the speculative
  preallocation quantity. This has been causing sporadic crashes when
  alwayscow mode is engaged.

  The other two fixes correct minor problems in more recent changes.

   - Fix the new allocator tracepoints because git am mismerged the
     changes such that the trace_XXX got rebased to be in function YYY
     instead of XXX

   - Ensure that the perag AGFL_RESET state is consistent with whatever
     we've just read off the disk

   - Fix a bug where we used the wrong iext cursor during a write begin"

* tag 'xfs-6.3-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix mismerged tracepoints
  xfs: clear incore AGFL_RESET state if it's not needed
  xfs: pass the correct cursor to xfs_iomap_prealloc_size

2 years agoMerge tag 'xfs-6.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 25 Mar 2023 19:57:34 +0000 (12:57 -0700)]
Merge tag 'xfs-6.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs percpu counter fixes from Darrick Wong:
 "We discovered a filesystem summary counter corruption problem that was
  traced to cpu hot-remove racing with the call to percpu_counter_sum
  that sets the free block count in the superblock when writing it to
  disk. The root cause is that percpu_counter_sum doesn't cull from
  dying cpus and hence misses those counter values if the cpu shutdown
  hooks have not yet run to merge the values.

  I'm hoping this is a fairly painless fix to the problem, since the
  dying cpu mask should generally be empty. It's been in for-next for a
  week without any complaints from the bots.

   - Fix a race in the percpu counters summation code where the
     summation failed to add in the values for any CPUs that were dying
     but not yet dead. This fixes some minor discrepancies and incorrect
     assertions when running generic/650"

* tag 'xfs-6.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  pcpcntr: remove percpu_counter_sum_all()
  fork: remove use of percpu_counter_sum_all
  pcpcntrs: fix dying cpu summation race
  cpumask: introduce for_each_cpu_or

2 years agoMerge tag 'xfs-6.3-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 25 Mar 2023 19:51:25 +0000 (12:51 -0700)]
Merge tag 'xfs-6.3-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "This batch started with some debugging enhancements to the new
  allocator refactoring that we put in 6.3-rc1 to assist developers in
  rebasing their dev branches.

  As for more serious code changes -- there's a bug fix to make the
  lockless allocator scan the whole filesystem before resorting to the
  locking allocator. We're also adding a selftest for the venerable
  directory/xattr hash function to make sure that it produces consistent
  results so that we can address any fallout as soon as possible.

   - Add a few debugging assertions so that people (me) trying to port
     code to the new allocator functions don't mess up the caller
     requirements

   - Relax some overly cautious lock ordering enforcement in the new
     allocator code, which means that file allocations will locklessly
     scan for the best space they can get before backing off to the
     traditional lock-and-really-get-it behavior

   - Add tracepoints to make it easier to trace the xfs allocator
     behavior

   - Actually test the dir/xattr hash algorithm to make sure it produces
     consistent results across all the platforms XFS supports"

* tag 'xfs-6.3-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: test dir/attr hash when loading module
  xfs: add tracepoints for each of the externally visible allocators
  xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags
  xfs: try to idiot-proof the allocators

2 years agoMerge tag 'hwmon-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groec...
Linus Torvalds [Sat, 25 Mar 2023 17:27:27 +0000 (10:27 -0700)]
Merge tag 'hwmon-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - it87: Fix voltage scaling for chips with 10.9mV ADCs

 - xgene: Fix ioremap and memremap leak

 - peci/cputemp: Fix miscalculated DTS temperature for SKX

 - hwmon core: fix potential sensor registration failure with thermal
   subsystem if of_node is missing

* tag 'hwmon-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon (it87): Fix voltage scaling for chips with 10.9mV  ADCs
  hwmon: (xgene) Fix ioremap and memremap leak
  hwmon: fix potential sensor registration fail if of_node is missing
  hwmon: (peci/cputemp) Fix miscalculated DTS for SKX

2 years agoMerge tag 'mm-hotfixes-stable-2023-03-24-17-09' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 25 Mar 2023 01:06:11 +0000 (18:06 -0700)]
Merge tag 'mm-hotfixes-stable-2023-03-24-17-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "21 hotfixes, 8 of which are cc:stable. 11 are for MM, the remainder
  are for other subsystems"

* tag 'mm-hotfixes-stable-2023-03-24-17-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
  mm: mmap: remove newline at the end of the trace
  mailmap: add entries for Richard Leitner
  kcsan: avoid passing -g for test
  kfence: avoid passing -g for test
  mm: kfence: fix using kfence_metadata without initialization in show_object()
  lib: dhry: fix unstable smp_processor_id(_) usage
  mailmap: add entry for Enric Balletbo i Serra
  mailmap: map Sai Prakash Ranjan's old address to his current one
  mailmap: map Rajendra Nayak's old address to his current one
  Revert "kasan: drop skip_kasan_poison variable in free_pages_prepare"
  mailmap: add entry for Tobias Klauser
  kasan, powerpc: don't rename memintrinsics if compiler adds prefixes
  mm/ksm: fix race with VMA iteration and mm_struct teardown
  kselftest: vm: fix unused variable warning
  mm: fix error handling for map_deny_write_exec
  mm: deduplicate error handling for map_deny_write_exec
  checksyscalls: ignore fstat to silence build warning on LoongArch
  nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy()
  test_maple_tree: add more testing for mas_empty_area()
  maple_tree: fix mas_skip_node() end slot detection
  ...

2 years agoMerge tag '6.3-rc3-ksmbd-smb3-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Sat, 25 Mar 2023 00:59:00 +0000 (17:59 -0700)]
Merge tag '6.3-rc3-ksmbd-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull ksmbd server fixes from Steve French:

 - return less confusing messages on unsupported dialects
   (STATUS_NOT_SUPPORTED instead of I/O error)

 - fix for overly frequent inactive session termination

 - fix refcount leak

 - fix bounds check problems found by static checkers

 - fix to advertise named stream support correctly

 - Fix AES256 signing bug when connected to from MacOS

* tag '6.3-rc3-ksmbd-smb3-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: return unsupported error on smb1 mount
  ksmbd: return STATUS_NOT_SUPPORTED on unsupported smb2.0 dialect
  ksmbd: don't terminate inactive sessions after a few seconds
  ksmbd: fix possible refcount leak in smb2_open()
  ksmbd: add low bound validation to FSCTL_QUERY_ALLOCATED_RANGES
  ksmbd: add low bound validation to FSCTL_SET_ZERO_DATA
  ksmbd: set FILE_NAMED_STREAMS attribute in FS_ATTRIBUTE_INFORMATION
  ksmbd: fix wrong signingkey creation when encryption is AES256

2 years agoMerge tag 'arm-fixes-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Fri, 24 Mar 2023 22:38:13 +0000 (15:38 -0700)]
Merge tag 'arm-fixes-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "As usual, most of the bug fixes address issues in the devicetree
  files, and out of these, most are for the Qualcomm and NXP platforms,
  including:

   - A missing 'reserved-memory' property on LG G Watch R that is needed
     to prevent clashing with firmware

   - Annotations for cache coherency on multiple machines

   - Corrections for pinctrl, regulator, clock, iommu and power domain
     properties for i.MX and Qualcomm to correctly reflect the hardware
     settings

   - Firmware file names on multiple machines SA8540P Ride board

   - An incompatible change to the qcom vadc driver requires adding
     individual labels

   - Fix EQoS PHY reset GPIO by dropping the deprecated/wrong property
     and switch to the new bindings.

   - A fix for PCI bus address translation Tegra194 and Tegra234.

  There are also a couple of device driver fixes, addressing:

   - A race condition in the amdtee driver

   - A performance regression in the Qualcomm 'llcc' driver

   - An unitialized variable use NXP i.MX 'weim' driver

   - Error handling issues in Qualcomm 'rmtfs', and 'scm' drivers and
     the Arm scmi firmware driver"

* tag 'arm-fixes-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (48 commits)
  arm64: dts: qcom: sc8280xp-x13s: mark bob regulator as always-on
  arm64: dts: qcom: sc8280xp-x13s: mark s12b regulator as always-on
  arm64: dts: qcom: sc8280xp-x13s: mark s10b regulator as always-on
  arm64: dts: qcom: sc8280xp-x13s: mark s11b regulator as always-on
  arm64: dts: imx93: add missing #address-cells and #size-cells to i2c nodes
  bus: imx-weim: fix branch condition evaluates to a garbage value
  arm64: dts: imx8mn: specify #sound-dai-cells for SAI nodes
  ARM: dts: imx6sl: tolino-shine2hd: fix usbotg1 pinctrl
  ARM: dts: imx6sll: e60k02: fix usbotg1 pinctrl
  ARM: dts: imx6sll: e70k02: fix usbotg1 pinctrl
  arm64: dts: imx93: Fix eqos properties
  arm64: dts: imx8mp: Fix LCDIF2 node clock order
  arm64: dts: imx8mm-nitrogen-r2: fix WM8960 clock name
  arm64: dts: imx8dxl-evk: Fix eqos phy reset gpio
  firmware: qcom: scm: fix bogus irq error at probe
  arm64: dts: qcom: sm8550: Mark UFS controller as cache coherent
  arm64: dts: qcom: sa8540p-ride: correct name of remoteproc_nsp0 firmware
  arm64: dts: qcom: sm8450: Mark UFS controller as cache coherent
  arm64: dts: qcom: sm8350: Mark UFS controller as cache coherent
  arm64: dts: qcom: sm8550: fix LPASS pinctrl slew base address
  ...

2 years agoMerge tag 'for-v6.3-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux...
Linus Torvalds [Fri, 24 Mar 2023 22:32:01 +0000 (15:32 -0700)]
Merge tag 'for-v6.3-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

Pull power supply fixes from Sebastian Reichel:

 - rk817: Fix compiler warning

 - cros_usbpd-charger: Fix excessive error printing

 - axp288_fuel_gauge: handle platform_get_irq error

 - bq24190 and da9150: Fix race condition in remove path

* tag 'for-v6.3-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: da9150: Fix use after free bug in da9150_charger_remove due to race condition
  power: supply: bq24190: Fix use after free bug in bq24190_remove due to race condition
  power: supply: axp288_fuel_gauge: Added check for negative values
  power: supply: cros_usbpd: reclassify "default case!" as debug
  power: supply: rk817: Fix unsigned comparison with less than zero

2 years agoMerge tag 'drm-fixes-2023-03-24' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 24 Mar 2023 22:21:24 +0000 (15:21 -0700)]
Merge tag 'drm-fixes-2023-03-24' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:

 - usual pile of fixes for amdgpu & i915

 - probe error handling fixes for meson, lt8912b bridge

 - the host1x patch from Arnd

 - panel-orientation fix for Lenovo Book X90F

* tag 'drm-fixes-2023-03-24' of git://anongit.freedesktop.org/drm/drm: (23 commits)
  gpu: host1x: fix uninitialized variable use
  drm/amd/display: Set dcn32 caps.seamless_odm
  drm/amd/display: fix wrong index used in dccg32_set_dpstreamclk
  drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi
  drm/amd/display: remove outdated 8bpc comments
  drm/amdgpu/gfx: set cg flags to enter/exit safe mode
  drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
  drm/amdgpu: add mes resume when do gfx post soft reset
  drm/amdgpu: skip ASIC reset for APUs when go to S4
  drm/amdgpu: reposition the gpu reset checking for reuse
  drm/bridge: lt8912b: return EPROBE_DEFER if bridge is not found
  drm/meson: fix missing component unbind on bind errors
  drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Book X90F
  Revert "drm/i915/hwmon: Enable PL1 power limit"
  drm/i915: Update vblank timestamping stuff on seamless M/N change
  drm/i915: Fix format for perf_limit_reasons
  drm/i915/gt: perform uc late init after probe error injection
  drm/i915/active: Fix missing debug object activation
  drm/i915/guc: Fix missing ecodes
  drm/i915/mtl: Disable MC6 for MTL A step
  ...

2 years agoMerge tag 'for-6.3/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Fri, 24 Mar 2023 21:20:48 +0000 (14:20 -0700)]
Merge tag 'for-6.3/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM thin to work as a swap device by using 'limit_swap_bios' DM
   target flag (initially added to allow swap to dm-crypt) to throttle
   the amount of outstanding swap bios.

 - Fix DM crypt soft lockup warnings by calling cond_resched() from the
   cpu intensive loop in dmcrypt_write().

 - Fix DM crypt to not access an uninitialized tasklet. This fix allows
   for consistent handling of IO completion, by _not_ needlessly punting
   to a workqueue when tasklets are not needed.

 - Fix DM core's alloc_dev() initialization for DM stats to check for
   and propagate alloc_percpu() failure.

* tag 'for-6.3/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm stats: check for and propagate alloc_percpu failure
  dm crypt: avoid accessing uninitialized tasklet
  dm crypt: add cond_resched() to dmcrypt_write()
  dm thin: fix deadlock when swapping to thin device

2 years agoMerge tag 'block-6.3-2023-03-24' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 24 Mar 2023 21:10:39 +0000 (14:10 -0700)]
Merge tag 'block-6.3-2023-03-24' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Christoph:
     - Send Identify with CNS 06h only to I/O controllers (Martin
       George)
     - Fix nvme_tcp_term_pdu to match spec (Caleb Sander)

 - Pass in issue_flags for uring_cmd, so the end_io handlers don't need
   to assume what the right context is (me)

 - Fix for ublk, marking it as LIVE before adding it to avoid races on
   the initial IO (Ming)

* tag 'block-6.3-2023-03-24' of git://git.kernel.dk/linux:
  nvme-tcp: fix nvme_tcp_term_pdu to match spec
  nvme: send Identify with CNS 06h only to I/O controllers
  block/io_uring: pass in issue_flags for uring_cmd task_work handling
  block: ublk_drv: mark device as LIVE before adding disk

2 years agoMerge tag 'io_uring-6.3-2023-03-24' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 24 Mar 2023 21:01:01 +0000 (14:01 -0700)]
Merge tag 'io_uring-6.3-2023-03-24' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Fix an issue with repeated -ECONNREFUSED on a socket (me)

 - Fix a NULL pointer deference due to a stale lookup cache for
   allocating direct descriptors (Savino)

* tag 'io_uring-6.3-2023-03-24' of git://git.kernel.dk/linux:
  io_uring/rsrc: fix null-ptr-deref in io_file_bitmap_get()
  io_uring/net: avoid sending -ECONNABORTED on repeated connection requests

2 years agoMerge tag 'thermal-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 24 Mar 2023 20:45:58 +0000 (13:45 -0700)]
Merge tag 'thermal-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "These address two recent regressions related to thermal control.

  Specifics:

   - Restore the thermal core behavior regarding zero-temperature trip
     points to avoid a driver regression (Ido Schimmel)

   - Fix a recent regression in the ACPI processor driver preventing it
     from changing the number of CPU cooling device states exposed via
     sysfs after the given CPU cooling device has been registered
     (Rafael Wysocki)"

* tag 'thermal-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: core: Restore behavior regarding invalid trip points
  ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes
  thermal: core: Introduce thermal_cooling_device_update()
  thermal: core: Introduce thermal_cooling_device_present()
  ACPI: processor: Reorder acpi_processor_driver_init()

2 years agoMerge tag 'acpi-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 24 Mar 2023 20:29:44 +0000 (13:29 -0700)]
Merge tag 'acpi-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These add new ACPI IRQ override and backlight detection quirks.

  Specifics:

   - Add backlight=native DMI quirk for Acer Aspire 3830TG to the ACPI
     backlight driver (Hans de Goede)

   - Add an ACPI IRQ override quirk for Medion S17413 (Aymeric Wibo)"

* tag 'acpi-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: resource: Add Medion S17413 to IRQ override quirk
  ACPI: video: Add backlight=native DMI quirk for Acer Aspire 3830TG

2 years agoxfs: fix mismerged tracepoints
Darrick J. Wong [Fri, 24 Mar 2023 20:14:48 +0000 (13:14 -0700)]
xfs: fix mismerged tracepoints

At some point in between sending this patch to the list and merging it
into for-next, the tracepoints got all mixed up because I've
over-reliant on automated tools not sucking.  The end result is that the
tracepoints are all wrong, so fix them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2 years agosmb3: fix unusable share after force unmount failure
Steve French [Thu, 23 Mar 2023 21:20:02 +0000 (16:20 -0500)]
smb3: fix unusable share after force unmount failure

If user does forced unmount ("umount -f") while files are still open
on the share (as was seen in a Kubernetes example running on SMB3.1.1
mount) then we were marking the share as "TID_EXITING" in umount_begin()
which caused all subsequent operations (except write) to fail ... but
unfortunately when umount_begin() is called we do not know yet that
there are open files or active references on the share that would prevent
unmount from succeeding.  Kubernetes had example when they were doing
umount -f when files were open which caused the share to become
unusable until the files were closed (and the umount retried).

Fix this so that TID_EXITING is not set until we are about to send
the tree disconnect (not at the beginning of forced umounts in
umount_begin) so that if "umount -f" fails (due to open files or
references) the mount is still usable.

Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2 years agocifs: fix dentry lookups in directory handle cache
Paulo Alcantara [Fri, 24 Mar 2023 16:56:33 +0000 (13:56 -0300)]
cifs: fix dentry lookups in directory handle cache

Get rid of any prefix paths in @path before lookup_positive_unlocked()
as it will call ->lookup() which already adds those prefix paths
through build_path_from_dentry().

This has caused a performance regression when mounting shares with a
prefix path where readdir(2) would end up retrying several times to
open bad directory names that contained duplicate prefix paths.

Fix this by skipping any prefix paths in @path before calling
lookup_positive_unlocked().

Fixes: e4029e072673 ("cifs: find and use the dentry for cached non-root directories also")
Cc: stable@vger.kernel.org # 6.1+
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2 years agosmb3: lower default deferred close timeout to address perf regression
Steve French [Thu, 23 Mar 2023 20:10:26 +0000 (15:10 -0500)]
smb3: lower default deferred close timeout to address perf regression

Performance tests with large number of threads noted that the change
of the default closetimeo (deferred close timeout between when
close is done by application and when client has to send the close
to the server), to 5 seconds from 1 second, significantly degraded
perf in some cases like this (in the filebench example reported,
the stats show close requests on the wire taking twice as long,
and 50% regression in filebench perf). This is stil configurable
via mount parm closetimeo, but to be safe, decrease default back
to its previous value of 1 second.

Reported-by: Yin Fengwei <fengwei.yin@intel.com>
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/lkml/997614df-10d4-af53-9571-edec36b0e2f3@intel.com/
Fixes: 5efdd9122eff ("smb3: allow deferred close timeout to be configurable")
Cc: stable@vger.kernel.org # 6.0+
Tested-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2 years agocifs: fix missing unload_nls() in smb2_reconnect()
Paulo Alcantara [Fri, 24 Mar 2023 19:05:19 +0000 (16:05 -0300)]
cifs: fix missing unload_nls() in smb2_reconnect()

Make sure to unload_nls() @nls_codepage if we no longer need it.

Fixes: bc962159e8e3 ("cifs: avoid race conditions with parallel reconnects")
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2 years agoMerge tag 'slab-fix-for-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 24 Mar 2023 17:12:14 +0000 (10:12 -0700)]
Merge tag 'slab-fix-for-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:
 "A single build fix for a corner case configuration that is apparently
  possible to achieve on some arches, from Geert"

* tag 'slab-fix-for-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP

2 years agoMerge tag 'efi-fixes-for-v6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 24 Mar 2023 17:07:38 +0000 (10:07 -0700)]
Merge tag 'efi-fixes-for-v6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

 - Set the NX compat flag for arm64 and zboot, to ensure compatibility
   with EFI firmware that complies with tightening requirements imposed
   across the ecosystem.

 - Improve identification of Ampere Altra systems based on SMBIOS data.

 - Fix some issues related to the EFI framebuffer that were introduced
   as a result from some refactoring related to zboot and the merge with
   sysfb.

 - Makefile tweak to avoid rebuilding vmlinuz unnecessarily.

 - Fix efi_random_alloc() return value on out of memory condition.

* tag 'efi-fixes-for-v6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi/libstub: randomalloc: Return EFI_OUT_OF_RESOURCES on failure
  efi/libstub: Use relocated version of kernel's struct screen_info
  efi/libstub: zboot: Add compressed image to make targets
  efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
  efi: sysfb_efi: Fix DMI quirks not working for simpledrm
  efi/libstub: smbios: Drop unused 'recsize' parameter
  arm64: efi: Use SMBIOS processor version to key off Ampere quirk
  efi/libstub: smbios: Use length member instead of record struct size
  efi: earlycon: Reprobe after parsing config tables
  arm64: efi: Set NX compat flag in PE/COFF header
  efi/libstub: arm64: Remap relocated image with strict permissions
  efi/libstub: zboot: Mark zboot EFI application as NX compatible

2 years agoMerge tag 'qcom-driver-fixes-for-6.3' of https://git.kernel.org/pub/scm/linux/kernel...
Arnd Bergmann [Fri, 24 Mar 2023 17:06:28 +0000 (18:06 +0100)]
Merge tag 'qcom-driver-fixes-for-6.3' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/fixes

Qualcomm driver fixes for v6.3

Support for the secure world interrupting the SCM driver drive the wait
queue mechanism was recently introduced, but most platforms doesn't have
this mechanism and an error should not be printed in the log.

The rmtfs_mem driver recently gained support for assigning the region to
multiple VMIDs, but accidentally removed the support for running without
assignment. A couple of changes are introducd to correct this.

The SC8280XP LLCC slice configuration is wrong, reslting in incorrect
configuration of the hardware. The table is corrected, based on the
datasheet.

* tag 'qcom-driver-fixes-for-6.3' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  firmware: qcom: scm: fix bogus irq error at probe
  soc: qcom: rmtfs: handle optional qcom,vmid correctly
  soc: qcom: rmtfs: fix error handling reading qcom,vmid
  soc: qcom: llcc: Fix slice configuration values for SC8280XP

Link: https://lore.kernel.org/r/20230323142505.1086072-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2 years agoMerge tag 'qcom-dts-fixes-for-6.3' of https://git.kernel.org/pub/scm/linux/kernel...
Arnd Bergmann [Fri, 24 Mar 2023 17:06:16 +0000 (18:06 +0100)]
Merge tag 'qcom-dts-fixes-for-6.3' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/fixes

Qualcomm ARM32 Devicetree fixes for v6.3

This introduces missing reserved-memory ranges on LG G Watch R,
resolving stability issues caused by Linux reusing memory used by
firmware.

* tag 'qcom-dts-fixes-for-6.3' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  ARM: dts: qcom: apq8026-lg-lenok: add missing reserved memory

Link: https://lore.kernel.org/r/20230323141922.1085875-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2 years agoMerge tag 'qcom-arm64-fixes-for-6.3' of https://git.kernel.org/pub/scm/linux/kernel...
Arnd Bergmann [Fri, 24 Mar 2023 17:05:35 +0000 (18:05 +0100)]
Merge tag 'qcom-arm64-fixes-for-6.3' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/fixes

Qualcomm ARM64 Devicetree fixes for v6.3

This correct SIM card selection on the two newly introduced
MSM8916-based USB modems.

The firmware-name for the first CDSP is corrected on the SA8540P Ride
board.

The PCIe controller in SC7280 is marked cache-coherent, which resolves
seen data corruption issues.

Labels are added to the vadc channel nodes on SC8280XP, as the Linux
driver was updated to not include the unit address when generating
device names and collisions thereby prevented registration of the
channels. Audio clocks and routing is corrected and a few regulators are
marked always-on for the Lenovo Thinkpad X13s, as their clients are not
fully described at this point.

SPI5 was accidentally enabled by default on SM6115, and is disabled
again.

CDSP on SM6375 is provided its power-domains, to appropriately vote for
during power up for the DSP.

The iommu mask for the PCIe controllers in SM8150 is updated, to match
what the hypervisor expects.

Th Venus firmware path is corrected on Xiaomi Mi Pad 5 Pro.

The UFS controller is marked cache coherent on SM8350 and SM8450.

The clocks for the second WSA macro on SM8450 is corrected, and given
its own clocks.

The bias-pull-up value for I2C pins are corrected on SM8550, to trigger
the selection of the strong pull. CPU compatibles and the base address
of the LPASS TLMM block are corrected.

* tag 'qcom-arm64-fixes-for-6.3' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (23 commits)
  arm64: dts: qcom: sc8280xp-x13s: mark bob regulator as always-on
  arm64: dts: qcom: sc8280xp-x13s: mark s12b regulator as always-on
  arm64: dts: qcom: sc8280xp-x13s: mark s10b regulator as always-on
  arm64: dts: qcom: sc8280xp-x13s: mark s11b regulator as always-on
  arm64: dts: qcom: sm8550: Mark UFS controller as cache coherent
  arm64: dts: qcom: sa8540p-ride: correct name of remoteproc_nsp0 firmware
  arm64: dts: qcom: sm8450: Mark UFS controller as cache coherent
  arm64: dts: qcom: sm8350: Mark UFS controller as cache coherent
  arm64: dts: qcom: sm8550: fix LPASS pinctrl slew base address
  arm64: dts: qcom: sc8280xp-x13s: fix va dmic dai links and routing
  arm64: dts: qcom: sc8280xp-x13s: fix dmic sample rate
  arm64: dts: qcom: sc8280xp: fix lpass tx macro clocks
  arm64: dts: qcom: sc8280xp: fix rx frame shapping info
  arm64: dts: qcom: sm8450: correct WSA2 assigned clocks
  arm64: dts: qcom: sc7280: Mark PCIe controller as cache coherent
  arm64: dts: qcom: msm8916-ufi: Fix sim card selection pinctrl
  arm64: dts: qcom: sm8250-xiaomi-elish: Correct venus firmware path
  arm64: dts: qcom: sm8550: Use correct CPU compatibles
  arm64: dts: qcom: sm8550: Add bias pull up value to tlmm i2c data clk states
  arm64: dts: qcom: sm6375: Add missing power-domain-named to CDSP
  ...

Link: https://lore.kernel.org/r/20230323141642.1085684-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2 years agoMerge tag 'riscv-for-linus-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 24 Mar 2023 16:52:26 +0000 (09:52 -0700)]
Merge tag 'riscv-for-linus-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A fix to match the CSR ASID masking rules when passing ASIDs to
   firmware

 - Force GCC to use ISA 2.2, to avoid a host of compatibily issues
   between toolchains

* tag 'riscv-for-linus-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Handle zicsr/zifencei issues between clang and binutils
  riscv: mm: Fix incorrect ASID argument when flushing TLB

2 years agoMerge tag 'for-linus-6.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 24 Mar 2023 16:44:43 +0000 (09:44 -0700)]
Merge tag 'for-linus-6.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - fix build warning

 - avoid concurrent accesses to the Xen PV console ring page

* tag 'for-linus-6.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/PVH: avoid 32-bit build warning when obtaining VGA console info
  hvc/xen: prevent concurrent accesses to the shared ring

2 years agoMerge branch 'thermal-acpi'
Rafael J. Wysocki [Fri, 24 Mar 2023 16:11:27 +0000 (17:11 +0100)]
Merge branch 'thermal-acpi'

Merge a fix for a recent thermal-related regression in the ACPI
processor driver.

* thermal-acpi:
  ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes
  thermal: core: Introduce thermal_cooling_device_update()
  thermal: core: Introduce thermal_cooling_device_present()
  ACPI: processor: Reorder acpi_processor_driver_init()

2 years agoMerge branch 'acpi-video'
Rafael J. Wysocki [Fri, 24 Mar 2023 16:08:52 +0000 (17:08 +0100)]
Merge branch 'acpi-video'

Merge an ACPI backlight quirk for Acer Aspire 3830TG (Hans de Goede).

* acpi-video:
  ACPI: video: Add backlight=native DMI quirk for Acer Aspire 3830TG

2 years agoACPI: resource: Add Medion S17413 to IRQ override quirk
Aymeric Wibo [Sun, 19 Mar 2023 02:12:05 +0000 (03:12 +0100)]
ACPI: resource: Add Medion S17413 to IRQ override quirk

Add DMI info of the Medion S17413 (board M1xA) to the IRQ override
quirk table. This fixes the keyboard not working on these laptops.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=213031
Signed-off-by: Aymeric Wibo <obiwac@gmail.com>
[ rjw: Fixed up white space ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2 years agoMerge tag 'tag-chrome-platform-fixes-for-v6.3-rc4' of git://git.kernel.org/pub/scm...
Linus Torvalds [Fri, 24 Mar 2023 16:05:25 +0000 (09:05 -0700)]
Merge tag 'tag-chrome-platform-fixes-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome platform fix from Tzung-Bi Shih:
 "Fix a kernel data leak vulnerability"

* tag 'tag-chrome-platform-fixes-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  platform/chrome: cros_ec_chardev: fix kernel data leak from ioctl

2 years agoMerge tag 'i2c-for-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Fri, 24 Mar 2023 16:02:24 +0000 (09:02 -0700)]
Merge tag 'i2c-for-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "A set of regular driver fixes"

* tag 'i2c-for-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: xgene-slimpro: Fix out-of-bounds bug in xgene_slimpro_i2c_xfer()
  i2c: hisi: Only use the completion interrupt to finish the transfer
  i2c: hisi: Avoid redundant interrupts
  i2c: mxs: ensure that DMA buffers are safe for DMA
  i2c: imx-lpi2c: check only for enabled interrupt flags
  i2c: imx-lpi2c: clean rx/tx buffers upon new message

2 years agoMerge tag 'net-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Fri, 24 Mar 2023 15:48:12 +0000 (08:48 -0700)]
Merge tag 'net-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, wifi and bluetooth.

  Current release - regressions:

   - wifi: mt76: mt7915: add back 160MHz channel width support for
     MT7915

   - libbpf: revert poisoning of strlcpy, it broke uClibc-ng

  Current release - new code bugs:

   - bpf: improve the coverage of the "allow reads from uninit stack"
     feature to fix verification complexity problems

   - eth: am65-cpts: reset PPS genf adj settings on enable

  Previous releases - regressions:

   - wifi: mac80211: serialize ieee80211_handle_wake_tx_queue()

   - wifi: mt76: do not run mt76_unregister_device() on unregistered hw,
     fix null-deref

   - Bluetooth: btqcomsmd: fix command timeout after setting BD address

   - eth: igb: revert rtnl_lock() that causes a deadlock

   - dsa: mscc: ocelot: fix device specific statistics

  Previous releases - always broken:

   - xsk: add missing overflow check in xdp_umem_reg()

   - wifi: mac80211:
      - fix QoS on mesh interfaces
      - fix mesh path discovery based on unicast packets

   - Bluetooth:
      - ISO: fix timestamped HCI ISO data packet parsing
      - remove "Power-on" check from Mesh feature

   - usbnet: more fixes to drivers trusting packet length

   - wifi: iwlwifi: mvm: fix mvmtxq->stopped handling

   - Bluetooth: btintel: iterate only bluetooth device ACPI entries

   - eth: iavf: fix inverted Rx hash condition leading to disabled hash

   - eth: igc: fix the validation logic for taprio's gate list

   - dsa: tag_brcm: legacy: fix daisy-chained switches

  Misc:

   - bpf: adjust insufficient default bpf_jit_limit to account for
     growth of BPF use over the last 5 years

   - xdp: bpf_xdp_metadata() use EOPNOTSUPP as unique errno indicating
     no driver support"

* tag 'net-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
  Bluetooth: HCI: Fix global-out-of-bounds
  Bluetooth: mgmt: Fix MGMT add advmon with RSSI command
  Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work
  Bluetooth: L2CAP: Fix responding with wrong PDU type
  Bluetooth: btqcomsmd: Fix command timeout after setting BD address
  Bluetooth: btinel: Check ACPI handle for NULL before accessing
  net: mdio: thunder: Add missing fwnode_handle_put()
  net: dsa: mt7530: move setting ssc_delta to PHY_INTERFACE_MODE_TRGMII case
  net: dsa: mt7530: move lowering TRGMII driving to mt7530_setup()
  net: dsa: mt7530: move enabling disabling core clock to mt7530_pll_setup()
  net: asix: fix modprobe "sysfs: cannot create duplicate filename"
  gve: Cache link_speed value from device
  tools: ynl: Fix genlmsg header encoding formats
  net: enetc: fix aggregate RMON counters not showing the ranges
  Bluetooth: Remove "Power-on" check from Mesh feature
  Bluetooth: Fix race condition in hci_cmd_sync_clear
  Bluetooth: btintel: Iterate only bluetooth device ACPI entries
  Bluetooth: ISO: fix timestamped HCI ISO data packet parsing
  Bluetooth: btusb: Remove detection of ISO packets over bulk
  Bluetooth: hci_core: Detect if an ACL packet is in fact an ISO packet
  ...

2 years agoxfs: clear incore AGFL_RESET state if it's not needed
Darrick J. Wong [Tue, 21 Mar 2023 23:33:20 +0000 (16:33 -0700)]
xfs: clear incore AGFL_RESET state if it's not needed

Prior to commit 7ac2ff8bb371, when we loaded the incore perag structure
with information from the AGF header, we would set or clear the
pagf_agfl_reset field based on whether or not the AGFL list was
misaligned within the block.  IOWs, it's an incore state bit that's
supposed to cache something in the ondisk metadata.  Therefore, the code
still needs to support clearing the incore bit if (somehow) the AGFL
were to correct itself.

It turns out that xfs_repair does exactly this -- phase 4 loads the AGF
to scan the rmapbt for corrupt records, which can set NEEDS_AGFL_RESET.
The scan unsets AGF_INIT but doesn't unset NEEDS_AGFL_RESET.  Phase 5
totally rewrites the AGFL and fixes the alignment problem, didn't clear
NEEDS_AGFL_RESET historically, and reloads the perag state to fix the
freelist.  This results in the AGFL being reset based on stale data,
which then causes the new AGFL blocks to be leaked.  A subsequent
xfs_repair -n then complains about the leaks.

One could argue that phase 5 ought to clear this bit directly when it
reloads the perag AGF data after rewriting the AGFL, but libxfs used to
handle this for us, so it should go back to doing that.

Found by fuzzing flfirst = ones in xfs/352.

Fixes: 7ac2ff8bb371 ("xfs: perags need atomic operational state")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2 years agoxfs: pass the correct cursor to xfs_iomap_prealloc_size
Darrick J. Wong [Sun, 19 Mar 2023 03:58:40 +0000 (20:58 -0700)]
xfs: pass the correct cursor to xfs_iomap_prealloc_size

In xfs_buffered_write_iomap_begin, @icur is the iext cursor for the data
fork and @ccur is the cursor for the cow fork.  Pass in whichever cursor
corresponds to allocfork, because otherwise the xfs_iext_prev_extent
call can use the data fork cursor to walk off the end of the cow fork
structure.  Best case it returns the wrong results, worst case it does
this:

stack segment: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 3141909 Comm: fsstress Tainted: G        W          6.3.0-rc2-xfsx #6.3.0-rc2 7bf5cc2e98997627cae5c930d890aba3aeec65dd
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014
RIP: 0010:xfs_iext_prev+0x71/0x150 [xfs]
RSP: 0018:ffffc90002233aa8 EFLAGS: 00010297
RAX: 000000000000000f RBX: 000000000000000e RCX: 000000000000000c
RDX: 0000000000000002 RSI: 000000000000000e RDI: ffff8883d0019ba0
RBP: 989642409af8a7a7 R08: ffffea0000000001 R09: 0000000000000002
R10: 0000000000000000 R11: 000000000000000c R12: ffffc90002233b00
R13: ffff8883d0019ba0 R14: 989642409af8a6bf R15: 000ffffffffe0000
FS:  00007fdf8115f740(0000) GS:ffff88843fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdf8115e000 CR3: 0000000357256000 CR4: 00000000003506e0
Call Trace:
 <TASK>
 xfs_iomap_prealloc_size.constprop.0.isra.0+0x1a6/0x410 [xfs 619a268fb2406d68bd34e007a816b27e70abc22c]
 xfs_buffered_write_iomap_begin+0xa87/0xc60 [xfs 619a268fb2406d68bd34e007a816b27e70abc22c]
 iomap_iter+0x132/0x2f0
 iomap_file_buffered_write+0x92/0x330
 xfs_file_buffered_write+0xb1/0x330 [xfs 619a268fb2406d68bd34e007a816b27e70abc22c]
 vfs_write+0x2eb/0x410
 ksys_write+0x65/0xe0
 do_syscall_64+0x2b/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Found by xfs/538 in alwayscow mode, but this doesn't seem particular to
that test.

Fixes: 590b16516ef3 ("xfs: refactor xfs_iomap_prealloc_size")
Actually-Fixes: 66ae56a53f0e ("xfs: introduce an always_cow mode")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2 years agoMerge tag 'for-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Fri, 24 Mar 2023 15:32:10 +0000 (08:32 -0700)]
Merge tag 'for-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes, the zoned accounting fix is spread across a few
  patches, preparatory and the actual fixes:

   - zoned mode:
      - fix accounting of unusable zone space
      - fix zone activation condition for DUP profile
      - preparatory patches

   - improved error handling of missing chunks

   - fix compiler warning"

* tag 'for-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: drop space_info->active_total_bytes
  btrfs: zoned: count fresh BG region as zone unusable
  btrfs: use temporary variable for space_info in btrfs_update_block_group
  btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING
  btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profile
  btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filename
  btrfs: handle missing chunk mapping more gracefully

2 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Fri, 24 Mar 2023 15:27:13 +0000 (08:27 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four small fixes, three in drivers.

  The core fix adds a UFS device to an existing quirk to avoid a huge
  delay on boot"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_dh_alua: Fix memleak for 'qdata' in alua_activate()
  scsi: qla2xxx: Synchronize the IOCB count to be in order
  scsi: qla2xxx: Perform lockless command completion in abort path
  scsi: core: Add BLIST_SKIP_VPD_PAGES for SKhynix H28U74301AMR

2 years agocifs: avoid race conditions with parallel reconnects
Shyam Prasad N [Mon, 20 Mar 2023 06:08:19 +0000 (06:08 +0000)]
cifs: avoid race conditions with parallel reconnects

When multiple processes/channels do reconnects in parallel
we used to return success immediately
negotiate/session-setup/tree-connect, causing race conditions
between processes that enter the function in parallel.
This caused several errors related to session not found to
show up during parallel reconnects.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2 years agocifs: append path to open_enter trace event
Shyam Prasad N [Fri, 17 Mar 2023 12:51:17 +0000 (12:51 +0000)]
cifs: append path to open_enter trace event

We do not dump the file path for smb3_open_enter ftrace
calls, which is a severe handicap while debugging
using ftrace evens. This change adds that info.

Unfortunately, we're not updating the path in open params
in many places; which I had to do as a part of this change.
SMB2_open gets path in utf16 format, but it's easier of
path is supplied as char pointer in oparms.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2 years agogpu: host1x: fix uninitialized variable use
Arnd Bergmann [Fri, 27 Jan 2023 22:14:00 +0000 (23:14 +0100)]
gpu: host1x: fix uninitialized variable use

The error handling for platform_get_irq() failing no longer
works after a recent change, clang now points this out with
a warning:

drivers/gpu/host1x/dev.c:520:6: error: variable 'syncpt_irq' is uninitialized when used here [-Werror,-Wuninitialized]
        if (syncpt_irq < 0)
            ^~~~~~~~~~

Fix this by removing the variable and checking the correct
error status.

Fixes: 625d4ffb438c ("gpu: host1x: Rewrite syncpoint interrupt handling")
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127221418.2522612-1-arnd@kernel.org
2 years agoMerge tag 'amd-drm-fixes-6.3-2023-03-23' of https://gitlab.freedesktop.org/agd5f...
Daniel Vetter [Fri, 24 Mar 2023 09:23:28 +0000 (10:23 +0100)]
Merge tag 'amd-drm-fixes-6.3-2023-03-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.3-2023-03-23:

amdgpu:
- S4 fix
- Soft reset fixes
- SR-IOV fix
- Remove an out of date comment in the DC code
- ASPM fix
- DCN 3.2 fixes

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323161939.7751-1-alexander.deucher@amd.com
2 years agoMerge tag 'drm-intel-fixes-2023-03-23' of git://anongit.freedesktop.org/drm/drm-intel...
Daniel Vetter [Fri, 24 Mar 2023 09:18:43 +0000 (10:18 +0100)]
Merge tag 'drm-intel-fixes-2023-03-23' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v6.3-rc4:
- Fix an MTL workaround
- Fix fbdev obj locking before vma pin
- Fix state inheritance tracking in initial commit
- Fix missing GuC error capture codes
- Fix missing debug object activation
- Fix uc init late order relative to probe error injection
- Fix perf limit reasons formatting
- Fix vblank timestamp update on seamless M/N changes

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/878rfn7njw.fsf@intel.com
2 years agoMerge tag 'drm-misc-fixes-2023-03-23' of git://anongit.freedesktop.org/drm/drm-misc...
Daniel Vetter [Fri, 24 Mar 2023 08:33:03 +0000 (09:33 +0100)]
Merge tag 'drm-misc-fixes-2023-03-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

 * fixes for bind and probing error handling for meson, lt8912b bridge
 * panel-orientation fixes for Lenovo Book X90F

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323082401.GA8371@linux-uq9g
2 years agoksmbd: return unsupported error on smb1 mount
Namjae Jeon [Thu, 23 Mar 2023 12:15:52 +0000 (21:15 +0900)]
ksmbd: return unsupported error on smb1 mount

ksmbd disconnect connection when mounting with vers=smb1.
ksmbd should send smb1 negotiate response to client for correct
unsupported error return. This patch add needed SMB1 macros and fill
NegProt part of the response for smb1 negotiate response.

Cc: stable@vger.kernel.org
Reported-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2 years agoplatform/chrome: cros_ec_chardev: fix kernel data leak from ioctl
Tzung-Bi Shih [Fri, 24 Mar 2023 01:06:58 +0000 (09:06 +0800)]
platform/chrome: cros_ec_chardev: fix kernel data leak from ioctl

It is possible to peep kernel page's data by providing larger `insize`
in struct cros_ec_command[1] when invoking EC host commands.

Fix it by using zeroed memory.

[1]: https://elixir.bootlin.com/linux/v6.2/source/include/linux/platform_data/cros_ec_proto.h#L74

Fixes: eda2e30c6684 ("mfd / platform: cros_ec: Miscellaneous character device to talk with the EC")
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Link: https://lore.kernel.org/r/20230324010658.1082361-1-tzungbi@kernel.org
2 years agomm: mmap: remove newline at the end of the trace
Minwoo Im [Fri, 10 Mar 2023 23:18:00 +0000 (08:18 +0900)]
mm: mmap: remove newline at the end of the trace

We already have newline in TP_printk so remove the redundant newline
character at the end of the mmap trace.

<...>-345     [006] .....    95.589290: exit_mmap: mt_mod ...

<...>-345     [006] .....    95.589413: vm_unmapped_area: addr=...

<...>-345     [006] .....    95.589571: vm_unmapped_area: addr=...

<...>-345     [006] .....    95.589606: vm_unmapped_area: addr=...

to

<...>-336     [006] .....    44.762506: exit_mmap: mt_mod ...
<...>-336     [006] .....    44.762654: vm_unmapped_area: addr=...
<...>-336     [006] .....    44.762794: vm_unmapped_area: addr=...
<...>-336     [006] .....    44.762835: vm_unmapped_area: addr=...

Link: https://lkml.kernel.org/r/ZAu6qDsNPmk82UjV@minwoo-desktop
FIxes: df529cabb7a25 ("mm: mmap: add trace point of vm_unmapped_area")
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomailmap: add entries for Richard Leitner
Richard Leitner [Thu, 16 Mar 2023 10:25:25 +0000 (11:25 +0100)]
mailmap: add entries for Richard Leitner

Map all my old email addresses to my current address.

Link: https://lkml.kernel.org/r/20230316-my-mailmap-v1-1-76bc3a36ba41@linux.dev
Signed-off-by: Richard Leitner <richard.leitner@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokcsan: avoid passing -g for test
Marco Elver [Thu, 16 Mar 2023 22:47:05 +0000 (23:47 +0100)]
kcsan: avoid passing -g for test

Nathan reported that when building with GNU as and a version of clang that
defaults to DWARF5, the assembler will complain with:

  Error: non-constant .uleb128 is not supported

This is because `-g` defaults to the compiler debug info default. If the
assembler does not support some of the directives used, the above errors
occur. To fix, remove the explicit passing of `-g`.

All the test wants is that stack traces print valid function names, and
debug info is not required for that. (I currently cannot recall why I
added the explicit `-g`.)

Link: https://lkml.kernel.org/r/20230316224705.709984-2-elver@google.com
Fixes: 1fe84fd4a402 ("kcsan: Add test suite")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokfence: avoid passing -g for test
Marco Elver [Thu, 16 Mar 2023 22:47:04 +0000 (23:47 +0100)]
kfence: avoid passing -g for test

Nathan reported that when building with GNU as and a version of clang that
defaults to DWARF5:

  $ make -skj"$(nproc)" ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- \
LLVM=1 LLVM_IAS=0 O=build \
mrproper allmodconfig mm/kfence/kfence_test.o
  /tmp/kfence_test-08a0a0.s: Assembler messages:
  /tmp/kfence_test-08a0a0.s:14627: Error: non-constant .uleb128 is not supported
  /tmp/kfence_test-08a0a0.s:14628: Error: non-constant .uleb128 is not supported
  /tmp/kfence_test-08a0a0.s:14632: Error: non-constant .uleb128 is not supported
  /tmp/kfence_test-08a0a0.s:14633: Error: non-constant .uleb128 is not supported
  /tmp/kfence_test-08a0a0.s:14639: Error: non-constant .uleb128 is not supported
  ...

This is because `-g` defaults to the compiler debug info default.  If the
assembler does not support some of the directives used, the above errors
occur.  To fix, remove the explicit passing of `-g`.

All the test wants is that stack traces print valid function names, and
debug info is not required for that.  (I currently cannot recall why I
added the explicit `-g`.)

Link: https://lkml.kernel.org/r/20230316224705.709984-1-elver@google.com
Fixes: bc8fbc5f305a ("kfence: add test suite")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: kfence: fix using kfence_metadata without initialization in show_object()
Muchun Song [Wed, 15 Mar 2023 03:44:41 +0000 (11:44 +0800)]
mm: kfence: fix using kfence_metadata without initialization in show_object()

The variable kfence_metadata is initialized in kfence_init_pool(), then,
it is not initialized if kfence is disabled after booting.  In this case,
kfence_metadata will be used (e.g.  ->lock and ->state fields) without
initialization when reading /sys/kernel/debug/kfence/objects.  There will
be a warning if you enable CONFIG_DEBUG_SPINLOCK.  Fix it by creating
debugfs files when necessary.

Link: https://lkml.kernel.org/r/20230315034441.44321-1-songmuchun@bytedance.com
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: Marco Elver <elver@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agolib: dhry: fix unstable smp_processor_id(_) usage
Geert Uytterhoeven [Wed, 15 Mar 2023 14:28:17 +0000 (15:28 +0100)]
lib: dhry: fix unstable smp_processor_id(_) usage

When running the in-kernel Dhrystone benchmark with
CONFIG_DEBUG_PREEMPT=y:

    BUG: using smp_processor_id() in preemptible [00000000] code: bash/938

Fix this by not using smp_processor_id() directly, but instead wrapping
the whole benchmark inside a get_cpu()/put_cpu() pair.  This makes sure
the whole benchmark is run on the same CPU core, and the reported values
are consistent.

Link: https://lkml.kernel.org/r/b0d29932bb24ad82cea7f821e295c898e9657be0.1678890070.git.geert+renesas@glider.be
Fixes: d5528cc16893f1f6 ("lib: add Dhrystone benchmark test")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reported-by: Tobias Klausmann <klausman@schwarzvogel.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217179
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomailmap: add entry for Enric Balletbo i Serra
Enric Balletbo i Serra [Tue, 14 Mar 2023 11:54:55 +0000 (12:54 +0100)]
mailmap: add entry for Enric Balletbo i Serra

Map Enric's old corporate addresses to his kernel.org address.

Link: https://lkml.kernel.org/r/20230314115455.188818-1-eballetbo@kernel.org
Signed-off-by: Enric Balletbo i Serra <eballetbo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomailmap: map Sai Prakash Ranjan's old address to his current one
Konrad Dybcio [Tue, 14 Mar 2023 12:56:03 +0000 (13:56 +0100)]
mailmap: map Sai Prakash Ranjan's old address to his current one

Sai's old email is still picked up by the likes of get_maintainer.pl and
keeps bouncing like all other @codeaurora.org addresses.  Map it to his
current one.

Link: https://lkml.kernel.org/r/20230314125604.2734146-1-konrad.dybcio@linaro.org
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Cc: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomailmap: map Rajendra Nayak's old address to his current one
Konrad Dybcio [Mon, 13 Mar 2023 09:03:43 +0000 (10:03 +0100)]
mailmap: map Rajendra Nayak's old address to his current one

Rajendra's old email is still picked up by the likes of get_maintainer.pl
and keeps bouncing like all other @codeaurora.org addresses.  Map it to
his current one.

Link: https://lkml.kernel.org/r/20230313090343.2148346-1-konrad.dybcio@linaro.org
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Cc: Rajendra Nayak <quic_rjendra@quicinc.com>
Cc: Andy Gross <agross@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Marijn Suijten <marijn.suijten@somainline.org>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Vasily Averin <vasily.averin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoRevert "kasan: drop skip_kasan_poison variable in free_pages_prepare"
Peter Collingbourne [Fri, 10 Mar 2023 04:29:13 +0000 (20:29 -0800)]
Revert "kasan: drop skip_kasan_poison variable in free_pages_prepare"

This reverts commit 487a32ec24be819e747af8c2ab0d5c515508086a.

should_skip_kasan_poison() reads the PG_skip_kasan_poison flag from
page->flags.  However, this line of code in free_pages_prepare():

page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;

clears most of page->flags, including PG_skip_kasan_poison, before calling
should_skip_kasan_poison(), which meant that it would never return true as
a result of the page flag being set.  Therefore, fix the code to call
should_skip_kasan_poison() before clearing the flags, as we were doing
before the reverted patch.

This fixes a measurable performance regression introduced in the reverted
commit, where munmap() takes longer than intended if HW tags KASAN is
supported and enabled at runtime.  Without this patch, we see a
single-digit percentage performance regression in a particular
mmap()-heavy benchmark when enabling HW tags KASAN, and with the patch,
there is no statistically significant performance impact when enabling HW
tags KASAN.

Link: https://lkml.kernel.org/r/20230310042914.3805818-2-pcc@google.com
Fixes: 487a32ec24be ("kasan: drop skip_kasan_poison variable in free_pages_prepare")
Link: https://linux-review.googlesource.com/id/Ic4f13affeebd20548758438bb9ed9ca40e312b79
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org> [6.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomailmap: add entry for Tobias Klauser
Tobias Klauser [Fri, 10 Mar 2023 12:35:08 +0000 (13:35 +0100)]
mailmap: add entry for Tobias Klauser

Map my old email addresses to the current address.

Link: https://lkml.kernel.org/r/20230310123508.22079-1-tklauser@distanz.ch
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokasan, powerpc: don't rename memintrinsics if compiler adds prefixes
Marco Elver [Mon, 27 Feb 2023 09:47:27 +0000 (10:47 +0100)]
kasan, powerpc: don't rename memintrinsics if compiler adds prefixes

With appropriate compiler support [1], KASAN builds use __asan prefixed
meminstrinsics, and KASAN no longer overrides memcpy/memset/memmove.

If compiler support is detected (CC_HAS_KASAN_MEMINTRINSIC_PREFIX), define
memintrinsics normally (do not prefix '__').

On powerpc, KASAN is the only user of __mem functions, which are used to
define instrumented memintrinsics.  Alias the normal versions for KASAN to
use in its implementation.

Link: https://lore.kernel.org/all/20230224085942.1791837-1-elver@google.com/
Link: https://lore.kernel.org/oe-kbuild-all/202302271348.U5lvmo0S-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230227094726.3833247-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/ksm: fix race with VMA iteration and mm_struct teardown
Liam R. Howlett [Wed, 8 Mar 2023 22:03:10 +0000 (17:03 -0500)]
mm/ksm: fix race with VMA iteration and mm_struct teardown

exit_mmap() will tear down the VMAs and maple tree with the mmap_lock held
in write mode.  Ensure that the maple tree is still valid by checking
ksm_test_exit() after taking the mmap_lock in read mode, but before the
for_each_vma() iterator dereferences a destroyed maple tree.

Since the maple tree is destroyed, the flags telling lockdep to check an
external lock has been cleared.  Skip the for_each_vma() iterator to avoid
dereferencing a maple tree without the external lock flag, which would
create a lockdep warning.

Link: https://lkml.kernel.org/r/20230308220310.3119196-1-Liam.Howlett@oracle.com
Fixes: a5f18ba07276 ("mm/ksm: use vma iterators instead of vma linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Link: https://lore.kernel.org/lkml/ZAdUUhSbaa6fHS36@xpf.sh.intel.com/
Reported-by: syzbot+2ee18845e89ae76342c5@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=64a3e95957cd3deab99df7cd7b5a9475af92c93e
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <heng.su@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokselftest: vm: fix unused variable warning
Peter Xu [Wed, 8 Mar 2023 19:04:22 +0000 (19:04 +0000)]
kselftest: vm: fix unused variable warning

Remove unused variable from the MDWE test.

[joey.gouly@arm.com: add commit message]
Link: https://lkml.kernel.org/r/20230308190423.46491-4-joey.gouly@arm.com
Fixes: 4cf1fe34fd18 ("kselftest: vm: add tests for memory-deny-write-execute")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexey Izbyshev <izbyshev@ispras.ru>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: nd <nd@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: fix error handling for map_deny_write_exec
Joey Gouly [Wed, 8 Mar 2023 19:04:21 +0000 (19:04 +0000)]
mm: fix error handling for map_deny_write_exec

Commit 4a18419f71cd ("mm/mprotect: use mmu_gather") changed 'goto out;' to
'break' in the loop.

This wasn't noticed while rebasing the MDWE patches, so fix it now.

Link: https://lkml.kernel.org/r/20230308190423.46491-3-joey.gouly@arm.com
Fixes: b507808ebce2 ("mm: implement memory-deny-write-execute as a prctl")
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Reported-by: Alexey Izbyshev <izbyshev@ispras.ru>
Link: https://lore.kernel.org/linux-arm-kernel/8408d8901e9d7ee6b78db4c6cba04b78@ispras.ru/
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: nd <nd@arm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: deduplicate error handling for map_deny_write_exec
Joey Gouly [Wed, 8 Mar 2023 19:04:20 +0000 (19:04 +0000)]
mm: deduplicate error handling for map_deny_write_exec

Patch series "Fixes for MDWE prctl"

These are four small fixes for the recent memory-write-deny-execute prctl
patches [1].  Two reported by Alexey about error handling and two tooling
fixes by Peter.

This patch (of 4):

Commit cc8d1b097de7 ("mmap: clean up mmap_region() unrolling")
deduplicated the error handling, do the same for the return value of
`map_deny_write_exec`.

Link: https://lkml.kernel.org/r/20230308190423.46491-1-joey.gouly@arm.com
Link: https://lkml.kernel.org/r/20230308190423.46491-2-joey.gouly@arm.com
Link: https://lore.kernel.org/linux-arm-kernel/20230119160344.54358-1-joey.gouly@arm.com/
Fixes: b507808ebce2 ("mm: implement memory-deny-write-execute as a prctl")
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Reported-by: Alexey Izbyshev <izbyshev@ispras.ru>
Link: https://lore.kernel.org/linux-arm-kernel/8408d8901e9d7ee6b78db4c6cba04b78@ispras.ru/
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: nd <nd@arm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agochecksyscalls: ignore fstat to silence build warning on LoongArch
Tiezhu Yang [Tue, 7 Mar 2023 07:59:00 +0000 (15:59 +0800)]
checksyscalls: ignore fstat to silence build warning on LoongArch

fstat is replaced by statx on the new architecture, so an exception is
added to the checksyscalls script to silence the following build warning
on LoongArch:

  CALL    scripts/checksyscalls.sh
<stdin>:569:2: warning: #warning syscall fstat not implemented [-Wcpp]

Link: https://lkml.kernel.org/r/1678175940-20872-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Suggested-by: WANG Xuerui <kernel@xen0n.name>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agonilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy()
Ryusuke Konishi [Tue, 7 Mar 2023 08:55:48 +0000 (17:55 +0900)]
nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy()

The ioctl helper function nilfs_ioctl_wrap_copy(), which exchanges a
metadata array to/from user space, may copy uninitialized buffer regions
to user space memory for read-only ioctl commands NILFS_IOCTL_GET_SUINFO
and NILFS_IOCTL_GET_CPINFO.

This can occur when the element size of the user space metadata given by
the v_size member of the argument nilfs_argv structure is larger than the
size of the metadata element (nilfs_suinfo structure or nilfs_cpinfo
structure) on the file system side.

KMSAN-enabled kernels detect this issue as follows:

 BUG: KMSAN: kernel-infoleak in instrument_copy_to_user
 include/linux/instrumented.h:121 [inline]
 BUG: KMSAN: kernel-infoleak in _copy_to_user+0xc0/0x100 lib/usercopy.c:33
  instrument_copy_to_user include/linux/instrumented.h:121 [inline]
  _copy_to_user+0xc0/0x100 lib/usercopy.c:33
  copy_to_user include/linux/uaccess.h:169 [inline]
  nilfs_ioctl_wrap_copy+0x6fa/0xc10 fs/nilfs2/ioctl.c:99
  nilfs_ioctl_get_info fs/nilfs2/ioctl.c:1173 [inline]
  nilfs_ioctl+0x2402/0x4450 fs/nilfs2/ioctl.c:1290
  nilfs_compat_ioctl+0x1b8/0x200 fs/nilfs2/ioctl.c:1343
  __do_compat_sys_ioctl fs/ioctl.c:968 [inline]
  __se_compat_sys_ioctl+0x7dd/0x1000 fs/ioctl.c:910
  __ia32_compat_sys_ioctl+0x93/0xd0 fs/ioctl.c:910
  do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
  __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
  do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
  do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
  entry_SYSENTER_compat_after_hwframe+0x70/0x82

 Uninit was created at:
  __alloc_pages+0x9f6/0xe90 mm/page_alloc.c:5572
  alloc_pages+0xab0/0xd80 mm/mempolicy.c:2287
  __get_free_pages+0x34/0xc0 mm/page_alloc.c:5599
  nilfs_ioctl_wrap_copy+0x223/0xc10 fs/nilfs2/ioctl.c:74
  nilfs_ioctl_get_info fs/nilfs2/ioctl.c:1173 [inline]
  nilfs_ioctl+0x2402/0x4450 fs/nilfs2/ioctl.c:1290
  nilfs_compat_ioctl+0x1b8/0x200 fs/nilfs2/ioctl.c:1343
  __do_compat_sys_ioctl fs/ioctl.c:968 [inline]
  __se_compat_sys_ioctl+0x7dd/0x1000 fs/ioctl.c:910
  __ia32_compat_sys_ioctl+0x93/0xd0 fs/ioctl.c:910
  do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
  __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
  do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
  do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
  entry_SYSENTER_compat_after_hwframe+0x70/0x82

 Bytes 16-127 of 3968 are uninitialized
 ...

This eliminates the leak issue by initializing the page allocated as
buffer using get_zeroed_page().

Link: https://lkml.kernel.org/r/20230307085548.6290-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+132fdd2f1e1805fdc591@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/000000000000a5bd2d05f63f04ae@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agotest_maple_tree: add more testing for mas_empty_area()
Liam R. Howlett [Tue, 7 Mar 2023 18:02:47 +0000 (13:02 -0500)]
test_maple_tree: add more testing for mas_empty_area()

Test robust filling of an entire area of the tree, then test one beyond.
This is to test the walking back up the tree at the end of nodes and error
condition.  Test inspired by the reproducer code provided by Snild Dolkow.

The last test in the function tests for the case of a corrupted maple
state caused by the incorrect limits set during mas_skip_node().  There
needs to be a gap in the second last child and last child, but the search
must rule out the second last child's gap.  This would avoid correcting
the maple state to the correct max limit and return an error.

Link: https://lkml.kernel.org/r/20230307180247.2220303-3-Liam.Howlett@oracle.com
Cc: Snild Dolkow <snild@sony.com>
Link: https://lore.kernel.org/linux-mm/cb8dc31a-fef2-1d09-f133-e9f7b9f9e77a@sony.com/
Fixes: e15e06a83923 ("lib/test_maple_tree: add testing for maple tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomaple_tree: fix mas_skip_node() end slot detection
Liam R. Howlett [Tue, 7 Mar 2023 18:02:46 +0000 (13:02 -0500)]
maple_tree: fix mas_skip_node() end slot detection

Patch series "Fix mas_skip_node() for mas_empty_area()", v2.

mas_empty_area() was incorrectly returning an error when there was room.
The issue was tracked down to mas_skip_node() using the incorrect
end-of-slot count.  Instead of using the nodes hard limit, the limit of
data should be used.

mas_skip_node() was also setting the min and max to that of the child
node, which was unnecessary.  Within these limits being set, there was
also a bug that corrupted the maple state's max if the offset was set to
the maximum node pivot.  The bug was without consequence unless there was
a sufficient gap in the next child node which would cause an error to be
returned.

This patch set fixes these errors by removing the limit setting from
mas_skip_node() and uses the mas_data_end() for slot limits, and adds
tests for all failures discovered.

This patch (of 2):

mas_skip_node() is used to move the maple state to the node with a higher
limit.  It does this by walking up the tree and increasing the slot count.
Since slot count may not be able to be increased, it may need to walk up
multiple times to find room to walk right to a higher limit node.  The
limit of slots that was being used was the node limit and not the last
location of data in the node.  This would cause the maple state to be
shifted outside actual data and enter an error state, thus returning
-EBUSY.

The result of the incorrect error state means that mas_awalk() would
return an error instead of finding the allocation space.

The fix is to use mas_data_end() in mas_skip_node() to detect the nodes
data end point and continue walking the tree up until it is safe to move
to a node with a higher limit.

The walk up the tree also sets the maple state limits so remove the buggy
code from mas_skip_node().  Setting the limits had the unfortunate side
effect of triggering another bug if the parent node was full and the there
was no suitable gap in the second last child, but room in the next child.

mas_skip_node() may also be passed a maple state in an error state from
mas_anode_descend() when no allocations are available.  Return on such an
error state immediately.

Link: https://lkml.kernel.org/r/20230307180247.2220303-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230307180247.2220303-2-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Snild Dolkow <snild@sony.com>
Link: https://lore.kernel.org/linux-mm/cb8dc31a-fef2-1d09-f133-e9f7b9f9e77a@sony.com/
Tested-by: Snild Dolkow <snild@sony.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm, vmalloc: fix high order __GFP_NOFAIL allocations
Michal Hocko [Mon, 6 Mar 2023 08:15:17 +0000 (09:15 +0100)]
mm, vmalloc: fix high order __GFP_NOFAIL allocations

Gao Xiang has reported that the page allocator complains about high order
__GFP_NOFAIL request coming from the vmalloc core:

 __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549
 alloc_pages+0x1aa/0x270 mm/mempolicy.c:2286
 vm_area_alloc_pages mm/vmalloc.c:2989 [inline]
 __vmalloc_area_node mm/vmalloc.c:3057 [inline]
 __vmalloc_node_range+0x978/0x13c0 mm/vmalloc.c:3227
 kvmalloc_node+0x156/0x1a0 mm/util.c:606
 kvmalloc include/linux/slab.h:737 [inline]
 kvmalloc_array include/linux/slab.h:755 [inline]
 kvcalloc include/linux/slab.h:760 [inline]

it seems that I have completely missed high order allocation backing
vmalloc areas case when implementing __GFP_NOFAIL support.  This means
that [k]vmalloc at al.  can allocate higher order allocations with
__GFP_NOFAIL which can trigger OOM killer for non-costly orders easily or
cause a lot of reclaim/compaction activity if those requests cannot be
satisfied.

Fix the issue by falling back to zero order allocations for __GFP_NOFAIL
requests if the high order request fails.

Link: https://lkml.kernel.org/r/ZAXynvdNqcI0f6Us@dhcp22.suse.cz
Fixes: 9376130c390a ("mm/vmalloc: add support for __GFP_NOFAIL")
Reported-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lkml.kernel.org/r/20230305053035.1911-1-hsiangkao@linux.alibaba.com
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>