Mel Gorman reported an issue with mas_walk() occasionally returning 0xE
causing NULL pointer dereference:
[ 3611.890629][T12506] BUG: kernel NULL pointer dereference, address:
000000000000009e
[ 3611.898293][T12506] #PF: supervisor read access in kernel mode
[ 3611.904127][T12506] #PF: error_code(0x0000) - not-present page
[ 3611.909961][T12506] PGD
5022e39067 P4D
5022e39067 PUD
502344b067 PMD 0
[ 3611.916571][T12506] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 3611.921629][T12506] CPU: 35 PID: 12506 Comm: pft Tainted: G E 6.1.0-rc6-mm-pervmalock-v2r3 #1
5eb359771d3fe61fee3afacf8357ad1393b861b8
[ 3611.935095][T12506] Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.8.2 08/27/2020
[ 3611.943266][T12506] RIP: 0010:lock_vma_under_rcu+0x70/0x140 # faddr2line matches mm/memory.c:5268
The code causing the crash is vma_is_anonymous() call which accesses
vma->vm_ops. With vm_ops being at offset 0x90 in vm_area_struct, the
vma returned by mas_walk() must be 0xE (0x9e-0x90).
Work around this issue for testing until we figure out why this is
happening.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
if (!vma)
goto inval;
+ /* A known issue to be investigated */
+ if (WARN_ON(vma <= (struct vm_area_struct *)0xFF)) {
+ trace_printk("RECEIVED VMA 0x%lx for addr 0x%lx\n",
+ (unsigned long)vma, address);
+ printk("GOT AN INVALID VMA 0x%lx for address 0x%lx\n",
+ (unsigned long)vma, address);
+#ifdef CONFIG_DEBUG_MAPLE_TREE
+ mt_dump(&mm->mm_mt);
+#endif
+ goto inval;
+ }
+
/* Only anonymous vmas are supported for now */
if (!vma_is_anonymous(vma))
goto inval;