The root cause is that unpoison_memory() tries to check the PG_HWPoison
flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
triggered. This can be reproduced by below steps:
This scenario can be identified by pfn_to_online_page() returning NULL.
And ZONE_DEVICE pages are never expected, so we can simply fail if
pfn_to_online_page() == NULL to fix the bug.
Link: https://lkml.kernel.org/r/20250828024618.1744895-1-linmiaohe@huawei.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Carlos Llamas [Thu, 28 Aug 2025 14:26:56 +0000 (14:26 +0000)]
mm/mremap: fix regression in vrm->new_addr check
Commit 3215eaceca87 ("mm/mremap: refactor initial parameter sanity
checks") moved the sanity check for vrm->new_addr from mremap_to() to
check_mremap_params().
However, this caused a regression as vrm->new_addr is now checked even
when MREMAP_FIXED and MREMAP_DONTUNMAP flags are not specified. In this
case, vrm->new_addr can be garbage and create unexpected failures.
Fix this by moving the new_addr check after the vrm_implies_new_addr()
guard. This ensures that the new_addr is only checked when the user has
specified one explicitly.
Link: https://lkml.kernel.org/r/20250828142657.770502-1-cmllamas@google.com Fixes: 3215eaceca87 ("mm/mremap: refactor initial parameter sanity checks") Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Carlos Llamas <cmllamas@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vlad Dumitrescu [Fri, 22 Aug 2025 22:55:16 +0000 (15:55 -0700)]
percpu: fix race on alloc failed warning limit
The 'allocation failed, ...' warning messages can cause unlimited log
spam, contrary to the implementation's intent.
The warn_limit variable is accessed without synchronization. If more than
<warn_limit> threads enter the warning path at the same time, the variable
will get decremented past 0. Once it becomes negative, the non-zero check
will always return true leading to unlimited log spam.
Use atomic operation to access warn_limit and change condition to test for
non-negative (>= 0) - atomic_dec_if_positive will return -1 once
warn_limit becomes 0. Continue to print disable message alongside the
last warning.
While the change cited in Fixes is only adjacent, the warning limit
implementation was correct before it. Only non-atomic allocations were
considered for warnings, and those happened to hold pcpu_alloc_mutex while
accessing warn_limit.
[vdumitrescu@nvidia.com: prevent warn_limit from going negative, per Christoph Lameter] Link: https://lkml.kernel.org/r/ee87cc59-2717-4dbb-8052-1d2692c5aaaa@nvidia.com Link: https://lkml.kernel.org/r/ab22061a-a62f-4429-945b-744e5cc4ba35@nvidia.com Fixes: f7d77dfc91f7 ("mm/percpu.c: print error message too if atomic alloc failed") Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kyle Meyer [Thu, 28 Aug 2025 18:38:20 +0000 (13:38 -0500)]
mm/memory-failure: fix redundant updates for already poisoned pages
Duplicate memory errors can be reported by multiple sources.
Passing an already poisoned page to action_result() causes issues:
* The amount of hardware corrupted memory is incorrectly updated.
* Per NUMA node MF stats are incorrectly updated.
* Redundant "already poisoned" messages are printed.
Avoid those issues by:
* Skipping hardware corrupted memory updates for already poisoned pages.
* Skipping per NUMA node MF stats updates for already poisoned pages.
* Dropping redundant "already poisoned" messages.
Make MF_MSG_ALREADY_POISONED consistent with other action_page_types and
make calls to action_result() consistent for already poisoned normal pages
and huge pages.
Link: https://lkml.kernel.org/r/aLCiHMy12Ck3ouwC@hpe.com Fixes: b8b9488d50b7 ("mm/memory-failure: improve memory failure action_result messages") Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com> Reviewed-by: Jiaqi Yan <jiaqiyan@google.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Kyle Meyer <kyle.meyer@hpe.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Breno Leitao [Wed, 27 Aug 2025 10:42:23 +0000 (03:42 -0700)]
s390: kexec: initialize kexec_buf struct
The kexec_buf structure was previously declared without initialization.
commit bf454ec31add ("kexec_file: allow to place kexec_buf randomly")
added a field that is always read but not consistently populated by all
architectures. This un-initialized field will contain garbage.
This is also triggering a UBSAN warning when the uninitialized data was
accessed:
------------[ cut here ]------------
UBSAN: invalid-load in ./include/linux/kexec.h:210:10
load of value 252 is not a valid value for type '_Bool'
Zero-initializing kexec_buf at declaration ensures all fields are
cleanly set, preventing future instances of uninitialized memory being
used.
Link: https://lkml.kernel.org/r/20250827-kbuf_all-v1-3-1df9882bb01a@debian.org Fixes: bf454ec31add ("kexec_file: allow to place kexec_buf randomly") Signed-off-by: Breno Leitao <leitao@debian.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Baoquan He <bhe@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Breno Leitao [Wed, 27 Aug 2025 10:42:22 +0000 (03:42 -0700)]
riscv: kexec: initialize kexec_buf struct
The kexec_buf structure was previously declared without initialization.
commit bf454ec31add ("kexec_file: allow to place kexec_buf randomly")
added a field that is always read but not consistently populated by all
architectures. This un-initialized field will contain garbage.
This is also triggering a UBSAN warning when the uninitialized data was
accessed:
------------[ cut here ]------------
UBSAN: invalid-load in ./include/linux/kexec.h:210:10
load of value 252 is not a valid value for type '_Bool'
Zero-initializing kexec_buf at declaration ensures all fields are
cleanly set, preventing future instances of uninitialized memory being
used.
Link: https://lkml.kernel.org/r/20250827-kbuf_all-v1-2-1df9882bb01a@debian.org Fixes: bf454ec31add ("kexec_file: allow to place kexec_buf randomly") Signed-off-by: Breno Leitao <leitao@debian.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Baoquan He <bhe@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Breno Leitao [Wed, 27 Aug 2025 10:42:21 +0000 (03:42 -0700)]
arm64: kexec: initialize kexec_buf struct in load_other_segments()
Patch series "kexec: Fix invalid field access".
The kexec_buf structure was previously declared without initialization.
commit bf454ec31add ("kexec_file: allow to place kexec_buf randomly")
added a field that is always read but not consistently populated by all
architectures. This un-initialized field will contain garbage.
This is also triggering a UBSAN warning when the uninitialized data was
accessed:
------------[ cut here ]------------
UBSAN: invalid-load in ./include/linux/kexec.h:210:10
load of value 252 is not a valid value for type '_Bool'
Zero-initializing kexec_buf at declaration ensures all fields are cleanly
set, preventing future instances of uninitialized memory being used.
An initial fix was already landed for arm64[0], and this patchset fixes
the problem on the remaining arm64 code and on riscv, as raised by Mark.
Discussions about this problem could be found at[1][2].
This patch (of 3):
The kexec_buf structure was previously declared without initialization.
commit bf454ec31add ("kexec_file: allow to place kexec_buf randomly")
added a field that is always read but not consistently populated by all
architectures. This un-initialized field will contain garbage.
This is also triggering a UBSAN warning when the uninitialized data was
accessed:
------------[ cut here ]------------
UBSAN: invalid-load in ./include/linux/kexec.h:210:10
load of value 252 is not a valid value for type '_Bool'
Zero-initializing kexec_buf at declaration ensures all fields are
cleanly set, preventing future instances of uninitialized memory being
used.
Quanmin Yan [Wed, 27 Aug 2025 11:58:58 +0000 (19:58 +0800)]
mm/damon/reclaim: avoid divide-by-zero in damon_reclaim_apply_parameters()
When creating a new scheme of DAMON_RECLAIM, the calculation of
'min_age_region' uses 'aggr_interval' as the divisor, which may lead to
division-by-zero errors. Fix it by directly returning -EINVAL when such a
case occurs.
Link: https://lkml.kernel.org/r/20250827115858.1186261-3-yanquanmin1@huawei.com Fixes: f5a79d7c0c87 ("mm/damon: introduce struct damos_access_pattern") Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: ze zuo <zuoze1@huawei.com> Cc: <stable@vger.kernel.org> [6.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Quanmin Yan [Wed, 27 Aug 2025 11:58:57 +0000 (19:58 +0800)]
mm/damon/lru_sort: avoid divide-by-zero in damon_lru_sort_apply_parameters()
Patch series "mm/damon: avoid divide-by-zero in DAMON module's parameters
application".
DAMON's RECLAIM and LRU_SORT modules perform no validation on
user-configured parameters during application, which may lead to
division-by-zero errors.
Avoid the divide-by-zero by adding validation checks when DAMON modules
attempt to apply the parameters.
This patch (of 2):
During the calculation of 'hot_thres' and 'cold_thres', either
'sample_interval' or 'aggr_interval' is used as the divisor, which may
lead to division-by-zero errors. Fix it by directly returning -EINVAL
when such a case occurs. Additionally, since 'aggr_interval' is already
required to be set no smaller than 'sample_interval' in damon_set_attrs(),
only the case where 'sample_interval' is zero needs to be checked.
Link: https://lkml.kernel.org/r/20250827115858.1186261-2-yanquanmin1@huawei.com Fixes: 40e983cca927 ("mm/damon: introduce DAMON-based LRU-lists Sorting") Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: ze zuo <zuoze1@huawei.com> Cc: <stable@vger.kernel.org> [6.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sang-Heon Jeon [Fri, 22 Aug 2025 02:50:57 +0000 (11:50 +0900)]
mm/damon/core: set quota->charged_from to jiffies at first charge window
Kernel initializes the "jiffies" timer as 5 minutes below zero, as shown
in include/linux/jiffies.h
/*
* Have the 32 bit jiffies value wrap 5 minutes after boot
* so jiffies wrap bugs show up earlier.
*/
#define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))
And jiffies comparison help functions cast unsigned value to signed to
cover wraparound
#define time_after_eq(a,b) \
(typecheck(unsigned long, a) && \
typecheck(unsigned long, b) && \
((long)((a) - (b)) >= 0))
When quota->charged_from is initialized to 0, time_after_eq() can
incorrectly return FALSE even after reset_interval has elapsed. This
occurs when (jiffies - reset_interval) produces a value with MSB=1, which
is interpreted as negative in signed arithmetic.
This issue primarily affects 32-bit systems because: On 64-bit systems:
MSB=1 values occur after ~292 million years from boot (assuming HZ=1000),
almost impossible.
On 32-bit systems: MSB=1 values occur during the first 5 minutes after
boot, and the second half of every jiffies wraparound cycle, starting from
day 25 (assuming HZ=1000)
When above unexpected FALSE return from time_after_eq() occurs, the
charging window will not reset. The user impact depends on esz value at
that time.
If esz is 0, scheme ignores configured quotas and runs without any limits.
If esz is not 0, scheme stops working once the quota is exhausted. It
remains until the charging window finally resets.
So, change quota->charged_from to jiffies at damos_adjust_quota() when it
is considered as the first charge window. By this change, we can avoid
unexpected FALSE return from time_after_eq()
Link: https://lkml.kernel.org/r/20250822025057.1740854-1-ekffu200098@gmail.com Fixes: 2b8a248d5873 ("mm/damon/schemes: implement size quota for schemes application speed control") # 5.16 Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jeongjun Park [Sat, 23 Aug 2025 18:21:15 +0000 (03:21 +0900)]
mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range()
When restoring a reservation for an anonymous page, we need to check to
freeing a surplus. However, __unmap_hugepage_range() causes data race
because it reads h->surplus_huge_pages without the protection of
hugetlb_lock.
And adjust_reservation is a boolean variable that indicates whether
reservations for anonymous pages in each folio should be restored.
Therefore, it should be initialized to false for each round of the loop.
However, this variable is not initialized to false except when defining
the current adjust_reservation variable.
This means that once adjust_reservation is set to true even once within
the loop, reservations for anonymous pages will be restored
unconditionally in all subsequent rounds, regardless of the folio's state.
To fix this, we need to add the missing hugetlb_lock, unlock the
page_table_lock earlier so that we don't lock the hugetlb_lock inside the
page_table_lock lock, and initialize adjust_reservation to false on each
round within the loop.
Link: https://lkml.kernel.org/r/20250823182115.1193563-1-aha310510@gmail.com Fixes: df7a6d1f6405 ("mm/hugetlb: restore the reservation if needed") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reported-by: syzbot+417aeb05fd190f3a6da9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=417aeb05fd190f3a6da9 Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Breno Leitao <leitao@debian.org> Cc: David Hildenbrand <david@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Besides, do_migrate_range() may be called between memory_failure set
hwpoison flag and isolate the folio from lru, so remove WARN_ON(). In other
places, unmap_poisoned_folio() is called when the folio is isolated, obey
it in do_migrate_range() too.
[david@redhat.com: don't abort offlining, fixed typo, add comment] Link: https://lkml.kernel.org/r/3c214dff-9649-4015-840f-10de0e03ebe4@redhat.com Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pankaj Raghav <kernel@pankajraghav.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wei Yang [Fri, 22 Aug 2025 06:33:18 +0000 (06:33 +0000)]
mm/khugepaged: fix the address passed to notifier on testing young
Commit 8ee53820edfd ("thp: mmu_notifier_test_young") introduced
mmu_notifier_test_young(), but we are passing the wrong address.
In xxx_scan_pmd(), the actual iteration address is "_address" not
"address". We seem to misuse the variable on the very beginning.
Change it to the right one.
[akpm@linux-foundation.org fix whitespace, per everyone] Link: https://lkml.kernel.org/r/20250822063318.11644-1-richard.weiyang@gmail.com Fixes: 8ee53820edfd ("thp: mmu_notifier_test_young") Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Merge tag 'sound-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small changes including a few regression fixes:
- Regression fix for Intel SKL/KBL HD-audio bindings
- Regression fix for missing Nvidia HDMI codec entries after the
recent code reorganization
- A few TAS2781 codec regression fixes
- Fix for ASoC component lookup breakage
- Usual HD-audio, USB-audio and SOF quirk entries"
* tag 'sound-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/hdmi: Add pin fix for another HP EliteDesk 800 G4 model
ALSA: usb-audio: Allow Focusrite devices to use low samplerates
ALSA: hda: tas2781: reorder tas2563 calibration variables
ALSA: hda: tas2781: fix tas2563 EFI data endianness
ALSA: firewire-motu: drop EPOLLOUT from poll return values as write is not supported
ALSA: docs: Add documents for recently changes in snd-usb-audio
ALSA: usb-audio: Add mute TLV for playback volumes on more devices
ASoC: SOF: Intel: WCL: Add the sdw_process_wakeen op
ALSA: hda: Avoid binding with SOF for SKL/KBL platforms
ASoC: rsnd: tidyup direction name on rsnd_dai_connect()
ALSA: hda/tas2781: Fix EFI name for calibration beginning with 1 instead of 0
ALSA: usb-audio: move mixer_quirks' min_mute into common quirk
ALSA: hda/realtek: Fix headset mic for TongFang X6[AF]R5xxY
ALSA: hda/hdmi: Restore missing HDMI codec entries
ASoC: codecs: idt821034: fix wrong log in idt821034_chip_direction_output()
ASoC: soc-core: tidyup snd_soc_lookup_component_nolocked()
ASoC: soc-core: care NULL dirver name on snd_soc_lookup_component_nolocked()
ALSA: hda: intel-dsp-config: Select SOF driver on MTL Chromebooks
ALSA: usb-audio: Add mute TLV for playback volumes on some devices
Merge tag 'mm-hotfixes-stable-2025-09-01-17-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes. 13 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels. 11 of these
fixes are for MM.
This includes a three-patch series from Harry Yoo which fixes an
intermittent boot failure which can occur on x86 systems. And a
two-patch series from Alexander Gordeev which fixes a KASAN crash on
S390 systems"
* tag 'mm-hotfixes-stable-2025-09-01-17-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: fix possible deadlock in kmemleak
x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings()
mm: introduce and use {pgd,p4d}_populate_kernel()
mm: move page table sync declarations to linux/pgtable.h
proc: fix missing pde_set_flags() for net proc files
mm: fix accounting of memmap pages
mm/damon/core: prevent unnecessary overflow in damos_set_effective_quota()
kexec: add KEXEC_FILE_NO_CMA as a legal flag
kasan: fix GCC mem-intrinsic prefix with sw tags
mm/kasan: avoid lazy MMU mode hazards
mm/kasan: fix vmalloc shadow memory (de-)population races
kunit: kasan_test: disable fortify string checker on kasan_strings() test
selftests/mm: fix FORCE_READ to read input value correctly
mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE
ocfs2: prevent release journal inode after journal shutdown
rust: mm: mark VmaNew as transparent
of_numa: fix uninitialized memory nodes causing kernel panic
Merge tag 'for-6.17-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix a few races related to inode link count
- fix inode leak on failure to add link to inode
- move transaction aborts closer to where they happen
* tag 'for-6.17-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: avoid load/store tearing races when checking if an inode was logged
btrfs: fix race between setting last_dir_index_offset and inode logging
btrfs: fix race between logging inode and checking if it was logged before
btrfs: simplify error handling logic for btrfs_link()
btrfs: fix inode leak on failure to add link to inode
btrfs: abort transaction on failure to add link to inode
To solve this problem, switch to printk_safe mode before printing warning
message, this will redirect all printk()-s to a special per-CPU buffer,
which will be flushed later from a safe context (irq work), and this
deadlock problem can be avoided. The proper API to use should be
printk_deferred_enter()/printk_deferred_exit() [2]. Another way is to
place the warn print after kmemleak is released.
ALSA: hda/hdmi: Add pin fix for another HP EliteDesk 800 G4 model
It was reported that HP EliteDesk 800 G4 DM 65W (SSID 103c:845a) needs
the similar quirk for enabling HDMI outputs, too. This patch adds the
corresponding quirk entry.
Tina Wuest [Mon, 1 Sep 2025 09:20:24 +0000 (12:20 +0300)]
ALSA: usb-audio: Allow Focusrite devices to use low samplerates
Commit 05f254a6369ac020fc0382a7cbd3ef64ad997c92 ("ALSA: usb-audio:
Improve filtering of sample rates on Focusrite devices") changed the
check for max_rate in a way which was overly restrictive, forcing
devices to use very high samplerates if they support them, despite
support existing for lower rates as well.
This maintains the intended outcome (ensuring samplerates selected are
supported) while allowing devices with higher maximum samplerates to be
opened at all supported samplerates.
This patch was tested with a Clarett+ 8Pre USB
Fixes: 05f254a6369a ("ALSA: usb-audio: Improve filtering of sample rates on Focusrite devices") Signed-off-by: Tina Wuest <tina@wuest.me> Link: https://patch.msgid.link/20250901092024.140993-1-tina@wuest.me Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Sun, 31 Aug 2025 16:20:17 +0000 (09:20 -0700)]
Merge tag 'x86_urgent_for_v6.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Convert the SSB mitigation to the attack vector controls which got
forgotten at the time
- Prevent the CPUID topology hierarchy detection on AMD from
overwriting the correct initial APIC ID
- Fix the case of a machine shipping without microcode in the BIOS, in
the AMD microcode loader
- Correct the Pentium 4 model range which has a constant TSC
* tag 'x86_urgent_for_v6.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Add attack vector controls for SSB
x86/cpu/topology: Use initial APIC ID from XTOPOLOGY leaf on AMD/HYGON
x86/microcode/AMD: Handle the case of no BIOS microcode
x86/cpu/intel: Fix the constant_tsc model check for Pentium 4
Linus Torvalds [Sun, 31 Aug 2025 16:13:00 +0000 (09:13 -0700)]
Merge tag 'sched_urgent_for_v6.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Fix a stall on the CPU offline path due to mis-counting a deadline
server task twice as part of the runqueue's running tasks count
- Fix a realtime tasks starvation case where failure to enqueue a timer
whose expiration time is already in the past would cause repeated
attempts to re-enqueue a deadline server task which leads to starving
the former, realtime one
- Prevent a delayed deadline server task stop from breaking the
per-runqueue bandwidth tracking
- Have a function checking whether the deadline server task has
stopped, return the correct value
* tag 'sched_urgent_for_v6.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Don't count nr_running for dl_server proxy tasks
sched/deadline: Fix RT task potential starvation when expiry time passed
sched/deadline: Always stop dl-server before changing parameters
sched/deadline: Fix dl_server_stopped()
Linus Torvalds [Sun, 31 Aug 2025 16:07:37 +0000 (09:07 -0700)]
Merge tag 'irq_urgent_for_v6.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Remove unnecessary and noisy WARN_ONs in gic-v5's init path
- Avoid a kmemleak false positive for the gic-v5's L2 IST table entries
- Fix a retval check in mvebu-gicp's probe function
- Fix a wrong conversion to guards in atmel-aic[5] irqchip
* tag 'irq_urgent_for_v6.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v5: Remove undue WARN_ON()s in the IRS affinity parsing
irqchip/gic-v5: Fix kmemleak L2 IST table entries false positives
irqchip/mvebu-gicp: Fix an IS_ERR() vs NULL check in probe()
irqchip/atmel-aic[5]: Fix incorrect lock guard conversion
Linus Torvalds [Sun, 31 Aug 2025 15:56:45 +0000 (08:56 -0700)]
Merge tag 'hardening-v6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- ARM: stacktrace: include asm/sections.h in asm/stacktrace.h (Arnd
Bergmann)
- ubsan: Fix incorrect hand-side used in handle (Junhui Pei)
- hardening: Require clang 20.1.0 for __counted_by (Nathan Chancellor)
* tag 'hardening-v6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
hardening: Require clang 20.1.0 for __counted_by
ARM: stacktrace: include asm/sections.h in asm/stacktrace.h
ubsan: Fix incorrect hand-side used in handle
Linus Torvalds [Sun, 31 Aug 2025 15:49:55 +0000 (08:49 -0700)]
Merge tag 'gpio-fixes-for-v6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix an off-by-one bug in interrupt handling in gpio-timberdale
- update MAINTAINERS
* tag 'gpio-fixes-for-v6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: Change Altera-PIO driver maintainer
gpio: timberdale: fix off-by-one in IRQ type boundary check
Linus Torvalds [Sat, 30 Aug 2025 17:43:53 +0000 (10:43 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- CFI failure due to kpti_ng_pgd_alloc() signature mismatch
- Underallocation bug in the SVE ptrace kselftest
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
kselftest/arm64: Don't open code SVE_PT_SIZE() in fp-ptrace
arm64: mm: Fix CFI failure due to kpti_ng_pgd_alloc function signature
Mark Brown [Tue, 12 Aug 2025 14:49:27 +0000 (15:49 +0100)]
kselftest/arm64: Don't open code SVE_PT_SIZE() in fp-ptrace
In fp-trace when allocating a buffer to write SVE register data we open
code the addition of the header size to the VL depeendent register data
size, which lead to an underallocation bug when we cut'n'pasted the code
for FPSIMD format writes. Use the SVE_PT_SIZE() macro that the kernel
UAPI provides for this.
The tasdev_load_calibrated_data() function expects the calibration data
values in the cali_data buffer as R0, R0Low, InvR0, Power, TLim which
is not the same as what tas2563_save_calibration() writes to the buffer.
Reorder the EFI variables in the tas2563_save_calibration() function
to put the values in the buffer in the correct order.
Fixes: 4fe238513407 ("ALSA: hda/tas2781: Move and unified the calibrated-data getting function for SPI and I2C into the tas2781_hda lib") Cc: <stable@vger.kernel.org> Signed-off-by: Gergo Koteles <soyer@irl.hu> Link: https://patch.msgid.link/20250829160450.66623-2-soyer@irl.hu Signed-off-by: Takashi Iwai <tiwai@suse.de>
Gergo Koteles [Fri, 29 Aug 2025 16:04:49 +0000 (18:04 +0200)]
ALSA: hda: tas2781: fix tas2563 EFI data endianness
Before conversion to unify the calibration data management, the
tas2563_apply_calib() function performed the big endian conversion and
wrote the calibration data to the device. The writing is now done by the
common tasdev_load_calibrated_data() function, but without conversion.
Put the values into the calibration data buffer with the expected
endianness.
Fixes: 4fe238513407 ("ALSA: hda/tas2781: Move and unified the calibrated-data getting function for SPI and I2C into the tas2781_hda lib") Cc: <stable@vger.kernel.org> Signed-off-by: Gergo Koteles <soyer@irl.hu> Link: https://patch.msgid.link/20250829160450.66623-1-soyer@irl.hu Signed-off-by: Takashi Iwai <tiwai@suse.de>
Takashi Sakamoto [Fri, 29 Aug 2025 23:37:49 +0000 (08:37 +0900)]
ALSA: firewire-motu: drop EPOLLOUT from poll return values as write is not supported
The ALSA HwDep character device of the firewire-motu driver incorrectly
returns EPOLLOUT in poll(2), even though the driver implements no operation
for write(2). This misleads userspace applications to believe write() is
allowed, potentially resulting in unnecessarily wakeups.
This issue dates back to the driver's initial code added by a commit 71c3797779d3 ("ALSA: firewire-motu: add hwdep interface"), and persisted
when POLLOUT was updated to EPOLLOUT by a commit a9a08845e9ac ('vfs: do
bulk POLL* -> EPOLL* replacement("").').
Linus Torvalds [Fri, 29 Aug 2025 20:54:26 +0000 (13:54 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Correctly handle 'invariant' system registers for protected VMs
- Improved handling of VNCR data aborts, including external aborts
- Fixes for handling of FEAT_RAS for NV guests, providing a sane
fault context during SEA injection and preventing the use of
RASv1p1 fault injection hardware
- Ensure that page table destruction when a VM is destroyed gives an
opportunity to reschedule
- Large fix to KVM's infrastructure for managing guest context loaded
on the CPU, addressing issues where the output of AT emulation
doesn't get reflected to the guest
- Fix AT S12 emulation to actually perform stage-2 translation when
necessary
- Avoid attempting vLPI irqbypass when GICv4 has been explicitly
disabled for a VM
- Minor KVM + selftest fixes
RISC-V:
- Fix pte settings within kvm_riscv_gstage_ioremap()
- Fix comments in kvm_riscv_check_vcpu_requests()
- Fix stack overrun when setting vlenb via ONE_REG
x86:
- Use array_index_nospec() to sanitize the target vCPU ID when
handling PV IPIs and yields as the ID is guest-controlled.
- Drop a superfluous cpumask_empty() check when reclaiming SEV
memory, as the common case, by far, is that at least one CPU will
have entered the VM, and wbnoinvd_on_cpus_mask() will naturally
handle the rare case where the set of have_run_cpus is empty.
Selftests (not KVM):
- Rename the is_signed_type() macro in kselftest_harness.h to
is_signed_var() to fix a collision with linux/overflow.h. The
collision generates compiler warnings due to the two macros having
different meaning"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits)
KVM: arm64: nv: Fix ATS12 handling of single-stage translation
KVM: arm64: Remove __vcpu_{read,write}_sys_reg_{from,to}_cpu()
KVM: arm64: Fix vcpu_{read,write}_sys_reg() accessors
KVM: arm64: Simplify sysreg access on exception delivery
KVM: arm64: Check for SYSREGS_ON_CPU before accessing the 32bit state
RISC-V: KVM: fix stack overrun when loading vlenb
RISC-V: KVM: Correct kvm_riscv_check_vcpu_requests() comment
RISC-V: KVM: Fix pte settings within kvm_riscv_gstage_ioremap()
KVM: arm64: selftests: Sync ID_AA64MMFR3_EL1 in set_id_regs
KVM: arm64: Get rid of ARM64_FEATURE_MASK()
KVM: arm64: Make ID_AA64PFR1_EL1.RAS_frac writable
KVM: arm64: Make ID_AA64PFR0_EL1.RAS writable
KVM: arm64: Ignore HCR_EL2.FIEN set by L1 guest's EL2
KVM: arm64: Handle RASv1p1 registers
arm64: Add capability denoting FEAT_RASv1p1
KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables
KVM: arm64: Split kvm_pgtable_stage2_destroy()
selftests: harness: Rename is_signed_type() to avoid collision with overflow.h
KVM: SEV: don't check have_run_cpus in sev_writeback_caches()
KVM: arm64: Correctly populate FAR_EL2 on nested SEA injection
...
Nathan Chancellor [Thu, 7 Aug 2025 21:36:28 +0000 (14:36 -0700)]
hardening: Require clang 20.1.0 for __counted_by
After an innocuous change in -next that modified a structure that
contains __counted_by, clang-19 start crashing when building certain
files in drivers/gpu/drm/xe. When assertions are enabled, the more
descriptive failure is:
clang: clang/lib/AST/RecordLayoutBuilder.cpp:3335: const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const: Assertion `D && "Cannot get layout of forward declarations!"' failed.
According to a reverse bisect, a tangential change to the LLVM IR
generation phase of clang during the LLVM 20 development cycle [1]
resolves this problem. Bump the version of clang that enables
CONFIG_CC_HAS_COUNTED_BY to 20.1.0 to ensure that this issue cannot be
hit.
Paolo Bonzini [Fri, 29 Aug 2025 16:57:31 +0000 (12:57 -0400)]
Merge tag 'kvmarm-fixes-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, take #2
- Correctly handle 'invariant' system registers for protected VMs
- Improved handling of VNCR data aborts, including external aborts
- Fixes for handling of FEAT_RAS for NV guests, providing a sane
fault context during SEA injection and preventing the use of
RASv1p1 fault injection hardware
- Ensure that page table destruction when a VM is destroyed gives an
opportunity to reschedule
- Large fix to KVM's infrastructure for managing guest context loaded
on the CPU, addressing issues where the output of AT emulation
doesn't get reflected to the guest
- Fix AT S12 emulation to actually perform stage-2 translation when
necessary
- Avoid attempting vLPI irqbypass when GICv4 has been explicitly
disabled for a VM
Paolo Bonzini [Fri, 29 Aug 2025 16:57:18 +0000 (12:57 -0400)]
Merge tag 'kvm-riscv-fixes-6.17-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 6.17, take #1
- Fix pte settings within kvm_riscv_gstage_ioremap()
- Fix comments in kvm_riscv_check_vcpu_requests()
- Fix stack overrun when setting vlenb via ONE_REG
Linus Torvalds [Fri, 29 Aug 2025 16:15:46 +0000 (09:15 -0700)]
Merge tag 'efi-fixes-for-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- Assorted fixes for the OP-TEE based pseudo-EFI variable store
- Fix for an OOB access when looking up the same non-existing efivarfs
entry multiple times in parallel
* tag 'efi-fixes-for-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efivarfs: Fix slab-out-of-bounds in efivarfs_d_compare
efi: stmm: Drop unneeded null pointer check
efi: stmm: Drop unused EFI error from setup_mm_hdr arguments
efi: stmm: Do not return EFI_OUT_OF_RESOURCES on internal errors
efi: stmm: Fix incorrect buffer allocation method
Linus Torvalds [Fri, 29 Aug 2025 15:09:34 +0000 (08:09 -0700)]
Merge tag 'xfs-fixes-6.17-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
"The highlight I'd like to point here is related to the XFS_RT
Kconfig, which has been updated to be enabled by default now if
CONFIG_BLK_DEV_ZONED is enabled.
This also contains a few fixes for zoned devices support in XFS,
specially related to swapon requests in inodes belonging to the zoned
FS.
A null-ptr dereference fix in the xattr data, due to a mishandling of
medium errors generated by block devices is also included"
* tag 'xfs-fixes-6.17-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: do not propagate ENODATA disk errors into xattr code
xfs: reject swapon for inodes on a zoned file system earlier
xfs: kick off inodegc when failing to reserve zoned blocks
xfs: remove xfs_last_used_zone
xfs: Default XFS_RT to Y if CONFIG_BLK_DEV_ZONED is enabled
Linus Torvalds [Fri, 29 Aug 2025 14:37:21 +0000 (07:37 -0700)]
Merge tag 'regulator-fix-v6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One simple fix for the pm8008 driver for poor error handling,
switching to use a helper which does the right thing in the
affected case"
* tag 'regulator-fix-v6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: pm8008: fix probe failure due to negative voltage selector
Linus Torvalds [Fri, 29 Aug 2025 14:29:17 +0000 (07:29 -0700)]
Merge tag 'ata-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Damien Le Moal:
- Fix the type of return values to be signed in the ahci_xgen driver
(Qianfeng)
- Add the mask_port_ext module parameter to the ahci driver.
This is to allow a user to ignore ports that are advertized as
external (hotplug capable) in favor of lower link power management
policies instead of the default max_performance for these ports.
This is useful to allow e.g. laptops to go into low power states when
hooked up to docking station with sata slots, connected with an
external port for hotplug (me)
* tag 'ata-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: ahci_xgene: Use int type for 'rc' to store error codes
ata: ahci: Allow ignoring the external/hotplug capability of ports
Takashi Iwai [Fri, 29 Aug 2025 09:13:09 +0000 (11:13 +0200)]
Merge tag 'asoc-fix-v6.17-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.17
The main fixes here are for some of the cleanups done in the core in
this release, we had broken component lookup in the case with a single
bus and DMA controller. Otherwise it's driver specific changes, the
shortlogs for the Intel WCL and rsnd drivers look like minor cleanups
but are actually bugfixes (adding an op needed for correct functionality
and reverting an inappropriate helper usage).
Linus Torvalds [Fri, 29 Aug 2025 02:56:32 +0000 (19:56 -0700)]
Merge tag 'drm-fixes-2025-08-29' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes, feels a bit big.
The major piece is msm fixes, then the usual amdgpu/xe along with some
mediatek and nouveau fixes and a tegra revert.
gpuvm:
- fix some typos
xe:
- Fix user-fence race issue
- Couple xe_vm fixes
- Don't trigger rebind on initial dma-buf validation
- Fix a build issue related to basename() posix vs gnu discrepancy
nouveau:
- fix linear modifier
- remove some dead code
msm:
- Core/GPU:
- fix comment doc warning in gpuvm
- fix build with KMS disabled
- fix pgtable setup/teardown race
- global fault counter fix
- various error path fixes
- GPU devcoredump snapshot fixes
- handle in-place VM_BIND remaps to solve turnip vm update race
- skip re-emitting IBs for unusable VMs
- Don't use %pK through printk
- moved display snapshot init earlier, fixing a crash
- DPU:
- Fixed crash in virtual plane checking code
- Fixed mode comparison in virtual plane checking code
- DSI:
- Adjusted width of resulution-related registers
- Fixed locking issue on 14nm PLLs
- UBWC (per Bjorn's ack)
- Added UBWC configuration for several missing platforms (fixing
regression)
mediatek:
- Add error handling for old state CRTC in atomic_disable
- Fix DSI host and panel bridge pre-enable order
- Fix device/node reference count leaks in mtk_drm_get_all_drm_priv
- mtk_hdmi: Fix inverted parameters in some regmap_update_bits calls
tegra:
- revert dma-buf change"
* tag 'drm-fixes-2025-08-29' of https://gitlab.freedesktop.org/drm/kernel: (56 commits)
drm/mediatek: mtk_hdmi: Fix inverted parameters in some regmap_update_bits calls
drm/amdgpu/userq: fix error handling of invalid doorbell
drm/amdgpu: update firmware version checks for user queue support
drm/amd/amdgpu: disable hwmon power1_cap* for gfx 11.0.3 on vf mode
Revert "drm/amdgpu: fix incorrect vm flags to map bo"
drm/amdgpu/gfx12: set MQD as appriopriate for queue types
drm/amdgpu/gfx11: set MQD as appriopriate for queue types
drm/xe: switch to local xbasename() helper
drm/xe: Don't trigger rebind on initial dma-buf validation
drm/xe/vm: Clear the scratch_pt pointer on error
drm/xe/vm: Don't pin the vm_resv during validation
drm/xe/xe_sync: avoid race during ufence signaling
Revert "drm/tegra: Use dma_buf from GEM object instance"
soc: qcom: use no-UBWC config for MSM8956/76
soc: qcom: add configuration for MSM8929
soc: qcom: ubwc: add more missing platforms
soc: qcom: ubwc: use no-uwbc config for MSM8917
drm/msm/dpu: Add a null ptr check for dpu_encoder_needs_modeset
dt-bindings: display/msm: qcom,mdp5: drop lut clock
drm/gpuvm: fix various typos in .c and .h gpuvm file
...
Linus Torvalds [Fri, 29 Aug 2025 01:51:28 +0000 (18:51 -0700)]
Merge tag 'block-6.17-20250828' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fix a lockdep spotted issue on recursive locking for zoned writes, in
case of errors
- Update bcache MAINTAINERS entry address for Coly
- Fix for a ublk release issue, with selftests
- Fix for a regression introduced in this cycle, where it assumed
q->rq_qos was always set if the bio flag indicated that
- Fix for a regression introduced in this cycle, where loop retrieving
block device sizes got broken
* tag 'block-6.17-20250828' of git://git.kernel.dk/linux:
bcache: change maintainer's email address
ublk selftests: add --no_ublk_fixed_fd for not using registered ublk char device
ublk: avoid ublk_io_release() called after ublk char dev is closed
block: validate QoS before calling __rq_qos_done_bio()
blk-zoned: Fix a lockdep complaint about recursive locking
loop: fix zero sized loop for block special file
Linus Torvalds [Fri, 29 Aug 2025 01:41:53 +0000 (18:41 -0700)]
Merge tag 'io_uring-6.17-20250828' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Use the proper type for min_t() in getting the min of the leftover
bytes and the buffer length.
- As good practice, use READ_ONCE() consistently for reading ring
provided buffer lengths. Additionally, stop looping for incremental
commits if a zero sized buffer is hit, as no further progress can be
made at that point.
* tag 'io_uring-6.17-20250828' of git://git.kernel.dk/linux:
io_uring/kbuf: always use READ_ONCE() to read ring provided buffer lengths
io_uring/kbuf: fix signedness in this_len calculation
Linus Torvalds [Fri, 29 Aug 2025 00:35:51 +0000 (17:35 -0700)]
Merge tag 'net-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth.
Current release - regressions:
- ipv4: fix regression in local-broadcast routes
- vsock: fix error-handling regression introduced in v6.17-rc1
Previous releases - regressions:
- bluetooth:
- mark connection as closed during suspend disconnect
- fix set_local_name race condition
- eth:
- ice: fix NULL pointer dereference on reset
- mlx5: fix memory leak in hws_pool_buddy_init error path
- bnxt_en: fix stats context reservation logic
- hv: fix loss of receive events from host during channel open
Previous releases - always broken:
- page_pool: fix incorrect mp_ops error handling
- sctp: initialize more fields in sctp_v6_from_sk()
- eth:
- octeontx2-vf: fix max packet length errors
- idpf: fix Tx flow scheduling to avoid Tx timeouts
- bnxt_en: fix memory corruption during ifdown
- ice: fix incorrect counter for buffer allocation failures
- mlx5: fix lockdep assertion on sync reset unload event
- fbnic: fixup rtnl_lock and devl_lock handling
- xgmac: do not enable RX FIFO overflow interrupts
- phy: mscc: fix when PTP clock is register and unregister
Misc:
- add Telit Cinterion LE910C4-WWX new compositions"
* tag 'net-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits)
net: ipv4: fix regression in local-broadcast routes
net: macb: Disable clocks once
fbnic: Move phylink resume out of service_task and into open/close
fbnic: Fixup rtnl_lock and devl_lock handling related to mailbox code
net: rose: fix a typo in rose_clear_routes()
l2tp: do not use sock_hold() in pppol2tp_session_get_sock()
sctp: initialize more fields in sctp_v6_from_sk()
MAINTAINERS: rmnet: Update email addresses
net: rose: include node references in rose_neigh refcount
net: rose: convert 'use' field to refcount_t
net: rose: split remove and free operations in rose_remove_neigh()
net: hv_netvsc: fix loss of early receive events from host during channel open.
net: stmmac: Set CIC bit only for TX queues with COE
net: stmmac: xgmac: Correct supported speed modes
net: stmmac: xgmac: Do not enable RX FIFO Overflow interrupts
net/mlx5e: Set local Xoff after FW update
net/mlx5e: Update and set Xon/Xoff upon port speed set
net/mlx5e: Update and set Xon/Xoff upon MTU set
net/mlx5: Prevent flow steering mode changes in switchdev mode
net/mlx5: Nack sync reset when SFs are present
...
1. Add error handling for old state CRTC in atomic_disable
2. Fix DSI host and panel bridge pre-enable order
3. Fix device/node reference count leaks in mtk_drm_get_all_drm_priv
4. mtk_hdmi: Fix inverted parameters in some regmap_update_bits calls
Linus Torvalds [Thu, 28 Aug 2025 23:34:32 +0000 (16:34 -0700)]
Merge tag 'pm-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Add missing locking annotations to two recently introduced
list_for_each_entry_rcu() loops in the core device suspend/resume
code (Johannes Berg)"
* tag 'pm-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: annotate RCU list iterations
Louis-Alexis Eyraud [Mon, 18 Aug 2025 14:17:52 +0000 (16:17 +0200)]
drm/mediatek: mtk_hdmi: Fix inverted parameters in some regmap_update_bits calls
In mtk_hdmi driver, a recent change replaced custom register access
function calls by regmap ones, but two replacements by regmap_update_bits
were done incorrectly, because original offset and mask parameters were
inverted, so fix them.
Linus Torvalds [Thu, 28 Aug 2025 23:04:14 +0000 (16:04 -0700)]
Merge tag 'dma-mapping-6.17-2025-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fixes from Marek Szyprowski:
- another small fix for arm64 systems with memory encryption (Shanker
Donthineni)
- fix for arm32 systems with non-standard CMA configuration (Oreoluwa
Babatunde)
* tag 'dma-mapping-6.17-2025-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma/pool: Ensure DMA_DIRECT_REMAP allocations are decrypted
of: reserved_mem: Restructure call site for dma_contiguous_early_fixup()
Linus Torvalds [Thu, 28 Aug 2025 22:46:06 +0000 (15:46 -0700)]
Merge tag 'fixes-2025-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fixes from Mike Rapoport:
- printk cleanups in memblock and numa_memblks
- update kernel-doc for MEMBLOCK_RSRV_NOINIT to be more accurate and
detailed
* tag 'fixes-2025-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: fix kernel-doc for MEMBLOCK_RSRV_NOINIT
mm: numa,memblock: Use SZ_1M macro to denote bytes to MB conversion
mm/numa_memblks: Use pr_debug instead of printk(KERN_DEBUG)
Dave Airlie [Thu, 28 Aug 2025 22:44:10 +0000 (08:44 +1000)]
Merge tag 'drm-misc-fixes-2025-08-28' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Several nouveau fixes to remove unused code, fix an error path and be
less restrictive with the formats it accepts. A fix for amdgpu to pin
vmapped dma-buf, and a revert for tegra for a regression in the dma-buf
/ GEM code.
Linus Torvalds [Thu, 28 Aug 2025 22:39:06 +0000 (15:39 -0700)]
Merge tag 'powerpc-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- Merge two CONFIG_POWERPC64_CPU entries in Kconfig.cputype
- Replace extra-y to always-y in Makefile
- Cleanup to use dev_fwnode helper
- Fix misleading comment in kvmppc_prepare_to_enter()
- misc cleanup and fixes
Thanks to Amit Machhiwal, Andrew Donnellan, Christophe Leroy, Gautam
Menghani, Jiri Slaby (SUSE), Masahiro Yamada, Shrikanth Hegde, Stephen
Rothwell, Venkat Rao Bagalkote, and Xichao Zhao
* tag 'powerpc-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/boot/install.sh: Fix shellcheck warnings
powerpc/prom_init: Fix shellcheck warnings
powerpc/kvm: Fix ifdef to remove build warning
powerpc: unify two CONFIG_POWERPC64_CPU entries in the same choice block
powerpc: use always-y instead of extra-y in Makefiles
powerpc/64: Drop unnecessary 'rc' variable
powerpc: Use dev_fwnode()
KVM: PPC: Fix misleading interrupts comment in kvmppc_prepare_to_enter()
Marc Zyngier [Sat, 9 Aug 2025 14:48:10 +0000 (15:48 +0100)]
KVM: arm64: nv: Fix ATS12 handling of single-stage translation
Volodymyr reports that using a Xen DomU as a nested guest (where
HCR_EL2.E2H == 0), ATS12 results in a translation that stops at
the L2's S1, which isn't something you'd normally expects.
Comparing the code against the spec proves to be illuminating,
and suggests that the author of such code must have been tired,
cross-eyed, drunk, or maybe all of the above.
The gist of it is that, apart from HCR_EL2.VM or HCR_EL2.DC being
0, only the use of the EL2&0 translation regime limits the walk
to S1 only, and that we must finish the S2 walk in any other case.
Which solves the above issue, as E2H==0 indicates that ATS12 walks
the EL1&0 translation regime.
Ajye Huang [Tue, 26 Aug 2025 15:40:40 +0000 (23:40 +0800)]
ASoC: SOF: Intel: WCL: Add the sdw_process_wakeen op
Add the missing op in the device description to avoid issues with jack
detection.
Fixes: 6b04629ae97a ("ASoC: SOF: Intel: add initial support for WCL") Acked-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Signed-off-by: Ajye Huang <ajye_huang@compal.corp-partner.google.com>
Message-ID: <20250826154040.2723998-1-ajye_huang@compal.corp-partner.google.com> Signed-off-by: Mark Brown <broonie@kernel.org>
Volodymyr reports (again!) that under some circumstances (E2H==0,
walking S1 PTs), PAR_EL1 doesn't report the value of the latest
walk in the CPU register, but that instead the value is written to
the backing store.
Further investigation indicates that the root cause of this is
that a group of registers (PAR_EL1, TPIDR*_EL{0,1}, the *32_EL2 dregs)
should always be considered as "on CPU", as they are not remapped
between EL1 and EL2.
We fail to treat them accordingly, and end-up considering that
the register (PAR_EL1 in this example) should be written to memory
instead of in the register.
While it would be possible to quickly work around it, it is obvious
that the way we track these things at the moment is pretty horrible,
and could do with some improvement.
Revamp the whole thing by:
- defining a location for a register (memory, cpu), potentially
depending on the state of the vcpu
- define a transformation for this register (mapped register, potential
translation, special register needing some particular attention)
- convey this information in a structure that can be easily passed
around
As a result, the accessors themselves become much simpler, as the
state is explicit instead of being driven by hard-to-understand
conventions.
We get rid of the "pure EL2 register" notion, which wasn't very
useful, and add sanitisation of the values by applying the RESx
masks as required, something that was missing so far.
And of course, we add the missing registers to the list, with the
indication that they are always loaded.
Marc Zyngier [Sun, 17 Aug 2025 12:19:24 +0000 (13:19 +0100)]
KVM: arm64: Simplify sysreg access on exception delivery
Distinguishing between NV and VHE is slightly pointless, and only
serves as an extra complication, or a way to introduce bugs, such
as the way SPSR_EL1 gets written without checking for the state
being resident.
Get rid if this silly distinction, and fix the bug in one go.
Marc Zyngier [Sun, 17 Aug 2025 12:19:23 +0000 (13:19 +0100)]
KVM: arm64: Check for SYSREGS_ON_CPU before accessing the 32bit state
Just like c6e35dff58d3 ("KVM: arm64: Check for SYSREGS_ON_CPU before
accessing the CPU state") fixed the 64bit state access, add a check
for the 32bit state actually being on the CPU before writing it.
Takashi Iwai [Thu, 28 Aug 2025 14:11:00 +0000 (16:11 +0200)]
ALSA: hda: Avoid binding with SOF for SKL/KBL platforms
For Intel SKL and KBL platforms, it may be bound with one of three
HD-audio drivers (AVS, SOF and legacy). AVS is the preferred one when
DMIC is detected, and that's how it's defined in the snd-intel-dspcfg
config table.
But, when AVS driver is disabled (CONFIG_SND_SOC_INTEL_AVS=n), the
device may be bound freely with either SOF or legacy driver.
Before 6.17, the legacy driver took it primarily, but on 6.17, likely
due to the recent code shuffling, SOF driver seems taking it at first,
and fails to probe. For avoiding the regression, we should enforce to
bind those with the legacy HD-audio drvier when AVS is disabled.
This patch adds the extra two entries in intel-dspcfg table that are
applied only when CONFIG_SND_SOC_INTEL_AVS=n, for binding with the
legacy driver.
Note that there are entries for APL in that config table block, but
APL may be supported by SOF for certain setups, so the choice can't be
exclusive. Hence this patch includes only SKL and KBL.
Ming Lei [Wed, 27 Aug 2025 12:16:00 +0000 (20:16 +0800)]
ublk selftests: add --no_ublk_fixed_fd for not using registered ublk char device
Add a new command line option --no_ublk_fixed_fd that excludes the ublk
control device (/dev/ublkcN) from io_uring's registered files array.
When this option is used, only backing files are registered starting
from index 1, while the ublk control device is accessed using its raw
file descriptor.
Add ublk_get_registered_fd() helper function that returns the appropriate
file descriptor for use with io_uring operations.
Key optimizations implemented:
- Cache UBLKS_Q_NO_UBLK_FIXED_FD flag in ublk_queue.flags to avoid
reading dev->no_ublk_fixed_fd in fast path
- Cache ublk char device fd in ublk_queue.ublk_fd for fast access
- Update ublk_get_registered_fd() to use ublk_queue * parameter
- Update io_uring_prep_buf_register/unregister() to use ublk_queue *
- Replace ublk_device * access with ublk_queue * access in fast paths
Also pass --no_ublk_fixed_fd to test_stress_04.sh for covering
plain ublk char device mode.
Ming Lei [Wed, 27 Aug 2025 12:15:59 +0000 (20:15 +0800)]
ublk: avoid ublk_io_release() called after ublk char dev is closed
When running test_stress_04.sh, the following warning is triggered:
WARNING: CPU: 1 PID: 135 at drivers/block/ublk_drv.c:1933 ublk_ch_release+0x423/0x4b0 [ublk_drv]
This happens when the daemon is abruptly killed:
- some references may still be held, because registering IO buffer
doesn't grab ublk char device reference
OR
- io->task_registered_buffers won't be cleared because io buffer is
released from non-daemon context
For zero-copy and auto buffer register modes, I/O reference crosses
syscalls, so IO reference may not be dropped naturally when ublk server is
killed abruptly. However, when releasing io_uring context, it is guaranteed
that the reference is dropped finally, see io_sqe_buffers_unregister() from
io_ring_ctx_free().
Fix this by adding ublk_drain_io_references() that:
- Waits for active I/O references dropped in async way by scheduling
work function, for avoiding ublk dev and io_uring file's release
dependency
- Reinitializes io->ref and io->task_registered_buffers to clean state
This ensures the reference count state is clean when ublk_queue_reinit()
is called, preventing the warning and potential use-after-free.
Fixes: 1f6540e2aabb ("ublk: zc register/unregister bvec") Fixes: 1ceeedb59749 ("ublk: optimize UBLK_IO_UNREGISTER_IO_BUF on daemon task") Fixes: 8a8fe42d765b ("ublk: optimize UBLK_IO_REGISTER_IO_BUF on daemon task") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250827121602.2619736-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 27 Aug 2025 21:27:30 +0000 (15:27 -0600)]
io_uring/kbuf: always use READ_ONCE() to read ring provided buffer lengths
Since the buffers are mapped from userspace, it is prudent to use
READ_ONCE() to read the value into a local variable, and use that for
any other actions taken. Having a stable read of the buffer length
avoids worrying about it changing after checking, or being read multiple
times.
Similarly, the buffer may well change in between it being picked and
being committed. Ensure the looping for incremental ring buffer commit
stops if it hits a zero sized buffer, as no further progress can be made
at that point.
Kuninori Morimoto [Tue, 26 Aug 2025 06:30:01 +0000 (06:30 +0000)]
ASoC: rsnd: tidyup direction name on rsnd_dai_connect()
commit 2c6b6a3e8b93 ("ASoC: rsnd: use snd_pcm_direction_name()") uses
snd_pcm_direction_name() instead of original method to get string
"Playback" or "Capture". But io->substream might be NULL in this timing.
Let's re-use original method.
Fixes: 2c6b6a3e8b93 ("ASoC: rsnd: use snd_pcm_direction_name()") Reported-by: Thuan Nguyen <thuan.nguyen-hong@banvien.com.vn> Tested-by: Thuan Nguyen <thuan.nguyen-hong@banvien.com.vn> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Message-ID: <87zfbmwq6v.wl-kuninori.morimoto.gx@renesas.com> Signed-off-by: Mark Brown <broonie@kernel.org>
Oscar Maes [Wed, 27 Aug 2025 06:23:21 +0000 (08:23 +0200)]
net: ipv4: fix regression in local-broadcast routes
Commit 9e30ecf23b1b ("net: ipv4: fix incorrect MTU in broadcast routes")
introduced a regression where local-broadcast packets would have their
gateway set in __mkroute_output, which was caused by fi = NULL being
removed.
Fix this by resetting the fib_info for local-broadcast packets. This
preserves the intended changes for directed-broadcast packets.
Cc: stable@vger.kernel.org Fixes: 9e30ecf23b1b ("net: ipv4: fix incorrect MTU in broadcast routes") Reported-by: Brett A C Sheffield <bacs@librecast.net> Closes: https://lore.kernel.org/regressions/20250822165231.4353-4-bacs@librecast.net Signed-off-by: Oscar Maes <oscmaes92@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250827062322.4807-1-oscmaes92@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Neil Mandir [Tue, 26 Aug 2025 14:30:22 +0000 (10:30 -0400)]
net: macb: Disable clocks once
When the driver is removed the clocks are disabled twice: once in
macb_remove and a second time by runtime pm. Disable wakeup in remove so
all the clocks are disabled and skip the second call to macb_clks_disable.
Always suspend the device as we always set it active in probe.
Fixes: d54f89af6cc4 ("net: macb: Add pm runtime support") Signed-off-by: Neil Mandir <neil.mandir@seco.com> Co-developed-by: Sean Anderson <sean.anderson@linux.dev> Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250826143022.935521-1-sean.anderson@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If dentry->d_name.len < EFI_VARIABLE_GUID_LEN , 'guid' can become
negative, leadings to oob. The issue can be triggered by parallel
lookups using invalid filename:
T1 T2
lookup_open
->lookup
simple_lookup
d_add
// invalid dentry is added to hash list
lookup_open
d_alloc_parallel
__d_lookup_rcu
__d_lookup_rcu_op_compare
hlist_bl_for_each_entry_rcu
// invalid dentry can be retrieved
->d_compare
efivarfs_d_compare
// oob
Fix it by checking 'guid' before cmp.
Fixes: da27a24383b2 ("efivarfs: guid part of filenames are case-insensitive") Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Harry Yoo [Mon, 18 Aug 2025 02:02:06 +0000 (11:02 +0900)]
x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings()
Define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to ensure
page tables are properly synchronized when calling p*d_populate_kernel().
For 5-level paging, synchronization is performed via
pgd_populate_kernel(). In 4-level paging, pgd_populate() is a no-op, so
synchronization is instead performed at the P4D level via
p4d_populate_kernel().
This fixes intermittent boot failures on systems using 4-level paging and
a large amount of persistent memory:
Harry Yoo [Mon, 18 Aug 2025 02:02:05 +0000 (11:02 +0900)]
mm: introduce and use {pgd,p4d}_populate_kernel()
Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
populating PGD and P4D entries for the kernel address space. These
helpers ensure proper synchronization of page tables when updating the
kernel portion of top-level page tables.
Until now, the kernel has relied on each architecture to handle
synchronization of top-level page tables in an ad-hoc manner. For
example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for direct
mapping and vmemmap mapping changes").
However, this approach has proven fragile for following reasons:
1) It is easy to forget to perform the necessary page table
synchronization when introducing new changes.
For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory
savings for compound devmaps") overlooked the need to synchronize
page tables for the vmemmap area.
2) It is also easy to overlook that the vmemmap and direct mapping areas
must not be accessed before explicit page table synchronization.
For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated
sub-pmd ranges")) caused crashes by accessing the vmemmap area
before calling sync_global_pgds().
To address this, as suggested by Dave Hansen, introduce _kernel() variants
of the page table population helpers, which invoke architecture-specific
hooks to properly synchronize page tables. These are introduced in a new
header file, include/linux/pgalloc.h, so they can be called from common
code.
They reuse existing infrastructure for vmalloc and ioremap.
Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
and the actual synchronization is performed by
arch_sync_kernel_mappings().
This change currently targets only x86_64, so only PGD and P4D level
helpers are introduced. Currently, these helpers are no-ops since no
architecture sets PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.
In theory, PUD and PMD level helpers can be added later if needed by other
architectures. For now, 32-bit architectures (x86-32 and arm) only handle
PGTBL_PMD_MODIFIED, so p*d_populate_kernel() will never affect them unless
we introduce a PMD level helper.
[harry.yoo@oracle.com: fix KASAN build error due to p*d_populate_kernel()] Link: https://lkml.kernel.org/r/20250822020727.202749-1-harry.yoo@oracle.com Link: https://lkml.kernel.org/r/20250818020206.4517-3-harry.yoo@oracle.com Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: bibo mao <maobibo@loongson.cn> Cc: Borislav Betkov <bp@alien8.de> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Harry Yoo [Mon, 18 Aug 2025 02:02:04 +0000 (11:02 +0900)]
mm: move page table sync declarations to linux/pgtable.h
During our internal testing, we started observing intermittent boot
failures when the machine uses 4-level paging and has a large amount of
persistent memory:
It turns out that the kernel panics while initializing vmemmap (struct
page array) when the vmemmap region spans two PGD entries, because the new
PGD entry is only installed in init_mm.pgd, but not in the page tables of
other tasks.
And looking at __populate_section_memmap():
if (vmemmap_can_optimize(altmap, pgmap))
// does not sync top level page tables
r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap);
else
// sync top level page tables in x86
r = vmemmap_populate(start, end, nid, altmap);
In the normal path, vmemmap_populate() in arch/x86/mm/init_64.c
synchronizes the top level page table (See commit 9b861528a801 ("x86-64,
mem: Update all PGDs for direct mapping and vmemmap mapping changes")) so
that all tasks in the system can see the new vmemmap area.
However, when vmemmap_can_optimize() returns true, the optimized path
skips synchronization of top-level page tables. This is because
vmemmap_populate_compound_pages() is implemented in core MM code, which
does not handle synchronization of the top-level page tables. Instead,
the core MM has historically relied on each architecture to perform this
synchronization manually.
We're not the first party to encounter a crash caused by not-sync'd top
level page tables: earlier this year, Gwan-gyeong Mun attempted to address
the issue [1] [2] after hitting a kernel panic when x86 code accessed the
vmemmap area before the corresponding top-level entries were synced. At
that time, the issue was believed to be triggered only when struct page
was enlarged for debugging purposes, and the patch did not get further
updates.
It turns out that current approach of relying on each arch to handle the
page table sync manually is fragile because 1) it's easy to forget to sync
the top level page table, and 2) it's also easy to overlook that the
kernel should not access the vmemmap and direct mapping areas before the
sync.
# The solution: Make page table sync more code robust and harder to miss
To address this, Dave Hansen suggested [3] [4] introducing
{pgd,p4d}_populate_kernel() for updating kernel portion of the page tables
and allow each architecture to explicitly perform synchronization when
installing top-level entries. With this approach, we no longer need to
worry about missing the sync step, reducing the risk of future
regressions.
The new interface reuses existing ARCH_PAGE_TABLE_SYNC_MASK,
PGTBL_P*D_MODIFIED and arch_sync_kernel_mappings() facility used by
vmalloc and ioremap to synchronize page tables.
pgd_populate_kernel() looks like this:
static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd,
p4d_t *p4d)
{
pgd_populate(&init_mm, pgd, p4d);
if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED)
arch_sync_kernel_mappings(addr, addr);
}
It is worth noting that vmalloc() and apply_to_range() carefully
synchronizes page tables by calling p*d_alloc_track() and
arch_sync_kernel_mappings(), and thus they are not affected by this patch
series.
This series was hugely inspired by Dave Hansen's suggestion and hence
added Suggested-by: Dave Hansen.
Cc stable because lack of this series opens the door to intermittent
boot failures.
This patch (of 3):
Move ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to
linux/pgtable.h so that they can be used outside of vmalloc and ioremap.
Link: https://lkml.kernel.org/r/20250818020206.4517-1-harry.yoo@oracle.com Link: https://lkml.kernel.org/r/20250818020206.4517-2-harry.yoo@oracle.com Link: https://lore.kernel.org/linux-mm/20250220064105.808339-1-gwan-gyeong.mun@intel.com Link: https://lore.kernel.org/linux-mm/20250311114420.240341-1-gwan-gyeong.mun@intel.com Link: https://lore.kernel.org/linux-mm/d1da214c-53d3-45ac-a8b6-51821c5416e4@intel.com Link: https://lore.kernel.org/linux-mm/4d800744-7b88-41aa-9979-b245e8bf794b@intel.com Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: bibo mao <maobibo@loongson.cn> Cc: Borislav Betkov <bp@alien8.de> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
wangzijie [Mon, 18 Aug 2025 12:31:02 +0000 (20:31 +0800)]
proc: fix missing pde_set_flags() for net proc files
To avoid potential UAF issues during module removal races, we use
pde_set_flags() to save proc_ops flags in PDE itself before
proc_register(), and then use pde_has_proc_*() helpers instead of directly
dereferencing pde->proc_ops->*.
However, the pde_set_flags() call was missing when creating net related
proc files. This omission caused incorrect behavior which FMODE_LSEEK was
being cleared inappropriately in proc_reg_open() for net proc files. Lars
reported it in this link[1].
Fix this by ensuring pde_set_flags() is called when register proc entry,
and add NULL check for proc_ops in pde_set_flags().
[wangzijie1@honor.com: stash pde->proc_ops in a local const variable, per Christian] Link: https://lkml.kernel.org/r/20250821105806.1453833-1-wangzijie1@honor.com Link: https://lkml.kernel.org/r/20250818123102.959595-1-wangzijie1@honor.com Link: https://lore.kernel.org/all/20250815195616.64497967@chagall.paradoxon.rec/ Fixes: ff7ec8dc1b64 ("proc: use the same treatment to check proc_lseek as ones for proc_read_iter et.al") Signed-off-by: wangzijie <wangzijie1@honor.com> Reported-by: Lars Wendler <polynomial-c@gmx.de> Tested-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Petr Vaněk <pv@excello.cz>
Tested by: Lars Wendler <polynomial-c@gmx.de> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Kirill A. Shutemov <k.shutemov@gmail.com> Cc: wangzijie <wangzijie1@honor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sumanth Korikkar [Thu, 7 Aug 2025 18:35:45 +0000 (20:35 +0200)]
mm: fix accounting of memmap pages
For !CONFIG_SPARSEMEM_VMEMMAP, memmap page accounting is currently done
upfront in sparse_buffer_init(). However, sparse_buffer_alloc() may
return NULL in failure scenario.
Also, memmap pages may be allocated either from the memblock allocator
during early boot or from the buddy allocator. When removed via
arch_remove_memory(), accounting of memmap pages must reflect the original
allocation source.
To ensure correctness:
* Account memmap pages after successful allocation in sparse_init_nid()
and section_activate().
* Account memmap pages in section_deactivate() based on allocation
source.
Quanmin Yan [Thu, 21 Aug 2025 12:55:55 +0000 (20:55 +0800)]
mm/damon/core: prevent unnecessary overflow in damos_set_effective_quota()
On 32-bit systems, the throughput calculation in
damos_set_effective_quota() is prone to unnecessary multiplication
overflow. Using mult_frac() to fix it.
Andrew Paniakin also recently found and privately reported this issue, on
64 bit systems. This can also happen on 64-bit systems, once the charged
size exceeds ~17 TiB. On systems running for long time in production,
this issue can actually happen.
More specifically, when a DAMOS scheme having the time quota run for
longtime, throughput calculation can overflow and set esz too small. As a
result, speed of the scheme get unexpectedly slow.
Link: https://lkml.kernel.org/r/20250821125555.3020951-1-yanquanmin1@huawei.com Fixes: 1cd243030059 ("mm/damon/schemes: implement time quota") Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reported-by: Andrew Paniakin <apanyaki@amazon.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: ze zuo <zuoze1@huawei.com> Cc: <stable@vger.kernel.org> [5.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Brian Mak [Tue, 5 Aug 2025 21:15:26 +0000 (14:15 -0700)]
kexec: add KEXEC_FILE_NO_CMA as a legal flag
Commit 07d24902977e ("kexec: enable CMA based contiguous allocation")
introduces logic to use CMA-based allocation in kexec by default. As part
of the changes, it introduces a kexec_file_load flag to disable the use of
CMA allocations from userspace. However, this flag is broken since it is
missing from the list of legal flags for kexec_file_load. kexec_file_load
returns EINVAL when attempting to use the flag.
Fix this by adding the KEXEC_FILE_NO_CMA flag to the list of legal flags
for kexec_file_load.
Without this fix, kexec_file_load syscall will failed and return
'-EINVAL' when KEXEC_FILE_NO_CMA is specified.
Link: https://lkml.kernel.org/r/20250805211527.122367-2-makb@juniper.net Fixes: 07d24902977e ("kexec: enable CMA based contiguous allocation") Signed-off-by: Brian Mak <makb@juniper.net> Acked-by: Baoquan He <bhe@redhat.com> Cc: Alexander Graf <graf@amazon.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Rob Herring <robh@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ada Couprie Diaz [Thu, 21 Aug 2025 12:07:35 +0000 (13:07 +0100)]
kasan: fix GCC mem-intrinsic prefix with sw tags
GCC doesn't support "hwasan-kernel-mem-intrinsic-prefix", only
"asan-kernel-mem-intrinsic-prefix"[0], while LLVM supports both. This is
already taken into account when checking
"CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX", but not in the KASAN Makefile
adding those parameters when "CONFIG_KASAN_SW_TAGS" is enabled.
Replace the version check with "CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX",
which already validates that mem-intrinsic prefix parameter can be used,
and choose the correct name depending on compiler.
GCC 13 and above trigger "CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX" which
prevents `mem{cpy,move,set}()` being redefined in "mm/kasan/shadow.c"
since commit 36be5cba99f6 ("kasan: treat meminstrinsic as builtins in
uninstrumented files"), as we expect the compiler to prefix those calls
with `__(hw)asan_` instead. But as the option passed to GCC has been
incorrect, the compiler has not been emitting those prefixes, effectively
never calling the instrumented versions of `mem{cpy,move,set}()` with
"CONFIG_KASAN_SW_TAGS" enabled.
If "CONFIG_FORTIFY_SOURCES" is enabled, this issue would be mitigated as
it redefines `mem{cpy,move,set}()` and properly aliases the
`__underlying_mem*()` that will be called to the instrumented versions.
Link: https://lkml.kernel.org/r/20250821120735.156244-1-ada.coupriediaz@arm.com Link: https://gcc.gnu.org/onlinedocs/gcc-13.4.0/gcc/Optimize-Options.html Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Fixes: 36be5cba99f6 ("kasan: treat meminstrinsic as builtins in uninstrumented files") Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexander Gordeev [Mon, 18 Aug 2025 16:39:13 +0000 (18:39 +0200)]
mm/kasan: avoid lazy MMU mode hazards
Functions __kasan_populate_vmalloc() and __kasan_depopulate_vmalloc() use
apply_to_pte_range(), which enters lazy MMU mode. In that mode updating
PTEs may not be observed until the mode is left.
That may lead to a situation in which otherwise correct reads and writes
to a PTE using ptep_get(), set_pte(), pte_clear() and other access
primitives bring wrong results when the vmalloc shadow memory is being
(de-)populated.
To avoid these hazards leave the lazy MMU mode before and re-enter it
after each PTE manipulation.
Link: https://lkml.kernel.org/r/0d2efb7ddddbff6b288fbffeeb10166e90771718.1755528662.git.agordeev@linux.ibm.com Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
While working on the lazy MMU mode enablement for s390 I hit pretty
curious issues in the kasan code.
The first is related to a custom kasan-based sanitizer aimed at catching
invalid accesses to PTEs and is inspired by [1] conversation. The kasan
complains on valid PTE accesses, while the shadow memory is reported as
unpoisoned:
This series fixes the above issues and is a pre-requisite for the s390
lazy MMU mode implementation.
test_vmalloc was used to stress-test the fixes.
This patch (of 2):
When vmalloc shadow memory is established the modification of the
corresponding page tables is not protected by any locks. Instead, the
locking is done per-PTE. This scheme however has defects.
kasan_populate_vmalloc_pte() - while ptep_get() read is atomic the
sequence pte_none(ptep_get()) is not. Doing that outside of the lock
might lead to a concurrent PTE update and what could be seen as a shadow
memory corruption as result.
kasan_depopulate_vmalloc_pte() - by the time a page whose address was
extracted from ptep_get() read and cached in a local variable outside of
the lock is attempted to get free, could actually be freed already.
To avoid these put ptep_get() itself and the code that manipulates the
result of the read under lock. In addition, move freeing of the page out
of the atomic context.
Zi Yan [Tue, 5 Aug 2025 17:51:40 +0000 (13:51 -0400)]
selftests/mm: fix FORCE_READ to read input value correctly
FORCE_READ() converts input value x to its pointer type then reads from
address x. This is wrong. If x is a non-pointer, it would be caught it
easily. But all FORCE_READ() callers are trying to read from a pointer
and FORCE_READ() basically reads a pointer to a pointer instead of the
original typed pointer. Almost no access violation was found, except the
one from split_huge_page_test.
Fix it by implementing a simplified READ_ONCE() instead.
Link: https://lkml.kernel.org/r/20250805175140.241656-1-ziy@nvidia.com Fixes: 3f6bfd4789a0 ("selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));"") Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: wang lian <lianux.mm@gmail.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Kairui Song <ryncsn@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE
With CONFIG_HIGHPTE on 32-bit ARM, move_pages_pte() maps PTE pages using
kmap_local_page(), which requires unmapping in Last-In-First-Out order.
The current code maps dst_pte first, then src_pte, but unmaps them in the
same order (dst_pte, src_pte), violating the LIFO requirement. This
causes the warning in kunmap_local_indexed():
Fix this by reversing the unmap order to respect LIFO ordering.
This issue follows the same pattern as similar fixes:
- commit eca6828403b8 ("crypto: skcipher - fix mismatch between mapping and unmapping order")
- commit 8cf57c6df818 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")
Both of which addressed the same fundamental requirement that kmap_local
operations must follow LIFO ordering.
Link: https://lkml.kernel.org/r/20250731144431.773923-1-sashal@kernel.org Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by: Sasha Levin <sashal@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Edward Adam Davis [Tue, 19 Aug 2025 13:41:02 +0000 (21:41 +0800)]
ocfs2: prevent release journal inode after journal shutdown
Before calling ocfs2_delete_osb(), ocfs2_journal_shutdown() has already
been executed in ocfs2_dismount_volume(), so osb->journal must be NULL.
Therefore, the following calltrace will inevitably fail when it reaches
jbd2_journal_release_jbd_inode().
Adding osb->journal checks will prevent null-ptr-deref during the above
execution path.
Link: https://lkml.kernel.org/r/tencent_357489BEAEE4AED74CBD67D246DBD2C4C606@qq.com Fixes: da5e7c87827e ("ocfs2: cleanup journal init and shutdown") Signed-off-by: Edward Adam Davis <eadavis@qq.com> Reported-by: syzbot+47d8cb2f2cc1517e515a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=47d8cb2f2cc1517e515a Tested-by: syzbot+47d8cb2f2cc1517e515a@syzkaller.appspotmail.com Reviewed-by: Mark Tinguely <mark.tinguely@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baptiste Lepers [Tue, 12 Aug 2025 13:26:56 +0000 (15:26 +0200)]
rust: mm: mark VmaNew as transparent
Unsafe code in VmaNew's methods assumes that the type has the same layout
as the inner `bindings::vm_area_struct`. This is not guaranteed by the
default struct representation in Rust, but requires specifying the
`transparent` representation.
Link: https://lkml.kernel.org/r/20250812132712.61007-1-baptiste.lepers@gmail.com Fixes: dcb81aeab406 ("mm: rust: add VmaNew for f_ops->mmap()") Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Trevor Gross <tmgross@umich.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In of_numa_parse_cpu_nodes, numa_nodes_parsed gets updated only for nodes
containing CPUs. Memory-only nodes should have been updated in
of_numa_parse_memory_nodes, but they weren't.
Subsequently, when free_area_init() attempts to access NODE_DATA() for
these uninitialized memory nodes, the kernel panics due to NULL pointer
dereference.
This can be reproduced on ARM64 QEMU with 1 CPU and 2 memory nodes:
Linus Torvalds [Thu, 28 Aug 2025 02:18:51 +0000 (19:18 -0700)]
Merge tag 'perf-tools-fixes-for-v6.17-2025-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf-tools fixes from Namhyung Kim:
"A number of kernel header sync changes and two build-id fixes"
* tag 'perf-tools-fixes-for-v6.17-2025-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf symbol: Add blocking argument to filename__read_build_id
perf symbol-minimal: Fix ehdr reading in filename__read_build_id
tools headers: Sync uapi/linux/vhost.h with the kernel source
tools headers: Sync uapi/linux/prctl.h with the kernel source
tools headers: Sync uapi/linux/fs.h with the kernel source
tools headers: Sync uapi/linux/fcntl.h with the kernel source
tools headers: Sync syscall tables with the kernel source
tools headers: Sync powerpc headers with the kernel source
tools headers: Sync arm64 headers with the kernel source
tools headers: Sync x86 headers with the kernel source
tools headers: Sync linux/cfi_types.h with the kernel source
tools headers: Sync linux/bits.h with the kernel source
tools headers: Sync KVM headers with the kernel source
perf test: Fix a build error in x86 topdown test
Jakub Kicinski [Thu, 28 Aug 2025 01:57:13 +0000 (18:57 -0700)]
Merge branch 'locking-fixes-for-fbnic-driver'
Alexander Duyck says:
====================
Locking fixes for fbnic driver
Address a few locking issues that were reported on the fbnic driver.
Specifically in one case we were seeing locking leaks due to us not
releasing the locks in certain exception paths. In another case we were
using phylink_resume outside of a section in which we held the RTNL mutex
and as a result we were throwing an assert.
====================
A bit of digging showed that we were invoking the phylink_resume as a part
of the fbnic_up path when we were enabling the service task while not
holding the RTNL lock. We should be enabling this sooner as a part of the
ndo_open path and then just letting the service task come online later.
This will help to enforce the correct locking and brings the phylink
interface online at the same time as the network interface, instead of at a
later time.
I tested this on QEMU to verify this was working by putting the system to
sleep using "echo mem > /sys/power/state" to put the system to sleep in the
guest and then using the command "system_wakeup" in the QEMU monitor.
Alexander Duyck [Mon, 25 Aug 2025 22:56:06 +0000 (15:56 -0700)]
fbnic: Fixup rtnl_lock and devl_lock handling related to mailbox code
The exception handling path for the __fbnic_pm_resume function had a bug in
that it was taking the devlink lock and then exiting to exception handling
instead of waiting until after it released the lock to do so. In order to
handle that I am swapping the placement of the unlock and the exception
handling jump to label so that we don't trigger a deadlock by holding the
lock longer than we need to.
In addition this change applies the same ordering to the rtnl_lock/unlock
calls in the same function as it should make the code easier to follow if
it adheres to a consistent pattern.
Eric Dumazet [Tue, 26 Aug 2025 13:44:35 +0000 (13:44 +0000)]
l2tp: do not use sock_hold() in pppol2tp_session_get_sock()
pppol2tp_session_get_sock() is using RCU, it must be ready
for sk_refcnt being zero.
Commit ee40fb2e1eb5 ("l2tp: protect sock pointer of
struct pppol2tp_session with RCU") was correct because it
had a call_rcu(..., pppol2tp_put_sk) which was later removed in blamed commit.
pppol2tp_recv() can use pppol2tp_session_get_sock() as well.
Fixes: c5cbaef992d6 ("l2tp: refactor ppp socket/session relationship") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20250826134435.1683435-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>