]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
2 months agoMerge tag 'ata-6.17-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/libat...
Linus Torvalds [Wed, 6 Aug 2025 05:52:59 +0000 (08:52 +0300)]
Merge tag 'ata-6.17-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fixes from Damien Le Moal:

 - Cleanup whitespace in messages in libata-core and the pata_pdc2027x,
   pata_macio drivers (Colin)

 - Fix ata_to_sense_error() to avoid seeing nonsensical sense data for
   rare cases where we fail to get sense data from the drive. The
   complementary fix to this is to ensure that we always return the
   generic "ABORTED COMMAND" sense data for a failed command for which
   we have no status or error fields

 - The recent changes to link power management (LPM) which now prevent
   the user from attempting to set an LPM policy through the
   link_power_management_policy caused some regressions in test
   environments because of the error that is now returned when writing
   to that attribute when LPM is not supported. To allow users to not
   trip on this, introduce the new link_power_management_supported
   attribute to allow simple testing of a port/device LPM support (me)

* tag 'ata-6.17-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: pata_pdc2027x: Remove space before newline and abbreviations
  ata: pata_macio: Remove space before newline
  ata: libata-core: Remove space before newline
  ata: libata-sata: Add link_power_management_supported sysfs attribute
  ata: libata-scsi: Return aborted command when missing sense and result TF
  ata: libata-scsi: Fix ata_to_sense_error() status handling

2 months agoMerge tag 'kbuild-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy...
Linus Torvalds [Wed, 6 Aug 2025 04:32:52 +0000 (07:32 +0300)]
Merge tag 'kbuild-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:
 "This is the last pull request from me.

  I'm grateful to have been able to continue as a maintainer for eight
  years. From the next cycle, Nathan and Nicolas will maintain Kbuild.

   - Fix a shortcut key issue in menuconfig

   - Fix missing rebuild of kheaders

   - Sort the symbol dump generated by gendwarfsyms

   - Support zboot extraction in scripts/extract-vmlinux

   - Migrate gconfig to GTK 3

   - Add TAR variable to allow overriding the default tar command

   - Hand over Kbuild maintainership"

* tag 'kbuild-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (92 commits)
  MAINTAINERS: hand over Kbuild maintenance
  kheaders: make it possible to override TAR
  kbuild: userprogs: use correct linker when mixing clang and GNU ld
  kconfig: lxdialog: replace strcpy() with strncpy() in inputbox.c
  kconfig: lxdialog: replace strcpy with snprintf in print_autowrap
  kconfig: gconf: refactor text_insert_help()
  kconfig: gconf: remove unneeded variable in text_insert_msg
  kconfig: gconf: use hyphens in signals
  kconfig: gconf: replace GtkImageMenuItem with GtkMenuItem
  kconfig: gconf: Fix Back button behavior
  kconfig: gconf: fix single view to display dependent symbols correctly
  scripts: add zboot support to extract-vmlinux
  gendwarfksyms: order -T symtypes output by name
  gendwarfksyms: use preferred form of sizeof for allocation
  kconfig: qconf: confine {begin,end}Group to constructor and destructor
  kconfig: qconf: fix ConfigList::updateListAllforAll()
  kconfig: add a function to dump all menu entries in a tree-like format
  kconfig: gconf: show GTK version in About dialog
  kconfig: gconf: replace GtkHPaned and GtkVPaned with GtkPaned
  kconfig: gconf: replace GdkColor with GdkRGBA
  ...

2 months agomedia: venus: Fix OPP table error handling
Sasha Levin [Tue, 5 Aug 2025 12:58:20 +0000 (08:58 -0400)]
media: venus: Fix OPP table error handling

The venus driver fails to check if dev_pm_opp_find_freq_{ceil,floor}()
returns an error pointer before calling dev_pm_opp_put(). This causes
a crash when OPP tables are not present in device tree.

Unable to handle kernel access to user memory outside uaccess routines
at virtual address 000000000000002e
...
pc : dev_pm_opp_put+0x1c/0x4c
lr : core_clks_enable+0x4c/0x16c [venus_core]

Add IS_ERR() checks before calling dev_pm_opp_put() to avoid
dereferencing error pointers.

Fixes: b179234b5e59 ("media: venus: pm_helpers: use opp-table for the frequency")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 months agoMerge tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip...
Linus Torvalds [Wed, 6 Aug 2025 01:41:21 +0000 (04:41 +0300)]
Merge tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

Pull perf fixes from Thomas Gleixner:
 "Perf fixes for perf_mmap() reference counting to prevent potential
  reference count leaks which are caused by:

   - VMA splits, which change the offset or size of a mapping, which
     causes perf_mmap_close() to ignore the unmap or unmap the wrong
     buffer.

   - Several internal issues of perf_mmap(), which can cause reference
     count leaks in the perf mmap, corrupt accounting or cause leaks in
     perf drivers.

  The main fix is to prevent VMA splits by implementing the
  [may_]split() callback for vm operations.

  The other issues are addressed by rearranging code, early returns on
  failure and invocation of cleanups.

  Also provide a selftest to validate the fixes.

  The reference counting should be converted to refcount_t, but that
  requires larger refactoring of the code and will be done once these
  fixes are upstream"

* tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git:
  selftests/perf_events: Add a mmap() correctness test
  perf/core: Prevent VMA split of buffer mappings
  perf/core: Handle buffer mapping fail correctly in perf_mmap()
  perf/core: Exit early on perf_mmap() fail
  perf/core: Don't leak AUX buffer refcount on allocation failure
  perf/core: Preserve AUX buffer allocation failure result

2 months agonet: usbnet: Fix the wrong netif_carrier_on() call
Ammar Faizi [Wed, 6 Aug 2025 00:31:05 +0000 (07:31 +0700)]
net: usbnet: Fix the wrong netif_carrier_on() call

The commit referenced in the Fixes tag causes usbnet to malfunction
(identified via git bisect). Post-commit, my external RJ45 LAN cable
fails to connect. Linus also reported the same issue after pulling that
commit.

The code has a logic error: netif_carrier_on() is only called when the
link is already on. Fix this by moving the netif_carrier_on() call
outside the if-statement entirely. This ensures it is always called
when EVENT_LINK_CARRIER_ON is set and properly clears it regardless
of the link state.

Cc: stable@vger.kernel.org
Cc: Armando Budianto <sprite@gnuweeb.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-=wjqL4uF0MG_c8+xHX1Vv8==sPYQrtzbdA3kzi96284nuQ@mail.gmail.com
Closes: https://lore.kernel.org/netdev/CAHk-=wjKh8X4PT_mU1kD4GQrbjivMfPn-_hXa6han_BTDcXddw@mail.gmail.com
Closes: https://lore.kernel.org/netdev/0752dee6-43d6-4e1f-81d2-4248142cccd2@gnuweeb.org
Fixes: 0d9cfc9b8cb1 ("net: usbnet: Avoid potential RCU stall on LINK_CHANGE event")
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 months agoMAINTAINERS: hand over Kbuild maintenance
Masahiro Yamada [Mon, 4 Aug 2025 14:20:07 +0000 (23:20 +0900)]
MAINTAINERS: hand over Kbuild maintenance

I'm stepping down as the maintainer of Kbuild/Kconfig.
It was enjoyable to refactor and improve the kernel build system,
but due to personal reasons, I believe it's difficult for me to
continue in this role any further.

I discussed this off-list with Nathan and Nicolas, and they have
kindly agreed to take over the maintenance of Kbuild with Odd Fixes.
I'm grateful to them for stepping in.

As for Kconfig, there are currently no designated reviewers, so the
maintainer position will remain vacant for now. I hope someone will
step up to take on the role.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Nicolas Schier <nicolas@fjasle.eu>
2 months agokheaders: make it possible to override TAR
Michał Górny [Tue, 29 Jul 2025 13:24:55 +0000 (15:24 +0200)]
kheaders: make it possible to override TAR

Commit 86cdd2fdc4e3 ("kheaders: make headers archive reproducible")
introduced a number of options specific to GNU tar to the `tar`
invocation in `gen_kheaders.sh` script. This causes the script to fail
to work on systems where `tar` is not GNU tar. This can occur e.g.
on recent Gentoo Linux installations that support using bsdtar from
libarchive instead.

Add a `TAR` make variable to make it possible to override the tar
executable used, e.g. by specifying:

  make TAR=gtar

Link: https://bugs.gentoo.org/884061
Reported-by: Sam James <sam@gentoo.org>
Tested-by: Sam James <sam@gentoo.org>
Co-developed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michał Górny <mgorny@gentoo.org>
Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2 months agokbuild: userprogs: use correct linker when mixing clang and GNU ld
Thomas Weißschuh [Mon, 28 Jul 2025 13:47:37 +0000 (15:47 +0200)]
kbuild: userprogs: use correct linker when mixing clang and GNU ld

The userprogs infrastructure does not expect clang being used with GNU ld
and in that case uses /usr/bin/ld for linking, not the configured $(LD).
This fallback is problematic as it will break when cross-compiling.
Mixing clang and GNU ld is used for example when building for SPARC64,
as ld.lld is not sufficient; see Documentation/kbuild/llvm.rst.

Relax the check around --ld-path so it gets used for all linkers.

Fixes: dfc1b168a8c4 ("kbuild: userprogs: use correct lld when linking through clang")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2 months agokconfig: lxdialog: replace strcpy() with strncpy() in inputbox.c
Suchit Karunakaran [Sun, 27 Jul 2025 16:44:33 +0000 (22:14 +0530)]
kconfig: lxdialog: replace strcpy() with strncpy() in inputbox.c

strcpy() performs no bounds checking and can lead to buffer overflows if
the input string exceeds the destination buffer size. This patch replaces
it with strncpy(), and null terminates the input string.

Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Reviewed-by: Nicolas Schier <nicolas.schier@linux.dev>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2 months agokconfig: lxdialog: replace strcpy with snprintf in print_autowrap
Suchit Karunakaran [Sat, 26 Jul 2025 19:43:07 +0000 (01:13 +0530)]
kconfig: lxdialog: replace strcpy with snprintf in print_autowrap

strcpy() does not perform bounds checking and can lead to buffer overflows
if the source string exceeds the destination buffer size. In
print_autowrap(), replace strcpy() with snprintf() to safely copy the
prompt string into the fixed-size tempstr buffer.

Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2 months agoselftests/perf_events: Add a mmap() correctness test
Lorenzo Stoakes [Sat, 2 Aug 2025 20:55:35 +0000 (22:55 +0200)]
selftests/perf_events: Add a mmap() correctness test

Exercise various mmap(), munmap() and mremap() invocations, which might
cause a perf buffer mapping to be split or truncated.

To avoid hard coding the perf event and having dependencies on
architectures and configuration options, scan through event types in sysfs
and try to open them. On success, try to mmap() and if that succeeds try to
mmap() the AUX buffer.

In case that no AUX buffer supporting event is found, only test the base
buffer mapping. If no mappable event is found or permissions are not
sufficient, skip the tests.

Reserve a PROT_NONE region for both rb and aux tests to allow testing the
case where mremap unmaps beyond the end of a mapped VMA to prevent it from
unmapping unrelated mappings.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
2 months agoperf/core: Prevent VMA split of buffer mappings
Thomas Gleixner [Wed, 30 Jul 2025 21:01:21 +0000 (23:01 +0200)]
perf/core: Prevent VMA split of buffer mappings

The perf mmap code is careful about mmap()'ing the user page with the
ringbuffer and additionally the auxiliary buffer, when the event supports
it. Once the first mapping is established, subsequent mapping have to use
the same offset and the same size in both cases. The reference counting for
the ringbuffer and the auxiliary buffer depends on this being correct.

Though perf does not prevent that a related mapping is split via mmap(2),
munmap(2) or mremap(2). A split of a VMA results in perf_mmap_open() calls,
which take reference counts, but then the subsequent perf_mmap_close()
calls are not longer fulfilling the offset and size checks. This leads to
reference count leaks.

As perf already has the requirement for subsequent mappings to match the
initial mapping, the obvious consequence is that VMA splits, caused by
resizing of a mapping or partial unmapping, have to be prevented.

Implement the vm_operations_struct::may_split() callback and return
unconditionally -EINVAL.

That ensures that the mapping offsets and sizes cannot be changed after the
fact. Remapping to a different fixed address with the same size is still
possible as it takes the references for the new mapping and drops those of
the old mapping.

Fixes: 45bfb2e50471 ("perf: Add AUX area to ring buffer for raw data streams")
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-27504
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: stable@vger.kernel.org
2 months agoperf/core: Handle buffer mapping fail correctly in perf_mmap()
Thomas Gleixner [Sat, 2 Aug 2025 10:48:55 +0000 (12:48 +0200)]
perf/core: Handle buffer mapping fail correctly in perf_mmap()

After successful allocation of a buffer or a successful attachment to an
existing buffer perf_mmap() tries to map the buffer read only into the page
table. If that fails, the already set up page table entries are zapped, but
the other perf specific side effects of that failure are not handled.  The
calling code just cleans up the VMA and does not invoke perf_mmap_close().

This leaks reference counts, corrupts user->vm accounting and also results
in an unbalanced invocation of event::event_mapped().

Cure this by moving the event::event_mapped() invocation before the
map_range() call so that on map_range() failure perf_mmap_close() can be
invoked without causing an unbalanced event::event_unmapped() call.

perf_mmap_close() undoes the reference counts and eventually frees buffers.

Fixes: b709eb872e19 ("perf: map pages in advance")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: stable@vger.kernel.org
2 months agoperf/core: Exit early on perf_mmap() fail
Thomas Gleixner [Sat, 2 Aug 2025 10:49:48 +0000 (12:49 +0200)]
perf/core: Exit early on perf_mmap() fail

When perf_mmap() fails to allocate a buffer, it still invokes the
event_mapped() callback of the related event. On X86 this might increase
the perf_rdpmc_allowed reference counter. But nothing undoes this as
perf_mmap_close() is never called in this case, which causes another
reference count leak.

Return early on failure to prevent that.

Fixes: 1e0fb9ec679c ("perf: Add pmu callbacks to track event mapping and unmapping")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: stable@vger.kernel.org
2 months agoperf/core: Don't leak AUX buffer refcount on allocation failure
Thomas Gleixner [Sat, 2 Aug 2025 10:39:39 +0000 (12:39 +0200)]
perf/core: Don't leak AUX buffer refcount on allocation failure

Failure of the AUX buffer allocation leaks the reference count.

Set the reference count to 1 only when the allocation succeeds.

Fixes: 45bfb2e50471 ("perf: Add AUX area to ring buffer for raw data streams")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: stable@vger.kernel.org
2 months agoperf/core: Preserve AUX buffer allocation failure result
Thomas Gleixner [Mon, 4 Aug 2025 20:22:09 +0000 (22:22 +0200)]
perf/core: Preserve AUX buffer allocation failure result

A recent overhaul sets the return value to 0 unconditionally after the
allocations, which causes reference count leaks and corrupts the user->vm
accounting.

Preserve the AUX buffer allocation failure return value, so that the
subsequent code works correctly.

Fixes: 0983593f32c4 ("perf/core: Lift event->mmap_mutex in perf_mmap()")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: stable@vger.kernel.org
2 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Linus Torvalds [Tue, 5 Aug 2025 13:55:03 +0000 (16:55 +0300)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux

Pull ARM update from Russell King:
 "Just one development update this time:

   - Finish removing Coresight support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9449/1: coresight: Finish removal of Coresight support in arch/arm/kernel

2 months agoMerge tag 'exfat-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linki...
Linus Torvalds [Tue, 5 Aug 2025 13:37:05 +0000 (16:37 +0300)]
Merge tag 'exfat-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - Use generic_write_sync instead of vfs_fsync_range in exfat_file_write_iter.
   It will fix an issue where fdatasync would be set incorrectly.

 - Fix potential infinite loop by the self-linked chain.

* tag 'exfat-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: add cluster chain loop check for dir
  exfat: fdatasync flag should be same like generic_write_sync()

2 months agoMerge tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 5 Aug 2025 13:02:07 +0000 (16:02 +0300)]
Merge tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:
 "Significant patch series in this pull request:

   - "mseal cleanups" (Lorenzo Stoakes)

     Some mseal cleaning with no intended functional change.

   - "Optimizations for khugepaged" (David Hildenbrand)

     Improve khugepaged throughput by batching PTE operations for large
     folios. This gain is mainly for arm64.

   - "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes" (Mike Rapoport)

     A bugfix, additional debug code and cleanups to the execmem code.

   - "mm/shmem, swap: bugfix and improvement of mTHP swap in" (Kairui Song)

     Bugfixes, cleanups and performance improvememnts to the mTHP swapin
     code"

* tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (38 commits)
  mm: mempool: fix crash in mempool_free() for zero-minimum pools
  mm: correct type for vmalloc vm_flags fields
  mm/shmem, swap: fix major fault counting
  mm/shmem, swap: rework swap entry and index calculation for large swapin
  mm/shmem, swap: simplify swapin path and result handling
  mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO
  mm/shmem, swap: tidy up swap entry splitting
  mm/shmem, swap: tidy up THP swapin checks
  mm/shmem, swap: avoid redundant Xarray lookup during swapin
  x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
  x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations
  execmem: drop writable parameter from execmem_fill_trapping_insns()
  execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)
  execmem: move execmem_force_rw() and execmem_restore_rox() before use
  execmem: rework execmem_cache_free()
  execmem: introduce execmem_alloc_rw()
  execmem: drop unused execmem_update_copy()
  mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
  mm/rmap: add anon_vma lifetime debug check
  mm: remove mm/io-mapping.c
  ...

2 months agoMerge tag 'i2c-for-6.17-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 4 Aug 2025 23:37:29 +0000 (16:37 -0700)]
Merge tag 'i2c-for-6.17-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull more i2c updates from Wolfram Sang:
 "A few more patches from I2C. Some are fixes which would be nice to
  have in rc1 already, some patches have nearly been fallen through the
  cracks, some just needed a bit more testing.

   - acpi: enable 100kHz workaround for DLL0945

   - apple: add support for Apple A7–A11, T2 chips; Kconfig update

   - mux: mule: fix error handling path

   - qcom-geni: fix controller frequency mapping

   - stm32f7: add DMA-safe transfer support

   - tegra: use controller reset if device reset is missing

   - tegra: remove unnecessary dma_sync*() calls"

* tag 'i2c-for-6.17-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: muxes: mule: Fix an error handling path in mule_i2c_mux_probe()
  i2c: Force DLL0945 touchpad i2c freq to 100khz
  i2c: apple: Drop default ARCH_APPLE in Kconfig
  i2c: qcom-geni: fix I2C frequency table to achieve accurate bus rates
  dt-bindings: i2c: apple,i2c: Document Apple A7-A11, T2 compatibles
  i2c: tegra: Remove dma_sync_*() calls
  i2c: tegra: Use internal reset when reset property is not available
  i2c: stm32f7: support i2c_*_dma_safe_msg_buf APIs

2 months agoMerge tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeu...
Linus Torvalds [Mon, 4 Aug 2025 23:27:21 +0000 (16:27 -0700)]
Merge tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "Three main updates: folio conversion by Matthew, switch to a new mount
  API by Hongbo and Eric, and several sysfs entries to tune GCs for ZUFS
  with finer granularity by Daeho.

  There are also patches to address bugs and issues in the existing
  features such as GCs, file pinning, write-while-dio-read, contingous
  block allocation, and memory access violations.

  Enhancements:
   - switch to new mount API and folio conversion
   - add sysfs nodes to controle F2FS GCs for ZUFS
   - improve performance on the nat entry cache
   - drop inode from the donation list when the last file is closed
   - avoid splitting bio when reading multiple pages

  Bug fixes:
   - fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
   - make sure zoned device GC to use FG_GC in shortage of free section
   - fix to calculate dirty data during has_not_enough_free_secs()
   - fix to update upper_p in __get_secs_required() correctly
   - wait for inflight dio completion, excluding pinned files read using dio
   - don't break allocation when crossing contiguous sections
   - vm_unmap_ram() may be called from an invalid context
   - fix to avoid out-of-boundary access in dnode page
   - fix to avoid panic in f2fs_evict_inode
   - fix to avoid UAF in f2fs_sync_inode_meta()
   - fix to use f2fs_is_valid_blkaddr_raw() in do_write_page()
   - fix UAF of f2fs_inode_info in f2fs_free_dic
   - fix to avoid invalid wait context issue
   - fix bio memleak when committing super block
   - handle nat.blkaddr corruption in f2fs_get_node_info()

  In addition, there are also clean-ups and minor bug fixes"

* tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (109 commits)
  f2fs: drop inode from the donation list when the last file is closed
  f2fs: add gc_boost_gc_greedy sysfs node
  f2fs: add gc_boost_gc_multiple sysfs node
  f2fs: fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
  f2fs: fix to calculate dirty data during has_not_enough_free_secs()
  f2fs: fix to update upper_p in __get_secs_required() correctly
  f2fs: directly add newly allocated pre-dirty nat entry to dirty set list
  f2fs: avoid redundant clean nat entry move in lru list
  f2fs: zone: wait for inflight dio completion, excluding pinned files read using dio
  f2fs: ignore valid ratio when free section count is low
  f2fs: don't break allocation when crossing contiguous sections
  f2fs: remove unnecessary tracepoint enabled check
  f2fs: merge the two conditions to avoid code duplication
  f2fs: vm_unmap_ram() may be called from an invalid context
  f2fs: fix to avoid out-of-boundary access in dnode page
  f2fs: switch to the new mount api
  f2fs: introduce fs_context_operation structure
  f2fs: separate the options parsing and options checking
  f2fs: Add f2fs_fs_context to record the mount options
  f2fs: Allow sbi to be NULL in f2fs_printk
  ...

2 months agoMerge tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk...
Linus Torvalds [Mon, 4 Aug 2025 17:54:36 +0000 (10:54 -0700)]
Merge tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Add new "hash_pointers=[auto|always|never]" boot parameter to force
   the hashing even with "slab_debug" enabled

 - Allow to stop CPU, after losing nbcon console ownership during
   panic(), even without proper NMI

 - Allow to use the printk kthread immediately even for the 1st
   registered nbcon

 - Compiler warning removal

* tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: nbcon: Allow reacquire during panic
  printk: Allow to use the printk kthread immediately even for 1st nbcon
  slab: Decouple slab_debug and no_hash_pointers
  vsprintf: Use __diag macros to disable '-Wsuggest-attribute=format'
  compiler-gcc.h: Introduce __diag_GCC_all

2 months agosched/psi: Fix psi_seq initialization
Peter Zijlstra [Tue, 15 Jul 2025 19:11:14 +0000 (15:11 -0400)]
sched/psi: Fix psi_seq initialization

With the seqcount moved out of the group into a global psi_seq,
re-initializing the seqcount on group creation is causing seqcount
corruption.

Fixes: 570c8efd5eb7 ("sched/psi: Optimize psi_group_change() cpu_clock() usage")
Reported-by: Chris Mason <clm@meta.com>
Suggested-by: Beata Michalska <beata.michalska@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 months agoMerge tag 'for-6.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 4 Aug 2025 15:58:53 +0000 (08:58 -0700)]
Merge tag 'for-6.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mikulas Patocka:

 - fix checking for request-based stackable devices (dm-table)

 - fix corrupt_bio_byte setup checks (dm-flakey)

 - add support for resync w/o metadata devices (dm raid)

 - small code simplification (dm, dm-mpath, vm-vdo, dm-raid)

 - remove support for asynchronous hashes (dm-verity)

 - close smatch warning (dm-zoned-target)

 - update the documentation and enable inline-crypto passthrough
   (dm-thin)

* tag 'for-6.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: set DM_TARGET_PASSES_CRYPTO feature for dm-thin
  dm-thin: update the documentation
  dm-raid: do not include dm-core.h
  vdo: omit need_resched() before cond_resched()
  md: dm-zoned-target: Initialize return variable r to avoid uninitialized use
  dm-verity: remove support for asynchronous hashes
  dm-mpath: don't print the "loaded" message if registering fails
  dm-mpath: make dm_unregister_path_selector return void
  dm: ima: avoid extra calls to strlen()
  dm: Simplify dm_io_complete()
  dm: Remove unnecessary return in dm_zone_endio()
  dm raid: add support for resync w/o metadata devices
  dm-flakey: Fix corrupt_bio_byte setup checks
  dm-table: fix checking for rq stackable devices

2 months agoMerge tag 'for-linus' of https://github.com/openrisc/linux
Linus Torvalds [Mon, 4 Aug 2025 15:37:46 +0000 (08:37 -0700)]
Merge tag 'for-linus' of https://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:

 - Replace __ASSEMBLY__ with __ASSEMBLER__ in headers (Thomas Huth)

* tag 'for-linus' of https://github.com/openrisc/linux:
  openrisc: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
  openrisc: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers

2 months agoMerge tag 'apparmor-pr-2025-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 4 Aug 2025 15:17:28 +0000 (08:17 -0700)]
Merge tag 'apparmor-pr-2025-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor updates from John Johansen:
 "This has one major feature, it pulls in a cleaned up version of
  af_unix mediation that Ubuntu has been carrying for years. It is
  placed behind a new abi to ensure that it does cause policy
  regressions. With pulling in the af_unix mediation there have been
  cleanups and some refactoring of network socket mediation. This
  accounts for the majority of the changes in the diff.

  In addition there are a few improvements providing minor code
  optimizations. several code cleanups, and bug fixes.

  Features:
   - improve debug printing
   - carry mediation check on label (optimization)
   - improve ability for compiler to optimize
     __begin_current_label_crit_section
   - transition for a linked list of rulesets to a vector of rulesets
   - don't hardcode profile signal, allow it to be set by policy
   - ability to mediate caps via the state machine instead of lut
   - Add Ubuntu af_unix mediation, put it behind new v9 abi

  Cleanups:
   - fix typos and spelling errors
   - cleanup kernel doc and code inconsistencies
   - remove redundant checks/code
   - remove unused variables
   - Use str_yes_no() helper function
   - mark tables static where appropriate
   - make all generated string array headers const char *const
   - refactor to doc semantics of file_perm checks
   - replace macro calls to network/socket fns with explicit calls
   - refactor/cleanup socket mediation code preparing for finer grained
     mediation of different network families
   - several updates to kernel doc comments

  Bug fixes:
   - fix incorrect profile->signal range check
   - idmap mount fixes
   - policy unpack unaligned access fixes
   - kfree_sensitive() where appropriate
   - fix oops when freeing policy
   - fix conflicting attachment resolution
   - fix exec table look-ups when stacking isn't first
   - fix exec auditing
   - mitigate userspace generating overly large xtables"

* tag 'apparmor-pr-2025-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: (60 commits)
  apparmor: fix: oops when trying to free null ruleset
  apparmor: fix Regression on linux-next (next-20250721)
  apparmor: fix test error: WARNING in apparmor_unix_stream_connect
  apparmor: Remove the unused variable rules
  apparmor: fix: accept2 being specifie even when permission table is presnt
  apparmor: transition from a list of rules to a vector of rules
  apparmor: fix documentation mismatches in val_mask_to_str and socket functions
  apparmor: remove redundant perms.allow MAY_EXEC bitflag set
  apparmor: fix kernel doc warnings for kernel test robot
  apparmor: Fix unaligned memory accesses in KUnit test
  apparmor: Fix 8-byte alignment for initial dfa blob streams
  apparmor: shift uid when mediating af_unix in userns
  apparmor: shift ouid when mediating hard links in userns
  apparmor: make sure unix socket labeling is correctly updated.
  apparmor: fix regression in fs based unix sockets when using old abi
  apparmor: fix AA_DEBUG_LABEL()
  apparmor: fix af_unix auditing to include all address information
  apparmor: Remove use of the double lock
  apparmor: update kernel doc comments for xxx_label_crit_section
  apparmor: make __begin_current_label_crit_section() indicate whether put is needed
  ...

2 months agoMerge branch 'rework/fixes' into for-linus
Petr Mladek [Mon, 4 Aug 2025 12:18:01 +0000 (14:18 +0200)]
Merge branch 'rework/fixes' into for-linus

2 months agoMerge branch 'rework/optimizations' into for-linus
Petr Mladek [Mon, 4 Aug 2025 12:17:17 +0000 (14:17 +0200)]
Merge branch 'rework/optimizations' into for-linus

2 months agoMerge branch 'for-6.17-hash_pointers' into for-linus
Petr Mladek [Mon, 4 Aug 2025 12:16:21 +0000 (14:16 +0200)]
Merge branch 'for-6.17-hash_pointers' into for-linus

2 months agoMerge branch 'for-6.15-printf-attribute' into for-linus
Petr Mladek [Mon, 4 Aug 2025 12:15:14 +0000 (14:15 +0200)]
Merge branch 'for-6.15-printf-attribute' into for-linus

2 months agoapparmor: fix: oops when trying to free null ruleset
John Johansen [Sat, 2 Aug 2025 03:36:06 +0000 (20:36 -0700)]
apparmor: fix: oops when trying to free null ruleset

profile allocation is wrongly setting the number of entries on the
rules vector before any ruleset is assigned. If profile allocation
fails between ruleset allocation and assigning the first ruleset,
free_ruleset() will be called with a null pointer resulting in an
oops.

[  107.350226] kernel BUG at mm/slub.c:545!
[  107.350912] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  107.351447] CPU: 1 UID: 0 PID: 27 Comm: ksoftirqd/1 Not tainted 6.14.6-hwe-rlee287-dev+ #5
[  107.353279] Hardware name:[   107.350218] -QE-----------[ cutMU here ]--------- Ub---
[  107.3502untu26] kernel BUG a 24t mm/slub.c:545.!04 P
[  107.350912]C ( Oops: invalid oi4pcode: 0000 [#1]40 PREEMPT SMP NOPFXTI
 + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  107.356054] RIP: 0010:__slab_free+0x152/0x340
[  107.356444] Code: 00 4c 89 ff e8 0f ac df 00 48 8b 14 24 48 8b 4c 24 20 48 89 44 24 08 48 8b 03 48 c1 e8 09 83 e0 01 88 44 24 13 e9 71 ff ff ff <0f> 0b 41 f7 44 24 08 87 04 00 00 75 b2 eb a8 41 f7 44 24 08 87 04
[  107.357856] RSP: 0018:ffffad4a800fbbb0 EFLAGS: 00010246
[  107.358937] RAX: ffff97ebc2a88e70 RBX: ffffd759400aa200 RCX: 0000000000800074
[  107.359976] RDX: ffff97ebc2a88e60 RSI: ffffd759400aa200 RDI: ffffad4a800fbc20
[  107.360600] RBP: ffffad4a800fbc50 R08: 0000000000000001 R09: ffffffff86f02cf2
[  107.361254] R10: 0000000000000000 R11: 0000000000000000 R12: ffff97ecc0049400
[  107.361934] R13: ffff97ebc2a88e60 R14: ffff97ecc0049400 R15: 0000000000000000
[  107.362597] FS:  0000000000000000(0000) GS:ffff97ecfb200000(0000) knlGS:0000000000000000
[  107.363332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  107.363784] CR2: 000061c9545ac000 CR3: 0000000047aa6000 CR4: 0000000000750ef0
[  107.364331] PKRU: 55555554
[  107.364545] Call Trace:
[  107.364761]  <TASK>
[  107.364931]  ? local_clock+0x15/0x30
[  107.365219]  ? srso_alias_return_thunk+0x5/0xfbef5
[  107.365593]  ? kfree_sensitive+0x32/0x70
[  107.365900]  kfree+0x29d/0x3a0
[  107.366144]  ? srso_alias_return_thunk+0x5/0xfbef5
[  107.366510]  ? local_clock_noinstr+0xe/0xd0
[  107.366841]  ? srso_alias_return_thunk+0x5/0xfbef5
[  107.367209]  kfree_sensitive+0x32/0x70
[  107.367502]  aa_free_profile.part.0+0xa2/0x400
[  107.367850]  ? rcu_do_batch+0x1e6/0x5e0
[  107.368148]  aa_free_profile+0x23/0x60
[  107.368438]  label_free_switch+0x4c/0x80
[  107.368751]  label_free_rcu+0x1c/0x50
[  107.369038]  rcu_do_batch+0x1e8/0x5e0
[  107.369324]  ? rcu_do_batch+0x157/0x5e0
[  107.369626]  rcu_core+0x1b0/0x2f0
[  107.369888]  rcu_core_si+0xe/0x20
[  107.370156]  handle_softirqs+0x9b/0x3d0
[  107.370460]  ? smpboot_thread_fn+0x26/0x210
[  107.370790]  run_ksoftirqd+0x3a/0x70
[  107.371070]  smpboot_thread_fn+0xf9/0x210
[  107.371383]  ? __pfx_smpboot_thread_fn+0x10/0x10
[  107.371746]  kthread+0x10d/0x280
[  107.372010]  ? __pfx_kthread+0x10/0x10
[  107.372310]  ret_from_fork+0x44/0x70
[  107.372655]  ? __pfx_kthread+0x10/0x10
[  107.372974]  ret_from_fork_asm+0x1a/0x30
[  107.373316]  </TASK>
[  107.373505] Modules linked in: af_packet_diag mptcp_diag tcp_diag udp_diag raw_diag inet_diag snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore qrtr binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd ccp kvm irqbypass polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd i2c_piix4 i2c_smbus input_leds joydev sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 hid_generic usbhid hid psmouse serio_raw floppy bochs pata_acpi
[  107.379086] ---[ end trace 0000000000000000 ]---

Don't set the count until a ruleset is actually allocated and
guard against free_ruleset() being called with a null pointer.

Reported-by: Ryan Lee <ryan.lee@canonical.com>
Fixes: 217af7e2f4de ("apparmor: refactor profile rules and attachments")
Signed-off-by: John Johansen <john.johansen@canonical.com>
2 months agoMerge tag 'rtc-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Linus Torvalds [Mon, 4 Aug 2025 03:17:34 +0000 (20:17 -0700)]
Merge tag 'rtc-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "Support for a new RTC in an existing driver and all the drivers
  exposing clocks using the common clock framework have been converted
  to determine_rate(). Summary:

  Subsystem:
   - Convert drivers exposing a clock from round_rate() to determine_rate()

  Drivers:
   - ds1307: oscillator stop flag handling for ds1341
   - pcf85063: add support for RV8063"

* tag 'rtc-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (34 commits)
  rtc: ds1685: Update Joshua Kinard's email address.
  rtc: rv3032: convert from round_rate() to determine_rate()
  rtc: rv3028: convert from round_rate() to determine_rate()
  rtc: pcf8563: convert from round_rate() to determine_rate()
  rtc: pcf85063: convert from round_rate() to determine_rate()
  rtc: nct3018y: convert from round_rate() to determine_rate()
  rtc: max31335: convert from round_rate() to determine_rate()
  rtc: m41t80: convert from round_rate() to determine_rate()
  rtc: hym8563: convert from round_rate() to determine_rate()
  rtc: ds1307: convert from round_rate() to determine_rate()
  rtc: rv3028: fix incorrect maximum clock rate handling
  rtc: pcf8563: fix incorrect maximum clock rate handling
  rtc: pcf85063: fix incorrect maximum clock rate handling
  rtc: nct3018y: fix incorrect maximum clock rate handling
  rtc: hym8563: fix incorrect maximum clock rate handling
  rtc: ds1307: fix incorrect maximum clock rate handling
  rtc: pcf85063: scope pcf85063_config structures
  rtc: Optimize calculations in rtc_time64_to_tm()
  dt-bindings: rtc: amlogic,a4-rtc: Add compatible string for C3
  rtc: ds1307: handle oscillator stop flag (OSF) for ds1341
  ...

2 months agoMerge tag 'powerpc-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Mon, 4 Aug 2025 02:15:04 +0000 (19:15 -0700)]
Merge tag 'powerpc-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Madhavan Srinivasan:

 - Fixes for several issues in the powernv PCI hotplug path

 - Fix htmldoc generation for htm.rst in toctree

 - Add jit support for load_acquire and store_release in ppc64 bpf jit

Thanks to Bjorn Helgaas, Hari Bathini, Puranjay Mohan, Saket Kumar
Bhaskar, Shawn Anastasio, Timothy Pearson, and Vishal Parmar

* tag 'powerpc-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc64/bpf: Add jit support for load_acquire and store_release
  docs: powerpc: add htm.rst to toctree
  PCI: pnv_php: Enable third attention indicator state
  PCI: pnv_php: Fix surprise plug detection and recovery
  powerpc/eeh: Make EEH driver device hotplug safe
  powerpc/eeh: Export eeh_unfreeze_pe()
  PCI: pnv_php: Work around switches with broken presence detection
  PCI: pnv_php: Clean up allocated IRQs on unplug

2 months agoMerge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 3 Aug 2025 23:23:09 +0000 (16:23 -0700)]
Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "Significant patch series in this pull request:

   - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
     us closer to being able to remove page->mapping

   - "relayfs: misc changes" (Jason Xing) does some maintenance and
     minor feature addition work in relayfs

   - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
     us from static preallocation of the kdump crashkernel's working
     memory over to dynamic allocation. So the difficulty of a-priori
     estimation of the second kernel's needs is removed and the first
     kernel obtains extra memory

   - "generalize panic_print's dump function to be used by other
     kernel parts" (Feng Tang) implements some consolidation and
     rationalization of the various ways in which a failing kernel
     splats information at the operator

* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
  tools/getdelays: add backward compatibility for taskstats version
  kho: add test for kexec handover
  delaytop: enhance error logging and add PSI feature description
  samples: Kconfig: fix spelling mistake "instancess" -> "instances"
  fat: fix too many log in fat_chain_add()
  scripts/spelling.txt: add notifer||notifier to spelling.txt
  xen/xenbus: fix typo "notifer"
  net: mvneta: fix typo "notifer"
  drm/xe: fix typo "notifer"
  cxl: mce: fix typo "notifer"
  KVM: x86: fix typo "notifer"
  MAINTAINERS: add maintainers for delaytop
  ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
  ucount: fix atomic_long_inc_below() argument type
  kexec: enable CMA based contiguous allocation
  stackdepot: make max number of pools boot-time configurable
  lib/xxhash: remove unused functions
  init/Kconfig: restore CONFIG_BROKEN help text
  lib/raid6: update recov_rvv.c zero page usage
  docs: update docs after introducing delaytop
  ...

2 months agoMerge tag 'trace-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Sun, 3 Aug 2025 22:03:04 +0000 (15:03 -0700)]
Merge tag 'trace-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull more tracing updates from Steven Rostedt:

 - Remove unneeded goto out statements

   Over time, the logic was restructured but left a "goto out" where the
   out label simply did a "return ret;". Instead of jumping to this out
   label, simply return immediately and remove the out label.

 - Add guard(ring_buffer_nest)

   Some calls to the tracing ring buffer can happen when the ring buffer
   is already being written to at the same context (for example, a
   trace_printk() in between a ring_buffer_lock_reserve() and a
   ring_buffer_unlock_commit()).

   In order to not trigger the recursion detection, these functions use
   ring_buffer_nest_start() and ring_buffer_nest_end(). Create a guard()
   for these functions so that their use cases can be simplified and not
   need to use goto for the release.

 - Clean up the tracing code with guard() and __free() logic

   There were several locations that were prime candidates for using
   guard() and __free() helpers. Switch them over to use them.

 - Fix output of function argument traces for unsigned int values

   The function tracer with "func-args" option set will record up to 6
   argument registers and then use BTF to format them for human
   consumption when the trace file is read. There are several arguments
   that are "unsigned long" and even "unsigned int" that are either and
   address or a mask. It is easier to understand if they were printed
   using hexadecimal instead of decimal. The old method just printed all
   non-pointer values as signed integers, which made it even worse for
   unsigned integers.

   For instance, instead of:

     __local_bh_disable_ip(ip=-2127311112, cnt=256) <-handle_softirqs

   show:

     __local_bh_disable_ip(ip=0xffffffff8133cef8, cnt=0x100) <-handle_softirqs"

* tag 'trace-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Have unsigned int function args displayed as hexadecimal
  ring-buffer: Convert ring_buffer_write() to use guard(preempt_notrace)
  tracing: Use __free(kfree) in trace.c to remove gotos
  tracing: Add guard() around locks and mutexes in trace.c
  tracing: Add guard(ring_buffer_nest)
  tracing: Remove unneeded goto out logic

2 months agoMerge tag 'modules-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules...
Linus Torvalds [Sun, 3 Aug 2025 21:16:52 +0000 (14:16 -0700)]
Merge tag 'modules-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux

Pull module updates from Daniel Gomez:
 "This is a small set of changes for modules, primarily to extend module
  users to use the module data structures in combination with the
  already no-op stub module functions, even when support for modules is
  disabled in the kernel configuration. This change follows the kernel's
  coding style for conditional compilation and allows kunit code to drop
  all CONFIG_MODULES ifdefs, which is also part of the changes. This
  should allow others part of the kernel to do the same cleanup.

  The remaining changes include a fix for module name length handling
  which could potentially lead to the removal of an incorrect module,
  and various cleanups"

* tag 'modules-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
  module: Rename MAX_PARAM_PREFIX_LEN to __MODULE_NAME_LEN
  tracing: Replace MAX_PARAM_PREFIX_LEN with MODULE_NAME_LEN
  module: Restore the moduleparam prefix length check
  module: Remove unnecessary +1 from last_unloaded_module::name size
  module: Prevent silent truncation of module name in delete_module(2)
  kunit: test: Drop CONFIG_MODULE ifdeffery
  module: make structure definitions always visible
  module: move 'struct module_use' to internal.h

2 months agoMerge tag 'i3c/for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Linus Torvalds [Sun, 3 Aug 2025 21:12:02 +0000 (14:12 -0700)]
Merge tag 'i3c/for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux

Pull i3c updates from Alexandre Belloni:
 "New driver:
   - Renesas I3C controller

  Subsystem:
   - use adapter timeout value for I2C transfers
   - don't fail if GETHDRCAP is unsupported
   - replace ENOTSUPP with SUSV4-compliant EOPNOTSUPP

  Drivers:
   - svc: Fix npcm845 FIFO_EMPTY quirk"

* tag 'i3c/for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: (25 commits)
  i3c: add missing include to internal header
  i3c: dw: Remove redundant pm_runtime_mark_last_busy() calls
  i3c: master: svc: Remove redundant pm_runtime_mark_last_busy() calls
  i3c: master: svc: Fix npcm845 FIFO_EMPTY quirk
  i3c: master: Add basic driver for the Renesas I3C controller
  dt-bindings: i3c: Add Renesas I3C controller
  i3c: Add more parameters for controllers to the header
  i3c: Standardize defines for specification parameters
  i3c: fix module_i3c_i2c_driver() with I3C=n
  i3c: master: cdns: Simplify handling clocks in probe()
  i3c: Fix i3c_device_do_priv_xfers() kernel-doc indentation
  i3c: master: dw: Use i3c_writel_fifo() and i3c_readl_fifo()
  i3c: master: cdns: Use i3c_writel_fifo() and i3c_readl_fifo()
  i3c: master: Add inline i3c_readl_fifo() and i3c_writel_fifo()
  i3c: prefix hexadecimal entries in sysfs
  i3c: master: cdns: replace ENOTSUPP with SUSV4-compliant EOPNOTSUPP
  i3c: dw: replace ENOTSUPP with SUSV4-compliant EOPNOTSUPP
  i3c: master: replace ENOTSUPP with SUSV4-compliant EOPNOTSUPP
  i3c: don't fail if GETHDRCAP is unsupported
  i3c: add patchwork entry to MAINTAINERS
  ...

2 months agoMerge tag 'rust-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Linus Torvalds [Sun, 3 Aug 2025 20:49:10 +0000 (13:49 -0700)]
Merge tag 'rust-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux

Pull Rust updates from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Enable a set of Clippy lints: 'ptr_as_ptr', 'ptr_cast_constness',
     'as_ptr_cast_mut', 'as_underscore', 'cast_lossless' and
     'ref_as_ptr'

     These are intended to avoid type casts with the 'as' operator,
     which are quite powerful, into restricted variants that are less
     powerful and thus should help to avoid mistakes

   - Remove the 'author' key now that most instances were moved to the
     plural one in the previous cycle

  'kernel' crate:

   - New 'bug' module: add 'warn_on!' macro which reuses the existing
     'BUG'/'WARN' infrastructure, i.e. it respects the usual sysctls and
     kernel parameters:

         warn_on!(value == 42);

     To avoid duplicating the assembly code, the same strategy is
     followed as for the static branch code in order to share the
     assembly between both C and Rust

     This required a few rearrangements on C arch headers -- the
     existing C macros should still generate the same outputs, thus no
     functional change expected there

   - 'workqueue' module: add delayed work items, including a
     'DelayedWork' struct, a 'impl_has_delayed_work!' macro and an
     'enqueue_delayed' method, e.g.:

         /// Enqueue the struct for execution on the system workqueue,
         /// where its value will be printed 42 jiffies later.
         fn print_later(value: Arc<MyStruct>) {
             let _ = workqueue::system().enqueue_delayed(value, 42);
         }

   - New 'bits' module: add support for 'bit' and 'genmask' functions,
     with runtime- and compile-time variants, e.g.:

         static_assert!(0b00010000 == bit_u8(4));
         static_assert!(0b00011110 == genmask_u8(1..=4));

         assert!(checked_bit_u32(u32::BITS).is_none());

   - 'uaccess' module: add 'UserSliceReader::strcpy_into_buf', which
     reads NUL-terminated strings from userspace into a '&CStr'

     Introduce 'UserPtr' newtype, similar in purpose to '__user' in C,
     to minimize mistakes handling userspace pointers, including mixing
     them up with integers and leaking them via the 'Debug' trait. Add
     it to the prelude, too

   - Start preparations for the replacement of our custom 'CStr' type
     with the analogous type in the 'core' standard library. This will
     take place across several cycles to make it easier. For this one,
     it includes a new 'fmt' module, using upstream method names and
     some other cleanups

     Replace 'fmt!' with a re-export, which helps Clippy lint properly,
     and clean up the found 'uninlined-format-args' instances

   - 'dma' module:

      - Clarify wording and be consistent in 'coherent' nomenclature

      - Convert the 'read!()' and 'write!()' macros to return a 'Result'

      - Add 'as_slice()', 'write()' methods in 'CoherentAllocation'

      - Expose 'count()' and 'size()' in 'CoherentAllocation' and add
        the corresponding type invariants

      - Implement 'CoherentAllocation::dma_handle_with_offset()'

   - 'time' module:

      - Make 'Instant' generic over clock source. This allows the
        compiler to assert that arithmetic expressions involving the
        'Instant' use 'Instants' based on the same clock source

      - Make 'HrTimer' generic over the timer mode. 'HrTimer' timers
        take a 'Duration' or an 'Instant' when setting the expiry time,
        depending on the timer mode. With this change, the compiler can
        check the type matches the timer mode

      - Add an abstraction for 'fsleep'. 'fsleep' is a flexible sleep
        function that will select an appropriate sleep method depending
        on the requested sleep time

      - Avoid 64-bit divisions on 32-bit hardware when calculating
        timestamps

      - Seal the 'HrTimerMode' trait. This prevents users of the
        'HrTimerMode' from implementing the trait on their own types

      - Pass the correct timer mode ID to 'hrtimer_start_range_ns()'

   - 'list' module: remove 'OFFSET' constants, allowing to remove
     pointer arithmetic; now 'impl_list_item!' invokes
     'impl_has_list_links!' or 'impl_has_list_links_self_ptr!'. Other
     simplifications too

   - 'types' module: remove 'ForeignOwnable::PointedTo' in favor of a
     constant, which avoids exposing the type of the opaque pointer, and
     require 'into_foreign' to return non-null

     Remove the 'Either<L, R>' type as well. It is unused, and we want
     to encourage the use of custom enums for concrete use cases

   - 'sync' module: implement 'Borrow' and 'BorrowMut' for 'Arc' types
     to allow them to be used in generic APIs

   - 'alloc' module: implement 'Borrow' and 'BorrowMut' for 'Box<T, A>';
     and 'Borrow', 'BorrowMut' and 'Default' for 'Vec<T, A>'

   - 'Opaque' type: add 'cast_from' method to perform a restricted cast
     that cannot change the inner type and use it in callers of
     'container_of!'. Rename 'raw_get' to 'cast_into' to match it

   - 'rbtree' module: add 'is_empty' method

   - 'sync' module: new 'aref' submodule to hold 'AlwaysRefCounted' and
     'ARef', which are moved from the too general 'types' module which
     we want to reduce or eventually remove. Also fix a safety comment
     in 'static_lock_class'

  'pin-init' crate:

   - Add 'impl<T, E> [Pin]Init<T, E> for Result<T, E>', so results are
     now (pin-)initializers

   - Add 'Zeroable::init_zeroed()' that delegates to 'init_zeroed()'

   - New 'zeroed()', a safe version of 'mem::zeroed()' and also provide
     it via 'Zeroable::zeroed()'

   - Implement 'Zeroable' for 'Option<&T>', 'Option<&mut T>' and for
     'Option<[unsafe] [extern "abi"] fn(...args...) -> ret>' for
     '"Rust"' and '"C"' ABIs and up to 20 arguments

   - Changed blanket impls of 'Init' and 'PinInit' from 'impl<T, E>
     [Pin]Init<T, E> for T' to 'impl<T> [Pin]Init<T> for T'

   - Renamed 'zeroed()' to 'init_zeroed()'

   - Upstream dev news: improve CI more to deny warnings, use
     '--all-targets'. Check the synchronization status of the two
     '-next' branches in upstream and the kernel

  MAINTAINERS:

   - Add Vlastimil Babka, Liam R. Howlett, Uladzislau Rezki and Lorenzo
     Stoakes as reviewers (thanks everyone)

  And a few other cleanups and improvements"

* tag 'rust-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (76 commits)
  rust: Add warn_on macro
  arm64/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
  riscv/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
  x86/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
  rust: kernel: move ARef and AlwaysRefCounted to sync::aref
  rust: sync: fix safety comment for `static_lock_class`
  rust: types: remove `Either<L, R>`
  rust: kernel: use `core::ffi::CStr` method names
  rust: str: add `CStr` methods matching `core::ffi::CStr`
  rust: str: remove unnecessary qualification
  rust: use `kernel::{fmt,prelude::fmt!}`
  rust: kernel: add `fmt` module
  rust: kernel: remove `fmt!`, fix clippy::uninlined-format-args
  scripts: rust: emit path candidates in panic message
  scripts: rust: replace length checks with match
  rust: list: remove nonexistent generic parameter in link
  rust: bits: add support for bits/genmask macros
  rust: list: remove OFFSET constants
  rust: list: add `impl_list_item!` examples
  rust: list: use fully qualified path
  ...

2 months agoi2c: muxes: mule: Fix an error handling path in mule_i2c_mux_probe()
Christophe JAILLET [Wed, 30 Jul 2025 19:38:02 +0000 (21:38 +0200)]
i2c: muxes: mule: Fix an error handling path in mule_i2c_mux_probe()

If an error occurs in the loop that creates the device adapters, then a
reference to 'dev' still needs to be released.

Use for_each_child_of_node_scoped() to both fix the issue and save one line
of code.

Fixes: d0f8e97866bf ("i2c: muxes: add support for tsd,mule-i2c multiplexer")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2 months agoi2c: Force DLL0945 touchpad i2c freq to 100khz
fangzhong.zhou [Sat, 2 Aug 2025 23:15:54 +0000 (07:15 +0800)]
i2c: Force DLL0945 touchpad i2c freq to 100khz

This patch fixes an issue where the touchpad cursor movement becomes
slow on the Dell Precision 5560. Force the touchpad freq to 100khz
as a workaround.

Tested on Dell Precision 5560 with 6.14 to 6.14.6. Cursor movement
is now smooth and responsive.

Signed-off-by: fangzhong.zhou <myth5@myth5.com>
[wsa: kept sorting and removed unnecessary parts from commit msg]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2 months agoi2c: apple: Drop default ARCH_APPLE in Kconfig
Sven Peter [Thu, 12 Jun 2025 21:11:29 +0000 (21:11 +0000)]
i2c: apple: Drop default ARCH_APPLE in Kconfig

When the first driver for Apple Silicon was upstreamed we accidentally
included `default ARCH_APPLE` in its Kconfig which then spread to almost
every subsequent driver. As soon as ARCH_APPLE is set to y this will
pull in many drivers as built-ins which is not what we want.
Thus, drop `default ARCH_APPLE` from Kconfig.

Signed-off-by: Sven Peter <sven@kernel.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2 months agoMerge tag 'i2c-host-6.17-pt2' of git://git.kernel.org/pub/scm/linux/kernel/git/andi...
Wolfram Sang [Sun, 3 Aug 2025 20:25:12 +0000 (22:25 +0200)]
Merge tag 'i2c-host-6.17-pt2' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-mergewindow

i2c-host for v6.17, part 2

- apple: add support for Apple A7–A11, T2 chips
- qcom-geni: fix controller frequency mapping
- stm32f7: add DMA-safe transfer support
- tegra: use controller reset if device reset is missing
- tegra: remove unnecessary dma_sync*() calls

2 months agortc: ds1685: Update Joshua Kinard's email address.
Joshua Kinard [Mon, 21 Jul 2025 17:00:51 +0000 (13:00 -0400)]
rtc: ds1685: Update Joshua Kinard's email address.

I am switching my address to a personal domain, so need to update the
driver's files and the entry in MAINTAINERS.

Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Link: https://lore.kernel.org/r/20250721170051.32407-1-kumba@gentoo.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: rv3032: convert from round_rate() to determine_rate()
Brian Masney [Thu, 10 Jul 2025 15:20:35 +0000 (11:20 -0400)]
rtc: rv3032: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-15-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: rv3028: convert from round_rate() to determine_rate()
Brian Masney [Thu, 10 Jul 2025 15:20:34 +0000 (11:20 -0400)]
rtc: rv3028: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-14-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: pcf8563: convert from round_rate() to determine_rate()
Brian Masney [Thu, 10 Jul 2025 15:20:33 +0000 (11:20 -0400)]
rtc: pcf8563: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-13-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: pcf85063: convert from round_rate() to determine_rate()
Brian Masney [Thu, 10 Jul 2025 15:20:32 +0000 (11:20 -0400)]
rtc: pcf85063: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-12-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: nct3018y: convert from round_rate() to determine_rate()
Brian Masney [Thu, 10 Jul 2025 15:20:31 +0000 (11:20 -0400)]
rtc: nct3018y: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-11-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: max31335: convert from round_rate() to determine_rate()
Brian Masney [Thu, 10 Jul 2025 15:20:30 +0000 (11:20 -0400)]
rtc: max31335: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-10-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: m41t80: convert from round_rate() to determine_rate()
Brian Masney [Thu, 10 Jul 2025 15:20:29 +0000 (11:20 -0400)]
rtc: m41t80: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-9-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: hym8563: convert from round_rate() to determine_rate()
Brian Masney [Thu, 10 Jul 2025 15:20:28 +0000 (11:20 -0400)]
rtc: hym8563: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-8-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: ds1307: convert from round_rate() to determine_rate()
Brian Masney [Thu, 10 Jul 2025 15:20:27 +0000 (11:20 -0400)]
rtc: ds1307: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-7-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: rv3028: fix incorrect maximum clock rate handling
Brian Masney [Thu, 10 Jul 2025 15:20:26 +0000 (11:20 -0400)]
rtc: rv3028: fix incorrect maximum clock rate handling

When rv3028_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.

Fixes: f583c341a515f ("rtc: rv3028: add clkout support")
Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-6-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: pcf8563: fix incorrect maximum clock rate handling
Brian Masney [Thu, 10 Jul 2025 15:20:25 +0000 (11:20 -0400)]
rtc: pcf8563: fix incorrect maximum clock rate handling

When pcf8563_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.

Fixes: a39a6405d5f94 ("rtc: pcf8563: add CLKOUT to common clock framework")
Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-5-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: pcf85063: fix incorrect maximum clock rate handling
Brian Masney [Thu, 10 Jul 2025 15:20:24 +0000 (11:20 -0400)]
rtc: pcf85063: fix incorrect maximum clock rate handling

When pcf85063_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.

Fixes: 8c229ab6048b7 ("rtc: pcf85063: Add pcf85063 clkout control to common clock framework")
Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-4-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: nct3018y: fix incorrect maximum clock rate handling
Brian Masney [Thu, 10 Jul 2025 15:20:23 +0000 (11:20 -0400)]
rtc: nct3018y: fix incorrect maximum clock rate handling

When nct3018y_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.

Fixes: 5adbaed16cc63 ("rtc: Add NCT3018Y real time clock driver")
Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-3-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: hym8563: fix incorrect maximum clock rate handling
Brian Masney [Thu, 10 Jul 2025 15:20:22 +0000 (11:20 -0400)]
rtc: hym8563: fix incorrect maximum clock rate handling

When hym8563_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.

Fixes: dcaf038493525 ("rtc: add hym8563 rtc-driver")
Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-2-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agortc: ds1307: fix incorrect maximum clock rate handling
Brian Masney [Thu, 10 Jul 2025 15:20:21 +0000 (11:20 -0400)]
rtc: ds1307: fix incorrect maximum clock rate handling

When ds3231_clk_sqw_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.

Fixes: 6c6ff145b3346 ("rtc: ds1307: add clock provider support for DS3231")
Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-1-33140bb2278e@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 months agoMerge tag 'pinctrl-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Sat, 2 Aug 2025 19:07:09 +0000 (12:07 -0700)]
Merge tag 'pinctrl-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control updates from Linus Walleij:
 "Nothing stands out, apart from maybe the interesting Eswin EIC7700, a
  RISC-V SoC I've never seen before.

  Core changes:

   - Open code PINCTRL_FUNCTION_DESC() instead of defining a complex
     macro only used in one place

   - Add pinmux_generic_add_pinfunction() helper and use this in a few
     drivers

  New drivers:

   - Amlogic S7, S7D and S6 pin control support

   - Eswin EIC7700 pin control support

   - Qualcomm PMIV0104, PM7550 and Milos pin control support

     Because of unhelpful numbering schemes, the Qualcomm driver now
     needs to start to rely on SoC codenames

   - STM32 HDP pin control support

   - Mediatek MT8189 pin control support

  Improvements:

   - Switch remaining pin control drivers over to the new GPIO set
     callback that provides a return value

   - Support RSVD (reserved) pins in the STM32 driver

   - Move many fixed assignments over to pinctrl_desc definitions

   - Handle multiple TLMM regions in the Qualcomm driver"

* tag 'pinctrl-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (105 commits)
  pinctrl: mediatek: Add pinctrl driver for mt8189
  dt-bindings: pinctrl: mediatek: Add support for mt8189
  pinctrl: aspeed-g6: Add PCIe RC PERST pin group
  pinctrl: ingenic: use pinmux_generic_add_pinfunction()
  pinctrl: keembay: use pinmux_generic_add_pinfunction()
  pinctrl: mediatek: moore: use pinmux_generic_add_pinfunction()
  pinctrl: airoha: use pinmux_generic_add_pinfunction()
  pinctrl: equilibrium: use pinmux_generic_add_pinfunction()
  pinctrl: provide pinmux_generic_add_pinfunction()
  pinctrl: pinmux: open-code PINCTRL_FUNCTION_DESC()
  pinctrl: ma35: use new GPIO line value setter callbacks
  MAINTAINERS: add Clément Le Goffic as STM32 HDP maintainer
  pinctrl: stm32: Introduce HDP driver
  dt-bindings: pinctrl: stm32: Introduce HDP
  pinctrl: qcom: Add Milos pinctrl driver
  dt-bindings: pinctrl: document the Milos Top Level Mode Multiplexer
  pinctrl: qcom: spmi: Add PM7550
  dt-bindings: pinctrl: qcom,pmic-gpio: Add PM7550 support
  pinctrl: qcom: spmi: Add PMIV0104
  dt-bindings: pinctrl: qcom,pmic-gpio: Add PMIV0104 support
  ...

2 months agomm: mempool: fix crash in mempool_free() for zero-minimum pools
Yadan Fan [Thu, 31 Jul 2025 18:14:45 +0000 (02:14 +0800)]
mm: mempool: fix crash in mempool_free() for zero-minimum pools

The mempool wake-up fix introduced in commit a5867a218d7c ("mm: mempool:
fix wake-up edge case bug for zero-minimum pools") inlined the
add_element() logic in mempool_free() to return the element to the
zero-minimum pool:

pool->elements[pool->curr_nr++] = element;

This causes crash, because mempool_init_node() does not initialize with
real allocation for zero-minimum pool, it only returns ZERO_SIZE_PTR to
the elements array which is unable to be dereferenced, and the
pre-allocation of this array never happened since the while test:

while (pool->curr_nr < pool->min_nr)

can never be satisfied as min_nr is zero, so the pool does not actually
reserve any buffer, the only way so far is to call alloc_fn() to get
buffer from SLUB, but if the memory is under high pressure the alloc_fn()
could never get any buffer, the waiting thread would be in an indefinite
loop of wake-sleep in a period until there is free memory to get.

This patch changes mempool_init_node() to allocate 1 element for the
elements array of zero-minimum pool, so that the pool will have reserved
buffer to use.  This will fix the crash issue and let the waiting thread
can get the reserved element when alloc_fn() failed to get buffer under
high memory pressure.

Also modify add_element() to support zero-minimum pool with simplifying
codes of zero-minimum handling in mempool_free().

Link: https://lkml.kernel.org/r/e01f00f3-58d9-4ca7-af54-bfa42fec9527@suse.com
Fixes: a5867a218d7c ("mm: mempool: fix wake-up edge case bug for zero-minimum pools")
Signed-off-by: Yadan Fan <ydfan@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: correct type for vmalloc vm_flags fields
Lorenzo Stoakes [Tue, 29 Jul 2025 11:49:06 +0000 (12:49 +0100)]
mm: correct type for vmalloc vm_flags fields

Several functions refer to the unfortunately named 'vm_flags' field when
referencing vmalloc flags, which happens to be the precise same name used
for VMA flags.

As a result these were erroneously changed to use the vm_flags_t type
(which currently is a typedef equivalent to unsigned long).

Currently this has no impact, but in future when vm_flags_t changes this
will result in issues, so change the type to unsigned long to account for
this.

[lorenzo.stoakes@oracle.com: fixup very disguised vmalloc flags parameter]
Link: https://lkml.kernel.org/r/e74dd8de-7e60-47ab-8a45-2c851f3c5d26@lucifer.local
Link: https://lkml.kernel.org/r/20250729114906.55347-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Harry Yoo <harry.yoo@oracle.com>
Closes: https://lore.kernel.org/all/aIgSpAnU8EaIcqd9@hyeyoo/
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/shmem, swap: fix major fault counting
Kairui Song [Mon, 28 Jul 2025 07:53:06 +0000 (15:53 +0800)]
mm/shmem, swap: fix major fault counting

If the swapin failed, don't update the major fault count.  There is a long
existing comment for doing it this way, now with previous cleanups, we can
finally fix it.

Link: https://lkml.kernel.org/r/20250728075306.12704-9-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/shmem, swap: rework swap entry and index calculation for large swapin
Kairui Song [Mon, 28 Jul 2025 07:53:05 +0000 (15:53 +0800)]
mm/shmem, swap: rework swap entry and index calculation for large swapin

Instead of calculating the swap entry differently in different swapin
paths, calculate it early before the swap cache lookup and use that for
the lookup and later swapin.  And after swapin have brought a folio,
simply round it down against the size of the folio.

This is simple and effective enough to verify the swap value.  A folio's
swap entry is always aligned by its size.  Any kind of parallel split or
race is acceptable because the final shmem_add_to_page_cache ensures that
all entries covered by the folio are correct, and thus there will be no
data corruption.

This also prevents false positive cache lookup.  If a shmem read request's
index points to the middle of a large swap entry, previously, shmem will
try the swap cache lookup using the large swap entry's starting value
(which is the first sub swap entry of this large entry).  This will lead
to false positive lookup results if only the first few swap entries are
cached but the actual requested swap entry pointed by the index is
uncached.  This is not a rare event, as swap readahead always tries to
cache order 0 folios when possible.

And this shouldn't cause any increased repeated faults.  Instead, no
matter how the shmem mapping is split in parallel, as long as the mapping
still contains the right entries, the swapin will succeed.

The final object size and stack usage are also reduced due to simplified
code:

./scripts/bloat-o-meter mm/shmem.o.old mm/shmem.o
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-145 (-145)
Function                                     old     new   delta
shmem_swapin_folio                          4056    3911    -145
Total: Before=33242, After=33097, chg -0.44%

Stack usage (Before vs After):
mm/shmem.c:2314:12:shmem_swapin_folio   264     static
mm/shmem.c:2314:12:shmem_swapin_folio   256     static

And while at it, round down the index too if swap entry is round down.
The index is used either for folio reallocation or confirming the mapping
content.  In either case, it should be aligned with the swap folio.

Link: https://lkml.kernel.org/r/20250728075306.12704-8-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/shmem, swap: simplify swapin path and result handling
Kairui Song [Mon, 28 Jul 2025 07:53:04 +0000 (15:53 +0800)]
mm/shmem, swap: simplify swapin path and result handling

Slightly tidy up the different handling of swap in and error handling for
SWP_SYNCHRONOUS_IO and non-SWP_SYNCHRONOUS_IO devices.  Now swapin will
always use either shmem_swap_alloc_folio or shmem_swapin_cluster, then
check the result.

Simplify the control flow and avoid a redundant goto label.

Link: https://lkml.kernel.org/r/20250728075306.12704-7-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO
Kairui Song [Mon, 28 Jul 2025 07:53:03 +0000 (15:53 +0800)]
mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO

For SWP_SYNCHRONOUS_IO devices, if a cache bypassing THP swapin failed due
to reasons like memory pressure, partially conflicting swap cache or ZSWAP
enabled, shmem will fallback to cached order 0 swapin.

Right now the swap cache still has a non-trivial overhead, and readahead
is not helpful for SWP_SYNCHRONOUS_IO devices, so we should always skip
the readahead and swap cache even if the swapin falls back to order 0.

So handle the fallback logic without falling back to the cached read.

Link: https://lkml.kernel.org/r/20250728075306.12704-6-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/shmem, swap: tidy up swap entry splitting
Kairui Song [Mon, 28 Jul 2025 07:53:02 +0000 (15:53 +0800)]
mm/shmem, swap: tidy up swap entry splitting

Instead of keeping different paths of splitting the entry before the swap
in start, move the entry splitting after the swapin has put the folio in
swap cache (or set the SWAP_HAS_CACHE bit).  This way we only need one
place and one unified way to split the large entry.  Whenever swapin
brought in a folio smaller than the shmem swap entry, split the entry and
recalculate the entry and index for verification.

This removes duplicated codes and function calls, reduces LOC, and the
split is less racy as it's guarded by swap cache now.  So it will have a
lower chance of repeated faults due to raced split.  The compiler is also
able to optimize the coder further:

bloat-o-meter results with GCC 14:

With DEBUG_SECTION_MISMATCH (-fno-inline-functions-called-once):
./scripts/bloat-o-meter mm/shmem.o.old mm/shmem.o
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-143 (-143)
Function                                     old     new   delta
shmem_swapin_folio                          2358    2215    -143
Total: Before=32933, After=32790, chg -0.43%

With !DEBUG_SECTION_MISMATCH:
add/remove: 0/1 grow/shrink: 1/0 up/down: 1069/-749 (320)
Function                                     old     new   delta
shmem_swapin_folio                          2871    3940   +1069
shmem_split_large_entry.isra                 749       -    -749
Total: Before=32806, After=33126, chg +0.98%

Since shmem_split_large_entry is only called in one place now. The
compiler will either generate more compact code, or inlined it for
better performance.

Link: https://lkml.kernel.org/r/20250728075306.12704-5-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/shmem, swap: tidy up THP swapin checks
Kairui Song [Mon, 28 Jul 2025 07:53:01 +0000 (15:53 +0800)]
mm/shmem, swap: tidy up THP swapin checks

Move all THP swapin related checks under CONFIG_TRANSPARENT_HUGEPAGE, so
they will be trimmed off by the compiler if not needed.

And add a WARN if shmem sees a order > 0 entry when
CONFIG_TRANSPARENT_HUGEPAGE is disabled, that should never happen unless
things went very wrong.

There should be no observable feature change except the new added WARN.

Link: https://lkml.kernel.org/r/20250728075306.12704-4-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/shmem, swap: avoid redundant Xarray lookup during swapin
Kairui Song [Mon, 28 Jul 2025 07:53:00 +0000 (15:53 +0800)]
mm/shmem, swap: avoid redundant Xarray lookup during swapin

Patch series "mm/shmem, swap: bugfix and improvement of mTHP swap in", v6.

The current THP swapin path have several problems.  It may potentially
hang, may cause redundant faults due to false positive swap cache lookup,
and it issues redundant Xarray walks.  !CONFIG_TRANSPARENT_HUGEPAGE builds
may also contain unnecessary THP checks.

This series fixes all of the mentioned issues, the code should be more
robust and prepared for the swap table series.  Now 4 walks is reduced to
3 (get order & confirm, confirm, insert folio),
!CONFIG_TRANSPARENT_HUGEPAGE build overhead is also minimized, and comes
with a sanity check now.

The performance is slightly better after this series, sequential swap in
of 24G data from ZRAM, using transparent_hugepage_tmpfs=always (24 samples
each):

Before:         avg: 10.66s, stddev: 0.04
After patch 1:  avg: 10.58s, stddev: 0.04
After patch 2:  avg: 10.65s, stddev: 0.05
After patch 3:  avg: 10.65s, stddev: 0.04
After patch 4:  avg: 10.67s, stddev: 0.04
After patch 5:  avg: 9.79s,  stddev: 0.04
After patch 6:  avg: 9.79s,  stddev: 0.05
After patch 7:  avg: 9.78s,  stddev: 0.05
After patch 8:  avg: 9.79s,  stddev: 0.04

Several patches improve the performance by a little, which is about ~8%
faster in total.

Build kernel test showed very slightly improvement, testing with make -j48
with defconfig in a 768M memcg also using ZRAM as swap, and
transparent_hugepage_tmpfs=always (6 test runs):

Before:         avg: 3334.66s, stddev: 43.76
After patch 1:  avg: 3349.77s, stddev: 18.55
After patch 2:  avg: 3325.01s, stddev: 42.96
After patch 3:  avg: 3354.58s, stddev: 14.62
After patch 4:  avg: 3336.24s, stddev: 32.15
After patch 5:  avg: 3325.13s, stddev: 22.14
After patch 6:  avg: 3285.03s, stddev: 38.95
After patch 7:  avg: 3287.32s, stddev: 26.37
After patch 8:  avg: 3295.87s, stddev: 46.24

This patch (of 7):

Currently shmem calls xa_get_order to get the swap radix entry order,
requiring a full tree walk.  This can be easily combined with the swap
entry value checking (shmem_confirm_swap) to avoid the duplicated lookup
and abort early if the entry is gone already.  Which should improve the
performance.

Link: https://lkml.kernel.org/r/20250728075306.12704-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20250728075306.12704-3-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agox86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
Mike Rapoport (Microsoft) [Sun, 13 Jul 2025 07:17:30 +0000 (10:17 +0300)]
x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations

For the most part ftrace uses text poking and can handle ROX memory.  The
only place that requires writable memory is create_trampoline() that
updates the allocated memory and in the end makes it ROX.

Use execmem_alloc_rw() in x86::ftrace::alloc_tramp() and enable ROX cache
for EXECMEM_FTRACE when configuration and CPU features allow that.

Link: https://lkml.kernel.org/r/20250713071730.4117334-9-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agox86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations
Mike Rapoport (Microsoft) [Sun, 13 Jul 2025 07:17:29 +0000 (10:17 +0300)]
x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations

x86::alloc_insn_page() always allocates ROX memory.

Instead of overriding this method, add EXECMEM_KPROBES entry in
execmem_info with pgprot set to PAGE_KERNEL_ROX and use ROX cache when
configuration and CPU features allow it.

Link: https://lkml.kernel.org/r/20250713071730.4117334-8-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoexecmem: drop writable parameter from execmem_fill_trapping_insns()
Mike Rapoport (Microsoft) [Sun, 13 Jul 2025 07:17:28 +0000 (10:17 +0300)]
execmem: drop writable parameter from execmem_fill_trapping_insns()

After update of execmem_cache_free() that made memory writable before
updating it, there is no need to update read only memory, so the writable
parameter to execmem_fill_trapping_insns() is not needed.  Drop it.

Link: https://lkml.kernel.org/r/20250713071730.4117334-7-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoexecmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)
Mike Rapoport (Microsoft) [Sun, 13 Jul 2025 07:17:27 +0000 (10:17 +0300)]
execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)

When execmem populates ROX cache it uses vmalloc(VM_ALLOW_HUGE_VMAP).
Although vmalloc falls back to allocating base pages if high order
allocation fails, it may happen that it still cannot allocate enough
memory.

Right now ROX cache is only used by modules and in majority of cases the
allocations happen at boot time when there's plenty of free memory, but
upcoming enabling ROX cache for ftrace and kprobes would mean that execmem
allocations can happen when the system is under memory pressure and a
failure to allocate large page worth of memory becomes more likely.

Fallback to regular vmalloc() if vmalloc(VM_ALLOW_HUGE_VMAP) fails.

Link: https://lkml.kernel.org/r/20250713071730.4117334-6-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoexecmem: move execmem_force_rw() and execmem_restore_rox() before use
Mike Rapoport (Microsoft) [Sun, 13 Jul 2025 07:17:26 +0000 (10:17 +0300)]
execmem: move execmem_force_rw() and execmem_restore_rox() before use

to avoid static declarations.

Link: https://lkml.kernel.org/r/20250713071730.4117334-5-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoexecmem: rework execmem_cache_free()
Mike Rapoport (Microsoft) [Sun, 13 Jul 2025 07:17:25 +0000 (10:17 +0300)]
execmem: rework execmem_cache_free()

Currently execmem_cache_free() ignores potential allocation failures that
may happen in execmem_cache_add().  Besides, it uses text poking to fill
the memory with trapping instructions before returning it to cache
although it would be more efficient to make that memory writable, update
it using memcpy and then restore ROX protection.

Rework execmem_cache_free() so that in case of an error it will defer
freeing of the memory to a delayed work.

With this the happy fast path will now change permissions to RW, fill the
memory with trapping instructions using memcpy, restore ROX permissions,
add the memory back to the free cache and clear the relevant entry in
busy_areas.

If any step in the fast path fails, the entry in busy_areas will be marked
as pending_free.  These entries will be handled by a delayed work and
freed asynchronously.

To make the fast path faster, use __GFP_NORETRY for memory allocations and
let asynchronous handler try harder with GFP_KERNEL.

Link: https://lkml.kernel.org/r/20250713071730.4117334-4-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoexecmem: introduce execmem_alloc_rw()
Mike Rapoport (Microsoft) [Sun, 13 Jul 2025 07:17:24 +0000 (10:17 +0300)]
execmem: introduce execmem_alloc_rw()

Some callers of execmem_alloc() require the memory to be temporarily
writable even when it is allocated from ROX cache.  These callers use
execemem_make_temp_rw() right after the call to execmem_alloc().

Wrap this sequence in execmem_alloc_rw() API.

Link: https://lkml.kernel.org/r/20250713071730.4117334-3-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Daniel Gomez <da.gomez@samsung.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoexecmem: drop unused execmem_update_copy()
Mike Rapoport (Microsoft) [Sun, 13 Jul 2025 07:17:23 +0000 (10:17 +0300)]
execmem: drop unused execmem_update_copy()

Patch series "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes", v3.

These patches enable use of EXECMEM_ROX_CACHE for ftrace and kprobes
allocations on x86.

They also include some ground work in execmem.

Since the execmem model for caching large ROX pages changed from the
initial assumption that the memory that is allocated from ROX cache is
always ROX to the current state where memory can be temporarily made RW
and then restored to ROX, we can stop using text poking to update it.
This also saves the hassle of trying lock text_mutex in
execmem_cache_free() when kprobes already hold that mutex.

This patch (of 8):

The execmem_update_copy() that used text poking was required when memory
allocated from ROX cache was always read-only.  Since now its permissions
can be switched to read-write there is no need in a function that updates
memory with text poking.

Remove it.

Link: https://lkml.kernel.org/r/20250713071730.4117334-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20250713071730.4117334-2-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
Suren Baghdasaryan [Mon, 28 Jul 2025 17:53:55 +0000 (10:53 -0700)]
mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped

By inducing delays in the right places, Jann Horn created a reproducer for
a hard to hit UAF issue that became possible after VMAs were allowed to be
recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.

Race description is borrowed from Jann's discovery report:
lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
rcu_read_lock().  At that point, the VMA may be concurrently freed, and it
can be recycled by another process.  vma_start_read() then increments the
vma->vm_refcnt (if it is in an acceptable range), and if this succeeds,
vma_start_read() can return a recycled VMA.

In this scenario where the VMA has been recycled, lock_vma_under_rcu()
will then detect the mismatching ->vm_mm pointer and drop the VMA through
vma_end_read(), which calls vma_refcount_put().  vma_refcount_put() drops
the refcount and then calls rcuwait_wake_up() using a copy of vma->vm_mm.
This is wrong: It implicitly assumes that the caller is keeping the VMA's
mm alive, but in this scenario the caller has no relation to the VMA's mm,
so the rcuwait_wake_up() can cause UAF.

The diagram depicting the race:
T1         T2         T3
==         ==         ==
lock_vma_under_rcu
  mas_walk
          <VMA gets removed from mm>
                      mmap
                        <the same VMA is reallocated>
  vma_start_read
    __refcount_inc_not_zero_limited_acquire
                      munmap
                        __vma_enter_locked
                          refcount_add_not_zero
  vma_end_read
    vma_refcount_put
      __refcount_dec_and_test
                          rcuwait_wait_event
                            <finish operation>
      rcuwait_wake_up [UAF]

Note that rcuwait_wait_event() in T3 does not block because refcount was
already dropped by T1.  At this point T3 can exit and free the mm causing
UAF in T1.

To avoid this we move vma->vm_mm verification into vma_start_read() and
grab vma->vm_mm to stabilize it before vma_refcount_put() operation.

[surenb@google.com: v3]
Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250728175355.2282375-1-surenb@google.com
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: Jann Horn <jannh@google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+m2tA@mail.gmail.com/
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/rmap: add anon_vma lifetime debug check
Jann Horn [Fri, 25 Jul 2025 12:16:24 +0000 (14:16 +0200)]
mm/rmap: add anon_vma lifetime debug check

If an anon folio is mapped into userspace, its anon_vma must be alive,
otherwise rmap walks can hit UAF.

There have been syzkaller reports a few months ago[1][2] of UAF in rmap
walks that seems to indicate that there can be pages with elevated
mapcount whose anon_vma has already been freed, but I think we never
figured out what the cause is; and syzkaller only hit these UAFs when
memory pressure randomly caused reclaim to rmap-walk the affected pages,
so it of course didn't manage to create a reproducer.

Add a VM_WARN_ON_FOLIO() when we add/remove mappings of anonymous folios
to hopefully catch such issues more reliably.

[1] https://lore.kernel.org/r/67abaeaf.050a0220.110943.0041.GAE@google.com
[2] https://lore.kernel.org/r/67a76f33.050a0220.3d72c.0028.GAE@google.com

Link: https://lkml.kernel.org/r/20250725-anonvma-uaf-debug-v2-1-bc3c7e5ba5b1@google.com
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: remove mm/io-mapping.c
Lorenzo Stoakes [Fri, 25 Jul 2025 14:29:01 +0000 (15:29 +0100)]
mm: remove mm/io-mapping.c

This is dead code, which was used from commit b739f125e4eb ("i915: use
io_mapping_map_user") but reverted a month later by commit 0e4fe0c9f2f9
("Revert "i915: use io_mapping_map_user"") back in 2021.

Since then nobody has used it, so remove it.

[akpm@linux-foundation.org: update Documentation/core-api/mm-api.rst, per Vlastimil]
Link: https://lkml.kernel.org/r/20250725142901.81502-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agokhugepaged: optimize collapse_pte_mapped_thp() by PTE batching
Dev Jain [Thu, 24 Jul 2025 05:23:01 +0000 (10:53 +0530)]
khugepaged: optimize collapse_pte_mapped_thp() by PTE batching

Use PTE batching to batch process PTEs mapping the same large folio. An
improvement is expected due to batching mapcount manipulation on the
folios, and for arm64 which supports contig mappings, the number of
TLB flushes is also reduced.

Note that we do not need to make a change to the check
"if (folio_page(folio, i) != page)"; if i'th page of the folio is equal
to the first page of our batch, then i + 1, .... i + nr_batch_ptes - 1
pages of the folio will be equal to the corresponding pages of our
batch mapping consecutive pages.

Link: https://lkml.kernel.org/r/20250724052301.23844-4-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agokhugepaged: optimize __collapse_huge_page_copy_succeeded() by PTE batching
Dev Jain [Thu, 24 Jul 2025 05:23:00 +0000 (10:53 +0530)]
khugepaged: optimize __collapse_huge_page_copy_succeeded() by PTE batching

Use PTE batching to batch process PTEs mapping the same large folio. An
improvement is expected due to batching refcount-mapcount manipulation on
the folios, and for arm64 which supports contig mappings, the number of
TLB flushes is also reduced.

Link: https://lkml.kernel.org/r/20250724052301.23844-3-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: add get_and_clear_ptes() and clear_ptes()
David Hildenbrand [Thu, 24 Jul 2025 05:22:59 +0000 (10:52 +0530)]
mm: add get_and_clear_ptes() and clear_ptes()

Patch series "Optimizations for khugepaged", v4.

If the underlying folio mapped by the ptes is large, we can process those
ptes in a batch using folio_pte_batch().

For arm64 specifically, this results in a 16x reduction in the number of
ptep_get() calls, since on a contig block, ptep_get() on arm64 will
iterate through all 16 entries to collect a/d bits.  Next, ptep_clear()
will cause a TLBI for every contig block in the range via
contpte_try_unfold().  Instead, use clear_ptes() to only do the TLBI at
the first and last contig block of the range.

For split folios, there will be no pte batching; the batch size returned
by folio_pte_batch() will be 1.  For pagetable split folios, the ptes will
still point to the same large folio; for arm64, this results in the
optimization described above, and for other arches, a minor improvement is
expected due to a reduction in the number of function calls and batching
atomic operations.

This patch (of 3):

Let's add variants to be used where "full" does not apply -- which will
be the majority of cases in the future. "full" really only applies if
we are about to tear down a full MM.

Use get_and_clear_ptes() in existing code, clear_ptes() users will
be added next.

Link: https://lkml.kernel.org/r/20250724052301.23844-2-dev.jain@arm.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/mincore: hold PTL in mincore_hugetlb
Jinjiang Tu [Thu, 24 Jul 2025 09:09:58 +0000 (17:09 +0800)]
mm/mincore: hold PTL in mincore_hugetlb

Hold PTL in mincore_hugetlb() to avoid operating on stale page, as
mincore_pte_range() have done.

Link: https://lkml.kernel.org/r/20250724090958.455887-4-tujinjiang@huawei.com
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Brahmajit Das <brahmajit.xyz@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/memory-failure: hold PTL in hwpoison_hugetlb_range
Jinjiang Tu [Fri, 25 Jul 2025 03:31:12 +0000 (11:31 +0800)]
mm/memory-failure: hold PTL in hwpoison_hugetlb_range

Hold PTL in hwpoison_hugetlb_range() to avoid operating on stale page, as
hwpoison_pte_range() have done.

This change is not known to address any issues which users have
experienced.

Link: https://lkml.kernel.org/r/20250725033112.2690158-1-tujinjiang@huawei.com
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Brahmajit Das <brahmajit.xyz@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/mseal: rework mseal apply logic
Lorenzo Stoakes [Fri, 25 Jul 2025 08:29:45 +0000 (09:29 +0100)]
mm/mseal: rework mseal apply logic

The logic can be simplified - firstly by renaming the inconsistently named
apply_mm_seal() to mseal_apply().

We then wrap mseal_fixup() into the main loop as the logic is simple
enough to not require it, equally it isn't a hugely pleasant pattern in
mprotect() etc.  so it's not something we want to perpetuate.

We eliminate the need for invoking vma_iter_end() on each loop by directly
determining if the VMA was merged - the only thing we need concern
ourselves with is whether the start/end of the (gapless) range are offset
into VMAs.

This refactoring also avoids the rather horrid 'pass pointer to prev
around' pattern used in mprotect() et al.

No functional change intended.

Link: https://lkml.kernel.org/r/ddfa4376ce29f19a589d7dc8c92cb7d4f7605a4c.1753431105.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Jeff Xu <jeffxu@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/mseal: simplify and rename VMA gap check
Lorenzo Stoakes [Fri, 25 Jul 2025 08:29:44 +0000 (09:29 +0100)]
mm/mseal: simplify and rename VMA gap check

The check_mm_seal() function is doing something general - checking whether
a range contains only VMAs (or rather that it does NOT contain any
unmapped regions).

So rename this function to range_contains_unmapped().

Additionally simplify the logic, we are simply checking whether the last
vma->vm_end has either a VMA starting after it or ends before the end
parameter.

This check is rather dubious, so it is sensible to keep it local to
mm/mseal.c as at a later stage it may be removed, and we don't want any
other mm code to perform such a check.

No functional change intended.

[lorenzo.stoakes@oracle.com: add comment explaining why we disallow gaps on mseal()]
Link: https://lkml.kernel.org/r/d85b3d55-09dc-43ba-8204-b48267a96751@lucifer.local
Link: https://lkml.kernel.org/r/dd50984eff1e242b5f7f0f070a3360ef760e06b8.1753431105.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/mseal: small cleanups
Lorenzo Stoakes [Fri, 25 Jul 2025 08:29:43 +0000 (09:29 +0100)]
mm/mseal: small cleanups

Drop the wholly unnecessary set_vma_sealed() helper(), which is used only
once, and place VMA_ITERATOR() declarations in the correct place.

Retain vma_is_sealed(), and use it instead of the confusingly named
can_modify_vma(), so it's abundantly clear what's being tested, rather
then a nebulous sense of 'can the VMA be modified'.

No functional change intended.

Link: https://lkml.kernel.org/r/98cf28d04583d632a6eb698e9ad23733bb6af26b.1753431105.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Jeff Xu <jeffxu@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/mseal: update madvise() logic
Lorenzo Stoakes [Fri, 25 Jul 2025 08:29:42 +0000 (09:29 +0100)]
mm/mseal: update madvise() logic

The madvise() logic is inexplicably performed in mm/mseal.c - this ought
to be located in mm/madvise.c.

Additionally can_modify_vma_madv() is inconsistently named and, in
combination with is_ro_anon(), is very confusing logic.

Put a static function in mm/madvise.c instead - can_madvise_modify() -
that spells out exactly what's happening.  Also explicitly check for an
anon VMA.

Also add commentary to explain what's going on.

Essentially - we disallow discarding of data in mseal()'d mappings in
instances where the user couldn't otherwise write to that data.

We retain the existing behaviour here regarding MAP_PRIVATE mappings of
file-backed mappings, which entails some complexity - while this, strictly
speaking - appears to violate mseal() semantics, it may interact badly
with users which expect to be able to madvise(MADV_DONTNEED) .text
mappings for instance.

We may revisit this at a later date.

No functional change intended.

Link: https://lkml.kernel.org/r/492a98d9189646e92c8f23f4cce41ed323fe01df.1753431105.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/mseal: always define VM_SEALED
Lorenzo Stoakes [Fri, 25 Jul 2025 08:29:41 +0000 (09:29 +0100)]
mm/mseal: always define VM_SEALED

Patch series "mseal cleanups", v4.

Perform a number of cleanups to the mseal logic.  Firstly, VM_SEALED is
treated differently from every other VMA flag, it really doesn't make
sense to do this, so we start by making this consistent with everything
else.

Next we place the madvise logic where it belongs - in mm/madvise.c.  It
really makes no sense to abstract this elsewhere.  In doing so, we go to
great lengths to explain very clearly the previously very confusing logic
as to what sealed mappings are impacted here.

In doing so, we retain existing logic regarding treatment of madvise()
discard operations for a sealed, read-only MAP_PRIVATE file-backed
mapping.  This is something we likely need to revisit.

We then abstract out and explain the 'are there are any gaps in this range
in the mm?' check being performed as a prerequisite to mseal being
performed.

Finally, we simplify the actual mseal logic which is really quite
straightforward.

No functional change is intended.

This patch (of 4):

There is no reason to treat VM_SEALED in a special way, in each other case
in which a VMA flag is unavailable due to configuration, we simply assign
that flag to VM_NONE, so make VM_SEALED consistent with all other VMA
flags in this respect.

Additionally, use the next available bit for VM_SEALED, 42, rather than
arbitrarily putting it at 63 and update the declaration to match all other
VMA flags.

No functional change intended.

Link: https://lkml.kernel.org/r/cover.1753431105.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/aeb398a77029b6e7377cd944328bc9bbc3c90537.1753431105.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/damon/vaddr: skip isolating folios already in destination nid
Bijan Tabatabai [Fri, 25 Jul 2025 16:33:00 +0000 (11:33 -0500)]
mm/damon/vaddr: skip isolating folios already in destination nid

damos_va_migrate_dests_add() determines the node a folio should be in
based on the struct damos_migrate_dests associated with the migration
scheme and adds the folio to the linked list corresponding to that node so
it can be migrated later.  Currently, folios are isolated and added to the
list even if they are already in the node they should be in.

In using damon weighted interleave more, I've found that the overhead of
needlessly adding these folios to the migration lists can be quite high.
The overhead comes from isolating folios and placing them in the migration
lists inside of damos_va_migrate_dests_add(), as well as the cost of
handling those folios in damon_migrate_pages().  This patch eliminates
that overhead by simply avoiding the addition of folios that are already
in their intended location to the migration list.

To show the benefit of this patch, we start the test workload and start a
DAMON instance attached to that workload with a migrate_hot scheme that
has one dest field sending data to the local node.  This way, we are only
measuring the overheads of the scheme, and not the cost of migrating
pages, since data will be allocated to the local node by default.  I
tested with two workloads: the embedding reduction workload used in [1]
and a microbenchmark that allocates 20GB of data then sleeps, which is
similar to the memory usage of the embedding reduction workload.

The time taken in damos_va_migrate_dests_add() and damon_migrate_pages()
each aggregation interval is shown below.

Before this patch:
                       damos_va_migrate_dests_add damon_migrate_pages
microbenchmark                   ~2ms                      ~3ms
embedding reduction              ~1s                       ~3s

After this patch:
                       damos_va_migrate_dests_add damon_migrate_pages
microbenchmark                    0us                      ~40us
embedding reduction               0us                      ~100us

I did not do an in depth analysis for why things are much slower in the
embedding reduction workload than the microbenchmark.  However, I assume
it's because the embedding reduction workload oversaturates the bandwidth
of the local memory node, increasing the memory access latency, and in
turn making the pointer chasing involved in iterating through a linked
list much slower.  Regardless of that, this patch results in a significant
speedup.

[1] https://lore.kernel.org/damon/20250709005952.17776-1-bijan311@gmail.com/

Link: https://lkml.kernel.org/r/20250725163300.4602-1-bijan311@gmail.com
Fixes: 19c1dc15c859 ("mm/damon/vaddr: use damos->migrate_dests in migrate_{hot,cold}")
Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Raghavendra K T <raghavendra.kt@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoselftests: cachestat: add tests for mmap, refactor and enhance mmap test for cachesta...
Suresh K C [Wed, 9 Jul 2025 17:46:57 +0000 (23:16 +0530)]
selftests: cachestat: add tests for mmap, refactor and enhance mmap test for cachestat validation

Add a cohesive test case that verifies cachestat behavior with
memory-mapped files using mmap().  Also refactor the test logic to reduce
redundancy, improve error reporting, and clarify failure messages for both
shmem and mmap file types.

[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20250709174657.6916-1-suresh.k.chandrappa@gmail.com
Signed-off-by: Suresh K C <suresh.k.chandrappa@gmail.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Tested-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: add process info to bad rss-counter warning
Xuanye Liu [Wed, 23 Jul 2025 10:09:00 +0000 (18:09 +0800)]
mm: add process info to bad rss-counter warning

Enhance the debugging information in check_mm() by including the process
name and PID when reporting bad rss-counter states.  This helps identify
which process is associated with the memory accounting issue.

Link: https://lkml.kernel.org/r/20250723100901.1909683-1-liuqiye2025@163.com
Signed-off-by: Xuanye Liu <liuqiye2025@163.com>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agokasan: skip quarantine if object is still accessible under RCU
Jann Horn [Wed, 23 Jul 2025 14:59:19 +0000 (16:59 +0200)]
kasan: skip quarantine if object is still accessible under RCU

Currently, enabling KASAN masks bugs where a lockless lookup path gets a
pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be
recycled and is insufficiently careful about handling recycled objects:
KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine
queues, even when it can't actually detect UAF in these objects, and the
quarantine prevents fast recycling.

When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling
CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed
after an RCU grace period and put them on the quarantine, while disabling
CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately;
but that hasn't actually been working.

I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I
could only trigger this bug in a KASAN build by disabling
CONFIG_SLUB_RCU_DEBUG and applying this patch.

Link: https://lkml.kernel.org/r/20250723-kasan-tsbrcu-noquarantine-v1-1-846c8645976c@google.com
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/page-flags: remove folio_start_writeback_keepwrite()
Joanne Koong [Tue, 22 Jul 2025 18:22:30 +0000 (11:22 -0700)]
mm/page-flags: remove folio_start_writeback_keepwrite()

Commit cd57b77197a4 ("ext4: Convert ext4_bio_write_page() to use a folio)
removed set_page_writeback_keepwrite() which was the last/only caller of
folio_start_writeback_keepwrite().

Link: https://lkml.kernel.org/r/20250722182230.2114587-1-joannelkoong@gmail.com
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoselftests/mm: add process_madvise() tests
wang lian [Mon, 21 Jul 2025 11:46:14 +0000 (19:46 +0800)]
selftests/mm: add process_madvise() tests

Add tests for process_madvise(), focusing on verifying behavior under
various conditions including valid usage and error cases.

[lianux.mm@gmail.com: v7]
Link: https://lkml.kernel.org/r/20250729113109.12272-1-lianux.mm@gmail.com
Link: https://lkml.kernel.org/r/20250729113109.12272-1-lianux.mm@gmail.com
Link: https://lkml.kernel.org/r/20250721114614.40996-1-lianux.mm@gmail.com
Signed-off-by: wang lian <lianux.mm@gmail.com>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Zi Yan <ziy@nvidia.com>
Suggested-by: Mark Brown <broonie@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Zi Yan <ziy@nvidia.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: shmem: fix the shmem large folio allocation for the i915 driver
Baolin Wang [Thu, 31 Jul 2025 01:53:43 +0000 (09:53 +0800)]
mm: shmem: fix the shmem large folio allocation for the i915 driver

After commit acd7ccb284b8 ("mm: shmem: add large folio support for
tmpfs"), we extend the 'huge=' option to allow any sized large folios for
tmpfs, which means tmpfs will allow getting a highest order hint based on
the size of write() and fallocate() paths, and then will try each
allowable large order.

However, when the i915 driver allocates shmem memory, it doesn't provide
hint information about the size of the large folio to be allocated,
resulting in the inability to allocate PMD-sized shmem, which in turn
affects GPU performance.

Patryk added:

: In my tests, the performance drop ranges from a few percent up to 13%
: in Unigine Superposition under heavy memory usage on the CPU Core Ultra
: 155H with the Xe 128 EU GPU.  Other users have reported performance
: impact up to 30% on certain workloads.  Please find more in the
: regressions reports:
: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14645
: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13845
:
: I believe the change should be backported to all active kernel branches
: after version 6.12.

To fix this issue, we can use the inode's size as a write size hint in
shmem_read_folio_gfp() to help allocate PMD-sized large folios.

Link: https://lkml.kernel.org/r/f7e64e99a3a87a8144cc6b2f1dddf7a89c12ce44.1753926601.git.baolin.wang@linux.alibaba.com
Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reported-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
Suggested-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agotools/getdelays: add backward compatibility for taskstats version
Fan Yu [Thu, 31 Jul 2025 14:53:26 +0000 (22:53 +0800)]
tools/getdelays: add backward compatibility for taskstats version

Add version checks to print_delayacct() to handle differences in struct
taskstats across kernel versions.  Field availability depends on taskstats
version (t->version), corresponding to TASKSTATS_VERSION in kernel headers
(see include/uapi/linux/taskstats.h).

Version feature mapping:
- version >= 11  - supports COMPACT statistics
- version >= 13  - supports WPCOPY statistics
- version >= 14  - supports IRQ statistics
- version >= 16  - supports *_max and *_min delay statistics

This ensures the tool works correctly with both older and newer kernel
versions by conditionally printing fields based on the reported version.

eg.1
bash# grep -r "#define TASKSTATS_VERSION" /usr/include/linux/taskstats.h
"#define TASKSTATS_VERSION       10"
bash# ./getdelays -d -p 1
CPU                 count     real total  virtual total    delay total  delay average
                     7481     3786181709     3807098291       36393725          0.005ms
IO                  count    delay total  delay average
                      369     1116046035          3.025ms
SWAP                count    delay total  delay average
                        0              0          0.000ms
RECLAIM             count    delay total  delay average
                        0              0          0.000ms
THRASHING           count    delay total  delay average
                        0              0          0.000ms

eg.2
bash# grep -r "#define TASKSTATS_VERSION" /usr/include/linux/taskstats.h
"#define TASKSTATS_VERSION       14"
bash# ./getdelays -d -p 1
CPU                 count     real total  virtual total    delay total  delay average
                    68862   163474790046   174584722267    19962496806          0.290ms
IO                  count    delay total  delay average
                        0              0          0.000ms
SWAP                count    delay total  delay average
                        0              0          0.000ms
RECLAIM             count    delay total  delay average
                        0              0          0.000ms
THRASHING           count    delay total  delay average
                        0              0          0.000ms
COMPACT             count    delay total  delay average
                        0              0          0.000ms
WPCOPY              count    delay total  delay average
                        0              0          0.000ms
IRQ                 count    delay total  delay average
                        0              0          0.000ms

Link: https://lkml.kernel.org/r/20250731225326549CttJ7g9NfjTlaqBwl015T@zte.com.cn
Signed-off-by: Fan Yu <fan.yu9@zte.com.cn>
Cc: Fan Yu <fan.yu9@zte.com.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Wang Yaxin <wang.yaxin@zte.com.cn>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agokho: add test for kexec handover
Mike Rapoport (Microsoft) [Sun, 27 Jul 2025 08:37:33 +0000 (11:37 +0300)]
kho: add test for kexec handover

Testing kexec handover requires a kernel driver that will generate some
data and preserve it with KHO on the first boot and then restore that data
and verify it was preserved properly after kexec.

To facilitate such test, along with the kernel driver responsible for data
generation, preservation and restoration add a script that runs a kernel
in a VM with a minimal /init.  The /init enables KHO, loads a kernel image
for kexec and runs kexec reboot.  After the boot of the kexeced kernel,
the driver verifies that the data was properly preserved.

[rppt@kernel.org: fix section mismatch]
Link: https://lkml.kernel.org/r/aIiRC8fXiOXKbPM_@kernel.org
Link: https://lkml.kernel.org/r/20250727083733.2590139-1-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agodelaytop: enhance error logging and add PSI feature description
fan.yu9@zte.com.cn [Mon, 28 Jul 2025 08:28:34 +0000 (16:28 +0800)]
delaytop: enhance error logging and add PSI feature description

This patch improves error diagnostics and documentation for delaytop:

1) Enhanced error logging:
   - Added explicit error messages in critical failure paths
   - Implemented BOOL_FPRINT macro for robust output handling

2) PSI feature documentation:
   - Updated header comment to reflect PSI monitoring capability
   - Improved output formatting for PSI information

System Pressure Information: (avg10/avg60/avg300/total)
CPU some:       0.0%/   0.0%/   0.0%/     345(ms)
CPU full:       0.0%/   0.0%/   0.0%/       0(ms)
Memory full:    0.0%/   0.0%/   0.0%/       0(ms)
Memory some:    0.0%/   0.0%/   0.0%/       0(ms)
IO full:        0.0%/   0.0%/   0.0%/      65(ms)
IO some:        0.0%/   0.0%/   0.0%/      79(ms)
IRQ full:       0.0%/   0.0%/   0.0%/       0(ms)

Link: https://lkml.kernel.org/r/202507281628341752gMXCMN7S-Vz_LHYHum9r@zte.com.cn
Signed-off-by: Fan Yu <fan.yu9@zte.com.cn>
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Acked-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: Fan Yu <fan.yu9@zte.com.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agosamples: Kconfig: fix spelling mistake "instancess" -> "instances"
Colin Ian King [Thu, 24 Jul 2025 11:17:15 +0000 (12:17 +0100)]
samples: Kconfig: fix spelling mistake "instancess" -> "instances"

There is a spelling mistake in the SAMPLE_TRACE_ARRAY config. Fix it.

Link: https://lkml.kernel.org/r/20250724111715.141826-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>