www.infradead.org Git - nvme.git/log

Merge tag 'mm-nonmm-stable-2025-04-02-22-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more non-MM updates from Andrew Morton:
"One bugfix and a couple of small late-arriving updates"

* tag 'mm-nonmm-stable-2025-04-02-22-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  lib: scatterlist: fix sg_split_phys to preserve original scatterlist offsets
  lib/sort.c: add _nonatomic() variants with cond_resched()
  mailmap: add an entry for Nicolas Schier

Merge tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

- The series "mm: fixes for fallouts from mem_init() cleanup" from Mike
   Rapoport fixes a couple of issues with the just-merged "arch, mm:
   reduce code duplication in mem_init()" series

- The series "MAINTAINERS: add my isub-entries to MM part." from Mike
   Rapoport does some maintenance on MAINTAINERS

- The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some
   cleanup work to the page mapping code

- The series "mseal system mappings" from Jeff Xu permits sealing of
   "system mappings", such as vdso, vvar, vvar_vclock, vectors (arm
   compat-mode), sigpage (arm compat-mode)

- Plus the usual shower of singleton patches

* tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
  mseal sysmap: add arch-support txt
  mseal sysmap: enable s390
  selftest: test system mappings are sealed
  mseal sysmap: update mseal.rst
  mseal sysmap: uprobe mapping
  mseal sysmap: enable arm64
  mseal sysmap: enable x86-64
  mseal sysmap: generic vdso vvar mapping
  selftests: x86: test_mremap_vdso: skip if vdso is msealed
  mseal sysmap: kernel config and header change
  mm: pgtable: remove tlb_remove_page_ptdesc()
  x86: pgtable: convert to use tlb_remove_ptdesc()
  riscv: pgtable: unconditionally use tlb_remove_ptdesc()
  mm: pgtable: convert some architectures to use tlb_remove_ptdesc()
  mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc*
  mm: pgtable: make generic tlb_remove_table() use struct ptdesc
  microblaze/mm: put mm_cmdline_setup() in .init.text section
  mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
  MAINTAINERS: mm: add entry for secretmem
  MAINTAINERS: mm: add entry for numa memblocks and numa emulation
  ...

Merge tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM hotfixes from Andrew Morton:
"Five hotfixes. Three are cc:stable and the remainder address post-6.14
  issues or aren't considered necessary for -stable kernels.

  All patches are for MM"

* tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()
  mm/hugetlb: move hugetlb_sysctl_init() to the __init section
  mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock
  mm/hugetlb_vmemmap: fix memory loads ordering
  mm/userfaultfd: fix release hang over concurrent GUP

Merge tag 'sched_ext-for-6.15-rc0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

- Calling scx_bpf_create_dsq() with the same ID would succeed creating
   duplicate DSQs. Fix it to return -EEXIST.

- scx_select_cpu_dfl() fixes and cleanups.

- Synchronize tool/sched_ext with external scheduler repo. While this
   isn't a fix. There's no risk to the kernel and it's better if they
   stay synced closer.

* tag 'sched_ext-for-6.15-rc0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  tools/sched_ext: Sync with scx repo
  sched_ext: initialize built-in idle state before ops.init()
  sched_ext: create_dsq: Return -EEXIST on duplicate request
  sched_ext: Remove a meaningless conditional goto in scx_select_cpu_dfl()
  sched_ext: idle: Fix return code of scx_select_cpu_dfl()

Merge tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Fix build error when CONFIG_PROBE_EVENTS_BTF_ARGS is not enabled

   The tracing of arguments in the function tracer depends on some
   functions that are only defined when PROBE_EVENTS_BTF_ARGS is
   enabled. In fact, PROBE_EVENTS_BTF_ARGS also depends on all the same
   configs as the function argument tracing requires. Just have the
   function argument tracing depend on PROBE_EVENTS_BTF_ARGS.

- Free module_delta for persistent ring buffer instance

   When an instance holds the persistent ring buffer, it allocates a
   helper array to hold the deltas between where modules are loaded on
   the last boot and the current boot. This array needs to be freed when
   the instance is freed.

- Add cond_resched() to loop in ftrace_graph_set_hash()

   The hash functions in ftrace loop over every function that can be
   enabled by ftrace. This can be 50,000 functions or more. This loop is
   known to trigger soft lockup warnings and requires a cond_resched().
   The loop in ftrace_graph_set_hash() was missing it.

- Fix the event format verifier to include "%*p.." arguments

   To prevent events from dereferencing stale pointers that can happen
   if a trace event uses a dereferece pointer to something that was not
   copied into the ring buffer and can be freed by the time the trace is
   read, a verifier is called. At boot or module load, the verifier
   scans the print format string for pointers that can be dereferenced
   and it checks the arguments to make sure they do not contain
   something that can be freed. The "%*p" was not handled, which would
   add another argument and cause the verifier to not only not verify
   this pointer, but it will look at the wrong argument for every
   pointer after that.

- Fix mcount sorttable building for different endian type target

   When modifying the ELF file to sort the mcount_loc table in the
   sorttable.c code, the endianess of the file and the host is used to
   determine if the bytes need to be swapped when calculations are done.
   A change was made to the sorting of the mcount_loc that read the
   values from the ELF file into an array and the swap happened on the
   filling of the array. But one of the calculations of the array still
   did the swap when it did not need to. This caused building on a
   little endian machine for a big endian target to not find the mcount
   function in the 'nm' table and it zeroed it out, causing there to be
   no functions available to trace.

- Add goto out_unlock jump to rv_register_monitor() on error path

   One of the error paths in rv_register_monitor() just returned the
   error when it should have jumped to the out_unlock label to release
   the mutex.

* tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Fix missing unlock on double nested monitors return path
  scripts/sorttable: Fix endianness handling in build-time mcount sort
  tracing: Verify event formats that have "%*p.."
  ftrace: Add cond_resched() to ftrace_graph_set_hash()
  tracing: Free module_delta on freeing of persistent ring buffer
  ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS

Merge tag 'firewire-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire update from Takashi Sakamoto:
"A single commit to use the common helper function for on-stack
  trailing array to enqueue any isochronous packet by the requests
  from userspace applications"

* tag 'firewire-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: core: avoid -Wflex-array-member-not-at-end warning

Merge tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mikulas Patocka:

- dm-crypt: switch to using the crc32 library

- dm-verity, dm-integrity, dm-crypt: documentation improvement

- dm-vdo fixes

- dm-stripe: enable inline crypto passthrough

- dm-integrity: set ti->error on memory allocation failure

- dm-bufio: remove unused return value

- dm-verity: do forward error correction on metadata I/O errors

- dm: fix unconditional IO throttle caused by REQ_PREFLUSH

- dm cache: prevent BUG_ON by blocking retries on failed device resumes

- dm cache: support shrinking the origin device

- dm: restrict dm device size to 2^63-512 bytes

- dm-delay: support zoned devices

- dm-verity: support block number limits for different ioprio classes

- dm-integrity: fix non-constant-time tag verification (security bug)

- dm-verity, dm-ebs: fix prefetch-vs-suspend race

* tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (27 commits)
  dm-ebs: fix prefetch-vs-suspend race
  dm-verity: fix prefetch-vs-suspend race
  dm-integrity: fix non-constant-time tag verification
  dm-verity: support block number limits for different ioprio classes
  dm-delay: support zoned devices
  dm: restrict dm device size to 2^63-512 bytes
  dm cache: support shrinking the origin device
  dm cache: prevent BUG_ON by blocking retries on failed device resumes
  dm vdo indexer: reorder uds_request to reduce padding
  dm: fix unconditional IO throttle caused by REQ_PREFLUSH
  dm vdo: rework processing of loaded refcount byte arrays
  dm vdo: remove remaining ring references
  dm-verity: do forward error correction on metadata I/O errors
  dm-bufio: remove unused return value
  dm-integrity: set ti->error on memory allocation failure
  dm: Enable inline crypto passthrough for striped target
  dm vdo slab-depot: read refcount blocks in large chunks at load time
  dm vdo vio-pool: allow variable-sized metadata vios
  dm vdo vio-pool: support pools with multiple data blocks per vio
  dm vdo vio-pool: add a pool pointer to pooled_vio
  ...

Merge tag 'libnvdimm-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Ira Weiny:
"Most of the code changes are to remove dead code.

  The bug fixes are minor, Syzkaller and one for broken devices which
  are unlikely to be in the field. So no need to backport them.

   - two patches to remove dead code: nd_attach_ndns() and
     nd_region_conflict() have not been used since 2017 and 2019
     respectively

   - Fix divide-by-0 if device returns a broken LSA value

   - Fix Syzkaller reported bug"

* tag 'libnvdimm-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm/labels: Fix divide error in nd_label_data_init()
  libnvdimm: Remove unused nd_attach_ndns
  libnvdimm: Remove unused nd_region_conflict
  acpi: nfit: fix narrowing conversion in acpi_nfit_ctl

Merge tag 'cxl-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull Compute Express Link (CXL)  updates from Dave Jiang:

- Add support for Global Persistent Flush (GPF)

- Cleanup of DPA partition metadata handling:
     - Remove the CXL_DECODER_MIXED enum that's not needed anymore
     - Introduce helpers to access resource and perf meta data
     - Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
     - Make cxl_dpa_alloc() DPA partition number agnostic
     - Remove cxl_decoder_mode
     - Cleanup partition size and perf helpers

- Remove unused CXL partition values

- Add logging support for CXL CPER endpoint and port protocol errors:
     - Prefix protocol error struct and function names with cxl_
     - Move protocol error definitions and structures to a common location
     - Remove drivers/firmware/efi/cper_cxl.h to include/linux/cper.h
     - Add support in GHES to process CXL CPER protocol errors
     - Process CXL CPER protocol errors
     - Add trace logging for CXL PCIe port RAS errors

- Remove redundant gp_port init

- Add validation of cxl device serial number

- CXL ABI documentation updates/fixups

- A series that uses guard() to clean up open coded mutex lockings and
   remove gotos for error handling.

- Some followup patches to support dirty shutdown accounting:
     - Add helper to retrieve DVSEC offset for dirty shutdown registers
     - Rename cxl_get_dirty_shutdown() to cxl_arm_dirty_shutdown()
     - Add support for dirty shutdown count via sysfs
     - cxl_test support for dirty shutdown

- A series to support CXL mailbox Features commands.

   Mostly in preparation for CXL EDAC code to utilize the Features
   commands. It's also in preparation for CXL fwctl support to utilize
   the CXL Features. The commands include "Get Supported Features", "Get
   Feature", and "Set Feature".

- A series to support extended linear cache support described by the
   ACPI HMAT table.

   The addition helps enumerate the cache and also provides additional
   RAS reporting support for configuration with extended linear cache.
   (and related fixes for the series).

- An update to cxl_test to support a 3-way capable CFMWS

- A documentation fix to remove unused "mixed mode"

* tag 'cxl-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (39 commits)
  cxl/region: Fix the first aliased address miscalculation
  cxl/region: Quiet some dev_warn()s in extended linear cache setup
  cxl/Documentation: Remove 'mixed' from sysfs mode doc
  cxl: Fix warning from emitting resource_size_t as long long int on 32bit systems
  cxl/test: Define a CFMWS capable of a 3 way HB interleave
  cxl/mem: Do not return error if CONFIG_CXL_MCE unset
  tools/testing/cxl: Set Shutdown State support
  cxl/pmem: Export dirty shutdown count via sysfs
  cxl/pmem: Rename cxl_dirty_shutdown_state()
  cxl/pci: Introduce cxl_gpf_get_dvsec()
  cxl/pci: Support Global Persistent Flush (GPF)
  cxl: Document missing sysfs files
  cxl: Plug typos in ABI doc
  cxl/pmem: debug invalid serial number data
  cxl/cdat: Remove redundant gp_port initialization
  cxl/memdev: Remove unused partition values
  cxl/region: Drop goto pattern of construct_region()
  cxl/region: Drop goto pattern in cxl_dax_region_alloc()
  cxl/core: Use guard() to drop goto pattern of cxl_dpa_alloc()
  cxl/core: Use guard() to drop the goto pattern of cxl_dpa_free()
  ...

Merge tag 'usb-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB / Thunderbolt updates from Greg KH:
"Here is the big set of USB and Thunderbolt driver updates for
  6.15-rc1. Included in here are:

   - Thunderbolt driver and core api updates for new hardware and
     features

   - usb-storage const array cleanups

   - typec driver updates

   - dwc3 driver updates

   - xhci driver updates and bugfixes

   - small USB documentation updates

   - usb cdns3 driver updates

   - usb gadget driver updates

   - other small driver updates and fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (92 commits)
  thunderbolt: Do not add non-active NVM if NVM upgrade is disabled for retimer
  thunderbolt: Scan retimers after device router has been enumerated
  usb: host: cdns3: forward lost power information to xhci
  usb: host: xhci-plat: allow upper layers to signal power loss
  usb: xhci: change xhci_resume() parameters to explicit the desired info
  usb: cdns3-ti: run HW init at resume() if HW was reset
  usb: cdns3-ti: move reg writes to separate function
  usb: cdns3: call cdns_power_is_lost() only once in cdns_resume()
  usb: cdns3: rename hibernated argument of role->resume() to lost_power
  usb: xhci: tegra: rename `runtime` boolean to `is_auto_runtime`
  usb: host: xhci-plat: mvebu: use ->quirks instead of ->init_quirk() func
  usb: dwc3: Don't use %pK through printk
  usb: core: Don't use %pK through printk
  usb: gadget: aspeed: Add NULL pointer check in ast_vhub_init_dev()
  dt-bindings: usb: qcom,dwc3: Synchronize minItems for interrupts and -names
  usb: common: usb-conn-gpio: switch psy_cfg from of_node to fwnode
  usb: xhci: Avoid Stop Endpoint retry loop if the endpoint seems Running
  usb: xhci: Don't change the status of stalled TDs on failed Stop EP
  xhci: Avoid queuing redundant Stop Endpoint command for stalled endpoint
  xhci: Handle spurious events on Etron host isoc enpoints
  ...

Merge tag 'tty-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
"Here is the big set of serial and tty driver updates for 6.15-rc1.
  Include in here are the following:

   - more great tty layer cleanups from Jiri. Someday this will be done,
     but that's not going to be any year soon...

   - kdb debug driver reverts to fix a reported issue

   - lots of .dts binding updates for different devices with serial
     devices

   - lots of tiny updates and tweaks and a few bugfixes for different
     serial drivers.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (79 commits)
  tty: serial: fsl_lpuart: Fix unused variable 'sport' build warning
  serial: stm32: do not deassert RS485 RTS GPIO prematurely
  serial: 8250: add driver for NI UARTs
  dt-bindings: serial: snps-dw-apb-uart: document RZ/N1 binding without DMA
  serial: icom: fix code format problems
  serial: sh-sci: Save and restore more registers
  tty: serial: pl011: remove incorrect of_match_ptr annotation
  dt-bindings: serial: snps-dw-apb-uart: Add support for rk3562
  tty: serial: lpuart: only disable CTS instead of overwriting the whole UARTMODIR register
  tty: caif: removed unused function debugfs_tx()
  serial: 8250_dma: terminate correct DMA in tx_dma_flush()
  tty: serial: fsl_lpuart: rename register variables more specifically
  tty: serial: fsl_lpuart: use port struct directly to simply code
  tty: serial: fsl_lpuart: Use u32 and u8 for register variables
  tty: serial: fsl_lpuart: disable transmitter before changing RS485 related registers
  tty: serial: 8250: Add Brainboxes XC devices
  dt-bindings: serial: fsl-lpuart: support i.MX94
  tty: serial: 8250: Add some more device IDs
  dt-bindings: serial: samsung: add exynos7870-uart compatible
  serial: 8250_dw: Comment possible corner cases in serial_out() implementation
  ...

Merge tag 'staging-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver updates from Greg KH:
"Here is the big set of staging driver cleanups and updates for
  6.15-rc1.

  As expected, with the introduction of the gpib drivers, loads of
  cleanups and fixes showed up, with the huge majority of changes being
  for that chunk of drivers. This is good and shows that the community
  can fix up things in public when asked to. Also included in here are:

   - small sm750fb cleanups

   - tiny rtl8723bs cleanups

   - more vchiq_arm cleanups and changes, hopefully this will get out of
     staging soon

  All of these have been in linux-next for almost 2 weeks now with no
  reported issues"

* tag 'staging-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (76 commits)
  staging: rtl8723bs: fixed a unnecessary parentheses coding style issue
  staging: vchiq_arm: Improve initial VCHIQ connect
  staging: vchiq_arm: Create keep-alive thread during probe
  staging: vchiq_arm: Stop kthreads if vchiq cdev register fails
  staging: vchiq_arm: Fix possible NPR of keep-alive thread
  staging: vchiq_arm: Register debugfs after cdev
  staging: vchiq_arm: Don't use %pK through printk
  staging: rtl8723bs: select CONFIG_CRYPTO_LIB_AES
  staging: rtl8723bs: Remove some unused functions, macros, and structs
  staging: gpib: change return type of t1_delay function to report errors
  staging: gpib: remove commented-out lines
  staging: gpib: fix kernel-doc section for usb_gpib_line_status() function
  staging: gpib: fix kernel-doc section for function usb_gpib_interface_clear()
  staging: gpib: fix kernel-doc section for write_loop() function
  staging: gpib: Removing typedef for gpib_board
  staging: gpib: struct typing for gpib_gboard_t
  staging: gpib: tnt4882: struct gpib_board
  staging: gpib: tms9914: struct gpib_board
  staging: gpib: pc2: struct gpib_board
  staging: gpib: ni_usb_gpib: struct gpib_board
  ...

Merge tag 'char-misc-6.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
"Here are two counter driver fixes that I realized I never sent to you
  for 6.14-final.

  They have been in my for weeks, as well as linux-next, my fault for
  not sending them earlier. They are:

   - bugfix for stm32-lptimer-cnt counter driver

   - bugfix for microchip-tcb-capture counter driver

  Again, these have been in linux-next for weeks with no reported
  issues"

* tag 'char-misc-6.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  counter: microchip-tcb-capture: Fix undefined counter channel state on probe
  counter: stm32-lptimer-cnt: fix error handling when enabling

Merge tag 'nfs-for-6.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
"Bugfixes:

   - Three fixes for looping in the NFSv4 state manager delegation code

   - Fix for the NFSv4 state XDR code (Neil Brown)

   - Fix a leaked reference in nfs_lock_and_join_requests()

   - Fix a use-after-free in the delegation return code

  Features:

   - Implement the NFSv4.2 copy offload OFFLOAD_STATUS operation to
     allow monitoring of an in-progress copy

   - Add a mount option to force NFSv3/NFSv4 to use READDIRPLUS in a
     getdents() call

   - SUNRPC now allows some basic management of an existing RPC client's
     connections using sysfs

   - Improvements to the automated teardown of a NFS client when the
     container it was initiated from gets killed

   - Improvements to prevent tasks from getting stuck in a killable wait
     state after calling exit_signals()"

* tag 'nfs-for-6.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (29 commits)
  nfs: Add missing release on error in nfs_lock_and_join_requests()
  NFSv4: Check for delegation validity in nfs_start_delegation_return_locked()
  NFS: Don't allow waiting for exiting tasks
  SUNRPC: Don't allow waiting for exiting tasks
  NFSv4: Treat ENETUNREACH errors as fatal for state recovery
  NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client
  NFSv4: Further cleanups to shutdown loops
  NFS: Shut down the nfs_client only after all the superblocks
  SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
  SUNRPC: rpcbind should never reset the port to the value '0'
  pNFS/flexfiles: Report ENETDOWN as a connection error
  pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
  NFS: Treat ENETUNREACH errors as fatal in containers
  NFS: Add a mount option to make ENETUNREACH errors fatal
  sunrpc: Add a sysfs file for one-step xprt deletion
  sunrpc: Add a sysfs file for adding a new xprt
  sunrpc: Add a sysfs files for rpc_clnt information
  sunrpc: Add a sysfs attr for xprtsec
  NFS: Add implid to sysfs
  NFS: Extend rdirplus mount option with "force|none"
  ...

Merge tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

- Allow connection to server to time out (Joanne Koong)

- If server doesn't support creating a hard link, return EPERM rather
   than ENOSYS (Matt Johnston)

- Allow file names longer than 1024 chars (Bernd Schubert)

- Fix a possible race if request on io_uring queue is interrupted
   (Bernd Schubert)

- Misc fixes and cleanups

* tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: remove unneeded atomic set in uring creation
  fuse: fix uring race condition for null dereference of fc
  fuse: Increase FUSE_NAME_MAX to PATH_MAX
  fuse: Allocate only namelen buf memory in fuse_notify_
  fuse: add default_request_timeout and max_request_timeout sysctls
  fuse: add kernel-enforced timeout option for requests
  fuse: optmize missing FUSE_LINK support
  fuse: Return EPERM rather than ENOSYS from link()
  fuse: removed unused function fuse_uring_create() from header
  fuse: {io-uring} Fix a possible req cancellation race

Merge tag 'ntfs3_for_6.15' of https://github.com/Paragon-Software-Group/linux-ntfs3

Pull ntfs3 updates from Konstantin Komarov:

- Fix integer overflows on 32-bit systems and in hdr_first_de()

- Fix 'proc_info_root' leak on NTFS initialization failure

- Remove unused functions ni_load_attr, ntfs_sb_read, ntfs_flush_inodes

- update inode->i_mapping->a_ops on compression state

- ensure atomicity of write operations

- refactor ntfs_{create/remove}_{procdir,proc_root}()

* tag 'ntfs3_for_6.15' of https://github.com/Paragon-Software-Group/linux-ntfs3:
  fs/ntfs3: Remove unused ntfs_flush_inodes
  fs/ntfs3: Remove unused ntfs_sb_read
  fs/ntfs3: Remove unused ni_load_attr
  fs/ntfs3: Prevent integer overflow in hdr_first_de()
  fs/ntfs3: Fix a couple integer overflows on 32bit systems
  fs/ntfs3: Update inode->i_mapping->a_ops on compression state
  fs/ntfs3: Fix WARNING in ntfs_extend_initialized_size
  fs/ntfs3: Fix 'proc_info_root' leak when init ntfs failed
  fs/ntfs3: Factor out ntfs_{create/remove}_proc_root()
  fs/ntfs3: Factor out ntfs_{create/remove}_procdir()
  fs/ntfs3: Keep write operations atomic

Merge tag 'vfs-6.15-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

- Add a new maintainer for configfs

- Fix exportfs module description

- Place flexible array memeber at the end of an internal struct in the
   mount code

- Add new maintainer for netfslib as Jeff Layton is stepping down as
   current co-maintainer

- Fix error handling in cachefiles_get_directory()

- Cleanup do_notify_pidfd()

- Fix syscall number definitions in pidfd selftests

- Fix racy usage of fs_struct->in exec during multi-threaded exec

- Ensure correct exit code is reported when pidfs_exit() is called from
   release_task() for a delayed thread-group leader exit

- Fix conflicting iomap flag definitions

* tag 'vfs-6.15-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iomap: Fix conflicting values of iomap flags
  fs: namespace: Avoid -Wflex-array-member-not-at-end warning
  MAINTAINERS: configfs: add Andreas Hindborg as maintainer
  exportfs: add module description
  exit: fix the usage of delay_group_leader->exit_code in do_notify_parent() and pidfs_exit()
  netfs: add Paulo as maintainer and remove myself as Reviewer
  cachefiles: Fix oops in vfs_mkdir from cachefiles_get_directory
  exec: fix the racy usage of fs_struct->in_exec
  selftests/pidfd: fixes syscall number defines
  pidfs: cleanup the usage of do_notify_pidfd()

Merge tag 'pm-6.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"Prevent cpufreq_update_limits() from crashing the kernel due to a NULL
  pointer dereference when it is called before registering a cpufreq
  driver, for instance as a result of a notification triggered by the
  platform firmware (Rafael Wysocki)"

* tag 'pm-6.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: Reference count policy in cpufreq_update_limits()

Merge tag 'acpi-6.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more ACPI updates from Rafael Wysocki:
"These are fixes and cleanups on top of the previous ACPI material for
  6.15-rc1 merged recently:

   - Extend the Lenovo Yoga Tab 3 ACPI quirk to skip GPIO event-handlers
     along with ACPI AC and battery which makes it work with Linux when
     started in the Windows mode (Hans de Goede)

   - Prevent the ACPI processor idle driver from being used on systems
     without _CST and with invalid C2/C3 in FADT in order to restore its
     previous (and expected) behavior that has been altered
     inadvertently by a recent code change (Giovanni Gherdovich)

   - Skip ACPI IRQ override on ASUS Vivobook 14 X1404VAP to make the
     internal keyboard work on it (Paul Menzel)

   - Make the ACPI backlight driver handle fetching EDID passed as
     ACPI_TYPE_PACKAGE which is not specification-compliant, but has
     been encountered in the field (Gergo Koteles)

   - Simplify the aggregation of choices in the ACPI platform-profile
     driver which has been unlocked by recent modifications of that
     driver (Kurt Borja)

   - Use str_enabled_disabled() instead of hardcoded strings in the ACPI
     code related to NUMA (Thorsten Blum)

   - Add Intel OC Watchdog device IDs to non-PNP device list to prevent
     PNP from claiming the devices that carry these IDs in which case
     non-PNP drivers cannot bind to them (Diogo Ivo)"

* tag 'acpi-6.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: video: Handle fetching EDID as ACPI_TYPE_PACKAGE
  ACPI: processor: idle: Return an error if both P_LVL{2,3} idle states are invalid
  ACPI: platform_profile: Optimize _aggregate_choices()
  ACPI: x86: Extend Lenovo Yoga Tab 3 quirk with skip GPIO event-handlers
  ACPI: resource: Skip IRQ override on ASUS Vivobook 14 X1404VAP
  ACPI: PNP: Add Intel OC Watchdog IDs to non-PNP device list
  ACPI: NUMA: Use str_enabled_disabled() helper function

Merge tag 'powerpc-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc updates from Michael Ellerman:

- Remove the IBM CAPI (cxl) driver

Thanks to Andrew Donnellan.

* tag 'powerpc-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
docs: Fix references to IBM CAPI (cxl) removal version
cxl: Remove driver

Merge tag 'uml-for-linux-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML updates from Johannes Berg:

- proper nofault accesses and read-only rodata

- hostfs fix for host inode number reuse

- fixes for host errno handling

- various cleanups/small fixes

* tag 'uml-for-linux-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: Rewrite the sigio workaround based on epoll and tgkill
  um: Prohibit the VM_CLONE flag in run_helper_thread()
  um: Switch to the pthread-based helper in sigio workaround
  um: ubd: Switch to the pthread-based helper
  um: Add pthread-based helper support
  um: x86: clean up elf specific definitions
  um: Store full CSGSFS and SS register from mcontext
  um: virt-pci: Refactor virtio_pcidev into its own module
  um: work around sched_yield not yielding in time-travel mode
  um/locking: Remove semicolon from "lock" prefix
  um: Update min_low_pfn to match changes in uml_reserved
  um: use str_yes_no() to remove hardcoded "yes" and "no"
  um: hostfs: avoid issues on inode number reuse by host
  um: Allocate vdso page pointer statically
  um: remove copy_from_kernel_nofault_allowed
  um: mark rodata read-only and implement _nofault accesses
  um: Pass the correct Rust target and options with gcc

Merge tag 'loongarch-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

- Always select HAVE_VIRT_CPU_ACCOUNTING_GEN

- Enable UBSAN (Undefined Behavior Sanitizer)

- Increase MAX_IO_PICS up to 8

- Increase ARCH_DMA_MINALIGN up to 16

- Fix and improve BPF JIT

- Fix and improve vDSO implementation

- Update the default config file

- Some bug fixes and other small changes

* tag 'loongarch-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Update Loongson-3 default config file
  LoongArch: vDSO: Make use of the t8 register for vgetrandom-chacha
  LoongArch: vDSO: Remove --hash-style=sysv
  LoongArch: BPF: Don't override subprog's return value
  LoongArch: BPF: Use move_addr() for BPF_PSEUDO_FUNC
  LoongArch: BPF: Fix off-by-one error in build_prologue()
  LoongArch: Rework the arch_kgdb_breakpoint() implementation
  LoongArch: Fix device node refcount leak in fdt_cpu_clk_init()
  LoongArch: Increase ARCH_DMA_MINALIGN up to 16
  LoongArch: Increase MAX_IO_PICS up to 8
  LoongArch: Fix help text of CMDLINE_EXTEND in Kconfig
  LoongArch: Enable UBSAN (Undefined Behavior Sanitizer)
  LoongArch: Always select HAVE_VIRT_CPU_ACCOUNTING_GEN
  rust: Fix enabling Rust and building with GCC for LoongArch

tools/sched_ext: Sync with scx repo

Synchronize with https://github.com/sched-ext/scx at dc44584874f0 ("kernel:
Synchronize with kernel tools/sched_ext").

- READ/WRITE_ONCE() is made more proper and READA_ONCE_ARENA() is dropped.

- scale_by_task_weight[_inverse]() helpers added.

- Enum defs expanded to cover more and new enums.

- Don't trigger fatal error when some enums are missing from kernel BTF.

Signed-off-by: Tejun Heo <tj@kernel.org>

Merge tag 'x86_tdx_for_6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 TDX updates from Dave Hansen:
"Avoid direct HLT instruction execution in TDX guests.

  TDX guests aren't expected to use the HLT instruction directly. It
  causes a virtualization exception (#VE). While the #VE _can_ be
  handled, the current handling is slow and buggy and the easiest thing
  is just to avoid HLT in the first place. Plus, the kernel already has
  paravirt infrastructure that makes it relatively painless.

  Make TDX guests require paravirt and add some TDX-specific paravirt
  handlers which avoid HLT in the normal halt routines. Also add a
  warning in case another HLT sneaks in.

  There was a report that this leads to a "major performance
  improvement" on specjbb2015, probably because of the extra #VE
  overhead or missed wakeups from the buggy HLT handling"

* tag 'x86_tdx_for_6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tdx: Emit warning if IRQs are enabled during HLT #VE handling
  x86/tdx: Fix arch_safe_halt() execution for TDX VMs
  x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT

Merge tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Ingo Molnar:
"These are objtool fixes and updates by Josh Poimboeuf, centered around
  the fallout from the new CONFIG_OBJTOOL_WERROR=y feature, which,
  despite its default-off nature, increased the profile/impact of
  objtool warnings:

   - Improve error handling and the presentation of warnings/errors

   - Revert the new summary warning line that some test-bot tools
     interpreted as new regressions

   - Fix a number of objtool warnings in various drivers, core kernel
     code and architecture code. About half of them are potential
     problems related to out-of-bounds accesses or potential undefined
     behavior, the other half are additional objtool annotations

   - Update objtool to latest (known) compiler quirks and objtool bugs
     triggered by compiler code generation

   - Misc fixes"

* tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  objtool/loongarch: Add unwind hints in prepare_frametrace()
  rcu-tasks: Always inline rcu_irq_work_resched()
  context_tracking: Always inline ct_{nmi,irq}_{enter,exit}()
  sched/smt: Always inline sched_smt_active()
  objtool: Fix verbose disassembly if CROSS_COMPILE isn't set
  objtool: Change "warning:" to "error: " for fatal errors
  objtool: Always fail on fatal errors
  Revert "objtool: Increase per-function WARN_FUNC() rate limit"
  objtool: Append "()" to function name in "unexpected end of section" warning
  objtool: Ignore end-of-section jumps for KCOV/GCOV
  objtool: Silence more KCOV warnings, part 2
  objtool, drm/vmwgfx: Don't ignore vmw_send_msg() for ORC
  objtool: Fix STACK_FRAME_NON_STANDARD for cold subfunctions
  objtool: Fix segfault in ignore_unreachable_insn()
  objtool: Fix NULL printf() '%s' argument in builtin-check.c:save_argv()
  objtool, lkdtm: Obfuscate the do_nothing() pointer
  objtool, regulator: rk808: Remove potential undefined behavior in rk806_set_mode_dcdc()
  objtool, ASoC: codecs: wcd934x: Remove potential undefined behavior in wcd934x_slim_irq_handler()
  objtool, Input: cyapa - Remove undefined behavior in cyapa_update_fw_store()
  objtool, panic: Disable SMAP in __stack_chk_fail()
  ...

Merge tag 'printk-for-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull more printk updates from Petr Mladek:

- Silence warnings about candidates for ‘gnu_print’ format attribute

* tag 'printk-for-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  vsnprintf: Silence false positive GCC warning for va_format()
  vsnprintf: Drop unused const char fmt * in va_format()
  vsnprintf: Mark binary printing functions with __printf() attribute
  tracing: Mark binary printing functions with __printf() attribute
  seq_file: Mark binary printing functions with __printf() attribute
  seq_buf: Mark binary printing functions with __printf() attribute

Merge tag 'rcu-fixes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU fix from Boqun Feng:

- srcu: Make FORCE_NEED_SRCU_NMI_SAFE depend on RCU_EXPERT

* tag 'rcu-fixes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
srcu: Make FORCE_NEED_SRCU_NMI_SAFE depend on RCU_EXPERT

Merge tag 'kgdb-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux

Pull kgdb updates from Daniel Thompson:
"Two cleanups this cycle. The larger of which is the removal of a
  private allocator within kdb and replacing it with regular memory
  allocation. The other adopts the simplified version of strscpy() in a
  couple of places in kdb"

* tag 'kgdb-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kdb: Remove optional size arguments from strscpy() calls
  kdb: remove usage of static environment buffer

mm/page_alloc: Fix try_alloc_pages

Fix an obvious bug. try_alloc_pages() should set_page_refcounted.

[ Not so obvious: it was probably correct at the time it was written but
  was at some point then rebased on top of v6.14-rc1.

  And at that point there was a semantic conflict with commit
  efabfe1420f5 ("mm/page_alloc: move set_page_refcounted() to callers
  of get_page_from_freelist()") and became buggy.
- Linus ]

Fixes: 97769a53f117 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil BAbka <vbabka@suse.cz>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'v6.15-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:

- revert the multibuffer hash testing as it is buggy

* tag 'v6.15-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
Revert "crypto: testmgr - Add multibuffer hash testing"

nfs: Add missing release on error in nfs_lock_and_join_requests()

Call nfs_release_request() on this error path before returning.

Fixes: c3f2235782c3 ("nfs: fold nfs_folio_find_and_lock_request into nfs_lock_and_join_requests")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/3aaaa3d5-1c8a-41e4-98c7-717801ddd171@stanley.mountain
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

rv: Fix missing unlock on double nested monitors return path

RV doesn't support nested monitors having children monitors themselves
and exits with the EINVAL code. However, it returns without unlocking
the rv_interface_lock.

Unlock the lock before returning from the initialisation function.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250402071351.19864-2-gmonaco@redhat.com
Fixes: cb85c660fcd4 ("rv: Add option for nested monitors and include sched")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202503310200.UBXGitB4-lkp@intel.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

scripts/sorttable: Fix endianness handling in build-time mcount sort

Kernel cross-compilation with BUILDTIME_MCOUNT_SORT produces zeroed
mcount values if the build-host endianness does not match the ELF
file endianness.

The mcount values array is converted from ELF file
endianness to build-host endianness during initialization in
fill_relocs()/fill_addrs(). Avoid extra conversion of these values during
weak-function zeroing; otherwise, they do not match nm-parsed addresses
and all mcount values are zeroed out.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/patch.git-dca31444b0f1.your-ad-here.call-01743554658-ext-8692@work.hours
Fixes: ef378c3b8233 ("scripts/sorttable: Zero out weak functions in mcount_loc table")
Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reported-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Closes: https://lore.kernel.org/all/your-ad-here.call-01743522822-ext-4975@work.hours/
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Verify event formats that have "%*p.."

The trace event verifier checks the formats of trace events to make sure
that they do not point at memory that is not in the trace event itself or
in data that will never be freed. If an event references data that was
allocated when the event triggered and that same data is freed before the
event is read, then the kernel can crash by reading freed memory.

The verifier runs at boot up (or module load) and scans the print formats
of the events and checks their arguments to make sure that dereferenced
pointers are safe. If the format uses "%*p.." the verifier will ignore it,
and that could be dangerous. Cover this case as well.

Also add to the sample code a use case of "%*pbl".

Link: https://lore.kernel.org/all/bcba4d76-2c3f-4d11-baf0-02905db953dd@oracle.com/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers")
Link: https://lore.kernel.org/20250327195311.2d89ec66@gandalf.local.home
Reported-by: Libo Chen <libo.chen@oracle.com>
Reviewed-by: Libo Chen <libo.chen@oracle.com>
Tested-by: Libo Chen <libo.chen@oracle.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ftrace: Add cond_resched() to ftrace_graph_set_hash()

When the kernel contains a large number of functions that can be traced,
the loop in ftrace_graph_set_hash() may take a lot of time to execute.
This may trigger the softlockup watchdog.

Add cond_resched() within the loop to allow the kernel to remain
responsive even when processing a large number of functions.

This matches the cond_resched() that is used in other locations of the
code that iterates over all functions that can be traced.

Cc: stable@vger.kernel.org
Fixes: b9b0c831bed26 ("ftrace: Convert graph filter to use hash tables")
Link: https://lore.kernel.org/tencent_3E06CE338692017B5809534B9C5C03DA7705@qq.com
Signed-off-by: zhoumin <teczm@foxmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Free module_delta on freeing of persistent ring buffer

If a persistent ring buffer is created, a "module_delta" array is also
allocated to hold the module deltas of loaded modules that match modules
in the scratch area. If this buffer gets freed, the module_delta array is
not freed and causes a memory leak.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250401124525.1f9ac02a@gandalf.local.home
Fixes: 35a380ddbc65 ("tracing: Show last module text symbols in the stacktrace")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS

The option PROBE_EVENTS_BTF_ARGS enables the functions
btf_find_func_proto() and btf_get_func_param() which are used by the
function argument tracing code. The option FUNCTION_TRACE_ARGS was
dependent on the same configs that PROBE_EVENTS_BTF_ARGS was dependent on,
but it was also dependent on PROBE_EVENTS_BTF_ARGS. In fact, if
PROBE_EVENTS_BTF_ARGS is supported then FUNCTION_TRACE_ARGS is supported.

Just make FUNCTION_TRACE_ARGS depend on PROBE_EVENTS_BTF_ARGS.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20250401113601.17fa1129@gandalf.local.home
Fixes: 533c20b062d7c ("ftrace: Add print_function_args()")
Closes: https://lore.kernel.org/all/DB9PR08MB75820599801BAD118D123D7D93AD2@DB9PR08MB7582.eurprd08.prod.outlook.com/
Reported-by: Christian Loehle <Christian.Loehle@arm.com>
Tested-by: Christian Loehle <Christian.Loehle@arm.com>
Tested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Merge branches 'acpi-video', 'acpi-platform-profile' and 'acpi-misc'

Merge an ACPI backlight (video) driver fix, an ACPI platform-profile
driver optimization, and a miscellaneous ACPI-related cleanup for
6.15-rc1:

- Make the ACPI backlight driver handle fetching EDID as
   ACPI_TYPE_PACKAGE which is not specification-compliant, but
   has been encountered in the field (Gergo Koteles).

- Simplify the aggregation of choices in the ACPI platform-profile
   driver which has become possible after recent modifications of that
   driver (Kurt Borja).

- Use str_enabled_disabled() instead of hardcoded strings in the ACPI
   code related to NUMA (Thorsten Blum).

* acpi-video:
  ACPI: video: Handle fetching EDID as ACPI_TYPE_PACKAGE

* acpi-platform-profile:
  ACPI: platform_profile: Optimize _aggregate_choices()

* acpi-misc:
  ACPI: NUMA: Use str_enabled_disabled() helper function

Merge branches 'acpi-x86', 'acpi-processor' and 'acpi-resource'

Merge an x86-specific ACPI fix, an ACPI processor driver fix, and a new
ACPI resources management quirk for 6.15-rc1:

- Extend the Lenovo Yoga Tab 3 ACPI quirk to skip GPIO event-handlers
   along with ACPI AC and battery which makes it work with Linux when
   started in the Windows mode (Hans de Goede).

- Prevent the ACPI processor idle driver from being used on systems
   without _CST and with invalid C2/C3 in FADT in order to restore its
   previous (and expected) behavior that has been altered inadvertently
   by a recent code change (Giovanni Gherdovich).

- Skip ACPI IRQ override on ASUS Vivobook 14 X1404VAP to make the
   internal keyboard work on it (Paul Menzel).

* acpi-x86:
  ACPI: x86: Extend Lenovo Yoga Tab 3 quirk with skip GPIO event-handlers

* acpi-processor:
  ACPI: processor: idle: Return an error if both P_LVL{2,3} idle states are invalid

* acpi-resource:
  ACPI: resource: Skip IRQ override on ASUS Vivobook 14 X1404VAP

docs: Fix references to IBM CAPI (cxl) removal version

The IBM CAPI (cxl) driver was removed in 6.15, not 6.14.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Rather tiny pull request, mostly so that we can get into our trees
  your fix to the x86 Makefile.

  Current release - regressions:

   - Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc", error
     queue accounting was missed

  Current release - new code bugs:

   - 5 fixes for the netdevice instance locking work

  Previous releases - regressions:

   - usbnet: restore usb%d name exception for local mac addresses

  Previous releases - always broken:

   - rtnetlink: allocate vfinfo size for VF GUIDs when supported, avoid
     spurious GET_LINK failures

   - eth: mana: Switch to page pool for jumbo frames

   - phy: broadcom: Correct BCM5221 PHY model detection

  Misc:

   - selftests: drv-net: replace helpers for referring to other files"

* tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits)
  Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc"
  bnxt_en: bring back rtnl lock in bnxt_shutdown
  eth: gve: add missing netdev locks on reset and shutdown paths
  selftests: mptcp: ignore mptcp_diag binary
  selftests: mptcp: close fd_in before returning in main_loop
  selftests: mptcp: fix incorrect fd checks in main_loop
  mptcp: fix NULL pointer in can_accept_new_subflow
  octeontx2-af: Free NIX_AF_INT_VEC_GEN irq
  octeontx2-af: Fix mbox INTR handler when num VFs > 64
  net: fix use-after-free in the netdev_nl_sock_priv_destroy()
  selftests: net: use Path helpers in ping
  selftests: net: use the dummy bpf from net/lib
  selftests: drv-net: replace the rpath helper with Path objects
  net: lapbether: use netdev_lockdep_set_classes() helper
  net: phy: broadcom: Correct BCM5221 PHY model detection
  net: usb: usbnet: restore usb%d name exception for local mac addresses
  net/mlx5e: SHAMPO, Make reserved size independent of page size
  net: mana: Switch to page pool for jumbo frames
  MAINTAINERS: Add dedicated entries for phy_link_topology
  net: move replay logic to tc_modify_qdisc
  ...

Merge tag 'vfio-v6.15-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

- Relax IGD support code to match display class device rather than
   specifically requiring a VGA device (Tomita Moeko)

- Accelerate DMA mapping of device MMIO by iterating at PMD and PUD
   levels to take advantage of huge pfnmap support added in v6.12
   (Alex Williamson)

- Extend virtio vfio-pci variant driver to include migration support
   for block devices where enabled by the PF (Yishai Hadas)

- Virtualize INTx PIN register for devices where the platform does not
   route legacy PCI interrupts for the device and the interrupt is
   reported as IRQ_NOTCONNECTED (Alex Williamson)

* tag 'vfio-v6.15-rc1' of https://github.com/awilliam/linux-vfio:
  vfio/pci: Handle INTx IRQ_NOTCONNECTED
  vfio/virtio: Enable support for virtio-block live migration
  vfio/type1: Use mapping page mask for pfnmaps
  mm: Provide address mask in struct follow_pfnmap_args
  vfio/type1: Use consistent types for page counts
  vfio/type1: Use vfio_batch for vaddr_get_pfns()
  vfio/type1: Convert all vaddr_get_pfns() callers to use vfio_batch
  vfio/type1: Catch zero from pin_user_pages_remote()
  vfio/pci: match IGD devices in display controller class

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
"A small number of improvements all over the place:

   - shutdown has been reworked to reset devices

   - virtio fs is now allowed in vduse

   - vhost-scsi memory use has been reduced

   - cleanups, fixes all over the place

  A couple more fixes are being tested and will be merged after rc1"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost-scsi: Reduce response iov mem use
  vhost-scsi: Allocate iov_iter used for unaligned copies when needed
  vhost-scsi: Stop duplicating se_cmd fields
  vhost-scsi: Dynamically allocate scatterlists
  vhost-scsi: Return queue full for page alloc failures during copy
  vhost-scsi: Add better resource allocation failure handling
  vhost-scsi: Allocate T10 PI structs only when enabled
  vhost-scsi: Reduce mem use by moving upages to per queue
  vduse: add virtio_fs to allowed dev id
  sound/virtio: Fix cancel_sync warnings on uninitialized work_structs
  vdpa/mlx5: Fix oversized null mkey longer than 32bit
  vdpa/mlx5: Fix mlx5_vdpa_get_config() endianness on big-endian machines
  vhost-scsi: Fix handling of multiple calls to vhost_scsi_set_endpoint
  tools: virtio/linux/module.h add MODULE_DESCRIPTION() define.
  tools: virtio/linux/compiler.h: Add data_race() define.
  tools/virtio: Add DMA_MAPPING_ERROR and sg_dma_len api define for virtio test
  virtio: break and reset virtio devices on device_shutdown()

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
"Two significant new items:

   - Allow reporting IOMMU HW events to userspace when the events are
     clearly linked to a device.

     This is linked to the VIOMMU object and is intended to be used by a
     VMM to forward HW events to the virtual machine as part of
     emulating a vIOMMU. ARM SMMUv3 is the first driver to use this
     mechanism. Like the existing fault events the data is delivered
     through a simple FD returning event records on read().

   - PASID support in VFIO.

     The "Process Address Space ID" is a PCI feature that allows the
     device to tag all PCI DMA operations with an ID. The IOMMU will
     then use the ID to select a unique translation for those DMAs. This
     is part of Intel's vIOMMU support as VT-D HW requires the
     hypervisor to manage each PASID entry.

     The support is generic so any VFIO user could attach any
     translation to a PASID, and the support should work on ARM SMMUv3
     as well. AMD requires additional driver work.

  Some minor updates, along with fixes:

   - Prevent using nested parents with fault's, no driver support today

   - Put a single "cookie_type" value in the iommu_domain to indicate
     what owns the various opaque owner fields"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits)
  iommufd: Test attach before detaching pasid
  iommufd: Fix iommu_vevent_header tables markup
  iommu: Convert unreachable() to BUG()
  iommufd: Balance veventq->num_events inc/dec
  iommufd: Initialize the flags of vevent in iommufd_viommu_report_event()
  iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO
  iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid
  vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices
  ida: Add ida_find_first_range()
  iommufd/selftest: Add coverage for iommufd pasid attach/detach
  iommufd/selftest: Add test ops to test pasid attach/detach
  iommufd/selftest: Add a helper to get test device
  iommufd/selftest: Add set_dev_pasid in mock iommu
  iommufd: Allow allocating PASID-compatible domain
  iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  iommufd: Enforce PASID-compatible domain for RID
  iommufd: Support pasid attach/replace
  iommufd: Enforce PASID-compatible domain in PASID path
  iommufd/device: Add pasid_attach array to track per-PASID attach
  ...

Merge tag 'edac_urgent_for_v6.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC documentation fix from Borislav Petkov:

- A single fix making sure the EDAC subtree is included in the
documentation table of contents

* tag 'edac_urgent_for_v6.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
Documentation/EDAC: Fix warning document isn't included in any toctree

Merge branch 'topic/cxl' into next

This merges in the removal of the IBM CAPI "cxl" driver.

Merge tag 'thermal-6.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more thermal control updates from Rafael Wysocki:
"These are mostly assorted updates of thermal drivers used on ARM
  platforms:

   - Use dev_err_probe() helpers to simplify the init code in the Qoriq
     thermal driver (Frank Li)

   - Power down the Qoriq's TMU at suspend time (Alice Guo)

   - Add ipq5332, ipq5424 compatible to the QCom's tsens thermal driver
     and TSENS enable / calibration support for V2 (Praveenkumar I)

   - Add missing rk3328 mapping entry (Trevor Woerner)

   - Remove duplicate struct declaration from the thermal core header
     file (Xueqin Luo)

   - Disable the monitoring mode during suspend in the LVTS Mediatek
     driver to prevent temperature acquisition glitches (Nícolas F. R.
     A. Prado)

   - Disable Stage 3 thermal threshold in the LVTS Mediatek driver
     because it disables the suspend ability and does not have an
     interrupt handler (Nícolas F. R. A. Prado)

   - Fix low temperature offset interrupt in the LVTS Mediatek driver to
     prevent multiple interrupts from triggering when the system is at
     its normal functionning temperature (Nícolas F. R. A. Prado)

   - Enable interrupts in the LVTS Mediatek driver only on sensors that
     are in use (Nícolas F. R. A. Prado)

   - Add the BCM74110 compatible DT binding and the corresponding code
     to support a chip based on a different process node than previous
     chips (Florian Fainelli)

   - Correct indentation and style in DTS example (Krzysztof Kozlowski)

   - Unify hexadecimal annotatation in the rcar_gen3 driver (Niklas
     Söderlund)

   - Factor out the code logic to read fuses on Gen3 and Gen4 in the
     rcar_gen3 thermal driver (Niklas Söderlund)

   - Drop unused driver data from the QCom's spmi temperature alarm
     driver (Johan Hovold)"

* tag 'thermal-6.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal/drivers/qcom-spmi-temp-alarm: Drop unused driver data
  thermal: rcar_gen3: Reuse logic to read fuses on Gen3 and Gen4
  thermal: rcar_gen3: Use lowercase hex constants
  dt-bindings: thermal: Correct indentation and style in DTS example
  thermal/drivers/brcmstb_thermal: Add support for BCM74110
  dt-bindings: thermal: Update for BCM74110
  thermal/drivers/mediatek/lvts: Only update IRQ enable for valid sensors
  thermal/drivers/mediatek/lvts: Start sensor interrupts disabled
  thermal/drivers/mediatek/lvts: Disable low offset IRQ for minimum threshold
  thermal/drivers/mediatek/lvts: Disable Stage 3 thermal threshold
  thermal/drivers/mediatek/lvts: Disable monitor mode during suspend
  thermal: core: Remove duplicate struct declaration
  thermal/drivers/rockchip: Add missing rk3328 mapping entry
  thermal/drivers/tsens: Add TSENS enable and calibration support for V2
  dt-bindings: thermal: tsens: Add ipq5332, ipq5424 compatible
  thermal/drivers/qoriq: Power down TMU on system suspend
  thermal/drivers/qoriq: Use dev_err_probe() simplify the code

Merge tag 'i3c/for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux

Pull i3c updates from Alexandre Belloni:
"The silvaco driver gets support for the integration of the IP in the
  Nuvoton npcm845 SoC. There is also a fix for a possible NULL pointer
  dereference that can happen with early IBIs. Summary:

  Core:

    - Fix a possible NULL pointer dereference due to IBI coming when the
      target driver is not yet probed.

  Drivers:

    - mipi-i3c-hci: Use I2C DMA-safe api

    - svc: add Nuvoton npcm845 support"

* tag 'i3c/for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  i3c: Add NULL pointer check in i3c_master_queue_ibi()
  i3c: master: Drop duplicate check before calling OF APIs
  i3c: master: svc: Fix implicit fallthrough in svc_i3c_master_ibi_work()
  i3c: master: svc: Fix missing STOP for master request
  i3c: master: svc: Use readsb helper for reading MDB
  i3c: master: svc: Fix missing the IBI rules
  i3c: master: svc: Fix i3c_master_get_free_addr return check
  i3c: master: svc: Fix npcm845 DAA process corruption
  i3c: master: svc: Fix npcm845 invalid slvstart event
  i3c: master: svc: Fix npcm845 FIFO empty issue
  i3c: master: svc: Add support for Nuvoton npcm845 i3c
  dt-bindings: i3c: silvaco: Add npcm845 compatible string
  dt-bindings: i3c: dw: Add power-domains
  i3c: master: svc: Flush FIFO before sending Dynamic Address Assignment(DAA)
  i3c: mipi-i3c-hci: Use I2C DMA-safe api
  i3c: Remove the const qualifier from i2c_msg pointer in i2c_xfers API
  MAINTAINERS: Add Frank Li to Silvaco I3C
  MAINTAINERS: Remove Conor Culhane from Silvaco I3C

Merge tag 'linux-watchdog-6.15-rc1' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

- Add watchdog driver for Lenovo SE30 platform

- Add support for Allwinner A523

- Add i.MX94 support

- watchdog framework: Convert to use device property

- renesas,wdt: Document RZ/G3E support

- Various other fixes and improvemenents

* tag 'linux-watchdog-6.15-rc1' of git://www.linux-watchdog.org/linux-watchdog:
  watchdog: sunxi_wdt: Add support for Allwinner A523
  dt-bindings: watchdog: sunxi: add Allwinner A523 compatible string
  watchdog: aspeed: fix 64-bit division
  watchdog: npcm: Remove unnecessary NULL check before clk_prepare_enable/clk_disable_unprepare
  dt-bindings: watchdog: renesas,wdt: Document RZ/G3E support
  watchdog: Convert to use device property
  watchdog: lenovo_se30_wdt: include io.h for devm_ioremap()
  dt-bindings: watchdog: fsl-imx7ulp-wdt: Add i.MX94 support
  watchdog: nic7018_wdt: tidy up ACPI ID table
  watchdog: s3c2410_wdt: Fix PMU register bits for ExynosAutoV920 SoC
  watchdog: lenovo_se30_wdt: Watchdog driver for Lenovo SE30 platform
  watchdog: Enable RZV2HWDT driver depend on ARCH_RENESAS
  watchdog: cros-ec: Add newlines to printks
  watchdog: aspeed: Update bootstatus handling

lib: scatterlist: fix sg_split_phys to preserve original scatterlist offsets

The split_sg_phys function was incorrectly setting the offsets of all
scatterlist entries (except the first) to 0.  Only the first scatterlist
entry's offset and length needs to be modified to account for the skip.
Setting the rest entries' offsets to 0 could lead to incorrect data
access.

I am using this function in a crypto driver that I'm currently developing
(not yet sent to mailing list).  During testing, it was observed that the
output scatterlists (except the first one) contained incorrect garbage
data.

I narrowed this issue down to the call of sg_split().  Upon debugging
inside this function, I found that this resetting of offset is the cause
of the problem, causing the subsequent scatterlists to point to incorrect
memory locations in a page.  By removing this code, I am obtaining
expected data in all the split output scatterlists.  Thus, this was indeed
causing observable runtime effects!

This patch removes the offending code, ensuring that the page offsets in
the input scatterlist are preserved in the output scatterlist.

Link: https://lkml.kernel.org/r/20250319111437.1969903-1-t-pratham@ti.com
Fixes: f8bcbe62acd0 ("lib: scatterlist: add sg splitting function")
Signed-off-by: T Pratham <t-pratham@ti.com>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kamlesh Gurudasani <kamlesh@ti.com>
Cc: Praneeth Bajjuri <praneeth@ti.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/sort.c: add _nonatomic() variants with cond_resched()

bcachefs calls sort() during recovery to sort all keys it found in the
journal, and this may be very large - gigabytes on large machines.

This has been causing "task blocked" warnings, so needs a
cond_resched().

[kent.overstreet@linux.dev: fix kerneldoc]
Link: https://lkml.kernel.org/r/cgsr5a447pxqomc4gvznsp5yroqmif4omd7o5lsr2swifjhoic@yzjjrx2bvrq7
Link: https://lkml.kernel.org/r/20250326152606.2594920-1-kent.overstreet@linux.dev
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mailmap: add an entry for Nicolas Schier

Add mapping to linux.dev address as mail usage at work is limited and my
mail provider for private mails is suffering from its own domain name
block lists.

Link: https://lkml.kernel.org/r/20250326-mailmap-add-entry-for-nicolas-v1-1-3c26579a7fdf@fjasle.eu
Signed-off-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mseal sysmap: add arch-support txt

Add Documentation/features/core/mseal_sys_mappings/arch-support.txt

N/A: the arch is 32bits only and mseal is not supported in 32 bits,
therefore N/A (until mseal is available in 32 bits kernel).

[jeffxu@chromium.org: update to v3]
Link: https://lkml.kernel.org/r/20250324151537.1106542-2-jeffxu@google.com
Link: https://lkml.kernel.org/r/20250321032627.4147562-2-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Eric Dumaze <edumazet@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: guoweikang <guoweikang.kernel@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Meghana Malladi <m-malladi@ti.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mseal sysmap: enable s390

Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on s390, covering the
vdso.

[hca@linux.ibm.com: update supported architectures]
Link: https://lkml.kernel.org/r/20250317131917.1332402-1-hca@linux.ibm.com
Link: https://lkml.kernel.org/r/20250311123326.2686682-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftest: test system mappings are sealed

Add sysmap_is_sealed.c to test system mappings are sealed.

Note: CONFIG_MSEAL_SYSTEM_MAPPINGS must be set, as indicated in
config file.

Link: https://lkml.kernel.org/r/20250305021711.3867874-8-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mseal sysmap: update mseal.rst

Update memory sealing documentation to include details about system
mappings.

Link: https://lkml.kernel.org/r/20250305021711.3867874-7-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mseal sysmap: uprobe mapping

Provide support to mseal the uprobe mapping.

Unlike other system mappings, the uprobe mapping is not established during
program startup. However, its lifetime is the same as the process's
lifetime. It could be sealed from creation.

Test was done with perf tool, and observe the uprobe mapping is sealed.

Link: https://lkml.kernel.org/r/20250305021711.3867874-6-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mseal sysmap: enable arm64

Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on arm64, covering the
vdso, vvar, and compat-mode vectors and sigpage mappings.

Production release testing passes on Android and Chrome OS.

Link: https://lkml.kernel.org/r/20250305021711.3867874-5-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mseal sysmap: enable x86-64

Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on x86-64, covering the
vdso, vvar, vvar_vclock.

Production release testing passes on Android and Chrome OS.

Link: https://lkml.kernel.org/r/20250305021711.3867874-4-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mseal sysmap: generic vdso vvar mapping

With the introduction of the generic vdso data storage the VM_SEALED_SYSMAP
vm flag must be moved from the architecture specific
_install_special_mapping() call [1] [2] which maps the vvar mapping to
generic code.

[1] https://lkml.kernel.org/r/20250305021711.3867874-4-jeffxu@google.com
[2] https://lkml.kernel.org/r/20250305021711.3867874-5-jeffxu@google.com

Link: https://lkml.kernel.org/r/20250311123326.2686682-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests: x86: test_mremap_vdso: skip if vdso is msealed

Add code to detect if the vdso is memory sealed, skip the test if it is.

Link: https://lkml.kernel.org/r/20250305021711.3867874-3-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mseal sysmap: kernel config and header change

Patch series "mseal system mappings", v9.

As discussed during mseal() upstream process [1], mseal() protects the
VMAs of a given virtual memory range against modifications, such as the
read/write (RW) and no-execute (NX) bits.  For complete descriptions of
memory sealing, please see mseal.rst [2].

The mseal() is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system.  For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped.

The system mappings are readonly only, memory sealing can protect them
from ever changing to writable or unmmap/remapped as different attributes.

System mappings such as vdso, vvar, vvar_vclock, vectors (arm
compat-mode), sigpage (arm compat-mode), are created by the kernel during
program initialization, and could be sealed after creation.

Unlike the aforementioned mappings, the uprobe mapping is not established
during program startup.  However, its lifetime is the same as the
process's lifetime [3].  It could be sealed from creation.

The vsyscall on x86-64 uses a special address (0xffffffffff600000), which
is outside the mm managed range.  This means mprotect, munmap, and mremap
won't work on the vsyscall.  Since sealing doesn't enhance the vsyscall's
security, it is skipped in this patch.  If we ever seal the vsyscall, it
is probably only for decorative purpose, i.e.  showing the 'sl' flag in
the /proc/pid/smaps.  For this patch, it is ignored.

It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the system mappings during restore operations.  UML(User Mode Linux)
and gVisor, rr are also known to change the vdso/vvar mappings.
Consequently, this feature cannot be universally enabled across all
systems.  As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.

To support mseal of system mappings, architectures must define
CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special
mappings calls to pass mseal flag.  Additionally, architectures must
confirm they do not unmap/remap system mappings during the process
lifetime.  The existence of this flag for an architecture implies that it
does not require the remapping of thest system mappings during process
lifetime, so sealing these mappings is safe from a kernel perspective.

This version covers x86-64 and arm64 archiecture as minimum viable feature.

While no specific CPU hardware features are required for enable this
feature on an archiecture, memory sealing requires a 64-bit kernel.  Other
architectures can choose whether or not to adopt this feature.  Currently,
I'm not aware of any instances in the kernel code that actively
munmap/mremap a system mapping without a request from userspace.  The PPC
does call munmap when _install_special_mapping fails for vdso; however,
it's uncertain if this will ever fail for PPC - this needs to be
investigated by PPC in the future [4].  The UML kernel can add this
support when KUnit tests require it [5].

In this version, we've improved the handling of system mapping sealing
from previous versions, instead of modifying the _install_special_mapping
function itself, which would affect all architectures, we now call
_install_special_mapping with a sealing flag only within the specific
architecture that requires it.  This targeted approach offers two key
advantages: 1) It limits the code change's impact to the necessary
architectures, and 2) It aligns with the software architecture by keeping
the core memory management within the mm layer, while delegating the
decision of sealing system mappings to the individual architecture, which
is particularly relevant since 32-bit architectures never require sealing.

Prior to this patch series, we explored sealing special mappings from
userspace using glibc's dynamic linker.  This approach revealed several
issues:

- The PT_LOAD header may report an incorrect length for vdso, (smaller
  than its actual size).  The dynamic linker, which relies on PT_LOAD
  information to determine mapping size, would then split and partially
  seal the vdso mapping.  Since each architecture has its own vdso/vvar
  code, fixing this in the kernel would require going through each
  archiecture.  Our initial goal was to enable sealing readonly mappings,
  e.g.  .text, across all architectures, sealing vdso from kernel since
  creation appears to be simpler than sealing vdso at glibc.

- The [vvar] mapping header only contains address information, not
  length information.  Similar issues might exist for other special
  mappings.

- Mappings like uprobe are not covered by the dynamic linker, and there
  is no effective solution for them.

This feature's security enhancements will benefit ChromeOS, Android, and
other high security systems.

Testing:
This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
  i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and
  Android to ensure the sealing doesn't affect the functionality of
  Chromebook and Android phone.

I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
  normal operations such as browsing the web, open/edit doc are OK.

Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/
Link: Documentation/userspace-api/mseal.rst [2]
Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/
Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQSt+HA@mail.gmail.com/
Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/
This patch (of 7):

Provide infrastructure to mseal system mappings.  Establish two kernel
configs (CONFIG_MSEAL_SYSTEM_MAPPINGS,
ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS) and VM_SEALED_SYSMAP macro for future
patches.

Link: https://lkml.kernel.org/r/20250305021711.3867874-1-jeffxu@google.com
Link: https://lkml.kernel.org/r/20250305021711.3867874-2-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: pgtable: remove tlb_remove_page_ptdesc()

The tlb_remove_ptdesc()/tlb_remove_table() is specially designed for page
table pages, and now all architectures have been converted to use it to
remove page table pages. So let's remove tlb_remove_page_ptdesc(), it
currently has no users and should not be used for page table pages.

Link: https://lkml.kernel.org/r/3df04c8494339073b71be4acb2d92e108ecd1b60.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

x86: pgtable: convert to use tlb_remove_ptdesc()

The x86 has already been converted to use struct ptdesc, so convert it to
use tlb_remove_ptdesc() instead of tlb_remove_table().

Link: https://lkml.kernel.org/r/36ad56b7e06fa4b17fb23c4fc650e8e0d72bb3cd.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

riscv: pgtable: unconditionally use tlb_remove_ptdesc()

To support fast gup, the commit 69be3fb111e7 ("riscv: enable
MMU_GATHER_RCU_TABLE_FREE for SMP && MMU") did the following:

1) use tlb_remove_page_ptdesc() for those platforms which use IPI to
   perform TLB shootdown

2) use tlb_remove_ptdesc() for those platforms which use SBI to perform
   TLB shootdown

The tlb_remove_page_ptdesc() is the wrapper of the tlb_remove_page().  By
design, the tlb_remove_page() should be used to remove a normal page from
a page table entry, and should not be used for page table pages.

The tlb_remove_ptdesc() is the wrapper of the tlb_remove_table(), which is
designed specifically for freeing page table pages.  If the
CONFIG_MMU_GATHER_TABLE_FREE is enabled, the tlb_remove_table() will use
semi RCU to free page table pages, that is:

- batch table freeing: asynchronous free by RCU
- single table freeing: IPI + synchronous free

If the CONFIG_MMU_GATHER_TABLE_FREE is disabled, the tlb_remove_table()
will fall back to pagetable_dtor() + tlb_remove_page().

For case 1), since we need to perform TLB shootdown before freeing the
page table page, the local_irq_save() in fast gup can block the freeing
and protect the fast gup page walker.  Therefore we can ensure safety by
just using tlb_remove_page_ptdesc().  In addition, we can also the
tlb_remove_ptdesc()/tlb_remove_table() to achieve it, and it doesn't
matter whether CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected.  And in
theory, the performance of freeing pages asynchronously via RCU will not
be lower than synchronous free.

For case 2), since local_irq_save() only disable S-privilege IPI irq but
not M-privilege's, which is used by the SBI implementation to perform TLB
shootdown, so we must select CONFIG_MMU_GATHER_RCU_TABLE_FREE and use
tlb_remove_ptdesc() to ensure safety.  The riscv selects this config for
SMP && MMU, the CONFIG_RISCV_SBI is dependent on MMU.  Therefore, only the
UP system may have the situation where CONFIG_MMU_GATHER_RCU_TABLE_FREE is
disabled but CONFIG_RISCV_SBI is enabled.  But there is no freeing vs fast
gup race in the UP system.

So, in summary, we can use tlb_remove_ptdesc() to support fast gup in all
cases, and this interface is specifically designed for page table pages.
So let's use it unconditionally.

Link: https://lkml.kernel.org/r/9025595e895515515c95e48db54b29afa489c41d.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: pgtable: convert some architectures to use tlb_remove_ptdesc()

Now, the nine architectures of csky, hexagon, loongarch, m68k, mips,
nios2, openrisc, sh and um do not select CONFIG_MMU_GATHER_RCU_TABLE_FREE,
and just call pagetable_dtor() + tlb_remove_page_ptdesc() (the wrapper of
tlb_remove_page()). This is the same as the implementation of
tlb_remove_{ptdesc|table}() under !CONFIG_MMU_GATHER_TABLE_FREE, so
convert these architectures to use tlb_remove_ptdesc().

The ultimate goal is to make the architecture only use tlb_remove_ptdesc()
or tlb_remove_table() for page table pages.

[zhengqi.arch@bytedance.com: v2]
Link: https://lkml.kernel.org/r/20250303072603.45423-1-zhengqi.arch@bytedance.com
[akpm@linux-foundation.org: remove trailing semi in arch/loongarch/include/asm/pgalloc.h]
Link: https://lkml.kernel.org/r/19db3e8673b67bad2f1df1ab37f1c89d99eacfea.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc*

All callers of tlb_remove_ptdesc() pass it a pointer of struct ptdesc, so
let's change the pt parameter from void * to struct ptdesc * to perform a
type safety check.

Link: https://lkml.kernel.org/r/60bb44299cf2d731df6592e446e7f694054d0dbe.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: pgtable: make generic tlb_remove_table() use struct ptdesc

Patch series "remove tlb_remove_page_ptdesc()", v2.

As suggested by Peter Zijlstra below [1], this series aims to remove
tlb_remove_page_ptdesc().

: Fundamentally tlb_remove_page() is about removing *pages* as from a PTE,
: there should not be a page-table anywhere near here *ever*.
:
: Yes, some architectures use tlb_remove_page() for page-tables too, but
: that is more or less an implementation detail that can be fixed.

After this series, all architectures use tlb_remove_table() or
tlb_remove_ptdesc() to remove the page table pages.  In the future, once
all architectures using tlb_remove_table() have also converted to using
struct ptdesc (eg.  powerpc), it may be possible to use only
tlb_remove_ptdesc().

[1] https://lore.kernel.org/linux-mm/20250103111457.GC22934@noisy.programming.kicks-ass.net/

This patch (of 6):

Now only arm will call tlb_remove_ptdesc()/tlb_remove_table() when
CONFIG_MMU_GATHER_TABLE_FREE is disabled.  In this case, the type of the
table parameter is actually struct ptdesc * instead of struct page *.

Since struct ptdesc still overlaps with struct page and has not been
separated from it, forcing the table parameter to struct page * will not
cause any problems at this time.  But this is definitely incorrect and
needs to be fixed.  So just like the generic __tlb_remove_table(), let
generic tlb_remove_table() use struct ptdesc by default when
CONFIG_MMU_GATHER_TABLE_FREE is disabled.

Link: https://lkml.kernel.org/r/cover.1740454179.git.zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/5be8c3ab7bd68510bf0db4cf84010f4dfe372917.1740454179.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

microblaze/mm: put mm_cmdline_setup() in .init.text section

As reported by lkp, there is a section mismatch of mm_cmdline_setup() and
memblock. The reason is we don't specify the section of
mm_cmdline_setup() and gcc put it into .text.unlikely.

As mm_cmdline_setup() is only used in mmu_init(), which is in .init.text
section, put mm_cmdline_setup() into it too.

Link: https://lkml.kernel.org/r/20250328010136.13139-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202503241259.kJV3U7Xj-lkp@intel.com/
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range

We triggered the below BUG:

page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2 pfn:0x240402
head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff)
page_type: f4(hugetlb)
page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1)
------------[ cut here ]------------
kernel BUG at ./include/linux/page-flags.h:310!
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty #374
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : const_folio_flags+0x3c/0x58
lr : const_folio_flags+0x3c/0x58
Call trace:
  const_folio_flags+0x3c/0x58 (P)
  do_migrate_range+0x164/0x720
  offline_pages+0x63c/0x6fc
  memory_subsys_offline+0x190/0x1f4
  device_offline+0xc0/0x13c
  state_store+0x90/0xd8
  dev_attr_store+0x18/0x2c
  sysfs_kf_write+0x44/0x54
  kernfs_fop_write_iter+0x120/0x1cc
  vfs_write+0x240/0x378
  ksys_write+0x70/0x108
  __arm64_sys_write+0x1c/0x28
  invoke_syscall+0x48/0x10c
  el0_svc_common.constprop.0+0x40/0xe0

When allocating a hugetlb folio, between the folio is taken from buddy and
prep_compound_page() is called, start_isolate_page_range() and
do_migrate_range() is called.  When do_migrate_range() scans the head page
of the hugetlb folio, the compound_head field isn't set, so scans the tail
page next.  And at this time, the compound_head field of tail page is set,
folio_test_large() is called by tail page, thus triggers VM_BUG_ON().

To fix it, get folio refcount before calling folio_test_large().

Link: https://lkml.kernel.org/r/20250324131750.1551884-1-tujinjiang@huawei.com
Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration")
Fixes: b62b51d2d159 ("mm: memory_hotplug: remove head variable in do_migrate_range()")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: mm: add entry for secretmem

Link: https://lkml.kernel.org/r/20250326215541.1809379-5-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Willaims <dan.j.williams@intel.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: mm: add entry for numa memblocks and numa emulation

Link: https://lkml.kernel.org/r/20250326215541.1809379-4-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Willaims <dan.j.williams@intel.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: mm: add entry for execmem

Link: https://lkml.kernel.org/r/20250326215541.1809379-3-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Willaims <dan.j.williams@intel.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: fixup USERFAULTFD entry

Patch series "MAINTAINERS: add my isub-entries to MM part."

Following discussion at LSF/MM/BPF I'm adding execmem, secretmem and
numa memblocks sub-entries for MEMORY MANAGEMENT in MAINTAINERS.

This patch (of 4):

Change title to "MEMORY MANAGEMENT - USERFAULTFD" and make it sub-topic in
memory management and add missing include/linux/userfaultfd_k.h and
mailing list

Link: https://lkml.kernel.org/r/20250326215541.1809379-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20250326215541.1809379-2-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Willaims <dan.j.williams@intel.com>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftest/mm: va_high_addr_switch: add ppc64 support check

Add PPC64 Radix MMU support to the va_high_addr_switch.sh by introducing
check_supported_ppc64().  The function verifies:

  - 5-level paging (PGTABLE_LEVELS >= 5) enable in kernel config
  - Radix MMU (required for PPC64 5-level translation)
  - HugePages availability (needed for some tests)

If any check fails, the test is skipped (ksft_skip).  This ensures
compatibility with Power9/Power10 systems running in Radix MMU mode.

Avoid failures on 4-level paging system:

  # mmap(NULL, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(LOW_ADDR, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(HIGH_ADDR, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(HIGH_ADDR, MAP_HUGETLB) again: 0xffffffffffffffff - FAILED
  # mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(-1, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(-1, MAP_HUGETLB) again: 0xffffffffffffffff - FAILED
  # mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2*HUGETLB_SIZE, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(ADDR_SWITCH_HINT , 2*HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED

Link: https://lkml.kernel.org/r/20250327114813.25980-1-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

memblock: don't release high memory to page allocator when HIGHMEM is off

Nathan Chancellor reports the following crash on a MIPS system with
CONFIG_HIGHMEM=n:

  Linux version 6.14.0-rc6-00359-g6faea3422e3b (nathan@ax162) (mips-linux-gcc (GCC) 14.2.0, GNU ld (GNU Binutils) 2.42) #1 SMP Fri Mar 21 08:12:02 MST 2025
  earlycon: uart8250 at I/O port 0x3f8 (options '38400n8')
  printk: legacy bootconsole [uart8250] enabled
  Config serial console: console=ttyS0,38400n8r
  CPU0 revision is: 00019300 (MIPS 24Kc)
  FPU revision is: 00739300
  MIPS: machine is mti,malta
  Software DMA cache coherency enabled
  Initial ramdisk at: 0x8fad0000 (5360128 bytes)
  OF: reserved mem: Reserved memory: No reserved-memory node in the DT
  Primary instruction cache 2kB, VIPT, 2-way, linesize 16 bytes.
  Primary data cache 2kB, 2-way, VIPT, no aliases, linesize 16 bytes
  Zone ranges:
    DMA      [mem 0x0000000000000000-0x0000000000ffffff]
    Normal   [mem 0x0000000001000000-0x000000001fffffff]
  Movable zone start for each node
  Early memory node ranges
    node   0: [mem 0x0000000000000000-0x000000000fffffff]
    node   0: [mem 0x0000000090000000-0x000000009fffffff]
  Initmem setup node 0 [mem 0x0000000000000000-0x000000009fffffff]
  On node 0, zone Normal: 16384 pages in unavailable ranges
  random: crng init done
  percpu: Embedded 3 pages/cpu s18832 r8192 d22128 u49152
  Kernel command line: rd_start=0xffffffff8fad0000 rd_size=5360128  console=ttyS0,38400n8r
  printk: log buffer data + meta data: 32768 + 102400 = 135168 bytes
  Dentry cache hash table entries: 65536 (order: 4, 262144 bytes, linear)
  Inode-cache hash table entries: 32768 (order: 3, 131072 bytes, linear)
  Writing ErrCtl register=00000000
  Readback ErrCtl register=00000000
  Built 1 zonelists, mobility grouping on.  Total pages: 16384
  mem auto-init: stack:all(zero), heap alloc:off, heap free:off
  Unhandled kernel unaligned access[#1]:
  CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc6-00359-g6faea3422e3b #1
  Hardware name: mti,malta
  $ 0   : 00000000 00000001 81cb0880 00129027
  $ 4   : 00000001 0000000a 00000002 00129026
  $ 8   : ffffdfff 80101e00 00000002 00000000
  $12   : 81c9c224 81c63e68 00000002 00000000
  $16   : 805b1e00 00025800 81cb0880 00000002
  $20   : 00000000 81c63e64 0000000a 81f10000
  $24   : 81c63e64 81c63e60
  $28   : 81c60000 81c63de0 00000001 81cc9d20
  Hi    : 00000000
  Lo    : 00000000
  epc   : 814a227c __free_pages_ok+0x144/0x3c0
  ra    : 81cc9d20 memblock_free_all+0x1d4/0x27c
  Status: 10000002        KERNEL EXL
  Cause : 00800410 (ExcCode 04)
  BadVA : 00129026
  PrId  : 00019300 (MIPS 24Kc)
  Modules linked in:
  Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000)
  Stack : 81f10000 805a9e00 81c80000 00000000 00000002 814aa240 000003ff 00000400
          00000000 81f10000 81c9c224 00003b1f 81c80000 81c63e60 81ca0000 81c63e64
          81f10000 0000000a 0000001f 81cc9d20 81f10000 81cc96d8 00000000 81c80000
          81c9c224 81c63e60 81c63e64 00000000 81f10000 00024000 00028000 00025c00
          90000000 a0000000 00000002 00000017 00000000 00000000 81f10000 81f10000
          ...
  Call Trace:
  [<814a227c>] __free_pages_ok+0x144/0x3c0
  [<81cc9d20>] memblock_free_all+0x1d4/0x27c
  [<81cc6764>] mm_core_init+0x100/0x138
  [<81cb4ba4>] start_kernel+0x4a0/0x6e4

  Code: 1080ffd5  02003825  2467ffff <8ce30000> 7c630500  1060ffd4  00000000  8ce30000  7c630180

The crash happens because commit 6faea3422e3b ("arch, mm: streamline
HIGHMEM freeing") too eagerly frees high memory to the page allocator even
when HIGHMEM is disabled.

Make sure that when CONFIG_HIGHMEM=n the high memory is not released to the
page allocator.

Link: https://lore.kernel.org/all/20250323190647.GA1009914@ax162
Link: https://lkml.kernel.org/r/20250325114928.1791109-3-rppt@kernel.org
Reported-by: Nathan Chancellor <nathan@kernel.org>
Fixes: 6faea3422e3b ("arch, mm: streamline HIGHMEM freeing")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/mm_init: init holes in the end of the memory map for FLATMEM

Patch series "mm: fixes for fallouts from mem_init() cleanup".

These are the fixes for fallouts from mem_init() cleanup reported by
Nathan Chancellor and kbuild.  The details are in the commit messages.

This patch (of 2):

Kernel test robot reports the following crash on 32-bit system with
FLATMEM and DEBUG_VM_PGFLAGS enabled:

[    0.478822][    T0] kernel BUG at include/linux/page-flags.h:536!
[    0.479312][    T0] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
[    0.479768][    T0] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc6-00357-g8268af309d07 #1
[    0.480470][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 0.481260][ T0] EIP: reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.481683][ T0] Code: 5d c3 01 f1 89 c8 ba e1 38 f4 c3 e8 1e 37 8e fc 0f 0b b8 90 e2 62 c4 e8 e2 05 5e fc 01 f1 89 c8 ba be 85 f7 c3 e8 04 37 8e fc <0f> 0b b8 80 e2 62 c4 e8 c8 05 5e fc 55 89 e5 53 57 56 83 ec 10 89
[    0.483177][    T0] EAX: 00000000 EBX: c425df50 ECX: 00000000 EDX: 00000000
[    0.483712][    T0] ESI: 017ffc00 EDI: ffffffff EBP: c425df34 ESP: c425df2c
[    0.484248][    T0] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210046
[    0.484846][    T0] CR0: 80050033 CR2: 00000000 CR3: 04b48000 CR4: 00000090
[    0.485376][    T0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    0.485907][    T0] DR6: fffe0ff0 DR7: 00000400
[    0.486253][    T0] Call Trace:
[ 0.486494][ T0] ? __die_body (arch/x86/kernel/dumpstack.c:478)
[ 0.486822][ T0] ? die (arch/x86/kernel/dumpstack.c:?)
[ 0.487099][ T0] ? do_trap (arch/x86/kernel/traps.c:? arch/x86/kernel/traps.c:197)
[ 0.487409][ T0] ? do_error_trap (arch/x86/kernel/traps.c:217)
[ 0.487752][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.488153][ T0] ? exc_overflow (arch/x86/kernel/traps.c:301)
[ 0.488490][ T0] ? handle_invalid_op (arch/x86/kernel/traps.c:254)
[ 0.488869][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.489271][ T0] ? exc_invalid_op (arch/x86/kernel/traps.c:316)
[ 0.489619][ T0] ? handle_exception (arch/x86/entry/entry_32.S:1055)
[ 0.489996][ T0] ? exc_overflow (arch/x86/kernel/traps.c:301)
[ 0.490332][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.490733][ T0] ? exc_overflow (arch/x86/kernel/traps.c:301)
[ 0.491068][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.491470][ T0] memmap_init_reserved_pages (mm/memblock.c:2203)
[ 0.491887][ T0] free_low_memory_core_early (mm/memblock.c:?)
[ 0.492302][ T0] memblock_free_all (mm/memblock.c:2272 include/linux/atomic/atomic-arch-fallback.h:546 include/linux/atomic/atomic-long.h:123 include/linux/atomic/atomic-instrumented.h:3261 include/linux/mm.h:67 mm/memblock.c:2273)
[ 0.492659][ T0] mem_init (arch/x86/mm/init_32.c:735)
[ 0.492952][ T0] mm_core_init (mm/mm_init.c:2730)
[ 0.493271][ T0] start_kernel (init/main.c:958)
[ 0.493604][ T0] i386_start_kernel (arch/x86/kernel/head32.c:79)
[ 0.493969][ T0] startup_32_smp (arch/x86/kernel/head_32.S:292)

The crash happens because after commit 8268af309d07 ("arch, mm: set
max_mapnr when allocating memory map for FLATMEM") max_mapnr is rounded up
to MAX_ORDER_NR_PAGES and the pages in the end of the memory map are
passing pfn_valid() check in reserve_bootmem_region().

Make sure that that pages in the end of the memory map are initialized,
just like the pages in the end of the last section for SPARSEMEM.

Link: https://lkml.kernel.org/r/20250325114928.1791109-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20250325114928.1791109-2-rppt@kernel.org
Fixes: 8268af309d07 ("arch, mm: set max_mapnr when allocating memory map for FLATMEM")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202503241424.d16223ec-lkp@intel.com
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: add peterx as userfaultfd reviewer

Add an entry for userfaultfd and make myself a reviewer of it, just in
case it helps people manage the cc list.

I named it MEMORY USERFAULTFD, could be a bad name, but then it can be
together with the MEMORY* entries when everything is in alphabetic order,
which is definitely a benefit.

The line may not change much on how I'd work with userfaultfd; I think
I'll do the same as before.. But maybe it still, more or less, adds some
responsibility on top, indeed.

[akpm@linux-foundation.org: add include/linux/userfaultfd_k.h, per Mike]
[akpm@linux-foundation.org: fix misordering]
Link: https://lkml.kernel.org/r/20250322002124.131736-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/page_alloc: replace flag check with PageHWPoison() in check_new_page_bad()

This patch replaces the direct check for the __PG_HWPOISON flag with the
PageHWPoison() macro, improving code readability and maintaining
consistency with other parts of the memory management code.

Link: https://lkml.kernel.org/r/20250320063346.489030-1-ye.liu@linux.dev
Signed-off-by: Ye Liu <liuye@kylinos.cn>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: simplify control flow in damon_register_ops()

The function logic is not complex, so using goto is unnecessary. Replace
it with a straightforward if-else to simplify control flow and improve
readability.

Link: https://lkml.kernel.org/r/Z9vxcPCw8tDsjKw1@OneApple
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/kasan: use SLAB_NO_MERGE flag instead of an empty constructor

Use SLAB_NO_MERGE flag to prevent merging instead of providing an empty
constructor. Using an empty constructor in this manner is an abuse of
slab interface.

The SLAB_NO_MERGE flag should be used with caution, but in this case, it
is acceptable as the cache is intended solely for debugging purposes.

No functional changes intended.

Link: https://lkml.kernel.org/r/20250318015926.1629748-1-harry.yoo@oracle.com
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: page_alloc: fix defrag_mode's retry & OOM path

Brendan points out that defrag_mode doesn't properly clear
ALLOC_NOFRAGMENT on its last-ditch attempt to allocate.  But looking
closer, the problem is actually more severe: it doesn't actually *check*
whether it's already retried, and keeps looping.  This means the OOM path
is never taken, and the thread can loop indefinitely.

This is verified with an intentional OOM test on defrag_mode=1, which
results in the machine hanging.  After this patch, it triggers the OOM
kill reliably and recovers.

Clear ALLOC_NOFRAGMENT properly, and only retry once.

Link: https://lkml.kernel.org/r/20250401041231.GA2117727@cmpxchg.org
Fixes: e3aa7df331bc ("mm: page_alloc: defrag_mode")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/mremap: do not set vrm->vma NULL immediately prior to checking it

This seems rather unwise. If we cannot merge, extend, then we need to
recall the original VMA to see if we need to uncharge.

If we do need to, do so.

Link: https://lkml.kernel.org/r/b2fb6b9c-376d-4e9b-905e-26d847fd3865@lucifer.local
Fixes: d5c8aec0542e ("mm/mremap: initial refactor of move_vma()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-=by: "Lai, Yi" <yi1.lai@linux.intel.com>
Closes: https://lore.kernel.org/linux-mm/Z+lcvEIHMLiKVR1i@ly-workstation/
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()

Currently, zswap_cpu_comp_dead() calls crypto_free_acomp() while holding
the per-CPU acomp_ctx mutex.  crypto_free_acomp() then holds scomp_lock
(through crypto_exit_scomp_ops_async()).

On the other hand, crypto_alloc_acomp_node() holds the scomp_lock (through
crypto_scomp_init_tfm()), and then allocates memory.  If the allocation
results in reclaim, we may attempt to hold the per-CPU acomp_ctx mutex.

The above dependencies can cause an ABBA deadlock.  For example in the
following scenario:

(1) Task A running on CPU #1:
    crypto_alloc_acomp_node()
      Holds scomp_lock
      Enters reclaim
      Reads per_cpu_ptr(pool->acomp_ctx, 1)

(2) Task A is descheduled

(3) CPU #1 goes offline
    zswap_cpu_comp_dead(CPU #1)
      Holds per_cpu_ptr(pool->acomp_ctx, 1))
      Calls crypto_free_acomp()
      Waits for scomp_lock

(4) Task A running on CPU #2:
      Waits for per_cpu_ptr(pool->acomp_ctx, 1) // Read on CPU #1
      DEADLOCK

Since there is no requirement to call crypto_free_acomp() with the per-CPU
acomp_ctx mutex held in zswap_cpu_comp_dead(), move it after the mutex is
unlocked.  Also move the acomp_request_free() and kfree() calls for
consistency and to avoid any potential sublte locking dependencies in the
future.

With this, only setting acomp_ctx fields to NULL occurs with the mutex
held.  This is similar to how zswap_cpu_comp_prepare() only initializes
acomp_ctx fields with the mutex held, after performing all allocations
before holding the mutex.

Opportunistically, move the NULL check on acomp_ctx so that it takes place
before the mutex dereference.

Link: https://lkml.kernel.org/r/20250226185625.2672936-1-yosry.ahmed@linux.dev
Fixes: 12dcb0ef5406 ("mm: zswap: properly synchronize freeing resources during CPU hotunplug")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Reported-by: syzbot+1a517ccfcbc6a7ab0f82@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67bcea51.050a0220.bbfd1.0096.GAE@google.com/
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Nhat Pham <nphamcs@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Chris Murphy <lists@colorremedies.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/hugetlb: move hugetlb_sysctl_init() to the __init section

hugetlb_sysctl_init() is only invoked once by an __init function and is
merely a wrapper around another __init function so there is not reason to
keep it.

Fixes the following warning when toning down some GCC inline options:

WARNING: modpost: vmlinux: section mismatch in reference:
hugetlb_sysctl_init+0x1b (section: .text) ->
__register_sysctl_init (section: .init.text)

Link: https://lkml.kernel.org/r/20250319060041.2737320-1-marc.herbert@linux.intel.com
Signed-off-by: Marc Herbert <Marc.Herbert@linux.intel.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock

I found a NULL pointer dereference as followed:

BUG: kernel NULL pointer dereference, address: 0000000000000028
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 5 UID: 0 PID: 5964 Comm: sh Kdump: loaded Not tainted 6.13.0-dirty #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.
RIP: 0010:has_unmovable_pages+0x184/0x360
...
Call Trace:
  <TASK>
  set_migratetype_isolate+0xd1/0x180
  start_isolate_page_range+0xd2/0x170
  alloc_contig_range_noprof+0x101/0x660
  alloc_contig_pages_noprof+0x238/0x290
  alloc_gigantic_folio.isra.0+0xb6/0x1f0
  only_alloc_fresh_hugetlb_folio.isra.0+0xf/0x60
  alloc_pool_huge_folio+0x80/0xf0
  set_max_huge_pages+0x211/0x490
  __nr_hugepages_store_common+0x5f/0xe0
  nr_hugepages_store+0x77/0x80
  kernfs_fop_write_iter+0x118/0x200
  vfs_write+0x23c/0x3f0
  ksys_write+0x62/0xe0
  do_syscall_64+0x5b/0x170
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

As has_unmovable_pages() call folio_hstate() without hugetlb_lock, there
is a race to free the HugeTLB page between PageHuge() and folio_hstate().
There is no need to add hugetlb_lock here as the HugeTLB page can be freed
in lot of places.  So it's enough to unfold folio_hstate() and add a check
to avoid NULL pointer dereference for hugepage_migration_supported().

Link: https://lkml.kernel.org/r/20250122061151.578768-1-liushixin2@huawei.com
Fixes: 464c7ffbcb16 ("mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/hugetlb_vmemmap: fix memory loads ordering

Using x86_64 as an example, for a 32KB struct page[] area describing a 2MB
hugeTLB, HVO reduces the area to 4KB by the following steps:

1. Split the (r/w vmemmap) PMD mapping the area into 512 (r/w) PTEs;
2. For the 8 PTEs mapping the area, remap PTE 1-7 to the page mapped
   by PTE 0, and at the same time change the permission from r/w to
   r/o;
3. Free the pages PTE 1-7 used to map, hence the reduction from 32KB
   to 4KB.

However, the following race can happen due to improperly memory loads
ordering:
  CPU 1 (HVO)                     CPU 2 (speculative PFN walker)

  page_ref_freeze()
  synchronize_rcu()
                                  rcu_read_lock()
                                  page_is_fake_head() is false
  vmemmap_remap_pte()
  XXX: struct page[] becomes r/o

  page_ref_unfreeze()
                                  page_ref_count() is not zero

                                  atomic_add_unless(&page->_refcount)
                                  XXX: try to modify r/o struct page[]

Specifically, page_is_fake_head() must be ordered after page_ref_count()
on CPU 2 so that it can only return true for this case, to avoid the later
attempt to modify r/o struct page[].

This patch adds the missing memory barrier and makes the tests on
page_is_fake_head() and page_ref_count() done in the proper order.

Link: https://lkml.kernel.org/r/20250108074822.722696-1-yuzhao@google.com
Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: Will Deacon <will@kernel.org>
Closes: https://lore.kernel.org/20241128142028.GA3506@willie-the-truck/
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Acked-by: Will Deacon <will@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/userfaultfd: fix release hang over concurrent GUP

This patch should fix a possible userfaultfd release() hang during
concurrent GUP.

This problem was initially reported by Dimitris Siakavaras in July 2023
[1] in a firecracker use case.  Firecracker has a separate process
handling page faults remotely, and when the process releases the
userfaultfd it can race with a concurrent GUP from KVM trying to fault in
a guest page during the secondary MMU page fault process.

A similar problem was reported recently again by Jinjiang Tu in March 2025
[2], even though the race happened this time with a mlockall() operation,
which does GUP in a similar fashion.

In 2017, commit 656710a60e36 ("userfaultfd: non-cooperative: closing the
uffd without triggering SIGBUS") was trying to fix this issue.  AFAIU,
that fixes well the fault paths but may not work yet for GUP.  In GUP, the
issue is NOPAGE will be almost treated the same as "page fault resolved"
in faultin_page(), then the GUP will follow page again, seeing page
missing, and it'll keep going into a live lock situation as reported.

This change makes core mm return RETRY instead of NOPAGE for both the GUP
and fault paths, proactively releasing the mmap read lock.  This should
guarantee the other release thread make progress on taking the write lock
and avoid the live lock even for GUP.

When at it, rearrange the comments to make sure it's uptodate.

[1] https://lore.kernel.org/r/79375b71-db2e-3e66-346b-254c90d915e2@cslab.ece.ntua.gr
[2] https://lore.kernel.org/r/20250307072133.3522652-1-tujinjiang@huawei.com

Link: https://lkml.kernel.org/r/20250312145131.1143062-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Dimitris Siakavaras <jimsiak@cslab.ece.ntua.gr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Merge tag 'i2c-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:
"i2c-core updates (collected by Wolfram):
   - remove last user and unexport i2c_of_match_device()
   - irq usage cleanup from Jiri

  i2c-host updates (collected by Andi):

  Refactoring and cleanups:
   - octeon, cadence, i801, pasemi, mlxbf, bcm-iproc: general
     refactorings
   - octeon: remove 10-bit address support

  Improvements:
   - amd-asf: improved error handling
   - designware: use guard(mutex)
   - amd-asf, designware: update naming to follow latest specs
   - cadence: fix cleanup path in probe
   - i801: use MMIO and I/O mapping helpers to access registers
   - pxa: handle error after clk_prepare_enable

  New features:
   - added i2c_10bit_addr_*_from_msg() and updated multiple drivers
   - omap: added multiplexer state handling
   - qcom-geni: update frequency configuration
   - qup: introduce DMA usage policy

  New hardware support:
   - exynos: add support for Samsung exynos7870
   - k1: add support for spacemit k1 (new driver)
   - imx: add support for i.mx94 lpi2c
   - rk3x: add support for rk3562
   - designware: add support for Renesas RZ/N1D

  Multiplexers:
   - ltc4306, reg: fix assignment in platform_driver structure

  at24 eeprom updates (collected by Bartosz):
   - add two new compatible entries to the DT binding document
   - drop of_match_ptr() and ACPI_PTR() macros"

* tag 'i2c-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (50 commits)
  dt-bindings: i2c: snps,designware-i2c: describe Renesas RZ/N1D variant
  irqdomain: i2c: Switch to irq_find_mapping()
  i2c: iproc: Refactor prototype and remove redundant error checks
  i2c: qcom-geni: Update i2c frequency table to match hardware guidance
  i2c: mlxbf: Use readl_poll_timeout_atomic() for polling
  i2c: pasemi: Add registers bits and switch to BIT()
  i2c: k1: Initialize variable before use
  i2c: spacemit: add support for SpacemiT K1 SoC
  dt-bindings: i2c: spacemit: add support for K1 SoC
  i2c: omap: Add support for setting mux
  dt-bindings: i2c: omap: Add mux-states property
  i2c: octeon: remove 10-bit addressing support
  i2c: octeon: fix return commenting
  i2c: i801: Use MMIO if available
  i2c: i801: Switch to iomapped register access
  i2c: i801: Improve too small kill wait time in i801_check_post
  i2c: i801: Move i801_wait_intr and i801_wait_byte_done in the code
  i2c: i801: Cosmetic improvements
  i2c: cadence: Move reset_control_assert after pm_runtime_set_suspended in probe error path
  i2c: cadence: Simplify using devm_clk_get_enabled()
  ...

Documentation/EDAC: Fix warning document isn't included in any toctree

Fix the build (htmldocs) warning:

Documentation/edac/index.rst: WARNING: document isn't included in any toctree.

Fixes: db99ea5f2c03 ("EDAC: Add support for EDAC device features control")
Closes: https://lore.kernel.org/all/20250228185102.15842f8b@canb.auug.org.au/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250401115823.573-1-shiju.jose@huawei.com

Merge tag 'dmaengine-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine updates from Vinod Koul:
"The dmaengine subsystem updates for this cycle consist of a new driver
  (Microchip) along with couple of yaml binding conversions, core api
  updates and bunch of driver updates etc.

  New HW support:

   - Microchip sama7d65 dma controller

   - Yaml conversion of atmel dma binding and Freescale Elo DMA
     Controller binding

  Core:

   - Remove device_prep_dma_imm_data() API as users are removed

   - Reduce scope of some less frequently used DMA request channel APIs
     with aim to cleanup these in future

  Updates:

   - Drop Fenghua Yu from idxd maintainers, as he changed jobs

   - AMD ptdma support for multiqueue and ae4dma deprecated PCI IDs
     removal"

* tag 'dmaengine-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (29 commits)
  dmaengine: ptdma: Utilize the AE4DMA engine's multi-queue functionality
  dmaengine: ae4dma: Use the MSI count and its corresponding IRQ number
  dmaengine: ae4dma: Remove deprecated PCI IDs
  dmaengine: Remove device_prep_dma_imm_data from struct dma_device
  dmaengine: ti: edma: support sw triggered chans in of_edma_xlate()
  dmaengine: ti: k3-udma: Enable second resource range for BCDMA and PKTDMA
  dmaengine: fsl-edma: free irq correctly in remove path
  dmaengine: fsl-edma: cleanup chan after dma_async_device_unregister
  dt-bindings: dma: snps,dw-axi-dmac: Allow devices to be marked as noncoherent
  dmaengine: dmatest: Fix dmatest waiting less when interrupted
  dt-bindings: dma: Convert fsl,elo*-dma to YAML
  dt-bindings: dma: fsl-mxs-dma: Add compatible string for i.MX8 chips
  dmaengine: Fix typo in comment
  dmaengine: ti: k3-udma-glue: Drop skip_fdq argument from k3_udma_glue_reset_rx_chn
  dmaengine: bcm2835-dma: fix warning when CONFIG_PM=n
  dt-bindings: dma: fsl,edma: Add i.MX94 support
  dt-bindings: dma: atmel: add microchip,sama7d65-dma
  dmaengine: img-mdc: remove incorrect of_match_ptr annotation
  dmaengine: idxd: Delete unnecessary NULL check
  dmaengine: pxa: Enable compile test
  ...

Merge tag 'phy-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Pull phy updates from Vinod Koul:
"A fairly moderate sized request for the generic phy subsystem with
  some new device and driver support along with driver updates with
  Samsung and Qualcomm ones being major ones.

  New HW Support:

   - Qualcomm X1P42100 PCIe Gen4x4, QCS615 qmp usbc, PCIe UNIPHY 28LP
     driver, SM8750 QMP UFS PHY

   - Rockchip rk3576 hdptx, rk3562 naneng-combo support

   - Samsung MIPI D-/C-PHY driver, ExynosAutov920 ufs phy driver

  Updates:

   - Samsung USB3 Type-C lane orientation detection and configuration
     for Google gs101

   - Qualcomm support for dual lane PHY support for QCS8300 SoC"

* tag 'phy-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (47 commits)
  phy: rockchip-naneng-combo: Support rk3562
  dt-bindings: phy: rockchip: Add rk3562 naneng-combophy compatible
  phy: rockchip: Add Samsung MIPI D-/C-PHY driver
  dt-bindings: phy: Add Rockchip MIPI C-/D-PHY schema
  phy: qcom: uniphy-28lp: add COMMON_CLK dependency
  phy: rockchip: usbdp: Remove unnecessary bool conversion
  phy: rockchip: usbdp: Avoid call hpd_event_trigger in dp_phy_init
  phy: rockchip: usbdp: Only verify link rates/lanes/voltage when the corresponding set flags are set
  phy: qcom-qmp-pcie: add dual lane PHY support for QCS8300
  dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: Document the QCS8300 QMP PCIe PHY Gen4 x2
  phy: qcom-qmp-ufs: Add PHY Configuration support for sm8750
  dt-bindings: phy: qcom,sc8280xp-qmp-ufs-phy: document the SM8750 QMP UFS PHY
  phy: qcom: Introduce PCIe UNIPHY 28LP driver
  dt-bindings: phy: qcom,uniphy-pcie: Document PCIe uniphy
  phy: qcom: qmp-usbc: Add qmp configuration for QCS615
  phy: freescale: imx8m-pcie: assert phy reset and perst in power off
  phy: freescale: imx8m-pcie: cleanup reset logic
  phy: core: Remove unused phy_pm_runtime_(allow|forbid)
  dt-bindings: phy: document Allwinner A523 USB-2.0 PHY
  phy: phy-rockchip-samsung-hdptx: Add support for RK3576
  ...

Merge tag 'soundwire-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

Pull soundwire updates from Vinod Koul:

- Support for SoundWire Bulk Register Access (BRA) protocol in core
   along with Intel driver support and ASoC bits required

- AMD driver updates and support for ACP 7.0 and 7.1 platforms

* tag 'soundwire-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (28 commits)
  soundwire: take in count the bandwidth of a prepared stream
  ASoC: rt711-sdca: add DP0 support
  soundwire: debugfs: add interface for BPT/BRA transfers
  ASoC: SOF: Intel: hda-sdw-bpt: add CHAIN_DMA support
  soundwire: intel_ace2x: add BPT send_async/wait callbacks
  soundwire: intel: add BPT context definition
  ASoC: SOF: Intel: hda-sdw-bpt: add helpers for SoundWire BPT DMA
  soundwire: intel_auxdevice: add indirection for BPT send_async/wait
  soundwire: cadence: add BTP/BRA helpers to format data
  soundwire: bus: add bpt_stream pointer
  soundwire: bus: add send_async/wait APIs for BPT protocol
  soundwire: stream: reuse existing code for BPT stream
  soundwire: stream: special-case the bus compute_params() routine
  soundwire: stream: extend sdw_alloc_stream() to take 'type' parameter
  soundwire: extend sdw_stream_type to BPT
  soundwire: cadence: add BTP support for DP0
  Documentation: driver: add SoundWire BRA description
  soundwire: amd: change the log level for command response log
  soundwire: slave: fix an OF node reference leak in soundwire slave device
  soundwire: Use str_enable_disable-like helpers
  ...

Merge tag 'char-misc-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / misc / IIO driver updates from Greg KH:
"Here is the big set of char, misc, iio, and other smaller driver
  subsystems for 6.15-rc1. Lots of stuff in here, including:

   - loads of IIO changes and driver updates

   - counter driver updates

   - w1 driver updates

   - faux conversions for some drivers that were abusing the platform
     bus interface

   - coresight driver updates

   - rust miscdevice binding updates based on real-world-use

   - other minor driver updates

  All of these have been in linux-next with no reported issues for quite
  a while"

* tag 'char-misc-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
  samples: rust_misc_device: fix markup in top-level docs
  Coresight: Fix a NULL vs IS_ERR() bug in probe
  misc: lis3lv02d: convert to use faux_device
  tlclk: convert to use faux_device
  regulator: dummy: convert to use the faux device interface
  bus: mhi: host: Fix race between unprepare and queue_buf
  coresight: configfs: Constify struct config_item_type
  doc: iio: ad7380: describe offload support
  iio: ad7380: add support for SPI offload
  iio: light: Add check for array bounds in veml6075_read_int_time_ms
  iio: adc: ti-ads7924 Drop unnecessary function parameters
  staging: iio: ad9834: Use devm_regulator_get_enable()
  staging: iio: ad9832: Use devm_regulator_get_enable()
  iio: gyro: bmg160_spi: add of_match_table
  dt-bindings: iio: adc: Add i.MX94 and i.MX95 support
  iio: adc: ad7768-1: remove unnecessary locking
  Documentation: ABI: add wideband filter type to sysfs-bus-iio
  iio: adc: ad7768-1: set MOSI idle state to prevent accidental reset
  iio: adc: ad7768-1: Fix conversion result sign
  iio: adc: ad7124: Benefit of dev = indio_dev->dev.parent in ad7124_parse_channel_config()
  ...

Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updatesk from Greg KH:
"Here is the big set of driver core updates for 6.15-rc1. Lots of stuff
  happened this development cycle, including:

   - kernfs scaling changes to make it even faster thanks to rcu

   - bin_attribute constify work in many subsystems

   - faux bus minor tweaks for the rust bindings

   - rust binding updates for driver core, pci, and platform busses,
     making more functionaliy available to rust drivers. These are all
     due to people actually trying to use the bindings that were in
     6.14.

   - make Rafael and Danilo full co-maintainers of the driver core
     codebase

   - other minor fixes and updates"

* tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits)
  rust: platform: require Send for Driver trait implementers
  rust: pci: require Send for Driver trait implementers
  rust: platform: impl Send + Sync for platform::Device
  rust: pci: impl Send + Sync for pci::Device
  rust: platform: fix unrestricted &mut platform::Device
  rust: pci: fix unrestricted &mut pci::Device
  rust: device: implement device context marker
  rust: pci: use to_result() in enable_device_mem()
  MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers
  rust/kernel/faux: mark Registration methods inline
  driver core: faux: only create the device if probe() succeeds
  rust/faux: Add missing parent argument to Registration::new()
  rust/faux: Drop #[repr(transparent)] from faux::Registration
  rust: io: fix devres test with new io accessor functions
  rust: io: rename `io::Io` accessors
  kernfs: Move dput() outside of the RCU section.
  efi: rci2: mark bin_attribute as __ro_after_init
  rapidio: constify 'struct bin_attribute'
  firmware: qemu_fw_cfg: constify 'struct bin_attribute'
  powerpc/perf/hv-24x7: Constify 'struct bin_attribute'
  ...

Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

- The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

- The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

- The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

- The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

- The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

- The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

- The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

- The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

- Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...

cpufreq: Reference count policy in cpufreq_update_limits()

Since acpi_processor_notify() can be called before registering a cpufreq
driver or even in cases when a cpufreq driver is not registered at all,
cpufreq_update_limits() needs to check if a cpufreq driver is present
and prevent it from being unregistered.

For this purpose, make it call cpufreq_cpu_get() to obtain a cpufreq
policy pointer for the given CPU and reference count the corresponding
policy object, if present.

Fixes: 5a25e3f7cc53 ("cpufreq: intel_pstate: Driver-specific handling of _PPC updates")
Closes: https://lore.kernel.org/linux-acpi/Z-ShAR59cTow0KcR@mail-itl
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/1928789.tdWV9SEqCh@rjwysocki.net

Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

- The series "Enable strict percpu address space checks" from Uros
   Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.

   This has caused a small amount of fallout - two or three issues were
   reported. In all cases the calling code was found to be incorrect.

- The series "Some cleanup for memcg" from Chen Ridong implements some
   relatively monir cleanups for the memcontrol code.

- The series "mm: fixes for device-exclusive entries (hmm)" from David
   Hildenbrand fixes a boatload of issues which David found then using
   device-exclusive PTE entries when THP is enabled. More work is
   needed, but this makes thins better - our own HMM selftests now
   succeed.

- The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
   remove the z3fold and zbud implementations. They have been deprecated
   for half a year and nobody has complained.

- The series "mm: further simplify VMA merge operation" from Lorenzo
   Stoakes implements numerous simplifications in this area. No runtime
   effects are anticipated.

- The series "mm/madvise: remove redundant mmap_lock operations from
   process_madvise()" from SeongJae Park rationalizes the locking in the
   madvise() implementation. Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.

- The series "Tiny cleanup and improvements about SWAP code" from
   Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.

- The series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak
   user-visible output.

- The series "mm/damon/paddr: fix large folios access and schemes
   handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.

- The series "mm/damon/core: fix wrong and/or useless damos_walk()
   behaviors" from SeongJae Park fixes a few issues with the accuracy of
   kdamond's walking of DAMON regions.

- The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
   Stoakes changes the interaction between framebuffer deferred-io and
   core MM. No functional changes are anticipated - this is preparatory
   work for the future removal of page structure fields.

- The series "mm/damon: add support for hugepage_size DAMOS filter"
   from Usama Arif adds a DAMOS filter which permits the filtering by
   huge page sizes.

- The series "mm: permit guard regions for file-backed/shmem mappings"
   from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state. The feature now covers shmem and
   file-backed mappings.

- The series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping
   for pte-mapped large folios.

- The series "reimplement per-vma lock as a refcount" from Suren
   Baghdasaryan puts the vm_lock back into the vma. Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy. This patchset provides small (0-10%) improvements on one
   microbenchmark.

- The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
   improves" from SeongJae Park does some maintenance work on the DAMON
   docs.

- The series "hugetlb/CMA improvements for large systems" from Frank
   van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.

- The series "mm/damon: introduce DAMOS filter type for unmapped pages"
   from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.

- The series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.

- The series "selftests/mm: Some cleanups from trying to run them" from
   Brendan Jackman fixes a pile of unrelated issues which Brendan
   encountered while runnimg our selftests.

- The series "fs/proc/task_mmu: add guard region bit to pagemap" from
   Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.

- The series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply
   wasn't being effective.

- The series "mm: cleanups for device-exclusive entries (hmm)" from
   David Hildenbrand implements a number of unrelated cleanups in this
   code.

- The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
   implements a number of preparatoty cleanups to the GENERIC_PTDUMP
   Kconfig logic.

- The series "mm/damon: auto-tune aggregation interval" from SeongJae
   Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.

- The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
   powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
   preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.

- The series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the
   code easier to follow.

- The series "page_counter cleanup and size reduction" from Shakeel
   Butt cleans up the page_counter code and fixes a size increase which
   we accidentally added late last year.

- The series "Add a command line option that enables control of how
   many threads should be used to allocate huge pages" from Thomas
   Prescher does that. It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.

- The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
   from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.

- The series "mm/damon: make allow filters after reject filters useful
   and intuitive" from SeongJae Park improves the handling of allow and
   reject filters. Behaviour is made more consistent and the documention
   is updated accordingly.

- The series "Switch zswap to object read/write APIs" from Yosry Ahmed
   updates zswap to the new object read/write APIs and thus permits the
   removal of some legacy code from zpool and zsmalloc.

- The series "Some trivial cleanups for shmem" from Baolin Wang does as
   it claims.

- The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
   Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.

- The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
   preparatoty refactoring and cleanup of the mremap() code.

- The series "mm: MM owner tracking for large folios (!hugetlb) +
   CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.

- The series "mm/damon: add sysfs dirs for managing DAMOS filters based
   on handling layers" from SeongJae Park adds a couple of new sysfs
   directories to ease the management of DAMON/DAMOS filters.

- The series "arch, mm: reduce code duplication in mem_init()" from
   Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.

- The series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.

- The series "mm: page_ext: Introduce new iteration API" from Luiz
   Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.

- The series "Buddy allocator like (or non-uniform) folio split" from
   Zi Yan reworks the code to split a folio into smaller folios. The
   main benefit is lessened memory consumption: fewer post-split folios
   are generated.

- The series "Minimize xa_node allocation during xarry split" from Zi
   Yan reduces the number of xarray xa_nodes which are generated during
   an xarray split.

- The series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.

- The series "Add tracepoints for lowmem reserves, watermarks and
   totalreserve_pages" from Martin Liu adds some more tracepoints to the
   page allocator code.

- The series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which
   SeongJae observed during his earlier madvise work.

- The series "mm/hwpoison: Fix regressions in memory failure handling"
   from Shuai Xue addresses two quite serious regressions which Shuai
   has observed in the memory-failure implementation.

- The series "mm: reliable huge page allocator" from Johannes Weiner
   makes huge page allocations cheaper and more reliable by reducing
   fragmentation.

- The series "Minor memcg cleanups & prep for memdescs" from Matthew
   Wilcox is preparatory work for the future implementation of memdescs.

- The series "track memory used by balloon drivers" from Nico Pache
   introduces a way to track memory used by our various balloon drivers.

- The series "mm/damon: introduce DAMOS filter type for active pages"
   from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.

- The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
   separates the proactive reclaim statistics from the direct reclaim
   statistics.

- The series "mm/vmscan: don't try to reclaim hwpoison folio" from
   Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
   code.

* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
  mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
  x86/mm: restore early initialization of high_memory for 32-bits
  mm/vmscan: don't try to reclaim hwpoison folio
  mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
  cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
  mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
  selftests/mm: speed up split_huge_page_test
  selftests/mm: uffd-unit-tests support for hugepages > 2M
  docs/mm/damon/design: document active DAMOS filter type
  mm/damon: implement a new DAMOS filter type for active pages
  fs/dax: don't disassociate zero page entries
  MM documentation: add "Unaccepted" meminfo entry
  selftests/mm: add commentary about 9pfs bugs
  fork: use __vmalloc_node() for stack allocation
  docs/mm: Physical Memory: Populate the "Zones" section
  xen: balloon: update the NR_BALLOON_PAGES state
  hv_balloon: update the NR_BALLOON_PAGES state
  balloon_compaction: update the NR_BALLOON_PAGES state
  meminfo: add a per node counter for balloon drivers
  mm: remove references to folio in __memcg_kmem_uncharge_page()
  ...

objtool/loongarch: Add unwind hints in prepare_frametrace()

If 'regs' points to a local stack variable, prepare_frametrace() stores
all registers to the stack.  This confuses objtool as it expects them to
be restored from the stack later.

The stores don't affect stack tracing, so use unwind hints to hide them
from objtool.

Fixes the following warnings:

  arch/loongarch/kernel/traps.o: warning: objtool: show_stack+0xe0: stack state mismatch: reg1[22]=-1+0 reg2[22]=-2-160
  arch/loongarch/kernel/traps.o: warning: objtool: show_stack+0xe0: stack state mismatch: reg1[23]=-1+0 reg2[23]=-2-152

Fixes: cb8a2ef0848c ("LoongArch: Add ORC stack unwinder support")
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/270cadd8040dda74db2307f23497bb68e65db98d.1743481539.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/oe-kbuild-all/202503280703.OARM8SrY-lkp@intel.com/

rcu-tasks: Always inline rcu_irq_work_resched()

Thanks to CONFIG_DEBUG_SECTION_MISMATCH, empty functions can be
generated out of line. rcu_irq_work_resched() can be called from
noinstr code, so make sure it's always inlined.

Fixes: 564506495ca9 ("rcu/context-tracking: Move deferred nocb resched to context tracking")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/e84f15f013c07e4c410d972e75620c53b62c1b3e.1743481539.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/d1eca076-fdde-484a-b33e-70e0d167c36d@infradead.org