]> www.infradead.org Git - users/willy/linux.git/log
users/willy/linux.git
3 months agoiommu/arm-smmu-v3-iommufd: Add vsmmu_size/type and vsmmu_init impl ops
Nicolin Chen [Thu, 10 Jul 2025 05:59:15 +0000 (22:59 -0700)]
iommu/arm-smmu-v3-iommufd: Add vsmmu_size/type and vsmmu_init impl ops

An impl driver might want to allocate its own type of vIOMMU object or the
standard IOMMU_VIOMMU_TYPE_ARM_SMMUV3 by setting up its own SW/HW bits, as
the tegra241-cmdqv driver will add IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV.

Add vsmmu_size/type and vsmmu_init to struct arm_smmu_impl_ops. Prioritize
them in arm_smmu_get_viommu_size() and arm_vsmmu_init().

Link: https://patch.msgid.link/r/375ac2b056764534bb7c10ecc4f34a0bae82b108.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/selftest: Update hw_info coverage for an input data_type
Nicolin Chen [Thu, 10 Jul 2025 05:59:14 +0000 (22:59 -0700)]
iommufd/selftest: Update hw_info coverage for an input data_type

Test both IOMMU_HW_INFO_TYPE_DEFAULT and IOMMU_HW_INFO_TYPE_SELFTEST, and
add a negative test for an unsupported type.

Also drop the unused mask in test_cmd_get_hw_capabilities() as checkpatch
is complaining.

Link: https://patch.msgid.link/r/f01a1e50cd7366f217cbf192ad0b2b79e0eb89f0.1752126748.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd: Allow an input data_type via iommu_hw_info
Nicolin Chen [Thu, 10 Jul 2025 05:59:13 +0000 (22:59 -0700)]
iommufd: Allow an input data_type via iommu_hw_info

The iommu_hw_info can output via the out_data_type field the vendor data
type from a driver, but this only allows driver to report one data type.

Now, with SMMUv3 having a Tegra241 CMDQV implementation, it has two sets
of types and data structs to report.

One way to support that is to use the same type field bidirectionally.

Reuse the same field by adding an "in_data_type", allowing user space to
request for a specific type and to get the corresponding data.

For backward compatibility, since the ioctl handler has never checked an
input value, add an IOMMU_HW_INFO_FLAG_INPUT_TYPE to switch between the
old output-only field and the new bidirectional field.

Link: https://patch.msgid.link/r/887378a7167e1786d9d13cde0c36263ed61823d7.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommu: Allow an input type in hw_info op
Nicolin Chen [Thu, 10 Jul 2025 05:59:12 +0000 (22:59 -0700)]
iommu: Allow an input type in hw_info op

The hw_info uAPI will support a bidirectional data_type field that can be
used as an input field for user space to request for a specific info data.

To prepare for the uAPI update, change the iommu layer first:
 - Add a new IOMMU_HW_INFO_TYPE_DEFAULT as an input, for which driver can
   output its only (or firstly) supported type
 - Update the kdoc accordingly
 - Roll out the type validation in the existing drivers

Link: https://patch.msgid.link/r/00f4a2d3d930721f61367014717b3ba2d1e82a81.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoDocumentation: userspace-api: iommufd: Update HW QUEUE
Nicolin Chen [Thu, 10 Jul 2025 05:59:11 +0000 (22:59 -0700)]
Documentation: userspace-api: iommufd: Update HW QUEUE

With the introduction of the new object and its infrastructure, update the
doc to reflect that.

Link: https://patch.msgid.link/r/caa3ddc0d9bacf05c5b3e02c5f306ff3172cc54d.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/selftest: Add coverage for the new mmap interface
Nicolin Chen [Thu, 10 Jul 2025 05:59:10 +0000 (22:59 -0700)]
iommufd/selftest: Add coverage for the new mmap interface

Extend the loopback test to a new mmap page.

Link: https://patch.msgid.link/r/b02b1220c955c3cf9ea5dd9fe9349ab1b4f8e20b.1752126748.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd: Add mmap interface
Nicolin Chen [Thu, 10 Jul 2025 05:59:09 +0000 (22:59 -0700)]
iommufd: Add mmap interface

For vIOMMU passing through HW resources to user space (VMs), allowing a VM
to control the passed through HW directly by accessing hardware registers,
add an mmap infrastructure to map the physical MMIO pages to user space.

Maintain a maple tree per ictx as a translation table managing mmappable
regions, from an allocated for-user mmap offset to an iommufd_mmap struct,
where it stores the real physical address range for io_remap_pfn_range().

Keep track of the lifecycle of the mmappable region by taking refcount of
its owner, so as to enforce user space to unmap the region first before it
can destroy its owner object.

To allow an IOMMU driver to add and delete mmappable regions onto/from the
maple tree, add iommufd_viommu_alloc/destroy_mmap helpers.

Link: https://patch.msgid.link/r/9a888a326b12aa5fe940083eae1156304e210fe0.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/selftest: Add coverage for IOMMUFD_CMD_HW_QUEUE_ALLOC
Nicolin Chen [Thu, 10 Jul 2025 05:59:08 +0000 (22:59 -0700)]
iommufd/selftest: Add coverage for IOMMUFD_CMD_HW_QUEUE_ALLOC

Some simple tests for IOMMUFD_CMD_HW_QUEUE_ALLOC infrastructure covering
the new iommufd_hw_queue_depend/undepend() helpers.

Link: https://patch.msgid.link/r/e8a194d187d7ef445f43e4a3c04fb39472050afd.1752126748.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/driver: Add iommufd_hw_queue_depend/undepend() helpers
Nicolin Chen [Thu, 10 Jul 2025 05:59:07 +0000 (22:59 -0700)]
iommufd/driver: Add iommufd_hw_queue_depend/undepend() helpers

NVIDIA Virtual Command Queue is one of the iommufd users exposing vIOMMU
features to user space VMs. Its hardware has a strict rule when mapping
and unmapping multiple global CMDQVs to/from a VM-owned VINTF, requiring
mappings in ascending order and unmappings in descending order.

The tegra241-cmdqv driver can apply the rule for a mapping in the LVCMDQ
allocation handler. However, it can't do the same for an unmapping since
user space could start random destroy calls breaking the rule, while the
destroy op in the driver level can't reject a destroy call as it returns
void.

Add iommufd_hw_queue_depend/undepend for-driver helpers, allowing LVCMDQ
allocator to refcount_inc() a sibling LVCMDQ object and LVCMDQ destroyer
to refcount_dec(), so that iommufd core will help block a random destroy
call that breaks the rule.

This is a bit of compromise, because a driver might end up with abusing
the API that deadlocks the objects. So restrict the API to a dependency
between two driver-allocated objects of the same type, as iommufd would
unlikely build any core-level dependency in this case. And encourage to
use the macro version that currently supports the HW QUEUE objects only.

Link: https://patch.msgid.link/r/2735c32e759c82f2e6c87cb32134eaf09b7589b5.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl
Nicolin Chen [Thu, 10 Jul 2025 05:59:06 +0000 (22:59 -0700)]
iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl

Introduce a new IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl for user space to allocate
a HW QUEUE object for a vIOMMU specific HW-accelerated queue, e.g.:
 - NVIDIA's Virtual Command Queue
 - AMD vIOMMU's Command Buffer, Event Log Buffers, and PPR Log Buffers

Since this is introduced with NVIDIA's VCMDQs that access the guest memory
in the physical address space, add an iommufd_hw_queue_alloc_phys() helper
that will create an access object to the queue memory in the IOAS, to avoid
the mappings of the guest memory from being unmapped, during the life cycle
of the HW queue object.

AMD's HW will need an hw_queue_init op that is mutually exclusive with the
hw_queue_init_phys op, and their case will bypass the access part, i.e. no
iommufd_hw_queue_alloc_phys() call.

Link: https://patch.msgid.link/r/dab4ace747deb46c1fe70a5c663307f46990ae56.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/viommu: Introduce IOMMUFD_OBJ_HW_QUEUE and its related struct
Nicolin Chen [Thu, 10 Jul 2025 05:59:05 +0000 (22:59 -0700)]
iommufd/viommu: Introduce IOMMUFD_OBJ_HW_QUEUE and its related struct

Add IOMMUFD_OBJ_HW_QUEUE with an iommufd_hw_queue structure, representing
a HW-accelerated queue type of IOMMU's physical queue that can be passed
through to a user space VM for direct hardware control, such as:
 - NVIDIA's Virtual Command Queue
 - AMD vIOMMU's Command Buffer, Event Log Buffers, and PPR Log Buffers

Add new viommu ops for iommufd to communicate with IOMMU drivers to fetch
supported HW queue structure size and to forward user space ioctls to the
IOMMU drivers for initialization/destroy.

As the existing HWs, NVIDIA's VCMDQs access the guest memory via physical
addresses, while AMD's Buffers access the guest memory via guest physical
addresses (i.e. iova of the nesting parent HWPT). Separate two mutually
exclusive hw_queue_init and hw_queue_init_phys ops to indicate whether a
vIOMMU HW accesses the guest queue in the guest physical space (via iova)
or the host physical space (via pa).

In a latter case, the iommufd core will validate the physical pages of a
given guest queue, to ensure the underlying physical pages are contiguous
and pinned.

Since this is introduced with NVIDIA's VCMDQs, add hw_queue_init_phys for
now, and leave some notes for hw_queue_init in the near future (for AMD).

Either NVIDIA's or AMD's HW is a multi-queue model: NVIDIA's will be only
one type in enum iommu_hw_queue_type, while AMD's will be three different
types (two of which will have multi queues). Compared to letting the core
manage multiple queues with three types per vIOMMU object, it'd be easier
for the driver to manage that by having three different driver-structure
arrays per vIOMMU object. Thus, pass in the index to the init op.

Link: https://patch.msgid.link/r/6939b73699e278e60ce167e911b3d9be68882bad.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/viommu: Add driver-defined vDEVICE support
Nicolin Chen [Thu, 10 Jul 2025 05:59:04 +0000 (22:59 -0700)]
iommufd/viommu: Add driver-defined vDEVICE support

NVIDIA VCMDQ driver will have a driver-defined vDEVICE structure and do
some HW configurations with that.

To allow IOMMU drivers to define their own vDEVICE structures, move the
struct iommufd_vdevice to the public header and provide a pair of viommu
ops, similar to get_viommu_size and viommu_init.

Doing this, however, creates a new window between the vDEVICE allocation
and its driver-level initialization, during which an abort could happen
but it can't invoke a driver destroy function from the struct viommu_ops
since the driver structure isn't initialized yet. vIOMMU object doesn't
have this problem, since its destroy op is set via the viommu_ops by the
driver viommu_init function. Thus, vDEVICE should do something similar:
add a destroy function pointer inside the struct iommufd_vdevice instead
of the struct iommufd_viommu_ops.

Note that there is unlikely a use case for a type dependent vDEVICE, so
a static vdevice_size is probably enough for the near term instead of a
get_vdevice_size function op.

Link: https://patch.msgid.link/r/1e751c01da7863c669314d8e27fdb89eabcf5605.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/access: Bypass access->ops->unmap for internal use
Nicolin Chen [Thu, 10 Jul 2025 05:59:03 +0000 (22:59 -0700)]
iommufd/access: Bypass access->ops->unmap for internal use

The access object has been used externally by VFIO mdev devices, allowing
them to pin/unpin physical pages (via needs_pin_pages). Meanwhile, a racy
unmap can occur in this case, so these devices usually implement an unmap
handler, invoked by iommufd_access_notify_unmap().

The new HW queue object will need the same pin/unpin feature, although it
(unlike the mdev case) wants to reject any unmap attempt, during its life
cycle. Instead, it would not implement an unmap handler. Thus, bypass any
access->ops->unmap access call when the access is marked as internal.

Also, an area being pinned by an internal access should reject any unmap
request. This cannot be done inside iommufd_access_notify_unmap() as it's
a per-iopt action. Add a "num_locks" counter in the struct iopt_area, set
that in iopt_area_add_access() when the caller is an internal access.

Link: https://patch.msgid.link/r/6df9a43febf79c0379091ec59747276ce9d2493b.1752126748.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/access: Add internal APIs for HW queue to use
Nicolin Chen [Thu, 10 Jul 2025 05:59:02 +0000 (22:59 -0700)]
iommufd/access: Add internal APIs for HW queue to use

The new HW queue object, as an internal iommufd object, wants to reuse the
struct iommufd_access to pin some iova range in the iopt.

However, an access generally takes the refcount of an ictx. So, in such an
internal case, a deadlock could happen when the release of the ictx has to
wait for the release of the access first when releasing a hw_queue object,
which could wait for the release of the ictx that is refcounted:
    ictx --releases--> hw_queue --releases--> access
      ^                                         |
      |_________________releases________________v

To address this, add a set of lightweight internal APIs to unlink the ictx
and the access, i.e. no ictx refcounting by the access:
    ictx --releases--> hw_queue --releases--> access

Then, there's no point in setting the access->ictx. So simply define !ictx
as an flag for an internal use and add an inline helper.

Link: https://patch.msgid.link/r/d8d84bf99cbebec56034b57b966a3d431385b90d.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/selftest: Add coverage for viommu data
Nicolin Chen [Thu, 10 Jul 2025 05:59:01 +0000 (22:59 -0700)]
iommufd/selftest: Add coverage for viommu data

Extend the existing test_cmd/err_viommu_alloc helpers to accept optional
user data. And add a TEST_F for a loopback test.

Link: https://patch.msgid.link/r/8ceb64d30e9953f29270a7d341032ca439317271.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/selftest: Support user_data in mock_viommu_alloc
Nicolin Chen [Thu, 10 Jul 2025 05:59:00 +0000 (22:59 -0700)]
iommufd/selftest: Support user_data in mock_viommu_alloc

Add a simple user_data for an input-to-output loopback test.

Link: https://patch.msgid.link/r/cae4632bb3d98a1efb3b77488fbf81814f2041c6.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/viommu: Allow driver-specific user data for a vIOMMU object
Nicolin Chen [Thu, 10 Jul 2025 05:58:59 +0000 (22:58 -0700)]
iommufd/viommu: Allow driver-specific user data for a vIOMMU object

The new type of vIOMMU for tegra241-cmdqv driver needs a driver-specific
user data. So, add data_len/uptr to the iommu_viommu_alloc uAPI and pass
it in via the viommu_init iommu op.

Link: https://patch.msgid.link/r/2315b0e164b355746387e960745ac9154caec124.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Pranjal Shrivastava <praan@google.com>
Acked-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommu: Pass in a driver-level user data structure to viommu_init op
Nicolin Chen [Thu, 10 Jul 2025 05:58:58 +0000 (22:58 -0700)]
iommu: Pass in a driver-level user data structure to viommu_init op

The new type of vIOMMU for tegra241-cmdqv allows user space VM to use one
of its virtual command queue HW resources exclusively. This requires user
space to mmap the corresponding MMIO page from kernel space for direct HW
control.

To forward the mmap info (offset and length), iommufd should add a driver
specific data structure to the IOMMUFD_CMD_VIOMMU_ALLOC ioctl, for driver
to output the info during the vIOMMU initialization back to user space.

Similar to the existing ioctls and their IOMMU handlers, add a user_data
to viommu_init op to bridge between iommufd and drivers.

Link: https://patch.msgid.link/r/90bd5637dab7f5507c7a64d2c4826e70431e45a4.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommu: Add iommu_copy_struct_to_user helper
Nicolin Chen [Thu, 10 Jul 2025 05:58:57 +0000 (22:58 -0700)]
iommu: Add iommu_copy_struct_to_user helper

Similar to the iommu_copy_struct_from_user helper receiving data from the
user space, add an iommu_copy_struct_to_user helper to report output data
back to the user space data pointer.

Link: https://patch.msgid.link/r/fa292c2a730aadd77085ec3a8272360c96eabb9c.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommu: Use enum iommu_hw_info_type for type in hw_info op
Nicolin Chen [Thu, 10 Jul 2025 05:58:56 +0000 (22:58 -0700)]
iommu: Use enum iommu_hw_info_type for type in hw_info op

Replace u32 to make it clear. No functional changes.

Also simplify the kdoc since the type itself is clear enough.

Link: https://patch.msgid.link/r/651c50dee8ab900f691202ef0204cd5a43fdd6a2.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd/viommu: Explicitly define vdev->virt_id
Nicolin Chen [Thu, 10 Jul 2025 05:58:55 +0000 (22:58 -0700)]
iommufd/viommu: Explicitly define vdev->virt_id

The "id" is too general to get its meaning easily. Rename it explicitly to
"virt_id" and update the kdocs for readability. No functional changes.

Link: https://patch.msgid.link/r/1fac22d645e6ee555675726faf3798a68315b044.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd: Correct virt_id kdoc at struct iommu_vdevice_alloc
Nicolin Chen [Thu, 10 Jul 2025 05:58:54 +0000 (22:58 -0700)]
iommufd: Correct virt_id kdoc at struct iommu_vdevice_alloc

The userspace-api iommufd.rst has described it correctly but the uAPI doc
was remained uncorrected. Thus, fix it.

Link: https://patch.msgid.link/r/2cdcecaf2babee16fda7545ccad4e5bed7a5032d.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 months agoiommufd: Report unmapped bytes in the error path of iopt_unmap_iova_range
Nicolin Chen [Thu, 10 Jul 2025 05:58:53 +0000 (22:58 -0700)]
iommufd: Report unmapped bytes in the error path of iopt_unmap_iova_range

There are callers that read the unmapped bytes even when rc != 0. Thus, do
not forget to report it in the error path too.

Fixes: 8d40205f6093 ("iommufd: Add kAPI toward external drivers for kernel access")
Link: https://patch.msgid.link/r/e2b61303bbc008ba1a4e2d7c2a2894749b59fdac.1752126748.git.nicolinc@nvidia.com
Cc: stable@vger.kernel.org
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommufd: Apply the new iommufd_object_alloc_ucmd helper
Nicolin Chen [Sat, 14 Jun 2025 06:35:26 +0000 (23:35 -0700)]
iommufd: Apply the new iommufd_object_alloc_ucmd helper

Now the new ucmd-based object allocator eases the finalize/abort routine,
apply this to all existing allocators that aren't protected by any lock.

Upgrade the for-driver vIOMMU alloctor too, and pass down to all existing
viommu_alloc op accordingly.

Note that __iommufd_object_alloc_ucmd() builds in some static tests that
cover both static_asserts in the iommufd_viommu_alloc(). Thus drop them.

Link: https://patch.msgid.link/r/107b24a3b791091bb09c92ffb0081c56c413b26d.1749882255.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommufd: Introduce iommufd_object_alloc_ucmd helper
Nicolin Chen [Sat, 14 Jun 2025 06:35:25 +0000 (23:35 -0700)]
iommufd: Introduce iommufd_object_alloc_ucmd helper

An object allocator needs to call either iommufd_object_finalize() upon a
success or iommufd_object_abort_and_destroy() upon an error code.

To reduce duplication, store a new_obj in the struct iommufd_ucmd and call
iommufd_object_finalize/iommufd_object_abort_and_destroy() accordingly in
the main function.

Similar to iommufd_object_alloc() and __iommufd_object_alloc(), add a pair
of helpers: __iommufd_object_alloc_ucmd() and iommufd_object_alloc_ucmd().

Link: https://patch.msgid.link/r/e7206d4227844887cc8dbf0cc7b0242580fafd9d.1749882255.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommufd: Move _iommufd_object_alloc out of driver.c
Nicolin Chen [Sat, 14 Jun 2025 06:35:24 +0000 (23:35 -0700)]
iommufd: Move _iommufd_object_alloc out of driver.c

Now, all driver structures will be allocated by the core, i.e. no longer a
need of driver calling _iommufd_object_alloc. Thus, move it back.

Before:
   text    data     bss     dec     hex filename
   3024     180       0    3204     c84 drivers/iommu/iommufd/driver.o
   9074     610      64    9748    2614 drivers/iommu/iommufd/main.o
After:
   text    data     bss     dec     hex filename
   2665     164       0    2829     b0d drivers/iommu/iommufd/driver.o
   9410     618      64   10092    276c drivers/iommu/iommufd/main.o

Link: https://patch.msgid.link/r/79e630c7b911930cf36e3c8a775a04e66c528d65.1749882255.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommu: Deprecate viommu_alloc op
Nicolin Chen [Sat, 14 Jun 2025 06:35:23 +0000 (23:35 -0700)]
iommu: Deprecate viommu_alloc op

To ease the for-driver iommufd APIs, get_viommu_size and viommu_init ops
are introduced. Now, those existing vIOMMU supported drivers implemented
these two ops, replacing the viommu_alloc one. So, there is no use of it.

Remove it from the headers and the viommu core.

Link: https://patch.msgid.link/r/5b32d4499d7ed02a63e57a293c11b642d226ef8d.1749882255.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommu/arm-smmu-v3: Replace arm_vsmmu_alloc with arm_vsmmu_init
Nicolin Chen [Sat, 14 Jun 2025 06:35:22 +0000 (23:35 -0700)]
iommu/arm-smmu-v3: Replace arm_vsmmu_alloc with arm_vsmmu_init

To ease the for-driver iommufd APIs, get_viommu_size and viommu_init ops
are introduced.

Sanitize the inputs and report the size of struct arm_vsmmu on success, in
arm_smmu_get_viommu_size().

Place the type sanity at the last, becase there will be soon an impl level
get_viommu_size op, which will require the same sanity tests prior. It can
simply insert a piece of code in front of the IOMMU_VIOMMU_TYPE_ARM_SMMUV3
sanity.

The core will ensure the viommu_type is set to the core vIOMMU object, and
pass in the same dev pointer, so arm_vsmmu_init() won't need to repeat the
same sanity tests but to simply init the arm_vsmmu struct.

Remove the arm_vsmmu_alloc, completing the replacement.

Link: https://patch.msgid.link/r/64e4b4c33acd26e1bd676e077be80e00fb63f17c.1749882255.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommufd/selftest: Replace mock_viommu_alloc with mock_viommu_init
Nicolin Chen [Sat, 14 Jun 2025 06:35:21 +0000 (23:35 -0700)]
iommufd/selftest: Replace mock_viommu_alloc with mock_viommu_init

To ease the for-driver iommufd APIs, get_viommu_size and viommu_init ops
are introduced.

Sanitize the inputs and report the size of struct mock_viommu on success,
in mock_get_viommu_size().

The core will ensure the viommu_type is set to the core vIOMMU object, so
simply init the driver part in mock_viommu_init().

Remove the mock_viommu_alloc, completing the replacement.

Link: https://patch.msgid.link/r/993beabbb0bc9705d979a92801ea5ed5996a34eb.1749882255.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommufd/selftest: Drop parent domain from mock_iommu_domain_nested
Nicolin Chen [Sat, 14 Jun 2025 06:35:20 +0000 (23:35 -0700)]
iommufd/selftest: Drop parent domain from mock_iommu_domain_nested

There is no use of this parent domain. Delete the dead code.

Note that the s2_parent in struct mock_viommu will be a deadcode too. Yet,
keep it because it will be soon used by HW queue objects, i.e. no point in
adding it back and forth in such a short window. Besides, keeping it could
cover the majority of vIOMMU use cases where a driver-level structure will
be larger in size than the core structure.

Link: https://patch.msgid.link/r/0f155a7cd71034a498448fe4828fb4aaacdabf95.1749882255.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommufd/viommu: Support get_viommu_size and viommu_init ops
Nicolin Chen [Sat, 14 Jun 2025 06:35:19 +0000 (23:35 -0700)]
iommufd/viommu: Support get_viommu_size and viommu_init ops

To ease the for-driver iommufd APIs, get_viommu_size and viommu_init ops
are introduced to replace the viommu_init op.

Let the new viommu_init pathway coexist with the old viommu_alloc one.

Since the viommu_alloc op and its pathway will be soon deprecated, try to
minimize the code difference between them by adding a tentative jump tag.

Note that this fails a !viommu->ops case from now on with a WARN_ON_ONCE
since a vIOMMU is expected to support an alloc_domain_nested op for now,
or some sort of a viommu op in the foreseeable future. This WARN_ON_ONCE
can be lifted, if some day there is a use case wanting !viommu->ops.

Link: https://patch.msgid.link/r/35c5fa5926be45bda82f5fc87545cd3180ad4c9c.1749882255.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommu: Introduce get_viommu_size and viommu_init ops
Nicolin Chen [Sat, 14 Jun 2025 06:35:18 +0000 (23:35 -0700)]
iommu: Introduce get_viommu_size and viommu_init ops

So far, a vIOMMU object has been allocated by IOMMU driver and initialized
with the driver-level structure, before it returns to the iommufd core for
core-level structure initialization. It has been requiring iommufd core to
expose some core structure/helpers in its driver.c file, which result in a
size increase of this driver module.

Meanwhile, IOMMU drivers are now requiring more vIOMMU-base structures for
some advanced feature, such as the existing vDEVICE and a future HW_QUEUE.
Initializing a core-structure later than driver-structure gives for-driver
helpers some trouble, when they are used by IOMMU driver assuming that the
new structure (including core) are fully initialized, for example:

core: viommu = ops->viommu_alloc();
driver: // my_viommu is successfully allocated
driver: my_viommu = iommufd_viommu_alloc(...);
driver: // This may crash if it reads viommu->ictx
driver: new = iommufd_new_viommu_helper(my_viommu->core ...);
core: viommu->ictx = ucmd->ictx;
core: ...

To ease such a condition, allow the IOMMU driver to report the size of its
vIOMMU structure, let the core allocate a vIOMMU object and initialize the
core-level structure first, and then hand it over the driver to initialize
its driver-level structure.

Thus, this requires two new iommu ops, get_viommu_size and viommu_init, so
iommufd core can communicate with drivers to replace the viommu_alloc op:

core: viommu = ops->get_viommu_size();
driver: return VIOMMU_STRUCT_SIZE();
core: viommu->ictx = ucmd->ictx; // and others
core: rc = ops->viommu_init();
driver: // This is safe now as viommu->ictx is inited
driver: new = iommufd_new_viommu_helper(my_viommu->core ...);
core: ...

This also adds a VIOMMU_STRUCT_SIZE macro, for drivers to use, which would
statically sanitize the driver structure.

Link: https://patch.msgid.link/r/3ab52c5b622dad476c43b1b1f1636c8b902f1692.1749882255.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommufd: Return EOPNOTSUPP for failures due to driver bugs
Nicolin Chen [Sat, 14 Jun 2025 06:35:17 +0000 (23:35 -0700)]
iommufd: Return EOPNOTSUPP for failures due to driver bugs

It's more accurate to report EOPNOTSUPP when an ioctl failed due to driver
bug, since there is nothing wrong with the user space side.

Link: https://patch.msgid.link/r/623bb6f0e8fdd7b9c5745a2f99f280163f9f1f5a.1749882255.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommufd: Use enum iommu_veventq_type for type in struct iommufd_veventq
Nicolin Chen [Sat, 14 Jun 2025 06:35:16 +0000 (23:35 -0700)]
iommufd: Use enum iommu_veventq_type for type in struct iommufd_veventq

Replace unsigned int, to make it clear. No functional changes.

Link: https://patch.msgid.link/r/208a260c100a00667d3799feaad1260745f96c6b.1749882255.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommufd: Use enum iommu_viommu_type for type in struct iommufd_viommu
Nicolin Chen [Sat, 14 Jun 2025 06:35:15 +0000 (23:35 -0700)]
iommufd: Use enum iommu_viommu_type for type in struct iommufd_viommu

Replace unsigned int, to make it clear. No functional changes.

The viommu_alloc iommu op will be deprecated, so don't change that.

Link: https://patch.msgid.link/r/6c6ba5c0cd381594f17ae74355872d78d7a022c0.1749882255.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommufd: Drop unused ictx in struct iommufd_vdevice
Nicolin Chen [Sat, 14 Jun 2025 06:35:14 +0000 (23:35 -0700)]
iommufd: Drop unused ictx in struct iommufd_vdevice

The core code can always get the ictx pointer via vdev->viommu->ictx, thus
drop this unused one.

Link: https://patch.msgid.link/r/6cbb65e8df433de45b6c3a4bb2c5df09faca8a7c.1749882255.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoiommufd: Apply obvious cosmetic fixes
Nicolin Chen [Sat, 14 Jun 2025 06:35:13 +0000 (23:35 -0700)]
iommufd: Apply obvious cosmetic fixes

Run clang-format but exclude those not so obvious ones, which leaves us:
 - Align indentations
 - Add missing spaces
 - Remove unnecessary spaces
 - Remove unnecessary line wrappings

Link: https://patch.msgid.link/r/9132e1ab45690ab1959c66bbb51ac5536a635388.1749882255.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 months agoLinux 6.16-rc1
Linus Torvalds [Sun, 8 Jun 2025 20:44:43 +0000 (13:44 -0700)]
Linux 6.16-rc1

4 months agoMerge tag 'turbostat-2025.06.08' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 8 Jun 2025 18:44:41 +0000 (11:44 -0700)]
Merge tag 'turbostat-2025.06.08' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:

 - Add initial DMR support, which required smarter RAPL probe

 - Fix AMD MSR RAPL energy reporting

 - Add RAPL power limit configuration output

 - Minor fixes

* tag 'turbostat-2025.06.08' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: version 2025.06.08
  tools/power turbostat: Add initial support for BartlettLake
  tools/power turbostat: Add initial support for DMR
  tools/power turbostat: Dump RAPL sysfs info
  tools/power turbostat: Avoid probing the same perf counters
  tools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs cleared
  tools/power turbostat: Clean up add perf/msr counter logic
  tools/power turbostat: Introduce add_msr_counter()
  tools/power turbostat: Remove add_msr_perf_counter_()
  tools/power turbostat: Remove add_cstate_perf_counter_()
  tools/power turbostat: Remove add_rapl_perf_counter_()
  tools/power turbostat: Quit early for unsupported RAPL counters
  tools/power turbostat: Always check rapl_joules flag
  tools/power turbostat: Fix AMD package-energy reporting
  tools/power turbostat: Fix RAPL_GFX_ALL typo
  tools/power turbostat: Add Android support for MSR device handling
  tools/power turbostat.8: pm_domain wording fix
  tools/power turbostat.8: fix typo: idle_pct should be pct_idle

4 months agoMerge tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 8 Jun 2025 18:33:00 +0000 (11:33 -0700)]
Merge tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer cleanup from Thomas Gleixner:
 "The delayed from_timer() API cleanup:

  The renaming to the timer_*() namespace was delayed due massive
  conflicts against Linux-next. Now that everything is upstream finish
  the conversion"

* tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  treewide, timers: Rename from_timer() to timer_container_of()

4 months agoMerge tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 8 Jun 2025 18:27:20 +0000 (11:27 -0700)]
Merge tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A small set of x86 fixes:

   - Cure IO bitmap inconsistencies

     A failed fork cleans up all resources of the newly created thread
     via exit_thread(). exit_thread() invokes io_bitmap_exit() which
     does the IO bitmap cleanups, which unfortunately assume that the
     cleanup is related to the current task, which is obviously bogus.

     Make it work correctly

   - A lockdep fix in the resctrl code removed the clearing of the
     command buffer in two places, which keeps stale error messages
     around. Bring them back.

   - Remove unused trace events"

* tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex
  x86/iopl: Cure TIF_IO_BITMAP inconsistencies
  x86/fpu: Remove unused trace events

4 months agoMerge tag 'timers-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 8 Jun 2025 18:25:13 +0000 (11:25 -0700)]
Merge tag 'timers-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "Add the missing seq_file forward declaration in the timer namespace
  header"

* tag 'timers-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timens: Add struct seq_file forward declaration

4 months agotools/power turbostat: version 2025.06.08
Len Brown [Sun, 8 Jun 2025 16:31:59 +0000 (12:31 -0400)]
tools/power turbostat: version 2025.06.08

Add initial DMR support, which required smarter RAPL probe
Fix AMD MSR RAPL energy reporting
Add RAPL power limit configuration output
Minor fixes

Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Add initial support for BartlettLake
Zhang Rui [Fri, 18 Apr 2025 06:04:26 +0000 (14:04 +0800)]
tools/power turbostat: Add initial support for BartlettLake

Add initial support for BartlettLake.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Add initial support for DMR
Zhang Rui [Mon, 4 Mar 2024 06:54:40 +0000 (14:54 +0800)]
tools/power turbostat: Add initial support for DMR

Add initial support for DMR.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Dump RAPL sysfs info
Zhang Rui [Fri, 30 May 2025 06:01:31 +0000 (14:01 +0800)]
tools/power turbostat: Dump RAPL sysfs info

for example:

intel-rapl:1: psys 28.0s:100W 976.0us:100W
intel-rapl:0: package-0 28.0s:57W,max:15W 2.4ms:57W
intel-rapl:0/intel-rapl:0:0: core disabled
intel-rapl:0/intel-rapl:0:1: uncore disabled
intel-rapl-mmio:0: package-0 28.0s:28W,max:15W 2.4ms:57W

[lenb: simplified format]

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
squish me

Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Avoid probing the same perf counters
Zhang Rui [Fri, 30 May 2025 00:09:28 +0000 (08:09 +0800)]
tools/power turbostat: Avoid probing the same perf counters

For the RAPL package energy status counter, Intel and AMD share the same
perf_subsys and perf_name, but with different MSR addresses.

Both rapl_counter_arch_infos[0] and rapl_counter_arch_infos[1] are
introduced to describe this counter for different Vendors.

As a result, the perf counter is probed twice, and causes a failure in
in get_rapl_counters() because expected_read_size and actual_read_size
don't match.

Fix the problem by skipping the already probed counter.

Note, this is not a perfect fix. For example, if different
vendors/platforms use the same MSR value for different purpose, the code
can be fooled when it probes a rapl_counter_arch_infos[] entry that does
not belong to the running Vendor/Platform.

In a long run, better to put rapl_counter_arch_infos[] into the
platform_features so that this becomes Vendor/Platform specific.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs cleared
Zhang Rui [Sat, 17 May 2025 09:44:50 +0000 (17:44 +0800)]
tools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs cleared

platform_features->rapl_msrs describes the RAPL MSRs supported. While
RAPL Perf counters can be exposed from different kernel backend drivers,
e.g. RAPL MSR I/F driver, or RAPL TPMI I/F driver.

Thus, turbostat should first blindly probe all the available RAPL Perf
counters, and falls back to the RAPL MSR counters if they are listed in
platform_features->rapl_msrs.

With this, platforms that don't have RAPL MSRs can clear the
platform_features->rapl_msrs bits and use RAPL Perf counters only.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Clean up add perf/msr counter logic
Zhang Rui [Sat, 17 May 2025 09:35:17 +0000 (17:35 +0800)]
tools/power turbostat: Clean up add perf/msr counter logic

Increase the code readability by moving the no_perf/no_msr flag and the
cai->perf_name/cai->msr sanity checks into the counter probe functions.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Introduce add_msr_counter()
Zhang Rui [Sat, 17 May 2025 07:58:51 +0000 (15:58 +0800)]
tools/power turbostat: Introduce add_msr_counter()

probe_rapl_msr() is reused for probing RAPL MSR counters, cstate MSR
counters and MPERF/APERF/SMI MSR counters, thus its name is misleading.

Similar to add_perf_counter(), introduce add_msr_counter() to probe a
counter via MSR. Introduce wrapper function add_rapl_msr_counter() at
the same time to add extra check for Zero return value for specified
RAPL counters.

No functional change intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Remove add_msr_perf_counter_()
Zhang Rui [Sat, 17 May 2025 09:40:08 +0000 (17:40 +0800)]
tools/power turbostat: Remove add_msr_perf_counter_()

As the only caller of add_msr_perf_counter_(), add_msr_perf_counter()
just gives extra debug output on top. There is no need to keep both
functions.

Remove add_msr_perf_counter_() and move all the logic to
add_msr_perf_counter().

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Remove add_cstate_perf_counter_()
Zhang Rui [Sat, 17 May 2025 07:43:59 +0000 (15:43 +0800)]
tools/power turbostat: Remove add_cstate_perf_counter_()

As the only caller of add_cstate_perf_counter_(),
add_cstate_perf_counter() just gives extra debug output on top. There is
no need to keep both functions.

Remove add_cstate_perf_counter_() and move all the logic to
add_cstate_perf_counter().

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Remove add_rapl_perf_counter_()
Zhang Rui [Sat, 17 May 2025 04:06:22 +0000 (12:06 +0800)]
tools/power turbostat: Remove add_rapl_perf_counter_()

As the only caller of add_rapl_perf_counter_(), add_rapl_perf_counter()
just gives extra debug output on top. There is no need to keep both
functions.

Remove add_rapl_perf_counter_() and move all the logic to
add_rapl_perf_counter().

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Quit early for unsupported RAPL counters
Zhang Rui [Sat, 17 May 2025 02:26:14 +0000 (10:26 +0800)]
tools/power turbostat: Quit early for unsupported RAPL counters

Quit early for unsupported RAPL counters.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Always check rapl_joules flag
Zhang Rui [Fri, 30 May 2025 06:00:33 +0000 (14:00 +0800)]
tools/power turbostat: Always check rapl_joules flag

rapl_joules bit should always be checked even if
platform_features->rapl_msrs is not set or no_msr flag is used.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Fix AMD package-energy reporting
Gautham R. Shenoy [Thu, 29 May 2025 11:48:25 +0000 (17:18 +0530)]
tools/power turbostat: Fix AMD package-energy reporting

commit 05a2f07db888 ("tools/power turbostat: read RAPL counters via
perf") that adds support to read RAPL counters via perf defines the
notion of a RAPL domain_id which is set to physical_core_id on
platforms which support per_core_rapl counters (Eg: AMD processors
Family 17h onwards) and is set to the physical_package_id on all the
other platforms.

However, the physical_core_id is only unique within a package and on
platforms with multiple packages more than one core can have the same
physical_core_id and thus the same domain_id. (For eg, the first cores
of each package have the physical_core_id = 0). This results in all
these cores with the same physical_core_id using the same entry in the
rapl_counter_info_perdomain[]. Since rapl_perf_init() skips the
perf-initialization for cores whose domain_ids have already been
visited, cores that have the same physical_core_id always read the
perf file corresponding to the physical_core_id of the first package
and thus the package-energy is incorrectly reported to be the same
value for different packages.

Note: This issue only arises when RAPL counters are read via perf and
not when they are read via MSRs since in the latter case the MSRs are
read separately on each core.

Fix this issue by associating each CPU with rapl_core_id which is
unique across all the packages in the system.

Fixes: 05a2f07db888 ("tools/power turbostat: read RAPL counters via perf")
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Fix RAPL_GFX_ALL typo
Kaushlendra Kumar [Fri, 23 May 2025 08:06:59 +0000 (13:36 +0530)]
tools/power turbostat: Fix RAPL_GFX_ALL typo

Fix typo in the currently unused RAPL_GFX_ALL macro definition.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat: Add Android support for MSR device handling
Kaushlendra Kumar [Thu, 22 May 2025 08:49:46 +0000 (14:19 +0530)]
tools/power turbostat: Add Android support for MSR device handling

It uses /dev/msrN device paths on Android instead of /dev/cpu/N/msr,
updates error messages and permission checks to reflect the Android
device path, and wraps platform-specific code with #if defined(ANDROID)
to ensure correct behavior on both Android and non-Android systems.
These changes improve compatibility and usability of turbostat on
Android devices.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat.8: pm_domain wording fix
Len Brown [Fri, 18 Apr 2025 21:54:39 +0000 (17:54 -0400)]
tools/power turbostat.8: pm_domain wording fix

turbostat.8: clarify that uncore "domains" are Power Management domains,
aka pm_domains.

Signed-off-by: Len Brown <len.brown@intel.com>
4 months agotools/power turbostat.8: fix typo: idle_pct should be pct_idle
Len Brown [Wed, 9 Apr 2025 04:06:24 +0000 (00:06 -0400)]
tools/power turbostat.8: fix typo: idle_pct should be pct_idle

idle_pct should be pct_idle

Signed-off-by: Len Brown <len.brown@intel.com>
4 months agoMerge tag 'perf-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 8 Jun 2025 18:07:33 +0000 (11:07 -0700)]
Merge tag 'perf-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf fix from Thomas Gleixner:
 "A single fix for the x86 performance counters on Intel CPUs:

  The MSR offset calculations for fixed performance counters are stored
  at the wrong index in the configuration array causing the general
  purpose counter MSR offset to be overwritten, so both the general
  purpose and the fixed counters offsets are incorrect.

  Correct the array index calculation to fix that"

* tag 'perf-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix incorrect MSR index calculations in intel_pmu_config_acr()

4 months agoMerge tag 'irq-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 8 Jun 2025 18:02:53 +0000 (11:02 -0700)]
Merge tag 'irq-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "A single fix for the PCI/MSI code:

  The conversion to per device MSI domains created a MSI domain with
  size 1 instead of sizing it to the maximum possible number of MSI
  interrupts for the device. This "worked" as the subsequent allocations
  resized the domain, but the recent change to move the prepare() call
  into the domain creation path broke this works by chance mechanism.

  Size the domain properly at creation time"

* tag 'irq-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  PCI/MSI: Size device MSI domain with the maximum number of vectors

4 months agoMerge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 8 Jun 2025 17:35:12 +0000 (10:35 -0700)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull mount fixes from Al Viro:
 "Various mount-related bugfixes:

   - split the do_move_mount() checks in subtree-of-our-ns and
     entire-anon cases and adapt detached mount propagation selftest for
     mount_setattr

   - allow clone_private_mount() for a path on real rootfs

   - fix a race in call of has_locked_children()

   - fix move_mount propagation graph breakage by MOVE_MOUNT_SET_GROUP

   - make sure clone_private_mnt() caller has CAP_SYS_ADMIN in the right
     userns

   - avoid false negatives in path_overmount()

   - don't leak MNT_LOCKED from parent to child in finish_automount()

   - do_change_type(): refuse to operate on unmounted/not ours mounts"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  do_change_type(): refuse to operate on unmounted/not ours mounts
  clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns
  selftests/mount_setattr: adapt detached mount propagation test
  do_move_mount(): split the checks in subtree-of-our-ns and entire-anon cases
  fs: allow clone_private_mount() for a path on real rootfs
  fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
  finish_automount(): don't leak MNT_LOCKED from parent to child
  path_overmount(): avoid false negatives
  fs/fhandle.c: fix a race in call of has_locked_children()

4 months agoMerge tag '6.16-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 8 Jun 2025 17:20:21 +0000 (10:20 -0700)]
Merge tag '6.16-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull more smb client updates from Steve French:

 - multichannel/reconnect fixes

 - move smbdirect (smb over RDMA) defines to fs/smb/common so they will
   be able to be used in the future more broadly, and a documentation
   update explaining setting up smbdirect mounts

 - update email address for Paulo

* tag '6.16-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal version number
  MAINTAINERS, mailmap: Update Paulo Alcantara's email address
  cifs: add documentation for smbdirect setup
  cifs: do not disable interface polling on failure
  cifs: serialize other channels when query server interfaces is pending
  cifs: deal with the channel loading lag while picking channels
  smb: client: make use of common smbdirect_socket_parameters
  smb: smbdirect: introduce smbdirect_socket_parameters
  smb: client: make use of common smbdirect_socket
  smb: smbdirect: add smbdirect_socket.h
  smb: client: make use of common smbdirect.h
  smb: smbdirect: add smbdirect.h with public structures
  smb: client: make use of common smbdirect_pdu.h
  smb: smbdirect: add smbdirect_pdu.h with protocol definitions

4 months agoMerge tag 'trace-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Sun, 8 Jun 2025 15:19:01 +0000 (08:19 -0700)]
Merge tag 'trace-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull more tracing fixes from Steven Rostedt:

 - Fix regression of waiting a long time on updating trace event filters

   When the faultable trace points were added, it needed task trace RCU
   synchronization.

   This was added to the tracepoint_synchronize_unregister() function.
   The filter logic always called this function whenever it updated the
   trace event filters before freeing the old filters. This increased
   the time of "trace-cmd record" from taking 13 seconds to running over
   2 minutes to complete.

   Move the freeing of the filters to call_rcu*() logic, which brings
   the time back down to 13 seconds.

 - Fix ring_buffer_subbuf_order_set() error path lock protection

   The error path of the ring_buffer_subbuf_order_set() released the
   mutex too early and allowed subsequent accesses to setting the
   subbuffer size to corrupt the data and cause a bug.

   By moving the mutex locking to the end of the error path, it prevents
   the reentrant access to the critical data and also allows the
   function to convert the taking of the mutex over to the guard()
   logic.

 - Remove unused power management clock events

   The clock events were added in 2010 for power management. In 2011 arm
   used them. In 2013 the code they were used in was removed. These
   events have been wasting memory since then.

 - Fix sparse warnings

   There was a few places that sparse warned about trace_events_filter.c
   where file->filter was referenced directly, but it is annotated with
   an __rcu tag. Use the helper functions and fix them up to use
   rcu_dereference() properly.

* tag 'trace-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Add rcu annotation around file->filter accesses
  tracing: PM: Remove unused clock events
  ring-buffer: Fix buffer locking in ring_buffer_subbuf_order_set()
  tracing: Fix regression of filter waiting a long time on RCU synchronization

4 months agotreewide, timers: Rename from_timer() to timer_container_of()
Ingo Molnar [Fri, 9 May 2025 05:51:14 +0000 (07:51 +0200)]
treewide, timers: Rename from_timer() to timer_container_of()

Move this API to the canonical timer_*() namespace.

[ tglx: Redone against pre rc1 ]

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
4 months agoMerge tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy...
Linus Torvalds [Sat, 7 Jun 2025 17:05:35 +0000 (10:05 -0700)]
Merge tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which
   exports a symbol only to specified modules

 - Improve ABI handling in gendwarfksyms

 - Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n

 - Add checkers for redundant or missing <linux/export.h> inclusion

 - Deprecate the extra-y syntax

 - Fix a genksyms bug when including enum constants from *.symref files

* tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits)
  genksyms: Fix enum consts from a reference affecting new values
  arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds
  kbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES}
  efi/libstub: use 'targets' instead of extra-y in Makefile
  module: make __mod_device_table__* symbols static
  scripts/misc-check: check unnecessary #include <linux/export.h> when W=1
  scripts/misc-check: check missing #include <linux/export.h> when W=1
  scripts/misc-check: add double-quotes to satisfy shellcheck
  kbuild: move W=1 check for scripts/misc-check to top-level Makefile
  scripts/tags.sh: allow to use alternative ctags implementation
  kconfig: introduce menu type enum
  docs: symbol-namespaces: fix reST warning with literal block
  kbuild: link lib-y objects to vmlinux forcibly even when CONFIG_MODULES=n
  tinyconfig: enable CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
  docs/core-api/symbol-namespaces: drop table of contents and section numbering
  modpost: check forbidden MODULE_IMPORT_NS("module:") at compile time
  kbuild: move kbuild syntax processing to scripts/Makefile.build
  Makefile: remove dependency on archscripts for header installation
  Documentation/kbuild: Add new gendwarfksyms kABI rules
  Documentation/kbuild: Drop section numbers
  ...

4 months agoMerge tag 'sh-for-v6.16-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubi...
Linus Torvalds [Sat, 7 Jun 2025 17:00:03 +0000 (10:00 -0700)]
Merge tag 'sh-for-v6.16-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux

Pull sh updates from John Paul Adrian Glaubitz:

 - replace the __ASSEMBLY__ with __ASSEMBLER__ macro in all headers
   since the latter is now defined automatically by both GCC and Clang
   when compiling assembly code (Thomas Huth)

 - set the default SPI mode for the ecovec24 board which became
   necessary after a new mode member as added to the sh_msiof_spi_info
   struct in cf9e4784f3bd ("spi: sh-msiof: Add slave mode support")
   (Geert Uytterhoeven)

 - remove unused variables in the kprobes code in
   kprobe_exceptions_notify() (Mike Rapoport)

* tag 'sh-for-v6.16-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: kprobes: Remove unused variables in kprobe_exceptions_notify()
  sh: ecovec24: Make SPI mode explicit
  sh: Replace __ASSEMBLY__ with __ASSEMBLER__ in all headers

4 months agoMerge tag 'loongarch-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuaca...
Linus Torvalds [Sat, 7 Jun 2025 16:56:18 +0000 (09:56 -0700)]
Merge tag 'loongarch-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

 - Adjust the 'make install' operation

 - Support SCHED_MC (Multi-core scheduler)

 - Enable ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS

 - Enable HAVE_ARCH_STACKLEAK

 - Increase max supported CPUs up to 2048

 - Introduce the numa_memblks conversion

 - Add PWM controller nodes in dts

 - Some bug fixes and other small changes

* tag 'loongarch-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  platform/loongarch: laptop: Unregister generic_sub_drivers on exit
  platform/loongarch: laptop: Add backlight power control support
  platform/loongarch: laptop: Get brightness setting from EC on probe
  LoongArch: dts: Add PWM support to Loongson-2K2000
  LoongArch: dts: Add PWM support to Loongson-2K1000
  LoongArch: dts: Add PWM support to Loongson-2K0500
  LoongArch: vDSO: Correctly use asm parameters in syscall wrappers
  LoongArch: Fix panic caused by NULL-PMD in huge_pte_offset()
  LoongArch: Preserve firmware configuration when desired
  LoongArch: Avoid using $r0/$r1 as "mask" for csrxchg
  LoongArch: Introduce the numa_memblks conversion
  LoongArch: Increase max supported CPUs up to 2048
  LoongArch: Enable HAVE_ARCH_STACKLEAK
  LoongArch: Enable ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
  LoongArch: Add SCHED_MC (Multi-core scheduler) support
  LoongArch: Add some annotations in archhelp
  LoongArch: Using generic scripts/install.sh in `make install`
  LoongArch: Add a default install.sh

4 months agoMerge tag 'sound-fix-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Sat, 7 Jun 2025 16:40:08 +0000 (09:40 -0700)]
Merge tag 'sound-fix-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of fix patches for the 6.16-rc1 merge window.

  Most of changes are about ASoC, especially lots of AVS driver fixes.
  Larger LOCs are seen in TAS571x codec drivers, but the changes are
  trivial and safe. The rest are all device-specific small fixes"

* tag 'sound-fix-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (27 commits)
  ASoC: Intel: avs: boards: Fix rt5663 front end name
  ASoC: Intel: avs: Simplify verification of parse_int_array() result
  ALSA: usb-audio: Add implicit feedback quirk for RODE AI-1
  ALSA: hda: Ignore unsol events for cards being shut down
  ALSA: hda: Add new pci id for AMD GPU display HD audio controller
  ALSA: hda: cs35l41: Constify regmap_irq_chip
  ALSA: usb-audio: Add a quirk for Lenovo Thinkpad Thunderbolt 3 dock
  ASoC: ti: omap-hdmi: Re-add dai_link->platform to fix card init
  ASoC: pcm: Do not open FEs with no BEs connected
  ASoC: rt1320: fix speaker noise when volume bar is 100%
  ASoC: Intel: avs: Include missing string.h
  ASoC: Intel: avs: Verify content returned by parse_int_array()
  ASoC: Intel: avs: Verify kcalloc() status when setting constraints
  ASoC: Intel: avs: Fix paths in MODULE_FIRMWARE hints
  ASoC: Intel: avs: Fix possible null-ptr-deref when initing hw
  ASoC: Intel: avs: Fix PPLCxFMT calculation
  ASoC: Intel: avs: Fix deadlock when the failing IPC is SET_D0IX
  ASoC: codecs: hda: Fix RPM usage count underflow
  ASoC: amd: yc: Add support for Lenovo Yoga 7 16ARP8
  ASoC: tas571x: fix tas5733 num_controls
  ...

4 months agotracing: Add rcu annotation around file->filter accesses
Steven Rostedt [Sat, 7 Jun 2025 14:28:21 +0000 (10:28 -0400)]
tracing: Add rcu annotation around file->filter accesses

Running sparse on trace_events_filter.c triggered several warnings about
file->filter being accessed directly even though it's annotated with __rcu.

Add rcu_dereference() around it and shuffle the logic slightly so that
it's always referenced via accessor functions.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250607102821.6c7effbf@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
4 months agoMerge tag 'ubifs-for-linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 7 Jun 2025 14:24:07 +0000 (07:24 -0700)]
Merge tag 'ubifs-for-linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull JFFS2 and UBIFS fixes from Richard Weinberger:
 "JFFS2:
   - Correctly check return code of jffs2_prealloc_raw_node_refs()

  UBIFS:
   - Spelling fixes"

* tag 'ubifs-for-linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  jffs2: check jffs2_prealloc_raw_node_refs() result in few other places
  jffs2: check that raw node were preallocated before writing summary
  ubifs: Fix grammar in error message

4 months agosh: kprobes: Remove unused variables in kprobe_exceptions_notify()
Mike Rapoport [Sat, 17 May 2025 09:30:48 +0000 (12:30 +0300)]
sh: kprobes: Remove unused variables in kprobe_exceptions_notify()

kbuild reports the following warning:

   arch/sh/kernel/kprobes.c: In function 'kprobe_exceptions_notify':
>> arch/sh/kernel/kprobes.c:412:24: warning: variable 'p' set but not used [-Wunused-but-set-variable]
     412 |         struct kprobe *p = NULL;
         |                        ^

The variable 'p' is indeed unused since the commit fa5a24b16f94
("sh/kprobes: Don't call the ->break_handler() in SH kprobes code")

Remove that variable along with 'kprobe_opcode_t *addr' which also
becomes unused after 'p' is removed.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202505151341.EuRFR22l-lkp@intel.com/
Fixes: fa5a24b16f94 ("sh/kprobes: Don't call the ->break_handler() in SH kprobes code")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
4 months agosh: ecovec24: Make SPI mode explicit
Geert Uytterhoeven [Fri, 2 May 2025 11:13:36 +0000 (13:13 +0200)]
sh: ecovec24: Make SPI mode explicit

Commit cf9e4784f3bde3e4 ("spi: sh-msiof: Add slave mode support") added
a new mode member to the sh_msiof_spi_info structure, but did not update
any board files.  Hence all users in board files rely on the default
being host mode.

Make this unambiguous by configuring host mode explicitly.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
4 months agosh: Replace __ASSEMBLY__ with __ASSEMBLER__ in all headers
Thomas Huth [Fri, 14 Mar 2025 07:10:03 +0000 (08:10 +0100)]
sh: Replace __ASSEMBLY__ with __ASSEMBLER__ in all headers

While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembly code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This can be very confusing when switching between userspace
and kernelspace coding, or when dealing with uapi headers that
rather should use __ASSEMBLER__ instead. So let's standardize on
the __ASSEMBLER__ macro that is provided by the compilers now.

This is a completely mechanical patch (done with a simple "sed -i"
statement).

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
4 months agogenksyms: Fix enum consts from a reference affecting new values
Petr Pavlu [Tue, 3 Jun 2025 13:02:09 +0000 (15:02 +0200)]
genksyms: Fix enum consts from a reference affecting new values

Enumeration constants read from a symbol reference file can incorrectly
affect new enumeration constants parsed from an actual input file.

Example:

 $ cat test.c
 enum { E_A, E_B, E_MAX };
 struct bar { int mem[E_MAX]; };
 int foo(struct bar *a) {}
 __GENKSYMS_EXPORT_SYMBOL(foo);

 $ cat test.c | ./scripts/genksyms/genksyms -T test.0.symtypes
 #SYMVER foo 0x070d854d

 $ cat test.0.symtypes
 E#E_MAX 2
 s#bar struct bar { int mem [ E#E_MAX ] ; }
 foo int foo ( s#bar * )

 $ cat test.c | ./scripts/genksyms/genksyms -T test.1.symtypes -r test.0.symtypes
 <stdin>:4: warning: foo: modversion changed because of changes in enum constant E_MAX
 #SYMVER foo 0x9c9dfd81

 $ cat test.1.symtypes
 E#E_MAX ( 2 ) + 3
 s#bar struct bar { int mem [ E#E_MAX ] ; }
 foo int foo ( s#bar * )

The __add_symbol() function includes logic to handle the incrementation of
enumeration values, but this code is also invoked when reading a reference
file. As a result, the variables last_enum_expr and enum_counter might be
incorrectly set after reading the reference file, which later affects
parsing of the actual input.

Fix the problem by splitting the logic for the incrementation of
enumeration values into a separate function process_enum() and call it from
__add_symbol() only when processing non-reference data.

Fixes: e37ddb825003 ("genksyms: Track changes to enum constants")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 months agoarch: use always-$(KBUILD_BUILTIN) for vmlinux.lds
Masahiro Yamada [Mon, 2 Jun 2025 18:12:54 +0000 (03:12 +0900)]
arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds

The extra-y syntax is deprecated. Instead, use always-$(KBUILD_BUILTIN),
which behaves equivalently.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
4 months agodo_change_type(): refuse to operate on unmounted/not ours mounts
Al Viro [Wed, 4 Jun 2025 16:27:08 +0000 (12:27 -0400)]
do_change_type(): refuse to operate on unmounted/not ours mounts

Ensure that propagation settings can only be changed for mounts located
in the caller's mount namespace. This change aligns permission checking
with the rest of mount(2).

Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: 07b20889e305 ("beginning of the shared-subtree proper")
Reported-by: "Orlando, Noah" <Noah.Orlando@deshaw.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4 months agokbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES}
Masahiro Yamada [Mon, 2 Jun 2025 18:12:53 +0000 (03:12 +0900)]
kbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES}

KBUILD_BUILTIN is set to 1 unless you are building only modules.

KBUILD_MODULES is set to 1 when you are building only modules
(a typical use case is "make modules").

It is more useful to set them to 'y' instead, so we can do
something like:

    always-$(KBUILD_BUILTIN) += vmlinux.lds

This works equivalently to:

    extra-y                  += vmlinux.lds

This allows us to deprecate extra-y. extra-y and always-y are quite
similar, and we do not need both.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
4 months agoclone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns
Al Viro [Mon, 2 Jun 2025 00:11:06 +0000 (20:11 -0400)]
clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns

What we want is to verify there is that clone won't expose something
hidden by a mount we wouldn't be able to undo.  "Wouldn't be able to undo"
may be a result of MNT_LOCKED on a child, but it may also come from
lacking admin rights in the userns of the namespace mount belongs to.

clone_private_mnt() checks the former, but not the latter.

There's a number of rather confusing CAP_SYS_ADMIN checks in various
userns during the mount, especially with the new mount API; they serve
different purposes and in case of clone_private_mnt() they usually,
but not always end up covering the missing check mentioned above.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Reported-by: "Orlando, Noah" <Noah.Orlando@deshaw.com>
Fixes: 427215d85e8d ("ovl: prevent private clone if bind mount is not allowed")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4 months agoMerge tag 'mm-stable-2025-06-06-16-09' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 7 Jun 2025 05:06:57 +0000 (22:06 -0700)]
Merge tag 'mm-stable-2025-06-06-16-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:
 "The series 'Fix uprobe pte be overwritten when expanding vma' fixes a
  longstanding and quite obscure bug related to the vma merging of the
  uprobe mmap page"

* tag 'mm-stable-2025-06-06-16-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  selftests/mm: add test about uprobe pte be orphan during vma merge
  selftests/mm: extract read_sysfs and write_sysfs into vm_util
  mm: expose abnormal new_pte during move_ptes
  mm: fix uprobe pte be overwritten when expanding vma
  mm/damon: s/primitives/code/ on comments

4 months agoMerge tag 'mm-hotfixes-stable-2025-06-06-16-02' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 7 Jun 2025 04:45:45 +0000 (21:45 -0700)]
Merge tag 'mm-hotfixes-stable-2025-06-06-16-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "13 hotfixes.

  6 are cc:stable and the remainder address post-6.15 issues or aren't
  considered necessary for -stable kernels. 11 are for MM"

* tag 'mm-hotfixes-stable-2025-06-06-16-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count
  MAINTAINERS: add mm swap section
  kmsan: test: add module description
  MAINTAINERS: add tlb trace events to MMU GATHER AND TLB INVALIDATION
  mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race
  mm/hugetlb: unshare page tables during VMA split, not before
  MAINTAINERS: add Alistair as reviewer of mm memory policy
  iov_iter: use iov_offset for length calculation in iov_iter_aligned_bvec
  mm/mempolicy: fix incorrect freeing of wi_kobj
  alloc_tag: handle module codetag load errors as module load failures
  mm/madvise: handle madvise_lock() failure during race unwinding
  mm: fix vmstat after removing NR_BOUNCE
  KVM: s390: rename PROT_NONE to PROT_TYPE_DUMMY

4 months agoselftests/mount_setattr: adapt detached mount propagation test
Christian Brauner [Thu, 5 Jun 2025 12:50:54 +0000 (14:50 +0200)]
selftests/mount_setattr: adapt detached mount propagation test

Make sure that detached trees don't receive mount propagation.

Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4 months agodo_move_mount(): split the checks in subtree-of-our-ns and entire-anon cases
Al Viro [Fri, 6 Jun 2025 22:31:03 +0000 (18:31 -0400)]
do_move_mount(): split the checks in subtree-of-our-ns and entire-anon cases

... and fix the breakage in anon-to-anon case.  There are two cases
acceptable for do_move_mount() and mixing checks for those is making
things hard to follow.

One case is move of a subtree in caller's namespace.
        * source and destination must be in caller's namespace
* source must be detachable from parent
Another is moving the entire anon namespace elsewhere
* source must be the root of anon namespace
* target must either in caller's namespace or in a suitable
  anon namespace (see may_use_mount() for details).
* target must not be in the same namespace as source.

It's really easier to follow if tests are *not* mixed together...

Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: 3b5260d12b1f ("Don't propagate mounts into detached trees")
Reported-by: Allison Karlitskaya <lis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4 months agofs: allow clone_private_mount() for a path on real rootfs
KONDO KAZUMA(近藤 和真) [Thu, 15 May 2025 12:18:30 +0000 (12:18 +0000)]
fs: allow clone_private_mount() for a path on real rootfs

Mounting overlayfs with a directory on real rootfs (initramfs)
as upperdir has failed with following message since commit
db04662e2f4f ("fs: allow detached mounts in clone_private_mount()").

  [    4.080134] overlayfs: failed to clone upperpath

Overlayfs mount uses clone_private_mount() to create internal mount
for the underlying layers.

The commit made clone_private_mount() reject real rootfs because
it does not have a parent mount and is in the initial mount namespace,
that is not an anonymous mount namespace.

This issue can be fixed by modifying the permission check
of clone_private_mount() following [1].

Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: db04662e2f4f ("fs: allow detached mounts in clone_private_mount()")
Link: https://lore.kernel.org/all/20250514190252.GQ2023217@ZenIV/
Link: https://lore.kernel.org/all/20250506194849.GT2023217@ZenIV/
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kazuma Kondo <kazuma-kondo@nec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4 months agofix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
Al Viro [Tue, 3 Jun 2025 21:57:27 +0000 (17:57 -0400)]
fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)

9ffb14ef61ba "move_mount: allow to add a mount into an existing group"
breaks assertions on ->mnt_share/->mnt_slave.  For once, the data structures
in question are actually documented.

Documentation/filesystem/sharedsubtree.rst:
        All vfsmounts in a peer group have the same ->mnt_master.  If it is
non-NULL, they form a contiguous (ordered) segment of slave list.

do_set_group() puts a mount into the same place in propagation graph
as the old one.  As the result, if old mount gets events from somewhere
and is not a pure event sink, new one needs to be placed next to the
old one in the slave list the old one's on.  If it is a pure event
sink, we only need to make sure the new one doesn't end up in the
middle of some peer group.

"move_mount: allow to add a mount into an existing group" ends up putting
the new one in the beginning of list; that's definitely not going to be
in the middle of anything, so that's fine for case when old is not marked
shared.  In case when old one _is_ marked shared (i.e. is not a pure event
sink), that breaks the assumptions of propagation graph iterators.

Put the new mount next to the old one on the list - that does the right thing
in "old is marked shared" case and is just as correct as the current behaviour
if old is not marked shared (kudos to Pavel for pointing that out - my original
suggested fix changed behaviour in the "nor marked" case, which complicated
things for no good reason).

Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: 9ffb14ef61ba ("move_mount: allow to add a mount into an existing group")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4 months agofinish_automount(): don't leak MNT_LOCKED from parent to child
Al Viro [Sun, 4 May 2025 17:28:37 +0000 (13:28 -0400)]
finish_automount(): don't leak MNT_LOCKED from parent to child

Intention for MNT_LOCKED had always been to protect the internal
mountpoints within a subtree that got copied across the userns boundary,
not the mountpoint that tree got attached to - after all, it _was_
exposed before the copying.

For roots of secondary copies that is enforced in attach_recursive_mnt() -
MNT_LOCKED is explicitly stripped for those.  For the root of primary
copy we are almost always guaranteed that MNT_LOCKED won't be there,
so attach_recursive_mnt() doesn't bother.  Unfortunately, one call
chain got overlooked - triggering e.g. NFS referral will have the
submount inherit the public flags from parent; that's fine for such
things as read-only, nosuid, etc., but not for MNT_LOCKED.

This is particularly pointless since the mount attached by finish_automount()
is usually expirable, which makes any protection granted by MNT_LOCKED
null and void; just wait for a while and that mount will go away on its own.

Include MNT_LOCKED into the set of flags to be ignored by do_add_mount() - it
really is an internal flag.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4 months agopath_overmount(): avoid false negatives
Al Viro [Sun, 1 Jun 2025 18:02:26 +0000 (14:02 -0400)]
path_overmount(): avoid false negatives

Holding namespace_sem is enough to make sure that result remains valid.
It is *not* enough to avoid false negatives from __lookup_mnt().  Mounts
can be unhashed outside of namespace_sem (stuck children getting detached
on final mntput() of lazy-umounted mount) and having an unrelated mount
removed from the hash chain while we traverse it may end up with false
negative from __lookup_mnt().  We need to sample and recheck the seqlock
component of mount_lock...

Bug predates the introduction of path_overmount() - it had come from
the code in finish_automount() that got abstracted into that helper.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: 26df6034fdb2 ("fix automount/automount race properly")
Fixes: 6ac392815628 ("fs: allow to mount beneath top mount")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4 months agofs/fhandle.c: fix a race in call of has_locked_children()
Al Viro [Sun, 1 Jun 2025 18:23:52 +0000 (14:23 -0400)]
fs/fhandle.c: fix a race in call of has_locked_children()

may_decode_fh() is calling has_locked_children() while holding no locks.
That's an oopsable race...

The rest of the callers are safe since they are holding namespace_sem and
are guaranteed a positive refcount on the mount in question.

Rename the current has_locked_children() to __has_locked_children(), make
it static and switch the fs/namespace.c users to it.

Make has_locked_children() a wrapper for __has_locked_children(), calling
the latter under read_seqlock_excl(&mount_lock).

Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes: 620c266f3949 ("fhandle: relax open_by_handle_at() permission checks")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4 months agoplatform/loongarch: laptop: Unregister generic_sub_drivers on exit
Yao Zi [Thu, 5 Jun 2025 12:34:46 +0000 (20:34 +0800)]
platform/loongarch: laptop: Unregister generic_sub_drivers on exit

Without correct unregisteration, ACPI notify handlers and the platform
drivers installed by generic_subdriver_init() will become dangling
references after removing the loongson_laptop module, triggering various
kernel faults when a hotkey is sent or at kernel shutdown.

Cc: stable@vger.kernel.org
Fixes: 6246ed09111f ("LoongArch: Add ACPI-based generic laptop driver")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
4 months agoplatform/loongarch: laptop: Add backlight power control support
Yao Zi [Thu, 5 Jun 2025 12:34:46 +0000 (20:34 +0800)]
platform/loongarch: laptop: Add backlight power control support

loongson_laptop_turn_{on,off}_backlight() are designed for controlling
the power of the backlight, but they aren't really used in the driver
previously.

Unify these two functions since they only differ in arguments passed to
ACPI method, and wire up loongson_laptop_backlight_update() to update
the power state of the backlight as well. Tested on the TongFang L860-T2
Loongson-3A5000 laptop.

Cc: stable@vger.kernel.org
Fixes: 6246ed09111f ("LoongArch: Add ACPI-based generic laptop driver")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
4 months agoMerge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 7 Jun 2025 03:02:51 +0000 (20:02 -0700)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Mostly trivial updates and bug fixes (core update is a comment
  spelling fix).

  The bigger UFS update is the clock scaling and frequency fixes"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: qcom: Prevent calling phy_exit() before phy_init()
  scsi: ufs: qcom: Call ufs_qcom_cfg_timers() in clock scaling path
  scsi: ufs: qcom: Map devfreq OPP freq to UniPro Core Clock freq
  scsi: ufs: qcom: Check gear against max gear in vop freq_to_gear()
  scsi: aacraid: Remove useless code
  scsi: core: devinfo: Fix typo in comment
  scsi: ufs: core: Don't perform UFS clkscaling during host async scan

4 months agoMerge tag 'riscv-for-linus-6.16-mw1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 7 Jun 2025 01:05:18 +0000 (18:05 -0700)]
Merge tag 'riscv-for-linus-6.16-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for the FWFT SBI extension, which is part of SBI 3.0 and a
   dependency for many new SBI and ISA extensions

 - Support for getrandom() in the VDSO

 - Support for mseal

 - Optimized routines for raid6 syndrome and recovery calculations

 - kexec_file() supports loading Image-formatted kernel binaries

 - Improvements to the instruction patching framework to allow for
   atomic instruction patching, along with rules as to how systems need
   to behave in order to function correctly

 - Support for a handful of new ISA extensions: Svinval, Zicbop, Zabha,
   some SiFive vendor extensions

 - Various fixes and cleanups, including: misaligned access handling,
   perf symbol mangling, module loading, PUD THPs, and improved uaccess
   routines

* tag 'riscv-for-linus-6.16-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (69 commits)
  riscv: uaccess: Only restore the CSR_STATUS SUM bit
  RISC-V: vDSO: Wire up getrandom() vDSO implementation
  riscv: enable mseal sysmap for RV64
  raid6: Add RISC-V SIMD syndrome and recovery calculations
  riscv: mm: Add support for Svinval extension
  RISC-V: Documentation: Add enough title underlines to CMODX
  riscv: Improve Kconfig help for RISCV_ISA_V_PREEMPTIVE
  MAINTAINERS: Update Atish's email address
  riscv: uaccess: do not do misaligned accesses in get/put_user()
  riscv: process: use unsigned int instead of unsigned long for put_user()
  riscv: make unsafe user copy routines use existing assembly routines
  riscv: hwprobe: export Zabha extension
  riscv: Make regs_irqs_disabled() more clear
  perf symbols: Ignore mapping symbols on riscv
  RISC-V: Kconfig: Fix help text of CMDLINE_EXTEND
  riscv: module: Optimize PLT/GOT entry counting
  riscv: Add support for PUD THP
  riscv: xchg: Prefetch the destination word for sc.w
  riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop
  riscv: Add support for Zicbop
  ...

4 months agoMerge tag 's390-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 7 Jun 2025 01:02:37 +0000 (18:02 -0700)]
Merge tag 's390-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:

 - Add missing select CRYPTO_ENGINE to CRYPTO_PAES_S390

 - Fix secure storage access exception handling when fault handling is
   disabled

* tag 's390-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mm: Fix in_atomic() handling in do_secure_storage_access()
  s390/crypto: Select crypto engine in Kconfig when PAES is chosen

4 months agoMerge tag 'tomoyo-pr-20250606' of git://git.code.sf.net/p/tomoyo/tomoyo
Linus Torvalds [Sat, 7 Jun 2025 01:00:36 +0000 (18:00 -0700)]
Merge tag 'tomoyo-pr-20250606' of git://git.code.sf.net/p/tomoyo/tomoyo

Pull tomoyo update from Tetsuo Handa:
 "Update mailing list address"

* tag 'tomoyo-pr-20250606' of git://git.code.sf.net/p/tomoyo/tomoyo:
  tomoyo: update mailing lists

4 months agoMerge tag 'ceph-for-6.16-rc1' of https://github.com/ceph/ceph-client
Linus Torvalds [Sat, 7 Jun 2025 00:56:19 +0000 (17:56 -0700)]
Merge tag 'ceph-for-6.16-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:

 - a one-liner that leads to a startling (but also very much rational)
   performance improvement in cases where an IMA policy with rules that
   are based on fsmagic matching is enforced

 - an encryption-related fixup that addresses generic/397 and other
   fstest failures

 - a couple of cleanups in CephFS

* tag 'ceph-for-6.16-rc1' of https://github.com/ceph/ceph-client:
  ceph: fix variable dereferenced before check in ceph_umount_begin()
  ceph: set superblock s_magic for IMA fsmagic matching
  ceph: cleanup hardcoded constants of file handle size
  ceph: fix possible integer overflow in ceph_zero_objects()
  ceph: avoid kernel BUG for encrypted inode with unaligned file size

4 months agoMerge tag 'ovl-update-v2-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/overl...
Linus Torvalds [Sat, 7 Jun 2025 00:54:09 +0000 (17:54 -0700)]
Merge tag 'ovl-update-v2-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs update from Miklos Szeredi:

 - Fix a regression in getting the path of an open file (e.g. in
   /proc/PID/maps) for a nested overlayfs setup (André Almeida)

 - Support data-only layers and verity in a user namespace (unprivileged
   composefs use case)

 - Fix a gcc warning (Kees)

 - Cleanups

* tag 'ovl-update-v2-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: Annotate struct ovl_entry with __counted_by()
  ovl: Replace offsetof() with struct_size() in ovl_stack_free()
  ovl: Replace offsetof() with struct_size() in ovl_cache_entry_new()
  ovl: Check for NULL d_inode() in ovl_dentry_upper()
  ovl: Use str_on_off() helper in ovl_show_options()
  ovl: don't require "metacopy=on" for "verity"
  ovl: relax redirect/metacopy requirements for lower -> data redirect
  ovl: make redirect/metacopy rejection consistent
  ovl: Fix nested backing file paths

4 months agotracing: PM: Remove unused clock events
Steven Rostedt [Thu, 5 Jun 2025 20:21:06 +0000 (16:21 -0400)]
tracing: PM: Remove unused clock events

The events clock_enable, clock_disable, and clock_set_rate were added back
in 2010. In 2011 they were used by the arm architecture but removed in
2013. These events add around 7K of memory which was wasted for the last 12
years.

Remove them.

Link: https://lore.kernel.org/all/20250529130138.544ffec4@gandalf.local.home/
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kajetan Puchalski <kajetan.puchalski@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/20250605162106.1a459dad@gandalf.local.home
Fixes: 74704ac6ea402 ("tracing, perf: Add more power related events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
4 months agoring-buffer: Fix buffer locking in ring_buffer_subbuf_order_set()
Dmitry Antipov [Fri, 6 Jun 2025 11:22:42 +0000 (14:22 +0300)]
ring-buffer: Fix buffer locking in ring_buffer_subbuf_order_set()

Enlarge the critical section in ring_buffer_subbuf_order_set() to
ensure that error handling takes place with per-buffer mutex held,
thus preventing list corruption and other concurrency-related issues.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Link: https://lore.kernel.org/20250606112242.1510605-1-dmantipov@yandex.ru
Reported-by: syzbot+05d673e83ec640f0ced9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=05d673e83ec640f0ced9
Fixes: f9b94daa542a8 ("ring-buffer: Set new size of the ring buffer sub page")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
4 months agotracing: Fix regression of filter waiting a long time on RCU synchronization
Steven Rostedt [Sat, 7 Jun 2025 00:20:20 +0000 (20:20 -0400)]
tracing: Fix regression of filter waiting a long time on RCU synchronization

When faultable trace events were added, a trace event may no longer use
normal RCU to synchronize but instead used synchronize_rcu_tasks_trace().
This synchronization takes a much longer time to synchronize.

The filter logic would free the filters by calling
tracepoint_synchronize_unregister() after it unhooked the filter strings
and before freeing them. With this function now calling
synchronize_rcu_tasks_trace() this increased the time to free a filter
tremendously. On a PREEMPT_RT system, it was even more noticeable.

 # time trace-cmd record -p function sleep 1
 [..]
 real 2m29.052s
 user 0m0.244s
 sys 0m20.136s

As trace-cmd would clear out all the filters before recording, it could
take up to 2 minutes to do a recording of "sleep 1".

To find out where the issues was:

 ~# trace-cmd sqlhist -e -n sched_stack  select start.prev_state as state, end.next_comm as comm, TIMESTAMP_DELTA_USECS as delta,  start.STACKTRACE as stack from sched_switch as start join sched_switch as end on start.prev_pid = end.next_pid

Which will produce the following commands (and -e will also execute them):

 echo 's:sched_stack s64 state; char comm[16]; u64 delta; unsigned long stack[];' >> /sys/kernel/tracing/dynamic_events
 echo 'hist:keys=prev_pid:__arg_18057_2=prev_state,__arg_18057_4=common_timestamp.usecs,__arg_18057_7=common_stacktrace' >> /sys/kernel/tracing/events/sched/sched_switch/trigger
 echo 'hist:keys=next_pid:__state_18057_1=$__arg_18057_2,__comm_18057_3=next_comm,__delta_18057_5=common_timestamp.usecs-$__arg_18057_4,__stack_18057_6=$__arg_18057_7:onmatch(sched.sched_switch).trace(sched_stack,$__state_18057_1,$__comm_18057_3,$__delta_18057_5,$__stack_18057_6)' >> /sys/kernel/tracing/events/sched/sched_switch/trigger

The above creates a synthetic event that creates a stack trace when a task
schedules out and records it with the time it scheduled back in. Basically
the time a task is off the CPU. It also records the state of the task when
it left the CPU (running, blocked, sleeping, etc). It also saves the comm
of the task as "comm" (needed for the next command).

~# echo 'hist:keys=state,stack.stacktrace:vals=delta:sort=state,delta if comm == "trace-cmd" &&  state & 3' > /sys/kernel/tracing/events/synthetic/sched_stack/trigger

The above creates a histogram with buckets per state, per stack, and the
value of the total time it was off the CPU for that stack trace. It filters
on tasks with "comm == trace-cmd" and only the sleeping and blocked states
(1 - sleeping, 2 - blocked).

~# trace-cmd record -p function sleep 1

~# cat /sys/kernel/tracing/events/synthetic/sched_stack/hist | tail -18
{ state:          2, stack.stacktrace         __schedule+0x1545/0x3700
         schedule+0xe2/0x390
         schedule_timeout+0x175/0x200
         wait_for_completion_state+0x294/0x440
         __wait_rcu_gp+0x247/0x4f0
         synchronize_rcu_tasks_generic+0x151/0x230
         apply_subsystem_event_filter+0xa2b/0x1300
         subsystem_filter_write+0x67/0xc0
         vfs_write+0x1e2/0xeb0
         ksys_write+0xff/0x1d0
         do_syscall_64+0x7b/0x420
         entry_SYSCALL_64_after_hwframe+0x76/0x7e
} hitcount:        237  delta:   99756288  <<--------------- Delta is 99 seconds!

Totals:
    Hits: 525
    Entries: 21
    Dropped: 0

This shows that this particular trace waited for 99 seconds on
synchronize_rcu_tasks() in apply_subsystem_event_filter().

In fact, there's a lot of places in the filter code that spends a lot of
time waiting for synchronize_rcu_tasks_trace() in order to free the
filters.

Add helper functions that will use call_rcu*() variants to asynchronously
free the filters. This brings the timings back to normal:

 # time trace-cmd record -p function sleep 1
 [..]
 real 0m14.681s
 user 0m0.335s
 sys 0m28.616s

And the histogram also shows this:

~# cat /sys/kernel/tracing/events/synthetic/sched_stack/hist | tail -21
{ state:          2, stack.stacktrace         __schedule+0x1545/0x3700
         schedule+0xe2/0x390
         schedule_timeout+0x175/0x200
         wait_for_completion_state+0x294/0x440
         __wait_rcu_gp+0x247/0x4f0
         synchronize_rcu_normal+0x3db/0x5c0
         tracing_reset_online_cpus+0x8f/0x1e0
         tracing_open+0x335/0x440
         do_dentry_open+0x4c6/0x17a0
         vfs_open+0x82/0x360
         path_openat+0x1a36/0x2990
         do_filp_open+0x1c5/0x420
         do_sys_openat2+0xed/0x180
         __x64_sys_openat+0x108/0x1d0
         do_syscall_64+0x7b/0x420
} hitcount:          2  delta:      77044

Totals:
    Hits: 55
    Entries: 28
    Dropped: 0

Where the total waiting time of synchronize_rcu_tasks_trace() is 77
milliseconds.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Andreas Ziegler <ziegler.andreas@siemens.com>
Cc: Felix MOESSBAUER <felix.moessbauer@siemens.com>
Link: https://lore.kernel.org/20250606201936.1e3d09a9@batman.local.home
Reported-by: "Flot, Julien" <julien.flot@siemens.com>
Tested-by: Julien Flot <julien.flot@siemens.com>
Fixes: a363d27cdbc2 ("tracing: Allow system call tracepoints to handle page faults")
Closes: https://lore.kernel.org/all/240017f656631c7dd4017aa93d91f41f653788ea.camel@siemens.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>