]> www.infradead.org Git - linux.git/log
linux.git
11 months agoiommu/arm-smmu-v3: Use S2FWB for NESTED domains
Jason Gunthorpe [Thu, 31 Oct 2024 00:20:54 +0000 (21:20 -0300)]
iommu/arm-smmu-v3: Use S2FWB for NESTED domains

Force Write Back (FWB) changes how the S2 IOPTE's MemAttr field
works. When S2FWB is supported and enabled the IOPTE will force cachable
access to IOMMU_CACHE memory when nesting with a S1 and deny cachable
access when !IOMMU_CACHE.

When using a single stage of translation, a simple S2 domain, it doesn't
change things for PCI devices as it is just a different encoding for the
existing mapping of the IOMMU protection flags to cachability attributes.
For non-PCI it also changes the combining rules when incoming transactions
have inconsistent attributes.

However, when used with a nested S1, FWB has the effect of preventing the
guest from choosing a MemAttr in it's S1 that would cause ordinary DMA to
bypass the cache. Consistent with KVM we wish to deny the guest the
ability to become incoherent with cached memory the hypervisor believes is
cachable so we don't have to flush it.

Allow NESTED domains to be created if the SMMU has S2FWB support and use
S2FWB for NESTING_PARENTS. This is an additional option to CANWBS.

Link: https://patch.msgid.link/r/10-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED
Jason Gunthorpe [Thu, 31 Oct 2024 00:20:53 +0000 (21:20 -0300)]
iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED

For SMMUv3 a IOMMU_DOMAIN_NESTED is composed of a S2 iommu_domain acting
as the parent and a user provided STE fragment that defines the CD table
and related data with addresses translated by the S2 iommu_domain.

The kernel only permits userspace to control certain allowed bits of the
STE that are safe for user/guest control.

IOTLB maintenance is a bit subtle here, the S1 implicitly includes the S2
translation, but there is no way of knowing which S1 entries refer to a
range of S2.

For the IOTLB we follow ARM's guidance and issue a CMDQ_OP_TLBI_NH_ALL to
flush all ASIDs from the VMID after flushing the S2 on any change to the
S2.

The IOMMU_DOMAIN_NESTED can only be created from inside a VIOMMU as the
invalidation path relies on the VIOMMU to translate virtual stream ID used
in the invalidation commands for the CD table and ATS.

Link: https://patch.msgid.link/r/9-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC
Nicolin Chen [Thu, 31 Oct 2024 00:20:52 +0000 (21:20 -0300)]
iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC

Add a new driver-type for ARM SMMUv3 to enum iommu_viommu_type. Implement
an arm_vsmmu_alloc().

As an initial step, copy the VMID from s2_parent. A followup series is
required to give the VIOMMU object it's own VMID that will be used in all
nesting configurations.

Link: https://patch.msgid.link/r/8-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoMerge branch 'iommufd/arm-smmuv3-nested' of iommu/linux into iommufd for-next
Jason Gunthorpe [Tue, 12 Nov 2024 17:47:28 +0000 (13:47 -0400)]
Merge branch 'iommufd/arm-smmuv3-nested' of iommu/linux into iommufd for-next

Common SMMUv3 patches for the following patches adding nesting, shared
branch with the iommu tree.

* 'iommufd/arm-smmuv3-nested' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/arm-smmu-v3: Expose the arm_smmu_attach interface
  iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT
  iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
  iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS
  ACPI/IORT: Support CANWBS memory access flag
  ACPICA: IORT: Update for revision E.f
  vfio: Remove VFIO_TYPE1_NESTING_IOMMU
  ...

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoDocumentation: userspace-api: iommufd: Update vDEVICE
Nicolin Chen [Tue, 5 Nov 2024 20:05:18 +0000 (12:05 -0800)]
Documentation: userspace-api: iommufd: Update vDEVICE

With the introduction of the new object and its infrastructure, update the
doc and the vIOMMU graph to reflect that.

Link: https://patch.msgid.link/r/e1ff278b7163909b2641ae04ff364bb41d2a2a2e.1730836308.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl
Nicolin Chen [Tue, 5 Nov 2024 20:05:17 +0000 (12:05 -0800)]
iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl

Add a viommu_cache test function to cover vIOMMU invalidations using the
updated IOMMU_HWPT_INVALIDATE ioctl, which now allows passing in a vIOMMU
via its hwpt_id field.

Link: https://patch.msgid.link/r/f317f902041f3d05deaee4ca3fdd8ef4b8297361.1730836308.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command
Nicolin Chen [Tue, 5 Nov 2024 20:05:16 +0000 (12:05 -0800)]
iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command

Similar to IOMMU_TEST_OP_MD_CHECK_IOTLB verifying a mock_domain's iotlb,
IOMMU_TEST_OP_DEV_CHECK_CACHE will be used to verify a mock_dev's cache.

Link: https://patch.msgid.link/r/cd4082079d75427bd67ed90c3c825e15b5720a5f.1730836308.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd/selftest: Add mock_viommu_cache_invalidate
Nicolin Chen [Tue, 5 Nov 2024 20:05:15 +0000 (12:05 -0800)]
iommufd/selftest: Add mock_viommu_cache_invalidate

Similar to the coverage of cache_invalidate_user for iotlb invalidation,
add a device cache and a viommu_cache_invalidate function to test it out.

Link: https://patch.msgid.link/r/a29c7c23d7cd143fb26ab68b3618e0957f485fdb.1730836308.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd/viommu: Add iommufd_viommu_find_dev helper
Nicolin Chen [Tue, 5 Nov 2024 20:05:14 +0000 (12:05 -0800)]
iommufd/viommu: Add iommufd_viommu_find_dev helper

This avoids a bigger trouble of exposing struct iommufd_device and struct
iommufd_vdevice in the public header.

Link: https://patch.msgid.link/r/84fa7c624db4d4508067ccfdf42059533950180a.1730836308.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommu: Add iommu_copy_struct_from_full_user_array helper
Jason Gunthorpe [Tue, 5 Nov 2024 20:05:13 +0000 (12:05 -0800)]
iommu: Add iommu_copy_struct_from_full_user_array helper

The iommu_copy_struct_from_user_array helper can be used to copy a single
entry from a user array which might not be efficient if the array is big.

Add a new iommu_copy_struct_from_full_user_array to copy the entire user
array at once. Update the existing iommu_copy_struct_from_user_array kdoc
accordingly.

Link: https://patch.msgid.link/r/5cd773d9c26920c5807d232b21d415ea79172e49.1730836308.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE
Nicolin Chen [Tue, 5 Nov 2024 20:05:12 +0000 (12:05 -0800)]
iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE

With a vIOMMU object, use space can flush any IOMMU related cache that can
be directed via a vIOMMU object. It is similar to the IOMMU_HWPT_INVALIDATE
uAPI, but can cover a wider range than IOTLB, e.g. device/desciprtor cache.

Allow hwpt_id of the iommu_hwpt_invalidate structure to carry a viommu_id,
and reuse the IOMMU_HWPT_INVALIDATE uAPI for vIOMMU invalidations. Drivers
can define different structures for vIOMMU invalidations v.s. HWPT ones.

Since both the HWPT-based and vIOMMU-based invalidation pathways check own
cache invalidation op, remove the WARN_ON_ONCE in the allocator.

Update the uAPI, kdoc, and selftest case accordingly.

Link: https://patch.msgid.link/r/b411e2245e303b8a964f39f49453a5dff280968f.1730836308.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommu/viommu: Add cache_invalidate to iommufd_viommu_ops
Nicolin Chen [Tue, 5 Nov 2024 20:05:11 +0000 (12:05 -0800)]
iommu/viommu: Add cache_invalidate to iommufd_viommu_ops

This per-vIOMMU cache_invalidate op is like the cache_invalidate_user op
in struct iommu_domain_ops, but wider, supporting device cache (e.g. PCI
ATC invaldiations).

Link: https://patch.msgid.link/r/90138505850fa6b165135e78a87b4cc7022869a4.1730836308.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage
Nicolin Chen [Tue, 5 Nov 2024 20:05:10 +0000 (12:05 -0800)]
iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage

Add a vdevice_alloc op to the viommu mock_viommu_ops for the coverage of
IOMMU_VIOMMU_TYPE_SELFTEST allocations. Then, add a vdevice_alloc TEST_F
to cover the IOMMU_VDEVICE_ALLOC ioctl.

Link: https://patch.msgid.link/r/4b9607e5b86726c8baa7b89bd48123fb44104a23.1730836308.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl
Nicolin Chen [Tue, 5 Nov 2024 20:05:09 +0000 (12:05 -0800)]
iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl

Introduce a new IOMMUFD_OBJ_VDEVICE to represent a physical device (struct
device) against a vIOMMU (struct iommufd_viommu) object in a VM.

This vDEVICE object (and its structure) holds all the infos and attributes
in the VM, regarding the device related to the vIOMMU.

As an initial patch, add a per-vIOMMU virtual ID. This can be:
 - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table
 - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table
 - Virtual RID on a nested Intel VT-D IOMMU, an index to a Context Table
Potentially, this vDEVICE structure would hold some vData for Confidential
Compute Architecture (CCA). Use this virtual ID to index an "vdevs" xarray
that belongs to a vIOMMU object.

Add a new ioctl for vDEVICE allocations. Since a vDEVICE is a connection
of a device object and an iommufd_viommu object, take two refcounts in the
ioctl handler.

Link: https://patch.msgid.link/r/cda8fd2263166e61b8191a3b3207e0d2b08545bf.1730836308.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoDocumentation: userspace-api: iommufd: Update vIOMMU
Nicolin Chen [Tue, 5 Nov 2024 20:04:29 +0000 (12:04 -0800)]
Documentation: userspace-api: iommufd: Update vIOMMU

With the introduction of the new object and its infrastructure, update the
doc to reflect that and add a new graph.

Link: https://patch.msgid.link/r/7e4302064e0d02137c1b1e139342affc0485ed3f.1730836219.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage
Nicolin Chen [Tue, 5 Nov 2024 20:04:28 +0000 (12:04 -0800)]
iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage

Add a new iommufd_viommu FIXTURE and setup it up with a vIOMMU object.

Any new vIOMMU feature will be added as a TEST_F under that.

Link: https://patch.msgid.link/r/abe267c9d004b29cb1712ceba2f378209d4b7e01.1730836219.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST
Nicolin Chen [Tue, 5 Nov 2024 20:04:27 +0000 (12:04 -0800)]
iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST

Implement the viommu alloc/free functions to increase/reduce refcount of
its dependent mock iommu device. User space can verify this loop via the
IOMMU_VIOMMU_TYPE_SELFTEST.

Link: https://patch.msgid.link/r/9d755a215a3007d4d8d1c2513846830332db62aa.1730836219.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd/selftest: Add refcount to mock_iommu_device
Nicolin Chen [Tue, 5 Nov 2024 20:04:26 +0000 (12:04 -0800)]
iommufd/selftest: Add refcount to mock_iommu_device

For an iommu_dev that can unplug (so far only this selftest does so), the
viommu->iommu_dev pointer has no guarantee of its life cycle after it is
copied from the idev->dev->iommu->iommu_dev.

Track the user count of the iommu_dev. Postpone the exit routine using a
completion, if refcount is unbalanced. The refcount inc/dec will be added
in the following patch.

Link: https://patch.msgid.link/r/33f28d64841b497eebef11b49a571e03103c5d24.1730836219.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd/selftest: Prepare for mock_viommu_alloc_domain_nested()
Nicolin Chen [Tue, 5 Nov 2024 20:04:25 +0000 (12:04 -0800)]
iommufd/selftest: Prepare for mock_viommu_alloc_domain_nested()

A nested domain now can be allocated for a parent domain or for a vIOMMU
object. Rework the existing allocators to prepare for the latter case.

Link: https://patch.msgid.link/r/f62894ad8ccae28a8a616845947fe4b76135d79b.1730836219.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd/selftest: Add container_of helpers
Nicolin Chen [Tue, 5 Nov 2024 20:04:24 +0000 (12:04 -0800)]
iommufd/selftest: Add container_of helpers

Use these inline helpers to shorten those container_of lines.

Note that one of them goes back and forth between iommu_domain and
mock_iommu_domain, which isn't necessary. So drop its container_of.

Link: https://patch.msgid.link/r/518ec64dae2e814eb29fd9f170f58a3aad56c81c.1730836219.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC
Nicolin Chen [Tue, 5 Nov 2024 20:04:23 +0000 (12:04 -0800)]
iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC

Now a vIOMMU holds a shareable nesting parent HWPT. So, it can act like
that nesting parent HWPT to allocate a nested HWPT.

Support that in the IOMMU_HWPT_ALLOC ioctl handler, and update its kdoc.

Also, add an iommufd_viommu_alloc_hwpt_nested helper to allocate a nested
HWPT for a vIOMMU object. Since a vIOMMU object holds the parent hwpt's
refcount already, increase the refcount of the vIOMMU only.

Link: https://patch.msgid.link/r/a0f24f32bfada8b448d17587adcaedeeb50a67ed.1730836219.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: Add alloc_domain_nested op to iommufd_viommu_ops
Nicolin Chen [Tue, 5 Nov 2024 20:04:22 +0000 (12:04 -0800)]
iommufd: Add alloc_domain_nested op to iommufd_viommu_ops

Allow IOMMU driver to use a vIOMMU object that holds a nesting parent
hwpt/domain to allocate a nested domain.

Link: https://patch.msgid.link/r/2dcdb5e405dc0deb68230564530d989d285d959c.1730836219.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl
Nicolin Chen [Tue, 5 Nov 2024 20:04:21 +0000 (12:04 -0800)]
iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl

Add a new ioctl for user space to do a vIOMMU allocation. It must be based
on a nesting parent HWPT, so take its refcount.

IOMMU driver wanting to support vIOMMUs must define its IOMMU_VIOMMU_TYPE_
in the uAPI header and implement a viommu_alloc op in its iommu_ops.

Link: https://patch.msgid.link/r/dc2b8ba9ac935007beff07c1761c31cd097ed780.1730836219.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: Verify object in iommufd_object_finalize/abort()
Nicolin Chen [Tue, 5 Nov 2024 20:04:20 +0000 (12:04 -0800)]
iommufd: Verify object in iommufd_object_finalize/abort()

To support driver-allocated vIOMMU objects, it's required for IOMMU driver
to call the provided iommufd_viommu_alloc helper to embed the core struct.
However, there is no guarantee that every driver will call it and allocate
objects properly.

Make the iommufd_object_finalize/abort functions more robust to verify if
the xarray slot indexed by the input obj->id is having an XA_ZERO_ENTRY,
which is the reserved value stored by xa_alloc via iommufd_object_alloc.

Link: https://patch.msgid.link/r/334bd4dde8e0a88eb30fa67eeef61827cdb546f9.1730836219.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct
Nicolin Chen [Tue, 5 Nov 2024 20:04:19 +0000 (12:04 -0800)]
iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct

Add a new IOMMUFD_OBJ_VIOMMU with an iommufd_viommu structure to represent
a slice of physical IOMMU device passed to or shared with a user space VM.
This slice, now a vIOMMU object, is a group of virtualization resources of
a physical IOMMU's, such as:
 - Security namespace for guest owned ID, e.g. guest-controlled cache tags
 - Non-device-affiliated event reporting, e.g. invalidation queue errors
 - Access to a sharable nesting parent pagetable across physical IOMMUs
 - Virtualization of various platforms IDs, e.g. RIDs and others
 - Delivery of paravirtualized invalidation
 - Direct assigned invalidation queues
 - Direct assigned interrupts

Add a new viommu_alloc op in iommu_ops, for drivers to allocate their own
vIOMMU structures. And this allocation also needs a free(), so add struct
iommufd_viommu_ops.

To simplify a vIOMMU allocation, provide a iommufd_viommu_alloc() helper.
It's suggested that a driver should embed a core-level viommu structure in
its driver-level viommu struct and call the iommufd_viommu_alloc() helper,
meanwhile the driver can also implement a viommu ops:
    struct my_driver_viommu {
        struct iommufd_viommu core;
        /* driver-owned properties/features */
        ....
    };

    static const struct iommufd_viommu_ops my_driver_viommu_ops = {
        .free = my_driver_viommu_free,
        /* future ops for virtualization features */
        ....
    };

    static struct iommufd_viommu my_driver_viommu_alloc(...)
    {
        struct my_driver_viommu *my_viommu =
                iommufd_viommu_alloc(ictx, my_driver_viommu, core,
                                     my_driver_viommu_ops);
        /* Init my_viommu and related HW feature */
        ....
        return &my_viommu->core;
    }

    static struct iommu_domain_ops my_driver_domain_ops = {
        ....
        .viommu_alloc = my_driver_viommu_alloc,
    };

Link: https://patch.msgid.link/r/64685e2b79dea0f1dc56f6ede04809b72d578935.1730836219.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: Move _iommufd_object_alloc helper to a sharable file
Nicolin Chen [Tue, 5 Nov 2024 20:04:18 +0000 (12:04 -0800)]
iommufd: Move _iommufd_object_alloc helper to a sharable file

The following patch will add a new vIOMMU allocator that will require this
_iommufd_object_alloc to be sharable with IOMMU drivers (and iommufd too).

Add a new driver.c file that will be built with CONFIG_IOMMUFD_DRIVER_CORE
selected by CONFIG_IOMMUFD, and put the CONFIG_DRIVER under that remaining
to be selectable for drivers to build the existing iova_bitmap.c file.

Link: https://patch.msgid.link/r/2f4f6e116dc49ffb67ff6c5e8a7a8e789ab9e98e.1730836219.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: Move struct iommufd_object to public iommufd header
Nicolin Chen [Tue, 5 Nov 2024 20:04:17 +0000 (12:04 -0800)]
iommufd: Move struct iommufd_object to public iommufd header

Prepare for an embedded structure design for driver-level iommufd_viommu
objects:
    // include/linux/iommufd.h
    struct iommufd_viommu {
        struct iommufd_object obj;
        ....
    };

    // Some IOMMU driver
    struct iommu_driver_viommu {
        struct iommufd_viommu core;
        ....
    };

It has to expose struct iommufd_object and enum iommufd_object_type from
the core-level private header to the public iommufd header.

Link: https://patch.msgid.link/r/54a43b0768089d690104530754f499ca05ce0074.1730836219.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: Allow fault reporting for non-PRI PCI devices
Zhangfei Gao [Thu, 7 Nov 2024 04:37:11 +0000 (04:37 +0000)]
iommufd: Allow fault reporting for non-PRI PCI devices

iommufd_fault_iopf_enable has limitation to PRI on PCI/SRIOV VFs because
the PRI might be a shared resource and current iommu subsystem is not
ready to support enabling/disabling PRI on a VF without any impact on
others.

However, we have devices that appear as PCI but are actually on the AMBA
bus. These fake PCI devices have PASID capability, support stall as well
as SRIOV, so remove the limitation for these devices.

Link: https://patch.msgid.link/r/20241107043711.116-1-zhangfei.gao@linaro.org
Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommu/arm-smmu-v3: Expose the arm_smmu_attach interface
Jason Gunthorpe [Thu, 31 Oct 2024 00:20:51 +0000 (21:20 -0300)]
iommu/arm-smmu-v3: Expose the arm_smmu_attach interface

The arm-smmuv3-iommufd.c file will need to call these functions too.
Remove statics and put them in the header file. Remove the kunit
visibility protections from arm_smmu_make_abort_ste() and
arm_smmu_make_s2_domain_ste().

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/7-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
11 months agoiommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT
Jason Gunthorpe [Thu, 31 Oct 2024 00:20:50 +0000 (21:20 -0300)]
iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT

For SMMUv3 the parent must be a S2 domain, which can be composed
into a IOMMU_DOMAIN_NESTED.

In future the S2 parent will also need a VMID linked to the VIOMMU and
even to KVM.

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/6-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
11 months agoiommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
Nicolin Chen [Thu, 31 Oct 2024 00:20:49 +0000 (21:20 -0300)]
iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info

For virtualization cases the IDR/IIDR/AIDR values of the actual SMMU
instance need to be available to the VMM so it can construct an
appropriate vSMMUv3 that reflects the correct HW capabilities.

For userspace page tables these values are required to constrain the valid
values within the CD table and the IOPTEs.

The kernel does not sanitize these values. If building a VMM then
userspace is required to only forward bits into a VM that it knows it can
implement. Some bits will also require a VMM to detect if appropriate
kernel support is available such as for ATS and BTM.

Start a new file and kconfig for the advanced iommufd support. This lets
it be compiled out for kernels that are not intended to support
virtualization, and allows distros to leave it disabled until they are
shipping a matching qemu too.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
11 months agoiommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS
Jason Gunthorpe [Thu, 31 Oct 2024 00:20:48 +0000 (21:20 -0300)]
iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS

HW with CANWBS is always cache coherent and ignores PCI No Snoop requests
as well. This meets the requirement for IOMMU_CAP_ENFORCE_CACHE_COHERENCY,
so let's return it.

Implement the enforce_cache_coherency() op to reject attaching devices
that don't have CANWBS.

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
11 months agoACPI/IORT: Support CANWBS memory access flag
Nicolin Chen [Thu, 31 Oct 2024 00:20:47 +0000 (21:20 -0300)]
ACPI/IORT: Support CANWBS memory access flag

The IORT spec, Issue E.f (April 2024), adds a new CANWBS bit to the Memory
Access Flag field in the Memory Access Properties table, mainly for a PCI
Root Complex.

This CANWBS defines the coherency of memory accesses to be not marked IOWB
cacheable/shareable. Its value further implies the coherency impact from a
pair of mismatched memory attributes (e.g. in a nested translation case):
  0x0: Use of mismatched memory attributes for accesses made by this
       device may lead to a loss of coherency.
  0x1: Coherency of accesses made by this device to locations in
       Conventional memory are ensured as follows, even if the memory
       attributes for the accesses presented by the device or provided by
       the SMMU are different from Inner and Outer Write-back cacheable,
       Shareable.

Note that the loss of coherency on a CANWBS-unsupported HW typically could
occur to an SMMU that doesn't implement the S2FWB feature where additional
cache flush operations would be required to prevent that from happening.

Add a new ACPI_IORT_MF_CANWBS flag and set IOMMU_FWSPEC_PCI_RC_CANWBS upon
the presence of this new flag.

CANWBS and S2FWB are similar features, in that they both guarantee the VM
can not violate coherency, however S2FWB can be bypassed by PCI No Snoop
TLPs, while CANWBS cannot. Thus CANWBS meets the requirements to set
IOMMU_CAP_ENFORCE_CACHE_COHERENCY.

Architecturally ARM has expected that VFIO would disable No Snoop through
PCI Config space, if this is done then the two would have the same
protections.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
11 months agoACPICA: IORT: Update for revision E.f
Nicolin Chen [Thu, 31 Oct 2024 00:20:46 +0000 (21:20 -0300)]
ACPICA: IORT: Update for revision E.f

ACPICA commit c4f5c083d24df9ddd71d5782c0988408cf0fc1ab

The IORT spec, Issue E.f (April 2024), adds a new CANWBS bit to the Memory
Access Flag field in the Memory Access Properties table, mainly for a PCI
Root Complex.

This CANWBS defines the coherency of memory accesses to be not marked IOWB
cacheable/shareable. Its value further implies the coherency impact from a
pair of mismatched memory attributes (e.g. in a nested translation case):
  0x0: Use of mismatched memory attributes for accesses made by this
       device may lead to a loss of coherency.
  0x1: Coherency of accesses made by this device to locations in
       Conventional memory are ensured as follows, even if the memory
       attributes for the accesses presented by the device or provided by
       the SMMU are different from Inner and Outer Write-back cacheable,
       Shareable.

Link: https://github.com/acpica/acpica/commit/c4f5c083
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Acked-by: Hanjun Guo <guohanjun@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
11 months agovfio: Remove VFIO_TYPE1_NESTING_IOMMU
Jason Gunthorpe [Thu, 31 Oct 2024 00:20:45 +0000 (21:20 -0300)]
vfio: Remove VFIO_TYPE1_NESTING_IOMMU

This control causes the ARM SMMU drivers to choose a stage 2
implementation for the IO pagetable (vs the stage 1 usual default),
however this choice has no significant visible impact to the VFIO
user. Further qemu never implemented this and no other userspace user is
known.

The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add
new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide
SMMU translation services to the guest operating system" however the rest
of the API to set the guest table pointer for the stage 1 and manage
invalidation was never completed, or at least never upstreamed, rendering
this part useless dead code.

Upstream has now settled on iommufd as the uAPI for controlling nested
translation. Choosing the stage 2 implementation should be done by through
the IOMMU_HWPT_ALLOC_NEST_PARENT flag during domain allocation.

Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the
enable_nesting iommu_domain_op.

Just in-case there is some userspace using this continue to treat
requesting it as a NOP, but do not advertise support any more.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
11 months agoiommufd: Selftest coverage for IOMMU_IOAS_MAP_FILE
Steve Sistare [Fri, 25 Oct 2024 13:11:59 +0000 (06:11 -0700)]
iommufd: Selftest coverage for IOMMU_IOAS_MAP_FILE

Add test cases to exercise IOMMU_IOAS_MAP_FILE.

Link: https://patch.msgid.link/r/1729861919-234514-10-git-send-email-steven.sistare@oracle.com
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: File mappings for mdev
Steve Sistare [Fri, 25 Oct 2024 13:11:58 +0000 (06:11 -0700)]
iommufd: File mappings for mdev

Support file mappings for mediated devices, aka mdevs.  Access is
initiated by the vfio_pin_pages() and vfio_dma_rw() kernel interfaces.

Link: https://patch.msgid.link/r/1729861919-234514-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: Add IOMMU_IOAS_MAP_FILE
Steve Sistare [Fri, 25 Oct 2024 13:11:57 +0000 (06:11 -0700)]
iommufd: Add IOMMU_IOAS_MAP_FILE

Define the IOMMU_IOAS_MAP_FILE ioctl interface, which allows a user to
register memory by passing a memfd plus offset and length.  Implement it
using the memfd_pin_folios() kAPI.

Link: https://patch.msgid.link/r/1729861919-234514-8-git-send-email-steven.sistare@oracle.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: pfn_reader for file mappings
Steve Sistare [Fri, 25 Oct 2024 13:11:56 +0000 (06:11 -0700)]
iommufd: pfn_reader for file mappings

Extend pfn_reader_user() to pin file mappings, by calling
memfd_pin_folios().  Repin at small page granularity, and fill the batch
from folios.  Expand folios to upages for the iopt_pages_fill() path.

Link: https://patch.msgid.link/r/1729861919-234514-7-git-send-email-steven.sistare@oracle.com
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: Folio subroutines
Steve Sistare [Fri, 25 Oct 2024 13:11:55 +0000 (06:11 -0700)]
iommufd: Folio subroutines

Add subroutines for copying folios to a batch.

Link: https://patch.msgid.link/r/1729861919-234514-6-git-send-email-steven.sistare@oracle.com
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: pfn_reader local variables
Steve Sistare [Fri, 25 Oct 2024 13:11:54 +0000 (06:11 -0700)]
iommufd: pfn_reader local variables

Add local variables for common sub-expressions needed by a subsequent
patch.  No functional change.

Link: https://patch.msgid.link/r/1729861919-234514-5-git-send-email-steven.sistare@oracle.com
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: Generalize iopt_pages address
Steve Sistare [Fri, 25 Oct 2024 13:11:53 +0000 (06:11 -0700)]
iommufd: Generalize iopt_pages address

The starting address in iopt_pages is currently a __user *uptr.
Generalize to allow other types of addresses.  Refactor iopt_alloc_pages()
and iopt_map_user_pages() into address-type specific and common functions.

Link: https://patch.msgid.link/r/1729861919-234514-4-git-send-email-steven.sistare@oracle.com
Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agoiommufd: Rename uptr in iopt_alloc_iova()
Steve Sistare [Fri, 25 Oct 2024 13:11:52 +0000 (06:11 -0700)]
iommufd: Rename uptr in iopt_alloc_iova()

iopt_alloc_iova() takes a uptr argument but only checks for its alignment.
Generalize this to an unsigned address, which can be the offset from the
start of a file in a subsequent patch.  No functional change.

Link: https://patch.msgid.link/r/1729861919-234514-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 months agomm/gup: Add folio_add_pins()
Steve Sistare [Fri, 25 Oct 2024 13:11:51 +0000 (06:11 -0700)]
mm/gup: Add folio_add_pins()

Export a function that adds pins to an already-pinned huge-page folio.
This allows any range of small pages within the folio to be unpinned later.
For example, pages pinned via memfd_pin_folios and modified by
folio_add_pins could be unpinned via unpin_user_page(s).

Link: https://patch.msgid.link/r/1729861919-234514-2-git-send-email-steven.sistare@oracle.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
12 months agoLinux 6.12-rc3 v6.12-rc3
Linus Torvalds [Sun, 13 Oct 2024 21:33:32 +0000 (14:33 -0700)]
Linux 6.12-rc3

12 months agoMerge tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 13 Oct 2024 17:52:39 +0000 (10:52 -0700)]
Merge tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Two fixes for Windows symlink handling"

* tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix creating native symlinks pointing to current or parent directory
  cifs: Improve creating native symlinks pointing to directory

12 months agoMerge tag 'usb-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 13 Oct 2024 16:21:36 +0000 (09:21 -0700)]
Merge tag 'usb-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB fixes for some reported problems for 6.12-rc3.
  Include in here is:

   - fix for yurex driver that was caused in -rc1

   - build error fix for usbg network filesystem code

   - onboard_usb_dev build fix

   - dwc3 driver fixes for reported errors

   - gadget driver fix

   - new USB storage driver quirk

   - xhci resume bugfix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  net/9p/usbg: Fix build error
  USB: yurex: kill needless initialization in yurex_read
  Revert "usb: yurex: Replace snprintf() with the safer scnprintf() variant"
  usb: xhci: Fix problem with xhci resume from suspend
  usb: misc: onboard_usb_dev: introduce new config symbol for usb5744 SMBus support
  usb: dwc3: core: Stop processing of pending events if controller is halted
  usb: dwc3: re-enable runtime PM after failed resume
  usb: storage: ignore bogus device raised by JieLi BR21 USB sound chip
  usb: gadget: core: force synchronous registration

12 months agoMerge tag 'driver-core-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 13 Oct 2024 16:10:52 +0000 (09:10 -0700)]
Merge tag 'driver-core-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here is a single driver core fix, and a .mailmap update.

  The fix is for the rust driver core bindings, turned out that the
  from_raw binding wasn't a good idea (don't want to pass a pointer to a
  reference counted object without actually incrementing the pointer.)
  So this change fixes it up as the from_raw binding came in in -rc1.

  The other change is a .mailmap update.

  Both have been in linux-next for a while with no reported issues"

* tag 'driver-core-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  mailmap: update mail for Fiona Behrens
  rust: device: change the from_raw() function

12 months agoMerge tag 'powerpc-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 13 Oct 2024 00:16:21 +0000 (17:16 -0700)]
Merge tag 'powerpc-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:

 - Fix crash in memcpy on 8xx due to dcbz workaround since recent
   changes

Thanks to Christophe Leroy.

* tag 'powerpc-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/8xx: Fix kernel DTLB miss on dcbz

12 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 12 Oct 2024 16:24:13 +0000 (09:24 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four small fixes, three in drivers and one in the FC transport class
  to add idempotence to state setting"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_transport_fc: Allow setting rport state to current state
  scsi: wd33c93: Don't use stale scsi_pointer value
  scsi: fnic: Move flush_work initialization out of if block
  scsi: ufs: Use pre-calculated offsets in ufshcd_init_lrb()

12 months agoMerge tag 'hwmon-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 12 Oct 2024 16:09:04 +0000 (09:09 -0700)]
Merge tag 'hwmon-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Add missing dependencies on REGMAP_I2C for several drivers

 - Fix memory leak in adt7475 driver

 - Relabel Columbiaville temperature sensor in intel-m10-bmc-hwmon
   driver to match other sensor labels

* tag 'hwmon-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (max1668) Add missing dependency on REGMAP_I2C
  hwmon: (ltc2991) Add missing dependency on REGMAP_I2C
  hwmon: (adt7470) Add missing dependency on REGMAP_I2C
  hwmon: (adm9240) Add missing dependency on REGMAP_I2C
  hwmon: (mc34vr500) Add missing dependency on REGMAP_I2C
  hwmon: (tmp513) Add missing dependency on REGMAP_I2C
  hwmon: (adt7475) Fix memory leak in adt7475_fan_pwm_config()
  hwmon: intel-m10-bmc-hwmon: relabel Columbiaville to CVL Die Temperature

12 months agoMerge tag 'linux_kselftest-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Fri, 11 Oct 2024 23:12:45 +0000 (16:12 -0700)]
Merge tag 'linux_kselftest-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "Fixes for build, run-time errors, and reporting errors:

   - ftrace: regression test for a kernel crash when running function
     graph tracing and then enabling function profiler.

   - rseq: fix for mm_cid test failure.

   - vDSO:
      - fixes to reporting skip and other error conditions
      - changes unconditionally build chacha and getrandom tests on all
        architectures to make it easier for them to run in CIs
      - build error when sched.h to bring in CLONE_NEWTIME define"

* tag 'linux_kselftest-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  ftrace/selftest: Test combination of function_graph tracer and function profiler
  selftests/rseq: Fix mm_cid test failure
  selftests: vDSO: Explicitly include sched.h
  selftests: vDSO: improve getrandom and chacha error messages
  selftests: vDSO: unconditionally build getrandom test
  selftests: vDSO: unconditionally build chacha test

12 months agoMerge tag 'devicetree-fixes-for-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 11 Oct 2024 23:07:15 +0000 (16:07 -0700)]
Merge tag 'devicetree-fixes-for-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Disable kunit tests for arm64+ACPI

 - Fix refcount issue in kunit tests

 - Drop constraints on non-conformant 'interrupt-map' in fsl,ls-extirq

 - Drop type ref on 'msi-parent in fsl,qoriq-mc binding

 - Move elgin,jg10309-01 to its own binding from trivial-devices

* tag 'devicetree-fixes-for-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: Skip kunit tests when arm64+ACPI doesn't populate root node
  of: Fix unbalanced of node refcount and memory leaks
  dt-bindings: interrupt-controller: fsl,ls-extirq: workaround wrong interrupt-map number
  dt-bindings: misc: fsl,qoriq-mc: remove ref for msi-parent
  dt-bindings: display: elgin,jg10309-01: Add own binding

12 months agoMerge tag 'fbdev-for-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/delle...
Linus Torvalds [Fri, 11 Oct 2024 22:56:02 +0000 (15:56 -0700)]
Merge tag 'fbdev-for-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev platform driver fix from Helge Deller:
 "Switch fbdev drivers back to struct platform_driver::remove()

  Now that 'remove()' has been converted to the sane new API, there's
  no reason for the 'remove_new()' use, so this converts back to the
  traditional and simpler name.

  See commits

     5c5a7680e67b ("platform: Provide a remove callback that returns no value")
     0edb555a65d1 ("platform: Make platform_driver::remove() return void")

  for background to this all"

* tag 'fbdev-for-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: Switch back to struct platform_driver::remove()

12 months agoMerge tag 'gpio-fixes-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 11 Oct 2024 22:42:26 +0000 (15:42 -0700)]
Merge tag 'gpio-fixes-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

 - fix clock handle leak in probe() error path in gpio-aspeed

 - add a dummy register read to ensure the write actually completed

* tag 'gpio-fixes-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: aspeed: Use devm_clk api to manage clock source
  gpio: aspeed: Add the flush write to ensure the write complete.

12 months agoMerge tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Linus Torvalds [Fri, 11 Oct 2024 22:37:15 +0000 (15:37 -0700)]
Merge tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "Localio Bugfixes:
   - remove duplicated include in localio.c
   - fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
   - fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
   - fix nfsd_file tracepoints to handle NULL rqstp pointers

  Other Bugfixes:
   - fix program selection loop in svc_process_common
   - fix integer overflow in decode_rc_list()
   - prevent NULL-pointer dereference in nfs42_complete_copies()
   - fix CB_RECALL performance issues when using a large number of
     delegations"

* tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: remove revoked delegation from server's delegation list
  nfsd/localio: fix nfsd_file tracepoints to handle NULL rqstp
  nfs_common: fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
  nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
  NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies()
  SUNRPC: Fix integer overflow in decode_rc_list()
  sunrpc: fix prog selection loop in svc_process_common
  nfs: Remove duplicated include in localio.c

12 months agoMerge tag 'rcu.fixes.6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu...
Linus Torvalds [Fri, 11 Oct 2024 21:42:27 +0000 (14:42 -0700)]
Merge tag 'rcu.fixes.6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU fix from Neeraj Upadhyay:
 "Fix rcuog kthread wakeup invocation from softirq context on a CPU
  which has been marked offline.

  This can happen when new callbacks are enqueued from a softirq on an
  offline CPU before it calls rcutree_report_cpu_dead(). When this
  happens on NOCB configuration, the rcuog wake-up is deferred through
  an IPI to an online CPU. This is done to avoid call into the scheduler
  which can risk arming the RT-bandwidth after hrtimers have been
  migrated out and disabled.

  However, doing IPI call from softirq is not allowed: Fix this by
  forcing deferred rcuog wakeup through the NOCB timer when the CPU is
  offline"

* tag 'rcu.fixes.6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
  rcu/nocb: Fix rcuog wake-up from offline softirq

12 months agoMerge tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 11 Oct 2024 21:34:18 +0000 (14:34 -0700)]
Merge tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
 "A fix for topology information of Xen PV guests"

* tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: mark boot CPU of PV guest in MSR_IA32_APICBASE

12 months agoftrace/selftest: Test combination of function_graph tracer and function profiler
Steven Rostedt [Thu, 10 Oct 2024 20:52:35 +0000 (16:52 -0400)]
ftrace/selftest: Test combination of function_graph tracer and function profiler

Masami reported a bug when running function graph tracing then the
function profiler. The following commands would cause a kernel crash:

  # cd /sys/kernel/tracing/
  # echo function_graph > current_tracer
  # echo 1 > function_profile_enabled

In that order. Create a test to test this two to make sure this does not
come back as a regression.

Link: https://lore.kernel.org/172398528350.293426.8347220120333730248.stgit@devnote2
Link: https://lore.kernel.org/all/20241010165235.35122877@gandalf.local.home/
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
12 months agoselftests/rseq: Fix mm_cid test failure
Mathieu Desnoyers [Wed, 9 Oct 2024 01:28:01 +0000 (21:28 -0400)]
selftests/rseq: Fix mm_cid test failure

Adapt the rseq.c/rseq.h code to follow GNU C library changes introduced by:

glibc commit 2e456ccf0c34 ("Linux: Make __rseq_size useful for feature detection (bug 31965)")

Without this fix, rseq selftests for mm_cid fail:

./run_param_test.sh
Default parameters
Running test spinlock
Running compare-twice test spinlock
Running mm_cid test spinlock
Error: cpu id getter unavailable

Fixes: 18c2355838e7 ("selftests/rseq: Implement rseq mm_cid field support")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
CC: Carlos O'Donell <carlos@redhat.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: linux-kselftest@vger.kernel.org
CC: stable@vger.kernel.org
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
12 months agoMerge tag 'io_uring-6.12-20241011' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 11 Oct 2024 19:00:21 +0000 (12:00 -0700)]
Merge tag 'io_uring-6.12-20241011' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Explicitly have a mshot_finished condition for IORING_OP_RECV in
   multishot mode, similarly to what IORING_OP_RECVMSG has. This doesn't
   fix a bug right now, but it makes it harder to actually have a bug
   here if a request takes multiple iterations to finish.

 - Fix handling of retry of read/write of !FMODE_NOWAIT files. If they
   are pollable, that's all we need.

* tag 'io_uring-6.12-20241011' of git://git.kernel.dk/linux:
  io_uring/rw: allow pollable non-blocking attempts for !FMODE_NOWAIT
  io_uring/rw: fix cflags posting for single issue multishot read

12 months agoMerge tag 'pm-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 11 Oct 2024 18:41:20 +0000 (11:41 -0700)]
Merge tag 'pm-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These address two issues in the TPMI module of the Intel RAPL power
  capping driver and one issue in the processor part of the Intel
  int340x thermal driver, update a CPU ID list and register definitions
  needed for RAPL PL4 support and remove some unused code.

  Specifics:

   - Fix the TPMI_RAPL_REG_DOMAIN_INFO register offset in the TPMI part
     of the Intel RAPL power capping driver, make it ignore minor
     hardware version mismatches (which only indicate exposing
     additional features) and update register definitions in it to
     enable PL4 support (Zhang Rui)

   - Add Arrow Lake-U to the list of processors supporting PL4 in the
     MSR part of the Intel RAPL power capping driver (Sumeet Pawnikar)

   - Remove excess pci_disable_device() calls from the processor part of
     the int340x thermal driver to address a warning triggered during
     module unload and remove unused CPU hotplug code related to RAPL
     support from it (Zhang Rui)"

* tag 'pm-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: int340x: processor: Add MMIO RAPL PL4 support
  thermal: intel: int340x: processor: Remove MMIO RAPL CPU hotplug support
  powercap: intel_rapl_msr: Add PL4 support for Arrowlake-U
  powercap: intel_rapl_tpmi: Ignore minor version change
  thermal: intel: int340x: processor: Fix warning during module unload
  powercap: intel_rapl_tpmi: Fix bogus register reading

12 months agoMerge tag 'thermal-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 11 Oct 2024 18:35:30 +0000 (11:35 -0700)]
Merge tag 'thermal-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "Address possible use-after-free scenarios during the processing of
  thermal netlink commands and during thermal zone removal (Rafael
  Wysocki)"

* tag 'thermal-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: core: Free tzp copy along with the thermal zone
  thermal: core: Reference count the zone in thermal_zone_get_by_id()

12 months agoMerge tag 'acpi-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 11 Oct 2024 18:32:10 +0000 (11:32 -0700)]
Merge tag 'acpi-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "Reduce the number of ACPI IRQ override DMI quirks by combining quirks
  that cover similar systems while making them cover additional models
  at the same time (Hans de Goede)"

* tag 'acpi-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: resource: Fold Asus Vivobook Pro N6506M* DMI quirks together
  ACPI: resource: Fold Asus ExpertBook B1402C* and B1502C* DMI quirks together
  ACPI: resource: Make Asus ExpertBook B2502 matches cover more models
  ACPI: resource: Make Asus ExpertBook B2402 matches cover more models

12 months agoMerge tag 'pmdomain-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh...
Linus Torvalds [Fri, 11 Oct 2024 18:26:15 +0000 (11:26 -0700)]
Merge tag 'pmdomain-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fixes from Ulf Hansson:
 "pmdomain core:
   - Fix alloc/free in dev_pm_domain_attach|detach_list()

  pmdomain providers:
   - qcom: Fix the return of uninitialized variable

  pmdomain consumers:
   - drm/tegra/gr3d: Revert conversion to dev_pm_domain_attach|detach_list()

  OPP core:
   - Fix error code in dev_pm_opp_set_config()"

* tag 'pmdomain-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
  PM: domains: Fix alloc/free in dev_pm_domain_attach|detach_list()
  Revert "drm/tegra: gr3d: Convert into dev_pm_domain_attach|detach_list()"
  pmdomain: qcom-cpr: Fix the return of uninitialized variable
  OPP: fix error code in dev_pm_opp_set_config()

12 months agoMerge tag 'mmc-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 11 Oct 2024 18:23:21 +0000 (11:23 -0700)]
Merge tag 'mmc-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "MMC core:
   - Prevent splat from warning when setting maximum DMA segment

  MMC host:
   - mvsdio: Drop sg_miter support for PIO as it didn't work
   - sdhci-of-dwcmshc: Prevent stale interrupt for the T-Head 1520
     variant"

* tag 'mmc-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-of-dwcmshc: Prevent stale command interrupt handling
  Revert "mmc: mvsdio: Use sg_miter for PIO"
  mmc: core: Only set maximum DMA segment size if DMA is supported

12 months agoMerge tag 'ata-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata...
Linus Torvalds [Fri, 11 Oct 2024 18:18:31 +0000 (11:18 -0700)]
Merge tag 'ata-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fixes from Niklas Cassel:

 - Fix a hibernate regression where the disk was needlessly spun down
   and then immediately spun up both when entering and when resuming
   from hibernation (me)

 - Update the MAINTAINERS file to remove remnants from Jens
   maintainership of libata (Damien)

* tag 'ata-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata: Update MAINTAINERS file
  ata: libata: avoid superfluous disk spin down + spin up during hibernation

12 months agoMerge tag 'drm-fixes-2024-10-11' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 11 Oct 2024 18:13:05 +0000 (11:13 -0700)]
Merge tag 'drm-fixes-2024-10-11' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Weekly fixes haul for drm, lots of small fixes all over, amdgpu, xe
  lead the way, some minor nouveau and radeon fixes, and then a bunch of
  misc all over.

  Nothing too scary or out of the unusual.

  sched:
   - Avoid leaking lockdep map

  fbdev-dma:
   - Only clean up deferred I/O if instanciated

  amdgpu:
   - Fix invalid UBSAN warnings
   - Fix artifacts in MPO transitions
   - Hibernation fix

  amdkfd:
   - Fix an eviction fence leak

  radeon:
   - Add late register for connectors
   - Always set GEM function pointers

  i915:
   - HDCP refcount fix

  nouveau:
   - dmem: Fix privileged error in copy engine channel; Fix possible
     data leak in migrate_to_ram()
   - gsp: Fix coding style

  v3d:
   - Stop active perfmon before destroying it

  vc4:
   - Stop active perfmon before destroying it

  xe:
   - Drop GuC submit_wq pool
   - Fix error checking with xa_store()
   - Fix missing freq restore on GSC load error
   - Fix wedged_mode file permission
   - Fix use-after-free in ct communication"

* tag 'drm-fixes-2024-10-11' of https://gitlab.freedesktop.org/drm/kernel:
  drm/fbdev-dma: Only cleanup deferred I/O if necessary
  drm/xe: Make wedged_mode debugfs writable
  drm/xe: Restore GT freq on GSC load error
  drm/xe/guc_submit: fix xa_store() error checking
  drm/xe/ct: fix xa_store() error checking
  drm/xe/ct: prevent UAF in send_recv()
  drm/radeon: always set GEM function pointer
  nouveau/dmem: Fix vulnerability in migrate_to_ram upon copy error
  nouveau/dmem: Fix privileged error in copy engine channel
  drm/amd/display: fix hibernate entry for DCN35+
  drm/amd/display: Clear update flags after update has been applied
  drm/amdgpu: partially revert powerplay `__counted_by` changes
  drm/radeon: add late_register for connector
  drm/amdkfd: Fix an eviction fence leak
  drm/vc4: Stop the active perfmon before being destroyed
  drm/v3d: Stop the active perfmon before being destroyed
  drm/i915/hdcp: fix connector refcounting
  drm/nouveau/gsp: remove extraneous ; after mutex
  drm/xe: Drop GuC submit_wq pool
  drm/sched: Use drm sched lockdep map for submit_wq

12 months agopowerpc/8xx: Fix kernel DTLB miss on dcbz
Christophe Leroy [Sat, 5 Oct 2024 08:53:29 +0000 (10:53 +0200)]
powerpc/8xx: Fix kernel DTLB miss on dcbz

Following OOPS is encountered while loading test_bpf module
on powerpc 8xx:

[  218.835567] BUG: Unable to handle kernel data access on write at 0xcb000000
[  218.842473] Faulting instruction address: 0xc0017a80
[  218.847451] Oops: Kernel access of bad area, sig: 11 [#1]
[  218.852854] BE PAGE_SIZE=16K PREEMPT CMPC885
[  218.857207] SAF3000 DIE NOTIFICATION
[  218.860713] Modules linked in: test_bpf(+) test_module
[  218.865867] CPU: 0 UID: 0 PID: 527 Comm: insmod Not tainted 6.11.0-s3k-dev-09856-g3de3d71ae2e6-dirty #1280
[  218.875546] Hardware name: MIAE 8xx 0x500000 CMPC885
[  218.880521] NIP:  c0017a80 LR: beab859c CTR: 000101d4
[  218.885584] REGS: cac2bc90 TRAP: 0300   Not tainted  (6.11.0-s3k-dev-09856-g3de3d71ae2e6-dirty)
[  218.894308] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 55005555  XER: a0007100
[  218.901290] DAR: cb000000 DSISR: c2000000
[  218.901290] GPR00: 000185d1 cac2bd50 c21b9580 caf7c030 c3883fcc 00000008 cafffffc 00000000
[  218.901290] GPR08: 00040000 18300000 20000000 00000004 99005555 100d815e ca669d08 00000369
[  218.901290] GPR16: ca730000 00000000 ca2c004c 00000000 00000000 0000035d 00000311 00000369
[  218.901290] GPR24: ca732240 00000001 00030ba3 c3800000 00000000 00185d48 caf7c000 ca2c004c
[  218.941087] NIP [c0017a80] memcpy+0x88/0xec
[  218.945277] LR [beab859c] test_bpf_init+0x22c/0x3c90 [test_bpf]
[  218.951476] Call Trace:
[  218.953916] [cac2bd50] [beab8570] test_bpf_init+0x200/0x3c90 [test_bpf] (unreliable)
[  218.962034] [cac2bde0] [c0004c04] do_one_initcall+0x4c/0x1fc
[  218.967706] [cac2be40] [c00a2ec4] do_init_module+0x68/0x360
[  218.973292] [cac2be60] [c00a5194] init_module_from_file+0x8c/0xc0
[  218.979401] [cac2bed0] [c00a5568] sys_finit_module+0x250/0x3f0
[  218.985248] [cac2bf20] [c000e390] system_call_exception+0x8c/0x15c
[  218.991444] [cac2bf30] [c00120a8] ret_from_syscall+0x0/0x28

This happens in the main loop of memcpy()

  ==> c0017a80: 7c 0b 37 ec  dcbz    r11,r6
c0017a84: 80 e4 00 04  lwz     r7,4(r4)
c0017a88: 81 04 00 08  lwz     r8,8(r4)
c0017a8c: 81 24 00 0c  lwz     r9,12(r4)
c0017a90: 85 44 00 10  lwzu    r10,16(r4)
c0017a94: 90 e6 00 04  stw     r7,4(r6)
c0017a98: 91 06 00 08  stw     r8,8(r6)
c0017a9c: 91 26 00 0c  stw     r9,12(r6)
c0017aa0: 95 46 00 10  stwu    r10,16(r6)
c0017aa4: 42 00 ff dc  bdnz    c0017a80 <memcpy+0x88>

Commit ac9f97ff8b32 ("powerpc/8xx: Inconditionally use task PGDIR in
DTLB misses") relies on re-reading DAR register to know if an error is
due to a missing copy of a PMD entry in task's PGDIR, allthough DAR
was already read in the exception prolog and copied into thread
struct. This is because is it done very early in the exception and
there are not enough registers available to keep a pointer to thread
struct.

However, dcbz instruction is buggy and doesn't update DAR register on
fault. That is detected and generates a call to FixupDAR workaround
which updates DAR copy in thread struct but doesn't fix DAR register.

Let's fix DAR in addition to the update of DAR copy in thread struct.

Fixes: ac9f97ff8b32 ("powerpc/8xx: Inconditionally use task PGDIR in DTLB misses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/2b851399bd87e81c6ccb87ea3a7a6b32c7aa04d7.1728118396.git.christophe.leroy@csgroup.eu
12 months agoMerge tag 'drm-xe-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Fri, 11 Oct 2024 03:54:05 +0000 (13:54 +1000)]
Merge tag 'drm-xe-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix error checking with xa_store() (Matthe Auld)
- Fix missing freq restore on GSC load error (Vinay)
- Fix wedged_mode file permission (Matt Roper)
- Fix use-after-free in ct communication (Matthew Auld)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/jri65tmv3bjbhqhxs5smv45nazssxzhtwphojem4uufwtjuliy@gsdhlh6kzsdy
12 months agoMerge tag 'drm-misc-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Thu, 10 Oct 2024 23:03:20 +0000 (09:03 +1000)]
Merge tag 'drm-misc-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

fbdev-dma:
- Only clean up deferred I/O if instanciated

nouveau:
- dmem: Fix privileged error in copy engine channel; Fix possible
data leak in migrate_to_ram()
- gsp: Fix coding style

sched:
- Avoid leaking lockdep map

v3d:
- Stop active perfmon before destroying it

vc4:
- Stop active perfmon before destroying it

xe:
- Drop GuC submit_wq pool

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241010133708.GA461532@localhost.localdomain
12 months agoMerge tag 'drm-intel-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Thu, 10 Oct 2024 22:55:26 +0000 (08:55 +1000)]
Merge tag 'drm-intel-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- HDCP refcount fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Zwd78Tnw8t3w9F16@jlahtine-mobl.ger.corp.intel.com
12 months agoMerge tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 10 Oct 2024 19:36:35 +0000 (12:36 -0700)]
Merge tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth and netfilter.

  Current release - regressions:

   - dsa: sja1105: fix reception from VLAN-unaware bridges

   - Revert "net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is
     enabled"

   - eth: fec: don't save PTP state if PTP is unsupported

  Current release - new code bugs:

   - smc: fix lack of icsk_syn_mss with IPPROTO_SMC, prevent null-deref

   - eth: airoha: update Tx CPU DMA ring idx at the end of xmit loop

   - phy: aquantia: AQR115c fix up PMA capabilities

  Previous releases - regressions:

   - tcp: 3 fixes for retrans_stamp and undo logic

  Previous releases - always broken:

   - net: do not delay dst_entries_add() in dst_release()

   - netfilter: restrict xtables extensions to families that are safe,
     syzbot found a way to combine ebtables with extensions that are
     never used by userspace tools

   - sctp: ensure sk_state is set to CLOSED if hashing fails in
     sctp_listen_start

   - mptcp: handle consistently DSS corruption, and prevent corruption
     due to large pmtu xmit"

* tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  MAINTAINERS: Add headers and mailing list to UDP section
  MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
  slip: make slhc_remember() more robust against malicious packets
  net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
  ppp: fix ppp_async_encode() illegal access
  docs: netdev: document guidance on cleanup patches
  phonet: Handle error of rtnl_register_module().
  mpls: Handle error of rtnl_register_module().
  mctp: Handle error of rtnl_register_module().
  bridge: Handle error of rtnl_register_module().
  vxlan: Handle error of rtnl_register_module().
  rtnetlink: Add bulk registration helpers for rtnetlink message handlers.
  net: do not delay dst_entries_add() in dst_release()
  mptcp: pm: do not remove closing subflows
  mptcp: fallback when MPTCP opts are dropped after 1st data
  tcp: fix mptcp DSS corruption due to large pmtu xmit
  mptcp: handle consistently DSS corruption
  net: netconsole: fix wrong warning
  net: dsa: refuse cross-chip mirroring operations
  net: fec: don't save PTP state if PTP is unsupported
  ...

12 months agoMerge tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 10 Oct 2024 19:25:32 +0000 (12:25 -0700)]
Merge tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "Ring-buffer fix: do not have boot-mapped buffers use CPU hotplug
  callbacks

  When a ring buffer is mapped to memory assigned at boot, it also
  splits it up evenly between the possible CPUs. But the allocation code
  still attached a CPU notifier callback to this ring buffer. When a CPU
  is added, the callback will happen and another per-cpu buffer is
  created for the ring buffer.

  But for boot mapped buffers, there is no room to add another one (as
  they were all created already). The result of calling the CPU hotplug
  notifier on a boot mapped ring buffer is unpredictable and could lead
  to a system crash.

  If the ring buffer is boot mapped simply do not attach the CPU
  notifier to it"

* tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Do not have boot mapped buffers hook to CPU hotplug

12 months agoof: Skip kunit tests when arm64+ACPI doesn't populate root node
Stephen Boyd [Wed, 9 Oct 2024 20:41:31 +0000 (13:41 -0700)]
of: Skip kunit tests when arm64+ACPI doesn't populate root node

A root node is required to apply DT overlays. A root node is usually
present after commit 7b937cc243e5 ("of: Create of_root if no dtb
provided by firmware"), except for on arm64 systems booted with ACPI
tables. In that case, the root node is intentionally not populated
because it would "allow DT devices to be instantiated atop an ACPI base
system"[1].

Introduce an OF function that skips the kunit test if the root node
isn't populated. Limit the test to when both CONFIG_ARM64 and
CONFIG_ACPI are set, because otherwise the lack of a root node is a bug.
Make the function private and take a kunit test parameter so that it
can't be abused to test for the presence of the root node in non-test
code.

Use this function to skip tests that require the root node. Currently
that's the DT tests and any tests that apply overlays.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/r/6cd337fb-38f0-41cb-b942-5844b84433db@roeck-us.net
Link: https://lore.kernel.org/r/Zd4dQpHO7em1ji67@FVFF77S0Q05N.cambridge.arm.com
Fixes: 893ecc6d2d61 ("of: Add KUnit test to confirm DTB is loaded")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20241009204133.1169931-1-sboyd@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
12 months agoMerge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 10 Oct 2024 17:02:59 +0000 (10:02 -0700)]
Merge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - update fstrim loop and add more cancellation points, fix reported
   delayed or blocked suspend if there's a huge chunk queued

 - fix error handling in recent qgroup xarray conversion

 - in zoned mode, fix warning printing device path without RCU
   protection

 - again fix invalid extent xarray state (6252690f7e1b), lost due to
   refactoring

* tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix clear_dirty and writeback ordering in submit_one_sector()
  btrfs: zoned: fix missing RCU locking in error message when loading zone info
  btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  btrfs: add cancellation points to trim loops
  btrfs: split remaining space to discard in chunks

12 months agoMerge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Thu, 10 Oct 2024 16:52:49 +0000 (09:52 -0700)]
Merge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix NFSD bring-up / shutdown

 - Fix a UAF when releasing a stateid

* tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: fix possible badness in FREE_STATEID
  nfsd: nfsd_destroy_serv() must call svc_destroy() even if nfsd_startup_net() failed
  NFSD: Mark filecache "down" if init fails

12 months agorcu/nocb: Fix rcuog wake-up from offline softirq
Frederic Weisbecker [Thu, 10 Oct 2024 16:36:09 +0000 (18:36 +0200)]
rcu/nocb: Fix rcuog wake-up from offline softirq

After a CPU has set itself offline and before it eventually calls
rcutree_report_cpu_dead(), there are still opportunities for callbacks
to be enqueued, for example from a softirq. When that happens on NOCB,
the rcuog wake-up is deferred through an IPI to an online CPU in order
not to call into the scheduler and risk arming the RT-bandwidth after
hrtimers have been migrated out and disabled.

But performing a synchronized IPI from a softirq is buggy as reported in
the following scenario:

        WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single
        Modules linked in: rcutorture torture
        CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1
        Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120
        RIP: 0010:smp_call_function_single
        <IRQ>
        swake_up_one_online
        __call_rcu_nocb_wake
        __call_rcu_common
        ? rcu_torture_one_read
        call_timer_fn
        __run_timers
        run_timer_softirq
        handle_softirqs
        irq_exit_rcu
        ? tick_handle_periodic
        sysvec_apic_timer_interrupt
        </IRQ>

Fix this with forcing deferred rcuog wake up through the NOCB timer when
the CPU is offline. The actual wake up will happen from
rcutree_report_cpu_dead().

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409231644.4c55582d-lkp@intel.com
Fixes: 9139f93209d1 ("rcu/nocb: Fix RT throttling hrtimer armed from offline CPU")
Reviewed-by: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
12 months agoMerge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Thu, 10 Oct 2024 16:45:45 +0000 (09:45 -0700)]
Merge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:

 - A few small typo fixes

 - fstests xfs/538 DEBUG-only fix

 - Performance fix on blockgc on COW'ed files, by skipping trims on
   cowblock inodes currently opened for write

 - Prevent cowblocks to be freed under dirty pagecache during unshare

 - Update MAINTAINERS file to quote the new maintainer

* tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix a typo
  xfs: don't free cowblocks from under dirty pagecache on unshare
  xfs: skip background cowblock trims on inodes open for write
  xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc
  xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc
  xfs: don't ifdef around the exact minlen allocations
  xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate
  xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname
  xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split
  xfs: return bool from xfs_attr3_leaf_add
  xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname
  xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()
  xfs: scrub: convert comma to semicolon
  xfs: Remove empty declartion in header file
  MAINTAINERS: add Carlos Maiolino as XFS release manager

12 months agoMerge branch 'maintainers-networking-file-coverage-updates'
Jakub Kicinski [Thu, 10 Oct 2024 16:35:50 +0000 (09:35 -0700)]
Merge branch 'maintainers-networking-file-coverage-updates'

Simon Horman says:

====================
MAINTAINERS: Networking file coverage updates

The aim of this proposal is to make the handling of some files,
related to Networking and Wireless, more consistently. It does so by:

1. Adding some more headers to the UDP section, making it consistent
   with the TCP section.

2. Excluding some files relating to Wireless from NETWORKING [GENERAL],
   making their handling consistent with other files related to
   Wireless.

The aim of this is to make things more consistent.  And for MAINTAINERS
to better reflect the situation on the ground.  I am more than happy to
be told that the current state of affairs is fine. Or for other ideas to
be discussed.

v1: https://lore.kernel.org/20241004-maint-net-hdrs-v1-0-41fd555aacc5@kernel.org
====================

Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-0-f2c86e7309c8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoMAINTAINERS: Add headers and mailing list to UDP section
Simon Horman [Wed, 9 Oct 2024 08:47:23 +0000 (09:47 +0100)]
MAINTAINERS: Add headers and mailing list to UDP section

Add netdev mailing list and some more udp.h headers to the UDP section.
This is now more consistent with the TCP section.

Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-2-f2c86e7309c8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoMAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
Simon Horman [Wed, 9 Oct 2024 08:47:22 +0000 (09:47 +0100)]
MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]

We already exclude wireless drivers from the netdev@ traffic, to
delegate it to linux-wireless@, and avoid overwhelming netdev@.

Many of the following wireless-related sections MAINTAINERS
are already not included in the NETWORKING [GENERAL] section.
For consistency, exclude those that are.

* 802.11 (including CFG80211/NL80211)
* MAC80211
* RFKILL

Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-1-f2c86e7309c8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoslip: make slhc_remember() more robust against malicious packets
Eric Dumazet [Wed, 9 Oct 2024 09:11:32 +0000 (09:11 +0000)]
slip: make slhc_remember() more robust against malicious packets

syzbot found that slhc_remember() was missing checks against
malicious packets [1].

slhc_remember() only checked the size of the packet was at least 20,
which is not good enough.

We need to make sure the packet includes the IPv4 and TCP header
that are supposed to be carried.

Add iph and th pointers to make the code more readable.

[1]

BUG: KMSAN: uninit-value in slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
  slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
  ppp_receive_nonmp_frame+0xe45/0x35e0 drivers/net/ppp/ppp_generic.c:2455
  ppp_receive_frame drivers/net/ppp/ppp_generic.c:2372 [inline]
  ppp_do_recv+0x65f/0x40d0 drivers/net/ppp/ppp_generic.c:2212
  ppp_input+0x7dc/0xe60 drivers/net/ppp/ppp_generic.c:2327
  pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
  sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
  __release_sock+0x1da/0x330 net/core/sock.c:3072
  release_sock+0x6b/0x250 net/core/sock.c:3626
  pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was created at:
  slab_post_alloc_hook mm/slub.c:4091 [inline]
  slab_alloc_node mm/slub.c:4134 [inline]
  kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186
  kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
  __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
  alloc_skb include/linux/skbuff.h:1322 [inline]
  sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
  pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

CPU: 0 UID: 0 PID: 5460 Comm: syz.2.33 Not tainted 6.12.0-rc2-syzkaller-00006-g87d6aab2389e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024

Fixes: b5451d783ade ("slip: Move the SLIP drivers")
Reported-by: syzbot+2ada1bc857496353be5a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/670646db.050a0220.3f80e.0027.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241009091132.2136321-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
D. Wythe [Wed, 9 Oct 2024 06:55:16 +0000 (14:55 +0800)]
net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC

Eric report a panic on IPPROTO_SMC, and give the facts
that when INET_PROTOSW_ICSK was set, icsk->icsk_sync_mss must be set too.

Bug: Unable to handle kernel NULL pointer dereference at virtual address
0000000000000000
Mem abort info:
ESR = 0x0000000086000005
EC = 0x21: IABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
user pgtable: 4k pages, 48-bit VAs, pgdp=00000001195d1000
[0000000000000000] pgd=0800000109c46003, p4d=0800000109c46003,
pud=0000000000000000
Internal error: Oops: 0000000086000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 UID: 0 PID: 8037 Comm: syz.3.265 Not tainted
6.11.0-rc7-syzkaller-g5f5673607153 #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 08/06/2024
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : 0x0
lr : cipso_v4_sock_setattr+0x2a8/0x3c0 net/ipv4/cipso_ipv4.c:1910
sp : ffff80009b887a90
x29: ffff80009b887aa0 x28: ffff80008db94050 x27: 0000000000000000
x26: 1fffe0001aa6f5b3 x25: dfff800000000000 x24: ffff0000db75da00
x23: 0000000000000000 x22: ffff0000d8b78518 x21: 0000000000000000
x20: ffff0000d537ad80 x19: ffff0000d8b78000 x18: 1fffe000366d79ee
x17: ffff8000800614a8 x16: ffff800080569b84 x15: 0000000000000001
x14: 000000008b336894 x13: 00000000cd96feaa x12: 0000000000000003
x11: 0000000000040000 x10: 00000000000020a3 x9 : 1fffe0001b16f0f1
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000040 x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000000002 x1 : 0000000000000000 x0 : ffff0000d8b78000
Call trace:
0x0
netlbl_sock_setattr+0x2e4/0x338 net/netlabel/netlabel_kapi.c:1000
smack_netlbl_add+0xa4/0x154 security/smack/smack_lsm.c:2593
smack_socket_post_create+0xa8/0x14c security/smack/smack_lsm.c:2973
security_socket_post_create+0x94/0xd4 security/security.c:4425
__sock_create+0x4c8/0x884 net/socket.c:1587
sock_create net/socket.c:1622 [inline]
__sys_socket_create net/socket.c:1659 [inline]
__sys_socket+0x134/0x340 net/socket.c:1706
__do_sys_socket net/socket.c:1720 [inline]
__se_sys_socket net/socket.c:1718 [inline]
__arm64_sys_socket+0x7c/0x94 net/socket.c:1718
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Code: ???????? ???????? ???????? ???????? (????????)
---[ end trace 0000000000000000 ]---

This patch add a toy implementation that performs a simple return to
prevent such panic. This is because MSS can be set in sock_create_kern
or smc_setsockopt, similar to how it's done in AF_SMC. However, for
AF_SMC, there is currently no way to synchronize MSS within
__sys_connect_file. This toy implementation lays the groundwork for us
to support such feature for IPPROTO_SMC in the future.

Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://patch.msgid.link/1728456916-67035-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoppp: fix ppp_async_encode() illegal access
Eric Dumazet [Wed, 9 Oct 2024 18:58:02 +0000 (18:58 +0000)]
ppp: fix ppp_async_encode() illegal access

syzbot reported an issue in ppp_async_encode() [1]

In this case, pppoe_sendmsg() is called with a zero size.
Then ppp_async_encode() is called with an empty skb.

BUG: KMSAN: uninit-value in ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
 BUG: KMSAN: uninit-value in ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
  ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
  ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
  ppp_async_send+0x130/0x1b0 drivers/net/ppp/ppp_async.c:634
  ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2280 [inline]
  ppp_input+0x1f1/0xe60 drivers/net/ppp/ppp_generic.c:2304
  pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
  sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
  __release_sock+0x1da/0x330 net/core/sock.c:3072
  release_sock+0x6b/0x250 net/core/sock.c:3626
  pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was created at:
  slab_post_alloc_hook mm/slub.c:4092 [inline]
  slab_alloc_node mm/slub.c:4135 [inline]
  kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4187
  kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
  __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
  alloc_skb include/linux/skbuff.h:1322 [inline]
  sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
  pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

CPU: 1 UID: 0 PID: 5411 Comm: syz.1.14 Not tainted 6.12.0-rc1-syzkaller-00165-g360c1f1f24c6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+1d121645899e7692f92a@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009185802.3763282-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agodocs: netdev: document guidance on cleanup patches
Simon Horman [Wed, 9 Oct 2024 09:12:19 +0000 (10:12 +0100)]
docs: netdev: document guidance on cleanup patches

The purpose of this section is to document what is the current practice
regarding clean-up patches which address checkpatch warnings and similar
problems. I feel there is a value in having this documented so others
can easily refer to it.

Clearly this topic is subjective. And to some extent the current
practice discourages a wider range of patches than is described here.
But I feel it is best to start somewhere, with the most well established
part of the current practice.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-doc-mc-clean-v2-1-e637b665fa81@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoMerge branch 'rtnetlink-handle-error-of-rtnl_register_module'
Paolo Abeni [Thu, 10 Oct 2024 13:39:37 +0000 (15:39 +0200)]
Merge branch 'rtnetlink-handle-error-of-rtnl_register_module'

Kuniyuki Iwashima says:

====================
rtnetlink: Handle error of rtnl_register_module().

While converting phonet to per-netns RTNL, I found a weird comment

  /* Further rtnl_register_module() cannot fail */

that was true but no longer true after commit addf9b90de22 ("net:
rtnetlink: use rcu to free rtnl message handlers").

Many callers of rtnl_register_module() just ignore the returned
value but should handle them properly.

This series introduces two helpers, rtnl_register_many() and
rtnl_unregister_many(), to do that easily and fix such callers.

All rtnl_register() and rtnl_register_module() will be converted
to _many() variant and some rtnl_lock() will be saved in _many()
later in net-next.

Changes:
  v4:
    * Add more context in changelog of each patch

  v3: https://lore.kernel.org/all/20241007124459.5727-1-kuniyu@amazon.com/
    * Move module *owner to struct rtnl_msg_handler
    * Make struct rtnl_msg_handler args/vars const
    * Update mctp goto labels

  v2: https://lore.kernel.org/netdev/20241004222358.79129-1-kuniyu@amazon.com/
    * Remove __exit from mctp_neigh_exit().

  v1: https://lore.kernel.org/netdev/20241003205725.5612-1-kuniyu@amazon.com/
====================

Link: https://patch.msgid.link/20241008184737.9619-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agophonet: Handle error of rtnl_register_module().
Kuniyuki Iwashima [Tue, 8 Oct 2024 18:47:37 +0000 (11:47 -0700)]
phonet: Handle error of rtnl_register_module().

Before commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl
message handlers"), once the first rtnl_register_module() allocated
rtnl_msg_handlers[PF_PHONET], the following calls never failed.

However, after the commit, rtnl_register_module() could fail silently
to allocate rtnl_msg_handlers[PF_PHONET][msgtype] and requires error
handling for each call.

Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality.  This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.

Let's use rtnl_register_many() to handle the errors easily.

Fixes: addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message handlers")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Rémi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agompls: Handle error of rtnl_register_module().
Kuniyuki Iwashima [Tue, 8 Oct 2024 18:47:36 +0000 (11:47 -0700)]
mpls: Handle error of rtnl_register_module().

Since introduced, mpls_init() has been ignoring the returned
value of rtnl_register_module(), which could fail silently.

Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality.  This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.

Let's handle the errors by rtnl_register_many().

Fixes: 03c0566542f4 ("mpls: Netlink commands to add, remove, and dump routes")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agomctp: Handle error of rtnl_register_module().
Kuniyuki Iwashima [Tue, 8 Oct 2024 18:47:35 +0000 (11:47 -0700)]
mctp: Handle error of rtnl_register_module().

Since introduced, mctp has been ignoring the returned value of
rtnl_register_module(), which could fail silently.

Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality.  This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.

Let's handle the errors by rtnl_register_many().

Fixes: 583be982d934 ("mctp: Add device handling and netlink interface")
Fixes: 831119f88781 ("mctp: Add neighbour netlink interface")
Fixes: 06d2f4c583a7 ("mctp: Add netlink route management")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agobridge: Handle error of rtnl_register_module().
Kuniyuki Iwashima [Tue, 8 Oct 2024 18:47:34 +0000 (11:47 -0700)]
bridge: Handle error of rtnl_register_module().

Since introduced, br_vlan_rtnl_init() has been ignoring the returned
value of rtnl_register_module(), which could fail silently.

Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality.  This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.

Let's handle the errors by rtnl_register_many().

Fixes: 8dcea187088b ("net: bridge: vlan: add rtm definitions and dump support")
Fixes: f26b296585dc ("net: bridge: vlan: add new rtm message support")
Fixes: adb3ce9bcb0f ("net: bridge: vlan: add del rtm message support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agovxlan: Handle error of rtnl_register_module().
Kuniyuki Iwashima [Tue, 8 Oct 2024 18:47:33 +0000 (11:47 -0700)]
vxlan: Handle error of rtnl_register_module().

Since introduced, vxlan_vnifilter_init() has been ignoring the
returned value of rtnl_register_module(), which could fail silently.

Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality.  This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.

Let's handle the errors by rtnl_register_many().

Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agortnetlink: Add bulk registration helpers for rtnetlink message handlers.
Kuniyuki Iwashima [Tue, 8 Oct 2024 18:47:32 +0000 (11:47 -0700)]
rtnetlink: Add bulk registration helpers for rtnetlink message handlers.

Before commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message
handlers"), once rtnl_msg_handlers[protocol] was allocated, the following
rtnl_register_module() for the same protocol never failed.

However, after the commit, rtnl_msg_handler[protocol][msgtype] needs to
be allocated in each rtnl_register_module(), so each call could fail.

Many callers of rtnl_register_module() do not handle the returned error,
and we need to add many error handlings.

To handle that easily, let's add wrapper functions for bulk registration
of rtnetlink message handlers.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agoPM: domains: Fix alloc/free in dev_pm_domain_attach|detach_list()
Ulf Hansson [Wed, 2 Oct 2024 12:22:23 +0000 (14:22 +0200)]
PM: domains: Fix alloc/free in dev_pm_domain_attach|detach_list()

The dev_pm_domain_attach|detach_list() functions are not resource managed,
hence they should not use devm_* helpers to manage allocation/freeing of
data. Let's fix this by converting to the traditional alloc/free functions.

Fixes: 161e16a5e50a ("PM: domains: Add helper functions to attach/detach multiple PM domains")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20241002122232.194245-3-ulf.hansson@linaro.org
12 months agoRevert "drm/tegra: gr3d: Convert into dev_pm_domain_attach|detach_list()"
Ulf Hansson [Wed, 2 Oct 2024 12:22:22 +0000 (14:22 +0200)]
Revert "drm/tegra: gr3d: Convert into dev_pm_domain_attach|detach_list()"

This reverts commit f790b5c09665cab0d51dfcc84832d79d2b1e6c0e.

The reverted commit was not ready to be applied due to dependency on other
OPP/pmdomain changes that didn't make it for the last release cycle. Let's
revert it to fix the behaviour.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20241002122232.194245-2-ulf.hansson@linaro.org
12 months agoMerge tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 10 Oct 2024 11:50:55 +0000 (13:50 +0200)]
Merge tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Restrict xtables extensions to families that are safe, syzbot found
   a way to combine ebtables with extensions that are never used by
   userspace tools. From Florian Westphal.

2) Set l3mdev inconditionally whenever possible in nft_fib to fix lookup
   mismatch, also from Florian.

netfilter pull request 24-10-09

* tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  selftests: netfilter: conntrack_vrf.sh: add fib test case
  netfilter: fib: check correct rtable in vrf setups
  netfilter: xtables: avoid NFPROTO_UNSPEC where needed
====================

Link: https://patch.msgid.link/20241009213858.3565808-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agommc: sdhci-of-dwcmshc: Prevent stale command interrupt handling
Michal Wilczynski [Tue, 8 Oct 2024 10:03:27 +0000 (12:03 +0200)]
mmc: sdhci-of-dwcmshc: Prevent stale command interrupt handling

While working with the T-Head 1520 LicheePi4A SoC, certain conditions
arose that allowed me to reproduce a race issue in the sdhci code.

To reproduce the bug, you need to enable the sdio1 controller in the
device tree file
`arch/riscv/boot/dts/thead/th1520-lichee-module-4a.dtsi` as follows:

&sdio1 {
bus-width = <4>;
max-frequency = <100000000>;
no-sd;
no-mmc;
broken-cd;
cap-sd-highspeed;
post-power-on-delay-ms = <50>;
status = "okay";
wakeup-source;
keep-power-in-suspend;
};

When resetting the SoC using the reset button, the following messages
appear in the dmesg log:

[    8.164898] mmc2: Got command interrupt 0x00000001 even though no
command operation was in progress.
[    8.174054] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[    8.180503] mmc2: sdhci: Sys addr:  0x00000000 | Version:  0x00000005
[    8.186950] mmc2: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
[    8.193395] mmc2: sdhci: Argument:  0x00000000 | Trn mode: 0x00000000
[    8.199841] mmc2: sdhci: Present:   0x03da0000 | Host ctl: 0x00000000
[    8.206287] mmc2: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
[    8.212733] mmc2: sdhci: Wake-up:   0x00000000 | Clock:    0x0000decf
[    8.219178] mmc2: sdhci: Timeout:   0x00000000 | Int stat: 0x00000000
[    8.225622] mmc2: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
[    8.232068] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[    8.238513] mmc2: sdhci: Caps:      0x3f69c881 | Caps_1:   0x08008177
[    8.244959] mmc2: sdhci: Cmd:       0x00000502 | Max curr: 0x00191919
[    8.254115] mmc2: sdhci: Resp[0]:   0x00001009 | Resp[1]:  0x00000000
[    8.260561] mmc2: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[    8.267005] mmc2: sdhci: Host ctl2: 0x00001000
[    8.271453] mmc2: sdhci: ADMA Err:  0x00000000 | ADMA Ptr:
0x0000000000000000
[    8.278594] mmc2: sdhci: ============================================

I also enabled some traces to better understand the problem:

     kworker/3:1-62      [003] .....     8.163538: mmc_request_start:
mmc2: start struct mmc_request[000000000d30cc0c]: cmd_opcode=5
cmd_arg=0x0 cmd_flags=0x2e1 cmd_retries=0 stop_opcode=0 stop_arg=0x0
stop_flags=0x0 stop_retries=0 sbc_opcode=0 sbc_arg=0x0 sbc_flags=0x0
sbc_retires=0 blocks=0 block_size=0 blk_addr=0 data_flags=0x0 tag=0
can_retune=0 doing_retune=0 retune_now=0 need_retune=0 hold_retune=1
retune_period=0
          <idle>-0       [000] d.h2.     8.164816: sdhci_cmd_irq:
hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x10000
intmask_p=0x18000
     irq/24-mmc2-96      [000] .....     8.164840: sdhci_thread_irq:
msg=
     irq/24-mmc2-96      [000] d.h2.     8.164896: sdhci_cmd_irq:
hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x1
intmask_p=0x1
     irq/24-mmc2-96      [000] .....     8.285142: mmc_request_done:
mmc2: end struct mmc_request[000000000d30cc0c]: cmd_opcode=5
cmd_err=-110 cmd_resp=0x0 0x0 0x0 0x0 cmd_retries=0 stop_opcode=0
stop_err=0 stop_resp=0x0 0x0 0x0 0x0 stop_retries=0 sbc_opcode=0
sbc_err=0 sbc_resp=0x0 0x0 0x0 0x0 sbc_retries=0 bytes_xfered=0
data_err=0 tag=0 can_retune=0 doing_retune=0 retune_now=0 need_retune=0
hold_retune=1 retune_period=0

Here's what happens: the __mmc_start_request function is called with
opcode 5. Since the power to the Wi-Fi card, which resides on this SDIO
bus, is initially off after the reset, an interrupt SDHCI_INT_TIMEOUT is
triggered. Immediately after that, a second interrupt SDHCI_INT_RESPONSE
is triggered. Depending on the exact timing, these conditions can
trigger the following race problem:

1) The sdhci_cmd_irq top half handles the command as an error. It sets
   host->cmd to NULL and host->pending_reset to true.
2) The sdhci_thread_irq bottom half is scheduled next and executes faster
   than the second interrupt handler for SDHCI_INT_RESPONSE. It clears
   host->pending_reset before the SDHCI_INT_RESPONSE handler runs.
3) The pending interrupt SDHCI_INT_RESPONSE handler gets called, triggering
   a code path that prints: "mmc2: Got command interrupt 0x00000001 even
   though no command operation was in progress."

To solve this issue, we need to clear pending interrupts when resetting
host->pending_reset. This ensures that after sdhci_threaded_irq restores
interrupts, there are no pending stale interrupts.

The behavior observed here is non-compliant with the SDHCI standard.
Place the code in the sdhci-of-dwcmshc driver to account for a
hardware-specific quirk instead of the core SDHCI code.

Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 43658a542ebf ("mmc: sdhci-of-dwcmshc: Add support for T-Head TH1520")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241008100327.4108895-1-m.wilczynski@samsung.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
12 months agonet: do not delay dst_entries_add() in dst_release()
Eric Dumazet [Tue, 8 Oct 2024 14:31:10 +0000 (14:31 +0000)]
net: do not delay dst_entries_add() in dst_release()

dst_entries_add() uses per-cpu data that might be freed at netns
dismantle from ip6_route_net_exit() calling dst_entries_destroy()

Before ip6_route_net_exit() can be called, we release all
the dsts associated with this netns, via calls to dst_release(),
which waits an rcu grace period before calling dst_destroy()

dst_entries_add() use in dst_destroy() is racy, because
dst_entries_destroy() could have been called already.

Decrementing the number of dsts must happen sooner.

Notes:

1) in CONFIG_XFRM case, dst_destroy() can call
   dst_release_immediate(child), this might also cause UAF
   if the child does not have DST_NOCOUNT set.
   IPSEC maintainers might take a look and see how to address this.

2) There is also discussion about removing this count of dst,
   which might happen in future kernels.

Fixes: f88649721268 ("ipv4: fix dst race in sk_dst_get()")
Closes: https://lore.kernel.org/lkml/CANn89iLCCGsP7SFn9HKpvnKu96Td4KD08xf7aGtiYgZnkjaL=w@mail.gmail.com/T/
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20241008143110.1064899-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agoata: libata: Update MAINTAINERS file
Damien Le Moal [Thu, 10 Oct 2024 02:01:17 +0000 (11:01 +0900)]
ata: libata: Update MAINTAINERS file

Modify the entry for the ahci_platform driver (LIBATA SATA
AHCI PLATFORM devices support) in the MAINTAINERS file to remove Jens
as maintainer. Also remove all references to Jens block tree from the
various LIBATA driver entries as the tree reference for these is defined
by the LIBATA SUBSYSTEM entry.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20241010020117.416333-1-dlemoal@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
12 months agodrm/fbdev-dma: Only cleanup deferred I/O if necessary
Janne Grunau [Sun, 6 Oct 2024 17:49:45 +0000 (19:49 +0200)]
drm/fbdev-dma: Only cleanup deferred I/O if necessary

Commit 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if
necessary") initializes deferred I/O only if it is used.
drm_fbdev_dma_fb_destroy() however calls fb_deferred_io_cleanup()
unconditionally with struct fb_info.fbdefio == NULL. KASAN with the
out-of-tree Apple silicon display driver posts following warning from
__flush_work() of a random struct work_struct instead of the expected
NULL pointer derefs.

[   22.053799] ------------[ cut here ]------------
[   22.054832] WARNING: CPU: 2 PID: 1 at kernel/workqueue.c:4177 __flush_work+0x4d8/0x580
[   22.056597] Modules linked in: uhid bnep uinput nls_ascii ip6_tables ip_tables i2c_dev loop fuse dm_multipath nfnetlink zram hid_magicmouse btrfs xor xor_neon brcmfmac_wcc raid6_pq hci_bcm4377 bluetooth brcmfmac hid_apple brcmutil nvmem_spmi_mfd simple_mfd_spmi dockchannel_hid cfg80211 joydev regmap_spmi nvme_apple ecdh_generic ecc macsmc_hid rfkill dwc3 appledrm snd_soc_macaudio macsmc_power nvme_core apple_isp phy_apple_atc apple_sart apple_rtkit_helper apple_dockchannel tps6598x macsmc_hwmon snd_soc_cs42l84 videobuf2_v4l2 spmi_apple_controller nvmem_apple_efuses videobuf2_dma_sg apple_z2 videobuf2_memops spi_nor panel_summit videobuf2_common asahi videodev pwm_apple apple_dcp snd_soc_apple_mca apple_admac spi_apple clk_apple_nco i2c_pasemi_platform snd_pcm_dmaengine mc i2c_pasemi_core mux_core ofpart adpdrm drm_dma_helper apple_dart apple_soc_cpufreq leds_pwm phram
[   22.073768] CPU: 2 UID: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.11.2-asahi+ #asahi-dev
[   22.075612] Hardware name: Apple MacBook Pro (13-inch, M2, 2022) (DT)
[   22.077032] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[   22.078567] pc : __flush_work+0x4d8/0x580
[   22.079471] lr : __flush_work+0x54/0x580
[   22.080345] sp : ffffc000836ef820
[   22.081089] x29: ffffc000836ef880 x28: 0000000000000000 x27: ffff80002ddb7128
[   22.082678] x26: dfffc00000000000 x25: 1ffff000096f0c57 x24: ffffc00082d3e358
[   22.084263] x23: ffff80004b7862b8 x22: dfffc00000000000 x21: ffff80005aa1d470
[   22.085855] x20: ffff80004b786000 x19: ffff80004b7862a0 x18: 0000000000000000
[   22.087439] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000005
[   22.089030] x14: 1ffff800106ddf0a x13: 0000000000000000 x12: 0000000000000000
[   22.090618] x11: ffffb800106ddf0f x10: dfffc00000000000 x9 : 1ffff800106ddf0e
[   22.092206] x8 : 0000000000000000 x7 : aaaaaaaaaaaaaaaa x6 : 0000000000000001
[   22.093790] x5 : ffffc000836ef728 x4 : 0000000000000000 x3 : 0000000000000020
[   22.095368] x2 : 0000000000000008 x1 : 00000000000000aa x0 : 0000000000000000
[   22.096955] Call trace:
[   22.097505]  __flush_work+0x4d8/0x580
[   22.098330]  flush_delayed_work+0x80/0xb8
[   22.099231]  fb_deferred_io_cleanup+0x3c/0x130
[   22.100217]  drm_fbdev_dma_fb_destroy+0x6c/0xe0 [drm_dma_helper]
[   22.101559]  unregister_framebuffer+0x210/0x2f0
[   22.102575]  drm_fb_helper_unregister_info+0x48/0x60
[   22.103683]  drm_fbdev_dma_client_unregister+0x4c/0x80 [drm_dma_helper]
[   22.105147]  drm_client_dev_unregister+0x1cc/0x230
[   22.106217]  drm_dev_unregister+0x58/0x570
[   22.107125]  apple_drm_unbind+0x50/0x98 [appledrm]
[   22.108199]  component_del+0x1f8/0x3a8
[   22.109042]  dcp_platform_shutdown+0x24/0x38 [apple_dcp]
[   22.110357]  platform_shutdown+0x70/0x90
[   22.111219]  device_shutdown+0x368/0x4d8
[   22.112095]  kernel_restart+0x6c/0x1d0
[   22.112946]  __arm64_sys_reboot+0x1c8/0x328
[   22.113868]  invoke_syscall+0x78/0x1a8
[   22.114703]  do_el0_svc+0x124/0x1a0
[   22.115498]  el0_svc+0x3c/0xe0
[   22.116181]  el0t_64_sync_handler+0x70/0xc0
[   22.117110]  el0t_64_sync+0x190/0x198
[   22.117931] ---[ end trace 0000000000000000 ]---

Signed-off-by: Janne Grunau <j@jannau.net>
Fixes: 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if necessary")
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/ZwLNuZL-8Gh5UUQb@robin