]> www.infradead.org Git - users/griffoul/linux.git/log
users/griffoul/linux.git
3 years agocxl/memdev: Add numa_node attribute
Dan Williams [Mon, 24 Jan 2022 00:31:24 +0000 (16:31 -0800)]
cxl/memdev: Add numa_node attribute

While CXL memory targets will have their own memory target node,
individual memory devices may be affinitized like other PCI devices.
Emit that attribute for memdevs.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164298428430.3018233.16409089892707993289.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/pci: Emit device serial number
Dan Williams [Mon, 31 Jan 2022 21:56:11 +0000 (13:56 -0800)]
cxl/pci: Emit device serial number

Per the CXL specification (8.1.12.2 Memory Device PCIe Capabilities and
Extended Capabilities) the Device Serial Number capability is mandatory.
Emit it for user tooling to identify devices.

It is reasonable to ask whether the attribute should be added to the
list of PCI sysfs device attributes. The PCI layer can optionally emit
it too, but the CXL subsystem is aiming to preserve its independence and
the possibility of CXL topologies with non-PCI devices in it. To date
that has only proven useful for the 'cxl_test' model, but as can be seen
with seen with ACPI0016 devices, sometimes all that is needed is a
platform firmware table to point to CXL Component Registers in MMIO
space to define a "CXL" device.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164366608838.196598.16856227191534267098.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/pci: Implement wait for media active
Ben Widawsky [Mon, 24 Jan 2022 00:31:13 +0000 (16:31 -0800)]
cxl/pci: Implement wait for media active

CXL 2.0 8.1.3.8.2 states:

  Memory_Active: When set, indicates that the CXL Range 1 memory is
  fully initialized and available for software use. Must be set within
  Range 1. Memory_Active_Timeout of deassertion of reset to CXL device
  if CXL.mem HwInit Mode=1

Unfortunately, Memory_Active can take quite a long time depending on
media size (up to 256s per 2.0 spec). Provide a callback for the
eventual establishment of CXL.mem operations via the 'cxl_mem' driver
the 'struct cxl_memdev'. The implementation waits for 60s by default for
now and can be overridden by the mbox_ready_time module parameter.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
[djbw: switch to sleeping wait]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164298427373.3018233.9309741847039301834.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/pci: Retrieve CXL DVSEC memory info
Ben Widawsky [Tue, 1 Feb 2022 23:48:56 +0000 (15:48 -0800)]
cxl/pci: Retrieve CXL DVSEC memory info

Before CXL 2.0 HDM Decoder Capability mechanisms can be utilized in a
device the driver must determine that the device is ready for CXL.mem
operation and that platform firmware, or some other agent, has
established an active decode via the legacy CXL 1.1 decoder mechanism.

This legacy mechanism is defined in the CXL DVSEC as a set of range
registers and status bits that take time to settle after a reset.

Validate the CXL memory decode setup via the DVSEC and cache it for
later consideration by the cxl_mem driver (to be added). Failure to
validate is not fatal to the cxl_pci driver since that is only providing
CXL command support over PCI.mmio, and might be needed to rectify CXL
DVSEC validation problems.

Any potential ranges that the device is already claiming via DVSEC need
to be reconciled with the dynamic provisioning ranges provided by
platform firmware (like ACPI CEDT.CFMWS). Leave that reconciliation to
the cxl_mem driver.

[djbw: shorten defines]
[djbw: change precise spin wait to generous msleep]

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
[djbw: clarify changelog]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164375911821.559935.7375160041663453400.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/pci: Cache device DVSEC offset
Ben Widawsky [Tue, 1 Feb 2022 22:06:32 +0000 (14:06 -0800)]
cxl/pci: Cache device DVSEC offset

The PCIe device DVSEC, defined in the CXL 2.0 spec, 8.1.3 is required to
be implemented by CXL 2.0 endpoint devices. In preparation for consuming
this information in a new cxl_mem driver, retrieve the CXL DVSEC
position and warn about the implications of not finding it. Allow for
mailbox operation even if the CXL DVSEC is missing.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164375309615.513620.7874131241128599893.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/pci: Store component register base in cxlds
Ben Widawsky [Tue, 1 Feb 2022 21:28:53 +0000 (13:28 -0800)]
cxl/pci: Store component register base in cxlds

In preparation for defining a cxl_port object to represent the decoder
resources of a memory expander capture the component register base
address.

The port driver uses the component register base to enumerate the HDM
Decoder Capability structure. Unlike other cxl_port objects the endpoint
port decodes from upstream SPA to downstream DPA rather than upstream
port to downstream port.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[djbw: clarify changelog]
Link: https://lore.kernel.org/r/164375084181.484304.3919737667590006795.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/core/port: Remove @host argument for dport + decoder enumeration
Dan Williams [Tue, 1 Feb 2022 21:23:14 +0000 (13:23 -0800)]
cxl/core/port: Remove @host argument for dport + decoder enumeration

Now that dport and decoder enumeration is centralized in the port
driver, the @host argument for these helpers can be made implicit. For
the root port the host is the port's uport device (ACPI0017 for
cxl_acpi), and for all other descendant ports the devm context is the
parent of @port.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164375043390.484143.17617734732003230076.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/port: Add a driver for 'struct cxl_port' objects
Ben Widawsky [Tue, 1 Feb 2022 21:07:51 +0000 (13:07 -0800)]
cxl/port: Add a driver for 'struct cxl_port' objects

The need for a CXL port driver and a dedicated cxl_bus_type is driven by
a need to simultaneously support 2 independent physical memory decode
domains (cache coherent CXL.mem and uncached PCI.mmio) that also
intersect at a single PCIe device node. A CXL Port is a device that
advertises a  CXL Component Register block with an "HDM Decoder
Capability Structure".

>From Documentation/driver-api/cxl/memory-devices.rst:

    Similar to how a RAID driver takes disk objects and assembles them into
    a new logical device, the CXL subsystem is tasked to take PCIe and ACPI
    objects and assemble them into a CXL.mem decode topology. The need for
    runtime configuration of the CXL.mem topology is also similar to RAID in
    that different environments with the same hardware configuration may
    decide to assemble the topology in contrasting ways. One may choose
    performance (RAID0) striping memory across multiple Host Bridges and
    endpoints while another may opt for fault tolerance and disable any
    striping in the CXL.mem topology.

The port driver identifies whether an endpoint Memory Expander is
connected to a CXL topology. If an active (bound to the 'cxl_port'
driver) CXL Port is not found at every PCIe Switch Upstream port and an
active "root" CXL Port then the device is just a plain PCIe endpoint
only capable of participating in PCI.mmio and DMA cycles, not CXL.mem
coherent interleave sets.

The 'cxl_port' driver lets the CXL subsystem leverage driver-core
infrastructure for setup and teardown of register resources and
communicating device activation status to userspace. The cxl_bus_type
can rendezvous the async arrival of platform level CXL resources (via
the 'cxl_acpi' driver) with the asynchronous enumeration of Memory
Expander endpoints, while also implementing a hierarchical locking model
independent of the associated 'struct pci_dev' locking model. The
locking for dport and decoder enumeration is now handled in the core
rather than callers.

For now the port driver only enumerates and registers CXL resources
(downstream port metadata and decoder resources) later it will be used
to take action on its decoders in response to CXL.mem region
provisioning requests.

Note1: cxlpci.h has long depended on pci.h, but port.c was the first to
not include pci.h. Carry that dependency in cxlpci.h.

Note2: cxl port enumeration and probing complicates CXL subsystem init
to the point that it helps to have centralized debug logging of probe
events in cxl_bus_probe().

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/164374948116.464348.1772618057599155408.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/core: Emit modalias for CXL devices
Dan Williams [Mon, 24 Jan 2022 00:30:41 +0000 (16:30 -0800)]
cxl/core: Emit modalias for CXL devices

In order to enable libkmod lookups for CXL device objects to their
corresponding module, add 'modalias' to the base attribute of CXL
devices.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164298424120.3018233.15611905873808708542.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/core/hdm: Add CXL standard decoder enumeration to the core
Dan Williams [Tue, 1 Feb 2022 20:24:30 +0000 (12:24 -0800)]
cxl/core/hdm: Add CXL standard decoder enumeration to the core

Unlike the decoder enumeration for "root decoders" described by platform
firmware, standard decoders can be enumerated from the component
registers space once the base address has been identified (via PCI,
ACPI, or another mechanism).

Add common infrastructure for HDM (Host-managed-Device-Memory) Decoder
enumeration and share it between host-bridge, upstream switch port, and
cxl_test defined decoders.

The locking model for switch level decoders is to hold the port lock
over the enumeration. This facilitates moving the dport and decoder
enumeration to a 'port' driver. For now, the only enumerator of decoder
resources is the cxl_acpi root driver.

Co-developed-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164374688404.395335.9239248252443123526.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/core: Generalize dport enumeration in the core
Dan Williams [Tue, 1 Feb 2022 02:10:04 +0000 (18:10 -0800)]
cxl/core: Generalize dport enumeration in the core

The core houses infrastructure for decoder resources. A CXL port's
dports are more closely related to decoder infrastructure than topology
enumeration. Implement generic PCI based dport enumeration in the core,
i.e. arrange for existing root port enumeration from cxl_acpi to share
code with switch port enumeration which just amounts to a small
difference in a pci_walk_bus() invocation once the appropriate 'struct
pci_bus' has been retrieved.

Set the convention that decoder objects are registered after all dports
are enumerated. This enables userspace to know when the CXL core is
finished establishing 'dportX' links underneath the 'portX' object.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164368114191.354031.5270501846455462665.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/pci: Rename pci.h to cxlpci.h
Dan Williams [Mon, 24 Jan 2022 00:30:25 +0000 (16:30 -0800)]
cxl/pci: Rename pci.h to cxlpci.h

Similar to the mem.h rename, if the core wants to reuse definitions from
drivers/cxl/pci.h it is unable to use <pci.h> as that collides with
archs that have an arch/$arch/include/asm/pci.h, like MIPS.

Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164298422510.3018233.14693126572756675563.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/port: Up-level cxl_add_dport() locking requirements to the caller
Dan Williams [Tue, 1 Feb 2022 01:07:38 +0000 (17:07 -0800)]
cxl/port: Up-level cxl_add_dport() locking requirements to the caller

In preparation for moving dport enumeration into the core, require the
port device lock to be acquired by the caller.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164367759016.324231.105551648350470000.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/pmem: Introduce a find_cxl_root() helper
Dan Williams [Tue, 1 Feb 2022 00:34:40 +0000 (16:34 -0800)]
cxl/pmem: Introduce a find_cxl_root() helper

In preparation for switch port enumeration while also preserving the
potential for multi-domain / multi-root CXL topologies. Introduce a
'struct device' generic mechanism for retrieving a root CXL port, if one
is registered. Note that the only known multi-domain CXL configurations
are running the cxl_test unit test on a system that also publishes an
ACPI0017 device.

With this in hand the nvdimm-bridge lookup can be with
device_find_child() instead of bus_find_device() + custom mocked lookup
infrastructure in cxl_test.

The mechanism looks for a 2nd level port since the root level topology
is platform-firmware specific and the 2nd level down follows standard
PCIe topology expectations. The cxl_acpi 2nd level is associated with a
PCIe Root Port.

Reported-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164367562182.225521.9488555616768096049.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/port: Introduce cxl_port_to_pci_bus()
Dan Williams [Mon, 31 Jan 2022 16:44:52 +0000 (08:44 -0800)]
cxl/port: Introduce cxl_port_to_pci_bus()

Add a helper for converting a PCI enumerated cxl_port into the pci_bus
that hosts its dports. For switch ports this is trivial, but for root
ports there is no generic way to go from a platform defined host bridge
device, like ACPI0016 to its corresponding pci_bus. Rather than spill
ACPI goop outside of the cxl_acpi driver, just arrange for it to
register an xarray translation from the uport device to the
corresponding pci_bus.

This is in preparation for centralizing dport enumeration in the core.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164364745633.85488.9744017377155103992.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/core/port: Use dedicated lock for decoder target list
Dan Williams [Mon, 31 Jan 2022 23:35:18 +0000 (15:35 -0800)]
cxl/core/port: Use dedicated lock for decoder target list

Lockdep reports:

 ======================================================
 WARNING: possible circular locking dependency detected
 5.16.0-rc1+ #142 Tainted: G           OE
 ------------------------------------------------------
 cxl/1220 is trying to acquire lock:
 ffff979b85475460 (kn->active#144){++++}-{0:0}, at: __kernfs_remove+0x1ab/0x1e0

 but task is already holding lock:
 ffff979b87ab38e8 (&dev->lockdep_mutex#2/4){+.+.}-{3:3}, at: cxl_remove_ep+0x50c/0x5c0 [cxl_core]

...where cxl_remove_ep() is a helper that wants to delete ports while
holding a lock on the host device for that port. That sets up a lockdep
violation whereby target_list_show() can not rely holding the decoder's
device lock while walking the target_list. Switch to a dedicated seqlock
for this purpose.

Reported-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164367209095.208169.1171673319121271280.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl: Prove CXL locking
Dan Williams [Mon, 31 Jan 2022 19:50:09 +0000 (11:50 -0800)]
cxl: Prove CXL locking

When CONFIG_PROVE_LOCKING is enabled the 'struct device' definition gets
an additional mutex that is not clobbered by
lockdep_set_novalidate_class() like the typical device_lock(). This
allows for local annotation of subsystem locks with mutex_lock_nested()
per the subsystem's object/lock hierarchy. For CXL, this primarily needs
the ability to lock ports by depth and child objects of ports by their
parent parent-port lock.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164365853422.99383.1052399160445197427.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/core: Track port depth
Ben Widawsky [Mon, 24 Jan 2022 00:29:53 +0000 (16:29 -0800)]
cxl/core: Track port depth

In preparation for proving CXL subsystem usage of the device_lock()
order track the depth of ports with the expectation that  shallower port
locks can be held over deeper port locks.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164298419321.3018233.4469731547378993606.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/core/port: Make passthrough decoder init implicit
Ben Widawsky [Mon, 24 Jan 2022 00:29:47 +0000 (16:29 -0800)]
cxl/core/port: Make passthrough decoder init implicit

Unused CXL decoders, or ports which use a passthrough decoder (no HDM
decoder registers) are expected to be initialized in a specific way.
Since upcoming drivers will want the same initialization, and it was
already a requirement to have consumers of the API configure the decoder
specific to their needs, initialize to this passthrough state by
default.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164298418778.3018233.13573986275832546547.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/core: Fix cxl_probe_component_regs() error message
Dan Williams [Mon, 24 Jan 2022 00:29:42 +0000 (16:29 -0800)]
cxl/core: Fix cxl_probe_component_regs() error message

Fix a '\n' vs '/n' typo.

Fixes: 08422378c4ad ("cxl/pci: Add HDM decoder capabilities")
Acked-by: Ben Widawsky <ben.widawsky@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164298418268.3018233.17790073375430834911.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/core/port: Clarify decoder creation
Ben Widawsky [Mon, 31 Jan 2022 21:33:13 +0000 (13:33 -0800)]
cxl/core/port: Clarify decoder creation

Add wrappers for the creation of decoder objects at the root level and
switch level, and keep the core helper private to cxl/core/port.c. Root
decoders are static descriptors conveyed from platform firmware (e.g.
ACPI CFMWS). Switch decoders are CXL standard decoders enumerated via
the HDM decoder capability structure. The base address for the HDM
decoder capability structure may be conveyed either by PCIe or platform
firmware (ACPI CEDT.CHBS).

Additionally, the kdoc descriptions for these helpers and their
dependencies is updated.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
[djbw: fixup changelog, clarify kdoc]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164366463014.111117.9714595404002687111.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/core: Convert decoder range to resource
Ben Widawsky [Mon, 24 Jan 2022 00:29:31 +0000 (16:29 -0800)]
cxl/core: Convert decoder range to resource

CXL decoders manage address ranges in a hierarchical fashion whereby a
leaf is a unique subregion of its parent decoder (midlevel or root). It
therefore makes sense to use the resource API for handling this.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> (v1)
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164298417191.3018233.5201055578165414714.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/decoder: Hide physical address information from non-root
Dan Williams [Mon, 24 Jan 2022 00:29:26 +0000 (16:29 -0800)]
cxl/decoder: Hide physical address information from non-root

Just like /proc/iomem, CXL physical address information is reserved for
root only.

Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164298416650.3018233.450720006145238709.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/core/port: Rename bus.c to port.c
Dan Williams [Mon, 24 Jan 2022 00:29:21 +0000 (16:29 -0800)]
cxl/core/port: Rename bus.c to port.c

Given it is dominated by port infrastructure, and will only acquire
more, rename bus.c to port.c.

Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164298416136.3018233.15442880970000855425.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl: Introduce module_cxl_driver
Ben Widawsky [Mon, 24 Jan 2022 00:29:15 +0000 (16:29 -0800)]
cxl: Introduce module_cxl_driver

Many CXL drivers simply want to register and unregister themselves.
module_driver already supported this. A simple wrapper around that
reduces a decent amount of boilerplate in upcoming patches.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164298415591.3018233.13608495220547681412.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/acpi: Map component registers for Root Ports
Ben Widawsky [Mon, 24 Jan 2022 00:29:10 +0000 (16:29 -0800)]
cxl/acpi: Map component registers for Root Ports

This implements the TODO in cxl_acpi for mapping component registers.
cxl_acpi becomes the second consumer of CXL register block enumeration
(cxl_pci being the first). Moving the functionality to cxl_core allows
both of these drivers to use the functionality. Equally importantly it
allows cxl_core to use the functionality in the future.

CXL 2.0 root ports are similar to CXL 2.0 Downstream Ports with the main
distinction being they're a part of the CXL 2.0 host bridge. While
mapping their component registers is not immediately useful for the CXL
drivers, the movement of register block enumeration into core is a vital
step towards HDM decoder programming.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[djbw: fix cxl_regmap_to_base() failure cases]
Link: https://lore.kernel.org/r/164298415080.3018233.14694957480228676592.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/pci: Add new DVSEC definitions
Ben Widawsky [Mon, 24 Jan 2022 00:29:05 +0000 (16:29 -0800)]
cxl/pci: Add new DVSEC definitions

In preparation for properly supporting memory active timeout, and later
on, other attributes obtained from DVSEC fields, add the full list of
DVSEC identifiers from the CXL 2.0 specification.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huwei.com> (v1)
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164298414567.3018233.12005290051592771878.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl: Flesh out register names
Ben Widawsky [Mon, 24 Jan 2022 00:29:00 +0000 (16:29 -0800)]
cxl: Flesh out register names

Get a better naming scheme in place for upcoming additions. By dropping
redundant usages of CXL and DVSEC where appropriate we can get more
concise and also more grepable defines.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164298414022.3018233.15522855498759815097.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/pci: Defer mailbox status checks to command timeouts
Dan Williams [Mon, 24 Jan 2022 00:28:54 +0000 (16:28 -0800)]
cxl/pci: Defer mailbox status checks to command timeouts

Device status can change without warning at any point in time. This
effectively means that no amount of status checking before a command is
submitted can guarantee that the device is not in an error condition
when the command is later submitted. The clearest signal that a device
is not able to process commands is if it fails to process commands.

With the above understanding in hand, update cxl_pci_setup_mailbox() to
validate the readiness of the mailbox once at the beginning of time, and
then use timeouts and busy sequencing errors as the only occasions to
report status.

Just as before, unless and until the driver gains a reset recovery path,
doorbell clearing failures by the device are fatal to mailbox
operations.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164298413480.3018233.9643395389297971819.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl/pci: Implement Interface Ready Timeout
Ben Widawsky [Mon, 31 Jan 2022 23:51:45 +0000 (15:51 -0800)]
cxl/pci: Implement Interface Ready Timeout

The original driver implementation used the doorbell timeout for the
Mailbox Interface Ready bit to piggy back off of, since the latter does
not have a defined timeout. This functionality, introduced in commit
8adaf747c9f0 ("cxl/mem: Find device capabilities"), needs improvement as
the recent "Add Mailbox Ready Time" ECN timeout indicates that the
mailbox ready time can be significantly longer that 2 seconds.

While the specification limits the maximum timeout to 256s, the cxl_pci
driver gives up on the mailbox after 60s. This value corresponds with
important timeout values already present in the kernel. A module
parameter is provided as an emergency override and represents the
default Linux policy for all devices.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[djbw: add modparam, drop check_device_status()]
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/164367306565.208548.1932299464604450843.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agocxl: Rename CXL_MEM to CXL_PCI
Ben Widawsky [Mon, 24 Jan 2022 00:28:44 +0000 (16:28 -0800)]
cxl: Rename CXL_MEM to CXL_PCI

The cxl_mem module was renamed cxl_pci in commit 21e9f76733a8 ("cxl:
Rename mem to pci"). In preparation for adding an ancillary driver for
cxl_memdev devices (registered on the cxl bus by cxl_pci), go ahead and
rename CONFIG_CXL_MEM to CONFIG_CXL_PCI. Free up the CXL_MEM name for
that new driver to manage CXL.mem endpoint operations.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164298412409.3018233.12407355692407890752.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agoLinux 5.17-rc2 v5.17-rc2
Linus Torvalds [Sun, 30 Jan 2022 13:37:07 +0000 (15:37 +0200)]
Linux 5.17-rc2

3 years agoMerge tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jan 2022 13:12:02 +0000 (15:12 +0200)]
Merge tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Drop an unused private data field in the AIC driver

 - Various fixes to the realtek-rtl driver

 - Make the GICv3 ITS driver compile again in !SMP configurations

 - Force reset of the GICv3 ITSs at probe time to avoid issues during kexec

 - Yet another kfree/bitmap_free conversion

 - Various DT updates (Renesas, SiFive)

* tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  dt-bindings: interrupt-controller: sifive,plic: Group interrupt tuples
  dt-bindings: interrupt-controller: sifive,plic: Fix number of interrupts
  dt-bindings: irqchip: renesas-irqc: Add R-Car V3U support
  irqchip/gic-v3-its: Reset each ITS's BASERn register before probe
  irqchip/gic-v3-its: Fix build for !SMP
  irqchip/loongson-pch-ms: Use bitmap_free() to free bitmap
  irqchip/realtek-rtl: Service all pending interrupts
  irqchip/realtek-rtl: Fix off-by-one in routing
  irqchip/realtek-rtl: Map control data to virq
  irqchip/apple-aic: Drop unused ipi_hwirq field

3 years agoMerge tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jan 2022 13:02:32 +0000 (15:02 +0200)]
Merge tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Prevent accesses to the per-CPU cgroup context list from another CPU
   except the one it belongs to, to avoid list corruption

 - Make sure parent events are always woken up to avoid indefinite hangs
   in the traced workload

* tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix cgroup event list management
  perf: Always wake the parent event

3 years agoMerge tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Sun, 30 Jan 2022 11:09:00 +0000 (13:09 +0200)]
Merge tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:
 "Make sure the membarrier-rseq fence commands are part of the reported
  set when querying membarrier(2) commands through MEMBARRIER_CMD_QUERY"

* tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask

3 years agoMerge tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jan 2022 10:55:06 +0000 (12:55 +0200)]
Merge tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add another Intel CPU model to the list of CPUs supporting the
   processor inventory unique number

 - Allow writing to MCE thresholding sysfs files again - a previous
   change had accidentally disabled it and no one noticed. Goes to show
   how much is this stuff used

* tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
  x86/MCE/AMD: Allow thresholding interface updates after init

3 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sun, 30 Jan 2022 09:21:50 +0000 (11:21 +0200)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "12 patches.

  Subsystems affected by this patch series: sysctl, binfmt, ia64, mm
  (memory-failure, folios, kasan, and psi), selftests, and ocfs2"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  ocfs2: fix a deadlock when commit trans
  jbd2: export jbd2_journal_[grab|put]_journal_head
  psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
  psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n
  mm, kasan: use compare-exchange operation to set KASAN page tag
  kasan: test: fix compatibility with FORTIFY_SOURCE
  tools/testing/scatterlist: add missing defines
  mm: page->mapping folio->mapping should have the same offset
  memory-failure: fetch compound_head after pgmap_pfn_valid()
  ia64: make IA64_MCA_RECOVERY bool instead of tristate
  binfmt_misc: fix crash when load/unload module
  include/linux/sysctl.h: fix register_sysctl_mount_point() return type

3 years agoocfs2: fix a deadlock when commit trans
Joseph Qi [Sat, 29 Jan 2022 21:41:27 +0000 (13:41 -0800)]
ocfs2: fix a deadlock when commit trans

commit 6f1b228529ae introduces a regression which can deadlock as
follows:

  Task1:                              Task2:
  jbd2_journal_commit_transaction     ocfs2_test_bg_bit_allocatable
  spin_lock(&jh->b_state_lock)        jbd_lock_bh_journal_head
  __jbd2_journal_remove_checkpoint    spin_lock(&jh->b_state_lock)
  jbd2_journal_put_journal_head
  jbd_lock_bh_journal_head

Task1 and Task2 lock bh->b_state and jh->b_state_lock in different
order, which finally result in a deadlock.

So use jbd2_journal_[grab|put]_journal_head instead in
ocfs2_test_bg_bit_allocatable() to fix it.

Link: https://lkml.kernel.org/r/20220121071205.100648-3-joseph.qi@linux.alibaba.com
Fixes: 6f1b228529ae ("ocfs2: fix race between searching chunks and release journal_head from buffer_head")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Tested-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Reported-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agojbd2: export jbd2_journal_[grab|put]_journal_head
Joseph Qi [Sat, 29 Jan 2022 21:41:23 +0000 (13:41 -0800)]
jbd2: export jbd2_journal_[grab|put]_journal_head

Patch series "ocfs2: fix a deadlock case".

This fixes a deadlock case in ocfs2.  We firstly export jbd2 symbols
jbd2_journal_[grab|put]_journal_head as preparation and later use them
in ocfs2 insread of jbd_[lock|unlock]_bh_journal_head to fix the
deadlock.

This patch (of 2):

This exports symbols jbd2_journal_[grab|put]_journal_head, which will be
used outside modules, e.g.  ocfs2.

Link: https://lkml.kernel.org/r/20220121071205.100648-2-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Cc: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agopsi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
Suren Baghdasaryan [Sat, 29 Jan 2022 21:41:20 +0000 (13:41 -0800)]
psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n

When CONFIG_PROC_FS is disabled psi code generates the following
warnings:

  kernel/sched/psi.c:1364:30: warning: 'psi_cpu_proc_ops' defined but not used [-Wunused-const-variable=]
      1364 | static const struct proc_ops psi_cpu_proc_ops = {
           |                              ^~~~~~~~~~~~~~~~
  kernel/sched/psi.c:1355:30: warning: 'psi_memory_proc_ops' defined but not used [-Wunused-const-variable=]
      1355 | static const struct proc_ops psi_memory_proc_ops = {
           |                              ^~~~~~~~~~~~~~~~~~~
  kernel/sched/psi.c:1346:30: warning: 'psi_io_proc_ops' defined but not used [-Wunused-const-variable=]
      1346 | static const struct proc_ops psi_io_proc_ops = {
           |                              ^~~~~~~~~~~~~~~

Make definitions of these structures and related functions conditional
on CONFIG_PROC_FS config.

Link: https://lkml.kernel.org/r/20220119223940.787748-3-surenb@google.com
Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agopsi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n
Suren Baghdasaryan [Sat, 29 Jan 2022 21:41:17 +0000 (13:41 -0800)]
psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n

When CONFIG_CGROUPS is disabled psi code generates the following
warnings:

  kernel/sched/psi.c:1112:21: warning: no previous prototype for 'psi_trigger_create' [-Wmissing-prototypes]
      1112 | struct psi_trigger *psi_trigger_create(struct psi_group *group,
           |                     ^~~~~~~~~~~~~~~~~~
  kernel/sched/psi.c:1182:6: warning: no previous prototype for 'psi_trigger_destroy' [-Wmissing-prototypes]
      1182 | void psi_trigger_destroy(struct psi_trigger *t)
           |      ^~~~~~~~~~~~~~~~~~~
  kernel/sched/psi.c:1249:10: warning: no previous prototype for 'psi_trigger_poll' [-Wmissing-prototypes]
      1249 | __poll_t psi_trigger_poll(void **trigger_ptr,
           |          ^~~~~~~~~~~~~~~~

Change the declarations of these functions in the header to provide the
prototypes even when they are unused.

Link: https://lkml.kernel.org/r/20220119223940.787748-2-surenb@google.com
Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm, kasan: use compare-exchange operation to set KASAN page tag
Peter Collingbourne [Sat, 29 Jan 2022 21:41:14 +0000 (13:41 -0800)]
mm, kasan: use compare-exchange operation to set KASAN page tag

It has been reported that the tag setting operation on newly-allocated
pages can cause the page flags to be corrupted when performed
concurrently with other flag updates as a result of the use of
non-atomic operations.

Fix the problem by using a compare-exchange loop to update the tag.

Link: https://lkml.kernel.org/r/20220120020148.1632253-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/I456b24a2b9067d93968d43b4bb3351c0cec63101
Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokasan: test: fix compatibility with FORTIFY_SOURCE
Marco Elver [Sat, 29 Jan 2022 21:41:11 +0000 (13:41 -0800)]
kasan: test: fix compatibility with FORTIFY_SOURCE

With CONFIG_FORTIFY_SOURCE enabled, string functions will also perform
dynamic checks using __builtin_object_size(ptr), which when failed will
panic the kernel.

Because the KASAN test deliberately performs out-of-bounds operations,
the kernel panics with FORTIFY_SOURCE, for example:

 | kernel BUG at lib/string_helpers.c:910!
 | invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
 | CPU: 1 PID: 137 Comm: kunit_try_catch Tainted: G    B             5.16.0-rc3+ #3
 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
 | RIP: 0010:fortify_panic+0x19/0x1b
 | ...
 | Call Trace:
 |  kmalloc_oob_in_memset.cold+0x16/0x16
 |  ...

Fix it by also hiding `ptr` from the optimizer, which will ensure that
__builtin_object_size() does not return a valid size, preventing
fortified string functions from panicking.

Link: https://lkml.kernel.org/r/20220124160744.1244685-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Nico Pache <npache@redhat.com>
Reviewed-by: Nico Pache <npache@redhat.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agotools/testing/scatterlist: add missing defines
Maor Gottlieb [Sat, 29 Jan 2022 21:41:07 +0000 (13:41 -0800)]
tools/testing/scatterlist: add missing defines

The cited commits replaced preemptible with pagefault_disabled and
flush_kernel_dcache_page with flush_dcache_page respectively, hence need
to update the corresponding defines in the test.

  scatterlist.c: In function â€˜sg_miter_stop’:
  scatterlist.c:919:4: warning: implicit declaration of function â€˜flush_dcache_page’ [-Wimplicit-function-declaration]
      flush_dcache_page(miter->page);
      ^~~~~~~~~~~~~~~~~
  In file included from linux/scatterlist.h:8:0,
                   from scatterlist.c:9:
  scatterlist.c:922:18: warning: implicit declaration of function â€˜pagefault_disabled’ [-Wimplicit-function-declaration]
      WARN_ON_ONCE(!pagefault_disabled());
                    ^
  linux/mm.h:23:25: note: in definition of macro â€˜WARN_ON_ONCE’
    int __ret_warn_on = !!(condition);                      \
                           ^~~~~~~~~

Link: https://lkml.kernel.org/r/20220118082105.1737320-1-maorg@nvidia.com
Fixes: 723aca208516 ("mm/scatterlist: replace the !preemptible warning in sg_miter_stop()")
Fixes: 0e84f5dbf8d6 ("scatterlist: replace flush_kernel_dcache_page with flush_dcache_page")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: page->mapping folio->mapping should have the same offset
Wei Yang [Sat, 29 Jan 2022 21:41:04 +0000 (13:41 -0800)]
mm: page->mapping folio->mapping should have the same offset

As with the other members of folio, the offset of page->mapping and
folio->mapping must be the same.  The compile-time check was
inadvertently removed during development.  Add it back.

[willy@infradead.org: changelog redo]

Link: https://lkml.kernel.org/r/20220104011734.21714-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomemory-failure: fetch compound_head after pgmap_pfn_valid()
Joao Martins [Sat, 29 Jan 2022 21:41:01 +0000 (13:41 -0800)]
memory-failure: fetch compound_head after pgmap_pfn_valid()

memory_failure_dev_pagemap() at the moment assumes base pages (e.g.
dax_lock_page()).  For devmap with compound pages fetch the
compound_head in case a tail page memory failure is being handled.

Currently this is a nop, but in the advent of compound pages in
dev_pagemap it allows memory_failure_dev_pagemap() to keep working.

Without this fix memory-failure handling (i.e.  MCEs on pmem) with
device-dax configured namespaces will regress (and crash).

Link: https://lkml.kernel.org/r/20211202204422.26777-2-joao.m.martins@oracle.com
Reported-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoia64: make IA64_MCA_RECOVERY bool instead of tristate
Randy Dunlap [Sat, 29 Jan 2022 21:40:58 +0000 (13:40 -0800)]
ia64: make IA64_MCA_RECOVERY bool instead of tristate

In linux-next, IA64_MCA_RECOVERY uses the (new) function
make_task_dead(), which is not exported for use by modules.  Instead of
exporting it for one user, convert IA64_MCA_RECOVERY to be a bool
Kconfig symbol.

In a config file from "kernel test robot <lkp@intel.com>" for a
different problem, this linker error was exposed when
CONFIG_IA64_MCA_RECOVERY=m.

Fixes this build error:

  ERROR: modpost: "make_task_dead" [arch/ia64/kernel/mca_recovery.ko] undefined!

Link: https://lkml.kernel.org/r/20220124213129.29306-1-rdunlap@infradead.org
Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agobinfmt_misc: fix crash when load/unload module
Tong Zhang [Sat, 29 Jan 2022 21:40:55 +0000 (13:40 -0800)]
binfmt_misc: fix crash when load/unload module

We should unregister the table upon module unload otherwise something
horrible will happen when we load binfmt_misc module again.  Also note
that we should keep value returned by register_sysctl_mount_point() and
release it later, otherwise it will leak.

Also, per Christian's comment, to fully restore the old behavior that
won't break userspace the check(binfmt_misc_header) should be
eliminated.

To reproduce:
  modprobe binfmt_misc
  modprobe -r binfmt_misc
  modprobe binfmt_misc
  modprobe -r binfmt_misc
  modprobe binfmt_misc

resulting in

  modprobe: can't load module binfmt_misc (kernel/fs/binfmt_misc.ko): Cannot allocate memory

and an unhappy kernel:

  binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
  binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
  BUG: unable to handle page fault for address: fffffbfff8004802
  Call Trace:
    init_misc_binfmt+0x2d/0x1000 [binfmt_misc]

Link: https://lkml.kernel.org/r/20220124181812.1869535-2-ztong0001@gmail.com
Fixes: 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file")
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Co-developed-by: Christian Brauner<brauner@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoinclude/linux/sysctl.h: fix register_sysctl_mount_point() return type
Andrew Morton [Sat, 29 Jan 2022 21:40:52 +0000 (13:40 -0800)]
include/linux/sysctl.h: fix register_sysctl_mount_point() return type

The CONFIG_SYSCTL=n stub returns the wrong type.

Fixes: ee9efac48a082 ("sysctl: add helper to register a sysctl mount point")
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge tag 'irqchip-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Thomas Gleixner [Sat, 29 Jan 2022 20:03:20 +0000 (21:03 +0100)]
Merge tag 'irqchip-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

Pull irqchip fixes from Marc Zyngier:

  - Drop an unused private data field in the AIC driver

  - Various fixes to the realtek-rtl driver

  - Make the GICv3 ITS driver compile again in !SMP configurations

  - Force reset of the GICv3 ITSs at probe time to avoid issues during kexec

  - Yet another kfree/bitmap_free conversion

  - Various DT updates (Renesas, SiFive)

Link: https://lore.kernel.org/r/20220128174217.517041-1-maz@kernel.org
3 years agoMerge tag 'pci-v5.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaa...
Linus Torvalds [Sat, 29 Jan 2022 17:05:47 +0000 (19:05 +0200)]
Merge tag 'pci-v5.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci fixes from Bjorn Helgaas:

 - Fix compilation warnings in new mt7621 driver (Sergio Paracuellos)

 - Restore the sysfs "rom" file for VGA shadow ROMs, which was broken
   when converting "rom" to be a static attribute (Bjorn Helgaas)

* tag 'pci-v5.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/sysfs: Find shadow ROM before static attribute initialization
  PCI: mt7621: Remove unused function pcie_rmw()
  PCI: mt7621: Drop of_match_ptr() to avoid unused variable

3 years agoMerge tag 'gpio-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 29 Jan 2022 13:45:33 +0000 (15:45 +0200)]
Merge tag 'gpio-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
 "Two fixes for the gpio-simulator:

   - fix a bug with hogs not being set-up in gpio-sim when user-space
     sets the chip label to an empty string

   - include the gpio-sim documentation in the index"

* tag 'gpio-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: sim: add doc file to index file
  gpio: sim: check the label length when setting up device properties

3 years agoMerge tag 'char-misc-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sat, 29 Jan 2022 13:34:04 +0000 (15:34 +0200)]
Merge tag 'char-misc-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are two small char/misc driver fixes for 5.17-rc2 that fix some
  reported issues. They are:

   - fix up a merge issue in the at25.c driver that ended up dropping
     some lines in the driver. The removed lines ended being needed, so
     this restores it and the driver works again.

   - counter core fix where the wrong error was being returned, NULL
     should be the correct error for when memory is gone here, like the
     kmalloc() core does.

  Both of these have been in linux-next this week with no reported
  issues"

* tag 'char-misc-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  counter: fix an IS_ERR() vs NULL bug
  eeprom: at25: Restore missing allocation

3 years agoMerge tag 'tty-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sat, 29 Jan 2022 13:23:13 +0000 (15:23 +0200)]
Merge tag 'tty-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
 "Here are some small bug fixes and reverts for reported problems with
  the tty core and drivers. They include:

   - revert the fifo use for the 8250 console mode. It caused too many
     regressions and problems, and had a bug in it as well. This is
     being reworked and should show up in a later -rc1 release, but it's
     not ready for 5.17

   - rpmsg tty race fix

   - restore the cyclades.h uapi header file. Turns out a compiler test
     suite used it for some unknown reason. Bring it back just for the
     parts that are used by the builder test so they continue to build.
     No functionality is restored as no one actually has this hardware
     anymore, nor is it really tested.

   - stm32 driver fixes

   - n_gsm flow control fixes

   - pl011 driver fix

   - rs485 initialization fix

  All of these have been in linux-next this week with no reported
  problems"

* tag 'tty-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  kbuild: remove include/linux/cyclades.h from header file check
  serial: core: Initialize rs485 RTS polarity already on probe
  serial: pl011: Fix incorrect rs485 RTS polarity on set_mctrl
  serial: stm32: fix software flow control transfer
  serial: stm32: prevent TDR register overwrite when sending x_char
  tty: n_gsm: fix SW flow control encoding/handling
  serial: 8250: of: Fix mapped region size when using reg-offset property
  tty: rpmsg: Fix race condition releasing tty port
  tty: Partially revert the removal of the Cyclades public API
  tty: Add support for Brainboxes UC cards.
  Revert "tty: serial: Use fifo in 8250 console driver"

3 years agoMerge tag 'usb-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sat, 29 Jan 2022 13:17:20 +0000 (15:17 +0200)]
Merge tag 'usb-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
 "Here are some small USB driver fixes for 5.17-rc2 that resolve a
  number of reported problems. These include:

   - typec driver fixes

   - xhci platform driver fixes for suspending

   - ulpi core fix

   - role.h build fix

   - new device ids

   - syzbot-reported bugfixes

   - gadget driver fixes

   - dwc3 driver fixes

   - other small fixes

  All of these have been in linux-next this week with no reported
  issues"

* tag 'usb-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: cdnsp: Fix segmentation fault in cdns_lost_power function
  usb: dwc2: gadget: don't try to disable ep0 in dwc2_hsotg_suspend
  usb: gadget: at91_udc: fix incorrect print type
  usb: dwc3: xilinx: Fix error handling when getting USB3 PHY
  usb: dwc3: xilinx: Skip resets and USB3 register settings for USB2.0 mode
  usb: xhci-plat: fix crash when suspend if remote wake enable
  usb: common: ulpi: Fix crash in ulpi_match()
  usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS
  ucsi_ccg: Check DEV_INT bit only when starting CCG4
  USB: core: Fix hang in usb_kill_urb by adding memory barriers
  usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
  usb: typec: tcpm: Do not disconnect when receiving VSAFE0V
  usb: typec: tcpm: Do not disconnect while receiving VBUS off
  usb: typec: Don't try to register component master without components
  usb: typec: Only attempt to link USB ports if there is fwnode
  usb: typec: tcpci: don't touch CC line if it's Vconn source
  usb: roles: fix include/linux/usb/role.h compile issue

3 years agoMerge tag 'block-5.17-2022-01-28' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 29 Jan 2022 13:01:08 +0000 (15:01 +0200)]
Merge tag 'block-5.17-2022-01-28' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request
      - add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs (Wu
        Zheng)
      - remove the unneeded ret variable in nvmf_dev_show (Changcheng
        Deng)

 - Fix for a hang regression introduced with a patch in the merge
   window, where low queue depth devices would not always get woken
   correctly (Laibin)

 - Small series fixing an IO accounting issue with bio backed dm devices
   (Mike, Yu)

* tag 'block-5.17-2022-01-28' of git://git.kernel.dk/linux-block:
  dm: properly fix redundant bio-based IO accounting
  dm: revert partial fix for redundant bio-based IO accounting
  block: add bio_start_io_acct_time() to control start_time
  blk-mq: Fix wrong wakeup batch configuration which will cause hang
  nvme-fabrics: remove the unneeded ret variable in nvmf_dev_show
  nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs
  blk-mq: fix missing blk_account_io_done() in error path
  block: fix memory leak in disk_register_independent_access_ranges

3 years agoMerge tag 'io_uring-5.17-2022-01-28' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 29 Jan 2022 12:53:07 +0000 (14:53 +0200)]
Merge tag 'io_uring-5.17-2022-01-28' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Just two small fixes this time:

   - Fix a bug that can lead to node registration taking 1 second, when
     it should finish much quicker (Dylan)

   - Remove an unused argument from a function (Usama)"

* tag 'io_uring-5.17-2022-01-28' of git://git.kernel.dk/linux-block:
  io_uring: remove unused argument from io_rsrc_node_alloc
  io_uring: fix bug in slow unregistering of nodes

3 years agoMerge tag 'powerpc-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sat, 29 Jan 2022 12:46:19 +0000 (14:46 +0200)]
Merge tag 'powerpc-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix VM debug warnings on boot triggered via __set_fixmap().

 - Fix a debug warning in the 64-bit Book3S PMU handling code.

 - Fix nested guest HFSCR handling with multiple vCPUs on Power9 or
   later.

 - Fix decrementer storm caused by a recent change, seen with some
   configs.

Thanks to Alexey Kardashevskiy, Athira Rajeev, Christophe Leroy,
Fabiano Rosas, Maxime Bizon, Nicholas Piggin, and Sachin Sant.

* tag 'powerpc-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s/interrupt: Fix decrementer storm
  KVM: PPC: Book3S HV Nested: Fix nested HFSCR being clobbered with multiple vCPUs
  powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending
  powerpc/fixmap: Fix VM debug warning on unmap

3 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Sat, 29 Jan 2022 06:57:22 +0000 (08:57 +0200)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Errata workarounds for Cortex-A510: broken hardware dirty bit
   management, detection code for the TRBE (tracing) bugs with the
   actual fixes going in via the CoreSight tree.

 - Cortex-X2 errata handling for TRBE (inheriting the workarounds from
   Cortex-A710).

 - Fix ex_handler_load_unaligned_zeropad() to use the correct struct
   members.

 - A couple of kselftest fixes for FPSIMD.

 - Silence the vdso "no previous prototype" warning.

 - Mark start_backtrace() notrace and NOKPROBE_SYMBOL.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: cpufeature: List early Cortex-A510 parts as having broken dbm
  kselftest/arm64: Correct logging of FPSIMD register read via ptrace
  kselftest/arm64: Skip VL_INHERIT tests for unsupported vector types
  arm64: errata: Add detection for TRBE trace data corruption
  arm64: errata: Add detection for TRBE invalid prohibited states
  arm64: errata: Add detection for TRBE ignored system register writes
  arm64: Add Cortex-A510 CPU part definition
  arm64: extable: fix load_unaligned_zeropad() reg indices
  arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL
  arm64: errata: Update ARM64_ERRATUM_[2119858|2224489] with Cortex-X2 ranges
  arm64: Add Cortex-X2 CPU part definition
  arm64: vdso: Fix "no previous prototype" warning

3 years agoMerge tag 'fixes-v5.17-lsm-ceph-null' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 29 Jan 2022 06:52:27 +0000 (08:52 +0200)]
Merge tag 'fixes-v5.17-lsm-ceph-null' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull security sybsystem fix from James Morris:
 "Fix NULL pointer crash in LSM via Ceph, from Vivek Goyal"

* tag 'fixes-v5.17-lsm-ceph-null' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  security, lsm: dentry_init_security() Handle multi LSM registration

3 years agoMerge tag 'docs-5.17-3' of git://git.lwn.net/linux
Linus Torvalds [Sat, 29 Jan 2022 06:27:28 +0000 (08:27 +0200)]
Merge tag 'docs-5.17-3' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A few documentation fixes for 5.17"

* tag 'docs-5.17-3' of git://git.lwn.net/linux:
  docs/vm: Fix typo in *harden*
  Documentation: arm: marvell: Extend Avanta list
  docs: fix typo in Documentation/kernel-hacking/locking.rst
  docs: Hook the RTLA documents into the kernel docs build

3 years agodm: properly fix redundant bio-based IO accounting
Mike Snitzer [Fri, 28 Jan 2022 15:58:41 +0000 (10:58 -0500)]
dm: properly fix redundant bio-based IO accounting

Record the start_time for a bio but defer the starting block core's IO
accounting until after IO is submitted using bio_start_io_acct_time().

This approach avoids the need to mess around with any of the
individual IO stats in response to a bio_split() that follows bio
submission.

Reported-by: Bud Brown <bubrown@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Depends-on: e45c47d1f94e ("block: add bio_start_io_acct_time() to control start_time")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220128155841.39644-4-snitzer@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agodm: revert partial fix for redundant bio-based IO accounting
Mike Snitzer [Fri, 28 Jan 2022 15:58:40 +0000 (10:58 -0500)]
dm: revert partial fix for redundant bio-based IO accounting

Reverts a1e1cb72d9649 ("dm: fix redundant IO accounting for bios that
need splitting") because it was too narrow in scope (only addressed
redundant 'sectors[]' accounting and not ios, nsecs[], etc).

Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220128155841.39644-3-snitzer@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: add bio_start_io_acct_time() to control start_time
Mike Snitzer [Fri, 28 Jan 2022 15:58:39 +0000 (10:58 -0500)]
block: add bio_start_io_acct_time() to control start_time

bio_start_io_acct_time() interface is like bio_start_io_acct() that
allows start_time to be passed in. This gives drivers the ability to
defer starting accounting until after IO is issued (but possibily not
entirely due to bio splitting).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220128155841.39644-2-snitzer@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Fri, 28 Jan 2022 19:17:58 +0000 (21:17 +0200)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Sixteen patches, mostly minor fixes and updates; however there are
  substantive driver bug fixes in pm8001, bnx2fc, zfcp, myrs and qedf"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: myrs: Fix crash in error case
  scsi: 53c700: Remove redundant assignment to pointer SCp
  scsi: ufs: Treat link loss as fatal error
  scsi: ufs: Use generic error code in ufshcd_set_dev_pwr_mode()
  scsi: bfa: Remove useless DMA-32 fallback configuration
  scsi: hisi_sas: Remove useless DMA-32 fallback configuration
  scsi: 3w-sas: Remove useless DMA-32 fallback configuration
  scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
  scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
  scsi: pm8001: Fix bogus FW crash for maxcpus=1
  scsi: qedf: Change context reset messages to ratelimited
  scsi: qedf: Fix refcount issue when LOGO is received during TMF
  scsi: qedf: Add stag_work to all the vports
  scsi: ufs: ufshcd-pltfrm: Check the return value of devm_kstrdup()
  scsi: target: iscsi: Make sure the np under each tpg is unique
  scsi: elx: efct: Don't use GFP_KERNEL under spin lock

3 years agoMerge tag 'efi-urgent-for-v5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 28 Jan 2022 19:12:07 +0000 (21:12 +0200)]
Merge tag 'efi-urgent-for-v5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

 - avoid UEFI v2.00+ runtime services on Apple Mac systems, as they have
   been reported to cause crashes, and most Macs claim to be EFI v1.10
   anyway

 - avoid a spurious boot time warning on arm64 systems with 64k pages

* tag 'efi-urgent-for-v5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: runtime: avoid EFIv2 runtime services on Apple x86 machines
  efi/libstub: arm64: Fix image check alignment at entry

3 years agosecurity, lsm: dentry_init_security() Handle multi LSM registration
Vivek Goyal [Wed, 26 Jan 2022 20:35:14 +0000 (15:35 -0500)]
security, lsm: dentry_init_security() Handle multi LSM registration

A ceph user has reported that ceph is crashing with kernel NULL pointer
dereference. Following is the backtrace.

/proc/version: Linux version 5.16.2-arch1-1 (linux@archlinux) (gcc (GCC)
11.1.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Thu, 20 Jan 2022
16:18:29 +0000
distro / arch: Arch Linux / x86_64
SELinux is not enabled
ceph cluster version: 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)

relevant dmesg output:
[   30.947129] BUG: kernel NULL pointer dereference, address:
0000000000000000
[   30.947206] #PF: supervisor read access in kernel mode
[   30.947258] #PF: error_code(0x0000) - not-present page
[   30.947310] PGD 0 P4D 0
[   30.947342] Oops: 0000 [#1] PREEMPT SMP PTI
[   30.947388] CPU: 5 PID: 778 Comm: touch Not tainted 5.16.2-arch1-1 #1
86fbf2c313cc37a553d65deb81d98e9dcc2a3659
[   30.947486] Hardware name: Gigabyte Technology Co., Ltd. B365M
DS3H/B365M DS3H, BIOS F5 08/13/2019
[   30.947569] RIP: 0010:strlen+0x0/0x20
[   30.947616] Code: b6 07 38 d0 74 16 48 83 c7 01 84 c0 74 05 48 39 f7 75
ec 31 c0 31 d2 89 d6 89 d7 c3 48 89 f8 31 d2 89 d6 89 d7 c3 0
f 1f 40 00 <80> 3f 00 74 12 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 31
ff
[   30.947782] RSP: 0018:ffffa4ed80ffbbb8 EFLAGS: 00010246
[   30.947836] RAX: 0000000000000000 RBX: ffffa4ed80ffbc60 RCX:
0000000000000000
[   30.947904] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[   30.947971] RBP: ffff94b0d15c0ae0 R08: 0000000000000000 R09:
0000000000000000
[   30.948040] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[   30.948106] R13: 0000000000000001 R14: ffffa4ed80ffbc60 R15:
0000000000000000
[   30.948174] FS:  00007fc7520f0740(0000) GS:ffff94b7ced40000(0000)
knlGS:0000000000000000
[   30.948252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.948308] CR2: 0000000000000000 CR3: 0000000104a40001 CR4:
00000000003706e0
[   30.948376] Call Trace:
[   30.948404]  <TASK>
[   30.948431]  ceph_security_init_secctx+0x7b/0x240 [ceph
49f9c4b9bf5be8760f19f1747e26da33920bce4b]
[   30.948582]  ceph_atomic_open+0x51e/0x8a0 [ceph
49f9c4b9bf5be8760f19f1747e26da33920bce4b]
[   30.948708]  ? get_cached_acl+0x4d/0xa0
[   30.948759]  path_openat+0x60d/0x1030
[   30.948809]  do_filp_open+0xa5/0x150
[   30.948859]  do_sys_openat2+0xc4/0x190
[   30.948904]  __x64_sys_openat+0x53/0xa0
[   30.948948]  do_syscall_64+0x5c/0x90
[   30.948989]  ? exc_page_fault+0x72/0x180
[   30.949034]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   30.949091] RIP: 0033:0x7fc7521e25bb
[   30.950849] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00
00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 0
0 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14
25

Core of the problem is that ceph checks for return code from
security_dentry_init_security() and if return code is 0, it assumes
everything is fine and continues to call strlen(name), which crashes.

Typically SELinux LSM returns 0 and sets name to "security.selinux" and
it is not a problem. Or if selinux is not compiled in or disabled, it
returns -EOPNOTSUP and ceph deals with it.

But somehow in this configuration, 0 is being returned and "name" is
not being initialized and that's creating the problem.

Our suspicion is that BPF LSM is registering a hook for
dentry_init_security() and returns hook default of 0.

LSM_HOOK(int, 0, dentry_init_security, struct dentry *dentry,...)

I have not been able to reproduce it just by doing CONFIG_BPF_LSM=y.
Stephen has tested the patch though and confirms it solves the problem
for him.

dentry_init_security() is written in such a way that it expects only one
LSM to register the hook. Atleast that's the expectation with current code.

If another LSM returns a hook and returns default, it will simply return
0 as of now and that will break ceph.

Hence, suggestion is that change semantics of this hook a bit. If there
are no LSMs or no LSM is taking ownership and initializing security context,
then return -EOPNOTSUP. Also allow at max one LSM to initialize security
context. This hook can't deal with multiple LSMs trying to init security
context. This patch implements this new behavior.

Reported-by: Stephen Muth <smuth4@gmail.com>
Tested-by: Stephen Muth <smuth4@gmail.com>
Suggested-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: <stable@vger.kernel.org> # 5.16.0
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: James Morris <jmorris@namei.org>
3 years agoMerge tag 'pm-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 28 Jan 2022 18:44:07 +0000 (20:44 +0200)]
Merge tag 'pm-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These make the buffer handling in pm_show_wakelocks() more robust and
  drop an unused hibernation-related function.

  Specifics:

   - Make the buffer handling in pm_show_wakelocks() more robust by
     using sysfs_emit_at() in it to generate output (Greg
     Kroah-Hartman).

   - Drop register_nosave_region_late() which is not used (Amadeusz
     SÅ‚awiÅ„ski)"

* tag 'pm-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: hibernate: Remove register_nosave_region_late()
  PM: wakeup: simplify the output logic of pm_show_wakelocks()

3 years agoMerge tag 'trace-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Fri, 28 Jan 2022 17:30:35 +0000 (19:30 +0200)]
Merge tag 'trace-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pulltracing fixes from Steven Rostedt:

 - Limit mcount build time sorting to only those archs that we know it
   works for.

 - Fix memory leak in error path of histogram setup

 - Fix and clean up rel_loc array out of bounds issue

 - tools/rtla documentation fixes

 - Fix issues with histogram logic

* tag 'trace-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Don't inc err_log entry count if entry allocation fails
  tracing: Propagate is_signed to expression
  tracing: Fix smatch warning for do while check in event_hist_trigger_parse()
  tracing: Fix smatch warning for null glob in event_hist_trigger_parse()
  tools/tracing: Update Makefile to build rtla
  rtla: Make doc build optional
  tracing/perf: Avoid -Warray-bounds warning for __rel_loc macro
  tracing: Avoid -Warray-bounds warning for __rel_loc macro
  tracing/histogram: Fix a potential memory leak for kstrdup()
  ftrace: Have architectures opt-in for mcount build time sorting

3 years agodt-bindings: interrupt-controller: sifive,plic: Group interrupt tuples
Geert Uytterhoeven [Fri, 28 Jan 2022 09:03:58 +0000 (10:03 +0100)]
dt-bindings: interrupt-controller: sifive,plic: Group interrupt tuples

To improve human readability and enable automatic validation, the tuples
in "interrupts-extended" properties should be grouped using angle
brackets.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/211705e74a2ce77de43d036c5dea032484119bf7.1643360419.git.geert@linux-m68k.org
3 years agodt-bindings: interrupt-controller: sifive,plic: Fix number of interrupts
Geert Uytterhoeven [Fri, 28 Jan 2022 09:03:57 +0000 (10:03 +0100)]
dt-bindings: interrupt-controller: sifive,plic: Fix number of interrupts

The number of interrupts lacks an upper bound, thus assuming one,
causing properly grouped "interrupts-extended" properties to be flagged
as an error by "make dtbs_check".

Fix this by adding the missing "maxItems", using the architectural
maximum of 15872 interrupts.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/f73a0aead89e1426b146c4c64f797aa035868bf0.1643360419.git.geert@linux-m68k.org
3 years agoMerge branch 'ucount-rlimit-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm...
Linus Torvalds [Fri, 28 Jan 2022 17:25:24 +0000 (19:25 +0200)]
Merge branch 'ucount-rlimit-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull ucount rlimit fix from Eric Biederman.

Make sure the ucounts have a reference to the user namespace it refers
to, so that users that themselves don't carry such a reference around
can safely use the ucount functions.

* 'ucount-rlimit-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucount:  Make get_ucount a safe get_user replacement

3 years agodt-bindings: irqchip: renesas-irqc: Add R-Car V3U support
Geert Uytterhoeven [Wed, 26 Jan 2022 12:32:05 +0000 (13:32 +0100)]
dt-bindings: irqchip: renesas-irqc: Add R-Car V3U support

Document support for the Interrupt Controller for External Devices
(INT-EC) in the Renesas R-Car V3U (r8a779a0) SoC.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/85b246cc0792663c72c1bb12a8576bd23d2299d3.1643200256.git.geert+renesas@glider.be
3 years agoMerge tag 'rcu-urgent.2022.01.26a' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 28 Jan 2022 17:19:22 +0000 (19:19 +0200)]
Merge tag 'rcu-urgent.2022.01.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU fix from Paul McKenney:
 "This fixes a brown-paper-bag bug in RCU tasks that causes things like
  BPF and ftrace to fail miserably on systems with non-power-of-two
  numbers of CPUs.

  It fixes a math error added in 7a30871b6a27 ("rcu-tasks: Introduce
  ->percpu_enqueue_shift for dynamic queue selection') during the v5.17
  merge window. This commit works correctly only on systems with a
  power-of-two number of CPUs, which just so happens to be the kind that
  rcutorture always uses by default.

  This pull request fixes the math so that things also work on systems
  that don't happen to have a power-of-two number of CPUs"

* tag 'rcu-urgent.2022.01.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu-tasks: Fix computation of CPU-to-list shift counts

3 years agoMerge tag 'hyperv-fixes-signed-20220128' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 28 Jan 2022 17:06:11 +0000 (19:06 +0200)]
Merge tag 'hyperv-fixes-signed-20220128' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Fix screen resolution for hyperv framebuffer (Michael Kelley)

 - Fix packet header accounting for balloon driver (Yanming Liu)

* tag 'hyperv-fixes-signed-20220128' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  video: hyperv_fb: Fix validation of screen resolution
  Drivers: hv: balloon: account for vmbus packet header in max_pkt_size

3 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 28 Jan 2022 17:00:26 +0000 (19:00 +0200)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Two larger x86 series:

   - Redo incorrect fix for SEV/SMAP erratum

   - Windows 11 Hyper-V workaround

  Other x86 changes:

   - Various x86 cleanups

   - Re-enable access_tracking_perf_test

   - Fix for #GP handling on SVM

   - Fix for CPUID leaf 0Dh in KVM_GET_SUPPORTED_CPUID

   - Fix for ICEBP in interrupt shadow

   - Avoid false-positive RCU splat

   - Enable Enlightened MSR-Bitmap support for real

  ARM:

   - Correctly update the shadow register on exception injection when
     running in nVHE mode

   - Correctly use the mm_ops indirection when performing cache
     invalidation from the page-table walker

   - Restrict the vgic-v3 workaround for SEIS to the two known broken
     implementations

  Generic code changes:

   - Dead code cleanup"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
  KVM: eventfd: Fix false positive RCU usage warning
  KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use
  KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread()
  KVM: nVMX: Rename vmcs_to_field_offset{,_table}
  KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
  KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
  selftests: kvm: check dynamic bits against KVM_X86_XCOMP_GUEST_SUPP
  KVM: x86: add system attribute to retrieve full set of supported xsave states
  KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr
  selftests: kvm: move vm_xsave_req_perm call to amx_test
  KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time
  KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS
  KVM: x86: Keep MSR_IA32_XSS unchanged for INIT
  KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2}
  KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02
  KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest
  KVM: x86: Check .flags in kvm_cpuid_check_equal() too
  KVM: x86: Forcibly leave nested virt when SMM state is toggled
  KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments()
  KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real
  ...

3 years agoMerge tag 'mips-fixes-5.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Fri, 28 Jan 2022 16:53:45 +0000 (18:53 +0200)]
Merge tag 'mips-fixes-5.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS build fix from Thomas Bogendoerfer:
 "Fix for allmodconfig build"

* tag 'mips-fixes-5.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Fix build error due to PTR used in more places

3 years agoMerge tag 's390-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Fri, 28 Jan 2022 16:50:05 +0000 (18:50 +0200)]
Merge tag 's390-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - Fix loading of modules with lots of relocations and add a regression
   test for it.

 - Fix machine check handling for vector validity and guarded storage
   validity failures in KVM guests.

 - Fix hypervisor performance data to include z/VM guests with access
   control group set.

 - Fix z900 build problem in uaccess code.

 - Update defconfigs.

* tag 's390-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/hypfs: include z/VM guests with access control group set
  s390: update defconfigs
  s390/module: test loading modules with a lot of relocations
  s390/module: fix loading modules with a lot of relocations
  s390/uaccess: fix compile error
  s390/nmi: handle vector validity failures for KVM guests
  s390/nmi: handle guarded storage validity failures for KVM guests

3 years agoMerge tag 'ceph-for-5.17-rc2' of git://github.com/ceph/ceph-client
Linus Torvalds [Fri, 28 Jan 2022 16:36:42 +0000 (18:36 +0200)]
Merge tag 'ceph-for-5.17-rc2' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A ZERO_SIZE_PTR dereference fix from Xiubo and two fixes for async
  creates interacting with pool namespace-constrained OSD permissions
  from Jeff (marked for stable)"

* tag 'ceph-for-5.17-rc2' of git://github.com/ceph/ceph-client:
  ceph: set pool_ns in new inode layout for async creates
  ceph: properly put ceph_string reference after async create attempt
  ceph: put the requests/sessions when it fails to alloc memory

3 years agoarm64: cpufeature: List early Cortex-A510 parts as having broken dbm
James Morse [Tue, 25 Jan 2022 15:40:40 +0000 (15:40 +0000)]
arm64: cpufeature: List early Cortex-A510 parts as having broken dbm

Versions of Cortex-A510 before r0p3 are affected by a hardware erratum
where the hardware update of the dirty bit is not correctly ordered.

Add these cpus to the cpu_has_broken_dbm list.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20220125154040.549272-3-james.morse@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoocfs2: fix subdirectory registration with register_sysctl()
Linus Torvalds [Fri, 28 Jan 2022 08:00:29 +0000 (10:00 +0200)]
ocfs2: fix subdirectory registration with register_sysctl()

The kernel test robot reports that commit c42ff46f97c1 ("ocfs2: simplify
subdirectory registration with register_sysctl()") is broken, and
results in kernel warning messages like

  sysctl table check failed: fs/ocfs2/nm Not a file
  sysctl table check failed: fs/ocfs2/nm No proc_handler
  sysctl table check failed: fs/ocfs2/nm bogus .mode 0555

and in fact this was already reported back in linux-next, but nobody
seems to have reacted to that report.  Possibly that original report
only ever made it to the lkp list.

The problem seems to be that the simplification didn't actually go far
enough, and should have converted the whole directory path to the final
sysctl file, rather than just the two first components.

So take that last step.

Fixes: c42ff46f97c1 ("ocfs2: simplify subdirectory registration with register_sysctl()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/all/20220128065310.GF8421@xsang-OptiPlex-9020/
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/KQ2F6TPJWMDVEXJM4WTUC4DU3EH3YJVT/
Tested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge tag 'trbe-cortex-a510-errata' of gitolite.kernel.org:pub/scm/linux/kernel/git...
Catalin Marinas [Fri, 28 Jan 2022 16:14:06 +0000 (16:14 +0000)]
Merge tag 'trbe-cortex-a510-errata' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux into for-next/fixes

coresight: trbe: Workaround Cortex-A510 erratas

This pull request is providing arm64 definitions to support
TRBE Cortex-A510 erratas.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
* tag 'trbe-cortex-a510-errata' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux:
  arm64: errata: Add detection for TRBE trace data corruption
  arm64: errata: Add detection for TRBE invalid prohibited states
  arm64: errata: Add detection for TRBE ignored system register writes
  arm64: Add Cortex-A510 CPU part definition

3 years agoMerge tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 28 Jan 2022 15:51:31 +0000 (17:51 +0200)]
Merge tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify fixes from Jan Kara:
 "Fixes for userspace breakage caused by fsnotify changes ~3 years ago
  and one fanotify cleanup"

* tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: fix fsnotify hooks in pseudo filesystems
  fsnotify: invalidate dcache before IN_DELETE event
  fanotify: remove variable set but not used

3 years agoMerge tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack...
Linus Torvalds [Fri, 28 Jan 2022 15:19:49 +0000 (17:19 +0200)]
Merge tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull udf and quota fixes from Jan Kara:
 "Fixes for crashes in UDF when inode expansion fails and one quota
  cleanup"

* tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: cleanup double word in comment
  udf: Restore i_lenAlloc when inode expansion fails
  udf: Fix NULL ptr deref when converting from inline format

3 years agoMerge tag 'kvmarm-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Fri, 28 Jan 2022 12:45:15 +0000 (07:45 -0500)]
Merge tag 'kvmarm-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.17, take #1

- Correctly update the shadow register on exception injection when
  running in nVHE mode

- Correctly use the mm_ops indirection when performing cache invalidation
  from the page-table walker

- Restrict the vgic-v3 workaround for SEIS to the two known broken
  implementations

3 years agoKVM: eventfd: Fix false positive RCU usage warning
Hou Wenlong [Thu, 27 Jan 2022 06:54:49 +0000 (14:54 +0800)]
KVM: eventfd: Fix false positive RCU usage warning

Fix the following false positive warning:
 =============================
 WARNING: suspicious RCU usage
 5.16.0-rc4+ #57 Not tainted
 -----------------------------
 arch/x86/kvm/../../../virt/kvm/eventfd.c:484 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 3 locks held by fc_vcpu 0/330:
  #0: ffff8884835fc0b0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x88/0x6f0 [kvm]
  #1: ffffc90004c0bb68 (&kvm->srcu){....}-{0:0}, at: vcpu_enter_guest+0x600/0x1860 [kvm]
  #2: ffffc90004c0c1d0 (&kvm->irq_srcu){....}-{0:0}, at: kvm_notify_acked_irq+0x36/0x180 [kvm]

 stack backtrace:
 CPU: 26 PID: 330 Comm: fc_vcpu 0 Not tainted 5.16.0-rc4+
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x44/0x57
  kvm_notify_acked_gsi+0x6b/0x70 [kvm]
  kvm_notify_acked_irq+0x8d/0x180 [kvm]
  kvm_ioapic_update_eoi+0x92/0x240 [kvm]
  kvm_apic_set_eoi_accelerated+0x2a/0xe0 [kvm]
  handle_apic_eoi_induced+0x3d/0x60 [kvm_intel]
  vmx_handle_exit+0x19c/0x6a0 [kvm_intel]
  vcpu_enter_guest+0x66e/0x1860 [kvm]
  kvm_arch_vcpu_ioctl_run+0x438/0x7f0 [kvm]
  kvm_vcpu_ioctl+0x38a/0x6f0 [kvm]
  __x64_sys_ioctl+0x89/0xc0
  do_syscall_64+0x3a/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Since kvm_unregister_irq_ack_notifier() does synchronize_srcu(&kvm->irq_srcu),
kvm->irq_ack_notifier_list is protected by kvm->irq_srcu. In fact,
kvm->irq_srcu SRCU read lock is held in kvm_notify_acked_irq(), making it
a false positive warning. So use hlist_for_each_entry_srcu() instead of
hlist_for_each_entry_rcu().

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
Message-Id: <f98bac4f5052bad2c26df9ad50f7019e40434512.1643265976.git.houwenlong.hwl@antgroup.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nVMX: Allow VMREAD when Enlightened VMCS is in use
Vitaly Kuznetsov [Wed, 12 Jan 2022 17:01:34 +0000 (18:01 +0100)]
KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use

Hyper-V TLFS explicitly forbids VMREAD and VMWRITE instructions when
Enlightened VMCS interface is in use:

"Any VMREAD or VMWRITE instructions while an enlightened VMCS is
active is unsupported and can result in unexpected behavior.""

Windows 11 + WSL2 seems to ignore this, attempts to VMREAD VMCS field
0x4404 ("VM-exit interruption information") are observed. Failing
these attempts with nested_vmx_failInvalid() makes such guests
unbootable.

Microsoft confirms this is a Hyper-V bug and claims that it'll get fixed
eventually but for the time being we need a workaround. (Temporary) allow
VMREAD to get data from the currently loaded Enlightened VMCS.

Note: VMWRITE instructions remain forbidden, it is not clear how to
handle them properly and hopefully won't ever be needed.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread()
Vitaly Kuznetsov [Wed, 12 Jan 2022 17:01:33 +0000 (18:01 +0100)]
KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread()

In preparation to allowing reads from Enlightened VMCS from
handle_vmread(), implement evmcs_field_offset() to get the correct
read offset. get_evmcs_offset(), which is being used by KVM-on-Hyper-V,
is almost what's needed but a few things need to be adjusted. First,
WARN_ON() is unacceptable for handle_vmread() as any field can (in
theory) be supplied by the guest and not all fields are defined in
eVMCS v1. Second, we need to handle 'holes' in eVMCS (missing fields).
It also sounds like a good idea to WARN_ON() if such fields are ever
accessed by KVM-on-Hyper-V.

Implement dedicated evmcs_field_offset() helper.

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nVMX: Rename vmcs_to_field_offset{,_table}
Vitaly Kuznetsov [Wed, 12 Jan 2022 17:01:32 +0000 (18:01 +0100)]
KVM: nVMX: Rename vmcs_to_field_offset{,_table}

vmcs_to_field_offset{,_table} may sound misleading as VMCS is an opaque
blob which is not supposed to be accessed directly. In fact,
vmcs_to_field_offset{,_table} are related to KVM defined VMCS12 structure.

Rename vmcs_field_to_offset() to get_vmcs12_field_offset() for clarity.

No functional change intended.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
Vitaly Kuznetsov [Wed, 12 Jan 2022 17:01:31 +0000 (18:01 +0100)]
KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER

Enlightened VMCS v1 doesn't have VMX_PREEMPTION_TIMER_VALUE field,
PIN_BASED_VMX_PREEMPTION_TIMER is also filtered out already so it makes
sense to filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER too.

Note, none of the currently existing Windows/Hyper-V versions are known
to enable 'save VMX-preemption timer value' when eVMCS is in use, the
change is aimed at making the filtering future proof.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
Vitaly Kuznetsov [Wed, 12 Jan 2022 17:01:30 +0000 (18:01 +0100)]
KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS

Similar to MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS,
MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS pair,
MSR_IA32_VMX_TRUE_PINBASED_CTLS needs to be filtered the same way
MSR_IA32_VMX_PINBASED_CTLS is currently filtered as guests may solely rely
on 'true' MSR data.

Note, none of the currently existing Windows/Hyper-V versions are known
to stumble upon the unfiltered MSR_IA32_VMX_TRUE_PINBASED_CTLS, the change
is aimed at making the filtering future proof.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoselftests: kvm: check dynamic bits against KVM_X86_XCOMP_GUEST_SUPP
Paolo Bonzini [Wed, 26 Jan 2022 12:51:00 +0000 (07:51 -0500)]
selftests: kvm: check dynamic bits against KVM_X86_XCOMP_GUEST_SUPP

Provide coverage for the new API.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86: add system attribute to retrieve full set of supported xsave states
Paolo Bonzini [Wed, 26 Jan 2022 12:49:45 +0000 (07:49 -0500)]
KVM: x86: add system attribute to retrieve full set of supported xsave states

Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded
VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that
have not been enabled.  Probing those, for example so that they can be
passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl.
The latter is in fact worse, even though that is what the rest of the
API uses, because it would require supported_xcr0 to be moved from the
KVM module to the kernel just for this use.  In addition, the value
would be nonsensical (or an error would have to be returned) until
the KVM module is loaded in.

Therefore, to limit the growth of system ioctls, add a /dev/kvm
variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86
with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86: Add a helper to retrieve userspace address from kvm_device_attr
Sean Christopherson [Thu, 27 Jan 2022 15:31:53 +0000 (07:31 -0800)]
KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr

Add a helper to handle converting the u64 userspace address embedded in
struct kvm_device_attr into a userspace pointer, it's all too easy to
forget the intermediate "unsigned long" cast as well as the truncation
check.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agokselftest/arm64: Correct logging of FPSIMD register read via ptrace
Mark Brown [Mon, 24 Jan 2022 17:55:27 +0000 (17:55 +0000)]
kselftest/arm64: Correct logging of FPSIMD register read via ptrace

There's a cut'n'paste error in the logging for our test for reading register
state back via ptrace, correctly say that we did a read instead of a write.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220124175527.3260234-3-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agokselftest/arm64: Skip VL_INHERIT tests for unsupported vector types
Mark Brown [Mon, 24 Jan 2022 17:55:26 +0000 (17:55 +0000)]
kselftest/arm64: Skip VL_INHERIT tests for unsupported vector types

Currently we unconditionally test the ability to set the vector length
inheritance flag via ptrace meaning that we generate false failures on
systems that don't support SVE when we attempt to set the vector length
there. Check the hwcap and mark the tests as skipped when it's not present.

Fixes: 0ba1ce1e8605 ("selftests: arm64: Add coverage of ptrace flags for SVE VL inheritance")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220124175527.3260234-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoMerge tag 'ata-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Fri, 28 Jan 2022 09:47:05 +0000 (11:47 +0200)]
Merge tag 'ata-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ATA fix from Damien Le Moal:
 "A single fix for 5.17-rc2, adding a missing resource allocation error
  check in the pata_platform driver, from Zhou"

* tag 'ata-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: pata_platform: Fix a NULL pointer dereference in __pata_platform_probe()

3 years agoMerge tag 'hwmon-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 28 Jan 2022 07:48:20 +0000 (09:48 +0200)]
Merge tag 'hwmon-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Fix crash in nct6775 driver

 - Prevent divide by zero in adt7470 driver

 - Fix conditional compile warning in pmbus/ir38064 driver

 - Various minor fixes in lm90 driver

* tag 'hwmon-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (nct6775) Fix crash in clear_caseopen
  hwmon: (adt7470) Prevent divide by zero in adt7470_fan_write()
  hwmon: (pmbus/ir38064) Mark ir38064_of_match as __maybe_unused
  hwmon: (lm90) Fix sysfs and udev notifications
  hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649
  hwmon: (lm90) Mark alert as broken for MAX6680
  hwmon: (lm90) Mark alert as broken for MAX6654
  hwmon: (lm90) Re-enable interrupts after alert clears
  hwmon: (lm90) Reduce maximum conversion rate for G781

3 years agoMerge tag 'drm-fixes-2022-01-28' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 28 Jan 2022 07:43:00 +0000 (09:43 +0200)]
Merge tag 'drm-fixes-2022-01-28' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "This week's regular normal fixes. amdgpu and msm make up the bulk of
  it, with a scattering of fixes elsewhere.

  atomic:
   - fix CRTC handling during modeset

  privcy-screen:
   - honor acpi=off

  ttm:
   - build fix for um

  panel:
   - add orientation quirk for 1NetBook OneXPlayer

  amdgpu:
   - Proper fix for otg synchronization logic regression
   - DCN3.01 fixes
   - Filter out secondary radeon PCI IDs
   - udelay fixes
   - Fix a memory leak in an error path

  msm:
   - parameter check fixes
   - put_device balancing
   - idle/suspend fixes

  etnaviv:
   - relax submit size checks

  vc4:
   - fix potential deadlock in DSI code

  ast:
   - revert 1600x900 mode change"

* tag 'drm-fixes-2022-01-28' of git://anongit.freedesktop.org/drm/drm: (25 commits)
  drm/privacy-screen: honor acpi=off in detect_thinkpad_privacy_screen
  Revert "drm/ast: Support 1600x900 with 108MHz PCLK"
  drm/amdgpu/display: Remove t_srx_delay_us.
  drm/amd/display: Wrap dcn301_calculate_wm_and_dlg for FPU.
  drm/amd/display: Fix FP start/end for dcn30_internal_validate_bw.
  drm/amd/display/dc/calcs/dce_calcs: Fix a memleak in calculate_bandwidth()
  drm/amdgpu/display: use msleep rather than udelay for long delays
  drm/amdgpu/display: adjust msleep limit in dp_wait_for_training_aux_rd_interval
  drm/amdgpu: filter out radeon secondary ids as well
  drm/amd/display: change FIFO reset condition to embedded display only
  drm/amd/display: Correct MPC split policy for DCN301
  drm/amd/display: Fix for otg synchronization logic
  drm/etnaviv: relax submit size limits
  drm/msm/gpu: Cancel idle/boost work on suspend
  drm/msm/gpu: Wait for idle before suspending
  drm/atomic: Add the crtc to affected crtc only if uapi.enable = true
  drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable
  drm/msm/a6xx: Add missing suspend_count increment
  drm/msm: Fix wrong size calculation
  drm/msm/dpu: invalid parameter check in dpu_setup_dspp_pcc
  ...

3 years agoMerge tag 'amd-drm-fixes-5.17-2022-01-26' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Fri, 28 Jan 2022 04:59:44 +0000 (14:59 +1000)]
Merge tag 'amd-drm-fixes-5.17-2022-01-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.17-2022-01-26:

amdgpu:
- Proper fix for otg synchronization logic regression
- DCN3.01 fixes
- Filter out secondary radeon PCI IDs
- udelay fixes
- Fix a memory leak in an error path

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127041006.5695-1-alexander.deucher@amd.com