- Use dev_err_probe() in dw-rockchip probe error path so the failures
aren't silent (Uwe Kleine-König)
- Sleep PCIE_T_PVPERL_MS (100ms) before deasserting PERST# (Damien Le Moal)
- Sleep PCIE_T_RRS_READY_MS (100ms) after conventional reset, before a
config access (Damien Le Moal)
- Request the PERST# GPIO with GPIOD_OUT_LOW so it matches the POR value,
which avoids a spurious PERST# assertion and fixes a Qcom modem firmware
crash and issues with WLAN controllers, e.g., RTL8822CE (Manivannan
Sadhasivam for rockchip, Niklas Cassel for dw-rockchip)
- Refactor dw-rockchip and add support for Endpoint mode for rk3568 and
rk3588 (Niklas Cassel)
* pci/controller/rockchip:
PCI: dw-rockchip: Use pci_epc_init_notify() directly
PCI: dw-rockchip: Add endpoint mode support
PCI: dw-rockchip: Refactor the driver to prepare for EP mode
PCI: dw-rockchip: Add rockchip_pcie_get_ltssm() helper
PCI: dw-rockchip: Fix weird indentation
PCI: dw-rockchip: Fix initial PERST# GPIO value
PCI: dw-rockchip: Add error messages in .probe() error paths
PCI: rockchip: Use GPIOD_OUT_LOW flag while requesting ep_gpio
PCI: rockchip-host: Wait 100ms after reset before starting configuration
PCI: rockchip-host: Fix rockchip_pcie_host_init_port() PERST# handling
- Demote WARN() to dev_warn_ratelimited() in rcar_pcie_wakeup() to avoid
excessive warnings when the driver is confused about link state when
resuming (Marek Vasut)
* pci/controller/rcar:
PCI: rcar: Demote WARN() to dev_warn_ratelimited() in rcar_pcie_wakeup()
- Use devm_clk_bulk_get_all() to get all the clocks from DT to avoid
writing out all the clock names (Manivannan Sadhasivam)
- Add DT binding and driver support for the SA8775P SoC (Mrinmay Sarkar)
- Refactor dw_pcie_edma_find_chip() to enable adding support for Hyper DMA
(HDMA) (Manivannan Sadhasivam)
- Enable drivers to supply the eDMA channel count since some can't auto
detect this (Manivannan Sadhasivam)
- Add HDMA support for the SA8775P SoC (Mrinmay Sarkar)
- Override the SA8775P NO_SNOOP default to avoid possible memory corruption
(Mrinmay Sarkar)
- Make sure resources are disabled during PERST# assertion, even if the
link is already disabled (Manivannan Sadhasivam)
- Vote for the CPU-PCIe ICC (interconnect) path to ensure it stays active
even if other drivers don't vote for it (Krishna chaitanya chundru)
- Add Operating Performance Points (OPP) to scale performance state based
on aggregate link bandwidth to improve SoC power efficiency (Krishna
chaitanya chundru)
- Return failure instead of success if dev_pm_opp_find_freq_floor() fails
(Dan Carpenter)
- Avoid an error pointer dereference if dev_pm_opp_find_freq_exact() fails
(Dan Carpenter)
- Prevent use of uninitialized data in qcom_pcie_suspend_noirq() (Dan
Carpenter)
* pci/controller/qcom:
PCI: qcom: Prevent use of uninitialized data in qcom_pcie_suspend_noirq()
PCI: qcom: Prevent potential error pointer dereference
PCI: qcom: Fix missing error code in qcom_pcie_probe()
PCI: qcom: Add OPP support to scale performance
PCI: Bring the PCIe speed to MBps logic to new pcie_dev_speed_mbps()
PCI: qcom: Add ICC bandwidth vote for CPU to PCIe path
PCI: qcom-ep: Disable resources unconditionally during PERST# assert
PCI: qcom-ep: Override NO_SNOOP attribute for SA8775P EP
PCI: qcom: Override NO_SNOOP attribute for SA8775P RC
PCI: epf-mhi: Enable HDMA for SA8775P SoC
PCI: qcom-ep: Add HDMA support for SA8775P SoC
PCI: dwc: Pass the eDMA mapping format flag directly from glue drivers
PCI: dwc: Skip finding eDMA channels count for HDMA platforms
PCI: dwc: Refactor dw_pcie_edma_find_chip() API
PCI: qcom-ep: Add support for SA8775P SOC
dt-bindings: PCI: qcom-ep: Add support for SA8775P SoC
PCI: qcom: Use devm_clk_bulk_get_all() API
- Move PLDA XpressRICH generic DT binding properties to
plda,xpressrich3-axi-common.yaml where they can be shared across
PLDA-based drivers (Minda Chen)
- Create a drivers/pci/controller/plda/ directory for PLDA-based drivers
and move pcie-microchip-host.c there (Minda Chen)
- Move PLDA generic macros to pcie-plda.h where they can be shared across
drivers (Minda Chen)
- Extract PLDA generic structures from pcie-microchip-host.c, rename them
to be generic, and move them to pcie-plda-host.c where they can be shared
across drivers (Minda Chen)
- Add a .request_event_irq() callback for requesting device-specific
interrupts in addition to PLDA-generic interrupts (Minda Chen)
- Add DT binding and driver for the StarFive JH7110 SoC, based on PLDA IP
(Minda Chen)
* pci/controller/microchip:
PCI: starfive: Add JH7110 PCIe controller
dt-bindings: PCI: Add StarFive JH7110 PCIe controller
PCI: Add PCIE_RESET_CONFIG_DEVICE_WAIT_MS waiting time value
PCI: plda: Pass pci_host_bridge to plda_pcie_setup_iomems()
PCI: plda: Add host init/deinit and map bus functions
PCI: plda: Add event bitmap field to struct plda_pcie_rp
PCI: microchip: Move IRQ functions to pcie-plda-host.c
PCI: microchip: Add event irqchip field to host port and add PLDA irqchip
PCI: microchip: Add get_events() callback and PLDA get_event()
PCI: microchip: Add INTx and MSI event num to struct plda_event
PCI: microchip: Add request_event_irq() callback function
PCI: microchip: Add num_events field to struct plda_pcie_rp
PCI: microchip: Rename interrupt related functions
PCI: microchip: Move PLDA functions to pcie-plda-host.c
PCI: microchip: Rename PLDA functions to be generic
PCI: microchip: Move PLDA structures to plda-pcie.h
PCI: microchip: Rename PLDA structures to be generic
PCI: microchip: Add bridge_addr field to struct mc_pcie
PCI: microchip: Move PLDA IP register macros to pcie-plda.h
PCI: microchip: Move pcie-microchip-host.c to PLDA directory
dt-bindings: PCI: Add PLDA XpressRICH PCIe host common properties
- Enable BAR 0 only for v3.65a to avoid Completion Timeouts that
cause a 45 second boot delay on the v4.90a-based AM654x SoC (Siddharth
Vadapalli)
- Avoid a NULL pointer dereference if DT failed to provide a host bridge
memory window (Aleksandr Mishin)
* pci/controller/keystone:
PCI: keystone: Add workaround for Errata #i2037 (AM65x SR 1.0)
PCI: keystone: Fix NULL pointer dereference in case of DT error in ks_pcie_setup_rc_app_regs()
PCI: keystone: Don't enable BAR 0 for AM654x
PCI: keystone: Relocate ks_pcie_set/clear_dbi_mode()
- Use msleep() in DWC core instead of usleep_range() for ~100 ms sleep
(Konrad Dybcio)
- Fix iATU slot management to avoid using the wrong slot after PERST#
assert/deassert, which could potentially cause DMA to go the wrong place
(Frank Li)
- Consolidate dw_pcie_prog_outbound_atu() arguments into a struct to ease
adding new functionality like initiating Message TLPs (Yoshihiro Shimoda)
- Add support for endpoints to initiate PCIe messages (Yoshihiro Shimoda)
- Add #defines for PCIe INTx messages (Yoshihiro Shimoda)
- Add support for endpoints to initiate PCIe PME_Turn_Off messages for
system suspend (Frank Li)
- Add dw_pcie_ep_linkdown() to reinitialize registers that are lost when
the link goes down (Manivannan Sadhasivam)
- Use dw_pcie_ep_linkdown() to reinitialize qcom non-sticky registers that
are lost when the link goes down (Manivannan Sadhasivam)
- Enforce DWC limitation that 64-bit BARs must start with the even numbered
BAR (Niklas Cassel)
* pci/controller/dwc:
PCI: dwc: ep: Enforce DWC specific 64-bit BAR limitation
PCI: layerscape-ep: Use the generic dw_pcie_ep_linkdown() API to handle Link Down event
PCI: qcom-ep: Use the generic dw_pcie_ep_linkdown() API to handle Link Down event
PCI: dwc: ep: Remove dw_pcie_ep_init_notify() wrapper
PCI: dwc: ep: Add a generic dw_pcie_ep_linkdown() API to handle Link Down event
PCI: dwc: Add generic MSG TLP support for sending PME_Turn_Off when system suspend
PCI: Add PCIE_MSG_CODE_PME_TURN_OFF message macro
PCI: Add PCIE_MSG_CODE_ASSERT_INTx message macros
PCI: dwc: Add outbound MSG TLPs support
PCI: dwc: Consolidate args of dw_pcie_prog_outbound_atu() into a structure
PCI: dwc: Fix index 0 incorrectly being interpreted as a free ATU slot
PCI: dwc: Use msleep() in dw_pcie_wait_for_link()
- Use cached epc_features instead of pci_epc_get_features() to avoid having
to check for failure (potential NULL pointer dereference) (Manivannan
Sadhasivam)
- Drop pointless local msix_capable variable in pci_epf_test_alloc_space()
(Manivannan Sadhasivam)
- Rename struct pci_epc_event_ops.core_init to .epc_init, since "core" is
no longer meaningful here (Manivannan Sadhasivam)
- Rename pci_epc_bme_notify(), pci_epf_mhi_bme(), pci_epc_bme_notify() to
spell out "bus_master_enable" instead of "bme" (Manivannan Sadhasivam)
- Factor pci_epf_test_clear_bar() and pci_epf_test_free_space() out of
pci_epf_test_unbind() so they can be reused elsewhere (Manivannan
Sadhasivam)
- Move DMA initialization to the pci_epf_mhi_epc_init() callback so
endpoint drivers do this uniformly (Manivannan Sadhasivam)
- Add endpoint testing for Link Down events (Manivannan Sadhasivam)
- Add 'epc_deinit' event so endpoints that can be reset via PERST# (qcom,
tegra194) can notify EPF drivers when this happens (Manivannan
Sadhasivam)
- Make pci_epc_class constant (Greg Kroah-Hartman)
- Fix vpci_scan_bus() error checking to print error for failure (not
success) and clean up after failure (Dan Carpenter)
- Fix epf_ntb_epc_cleanup() error handling to clean up scratchpad BARs and
clean up in mirror order of allocation (Dan Carpenter)
- Add rk3588, which requires 64KB BAR alignment, to pci_endpoint_test
(Niklas Cassel)
- Use memcpy_toio()/memcpy_fromio() for endpoint BAR tests to improve
performance (Niklas Cassel)
- Set DMA mask to 48 bits always to simplify endpoint test, since there's
there's no need to check for error or to fallback to 32 bits (Frank Li)
- Suggest using programmable Vendor/Device ID (when supported) to use
pci_endpoint_test without having to add new entries (Yoshihiro Shimoda)
- Remove 'linkup' and add 'add_cfs' to the endpoint function driver 'ops'
documentation to match the code (Alexander Stein)
-
* pci/endpoint:
Documentation: PCI: pci-endpoint: Fix EPF ops list
misc: pci_endpoint_test: Remove unused pci_endpoint_test_bar_{readl,writel} functions
misc: pci_endpoint_test: Document policy about adding pci_device_id
misc: pci_endpoint_test: Refactor dma_set_mask_and_coherent() logic
misc: pci_endpoint_test: Use memcpy_toio()/memcpy_fromio() for BAR tests
misc: pci_endpoint_test: Add support for Rockchip rk3588
PCI: endpoint: Fix error handling in epf_ntb_epc_cleanup()
PCI: endpoint: Clean up error handling in vpci_scan_bus()
PCI: endpoint: Make pci_epc_class struct constant
PCI: endpoint: Introduce 'epc_deinit' event and notify the EPF drivers
PCI: endpoint: pci-epf-test: Handle Link Down event
PCI: endpoint: pci-epf-{mhi/test}: Move DMA initialization to EPC init callback
PCI: endpoint: pci-epf-test: Refactor pci_epf_test_unbind() function
PCI: endpoint: Rename BME to Bus Master Enable
PCI: endpoint: Rename core_init() callback in 'struct pci_epc_event_ops' to epc_init()
PCI: endpoint: pci-epf-test: Use 'msix_capable' flag directly in pci_epf_test_alloc_space()
PCI: endpoint: pci-epf-test: Make use of cached 'epc_features' in pci_epf_test_core_init()
PCI: endpoint: Remove unused field in struct pci_epf_group
- Add "apb", "sys", "pmc", "msg", "err" for Endpoint descriptions as well
as for Root Complexes (Niklas Cassel)
- Add "tx_inta", "tx_intb", "tx_intc", "tx_intd" for interrupt signals
triggered in response to PCIe Assert_INTx messages (Niklas Cassel)
- Refactor rockchip-dw-pcie binding to move generic properties to a new
rockchip-dw-pcie-common binding that can be shared by both RC and EP mode
(Niklas Cassel)
- Fix rockchip-dw-pcie description of INTx signals (Niklas Cassel)
- Add rockchip-dw-pcie description of Endpoint controller (Niklas Cassel)
- Avoid xilinx-versal-cpm overlapping of bridge registers and 32-bit BAR
addresses (Thippeswamy Havalige)
- Rename find_resource() to find_resource_space() to make it more
descriptive for exporting outside resource.c (Ilpo Järvinen)
- Document find_resource_space() and the resource_constraint struct it uses
(Ilpo Järvinen)
- Add typedef resource_alignf to make it simpler to declare allocation
constraint alignf callbacks (Ilpo Järvinen)
- Open-code the no-constraint simple alignment case to make the
simple_align_resource() default callback unnecessary (Ilpo Järvinen)
- Export find_resource_space() because PCI bridge window allocation needs
to learn whether there's space for a window (Ilpo Järvinen)
- Fix a double-counting problem in PCI calculate_memsize() that led to
allocating larger windows each time a bus was removed and rescanned (Ilpo
Järvinen)
- When we don't have space to allocate larger bridge windows, allocate
windows only large enough for the downstream devices to prevent cases
where a device worked originally, but not after being removed and
re-added (Ilpo Järvinen)
* pci/resource:
PCI: Relax bridge window tail sizing rules
PCI: Make minimum bridge window alignment reference more obvious
PCI: Fix resource double counting on remove & rescan
resource: Export find_resource_space()
resource: Handle simple alignment inside __find_resource_space()
resource: Use typedef for alignf callback
resource: Document find_resource_space() and resource_constraint
resource: Rename find_resource() to find_resource_space()
- Detect if a device was removed or replaced during system sleep so we
don't assume a new device is the one that used to be there. This uses
Vendor/Device/Subsystem/Class/Revision and Device Serial Number (if
implemented), so it's not fool-proof and drivers may know how to detect
more cases (Lukas Wunner)
- Disable AER and DPC during suspend so that if they share an interrupt
with PME and errors occur during suspend, the AER or DPC interrupt
doesn't cause spurious wakeups (Kai-Heng Feng)
* pci/err:
PCI/DPC: Disable DPC service on suspend
PCI/AER: Disable AER service on suspend
- Move the PRESERVE_BOOT_CONFIG ACPI _DSM evaluation from drivers/acpi to
drivers/pci so we can unify with similar DT functionality (Vidya Sagar)
- Add of_pci_preserve_config() to check for a DT "linux,pci-probe-only"
property on a per-host bridge basis in addition to a global basis (Vidya
Sagar)
- Unify ACPI PRESERVE_BOOT_CONFIG _DSM and DT "linux,pci-probe-only" in a
generic pci_preserve_config() path (Vidya Sagar)
* pci/enumeration:
PCI: Use preserve_config in place of pci_flags
PCI: Unify ACPI and DT 'preserve config' support
PCI: of: Add of_pci_preserve_config() for per-host bridge support
PCI: Move PRESERVE_BOOT_CONFIG _DSM evaluation to pci_register_host_bridge()
- If there's a device below a bridge, prevent a use-after-free by holding a
reference to the device while waiting for the secondary bus to be ready
in case the device is concurrently removed, e.g., by DPC (Lukas Wunner)
* pci/dpc:
PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
- Add pcim_add_mapping_to_legacy_table() and
pcim_remove_mapping_from_legacy_table() helper functions to simplify
devres iomap table (Philipp Stanner)
- Reimplement devres that take a bit mask of BARs in a way that can be used
to map partial BARs as well as entire BARs (Philipp Stanner)
- Deprecate pcim_iomap_table() and pcim_iomap_regions_request_all() in
favor of pcim_* request plus pcim_* mapping (Philipp Stanner)
- Add pcim_request_region(), a managed interface to request a single BAR
(Philipp Stanner)
- Use the existing pci_is_enabled() interface to replace the struct
devres.enabled bit (Philipp Stanner)
- Move the struct pci_devres.pinned bit to struct pci_dev (Philipp Stanner)
- Reimplement pcim_set_mwi() so it uses its own devres cleanup callback
instead of a special-purpose bit in struct pci_devres (Philipp Stanner)
- Add pcim_intx(), which is unambiguously managed, unlike pci_intx(), which
is managed if pcim_enable_device() has been called but unmanaged
otherwise (Philipp Stanner)
- Remove pcim_release(), which is no longer needed after previous cleanups
of pcim_set_mwi() and pci_intx() (Philipp Stanner)
- Add pcim_iomap_range(), a managed interface to map part of a BAR (Philipp
Stanner)
- Fix vboxvideo leak by using the new pcim_iomap_range() instead of the
unmanaged pci_iomap_range() (Philipp Stanner)
* pci/devres:
drm/vboxvideo: fix mapping leaks
PCI: Add managed pcim_iomap_range()
PCI: Remove legacy pcim_release()
PCI: Add managed pcim_intx()
PCI: Give pcim_set_mwi() its own devres cleanup callback
PCI: Move struct pci_devres.pinned bit to struct pci_dev
PCI: Remove struct pci_devres.enabled status bit
PCI: Document hybrid devres hazards
PCI: Add managed pcim_request_region()
PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all()
PCI: Add managed partial-BAR request and map infrastructure
PCI: Add devres helpers for iomap table
PCI: Add and use devres helper for bit masks
- Add ACS quirk for Broadcom BCM5760X NIC, which doesn't allow peer-to-peer
transactions between functions, but doesn't advertise ACS support (Ajit
Khaparde)
- Add "pci=config_acs=" kernel command-line parameter to relax default ACS
settings to enable peer-to-peer configurations. Requires expert
knowledge of topology and ACS operation (Vidya Sagar)
* pci/acs:
PCI: Extend ACS configurability
PCI: Add ACS quirk for Broadcom BCM5760X NIC
Huacai Chen [Wed, 12 Jun 2024 06:53:15 +0000 (14:53 +0800)]
PCI: loongson: Enable MSI in LS7A Root Complex
The LS7A chipset can be used as part of a PCIe Root Complex with
Loongson-3C6000 and similar CPUs. In this case, DEV_LS7A_PCIE_PORT5 has a
PCI_CLASS_BRIDGE_HOST class code, and it is a Type 0 Function whose config
space provides access to Root Complex registers.
The DEV_LS7A_PCIE_PORT5 has an MSI Capability, and its MSI Enable bit must
be set before other devices below the Root Complex can use MSI. This is
not the standard PCI behavior of MSI Enable, so the normal PCI MSI code
does not set it.
Set the DEV_LS7A_PCIE_PORT5 MSI Enable bit via a quirk so other devices
below the Root Complex can use MSI.
Vidya Sagar [Tue, 25 Jun 2024 15:31:50 +0000 (21:01 +0530)]
PCI: Extend ACS configurability
PCIe ACS settings control the level of isolation and the possible P2P paths
between devices. With greater isolation the kernel will create smaller
iommu_groups and with less isolation there is more HW that can achieve P2P
transfers. From a virtualization perspective all devices in the same
iommu_group must be assigned to the same VM as they lack security
isolation.
There is no way for the kernel to automatically know the correct ACS
settings for any given system and workload. Existing command line options
(e.g., disable_acs_redir) allow only for large scale change, disabling all
isolation, but this is not sufficient for more complex cases.
Add a kernel command-line option 'config_acs' to directly control all the
ACS bits for specific devices, which allows the operator to setup the right
level of isolation to achieve the desired P2P configuration. The
definition is future proof; when new ACS bits are added to the spec the
open syntax can be extended.
ACS needs to be setup early in the kernel boot as the ACS settings affect
how iommu_groups are formed. iommu_group formation is a one time event
during initial device discovery, so changing ACS bits after kernel boot can
result in an inaccurate view of the iommu_groups compared to the current
isolation configuration.
ACS applies to PCIe Downstream Ports and multi-function devices. The
default ACS settings are strict and deny any direct traffic between two
functions. This results in the smallest iommu_group the HW can support.
Frequently these values result in slow or non-working P2PDMA.
ACS offers a range of security choices controlling how traffic is
allowed to go directly between two devices. Some popular choices:
- Full prevention
- Translated requests can be direct, with various options
- Asymmetric direct traffic, A can reach B but not the reverse
- All traffic can be direct
Along with some other less common ones for special topologies.
The intention is that this option would be used with expert knowledge of
the HW capability and workload to achieve the desired configuration.
Where pci_reset_bus() users are triggering unlocked secondary bus resets.
Ironically pci_bus_reset(), several calls down from pci_reset_bus(), uses
pci_bus_lock() before issuing the reset which locks everything *but* the
bridge itself.
For the same motivation as adding:
bridge = pci_upstream_bridge(dev);
if (bridge)
pci_dev_lock(bridge);
to pci_reset_function() for the "bus" and "cxl_bus" reset cases, add
pci_dev_lock() for @bus->self to pci_bus_lock().
Link: https://lore.kernel.org/r/171711747501.1628941.15217746952476635316.stgit@dwillia2-xfh.jf.intel.com Reported-by: Imre Deak <imre.deak@intel.com> Closes: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
[bhelgaas: squash in recursive locking deadlock fix from Keith Busch:
https://lore.kernel.org/r/20240711193650.701834-1-kbusch@meta.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Kalle Valo <kvalo@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Philipp Stanner [Thu, 13 Jun 2024 11:50:26 +0000 (13:50 +0200)]
drm/vboxvideo: fix mapping leaks
When the PCI devres API was introduced to this driver, it was wrongly
assumed that initializing the device with pcim_enable_device() instead of
pci_enable_device() will make all PCI functions managed.
This is wrong and was caused by the quite confusing PCI devres API in which
some, but not all, functions become managed that way.
The function pci_iomap_range() is never managed.
Replace pci_iomap_range() with the managed function pcim_iomap_range().
Fixes: 8558de401b5f ("drm/vboxvideo: use managed pci functions") Link: https://lore.kernel.org/r/20240613115032.29098-14-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Philipp Stanner [Thu, 13 Jun 2024 11:50:25 +0000 (13:50 +0200)]
PCI: Add managed pcim_iomap_range()
The only managed mapping function currently is pcim_iomap() which doesn't
allow for mapping an area starting at a certain offset, which many drivers
want.
Philipp Stanner [Thu, 13 Jun 2024 11:50:24 +0000 (13:50 +0200)]
PCI: Remove legacy pcim_release()
Thanks to preceding cleanup steps, pcim_release() is now not needed
anymore and can be replaced by pcim_disable_device(), which is the exact
counterpart to pcim_enable_device().
This permits removing further parts of the old PCI devres implementation.
Replace pcim_release() with pcim_disable_device(). Remove the now unused
function get_pci_dr(). Remove the struct pci_devres from pci.h.
Philipp Stanner [Thu, 13 Jun 2024 11:50:23 +0000 (13:50 +0200)]
PCI: Add managed pcim_intx()
pci_intx() is a "hybrid" function, i.e., it is managed if
pcim_enable_device() has been called, but unmanaged otherwise.
Add pcim_intx(), which is always managed, and implement pci_intx() using
it.
Remove the now-unused struct pci_devres.orig_intx and .restore_intx and
find_pci_dr().
Link: https://lore.kernel.org/r/20240613115032.29098-11-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com>
[kwilczynski: squashed in
https://lore.kernel.org/r/426645d40776198e0fcc942f4a6cac4433c7a9aa.camel@redhat.com
to fix problem reported and tested by Ashish Kalra <Ashish.Kalra@amd.com>:
https://lore.kernel.org/r/20240708214656.4721-1-Ashish.Kalra@amd.com
https://lore.kernel.org/r/8c4634e9-4f02-4c54-9c89-d75e2f4bf026@amd.com/] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Jiwei Sun [Wed, 5 Jun 2024 12:48:44 +0000 (20:48 +0800)]
PCI: vmd: Create domain symlink before pci_bus_add_devices()
The vmd driver creates a "domain" symlink in sysfs for each VMD bridge.
Previously this symlink was created after pci_bus_add_devices() added
devices below the VMD bridge and emitted udev events to announce them to
userspace.
This led to a race between userspace consumers of the udev events and the
kernel creation of the symlink. One such consumer is mdadm, which
assembles block devices into a RAID array, and for devices below a VMD
bridge, mdadm depends on the "domain" symlink.
If mdadm loses the race, it may be unable to assemble a RAID array, which
may cause a boot failure or other issues, with complaints like this:
(udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: Unable to get real path for '/sys/bus/pci/drivers/vmd/0000:c7:00.5/domain/device''
(udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: /dev/nvme1n1 is not attached to Intel(R) RAID controller.'
(udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: No OROM/EFI properties for /dev/nvme1n1'
(udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: no RAID superblock on /dev/nvme1n1.'
(udev-worker)[2149]: nvme1n1: Process '/sbin/mdadm -I /dev/nvme1n1' failed with exit code 1.
This symptom prevents the OS from booting successfully.
After a NVMe disk is probed/added by the nvme driver, udevd invokes mdadm
to detect if there is a mdraid associated with this NVMe disk, and mdadm
determines if a NVMe device is connected to a particular VMD domain by
checking the "domain" symlink. For example:
Thread A Thread B Thread mdadm
vmd_enable_domain
pci_bus_add_devices
__driver_probe_device
...
work_on_cpu
schedule_work_on
: wakeup Thread B
nvme_probe
: wakeup scan_work
to scan nvme disk
and add nvme disk
then wakeup udevd
: udevd executes
mdadm command
flush_work main
: wait for nvme_probe done ...
__driver_probe_device find_driver_devices
: probe next nvme device : 1) Detect domain symlink
... 2) Find domain symlink
... from vmd sysfs
... 3) Domain symlink not
... created yet; failed
sysfs_create_link
: create domain symlink
Create the VMD "domain" symlink before invoking pci_bus_add_devices() to
avoid this race.
Suggested-by: Adrian Huang <ahuang12@lenovo.com> Link: https://lore.kernel.org/linux-pci/20240605124844.24293-1-sjiwei@163.com Signed-off-by: Jiwei Sun <sunjw10@lenovo.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Philipp Stanner [Thu, 13 Jun 2024 11:50:22 +0000 (13:50 +0200)]
PCI: Give pcim_set_mwi() its own devres cleanup callback
Managing pci_set_mwi() with devres can easily be done with its own
callback, without the necessity to store any state about it in a
device-related struct.
Remove the MWI state from struct pci_devres. Give pcim_set_mwi() a
separate devres cleanup callback.
Philipp Stanner [Thu, 13 Jun 2024 11:50:21 +0000 (13:50 +0200)]
PCI: Move struct pci_devres.pinned bit to struct pci_dev
The bit describing whether the PCI device is currently pinned is stored
in struct pci_devres. To clean up and simplify the PCI devres API, it's
better if this information is stored in struct pci_dev.
This will later permit simplifying pcim_enable_device().
Move the 'pinned' boolean bit to struct pci_dev.
Restructure bits in struct pci_dev so the pm / pme fields are next to
each other.
Philipp Stanner [Thu, 13 Jun 2024 11:50:20 +0000 (13:50 +0200)]
PCI: Remove struct pci_devres.enabled status bit
The struct pci_devres has a separate boolean to track whether a device is
enabled. That, however, can easily be tracked in an agnostic manner through
the function pci_is_enabled().
Using it allows for simplifying the PCI devres implementation.
Replace the separate 'enabled' status bit from struct pci_devres with
calls to pci_is_enabled() at the appropriate places.
are "hybrid" functions that are managed if pcim_enable_device() has been
called, but unmanaged otherwise.
This is confusing and has already caused a bug (in 8558de401b5f
("drm/vboxvideo: use managed pci functions")) because users believe all PCI
functions, such as pci_iomap_range(), can become managed that way, which is
not the case.
Add comments to the relevant functions' docstrings that warn users about
this behavior.
Deprecate pcim_iomap_table(). It returns a pointer to a table of
ioremapped BARs, or NULL if it fails. This makes uses like this:
addr = pcim_iomap_table(pdev)[0];
problematic because it causes a NULL pointer dereference on failure.
Callers should use pcim_iomap() instead.
Deprecate pcim_iomap_regions_request_all() because it is built on
__pci_request_region() and is managed if pcim_enable_device() has been
called, but unmanaged otherwise, which is prone to errors.
Callers should either use pcim_iomap_regions() to request and map BARs, or
use pcim_request_region() followed by pcim_iomap().
Link: https://lore.kernel.org/r/20240613115032.29098-5-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log, sphinx markup] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Philipp Stanner [Thu, 13 Jun 2024 11:50:15 +0000 (13:50 +0200)]
PCI: Add devres helpers for iomap table
The pcim_iomap_devres.table administrated by pcim_iomap_table() has its
entries set and unset at several places throughout devres.c using manual
iterations which are effectively code duplications.
Add pcim_add_mapping_to_legacy_table() and
pcim_remove_mapping_from_legacy_table() helper functions and use them where
possible.
Philipp Stanner [Thu, 13 Jun 2024 11:50:14 +0000 (13:50 +0200)]
PCI: Add and use devres helper for bit masks
The current devres implementation uses manual shift operations to check
whether a bit in a mask is set. The code can be made more readable by
writing a small helper function for that.
Implement mask_contains_bar() and use it where applicable.
Niklas Cassel [Sat, 22 Jun 2024 13:20:25 +0000 (15:20 +0200)]
PCI: dw-rockchip: Use pci_epc_init_notify() directly
A previous commit ("PCI: dwc: ep: Remove dw_pcie_ep_init_notify() wrapper")
removed the dw_pcie_ep_init_notify() wrapper and changed the DWC glue
drivers to instead use pci_epc_init_notify() directly.
The endpoint support for the dw-rockchip had not been merged at that point
in time, so the previous commit wrapper") did not update dw-rockchip.
Do the same change for dw-rockchip, so that the driver will not try
to use a function that has now been removed.
Niklas Cassel [Fri, 7 Jun 2024 11:14:29 +0000 (13:14 +0200)]
PCI: dw-rockchip: Refactor the driver to prepare for EP mode
Refactor the driver to prepare for EP mode.
Add of-match data to the existing compatible, and explicitly define it as
DW_PCIE_RC_TYPE. This way, we will be able to add EP mode in a follow-up
commit in a much less intrusive way, which makes the follow-up commit much
easier to review.
The actual toggling of PERST# is correct, and we cannot change it anyway,
since that would break device tree compatibility.
However, this driver does request the GPIO to be initialized as
GPIOD_OUT_HIGH, which does cause a silly sequence where PERST# gets
toggled back and forth for no good reason.
Fix this by requesting the GPIO to be initialized as GPIOD_OUT_LOW (which
for this driver means PERST# asserted).
This will avoid an unnecessary signal change where PERST# gets deasserted
(by devm_gpiod_get_optional()) and then gets asserted (by
rockchip_pcie_start_link()) just a few instructions later.
Before patch, debug prints on EP side, when booting RC:
[ 845.606810] pci: PERST# asserted by host!
[ 852.483985] pci: PERST# de-asserted by host!
[ 852.503041] pci: PERST# asserted by host!
[ 852.610318] pci: PERST# de-asserted by host!
After patch, debug prints on EP side, when booting RC:
[ 125.107921] pci: PERST# asserted by host!
[ 132.111429] pci: PERST# de-asserted by host!
This extra, very short, PERST# assertion + deassertion has been reported to
cause issues with certain WLAN controllers, e.g. RTL8822CE.
PCI: rockchip: Use GPIOD_OUT_LOW flag while requesting ep_gpio
Rockchip platforms use 'GPIO_ACTIVE_HIGH' flag in the devicetree definition
for ep_gpio. This means, whatever the logical value set by the driver for
the ep_gpio, physical line will output the same logic level.
For instance,
gpiod_set_value_cansleep(rockchip->ep_gpio, 0); --> Level low
gpiod_set_value_cansleep(rockchip->ep_gpio, 1); --> Level high
But while requesting the ep_gpio, GPIOD_OUT_HIGH flag is currently used.
Now, this also causes the physical line to output 'high' creating trouble
for endpoint devices during host reboot.
When host reboot happens, the ep_gpio will initially output 'low' due to
the GPIO getting reset to its POR value. Then during host controller probe,
it will output 'high' due to GPIOD_OUT_HIGH flag. Then during
rockchip_pcie_host_init_port(), it will first output 'low' and then 'high'
indicating the completion of controller initialization.
On the endpoint side, each output 'low' of ep_gpio is accounted for PERST#
assert and 'high' for PERST# deassert. With the above mentioned flow during
host reboot, endpoint will witness below state changes for PERST#:
(1) PERST# assert - GPIO POR state
(2) PERST# deassert - GPIOD_OUT_HIGH while requesting GPIO
(3) PERST# assert - rockchip_pcie_host_init_port()
(4) PERST# deassert - rockchip_pcie_host_init_port()
Now the time interval between (2) and (3) is very short as both happen
during the driver probe(), and this results in a race in the endpoint.
Because, before completing the PERST# deassertion in (2), endpoint got
another PERST# assert in (3).
A proper way to fix this issue is to change the GPIOD_OUT_HIGH flag in (2)
to GPIOD_OUT_LOW. Because the usual convention is to request the GPIO with
a state corresponding to its 'initial/default' value and let the driver
change the state of the GPIO when required.
As per that, the ep_gpio should be requested with GPIOD_OUT_LOW as it
corresponds to the POR value of '0' (PERST# assert in the endpoint). Then
the driver can change the state of the ep_gpio later in
rockchip_pcie_host_init_port() as per the initialization sequence.
This fixes the firmware crash issue in Qcom based modems connected to
Rockpro64 based board.
Damien Le Moal [Sat, 13 Apr 2024 00:41:20 +0000 (09:41 +0900)]
PCI: rockchip-host: Wait 100ms after reset before starting configuration
PCIe r6.0, sec 6.6.1, states that the host should wait for at least 100
msec from the end of a conventional reset (PERST# is de-asserted) before
sending a configuration request to ensure that the device is able to
respond with a "Request Retry Status" completion.
Add the PCIE_T_RRS_READY_MS macro to define this wait time and modify
rockchip_pcie_host_init_port() to add this 100ms sleep after deasserting
PERST# using the ep_gpio GPIO.
Link: https://lore.kernel.org/linux-pci/20240413004120.1099089-3-dlemoal@kernel.org Suggested-by: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
PCIe CEM r5.1, sec 2.9.2, mandates that the PERST# signal must remain
asserted for at least 100 usec (Tperst-clk) after the PCIe reference clock
becomes stable (if a reference clock is supplied), and for at least 100
msec after the power is stable (Tpvperl, defined by the macro
PCIE_T_PVPERL_MS).
Modify rockchip_pcie_host_init_port() to satisfy these constraints by
adding a sleep period before deasserting PERST# using the ep_gpio GPIO.
Since Tperst-clk is the shorter wait time, add an msleep() call for the
longer PCIE_T_PVPERL_MS milliseconds to handle both timing requirements.
Link: https://lore.kernel.org/linux-pci/20240413004120.1099089-2-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Niklas Cassel [Tue, 28 May 2024 13:48:40 +0000 (15:48 +0200)]
PCI: dwc: ep: Enforce DWC specific 64-bit BAR limitation
From the DWC EP databook 5.96a, section "3.5.7.1.4 General Rules for BAR
Setup (Fixed Mask or Programmable Mask Schemes Only)":
"Any pair (for example BARs 0 and 1) can be configured as one 64-bit BAR,
two 32-bit BARs, or one 32-bit BAR."
"BAR pairs cannot overlap to form a 64-bit BAR. For example, you cannot
combine BARs 1 and 2 to form a 64-bit BAR."
While this limitation does exist in some other PCI endpoint controllers,
e.g. cdns_pcie_ep_set_bar(), the limitation does not appear to be defined
in the PCIe specification itself, thus add an explicit check for this in
dw_pcie_ep_set_bar() (rather than pci_epc_set_bar()).
Manivannan Sadhasivam [Thu, 6 Jun 2024 07:26:38 +0000 (12:56 +0530)]
PCI: layerscape-ep: Use the generic dw_pcie_ep_linkdown() API to handle Link Down event
Now that dw_pcie_ep_linkdown() is available, use it. This also handles the
reinitialization of DWC non-sticky registers in addition to sending the
notification to EPF drivers.
Closes: https://lore.kernel.org/linux-pci/20240528195539.GA458945@bhelgaas Link: https://lore.kernel.org/linux-pci/20240606-pci-deinit-v1-5-4395534520dc@linaro.org Reported-by: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Niklas Cassel <cassel@kernel.org>
PCI: qcom-ep: Use the generic dw_pcie_ep_linkdown() API to handle Link Down event
Now that the generic dw_pcie_ep_linkdown() API is available, use it. This
also handles the reinitialization of DWC non-sticky registers in addition
to sending the notification to EPF drivers.
PCI: dwc: ep: Add a generic dw_pcie_ep_linkdown() API to handle Link Down event
Per PCIe r6.0, sec 5.2, a Link Down event can happen under any of the
following circumstances:
1. Fundamental/Hot reset
2. Link disable transmission by upstream component
3. Moving from L2/L3 to L0
In those cases, Link Down causes some non-sticky DWC registers to lose the
state (like REBAR, etc.), so drivers need to reinitialize them to function
properly once the link comes back again.
This is not a problem for drivers supporting PERST# IRQ, since they can
reinitialize the registers in the PERST# IRQ callback. But for the drivers
not supporting PERST#, there is no way they can reinitialize the registers
other than relying on Link Down IRQ received when the link goes down. So
add a DWC generic API dw_pcie_ep_linkdown() that reinitializes the
non-sticky registers and also notifies the EPF drivers about link going
down.
This API can also be used by the drivers supporting PERST# to handle the
scenario (2) mentioned above.
NOTE: For the sake of code organization, move the dw_pcie_ep_linkup()
definition just above dw_pcie_ep_linkdown().
Frank Li [Thu, 18 Apr 2024 16:04:28 +0000 (12:04 -0400)]
PCI: dwc: Add generic MSG TLP support for sending PME_Turn_Off when system suspend
Instead of relying on the vendor specific implementations to send the
PME_Turn_Off message, introduce a generic way of sending the message using
the MSG TLP.
This is achieved by reserving a region for MSG TLP of size
'pci->region_align', at the end of the first IORESOURCE_MEM window of the
host bridge. And then sending the PME_Turn_Off message during system
suspend with the help of iATU.
The reason for reserving the MSG TLP region at the end of the
IORESOURCE_MEM is to avoid generating holes in between, because when the
region is allocated using allocate_resource(), memory will be allocated
from the start of the window. Later, if memory gets allocated for an
endpoint of size bigger than 'region_align', there will be a hole between
MSG TLP region and endpoint memory.
This generic implementation is optional for the glue drivers and can be
overridden by a custom 'pme_turn_off' callback.
Link: https://lore.kernel.org/linux-pci/20240418-pme_msg-v8-5-a54265c39742@nxp.com Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Add "code" and "routing" into struct dw_pcie_ob_atu_cfg for triggering
INTx IRQs by iATU in the PCIe endpoint mode in near the future.
PCIE_ATU_INHIBIT_PAYLOAD is set to issue TLP type of Msg instead of
MsgD. This implementation supports the data-less messages only for now.
Link: https://lore.kernel.org/linux-pci/20240418-pme_msg-v8-3-a54265c39742@nxp.com Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
PCI: dwc: Consolidate args of dw_pcie_prog_outbound_atu() into a structure
This is a preparation before adding the Msg-type outbound iATU
mapping. The respective update will require two more arguments added
to __dw_pcie_prog_outbound_atu(). That will make the already
complicated function prototype even more hard to comprehend accepting
_eight_ arguments.
To prevent that and keep the code more-or-less readable, move all the
outbound iATU-related arguments to a new config structure: struct
dw_pcie_ob_atu_cfg, and pass a pointer to dw_pcie_prog_outbound_atu(). The
structure should be locally defined and populated with the outbound iATU
settings implied by the caller context.
As a result of this change there is no longer need in having the two
distinctive methods for the Host and Endpoint outbound iATU setups since
the code can directly call the dw_pcie_prog_outbound_atu() method with the
config structure populated, so drop dw_pcie_prog_ep_outbound_atu().
[kwilczynski: commit log] Link: https://lore.kernel.org/linux-pci/20240418-pme_msg-v8-2-a54265c39742@nxp.com Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Frank Li [Fri, 12 Apr 2024 16:08:41 +0000 (12:08 -0400)]
PCI: dwc: Fix index 0 incorrectly being interpreted as a free ATU slot
When PERST# assert and deassert happens on the PERST# supported platforms,
both iATU0 and iATU6 will map inbound window to BAR0. DMA will access the
area that was previously allocated (iATU0) for BAR0, instead of the new
area (iATU6) for BAR0.
Right now, this isn't an issue because both iATU0 and iATU6 should
translate inbound accesses to BAR0 to the same allocated memory area.
However, having two separate inbound mappings for the same BAR is a
disaster waiting to happen.
The mappings between PCI BAR and iATU inbound window are maintained in the
dw_pcie_ep::bar_to_atu[] array. While allocating a new inbound iATU map for
a BAR, dw_pcie_ep_inbound_atu() API checks for the availability of the
existing mapping in the array and if it is not found (i.e., value in the
array indexed by the BAR is found to be 0), it allocates a new map value
using find_first_zero_bit().
The issue is the existing logic failed to consider the fact that the map
value '0' is a valid value for BAR0, so find_first_zero_bit() will return
'0' as the map value for BAR0 (note that it returns the first zero bit
position).
Due to this, when PERST# assert + deassert happens on the PERST# supported
platforms, the inbound window allocation restarts from BAR0 and the
existing logic to find the BAR mapping will return '6' for BAR0 instead of
'0' due to the fact that it considers '0' as an invalid map value.
Fix this issue by always incrementing the map value before assigning to
bar_to_atu[] array and then decrementing it while fetching. This will make
sure that the map value '0' always represents the invalid mapping."
Alexander Stein [Thu, 18 Apr 2024 08:49:24 +0000 (10:49 +0200)]
Documentation: PCI: pci-endpoint: Fix EPF ops list
With commit 5779dd0a7dbd7 ("PCI: endpoint: Use notification chain mechanism
to notify EPC events to EPF") the linkup callback has been removed and
replaced by EPC event notifications.
With commit 256ae475201b1 ("PCI: endpoint: Add pci_epf_ops to expose
function-specific attrs") a new (optional) add_cfs callback was added.
Update documentation accordingly.
These two functions are defined in the pci_endpoint_test.c file, but not
called elsewhere, so delete these unused functions.
This fixes the following warning:
drivers/misc/pci_endpoint_test.c:144:19: warning: unused function 'pci_endpoint_test_bar_readl'.
drivers/misc/pci_endpoint_test.c:150:20: warning: unused function 'pci_endpoint_test_bar_writel'.
Yoshihiro Shimoda [Tue, 11 Jun 2024 12:50:57 +0000 (21:50 +0900)]
misc: pci_endpoint_test: Document policy about adding pci_device_id
Add a comment suggesting that if the endpoint controller Vendor and Device
ID are programmable, an existing entry might be usable for testing without
having to add an entry to pci_endpoint_test_tbl[].
Link: https://lore.kernel.org/linux-pci/20240611125057.1232873-6-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Frank Li <Frank.Li@nxp.com>
dma_set_mask_and_coherent() should never fail when the mask is >= 32bit,
unless the architecture has no DMA support. So no need to check for the
error and also no need to set dma_set_mask_and_coherent(32) as a fallback.
Even if dma_set_mask_and_coherent(48) fails due to the lack of DMA support
(theoretically), then dma_set_mask_and_coherent(32) will also fail for the
same reason. So the fallback doesn't make sense.
Simplify the code by setting the streaming and coherent DMA mask to 48
bits.
Link: https://lore.kernel.org/linux-pci/20240502195903.3191049-1-Frank.Li@nxp.com Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Niklas Cassel <cassel@kernel.org>
Niklas Cassel [Fri, 7 Jun 2024 11:14:31 +0000 (13:14 +0200)]
misc: pci_endpoint_test: Add support for Rockchip rk3588
Rockchip rk3588 requires 64KB alignment for BARs.
While there is an existing device_id:vendor_id in the driver with 64KB
alignment, that device_id:vendor_id is am654, which uses BAR2 instead of
BAR0 as the test_reg_bar, and also has special is_am654_pci_dev() checks
in the driver to disallow BAR0. In order to allow testing all BARs, add a
new rk3588 entry in the driver.
We intentionally do not add the vendor id to pci_ids.h, since the policy
for that file is that the vendor id has to be used by multiple drivers.
Hopefully, this new entry will be short-lived, as there is a series on the
mailing list which intends to move the address alignment restrictions from
this driver to the endpoint side.
Add a new entry for rk3588 in order to allow us to test all BARs.
Javier Carrasco [Sun, 9 Jun 2024 10:56:14 +0000 (12:56 +0200)]
PCI: kirin: Convert kirin_pcie_parse_port() to scoped iterator
Convert loops in kirin_pcie_parse_port() to use the _scoped() version of
for_each_available_child_of_node() so the refcounts of children are
implicitly decremented when the loop is exited.
No functional change intended here, but it will make future error exits
from these loops easier.
Link: https://lore.kernel.org/linux-pci/20240609-pcie-kirin-memleak-v1-1-62b45b879576@gmail.com Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: move to GPIO series to avoid bisection hole, commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
QCOM Resource Power Manager-hardened (RPMh) is a hardware block which
maintains hardware state of a regulator by performing max aggregation of
the requests made by all of the clients.
PCIe controller can operate on different RPMh performance state of power
domain based on the speed of the link. And this performance state varies
from target to target, like some controllers support GEN3 in NOM (Nominal)
voltage corner, while some other supports GEN3 in low SVS (static voltage
scaling).
The SoC can be more power efficient if we scale the performance state
based on the aggregate PCIe link bandwidth.
Add Operating Performance Points (OPP) support to vote for RPMh state based
on the aggregate link bandwidth.
OPP can handle ICC bw voting also, so move ICC bw voting through OPP
framework if OPP entries are present.
As we are moving ICC voting as part of OPP, don't initialize ICC if OPP
is supported.
Before PCIe link is initialized vote for highest OPP in the OPP table,
so that we are voting for maximum voltage corner for the link to come up
in maximum supported speed.
Link: https://lore.kernel.org/linux-pci/20240619-opp_support-v15-4-aa769a2173a3@quicinc.com Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: wrap comments to fit in 80 columns] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
PCI: qcom: Add ICC bandwidth vote for CPU to PCIe path
To access the host controller registers of the host controller and the
endpoint BAR/config space, the CPU-PCIe ICC (interconnect) path should
be voted otherwise it may lead to NoC (Network on chip) timeout.
We are surviving because of other driver voting for this path.
As there is less access on this path compared to PCIe to mem path
add minimum vote i.e 1KBps bandwidth always which is sufficient enough
to keep the path active and is recommended by HW team.
During S2RAM (Suspend-to-RAM), the DBI access can happen very late (while
disabling the boot CPU). So do not disable the CPU-PCIe interconnect path
during S2RAM as that may lead to NoC error.
PCI: qcom-ep: Disable resources unconditionally during PERST# assert
All EP specific resources are enabled during PERST# deassert. As a counter
operation, all resources should be disabled during PERST# assert. There is
no point in skipping that if the link was not enabled.
This will also result in enablement of the resources twice if PERST# got
deasserted again. So remove the check from qcom_pcie_perst_assert() and
disable all the resources unconditionally.
Mrinmay Sarkar [Mon, 11 Mar 2024 14:11:36 +0000 (19:41 +0530)]
PCI: qcom-ep: Override NO_SNOOP attribute for SA8775P EP
Due to some hardware changes, SA8775P has set the NO_SNOOP attribute
in its TLP for all the PCIe controllers. NO_SNOOP attribute when set,
the requester is indicating that no cache coherency issues exist for
the addressed memory on the host i.e., memory is not cached. But in
reality, requester cannot assume this unless there is a complete
control/visibility over the addressed memory on the host.
And worst case, if the memory is cached on the host, it may lead to
memory corruption issues. It should be noted that the caching of memory
on the host is not solely dependent on the NO_SNOOP attribute in TLP.
So to avoid the corruption, this patch overrides the NO_SNOOP attribute
by setting the PCIE_PARF_NO_SNOOP_OVERIDE register. This patch is not
needed for other upstream supported platforms since they do not set
NO_SNOOP attribute by default.
Mrinmay Sarkar [Mon, 11 Mar 2024 14:11:35 +0000 (19:41 +0530)]
PCI: qcom: Override NO_SNOOP attribute for SA8775P RC
Due to some hardware changes, SA8775P has set the NO_SNOOP attribute
in its TLP for all the PCIe controllers. NO_SNOOP attribute when set,
the requester is indicating that no cache coherency issue exist for
the addressed memory on the endpoint i.e., memory is not cached. But
in reality, requester cannot assume this unless there is a complete
control/visibility over the addressed memory on the endpoint.
And worst case, if the memory is cached on the endpoint, it may lead to
memory corruption issues. It should be noted that the caching of memory
on the endpoint is not solely dependent on the NO_SNOOP attribute in TLP.
So to avoid the corruption, this patch overrides the NO_SNOOP attribute
by setting the PCIE_PARF_NO_SNOOP_OVERIDE register. This patch is not
needed for other upstream supported platforms since they do not set
NO_SNOOP attribute by default.
8775 has IP version 1.34.0 so introduce a new cfg(cfg_1_34_0) for this
platform. Assign override_no_snoop flag into struct qcom_pcie_cfg and
set it true in cfg_1_34_0 and enable cache snooping if this particular
flag is true.
Link: https://lore.kernel.org/linux-pci/1710166298-27144-2-git-send-email-quic_msarkar@quicinc.com Signed-off-by: Mrinmay Sarkar <quic_msarkar@quicinc.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: wrap comments to fit in 80 columns] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Add a way for firmware to tell the OS that ATS is supported by the PCI
root complex. An endpoint with ATS enabled may send Translation Requests
and Translated Memory Requests, which look just like Normal Memory
Requests with a non-zero AT field. So a root controller that ignores the
AT field may simply forward the request to the IOMMU as a Normal Memory
Request, which could end badly. In any case, the endpoint will be
unusable.
The ats-supported property allows the OS to only enable ATS in endpoints
if the root controller can handle ATS requests. Only add the property to
pcie-host-ecam-generic for the moment. For non-generic root controllers,
availability of ATS can be inferred from the compatible string.
Link: https://lore.kernel.org/linux-pci/20240607105415.2501934-3-jean-philippe@linaro.org Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
MediaTek MT7621 PCIe sub-system supports a single Root Complex (RC)
with 3 Root Ports. Add PCIe host topology ASCII graph to the binding
for completeness.
Suggested-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/linux-pci/20240522044321.3205160-1-sergio.paracuellos@gmail.com Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
PCIe needs to choose the appropriate performance state of RPMh power
domain based on the PCIe gen speed.
Adding the Operating Performance Points table allows to adjust power
domain performance state and ICC peak bw, depending on the PCIe data
rate and link width.
Link: https://lore.kernel.org/linux-pci/20240619-opp_support-v15-2-aa769a2173a3@quicinc.com Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Thippeswamy Havalige [Mon, 24 Jun 2024 11:10:22 +0000 (16:40 +0530)]
dt-bindings: PCI: xilinx-cpm: Fix overlapping of bridge register and 32-bit BAR addresses
The current configuration had non-prefetchable memory overlapping with
bridge registers by 64KB from base address.
This patch fixes the 'ranges' property in the device tree by adjusting
the non-prefetchable memory addresses beyond the 64KB mark to prevent
conflicts.
Dan Carpenter [Mon, 10 Jun 2024 09:33:49 +0000 (12:33 +0300)]
PCI: endpoint: Fix error handling in epf_ntb_epc_cleanup()
There are two issues related to epf_ntb_epc_cleanup():
1) It should call epf_ntb_config_sspad_bar_clear()
2) The epf_ntb_bind() function should call epf_ntb_epc_cleanup()
to cleanup.
I also changed the ordering a bit. Unwinding should be done in the
mirror order from how they are allocated.
Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP") Link: https://lore.kernel.org/linux-pci/aaffbe8d-7094-4083-8146-185f4a84e8a1@moroto.mountain Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Dan Carpenter [Mon, 10 Jun 2024 09:33:39 +0000 (12:33 +0300)]
PCI: endpoint: Clean up error handling in vpci_scan_bus()
Smatch complains about inconsistent NULL checking in vpci_scan_bus():
drivers/pci/endpoint/functions/pci-epf-vntb.c:1024 vpci_scan_bus() error: we previously assumed 'vpci_bus' could be null (see line 1021)
Instead of printing an error message and then crashing we should return
an error code and clean up.
Also the NULL check is reversed so it prints an error for success
instead of failure.
Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP") Link: https://lore.kernel.org/linux-pci/68e0f6a4-fd57-45d0-945b-0876f2c8cb86@moroto.mountain Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Greg Kroah-Hartman [Mon, 10 Jun 2024 08:20:12 +0000 (10:20 +0200)]
PCI: endpoint: Make pci_epc_class struct constant
Now that the driver core allows for struct class to be in read-only
memory, we should make all 'class' structures declared at build time
placing them into read-only memory, instead of having to be dynamically
allocated at runtime.
Manivannan Sadhasivam [Thu, 6 Jun 2024 07:26:35 +0000 (12:56 +0530)]
PCI: endpoint: Introduce 'epc_deinit' event and notify the EPF drivers
As like the 'epc_init' event, that is used to signal the EPF drivers about
the EPC initialization, let's introduce 'epc_deinit' event that is used to
signal EPC deinitialization.
The EPC deinitialization applies only when any sort of fundamental reset
is supported by the endpoint controller as per the PCIe spec.
Reference: PCIe r6.0, sec 4.2.5.9.1 and 6.6.1.
Currently, some EPC drivers like pcie-qcom-ep and pcie-tegra194 support
PERST# as the fundamental reset. So the 'deinit' event will be notified to
the EPF drivers when PERST# assert happens in the above mentioned EPC
drivers.
The EPF drivers, on receiving the event through the epc_deinit() callback
should reset the EPF state machine and also cleanup any configuration that
got affected by the fundamental reset like BAR, DMA etc...
This change also warrants skipping the cleanups in unbind() if already done
in epc_deinit().
Link: https://lore.kernel.org/r/20240606-pci-deinit-v1-2-4395534520dc@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Frank Li <Frank.Li@nxp.com>
Lukas Wunner [Tue, 18 Jun 2024 10:54:55 +0000 (12:54 +0200)]
PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
Keith reports a use-after-free when a DPC event occurs concurrently to
hot-removal of the same portion of the hierarchy:
The dpc_handler() awaits readiness of the secondary bus below the
Downstream Port where the DPC event occurred. To do so, it polls the
config space of the first child device on the secondary bus. If that
child device is concurrently removed, accesses to its struct pci_dev
cause the kernel to oops.
That's because pci_bridge_wait_for_secondary_bus() neglects to hold a
reference on the child device. Before v6.3, the function was only
called on resume from system sleep or on runtime resume. Holding a
reference wasn't necessary back then because the pciehp IRQ thread
could never run concurrently. (On resume from system sleep, IRQs are
not enabled until after the resume_noirq phase. And runtime resume is
always awaited before a PCI device is removed.)
However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also
called on a DPC event. Commit 53b54ad074de ("PCI/DPC: Await readiness
of secondary bus after reset"), which introduced that, failed to
appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a
reference on the child device because dpc_handler() and pciehp may
indeed run concurrently. The commit was backported to v5.10+ stable
kernels, so that's the oldest one affected.
Yoshihiro Shimoda [Tue, 11 Jun 2024 12:50:55 +0000 (21:50 +0900)]
PCI: rcar-gen4: Add .ltssm_control() for other SoC support
Sequence for controlling the LTSSM state machine is going to change
for SoCs like r8a779f0. Move the LTSSM code to a new callback
ltssm_control() and populate it for each SoCs.
This also warrants the addition of new compatibles for r8a779g0 and
r8a779h0. But since they are already part of the DT binding, it won't
make any difference.
Yoshihiro Shimoda [Tue, 11 Jun 2024 12:50:54 +0000 (21:50 +0900)]
PCI: rcar-gen4: Add struct rcar_gen4_pcie_drvdata
In order to support future SoCs such as r8a779g0 (R-Car V4H) and
r8a779h0 (R-Car V4M) that require different initialization settings,
introduce SoC specific driver data with the initial member being the
device mode.
Kishon Vijay Abraham I [Fri, 28 Jun 2024 11:45:29 +0000 (13:45 +0200)]
PCI: keystone: Add workaround for Errata #i2037 (AM65x SR 1.0)
Errata #i2037 in AM65x/DRA80xM Processors Silicon Revision 1.0
(SPRZ452D_July 2018_Revised December 2019 [1]) mentions when an
inbound PCIe TLP spans more than two internal AXI 128-byte bursts,
the bus may corrupt the packet payload and the corrupt data may
cause associated applications or the processor to hang.
The workaround for Errata #i2037 is to limit the maximum read
request size and maximum payload size to 128 bytes. Add workaround
for Errata #i2037 here.
The errata and workaround is applicable only to AM65x SR 1.0 and
later versions of the silicon will have this fixed.