Bjorn Helgaas [Fri, 3 Oct 2025 17:13:19 +0000 (12:13 -0500)]
Merge branch 'pci/controller/mediatek-gen3'
- Add optional sys clock ready time setting to avoid sys_clk_rdy signal
glitching in MT6991 and MT8196 (AngeloGioacchino Del Regno)
- Add DT binding and driver support for MT6991 and MT8196 (AngeloGioacchino
Del Regno)
* pci/controller/mediatek-gen3:
PCI: mediatek-gen3: Add support for MediaTek MT8196 SoC
dt-bindings: PCI: mediatek-gen3: Add support for MT6991/MT8196
PCI: mediatek-gen3: Implement sys clock ready time setting
Bjorn Helgaas [Fri, 3 Oct 2025 17:13:16 +0000 (12:13 -0500)]
Merge branch 'pci/controller/dwc-edma'
- Verify that if DT specifies a single IRQ for all eDMA channels, it is
named 'dma' (Niklas Cassel)
- Remove qcom edma.nr_irqs initialization, which is redundant since
dw_pcie_edma_irq_verify() initializes it based on whether the DT contains
'dma' (single IRQ) or 'dmaX' (multiple IRQs) (Niklas Cassel)
* pci/controller/dwc-edma:
PCI: qcom-ep: Remove redundant edma.nr_irqs initialization
PCI: dwc: Verify the single eDMA IRQ in dw_pcie_edma_irq_verify()
Bjorn Helgaas [Fri, 3 Oct 2025 17:13:15 +0000 (12:13 -0500)]
Merge branch 'pci/controller/amd-mdb'
- Update DT binding example to separate PERST# to a Root Port stanza to
make multiple Root Ports possible in the future (Sai Krishna Musham)
- Add driver support for Root Port PERST# (Sai Krishna Musham)
* pci/controller/amd-mdb:
PCI: amd-mdb: Add support for PCIe RP PERST# signal handling
dt-bindings: PCI: amd-mdb: Add example usage of reset-gpios for PCIe RP PERST#
- Document sysfs interface for BAR assignment of vNTB endpoint functions
(Jerome Brunet)
- Drop superfluous pci_epc_features initialization for unsupported
features; we only have to mention features that *are* supported (Niklas
Cassel)
- Skip IRQ tests if the IRQ is out of range (Christian Bruel)
- Fix pci-epf-test for controllers with fixed-size BARs smaller than
requested by the test (Marek Vasut)
- Restore inbound translation when disabling doorbell so the doorbell test
case can be run more than once (Niklas Cassel)
- Check for NULL before releasing DMA channels to avoid a NULL pointer
dereference (Shin'ichiro Kawasaki)
- Convert tegra194 interrupt number to MSI vector to fix endpoint Kselftest
MSI_TEST test case (Niklas Cassel)
- Set tegra_pcie_epc_features.msi_capable so the pci_endpoint_test can use
the optimal IRQ type (Niklas Cassel)
- Reset tegra194 BARs when running in endpoint mode so the BAR tests don't
overwrite the ATU settings in BAR4 (Niklas Cassel)
- Handle errors in tegra194 BPMP transactions so we don't mistakenly skip
future PERST# assertion (Vidya Sagar)
* pci/endpoint:
PCI: tegra194: Handle errors in BPMP response
PCI: tegra194: Reset BARs when running in PCIe endpoint mode
PCI: tegra194: Set pci_epc_features::msi_capable to true
PCI: tegra194: Fix broken tegra_pcie_ep_raise_msi_irq()
PCI: endpoint: pci-epf-test: Add NULL check for DMA channels before release
PCI: endpoint: pci-epf-test: Fix doorbell test support
PCI: endpoint: pci-epf-test: Limit PCIe BAR size for fixed BARs
selftests: pci_endpoint: Skip IRQ test if IRQ is out of range.
misc: pci_endpoint_test: Cleanup extra 0 initialization
misc: pci_endpoint_test: Skip IRQ tests if irq is out of range
PCI: endpoint: Drop superfluous pci_epc_features initialization
Documentation: PCI: endpoint: Document BAR assignment
misc: pci_endpoint_test: Fix array underflow in pci_endpoint_test_ioctl()
PCI: endpoint: pci-ep-msi: Fix NULL vs IS_ERR() check in pci_epf_write_msi_msg()
Bjorn Helgaas [Fri, 3 Oct 2025 17:13:14 +0000 (12:13 -0500)]
Merge branch 'pci/dt-binding'
- Correct indentation in qcom,pcie-sa8255p.yaml and
amd,versal2-mdb-host.yaml so they indent with four spaces consistently
(Krzysztof Kozlowski)
- Add SM8750 compatible to qcom,pcie-sm8550.yaml (Krishna Chaitanya
Chundru)
- Add Peripheral Virtualization Unit (PVU), which restricts DMA from PCIe
devices to specific regions of host memory, to the ti,am65 binding (Jan
Kiszka)
- Update qcom,pcie-x1e80100.yaml to allow fifth PCIe host on Qualcomm
Glymur, which is compatible with X1E80100 but doesn't have the
cnoc_sf_axi clock (Qiang Yu)
* pci/dt-binding:
dt-bindings: PCI: qcom,pcie-x1e80100: Set clocks minItems for the fifth Glymur PCIe Controller
dt-bindings: PCI: ti,am65: Extend for use with PVU
dt-bindings: PCI: qcom,pcie-sm8550: Add SM8750 compatible
dt-bindings: PCI: Correct example indentation
Bjorn Helgaas [Fri, 3 Oct 2025 17:13:14 +0000 (12:13 -0500)]
Merge branch 'pci/capability-search'
- Simplify __pci_find_next_cap_ttl() by replacing magic numbers with
#defines, extracting fields with FIELD_GET(), etc (Hans Zhang)
- Convert __pci_find_next_cap_ttl() to a PCI_FIND_NEXT_CAP() macro that
takes a config space accessor function so we can also use it in cases
where the usual config accessors aren't available (Hans Zhang)
- Similarly convert pci_find_next_ext_capability() to a
PCI_FIND_NEXT_EXT_CAP() macro (Hans Zhang)
- Implement dwc, dwc endpoint, and cadence capability search interfaces on
top of PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP(), replacing the
previous duplicated code (Hans Zhang)
- Search for capabilities in the cadence core instead of hard-coding their
offsets, which are subject to change (Hans Zhang)
* pci/capability-search:
PCI: cadence: Use cdns_pcie_find_*capability() to avoid hardcoding offsets
PCI: cadence: Implement capability search using PCI core APIs
PCI: dwc: ep: Implement capability search using PCI core APIs
PCI: dwc: Implement capability search using PCI core APIs
PCI: Refactor extended capability search into PCI_FIND_NEXT_EXT_CAP()
PCI: Refactor capability search into PCI_FIND_NEXT_CAP()
PCI: Clean up __pci_find_next_cap_ttl() readability
Bjorn Helgaas [Fri, 3 Oct 2025 17:13:12 +0000 (12:13 -0500)]
Merge branch 'pci/virtualization'
- Add rescan/remove locking when enabling/disabling SR-IOV, which solves
list corruption on s390, where disabling SR-IOV also generates hotplug
events (Niklas Schnelle)
- Add lockdep assertion in pci_stop_and_remove_bus_device() to catch
device removal without appropriate locking (Niklas Schnelle)
* pci/virtualization:
PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV
Bjorn Helgaas [Fri, 3 Oct 2025 17:13:12 +0000 (12:13 -0500)]
Merge branch 'pci/resource'
- Ensure relaxed tail alignment does not increase min_align when computing
bridge window size, to fix a regression (Ilpo Järvinen)
- Fix bridge window size computation to fix a regression for devices with
undefined PCI class, e.g., Samsung [144d:a5a5] (Ilpo Järvinen)
- Fix error handling during resource resize to fix a regression in amdgpu
(Ilpo Järvinen)
- Align m68k pcibios_enable_device() with other arches (Ilpo Järvinen)
- Remove several sparc pcibios_enable_device() implementations that don't
do anything beyond what pci_enable_resources() does (Ilpo Järvinen)
- Remove mips pcibios_enable_resources() and use pci_enable_resources()
instead (Ilpo Järvinen)
- Refactor and simplify find_bus_resource_of_type() (Ilpo Järvinen)
- Claim bridge windows before setting them up (Ilpo Järvinen)
- Disable non-claimed bridge windows so the kernel's view matches the
hardware configuration (Ilpo Järvinen)
- Use pci_release_resource() instead of release_resource() to reduce code
duplication and increase consistency (Ilpo Järvinen)
- Enable bridges even if bridge window assignment fails (Ilpo Järvinen)
- Preserve bridge window resource type flags when assignment fails because
we may need it later (Ilpo Järvinen)
- Add bridge window selection functions to make the selection consistent
across the several places that do this (Ilpo Järvinen)
- Warn if bridge window cannot be released when resizing BAR (Ilpo
Järvinen)
- Set up bridge resources before enumerating children so we can check
whether child resources are inside bridge windows (Ilpo Järvinen)
* pci/resource:
PCI: Set up bridge resources earlier
PCI: Don't print stale information about resource
PCI: Alter misleading recursion to pci_bus_release_bridge_resources()
PCI: Pass bridge window to pci_bus_release_bridge_resources()
PCI: Add pci_setup_one_bridge_window()
PCI: Refactor remove_dev_resources() to use pbus_select_window()
PCI: Refactor distributing available memory to use loops
PCI: Use pbus_select_window_for_type() during mem window sizing
PCI: Use pbus_select_window() in space available checker
PCI: Rename resource variable from r to res
PCI: Use pbus_select_window_for_type() during IO window sizing
PCI: Use pbus_select_window() during BAR resize
PCI: Warn if bridge window cannot be released when resizing BAR
PCI: Fix finding bridge window in pci_reassign_bridge_resources()
PCI: Add bridge window selection functions
PCI: Add defines for bridge window indexing
PCI: Preserve bridge window resource type flags
PCI: Enable bridge even if bridge window fails to assign
PCI: Use pci_release_resource() instead of release_resource()
PCI: Disable non-claimed bridge window
PCI: Always claim bridge window before its setup
PCI: Refactor find_bus_resource_of_type() logic checks
PCI: Move find_bus_resource_of_type() earlier
MIPS: PCI: Use pci_enable_resources()
sparc/PCI: Remove pcibios_enable_device() as they do nothing extra
m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()
PCI: Fix failure detection during resource resize
PCI: Fix pdev_resources_assignable() disparity
PCI: Ensure relaxed tail alignment does not increase min_align
Bjorn Helgaas [Fri, 3 Oct 2025 17:13:11 +0000 (12:13 -0500)]
Merge branch 'pci/pwrctrl'
- Fix a double cleanup of regulators if devm_add_action_or_reset() fails
(Geert Uytterhoeven)
* pci/pwrctrl:
PCI/pwrctrl: Fix device leak at device stop
PCI/pwrctrl: Fix device and OF node leak at bus scan
PCI/pwrctrl: Fix device leak at registration
PCI/pwrctrl: Fix double cleanup on devm_add_action_or_reset() failure
Bjorn Helgaas [Fri, 3 Oct 2025 17:13:11 +0000 (12:13 -0500)]
Merge branch 'pci/pm'
- If a device has already been disconnected, e.g., by a hotplug removal,
don't bother trying to resume it to D0 when detaching the driver (Mario
Limonciello)
- Ensure devices are powered up before config reads for 'max_link_width',
'current_link_speed', 'current_link_width', 'secondary_bus_number', and
'subordinate_bus_number' sysfs files (Brian Norris)
* pci/pm:
PCI/sysfs: Ensure devices are powered for config reads
PCI/PM: Skip resuming to D0 if device is disconnected
Bjorn Helgaas [Fri, 3 Oct 2025 17:13:10 +0000 (12:13 -0500)]
Merge branch 'pci/of'
- Leave parent unit address 0 in 'interrupt-map' so we can build this
property even when interrupt controllers lack 'reg' properties (Lorenzo
Pieralisi)
* pci/of:
PCI: of: Update parent unit address generation in of_pci_prop_intr_map()
Bjorn Helgaas [Fri, 3 Oct 2025 17:13:08 +0000 (12:13 -0500)]
Merge branch 'pci/enumeration'
- Use PCI_HEADER_TYPE_* defines, not hard-coded values (Ilpo Järvinen)
- Clean up early_dump_pci_device() to avoid hard-coded values (Ilpo
Järvinen)
- Clean up pci_scan_child_bus_extend() loop to avoid hard-coded values
(Ilpo Järvinen)
- Add a Xeon 6 quirk to disable Extended Tags and limit Max Read Request
Size to 128B to avoid a performance issue (Ilpo Järvinen)
* pci/enumeration:
PCI: Add Extended Tag + MRRS quirk for Xeon 6
PCI: Clean up pci_scan_child_bus_extend() loop
PCI: Clean up early_dump_pci_device()
PCI: Use header type defines in pci_setup_device()
Bjorn Helgaas [Fri, 3 Oct 2025 17:13:07 +0000 (12:13 -0500)]
Merge branch 'pci/aer'
- Allow drivers to request a Bus Reset on Non-Fatal Errors (Lukas Wunner)
- Send uevents for subordinate devices (not the bridge) on failure to
recover from errors on the subordinate devices (Lukas Wunner)
- Notify drivers by calling their err_handler.error_detected() callback on
failure to recover (Lukas Wunner)
- Update device error_state earlier after reset to align AER and EEH error
recovery (Lukas Wunner)
- Remove obsolete comments about .link_reset(), which was removed long ago
(Lukas Wunner)
- Emit a uevent for the beginning of error recovery if driver requests a
reset (Niklas Schnelle)
- Emit error recover uevents on s390 as is done by EEH and AER (Niklas
Schnelle)
- Include error_detected() result in AER uevent to align with corresponding
uevents from EEH and s390 (Niklas Schnelle)
- Decode new errors added in PCIe r6.0 (Lukas Wunner)
- Print TLP Log for errors introduced since PCIe spec r1.1 (Lukas Wunner)
- Check for allocation failure in pci_aer_init() (Vernon Yang)
- Update error recovery documentation to match the current code and use
consistent nomenclature (Lukas Wunner)
- Avoid NULL pointer dereference in aer_ratelimit() when GHES error info
points to a device with no AER Capability (Breno Leitao)
* pci/aer:
PCI/AER: Avoid NULL pointer dereference in aer_ratelimit()
Documentation: PCI: Fix typos
Documentation: PCI: Tidy error recovery doc's PCIe nomenclature
Documentation: PCI: Amend error recovery doc with DPC/AER specifics
Documentation: PCI: Sync error recovery doc with code
Documentation: PCI: Sync AER doc with code
PCI/AER: Fix NULL pointer access by aer_info
PCI/AER: Print TLP Log for errors introduced since PCIe r1.1
PCI/AER: Support errors introduced by PCIe r6.0
powerpc/eeh: Use result of error_detected() in uevent
s390/pci: Use pci_uevent_ers() in PCI recovery
PCI/AER: Fix missing uevent on recovery when a reset is requested
PCI/ERR: Remove remnants of .link_reset() callback
PCI/ERR: Update device error_state already after reset
PCI/ERR: Notify drivers on failure to recover
PCI/ERR: Fix uevent on failure to recover
PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
PCI/AER: Avoid NULL pointer dereference in aer_ratelimit()
When platform firmware supplies error information to the OS, e.g., via the
ACPI APEI GHES mechanism, it may identify an error source device that
doesn't advertise an AER Capability and therefore dev->aer_info, which
contains AER stats and ratelimiting data, is NULL.
pci_dev_aer_stats_incr() already checks dev->aer_info for NULL, but
aer_ratelimit() did not, leading to NULL pointer dereferences like this one
from the URL below:
PCI: j721e: Fix incorrect error message in probe()
The probe() function prints "pm_runtime_get_sync failed" when
j721e_pcie_ctrl_init() returns an error. This is misleading since
the failure is not from pm_runtime, but from the controller init
routine. Update the error message to correctly reflect the source.
PCI: keystone: Use devm_request_irq() to free "ks-pcie-error-irq" on exit
Commit under Fixes introduced the IRQ handler for "ks-pcie-error-irq".
The interrupt is acquired using "request_irq()" but is never freed if
the driver exits due to an error. Although the section in the driver that
invokes "request_irq()" has moved around over time, the issue hasn't been
addressed until now.
Fix this by using "devm_request_irq()" which automatically frees the
interrupt if the driver exits.
dt-bindings: PCI: qcom,pcie-x1e80100: Set clocks minItems for the fifth Glymur PCIe Controller
On the Qualcomm Glymur platform, the fifth PCIe host is compatible with
the DWC controller present on the X1E80100 platform, but does not have
cnoc_sf_axi clock. Hence, set minItems of clocks and clock-names to six.
Niklas Schnelle [Tue, 26 Aug 2025 08:52:09 +0000 (10:52 +0200)]
PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
Removing a PCI devices requires holding pci_rescan_remove_lock. Prompted by
this being missed in sriov_disable() and going unnoticed since its
inception, add a lockdep assert so this doesn't get missed again in the
future.
Niklas Schnelle [Tue, 26 Aug 2025 08:52:08 +0000 (10:52 +0200)]
PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV
Before disabling SR-IOV via config space accesses to the parent PF,
sriov_disable() first removes the PCI devices representing the VFs.
Since commit 9d16947b7583 ("PCI: Add global pci_lock_rescan_remove()")
such removal operations are serialized against concurrent remove and
rescan using the pci_rescan_remove_lock. No such locking was ever added
in sriov_disable() however. In particular when commit 18f9e9d150fc
("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device
removal into sriov_del_vfs() there was still no locking around the
pci_iov_remove_virtfn() calls.
On s390 the lack of serialization in sriov_disable() may cause double
remove and list corruption with the below (amended) trace being observed:
This is because in addition to sriov_disable() removing the VFs, the
platform also generates hot-unplug events for the VFs. This being the
reverse operation to the hotplug events generated by sriov_enable() and
handled via pdev->no_vf_scan. And while the event processing takes
pci_rescan_remove_lock and checks whether the struct pci_dev still exists,
the lack of synchronization makes this checking racy.
Other races may also be possible of course though given that this lack of
locking persisted so long observable races seem very rare. Even on s390 the
list corruption was only observed with certain devices since the platform
events are only triggered by config accesses after the removal, so as long
as the removal finished synchronously they would not race. Either way the
locking is missing so fix this by adding it to the sriov_del_vfs() helper.
Just like PCI rescan-remove, locking is also missing in sriov_add_vfs()
including for the error case where pci_stop_and_remove_bus_device() is
called without the PCI rescan-remove lock being held. Even in the non-error
case, adding new PCI devices and buses should be serialized via the PCI
rescan-remove lock. Add the necessary locking.
Fixes: 18f9e9d150fc ("PCI/IOV: Factor out sriov_add_vfs()") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250826-pci_fix_sriov_disable-v1-1-2d0bc938f2a3@linux.ibm.com
Ilpo Järvinen [Wed, 24 Sep 2025 13:42:27 +0000 (16:42 +0300)]
PCI: Set up bridge resources earlier
Bridge windows are read twice from PCI Config Space, the first time from
pci_read_bridge_windows(), which does not set up the device's resources.
This causes problems down the road as child resources of the bridge cannot
check whether they reside within the bridge window or not.
Set up the bridge windows already in pci_read_bridge_windows().
Ilpo Järvinen [Wed, 24 Sep 2025 13:56:41 +0000 (16:56 +0300)]
PCI: Don't print stale information about resource
pbus_size_mem() logs the bridge window resource using pci_info() before the
start and end fields of the resource have been updated which then prints
stale information.
Set resource addresses earlier to make understanding logs easier.
Regrettably, this results in setting the addresses multiple times but that
seems unavoidable.
The return value from tegra_bpmp_transfer() indicates the success or
failure of the IPC transaction with BPMP. If the transaction succeeded, we
also need to check the actual command's result code.
If we don't have error handling for tegra_bpmp_transfer(), we will set the
pcie->ep_state to EP_STATE_ENABLED even when the tegra_bpmp_transfer()
command fails. Thus, the pcie->ep_state will get out of sync with reality,
and any further PERST# assert + deassert will be a no-op and will not
trigger the hardware initialization sequence.
This is because pex_ep_event_pex_rst_deassert() checks the current
pcie->ep_state, and does nothing if the current state is already
EP_STATE_ENABLED.
Thus, it is important to have error handling for tegra_bpmp_transfer(),
such that the pcie->ep_state can not get out of sync with reality, so that
we will try to initialize the hardware not only during the first PERST#
assert + deassert, but also during any succeeding PERST# assert + deassert.
One example where this fix is needed is when using a rock5b as host.
During the initial PERST# assert + deassert (triggered by the bootloader on
the rock5b) pex_ep_event_pex_rst_deassert() will get called, but for some
unknown reason, the tegra_bpmp_transfer() call to initialize the PHY fails.
Once Linux has been loaded on the rock5b, the PCIe driver will once again
assert + deassert PERST#. However, without tegra_bpmp_transfer() error
handling, this second PERST# assert + deassert will not trigger the
hardware initialization sequence.
With tegra_bpmp_transfer() error handling, the second PERST# assert +
deassert will once again trigger the hardware to be initialized and this
time the tegra_bpmp_transfer() succeeds.
PCI: tegra194: Reset BARs when running in PCIe endpoint mode
Tegra already defines all BARs except BAR0 as BAR_RESERVED. This is
sufficient for pci-epf-test to not allocate backing memory and to not call
set_bar() for those BARs. However, marking a BAR as BAR_RESERVED does not
mean that the BAR gets disabled.
The host side driver, pci_endpoint_test, simply does an ioremap for all
enabled BARs and will run tests against all enabled BARs, so it will run
tests against the BARs marked as BAR_RESERVED.
After running the BAR tests (which will write to all enabled BARs), the
inbound address translation is broken. This is because the tegra controller
exposes the ATU Port Logic Structure in BAR4, so when BAR4 is written, the
inbound address translation settings get overwritten.
To avoid this, implement the dw_pcie_ep_ops .init() callback and start off
by disabling all BARs (pci-epf-test will later enable/configure BARs that
are not defined as BAR_RESERVED).
This matches the behavior of other PCIe endpoint drivers: dra7xx, imx6,
layerscape-ep, artpec6, dw-rockchip, qcom-ep, rcar-gen4, and uniphier-ep.
With this, the PCI endpoint kselftest test case CONSECUTIVE_BAR_TEST (which
was specifically made to detect address translation issues) passes.
Fixes: c57247f940e8 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194") Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250922140822.519796-7-cassel@kernel.org
Brian Norris [Wed, 24 Sep 2025 16:57:11 +0000 (09:57 -0700)]
PCI/sysfs: Ensure devices are powered for config reads
The "max_link_width", "current_link_speed", "current_link_width",
"secondary_bus_number", and "subordinate_bus_number" sysfs files all access
config registers, but they don't check the runtime PM state. If the device
is in D3cold or a parent bridge is suspended, we may see -EINVAL, bogus
values, or worse, depending on implementation details.
Wrap these access in pci_config_pm_runtime_{get,put}() like most of the
rest of the similar sysfs attributes.
Notably, "max_link_speed" does not access config registers; it returns a
cached value since d2bd39c0456b ("PCI: Store all PCIe Supported Link
Speeds").
The pci_epc_raise_irq() supplies a MSI or MSI-X interrupt number in range
(1-N), as per the pci_epc_raise_irq() kdoc, where N is 32 for MSI.
But tegra_pcie_ep_raise_msi_irq() incorrectly uses the interrupt number as
the MSI vector. This causes wrong MSI vector to be triggered, leading to
the failure of PCI endpoint Kselftest MSI_TEST test case.
To fix this issue, convert the interrupt number to MSI vector.
Fixes: c57247f940e8 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194") Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250922140822.519796-6-cassel@kernel.org
Since the PCI subsystem has started enabling all ASPM states for all
devicetree based platforms, the ASPM enablement code from this driver can
now be dropped.
PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms
So far, the PCI subsystem has honored the ASPM and Clock PM states set by
the BIOS (through LNKCTL) during device initialization, if it relies on the
default state selected using:
* Kconfig: CONFIG_PCIEASPM_DEFAULT=y, or
* cmdline: "pcie_aspm=off", or
* FADT: ACPI_FADT_NO_ASPM
This was done conservatively to avoid issues with the buggy devices that
advertise ASPM capabilities, but behave erratically if the ASPM states are
enabled. So the PCI subsystem ended up trusting the BIOS to enable only the
ASPM states that were known to work for the devices.
But this turned out to be a problem for devicetree platforms, especially
the ARM based devicetree platforms powering Embedded and *some* Compute
devices as they tend to run without any standard BIOS. So the ASPM states
on these platforms were left disabled during boot and the PCI subsystem
never bothered to enable them, unless the user has forcefully enabled the
ASPM states through Kconfig, cmdline, and sysfs or the device drivers
themselves, enabling the ASPM states through pci_enable_link_state() APIs.
This caused runtime power issues on those platforms. So a couple of
approaches were tried to mitigate this BIOS dependency without user
intervention by enabling the ASPM states in the PCI controller drivers
after device enumeration, and overriding the ASPM/Clock PM states
by the PCI controller drivers through an API before enumeration.
But it has been concluded that none of these mitigations should really be
required and the PCI subsystem should enable the ASPM states advertised by
the devices without relying on BIOS or the PCI controller drivers. If any
device is found to be misbehaving after enabling ASPM states that they
advertised, then those devices should be quirked to disable the problematic
ASPM/Clock PM states.
In an effort to do so, start by overriding the ASPM and Clock PM states set
by the BIOS for devicetree platforms first. Separate helper functions are
introduced to override the BIOS set states by enabling all of them if
of_have_populated_dt() returns true. To aid debugging, print the overridden
ASPM and Clock PM states as well.
In the future, these helpers could be extended to allow other platforms
like VMD, newer ACPI systems with a cutoff year etc... to follow the path.
Mario Limonciello [Tue, 9 Sep 2025 03:19:15 +0000 (22:19 -0500)]
PCI/PM: Skip resuming to D0 if device is disconnected
When a device is surprise-removed (e.g., due to a dock unplug), the PCI
core unconfigures all downstream devices and sets their error state to
pci_channel_io_perm_failure. This marks them as disconnected via
pci_dev_is_disconnected().
During device removal, the runtime PM framework may attempt to resume the
device to D0 via pm_runtime_get_sync(), which calls into pci_power_up().
Since the device is already disconnected, this resume attempt is
unnecessary and results in a predictable errors like this, typically when
undocking from a TBT3 or USB4 dock with PCIe tunneling:
pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
Avoid powering up disconnected devices by checking their status early in
pci_power_up() and returning -EIO.
Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
[bhelgaas: add typical message] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://patch.msgid.link/20250909031916.4143121-1-superm1@kernel.org
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:13 +0000 (16:11 +0300)]
PCI: Alter misleading recursion to pci_bus_release_bridge_resources()
Recursing into pci_bus_release_bridge_resources() should not alter rel_type
because it makes no sense to change the release type within the recursion
call chain. A literal "whole_subtree" is passed into the recursion instead
of "rel_type" parameter which is misleading as the release type should
remain the same throughout the entire operation.
This is not a correctness issue because of the preceding if () that only
allows the recursion to happen if rel_type is "whole_subtree". Still,
replace the non-intuitive parameter with direct passing of "rel_type".
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:12 +0000 (16:11 +0300)]
PCI: Pass bridge window to pci_bus_release_bridge_resources()
pci_bus_release_bridge_resources() takes type, which is converted into a
bridge window resource in pci_bridge_release_resources().
Find out the correct bridge window for resource whose assignment failed.
Pass that bridge window to pci_bus_release_bridge_resources() instead of
passing the type. When recursing to subordinate, check which bridge windows
have to be released and recurse for each.
For now, use pbus_select_window_for_type() instead of pbus_select_window()
because non-bridge window resources still have their flags reset which
destroys the type information from the struct resource. The struct
pci_dev_resource holds a copy of the flags which are used instead.
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:11 +0000 (16:11 +0300)]
PCI: Add pci_setup_one_bridge_window()
pci_bridge_release_resources() contains a resource type hack to work
around the unsuitable __pci_setup_bridge() interface. Extract the
switch statement that picks the correct bridge window setup function
from pci_claim_bridge_resource() into pci_setup_one_bridge_window() and
use it also in pci_bridge_release_resources().
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:10 +0000 (16:11 +0300)]
PCI: Refactor remove_dev_resources() to use pbus_select_window()
Convert remove_dev_resources() to use pbus_select_window(). As 'available'
is not the real resources, the index has to be adjusted as only bridge
resource counterparts are present in the 'available' array.
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:09 +0000 (16:11 +0300)]
PCI: Refactor distributing available memory to use loops
pci_bus_distribute_available_resources() and
pci_bridge_distribute_available_resources() retain bridge window resources
and related data needed for distributing the available window in
independent variables for io, memory, and prefetchable memory windows. The
code is essentially the same for all of them and therefore repeated three
times with different variable names.
Refactor pci_bus_distribute_available_resources() to take an array. This
is complicated slightly by the function taking advantage of passing the
struct as value, which cannot be done for arrays in C. Therefore, copy the
data into a local array in the stack in the first loop.
Variable names are (hopefully) improved slightly as well.
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:08 +0000 (16:11 +0300)]
PCI: Use pbus_select_window_for_type() during mem window sizing
__pci_bus_size_bridges() goes to great lengths of helping pbus_size_mem()
in which types it should put into a particular bridge window, requiring
passing up to three resource type into pbus_size_mem().
Instead of having complex logic in __pci_bus_size_bridges() and a
non-straightforward interface between those functions, use
pbus_select_window_for_type() and pbus_select_window() to find the correct
bridge window and compare if the resources belong to that window.
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:07 +0000 (16:11 +0300)]
PCI: Use pbus_select_window() in space available checker
pbus_upstream_space_available() figures out the upstream bridge window
resources on its own. Migrate it to use pbus_select_window().
Note: pbus_select_window() -> pbus_select_window_for_type() calls
find_bus_resource_of_type() for root bus, which does not do parent check
similar to what pbus_upstream_space_available() did earlier, but the
difference does not matter because pbus_upstream_space_available() itself
stops when it encounters the root bus.
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:06 +0000 (16:11 +0300)]
PCI: Rename resource variable from r to res
Resource is going to be passed in as argument aften an upcoming change.
Rename the struct resource variable from "r" to "res" to avoid using one
letter variable name in a function argument.
This rename is made separately to reduce churn in the upcoming change.
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:04 +0000 (16:11 +0300)]
PCI: Use pbus_select_window() during BAR resize
Prior to a BAR resize, __resource_resize_store() loops through the normal
resources of the PCI device and releases those that match to the flags of
the BAR to be resized. This is necessary to allow resizing also the
upstream bridge window as only childless bridge windows can be resized.
While the flags check (mostly) works (if corner cases are ignored), the
more straightforward way is to check if the resources share the bridge
window. Change __resource_resize_store() to do the check using
pbus_select_window().
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:03 +0000 (16:11 +0300)]
PCI: Warn if bridge window cannot be released when resizing BAR
BAR resizing calls to pci_reassign_bridge_resources(), which attempts to
release any upstream bridge window to allow them to accommodate the new BAR
size. The release can only be performed if there are no other child
resources for the bridge window. Previously the code continued silently
when other child resources were detected.
Add pci_warn() to inform user that a bridge window could not be released
because of child resources. As a small bridge window is often the reason
why BAR resize fails, this warning will help to pinpoint to the cause.
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:02 +0000 (16:11 +0300)]
PCI: Fix finding bridge window in pci_reassign_bridge_resources()
pci_reassign_bridge_resources() walks upwards in the PCI bus hierarchy,
locates the relevant bridge window on each level using flags check, and
attempts to release the bridge window. The flags-based check is fragile due
to various fallbacks in the bridge window selection logic. As such, the
algorithm might not locate the correct bridge window.
Refactor pci_reassign_bridge_resources() to determine the correct bridge
window using pbus_select_window(), which contains logic to handle all
fallback cases correctly. Change function prefix to pbus as it now inputs
struct bus and resource for which to locate the bridge window.
The main purpose is to make bridge window selection logic consistent across
the entire PCI core (one step at a time). While this technically also fixes
the commit 8bb705e3e79d ("PCI: Add pci_resize_resource() for resizing
BARs") making the bridge window walk algorithm more robust, the normal
setup having a 64-bit resizable BAR underneath bridge(s) with 64-bit
prefetchable windows does not need to use any fallbacks. As such, the
practical impact is low (requiring BAR resize use case and a non-typical
bridge device).
The way to detect if unrelated resource failed again is left to use the
type based approximation which should not behave worse than before.
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:01 +0000 (16:11 +0300)]
PCI: Add bridge window selection functions
Various places in the PCI core code independently decide into which bridge
window a child resource should be placed. It is hard to see whether these
decisions always end up in agreement, especially in the corner cases, and
in some places it requires complex logic to pass multiple resource types
and/or bridge windows around.
Add pbus_select_window() and pbus_select_window_for_type() for cases where
the former cannot be used so that eventually the same helper can be used to
select the bridge window everywhere. Using the same function ensures the
selected bridge window remains always the same and it can be easily
recalculated in-situ allowing simplifying the interfaces between internal
functions in upcoming changes.
Ilpo Järvinen [Fri, 29 Aug 2025 13:11:00 +0000 (16:11 +0300)]
PCI: Add defines for bridge window indexing
include/linux/pci.h provides PCI_BRIDGE_{IO,MEM,PREF_MEM}_WINDOW defines,
however, they're based on the resource array indexing in the pci_dev
struct. The struct pci_bus also has pointers to those same resources but
they start from zeroth index.
Add PCI_BUS_BRIDGE_{IO,MEM,PREF_MEM}_WINDOW defines to get rid of literal
indexing.
Ilpo Järvinen [Fri, 29 Aug 2025 13:10:59 +0000 (16:10 +0300)]
PCI: Preserve bridge window resource type flags
When a bridge window is found unused or fails to assign, the flags of the
associated resource are cleared. Clearing flags is problematic as it also
removes the type information of the resource which is needed later.
Thus, always preserve the bridge window type flags and use IORESOURCE_UNSET
and IORESOURCE_DISABLED to indicate the status of the bridge window. Also,
when initializing resources, make sure all valid bridge windows do get
their type flags set.
Change various places that relied on resource flags being cleared to check
for IORESOURCE_UNSET and IORESOURCE_DISABLED to allow bridge window
resource to retain their type flags. Add pdev_resource_assignable() and
pdev_resource_should_fit() helpers to filter out disabled bridge windows
during resource fitting; the latter combines more common checks into the
helper.
When reading the bridge windows from the registers, instead of leaving the
resource flags cleared for bridge windows that are not enabled, always
set up the flags and set IORESOURCE_UNSET | IORESOURCE_DISABLED as needed.
When resource fitting or assignment fails for a bridge window resource, or
the bridge window is not needed, mark the resource with IORESOURCE_UNSET or
IORESOURCE_DISABLED, respectively.
Use dummy zero resource in resource_show() for backwards compatibility as
lspci will otherwise misrepresent disabled bridge windows.
This change fixes an issue which highlights the importance of keeping the
resource type flags intact:
At the end of __assign_resources_sorted(), reset_resource() is called,
previously clearing the flags. Later, pci_prepare_next_assign_round()
attempted to release bridge resources using
pci_bus_release_bridge_resources() that calls into
pci_bridge_release_resources() that assumes type flags are still present.
As type flags were cleared, IORESOURCE_MEM_64 was not set leading to
resources under an incorrect bridge window to be released (idx = 1
instead of idx = 2). While the assignments performed later covered this
problem so that the wrongly released resources got assigned in the end,
it was still causing extra release+assign pairs.
There are other reasons why the resource flags should be retained in
upcoming changes too.
Removing the flag reset for non-bridge window resource is left as future
work, in part because it has a much higher regression potential due to
pci_enable_resources() that will start to work also for those resources
then and due to what endpoint drivers might assume about resources.
Despite the Fixes tag, backporting this (at least any time soon) is highly
discouraged. The issue fixed is borderline cosmetic as the later
assignments normally cover the problem entirely. Also there might be
non-obvious dependencies.
Ilpo Järvinen [Fri, 29 Aug 2025 13:10:58 +0000 (16:10 +0300)]
PCI: Enable bridge even if bridge window fails to assign
A normal PCI bridge has multiple bridge windows and not all of them are
always required by devices underneath the bridge. If a Root Port or bridge
does not have a device underneath, no bridge windows get assigned. Yet,
pci_enable_resources() is set to fail indiscriminantly on any resource
assignment failure if the resource is not known to be optional.
In practice, the code in pci_enable_resources() is currently largely
dormant. The kernel sets resource flags to zero for any unused bridge
window and resets flags to zero in case of an resource assignment failure,
which short-circuits pci_enable_resources() because of this check:
if (!(r->flags & (IORESOURCE_IO | IORESOURCE_MEM)))
continue;
However, an upcoming change to resource flags will alter how bridge window
resource flags behave activating these long dormants checks in
pci_enable_resources().
While complex logic could be built to selectively enable a bridge only
under some conditions, a few versions of such logic were tried during
development of this change and none of them worked satisfactorily. Thus, I
just gave up and decided to enable any bridge regardless of the bridge
windows as there seems to be no clear benefit from not enabling it, but a
major downside as pcieport will not be probed for the bridge if it's not
enabled.
Therefore, change pci_enable_resources() to not check if bridge window
resources remain unassigned. Resource assignment failures are pretty noisy
already so there is no need to log that for bridge windows in
pci_enable_resources().
Ignoring bridge window failures hopefully prevents an obvious source of
regressions when the upcoming change that no longer clears resource flags
for bridge windows is enacted. I've hit this problem even during my own
testing on multiple occasions so I expect it to be a quite common problem.
This can always be revisited later if somebody thinks the enable check for
bridges is not strict enough, but expect a mind-boggling number of
regressions from such a change.
Ilpo Järvinen [Fri, 29 Aug 2025 13:10:57 +0000 (16:10 +0300)]
PCI: Use pci_release_resource() instead of release_resource()
A few places in setup-bus.c call release_resource() directly and end up
duplicating functionality from pci_release_resource() such as parent check,
logging, and clearing the resource. Worse yet, the way the resource is
cleared is inconsistent between different sites.
Convert release_resource() calls into pci_release_resource() to remove code
duplication. This will also make the resource start, end, and flags
behavior consistent, i.e., start address is cleared, and only
IORESOURCE_UNSET is asserted for the resource.
While at it, eliminate the unnecessary initialization of idx variable in
pci_bridge_release_resources().
Ilpo Järvinen [Fri, 29 Aug 2025 13:10:56 +0000 (16:10 +0300)]
PCI: Disable non-claimed bridge window
If clipping or claiming the bridge window fails, the bridge window is left
in a state that does not match the kernel's view on what the bridge window
is.
Disable the bridge window by writing the magic disable value into the Base
and Limit Registers if clipping or claiming failed. To detect if claiming
the resource was successful, add res->parent checks into the bridge setup
functions.
Ilpo Järvinen [Fri, 29 Aug 2025 13:10:55 +0000 (16:10 +0300)]
PCI: Always claim bridge window before its setup
When the claim of a resource fails for the full range in
pci_claim_bridge_resource(), clipping the resource to a smaller size is
attempted. If clipping is successful, the new bridge window is programmed
and only as the last step the code attempts to claim the resource again.
The order of the last two steps is slightly illogical and inconsistent with
the assignment call chains.
If claiming the bridge window after clipping fails, the bridge window that
was set up is left in place.
Rework the logic such that the bridge window is claimed before calling the
relevant bridge setup function. This make the behavior consistent with
resource fitting call chains that always assign the bridge window before
programming it.
If claiming the bridge window fails, the clipped bridge window is no longer
set up but pci_claim_bridge_resource() returns without writing the bridge
window at all.
Ilpo Järvinen [Fri, 29 Aug 2025 13:10:52 +0000 (16:10 +0300)]
MIPS: PCI: Use pci_enable_resources()
pci-legacy.c under MIPS has a copy of pci_enable_resources() named as
pcibios_enable_resources(). Having own copy of same functionality could
lead to inconsistencies in behavior, especially now as
pci_enable_resources() and the bridge window resource flags behavior are
going to be altered by upcoming changes.
The check for !r->start && r->end is already covered by the more generic
checks done in pci_enable_resources().
Call pci_enable_resources() from MIPS's pcibios_enable_device() and remove
pcibios_enable_resources().
Ilpo Järvinen [Fri, 29 Aug 2025 13:10:51 +0000 (16:10 +0300)]
sparc/PCI: Remove pcibios_enable_device() as they do nothing extra
Under arch/sparc/ there are multiple copies of pcibios_enable_device() but
none of those seem to do anything extra beyond what pci_enable_resources()
is supposed to do. These functions could lead to inconsistencies in
behavior, especially now as pci_enable_resources() and the bridge window
resource flags behavior are going to be altered by upcoming changes.
Remove all pcibios_enable_device() from arch/sparc/ so that PCI core can
simply call into pci_enable_resources() instead using its __weak version
of pcibios_enable_device().
Ilpo Järvinen [Fri, 29 Aug 2025 13:10:50 +0000 (16:10 +0300)]
m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()
m68k has a resource enable (check) loop in its pcibios_enable_device()
which for some reason differs from pci_enable_resources(). This could lead
to inconsistencies in behavior, especially now as pci_enable_resources()
and the bridge window resource flags behavior are going to be altered by
upcoming changes.
The check for !r->start && r->end is already covered by the more generic
checks done in pci_enable_resources().
The entire pcibios_enable_device() suspiciously looks copy-paste from some
other arch as also indicated by the preceding comment. However, it also
enables PCI_COMMAND_IO | PCI_COMMAND_MEMORY always for bridges. It is not
clear why that is being done as the commit e93a6bbeb5a5 ("m68k: common PCI
support definitions and code") introducing this code states "Nothing
specific to any PCI implementation in any m68k class CPU hardware yet".
Replace the resource enable loop with a call to pci_enable_resources() and
adjust the Command Register afterwards as it's unclear if that is necessary
or not so keep it for now.
Ilpo Järvinen [Mon, 30 Jun 2025 14:26:41 +0000 (17:26 +0300)]
PCI: Fix failure detection during resource resize
Since 96336ec70264 ("PCI: Perform reset_resource() and build fail list in
sync") the failed list is always built and returned to let the caller
decide what to do with the failures. The caller may want to retry resource
fitting and assignment and before that can happen, the resources should be
restored to their original state (a reset effectively clears the struct
resource), which requires returning them to the failed list so the original
state remains stored in the associated struct pci_dev_resource.
Resource resizing is different from the ordinary resource fitting and
assignment in that it only considers part of the resources. This means
failures for other resource types are not relevant at all and should be
ignored. As resize doesn't unassign such unrelated resources, those
resources ending up in the failed list implies assignment of that
resource must have failed before resize too. The check in
pci_reassign_bridge_resources() to decide if the whole assignment is
successful, however, is based on list emptiness which will cause false
negatives when the failed list has resources with an unrelated type.
If the failed list is not empty, call pci_required_resource_failed() and
extend it to be able to filter on specific resource types too (if
provided).
Calling pci_required_resource_failed() at this point is slightly
problematic because the resource itself is reset when the failed list
is constructed in __assign_resources_sorted(). As a result,
pci_resource_is_optional() does not have access to the original
resource flags. This could be worked around by restoring and
re-resetting the resource around the call to pci_resource_is_optional(),
however, it shouldn't cause issue as resource resizing is meant for
64-bit prefetchable resources according to Christian König (see the
Link which unfortunately doesn't point directly to Christian's reply
because lore didn't store that email at all).
Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel.com/ Reported-by: D Scott Phillips <scott@os.amperecomputing.com> Closes: https://lore.kernel.org/all/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: D Scott Phillips <scott@os.amperecomputing.com> Reviewed-by: D Scott Phillips <scott@os.amperecomputing.com> Cc: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org # v6.15+ Link: https://patch.msgid.link/20250822123359.16305-4-ilpo.jarvinen@linux.intel.com
Ilpo Järvinen [Mon, 30 Jun 2025 14:26:40 +0000 (17:26 +0300)]
PCI: Fix pdev_resources_assignable() disparity
pdev_sort_resources() uses pdev_resources_assignable() helper to decide if
device's resources cannot be assigned, so it ignores class 0
(PCI_CLASS_NOT_DEFINED) devices. pbus_size_mem(), on the other hand, does
not do the same check. This could lead into a situation where a resource
ends up on realloc_head list but is not on the head list, which in turn
prevents emptying the resource from the realloc_head list in
__assign_resources_sorted().
A non-empty realloc_head is unacceptable because it triggers an internal
sanity check as shown in this log with a device that has class 0
(PCI_CLASS_NOT_DEFINED):
pci 0001:01:00.0: [144d:a5a5] type 00 class 0x000000 PCIe Endpoint
pci 0001:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
pci 0001:01:00.0: ROM [mem 0x00000000-0x0000ffff pref]
pcieport 0001:00:00.0: bridge window [mem 0x00100000-0x001fffff] to [bus 01-ff] add_size 100000 add_align 100000
pcieport 0001:00:00.0: bridge window [mem 0x40000000-0x401fffff]: assigned
------------[ cut here ]------------
kernel BUG at drivers/pci/setup-bus.c:2532!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
Call trace:
pci_assign_unassigned_bus_resources+0x110/0x114 (P)
pci_rescan_bus+0x28/0x48
Use pdev_resources_assignable() also within pbus_size_mem() to skip
processing of non-assignable resources which removes the disparity in
between what resources pdev_sort_resources() and pbus_size_mem() consider.
As non-assignable resources are no longer processed, they are not added to
the realloc_head list, thus the sanity check no longer triggers.
This disparity problem is very old but only now became apparent after 2499f5348431 ("PCI: Rework optional resource handling") that made the ROM
resources optional when calculating bridge window sizes which required
adding the resource to the realloc_head list. Previously, bridge windows
were just sized larger than necessary.
Ilpo Järvinen [Mon, 30 Jun 2025 14:26:39 +0000 (17:26 +0300)]
PCI: Ensure relaxed tail alignment does not increase min_align
When using relaxed tail alignment for the bridge window, pbus_size_mem()
also tries to minimize min_align, which can under certain scenarios end up
increasing min_align from that found by calculate_mem_align().
Ensure min_align is not increased by the relaxed tail alignment.
Eventually, it would be better to add calculate_relaxed_head_align()
similar to calculate_mem_align() which finds out what alignment can be used
for the head without introducing any gaps into the bridge window to give
flexibility on head address too. But that looks relatively complex so it
requires much more testing than fixing the immediate problem causing a
regression.
Fixes: 67f9085596ee ("PCI: Allow relaxed bridge window tail sizing for optional resources") Reported-by: Rio Liu <rio@r26.me> Closes: https://lore.kernel.org/all/o2bL8MtD_40-lf8GlslTw-AZpUPzm8nmfCnJKvS8RQ3NOzOW1uq1dVCEfRpUjJ2i7G2WjfQhk2IWZ7oGp-7G-jXN4qOdtnyOcjRR0PZWK5I=@r26.me/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Rio Liu <rio@r26.me> Cc: stable@vger.kernel.org # v6.15+ Link: https://patch.msgid.link/20250822123359.16305-2-ilpo.jarvinen@linux.intel.com
Commit 11502feab423 ("Documentation: PCI: Tidy AER documentation")
replaced the terms "PCI-E", "PCI-Express" and "PCI Express" with "PCIe"
in the AER documentation.
Do the same in the documentation on PCI error recovery. While at it,
add a missing period and a missing blank.
Documentation: PCI: Amend error recovery doc with DPC/AER specifics
Amend the documentation on PCI error recovery with specifics about
Downstream Port Containment and Advanced Error Reporting:
* Explain that with DPC, devices are inaccessible upon an error (similar
to EEH on powerpc) and do not become accessible until the link is
re-enabled.
* Explain that with AER, although devices may already be accessible in the
->error_detected() callback, accesses should be deferred to the
->mmio_enabled() callback for compatibility with EEH on powerpc and with
s390.
Documentation: PCI: Sync error recovery doc with code
Amend the documentation on PCI error recovery to fix minor inaccuracies
vis-à-vis the actual code:
* The documentation claims that a missing ->resume() or ->mmio_enabled()
callback always leads to recovery through reset. But none of the
implementations do this (pcie_do_recovery(), eeh_handle_normal_event(),
zpci_event_do_error_state_clear()).
Drop the claim to align the documentation with the code.
* The documentation does not list PCI_ERS_RESULT_RECOVERED as a valid
return value from ->error_detected(). But none of the implementations
forbid this and some drivers are returning it, e.g.:
drivers/bus/mhi/host/pci_generic.c
drivers/infiniband/hw/hfi1/pcie.c
Further down in the documentation it is implied that the return value is
in fact allowed:
"The platform will call the resume() callback on all affected device
drivers if all drivers on the segment have returned
PCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks."
The "3 previous callbacks" being ->error_detected(), ->mmio_enabled()
and ->slot_reset().
Add it to the valid return values for consistency.
The PCIe Advanced Error Reporting driver has evolved over the years but
its documentation hasn't. Catch up with past code changes:
* The documentation claims that Correctable Errors are logged with
KERN_INFO severity, but the code uses KERN_WARN.
It had used KERN_WARN from the beginning with commit 6c2b374d7485
("PCI-Express AER implemetation: AER core and aerdriver"). In 2013,
commit 2cced2d95961 ("aerdrv: Cleanup log output for AER") switched to
KERN_ERR, until 2020 when it was reverted back to KERN_WARN by commit e83e2ca3c395 ("PCI/AER: Log correctable errors as warning, not error").
* An example log message in the documentation uses the term "Uncorrected",
but the code uses "Uncorrectable" since commit 02a06f5f1a6a ("PCI/AER:
Use 'Correctable' and 'Uncorrectable' spec terms for errors").
* The example contains the Requester ID "id=0500", which is omitted since
commit 010caed4ccb6 ("PCI/AER: Decode Error Source Requester ID").
* The example contains the error name "Unsupported Request", which is
instead reported as "UnsupReq" since commit bd237801fef2 ("PCI/AER:
Adopt lspci names for AER error decoding").
* The example doesn't prepend "0x" to hex values from the TLP Header Log,
as introduced by commit f68ea779d98a ("PCI: Add pcie_print_tlp_log() to
print TLP Header and Prefix Log").
* The documentation refers to a reset_link callback which was removed by
commit b6cf1a42f916 ("PCI/ERR: Remove service dependency in
pcie_do_recovery()").
* Commit 579086225502 ("PCI/ERR: Recover from RCiEP AER errors") added
support to recover Root Complex Integrated Endpoints by applying a
Function Level Reset, alternatively to the Secondary Bus Reset which is
applied otherwise.
* On non-fatal errors, a reset was previously never performed. But the
AER driver has just been amended to allow drivers to opt in to a reset.
* The documentation claims that a warning message is logged if a driver
lacks pci_error_handlers. But the message has been informational
(logged with KERN_INFO severity) since its introduction with commit 01daacfb9035 ("PCI/AER: Log which device prevents error recovery").
The documentation claims that the message is only logged for fatal
errors, which is incorrect. Moreover it refers to "section 3", even
though the documentation no longer contains section numbers since commit 4e37f055a92e ("Documentation: PCI: convert pcieaer-howto.txt to reST").
Section 3 is titled "Developer Guide". That's the same section where
the reference is located, so it is self-referential and can be dropped.
PCI: endpoint: pci-epf-test: Add NULL check for DMA channels before release
The fields dma_chan_tx and dma_chan_rx of the struct pci_epf_test can be
NULL even after EPF initialization. Then it is prudent to check that
they have non-NULL values before releasing the channels. Add the checks
in pci_epf_test_clean_dma_chan().
Without the checks, NULL pointer dereferences happen and they can lead
to a kernel panic in some cases:
dw_pcie_edma_irq_verify() already parses device tree for either "dma" (if
there is a single IRQ for all DMA channels) or "dmaX" (if there is one IRQ
per DMA channel), and initializes dma.nr_irqs accordingly.
Additionally, the probing of the eDMA driver will fail if neither "dma"
nor "dmaX" is defined in the device tree.
Therefore there is no need for a glue driver to specify edma.nr_irqs, so
remove the redundant edma.nr_irqs initialization.
PCI: dwc: Verify the single eDMA IRQ in dw_pcie_edma_irq_verify()
dw_pcie_edma_irq_verify() is supposed to verify the eDMA IRQs in devicetree
by fetching them using either 'dma' or 'dmaX' IRQ names. Former is used
when the platform uses a single IRQ for all eDMA channels and latter is
used when the platform uses separate IRQ per channel. But currently,
dw_pcie_edma_irq_verify() bails out early if edma::nr_irqs is 1, i.e., when
a single IRQ is used. This gives an impression that the driver could work
with any single IRQ in devicetree, not necessarily with name 'dma'.
But dw_pcie_edma_irq_vector(), which actually requests the IRQ, does
require the single IRQ to be named as 'dma'. So this creates inconsistency
between dw_pcie_edma_irq_verify() and dw_pcie_edma_irq_vector().
Thus, to fix this inconsistency, make sure dw_pcie_edma_irq_verify() also
verifies the single IRQ name by removing the bail out code.
PCI: endpoint: pci-epf-test: Fix doorbell test support
The doorbell feature temporarily overrides the inbound translation to point
to the address stored in epf_test->db_bar.phys_addr, i.e., it calls
set_bar() twice without ever calling clear_bar(), as calling clear_bar()
would clear the BAR's PCI address assigned by the host.
Thus, when disabling the doorbell, restore the inbound translation to point
to the memory allocated for the BAR.
Without this, running the PCI endpoint kselftest doorbell test case more
than once would fail.
Fixes: eff0c286aa91 ("PCI: endpoint: pci-epf-test: Add doorbell test support") Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250908161942.534799-2-cassel@kernel.org
Lorenzo Pieralisi [Mon, 18 Aug 2025 09:35:04 +0000 (11:35 +0200)]
PCI: of: Update parent unit address generation in of_pci_prop_intr_map()
Some interrupt controllers require an #address-cells property in their
bindings without requiring a "reg" property to be present.
The current logic used to craft an interrupt-map property in
of_pci_prop_intr_map() is based on reading the #address-cells
property in the interrupt-parent and, if != 0, read the interrupt
parent "reg" property to determine the parent unit address to be
used to create the parent unit interrupt specifier.
First of all, it is not correct to read the "reg" property of
the interrupt-parent with an #address-cells value taken from the
interrupt-parent node, because the #address-cells value define the
number of address cells required by child nodes.
More importantly, for all modern interrupt controllers, the parent
unit address is irrelevant in hardware in relation to the
device <-> interrupt-controller connection and the kernel actually
ignores the parent unit address value when hierarchically parsing
the interrupt-map property (i.e., of_irq_parse_raw()).
For the reasons above, remove the code parsing the interrupt parent "reg"
property in of_pci_prop_intr_map() -- it is not needed and prevents
interrupt-map property generation on systems with an interrupt-controller
that has no "reg" property in its interrupt-controller node -- and leave
the parent unit address always initialized to 0 since it is simply ignored
by the kernel.
Marek Vasut [Fri, 5 Sep 2025 18:42:10 +0000 (20:42 +0200)]
PCI: endpoint: pci-epf-test: Limit PCIe BAR size for fixed BARs
Currently, the test allocates BAR sizes according to fixed table bar_size.
This does not work with controllers which have fixed size BARs that are
smaller than the requested BAR size. One such controller is Renesas R-Car
V4H PCIe controller, which has BAR4 size limited to 256 bytes, which is
much less than one of the BAR size, 131072 currently requested by this
test. A lot of controllers drivers in-tree have fixed size BARs, and they
do work perfectly fine, but it is only because their fixed size is larger
than the size requested by pci-epf-test.c
Adjust the test such that in case a fixed size BAR is detected, the fixed
BAR size is used, as that is the only possible option.
This helps with test failures reported as follows:
pci_epf_test pci_epf_test.0: requested BAR size is larger than fixed size
pci_epf_test pci_epf_test.0: Failed to allocate space for BAR4
PCI: j721e: Fix programming sequence of "strap" settings
The Cadence PCIe Controller integrated in the TI K3 SoCs supports both
Root-Complex and Endpoint modes of operation. The Glue Layer allows
"strapping" the Mode of operation of the Controller, the Link Speed
and the Link Width. This is enabled by programming the "PCIEn_CTRL"
register (n corresponds to the PCIe instance) within the CTRL_MMR
memory-mapped register space. The "reset-values" of the registers are
also different depending on the mode of operation.
Since the PCIe Controller latches onto the "reset-values" immediately
after being powered on, if the Glue Layer configuration is not done while
the PCIe Controller is off, it will result in the PCIe Controller latching
onto the wrong "reset-values". In practice, this will show up as a wrong
representation of the PCIe Controller's capability structures in the PCIe
Configuration Space. Some such capabilities which are supported by the PCIe
Controller in the Root-Complex mode but are incorrectly latched onto as
being unsupported are:
- Link Bandwidth Notification
- Alternate Routing ID (ARI) Forwarding Support
- Next capability offset within Advanced Error Reporting (AER) capability
Fix this by powering off the PCIe Controller before programming the "strap"
settings and powering it on after that. The runtime PM APIs namely
pm_runtime_put_sync() and pm_runtime_get_sync() will decrement and
increment the usage counter respectively, causing GENPD to power off and
power on the PCIe Controller.
Xichao Zhao [Wed, 20 Aug 2025 08:52:00 +0000 (16:52 +0800)]
PCI: plda: Remove dev_err_probe() when the errno is -ENOMEM
The dev_err_probe() doesn't do anything when error is '-ENOMEM'.
Therefore, remove the useless call to dev_err_probe(), and just
return the value instead.
Richard Zhu [Wed, 20 Aug 2025 02:23:28 +0000 (10:23 +0800)]
PCI: imx6: Enable the Vaux supply if available
When the 3.3Vaux supply is present, fetch it at the probe time and keep it
enabled for the entire PCIe controller lifecycle so that the link can enter
L2 state and the devices can signal wakeup using either Beacon or WAKE#
mechanisms.
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: reworded the subject, description and error message] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250820022328.2143374-1-hongxing.zhu@nxp.com
PCI/AER: Print TLP Log for errors introduced since PCIe r1.1
When reporting an error, the AER driver prints the TLP Header / Prefix Log
only for errors enumerated in the AER_LOG_TLP_MASKS macro.
The macro was never amended since its introduction in 2006 with commit 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver").
At the time, PCIe r1.1 was the latest spec revision.
Amend the macro with errors defined since then to avoid omitting the TLP
Header / Prefix Log for newer errors.
The order of the errors in AER_LOG_TLP_MASKS follows PCIe r1.1 sec 6.2.7
rather than 7.10.2, because only the former documents for which errors a
TLP Header / Prefix is logged. Retain this order. The section number is
still 6.2.7 in today's PCIe r7.0.
For Completion Timeouts, the TLP Header / Prefix is only logged if the
Completion Timeout Prefix / Header Log Capable bit is set in the AER
Capabilities and Control register. Introduce a tlp_header_logged() helper
to check whether the TLP Header / Prefix Log is populated and use it in
the two places which currently match against AER_LOG_TLP_MASKS directly.
For Uncorrectable Internal Errors, logging of the TLP Header / Prefix is
optional per PCIe r7.0 sec 6.2.7. If needed, drivers could indicate
through a flag whether devices are capable and tlp_header_logged() could
then check that flag.
pcitools introduced macros for newer errors with commit 144b0911cc0b
("ls-ecaps: extend decode support for more fields for AER CE and UE
status"):
https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/commit/?id=144b0911cc0b
Unfortunately some of those macros are overly long:
PCI_ERR_UNC_POISONED_TLP_EGRESS
PCI_ERR_UNC_DMWR_REQ_EGRESS_BLOCKED
PCI_ERR_UNC_IDE_CHECK
PCI_ERR_UNC_MISR_IDE_TLP
PCI_ERR_UNC_PCRC_CHECK
PCI_ERR_UNC_TLP_XLAT_EGRESS_BLOCKED
This seems unsuitable for <linux/pci_regs.h>, so shorten to:
PCI_ERR_UNC_POISON_BLK
PCI_ERR_UNC_DMWR_BLK
PCI_ERR_UNC_IDE_CHECK
PCI_ERR_UNC_MISR_IDE
PCI_ERR_UNC_PCRC_CHECK
PCI_ERR_UNC_XLAT_BLK
Note that some of the existing macros in <linux/pci_regs.h> do not match
exactly with pcitools (e.g. PCI_ERR_UNC_SDES versus PCI_ERR_UNC_SURPDN),
so it does not seem mandatory for them to be identical.
Commit a2790bf81f0f ("PCI: j721e: Add support to build as a loadable
module") added support to build the driver as a loadable module. However,
it did not add MODULE_DEVICE_TABLE() which is required for autoloading the
driver based on device table when it is built as a loadable module.
Fix it by adding MODULE_DEVICE_TABLE.
Fixes: a2790bf81f0f ("PCI: j721e: Add support to build as a loadable module") Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
[mani: reworded description] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250901120359.3410774-1-s-vadapalli@ti.com
Jan Kiszka [Mon, 28 Jul 2025 02:36:56 +0000 (10:36 +0800)]
dt-bindings: PCI: ti,am65: Extend for use with PVU
The Peripheral Virtualization Unit (PVU) on the AM65 SoC is capable of
restricting DMA from PCIe devices to specific regions of host memory.
Add the optional property "memory-regions" to point to such regions of
memory when PVU is used.
Since the PVU deals with system physical addresses, utilizing the PVU
with PCIe devices also requires setting up the VMAP registers to map the
Requester ID of the PCIe device to the CBA Virtual ID, which in turn is
mapped to the system physical address. Hence, describe the VMAP
registers which are optional unless the PVU shall be used for PCIe.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Li Hua Qian <huaqian.li@siemens.com>
[mani: Expanded PVU in description] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250728023701.116963-3-huaqian.li@siemens.com
Johan Hovold [Mon, 21 Jul 2025 15:36:08 +0000 (17:36 +0200)]
PCI/pwrctrl: Fix device and OF node leak at bus scan
Make sure to drop the references to the pwrctrl OF node and device taken by
of_pci_find_child_device() and of_find_device_by_node() respectively when
scanning the bus.
Fixes: 957f40d039a9 ("PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Cc: stable@vger.kernel.org # v6.15 Link: https://patch.msgid.link/20250721153609.8611-3-johan+linaro@kernel.org
PCIe controller present in SM8750 SoC is backwards compatible with the
controller present in SM8550 SoC. Hence, add the compatible with SM8550
fallback.
Qianfeng Rong [Tue, 19 Aug 2025 13:12:33 +0000 (21:12 +0800)]
PCI: keystone: Use kcalloc() instead of kzalloc()
Replace calls of devm_kzalloc() with devm_kcalloc() in ks_pcie_probe().
As noted in the kernel documentation [1], open-coded multiplication in
allocator arguments is discouraged because it can lead to integer
overflow.
Using devm_kcalloc() provides built-in overflow protection, making the
memory allocation safer when calculating the allocation size compared
to explicit multiplication.
Christian Bruel [Mon, 4 Aug 2025 17:09:14 +0000 (19:09 +0200)]
misc: pci_endpoint_test: Skip IRQ tests if irq is out of range
The pci_endpoint_test tests the 32-bit MSI range. However, the device might
not have all vectors configured. For example, if msi_interrupts is 8 in the
ep function space or if the MSI Multiple Message Capable value is
configured as 4 (maximum 16 vectors).
In this case, do not attempt to run the test to avoid timeouts and directly
return the error value.