Bjorn Helgaas [Thu, 27 Mar 2025 18:14:49 +0000 (13:14 -0500)]
Merge branch 'pci/controller/dwc'
- Move struct dwc_pcie_vsec_id to include/linux/pcie-dwc.h, where it can be
shared by debugfs, perf, sysfs, etc (Manivannan Sadhasivam)
- Add dw_pcie_find_vsec_capability() to locate Vendor Specific Extended
Capabilities (Shradha Todi)
- Add debugfs-based Silicon Debug, Error Injection, Statistical Counter
support for DWC (Shradha Todi)
- Add debugfs property to expose LTSSM status of DWC PCIe link (Hans Zhang)
- Add Rockchip Vendor ID and Vendor Specific ID of RAS DES Capability so
the DWC debugfs features work for Rockchip as well (Niklas Cassel)
* pci/controller/dwc:
PCI: dw-rockchip: Hide broken ATS capability for RK3588 running in EP mode
PCI: dwc: ep: Add dw_pcie_ep_hide_ext_capability()
PCI: dwc: ep: Return -ENOMEM for allocation failures
PCI: dwc: Add Rockchip to the RAS DES allowed vendor list
PCI: Add Rockchip Vendor ID
PCI: dwc: Add debugfs property to provide LTSSM status of the PCIe link
PCI: dwc: Add debugfs based Statistical Counter support for DWC
PCI: dwc: Add debugfs based Error Injection support for DWC
PCI: dwc: Add debugfs based Silicon Debug support for DWC
PCI: dwc: Add helper to find the Vendor Specific Extended Capability (VSEC)
perf/dwc_pcie: Move common DWC struct definitions to 'pcie-dwc.h'
- Add brcmstb softdep on irq_bcm2712_mip MIP MSI-X interrupt controller
driver to ensure that it is loaded first (Stanimir Varbanov)
- Add struct brcm_pcie pointer to pcie_cfg_data so we can reference the
pcie_cfg_data directly instead of copying it to brcm_pcie (Stanimir
Varbanov)
- Expand inbound window map to 64GB so it can accommodate BCM2712 (Stanimir
Varbanov)
- Add BCM2712 support and DT updates (Stanimir Varbanov)
- Apply link speed restriction before bringing link up, not after (Jim
Quinlan)
- Update Max Link Speed in Link Capabilities via the internal writable
register, not the read-only config register (Jim Quinlan)
- Handle regulator_bulk_get() error to avoid panic when we call
regulator_bulk_free() later (Jim Quinlan)
- Disable regulators only when removing the bus immediately below a Root
Port because we don't support regulators deeper in the hierarchy (Jim
Quinlan)
- Consistently use config access index/data register offsets from the
SoC-specific pcie_offsets[] table (Jim Quinlan)
- Update MDIO register fields that reduced CMD from 12 bits to 1 and
widened PORT from 4 bits to 5 and split it into two parts (Jim Quinlan)
- Make const read-only arrays static (Colin Ian King)
* pci/controller/brcmstb:
PCI: brcmstb: Make const read-only arrays static
PCI: brcmstb: Make irq_domain_set_info() parameter cast explicit
PCI: brcmstb: Make two changes in MDIO register fields
PCI: brcmstb: Use same constant table for config space access
PCI: brcmstb: Fix potential premature regulator disabling
PCI: brcmstb: Fix error path after a call to regulator_bulk_get()
PCI: brcmstb: Do not assume that register field starts at LSB
PCI: brcmstb: Use internal register to change link capability
PCI: brcmstb: Set generation limit before PCIe link up
PCI: brcmstb: Add BCM2712 support
PCI: brcmstb: Expand inbound window size up to 64GB
PCI: brcmstb: Reuse pcie_cfg_data structure
PCI: brcmstb: Add a softdep to MIP MSI-X driver
irqchip: Add Broadcom BCM2712 MSI-X interrupt controller
dt-bindings: PCI: brcmstb: Update bindings for PCIe on BCM2712
dt-bindings: interrupt-controller: Add BCM2712 MSI-X bindings
PCI: brcmstb: Fix missing of_node_put() in brcm_pcie_probe()
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:47 +0000 (13:14 -0500)]
Merge branch 'pci/scoped-cleanup'
- Use for_each_available_child_of_node_scoped() to simplify apple, kirin,
mediatek, mt7621, tegra drivers (Zhang Zekun)
* pci/scoped-cleanup:
PCI: tegra: Use helper function for_each_child_of_node_scoped()
PCI: apple: Use helper function for_each_child_of_node_scoped()
PCI: mt7621: Use helper function for_each_available_child_of_node_scoped()
PCI: mediatek: Use helper function for_each_available_child_of_node_scoped()
PCI: kirin: Tidy up _probe() related function with dev_err_probe()
PCI: kirin: Use helper function for_each_available_child_of_node_scoped()
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:46 +0000 (13:14 -0500)]
Merge branch 'pci/endpoint'
- Convert PCI device data so pci-epf-test works correctly on big-endian
endpoint systems (Niklas Cassel)
- Add BAR_RESIZABLE type to endpoint framework (Niklas Cassel)
- Add pci_epc_bar_size_to_rebar_cap() to convert a size to the Resizable
BAR Capability so endpoint drivers can configure what the Capability
register advertises (Niklas Cassel)
- Add DWC core support for EPF drivers to set BAR_RESIZABLE type and size
via dw_pcie_ep_set_bar() (Niklas Cassel)
- Describe TI AM65x (keystone) BARs 2 and 5 as Resizable, not Fixed
(Niklas Cassel)
- Reduce TI AM65x (keystone) BAR alignment requirement from 1MB to 64KB
(Niklas Cassel)
- Describe Rockchip rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas
Cassel)
- Drop unused devm_pci_epc_destroy() (Zijun Hu)
- Fix pci-epf-test double free that causes an oops if the host reboots and
PERST# deassertion restarts endpoint BAR allocation (Christian Bruel)
- Drop dw_pcie_ep_find_ext_capability() and use
dw_pcie_find_ext_capability() instead (Niklas Cassel)
* pci/endpoint:
PCI: dwc: ep: Remove superfluous function dw_pcie_ep_find_ext_capability()
PCI: endpoint: pci-epf-test: Fix double free that causes kernel to oops
PCI: endpoint: Remove unused devm_pci_epc_destroy()
PCI: dw-rockchip: Describe Resizable BARs as Resizable BARs
PCI: keystone: Specify correct alignment requirement
PCI: keystone: Describe Resizable BARs as Resizable BARs
PCI: dwc: ep: Allow EPF drivers to configure the size of Resizable BARs
PCI: dwc: ep: Move dw_pcie_ep_find_ext_capability()
PCI: endpoint: Add pci_epc_bar_size_to_rebar_cap()
PCI: endpoint: Allow EPF drivers to configure the size of Resizable BARs
PCI: endpoint: pci-epf-test: Handle endianness properly
- Drop deprecated layerscape 'num-ib-windows' and 'num-ob-windows' from
example (Krzysztof Kozlowski)
- Drop unnecessary layerscape 'status' from example (Krzysztof Kozlowski)
- Add common pci-ep-bus.yaml schema for exporting several peripherals of a
single PCI function via devicetree (Andrea della Porta)
* pci/dt-bindings:
dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
dt-bindings: PCI: fsl,layerscape-pcie-ep: Drop unnecessary status from example
dt-bindings: PCI: fsl,layerscape-pcie-ep: Drop deprecated windows
dt-bindings: PCI: fsl,imx6q-pcie: Add optional DMA interrupt
dt-bindings: PCI: Convert fsl,mpc83xx-pcie to YAML
dt-bindings: PCI: qcom: Document the IPQ5332 PCIe controller
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:45 +0000 (13:14 -0500)]
Merge branch 'pci/devtree-create'
- Add device_add_of_node() to set dev->of_node and dev->fwnode only if they
haven't been set already (Herve Codina)
- Allow of_pci_set_address() to set the DT address property for root bus
nodes, where there is no PCI bridge to supply the PCI bus/device/function
part of the property (Herve Codina)
- Create DT nodes for PCI host bridges to enable loading device tree
overlays to create platform devices for PCI devices that have several
features that require multiple drivers (Herve Codina)
* pci/devtree-create:
PCI: of: Create device tree PCI host bridge node
PCI: of_property: Constify parameter in of_pci_get_addr_flags()
PCI: of_property: Add support for NULL pdev in of_pci_set_address()
PCI: of: Use device_{add,remove}_of_node() to attach of_node to existing device
driver core: Introduce device_{add,remove}_of_node()
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:45 +0000 (13:14 -0500)]
Merge branch 'pci/resource'
- Use pci_resource_n() to simplify BAR/window resource lookup (Ilpo
Järvinen)
- Fix typo that repeatedly distributed resources to a bridge instead of
iterating over subordinate bridges, which resulted in too little space to
assign some BARs (Kai-Heng Feng)
- Relax bridge window tail sizing for optional resources, e.g., IOV BARs,
to avoid failures when removing and re-adding devices (Ilpo Järvinen)
- Fix a double counting error for I/O resources, as we previously did for
memory resources (Ilpo Järvinen)
- Use resource_set_{range,size}() helpers in more places (Ilpo Järvinen)
- Add pci_resource_is_iov() to identify IOV resources (Ilpo Järvinen)
- Add pci_resource_num() to look up the BAR number from the resource
pointer (Ilpo Järvinen)
- Add restore_dev_resource() to simplify code that resources saved device
resources (Ilpo Järvinen)
- Allow drivers to enable devices even if we haven't assigned optional IOV
resources to them (Ilpo Järvinen)
- Improve debug output during resource reallocation (Ilpo Järvinen)
- Rework handling of optional resources (IOV BARs, ROMs) to reduce failures
if we can't allocate them (Ilpo Järvinen)
- Move declarations of pci_rescan_bus_bridge_resize(),
pci_reassign_bridge_resources(), and CardBus-related sizes from
include/linux/pci.h to drivers/pci/pci.h since they're not used outside
the PCI core (Ilpo Järvinen)
- Make pci_setup_bridge() static (Ilpo Järvinen)
- Fix a NULL dereference in the SR-IOV VF creation error path (Shay Drory)
- Fix s390 mmio_read/write syscalls, which didn't cause page faults in some
cases, which broke vfio-pci lazy mapping on first access (Niklas
Schnelle)
- Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which was
disabled only for s390 (Niklas Schnelle)
- Support mmap of PCI resources on s390 except for ISM devices (Niklas
Schnelle)
* pci/resource:
s390/pci: Support mmap() of PCI resources except for ISM devices
s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP
s390/pci: Fix s390_mmio_read/write syscall page fault handling
PCI: Fix NULL dereference in SR-IOV VF creation error path
PCI: Move cardbus IO size declarations into pci/pci.h
PCI: Make pci_setup_bridge() static
PCI: Move resource reassignment func declarations into pci/pci.h
PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h
PCI: Fix BAR resizing when VF BARs are assigned
PCI: Do not claim to release resource falsely
PCI: Increase Resizable BAR support from 512 GB to 128 TB
PCI: Rework optional resource handling
PCI: Perform reset_resource() and build fail list in sync
PCI: Use res->parent to check if resource is assigned
PCI: Add debug print when releasing resources before retry
PCI: Indicate optional resource assignment failures
PCI: Always have realloc_head in __assign_resources_sorted()
PCI: Extend enable to check for any optional resource
PCI: Add restore_dev_resource()
PCI: Remove incorrect comment from pci_reassign_resource()
PCI: Consolidate assignment loop next round preparation
PCI: Rename retval to ret
PCI: Use while loop and break instead of gotos
PCI: Refactor pdev_sort_resources() & __dev_sort_resources()
PCI: Converge return paths in __assign_resources_sorted()
PCI: Add dev & res local variables to resource assignment funcs
PCI: Add pci_resource_num() helper
PCI: Check resource_size() separately
PCI: Add pci_resource_is_iov() to identify IOV resources
PCI: Use resource_set_{range,size}() helpers
PCI: Use SZ_* instead of literals in setup-bus.c
PCI: Fix old_size lower bound in calculate_iosize() too
PCI: Allow relaxed bridge window tail sizing for optional resources
PCI: Simplify size1 assignment logic
PCI: Use min_align, not unrelated add_align, for size0
PCI: Remove add_align overwrite unrelated to size0
PCI: Use downstream bridges for distributing resources
PCI: Cleanup dev->resource + resno to use pci_resource_n()
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:44 +0000 (13:14 -0500)]
Merge branch 'pci/pwrctrl'
- Create pwrctrl devices in pci_scan_device() to make it more symmetric
with pci_pwrctrl_unregister() and make pwrctrl devices for PCI bridges
possible (Manivannan Sadhasivam)
- Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc. can
still access devices after pci_stop_dev() (Manivannan Sadhasivam)
- If there's a pwrctrl device for a PCI device, skip scanning it because
the pwrctrl core will rescan the bus after the device is powered on
(Manivannan Sadhasivam)
- Add a pwrctrl driver for PCI slots based on voltage regulators described
via devicetree (Manivannan Sadhasivam)
* pci/pwrctrl:
PCI/pwrctrl: Add pwrctrl driver for PCI slots
dt-bindings: vendor-prefixes: Document the 'pciclass' prefix
PCI/pwrctrl: Skip scanning for the device further if pwrctrl device is created
PCI/pwrctrl: Move pci_pwrctrl_unregister() to pci_destroy_dev()
PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:44 +0000 (13:14 -0500)]
Merge branch 'pci/hotplug'
- Drop shpchp module init/exit logging (Ilpo Järvinen)
- Replace shpchp dbg() with ctrl_dbg() and remove unused dbg(), err(),
info(), warn() wrappers (Ilpo Järvinen)
- Drop 'shpchp_debug' module parameter in favor of standard dynamic
debugging (Ilpo Järvinen)
- Drop unused .get_power(), .set_power() function pointers (Guilherme
Giacomo Simoes)
- Drop superfluous pci_hotplug_slot_list (Lukas Wunner)
- Drop superfluous try_module_get() calls (Lukas Wunner)
- Drop superfluous NULL pointer checks (Lukas Wunner)
- Pass struct hotplug_slot pointers directly to avoid backpointer
dereferencing in has_*_file() (Lukas Wunner)
- Inline pci_hp_{create,remove}_module_link() to reduce exported symbols
(Lukas Wunner)
- Disable hotplug interrupts in portdrv only when pciehp is not enabled to
prevent issuing two hotplug commands too close together (Feng Tang)
- Skip pciehp 'device replaced' check if the device has been removed to
address a common deadlock when resuming after a device was removed during
system sleep (Lukas Wunner)
- Don't enable pciehp hotplug interupt when resuming in poll mode (Ilpo
Järvinen)
* pci/hotplug:
PCI: pciehp: Don't enable HPIE when resuming in poll mode
PCI: pciehp: Avoid unnecessary device replacement check
PCI/portdrv: Only disable pciehp interrupts early when needed
PCI: hotplug: Inline pci_hp_{create,remove}_module_link()
PCI: hotplug: Avoid backpointer dereferencing in has_*_file()
PCI: hotplug: Drop superfluous NULL pointer checks in has_*_file()
PCI: hotplug: Drop superfluous try_module_get() calls
PCI: hotplug: Drop superfluous pci_hotplug_slot_list
PCI: cpcihp: Remove unused .get_power() and .set_power()
PCI: shpchp: Remove 'shpchp_debug' module parameter
PCI: shpchp: Remove unused logging wrappers
PCI: shpchp: Change dbg() -> ctrl_dbg()
PCI: shpchp: Remove logging from module init/exit functions
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:43 +0000 (13:14 -0500)]
Merge branch 'pci/doe'
- Rename DOE 'protocol' to 'feature' to follow spec terminology (Alistair
Francis)
- Expose supported DOE features via sysfs (Alistair Francis)
- Allow DOE support to be enabled even if CXL isn't enabled (Alistair
Francis)
* pci/doe:
PCI/DOE: Allow enabling DOE without CXL
PCI/DOE: Expose DOE features via sysfs
PCI/DOE: Rename Discovery Response Data Object Contents to type
PCI/DOE: Rename DOE protocol to feature
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:42 +0000 (13:14 -0500)]
Merge branch 'pci/bwctrl'
- Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
set_pcie_cooling_state.sh test case (Yi Lai)
- Fix the pcie_bwctrl_select_speed() return value in cases where a
non-compliant device doesn't advertise valid supported speeds (Ilpo
Järvinen)
- Avoid a NULL pointer dereference when we run out of bus numbers to assign
for a bridge secondary bus (Lukas Wunner)
* pci/bwctrl:
PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type
selftests/pcie_bwctrl: Add 'set_pcie_speed.sh' to TEST_PROGS
Niklas Cassel [Mon, 10 Mar 2025 11:10:24 +0000 (12:10 +0100)]
misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
For PCITEST_MSI we really want to set PCITEST_SET_IRQTYPE explicitly
to PCITEST_IRQ_TYPE_MSI, since we want to test if MSI works.
For PCITEST_MSIX we really want to set PCITEST_SET_IRQTYPE explicitly
to PCITEST_IRQ_TYPE_MSIX, since we want to test if MSI works.
For PCITEST_LEGACY_IRQ we really want to set PCITEST_SET_IRQTYPE
explicitly to PCITEST_IRQ_TYPE_INTX, since we want to test if INTx
works.
However, for PCITEST_WRITE, PCITEST_READ, PCITEST_COPY, we really don't
care which IRQ type that is used, we just want to use a IRQ type that is
supported by the EPC.
The old behavior was to always use MSI for PCITEST_WRITE, PCITEST_READ,
PCITEST_COPY, was to always set IRQ type to MSI before doing the actual
test, however, there are EPC drivers that do not support MSI.
Add a new PCITEST_IRQ_TYPE_AUTO, that will use the CAPS register to see
which IRQ types the endpoint supports, and use one of the supported IRQ
types.
For backwards compatibility, if the endpoint does not expose any supported
IRQ type in the CAPS register, simply fallback to using MSI, as it was
unconditionally done before.
Neither RK3568 or RK3588 supports INTx interrupts.
Since epc_features is zero initialized, this is strictly not needed.
However, setting intx_capable explicitly to false makes it more clear
that neither RK3568 or RK3588 supports INTx interrupts.
Andrea della Porta [Wed, 19 Mar 2025 21:52:24 +0000 (22:52 +0100)]
dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
Common YAML schema for devices that exports internal peripherals through
PCI BARs. The BARs are exposed as simple-buses through which the
peripherals can be accessed.
This is not intended to be used as a standalone binding, but should be
included by device specific bindings.
Lukas Wunner [Sat, 22 Mar 2025 18:52:08 +0000 (19:52 +0100)]
PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
When BIOS neglects to assign bus numbers to PCI bridges, the kernel
attempts to correct that during PCI device enumeration. If it runs out
of bus numbers, no pci_bus is allocated and the "subordinate" pointer in
the bridge's pci_dev remains NULL.
The PCIe bandwidth controller erroneously does not check for a NULL
subordinate pointer and dereferences it on probe.
Bandwidth control of unusable devices below the bridge is of questionable
utility, so simply error out instead. This mirrors what PCIe hotplug does
since commit 62e4492c3063 ("PCI: Prevent NULL dereference during pciehp
probe").
The PCI core emits a message with KERN_INFO severity if it has run out of
bus numbers. PCIe hotplug emits an additional message with KERN_ERR
severity to inform the user that hotplug functionality is disabled at the
bridge. A similar message for bandwidth control does not seem merited,
given that its only purpose so far is to expose an up-to-date link speed
in sysfs and throttle the link speed on certain laptops with limited
Thermal Design Power. So error out silently.
User-visible messages:
pci 0000:16:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[...]
pci_bus 0000:45: busn_res: [bus 45-74] end is updated to 74
pci 0000:16:02.0: devices behind bridge are unusable because [bus 45-74] cannot be assigned for them
[...]
pcieport 0000:16:02.0: pciehp: Hotplug bridge without secondary bus, ignoring
[...]
BUG: kernel NULL pointer dereference
RIP: pcie_update_link_speed
pcie_bwnotif_enable
pcie_bwnotif_probe
pcie_port_probe_service
really_probe
Colin Ian King [Mon, 17 Mar 2025 14:34:56 +0000 (14:34 +0000)]
PCI: brcmstb: Make const read-only arrays static
Don't populate the const read-only arrays "data" and "regs" on the
stack at run time, instead make them static.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
[kwilczynski: commit log, wrap overly long line to 80 columns] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250317143456.477901-1-colin.i.king@gmail.com
Thippeswamy Havalige [Fri, 28 Feb 2025 09:33:51 +0000 (15:03 +0530)]
PCI: amd-mdb: Add AMD MDB Root Port driver
Add support for AMD MDB (Multimedia DMA Bridge) IP core as Root Port.
The Versal2 devices include MDB Module. The integrated block for MDB
along with the integrated bridge can function as PCIe Root Port
controller at Gen5 32-GT/s operation per lane.
Bridge supports error and INTx interrupts and are handled using platform
specific interrupt line in Versal2.
Signed-off-by: Thippeswamy Havalige <thippeswamy.havalige@amd.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250228093351.923615-4-thippeswamy.havalige@amd.com
[bhelgaas: only present on ARM64-based SoCs; squash Kconfig dependency on
ARM64 from Geert Uytterhoeven <geert+renesas@glider.be>:
https://lore.kernel.org/r/eaef1dea7edcf146aa377d5e5c5c85a76ff56bae.1742306383.git.geert+renesas@glider.be] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: commit log, code comments and error messages clean-up,
drop redundant "depends on PCI" from Kconfig, expose the error code
as part of error messages where appropriatie, change "depends on"
expression to match existing style from other drivers] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Alistair Francis [Thu, 6 Mar 2025 07:52:10 +0000 (17:52 +1000)]
PCI/DOE: Expose DOE features via sysfs
PCIe r6.0 added support for Data Object Exchange (DOE). When DOE is
supported, the DOE Discovery Feature must be implemented per PCIe r6.1, sec
6.30.1.1. DOE allows a requester to obtain information about the other DOE
features supported by the device.
The kernel already queries the DOE features supported and caches the
values. Expose the values in sysfs to allow user space to determine which
DOE features are supported by the PCIe device.
By exposing the information to userspace, tools like lspci can relay the
information to users. By listing all of the supported features we can allow
userspace to parse the list, which might include vendor specific features
as well as yet to be supported features.
As the DOE Discovery feature must always be supported we treat it as a
special named attribute case. This allows the usual PCI attribute_group
handling to correctly create the doe_features directory when registering
pci_doe_sysfs_group (otherwise it doesn't and sysfs_add_file_to_group()
will seg fault).
After this patch is supported you can see something like this when
attaching a DOE device:
$ ls /sys/devices/pci0000:00/0000:00:02.0//doe*
0001:01 0001:02 doe_discovery
Link: https://lore.kernel.org/r/20250306075211.1855177-3-alistair@alistair23.me Signed-off-by: Alistair Francis <alistair@alistair23.me>
[bhelgaas: drop pci_doe_sysfs_init() stub return, make
DEVICE_ATTR_RO(doe_discovery) static] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Niklas Schnelle [Wed, 26 Feb 2025 12:07:47 +0000 (13:07 +0100)]
s390/pci: Support mmap() of PCI resources except for ISM devices
So far s390 does not allow mmap() of PCI resources to user-space via the
usual mechanisms, though it does use it for RDMA. For the PCI sysfs
resource files and /proc/bus/pci it defines neither HAVE_PCI_MMAP nor
ARCH_GENERIC_PCI_MMAP_RESOURCE. For vfio-pci s390 previously relied on
disabled VFIO_PCI_MMAP and now relies on setting pdev->non_mappable_bars
for all devices.
This is partly because access to mapped PCI resources from user-space
requires special PCI load/store memory-I/O (MIO) instructions, or the
special MMIO syscalls when these are not available. Still, such access is
possible and useful not just for RDMA, in fact not being able to mmap() PCI
resources has previously caused extra work when testing devices.
One thing that doesn't work with PCI resources mapped to user-space though
is the s390 specific virtual ISM device. Not only because the BAR size of
256 TiB prevents mapping the whole BAR but also because access requires use
of the legacy PCI instructions which are not accessible to user-space on
systems with the newer MIO PCI instructions.
Now with the pdev->non_mappable_bars flag ISM can be excluded from mapping
its resources while making this functionality available for all other PCI
devices. To this end introduce a minimal implementation of PCI_QUIRKS and
use that to set pdev->non_mappable_bars for ISM devices only. Then also set
ARCH_GENERIC_PCI_MMAP_RESOURCE to take advantage of the generic
implementation of pci_mmap_resource_range() enabling only the newer sysfs
mmap() interface. This follows the recommendation in
Documentation/PCI/sysfs-pci.rst.
Niklas Schnelle [Wed, 26 Feb 2025 12:07:46 +0000 (13:07 +0100)]
s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP
The ability to map PCI resources to user-space is controlled by global
defines. For vfio there is VFIO_PCI_MMAP which is only disabled on s390 and
controls mapping of PCI resources using vfio-pci with a fallback option via
the pread()/pwrite() interface.
For the PCI core there is ARCH_GENERIC_PCI_MMAP_RESOURCE which enables a
generic implementation for mapping PCI resources plus the newer sysfs
interface. Then there is HAVE_PCI_MMAP which can be used with custom
definitions of pci_mmap_resource_range() and the historical /proc/bus/pci
interface. Both mechanisms are all or nothing.
For s390 mapping PCI resources is possible and useful for testing and
certain applications such as QEMU's vfio-pci based user-space NVMe driver.
For certain devices, however access to PCI resources via mappings to
user-space is not possible and these must be excluded from the general PCI
resource mapping mechanisms.
Introduce pdev->non_mappable_bars to indicate that a PCI device's BARs can
not be accessed via mappings to user-space. In the future this enables
per-device restrictions of PCI resource mapping.
For now, set this flag for all PCI devices on s390 in line with the
existing, general disable of PCI resource mapping. As s390 is the only user
of the VFI_PCI_MMAP Kconfig options this can already be replaced with a
check of this new flag. Also add similar checks in the other code protected
by HAVE_PCI_MMAP respectively ARCH_GENERIC_PCI_MMAP in preparation for
enabling these for supported devices.
The s390 MMIO syscalls when using the classic PCI instructions do not
cause a page fault when follow_pfnmap_start() fails due to the page not
being present. Besides being a general deficiency this breaks vfio-pci's
mmap() handling once VFIO_PCI_MMAP gets enabled as this lazily maps on
first access. Fix this by following a failed follow_pfnmap_start() with
fixup_user_page() and retrying the follow_pfnmap_start(). Also fix
a VM_READ vs VM_WRITE mixup in the read syscall.
Shay Drory [Mon, 10 Mar 2025 08:45:24 +0000 (10:45 +0200)]
PCI: Fix NULL dereference in SR-IOV VF creation error path
Clean up when virtfn setup fails to prevent NULL pointer dereference
during device removal. The kernel oops below occurred due to incorrect
error handling flow when pci_setup_device() fails.
Add pci_iov_scan_device(), which handles virtfn allocation and setup and
cleans up if pci_setup_device() fails, so pci_iov_add_virtfn() doesn't need
to call pci_stop_and_remove_bus_device(). This prevents accessing
partially initialized virtfn devices during removal.
Ilpo Järvinen [Fri, 21 Mar 2025 16:31:03 +0000 (18:31 +0200)]
PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type
pcie_bwctrl_select_speed() should take __fls() of the speed bit, not return
it as a raw value. Instead of directly returning 2.5GT/s speed bit, simply
assign the fallback speed (2.5GT/s) into supported_speeds variable to share
the normal return path that calls pcie_supported_speeds2target_speed() to
calculate __fls().
This code path is not very likely to execute because
pcie_get_supported_speeds() should provide valid ->supported_speeds but a
spec violating device could fail to synthesize any speed in
pcie_get_supported_speeds(). It could also happen in case the
supported_speeds intersection is empty (also a violation of the current
PCIe specs).
Ilpo Järvinen [Fri, 21 Mar 2025 16:21:14 +0000 (18:21 +0200)]
PCI: pciehp: Don't enable HPIE when resuming in poll mode
PCIe hotplug can operate in poll mode without interrupt handlers using a
polling kthread only. eb34da60edee ("PCI: pciehp: Disable hotplug
interrupt during suspend") failed to consider that and enables HPIE
(Hot-Plug Interrupt Enable) unconditionally when resuming the Port.
Only set HPIE if non-poll mode is in use. This makes
pcie_enable_interrupt() match how pcie_enable_notification() already
handles HPIE.
Ilpo Järvinen [Tue, 11 Mar 2025 17:47:01 +0000 (19:47 +0200)]
PCI: Move cardbus IO size declarations into pci/pci.h
For some reason, cardbus related io/mem size declarations are in
linux/pci.h, whereas non-cardbus sizes are already in pci/pci.h.
Move all them into one place in pci/pci.h.
Ilpo Järvinen [Tue, 11 Mar 2025 17:46:59 +0000 (19:46 +0200)]
PCI: Move resource reassignment func declarations into pci/pci.h
Neither pci_reassign_bridge_resources() nor pci_reassign_resource() is used
outside of the PCI subsystem. They seem to be naturally static functions
but since resource fitting/assignment is split between setup-bus.c and
setup-res.c, they fall into different sides of the divide and need to be
declared.
Move the declarations of pci_reassign_bridge_resources() and
pci_reassign_resource() into pci/pci.h to keep them internal to PCI
subsystem.
Ilpo Järvinen [Tue, 11 Mar 2025 17:46:58 +0000 (19:46 +0200)]
PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h
pci_rescan_bus_bridge_resize() is only used by code inside PCI subsystem.
The comment also falsely advertises it to be for hotplug drivers, yet the
only caller is from sysfs store function. Move the function declaration
into pci/pci.h.
Ilpo Järvinen [Thu, 20 Mar 2025 14:28:37 +0000 (16:28 +0200)]
PCI: Fix BAR resizing when VF BARs are assigned
__resource_resize_store() attempts to release all resources of the device
before attempting the resize. The loop, however, only covers standard BARs
(< PCI_STD_NUM_BARS). If a device has VF BARs that are assigned,
pci_reassign_bridge_resources() finds the bridge window still has some
assigned child resources and returns -NOENT which makes
pci_resize_resource() to detect an error and abort the resize.
Change the release loop to cover all resources up to VF BARs which allows
the resize operation to release the bridge windows and attempt to assigned
them again with the different size.
If SR-IOV is enabled, disallow resize as it requires releasing also IOV
resources.
Link: https://lore.kernel.org/r/20250320142837.8027-1-ilpo.jarvinen@linux.intel.com Fixes: 91fa127794ac ("PCI: Expose PCIe Resizable BAR support via sysfs") Reported-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Manivannan Sadhasivam [Thu, 20 Mar 2025 18:06:04 +0000 (11:06 -0700)]
PCI: Allow PCI bridges to go to D3Hot on all non-x86
Currently, pci_bridge_d3_possible() encodes a variety of decision factors
when deciding whether a given bridge can be put into D3. A particular one
of note is for "recent enough PCIe ports." Per Rafael [0]:
"There were hardware issues related to PM on x86 platforms predating
the introduction of Connected Standby in Windows. For instance,
programming a port into D3hot by writing to its PMCSR might cause the
PCIe link behind it to go down and the only way to revive it was to
power cycle the Root Complex. And similar."
Thus, this function contains a DMI-based check for post-2015 BIOS.
The above factors (Windows, x86) don't really apply to non-x86 systems, and
also, many such systems don't have BIOS or DMI. However, we'd like to be
able to suspend bridges on non-x86 systems too.
Restrict the "recent enough" check to x86. If we find further
incompatibilities, it probably makes sense to expand on the deny-list
approach (i.e., bridge_d3_blacklist or similar).
Richard Zhu [Wed, 26 Feb 2025 02:42:56 +0000 (10:42 +0800)]
PCI: imx6: Identify controller via 'linux,pci-domain', not address
Instead of testing the controller register address to distinguish
controller 1 from controller 0 on i.MX8MQ platforms, use the PCI domain
number, which comes from the devicetree 'linux,pci-domain' property.
All relevant devicetrees should already supply 'linux,pci-domain', which
was added by c0b70f05c87f ("arm64: dts: imx8mq: use_dt_domains for pci
node").
Instead of being set directly in imx_pcie_probe(), pci->dbi_base will be
set by the DWC core in dw_pcie_get_resources().
No functional changes intended.
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250226024256.1678103-3-hongxing.zhu@nxp.com
Usually, to handle these issues, we add a quirk for the PCI vendor and
device ID in drivers/pci/quirks.c with quirk_no_ats(). That is because
we cannot usually modify the capabilities on the EP side. In this case,
we can modify the capabilities on the EP side.
Thus, hide the broken ATS capability on RK3588 when running in EP mode.
That way, we don't need any quirk on the host side, and we see no errors
on the host side, and we can run pci_endpoint_test successfully, with
the IOMMU enabled on the host side.
Acked-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Niklas Cassel <cassel@kernel.org>
[kwilczynski: commit log, tidy up code comments and error message] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250310094826.842681-6-cassel@kernel.org
Niklas Cassel [Mon, 10 Mar 2025 11:10:18 +0000 (12:10 +0100)]
PCI: endpoint: pcitest: Add IRQ_TYPE_* defines to UAPI header
These IRQ_TYPE_* defines are used by both drivers/misc/pci_endpoint_test.c
and tools/testing/selftests/pci_endpoint/pci_endpoint_test.c.
Considering that both the misc driver and the selftest already includes
the pcitest.h UAPI header, it makes sense for these IRQ_TYPE_* defines
to be located in the pcitest.h UAPI header.
Kunihiko Hayashi [Tue, 25 Feb 2025 11:02:52 +0000 (20:02 +0900)]
misc: pci_endpoint_test: Do not use managed IRQ functions
The pci_endpoint_test_request_irq() and pci_endpoint_test_release_irq()
are called repeatedly by the users through pci_endpoint_test_set_irq().
So using the managed version of IRQ functions within these functions
has no effect.
Kunihiko Hayashi [Tue, 25 Feb 2025 11:02:51 +0000 (20:02 +0900)]
misc: pci_endpoint_test: Remove global 'irq_type' and 'no_msi'
The global variable "irq_type" preserves the current value of
ioctl(GET_IRQTYPE).
However, all tests that use interrupts first call ioctl(SET_IRQTYPE)
to set "test->irq_type", then write the value of test->irq_type into
the register pointed by test_reg_bar, and request the interrupt to the
endpoint. The endpoint function driver, pci-epf-test, refers to the
register, and determine which type of interrupt to raise.
The global variable "irq_type" is never used in the actual test,
so remove the variable and replace it with "test->irq_type".
Also, for the same reason, the variable "no_msi" can be removed.
Initially, "test->irq_type" has IRQ_TYPE_UNDEFINED, and the
ioctl(GET_IRQTYPE) before calling ioctl(SET_IRQTYPE) will return
an error.
Kunihiko Hayashi [Tue, 25 Feb 2025 11:02:50 +0000 (20:02 +0900)]
misc: pci_endpoint_test: Fix 'irq_type' to convey the correct type
There are two variables that indicate the interrupt type to be used
in the next test execution, "irq_type" as global and "test->irq_type".
The global is referenced from pci_endpoint_test_get_irq() to preserve
the current type for ioctl(PCITEST_GET_IRQTYPE).
The type set in this function isn't reflected in the global "irq_type",
so ioctl(PCITEST_GET_IRQTYPE) returns the previous type.
As a result, the wrong type is displayed in old version of "pcitest"
as follows:
- Result of running "pcitest -i 0"
SET IRQ TYPE TO LEGACY: OKAY
- Result of running "pcitest -I"
GET IRQ TYPE: MSI
Whereas running the new version of "pcitest" in kselftest results in an
error as follows:
# RUN pci_ep_basic.LEGACY_IRQ_TEST ...
# pci_endpoint_test.c:104:LEGACY_IRQ_TEST:Expected 0 (0) == ret (1)
# pci_endpoint_test.c:104:LEGACY_IRQ_TEST:Can't get Legacy IRQ type
Fix this issue by propagating the current type to the global "irq_type".
Philipp Stanner [Wed, 12 Mar 2025 08:06:35 +0000 (09:06 +0100)]
PCI: Check BAR index for validity
Many functions in PCI use accessor macros such as pci_resource_len(),
which take a BAR index. That index, however, is never checked for
validity, potentially resulting in undefined behavior by overflowing the
array pci_dev.resource in the macro pci_resource_n().
Since many users of those macros directly assign the accessed value to
an unsigned integer, the macros cannot be changed easily anymore to
return -EINVAL for invalid indexes. Consequently, the problem has to be
mitigated in higher layers.
Add pci_bar_index_valid(). Use it where appropriate.
Link: https://lore.kernel.org/r/20250312080634.13731-4-phasta@kernel.org Closes: https://lore.kernel.org/all/adb53b1f-29e1-3d14-0e61-351fd2d3ff0d@linux.intel.com/ Reported-by: Bingbu Cao <bingbu.cao@linux.intel.com> Signed-off-by: Philipp Stanner <phasta@kernel.org>
[kwilczynski: correct if-statement condition the pci_bar_index_is_valid()
helper function uses, tidy up code comments] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: fix typo] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Hot-removal of nested PCI hotplug ports suffers from a long-standing race
condition which can lead to a deadlock: A parent hotplug port acquires
pci_lock_rescan_remove(), then waits for pciehp to unbind from a child
hotplug port. Meanwhile that child hotplug port tries to acquire
pci_lock_rescan_remove() as well in order to remove its own children.
The deadlock only occurs if the parent acquires pci_lock_rescan_remove()
first, not if the child happens to acquire it first.
Several workarounds to avoid the issue have been proposed and discarded
over the years, e.g.:
A proper fix is being worked on, but needs more time as it is nontrivial
and necessarily intrusive.
Recent commit 9d573d19547b ("PCI: pciehp: Detect device replacement during
system sleep") provokes more frequent occurrence of the deadlock when
removing more than one Thunderbolt device during system sleep. The commit
sought to detect device replacement, but also triggered on device removal.
Differentiating reliably between replacement and removal is impossible
because pci_get_dsn() returns 0 both if the device was removed, as well as
if it was replaced with one lacking a Device Serial Number.
Avoid the more frequent occurrence of the deadlock by checking whether the
hotplug port itself was hot-removed. If so, there's no sense in checking
whether its child device was replaced.
This works because the ->resume_noirq() callback is invoked in top-down
order for the entire hierarchy: A parent hotplug port detecting device
replacement (or removal) marks all children as removed using
pci_dev_set_disconnected() and a child hotplug port can then reliably
detect being removed.
Philipp Stanner [Wed, 12 Mar 2025 08:06:34 +0000 (09:06 +0100)]
PCI: Fix wrong length of devres array
The array for the iomapping cookie addresses has a length of
PCI_STD_NUM_BARS. This constant, however, only describes standard BARs;
while PCI can allow for additional, special BARs.
The total number of PCI resources is described by constant
PCI_NUM_RESOURCES, which is also used in, e.g., pci_select_bars().
Thus, the devres array has so far been too small.
Change the length of the devres array to PCI_NUM_RESOURCES.
Link: https://lore.kernel.org/r/20250312080634.13731-3-phasta@kernel.org Fixes: bbaff68bf4a4 ("PCI: Add managed partial-BAR request and map infrastructure") Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org # v6.11+
Ma Ke [Sun, 2 Feb 2025 06:23:57 +0000 (14:23 +0800)]
PCI: Fix reference leak in pci_alloc_child_bus()
If device_register(&child->dev) fails, call put_device() to explicitly
release child->dev, per the comment at device_register().
Found by code review.
Link: https://lore.kernel.org/r/20250202062357.872971-1-make24@iscas.ac.cn Fixes: 4f535093cf8f ("PCI: Put pci_dev in device tree as early as possible") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: stable@vger.kernel.org
Bjorn Helgaas [Sat, 15 Feb 2025 00:03:01 +0000 (18:03 -0600)]
PCI: Cache offset of Resizable BAR capability
Previously most resizable BAR interfaces (pci_rebar_get_possible_sizes(),
pci_rebar_set_size(), etc) as well as pci_restore_state() searched config
space for a Resizable BAR capability. Most devices don't have such a
capability, so this is wasted effort, especially for pci_restore_state().
Search for a Resizable BAR capability once at enumeration-time and cache
the offset so we don't have to search every time we need it. No functional
change intended.
Bjorn Helgaas [Mon, 3 Mar 2025 21:02:17 +0000 (15:02 -0600)]
PCI: Enable Configuration RRS SV early
Following a reset, a Function may respond to Config Requests with Request
Retry Status (RRS) Completion Status to indicate that it is temporarily
unable to process the Request, but will be able to process the Request in
the future (PCIe r6.0, sec 2.3.1).
If the Configuration RRS Software Visibility feature is enabled and a Root
Complex receives RRS for a config read of the Vendor ID, the Root Complex
completes the Request to the host by returning PCI_VENDOR_ID_PCI_SIG,
0x0001 (sec 2.3.2).
The Config RRS SV feature applies only to Root Ports and is not directly
related to pci_scan_bridge_extend(). Move the RRS SV enable to
set_pcie_port_type() where we handle other PCIe-specific configuration.
Krzysztof Kozlowski [Fri, 7 Mar 2025 08:13:26 +0000 (09:13 +0100)]
dt-bindings: PCI: fsl,layerscape-pcie-ep: Drop deprecated windows
The example DTS uses 'num-ib-windows' and 'num-ob-windows' properties
but these are not defined in the binding. Binding also does not
reference snps,dw-pcie-common.yaml, probably because it is quite
different even though the device is based on Synopsys controller.
The properties are actually deprecated, so simply drop them from the
example.
Christian Bruel [Fri, 24 Jan 2025 12:30:43 +0000 (13:30 +0100)]
PCI: endpoint: pci-epf-test: Fix double free that causes kernel to oops
Fix a kernel oops found while testing the stm32_pcie Endpoint driver
with handling of PERST# deassertion:
During EP initialization, pci_epf_test_alloc_space() allocates all BARs,
which are further freed if epc_set_bar() fails (for instance, due to no
free inbound window).
However, when pci_epc_set_bar() fails, the error path:
pci_epc_set_bar() ->
pci_epf_free_space()
does not clear the previous assignment to epf_test->reg[bar].
Then, if the host reboots, the PERST# deassertion restarts the BAR
allocation sequence with the same allocation failure (no free inbound
window), creating a double free situation since epf_test->reg[bar] was
deallocated and is still non-NULL.
Thus, make sure that pci_epf_alloc_space() and pci_epf_free_space()
invocations are symmetric, and as such, set epf_test->reg[bar] to NULL
when memory is freed.
The static function devm_pci_epc_match() is only invoked within the
devm_pci_epc_destroy(). However, since it was initially introduced,
this new API has had no callers.
Thus, remove both the unused API and the static function.
Niklas Cassel [Fri, 31 Jan 2025 18:29:56 +0000 (19:29 +0100)]
PCI: dw-rockchip: Describe Resizable BARs as Resizable BARs
Looking at section "11.4.4.29 USP_PCIE_RESBAR Registers Summary" in the
Technical Reference Manual (TRM) for RK3588, we can see that none of the
BARs are Fixed BARs, but actually Resizable BARs.
I couldn't find any reference in the TRM for RK3568, but looking at the
downstream PCIe endpoint driver, both RK3568 and RK3588 are treated as
the same, so the BARs on RK3568 must also be Resizable BARs.
Now when we actually have support for Resizable BARs, let's configure
these BARs as such.
The support for a specific iATU alignment was added in
commit 2a9a801620ef ("PCI: endpoint: Add support to specify alignment for
buffers allocated to BARs").
This commit specifically mentions both that the alignment by each DWC
based EP driver should match CX_ATU_MIN_REGION_SIZE, and that AM65x
specifically has a 64 KB alignment.
This also matches the CX_ATU_MIN_REGION_SIZE value specified in the
section "12.2.2.4.7 PCIe Subsystem Address Translation" of the Technical
Reference Manual (TRM) for AM65x:
https://www.ti.com/lit/ug/spruid7e/spruid7e.pdf
This higher value, 1 MB, was obviously an ugly hack used to be able to
handle Resizable BARs which have a minimum size of 1 MB.
Now when we actually have support for Resizable BARs, let's configure the
iATU alignment requirement to the actual requirement.
(BARs described as Resizable will still get aligned to 1 MB.)
Cc: stable+noautosel@kernel.org # Depends on PCI endpoint Resizable BARs series Fixes: 23284ad677a9 ("PCI: keystone: Add support for PCIe EP in AM654x Platforms") Signed-off-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250131182949.465530-15-cassel@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Each of these sizing schemes have different instructions for how to
initialize the BAR.
The DWC driver currently does not support resizable BARs.
Instead, in order to somewhat support resizable BARs, the DWC EP driver
currently has an ugly hack that force sets a resizable BAR to 1 MB, if
such a BAR is detected.
Additionally, this hack only works if the DWC glue driver also has lied
in their EPC features, and claimed that the resizable BAR is a 1 MB fixed
size BAR.
This is unintuitive (as you somehow need to know that you need to lie in
your EPC features), but other than that it is overly restrictive, since a
resizable BAR is capable of supporting sizes different than 1 MB.
Add proper support for resizable BARs in the DWC EP driver.
Note that the pci_epc_set_bar() API takes a struct pci_epf_bar which tells
the EPC driver how it wants to configure the BAR.
struct pci_epf_bar only has a single size struct member.
This means that an EPC driver will only be able to set a single supported
size. This is perfectly fine, as we do not need the complexity of allowing
a host to change the size of the BAR. If someone ever wants to support
resizing a resizable BAR, the pci_epc_set_bar() API can be extended in the
future.
With these changes, we allow an EPF driver to configure the size of
Resizable BARs, rather than forcing them to a 1 MB size.
This means that an EPC driver does not need to lie in EPC features, and an
EPF driver will be able to set an arbitrary size (not be forced to a 1 MB
size), just like BAR_PROGRAMMABLE.
Niklas Cassel [Fri, 31 Jan 2025 18:29:50 +0000 (19:29 +0100)]
PCI: endpoint: Allow EPF drivers to configure the size of Resizable BARs
A resizable BAR is different from a normal BAR in a few ways:
- The minimum size of a resizable BAR is 1 MB.
- Each BAR that is resizable has a Capability and Control register in
the Resizable BAR Capability structure.
These registers contain the supported sizes and the currently selected
size of a resizable BAR.
The supported sizes is a bitmap of the supported sizes. The selected size
is a single value that is equal to one of the supported sizes.
A resizable BAR thus has to be configured differently than a
BAR_PROGRAMMABLE BAR, which usually sets the BAR size/mask in a vendor
specific way.
The PCI endpoint framework currently does not support resizable BARs.
Add a BAR type BAR_RESIZABLE, so that an EPC driver can support resizable
BARs properly.
Note that the pci_epc_set_bar() API takes a struct pci_epf_bar which tells
the EPC driver how it wants to configure the BAR.
struct pci_epf_bar only has a single size struct member.
This means that an EPC driver will only be able to set a single supported
size. This is perfectly fine, as we do not need the complexity of allowing
a host to change the size of the BAR. If someone ever wants to support
resizing a resizable BAR, the pci_epc_set_bar() API can be extended in the
future.
With these changes, we allow an EPF driver to configure the size of
Resizable BARs, rather than forcing them to a 1 MB size.
The struct pci_epf_test_reg is the actual data in pci-epf-test's test_reg
BAR (usually BAR0), which the host uses to send commands (etc.), and which
pci-epf-test uses to send back status codes.
pci-epf-test currently reads and writes this data without any endianness
conversion functions, which means that pci-epf-test is completely broken
on big-endian endpoint systems.
PCI devices are inherently little-endian, and the data stored in the PCI
BARs should be in little-endian.
Use endianness conversion functions when reading and writing data to
struct pci_epf_test_reg so that pci-epf-test will behave correctly on
big-endian endpoint systems.
Fixes: 349e7a85b25f ("PCI: endpoint: functions: Add an EP function to test PCI") Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Link: https://lore.kernel.org/r/20250127161242.104651-2-cassel@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Kunihiko Hayashi [Tue, 25 Feb 2025 11:02:48 +0000 (20:02 +0900)]
misc: pci_endpoint_test: Avoid issue of interrupts remaining after request_irq error
After devm_request_irq() fails with error in pci_endpoint_test_request_irq(),
the pci_endpoint_test_free_irq_vectors() is called assuming that all IRQs
have been released.
However, some requested IRQs remain unreleased, so there are still
/proc/irq/* entries remaining, and this results in WARN() with the
following message:
remove_proc_entry: removing non-empty directory 'irq/30', leaking at least 'pci-endpoint-test.0'
WARNING: CPU: 0 PID: 202 at fs/proc/generic.c:719 remove_proc_entry +0x190/0x19c
To solve this issue, set the number of remaining IRQs to test->num_irqs,
and release IRQs in advance by calling pci_endpoint_test_release_irq().
Niklas Cassel [Fri, 24 Jan 2025 09:33:01 +0000 (10:33 +0100)]
misc: pci_endpoint_test: Handle BAR sizes larger than INT_MAX
Running 'pcitest -b 0' fails with "TEST FAILED" when the BAR0 size
is e.g. 8 GB.
The return value of the pci_resource_len() macro can be larger than that
of a signed integer type. Thus, when using 'pcitest' with an 8 GB BAR,
the bar_size of the integer type will overflow.
Change bar_size from integer to resource_size_t to prevent integer
overflow for large BAR sizes with 32-bit compilers.
In order to handle 64-bit resource_type_t on 32-bit platforms, we would
have needed to use a function like div_u64() or similar. Instead, change
the code to use addition instead of division. This avoids the need for
div_u64() or similar, while also simplifying the code.
Fixes: cda370ec6d1f ("misc: pci_endpoint_test: Avoid using hard-coded BAR sizes") Co-developed-by: Hans Zhang <18255117159@163.com> Signed-off-by: Hans Zhang <18255117159@163.com> Signed-off-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250124093300.3629624-2-cassel@kernel.org
[mani: added fixes tag] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Niklas Cassel [Thu, 23 Jan 2025 12:01:48 +0000 (13:01 +0100)]
misc: pci_endpoint_test: Give disabled BARs a distinct error code
The current code returns -ENOMEM if test->bar[barno] is NULL.
There can be two reasons why test->bar[barno] is NULL:
1) The pci_ioremap_bar() call in pci_endpoint_test_probe() failed.
2) The BAR was skipped, because it is disabled by the endpoint.
Many PCI endpoint controller drivers will disable all BARs in their
init function. A disabled BAR will have a size of 0.
A PCI endpoint function driver will be able to enable any BAR that
is not marked as BAR_RESERVED (which means that the BAR should not
be touched by the EPF driver).
Thus, perform check if the size is 0, before checking if
test->bar[barno] is NULL, such that we can return different errors.
This will allow the selftests to return SKIP instead of FAIL for
disabled BARs.
Commit f26d37ee9bda ("misc: pci_endpoint_test: Fix IOCTL return value")
changed the return value of pci_endpoint_test_bars_read_bar() from false
to -EINVAL on error, however, it failed to update the error handling.
Niklas Cassel [Thu, 23 Jan 2025 12:01:49 +0000 (13:01 +0100)]
selftests: pci_endpoint: Skip disabled BARs
Currently BARs that have been disabled by the endpoint controller driver
will result in a test FAIL.
Returning FAIL for a BAR that is disabled seems overly pessimistic.
There are EPC that disables one or more BARs intentionally.
One reason for this is that there are certain EPCs that are hardwired to
expose internal PCIe controller registers over a certain BAR, so the EPC
driver disables such a BAR, such that the host will not overwrite random
registers during testing.
Such a BAR will be disabled by the EPC driver's init function, and the
BAR will be marked as BAR_RESERVED, such that it will be unavailable to
endpoint function drivers.
Let's return FAIL only for BARs that are actually enabled and failed the
test, and let's return skip for BARs that are not even enabled.
Ilpo Järvinen [Fri, 7 Mar 2025 14:09:22 +0000 (16:09 +0200)]
PCI: Do not claim to release resource falsely
pci_release_resource() will print "... releasing" regardless of the
resource being assigned or not. Move the print after the res->parent check
to avoid claiming the kernel would be releasing an unassigned resource.
Likely, none of the current callers pass a resource that is unassigned so
this change is mostly to correct the non-sensical order than to remove
errorneous printouts.
Zhiyuan Dai [Fri, 7 Mar 2025 05:35:29 +0000 (13:35 +0800)]
PCI: Increase Resizable BAR support from 512 GB to 128 TB
Per PCIe r6.0, sec 7.8.6.2, devices can advertise Resizable BAR sizes up to
128 TB in the Resizable BAR Capability register. Larger sizes can be
advertised via the Capability register, but that requires an API change.
Update pci_rebar_get_possible_sizes() and pbus_size_mem() to increase the
sizes we currently support from 512 GB to 128 TB.
Alistair Francis [Thu, 6 Mar 2025 07:52:09 +0000 (17:52 +1000)]
PCI/DOE: Rename Discovery Response Data Object Contents to type
PCIe r6.1, sec 6.30.1.1, describes a "Vendor ID", a "Data Object Type" and
"Next Index" as the fields in the DOE Discovery Response Data Object. The
DOE driver currently uses both the terms 'type' and 'prot' for the second
element.
Rename all uses of the DOE Discovery Response Data Object to use 'type' as
the second element of the object header, instead of type/prot as it
currently is.
J. Neuschäfer [Thu, 20 Feb 2025 12:29:58 +0000 (13:29 +0100)]
dt-bindings: PCI: Convert fsl,mpc83xx-pcie to YAML
Formalise the binding for the PCI controllers in the Freescale MPC8xxx
chip family. Information about PCI-X-specific properties was taken from
fsl,pci.txt. The examples were taken from mpc8315erdb.dts and
xpedite5200_xmon.dts.
D M, Sharath Kumar [Fri, 21 Feb 2025 17:04:52 +0000 (11:04 -0600)]
PCI: altera: Add Agilex support
Add PCIe Root Port controller support for the Agilex family of chips.
The Agilex PCIe Hard IP has three variants that are mostly software
compatible, except for a couple register offsets. The P-Tile variant
supports Gen3/Gen4 1x16. The F-Tile variant supports Gen3/Gen4 4x4,
4x8, and 4x16. The R-Tile variant improves on the F-Tile variant by
adding Gen5 support.
To simplify the implementation of pci_ops read/write functions,
ep_{read/write}_cfg() callbacks were added to struct altera_pci_ops
to easily distinguish between hardware variants.
Signed-off-by: D M, Sharath Kumar <sharath.kumar.d.m@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250221170452.875419-3-matthew.gerlach@linux.intel.com
[kwilczynski: tidy code comments] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Zhang Zekun [Sat, 31 Aug 2024 04:04:13 +0000 (12:04 +0800)]
PCI: tegra: Use helper function for_each_child_of_node_scoped()
The for_each_available_child_of_node_scoped() helper provides
a scope-based clean-up functionality to put the device_node
automatically, and as such, there is no need to call of_node_put()
directly.
Zhang Zekun [Sat, 31 Aug 2024 04:04:12 +0000 (12:04 +0800)]
PCI: apple: Use helper function for_each_child_of_node_scoped()
The for_each_available_child_of_node_scoped() helper provides
a scope-based clean-up functionality to put the device_node
automatically, and as such, there is no need to call of_node_put()
directly.
Zhang Zekun [Sat, 31 Aug 2024 04:04:11 +0000 (12:04 +0800)]
PCI: mt7621: Use helper function for_each_available_child_of_node_scoped()
The for_each_available_child_of_node_scoped() helper provides
a scope-based clean-up functionality to put the device_node
automatically, and as such, there is no need to call of_node_put()
directly.