Bjorn Helgaas [Sat, 28 Oct 2023 18:31:02 +0000 (13:31 -0500)]
Merge branch 'pci/controller/xilinx-ecam'
- Drop xilinx-nwl updates of bridge bus number fields, since PCI core
already does that (Thippeswamy Havalige)
- Update xilinx-nwl driver and ECAM size in devicetree example to allow up
to 256 buses (Thippeswamy Havalige)
* pci/controller/xilinx-ecam:
PCI: xilinx-nwl: Modify ECAM size to enable support for 256 buses
PCI: xilinx-nwl: Rename the NWL_ECAM_VALUE_DEFAULT macro
dt-bindings: PCI: xilinx-nwl: Modify ECAM size in the DT example
PCI: xilinx-nwl: Remove redundant code that sets Type 1 header fields
Bjorn Helgaas [Sat, 28 Oct 2023 18:31:02 +0000 (13:31 -0500)]
Merge branch 'pci/controller/speed'
- Use PCIE_SPEED2MBS_ENC() macro in qcom host and endpoint to encode link
speed instead of hard-coding the link speed in MBps (Manivannan
Sadhasivam)
- Use Mbps_to_icc() (not MBps_to_icc()) in tegra194 instead of explicitly
doing the bytes-to-bits conversion (Manivannan Sadhasivam)
* pci/controller/speed:
PCI: tegra194: Use Mbps_to_icc() macro for setting icc speed
PCI: qcom-ep: Use PCIE_SPEED2MBS_ENC() macro for encoding link speed
PCI: qcom: Use PCIE_SPEED2MBS_ENC() macro for encoding link speed
Bjorn Helgaas [Sat, 28 Oct 2023 18:31:01 +0000 (13:31 -0500)]
Merge branch 'pci/controller/rcar'
- Add generic T_PVPERL macro for the required interval between power being
stable and PERST# being inactive (Yoshihiro Shimoda)
- Factor out dw_pcie_link_set_max_link_width() (Yoshihiro Shimoda)
- Update PCI_EXP_LNKCAP_MLW so Link Capabilities shows the correct max link
width (Yoshihiro Shimoda)
- Drop tegra194 PCI_EXP_LNKCAP_MLW setting since dw_pcie_setup() already
does it (Yoshihiro Shimoda)
- Add dwc support for different dbi and dbi2 register offsets, to be used
for R-Car Gen4 controllers (Yoshihiro Shimoda)
- Add EDMA_UNROLL capability flag for R-Car Gen4 controllers that don't
correctly advertise unrolled mapping via their eDMA CTRL register
(Yoshihiro Shimoda)
- Export dw_pcie_ep_exit() for use by the modular R-Car Gen4 driver
(Yoshihiro Shimoda)
- Add .pre_init() and .deinit() hooks for use by R-Car Gen4 controllers
(Yoshihiro Shimoda)
- Increase snps,dw-pcie DT reg and reg-names maxItems for R-Car Gen4
controllers (Yoshihiro Shimoda)
- Add rcar-gen4-pci host and endpoint DT bindings and drivers (Yoshihiro
Shimoda)
- Add Renesas R8A779F0 Device ID to pci_endpoint_test to allow testing on
R-Car S4-8 (Yoshihiro Shimoda)
* pci/controller/rcar:
misc: pci_endpoint_test: Add Device ID for R-Car S4-8 PCIe controller
MAINTAINERS: Update PCI DRIVER FOR RENESAS R-CAR for R-Car Gen4
PCI: rcar-gen4: Add endpoint mode support
PCI: rcar-gen4: Add R-Car Gen4 PCIe controller support for host mode
dt-bindings: PCI: renesas: Add R-Car Gen4 PCIe Endpoint
dt-bindings: PCI: renesas: Add R-Car Gen4 PCIe Host
dt-bindings: PCI: dwc: Update maxItems of reg and reg-names
PCI: dwc: endpoint: Introduce .pre_init() and .deinit()
PCI: dwc: Expose dw_pcie_write_dbi2() to module
PCI: dwc: Expose dw_pcie_ep_exit() to module
PCI: dwc: Add EDMA_UNROLL capability flag
PCI: dwc: endpoint: Add multiple PFs support for dbi2
PCI: tegra194: Drop PCI_EXP_LNKSTA_NLW setting
PCI: dwc: Add missing PCI_EXP_LNKCAP_MLW handling
PCI: dwc: Add dw_pcie_link_set_max_link_width()
PCI: Add T_PVPERL macro
Bjorn Helgaas [Sat, 28 Oct 2023 18:31:00 +0000 (13:31 -0500)]
Merge branch 'pci/vga'
- Add pci_is_vga() helper, which checks for both PCI_CLASS_DISPLAY_VGA and
PCI_CLASS_NOT_DEFINED_VGA (which catches ancient devices built before
Class Codes were defined) (Sui Jingfeng)
- Use the new pci_is_vga() to identify devices for the VGA arbiter, the
sysfs "boot_vga" attribute, and the virtio and qxl drivers (SUi Jingfeng)
* pci/vga:
drm/qxl: Use pci_is_vga() to identify VGA devices
drm/virtio: Use pci_is_vga() to identify VGA devices
PCI/sysfs: Enable 'boot_vga' attribute via pci_is_vga()
PCI/VGA: Select VGA devices earlier
PCI/VGA: Use pci_is_vga() to identify VGA devices
PCI: Add pci_is_vga() helper
Bjorn Helgaas [Sat, 28 Oct 2023 18:30:59 +0000 (13:30 -0500)]
Merge branch 'pci/p2pdma'
- Move struct dev_pagemap (a flexible structure) to end of struct
pci_p2pdma_pagemap to avoid overwriting things after dev_pagemap
(Gustavo A. R. Silva)
Bjorn Helgaas [Sat, 28 Oct 2023 18:30:57 +0000 (13:30 -0500)]
Merge branch 'pci/aspm'
* pci/aspm:
PCI/ASPM: Fix L1 substate handling in aspm_attr_store_common()
Revert "PCI/ASPM: Disable only ASPM_STATE_L1 when driver, disables L1"
PCI/ASPM: Convert printk() to pr_*() and add include
PCI/ASPM: Remove unnecessary includes
PCI/ASPM: Use FIELD_MAX() instead of literals
PCI/ASPM: Use time constants
PCI/ASPM: Return U32_MAX instead of bit magic construct
PCI/ASPM: Use FIELD_GET/PREP() to access PCIe capability fields
PCI: Add PCI_L1SS_CTL2 fields
Thippeswamy Havalige [Mon, 16 Oct 2023 05:11:02 +0000 (10:41 +0530)]
PCI: xilinx-nwl: Modify ECAM size to enable support for 256 buses
The PCIe Root Port controller expects ECAM size to be set through software.
As such, update the value of the NWL_ECAM_VALUE_DEFAULT macro to 16 to
allow the controller to address the 256 MB ECAM region and, as such,
enable support for detecting up to 256 buses.
D Scott Phillips [Sat, 30 Sep 2023 00:20:36 +0000 (17:20 -0700)]
PCI: hotplug: Add Ampere Altra Attention Indicator extension driver
On Ampere Altra, PCIe hotplug is handled through ACPI. A side interface is
also present to request system firmware control of the hotplug Attention
Indicators. Add an ACPI PCI Hotplug companion driver to support Attention
Indicator control.
D Scott Phillips [Sat, 30 Sep 2023 00:20:35 +0000 (17:20 -0700)]
PCI: acpiphp: Allow built-in drivers for Attention Indicators
Since the introduction of the attention callback in acpiphp, a non-zero
struct module *owner has been required in acpiphp_register_attention(). The
intent seemed to be that the core code could hold a refcount on the module
while invoking a callback.
This check accidentally precludes the possibility of attention callbacks to
built-in drivers.
Remove the check on `struct module *owner` in acpiphp_register_attention()
so attention callbacks can also be registered from built-in drivers.
Heiner Kallweit [Wed, 11 Oct 2023 07:46:45 +0000 (09:46 +0200)]
PCI/ASPM: Fix L1 substate handling in aspm_attr_store_common()
aspm_attr_store_common(), which handles sysfs control of ASPM, has the same
problem as fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver
disables L1"): disabling L1 adds only ASPM_L1 (but not any of the L1.x
substates) to the "aspm_disable" mask.
Enabling one substate, e.g., L1.1, via sysfs removes ASPM_L1 from the
disable mask. Since disabling L1 via sysfs doesn't add any of the
substates to the disable mask, enabling L1.1 actually enables *all* the
substates.
In this scenario:
- Write 0 to "l1_aspm" to disable L1
- Write 1 to "l1_1_aspm" to enable L1.1
the intention is to disable L1 and all L1.x substates, then enable just
L1.1, but in fact, *all* L1.x substates are enabled.
Fix this by explicitly disabling all the L1.x substates when disabling L1.
After fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver
disables L1"), disabling L1 via pci_disable_link_state(PCIE_LINK_STATE_L1),
then enabling one substate, e.g., L1.1, via sysfs actually enables *all*
the substates.
For example, r8169 disables L1 because of hardware issues on a number of
systems, which implicitly disables the L1.1 and L1.2 substates.
On some systems, L1 and L1.1 work fine, but L1.2 causes missed rx packets.
Enabling L1.1 via the sysfs "aspm_l1_1" attribute unexpectedly enables L1.2
as well as L1.1.
After fb097dcd5a28, pci_disable_link_state(PCIE_LINK_STATE_L1) adds only
ASPM_L1 (but not any of the L1.x substates) to the "aspm_disable" mask:
Enabling an L1.x substate removes the substate and L1 from the
"aspm_disable" mask. After fb097dcd5a28, the substates were not added to
the mask when disabling L1, so enabling one substate implicitly enables all
of them.
Revert fb097dcd5a28 so enabling one substate doesn't enable the others.
Yoshihiro Shimoda [Wed, 18 Oct 2023 08:56:28 +0000 (17:56 +0900)]
PCI: rcar-gen4: Add R-Car Gen4 PCIe controller support for host mode
Add R-Car Gen4 PCIe controller support for host mode.
This controller is based on Synopsys DesignWare PCIe. However, this
particular controller has a number of vendor-specific registers, and as
such, requires initialization code like mode setting and retraining and
so on.
Yoshihiro Shimoda [Wed, 18 Oct 2023 08:56:24 +0000 (17:56 +0900)]
PCI: dwc: endpoint: Introduce .pre_init() and .deinit()
Renesas R-Car Gen4 PCIe controllers require vendor-specific
initialization before .init().
To use dw->dbi and dw->num-lanes in the initialization code,
introduce .pre_init() into struct dw_pcie_ep_ops. While at it,
also introduce .deinit() to disable the controller by using
vendor-specific de-initialization.
Note that the ep_init in the struct dw_pcie_ep_ops should be
renamed to init later.
Since no PCIe controller drivers call this, this change is not required
for now. But, Renesas R-Car Gen4 PCIe controller driver will call this
and if the controller driver is built as a kernel module, the following
build error happens:
Yoshihiro Shimoda [Wed, 18 Oct 2023 08:56:23 +0000 (17:56 +0900)]
PCI: dwc: Expose dw_pcie_ep_exit() to module
Since no PCIe controller drivers call this, this change is not required
for now. But, Renesas R-Car Gen4 PCIe controller driver will call this
and if the controller driver is built as a kernel module, the following
build error happens:
Yoshihiro Shimoda [Wed, 18 Oct 2023 08:56:21 +0000 (17:56 +0900)]
PCI: dwc: endpoint: Add multiple PFs support for dbi2
The commit 24ede430fa49 ("PCI: designware-ep: Add multiple PFs support
for DWC") added .func_conf_select() to get the configuration space of
different PFs and assumed that the offsets between dbi and dbi2 would
be the same.
However, Renesas R-Car Gen4 PCIe controllers have different offsets of
function 1: dbi (+0x1000) and dbi2 (+0x800). To get the offset for dbi2,
add .get_dbi2_offset() and dw_pcie_ep_get_dbi2_offset().
Note:
- .func_conf_select() should be renamed later.
- dw_pcie_ep_get_dbi2_offset() will call .func_conf_select()
if .get_dbi2_offset() doesn't exist for backward compatibility.
- dw_pcie_writeX_{dbi/dbi2} APIs accepted the func_no argument,
so that these offset calculations are contained in the API
definitions itself as it should.
Yoshihiro Shimoda [Wed, 18 Oct 2023 08:56:19 +0000 (17:56 +0900)]
PCI: dwc: Add missing PCI_EXP_LNKCAP_MLW handling
Update dw_pcie_link_set_max_link_width() to set PCI_EXP_LNKCAP_MLW.
In accordance with the DW PCIe RC/EP HW manuals [1,2,3,...] aside with
the PORT_LINK_CTRL_OFF.LINK_CAPABLE and GEN2_CTRL_OFF.NUM_OF_LANES[8:0]
field there is another one which needs to be updated.
It's LINK_CAPABILITIES_REG.PCIE_CAP_MAX_LINK_WIDTH. If it isn't done at
the very least the maximum link-width capability CSR won't expose the
actual maximum capability.
[1] DesignWare Cores PCI Express Controller Databook - DWC PCIe Root Port,
Version 4.60a, March 2015, p.1032
[2] DesignWare Cores PCI Express Controller Databook - DWC PCIe Root Port,
Version 4.70a, March 2016, p.1065
[3] DesignWare Cores PCI Express Controller Databook - DWC PCIe Root Port,
Version 4.90a, March 2016, p.1057
...
[X] DesignWare Cores PCI Express Controller Databook - DWC PCIe Endpoint,
Version 5.40a, March 2019, p.1396
[X+1] DesignWare Cores PCI Express Controller Databook - DWC PCIe Root Port,
Version 5.40a, March 2019, p.1266
Yoshihiro Shimoda [Wed, 18 Oct 2023 08:56:18 +0000 (17:56 +0900)]
PCI: dwc: Add dw_pcie_link_set_max_link_width()
This is a preparation before adding the Max-Link-width capability
setup which would in its turn complete the max-link-width setup
procedure defined by Synopsys in the HW-manual.
Seeing there is a max-link-speed setup method defined in the DW PCIe
core driver it would be good to have a similar function for the link
width setup.
That's why we need to define a dedicated function first from already
implemented but incomplete link-width setting up code.
Yoshihiro Shimoda [Wed, 18 Oct 2023 08:56:17 +0000 (17:56 +0900)]
PCI: Add T_PVPERL macro
According to the PCIe CEM r5.0, sec 2.9.2, Power stable to PERST#
inactive interval is 100 ms as minimum. Add a macro so that the PCIe
controller drivers can make use of it.
PCI: Disable ATS for specific Intel IPU E2000 devices
Due to a hardware issue in A and B steppings of Intel IPU E2000, it expects
wrong endianness in ATS invalidation message body. This problem can lead to
outdated translations being returned as valid and finally cause system
instability.
To prevent such issues, add quirk_intel_e2000_no_ats() to disable ATS for
vulnerable IPU E2000 devices.
Link: https://lore.kernel.org/r/20230908143606.685930-3-bartosz.pawlowski@intel.com Signed-off-by: Bartosz Pawlowski <bartosz.pawlowski@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
PCI: hv: Annotate struct hv_dr_state with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct hv_dr_state.
Manivannan Sadhasivam [Tue, 10 Oct 2023 15:59:14 +0000 (21:29 +0530)]
PCI: qcom: Enable ASPM for platforms supporting 1.9.0 ops
ASPM is supported by Qcom host controllers/bridges on most of the recent
platforms and so the devices tested so far. But for enabling ASPM by
default (without using Kconfig, kernel command-line or sysfs), BIOS has
to enable ASPM on both host bridge and downstream devices during boot.
Unfortunately, none of the BIOS available on Qcom platforms enables
ASPM. Due to this, the platforms making use of Qcom SoCs draw high power
during runtime.
To fix this power draw issue, users have to enable ASPM using Kconfig,
kernel command-line, sysfs or the BIOS has to start enabling ASPM.
The latter may happen in the future, but that won't address the issue on
current platforms. Also, asking users to enable a feature to get the power
management right would provide an unpleasant out-of-the-box experience.
So the apt solution is to enable ASPM in the controller driver itself. And
this is being accomplished by calling pci_enable_link_state() in the newly
introduced host_post_init() callback for all the devices connected to the
bus. This function enables all supported link low power states for both
host bridge and the downstream devices.
Due to limited testing, ASPM is only enabled for platforms making use of
ops_1_9_0 callbacks.
Manivannan Sadhasivam [Tue, 10 Oct 2023 15:59:13 +0000 (21:29 +0530)]
PCI: dwc: Add host_post_init() callback
This callback can be used by the platform drivers to do configuration
once all the devices are scanned. Like changing LNKCTL of all downstream
devices to enable ASPM etc...
Manivannan Sadhasivam [Wed, 4 Oct 2023 16:44:30 +0000 (22:14 +0530)]
PCI: tegra194: Use Mbps_to_icc() macro for setting icc speed
PCIe speed returned by the PCIE_SPEED2MBS_ENC() macro is in Mbps. So
instead of converting it to MBps explicitly and using the MBps_to_icc()
macro, let's use the Mbps_to_icc() macro to pass the value directly.
Manivannan Sadhasivam [Wed, 4 Oct 2023 16:44:29 +0000 (22:14 +0530)]
PCI: qcom-ep: Use PCIE_SPEED2MBS_ENC() macro for encoding link speed
Instead of hardcoding the link speed in MBps, use existing
PCIE_SPEED2MBS_ENC() macro that does the encoding of the link speed for
us. Also, let's Wrap it with QCOM_PCIE_LINK_SPEED_TO_BW() macro to do
the conversion to ICC speed.
This eliminates the need for a switch case in qcom_pcie_icc_update() and
also works for future Gen speeds without any code modifications.
Manivannan Sadhasivam [Wed, 4 Oct 2023 16:44:28 +0000 (22:14 +0530)]
PCI: qcom: Use PCIE_SPEED2MBS_ENC() macro for encoding link speed
Instead of hardcoding the link speed in MBps, use existing
PCIE_SPEED2MBS_ENC() macro that does the encoding of the link speed for
us. Also, let's Wrap it with QCOM_PCIE_LINK_SPEED_TO_BW() macro to do
the conversion to ICC speed.
This eliminates the need for a switch case in qcom_pcie_icc_update() and
also works for future Gen speeds without any code modifications.
Uwe Kleine-König [Sun, 1 Oct 2023 17:02:54 +0000 (19:02 +0200)]
PCI: keystone: Don't discard .probe() callback
The __init annotation makes the ks_pcie_probe() function disappear after
booting completes. However a device can also be bound later. In that case,
we try to call ks_pcie_probe(), but the backing memory is likely already
overwritten.
The right thing to do is do always have the probe callback available. Note
that the (wrong) __refdata annotation prevented this issue to be noticed by
modpost.
Uwe Kleine-König [Sun, 1 Oct 2023 17:02:53 +0000 (19:02 +0200)]
PCI: keystone: Don't discard .remove() callback
With CONFIG_PCIE_KEYSTONE=y and ks_pcie_remove() marked with __exit, the
function is discarded from the driver. In this case a bound device can
still get unbound, e.g via sysfs. Then no cleanup code is run resulting in
resource leaks or worse.
The right thing to do is do always have the remove callback available.
Note that this driver cannot be compiled as a module, so ks_pcie_remove()
was always discarded before this change and modpost couldn't warn about
this issue. Furthermore the __ref annotation also prevents a warning.
Uwe Kleine-König [Sun, 1 Oct 2023 17:02:52 +0000 (19:02 +0200)]
PCI: kirin: Don't discard .remove() callback
With CONFIG_PCIE_KIRIN=y and kirin_pcie_remove() marked with __exit, the
function is discarded from the driver. In this case a bound device can
still get unbound, e.g via sysfs. Then no cleanup code is run resulting in
resource leaks or worse.
The right thing to do is do always have the remove callback available.
This fixes the following warning by modpost:
Uwe Kleine-König [Sun, 1 Oct 2023 17:02:51 +0000 (19:02 +0200)]
PCI: exynos: Don't discard .remove() callback
With CONFIG_PCI_EXYNOS=y and exynos_pcie_remove() marked with __exit, the
function is discarded from the driver. In this case a bound device can
still get unbound, e.g via sysfs. Then no cleanup code is run resulting in
resource leaks or worse.
The right thing to do is do always have the remove callback available.
This fixes the following warning by modpost:
Sui Jingfeng [Wed, 30 Aug 2023 11:15:32 +0000 (19:15 +0800)]
drm/qxl: Use pci_is_vga() to identify VGA devices
Use pci_is_vga() to identify VGA devices instead of a private is_vga()
function.
This means qxl will use the VGA arbiter for old PCI_CLASS_NOT_DEFINED_VGA
(0x0001) devices as well as the PCI_CLASS_DISPLAY_VGA (0x0300) devices it
recognized previously.
This probably doesn't make a difference because qxl_pci_driver doesn't
claim PCI_CLASS_NOT_DEFINED_VGA devices by default, so it's mainly a code
simplification.
Link: https://lore.kernel.org/r/20230830111532.444535-6-sui.jingfeng@linux.dev Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
[bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch>
Sui Jingfeng [Wed, 30 Aug 2023 11:15:31 +0000 (19:15 +0800)]
drm/virtio: Use pci_is_vga() to identify VGA devices
Use pci_is_vga() to identify VGA devices instead of open-coding the class
test.
This means virtio_gpu_pci_quirk() will apply to old
PCI_CLASS_NOT_DEFINED_VGA (0x0001) devices as well as the
PCI_CLASS_DISPLAY_VGA (0x0300) devices it did previously.
Sui Jingfeng [Wed, 30 Aug 2023 11:15:30 +0000 (19:15 +0800)]
PCI/sysfs: Enable 'boot_vga' attribute via pci_is_vga()
Enable the 'boot_vga' sysfs attribute via pci_is_vga().
This exposes 'boot_vga' for old PCI_CLASS_NOT_DEFINED_VGA (0x0001) devices
as well as for the PCI_CLASS_DISPLAY_VGA (0x0300) devices where it was
previously exposed.
Sui Jingfeng [Fri, 6 Oct 2023 21:48:38 +0000 (16:48 -0500)]
PCI/VGA: Select VGA devices earlier
Select VGA devices in vga_arb_device_init() and pci_notify() instead of in
vga_arbiter_add_pci_device().
This is a trivial optimization for adding devices. It's a bigger
optimization for the removal case because pci_notify() won't call
vga_arbiter_del_pci_device() for non-VGA devices, so it won't have to
search the vga_list for them.
https://lore.kernel.org/r/20230830111532.444535-3-sui.jingfeng@linux.dev Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
[bhelgaas: commit log, split from functional change] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Sui Jingfeng [Wed, 30 Aug 2023 11:15:29 +0000 (19:15 +0800)]
PCI/VGA: Use pci_is_vga() to identify VGA devices
Use pci_is_vga() to identify VGA devices, so the arbiter will handle old
PCI_CLASS_NOT_DEFINED_VGA (0x0001) devices as well as the
PCI_CLASS_DISPLAY_VGA (0x0300) devices it previously handled.
Link: https://lore.kernel.org/r/20230830111532.444535-3-sui.jingfeng@linux.dev Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
[bhelgaas: commit log, split functional change from optimization] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: "Maciej W. Rozycki" <macro@orcam.me.uk>
Sui Jingfeng [Wed, 30 Aug 2023 11:15:28 +0000 (19:15 +0800)]
PCI: Add pci_is_vga() helper
The PCI Code and ID Assignment spec, r1.15, secs 1.4 and 1.1, define
VGA Base Class and Sub-Classes:
03 00 PCI_CLASS_DISPLAY_VGA VGA-compatible or 8514-compatible
00 01 PCI_CLASS_NOT_DEFINED_VGA VGA-compatible (before Class Code)
Add a pci_is_vga() helper to return true if a device is in either category.
These VGA devices use the hardwired legacy VGA resources
([mem 0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df] and aliases), so
they require special handling if more than one is present in the system.
Mario Limonciello [Wed, 4 Oct 2023 14:49:59 +0000 (09:49 -0500)]
x86/PCI: Avoid PME from D3hot/D3cold for AMD Rembrandt and Phoenix USB4
Iain reports that USB devices can't be used to wake a Lenovo Z13 from
suspend. This occurs because on some AMD platforms, even though the Root
Ports advertise PME_Support for D3hot and D3cold, wakeup events from
devices on a USB4 controller don't result in wakeup interrupts from the
Root Port when amd-pmc has put the platform in a hardware sleep state.
If amd-pmc will be involved in the suspend, remove D3hot and D3cold from
the PME_Support mask of Root Ports above USB4 controllers so we avoid those
states if we need wakeups.
Restore D3 support at resume so that it can be used by runtime suspend.
This affects both AMD Rembrandt and Phoenix SoCs.
"pm_suspend_target_state == PM_SUSPEND_ON" means we're doing runtime
suspend, and amd-pmc will not be involved. In that case PMEs work as
advertised in D3hot/D3cold, so we don't need to do anything.
Note that amd-pmc is technically optional, and there's no need for this
quirk if it's not present, but we assume it's always present because power
consumption is so high without it.
Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend") Link: https://lore.kernel.org/r/20231004144959.158840-1-mario.limonciello@amd.com Reported-by: Iain Lane <iain@orangesquash.org.uk> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
[bhelgaas: commit log, move to arch/x86/pci/fixup.c, add #includes] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org
Li Chen [Wed, 5 Jul 2023 10:15:51 +0000 (18:15 +0800)]
PCI: cadence: Drop unused member from struct cdns_plat_pcie
The struct cdns_plat_pcie contains a member called is_rc that is not
being used beyond being assigned a value within the cdns_plat_pcie_probe()
function, which is then not used for anything.
Thus, drop is_rc from the struct cdns_plat_pcie, especially since there
already is an is_rc member within the struct cdns_plat_pcie_of_data that
is actively used to convey information about the PCIe controller mode.
[kwilczynski: commit log] Signed-off-by: Li Chen <lchen@ambarella.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Gustavo A. R. Silva [Mon, 2 Oct 2023 19:04:15 +0000 (21:04 +0200)]
PCI/P2PDMA: Fix undefined behavior bug in struct pci_p2pdma_pagemap
Struct dev_pagemap is a flexible structure, which means that it contains a
flexible-array member. If dev_pagemap.nr_range > 1, the memory following
the dev_pagemap could be overwritten.
This is currently not an issue because pci_p2pdma_pagemap is not exposed
outside p2pdma.c, and p2pdma.c only sets dev_pagemap.nr_range to 1.
To prevent problems if p2pdma.c ever uses nr_range > 1, move the flexible
struct dev_pagemap to the end of struct pci_p2pdma_pagemap.
-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting
ready to enable it globally.
Link: https://lore.kernel.org/r/ZRsUL/hATNruwtla@work Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
[bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Ilpo Järvinen [Tue, 3 Oct 2023 12:52:58 +0000 (15:52 +0300)]
PCI: vmd: Correct PCI Header Type Register's multi-function check
vmd_domain_reset() attempts to find whether the device may contain multiple
functions by checking 0x80 (Multi-Function Device), however, the hdr_type
variable has already been masked with PCI_HEADER_TYPE_MASK so the check can
never true.
To fix the issue, don't mask the read with PCI_HEADER_TYPE_MASK.
Fixes: 6aab5622296b ("PCI: vmd: Clean up domain before enumeration") Link: https://lore.kernel.org/r/20231003125300.5541-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Nirmal Patel <nirmal.patel@linux.intel.com>
PCI/sysfs: Protect driver's D3cold preference from user space
struct pci_dev contains two flags which govern whether the device may
suspend to D3cold:
* no_d3cold provides an opt-out for drivers (e.g. if a device is known
to not wake from D3cold)
* d3cold_allowed provides an opt-out for user space (default is true,
user space may set to false)
Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend"),
the user space setting overwrites the driver setting. Essentially user
space is trusted to know better than the driver whether D3cold is
working.
That feels unsafe and wrong. Assume that the change was introduced
inadvertently and do not overwrite no_d3cold when d3cold_allowed is
modified. Instead, consider d3cold_allowed in addition to no_d3cold
when choosing a suspend state for the device.
That way, user space may opt out of D3cold if the driver hasn't, but it
may no longer force an opt in if the driver has opted out.
Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend") Link: https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@wunner.de Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Cc: stable@vger.kernel.org # v4.8+
Sui Jingfeng [Fri, 25 Aug 2023 06:27:12 +0000 (14:27 +0800)]
drm/nouveau: Use pci_get_base_class() to reduce duplicated code
Use pci_get_base_class() to reduce duplicated code. No functional change
intended.
Link: https://lore.kernel.org/r/20230825062714.6325-4-sui.jingfeng@linux.dev Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch>
Sui Jingfeng [Fri, 25 Aug 2023 06:27:10 +0000 (14:27 +0800)]
PCI: Add pci_get_base_class() helper
There is no function to get all PCI devices in a system by matching
against the base class code only, ignoring the sub-class code and
the programming interface. Add pci_get_base_class() to suit the
need.
For example, if a driver wants to process all PCI display devices in
a system, it can do so like this:
PCI: Lengthen reset delay for VideoPropulsion Torrent QN16e card
Commit ac91e6980563 ("PCI: Unify delay handling for reset and resume")
shortened an unconditional 1 sec delay after a Secondary Bus Reset to 100
msec for PCIe (per PCIe r6.1 sec 6.6.1). The 1 sec delay is only required
for Conventional PCI.
But it turns out that there are PCIe devices which require a longer delay
than prescribed before first config space access after reset recovery or
resume from D3cold:
Chad reports that a "VideoPropulsion Torrent QN16e" MPEG QAM Modulator
"raises a PCI system error (PERR), as reported by the IPMI event log, and
the hardware itself would suffer a catastrophic event, cycling the server"
unless the longer delay is observed.
The card is specified to conform to PCIe r1.0 and indeed only supports Gen1
speed (2.5 GT/s) according to lspci. PCIe r1.0 sec 7.6 prescribes the same
100 msec delay as PCIe r6.1 sec 6.6.1:
To allow components to perform internal initialization, system software
must wait for at least 100 ms from the end of a reset (cold/warm/hot)
before it is permitted to issue Configuration Requests
The behavior of the Torrent QN16e card thus appears to be a quirk. Treat
it as such and lengthen the reset delay for this specific device.
Fixes: ac91e6980563 ("PCI: Unify delay handling for reset and resume") Link: https://lore.kernel.org/r/47727e792c7f0282dc144e3ec8ce8eb6e713394e.1695304512.git.lukas@wunner.de Reported-by: Chad Schroeder <CSchroeder@sonifi.com> Closes: https://lore.kernel.org/linux-pci/DM6PR16MB2844903E34CAB910082DF019B1FAA@DM6PR16MB2844.namprd16.prod.outlook.com/ Tested-by: Chad Schroeder <CSchroeder@sonifi.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v5.4+
[kwilczynski: use correct tags, commit log] Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/linux-pci/20230627113808.269716-1-korantwork@gmail.com Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Xinghui Li <korantli@tencent.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Merge tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm
Pull drm ci scripts from Dave Airlie:
"This is a bunch of ci integration for the freedesktop gitlab instance
where we currently do upstream userspace testing on diverse sets of
GPU hardware. From my perspective I think it's an experiment worth
going with and seeing how the benefits/noise playout keeping these
files useful.
Ideally I'd like to get this so we can do pre-merge testing on PRs
eventually.
Below is some info from danvet on why we've ended up making the
decision and how we can roll it back if we decide it was a bad plan.
Why in upstream?
- like documentation, testcases, tools CI integration is one of these
things where you can waste endless amounts of time if you
accidentally have a version that doesn't match your source code
- but also like the above, there's a balance, this is the initial cut
of what we think makes sense to keep in sync vs out-of-tree,
probably needs adjustment
- gitlab supports out-of-repo gitlab integration and that's what's
been used for the kernel in drm, but it results in per-driver
fragmentation and lots of duplicated effort. the simple act of
smashing an arbitrary winner into a topic branch already started
surfacing patches on dri-devel and sparking good cross driver team
discussions
Why gitlab?
- it's not any more shit than any of the other CI
- drm userspace uses it extensively for everything in userspace, we
have a lot of people and experience with this, including
integration of hw testing labs
- media userspace like gstreamer is also on gitlab.fd.o, and there's
discussion to extend this to the media subsystem in some fashion
Can this be shared?
- there's definitely a pile of code that could move to scripts/ if
other subsystem adopt ci integration in upstream kernel git. other
bits are more drm/gpu specific like the igt-gpu-tests/tools
integration
- docker images can be run locally or in other CI runners
Will we regret this?
- it's all in one directory, intentionally, for easy deletion
- probably 1-2 years in upstream to see whether this is worth it or a
Big Mistake. that's roughly what it took to _really_ roll out solid
CI in the bigger userspace projects we have on gitlab.fd.o like
mesa3d"
* tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm:
drm: ci: docs: fix build warning - add missing escape
drm: Add initial ci/ subdirectory
Merge tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix preemption delays in the SGX code, remove unnecessarily
UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and
make the x86 SMP init code a bit more conservative to fix kexec()
lockups"
* tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Break up long non-preemptible delays in sgx_vepc_release()
x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI
x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld
x86/smp: Don't send INIT to non-present and non-booted CPUs
Merge tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
"perf tools maintainership:
- Add git information for perf-tools and perf-tools-next trees and
branches to the MAINTAINERS file. That is where development now
takes place and myself and Namhyung Kim have write access, more
people to come as we emulate other maintainer groups.
perf record:
- Record kernel data maps when 'perf record --data' is used, so that
global variables can be resolved and used in tools that do data
profiling.
perf trace:
- Remove the old, experimental support for BPF events in which a .c
file was passed as an event: "perf trace -e hello.c" to then get
compiled and loaded.
The only known usage for that, that shipped with the kernel as an
example for such events, augmented the raw_syscalls tracepoints and
was converted to a libbpf skeleton, reusing all the user space
components and the BPF code connected to the syscalls.
In the end just the way to glue the BPF part and the user space
type beautifiers changed, now being performed by libbpf skeletons.
The next step is to use BTF to do pretty printing of all syscall
types, as discussed with Alan Maguire and others.
Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all
path/filenames/strings, some of the networking data structures,
perf_event_attr, etc, i.e. systemwide tracing of nanosleep calls
and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5
seconds:
- Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1)
for licensing reasons, and we missed a build test on
tools/perf/tests makefile.
Since we now default to NDEBUG=1, we ended up segfaulting when
building with BUILD_NONDISTRO=1 because a needed initialization
routine was being "error checked" via an assert.
Fix it by explicitly checking the result and aborting instead if it
fails.
We better back propagate the error, but at least 'perf annotate' on
samples collected for a BPF program is back working when perf is
built with BUILD_NONDISTRO=1.
perf report/top:
- Add back TUI hierarchy mode header, that is seen when using 'perf
report/top --hierarchy'.
- Fix the number of entries for 'e' key in the TUI that was
preventing navigation of lines when expanding an entry.
perf report/script:
- Support cross platform register handling, allowing a perf.data file
collected on one architecture to have registers sampled correctly
displayed when analysis tools such as 'perf report' and 'perf
script' are used on a different architecture.
- Fix handling of event attributes in pipe mode, i.e. when one uses:
perf record -o - | perf report -i -
When no perf.data files are used.
- Handle files generated via pipe mode with a version of perf and
then read also via pipe mode with a different version of perf,
where the event attr record may have changed, use the record size
field to properly support this version mismatch.
perf probe:
- Accessing global variables from uprobes isn't supported, make the
error message state that instead of stating that some minimal
kernel version is needed to have that feature. This seems just a
tool limitation, the kernel probably has all that is needed.
perf tests:
- Fix a reference count related leak in the dlfilter v0 API where the
result of a thread__find_symbol_fb() is not matched with an
addr_location__exit() to drop the reference counts of the resolved
components (machine, thread, map, symbol, etc). Add a dlfilter test
to make sure that doesn't regresses.
- Lots of fixes for the 'perf test' written in shell script related
to problems found with the shellcheck utility.
- Fixes for 'perf test' shell scripts testing features enabled when
perf is built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf
counters.
- Add perf record sample filtering test, things like the following
example, that gets implemented as a BPF filter attached to the
event:
- Improve the way the task_analyzer test checks if libtraceevent is
linked, using 'perf version --build-options' instead of the more
expensinve 'perf record -e "sched:sched_switch"'.
- Add support for riscv in the mmap-basic test. (This went as well
via the RiscV tree, same contents).
libperf:
- Implement riscv mmap support (This went as well via the RiscV tree,
same contents).
perf script:
- New tool that converts perf.data files to the firefox profiler
format so that one can use the visualizer at
https://profiler.firefox.com/. Done by Anup Sharma as part of this
year's Google Summer of Code.
One can generate the output and upload it to the web interface but
Anup also automated everything:
perf script gecko -F 99 -a sleep 60
- Support syscall name parsing on arm64.
- Print "cgroup" field on the same line as "comm".
perf bench:
- Add new 'uprobe' benchmark to measure the overhead of uprobes
with/without BPF programs attached to it.
- breakpoints are not available on power9, skip that test.
perf stat:
- Add #num_cpus_online literal to be used in 'perf stat' metrics, and
add this extra 'perf test' check that exemplifies its purpose:
- Improve tool startup time by lazily reading PMU, JSON, sysfs data.
- Improve error reporting in the parsing of events, passing YYLTYPE
to error routines, so that the output can show were the parsing
error was found.
- Add 'perf test' entries to check the parsing of events
improvements.
- Fix various leak for things detected by -fsanitize=address, mostly
things that would be freed at tool exit, including:
- Free evsel->filter on the destructor.
- Allow tools to register a thread->priv destructor and use it in
'perf trace'.
- Free evsel->priv in 'perf trace'.
- Free string returned by synthesize_perf_probe_point() when the
caller fails to do all it needs.
- Adjust various compiler options to not consider errors some
warnings when building with broken headers found in things like
python, flex, bison, as we otherwise build with -Werror. Some for
gcc, some for clang, some for some specific version of those, some
for some specific version of flex or bison, or some specific
combination of these components, bah.
- Allow customization of clang options for BPF target, this helps
building on gentoo where there are other oddities where BPF targets
gets passed some compiler options intended for the native build, so
building with WERROR=0 helps while these oddities are fixed.
- Dont pass ERR_PTR() values to perf_session__delete() in 'perf top'
and 'perf lock', fixing some segfaults when handling some odd
failures.
- Add LTO build option.
- Fix format of unordered lists in the perf docs
(tools/perf/Documentation)
- Overhaul the bison files, using constructs such as YYNOMEM.
- Remove unused tokens from the bison .y files.
- Add more comments to various structs.
- A few LoongArch enablement patches.
Vendor events (JSON):
- Add JSON metrics for Yitian 710 DDR (aarch64). Things like:
EventName, BriefDescription
visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.",
visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.",
op_is_dqsosc_mpc , "A DQS Oscillator MPC command to DRAM.",
op_is_dqsosc_mrr , "A DQS Oscillator MRR command to DRAM.",
op_is_tcr_mrr , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.",
- Add AmpereOne metrics (aarch64).
- Update N2 and V2 metrics (aarch64) and events using Arm telemetry
repo.
- Update scale units and descriptions of common topdown metrics on
aarch64. Things like:
- "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
- "BriefDescription": "Frontend bound L1 topdown metric",
+ "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))",
+ "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.",
- Update events for intel: meteorlake to 1.04, sapphirerapids to
1.15, Icelake+ metric constraints.
- Update files for the power10 platform"
* tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (217 commits)
perf parse-events: Fix driver config term
perf parse-events: Fixes relating to no_value terms
perf parse-events: Fix propagation of term's no_value when cloning
perf parse-events: Name the two term enums
perf list: Don't print Unit for "default_core"
perf vendor events intel: Fix modifier in tma_info_system_mem_parallel_reads for skylake
perf dlfilter: Avoid leak in v0 API test use of resolve_address()
perf metric: Add #num_cpus_online literal
perf pmu: Remove str from perf_pmu_alias
perf parse-events: Make common term list to strbuf helper
perf parse-events: Minor help message improvements
perf pmu: Avoid uninitialized use of alias->str
perf jevents: Use "default_core" for events with no Unit
perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test
perf test shell stat_bpf_counters: Fix test on Intel
perf test shell record_bpf_filter: Skip 6.2 kernel
libperf: Get rid of attr.id field
perf tools: Convert to perf_record_header_attr_id()
libperf: Add perf_record_header_attr_id()
perf tools: Handle old data in PERF_RECORD_ATTR
...
Merge tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- six smb3 client fixes including ones to allow controlling smb3
directory caching timeout and limits, and one debugging improvement
- one fix for nls Kconfig (don't need to expose NLS_UCS2_UTILS option)
- one minor spnego registry update
* tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
spnego: add missing OID to oid registry
smb3: fix minor typo in SMB2_GLOBAL_CAP_LARGE_MTU
cifs: update internal module version number for cifs.ko
smb3: allow controlling maximum number of cached directories
smb3: add trace point for queryfs (statfs)
nls: Hide new NLS_UCS2_UTILS
smb3: allow controlling length of time directory entries are cached with dir leases
smb: propagate error code of extract_sharename()
David Howells [Fri, 8 Sep 2023 16:03:22 +0000 (17:03 +0100)]
iov_iter: Kunit tests for page extraction
Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and
ITER_XARRAY type iterators. ITER_UBUF and ITER_IOVEC aren't dealt with
as they require userspace VM interaction. ITER_DISCARD isn't dealt with
either as that can't be extracted.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 8 Sep 2023 16:03:21 +0000 (17:03 +0100)]
iov_iter: Kunit tests for copying to/from an iterator
Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and
ITER_XARRAY type iterators. ITER_UBUF and ITER_IOVEC aren't dealt with
as they require userspace VM interaction. ITER_DISCARD isn't dealt with
either as that does nothing.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>