www.infradead.org Git - users/hch/misc.git/log

Merge branch 'pci/resource'

- Ensure relaxed tail alignment does not increase min_align when computing
  bridge window size, to fix a regression (Ilpo Järvinen)

- Fix bridge window size computation to fix a regression for devices with
  undefined PCI class, e.g., Samsung [144d:a5a5] (Ilpo Järvinen)

- Fix error handling during resource resize to fix a regression in amdgpu
  (Ilpo Järvinen)

- Align m68k pcibios_enable_device() with other arches (Ilpo Järvinen)

- Remove several sparc pcibios_enable_device() implementations that don't
  do anything beyond what pci_enable_resources() does (Ilpo Järvinen)

- Remove mips pcibios_enable_resources() and use pci_enable_resources()
  instead (Ilpo Järvinen)

- Refactor and simplify find_bus_resource_of_type() (Ilpo Järvinen)

- Claim bridge windows before setting them up (Ilpo Järvinen)

- Disable non-claimed bridge windows so the kernel's view matches the
  hardware configuration (Ilpo Järvinen)

- Use pci_release_resource() instead of release_resource() to reduce code
  duplication and increase consistency (Ilpo Järvinen)

- Enable bridges even if bridge window assignment fails (Ilpo Järvinen)

- Preserve bridge window resource type flags when assignment fails because
  we may need it later (Ilpo Järvinen)

- Add bridge window selection functions to make the selection consistent
  across the several places that do this (Ilpo Järvinen)

- Warn if bridge window cannot be released when resizing BAR (Ilpo
  Järvinen)

- Set up bridge resources before enumerating children so we can check
  whether child resources are inside bridge windows (Ilpo Järvinen)

* pci/resource:
  PCI: Set up bridge resources earlier
  PCI: Don't print stale information about resource
  PCI: Alter misleading recursion to pci_bus_release_bridge_resources()
  PCI: Pass bridge window to pci_bus_release_bridge_resources()
  PCI: Add pci_setup_one_bridge_window()
  PCI: Refactor remove_dev_resources() to use pbus_select_window()
  PCI: Refactor distributing available memory to use loops
  PCI: Use pbus_select_window_for_type() during mem window sizing
  PCI: Use pbus_select_window() in space available checker
  PCI: Rename resource variable from r to res
  PCI: Use pbus_select_window_for_type() during IO window sizing
  PCI: Use pbus_select_window() during BAR resize
  PCI: Warn if bridge window cannot be released when resizing BAR
  PCI: Fix finding bridge window in pci_reassign_bridge_resources()
  PCI: Add bridge window selection functions
  PCI: Add defines for bridge window indexing
  PCI: Preserve bridge window resource type flags
  PCI: Enable bridge even if bridge window fails to assign
  PCI: Use pci_release_resource() instead of release_resource()
  PCI: Disable non-claimed bridge window
  PCI: Always claim bridge window before its setup
  PCI: Refactor find_bus_resource_of_type() logic checks
  PCI: Move find_bus_resource_of_type() earlier
  MIPS: PCI: Use pci_enable_resources()
  sparc/PCI: Remove pcibios_enable_device() as they do nothing extra
  m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()
  PCI: Fix failure detection during resource resize
  PCI: Fix pdev_resources_assignable() disparity
  PCI: Ensure relaxed tail alignment does not increase min_align

Merge branch 'pci/pwrctrl'

- Fix a double cleanup of regulators if devm_add_action_or_reset() fails
  (Geert Uytterhoeven)

* pci/pwrctrl:
  PCI/pwrctrl: Fix device leak at device stop
  PCI/pwrctrl: Fix device and OF node leak at bus scan
  PCI/pwrctrl: Fix device leak at registration
  PCI/pwrctrl: Fix double cleanup on devm_add_action_or_reset() failure

Merge branch 'pci/pm'

- If a device has already been disconnected, e.g., by a hotplug removal,
  don't bother trying to resume it to D0 when detaching the driver (Mario
  Limonciello)

- Ensure devices are powered up before config reads for 'max_link_width',
  'current_link_speed', 'current_link_width', 'secondary_bus_number', and
  'subordinate_bus_number' sysfs files (Brian Norris)

* pci/pm:
  PCI/sysfs: Ensure devices are powered for config reads
  PCI/PM: Skip resuming to D0 if device is disconnected

Merge branch 'pci/p2pdma'

- Free struct p2p_pgmap, not a member within it, in the
  pci_p2pdma_add_resource() error path (Sungho Kim)

- Make pci_has_p2pmem() static (Leon Romanovsky)

* pci/p2pdma:
  PCI/P2PDMA: Reduce scope of pci_has_p2pmem()
  PCI/P2PDMA: Fix incorrect pointer usage in devm_kfree() call

Merge branch 'pci/of'

- Leave parent unit address 0 in 'interrupt-map' so we can build this
  property even when interrupt controllers lack 'reg' properties (Lorenzo
  Pieralisi)

* pci/of:
  PCI: of: Update parent unit address generation in of_pci_prop_intr_map()

Merge branch 'pci/msi'

- Add quirk to disable MSI on RDC PCI to PCIe bridges (Marcos Del Sol
Vives)

* pci/msi:
PCI: Disable MSI on RDC PCI to PCIe bridges

Merge branch 'pci/hotplug'

- Clean up whitespace in messages (Colin Ian King)

* pci/hotplug:
PCI: hotplug: Clean up spaces in messages

Merge branch 'pci/enumeration'

- Use PCI_HEADER_TYPE_* defines, not hard-coded values (Ilpo Järvinen)

- Clean up early_dump_pci_device() to avoid hard-coded values (Ilpo
  Järvinen)

- Clean up pci_scan_child_bus_extend() loop to avoid hard-coded values
  (Ilpo Järvinen)

- Add a Xeon 6 quirk to disable Extended Tags and limit Max Read Request
  Size to 128B to avoid a performance issue (Ilpo Järvinen)

* pci/enumeration:
  PCI: Add Extended Tag + MRRS quirk for Xeon 6
  PCI: Clean up pci_scan_child_bus_extend() loop
  PCI: Clean up early_dump_pci_device()
  PCI: Use header type defines in pci_setup_device()

Merge branch 'pci/aspm'

- Enable all ClockPM and ASPM states for devicetree platforms, since
  there's typically no firmware that enables ASPM (Manivannan Sadhasivam)

- Remove the qcom code that enabled ASPM (Manivannan Sadhasivam)

* pci/aspm:
  PCI: qcom: Remove custom ASPM enablement code
  PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms

Merge branch 'pci/aer'

- Allow drivers to request a Bus Reset on Non-Fatal Errors (Lukas Wunner)

- Send uevents for subordinate devices (not the bridge) on failure to
  recover from errors on the subordinate devices (Lukas Wunner)

- Notify drivers by calling their err_handler.error_detected() callback on
  failure to recover (Lukas Wunner)

- Update device error_state earlier after reset to align AER and EEH error
  recovery (Lukas Wunner)

- Remove obsolete comments about .link_reset(), which was removed long ago
  (Lukas Wunner)

- Emit a uevent for the beginning of error recovery if driver requests a
  reset (Niklas Schnelle)

- Emit error recover uevents on s390 as is done by EEH and AER (Niklas
  Schnelle)

- Include error_detected() result in AER uevent to align with corresponding
  uevents from EEH and s390 (Niklas Schnelle)

- Decode new errors added in PCIe r6.0 (Lukas Wunner)

- Print TLP Log for errors introduced since PCIe spec r1.1 (Lukas Wunner)

- Check for allocation failure in pci_aer_init() (Vernon Yang)

- Update error recovery documentation to match the current code and use
  consistent nomenclature (Lukas Wunner)

- Avoid NULL pointer dereference in aer_ratelimit() when GHES error info
  points to a device with no AER Capability (Breno Leitao)

* pci/aer:
  PCI/AER: Avoid NULL pointer dereference in aer_ratelimit()
  Documentation: PCI: Fix typos
  Documentation: PCI: Tidy error recovery doc's PCIe nomenclature
  Documentation: PCI: Amend error recovery doc with DPC/AER specifics
  Documentation: PCI: Sync error recovery doc with code
  Documentation: PCI: Sync AER doc with code
  PCI/AER: Fix NULL pointer access by aer_info
  PCI/AER: Print TLP Log for errors introduced since PCIe r1.1
  PCI/AER: Support errors introduced by PCIe r6.0
  powerpc/eeh: Use result of error_detected() in uevent
  s390/pci: Use pci_uevent_ers() in PCI recovery
  PCI/AER: Fix missing uevent on recovery when a reset is requested
  PCI/ERR: Remove remnants of .link_reset() callback
  PCI/ERR: Update device error_state already after reset
  PCI/ERR: Notify drivers on failure to recover
  PCI/ERR: Fix uevent on failure to recover
  PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors

PCI/AER: Avoid NULL pointer dereference in aer_ratelimit()

When platform firmware supplies error information to the OS, e.g., via the
ACPI APEI GHES mechanism, it may identify an error source device that
doesn't advertise an AER Capability and therefore dev->aer_info, which
contains AER stats and ratelimiting data, is NULL.

pci_dev_aer_stats_incr() already checks dev->aer_info for NULL, but
aer_ratelimit() did not, leading to NULL pointer dereferences like this one
from the URL below:

  {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
  {1}[Hardware Error]: event severity: corrected
  {1}[Hardware Error]:   device_id: 0000:00:00.0
  {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2020
  {1}[Hardware Error]:   aer_cor_status: 0x00001000, aer_cor_mask: 0x00002000
  BUG: kernel NULL pointer dereference, address: 0000000000000264
  RIP: 0010:___ratelimit+0xc/0x1b0
  pci_print_aer+0x141/0x360
  aer_recover_work_func+0xb5/0x130

[8086:2020] is an Intel "Sky Lake-E DMI3 Registers" device that claims to
be a Root Port but does not advertise an AER Capability.

Add a NULL check in aer_ratelimit() to avoid the NULL pointer dereference.
Note that this also prevents ratelimiting these events from GHES.

Fixes: a57f2bfb4a5863 ("PCI/AER: Ratelimit correctable and non-fatal error logging")
Link: https://lore.kernel.org/r/buduna6darbvwfg3aogl5kimyxkggu3n4romnmq6sozut6axeu@clnx7sfsy457/
Signed-off-by: Breno Leitao <leitao@debian.org>
[bhelgaas: add crash details to commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250929-aer_crash_2-v1-1-68ec4f81c356@debian.org

PCI: Set up bridge resources earlier

Bridge windows are read twice from PCI Config Space, the first time from
pci_read_bridge_windows(), which does not set up the device's resources.
This causes problems down the road as child resources of the bridge cannot
check whether they reside within the bridge window or not.

Set up the bridge windows already in pci_read_bridge_windows().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250924134228.1663-2-ilpo.jarvinen@linux.intel.com

PCI: Don't print stale information about resource

pbus_size_mem() logs the bridge window resource using pci_info() before the
start and end fields of the resource have been updated which then prints
stale information.

Set resource addresses earlier to make understanding logs easier.
Regrettably, this results in setting the addresses multiple times but that
seems unavoidable.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250924135641.3399-1-ilpo.jarvinen@linux.intel.com

PCI/sysfs: Ensure devices are powered for config reads

The "max_link_width", "current_link_speed", "current_link_width",
"secondary_bus_number", and "subordinate_bus_number" sysfs files all access
config registers, but they don't check the runtime PM state. If the device
is in D3cold or a parent bridge is suspended, we may see -EINVAL, bogus
values, or worse, depending on implementation details.

Wrap these access in pci_config_pm_runtime_{get,put}() like most of the
rest of the similar sysfs attributes.

Notably, "max_link_speed" does not access config registers; it returns a
cached value since d2bd39c0456b ("PCI: Store all PCIe Supported Link
Speeds").

Fixes: 56c1af4606f0 ("PCI: Add sysfs max_link_speed/width, current_link_speed/width, etc")
Signed-off-by: Brian Norris <briannorris@google.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250924095711.v2.1.Ibb5b6ca1e2c059e04ec53140cd98a44f2684c668@changeid

PCI: qcom: Remove custom ASPM enablement code

Since the PCI subsystem has started enabling all ASPM states for all
devicetree based platforms, the ASPM enablement code from this driver can
now be dropped.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250922-pci-dt-aspm-v2-2-2a65cf84e326@oss.qualcomm.com

PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms

So far, the PCI subsystem has honored the ASPM and Clock PM states set by
the BIOS (through LNKCTL) during device initialization, if it relies on the
default state selected using:

* Kconfig: CONFIG_PCIEASPM_DEFAULT=y, or
* cmdline: "pcie_aspm=off", or
* FADT: ACPI_FADT_NO_ASPM

This was done conservatively to avoid issues with the buggy devices that
advertise ASPM capabilities, but behave erratically if the ASPM states are
enabled. So the PCI subsystem ended up trusting the BIOS to enable only the
ASPM states that were known to work for the devices.

But this turned out to be a problem for devicetree platforms, especially
the ARM based devicetree platforms powering Embedded and *some* Compute
devices as they tend to run without any standard BIOS. So the ASPM states
on these platforms were left disabled during boot and the PCI subsystem
never bothered to enable them, unless the user has forcefully enabled the
ASPM states through Kconfig, cmdline, and sysfs or the device drivers
themselves, enabling the ASPM states through pci_enable_link_state() APIs.

This caused runtime power issues on those platforms. So a couple of
approaches were tried to mitigate this BIOS dependency without user
intervention by enabling the ASPM states in the PCI controller drivers
after device enumeration, and overriding the ASPM/Clock PM states
by the PCI controller drivers through an API before enumeration.

But it has been concluded that none of these mitigations should really be
required and the PCI subsystem should enable the ASPM states advertised by
the devices without relying on BIOS or the PCI controller drivers. If any
device is found to be misbehaving after enabling ASPM states that they
advertised, then those devices should be quirked to disable the problematic
ASPM/Clock PM states.

In an effort to do so, start by overriding the ASPM and Clock PM states set
by the BIOS for devicetree platforms first. Separate helper functions are
introduced to override the BIOS set states by enabling all of them if
of_have_populated_dt() returns true. To aid debugging, print the overridden
ASPM and Clock PM states as well.

In the future, these helpers could be extended to allow other platforms
like VMD, newer ACPI systems with a cutoff year etc... to follow the path.

Link: https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas
Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
[bhelgaas: tweak comments and dmesg logs]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250922-pci-dt-aspm-v2-1-2a65cf84e326@oss.qualcomm.com

PCI/PM: Skip resuming to D0 if device is disconnected

When a device is surprise-removed (e.g., due to a dock unplug), the PCI
core unconfigures all downstream devices and sets their error state to
pci_channel_io_perm_failure. This marks them as disconnected via
pci_dev_is_disconnected().

During device removal, the runtime PM framework may attempt to resume the
device to D0 via pm_runtime_get_sync(), which calls into pci_power_up().
Since the device is already disconnected, this resume attempt is
unnecessary and results in a predictable errors like this, typically when
undocking from a TBT3 or USB4 dock with PCIe tunneling:

pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible

Avoid powering up disconnected devices by checking their status early in
pci_power_up() and returning -EIO.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
[bhelgaas: add typical message]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://patch.msgid.link/20250909031916.4143121-1-superm1@kernel.org

Documentation: PCI: Fix typos

The PCIe spec uses "Requester ID", not "requestor ID". Follow the spec to
avoid confusion.

Signed-off-by: Emilio Perez <emiliopeju@gmail.com>
[bhelgaas: capitalize as a hint that the spec defines this]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250818023121.33427-1-emiliopeju@gmail.com

PCI: Alter misleading recursion to pci_bus_release_bridge_resources()

Recursing into pci_bus_release_bridge_resources() should not alter rel_type
because it makes no sense to change the release type within the recursion
call chain. A literal "whole_subtree" is passed into the recursion instead
of "rel_type" parameter which is misleading as the release type should
remain the same throughout the entire operation.

This is not a correctness issue because of the preceding if () that only
allows the recursion to happen if rel_type is "whole_subtree". Still,
replace the non-intuitive parameter with direct passing of "rel_type".

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-25-ilpo.jarvinen@linux.intel.com

PCI: Pass bridge window to pci_bus_release_bridge_resources()

pci_bus_release_bridge_resources() takes type, which is converted into a
bridge window resource in pci_bridge_release_resources().

Find out the correct bridge window for resource whose assignment failed.
Pass that bridge window to pci_bus_release_bridge_resources() instead of
passing the type. When recursing to subordinate, check which bridge windows
have to be released and recurse for each.

For now, use pbus_select_window_for_type() instead of pbus_select_window()
because non-bridge window resources still have their flags reset which
destroys the type information from the struct resource. The struct
pci_dev_resource holds a copy of the flags which are used instead.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-24-ilpo.jarvinen@linux.intel.com

PCI: Add pci_setup_one_bridge_window()

pci_bridge_release_resources() contains a resource type hack to work
around the unsuitable __pci_setup_bridge() interface. Extract the
switch statement that picks the correct bridge window setup function
from pci_claim_bridge_resource() into pci_setup_one_bridge_window() and
use it also in pci_bridge_release_resources().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-23-ilpo.jarvinen@linux.intel.com

PCI: Refactor remove_dev_resources() to use pbus_select_window()

Convert remove_dev_resources() to use pbus_select_window(). As 'available'
is not the real resources, the index has to be adjusted as only bridge
resource counterparts are present in the 'available' array.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-22-ilpo.jarvinen@linux.intel.com

PCI: Refactor distributing available memory to use loops

pci_bus_distribute_available_resources() and
pci_bridge_distribute_available_resources() retain bridge window resources
and related data needed for distributing the available window in
independent variables for io, memory, and prefetchable memory windows. The
code is essentially the same for all of them and therefore repeated three
times with different variable names.

Refactor pci_bus_distribute_available_resources() to take an array. This
is complicated slightly by the function taking advantage of passing the
struct as value, which cannot be done for arrays in C. Therefore, copy the
data into a local array in the stack in the first loop.

Variable names are (hopefully) improved slightly as well.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-21-ilpo.jarvinen@linux.intel.com

PCI: Use pbus_select_window_for_type() during mem window sizing

__pci_bus_size_bridges() goes to great lengths of helping pbus_size_mem()
in which types it should put into a particular bridge window, requiring
passing up to three resource type into pbus_size_mem().

Instead of having complex logic in __pci_bus_size_bridges() and a
non-straightforward interface between those functions, use
pbus_select_window_for_type() and pbus_select_window() to find the correct
bridge window and compare if the resources belong to that window.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-20-ilpo.jarvinen@linux.intel.com

PCI: Use pbus_select_window() in space available checker

pbus_upstream_space_available() figures out the upstream bridge window
resources on its own. Migrate it to use pbus_select_window().

Note: pbus_select_window() -> pbus_select_window_for_type() calls
find_bus_resource_of_type() for root bus, which does not do parent check
similar to what pbus_upstream_space_available() did earlier, but the
difference does not matter because pbus_upstream_space_available() itself
stops when it encounters the root bus.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-19-ilpo.jarvinen@linux.intel.com

PCI: Rename resource variable from r to res

Resource is going to be passed in as argument aften an upcoming change.
Rename the struct resource variable from "r" to "res" to avoid using one
letter variable name in a function argument.

This rename is made separately to reduce churn in the upcoming change.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-18-ilpo.jarvinen@linux.intel.com

PCI: Use pbus_select_window_for_type() during IO window sizing

Convert pbus_size_io() to use pbus_select_window_for_type().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-17-ilpo.jarvinen@linux.intel.com

PCI: Use pbus_select_window() during BAR resize

Prior to a BAR resize, __resource_resize_store() loops through the normal
resources of the PCI device and releases those that match to the flags of
the BAR to be resized. This is necessary to allow resizing also the
upstream bridge window as only childless bridge windows can be resized.

While the flags check (mostly) works (if corner cases are ignored), the
more straightforward way is to check if the resources share the bridge
window. Change __resource_resize_store() to do the check using
pbus_select_window().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-16-ilpo.jarvinen@linux.intel.com

PCI: Warn if bridge window cannot be released when resizing BAR

BAR resizing calls to pci_reassign_bridge_resources(), which attempts to
release any upstream bridge window to allow them to accommodate the new BAR
size. The release can only be performed if there are no other child
resources for the bridge window. Previously the code continued silently
when other child resources were detected.

Add pci_warn() to inform user that a bridge window could not be released
because of child resources. As a small bridge window is often the reason
why BAR resize fails, this warning will help to pinpoint to the cause.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-15-ilpo.jarvinen@linux.intel.com

PCI: Fix finding bridge window in pci_reassign_bridge_resources()

pci_reassign_bridge_resources() walks upwards in the PCI bus hierarchy,
locates the relevant bridge window on each level using flags check, and
attempts to release the bridge window. The flags-based check is fragile due
to various fallbacks in the bridge window selection logic. As such, the
algorithm might not locate the correct bridge window.

Refactor pci_reassign_bridge_resources() to determine the correct bridge
window using pbus_select_window(), which contains logic to handle all
fallback cases correctly. Change function prefix to pbus as it now inputs
struct bus and resource for which to locate the bridge window.

The main purpose is to make bridge window selection logic consistent across
the entire PCI core (one step at a time). While this technically also fixes
the commit 8bb705e3e79d ("PCI: Add pci_resize_resource() for resizing
BARs") making the bridge window walk algorithm more robust, the normal
setup having a 64-bit resizable BAR underneath bridge(s) with 64-bit
prefetchable windows does not need to use any fallbacks. As such, the
practical impact is low (requiring BAR resize use case and a non-typical
bridge device).

The way to detect if unrelated resource failed again is left to use the
type based approximation which should not behave worse than before.

Fixes: 8bb705e3e79d ("PCI: Add pci_resize_resource() for resizing BARs")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-14-ilpo.jarvinen@linux.intel.com

PCI: Add bridge window selection functions

Various places in the PCI core code independently decide into which bridge
window a child resource should be placed. It is hard to see whether these
decisions always end up in agreement, especially in the corner cases, and
in some places it requires complex logic to pass multiple resource types
and/or bridge windows around.

Add pbus_select_window() and pbus_select_window_for_type() for cases where
the former cannot be used so that eventually the same helper can be used to
select the bridge window everywhere. Using the same function ensures the
selected bridge window remains always the same and it can be easily
recalculated in-situ allowing simplifying the interfaces between internal
functions in upcoming changes.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-13-ilpo.jarvinen@linux.intel.com

PCI: Add defines for bridge window indexing

include/linux/pci.h provides PCI_BRIDGE_{IO,MEM,PREF_MEM}_WINDOW defines,
however, they're based on the resource array indexing in the pci_dev
struct. The struct pci_bus also has pointers to those same resources but
they start from zeroth index.

Add PCI_BUS_BRIDGE_{IO,MEM,PREF_MEM}_WINDOW defines to get rid of literal
indexing.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-12-ilpo.jarvinen@linux.intel.com

PCI: Preserve bridge window resource type flags

When a bridge window is found unused or fails to assign, the flags of the
associated resource are cleared. Clearing flags is problematic as it also
removes the type information of the resource which is needed later.

Thus, always preserve the bridge window type flags and use IORESOURCE_UNSET
and IORESOURCE_DISABLED to indicate the status of the bridge window. Also,
when initializing resources, make sure all valid bridge windows do get
their type flags set.

Change various places that relied on resource flags being cleared to check
for IORESOURCE_UNSET and IORESOURCE_DISABLED to allow bridge window
resource to retain their type flags. Add pdev_resource_assignable() and
pdev_resource_should_fit() helpers to filter out disabled bridge windows
during resource fitting; the latter combines more common checks into the
helper.

When reading the bridge windows from the registers, instead of leaving the
resource flags cleared for bridge windows that are not enabled, always
set up the flags and set IORESOURCE_UNSET | IORESOURCE_DISABLED as needed.

When resource fitting or assignment fails for a bridge window resource, or
the bridge window is not needed, mark the resource with IORESOURCE_UNSET or
IORESOURCE_DISABLED, respectively.

Use dummy zero resource in resource_show() for backwards compatibility as
lspci will otherwise misrepresent disabled bridge windows.

This change fixes an issue which highlights the importance of keeping the
resource type flags intact:

  At the end of __assign_resources_sorted(), reset_resource() is called,
  previously clearing the flags. Later, pci_prepare_next_assign_round()
  attempted to release bridge resources using
  pci_bus_release_bridge_resources() that calls into
  pci_bridge_release_resources() that assumes type flags are still present.
  As type flags were cleared, IORESOURCE_MEM_64 was not set leading to
  resources under an incorrect bridge window to be released (idx = 1
  instead of idx = 2). While the assignments performed later covered this
  problem so that the wrongly released resources got assigned in the end,
  it was still causing extra release+assign pairs.

There are other reasons why the resource flags should be retained in
upcoming changes too.

Removing the flag reset for non-bridge window resource is left as future
work, in part because it has a much higher regression potential due to
pci_enable_resources() that will start to work also for those resources
then and due to what endpoint drivers might assume about resources.

Despite the Fixes tag, backporting this (at least any time soon) is highly
discouraged. The issue fixed is borderline cosmetic as the later
assignments normally cover the problem entirely. Also there might be
non-obvious dependencies.

Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-11-ilpo.jarvinen@linux.intel.com

PCI: Enable bridge even if bridge window fails to assign

A normal PCI bridge has multiple bridge windows and not all of them are
always required by devices underneath the bridge. If a Root Port or bridge
does not have a device underneath, no bridge windows get assigned. Yet,
pci_enable_resources() is set to fail indiscriminantly on any resource
assignment failure if the resource is not known to be optional.

In practice, the code in pci_enable_resources() is currently largely
dormant. The kernel sets resource flags to zero for any unused bridge
window and resets flags to zero in case of an resource assignment failure,
which short-circuits pci_enable_resources() because of this check:

if (!(r->flags & (IORESOURCE_IO | IORESOURCE_MEM)))
continue;

However, an upcoming change to resource flags will alter how bridge window
resource flags behave activating these long dormants checks in
pci_enable_resources().

While complex logic could be built to selectively enable a bridge only
under some conditions, a few versions of such logic were tried during
development of this change and none of them worked satisfactorily. Thus, I
just gave up and decided to enable any bridge regardless of the bridge
windows as there seems to be no clear benefit from not enabling it, but a
major downside as pcieport will not be probed for the bridge if it's not
enabled.

Therefore, change pci_enable_resources() to not check if bridge window
resources remain unassigned. Resource assignment failures are pretty noisy
already so there is no need to log that for bridge windows in
pci_enable_resources().

Ignoring bridge window failures hopefully prevents an obvious source of
regressions when the upcoming change that no longer clears resource flags
for bridge windows is enacted. I've hit this problem even during my own
testing on multiple occasions so I expect it to be a quite common problem.

This can always be revisited later if somebody thinks the enable check for
bridges is not strict enough, but expect a mind-boggling number of
regressions from such a change.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-10-ilpo.jarvinen@linux.intel.com

PCI: Use pci_release_resource() instead of release_resource()

A few places in setup-bus.c call release_resource() directly and end up
duplicating functionality from pci_release_resource() such as parent check,
logging, and clearing the resource. Worse yet, the way the resource is
cleared is inconsistent between different sites.

Convert release_resource() calls into pci_release_resource() to remove code
duplication. This will also make the resource start, end, and flags
behavior consistent, i.e., start address is cleared, and only
IORESOURCE_UNSET is asserted for the resource.

While at it, eliminate the unnecessary initialization of idx variable in
pci_bridge_release_resources().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-9-ilpo.jarvinen@linux.intel.com

PCI: Disable non-claimed bridge window

If clipping or claiming the bridge window fails, the bridge window is left
in a state that does not match the kernel's view on what the bridge window
is.

Disable the bridge window by writing the magic disable value into the Base
and Limit Registers if clipping or claiming failed. To detect if claiming
the resource was successful, add res->parent checks into the bridge setup
functions.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-8-ilpo.jarvinen@linux.intel.com

PCI: Always claim bridge window before its setup

When the claim of a resource fails for the full range in
pci_claim_bridge_resource(), clipping the resource to a smaller size is
attempted. If clipping is successful, the new bridge window is programmed
and only as the last step the code attempts to claim the resource again.
The order of the last two steps is slightly illogical and inconsistent with
the assignment call chains.

If claiming the bridge window after clipping fails, the bridge window that
was set up is left in place.

Rework the logic such that the bridge window is claimed before calling the
relevant bridge setup function. This make the behavior consistent with
resource fitting call chains that always assign the bridge window before
programming it.

If claiming the bridge window fails, the clipped bridge window is no longer
set up but pci_claim_bridge_resource() returns without writing the bridge
window at all.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-7-ilpo.jarvinen@linux.intel.com

PCI: Refactor find_bus_resource_of_type() logic checks

Reorder the logic checks in find_bus_resource_of_type() to simplify them.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-6-ilpo.jarvinen@linux.intel.com

PCI: Move find_bus_resource_of_type() earlier

Move find_bus_resource_of_type() earlier in setup-bus.c to be able to call
it in upcoming changes.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-5-ilpo.jarvinen@linux.intel.com

MIPS: PCI: Use pci_enable_resources()

pci-legacy.c under MIPS has a copy of pci_enable_resources() named as
pcibios_enable_resources(). Having own copy of same functionality could
lead to inconsistencies in behavior, especially now as
pci_enable_resources() and the bridge window resource flags behavior are
going to be altered by upcoming changes.

The check for !r->start && r->end is already covered by the more generic
checks done in pci_enable_resources().

Call pci_enable_resources() from MIPS's pcibios_enable_device() and remove
pcibios_enable_resources().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Link: https://patch.msgid.link/20250829131113.36754-4-ilpo.jarvinen@linux.intel.com

sparc/PCI: Remove pcibios_enable_device() as they do nothing extra

Under arch/sparc/ there are multiple copies of pcibios_enable_device() but
none of those seem to do anything extra beyond what pci_enable_resources()
is supposed to do. These functions could lead to inconsistencies in
behavior, especially now as pci_enable_resources() and the bridge window
resource flags behavior are going to be altered by upcoming changes.

Remove all pcibios_enable_device() from arch/sparc/ so that PCI core can
simply call into pci_enable_resources() instead using its __weak version
of pcibios_enable_device().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-3-ilpo.jarvinen@linux.intel.com

m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()

m68k has a resource enable (check) loop in its pcibios_enable_device()
which for some reason differs from pci_enable_resources(). This could lead
to inconsistencies in behavior, especially now as pci_enable_resources()
and the bridge window resource flags behavior are going to be altered by
upcoming changes.

The check for !r->start && r->end is already covered by the more generic
checks done in pci_enable_resources().

The entire pcibios_enable_device() suspiciously looks copy-paste from some
other arch as also indicated by the preceding comment. However, it also
enables PCI_COMMAND_IO | PCI_COMMAND_MEMORY always for bridges. It is not
clear why that is being done as the commit e93a6bbeb5a5 ("m68k: common PCI
support definitions and code") introducing this code states "Nothing
specific to any PCI implementation in any m68k class CPU hardware yet".

Replace the resource enable loop with a call to pci_enable_resources() and
adjust the Command Register afterwards as it's unclear if that is necessary
or not so keep it for now.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-2-ilpo.jarvinen@linux.intel.com

PCI: Fix failure detection during resource resize

Since 96336ec70264 ("PCI: Perform reset_resource() and build fail list in
sync") the failed list is always built and returned to let the caller
decide what to do with the failures. The caller may want to retry resource
fitting and assignment and before that can happen, the resources should be
restored to their original state (a reset effectively clears the struct
resource), which requires returning them to the failed list so the original
state remains stored in the associated struct pci_dev_resource.

Resource resizing is different from the ordinary resource fitting and
assignment in that it only considers part of the resources. This means
failures for other resource types are not relevant at all and should be
ignored. As resize doesn't unassign such unrelated resources, those
resources ending up in the failed list implies assignment of that
resource must have failed before resize too. The check in
pci_reassign_bridge_resources() to decide if the whole assignment is
successful, however, is based on list emptiness which will cause false
negatives when the failed list has resources with an unrelated type.

If the failed list is not empty, call pci_required_resource_failed() and
extend it to be able to filter on specific resource types too (if
provided).

Calling pci_required_resource_failed() at this point is slightly
problematic because the resource itself is reset when the failed list
is constructed in __assign_resources_sorted(). As a result,
pci_resource_is_optional() does not have access to the original
resource flags. This could be worked around by restoring and
re-resetting the resource around the call to pci_resource_is_optional(),
however, it shouldn't cause issue as resource resizing is meant for
64-bit prefetchable resources according to Christian König (see the
Link which unfortunately doesn't point directly to Christian's reply
because lore didn't store that email at all).

Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync")
Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel.com/
Reported-by: D Scott Phillips <scott@os.amperecomputing.com>
Closes: https://lore.kernel.org/all/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: D Scott Phillips <scott@os.amperecomputing.com>
Reviewed-by: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org # v6.15+
Link: https://patch.msgid.link/20250822123359.16305-4-ilpo.jarvinen@linux.intel.com

PCI: Fix pdev_resources_assignable() disparity

pdev_sort_resources() uses pdev_resources_assignable() helper to decide if
device's resources cannot be assigned, so it ignores class 0
(PCI_CLASS_NOT_DEFINED) devices. pbus_size_mem(), on the other hand, does
not do the same check. This could lead into a situation where a resource
ends up on realloc_head list but is not on the head list, which in turn
prevents emptying the resource from the realloc_head list in
__assign_resources_sorted().

A non-empty realloc_head is unacceptable because it triggers an internal
sanity check as shown in this log with a device that has class 0
(PCI_CLASS_NOT_DEFINED):

  pci 0001:01:00.0: [144d:a5a5] type 00 class 0x000000 PCIe Endpoint
  pci 0001:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
  pci 0001:01:00.0: ROM [mem 0x00000000-0x0000ffff pref]
  pcieport 0001:00:00.0: bridge window [mem 0x00100000-0x001fffff] to [bus 01-ff] add_size 100000 add_align 100000
  pcieport 0001:00:00.0: bridge window [mem 0x40000000-0x401fffff]: assigned
  ------------[ cut here ]------------
  kernel BUG at drivers/pci/setup-bus.c:2532!
  Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
  ...
  Call trace:
   pci_assign_unassigned_bus_resources+0x110/0x114 (P)
   pci_rescan_bus+0x28/0x48

Use pdev_resources_assignable() also within pbus_size_mem() to skip
processing of non-assignable resources which removes the disparity in
between what resources pdev_sort_resources() and pbus_size_mem() consider.
As non-assignable resources are no longer processed, they are not added to
the realloc_head list, thus the sanity check no longer triggers.

This disparity problem is very old but only now became apparent after
2499f5348431 ("PCI: Rework optional resource handling") that made the ROM
resources optional when calculating bridge window sizes which required
adding the resource to the realloc_head list.  Previously, bridge windows
were just sized larger than necessary.

Fixes: 2499f5348431 ("PCI: Rework optional resource handling")
Reported-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Closes: https://lore.kernel.org/all/5f103643-5e1c-43c6-b8fe-9617d3b5447c@linaro.org/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v6.15+
Link: https://patch.msgid.link/20250822123359.16305-3-ilpo.jarvinen@linux.intel.com

PCI: Ensure relaxed tail alignment does not increase min_align

When using relaxed tail alignment for the bridge window, pbus_size_mem()
also tries to minimize min_align, which can under certain scenarios end up
increasing min_align from that found by calculate_mem_align().

Ensure min_align is not increased by the relaxed tail alignment.

Eventually, it would be better to add calculate_relaxed_head_align()
similar to calculate_mem_align() which finds out what alignment can be used
for the head without introducing any gaps into the bridge window to give
flexibility on head address too. But that looks relatively complex so it
requires much more testing than fixing the immediate problem causing a
regression.

Fixes: 67f9085596ee ("PCI: Allow relaxed bridge window tail sizing for optional resources")
Reported-by: Rio Liu <rio@r26.me>
Closes: https://lore.kernel.org/all/o2bL8MtD_40-lf8GlslTw-AZpUPzm8nmfCnJKvS8RQ3NOzOW1uq1dVCEfRpUjJ2i7G2WjfQhk2IWZ7oGp-7G-jXN4qOdtnyOcjRR0PZWK5I=@r26.me/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Rio Liu <rio@r26.me>
Cc: stable@vger.kernel.org # v6.15+
Link: https://patch.msgid.link/20250822123359.16305-2-ilpo.jarvinen@linux.intel.com

Documentation: PCI: Tidy error recovery doc's PCIe nomenclature

Commit 11502feab423 ("Documentation: PCI: Tidy AER documentation")
replaced the terms "PCI-E", "PCI-Express" and "PCI Express" with "PCIe"
in the AER documentation.

Do the same in the documentation on PCI error recovery. While at it,
add a missing period and a missing blank.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/db56b7ef12043f709a04ce67c1d1e102ab5f4e19.1757942121.git.lukas@wunner.de

Documentation: PCI: Amend error recovery doc with DPC/AER specifics

Amend the documentation on PCI error recovery with specifics about
Downstream Port Containment and Advanced Error Reporting:

* Explain that with DPC, devices are inaccessible upon an error (similar
  to EEH on powerpc) and do not become accessible until the link is
  re-enabled.

* Explain that with AER, although devices may already be accessible in the
  ->error_detected() callback, accesses should be deferred to the
  ->mmio_enabled() callback for compatibility with EEH on powerpc and with
  s390.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/61d8eeadb20ee71c3a852f44c863bfe0209c454d.1757942121.git.lukas@wunner.de

Documentation: PCI: Sync error recovery doc with code

Amend the documentation on PCI error recovery to fix minor inaccuracies
vis-à-vis the actual code:

* The documentation claims that a missing ->resume() or ->mmio_enabled()
  callback always leads to recovery through reset.  But none of the
  implementations do this (pcie_do_recovery(), eeh_handle_normal_event(),
  zpci_event_do_error_state_clear()).

  Drop the claim to align the documentation with the code.

* The documentation does not list PCI_ERS_RESULT_RECOVERED as a valid
  return value from ->error_detected().  But none of the implementations
  forbid this and some drivers are returning it, e.g.:
  drivers/bus/mhi/host/pci_generic.c
  drivers/infiniband/hw/hfi1/pcie.c

  Further down in the documentation it is implied that the return value is
  in fact allowed:
  "The platform will call the resume() callback on all affected device
  drivers if all drivers on the segment have returned
  PCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks."

  The "3 previous callbacks" being ->error_detected(), ->mmio_enabled()
  and ->slot_reset().

  Add it to the valid return values for consistency.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/ed3c3385499775fcc25f1ee66f395e212919f94a.1757942121.git.lukas@wunner.de

Documentation: PCI: Sync AER doc with code

The PCIe Advanced Error Reporting driver has evolved over the years but
its documentation hasn't.  Catch up with past code changes:

* The documentation claims that Correctable Errors are logged with
  KERN_INFO severity, but the code uses KERN_WARN.

  It had used KERN_WARN from the beginning with commit 6c2b374d7485
  ("PCI-Express AER implemetation: AER core and aerdriver").  In 2013,
  commit 2cced2d95961 ("aerdrv: Cleanup log output for AER") switched to
  KERN_ERR, until 2020 when it was reverted back to KERN_WARN by commit
  e83e2ca3c395 ("PCI/AER: Log correctable errors as warning, not error").

* An example log message in the documentation uses the term "Uncorrected",
  but the code uses "Uncorrectable" since commit 02a06f5f1a6a ("PCI/AER:
  Use 'Correctable' and 'Uncorrectable' spec terms for errors").

* The example contains the Requester ID "id=0500", which is omitted since
  commit 010caed4ccb6 ("PCI/AER: Decode Error Source Requester ID").

* The example contains the error name "Unsupported Request", which is
  instead reported as "UnsupReq" since commit bd237801fef2 ("PCI/AER:
  Adopt lspci names for AER error decoding").

* The example doesn't prepend "0x" to hex values from the TLP Header Log,
  as introduced by commit f68ea779d98a ("PCI: Add pcie_print_tlp_log() to
  print TLP Header and Prefix Log").

* The documentation refers to a reset_link callback which was removed by
  commit b6cf1a42f916 ("PCI/ERR: Remove service dependency in
  pcie_do_recovery()").

* Commit 579086225502 ("PCI/ERR: Recover from RCiEP AER errors") added
  support to recover Root Complex Integrated Endpoints by applying a
  Function Level Reset, alternatively to the Secondary Bus Reset which is
  applied otherwise.

* On non-fatal errors, a reset was previously never performed.  But the
  AER driver has just been amended to allow drivers to opt in to a reset.

* The documentation claims that a warning message is logged if a driver
  lacks pci_error_handlers.  But the message has been informational
  (logged with KERN_INFO severity) since its introduction with commit
  01daacfb9035 ("PCI/AER: Log which device prevents error recovery").

  The documentation claims that the message is only logged for fatal
  errors, which is incorrect.  Moreover it refers to "section 3", even
  though the documentation no longer contains section numbers since commit
  4e37f055a92e ("Documentation: PCI: convert pcieaer-howto.txt to reST").
  Section 3 is titled "Developer Guide".  That's the same section where
  the reference is located, so it is self-referential and can be dropped.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/7501bfc5b9920193a25998a3cbcf72c47674ec63.1757942121.git.lukas@wunner.de

PCI: of: Update parent unit address generation in of_pci_prop_intr_map()

Some interrupt controllers require an #address-cells property in their
bindings without requiring a "reg" property to be present.

The current logic used to craft an interrupt-map property in
of_pci_prop_intr_map() is based on reading the #address-cells
property in the interrupt-parent and, if != 0, read the interrupt
parent "reg" property to determine the parent unit address to be
used to create the parent unit interrupt specifier.

First of all, it is not correct to read the "reg" property of
the interrupt-parent with an #address-cells value taken from the
interrupt-parent node, because the #address-cells value define the
number of address cells required by child nodes.

More importantly, for all modern interrupt controllers, the parent
unit address is irrelevant in hardware in relation to the
device <-> interrupt-controller connection and the kernel actually
ignores the parent unit address value when hierarchically parsing
the interrupt-map property (i.e., of_irq_parse_raw()).

For the reasons above, remove the code parsing the interrupt parent "reg"
property in of_pci_prop_intr_map() -- it is not needed and prevents
interrupt-map property generation on systems with an interrupt-controller
that has no "reg" property in its interrupt-controller node -- and leave
the parent unit address always initialized to 0 since it is simply ignored
by the kernel.

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/lkml/aJms+YT8TnpzpCY8@lpieralisi/
Link: https://patch.msgid.link/20250818093504.80651-1-lpieralisi@kernel.org

PCI/AER: Fix NULL pointer access by aer_info

The kzalloc(GFP_KERNEL) may return NULL, so all accesses to aer_info->xxx
will result in kernel panic. Fix it.

Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250904182527.67371-1-vernon2gm@gmail.com

PCI/AER: Print TLP Log for errors introduced since PCIe r1.1

When reporting an error, the AER driver prints the TLP Header / Prefix Log
only for errors enumerated in the AER_LOG_TLP_MASKS macro.

The macro was never amended since its introduction in 2006 with commit
6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver").
At the time, PCIe r1.1 was the latest spec revision.

Amend the macro with errors defined since then to avoid omitting the TLP
Header / Prefix Log for newer errors.

The order of the errors in AER_LOG_TLP_MASKS follows PCIe r1.1 sec 6.2.7
rather than 7.10.2, because only the former documents for which errors a
TLP Header / Prefix is logged.  Retain this order.  The section number is
still 6.2.7 in today's PCIe r7.0.

For Completion Timeouts, the TLP Header / Prefix is only logged if the
Completion Timeout Prefix / Header Log Capable bit is set in the AER
Capabilities and Control register.  Introduce a tlp_header_logged() helper
to check whether the TLP Header / Prefix Log is populated and use it in
the two places which currently match against AER_LOG_TLP_MASKS directly.

For Uncorrectable Internal Errors, logging of the TLP Header / Prefix is
optional per PCIe r7.0 sec 6.2.7.  If needed, drivers could indicate
through a flag whether devices are capable and tlp_header_logged() could
then check that flag.

pcitools introduced macros for newer errors with commit 144b0911cc0b
("ls-ecaps: extend decode support for more fields for AER CE and UE
status"):
  https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/commit/?id=144b0911cc0b

Unfortunately some of those macros are overly long:
  PCI_ERR_UNC_POISONED_TLP_EGRESS
  PCI_ERR_UNC_DMWR_REQ_EGRESS_BLOCKED
  PCI_ERR_UNC_IDE_CHECK
  PCI_ERR_UNC_MISR_IDE_TLP
  PCI_ERR_UNC_PCRC_CHECK
  PCI_ERR_UNC_TLP_XLAT_EGRESS_BLOCKED

This seems unsuitable for <linux/pci_regs.h>, so shorten to:
  PCI_ERR_UNC_POISON_BLK
  PCI_ERR_UNC_DMWR_BLK
  PCI_ERR_UNC_IDE_CHECK
  PCI_ERR_UNC_MISR_IDE
  PCI_ERR_UNC_PCRC_CHECK
  PCI_ERR_UNC_XLAT_BLK

Note that some of the existing macros in <linux/pci_regs.h> do not match
exactly with pcitools (e.g. PCI_ERR_UNC_SDES versus PCI_ERR_UNC_SURPDN),
so it does not seem mandatory for them to be identical.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/5f707caf1260bd8f15012bb032f7da9a9b898aba.1756712066.git.lukas@wunner.de

PCI/P2PDMA: Reduce scope of pci_has_p2pmem()

pci_has_p2pmem() is not used outside of p2pdma.c, and there is no need to
export it for use by modules.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Link: https://patch.msgid.link/d40f3f1decf54c9236bc38b48a6aae612a5c182f.1756900291.git.leon@kernel.org

PCI/AER: Support errors introduced by PCIe r6.0

PCIe r6.0 defined five additional errors in the Uncorrectable Error
Status, Mask and Severity Registers (PCIe r7.0 sec 7.8.4.2ff).

lspci has been supporting them since commit 144b0911cc0b ("ls-ecaps:
extend decode support for more fields for AER CE and UE status"):

https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/commit/?id=144b0911cc0b

Amend the AER driver to recognize them as well, instead of logging them as
"Unknown Error Bit".

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/21f1875b18d4078c99353378f37dcd6b994f6d4e.1756301211.git.lukas@wunner.de

PCI/pwrctrl: Fix device leak at device stop

Make sure to drop the reference to the pwrctrl device taken by
of_find_device_by_node() when stopping a PCI device.

Fixes: 681725afb6b9 ("PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org # v6.13
Link: https://patch.msgid.link/20250721153609.8611-4-johan+linaro@kernel.org

PCI/pwrctrl: Fix device and OF node leak at bus scan

Make sure to drop the references to the pwrctrl OF node and device taken by
of_pci_find_child_device() and of_find_device_by_node() respectively when
scanning the bus.

Fixes: 957f40d039a9 ("PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org # v6.15
Link: https://patch.msgid.link/20250721153609.8611-3-johan+linaro@kernel.org

PCI/pwrctrl: Fix device leak at registration

Make sure to drop the reference to the pwrctrl device taken by
of_find_device_by_node() when registering a PCI device.

Fixes: b458ff7e8176 ("PCI/pwrctl: Ensure that pwrctl drivers are probed before PCI client drivers")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org # v6.13
Link: https://patch.msgid.link/20250721153609.8611-2-johan+linaro@kernel.org

PCI/P2PDMA: Fix incorrect pointer usage in devm_kfree() call

The error handling path in pci_p2pdma_add_resource() contains a bug in its
`pgmap_free` label.

Memory is allocated for the `p2p_pgmap` struct, and the pointer is stored
in `p2p_pgmap`. However, the error path calls devm_kfree() with `pgmap`,
which is a pointer to a member field within the `p2p_pgmap` struct, not the
base pointer of the allocation.

Correct the bug by passing the correct base pointer, `p2p_pgmap`, to
devm_kfree().

Signed-off-by: Sungho Kim <sungho.kim@furiosa.ai>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Link: https://patch.msgid.link/20250820105714.2939896-1-sungho.kim@furiosa.ai

powerpc/eeh: Use result of error_detected() in uevent

Ever since uevent support was added for AER and EEH with commit
856e1eb9bdd4 ("PCI/AER: Add uevents in AER and EEH error/resume"), it
reported PCI_ERS_RESULT_NONE as uevent when recovery begins.

Commit 7b42d97e99d3 ("PCI/ERR: Always report current recovery status for
udev") subsequently amended AER to report the actual return value of
error_detected().

Make the same change to EEH to align it with AER and s390.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/linux-pci/aIp6LiKJor9KLVpv@wunner.de/
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Link: https://patch.msgid.link/20250807-add_err_uevents-v5-3-adf85b0620b0@linux.ibm.com

s390/pci: Use pci_uevent_ers() in PCI recovery

Issue uevents on s390 during PCI recovery using pci_uevent_ers() as done by
EEH and AER PCIe recovery routines.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Link: https://patch.msgid.link/20250807-add_err_uevents-v5-2-adf85b0620b0@linux.ibm.com

PCI/AER: Fix missing uevent on recovery when a reset is requested

Since commit 7b42d97e99d3 ("PCI/ERR: Always report current recovery
status for udev") AER uses the result of error_detected() as parameter
to pci_uevent_ers(). As pci_uevent_ers() however does not handle
PCI_ERS_RESULT_NEED_RESET this results in a missing uevent for the
beginning of recovery if drivers request a reset. Fix this by treating
PCI_ERS_RESULT_NEED_RESET as beginning recovery.

Fixes: 7b42d97e99d3 ("PCI/ERR: Always report current recovery status for udev")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250807-add_err_uevents-v5-1-adf85b0620b0@linux.ibm.com

PCI/pwrctrl: Fix double cleanup on devm_add_action_or_reset() failure

When devm_add_action_or_reset() fails, it calls the passed cleanup
function. Hence the caller must not repeat that cleanup.

Replace the "goto err_regulator_free" by the actual freeing, as there
will never be a need again for a second user of this label.

Fixes: 75996c92f4de309f ("PCI/pwrctrl: Add pwrctrl driver for PCI slots")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Marek Vasut <marek.vasut+renesas@mailbox.org> # V4H Sparrow Hawk
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://patch.msgid.link/7b1386e6162e70e6d631c87f6323d2ab971bc1c5.1755100324.git.geert+renesas@glider.be

PCI/ERR: Remove remnants of .link_reset() callback

Back in 2017, commit 2fd260f03b6a ("PCI/AER: Remove unused .link_reset()
callback") removed .link_reset() from struct pci_error_handlers, but left
a few code comments behind which still mention it.  Remove them.

The code comments in the SolarFlare Ethernet drivers point out that no
.mmio_enabled() callback is needed because the driver's .error_detected()
callback always returns PCI_ERS_RESULT_NEED_RESET, which causes
pcie_do_recovery() to skip .mmio_enabled().  That's not quite correct
because efx_io_error_detected() does return PCI_ERS_RESULT_RECOVERED under
certain conditions and then .mmio_enabled() would indeed be called if it
were implemented.  Remove this misleading portion of the code comment as
well.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/1d72a891a7f57115e78a73046e776f7e0c8cd68f.1755008151.git.lukas@wunner.de

PCI/ERR: Update device error_state already after reset

After a Fatal Error has been reported by a device and has been recovered
through a Secondary Bus Reset, AER updates the device's error_state to
pci_channel_io_normal before invoking its driver's ->resume() callback.

By contrast, EEH updates the error_state earlier, namely after resetting
the device and before invoking its driver's ->slot_reset() callback.
Commit c58dc575f3c8 ("powerpc/pseries: Set error_state to
pci_channel_io_normal in eeh_report_reset()") explains in great detail
that the earlier invocation is necessitated by various drivers checking
accessibility of the device with pci_channel_offline() and avoiding
accesses if it returns true.  It returns true for any other error_state
than pci_channel_io_normal.

The device should be accessible already after reset, hence the reasoning
is that it's safe to update the error_state immediately afterwards.

This deviation between AER and EEH seems problematic because drivers
behave differently depending on which error recovery mechanism the
platform uses.  Three drivers have gone so far as to update the
error_state themselves, presumably to work around AER's behavior.

For consistency, amend AER to update the error_state at the same recovery
steps as EEH.  Drop the now unnecessary workaround from the three drivers.

Keep updating the error_state before ->resume() in case ->error_detected()
or ->mmio_enabled() return PCI_ERS_RESULT_RECOVERED, which causes
->slot_reset() to be skipped.  There are drivers doing this even for Fatal
Errors, e.g. mhi_pci_error_detected().

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/4517af6359ffb9d66152b827a5d2833459144e3f.1755008151.git.lukas@wunner.de

PCI/ERR: Notify drivers on failure to recover

According to Documentation/PCI/pci-error-recovery.rst, the following shall
occur on failure to recover from a PCIe Uncorrectable Error:

  STEP 6: Permanent Failure
  -------------------------
  A "permanent failure" has occurred, and the platform cannot recover
  the device.  The platform will call error_detected() with a
  pci_channel_state_t value of pci_channel_io_perm_failure.

  The device driver should, at this point, assume the worst. It should
  cancel all pending I/O, refuse all new I/O, returning -EIO to
  higher layers. The device driver should then clean up all of its
  memory and remove itself from kernel operations, much as it would
  during system shutdown.

Sathya notes that AER does not call error_detected() on failure and thus
deviates from the document (as well as EEH, for which the document was
originally added).

Most drivers do nothing on permanent failure, but the SCSI drivers and a
number of Ethernet drivers do take advantage of the notification to flush
queues and give up resources.

Amend AER to notify such drivers and align with the documentation and EEH.

Link: https://lore.kernel.org/r/f496fc0f-64d7-46a4-8562-dba74e31a956@linux.intel.com/
Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/ec212d4d4f5c65d29349df33acdc9768ff8279d1.1755008151.git.lukas@wunner.de

PCI/ERR: Fix uevent on failure to recover

Upon failure to recover from a PCIe error through AER, DPC or EDR, a
uevent is sent to inform user space about disconnection of the bridge
whose subordinate devices failed to recover.

However the bridge itself is not disconnected. Instead, a uevent should
be sent for each of the subordinate devices.

Only if the "bridge" happens to be a Root Complex Event Collector or
Integrated Endpoint does it make sense to send a uevent for it (because
there are no subordinate devices).

Right now if there is a mix of subordinate devices with and without
pci_error_handlers, a BEGIN_RECOVERY event is sent for those with
pci_error_handlers but no FAILED_RECOVERY event is ever sent for them
afterwards. Fix it.

Fixes: 856e1eb9bdd4 ("PCI/AER: Add uevents in AER and EEH error/resume")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v4.16+
Link: https://patch.msgid.link/68fc527a380821b5d861dd554d2ce42cb739591c.1755008151.git.lukas@wunner.de

PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors

When Advanced Error Reporting was introduced in September 2006 by commit
6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver"), it
sought to adhere to the recovery flow and callbacks specified in
Documentation/PCI/pci-error-recovery.rst.

That document had been added in January 2006, when Enhanced Error Handling
(EEH) was introduced for PowerPC with commit 065c6359071c ("[PATCH] PCI
Error Recovery: documentation").

However the AER driver deviates from the document in that it never
performs a Secondary Bus Reset on Non-Fatal Errors, but always on Fatal
Errors.  By contrast, EEH allows drivers to opt in or out of a Bus Reset
regardless of error severity, by returning PCI_ERS_RESULT_NEED_RESET or
PCI_ERS_RESULT_CAN_RECOVER from their ->error_detected() callback.  If all
drivers agree that they can recover without a Bus Reset, EEH skips it.
Should one of them request a Bus Reset, it overrides all other drivers.

This inconsistency between EEH and AER seems problematic because drivers
need to be aware of and cope with it.

The file Documentation/PCI/pcieaer-howto.rst hints at a rationale for
always performing a Bus Reset on Fatal Errors:  "Fatal errors [...] cause
the link to be unreliable.  [...] This [reset_link] callback is used to
reset the PCIe physical link when a fatal error happens.  If an error
message indicates a fatal error, [...] performing link reset at upstream
is necessary."

There's no such rationale provided for never performing a Bus Reset on
Non-Fatal Errors.

The "xe" driver has a need to attempt a reset of local units on graphics
cards upon a Non-Fatal Error.  If that is insufficient for recovery, the
driver wants to opt in to a Bus Reset.

Accommodate such use cases and align AER more closely with EEH by
performing a Bus Reset in pcie_do_recovery() if drivers request it and the
faulting device's channel_state is pci_channel_io_normal.  The AER driver
sets this channel_state for Non-Fatal Errors.  For Fatal Errors, it uses
pci_channel_io_frozen.

This limits the deviation from Documentation/PCI/pci-error-recovery.rst
and EEH to the unconditional Bus Reset on Fatal Errors.

pcie_do_recovery() is also invoked by the Downstream Port Containment and
Error Disconnect Recover drivers.  They both set the channel_state to
pci_channel_io_frozen, hence pcie_do_recovery() continues to always invoke
the ->reset_subordinates() callback in their case.  That is necessary
because the callback brings the link back up at the containing Downstream
Port.

There are two behavioral changes resulting from this commit:

First, if channel_state is pci_channel_io_normal and one of the affected
drivers returns PCI_ERS_RESULT_NEED_RESET from its ->error_detected()
callback, a Bus Reset will now be performed.  There are drivers doing this
and although it would be possible to avoid a behavioral change by letting
them return PCI_ERS_RESULT_CAN_RECOVER instead, the impression I got from
examination of all drivers is that they actually expect or want a Bus
Reset (cxl_error_detected() is a case in point).  In any case, if they can
cope with a Bus Reset on Fatal Errors, they shouldn't have issues with a
Bus Reset on Non-Fatal Errors.

Second, if channel_state is pci_channel_io_frozen and all affected drivers
return PCI_ERS_RESULT_CAN_RECOVER from ->error_detected(), their
->mmio_enabled() callback is now invoked prior to performing a Bus Reset,
instead of afterwards.  This actually makes sense:  For example,
drivers/scsi/sym53c8xx_2/sym_glue.c dumps debug registers in its
->mmio_enabled() callback.  Doing so after reset right now captures the
post-reset state instead of the faulting state, which is useless.

There is only one other driver which implements ->mmio_enabled() and
returns PCI_ERS_RESULT_CAN_RECOVER from ->error_detected() for
channel_state pci_channel_io_frozen, drivers/scsi/ipr.c (IBM Power RAID).
It appears to only be used on EEH platforms.  So the second behavioral
change is limited to these two drivers.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/28fd805043bb57af390168d05abb30898cf4fc58.1755008151.git.lukas@wunner.de

PCI: Disable MSI on RDC PCI to PCIe bridges

RDC PCI to PCIe bridges, present on Vortex86DX3 and Vortex86EX2 SoCs, do
not support MSIs. If enabled, interrupts generated by PCIe devices never
reach the processor.

I have contacted the manufacturer (DM&P) and they confirmed that PCI MSIs
need to be disabled for them.

Signed-off-by: Marcos Del Sol Vives <marcos@orca.pet>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250705233209.721507-1-marcos@orca.pet

PCI: hotplug: Clean up spaces in messages

Remove extraneous space before newline in messages.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
[bhelgaas: squash into one patch, commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250731100017.2165781-1-colin.i.king@gmail.com
Link: https://patch.msgid.link/20250731101036.2167812-1-colin.i.king@gmail.com

PCI: Add Extended Tag + MRRS quirk for Xeon 6

When bifurcated to x2, Xeon 6 Root Port performance is sensitive to the
configuration of Extended Tags, Max Read Request Size (MRRS), and 10-Bit
Tag Requester (note: there is currently no 10-Bit Tag support in the
kernel). While those can be configured to the recommended values by FW,
kernel may decide to overwrite the initial values.

Add a quirk that disallows enabling Extended Tags and setting MRRS
larger than 128B for devices under Xeon 6 Root Ports if the Root Port
is bifurcated to x2. Use the host bridge's enable_device hook to
overwrite MRRS if it's set to >128B for the device to be enabled.

The earlier attempts to implement this quirk polluted PCI core code with
the checks necessary to support this quirk. Using the enable_device hook
keeps the quirk well-contained, away from the PCI core code.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Link: https://cdrdv2.intel.com/v1/dl/getContent/837176
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: 2x -> x2, rename quirk]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20250610114802.7460-1-ilpo.jarvinen@linux.intel.com

PCI: Clean up pci_scan_child_bus_extend() loop

pci_scan_child_bus_extend() open-codes device number iteration in the for
loop. Convert to use PCI_DEVFN() and add PCI_MAX_NR_DEVS (there seems to be
no pre-existing define for this purpose).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250610105820.7126-3-ilpo.jarvinen@linux.intel.com

PCI: Clean up early_dump_pci_device()

Convert 256 to PCI_CFG_SPACE_SIZE and 4 to sizeof(u32) and avoid i / 4
construct by changing the iteration.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250610105820.7126-2-ilpo.jarvinen@linux.intel.com

PCI: Use header type defines in pci_setup_device()

Replace literals with PCI_HEADER_TYPE_* defines in pci_setup_device().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250610105820.7126-1-ilpo.jarvinen@linux.intel.com

Linux 6.17-rc1

Merge tag 'turbostat-2025.09.09' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:
"tools/power turbostat: version 2025.09.09

   - Probe and display L3 Cache topology

   - Add ability to average an added counter (useful for pre-integrated
     "counters", such as Watts)

   - Break the limit of 64 built-in counters

   - Assorted bug fixes and minor feature tweaks"

* tag 'turbostat-2025.09.09' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: version 2025.09.09
  tools/power turbostat: Handle non-root legacy-uncore sysfs permissions
  tools/power turbostat: standardize PER_THREAD_PARAMS
  tools/power turbostat: Fix DMR support
  tools/power turbostat: add format "average" for external attributes
  tools/power turbostat: delete GET_PKG()
  tools/power turbostat: probe and display L3 cache topology
  tools/power turbostat: Support more than 64 built-in-counters
  tools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columns
  tools/power turbostat: Fix bogus SysWatt for forked program
  tools/power turbostat: Handle cap_get_proc() ENOSYS
  tools/power turbostat: Fix build with musl
  tools/power turbostat: verify arguments to params --show and --hide
  tools/power turbostat: regression fix: --show C1E%

Merge tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull smp fixes from Borislav Petkov:

- Remove an obsolete comment and fix spelling

* tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu: Remove obsolete comment from takedown_cpu()
smp: Fix spelling in on_each_cpu_cond_mask()'s doc-comment

Merge tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

- Fix a wrong ioremap size in mvebu-gicp

- Remove yet another compile-test case for a driver which needs an
   additional dependency

- Fix a lock inversion scenario in the IRQ unit test suite

- Remove an impossible flag situation in gic-v5

- Do not iounmap resources in gic-v5 which are managed by devm

- Make sure stale, left-over interrupts in mvebu-gicp are cleared on
   driver init

- Fix a reference counting mishap in msi-lib

- Fix a dereference-before-null-ptr-check case in the riscv-imsic
   irqchip driver

* tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mvebu-gicp: Use resource_size() for ioremap()
  irqchip: Build IMX_MU_MSI only on ARM
  genirq/test: Resolve irq lock inversion warnings
  irqchip/gic-v5: Remove IRQD_RESEND_WHEN_IN_PROGRESS for ITS IRQs
  irqchip/gic-v5: iwb: Fix iounmap probe failure path
  irqchip/mvebu-gicp: Clear pending interrupts on init
  irqchip/msi-lib: Fix fwnode refcount in msi_lib_irq_domain_select()
  irqchip/riscv-imsic: Don't dereference before NULL pointer check

Merge tag 'x86_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- Fix an interrupt vector setup race which leads to a non-functioning
   device

- Add new Intel CPU models *and* a family: 0x12. Finally. Yippie! :-)

* tag 'x86_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Plug vector setup race
  x86/cpu: Add new Intel CPU model numbers for Wildcatlake and Novalake

Merge tag 'locking_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Borislav Petkov:

- Prevent a futex hash leak due to different mm lifetimes

* tag 'locking_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Move futex cleanup to __mmdrop()

tools/power turbostat: version 2025.09.09

Probe and display L3 Cache topology
Add ability to average an added counter
(useful for pre-integrated "counters", such as Watts)
Break the limit of 64 built-in counters.
Assorted bug fixes and minor feature tweaks

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Handle non-root legacy-uncore sysfs permissions

/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/
may be readable by all, but
/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/current_freq_khz
may be readable only by root.

Non-root turbostat users see complaints in this scenario.

Fail probe of the interface if we can't read current_freq_khz.

Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Original-patch-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: standardize PER_THREAD_PARAMS

use a macro for PER_THREAD_PARAMS to make adding one later more clear.

no functional change

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Fix DMR support

Together with the RAPL MSRs, there are more MSRs gone on DMR, including
PLR (Perf Limit Reasons), and IRTL (Package cstate Interrupt Response
Time Limit) MSRs. The configurable TDP info should also be retrieved
from TPMI based Intel Speed Select Technology feature.

Remove the access of these MSRs for DMR. Improve the DMR platform
feature table to make it more readable at the same time.

Fixes: 83075bd59de2 ("tools/power turbostat: Add initial support for DMR")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: add format "average" for external attributes

External atributes with format "raw" are not printed in summary lines
for nodes/packages (or with option -S). The new format "average"
behaves like "raw" but also adds the summary data

Signed-off-by: Michael Hebenstreit <michael.hebenstreit@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: delete GET_PKG()

pkg_base[pkg_id] is a simple array of structure pointers,
let the compiler treat it that way.

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: probe and display L3 cache topology

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Support more than 64 built-in-counters

We have out-grown the ability to use a 64-bit memory location
to inventory every possible built-in counter.
Leverage the the CPU_SET(3) macros to break this barrier.

Also, break the Joules & Watts counters into two,
since we can no longer 'or' them together...

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columns

Explain the meaning of the Totl%C0, Any%C0, GFX%C0, CPUGFX% columns.

Signed-off-by: Len Brown <len.brown@intel.com>

Merge tag 'tty-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull TTY fix from Greg KH:
"Here is a single revert of one of the previous patches that went in
  the last tty/serial merge that is breaking userspace on some platforms
  (specifically powerpc, probably a few others.)

  It accidentially changed the ioctl values of some tty ioctls, which
  breaks xorg.

  The revert has been in linux-next all this week with no reported
  issues"

* tag 'tty-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "tty: vt: use _IO() to define ioctl numbers"

Merge tag 'efi-next-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:

- Expose the OVMF firmware debug log via sysfs

- Lower the default log level for the EFI stub to avoid corrupting any
   splash screens with unimportant diagnostic output

* tag 'efi-next-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: add API doc entry for ovmf_debug_log
  efistub: Lower default log level
  efi: add ovmf debug log driver

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

- Fix memory leak of bpf_scc_info objects (Eduard Zingerman)

- Fix a regression in the 'perf' tool caused by moving UID filtering to
   BPF (Ilya Leoshkevich)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  perf bpf-filter: Enable events manually
  libbpf: Add the ability to suppress perf event enablement
  bpf: Fix memory leak of bpf_scc_info objects

Merge tag 'block-6.17-20250808' of git://git.kernel.dk/linux

Pull more block updates from Jens Axboe:

- MD pull request via Yu:
      - mddev null-ptr-dereference fix, by Erkun
      - md-cluster fail to remove the faulty disk regression fix, by
        Heming
      - minor cleanup, by Li Nan and Jinchao
      - mdadm lifetime regression fix reported by syzkaller, by Yu Kuai

- MD pull request via Christoph
      - add support for getting the FDP featuee in fabrics passthru path
        (Nitesh Shetty)
      - add capability to connect to an administrative controller
        (Kamaljit Singh)
      - fix a leak on sgl setup error (Keith Busch)
      - initialize discovery subsys after debugfs is initialized
        (Mohamed Khalfella)
      - fix various comment typos (Bjorn Helgaas)
      - remove unneeded semicolons (Jiapeng Chong)

- nvmet debugfs ordering issue fix

- Fix UAF in the tag_set in zloop

- Ensure sbitmap shallow depth covers entire set

- Reduce lock roundtrips in io context lookup

- Move scheduler tags alloc/free out of elevator and freeze lock, to
   fix some lockdep found issues

- Improve robustness of queue limits checking

- Fix a regression with IO priorities, if no io context exists

* tag 'block-6.17-20250808' of git://git.kernel.dk/linux: (26 commits)
  lib/sbitmap: make sbitmap_get_shallow() internal
  lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
  nvmet: exit debugfs after discovery subsystem exits
  block, bfq: Reorder struct bfq_iocq_bfqq_data
  md: make rdev_addable usable for rcu mode
  md/raid1: remove struct pool_info and related code
  md/raid1: change r1conf->r1bio_pool to a pointer type
  block: ensure discard_granularity is zero when discard is not supported
  zloop: fix KASAN use-after-free of tag set
  block: Fix default IO priority if there is no IO context
  nvme: fix various comment typos
  nvme-auth: remove unneeded semicolon
  nvme-pci: fix leak on sgl setup error
  nvmet: initialize discovery subsys after debugfs is initialized
  nvme: add capability to connect to an administrative controller
  nvmet: add support for FDP in fabrics passthru path
  md: rename recovery_cp to resync_offset
  md/md-cluster: handle REMOVE message earlier
  md: fix create on open mddev lifetime regression
  block: fix potential deadlock while running nr_hw_queue update
  ...

Merge tag 'io_uring-6.17-20250808' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

- Allow vectorized payloads for send/send-zc - like sendmsg, but
   without the hassle of a msghdr.

- Fix for an integer wrap that should go to stable, spotted by syzbot.
   Nothing alarming here, as you need to be root to hit this.
   Nevertheless, it should get fixed.

   FWIW, kudos to the syzbot crew for having much nicer reproducers now,
   and with nicely annotated source code as well. This is particularly
   useful as syzbot uses the raw interface rather than liburing,
   historically it's been difficult to turn a syzbot reproducer into a
   meaningful test case. With the recent changes, not true anymore!

* tag 'io_uring-6.17-20250808' of git://git.kernel.dk/linux:
  io_uring/memmap: cast nr_pages to size_t before shifting
  io_uring/net: Allow to do vectorized send

Merge tag 'spi-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"There's one fix here for an issue with the CS42L43 where we were
  allocating a single property for client devices as just that property
  rather than a terminated array of properties like we are supposed to.

  We also have an update to the MAINTAINERS file for some Renesas
  devices"

* tag 'spi-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: cs42l43: Property entry should be a null-terminated array
  MAINTAINERS: Add entries for the RZ/V2H(P) RSPI

Merge tag 'regulator-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
"This fixes an issue with the newly added code for handling large
  voltage changes on regulators which require that individual voltage
  changes cover a limited range, the check for convergence was broken"

* tag 'regulator-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: core: correct convergence check in regulator_set_voltage()

Merge tag 'regmap-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
"These patches fix a lockdep issue Russell King reported with nested
  regmap-irqs (unusual since regmap is generally for devices on slow
  buses so devices don't get nested), plus add a missing mutex free
  which I noticed while implementing a fix for that issue"

* tag 'regmap-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: irq: Avoid lockdep warnings with nested regmap-irq chips
  regmap: irq: Free the regmap-irq mutex

Merge tag 'pci-v6.17-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fix from Bjorn Helgaas:

- Fix vmd MSI interrupt domain restructure that caused crash early in
boot (Nam Cao)

* tag 'pci-v6.17-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: vmd: Fix wrong kfree() in vmd_msi_free()

Merge tag 'mailbox-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox

Pull mailbox updates from Jassi Brar:

- aspeed: add driver and bindings for ast2700

- broadcom: add driver and bindings for bcm74110

- mediatek: fix RPM api usage

- qcom: use dev_fwnode

- pcc: support shared buffer

- misc dt-bindings cleanup

* tag 'mailbox-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
  mailbox/pcc: support mailbox management of the shared buffer
  mailbox: bcm74110: Fix spelling mistake
  mailbox: bcm74110: remove unneeded semicolon
  mailbox: aspeed: add mailbox driver for AST27XX series SoC
  dt-bindings: mailbox: Add ASPEED AST2700 series SoC
  dt-bindings: mailbox: Drop consumers example DTS
  dt-bindings: mailbox: nvidia,tegra186-hsp: Use generic node name
  dt-bindings: mailbox: Correct example indentation
  dt-bindings: mailbox: ti,secure-proxy: Add missing reg maxItems
  dt-bindings: mailbox: amlogic,meson-gxbb-mhu: Add missing interrupts maxItems
  dt-bindings: mailbox: qcom-ipcc: document the Milos Inter-Processor Communication Controller
  mailbox: Add support for bcm74110
  dt-bindings: mailbox: Add support for bcm74110
  mailbox: Use dev_fwnode()
  mailbox: mtk-cmdq: Switch to pm_runtime_put_autosuspend()

Merge tag 'gpio-updates-for-v6.17-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio updates from Bartosz Golaszewski:
"As discussed: there's a small commit that removes the legacy GPIO line
  value setter callbacks as they're no longer used and a big, treewide
  commit that renames the new ones to the old names across all GPIO
  drivers at once.

  While at it: there are also two fixes that I picked up over the course
  of the merge window:

   - remove unused, legacy GPIO line value setters from struct gpio_chip

   - rename the new set callbacks back to the original names treewide

   - fix interrupt handling in gpio-mlxbf2

   - revert a buggy immutable irqchip conversion"

* tag 'gpio-updates-for-v6.17-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  treewide: rename GPIO set callbacks back to their original names
  gpio: remove legacy GPIO line value setter callbacks
  gpio: mlxbf2: use platform_get_irq_optional()
  Revert "gpio: pxa: Make irq_chip immutable"

Merge tag 'sound-fix-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:

- Support for ASoC AMD ACP 7.2 with new IDs

- ASoC Intel AVS and SOF fixes

- Yet more kconfig adjustments for HD-audio codecs

- TAS2781 codec fixes

- Fixes for longstanding (rather minor) bugs in Intel LPE audio and
   USB-audio drivers

* tag 'sound-fix-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/cirrus: Restrict prompt only for CONFIG_EXPERT
  ALSA: hda/hdmi: Restrict prompt only for CONFIG_EXPERT
  ALSA: hda/realtek: Restrict prompt only for CONFIG_EXPERT
  ALSA: hda/ca0132: Fix missing error handling in ca0132_alt_select_out()
  ASoC: SOF: Intel: hda-sdw-bpt: fix SND_SOF_SOF_HDA_SDW_BPT dependencies
  ALSA: hda/tas2781: Support L"SmartAmpCalibrationData" to save calibrated data
  ALSA: intel_hdmi: Fix off-by-one error in __hdmi_lpe_audio_probe()
  ALSA: hda/realtek: add LG gram 16Z90R-A to alc269 fixup table
  ALSA: usb-audio: Don't use printk_ratelimit for debug prints
  ASoC: Intel: sof_sdw: Add quirk for Alienware Area 51 (2025) 0CCC SKU
  ASoC: tas2781: Fix the wrong step for TLV on tas2781
  ASoC: amd: acp: Add SoundWire SOF machine driver support for acp7.2 platform
  ASoC: amd: acp: Add SoundWire legacy machine driver support for acp7.2 platform
  ASoC: amd: ps: Add SoundWire pci and dma driver support for acp7.2 platform
  ASoC: SOF: amd: Add sof audio support for acp7.2 platform
  ASoC: Intel: avs: Fix uninitialized pointer error in probe()
  ASoC: wm8962: Clear master mode when enter runtime suspend
  ASoC: SOF: amd: acp-loader: Use GFP_KERNEL for DMA allocations in resume context