From 77052ab24590cb72598e31de4a7c29f99d51d201 Mon Sep 17 00:00:00 2001 From: Riana Tauro Date: Mon, 7 Apr 2025 10:44:12 +0530 Subject: [PATCH] drm/xe: Add documentation for survivability mode Add survivability mode document to pcode document as it is enabled when pcode detects a failure. v2: fix kernel-doc (Lucas) Signed-off-by: Riana Tauro Reviewed-by: Lucas De Marchi Link: https://lore.kernel.org/r/20250407051414.1651616-3-riana.tauro@intel.com Signed-off-by: Lucas De Marchi --- Documentation/gpu/xe/xe_pcode.rst | 7 +++++ drivers/gpu/drm/xe/xe_survivability_mode.c | 34 +++++++++++++++------- 2 files changed, 30 insertions(+), 11 deletions(-) diff --git a/Documentation/gpu/xe/xe_pcode.rst b/Documentation/gpu/xe/xe_pcode.rst index d2e22cc45061f..5937ef3599b0e 100644 --- a/Documentation/gpu/xe/xe_pcode.rst +++ b/Documentation/gpu/xe/xe_pcode.rst @@ -12,3 +12,10 @@ Internal API .. kernel-doc:: drivers/gpu/drm/xe/xe_pcode.c :internal: + +================== +Boot Survivability +================== + +.. kernel-doc:: drivers/gpu/drm/xe/xe_survivability_mode.c + :doc: Xe Boot Survivability diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c index cb813b337fd33..399c06890b0b8 100644 --- a/drivers/gpu/drm/xe/xe_survivability_mode.c +++ b/drivers/gpu/drm/xe/xe_survivability_mode.c @@ -28,20 +28,32 @@ * This is implemented by loading the driver with bare minimum (no drm card) to allow the firmware * to be flashed through mei and collect telemetry. The driver's probe flow is modified * such that it enters survivability mode when pcode initialization is incomplete and boot status - * denotes a failure. The driver then populates the survivability_mode PCI sysfs indicating - * survivability mode and provides additional information required for debug + * denotes a failure. * - * KMD exposes below admin-only readable sysfs in survivability mode + * Survivability mode can also be entered manually using the survivability mode attribute available + * through configfs which is beneficial in several usecases. It can be used to address scenarios + * where pcode does not detect failure or for validation purposes. It can also be used in + * In-Field-Repair (IFR) to repair a single card without impacting the other cards in a node. * - * device/survivability_mode: The presence of this file indicates that the card is in survivability - * mode. Also, provides additional information on why the driver entered - * survivability mode. + * Use below command enable survivability mode manually:: * - * Capability Information - Provides boot status - * Postcode Information - Provides information about the failure - * Overflow Information - Provides history of previous failures - * Auxiliary Information - Certain failures may have information in - * addition to postcode information + * # echo 1 > /sys/kernel/config/xe/0000:03:00.0/survivability_mode + * + * Refer :ref:`xe_configfs` for more details on how to use configfs + * + * Survivability mode is indicated by the below admin-only readable sysfs which provides additional + * debug information:: + * + * /sys/bus/pci/devices//surivability_mode + * + * Capability Information: + * Provides boot status + * Postcode Information: + * Provides information about the failure + * Overflow Information + * Provides history of previous failures + * Auxiliary Information + * Certain failures may have information in addition to postcode information */ static u32 aux_history_offset(u32 reg_value) -- 2.50.1