drm/xe/gsc: Wedge the device if the GSCCS reset fails
authorDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Wed, 28 Aug 2024 22:14:57 +0000 (15:14 -0700)
committerDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Thu, 29 Aug 2024 21:18:52 +0000 (14:18 -0700)
Due to the special handling of the GSCCS in HW, we can't escalate to GT
reset when we receive the reset failure interrupt; the specs indicate
that we should trigger an FLR instead, but we do not have support for
that at the moment, so the HW will stay permanently in a broken state.
We should therefore mark the device as wedged, the same as if the GT
reset had failed.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240828221457.2752868-1-daniele.ceraolospurio@intel.com
drivers/gpu/drm/xe/xe_gsc.c

index 648786afffe05a120cd707818cc8a258b1d21252..6fbea70d3d36d778907098584cefbfe7d96827b3 100644 (file)
@@ -335,9 +335,11 @@ static int gsc_er_complete(struct xe_gt *gt)
        if (er_status == GSCI_TIMER_STATUS_TIMER_EXPIRED) {
                /*
                 * XXX: we should trigger an FLR here, but we don't have support
-                * for that yet.
+                * for that yet. Since we can't recover from the error, we
+                * declare the device as wedged.
                 */
                xe_gt_err(gt, "GSC ER timed out!\n");
+               xe_device_declare_wedged(gt_to_xe(gt));
                return -EIO;
        }