]> www.infradead.org Git - users/jedix/linux-maple.git/commit
drm/xe: Don't short circuit TDR on jobs not started
authorMatthew Brost <matthew.brost@intel.com>
Fri, 25 Oct 2024 21:43:29 +0000 (14:43 -0700)
committerLucas De Marchi <lucas.demarchi@intel.com>
Thu, 31 Oct 2024 05:14:06 +0000 (22:14 -0700)
commit35d25a4a0012e690ef0cc4c5440231176db595cc
treede6f27d71ca40e9754cafa1165dc56759a2d15e7
parent5a710196883e0ac019ac6df2a6d79c16ad3c32fa
drm/xe: Don't short circuit TDR on jobs not started

Short circuiting TDR on jobs not started is an optimization which is not
required. On LNL we are facing an issue where jobs do not get scheduled
by the GuC if it misses a GGTT page update. When this occurs let the TDR
fire, toggle the scheduling which may get the job unstuck, and print a
warning message. If the TDR fires twice on job that hasn't started,
timeout the job.

v2:
 - Add warning message (Paulo)
 - Add fixes tag (Paulo)
 - Timeout job which hasn't started after TDR firing twice
v3:
 - Include local change
v4:
 - Short circuit check_timeout on job not started
 - use warn level rather than notice (Paulo)

Fixes: 7ddb9403dd74 ("drm/xe: Sample ctx timestamp to determine if jobs have timed out")
Cc: stable@vger.kernel.org
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241025214330.2010521-2-matthew.brost@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
drivers/gpu/drm/xe/xe_guc_submit.c