]> www.infradead.org Git - users/dwmw2/linux.git/commit
drm/xe: Don't short circuit TDR on jobs not started
authorMatthew Brost <matthew.brost@intel.com>
Fri, 25 Oct 2024 21:43:29 +0000 (14:43 -0700)
committerLucas De Marchi <lucas.demarchi@intel.com>
Thu, 31 Oct 2024 14:03:14 +0000 (07:03 -0700)
commitfe05cee4d9533892210e1ee90147175d87e7c053
tree3d67cb5d5b38f4371d22be45ded4a3d96e1b01f3
parent993ca0eccec65a2cacc3cefb15d35ffadc6f00fb
drm/xe: Don't short circuit TDR on jobs not started

Short circuiting TDR on jobs not started is an optimization which is not
required. On LNL we are facing an issue where jobs do not get scheduled
by the GuC if it misses a GGTT page update. When this occurs let the TDR
fire, toggle the scheduling which may get the job unstuck, and print a
warning message. If the TDR fires twice on job that hasn't started,
timeout the job.

v2:
 - Add warning message (Paulo)
 - Add fixes tag (Paulo)
 - Timeout job which hasn't started after TDR firing twice
v3:
 - Include local change
v4:
 - Short circuit check_timeout on job not started
 - use warn level rather than notice (Paulo)

Fixes: 7ddb9403dd74 ("drm/xe: Sample ctx timestamp to determine if jobs have timed out")
Cc: stable@vger.kernel.org
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241025214330.2010521-2-matthew.brost@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 35d25a4a0012e690ef0cc4c5440231176db595cc)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
drivers/gpu/drm/xe/xe_guc_submit.c