]> www.infradead.org Git - users/jedix/linux-maple.git/commit
accel/ivpu: Implement heartbeat-based TDR mechanism
authorKarol Wachowski <karol.wachowski@intel.com>
Wed, 16 Apr 2025 10:25:55 +0000 (12:25 +0200)
committerJacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Fri, 25 Apr 2025 07:49:11 +0000 (09:49 +0200)
commit0e7db503c5355ff9a7471d0a450bde8c069ef803
tree779b001642a4e3613fd25bdc66a02ef755046976
parent3a2b7389feea9a7afd18d58cda59b7a989445f38
accel/ivpu: Implement heartbeat-based TDR mechanism

Introduce a heartbeat-based Timeout Detection and Recovery (TDR) mechanism.
The enhancement aims to improve the reliability of device hang detection by
monitoring heartbeat updates.

Each progressing inference will update heartbeat counter allowing driver to
monitor its progression. Limit maximum number of reschedules when heartbeat
indicates progression to 30. This increases the maximum running time of
single inference to about 60 seconds.

The heartbeat mechanism provides a more robust method for detecting device
hangs, potentially reducing false positive recoveries due to long running
inferences.

Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://lore.kernel.org/r/20250416102555.384526-1-maciej.falkowski@linux.intel.com
drivers/accel/ivpu/ivpu_drv.c
drivers/accel/ivpu/ivpu_drv.h
drivers/accel/ivpu/ivpu_fw.h
drivers/accel/ivpu/ivpu_pm.c