From: Philipp Stanner Date: Wed, 13 Aug 2025 08:56:55 +0000 (+0200) Subject: drm/sched: Document race condition in drm_sched_fini() X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=f4c75f975cf50fa2e1fd96c5aafe5aa62e55fbe4;p=users%2Fhch%2Fmisc.git drm/sched: Document race condition in drm_sched_fini() In drm_sched_fini() all entities are marked as stopped - without taking the appropriate lock, because that would deadlock. That means that drm_sched_fini() and drm_sched_entity_push_job() can race against each other. This should most likely be fixed by establishing the rule that all entities associated with a scheduler must be torn down first. Then, however, the locking should be removed from drm_sched_fini() alltogether with an appropriate comment. Reported-by: James Flowers Link: https://lore.kernel.org/dri-devel/20250720235748.2798-1-bold.zone2373@fastmail.com/ Reviewed-by: Christian König Signed-off-by: Philipp Stanner Link: https://lore.kernel.org/r/20250813085654.102504-2-phasta@kernel.org --- diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 5a550fd76bf0..46119aacb809 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1424,6 +1424,22 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) * Prevents reinsertion and marks job_queue as idle, * it will be removed from the rq in drm_sched_entity_fini() * eventually + * + * FIXME: + * This lacks the proper spin_lock(&s_entity->lock) and + * is, therefore, a race condition. Most notably, it + * can race with drm_sched_entity_push_job(). The lock + * cannot be taken here, however, because this would + * lead to lock inversion -> deadlock. + * + * The best solution probably is to enforce the life + * time rule of all entities having to be torn down + * before their scheduler. Then, however, locking could + * be dropped alltogether from this function. + * + * For now, this remains a potential race in all + * drivers that keep entities alive for longer than + * the scheduler. */ s_entity->stopped = true; spin_unlock(&rq->lock);