]> www.infradead.org Git - users/willy/pagecache.git/commitdiff
drm/amd/amdkfd: Evict all queues even HWS remove queue failed
authorYifan Zha <Yifan.Zha@amd.com>
Wed, 5 Mar 2025 05:14:55 +0000 (13:14 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Wed, 12 Mar 2025 18:59:21 +0000 (14:59 -0400)
[Why]
If reset is detected and kfd need to evict working queues, HWS moving queue will be failed.
Then remaining queues are not evicted and in active state.

After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted.
So remove queue will be failed again.

[How]
Keep removing all queues even if HWS returns failed.
It will not affect cpsch as it checks reset_domain->sem.

v2: If any queue failed, evict queue returns error.
v3: Declare err inside the if-block.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 42c854b8fb0cce512534aa2b7141948e80c6ebb0)
Cc: stable@vger.kernel.org
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index d4593374e7a1e448afab21fa22d154df942435fb..34c2c42c0f95c6cb47eda8d9a625b8a8ff7e36db 100644 (file)
@@ -1230,11 +1230,13 @@ static int evict_process_queues_cpsch(struct device_queue_manager *dqm,
                decrement_queue_count(dqm, qpd, q);
 
                if (dqm->dev->kfd->shared_resources.enable_mes) {
-                       retval = remove_queue_mes(dqm, q, qpd);
-                       if (retval) {
+                       int err;
+
+                       err = remove_queue_mes(dqm, q, qpd);
+                       if (err) {
                                dev_err(dev, "Failed to evict queue %d\n",
                                        q->properties.queue_id);
-                               goto out;
+                               retval = err;
                        }
                }
        }