From 9eab24532691a36ce795f192ea2fb3193bc5c4b3 Mon Sep 17 00:00:00 2001 From: Srinivasan Shanmugam Date: Wed, 26 Mar 2025 12:53:01 +0530 Subject: [PATCH 01/16] drm/amdgpu/gfx10: Add Cleaner Shader Support for GFX10.3.x GPUs MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Enable the cleaner shader for other GFX10.3.x series of GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX10.3.x GPUs, previously available for GFX10.3.0. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Mario Sopena-Novales Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Acked-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 30 ++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index a63ce747863f..d1c04de0f633 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -4810,7 +4810,9 @@ static int gfx_v10_0_sw_init(struct amdgpu_ip_block *ip_block) } break; case IP_VERSION(10, 3, 0): + case IP_VERSION(10, 3, 1): case IP_VERSION(10, 3, 2): + case IP_VERSION(10, 3, 3): case IP_VERSION(10, 3, 4): case IP_VERSION(10, 3, 5): adev->gfx.cleaner_shader_ptr = gfx_10_3_0_cleaner_shader_hex; @@ -4826,6 +4828,34 @@ static int gfx_v10_0_sw_init(struct amdgpu_ip_block *ip_block) } } break; + case IP_VERSION(10, 3, 6): + adev->gfx.cleaner_shader_ptr = gfx_10_3_0_cleaner_shader_hex; + adev->gfx.cleaner_shader_size = sizeof(gfx_10_3_0_cleaner_shader_hex); + if (adev->gfx.me_fw_version >= 14 && + adev->gfx.pfp_fw_version >= 17 && + adev->gfx.mec_fw_version >= 24) { + adev->gfx.enable_cleaner_shader = true; + r = amdgpu_gfx_cleaner_shader_sw_init(adev, adev->gfx.cleaner_shader_size); + if (r) { + adev->gfx.enable_cleaner_shader = false; + dev_err(adev->dev, "Failed to initialize cleaner shader\n"); + } + } + break; + case IP_VERSION(10, 3, 7): + adev->gfx.cleaner_shader_ptr = gfx_10_3_0_cleaner_shader_hex; + adev->gfx.cleaner_shader_size = sizeof(gfx_10_3_0_cleaner_shader_hex); + if (adev->gfx.me_fw_version >= 4 && + adev->gfx.pfp_fw_version >= 9 && + adev->gfx.mec_fw_version >= 12) { + adev->gfx.enable_cleaner_shader = true; + r = amdgpu_gfx_cleaner_shader_sw_init(adev, adev->gfx.cleaner_shader_size); + if (r) { + adev->gfx.enable_cleaner_shader = false; + dev_err(adev->dev, "Failed to initialize cleaner shader\n"); + } + } + break; default: adev->gfx.enable_cleaner_shader = false; break; -- 2.51.0 From 0d47bb77b505e539a99f4f032c9c5a58f1d21e18 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Thu, 20 Mar 2025 13:34:33 -0400 Subject: [PATCH 02/16] drm/amdgpu/gfx: make amdgpu_gfx_me_queue_to_bit() static It's not used outside of amdgpu_gfx.c. Reviewed-by: Sunil Khatri Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 72af5e5a894a..04982b7f33a8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -74,8 +74,8 @@ bool amdgpu_gfx_is_mec_queue_enabled(struct amdgpu_device *adev, adev->gfx.mec_bitmap[xcc_id].queue_bitmap); } -int amdgpu_gfx_me_queue_to_bit(struct amdgpu_device *adev, - int me, int pipe, int queue) +static int amdgpu_gfx_me_queue_to_bit(struct amdgpu_device *adev, + int me, int pipe, int queue) { int bit = 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index 87e862188766..5d70b3ae3fe1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -550,8 +550,6 @@ bool amdgpu_gfx_is_high_priority_compute_queue(struct amdgpu_device *adev, struct amdgpu_ring *ring); bool amdgpu_gfx_is_high_priority_graphics_queue(struct amdgpu_device *adev, struct amdgpu_ring *ring); -int amdgpu_gfx_me_queue_to_bit(struct amdgpu_device *adev, int me, - int pipe, int queue); bool amdgpu_gfx_is_me_queue_enabled(struct amdgpu_device *adev, int me, int pipe, int queue); void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable); -- 2.51.0 From 8f970c46b56235a3965e0965fa3fd60e7567fecc Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Thu, 20 Mar 2025 14:10:17 -0400 Subject: [PATCH 03/16] drm/amdgpu/gfx: decouple the number of kgqs from the hw The driver currently sets up one kgq per pipe. As such adev->gfx.me.num_queue_per_pipe is hardcoded to 1 everywhere. This is fine for kernel queues, but when we enable user queues we need to know that actual number of queues per pipe. Decouple the kgq setup from the actual hardware count. For dev core dumps and user queues, we want to know the actual number of queues per pipe. Reviewed-by: Sunil Khatri Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 13 +++++++------ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 ++- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 ++- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 3 ++- 4 files changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 04982b7f33a8..f64675b2ab75 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -77,11 +77,12 @@ bool amdgpu_gfx_is_mec_queue_enabled(struct amdgpu_device *adev, static int amdgpu_gfx_me_queue_to_bit(struct amdgpu_device *adev, int me, int pipe, int queue) { + int num_queue_per_pipe = 1; /* we only enable 1 KGQ per pipe */ int bit = 0; bit += me * adev->gfx.me.num_pipe_per_me - * adev->gfx.me.num_queue_per_pipe; - bit += pipe * adev->gfx.me.num_queue_per_pipe; + * num_queue_per_pipe; + bit += pipe * num_queue_per_pipe; bit += queue; return bit; @@ -238,8 +239,8 @@ void amdgpu_gfx_graphics_queue_acquire(struct amdgpu_device *adev) { int i, queue, pipe; bool multipipe_policy = amdgpu_gfx_is_graphics_multipipe_capable(adev); - int max_queues_per_me = adev->gfx.me.num_pipe_per_me * - adev->gfx.me.num_queue_per_pipe; + int num_queue_per_pipe = 1; /* we only enable 1 KGQ per pipe */ + int max_queues_per_me = adev->gfx.me.num_pipe_per_me * num_queue_per_pipe; if (multipipe_policy) { /* policy: amdgpu owns the first queue per pipe at this stage @@ -247,9 +248,9 @@ void amdgpu_gfx_graphics_queue_acquire(struct amdgpu_device *adev) for (i = 0; i < max_queues_per_me; i++) { pipe = i % adev->gfx.me.num_pipe_per_me; queue = (i / adev->gfx.me.num_pipe_per_me) % - adev->gfx.me.num_queue_per_pipe; + num_queue_per_pipe; - set_bit(pipe * adev->gfx.me.num_queue_per_pipe + queue, + set_bit(pipe * num_queue_per_pipe + queue, adev->gfx.me.queue_bitmap); } } else { diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index d1c04de0f633..cba9b973f0a7 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -4752,6 +4752,7 @@ static int gfx_v10_0_sw_init(struct amdgpu_ip_block *ip_block) int i, j, k, r, ring_id = 0; int xcc_id = 0; struct amdgpu_device *adev = ip_block->adev; + int num_queue_per_pipe = 1; /* we only enable 1 KGQ per pipe */ INIT_DELAYED_WORK(&adev->gfx.idle_work, amdgpu_gfx_profile_idle_work_handler); @@ -4916,7 +4917,7 @@ static int gfx_v10_0_sw_init(struct amdgpu_ip_block *ip_block) /* set up the gfx ring */ for (i = 0; i < adev->gfx.me.num_me; i++) { - for (j = 0; j < adev->gfx.me.num_queue_per_pipe; j++) { + for (j = 0; j < num_queue_per_pipe; j++) { for (k = 0; k < adev->gfx.me.num_pipe_per_me; k++) { if (!amdgpu_gfx_is_me_queue_enabled(adev, i, k, j)) continue; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index d57db42f9536..0c9b28a46050 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -1571,6 +1571,7 @@ static int gfx_v11_0_sw_init(struct amdgpu_ip_block *ip_block) int i, j, k, r, ring_id = 0; int xcc_id = 0; struct amdgpu_device *adev = ip_block->adev; + int num_queue_per_pipe = 1; /* we only enable 1 KGQ per pipe */ INIT_DELAYED_WORK(&adev->gfx.idle_work, amdgpu_gfx_profile_idle_work_handler); @@ -1703,7 +1704,7 @@ static int gfx_v11_0_sw_init(struct amdgpu_ip_block *ip_block) /* set up the gfx ring */ for (i = 0; i < adev->gfx.me.num_me; i++) { - for (j = 0; j < adev->gfx.me.num_queue_per_pipe; j++) { + for (j = 0; j < num_queue_per_pipe; j++) { for (k = 0; k < adev->gfx.me.num_pipe_per_me; k++) { if (!amdgpu_gfx_is_me_queue_enabled(adev, i, k, j)) continue; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c index e7b58e470292..7b20ab48f762 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c @@ -1346,6 +1346,7 @@ static int gfx_v12_0_sw_init(struct amdgpu_ip_block *ip_block) unsigned num_compute_rings; int xcc_id = 0; struct amdgpu_device *adev = ip_block->adev; + int num_queue_per_pipe = 1; /* we only enable 1 KGQ per pipe */ INIT_DELAYED_WORK(&adev->gfx.idle_work, amdgpu_gfx_profile_idle_work_handler); @@ -1435,7 +1436,7 @@ static int gfx_v12_0_sw_init(struct amdgpu_ip_block *ip_block) /* set up the gfx ring */ for (i = 0; i < adev->gfx.me.num_me; i++) { - for (j = 0; j < adev->gfx.me.num_queue_per_pipe; j++) { + for (j = 0; j < num_queue_per_pipe; j++) { for (k = 0; k < adev->gfx.me.num_pipe_per_me; k++) { if (!amdgpu_gfx_is_me_queue_enabled(adev, i, k, j)) continue; -- 2.51.0 From 9cffd67e803a081ce548488161c69f2cddb97fe7 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Thu, 20 Mar 2025 14:24:47 -0400 Subject: [PATCH 04/16] drm/amdgpu/gfx: assign the actual me0 queues per pipe Set the actual number of queues per pipe for ME0 (gfx). This way we will dump all of the queues properly in dev core dumps. Reviewed-by: Sunil Khatri Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index cba9b973f0a7..d9c2dab17f6b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -4764,7 +4764,7 @@ static int gfx_v10_0_sw_init(struct amdgpu_ip_block *ip_block) case IP_VERSION(10, 1, 4): adev->gfx.me.num_me = 1; adev->gfx.me.num_pipe_per_me = 1; - adev->gfx.me.num_queue_per_pipe = 1; + adev->gfx.me.num_queue_per_pipe = 8; adev->gfx.mec.num_mec = 2; adev->gfx.mec.num_pipe_per_mec = 4; adev->gfx.mec.num_queue_per_pipe = 8; @@ -4779,7 +4779,7 @@ static int gfx_v10_0_sw_init(struct amdgpu_ip_block *ip_block) case IP_VERSION(10, 3, 7): adev->gfx.me.num_me = 1; adev->gfx.me.num_pipe_per_me = 2; - adev->gfx.me.num_queue_per_pipe = 1; + adev->gfx.me.num_queue_per_pipe = 2; adev->gfx.mec.num_mec = 2; adev->gfx.mec.num_pipe_per_mec = 4; adev->gfx.mec.num_queue_per_pipe = 4; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index 0c9b28a46050..c78458ecb88b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -1581,7 +1581,7 @@ static int gfx_v11_0_sw_init(struct amdgpu_ip_block *ip_block) case IP_VERSION(11, 0, 3): adev->gfx.me.num_me = 1; adev->gfx.me.num_pipe_per_me = 1; - adev->gfx.me.num_queue_per_pipe = 1; + adev->gfx.me.num_queue_per_pipe = 2; adev->gfx.mec.num_mec = 1; adev->gfx.mec.num_pipe_per_mec = 4; adev->gfx.mec.num_queue_per_pipe = 4; @@ -1594,7 +1594,7 @@ static int gfx_v11_0_sw_init(struct amdgpu_ip_block *ip_block) case IP_VERSION(11, 5, 3): adev->gfx.me.num_me = 1; adev->gfx.me.num_pipe_per_me = 1; - adev->gfx.me.num_queue_per_pipe = 1; + adev->gfx.me.num_queue_per_pipe = 2; adev->gfx.mec.num_mec = 1; adev->gfx.mec.num_pipe_per_mec = 4; adev->gfx.mec.num_queue_per_pipe = 4; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c index 7b20ab48f762..ddac26b012bc 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c @@ -1355,7 +1355,7 @@ static int gfx_v12_0_sw_init(struct amdgpu_ip_block *ip_block) case IP_VERSION(12, 0, 1): adev->gfx.me.num_me = 1; adev->gfx.me.num_pipe_per_me = 1; - adev->gfx.me.num_queue_per_pipe = 1; + adev->gfx.me.num_queue_per_pipe = 8; adev->gfx.mec.num_mec = 1; adev->gfx.mec.num_pipe_per_mec = 2; adev->gfx.mec.num_queue_per_pipe = 4; -- 2.51.0 From 8307ebc15c1ea98a8a0b7837af1faa6c01514577 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Wed, 19 Mar 2025 11:56:02 -0400 Subject: [PATCH 05/16] drm/amdgpu/gfx6: fix CSIB handling We shouldn't return after the last section. We need to update the rest of the CSIB. Reviewed-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c index 9c6453b458b0..d86620f38855 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c @@ -2881,8 +2881,6 @@ static void gfx_v6_0_get_csb_buffer(struct amdgpu_device *adev, buffer[count++] = cpu_to_le32(ext->reg_index - 0xa000); for (i = 0; i < ext->reg_count; i++) buffer[count++] = cpu_to_le32(ext->extent[i]); - } else { - return; } } } -- 2.51.0 From be7652c23d833d1ab2c67b16e173b1a4e69d1ae6 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Wed, 19 Mar 2025 11:57:19 -0400 Subject: [PATCH 06/16] drm/amdgpu/gfx7: fix CSIB handling We shouldn't return after the last section. We need to update the rest of the CSIB. Reviewed-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c index 6eb48ebd3f4e..0fdc4362bbc0 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c @@ -3909,8 +3909,6 @@ static void gfx_v7_0_get_csb_buffer(struct amdgpu_device *adev, buffer[count++] = cpu_to_le32(ext->reg_index - PACKET3_SET_CONTEXT_REG_START); for (i = 0; i < ext->reg_count; i++) buffer[count++] = cpu_to_le32(ext->extent[i]); - } else { - return; } } } -- 2.51.0 From c8b8d7a4f1c5cdfbd61d75302fb3e3cdefb1a7ab Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Wed, 19 Mar 2025 11:57:34 -0400 Subject: [PATCH 07/16] drm/amdgpu/gfx8: fix CSIB handling We shouldn't return after the last section. We need to update the rest of the CSIB. Reviewed-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c index bfedd487efc5..fc73be4ab068 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c @@ -1248,8 +1248,6 @@ static void gfx_v8_0_get_csb_buffer(struct amdgpu_device *adev, PACKET3_SET_CONTEXT_REG_START); for (i = 0; i < ext->reg_count; i++) buffer[count++] = cpu_to_le32(ext->extent[i]); - } else { - return; } } } -- 2.51.0 From a4a4c0ae6742ec7d6bf1548d2c6828de440814a0 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Wed, 19 Mar 2025 11:57:49 -0400 Subject: [PATCH 08/16] drm/amdgpu/gfx9: fix CSIB handling We shouldn't return after the last section. We need to update the rest of the CSIB. Reviewed-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index d7db4cb907ae..d725e2e230a3 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -1649,8 +1649,6 @@ static void gfx_v9_0_get_csb_buffer(struct amdgpu_device *adev, PACKET3_SET_CONTEXT_REG_START); for (i = 0; i < ext->reg_count; i++) buffer[count++] = cpu_to_le32(ext->extent[i]); - } else { - return; } } } -- 2.51.0 From 683308af030cd9b8d3f1de5cbc1ee51788878feb Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Wed, 19 Mar 2025 11:58:03 -0400 Subject: [PATCH 09/16] drm/amdgpu/gfx10: fix CSIB handling We shouldn't return after the last section. We need to update the rest of the CSIB. Reviewed-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index d9c2dab17f6b..792c31411ae3 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -4322,8 +4322,6 @@ static void gfx_v10_0_get_csb_buffer(struct amdgpu_device *adev, PACKET3_SET_CONTEXT_REG_START); for (i = 0; i < ext->reg_count; i++) buffer[count++] = cpu_to_le32(ext->extent[i]); - } else { - return; } } } -- 2.51.0 From a9a8bccaa3ba64d509cf7df387cf0b5e1cd06499 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Wed, 19 Mar 2025 11:58:19 -0400 Subject: [PATCH 10/16] drm/amdgpu/gfx11: fix CSIB handling We shouldn't return after the last section. We need to update the rest of the CSIB. Reviewed-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index c78458ecb88b..cedfcd118430 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -859,8 +859,6 @@ static void gfx_v11_0_get_csb_buffer(struct amdgpu_device *adev, PACKET3_SET_CONTEXT_REG_START); for (i = 0; i < ext->reg_count; i++) buffer[count++] = cpu_to_le32(ext->extent[i]); - } else { - return; } } } -- 2.51.0 From 8f1366fcb84677293d070e31843a87c41ab8f93f Mon Sep 17 00:00:00 2001 From: Rodrigo Siqueira Date: Tue, 25 Mar 2025 11:18:42 -0600 Subject: [PATCH 11/16] Documentation/gpu: Add new acronyms MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit This commit introduces some new acronyms extracted from the source code and found on some web pages around the internet (most of them came from ArchLinux, Gentoo, and Wikipedia links). Reviewed-by: Christian König Signed-off-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- Documentation/gpu/amdgpu/amdgpu-glossary.rst | 36 ++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/Documentation/gpu/amdgpu/amdgpu-glossary.rst b/Documentation/gpu/amdgpu/amdgpu-glossary.rst index 1e9283e076ba..080c3f2250d1 100644 --- a/Documentation/gpu/amdgpu/amdgpu-glossary.rst +++ b/Documentation/gpu/amdgpu/amdgpu-glossary.rst @@ -12,15 +12,27 @@ we have a dedicated glossary for Display Core at The number of CUs that are active on the system. The number of active CUs may be less than SE * SH * CU depending on the board configuration. + BACO + Bus Alive, Chip Off + + BOCO + Bus Off, Chip Off + CE Constant Engine + CIK + Sea Islands + CP Command Processor CPLIB Content Protection Library + CS + Command Submission + CU Compute Unit @@ -33,6 +45,9 @@ we have a dedicated glossary for Display Core at EOP End Of Pipe/Pipeline + FLR + Function Level Reset + GART Graphics Address Remapping Table. This is the name we use for the GPUVM page table used by the GPU kernel driver. It remaps system resources @@ -80,6 +95,9 @@ we have a dedicated glossary for Display Core at KCQ Kernel Compute Queue + KFD + Kernel Fusion Driver + KGQ Kernel Graphics Queue @@ -89,6 +107,9 @@ we have a dedicated glossary for Display Core at MC Memory Controller + MCBP + Mid Command Buffer Preemption + ME MicroEngine (Graphics) @@ -125,9 +146,15 @@ we have a dedicated glossary for Display Core at SE Shader Engine + SGPR + Scalar General-Purpose Registers + SH SHader array + SI + Southern Islands + SMU/SMC System Management Unit / System Management Controller @@ -146,6 +173,9 @@ we have a dedicated glossary for Display Core at TA Trusted Application + TC + Texture Cache + TOC Table of Contents @@ -158,5 +188,11 @@ we have a dedicated glossary for Display Core at VCN Video Codec Next + VGPR + Vector General-Purpose Registers + + VMID + Virtual Memory ID + VPE Video Processing Engine -- 2.51.0 From 5acd17d6d14ec1112a6427e7c02569cba6c46f2d Mon Sep 17 00:00:00 2001 From: Rodrigo Siqueira Date: Tue, 25 Mar 2025 11:18:43 -0600 Subject: [PATCH 12/16] Documentation/gpu: Change index order to show driver core first Since driver-core has an overview of the AMD GPU hardware structure, it makes more sense to keep it first. This commit move driver-core up in the index list. Signed-off-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- Documentation/gpu/amdgpu/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/gpu/amdgpu/index.rst b/Documentation/gpu/amdgpu/index.rst index 4c75567854cb..fd47324437a0 100644 --- a/Documentation/gpu/amdgpu/index.rst +++ b/Documentation/gpu/amdgpu/index.rst @@ -7,8 +7,8 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) architectures. .. toctree:: - module-parameters driver-core + module-parameters display/index flashing xgmi -- 2.51.0 From c6a1c23d1041f7a5f8f259face14212c6ce4f1a8 Mon Sep 17 00:00:00 2001 From: Rodrigo Siqueira Date: Tue, 25 Mar 2025 11:18:44 -0600 Subject: [PATCH 13/16] Documentation/gpu: Create a documentation entry just for hardware info The APU and dGPU tables are hidden in the driver misc info, which makes it hard to find specific hardware info when users need it. This commit creates a single page for this information and adds it to the top of the amdgpu list to improve searchability. Signed-off-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- .../gpu/amdgpu/amd-hardware-list-info.rst | 23 +++++++++++++++++++ Documentation/gpu/amdgpu/driver-misc.rst | 17 -------------- Documentation/gpu/amdgpu/index.rst | 1 + 3 files changed, 24 insertions(+), 17 deletions(-) create mode 100644 Documentation/gpu/amdgpu/amd-hardware-list-info.rst diff --git a/Documentation/gpu/amdgpu/amd-hardware-list-info.rst b/Documentation/gpu/amdgpu/amd-hardware-list-info.rst new file mode 100644 index 000000000000..1786544fe7c1 --- /dev/null +++ b/Documentation/gpu/amdgpu/amd-hardware-list-info.rst @@ -0,0 +1,23 @@ +================================================= + AMD Hardware Components Information per Product +================================================= + +On this page, you can find the AMD product name and which component version is +part of it. + +Accelerated Processing Units (APU) Info +--------------------------------------- + +.. csv-table:: + :header-rows: 1 + :widths: 3, 2, 2, 1, 1, 1, 1 + :file: ./apu-asic-info-table.csv + +Discrete GPU Info +----------------- + +.. csv-table:: + :header-rows: 1 + :widths: 3, 2, 2, 1, 1, 1 + :file: ./dgpu-asic-info-table.csv + diff --git a/Documentation/gpu/amdgpu/driver-misc.rst b/Documentation/gpu/amdgpu/driver-misc.rst index e40e15f89fd3..25b0c857816e 100644 --- a/Documentation/gpu/amdgpu/driver-misc.rst +++ b/Documentation/gpu/amdgpu/driver-misc.rst @@ -50,23 +50,6 @@ board_info .. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c :doc: board_info -Accelerated Processing Units (APU) Info ---------------------------------------- - -.. csv-table:: - :header-rows: 1 - :widths: 3, 2, 2, 1, 1, 1, 1 - :file: ./apu-asic-info-table.csv - -Discrete GPU Info ------------------ - -.. csv-table:: - :header-rows: 1 - :widths: 3, 2, 2, 1, 1, 1 - :file: ./dgpu-asic-info-table.csv - - GPU Memory Usage Information ============================ diff --git a/Documentation/gpu/amdgpu/index.rst b/Documentation/gpu/amdgpu/index.rst index fd47324437a0..1fdc3ccb61c4 100644 --- a/Documentation/gpu/amdgpu/index.rst +++ b/Documentation/gpu/amdgpu/index.rst @@ -8,6 +8,7 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) architectures. .. toctree:: driver-core + amd-hardware-list-info module-parameters display/index flashing -- 2.51.0 From 4ede6d20047ae13855a49969b48c438d1a2ac054 Mon Sep 17 00:00:00 2001 From: Rodrigo Siqueira Date: Tue, 25 Mar 2025 11:18:45 -0600 Subject: [PATCH 14/16] Documentation/gpu: Add explanation about AMD Pipes and Queues MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Pipes and Queues are two common vocabulary that pervades discussions around amdgpu core features. The definition and explanation of those components are spread around multiple places in the code, mailing list, and Gitlab, which sometimes leads to the wrong interpretation of these concepts. This commit attempts to centralize the definition and explanation of Pipe and Queue from amdgpu perspective in a kernel doc. Most of the information in this doc was derived from: - https://lore.kernel.org/amd-gfx/CADnq5_Pcz2x4aJzKbVrN3jsZhD6sTydtDw=6PaN4O3m4t+Grtg@mail.gmail.com/T/#m9a670b55ab20e0f7c46c80f802a0a4be255a719d - https://gitlab.freedesktop.org/mesa/mesa/-/issues/11759 Reviewed-by: Christian König Signed-off-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- Documentation/gpu/amdgpu/driver-core.rst | 50 + .../gpu/amdgpu/pipe_and_queue_abstraction.svg | 1279 +++++++++++++++++ 2 files changed, 1329 insertions(+) create mode 100644 Documentation/gpu/amdgpu/pipe_and_queue_abstraction.svg diff --git a/Documentation/gpu/amdgpu/driver-core.rst b/Documentation/gpu/amdgpu/driver-core.rst index 32723a925377..d8500c9c961e 100644 --- a/Documentation/gpu/amdgpu/driver-core.rst +++ b/Documentation/gpu/amdgpu/driver-core.rst @@ -98,6 +98,56 @@ RLC (RunList Controller) The name is a vestige of old hardware where it was originally added and doesn't really have much relation to what the engine does now. + +GFX, Compute, and SDMA Overall Behavior +======================================= + +.. note:: For simplicity, whenever the term block is used in this section, it + means GFX, Compute, and SDMA. + +GFX, Compute and SDMA share a similar form of operation that can be abstracted +to facilitate understanding of the behavior of these blocks. See the figure +below illustrating the common components of these blocks: + +.. kernel-figure:: pipe_and_queue_abstraction.svg + +In the central part of this figure, you can see two hardware elements, one called +**Pipes** and another called **Queues**; it is important to highlight that Queues +must be associated with a Pipe and vice-versa. Every specific hardware IP may have +a different number of Pipes and, in turn, a different number of Queues; for +example, GFX 11 has two Pipes and two Queues per Pipe for the GFX front end. + +Pipe is the hardware that processes the instructions available in the Queues; +in other words, it is a thread executing the operations inserted in the Queue. +One crucial characteristic of Pipes is that they can only execute one Queue at +a time; no matter if the hardware has multiple Queues in the Pipe, it only runs +one Queue per Pipe. + +Pipes have the mechanics of swapping between queues at the hardware level. +Nonetheless, they only make use of Queues that are considered mapped. Pipes can +switch between queues based on any of the following inputs: + +1. Command Stream; +2. Packet by Packet; +3. Other hardware requests the change (e.g., MES). + +Queues within Pipes are defined by the Hardware Queue Descriptors (HQD). +Associated with the HQD concept, we have the Memory Queue Descriptor (MQD), +which is responsible for storing information about the state of each of the +available Queues in the memory. The state of a Queue contains information such +as the GPU virtual address of the queue itself, save areas, doorbell, etc. The +MQD also stores the HQD registers, which are vital for activating or +deactivating a given Queue. The scheduling firmware (e.g., MES) is responsible +for loading HQDs from MQDs and vice versa. + +The Queue-switching process can also happen with the firmware requesting the +preemption or unmapping of a Queue. The firmware waits for the HQD_ACTIVE bit +to change to low before saving the state into the MQD. To make a different +Queue become active, the firmware copies the MQD state into the HQD registers +and loads any additional state. Finally, it sets the HQD_ACTIVE bit to high to +indicate that the queue is active. The Pipe will then execute work from active +Queues. + Driver Structure ================ diff --git a/Documentation/gpu/amdgpu/pipe_and_queue_abstraction.svg b/Documentation/gpu/amdgpu/pipe_and_queue_abstraction.svg new file mode 100644 index 000000000000..0df3c6b3000b --- /dev/null +++ b/Documentation/gpu/amdgpu/pipe_and_queue_abstraction.svg @@ -0,0 +1,1279 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + . . . + + + + + + + + + + + + . . . + + . . . + Pipe[0] + MQD + Queue[0] + Queue[n] + ... + + + + + + + + + + + + + + + + + . . . + + + + + + + + + + + + . . . + + . . . + Pipe[1] + Queue[0] + Queue[n] + ... + + + + + + + + + + + + + + . . . + + + + + + + + + + + + . . . + + . . . + Pipe[n] + Queue[0] + Queue[n] + ... + + ... + + Hardware Block + EXECUTION + + Memory + + + + + + + + e.g.,:queue[0] + + + + + e.g.,:queue[4] + + + + + e.g.,:queue[n] + + HQD + + + + + HQD + + + + + HQD + + + + + HQD + + + + + HQD + + + + + HQD + + + + Registers + + MQD + MQD + MQD + MQD + MQD + ... + + + HQD RegistersQueue Address in the GPUDoorbell... + + SWITCH QUEUE:WAIT FOR HQD_ACTIVE = 0SAVE QUEUE STATE TO THE MQDCOPY NEW MQD STATESET HQD_ACTIVE = 1 + + Firmware + + + + -- 2.51.0 From e7aaa5fbf4fc1aa9c348075aab9bf054727f025f Mon Sep 17 00:00:00 2001 From: Rodrigo Siqueira Date: Tue, 25 Mar 2025 11:18:46 -0600 Subject: [PATCH 15/16] Documentation/gpu: Create a GC entry in the amdgpu documentation MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit GC is a large block that plays a vital role for amdgpu; for this reason, this commit creates one specific page for GC and adds extra information about the CP component. Acked-by: Christian König Signed-off-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- Documentation/gpu/amdgpu/driver-core.rst | 30 ++------------- Documentation/gpu/amdgpu/gc/index.rst | 47 ++++++++++++++++++++++++ Documentation/gpu/amdgpu/index.rst | 1 + 3 files changed, 52 insertions(+), 26 deletions(-) create mode 100644 Documentation/gpu/amdgpu/gc/index.rst diff --git a/Documentation/gpu/amdgpu/driver-core.rst b/Documentation/gpu/amdgpu/driver-core.rst index d8500c9c961e..2c7f7299d7ad 100644 --- a/Documentation/gpu/amdgpu/driver-core.rst +++ b/Documentation/gpu/amdgpu/driver-core.rst @@ -67,38 +67,16 @@ GC (Graphics and Compute) This is the graphics and compute engine, i.e., the block that encompasses the 3D pipeline and and shader blocks. This is by far the largest block on the GPU. The 3D pipeline has tons of sub-blocks. In - addition to that, it also contains the CP microcontrollers (ME, PFP, - CE, MEC) and the RLC microcontroller. It's exposed to userspace for - user mode drivers (OpenGL, Vulkan, OpenCL, etc.) + addition to that, it also contains the CP microcontrollers (ME, PFP, CE, + MEC) and the RLC microcontroller. It's exposed to userspace for user mode + drivers (OpenGL, Vulkan, OpenCL, etc.). More details in :ref:`Graphics (GFX) + and Compute `. VCN (Video Core Next) This is the multi-media engine. It handles video and image encode and decode. It's exposed to userspace for user mode drivers (VA-API, OpenMAX, etc.) -Graphics and Compute Microcontrollers -------------------------------------- - -CP (Command Processor) - The name for the hardware block that encompasses the front end of the - GFX/Compute pipeline. Consists mainly of a bunch of microcontrollers - (PFP, ME, CE, MEC). The firmware that runs on these microcontrollers - provides the driver interface to interact with the GFX/Compute engine. - - MEC (MicroEngine Compute) - This is the microcontroller that controls the compute queues on the - GFX/compute engine. - - MES (MicroEngine Scheduler) - This is a new engine for managing queues. This is currently unused. - -RLC (RunList Controller) - This is another microcontroller in the GFX/Compute engine. It handles - power management related functionality within the GFX/Compute engine. - The name is a vestige of old hardware where it was originally added - and doesn't really have much relation to what the engine does now. - - GFX, Compute, and SDMA Overall Behavior ======================================= diff --git a/Documentation/gpu/amdgpu/gc/index.rst b/Documentation/gpu/amdgpu/gc/index.rst new file mode 100644 index 000000000000..65ea93a3fbbe --- /dev/null +++ b/Documentation/gpu/amdgpu/gc/index.rst @@ -0,0 +1,47 @@ +.. _amdgpu-gc: + +======================================== + drm/amdgpu - Graphics and Compute (GC) +======================================== + +The relationship between the CPU and GPU can be described as the +producer-consumer problem, where the CPU fills out a buffer with operations +(producer) to be executed by the GPU (consumer). The requested operations in +the buffer are called Command Packets, which can be summarized as a compressed +way of transmitting command information to the graphics controller. + +The component that acts as the front end between the CPU and the GPU is called +the Command Processor (CP). This component is responsible for providing greater +flexibility to the GC since CP makes it possible to program various aspects of +the GPU pipeline. CP also coordinates the communication between the CPU and GPU +via a mechanism named **Ring Buffers**, where the CPU appends information to +the buffer while the GPU removes operations. It is relevant to highlight that a +CPU can add a pointer to the Ring Buffer that points to another region of +memory outside the Ring Buffer, and CP can handle it; this mechanism is called +**Indirect Buffer (IB)**. CP receives and parses the Command Streams (CS), and +writes the operations to the correct hardware blocks. + +Graphics (GFX) and Compute Microcontrollers +------------------------------------------- + +GC is a large block, and as a result, it has multiple firmware associated with +it. Some of them are: + +CP (Command Processor) + The name for the hardware block that encompasses the front end of the + GFX/Compute pipeline. Consists mainly of a bunch of microcontrollers + (PFP, ME, CE, MEC). The firmware that runs on these microcontrollers + provides the driver interface to interact with the GFX/Compute engine. + + MEC (MicroEngine Compute) + This is the microcontroller that controls the compute queues on the + GFX/compute engine. + + MES (MicroEngine Scheduler) + This is the engine for managing queues. + +RLC (RunList Controller) + This is another microcontroller in the GFX/Compute engine. It handles + power management related functionality within the GFX/Compute engine. + The name is a vestige of old hardware where it was originally added + and doesn't really have much relation to what the engine does now. diff --git a/Documentation/gpu/amdgpu/index.rst b/Documentation/gpu/amdgpu/index.rst index 1fdc3ccb61c4..bb2894b5edaf 100644 --- a/Documentation/gpu/amdgpu/index.rst +++ b/Documentation/gpu/amdgpu/index.rst @@ -10,6 +10,7 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) architectures. driver-core amd-hardware-list-info module-parameters + gc/index display/index flashing xgmi -- 2.51.0 From 74f0ff369f8a4f44082a71049a15d7da2e73ff7c Mon Sep 17 00:00:00 2001 From: Rodrigo Siqueira Date: Tue, 25 Mar 2025 11:18:47 -0600 Subject: [PATCH 16/16] Documentation/gpu: Add an intro about MES MES is an important firmware that lacks some essential documentation. This commit introduces an overview of it and how it works. Reviewed-by: Bagas Sanjaya Signed-off-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- Documentation/gpu/amdgpu/driver-core.rst | 2 ++ Documentation/gpu/amdgpu/gc/index.rst | 7 ++++- Documentation/gpu/amdgpu/gc/mes.rst | 38 ++++++++++++++++++++++++ 3 files changed, 46 insertions(+), 1 deletion(-) create mode 100644 Documentation/gpu/amdgpu/gc/mes.rst diff --git a/Documentation/gpu/amdgpu/driver-core.rst b/Documentation/gpu/amdgpu/driver-core.rst index 2c7f7299d7ad..7e3f5d1e9aaf 100644 --- a/Documentation/gpu/amdgpu/driver-core.rst +++ b/Documentation/gpu/amdgpu/driver-core.rst @@ -77,6 +77,8 @@ VCN (Video Core Next) decode. It's exposed to userspace for user mode drivers (VA-API, OpenMAX, etc.) +.. _pipes-and-queues-description: + GFX, Compute, and SDMA Overall Behavior ======================================= diff --git a/Documentation/gpu/amdgpu/gc/index.rst b/Documentation/gpu/amdgpu/gc/index.rst index 65ea93a3fbbe..ff6e9ef5cbee 100644 --- a/Documentation/gpu/amdgpu/gc/index.rst +++ b/Documentation/gpu/amdgpu/gc/index.rst @@ -38,10 +38,15 @@ CP (Command Processor) GFX/compute engine. MES (MicroEngine Scheduler) - This is the engine for managing queues. + This is the engine for managing queues. For more details check + :ref:`MicroEngine Scheduler (MES) `. RLC (RunList Controller) This is another microcontroller in the GFX/Compute engine. It handles power management related functionality within the GFX/Compute engine. The name is a vestige of old hardware where it was originally added and doesn't really have much relation to what the engine does now. + +.. toctree:: + + mes.rst diff --git a/Documentation/gpu/amdgpu/gc/mes.rst b/Documentation/gpu/amdgpu/gc/mes.rst new file mode 100644 index 000000000000..b99eb211b179 --- /dev/null +++ b/Documentation/gpu/amdgpu/gc/mes.rst @@ -0,0 +1,38 @@ +.. _amdgpu-mes: + +============================= + MicroEngine Scheduler (MES) +============================= + +.. note:: + Queue and ring buffer are used as a synonymous. + +.. note:: + This section assumes that you are familiar with the concept of Pipes, Queues, and GC. + If not, check :ref:`GFX, Compute, and SDMA Overall Behavior` + and :ref:`drm/amdgpu - Graphics and Compute (GC) `. + +Every GFX has a pipe component with one or more hardware queues. Pipes can +switch between queues depending on certain conditions, and one of the +components that can request a queue switch to a pipe is the MicroEngine +Scheduler (MES). Whenever the driver is initialized, it creates one MQD per +hardware queue, and then the MQDs are handed to the MES firmware for mapping +to: + +1. Kernel Queues (legacy): This queue is statically mapped to HQDs and never + preempted. Even though this is a legacy feature, it is the current default, and + most existing hardware supports it. When an application submits work to the + kernel driver, it submits all of the application command buffers to the kernel + queues. The CS IOCTL takes the command buffer from the applications and + schedules them on the kernel queue. + +2. User Queues: These queues are dynamically mapped to the HQDs. Regarding the + utilization of User Queues, the userspace application will create its user + queues and submit work directly to its user queues with no need to IOCTL for + each submission and no need to share a single kernel queue. + +In terms of User Queues, MES can dynamically map them to the HQD. If there are +more MQDs than HQDs, the MES firmware will preempt other user queues to make +sure each queues get a time slice; in other words, MES is a microcontroller +that handles the mapping and unmapping of MQDs into HQDs, as well as the +priorities and oversubscription of MQDs. -- 2.51.0