This initial version supports the NPU as shipped in the RK3588 SoC and
described in the first part of its TRM, in Chapter 36.
This NPU contains 3 independent cores that the driver can submit jobs
to.
This commit adds just hardware initialization and power management.
v2:
- Split cores and IOMMUs as independent devices (Sebastian Reichel)
- Add some documentation (Jeffrey Hugo)
- Be more explicit in the Kconfig documentation (Jeffrey Hugo)
- Remove resets, as these haven't been found useful so far (Zenghui Yu)
- Repack structs (Jeffrey Hugo)
- Use DEFINE_DRM_ACCEL_FOPS (Jeffrey Hugo)
- Use devm_drm_dev_alloc (Jeffrey Hugo)
- Use probe log helper (Jeffrey Hugo)
- Introduce UABI header in a later patch (Jeffrey Hugo)
v3:
- Adapt to a split of the register block in the DT bindings (Nicolas
Frattaroli)
- Move registers header to its own commit (Thomas Zimmermann)
- Misc. cleanups (Thomas Zimmermann and Jeff Hugo)
- Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
- PM improvements (Nicolas Frattaroli)
v4:
- Use bulk clk API (Krzysztof Kozlowski)
v6:
- Remove mention to NVDLA, as the hardware is only incidentally related
(Kever Yang)
- Use calloc instead of GFP_ZERO (Jeff Hugo)
- Explicitly include linux/container_of.h (Jeff Hugo)
- pclk and npu clocks are now needed by all cores (Rob Herring)
v7:
- Assign its own IOMMU domain to each client, for isolation (Daniel
Stone and Robin Murphy)
v8:
- Kconfig: fix depends to be more explicit about Rockchip, and remove
superfluous selects (Robin Murphy)
- Use reset lines to reset the cores (Robin Murphy)
- Reference count the module
- Set dma_set_max_seg_size
- Correctly acquire a reference to the IOMMU (Robin Murphy)
- Remove notion of top core (Robin Murphy)
Reviewed-by: Robert Foss <rfoss@kernel.org> Tested-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250721-6-10-rocket-v9-2-77ebd484941e@tomeuvizoso.net
A XML file was generated with the data from the TRM, and then this
header was generated from it.
The canonical location for the XML file is the Mesa3D repository.
v3:
- Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
v8:
- Remove full MIT license blob, to match other files with the same
licensing arrangement in the kernel
Reviewed-by: Robert Foss <rfoss@kernel.org> Tested-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250721-6-10-rocket-v9-1-77ebd484941e@tomeuvizoso.net
We choose to save the info on panthor_group_create rather than on
panthor_open because, when the two differ, we are more interested in the
task that created the group.
Langyan Ye [Wed, 23 Jul 2025 07:25:13 +0000 (15:25 +0800)]
drm/panel-edp: Add 50ms disable delay for four panels
Add 50ms disable delay for NV116WHM-N49, NV122WUM-N41, and MNC207QS1-1
to satisfy T9+T10 timing. Add 50ms disable delay for MNE007JA1-2
as well, since MNE007JA1-2 copies the timing of MNC207QS1-1.
Specifically, it should be noted that the MNE007JA1-2 panel was added
by someone who did not have the panel documentation, so they simply
copied the timing from the MNC207QS1-1 panel. Adding an extra 50 ms
of delay should be safe.
Fixes: 0547692ac146 ("drm/panel-edp: Add several generic edp panels") Fixes: 50625eab3972 ("drm/edp-panel: Add panel used by T14s Gen6 Snapdragon") Signed-off-by: Langyan Ye <yelangyan@huaqin.corp-partner.google.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20250723072513.2880369-1-yelangyan@huaqin.corp-partner.google.com
drm/display: bridge-connector: correct CEC bridge pointers in drm_bridge_connector_init
The bridge used in drm_bridge_connector_init() for CEC init does not
correctly point to the required HDMI CEC bridge, which can lead to
errors during CEC initialization.
drm/bridge: select_bus_fmt_recursive(): put the bridge obtained by drm_bridge_get_prev_bridge()
The bridge returned by drm_bridge_get_prev_bridge() is refcounted. Put it
when done.
select_bus_fmt_recursive() has several return points, and ensuring
drm_bridge_put() is always called in the right place would be error-prone
(especially with future changes to the select_bus_fmt_recursive() code) and
make code uglier. Instead use a scope-based free, which is future-proof and
a lot cleaner.
drm/bridge: get the bridge returned by drm_bridge_get_prev_bridge()
drm_bridge_get_prev_bridge() returns a bridge pointer that the
caller could hold for a long time. Increment the refcount of the returned
bridge and document it must be put by the caller.
accel/amdxdna: Support user space allocated buffer
Enhance DRM_IOCTL_AMDXDNA_CREATE_BO to accept user space allocated
buffer pointer. The buffer pages will be pinned in memory. Unless
the CAP_IPC_LOCK is enabled for the application process, the total
pinned memory can not beyond rlimit_memlock.
drm/bridge: get the bridge returned by drm_bridge_chain_get_first_bridge()
drm_bridge_chain_get_first_bridge() returns a bridge pointer that the
caller could hold for a long time. Increment the refcount of the returned
bridge and document it must be put by the caller.
drm/bridge: add a cleanup action for scope-based drm_bridge_put() invocation
Many functions get a drm_bridge pointer, only use it in the function body
(or a smaller scope such as a loop body), and don't store it. In these
cases they always need to drm_bridge_put() it before returning (or exiting
the scope).
Some of those functions have complex code paths with multiple return points
or loop break/continue. This makes adding drm_bridge_put() in the right
places tricky, ugly and error prone in case of future code changes.
Others use the bridge pointer in the return statement and would need to
split the return line to fit the drm_bridge_put, which is a bit annoying:
To make it easier for all of them to put the bridge reference correctly
without complicating code, define a scope-based cleanup action to be used
with __free().
Beata Michalska [Thu, 26 Jun 2025 16:23:13 +0000 (18:23 +0200)]
rust: drm: Drop the use of Opaque for ioctl arguments
With the Opaque<T>, the expectations are that Rust should not
make any assumptions on the layout or invariants of the wrapped
C types. That runs rather counter to ioctl arguments, which must
adhere to certain data-layout constraints. By using Opaque<T>,
ioctl handlers are forced to use unsafe code where none is actually
needed. This adds needless complexity and maintenance overhead,
brining no safety benefits.
Drop the use of Opaque for ioctl arguments as that is not the best
fit here.
Jann Horn [Wed, 13 Nov 2024 21:03:39 +0000 (22:03 +0100)]
drm/panthor: Fix memory leak in panthor_ioctl_group_create()
When bailing out due to group_priority_permit() failure, the queue_args
need to be freed. Fix it by rearranging the function to use the
goto-on-error pattern, such that the success case flows straight without
indentation while error cases jump forward to cleanup.
Cc: stable@vger.kernel.org Fixes: 5f7762042f8a ("drm/panthor: Restrict high priorities on group_create") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20241113-panthor-fix-gcq-bailout-v1-1-654307254d68@google.com
drm/sched: Avoid double re-lock on the job free path
Currently the job free work item will lock sched->job_list_lock first time
to see if there are any jobs, free a single job, and then lock again to
decide whether to re-queue itself if there are more finished jobs.
Since drm_sched_get_finished_job() already looks at the second job in the
queue we can simply add the signaled check and have it return the presence
of more jobs to be freed to the caller. That way the work item does not
have to lock the list again and repeat the signaled check.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Maíra Canal <mcanal@igalia.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Reviewed-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250716085117.56864-1-tvrtko.ursulin@igalia.com
David Francis [Thu, 17 Jul 2025 14:35:56 +0000 (10:35 -0400)]
drm: Move drm_gem ioctl kerneldoc to uapi file
The drm_gem ioctls were documented in internal file drm_gem.c
instead of uapi header drm.h. Move them there and change to
appropriate kerneldoc formatting.
David Francis [Thu, 17 Jul 2025 14:35:55 +0000 (10:35 -0400)]
drm: Add DRM prime interface to reassign GEM handle
CRIU restore of drm buffer objects requires the ability to create
or import a buffer object with a specific gem handle.
Add new drm ioctl DRM_IOCTL_GEM_CHANGE_HANDLE, which takes
the gem handle of an object and moves that object to a
specified new gem handle.
This ioctl needs to call drm_prime_remove_buf_handle,
but that function acquires the prime lock, which the ioctl
needs to hold for other purposes.
Make drm_prime_remove_buf_handle not acquire the prime lock,
and change its other caller to reflect this.
The rest of the kernel patches required to enable CRIU can be
found at
https://lore.kernel.org/dri-devel/20250617194536.538681-1-David.Francis@amd.com/
v2 - Move documentation to UAPI headers
v3 - Always return 0 on success
Signed-off-by: David Francis <David.Francis@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250717143556.857893-2-David.Francis@amd.com
Simon Ser [Thu, 1 May 2025 11:29:51 +0000 (11:29 +0000)]
drm: document DRM_MODE_PAGE_FLIP_EVENT interactions with atomic
It's not obvious off-hand which CRTCs will get a page-flip event
when using this flag in an atomic commit, because it's all
implicitly implied based on the contents of the atomic commit.
Document requirements for using this flag and how to request an
event for a CRTC.
Note, because prepare_signaling() runs right after
drm_atomic_set_property() calls, page-flip events are not delivered
for CRTCs pulled in later by DRM core (e.g. on modeset by
drm_atomic_helper_check_modeset()) or the driver (e.g. other CRTCs
sharing a DP-MST connector).
v2: fix cut off sentence in commit message (Pekka)
Signed-off-by: Simon Ser <contact@emersion.fr> Reviewed-by: Simona Vetter <simona@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Pekka Paalanen <pekka.paalanen@collabora.com> Cc: David Turner <david.turner@raspberrypi.com> Cc: Daniel Stone <daniel@fooishbar.org> Link: https://lore.kernel.org/r/20250501112945.6448-1-contact@emersion.fr
drm/v3d: Add parameter to retrieve the number of GPU resets per-fd
The GL extension KHR_robustness uses the number of global and per-context
GPU resets to learn about graphics resets that affect a GL context. This
commit introduces a new V3D parameter to retrieve the number of GPU resets
triggered by jobs submitted through a file descriptor.
To retrieve this information, user-space must use DRM_V3D_PARAM_CONTEXT_RESET_COUNTER.
drm/v3d: Add parameter to retrieve the global number of GPU resets
The GL extension KHR_robustness uses the number of global and per-context
GPU resets to learn about graphics resets that affect a GL context. This
commit introduces a new V3D parameter to retrieve the global number of
GPU resets that have happened since the driver was probed.
To retrieve this information, user-space must use DRM_V3D_PARAM_GLOBAL_RESET_COUNTER.
drm/sched: Fix a race in DRM_GPU_SCHED_STAT_NO_HANG test
The "skip reset" test waits for the timeout handler to run for the
duration of 2 * MOCK_TIMEOUT, and because the mock scheduler opted to
remove the "skip reset" flag once it fires, this gives opportunity for the
timeout handler to run twice. Second time the job will be removed from the
mock scheduler job list and the drm_mock_sched_advance() call in the test
will fail.
Fix it by making the "don't reset" flag persist for the lifetime of the
job and add a new flag to verify that the code path had executed as
expected.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: 1472e7549f84 ("drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG") Cc: Maíra Canal <mcanal@igalia.com> Cc: Philipp Stanner <phasta@kernel.org> Reviewed-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250716084817.56797-1-tvrtko.ursulin@igalia.com
Ville Syrjälä [Tue, 1 Jul 2025 09:07:08 +0000 (12:07 +0300)]
drm: Allow the caller to pass in the format info to drm_helper_mode_fill_fb_struct()
Soon all drivers should have the format info already available in the
places where they call drm_helper_mode_fill_fb_struct(). Allow it to
be passed along into drm_helper_mode_fill_fb_struct() instead of doing
yet another redundant lookup.
Start by always passing in NULL and still doing the extra lookup.
The actual changes to avoid the lookup will follow.
Ville Syrjälä [Tue, 1 Jul 2025 09:07:06 +0000 (12:07 +0300)]
drm: Look up the format info earlier
Look up the format info already in drm_internal_framebuffer_create()
so that we can later pass it along to .fb_create(). Currently various
drivers are doing additional lookups in their .fb_create()
implementations, and these lookups are rather expensive now (given
how many different pixel formats we have).
Ville Syrjälä [Tue, 1 Jul 2025 09:07:05 +0000 (12:07 +0300)]
drm: Pass pixel_format+modifier directly to drm_get_format_info()
Decouple drm_get_format_info() from struct drm_mode_fb_cmd2 and just
pass the pixel format+modifier combo in by hand.
We may want to use drm_get_format_info() outside of the normal
addfb paths where we won't have a struct drm_mode_fb_cmd2, and
creating a temporary one just for this seems silly.
Ville Syrjälä [Tue, 1 Jul 2025 09:07:04 +0000 (12:07 +0300)]
drm: Pass pixel_format+modifier to .get_format_info()
Decouple .get_format_info() from struct drm_mode_fb_cmd2 and just
pass the pixel format+modifier combo in by hand.
We may want to use .get_format_info() outside of the normal
addfb paths where we won't have a struct drm_mode_fb_cmd2, and
creating a temporary one just for this seems silly.
v2: Fix intel_fb_get_format_info() docs (Laurent)
Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <siqueira@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250701090722.13645-2-ville.syrjala@linux.intel.com
Maxime Ripard [Wed, 25 Jun 2025 15:14:37 +0000 (17:14 +0200)]
drm/tests: edid: Add edid-decode --check output
Some of our EDIDs are (rightfully) invalid, but most of them should be
valid.
Let's add the edid-decode --check of these EDIDs when they were
generated, so we know what to expect going forward, and a comment to
explicitly mention when we expect them to be broken.
drm/panel/boe-himax8279d: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/boe-tv101wum-nl6: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/himax-hx83102: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/ilitek-ili9882t: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/lpm102a188a: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/jdi-lt070me05000: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/khadas-ts050: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/kd097d04: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/lg-sw43408: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/novatek-nt36672a: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/osd101t2587-53ts: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/vvx10f034n00: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/raspberrypi: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/lq101r1sx01: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/sitronix/st7571-i2c: Add support for the ST7567 Controller
The Sitronix ST7567 is a monochrome Dot Matrix LCD Controller that has SPI,
I2C and parallel interfaces. The st7571-i2c driver only has support for I2C
so displays using other transport interfaces are currently not supported.
The DRM_FORMAT_R1 pixel format and data commands are the same than what
is used by the ST7571 controller, so only is needed a different callback
that implements the expected initialization sequence for the ST7567 chip.
Christian König [Thu, 10 Jul 2025 14:25:21 +0000 (16:25 +0200)]
drm/ttm: remove ttm_bo_validate_swapout test
The test is quite fragile since it tries to allocate halve available system
memory + 1 page.
If the system has either not enough memory to make the allocation work
with other things running in parallel or to much memory so the allocation
fails as to large/invalid the test will fail.
Completely remove the test. We already validate swapout on the device
level and that test seems to be stable.
The Samsung ATNA30DW01 panel is a 13" AMOLED eDP panel. It is similar to
the ATNA33XC20 except that it is smaller and has a higher resolution.
Tested-by: Jérôme de Bretagne <jerome.debretagne@gmail.com> Signed-off-by: Dale Whinham <daleyo@gmail.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20250714173554.14223-3-daleyo@gmail.com
drm/panfrost: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
Panfrost can skip the reset if TDR has fired before the free-job worker.
Currently, since Panfrost doesn't take any action on these scenarios, the
job is being leaked, considering that `free_job()` won't be called.
To avoid such leaks, inform the scheduler that the job did not actually
timeout and no reset was performed through the new status code
DRM_GPU_SCHED_STAT_NO_HANG.
drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
Xe can skip the reset if TDR has fired before the free job worker and can
also re-arm the timeout timer in some scenarios. Instead of manipulating
scheduler's internals, inform the scheduler that the job did not actually
timeout and no reset was performed through the new status code
DRM_GPU_SCHED_STAT_NO_HANG.
Note that, in the first case, there is no need to restart submission if it
hasn't been stopped.
drm/etnaviv: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
Etnaviv can skip a hardware reset in two situations:
1. TDR has fired before the free-job worker and the timeout is spurious.
2. The GPU is still making progress on the front-end and we can give
the job a chance to complete.
Instead of manipulating scheduler's internals, inform the scheduler that
the job did not actually timeout and no reset was performed through
the new status code DRM_GPU_SCHED_STAT_NO_HANG.
drm/v3d: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
When a CL/CSD job times out, we check if the GPU has made any progress
since the last timeout. If so, instead of resetting the hardware, we skip
the reset and allow the timer to be rearmed. This gives long-running jobs
a chance to complete.
Instead of manipulating scheduler's internals, inform the scheduler that
the job did not actually timeout and no reset was performed through
the new status code DRM_GPU_SCHED_STAT_NO_HANG.
drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG
Add a test to submit a single job against a scheduler with the timeout
configured and verify that if the job is still running, the timeout
handler will skip the reset and allow the job to complete.
As more KUnit tests are introduced to evaluate the basic capabilities of
the `timedout_job()` hook, the test suite will continue to increase in
duration. To reduce the overall running time of the test suite, decrease
the scheduler's timeout for the timeout tests.
drm/sched: Allow drivers to skip the reset and keep on running
When the DRM scheduler times out, it's possible that the GPU isn't hung;
instead, a job just took unusually long (longer than the timeout) but is
still running, and there is, thus, no reason to reset the hardware. This
can occur in two scenarios:
1. The job is taking longer than the timeout, but the driver determined
through a GPU-specific mechanism that the hardware is still making
progress. Hence, the driver would like the scheduler to skip the
timeout and treat the job as still pending from then onward. This
happens in v3d, Etnaviv, and Xe.
2. Timeout has fired before the free-job worker. Consequently, the
scheduler calls `sched->ops->timedout_job()` for a job that isn't
timed out.
These two scenarios are problematic because the job was removed from the
`sched->pending_list` before calling `sched->ops->timedout_job()`, which
means that when the job finishes, it won't be freed by the scheduler
though `sched->ops->free_job()` - leading to a memory leak.
To solve these problems, create a new `drm_gpu_sched_stat`, called
DRM_GPU_SCHED_STAT_NO_HANG, which allows a driver to skip the reset. The
new status will indicate that the job must be reinserted into
`sched->pending_list`, and the hardware / driver will still complete that
job.
drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET
Among the scheduler's statuses, the only one that indicates an error is
DRM_GPU_SCHED_STAT_ENODEV. Any status other than DRM_GPU_SCHED_STAT_ENODEV
signifies that the operation succeeded and the GPU is in a nominal state.
However, to provide more information about the GPU's status, it is needed
to convey more information than just "OK".
Therefore, rename DRM_GPU_SCHED_STAT_NOMINAL to
DRM_GPU_SCHED_STAT_RESET, which better communicates the meaning of this
status. The status DRM_GPU_SCHED_STAT_RESET indicates that the GPU has
hung, but it has been successfully reset and is now in a nominal state
again.
dt-bindings: display: rockchip,dw-mipi-dsi: Drop address/size cells
The "rockchip,dw-mipi-dsi" binding has allOf "snps,dw-mipi-dsi.yaml"
which has allOf "dsi-controller.yaml", which already has #address-cells
and #size-cells defined as '1' and '0' respectively.
So drop this re-definition.
Val Packett [Sun, 6 Jul 2025 20:50:27 +0000 (17:50 -0300)]
drm/panel-edp: Add BOE NE14QDM panel for Dell Latitude 7455
Cannot confirm which variant exactly it is, as the EDID alphanumeric data
contains '0RGNR' <0x80> 'NE14QDM' and ends there; but it's 60 Hz and with
touch.
I do not have access to datasheets for these panels, so the timing is
a guess that was tested to work fine on this laptop.
Commit ec62d37d2c0d("drm/panthor: Fix the fast-reset logic") did away
with the only reference to panthor_vm_flush_all(), so let's get rid
of the orphaned definition.
Andy Yan [Thu, 3 Jul 2025 12:49:53 +0000 (20:49 +0800)]
drm/bridge: Pass down connector to drm bridge detect hook
In some application scenarios, we hope to get the corresponding
connector when the bridge's detect hook is invoked.
In most cases, we can get the connector by drm_atomic_get_connector_for_encoder
if the encoder attached to the bridge is enabled, however there will
still be some scenarios where the detect hook of the bridge is called
but the corresponding encoder has not been enabled yet. For instance,
this occurs when the device is hot plug in for the first time.
Since the call to bridge's detect is initiated by the connector, passing
down the corresponding connector directly will make things simpler.
Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250703125027.311109-3-andyshrk@163.com
[DB: added the chunk to the cdn-dp driver] Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Andy Yan [Thu, 3 Jul 2025 12:49:52 +0000 (20:49 +0800)]
drm/bridge: Make dp/hdmi_audio_* callback keep the same paramter order with get_modes
Make the dp/hdmi_audio_* callback maintain the same parameter order as
get_modes and edid_read: first the bridge, then the connector.
Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250703125027.311109-2-andyshrk@163.com
[DB: added the chunk to the cdn-dp driver] Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Mario Limonciello [Sat, 12 Jul 2025 23:37:12 +0000 (18:37 -0500)]
PM: hibernate: Add stub for pm_hibernate_is_recovering()
Randy reports that amdgpu fails to compile with the following error:
ERROR: modpost: "pm_hibernate_is_recovering" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
This happens because pm_hibernate_is_recovering() is only compiled when
CONFIG_PM_SLEEP is set. Add a stub for it so that drivers don't need
to depend upon CONFIG_PM.
Cc: Samuel Zhang <guoqing.zhang@amd.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/dri-devel/CAJZ5v0h1CX+aTu7dFy6vB-9LM6t5J4rt7Su3qVnq1xx-BFAm=Q@mail.gmail.com/T/#m2b9fe212b35fde11d58fcbc4e0727bc02ebba7b0 Fixes: c2aaddbd2dede ("PM: hibernate: add new api pm_hibernate_is_recovering()") Acked-by: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20250712233715.821424-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Alice Ryhl [Fri, 11 Jul 2025 08:04:38 +0000 (08:04 +0000)]
drm: rust: rename as_ref() to from_raw() for drm constructors
The prefix as_* should not be used for a constructor. Constructors
usually use the prefix from_* instead.
Some prior art in the stdlib: Box::from_raw, CString::from_raw,
Rc::from_raw, Arc::from_raw, Waker::from_raw, File::from_raw_fd.
There is also prior art in the kernel crate: cpufreq::Policy::from_raw,
fs::File::from_raw_file, Kuid::from_raw, ARef::from_raw,
SeqFile::from_raw, VmaNew::from_raw, Io::from_raw.
André Almeida [Fri, 4 Jul 2025 03:06:29 +0000 (00:06 -0300)]
drm/amdgpu: Fix lifetime of struct amdgpu_task_info after ring reset
When a ring reset happens, amdgpu calls drm_dev_wedged_event() using
struct amdgpu_task_info *ti as one of the arguments. After using *ti, a
call to amdgpu_vm_put_task_info(ti) is required to correctly track its
lifetime.
However, it's called from a place that the ring reset path never reaches
due to a goto after drm_dev_wedged_event() is called. Move
amdgpu_vm_put_task_info() bellow the exit label to make sure that it's
called regardless of the code path.
amdgpu_vm_put_task_info() can only accept a valid address or NULL as
argument, so initialise *ti to make sure we can call this function if
*ti isn't used.
Fixes: a72002cb181f ("drm/amdgpu: Make use of drm_wedge_task_info") Reported-by: Dave Airlie <airlied@gmail.com> Closes: https://lore.kernel.org/dri-devel/CAPM=9tz0rQP8VZWKWyuF8kUMqRScxqoa6aVdwWw9=5yYxyYQ2Q@mail.gmail.com/ Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250704030629.1064397-1-andrealmeid@igalia.com Signed-off-by: André Almeida <andrealmeid@igalia.com>
include/drm/drm_device.h:40: warning: Function parameter or struct member 'pid' not described in 'drm_wedge_task_info'
include/drm/drm_device.h:40: warning: Function parameter or struct member 'comm' not described in 'drm_wedge_task_info'
Fixes: 183bccafa176 ("drm: Create a task info option for wedge events") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/lkml/20250618151307.4a1a5e17@canb.auug.org.au/ Reviewed-by: Raag Jadav <raag.jadav@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20250704190724.1159416-2-andrealmeid@igalia.com Signed-off-by: André Almeida <andrealmeid@igalia.com>
Samuel Zhang [Thu, 10 Jul 2025 06:23:13 +0000 (14:23 +0800)]
drm/amdgpu: do not resume device in thaw for normal hibernation
For normal hibernation, GPU do not need to be resumed in thaw since it is
not involved in writing the hibernation image. Skip resume in this case
can reduce the hibernation time.
On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
this can save 50 minutes.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20250710062313.3226149-6-guoqing.zhang@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Samuel Zhang [Thu, 10 Jul 2025 06:23:12 +0000 (14:23 +0800)]
PM: hibernate: add new api pm_hibernate_is_recovering()
dev_pm_ops.thaw() is called in following cases:
* normal case: after hibernation image has been created.
* error case 1: creation of a hibernation image has failed.
* error case 2: restoration from a hibernation image has failed.
For normal case, it is called mainly for resume storage devices for
saving the hibernation image. Other devices that are not involved
in the image saving do not need to resume the device. But since there's
no api to know which case thaw() is called, device drivers can't
conditionally resume device in thaw().
The new pm_hibernate_is_recovering() is such a api to query if thaw() is
called in normal case.
Samuel Zhang [Thu, 10 Jul 2025 06:23:11 +0000 (14:23 +0800)]
PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()
When hibernate with data center dGPUs, huge number of VRAM data will be
moved to shmem during dev_pm_ops.prepare(). These shmem pages take a lot
of system memory so that there's no enough free memory for creating the
hibernation image. This will cause hibernation fail and abort.
After dev_pm_ops.prepare(), call shrink_all_memory() to force move shmem
pages to swap disk and reclaim the pages, so that there's enough system
memory for hibernation image and less pages needed to copy to the image.
This patch can only flush and free about half shmem pages. It will be
better to flush and free more pages, even all of shmem pages, so that
there're less pages to be copied to the hibernation image and the overall
hibernation time can be reduced.
Samuel Zhang [Thu, 10 Jul 2025 06:23:10 +0000 (14:23 +0800)]
drm/amdgpu: move GTT to shmem after eviction for hibernation
When hibernate with data center dGPUs, huge number of VRAM BOs evicted
to GTT and takes too much system memory. This will cause hibernation
fail due to insufficient memory for creating the hibernation image.
Move GTT BOs to shmem in KMD, then shmem to swap disk in kernel
hibernation code to make room for hibernation image.
Samuel Zhang [Thu, 10 Jul 2025 06:23:09 +0000 (14:23 +0800)]
drm/ttm: add new api ttm_device_prepare_hibernation()
This new api is used for hibernation to move GTT BOs to shmem after
VRAM eviction. shmem will be flushed to swap disk later to reduce
the system memory usage for hibernation.