We found a regression in v5.10 on real-time server, using the
rt-kernel and the mgag200 driver. It's some really specialized
workload, with <10us latency expectation on isolated core.
After the v5.10, the real time tasks missed their <10us latency
when something prints on the screen (fbcon or printk)
The regression has been bisected to 2 commits:
commit
0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages")
commit
4862ffaec523 ("drm/mgag200: Move vmap out of commit tail")
The first one changed the system memory framebuffer from Write-Combine
to the default caching.
Before the second commit, the mgag200 driver used to unmap the
framebuffer after each frame, which implicitly does a cache flush.
Both regressions are fixed by this commit, which restore WC mapping
for the framebuffer in system memory, and add a cache flush.
This is only needed on x86_64, for low-latency workload,
so the new kconfig DRM_MGAG200_IOBURST_WORKAROUND depends on
PREEMPT_RT and X86.
For more context, the whole thread can be found here [1]
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://lore.kernel.org/dri-devel/20231019135655.313759-1-jfalempe@redhat.com/
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240208095125.377908-1-jfalempe@redhat.com
MGA G200 desktop chips and the server variants. It requires 0.3.0
of the modesetting userspace driver, and a version of mga driver
that will fail on KMS enabled devices.
+
+config DRM_MGAG200_IOBURST_WORKAROUND
+ bool "Disable buffer caching"
+ depends on DRM_MGAG200 && PREEMPT_RT && X86
+ help
+ Enable a workaround to avoid I/O bursts within the mgag200 driver at
+ the expense of overall display performance.
+ It restores the <v5.10 behavior, by mapping the framebuffer in system
+ RAM as Write-Combining, and flushing the cache after each write.
+ This is only useful on x86_64 if you want to run processes with
+ deterministic latency.
+ If unsure, say N.
return offset - 65536;
}
+#if defined(CONFIG_DRM_MGAG200_IOBURST_WORKAROUND)
+static struct drm_gem_object *mgag200_create_object(struct drm_device *dev, size_t size)
+{
+ struct drm_gem_shmem_object *shmem;
+
+ shmem = kzalloc(sizeof(*shmem), GFP_KERNEL);
+ if (!shmem)
+ return NULL;
+
+ shmem->map_wc = true;
+ return &shmem->base;
+}
+#endif
+
/*
* DRM driver
*/
.major = DRIVER_MAJOR,
.minor = DRIVER_MINOR,
.patchlevel = DRIVER_PATCHLEVEL,
+#if defined(CONFIG_DRM_MGAG200_IOBURST_WORKAROUND)
+ .gem_create_object = mgag200_create_object,
+#endif
DRM_GEM_SHMEM_DRIVER_OPS,
};
#include <drm/drm_atomic.h>
#include <drm/drm_atomic_helper.h>
+#include <drm/drm_cache.h>
#include <drm/drm_damage_helper.h>
#include <drm/drm_edid.h>
#include <drm/drm_format_helper.h>
iosys_map_incr(&dst, drm_fb_clip_offset(fb->pitches[0], fb->format, clip));
drm_fb_memcpy(&dst, fb->pitches, vmap, fb, clip);
+
+ /* Flushing the cache greatly improves latency on x86_64 */
+#if defined(CONFIG_DRM_MGAG200_IOBURST_WORKAROUND)
+ if (!vmap->is_iomem)
+ drm_clflush_virt_range(vmap->vaddr + clip->y1 * fb->pitches[0],
+ drm_rect_height(clip) * fb->pitches[0]);
+#endif
}
/*