Dave Airlie [Fri, 4 Nov 2022 07:20:12 +0000 (17:20 +1000)]
Merge tag 'drm-intel-gt-next-2022-11-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Driver Changes:
- Fix for #7306: [Arc A380] white flickering when using arc as a
secondary gpu (Matt A)
- Add Wa_18017747507 for DG2 (Wayne)
- Avoid spurious WARN on DG1 due to incorrect cache_dirty flag
(Niranjana, Matt A)
- Corrections to CS timestamp support for Gen5 and earlier (Ville)
- Fix a build error used with clang compiler on hwmon (GG)
- Improvements to LMEM handling with RPM (Anshuman, Matt A)
- Cleanups in dmabuf code (Mike)
Dave Airlie [Fri, 4 Nov 2022 02:32:11 +0000 (12:32 +1000)]
Merge tag 'drm-misc-next-2022-11-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 6.2:
UAPI Changes:
Cross-subsystem Changes:
- dma-buf: locking improvements
- firmware: New API in the RaspberryPi firmware driver used by vc4
Core Changes:
- client: Null pointer dereference fix in drm_client_buffer_delete()
- mm/buddy: Add back random seed log
- ttm: Convert ttm_resource to use size_t for its size, fix for an
undefined behaviour
Driver Changes:
- bridge:
- adv7511: use dev_err_probe
- it6505: Fix return value check of pm_runtime_get_sync
- panel:
- sitronix: Fixes and clean-ups
- lcdif: Increase DMA burst size
- rockchip: runtime_pm improvements
- vc4: Fix for a regression preventing the use of 4k @ 60Hz, and
further HDMI rate constraints check.
- vmwgfx: Cursor improvements
Gwan-gyeong Mun [Sat, 29 Oct 2022 04:42:30 +0000 (07:42 +0300)]
drm/i915/hwmon: Fix a build error used with clang compiler
Use REG_FIELD_PREP() and a constant value for hwm_field_scale_and_write()
If the first argument of FIELD_PREP() is not a compile-time constant value
or unsigned long long type, this routine of the __BF_FIELD_CHECK() macro
used internally by the FIELD_PREP() macro always returns false.
BUILD_BUG_ON_MSG(__bf_cast_unsigned(_mask, _mask) > \
__bf_cast_unsigned(_reg, ~0ull), \
_pfx "type of reg too small for mask"); \
And it returns a build error by the option among the clang
compilation options. [-Werror,-Wtautological-constant-out-of-range-compare]
Reported build error while using clang compiler:
drivers/gpu/drm/i915/i915_hwmon.c:115:16: error: result of comparison of
constant 18446744073709551615 with expression of type 'typeof (_Generic((field_msk),
char: (unsigned char)0, unsigned char: (unsigned char)0, signed char: (unsigned char)0,
unsigned short: (unsigned short)0, short: (unsigned short)0, unsigned int:
(unsigned int)0, int: (unsigned int)0, unsigned long: (unsigned long)0, long:
(unsigned long)0, unsigned long long: (unsigned long long)0, long long:
(unsigned long long)0, default: (field_msk)))' (aka 'unsigned int') is always false
[-Werror,-Wtautological-constant-out-of-range-compare]
bits_to_set = FIELD_PREP(field_msk, nval);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/bitfield.h:114:3: note: expanded from macro 'FIELD_PREP'
__BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: "); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/bitfield.h:71:53: note: expanded from macro '__BF_FIELD_CHECK'
BUILD_BUG_ON_MSG(__bf_cast_unsigned(_mask, _mask) > \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
./include/linux/build_bug.h:39:58: note: expanded from macro 'BUILD_BUG_ON_MSG'
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
./include/linux/compiler_types.h:357:22: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/compiler_types.h:345:23: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/compiler_types.h:337:9: note: expanded from macro '__compiletime_assert'
if (!(condition)) \
v2: Use REG_FIELD_PREP() macro instead of FIELD_PREP() (Jani)
Fixes: 99f55efb7911 ("drm/i915/hwmon: Power PL1 limit and TDP setting") Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com>
[Joonas: Wrapped commit message error line length to be more reasonable] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221029044230.32128-1-gwan-gyeong.mun@intel.com
Dmitry Osipenko [Sun, 30 Oct 2022 15:44:12 +0000 (18:44 +0300)]
drm/client: Prevent NULL dereference in drm_client_buffer_delete()
The drm_gem_vunmap() will crash with a NULL dereference if the passed
object pointer is NULL. It wasn't a problem before we added the locking
support to drm_gem_vunmap function because the mapping argument was always
NULL together with the object. Make drm_client_buffer_delete() to check
whether GEM is NULL before trying to unmap the GEM, it will happen on
framebuffer creation error.
Dmitry Osipenko [Sun, 30 Oct 2022 15:44:11 +0000 (18:44 +0300)]
dma-buf: Make locking consistent in dma_buf_detach()
The dma_buf_detach() locks attach->dmabuf->resv and then unlocks
dmabuf->resv, which could be a two different locks from a static
code checker perspective. In particular this triggers Smatch to
report the "double unlock" error. Make the locking pointers consistent.
Ville Syrjälä [Mon, 31 Oct 2022 13:57:02 +0000 (15:57 +0200)]
drm/i915/selftests: Test RING_TIMESTAMP on gen4/5
Now that we actually know the cs timestamp frequency on gen4/5
let's run the corresponding test.
On g4x/ilk we must read the udw of the 64bit timestamp
register. Details in {g4x,gen5)_read_clock_frequency().
The one extra caveat is that on i965 (or at least CL, don't
recall if I ever tested on BW) we must read the register
twice to get an up to date value. For some unknown reason
the first read tends to return a stale value.
Ville Syrjälä [Mon, 31 Oct 2022 13:56:58 +0000 (15:56 +0200)]
drm/i915: Fix cs timestamp frequency for ctg/elk/ilk
On ilk the UDW of TIMESTAMP increments every 1000 ns,
LDW is mbz. In order to represent that we'd need 52 bits,
but we only have 32 bits. Even worse most things want to
only deal with 32 bits of timestamp. So let's just set
up the timestamp frequency as if we only had the UDW.
On ctg/elk 63:20 of TIMESTAMP increments every 1/4 ns, 19:0
are mbz. To make life simpler let's ignore the LDW and set up
timestamp frequency based on the UDW only (increments every
1024 ns).
Marek Vasut [Fri, 14 Oct 2022 23:10:42 +0000 (01:10 +0200)]
drm/panel/panel-sitronix-st7701: Clean up CMDnBKx selection
There are two command register files, CMD1 and CMD2, where only the CMD2
contains additional register sub-files BK0..3 . Pull the register file
selection call into separate function instead of duplicating it all over
the driver. The CMD2BK2 file is undocumented in datasheet, and is used
for BIST. No functional change.
The RTNI field is multiplied by 16 and incremented by 512 before being
used as the minimum number of pixel clock per horizontal line, hence
it is necessary to subtract those 512 bytes from htotal and then divide
the result by 16 before writing the value into the RTNI field. Fix the
calculation.
Marco Felsch [Tue, 1 Nov 2022 16:46:15 +0000 (17:46 +0100)]
drm: lcdif: change burst size to 256B
If a axi bus master with a higher priority do a lot of memory access
FIFO underruns can be inspected. Increase the burst size to 256B to
avoid such underruns and to improve the memory access efficiency.
Dave Airlie [Tue, 1 Nov 2022 07:48:12 +0000 (17:48 +1000)]
Merge tag 'drm-intel-next-2022-10-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
- Hotplug code clean-up and organization (Jani, Gustavo)
- More VBT specific code clean-up, doc, organization,
and improvements (Ville)
- More MTL enabling work (Matt, RK, Anusha, Jose)
- FBC related clean-ups and improvements (Ville)
- Removing unused sw_fence_await_reservation (Niranjana)
- Big chunch of display house clean-up (Ville)
- Many Watermark fixes and clean-ups (Ville)
- Fix device info for devices without display (Jani)
- Fix TC port PLLs after readout (Ville)
- DPLL ID clean-ups (Ville)
- Prep work for finishing (de)gamma readout (Ville)
- PSR fixes and improvements (Jouni, Jose)
- Reject excessive dotclocks early (Ville)
- DRRS related improvements (Ville)
- Simplify uncore register updates (Andrzej)
- Fix simulated GPU reset wrt. encoder HW readout (Imre)
- Add a ADL-P workaround (Jose)
- Fix clear mask in GEN7_MISCCPCTL update (Andrzej)
- Temporarily disable runtime_pm for discrete (Anshuman)
- Improve fbdev debugs (Nirmoy)
- Fix DP FRL link training status (Ankit)
- Other small display fixes (Ankit, Suraj)
- Allow panel fixed modes to have differing sync
polarities (Ville)
- Clean up crtc state flag checks (Ville)
- Fix race conditions during DKL PHY accesses (Imre)
- Prep-work for cdclock squash and crawl modes (Anusha)
- ELD precompute and readout (Ville)
Zack Rusin [Wed, 26 Oct 2022 03:19:36 +0000 (23:19 -0400)]
drm/vmwgfx: Cleanup the cursor snooping code
Cursor snooping depended on implicit size and format which made debugging
quite difficult. Make the code easier to following by making everything
explicit and instead of using magic numbers predefine all the
parameters the code depends on.
Also fixes incorrectly computed pitches for non-aligned cursor snoops.
Fix which has no practical effect because non-aligned cursor snoops
are not used by the X11 driver and Wayland cursors will go through
mob cursors, instead of surface dma's.
Zack Rusin [Wed, 26 Oct 2022 03:19:35 +0000 (23:19 -0400)]
drm/vmwgfx: Validate the box size for the snooped cursor
Invalid userspace dma surface copies could potentially overflow
the memcpy from the surface to the snooped image leading to crashes.
To fix it the dimensions of the copybox have to be validated
against the expected size of the snooped cursor.
Signed-off-by: Zack Rusin <zackr@vmware.com> Fixes: 2ac863719e51 ("vmwgfx: Snoop DMA transfers with non-covering sizes") Cc: <stable@vger.kernel.org> # v3.2+ Reviewed-by: Michael Banack <banackm@vmware.com> Reviewed-by: Martin Krastev <krastevm@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221026031936.1004280-1-zack@kde.org
Matthew Auld [Fri, 28 Oct 2022 15:50:27 +0000 (16:50 +0100)]
drm/i915/selftests: exercise GPU access from the importer
Using PAGE_SIZE here potentially hides issues so bump that to something
larger. This should also make it possible for iommu to coalesce entries
for us. With that in place verify we can write from the GPU using the
importers sg_table, followed by checking that our writes match when read
from the CPU side.
v2: Switch over to igt_gpu_fill_dw(), which looks to be more widely
supported than the migrate stuff (at least OOTB).
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7306 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221028155029.494736-2-matthew.auld@intel.com
Matthew Auld [Fri, 28 Oct 2022 15:50:26 +0000 (16:50 +0100)]
drm/i915/dmabuf: fix sg_table handling in map_dma_buf
We need to iterate over the original entries here for the sg_table,
pulling out the struct page for each one, to be remapped. However
currently this incorrectly iterates over the final dma mapped entries,
which is likely just one gigantic sg entry if the iommu is enabled,
leading to us only mapping the first struct page (and any physically
contiguous pages following it), even if there is potentially lots more
data to follow.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7306 Fixes: 1286ff739773 ("i915: add dmabuf/prime buffer sharing support.") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Michael J. Ruhl <michael.j.ruhl@intel.com> Cc: <stable@vger.kernel.org> # v3.5+ Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221028155029.494736-1-matthew.auld@intel.com
Anshuman Gupta [Thu, 27 Oct 2022 09:22:42 +0000 (14:52 +0530)]
drm/i915/dgfx: Grab wakeref at i915_ttm_unmap_virtual
We had already grabbed the rpm wakeref at obj destruction path,
but it also required to grab the wakeref when object moves.
When i915_gem_object_release_mmap_offset() gets called by
i915_ttm_move_notify(), it will release the mmap offset without
grabbing the wakeref. We want to avoid that therefore,
grab the wakeref at i915_ttm_unmap_virtual() accordingly.
While doing that also changed the lmem_userfault_lock from
mutex to spinlock, as spinlock widely used for list.
Also changed if (obj->userfault_count) to
GEM_BUG_ON(!obj->userfault_count).
v2:
- Removed lmem_userfault_{list,lock} from intel_gt. [Matt Auld]
Anshuman Gupta [Thu, 27 Oct 2022 09:22:41 +0000 (14:52 +0530)]
drm/i915: Encapsulate lmem rpm stuff in intel_runtime_pm
Runtime pm is not really per GT, therefore it make sense to
move lmem_userfault_list, lmem_userfault_lock and
userfault_wakeref from intel_gt to intel_runtime_pm structure,
which is embedded to i915.
drm/rockchip: lvds: fix PM usage counter unbalance in poweron
pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
We fix it by replacing it with the newest pm_runtime_resume_and_get
to keep usage counter balanced.
drm/rockchip: vop2: Register Esmart0-win0 as primary plane
Esmart0-win0 could serve as primary plane, so mark it as such. On
RK3568 this window will never be used as primary plane, because the
three windows at the beginning of the rk3568_vop_win_data[] array
will be used. On RK3566 however, two of the windows at the beginning
of the rk3568_vop_win_data[] array cannot not be used due to hardware
limitations, so without this patch we end up with CRTCs without primary
planes when multiple VPs are active.
Johan Jonker [Wed, 19 Oct 2022 21:35:03 +0000 (23:35 +0200)]
drm: rockchip: remove rockchip_drm_framebuffer_init() function
The function rockchip_drm_framebuffer_init() was in use
in the rockchip_drm_fbdev.c file, but that is now replaced
by a generic fbdev setup. Reduce the image size by
removing the rockchip_drm_framebuffer_init() and sub function
rockchip_fb_alloc() and cleanup the rockchip_drm_fb.h header file.
Ahmad Fatoum [Wed, 26 Oct 2022 12:52:46 +0000 (14:52 +0200)]
drm: bridge: adv7511: use dev_err_probe in probe function
adv7511 probe may need to be attempted multiple times before no
-EPROBE_DEFER is returned. Currently, every such probe results in
an error message:
[ 4.534229] adv7511 1-003d: failed to find dsi host
[ 4.580288] adv7511 1-003d: failed to find dsi host
This is misleading, as there is no error and probe deferral is normal
behavior. Fix this by using dev_err_probe that will suppress
-EPROBE_DEFER errors. While at it, we touch all dev_err in the probe
path. This makes the code more concise and included the error code
everywhere to aid user in debugging.
Ville Syrjälä [Wed, 26 Oct 2022 10:11:33 +0000 (13:11 +0300)]
drm/i915/sdvo: Reduce copy-pasta in output setup
Avoid having to call the output init function for each
output type separately. We can just call the right one
based on the "class" of the output.
Technically we could just walk the bits of the bitmask
but that could change the order in which we initialize
the outputs. To avoid any behavioural changes keep to
the same explicit probe order as before.
Ville Syrjälä [Wed, 26 Oct 2022 10:11:32 +0000 (13:11 +0300)]
drm/i915/sdvo: Get rid of the output type<->device index stuff
Get rid of this silly output type<->device index back and
forth and just pass the output type directly to the corresponding
output init function. This was already being done for TV outputs
anyway.
Ville Syrjälä [Wed, 26 Oct 2022 10:11:31 +0000 (13:11 +0300)]
drm/i915/sdvo: Don't add DDC modes for LVDS
Stop enumerating the DDC modes for SDVO LVDS outputs (outside
the initial fixed mode setup). intel_panel_mode_valid() will
just reject most of them anyway, and any left over are entirely
pointless as they'll match the fixed mode hdisp+vdisp+vrefresh
so no user visible effect from using them instead of the fixed
mode.
Ville Syrjälä [Wed, 26 Oct 2022 10:11:30 +0000 (13:11 +0300)]
drm/i915/sdvo: Simplify output setup debugs
Get rid of this funny byte based dumping of invalid output
flags and just dump it as a single hex numbers. Also do that
early since all the rest is going to get skipped anyway of
the thing is zero.
Ville Syrjälä [Wed, 26 Oct 2022 10:11:28 +0000 (13:11 +0300)]
drm/i915/sdvo: Setup DDC fully before output init
Call intel_sdvo_select_ddc_bus() before initializing any
of the outputs. And before that is functional (assuming no VBT)
we have to set up the controlled_outputs thing. Otherwise DDC
won't be functional during the output init but LVDS really
needs it for the fixed mode setup.
Note that the whole multi output support still looks very
bogus, and more work will be needed to make it correct.
But for now this should at least fix the LVDS EDID fixed mode
setup.
Ville Syrjälä [Wed, 26 Oct 2022 10:11:27 +0000 (13:11 +0300)]
drm/i915/sdvo: Filter out invalid outputs more sensibly
We try to filter out the corresponding xxx1 output
if the xxx0 output is not present. But the way that is
being done is pretty awkward. Make it less so.
Maxime Ripard [Thu, 27 Oct 2022 12:52:44 +0000 (14:52 +0200)]
drm/vc4: hdmi: Fix hdmi_enable_4kp60 detection
In order to support higher HDMI frequencies, users have to set the
hdmi_enable_4kp60 parameter in their config.txt file.
We were detecting this so far by calling clk_round_rate() on the core
clock with the frequency we're supposed to run at when one of those
modes is enabled. Whether or not the parameter was enabled could then be
inferred by the returned rate since the maximum clock rate reported by
the firmware was one of the side effect of setting that parameter.
However, the recent clock rework we did changed what clk_round_rate()
was returning to always return the minimum allowed, and thus this test
wasn't reliable anymore.
Let's use the new clk_get_max_rate() function to reliably determine the
maximum rate allowed on that clock and fix the 4k@60Hz output.
Maxime Ripard [Thu, 27 Oct 2022 12:52:43 +0000 (14:52 +0200)]
firmware: raspberrypi: Provide a helper to query a clock max rate
The firmware allows to query for its clocks the operating range of a
given clock. We'll need this for some drivers (KMS, in particular) to
infer the state of some configuration options, so let's create a
function to do so.
A significant number of RaspberryPi drivers using the firmware don't
have a phandle to it, so end up scanning the device tree to find a node
with the firmware compatible.
That code is duplicated everywhere, so let's introduce a helper instead.
drm/i915/perf: Save/restore EU flex counters across reset
If a drm client is killed, then hw contexts used by the client are reset
immediately. This reset clears the EU flex counter configuration. If an
OA use case is running in parallel, it would start seeing zeroed eu
counter values following the reset even if the drm client is restarted.
Save/restore the EU flex counter config so that the EU counters can be
monitored continuously across resets.
v2:
- Save/restore eu flex config only for gen12, as for pre-gen12, these
are saved and restored in the context image.
OA reports in the OA buffer contain an OA timestamp field that helps
user calculate delta between 2 OA reports. The calculation relies on the
CS timestamp frequency to convert the timestamp value to nanoseconds.
The CS timestamp frequency is a function of the CTC_SHIFT value in
RPM_CONFIG0.
In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
actual value from RPM_CONFIG0. At the user level, this results in an
error in calculating delta between 2 OA reports since the OA timestamp
is not shifted in the same manner as CS timestamp. Also the periodicity
of the reports is different from what the user configured because of
mismatch in the CS and OA frequencies.
The issue also affects MI_REPORT_PERF_COUNT command.
To resolve this, return actual OA timestamp frequency to the user in
i915_getparam_ioctl, so that user can calculate the right OA exponent as
well as interpret the reports correctly.
drm/i915/perf: Store a pointer to oa_format in oa_buffer
DG2 introduces OA reports with 64 bit report header fields. Perf OA
would need more information about the OA format in order to process such
reports. Store all OA format info in oa_buffer instead of just the size
and format-id.
drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
Make perf part of gt as the OAG buffer is specific to a gt. The refactor
eventually simplifies programming the right OA buffer and the right HW
registers when supporting multiple gts.
drm/i915/perf: Determine gen12 oa ctx offset at runtime
Some SKUs of same gen12 platform may have different oactxctrl
offsets. For gen12, determine oactxctrl offsets at runtime.
v2: (Lionel)
- Move MI definitions to intel_gpu_commands.h
- Ensure __find_reg_in_lri does read past context image size
v3: (Ashutosh)
- Drop unnecessary use of double underscores
- fix find_reg_in_lri
- Return error if oa context offset is U32_MAX
- Error out if oa_ctx_ctrl_offset does not find offset
v4: (Ashutosh)
- Warn on odd MI LRI_LEN
- Remove unnecessary check for valid_oactxctrl_offset
- Drop valid_oactxctrl_offset macro
Predication for batch buffer commands changed in XEHPSDV.
MI_BATCH_BUFFER_START predicates based on MI_SET_PREDICATE_RESULT
register. The MI_SET_PREDICATE_RESULT register can only be modified
with MI_SET_PREDICATE command. When configured, the MI_SET_PREDICATE
command sets MI_SET_PREDICATE_RESULT based on bit 0 of
MI_PREDICATE_RESULT_2. Use this to configure predication in noa_wait.
v2:
- Update commit title (Ashutosh)
- Coding style fixes (Lionel)
- 64 bit OA formats need UMD changes in GPUvis, drop for now and send in a
separate series with UMD changes
v3:
- Update commit message to drop 64 bit related description
drm/i915/perf: Fix OA filtering logic for GuC mode
With GuC mode of submission, GuC is in control of defining the context
id field that is part of the OA reports. To filter reports, UMD and KMD
must know what sw context id was chosen by GuC. There is not interface
between KMD and GuC to determine this, so read the upper-dword of
EXECLIST_STATUS to filter/squash OA reports for the specific context.
Ville Syrjälä [Wed, 26 Oct 2022 17:01:50 +0000 (20:01 +0300)]
drm/i915/sdvo: Extract intel_sdvo_has_audio()
Pull the SDVO audio state computation into a helper.
This is almost identical to intel_hdmi_has_audio(),
except the sink capabilities are stored under intel_sdvo
rather than intel_hdmi. Might be nice to get rid of
this duplication eventually...
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Kai Vehmanen <kai.vehmanen@linux.intel.com> Cc: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221026170150.2654-16-ville.syrjala@linux.intel.com
Ville Syrjälä [Wed, 26 Oct 2022 17:01:49 +0000 (20:01 +0300)]
drm/i915/audio: Do the vblank waits
The spec tells us to do a bunch of vblank waits in the audio
enable/disable sequences. Make it so.
The FIXMEs are nonsense since we do the audio disable very
early and enable very late, so vblank interrupts are in fact
enabled when we do this.
TODO not sure we actually want these since we don't even rely
on the hw ELD buffer, and these might be there just to give
the audio side a bit of time to respond to the unsol events.
OTOH they might be really needed for some other reason.
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Kai Vehmanen <kai.vehmanen@linux.intel.com> Cc: Takashi Iwai <tiwai@suse.de> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221026170150.2654-15-ville.syrjala@linux.intel.com
Ville Syrjälä [Wed, 26 Oct 2022 17:01:48 +0000 (20:01 +0300)]
drm/i915/audio: Split "ELD valid" vs. audio PD on hsw+
On the older platforms the audio presence detect bit is in
the port register, so it gets written outside audio codec hooks
and is this separate from the ELD valid toggling. Split the
operations into two steps on hsw+ to be more consistent with
both the other platforms and the spec. Also according to the
spec we might need some vblank waits between the two which
definitely needs them done separately.
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Kai Vehmanen <kai.vehmanen@linux.intel.com> Cc: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221026170150.2654-14-ville.syrjala@linux.intel.com
Ville Syrjälä [Wed, 26 Oct 2022 17:01:47 +0000 (20:01 +0300)]
drm/i915/audio: Use intel_de_rmw() for most audio registers
The audio code does a lot of RMW accesses. Utilize
intel_de_rmw() to make that a bit less tedious.
There are still some hand rolled RMW left, but those have
a lot of code in between the read and write to calculate
the new value, so would need some refactoring first.
v2: Add parens around the ?: to satisfy the robot
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Kai Vehmanen <kai.vehmanen@linux.intel.com> Cc: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221026170150.2654-13-ville.syrjala@linux.intel.com
Ville Syrjälä [Wed, 26 Oct 2022 17:01:45 +0000 (20:01 +0300)]
drm/i915/audio: Make sure we write the whole ELD buffer
Currently we only write as many dwords into the hardware
ELD buffers as drm_eld_size() tells us. That could mean the
remainder of the hardware buffer is left with whatever
stale garbage it had before, which doesn't seem entirely
great. Let's zero out the remainder of the buffer in case
the provided ELD doesn't fill it fully.
We can also sanity check out idea of the hardware ELD buffer's
size by making sure the address wrapped back to zero once
we wrote the entire buffer.
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Kai Vehmanen <kai.vehmanen@linux.intel.com> Cc: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221026170150.2654-11-ville.syrjala@linux.intel.com
Ville Syrjälä [Wed, 26 Oct 2022 17:01:44 +0000 (20:01 +0300)]
drm/i915/audio: Read ELD buffer size from hardware
We currently read the ELD buffer size from hardware on g4x,
but on ilk+ we just hardcode it to 84 bytes. Let's unify
this and just do the hardware readout on all platforms,
in case the size changes in the future or something.
TODO: should perhaps do the readout during driver init and
stash the results somewhere so that we could check that the
connector's ELD actually fits and not even try to enable audio
in that case...
v2: Document the size is in dwords (Jani)
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Kai Vehmanen <kai.vehmanen@linux.intel.com> Cc: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221026170150.2654-10-ville.syrjala@linux.intel.com
Ville Syrjälä [Wed, 26 Oct 2022 17:01:42 +0000 (20:01 +0300)]
drm/i915/audio: Protect singleton register with a lock
On the "ilk" platforms AUD_CNTL_ST2 is a singleton. Protect
it with the audio mutex in case we ever want to do parallel
RMW access to it.
Currently that should not happen since we only do audio
enable/disable from full modesets, and those are fully
serialized. But we probably want to think about toggling
audio on/off from fastsets too.
The hsw codepaths already have the same locking.
g4x should not need it since it can only do audio to a
single port at a time, which means it's actually broken
in more ways than this atm.
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Kai Vehmanen <kai.vehmanen@linux.intel.com> Cc: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221026170150.2654-8-ville.syrjala@linux.intel.com
Ville Syrjälä [Wed, 26 Oct 2022 17:01:39 +0000 (20:01 +0300)]
drm/i915/audio: Extract struct ilk_audio_regs
The "ilk" audio codec codepaths have some duplicated code
to figure out the correct registers to use on each platform.
Extrat that into a single place.
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Kai Vehmanen <kai.vehmanen@linux.intel.com> Cc: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221026170150.2654-5-ville.syrjala@linux.intel.com
Nathan Chancellor [Tue, 25 Oct 2022 19:50:15 +0000 (21:50 +0200)]
drm/i915: Fix CFI violations in gt_sysfs
When booting with CONFIG_CFI_CLANG, there are numerous violations when
accessing the files under
/sys/devices/pci0000:00/0000:00:02.0/drm/card0/gt/gt0:
$ cd /sys/devices/pci0000:00/0000:00:02.0/drm/card0/gt/gt0
With kCFI, indirect calls are validated against their expected type
versus actual type and failures occur when the two types do not match.
The ultimate issue is that these sysfs functions are expecting to be
called via dev_attr_show() but they may also be called via
kobj_attr_show(), as certain files are created under two different
kobjects that have two different sysfs_ops in intel_gt_sysfs_register(),
hence the warnings above. When accessing the gt_ files under
/sys/devices/pci0000:00/0000:00:02.0/drm/card0, which are using the same
sysfs functions, there are no violations, meaning the functions are
being called with the proper type.
To make everything work properly, adjust certain functions to match the
type of the ->show() and ->store() members in 'struct kobj_attribute'.
Add a macro to generate functions for that can be called via both
dev_attr_{show,store}() or kobj_attr_{show,store}() so that they can be
called through both kobject locations without violating kCFI and adjust
the attribute groups to account for this.
Pin-yen Lin [Thu, 27 Oct 2022 03:21:49 +0000 (11:21 +0800)]
drm/bridge: it6505: Fix return value check for pm_runtime_get_sync
`pm_runtime_get_sync` may return 1 on success. Fix the `if` statement
here to make the code less confusing, even though additional calls to
`it6505_poweron` doesn't break anything when it's already powered.
This was reported by Dan Carpenter <dan.carpenter@oracle.com> in
https://lore.kernel.org/all/Y1fMCs6VnxbDcB41@kili/
Fixes: 10517777d302 ("drm/bridge: it6505: Adapt runtime power management framework") Signed-off-by: Pin-yen Lin <treapking@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20221027032149.2739912-1-treapking@chromium.org
Karolina Drobnik [Tue, 25 Oct 2022 09:19:03 +0000 (11:19 +0200)]
i915/i915_gem_context: Remove debug message in i915_gem_context_create_ioctl
We know that as long as GEM context create ioctl succeeds, a context was
created. There is no need to write about it, especially when such a message
heavily pollutes dmesg and makes debugging actual errors harder.
Since commit baa89ba3f1fe ("drm/i915/gem: initial conversion to new
logging macros using coccinelle"), the logging for creating a new user
context was moved under the driver debug output (for lack of a means for
per-user logs, and a lack of user-focused drm.debug parameter). This
only reveals how obnoxious having that spam be part of the driver debug
logs, so remove it. [ from Chris Wilson ]
Somalapuram Amaranath [Thu, 27 Oct 2022 09:12:37 +0000 (14:42 +0530)]
drm/ttm: rework on ttm_resource to use size_t type
Change ttm_resource structure from num_pages to size_t size in bytes.
v1 -> v2: change PFN_UP(dst_mem->size) to ttm->num_pages
v1 -> v2: change bo->resource->size to bo->base.size at some places
v1 -> v2: remove the local variable
v1 -> v2: cleanup cmp_size_smaller_first()
v2 -> v3: adding missing PFN_UP in ttm_bo_vm_fault_reserved
Robert Beckett [Thu, 20 Oct 2022 11:03:08 +0000 (13:03 +0200)]
drm/i915: stop abusing swiotlb_max_segment
swiotlb_max_segment used to return either the maximum size that swiotlb
could bounce, or for Xen PV PAGE_SIZE even if swiotlb could bounce buffer
larger mappings. This made i915 on Xen PV work as it bypasses the
coherency aspect of the DMA API and can't cope with bounce buffering
and this avoided bounce buffering for the Xen/PV case.
So instead of adding this hack back, check for Xen/PV directly in i915
for the Xen case and otherwise use the proper DMA API helper to query
the maximum mapping size.
Replace swiotlb_max_segment() calls with dma_max_mapping_size().
In i915_gem_object_get_pages_internal() no longer consider max_segment
only if CONFIG_SWIOTLB is enabled. There can be other (iommu related)
causes of specific max segment sizes.
Fixes: a2daa27c0c61 ("swiotlb: simplify swiotlb_max_segment") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
[hch: added the Xen hack, rewrote the changelog] Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221020110308.1582518-1-hch@lst.de
Reported-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Yang A Shi <yang.a.shi@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221024101946.28974-1-matthew.auld@intel.com
With the introduction of the delayed disable-sched behavior,
we use the GuC's xarray of valid guc-id's as a way to
identify if new requests had been added to a context
when the said context is being checked for closure.
Additionally that prior change also closes the race for when
a new incoming request fails to cancel the pending
delayed disable-sched worker.
With these two complementary checks, we see no more
use for intel_context:guc_state:number_committed_requests.
Matthew Brost [Thu, 6 Oct 2022 22:51:20 +0000 (15:51 -0700)]
drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis
Add a delay, configurable via debugfs (default 34ms), to disable
scheduling of a context after the pin count goes to zero. Disable
scheduling is a costly operation as it requires synchronizing with
the GuC. So the idea is that a delay allows the user to resubmit
something before doing this operation. This delay is only done if
the context isn't closed and less than a given threshold
(default is 3/4) of the guc_ids are in use.
Alan Previn: Matt Brost first introduced this patch back in Oct 2021.
However no real world workload with measured performance impact was
available to prove the intended results. Today, this series is being
republished in response to a real world workload that benefited greatly
from it along with measured performance improvement.
Workload description: 36 containers were created on a DG2 device where
each container was performing a combination of 720p 3d game rendering
and 30fps video encoding. The workload density was configured in a way
that guaranteed each container to ALWAYS be able to render and
encode no less than 30fps with a predefined maximum render + encode
latency time. That means the totality of all 36 containers and their
workloads were not saturating the engines to their max (in order to
maintain just enough headrooom to meet the min fps and max latencies
of incoming container submissions).
Problem statement: It was observed that the CPU core processing the i915
soft IRQ work was experiencing severe load. Using tracelogs and an
instrumentation patch to count specific i915 IRQ events, it was confirmed
that the majority of the CPU cycles were caused by the
gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
majority of the cycles was determined to be processing a specific G2H
IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
by GuC in response to i915 KMD sending H2G requests:
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
whenever a context goes idle so that we can unpin the context from GuC.
The high CPU utilization % symptom was limiting density scaling.
Root Cause Analysis: Because the incoming execution buffers were spread
across 36 different containers (each with multiple contexts) but the
system in totality was NOT saturated to the max, it was assumed that each
context was constantly idling between submissions. This was causing
a thrashing of unpinning contexts from GuC at one moment, followed quickly
by repinning them due to incoming workload the very next moment. These
event-pairs were being triggered across multiple contexts per container,
across all containers at the rate of > 30 times per sec per context.
Metrics: When running this workload without this patch, we measured an
average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
seconds or ~10 million times over ~25+ mins. With this patch, the count
reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
improvement observed is ~99% for the average counts per 10 seconds.
Design awareness: Selftest impact.
As temporary WA disable this feature for the selftests. Selftests are
very timing sensitive and any change in timing can cause failure. A
follow up patch will fixup the selftests to understand this delay.
Design awareness: Race between guc_request_alloc and guc_context_close.
If a context close is issued while there is a request submission in
flight and a delayed schedule disable is pending, guc_context_close
and guc_request_alloc will race to cancel the delayed disable.
To close the race, make sure that guc_request_alloc waits for
guc_context_close to finish running before checking any state.
Design awareness: GT Reset event.
If a gt reset is triggered, as preparation steps, add an additional step
to ensure all contexts that have a pending delay-disable-schedule task
be flushed of it. Move them directly into the closed state after cancelling
the worker. This is okay because the existing flow flushes all
yet-to-arrive G2H's dropping them anyway.
Alan Previn [Wed, 26 Oct 2022 06:05:06 +0000 (23:05 -0700)]
drm/i915/guc: Fix GuC error capture sizing estimation and reporting
During GuC error capture initialization, we estimate the amount of size
we need for the error-capture-region of the shared GuC-log-buffer.
This calculation was incorrect so fix that. With the fixed calculation
we can reduce the allocation of error-capture region from 4MB to 1MB
(see note2 below for reasoning). Additionally, switch from drm_notice to
drm_debug for the 3X spare size check since that would be impossible to
hit without redesigning gpu_coredump framework to hold multiple captures.
NOTE1: Even for 1x the min size estimation case, actually running out
of space is a corner case because it can only occur if all engine
instances get reset all at once and i915 isn't able extract the capture
data fast enough within G2H handler worker.
NOTE2: With the corrected calculation, a DG2 part required ~77K and a PVC
required ~115K (1X min-est-size that is calculated as one-shot all-engine-
reset scenario).
Fixes: d7c15d76a554 ("drm/i915/guc: Check sizing of guc_capture output") Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris.p.wilson@intel.com> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221026060506.1007830-2-alan.previn.teres.alexis@intel.com
Vinay Belgaumkar [Mon, 24 Oct 2022 22:54:53 +0000 (15:54 -0700)]
drm/i915/slpc: Use platform limits for min/max frequency
GuC will set the min/max frequencies to theoretical max on
ATS-M. This will break kernel ABI, so limit min/max frequency
to RP0(platform max) instead.
Also modify the SLPC selftest to update the min frequency
when we have a server part so that we can iterate between
platform min and max.
v2: Check softlimits instead of platform limits (Riana)
v3: More review comments (Ashutosh)
v4: No need to use saved_min_freq and other comments (Ashutosh)
Driver had discrepancy in how cdclk squash and crawl support
were checked. Like crawl, add squash as a 1 bit feature flag
to the display section of DG2.
Vinay Belgaumkar [Mon, 24 Oct 2022 17:11:08 +0000 (10:11 -0700)]
drm/i915/slpc: Optmize waitboost for SLPC
Waitboost (when SLPC is enabled) results in a H2G message. This can result
in thousands of messages during a stress test and fill up an already full
CTB. There is no need to request for boost if min softlimit is equal or
greater than it.
v2: Add the tracing back, and check requested freq
in the worker thread (Tvrtko)
v3: Check requested freq in dec_waiters as well
v4: Only check min_softlimit against boost_freq. Limit this
optimization for server parts for now.
v5: min_softlimit can be greater than boost (Ashutosh)
Not all Dekel PHY registers have a lane instance, so having to specify
this when using them is awkward. It makes more sense to define each PHY
register with its full internal PHY offset where bits 15:12 is the lane
for lane-instanced PHY registers and just a register bank index for other
PHY registers. This way lane-instanced registers can be referred to with
the (tc_port, lane) parameters, while other registers just with a tc_port
parameter.
An additional benefit of this change is to prevent passing a Dekel
register to a generic MMIO access function or vice versa.
v2:
- Fix parameter reuse in the DKL_REG_MMIO definition.
v3:
- Rebase on latest patchset version.