Dave Airlie [Mon, 30 Jan 2023 03:35:56 +0000 (13:35 +1000)]
Merge tag 'drm-intel-next-2023-01-27' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
drm/i915 feature pull #2 v6.3:
Features and functionality:
- Enable HF-EEODB by switching HDMI, DP and LVDS to use struct drm_edid (Jani)
- Start using unversioned DMC firmware paths for new platforms (Gustavo)
Refactoring and cleanups:
- ELD refactor: Stop using hardware buffer, precompute ELD, and wire up ELD in
the state checker (Ville)
- Use generics for debugfs device parameters (Jani)
- DSB refactoring and fixes (Ville)
- Header refactoring, add new intel_display_limits.h (Jani)
- Split out GMCH code to a new file (Jani)
- Split out vblank code to a new file (Jani)
- i915_drv.h and struct drm_i915_private cleanups (Jani)
- Simplify FBC and DRRS debug attributes (Deepak R Varma)
- Remove some single-use macros (Rodrigo)
Fixes:
- Fix scaler limits for display versions 12 and 13 (Luca)
- Fix plane source size check for zero height (Drew Davenport)
- Implement PSR2 selective fetch workaround (Jouni)
- Expand a PSR workaound to more platforms and pipes (Jouni)
- Expand an HDMI infoframe workaround to all MTL steppings (Jouni)
- Enable PIPEDMC whenever its corresponding pipe is enabled (Imre)
Dave Airlie [Mon, 30 Jan 2023 02:43:10 +0000 (12:43 +1000)]
Merge tag 'drm-habanalabs-next-2023-01-26' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next
This tag contains habanalabs driver and accel changes for v6.3:
- Moved the driver to the accel subsystem. Currently only the files were
moved (including the uapi file which was also renamed). This doesn't
include registering to the accel subsystem. This will probably be only
in the next kernel version.
- In case of decoder error (axi error) in Gaudi2, we can now find the exact
IP that initiated the erroneous transaction and print the details for
better debug.
- Add more trace events. We now can trace mmio transactions and communication
with the preboot firmware.
- Add to Gaudi2 support for abrupt reset that is done by the firmware. This
was support so far only for Gaudi1.
- Add uAPI to flush memory transactions (to the device memory). This is
needed by the communications library in case of doing p2p with a host NIC
which access our HBM directly through the PCI BAR.
- Add uAPI to pass-through a request from user-space to firmware and get the
result back to user-space. This will allow the driver code to avoid the
need to add new packet (in the communication channel with the firmware) for
every new request type.
- Remove the option to export dma-buf by memory allocation handle in our uAPI.
This was planned for Gaudi2 but was never used. Instead, we will do export
by memory address (same as Gaudi1). In addition, we added the option to
specify an offset to the address. This is needed in Gaudi2 because there
the user allocates the entire HBM in one allocation, but would like to
export only small part of it.
- Multiple bug fixes, refactors and small optimizations.
Dave Airlie [Fri, 27 Jan 2023 02:15:13 +0000 (12:15 +1000)]
amdgpu: fix build on non-DCN platforms.
This fixes the build here locally on my 32-bit arm build.
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit f439a959dcfb6b39d6fd4b85ca1110a1d1de1587) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Paul Cercueil [Tue, 29 Nov 2022 19:19:36 +0000 (19:19 +0000)]
drm/tegra: Remove #ifdef guards for PM related functions
Use the RUNTIME_PM_OPS() and pm_ptr() macros to handle the
.runtime_suspend/.runtime_resume callbacks.
These macros allow the suspend and resume functions to be automatically
dropped by the compiler when CONFIG_PM is disabled, without having
to use #ifdef guards.
This has the advantage of always compiling these functions in,
independently of any Kconfig option. Thanks to that, bugs and other
regressions are subsequently easier to catch.
Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Thierry Reding <treding@nvidia.com>
Mikko Perttunen [Thu, 19 Jan 2023 13:09:21 +0000 (15:09 +0200)]
gpu: host1x: External timeout/cancellation for fences
Currently all fences have a 30 second timeout to ensure they are
cleaned up if the fence never completes otherwise. However, this
one size fits all solution doesn't actually fit in every case,
such as syncpoint waiting where we want to be able to have timeouts
longer than 30 seconds. As such, we want to be able to give control
over fence cancellation to the caller (and maybe eventually get rid
of the internal timeout altogether).
Here we add this cancellation mechanism by essentially adding a
function for entering the timeout path by function call, and changing
the syncpoint wait function to use it.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
Mikko Perttunen [Thu, 19 Jan 2023 13:09:20 +0000 (15:09 +0200)]
gpu: host1x: Rewrite syncpoint interrupt handling
Move from the old, complex intr handling code to a new implementation
based on dma_fences. While there is a fair bit of churn to get there,
the new implementation is much simpler and likely faster as well due
to allowing signaling directly from interrupt context.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
Mikko Perttunen [Thu, 19 Jan 2023 13:09:19 +0000 (15:09 +0200)]
gpu: host1x: Implement job tracking using DMA fences
In anticipation of removal of the intr API, implement job tracking
using DMA fences instead. The main two things about this are
making cdma_update schedule the work since fence completion can
now be called from interrupt context, and some complication in
ensuring the callback is not running when we free the fence.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
Mikko Perttunen [Thu, 19 Jan 2023 13:09:18 +0000 (15:09 +0200)]
gpu: host1x: Implement syncpoint wait using DMA fences
In anticipation of removal of the intr API, move host1x_syncpt_wait
to use DMA fences instead. As of this patch, this means that waits
have a 30 second maximum timeout because of the implicit timeout
we have with fences, but that will be lifted in a follow-up patch.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
Jani Nikula [Wed, 25 Jan 2023 11:10:52 +0000 (13:10 +0200)]
drm/i915/panel: move panel fixed EDID to struct intel_panel
It's a bit confusing to have two cached EDIDs in struct intel_connector
with slightly different purposes. Make the distinction a bit clearer by
moving the EDID cached for eDP and LVDS panels at connector init time to
struct intel_panel, and name it fixed_edid. That's what it is, a fixed
EDID for the panels.
Jani Nikula [Wed, 25 Jan 2023 11:10:49 +0000 (13:10 +0200)]
drm/i915/edid: convert DP, HDMI and LVDS to drm_edid
Convert all the connectors that use cached connector edid and
detect_edid to drm_edid.
Since drm_get_edid() calls drm_connector_update_edid_property() while
drm_edid_read*() do not, we need to call drm_edid_connector_update()
separately, in part due to the EDID caching behaviour in HDMI and
DP. Especially DP depends on the details parsed from EDID. (The big
behavioural change conflating EDID reading with parsing and property
update was done in commit 5186421cbfe2 ("drm: Introduce epoch counter to
drm_connector"))
v6: Rebase on drm_edid_connector_add_modes()
v5: Fix potential uninitialized var use (kernel test robot <lkp@intel.com>)
v4: Call drm_edid_connector_update() after reading HDMI/DP EDID
v3: Don't leak vga switcheroo EDID in LVDS init (Ville)
Jeffrey Hugo [Thu, 19 Jan 2023 16:26:08 +0000 (09:26 -0700)]
docs: accel: Fix debugfs path
The device specific directory in debugfs does not have "accel". For
example, the documentation says device 0 should have a debugfs entry as
/sys/kernel/debug/accel/accel0/ but in reality the entry is
/sys/kernel/debug/accel/0/
Fix the documentation to match the implementation.
Fixes: 8c5577a5ccc6 ("doc: add documentation for accel subsystem") Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Koby Elbaz [Sun, 15 Jan 2023 10:38:53 +0000 (12:38 +0200)]
habanalabs/gaudi2: find decode error root cause
When a decode error happens, we often don't know the exact root
cause (the erroneous address that was accessed) and the exact engine
that created the erroneous transaction.
To find out, we need to go over all the relevant register blocks
in the ASIC. Once we find the relevant engine, we print its details
and the offending address.
This helps tremendously when debugging an error that was created
by running a user workload.
Tomer Tayar [Tue, 17 Jan 2023 17:45:24 +0000 (19:45 +0200)]
habanalabs: clear in_compute_reset when escalating to hard reset
If resetting device upon release while the release watchdog work is
scheduled, the compute reset is replaced with hard reset.
In this case, need to clear the in_compute_reset indication in the
device reset information structure.
Moti Haimovski [Tue, 3 Jan 2023 08:28:24 +0000 (10:28 +0200)]
habanalabs: enhance info printed on FW load errors
This commit enhances the following error messages to also provide the
type of error occurred, this in order to ease debugging of errors
detected during firmware-load.
Completion timestamp is taken during the actual command submission
release. As the release happens in a work queue, the timestamp taken
is not accurate. Hence, we will take the timestamp in the interrupt
handler itself while propagating it to the release function.
Dani Liberman [Mon, 16 Jan 2023 10:00:05 +0000 (12:00 +0200)]
habanalabs/gaudi2: fix emda range registers razwi handling
Handling edma razwi is different than all other engines since edma
uses sft routers. For hbw transactions sft router contain separate
interface for each edma and for lbw there is common interface for
both edma engines of the same dcore.
To handle the razwi correctly we need to:
1. Simplify the calculation of the sft router address.
2. Add razwi handling for edma qm errors, since edma qman doesn't
reports axi error response.
Koby Elbaz [Wed, 11 Jan 2023 13:43:21 +0000 (15:43 +0200)]
habanalabs: block soft-reset on an unusable device
A device with status malfunction indicates that it can't be used.
In such a case we do not support certain reset types, e.g.,
all kinds of soft-resets (compute reset, inference soft-reset),
and reset upon device release.
A hard-reset is the only way that an unusable device can change its
status. All other reset procedures can't put the device in a reset
procedure, which might ultimately cause the device to change its
status, unintentionally, to become operational again.
Such a scenario has recently occurred, when a user requested
a hard-reset while another heavy user workload was ongoing (reset
request is queued).
Since the workload couldn't finish within reset's timeout limits, the
reset has failed and set a device status malfunction.
Eventually, when the user released the FD, an unsuccessful soft-reset
occurred, hence followed by an additional hard-reset that changed the
ASICs status back to be operational.
Dani Liberman [Tue, 10 Jan 2023 13:48:36 +0000 (15:48 +0200)]
habanalabs/gaudi2: print page fault axi transaction id
AXI transaction id holds information about the initiator which caused
the page fault. In the future it will be translated automatically by
driver to an initiator name.
Jeffrey Hugo [Tue, 17 Jan 2023 17:45:58 +0000 (10:45 -0700)]
accel: Add .mmap to DRM_ACCEL_FOPS
In reviewing the ivpu driver, DEFINE_DRM_ACCEL_FOPS could have been used
if DRM_ACCEL_FOPS defined .mmap to be drm_gem_mmap. Lets add that since
accel drivers are a variant of drm drivers, modern drm drivers are
expected to use GEM, and mmap() is a common operation that is expected
to be heavily used in accel drivers thus the common accel driver should
be able to just use DEFINE_DRM_ACCEL_FOPS() for convenience.
Jeffrey Hugo [Tue, 17 Jan 2023 18:04:30 +0000 (11:04 -0700)]
MAINTAINERS/ACCEL: Add include/drm/drm_accel.h to the accel entry
get_maintainer.pl does not suggest Oded Gabbay, the DRM COMPUTE
ACCELERATORS DRIVERS AND FRAMEWORK maintainer for changes that touch
the Accel Subsystem header - drm_accel.h. This is because that file is
missing from the Accel Subsystem entry. Fix this.
Dani Liberman [Thu, 5 Jan 2023 15:12:28 +0000 (17:12 +0200)]
habanalabs/gaudi2: read mmio razwi information
In gaudi2 there night be different routers for low b/w and high b/w
transactions. But in the code that collects razwi information, we used
the same router for high b/w and low b/w.
Fixed it by reading the information also from low b/w routers.
farah kassabri [Tue, 10 Jan 2023 10:29:55 +0000 (12:29 +0200)]
habanalabs: fix bug in timestamps registration code
Protect re-using the same timestamp buffer record before actually
adding it to the to interrupt wait list.
Mark ts buff offset as in use in the spinlock protection area of the
interrupt wait list to avoid getting in the re-use section in
ts_buff_get_kernel_ts_record before adding the node to the list.
this scenario might happen when multiple threads are racing on
same offset and one thread could set data in the ts buff in
ts_buff_get_kernel_ts_record then the other thread takes over
and get to ts_buff_get_kernel_ts_record and we will try
to re-use the same ts buff offset then we will try to
delete a non existing node from the list.
Gustavo A. R. Silva [Tue, 10 Jan 2023 01:39:47 +0000 (19:39 -0600)]
habanalabs: Replace zero-length arrays with flexible-array members
Zero-length arrays are deprecated[1] and we are moving towards
adopting C99 flexible-array members instead. So, replace zero-length
arrays in a couple of structures with flex-array members.
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [2].
Dani Liberman [Tue, 3 Jan 2023 22:05:03 +0000 (00:05 +0200)]
habanalabs/gaudi2: remove use of razwi info received from f/w
Because f/w does not update razwi info when sending events, remove the
use of it.
The driver is responsible to check if razwi happened and to
collect razwi data.
Ohad Sharabi [Wed, 30 Nov 2022 12:02:00 +0000 (14:02 +0200)]
habanalabs: define events to trace PCI LBW access
There are cases where it may be useful to dump the whole LBW configs.
Yet, doing so while spamming the kernel log will probably shade other
important messages since the LBW access is done in sheer volume.
To answer this we add trace events for those too.
Koby Elbaz [Fri, 23 Dec 2022 13:02:05 +0000 (15:02 +0200)]
habanalabs: protect access to dynamic mem 'user_mappings'
When HL_INFO_USER_MAPPINGS IOCTL is called, we copy_to_user from
a dynamically allocated memory - 'user_mappings'.
Since freeing/allocating it happens in runtime (upon a page fault),
it not unlikely to access it even before being initially allocated
(i.e., accessing a NULL pointer).
The solution is to simply mark the spot when the err info has been
collected, and that way to know whether err info (either page fault
or RAZWI) is available to be read.
Tom Rix [Sat, 7 Jan 2023 18:48:27 +0000 (13:48 -0500)]
habanalabs: remove redundant memset
From reviewing the code, the line
memset(kdata, 0, usize);
is not needed because kdata is either zeroed by
kdata = kzalloc(asize, GFP_KERNEL);
when allocated at runtime or by
char stack_kdata[128] = {0};
at compile time.
Ofir Bitton [Sun, 25 Dec 2022 14:27:24 +0000 (16:27 +0200)]
habanalabs/gaudi: allow device acquire while in debug mode
During device acquire, the driver is using a QMAN for clearing some
registers. In order to avoid internal races, the driver verifies
the device is idle before submitting the register clear job.
This check introduces an issue, as debug mode will cause the device
to be non-idle which will lead to device acquire failure.
In order to overcome this issue we can entirely remove the idle
check as the driver is using the QMAN only when there is no active
context.
Oded Gabbay [Thu, 22 Dec 2022 10:28:54 +0000 (12:28 +0200)]
habanalabs: move some prints to debug level
When entering an IOCTL, the driver prints a message in case device is
not operational. This message should be printed in debug level as
it can spam the kernel log and it is not an error.
Ohad Sharabi [Sun, 18 Dec 2022 07:42:34 +0000 (09:42 +0200)]
habanalabs: add uapi to flush inbound HBM transactions
When doing p2p with a NIC device, the NIC needs to make sure all the
writes to the HBM (through the PCI bar of the Gaudi device) were
flushed.
It can be done by either the NIC or the host reading through the PCI
bar.
To support the host side, we supply a simple uapi to perform this flush
through the driver, because the user can't create such a transaction
by itself (the PCI bar isn't exposed to normal users).
Oded Gabbay [Mon, 26 Dec 2022 21:05:00 +0000 (23:05 +0200)]
habanalabs: move driver to accel subsystem
Now that we have a subsystem for compute accelerators, move the
habanalabs driver to it.
This patch only moves the files and fixes the Makefiles. Future
patches will change the existing code to register to the accel
subsystem and expose the accel device char files instead of the
habanalabs device char files.
Update the MAINTAINERS file to reflect this change.
Tomer Tayar [Thu, 15 Dec 2022 14:36:53 +0000 (16:36 +0200)]
habanalabs: fix dma-buf release handling if dma_buf_fd() fails
The dma-buf private object is freed if a call to dma_buf_fd() fails,
and because a file was already associated with the dma-buf in
dma_buf_export(), the release op will be called and will use this
object.
Mark the 'priv' field as NULL in this case, and avoid accessing it from
the release op.
farah kassabri [Wed, 16 Nov 2022 13:40:30 +0000 (15:40 +0200)]
habanalabs: pass-through request from user to f/w
Add a uAPI, as part of the INFO IOCTL, to allow users to send
requests directly to f/w, according to a pre-defined set of opcodes
that the f/w exposes.
The f/w will put the result in a kernel-allocated buffer, which the
driver will then copy to the user-supplied buffer.
This will allow f/w tools to communicate directly with the f/w
without the need to add a new uAPI to the driver for each new type
of request.
Tal Cohen [Thu, 1 Dec 2022 14:37:30 +0000 (16:37 +0200)]
habanalabs: support receiving ascii message from preboot f/w
An Ascii message that is sent from preboot towards the driver
will indicate the specific error that occurred on the f/w.
This commit supports that message and parse the ascii string
in order to print it into the kernel log
The commit also changes the way the descriptor struct is declared.
While its size increased (it now above 1024 bytes), it will be
allocated by using kmalloc instead of stack declaration.
Ohad Sharabi [Wed, 30 Nov 2022 12:26:10 +0000 (14:26 +0200)]
habanalabs/gaudi2: wait for preboot ready if HW state is dirty
Instead of waiting for BTM indication we should wait for preboot ready.
Consider the below scenario:
1. FW update is being triggered
- setting the dirty bit
2. hard reset will be triggered due to the dirty bit
3. FW initiates the reset:
- dirty bit cleared
- BTM indication cleared
- preboot ready indication cleared
4. during hard reset:
- BTM indication will be set
- BIST test performed and another reset triggered
5. only after this reset the preboot will set the preboot ready
When polling on BTM indication alone we can lose sync with FW while
trying to communicate with FW that is during reset.
To overcome this we will always wait to preboot ready indication.
Tomer Tayar [Sun, 4 Dec 2022 20:09:08 +0000 (22:09 +0200)]
habanalabs: fix handling of wait CS for interrupting signals
The -ERESTARTSYS return value is not handled correctly when a signal is
received while waiting for CS completion.
This can lead to bad output values to user when waiting for a single CS
completion, and more severe, it can cause a non-stopping loop when
waiting to multi-CS completion and until a CS timeout.
Fix the handling and exit the waiting if this return value is received.
Ohad Sharabi [Thu, 10 Nov 2022 11:43:02 +0000 (13:43 +0200)]
habanalabs: fix dmabuf to export only required size
This patch fixes a bug that was found in the dmabuf flow.
Bug description as found on Gaudi2 device:
1. User allocates 4MB of device memory
- Note that although the allocation size was 4MB the HMMU allocated
a full page of 768MB to back the request.
- The user gets a memory handle that points to a single page (768MB)
- Mapping the handle, the user gets virtual address to the start of
the page.
2. User exports the buffer
3. User registers the exported buffer in the importer. This flow has
a callback to the exporter which in turn converts the phys_page_pack
to an SG list for the importer. This SG list is of single entry of
size 768MB. However, the size that was passed to the importer was
only 4MB.
The solution for this is to make sure the importer gets exposure only
to the exported size.
This will be done by fixing the SG created by the exporter to be of
the total size of the actual exported memory requested by the user.
Ohad Sharabi [Mon, 14 Nov 2022 10:16:37 +0000 (12:16 +0200)]
habanalabs: modify export dmabuf API
A previous commit deprecated the option to export from handle, leaving
the code with no support for devices with virtual memory.
This commit modifies the export API in a way that unifies the uAPI to
user address for both cases (i.e. with and without MMU support) and add
the actual support for devices with virtual memory.
Ohad Sharabi [Tue, 29 Nov 2022 07:13:34 +0000 (09:13 +0200)]
habanalabs: helper function to validate export params
Validate export parameters in a dedicated function instead of in the
main export flow.
This will be useful later when support to export dmabuf for devices
with virtual memory will be added.
Ohad Sharabi [Tue, 29 Nov 2022 12:02:07 +0000 (14:02 +0200)]
habanalabs: remove support to export dmabuf from handle
The API to the user which allows exporting DMA buffer from handle is
deprecated here. It was never used as it is relevant only for Gaudi2,
and the user stack has yet to add support for dmabuf in Gaudi2.
Looking forward, a modified API to export DMA buffer for ASICs that
supports virtual memory will be added.
Until the new API will be ready- exporting DMA buffer will not be
supported for ASICs with virtual memory.
Ofir Bitton [Wed, 30 Nov 2022 12:35:32 +0000 (14:35 +0200)]
habanalabs/gaudi2: support abrupt device reset event
In certain scenarios, firmware might encounter a fatal event for
which a device reset is required. Hence, a proper notification
is needed for driver to be aware and initiate a reset sequence.
In secured environments the reset will be performed by firmware
without an explicit request from the driver.
Tomer Tayar [Wed, 30 Nov 2022 10:07:06 +0000 (12:07 +0200)]
habanalabs: skip device idle check in hpriv_release if in reset
When user context is released and hpriv_release() is called, there is a
device idle status check, to understand if user has left the device not
idle and then a reset is required.
However, if the user process is killed because of device hard reset,
the device at this point would always be not idle, because the device
engines were already forcefully halted.
Modify hpriv_release() to skip the idle check if reset is in progress.
Ofir Bitton [Wed, 16 Nov 2022 15:27:26 +0000 (17:27 +0200)]
habanalabs/gaudi2: remove duplicated event prints
In order to reduce error log, we try to minimize the dumped rows
while keeping all relevant error info. In addition we completely
remove clock throttling debug logs.
Ohad Sharabi [Sun, 27 Nov 2022 10:38:49 +0000 (12:38 +0200)]
habanalabs: make set_dram_properties an ASIC function
As ASICs are evolving, we will need to update the DRAM properties at
various points because we may get different information from the f/w
at different points of the initialization.
This ASIC function is a foundation for this capability.
Tomer Tayar [Thu, 24 Nov 2022 09:12:38 +0000 (11:12 +0200)]
habanalabs: use dev_dbg() when hl_mmap_mem_buf_get() fails
As hl_mmap_mem_buf_get() is called also from IOCTLs which can have a
bad handle from user, modify the print for "no match to handle" to use
dev_dbg().
Calls to this function which are not dependent on user, already have an
error print when the function fails.
Tomer Tayar [Wed, 23 Nov 2022 13:09:43 +0000 (15:09 +0200)]
habanalabs: don't allow user to destroy CB handle more than once
The refcount of a CB buffer is initialized when user allocates a CB,
and is decreased when he destroys the CB handle.
If this refcount is increased also from kernel and user sends more than
one destroy requests for the handle, the buffer will be released/freed
and later be accessed when the refcount is put from kernel side.
To avoid it, prevent user from destroying the handle more than once.
Ofir Bitton [Thu, 24 Nov 2022 09:01:44 +0000 (11:01 +0200)]
habanalabs: don't notify user about clk throttling due to power
As clock throttling due to high power consumption can happen very
frequently and there is no real reason to notify the user about it,
we skip this notification in all asics.
Tomer Tayar [Tue, 8 Nov 2022 12:34:43 +0000 (14:34 +0200)]
habanalabs: abort waiting user threads upon error
User should close the FD when being notified about an error, after
which a device reset takes place.
However, if the user has pending threads that wait for completions,
the device release won't be called and eventually the watchdog timeout
will expire, leading to hard reset and killing the user process.
To avoid it, abort such waiting threads right after the error
notification, and block following waiting operations.
Tomer Tayar [Sun, 6 Nov 2022 18:29:18 +0000 (20:29 +0200)]
habanalabs: remove releasing of user threads from device release
The device file is not in use when hl_device_release() is called,
and there aren't any user threads that use IOCTLs to wait for
interrupts. Therefore there is no need to release them at this point.
farah kassabri [Sun, 13 Nov 2022 15:44:17 +0000 (17:44 +0200)]
habanalabs: read binning info from preboot
Sometimes we need the binning info at a very early state of the
driver initialization. Therefore, support was added in preboot to
provide the binning info as part of the f/w descriptor and the driver
can now use that.