Use the correct old/new topology and payload states in
intel_mst_disable_dp(). So far drm_atomic_get_mst_topology_state() it
used returned either the old state, in case the state was added already
earlier during the atomic check phase or otherwise the new state (but
the latter could fail, which can't be handled in the enable/disable
hooks). After the first patch in the patchset, the state should always
get added already during the check phase, so here we can get the
old/new states without a failure.
drm_dp_remove_payload() should use time_slots from the old payload state
and vc_start_slot in the new one. It should update the new payload
states to reflect the sink's current payload table after the payload is
removed. Pass the new topology state and the old and new payload states
accordingly.
This also fixes a problem where the payload allocations for multiple MST
streams on the same link got inconsistent after a few commits, as
during payload removal the old instead of the new payload state got
updated, so the subsequent enabling sequence and commits used a stale
payload state.
v2: Constify the old payload state pointer. (Ville)
Cc: Lyude Paul <lyude@redhat.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: stable@vger.kernel.org # 6.1 Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Acked-by: Lyude Paul <lyude@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Acked-by: Wayne Lin <wayne.lin@amd.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230206114856.2665066-4-imre.deak@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Atm, drm_dp_remove_payload() uses the same payload state to both get the
vc_start_slot required for the payload removal DPCD message and to
deduct time_slots from vc_start_slot of all payloads after the one being
removed.
The above isn't always correct, as vc_start_slot must be the up-to-date
version contained in the new payload state, but time_slots must be the
one used when the payload was previously added, contained in the old
payload state. The new payload's time_slots can change vs. the old one
if the current atomic commit changes the corresponding mode.
This patch let's drivers pass the old and new payload states to
drm_dp_remove_payload(), but keeps these the same for now in all drivers
not to change the behavior. A follow-up i915 patch will pass in that
driver the correct old and new states to the function.
Cc: Lyude Paul <lyude@redhat.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Wayne Lin <Wayne.Lin@amd.com> Cc: stable@vger.kernel.org # 6.1 Cc: dri-devel@lists.freedesktop.org Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Acked-by: Lyude Paul <lyude@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Acked-by: Wayne Lin <wayne.lin@amd.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230206114856.2665066-2-imre.deak@intel.com
Hand modified for missing 8c7d980da9ba3eb67a1b40fd4b33bcf49397084b Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The psp suspend & resume should be skipped to avoid destroy
the TMR and reload FWs again for IMU enabled APU ASICs.
Signed-off-by: Tim Huang <tim.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
In case of failure to resume MST topology after suspend, an emtpty
mst tree prevents further mst hub detection on the same connector.
That causes the issue with MST hub hotplug after it's been unplug in
suspend.
[How]
Stop topology manager on the connector after detecting DM_MST failure.
Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: "Limonciello, Mario" <Mario.Limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Consider situation as following (on the default hierarchy):
HDD
|
root (bps limit: 4k)
|
child (bps limit :8k)
|
fio bs=8k
Rate of fio is supposed to be 4k, but result is 8k. Reason is as
following:
Size of single IO from fio is larger than bytes allowed in one
throtl_slice in child, so IOs are always queued in child group first.
When queued IOs in child are dispatched to parent group, BIO_BPS_THROTTLED
is set and these IOs will not be limited by tg_within_bps_limit anymore.
Fix this by only set BIO_BPS_THROTTLED when the bio traversed the entire
tree.
There patch has no influence on situation which is not on the default
hierarchy as each group is a single root group without parent.
When CPU1 is about to execute the instruction pointed to by IP, the
ma_data_end() executed by CPU2 may return the wrong end position, which
will cause the value loaded by mtree_load() to be wrong.
An example of triggering the bug:
Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in
mas_root_expand().
static DEFINE_MTREE(tree);
int work(void *p) {
unsigned long val;
for (int i = 0 ; i< 30; ++i) {
val = (unsigned long)mtree_load(&tree, 8);
mdelay(5);
pr_info("%lu",val);
}
return 0;
}
In RCU mode, mtree_load() should always return the value before or after
the data structure is modified, and in this example mtree_load(&tree, 8)
may return 56789 which is not expected, it should always return NULL. Fix
it by put ma_set_meta() before rcu_assign_pointer().
Link: https://lkml.kernel.org/r/20230314124203.91572-4-zhangpeng.00@bytedance.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The above code should be changed to if (likely(offset < end)), which is
correct. This affects the correctness of ma_data_end(). Now it seems
that the final result will not be wrong, but it is best to change it.
This patch does not change the code as above, because it simplifies the
code by the way.
This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
writable even with uffd-wp bit set. It only happens with hugetlb private
mappings, when someone firstly wr-protects a missing pte (which will
install a pte marker), then a write to the same page without any prior
access to the page.
Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
reaching hugetlb_wp() to avoid taking more locks that userfault won't
need. However there's one CoW optimization path that can trigger
hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap.
This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
is detected. The new path will only trigger in the CoW optimization path
because generic hugetlb_fault() (e.g. when a present pte was
wr-protected) will resolve the uffd-wp bit already. Also make sure
anonymous UNSHARE won't be affected and can still be resolved, IOW only
skip CoW not CoR.
This patch will be needed for v5.19+ hence copy stable.
[peterx@redhat.com: v2] Link: https://lkml.kernel.org/r/ZBzOqwF2wrHgBVZb@x1n
[peterx@redhat.com: v3] Link: https://lkml.kernel.org/r/20230324142620.2344140-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20230321191840.1897940-1-peterx@redhat.com Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The si->lock must be held when deleting the si from the available list.
Otherwise, another thread can re-add the si to the available list, which
can lead to memory corruption. The only place we have found where this
happens is in the swapoff path. This case can be described as below:
core 0 core 1
swapoff
del_from_avail_list(si) waiting
try lock si->lock acquire swap_avail_lock
and re-add si into
swap_avail_head
acquire si->lock but missing si already being added again, and continuing
to clear SWP_WRITEOK, etc.
It can be easily found that a massive warning messages can be triggered
inside get_swap_pages() by some special cases, for example, we call
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
run much swapon-swapoff operations (e.g. stress-ng-swap).
However, in the worst case, panic can be caused by the above scene. In
swapoff(), the memory used by si could be kept in swap_info[] after
turning off a swap. This means memory corruption will not be caused
immediately until allocated and reset for a new swap in the swapon path.
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)
Now, si->lock locked before calling 'del_from_avail_list()' to make sure
other thread see the si had been deleted and SWP_WRITEOK cleared together,
will not reinsert again.
This problem exists in versions after stable 5.10.y.
Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com Fixes: a2468cc9bfdff ("swap: choose swap device according to numa node") Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com> Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Aaron Lu <aaron.lu@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When user reads file 'trace_pipe', kernel keeps printing following logs
that warn at "cpu_buffer->reader_page->read > rb_page_size(reader)" in
rb_get_reader_page(). It just looks like there's an infinite loop in
tracing_read_pipe(). This problem occurs several times on arm64 platform
when testing v5.10 and below.
Then I dump the vmcore and look into the problematic per_cpu ring_buffer,
I found that tail_page/commit_page/reader_page are on the same page while
reader_page->read is obviously abnormal:
tail_page == commit_page == reader_page == {
.write = 0x100d20,
.read = 0x8f9f4805, // Far greater than 0xd20, obviously abnormal!!!
.entries = 0x10004c,
.real_end = 0x0,
.page = {
.time_stamp = 0x857257416af0,
.commit = 0xd20, // This page hasn't been full filled.
// .data[0...0xd20] seems normal.
}
}
The root cause is most likely the race that reader and writer are on the
same page while reader saw an event that not fully committed by writer.
To fix this, add memory barriers to make sure the reader can see the
content of what is committed. Since commit a0fcaaed0c46 ("ring-buffer: Fix
race between reset page and reading page") has added the read barrier in
rb_get_reader_page(), here we just need to add the write barrier.
Link: https://lore.kernel.org/linux-trace-kernel/20230325021247.2923907-1-zhengyejian1@huawei.com Cc: stable@vger.kernel.org Fixes: 77ae365eca89 ("ring-buffer: make lockless") Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Userspace can guess the id value and try to race oa_config object creation
with config remove, resulting in a use-after-free if we dereference the
object after unlocking the metrics_lock. For that reason, unlocking the
metrics_lock must be done after we are done dereferencing the object.
When considering whether to mark one context as stopped and another as
started we need to look at whether the previous and new _contexts_ are
different and not just requests. Otherwise the software tracked context
start time was incorrectly updated to the most recent lite-restore time-
stamp, which was in some cases resulting in active time going backward,
until the context switch (typically the heartbeat pulse) would synchronise
with the hardware tracked context runtime. Easiest use case to observe
this behaviour was with a full screen clients with close to 100% engine
load.
Since SQE memory is shared with userspace, we should only be reading it
once. We cannot read it multiple times, particularly when it's read once
for validation and then read again for the actual use.
ublk_ch_uring_cmd() is safe when called as a retry operation, as the
memory backing is stable at that point. But for normal issue, we want
to ensure that we only read ublksrv_io_cmd once. Wrap the function in
a helper that reads the value into an on-stack copy of the struct.
Cc: stable@vger.kernel.org # 6.0+ Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This failure really confuse me as there're still lots of available pages.
Finally I figured out it was caused by a fatal signal. When a process is
allocating memory via vm_area_alloc_pages(), it will break directly even
if it hasn't allocated the requested pages when it receives a fatal
signal. In that case, we shouldn't show this warn_alloc, as it is
useless. We only need to show this warning when there're really no enough
pages.
Link: https://lkml.kernel.org/r/20230330162625.13604-1-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a tracing instance is removed, the error messages that hold errors
that occurred in the instance needs to be freed. The following reports a
memory leak:
osnoise/timerlat tracers are reporting new max latency on instances
where the tracing is off, creating inconsistencies between the max
reported values in the trace and in the tracing_max_latency. Thus
only report new tracing_max_latency on active tracing instances.
timerlat is not reporting a new tracing_max_latency for the thread
latency. The reason is that it is not calling notify_new_max_latency()
function after the new thread latency is sampled.
Call notify_new_max_latency() after computing the thread latency.
Currently, the "last_cmd" variable can be accessed by multiple processes
asynchronously when multiple users manipulate synthetic_events node
at the same time, it could lead to use-after-free or double-free.
This patch add "lastcmd_mutex" to prevent "last_cmd" from being accessed
asynchronously.
Queue reset was moved out from __init_dma_rx_desc_rings() and
__init_dma_tx_desc_rings() functions. Thus, the driver fails to transmit
and receive packet after XDP prog setup.
This commit adds the missing queue reset into stmmac_xdp_open() function.
Fixes: f9ec5723c3db ("net: ethernet: stmicro: stmmac: move queue reset to dedicated functions") Cc: <stable@vger.kernel.org> # 6.0+ Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230404044823.3226144-1-yoong.siang.song@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Lenovo ThinkPad W530 uses a nvidia k1000m GPU. When this gets used
together with one of the older nvidia binary driver series (the latest
series does not support it), then backlight control does not work.
This is caused by commit 3dbc80a3e4c5 ("ACPI: video: Make backlight
class device registration a separate step (v2)") combined with
commit 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for
creating ACPI backlight by default").
After these changes the acpi_video# backlight device is only registered
when requested by a GPU driver calling acpi_video_register_backlight()
which the nvidia binary driver does not do.
I realize that using the nvidia binary driver is not a supported use-case
and users can workaround this by adding acpi_backlight=video on the kernel
commandline, but the ThinkPad W530 is a popular model under Linux users,
so it seems worthwhile to add a quirk for this.
I will also email Nvidia asking them to make the driver call
acpi_video_register_backlight() when an internal LCD panel is detected.
So maybe the next maintenance release of the drivers will fix this...
Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Cc: All applicable <stable@vger.kernel.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On the Apple iMac14,1 and iMac14,2 all-in-ones (monitors with builtin "PC")
the connection between the GPU and the panel is seen by the GPU driver as
regular DP instead of eDP, causing the GPU driver to never call
acpi_video_register_backlight().
(GPU drivers only call acpi_video_register_backlight() when an internal
panel is detected, to avoid non working acpi_video# devices getting
registered on desktops which unfortunately is a real issue.)
Fix the missing acpi_video# backlight device on these all-in-ones by
adding a acpi_backlight=video DMI quirk, so that video.ko will
immediately register the backlight device instead of waiting for
an acpi_video_register_backlight() call.
Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Cc: All applicable <stable@vger.kernel.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3dbc80a3e4c5 ("ACPI: video: Make backlight class device
registration a separate step (v2)") combined with
commit 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for
creating ACPI backlight by default")
Means that the video.ko code now fully depends on the GPU driver calling
acpi_video_register_backlight() for the acpi_video# backlight class
devices to get registered.
This means that if the GPU driver does not do this, acpi_backlight=video
on the cmdline, or DMI quirks for selecting acpi_video# will not work.
This is a problem on for example Apple iMac14,1 all-in-ones where
the monitor's LCD panel shows up as a regular DP connection instead of
eDP so the GPU driver will not call acpi_video_register_backlight() [1].
Fix this by making video.ko directly register the acpi_video# devices
when these have been explicitly requested either on the cmdline or
through DMI quirks (rather then auto-detection being used).
[1] GPU drivers only call acpi_video_register_backlight() when an internal
panel is detected, to avoid non working acpi_video# devices getting
registered on desktops which unfortunately is a real issue.
Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Cc: All applicable <stable@vger.kernel.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Allow callers of __acpi_video_get_backlight_type() to pass a pointer
to a bool which will get set to false if the backlight-type comes from
the cmdline or a DMI quirk and set to true if auto-detection was used.
And make __acpi_video_get_backlight_type() non static so that it can
be called directly outside of video_detect.c .
While at it turn the acpi_video_get_backlight_type() and
acpi_video_backlight_use_native() wrappers into static inline functions
in include/acpi/video.h, so that we need to export one less symbol.
Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Cc: All applicable <stable@vger.kernel.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When using select()/poll()/epoll() with a non-blocking ISOTP socket to
wait for when non-blocking write is possible, a false EPOLLOUT event
is sometimes returned. This can happen at least after sending a
message which must be split to multiple CAN frames.
The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
not equal to ISOTP_IDLE and this behavior is not reflected in
datagram_poll(), which is used in isotp_ops.
This is fixed by introducing ISOTP-specific poll function, which
suppresses the EPOLLOUT events in that case.
As discussed with Dae R. Jeong and Hillf Danton here [1] the sendmsg()
function in isotp.c might get into a race condition when restoring the
former tx.state from the old_state.
Remove the old_state concept and implement proper locking for the
ISOTP_IDLE transitions in isotp_sendmsg(), inspired by a
simplification idea from Hillf Danton.
Introduce a new tx.state ISOTP_SHUTDOWN and use the same locking
mechanism from isotp_release() which resolves a potential race between
isotp_sendsmg() and isotp_release().
v1: https://lore.kernel.org/all/20230331102114.15164-1-socketcan@hartkopp.net
v2: https://lore.kernel.org/all/20230331123600.3550-1-socketcan@hartkopp.net
take care of signal interrupts for wait_event_interruptible() in
isotp_release()
v3: https://lore.kernel.org/all/20230331130654.9886-1-socketcan@hartkopp.net
take care of signal interrupts for wait_event_interruptible() in
isotp_sendmsg() in the wait_tx_done case
v4: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net
take care of signal interrupts for wait_event_interruptible() in
isotp_sendmsg() in ALL cases
Cc: Dae R. Jeong <threeearcat@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Fixes: 4f027cba8216 ("can: isotp: split tx timer into transmission and timeout") Link: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org
[mkl: rephrase commit message] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the j1939_tp_tx_dat_new() function, an out-of-bounds memory access
could occur during the memcpy() operation if the size of skb->cb is
larger than the size of struct j1939_sk_buff_cb. This is because the
memcpy() operation uses the size of skb->cb, leading to a read beyond
the struct j1939_sk_buff_cb.
Updated the memcpy() operation to use the size of struct
j1939_sk_buff_cb instead of the size of skb->cb. This ensures that the
memcpy() operation only reads the memory within the bounds of struct
j1939_sk_buff_cb, preventing out-of-bounds memory access.
Additionally, add a BUILD_BUG_ON() to check that the size of skb->cb
is greater than or equal to the size of struct j1939_sk_buff_cb. This
ensures that the skb->cb buffer is large enough to hold the
j1939_sk_buff_cb structure.
When cleaning up peer group ids in the failure path we need to make sure
to hold on to the namespace lock. Otherwise another thread might just
turn the mount from a shared into a non-shared mount concurrently.
Syzkaller report a WARNING: "WARN_ON(!direct)" in modify_ftrace_direct().
Root cause is 'direct->addr' was changed from 'old_addr' to 'new_addr' but
not restored if error happened on calling ftrace_modify_direct_caller().
Then it can no longer find 'direct' by that 'old_addr'.
To fix it, restore 'direct->addr' to 'old_addr' explicitly in error path.
If the compiler decides not to inline this function then preemption
tracing will always show an IP inside the preemption disabling path and
never the function actually calling preempt_{enable,disable}.
Link: https://lore.kernel.org/linux-trace-kernel/20230327173647.1690849-1-john@metanate.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: stable@vger.kernel.org Fixes: f904f58263e1d ("sched/debug: Fix preempt_disable_ip recording for preempt_disable()") Signed-off-by: John Keeping <john@metanate.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The same task check in perf_event_set_output has some potential issues
for some usages.
For the current perf code, there is a problem if using of
perf_event_open() to have multiple samples getting into the same mmap’d
memory when they are both attached to the same process.
https://lore.kernel.org/all/92645262-D319-4068-9C44-2409EF44888E@gmail.com/
Because the event->ctx is not ready when the perf_event_set_output() is
invoked in the perf_event_open().
Besides the above issue, before the commit bd2756811766 ("perf: Rewrite
core context handling"), perf record can errors out when sampling with
a hardware event and a software event as below.
$ perf record -e cycles,dummy --per-thread ls
failed to mmap with 22 (Invalid argument)
That's because that prior to the commit a hardware event and a software
event are from different task context.
The problem should be a long time issue since commit c3f00c70276d
("perk: Separate find_get_context() from event initialization").
The task struct is stored in the event->hw.target for each per-thread
event. It is a more reliable way to determine whether two events are
attached to the same task.
The event->hw.target was also introduced several years ago by the
commit 50f16a8bf9d7 ("perf: Remove type specific target pointers"). It
can not only be used to fix the issue with the current code, but also
back port to fix the issues with an older kernel.
Note: The event->hw.target was introduced later than commit c3f00c70276d. The patch may cannot be applied between the commit c3f00c70276d and commit 50f16a8bf9d7. Anybody that wants to back-port
this at that period may have to find other solutions.
Fixes: c3f00c70276d ("perf: Separate find_get_context() from event initialization") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Zhengjun Xing <zhengjun.xing@linux.intel.com> Link: https://lkml.kernel.org/r/20230322202449.512091-1-kan.liang@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still
set, and partition scan will be proceed again when blkdev_get_by_dev()
is called. However, this will cause a problem that re-assemble partitioned
raid device will creat partition for underlying disk.
After a server reboot, clients are failing to move files with ENOENT.
This is caused by DFS referrals containing multiple separators, which
the server move call doesn't recognize.
v1: Initial patch.
v2: Move prototype to header.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182472 Fixes: a31080899d5f ("cifs: sanitize multiple delimiters in prepath")
Actually-Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Thiago Rafael Becker <tbecker@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The device can report discard support without setting the ONCS DSM bit.
When not set, the driver clears max_discard_size expecting it to be set
later. We don't know the size until we have the namespace format,
though, so setting it is deferred until configuring one, but the driver
was abandoning the discard settings due to that initial clearing.
Move the max_discard_size calculation above the check for a '0' discard
size.
The validity of sock should be checked before assignment to avoid incorrect
values. Commit 57569c37f0ad ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref
while calling getpeername()") introduced this change which may lead to
inconsistent values of tcp_sw_conn->sendpage and conn->datadgst_en.
Fix the issue by moving the position of the assignment.
Fixes: 57569c37f0ad ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()") Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> Link: https://lore.kernel.org/r/20230329071739.2175268-1-zhongjinghua@huaweicloud.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The root cause is traced to an error-handling path in qla2x00_probe_one()
when the adapter "base_vha" initialize failed. The fab_scan_rp "scan.l" is
used to record the port information and it is allocated in
qla2x00_create_host(). However, it is not released in the error handling
path "probe_failed".
Fix this by freeing the memory of "scan.l" when an error occurs in the
adapter initialization process.
Fixes: a4239945b8ad ("scsi: qla2xxx: Add switch command to simplify fabric discovery") Signed-off-by: Li Zetao <lizetao1@huawei.com> Link: https://lore.kernel.org/r/20230325110004.363898-1-lizetao1@huawei.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When removing provided buffers, io_buffer structs are not being disposed
of, leading to a memory leak. They can't be freed individually, because
they are allocated in page-sized groups. They need to be added to some
free list instead, such as io_buffers_cache. All callers already hold
the lock protecting it, apart from when destroying buffers, so had to
extend the lock there.
When a request to remove buffers is submitted, and the given number to be
removed is larger than available in the specified buffer group, the
resulting CQE result will be the number of removed buffers + 1, which is
1 more than it should be.
Previously, the head was part of the list and it got removed after the
loop, so the increment was needed. Now, the head is not an element of
the list, so the increment shouldn't be there anymore.
Polling needs a bio with a valid bi_bdev, but neither of those are
guaranteed for polled driver requests. Make request based polling
directly use blk-mq's polling function instead.
When executing a request from a polled hctx, we know the request's
cookie, and that it's from a live blk-mq queue that supports polling, so
we can safely skip everything that bio_poll provides.
Cc: stable@kernel.org Reported-by: Martin Belanger <Martin.Belanger@dell.com> Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Tested-by: Daniel Wagner <dwagner@suse.de> Revieded-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20230331180056.1155862-1-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signal 16 and higher represent the device's Index lines. The
priv->preset_enable array holds the device configuration for these Index
lines. The preset_enable configuration is active low on the device, so
invert the conditional check in quad8_action_read() to properly handle
the logical state of preset_enable.
The Counter (CNTR) register is 24 bits wide, but we can have an
effective 25-bit count value by setting bit 24 to the XOR of the Borrow
flag and Carry flag. The flags can be read from the FLAG register, but a
race condition exists: the Borrow flag and Carry flag are instantaneous
and could change by the time the count value is read from the CNTR
register.
Since the race condition could result in an incorrect 25-bit count
value, remove support for 25-bit count values from this driver;
hard-coded maximum count values are replaced by a LS7267_CNTR_MAX define
for consistency and clarity.
Fixes: 28e5d3bb0325 ("iio: 104-quad-8: Add IIO support for the ACCES 104-QUAD-8") Cc: <stable@vger.kernel.org> # 6.1.x Cc: <stable@vger.kernel.org> # 6.2.x Link: https://lore.kernel.org/r/20230312231554.134858-1-william.gray@linaro.org/ Signed-off-by: William Breathitt Gray <william.gray@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CoreSight ETM4x architecture clearly provides ways to identify a device
via registers in the "Management" class, TRCDEVARCH and TRCDEVTYPE. These
registers can be accessed without the Trace domain being powered on.
We additionally added TRCIDR1 as fallback in order to cover for any
ETMs that may not have implemented TRCDEVARCH. So far, nobody has
reported hitting a WARNING we placed to catch such systems.
Also, more importantly it is problematic to access TRCIDR1, which is a
"Trace" register via MMIO access, without clearing the OSLK. But we cannot
mess with the OSLK until we know for sure that this is an ETMv4 device.
Thus, this kind of creates a chicken and egg problem unnecessarily for
systems "which are compliant" to the ETMv4 architecture.
Let us remove the TRCIDR1 fall back check and rely only on TRCDEVARCH.
The struct pages could be discontiguous when the kfence pool is allocated
via alloc_contig_pages() with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP.
This may result in setting PG_slab and memcg_data to a arbitrary
address (may be not used as a struct page), which in the worst case
might corrupt the kernel.
So the iteration should use nth_page().
Link: https://lkml.kernel.org/r/20230323025003.94447-1-songmuchun@bytedance.com Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It does not reset PG_slab and memcg_data when KFENCE fails to initialize
kfence pool at runtime. It is reporting a "Bad page state" message when
kfence pool is freed to buddy. The checking of whether it is a compound
head page seems unnecessary since we already guarantee this when
allocating kfence pool. Remove the check to simplify the code.
Link: https://lkml.kernel.org/r/20230320030059.20189-1-songmuchun@bytedance.com Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Cc: Marco Elver <elver@google.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: SeongJae Park <sjpark@amazon.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Don't report an error code to L1 when synthesizing a nested VM-Exit and
L2 is in Real Mode. Per Intel's SDM, regarding the error code valid bit:
This bit is always 0 if the VM exit occurred while the logical processor
was in real-address mode (CR0.PE=0).
The bug was introduced by a recent fix for AMD's Paged Real Mode, which
moved the error code suppression from the common "queue exception" path
to the "inject exception" path, but missed VMX's "synthesize VM-Exit"
path.
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230322143300.2209476-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When injecting an exception into a vCPU in Real Mode, suppress the error
code by clearing the flag that tracks whether the error code is valid, not
by clearing the error code itself. The "typo" was introduced by recent
fix for SVM's funky Paged Real Mode.
Opportunistically hoist the logic above the tracepoint so that the trace
is coherent with respect to what is actually injected (this was also the
behavior prior to the buggy commit).
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230322143300.2209476-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ACPI 6.3 introduced the online capable bit, and also introduced MADT
version 5.
Latter was used to distinguish whether the offset storing online capable
could be used. However ACPI 6.2b has MADT version "45" which is for
an errata version of the ACPI 6.2 spec. This means that the Linux code
for detecting availability of MADT will mistakenly flag ACPI 6.2b as
supporting online capable which is inaccurate as it's an ACPI 6.3 feature.
Instead use the FADT major and minor revision fields to distinguish this.
[ bp: Massage. ]
Fixes: aa06e20f1be6 ("x86/ACPI: Don't add CPUs that are not online capable") Reported-by: Eric DeVolder <eric.devolder@oracle.com> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/943d2445-84df-d939-f578-5d8240d342cc@unsolicited.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The logic in acpi_is_processor_usable() requires the online capable
bit be set for hotpluggable CPUs. The online capable bit has been
introduced in ACPI 6.3.
However, for ACPI revisions < 6.3 which do not support that bit, CPUs
should be reported as usable, not the other way around.
Reverse the check.
[ bp: Rewrite commit message. ]
Fixes: e2869bd7af60 ("x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC") Suggested-by: Miguel Luis <miguel.luis@oracle.com> Suggested-by: Boris Ostrovsky <boris.ovstrosky@oracle.com> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: David R <david@unsolicited.net> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20230327191026.3454-2-eric.devolder@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When smb1 mount fails, KASAN detect slab-out-of-bounds in
init_smb2_rsp_hdr like the following one.
For smb1 negotiate(56bytes) , init_smb2_rsp_hdr() for smb2 is called.
The issue occurs while handling smb1 negotiate as smb2 server operations.
Add smb server operations for smb1 (get_cmd_val, init_rsp_hdr,
allocate_rsp_buf, check_user_session) to handle smb1 negotiate so that
smb2 server operation does not handle it.
[ 411.400423] CIFS: VFS: Use of the less secure dialect vers=1.0 is
not recommended unless required for access to very old servers
[ 411.400452] CIFS: Attempting to mount \\192.168.45.139\homes
[ 411.479312] ksmbd: init_smb2_rsp_hdr : 492
[ 411.479323] ==================================================================
[ 411.479327] BUG: KASAN: slab-out-of-bounds in
init_smb2_rsp_hdr+0x1e2/0x1f4 [ksmbd]
[ 411.479369] Read of size 16 at addr ffff888488ed0734 by task kworker/14:1/199
Commit 83dcedd5540d ("ksmbd: fix infinite loop in ksmbd_conn_handler_loop()"),
changes GFP modifiers passed to kvmalloc(). This cause xfstests generic/551
test to fail. We limit pdu length size according to connection status and
maximum number of connections. In the rest, memory allocation of request
is limited by credit management. so these flags are no longer needed.
Fixes: 83dcedd5540d ("ksmbd: fix infinite loop in ksmbd_conn_handler_loop()") Cc: stable@vger.kernel.org Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hans de Goede reported Bluetooth adapters (HCIs) connected over an UART
connection failed due corrupted Rx payload. The problem was narrowed
down to DMA Rx starting on UART_IIR_THRI interrupt. The problem occurs
despite LSR having DR bit set, which is precondition for attempting to
start DMA Rx in the first place.
In the debug snippet, received field indicates 1 byte was transferred
over DMA and 12 bytes after that with the non-DMA Rx. The sole byte DMA
handled was corrupted (gets zeroed) which leads to the HCI failure.
This problem became apparent after commit e8ffbb71f783 ("serial: 8250:
use THRE & __stop_tx also with DMA") changed Tx stop behavior. Tx stop
is now triggered from a THRI interrupt.
Despite that this problem looks like a HW bug, this fix is not adding
UART_BUG_xx flag to the driver beucase it seems useful in general to
avoid starting DMA when there are only a few bytes to transfer.
Skipping DMA for small transfers avoids the extra overhead DMA incurs.
Thus, don't setup DMA Rx on UART_IIR_THRI but leave it to a subsequent
interrupt which has Rx a related IIR value.
By returning false from handle_rx_dma(), the DMA vs non-DMA decision is
postponed until either UART_IIR_RDI (FIFO threshold worth of bytes
awaiting) or UART_IIR_TIMEOUT (inter-character timeout) triggers at a
later time which allows better to discern whether the number of bytes
warrants starting DMA or not.
Reported-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Fixes: e8ffbb71f783 ("serial: 8250: use THRE & __stop_tx also with DMA") Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230317103034.12881-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The fourth interrupt on SCIF variants with four interrupts (RZ/A1) is
the Break interrupt, not the Transmit End interrupt (like on SCI(g)).
Update the description and interrupt name to fix this.
The current nilfs2 sysfs support has issues with the timing of creation
and deletion of sysfs entries, potentially leading to null pointer
dereferences, use-after-free, and lockdep warnings.
Some of the sysfs attributes for nilfs2 per-filesystem instance refer to
metadata file "cpfile", "sufile", or "dat", but
nilfs_sysfs_create_device_group that creates those attributes is executed
before the inodes for these metadata files are loaded, and
nilfs_sysfs_delete_device_group which deletes these sysfs entries is
called after releasing their metadata file inodes.
Therefore, access to some of these sysfs attributes may occur outside of
the lifetime of these metadata files, resulting in inode NULL pointer
dereferences or use-after-free.
In addition, the call to nilfs_sysfs_create_device_group() is made during
the locking period of the semaphore "ns_sem" of nilfs object, so the
shrinker call caused by the memory allocation for the sysfs entries, may
derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in
nilfs_evict_inode()".
Since nilfs2 may acquire "ns_sem" deep in the call stack holding other
locks via its error handler __nilfs_error(), this causes lockdep to report
circular locking. This is a false positive and no circular locking
actually occurs as no inodes exist yet when
nilfs_sysfs_create_device_group() is called. Fortunately, the lockdep
warnings can be resolved by simply moving the call to
nilfs_sysfs_create_device_group() out of "ns_sem".
This fixes these sysfs issues by revising where the device's sysfs
interface is created/deleted and keeping its lifetime within the lifetime
of the metadata files above.
The finalization of nilfs_segctor_thread() can race with
nilfs_segctor_kill_thread() which terminates that thread, potentially
causing a use-after-free BUG as KASAN detected.
At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member
of "struct nilfs_sc_info" to indicate the thread has finished, and then
notifies nilfs_segctor_kill_thread() of this using waitqueue
"sc_wait_task" on the struct nilfs_sc_info.
However, here, immediately after the NULL assignment to "sc_task", it is
possible that nilfs_segctor_kill_thread() will detect it and return to
continue the deallocation, freeing the nilfs_sc_info structure before the
thread does the notification.
This fixes the issue by protecting the NULL assignment to "sc_task" and
its notification, with spinlock "sc_state_lock" of the struct
nilfs_sc_info. Since nilfs_segctor_kill_thread() does a final check to
see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate
the race.
According to LPUART RM, Transmission Complete Flag becomes 0 if queuing
a break character by writing 1 to CTRL[SBK], so here need to avoid
checking for transmission complete when UARTCTRL_SBK is asserted,
otherwise the lpuart32_tx_empty may never get TIOCSER_TEMT.
Commit 2411fd94ceaa("tty: serial: fsl_lpuart: skip waiting for
transmission complete when UARTCTRL_SBK is asserted") only fix it in
lpuart32_set_termios(), here also fix it in lpuart32_tx_empty().
SCI IP on RZ/G2L alike SoCs do not need regshift compared to other SCI
IPs on the SH platform. Currently, it does regshift and configuring Rx
wrongly. Drop adding regshift for RZ/G2L alike SoCs.
Fixes: dfc80387aefb ("serial: sh-sci: Compute the regshift value for SCI ports") Cc: stable@vger.kernel.org Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://lore.kernel.org/r/20230321114753.75038-3-biju.das.jz@bp.renesas.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The fourth interrupt on SCI port is transmit end interrupt compared to
the break interrupt on other port types. So, shuffle the interrupts to fix
the transmit end interrupt handler.
If a second dummy client that talks to the actual I2C address was
created in probe(), there should be a proper cleanup on driver and
device removal to avoid leakage.
So unregister the dummy client via another callback.
Reviewed-by: Hans de Goede <hdegoede@redhat.com> Suggested-by: Hans de Goede <hdegoede@redhat.com> Fixes: c1e62062ff54 ("iio: light: cm32181: Handle CM3218 ACPI devices with 2 I2C resources")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2152281 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Link: https://lore.kernel.org/r/20230223020059.2013993-1-kai.heng.feng@canonical.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For output buffers, there's no guarantee that the buffer won't be full
in the first iteration of the loop in which case we would block
independently of userspace passing O_NONBLOCK or not. Fix it by always
checking the flag before going to sleep.
While at it (and as it's a bit related), refactored the loop so that the
stop condition is 'written != n', i.e, run the loop until all data has
been copied into the IIO buffers. This makes the code a bit simpler.
If for some reason 'rb->access->write()' does not write the full
requested data and the O_NONBLOCK is set, we would return 'n' to
userspace which is not really truth. Hence, let's return the number of
bytes we effectively wrote.
The CIO-DAC series of devices only supports DAC values up to 12-bit
rather than 16-bit. Trying to write a 16-bit value results in only the
lower 12 bits affecting the DAC output which is not what the user
expects. Instead, adjust the DAC write value check to reject values
larger than 12-bit so that they fail explicitly as invalid for the user.
Fixes: 3b8df5fd526e ("iio: Add IIO support for the Measurement Computing CIO-DAC family") Cc: stable@vger.kernel.org Signed-off-by: William Breathitt Gray <william.gray@linaro.org> Link: https://lore.kernel.org/r/20230311002248.8548-1-william.gray@linaro.org Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The node name can contain an address part which is unused
by the driver. Moreover, this string is propagated into
the userspace label, sysfs filenames *and breaking ABI*.
Cut the address part out before assigning the channel name.
Correct the "sub_lsb" shift for the ltc2497 and drop the sub_lsb element
which is now constant.
An earlier version of the code shifted by 14 but this was a consequence
of reading three bytes into a __be32 buffer and using be32_to_cpu(), so
eight extra bits needed to be skipped. Now we use get_unaligned_be24()
and thus the additional skip is wrong.
Fixes: 2187cfeb3626 ("drivers: iio: adc: ltc2497: LTC2499 support") Signed-off-by: Ian Ray <ian.ray@ge.com> Link: https://lore.kernel.org/r/20230127125714.44608-1-ian.ray@ge.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While determining the initial pin assignment to be sent in the configure
message, using the DP_PIN_ASSIGN_DP_ONLY_MASK mask causes the DFP_U to
send both Pin Assignment C and E when both are supported by the DFP_U and
UFP_U. The spec (Table 5-7 DFP_U Pin Assignment Selection Mandates,
VESA DisplayPort Alt Mode Standard v2.0) indicates that the DFP_U never
selects Pin Assignment E when Pin Assignment C is offered.
Update the DP_PIN_ASSIGN_DP_ONLY_MASK conditional to intially select only
Pin Assignment C if it is available.
The Silicon Labs IFS-USB-DATACABLE is used in conjunction with for example
the Quint UPSes. It is used to enable Modbus communication with the UPS to
query configuration, power and battery status.
Signed-off-by: Kees Jan Koster <kjkoster@kjkoster.org> Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The command allocated to set exit latency LPM values need to be freed in
case the command is never queued. This would be the case if there is no
change in exit latency values, or device is missing.
When we set the dual-role port to Host mode, we observed the following
splat:
[ 167.057718] BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:229
[ 167.057872] Workqueue: events tegra_xusb_usb_phy_work
[ 167.057954] Call trace:
[ 167.057962] dump_backtrace+0x0/0x210
[ 167.057996] show_stack+0x30/0x50
[ 167.058020] dump_stack_lvl+0x64/0x84
[ 167.058065] dump_stack+0x14/0x34
[ 167.058100] __might_resched+0x144/0x180
[ 167.058140] __might_sleep+0x64/0xd0
[ 167.058171] slab_pre_alloc_hook.constprop.0+0xa8/0x110
[ 167.058202] __kmalloc_track_caller+0x74/0x2b0
[ 167.058233] kvasprintf+0xa4/0x190
[ 167.058261] kasprintf+0x58/0x90
[ 167.058285] tegra_xusb_find_port_node.isra.0+0x58/0xd0
[ 167.058334] tegra_xusb_find_port+0x38/0xa0
[ 167.058380] tegra_xusb_padctl_get_usb3_companion+0x38/0xd0
[ 167.058430] tegra_xhci_id_notify+0x8c/0x1e0
[ 167.058473] notifier_call_chain+0x88/0x100
[ 167.058506] atomic_notifier_call_chain+0x44/0x70
[ 167.058537] tegra_xusb_usb_phy_work+0x60/0xd0
[ 167.058581] process_one_work+0x1dc/0x4c0
[ 167.058618] worker_thread+0x54/0x410
[ 167.058650] kthread+0x188/0x1b0
[ 167.058672] ret_from_fork+0x10/0x20
The function tegra_xusb_padctl_get_usb3_companion eventually calls
tegra_xusb_find_port and this in turn calls kasprintf which might sleep
and so cannot be called from an atomic context.
Fix this by moving the call to tegra_xusb_padctl_get_usb3_companion to
the tegra_xhci_id_work function where it is really needed.
Gregory Price reports a WARN splat with CONFIG_DEBUG_OBJECTS=y upon CXL
probing because pci_doe_submit_task() invokes INIT_WORK() instead of
INIT_WORK_ONSTACK() for a work_struct that was allocated on the stack.
All callers of pci_doe_submit_task() allocate the work_struct on the
stack, so replace INIT_WORK() with INIT_WORK_ONSTACK() as a backportable
short-term fix.
The long-term fix implemented by a subsequent commit is to move to a
synchronous API which allocates the work_struct internally in the DOE
library.
If the length in the CDAT header is larger than the concatenation of the
header and all table entries, then the CDAT exposed to user space
contains trailing null bytes.
Not every consumer may be able to handle that. Per Postel's robustness
principle, "be liberal in what you accept" and silently reduce the
cached length to avoid exposing those null bytes.
Fixes: c97006046c79 ("cxl/port: Read CDAT table") Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: stable@vger.kernel.org # v6.0+ Link: https://lore.kernel.org/r/6d98b3c7da5343172bd3ccabfabbc1f31c079d74.1678543498.git.lukas@wunner.de Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If truncated CDAT entries are received from a device, the concatenation
of those entries constitutes a corrupt CDAT, yet is happily exposed to
user space.
Avoid by verifying response lengths and erroring out if truncation is
detected.
The last CDAT entry may still be truncated despite the checks introduced
herein if the length in the CDAT header is too small. However, that is
easily detectable by user space because it reaches EOF prematurely.
A subsequent commit which rightsizes the CDAT response allocation closes
that remaining loophole.
The two lines introduced here which exceed 80 chars are shortened to
less than 80 chars by a subsequent commit which migrates to a
synchronous DOE API and replaces "t.task.rv" by "rc".
The existing acpi_cdat_header and acpi_table_cdat struct definitions
provided by ACPICA cannot be used because they do not employ __le16 or
__le32 types. I believe that cannot be changed because those types are
Linux-specific and ACPI is specified for little endian platforms only,
hence doesn't care about endianness. So duplicate the structs.
Fixes: c97006046c79 ("cxl/port: Read CDAT table") Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: stable@vger.kernel.org # v6.0+ Link: https://lore.kernel.org/r/bce3aebc0e8e18a1173425a7a865b232c3912963.1678543498.git.lukas@wunner.de Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cxl_cdat_get_length() only checks whether the DOE response size is
sufficient for the Table Access response header (1 dword), but not the
succeeding CDAT header (1 dword length plus other fields).
It thus returns whatever uninitialized memory happens to be on the stack
if a truncated DOE response with only 1 dword was received. Fix it.
Fixes: c97006046c79 ("cxl/port: Read CDAT table") Reported-by: Ming Li <ming4.li@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Ming Li <ming4.li@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: stable@vger.kernel.org # v6.0+ Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/000e69cd163461c8b1bc2cf4155b6e25402c29c7.1678543498.git.lukas@wunner.de Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The CDAT exposed in sysfs differs between little endian and big endian
arches: On big endian, every 4 bytes are byte-swapped.
PCI Configuration Space is little endian (PCI r3.0 sec 6.1). Accessors
such as pci_read_config_dword() implicitly swap bytes on big endian.
That way, the macros in include/uapi/linux/pci_regs.h work regardless of
the arch's endianness. For an example of implicit byte-swapping, see
ppc4xx_pciex_read_config(), which calls in_le32(), which uses lwbrx
(Load Word Byte-Reverse Indexed).
DOE Read/Write Data Mailbox Registers are unlike other registers in
Configuration Space in that they contain or receive a 4 byte portion of
an opaque byte stream (a "Data Object" per PCIe r6.0 sec 7.9.24.5f).
They need to be copied to or from the request/response buffer verbatim.
So amend pci_doe_send_req() and pci_doe_recv_resp() to undo the implicit
byte-swapping.
The CXL_DOE_TABLE_ACCESS_* and PCI_DOE_DATA_OBJECT_DISC_* macros assume
implicit byte-swapping. Byte-swap requests after constructing them with
those macros and byte-swap responses before parsing them.
Change the request and response type to __le32 to avoid sparse warnings.
Per a request from Jonathan, replace sizeof(u32) with sizeof(__le32) for
consistency.
Fixes: c97006046c79 ("cxl/port: Read CDAT table") Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: stable@vger.kernel.org # v6.0+ Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/3051114102f41d19df3debbee123129118fc5e6d.1678543498.git.lukas@wunner.de Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some DT devices already have phy device configured in the DT/ACPI.
Current implementation scans for a phy unconditionally even though
there is a phy listed in the DT/ACPI and already attached.
We should check the fwnode if there is any phy device listed in
fwnode and decide whether to scan for a phy to attach to.
Fixes: fe2cfbc96803 ("net: stmmac: check if MAC needs to attach to a PHY") Reported-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Link: https://lore.kernel.org/lkml/20230403212434.296975-1-martin.blumenstingl@googlemail.com/ Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Shahab Vahedi <shahab@synopsys.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com> Link: https://lore.kernel.org/r/20230406024541.3556305-1-michael.wei.hong.sit@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Dan reports that smatch complains about a potential uninitialized
variable being used in the compat alignment fixup code.
The logic is not wrong per se, but we do end up using an uninitialized
variable if reading the instruction that triggered the alignment fault
from user space faults, even if the fault ensures that the uninitialized
value doesn't propagate any further.
Given that we just give up and return 1 if any fault occurs when reading
the instruction, let's get rid of the 'success handling' pattern that
captures the fault in a variable and aborts later, and instead, just
return 1 immediately if any of the get_user() calls result in an
exception.
Non-GSO TCP packets whose SKBs' linear portion did not include the
entire TCP header were not populating the first Tx descriptor with
as many bytes as the vNIC expected. This change ensures that all
TCP packets populate the first descriptor with the correct number of
bytes.
syzbot reported a data-race in data-race in netlink_recvmsg() [1]
Indeed, netlink_recvmsg() can be run concurrently,
and netlink_dump() also needs protection.
[1]
BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0:
netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg net/socket.c:1038 [inline]
__sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194
__do_sys_recvfrom net/socket.c:2212 [inline]
__se_sys_recvfrom net/socket.c:2208 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2208
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1:
netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg net/socket.c:1038 [inline]
____sys_recvmsg+0x156/0x310 net/socket.c:2720
___sys_recvmsg net/socket.c:2762 [inline]
do_recvmmsg+0x2e5/0x710 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x0000000000000000 -> 0x0000000000001000
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48fdfcb #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
Fixes: 9063e21fb026 ("netlink: autosize skb lengthes") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230403214643.768555-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the number of lanes was forced and then subsequently the user
omits this parameter, the ksettings->lanes is reset. The driver
should then reset the number of lanes to the device's default
for the specified speed.
However, although the ksettings->lanes is set to 0, the mod variable
is not set to true to indicate the driver and userspace should be
notified of the changes.
The consequence is that the same ethtool operation will produce
different results based on the initial state.
If the initial state is:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 2
Duplex: Full
Auto-negotiation: on
then executing 'ethtool -s swp1 speed 50000 autoneg off' will yield:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 2
Duplex: Full
Auto-negotiation: off
While if the initial state is:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 1
Duplex: Full
Auto-negotiation: off
executing the same 'ethtool -s swp1 speed 50000 autoneg off' results in:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 1
Duplex: Full
Auto-negotiation: off
This patch fixes this behavior. Omitting lanes will always results in
the driver choosing the default lane width for the chosen speed. In this
scenario, regardless of the initial state, the end state will be, e.g.,
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 2
Duplex: Full
Auto-negotiation: off
Fixes: 012ce4dd3102 ("ethtool: Extend link modes settings uAPI with lanes") Signed-off-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/ac238d6b-8726-8156-3810-6471291dbc7f@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
After commit dbca1596bbb0 ("ping: convert to RCU lookups, get rid
of rwlock"), we use RCU for ping sockets, but we should use spinlock
for /proc/net/icmp to avoid a potential NULL deref mentioned in
the previous patch.
Let's go back to using spinlock there.
Note we can convert ping sockets to use hlist instead of hlist_nulls
because we do not use SLAB_TYPESAFE_BY_RCU for ping sockets.
Fixes: dbca1596bbb0 ("ping: convert to RCU lookups, get rid of rwlock") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
After commit 0daf07e52709 ("raw: convert raw sockets to RCU"), we
use RCU and hlist_nulls_for_each_entry() to iterate over SOCK_RAW
sockets. However, we should use spinlock for slow paths to avoid
the NULL deref.
Also, SOCK_RAW does not use SLAB_TYPESAFE_BY_RCU, and the slab object
is not reused during iteration in the grace period. In fact, the
lockless readers do not check the nulls marker with get_nulls_value().
So, SOCK_RAW should use hlist instead of hlist_nulls.
Instead of adding an unnecessary barrier by sk_nulls_for_each_rcu(),
let's convert hlist_nulls to hlist and use sk_for_each_rcu() for
fast paths and sk_for_each() and spinlock for /proc/net/raw.