The extended KCK key length check wrongly using the KEK key attribute
for validation. Due to this GTK rekey offload is failing when the KCK
key length is 24 bytes even though the driver advertising
WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK flag. Use correct attribute to fix the
same.
crypto_alloc_shash() allocates resources, which should be released by
crypto_free_shash(). When ath11k_peer_find() fails, there has memory
leak. Add missing crypto_free_shash() to fix this.
Fixes: 243874c64c81 ("ath11k: handle RX fragments") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230102081142.3937570-1-linmq006@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix a stack-out-of-bounds write that occurs in a WMI response callback
function that is called after a timeout occurs in ath9k_wmi_cmd().
The callback writes to wmi->cmd_rsp_buf, a stack-allocated buffer that
could no longer be valid when a timeout occurs. Set wmi->last_seq_id to
0 when a timeout occurred.
Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") Signed-off-by: Minsuk Kang <linuxlovemin@yonsei.ac.kr> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230104124130.10996-1-linuxlovemin@yonsei.ac.kr Signed-off-by: Sasha Levin <sashal@kernel.org>
Syzkaller detected a memory leak of skbs in ath9k_hif_usb_rx_stream().
While processing skbs in ath9k_hif_usb_rx_stream(), the already allocated
skbs in skb_pool are not freed if ath9k_hif_usb_rx_stream() fails. If we
have an incorrect pkt_len or pkt_tag, the input skb is considered invalid
and dropped. All the associated packets already in skb_pool should be
dropped and freed. Added a comment describing this issue.
The patch also makes remain_skb NULL after being processed so that it
cannot be referenced after potential free. The initialization of hif_dev
fields which are associated with remain_skb (rx_remain_len,
rx_transfer_len and rx_pad_len) is moved after a new remain_skb is
allocated.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
It is stated that ath9k_htc_rx_msg() either frees the provided skb or
passes its management to another callback function. However, the skb is
not freed in case there is no another callback function, and Syzkaller was
able to cause a memory leak. Also minor comment fix.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") Reported-by: syzbot+e008dccab31bd3647609@syzkaller.appspotmail.com Reported-by: syzbot+6692c72009680f7c4eb2@syzkaller.appspotmail.com Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230104123546.51427-1-pchelkin@ispras.ru Signed-off-by: Sasha Levin <sashal@kernel.org>
The minimal resource ID is 0: IMX_SC_R_AP_0=0, so fix
the loop condition. Aside of this - constify the array.
Fixes: 31fd4b9db13b ("thermal/drivers/imx_sc: Rely on the platform data to get the resource id") Signed-off-by: Viorel Suman <viorel.suman@nxp.com> Reviewed-by: Dong Aisheng <Aisheng.dong@nxp.com> Link: https://lore.kernel.org/r/20230117091956.61729-1-viorel.suman@oss.nxp.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is currently no return check for writing an authentication
type (HERMES_AUTH_SHARED_KEY or HERMES_AUTH_OPEN). It looks like
it was accidentally skipped.
This patch adds a return check similar to the other checks in
__orinoco_hw_setup_enc() for hermes_write_wordrec().
The wifi + bluetooth combo chip RTL8723BU can leak memory (especially?)
when it's connected to a bluetooth audio device. The busy bluetooth
traffic generates lots of C2H (card to host) messages, which are not
freed correctly.
To fix this, move the dev_kfree_skb() call in rtl8xxxu_c2hcmd_callback()
inside the loop where skb_dequeue() is called.
The RTL8192EU leaks memory because the C2H messages are added to the
queue and left there forever. (This was fine in the past because it
probably wasn't sending any C2H messages until commit e542e66b7c2e
("wifi: rtl8xxxu: gen2: Turn on the rate control"). Since that commit
it sends a C2H message when the TX rate changes.)
To fix this, delete the check for rf_paths > 1 and the goto. Let the
function process the C2H messages from RTL8192EU like the ones from
the other chips.
Theoretically the RTL8188FU could also leak like RTL8723BU, but it
most likely doesn't send C2H messages frequently enough.
This change was tested with RTL8723BU by Erhard F. I tested it with
RTL8188FU and RTL8192EU.
Reported-by: Erhard F. <erhard_f@mailbox.org> Tested-by: Erhard F. <erhard_f@mailbox.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215197 Fixes: e542e66b7c2e ("rtl8xxxu: add bluetooth co-existence support for single antenna") Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com> Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/03b099c1-c671-d252-36f4-57b70d721f9d@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
On msm8939 last (hwid=10) sensor was added in the hw revision 3.0.
Calibration data for it was placed outside of the main calibration data
blob, so it is not accessible by the current blob-parsing code.
Moreover data for the sensor's p2 is not contiguous in the fuses. This
makes it hard to use nvmem_cell API to parse calibration data in a
generic way.
Since the sensor doesn't seem to be actually used by the existing
hardware, disable the sensor for now.
Tsens driver mentions that msm8976 data should be used for both msm8976
and msm8956 SoCs. This is not quite correct, as according to the
vendor kernels, msm8976 should use standard slope values (3200), while
msm8956 really uses the slope values found in the driver.
Add separate compatibility string for msm8956, move slope value
overrides to the corresponding init function and use the standard
compute_intercept_slope() function for both platforms.
Fixes: 0e580290170d ("thermal: qcom: tsens-v1: Add support for MSM8956 and MSM8976") Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230101194034.831222-7-dmitry.baryshkov@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Drop msm8976-specific defines, which duplicate generic ones.
Fixes: 0e580290170d ("thermal: qcom: tsens-v1: Add support for MSM8956 and MSM8976") Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230101194034.831222-6-dmitry.baryshkov@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Functions used with __setup() return 1 when the argument has been
successfully parsed.
Reverse the returned value so that 1 is returned when kstrtobool() is
successful (i.e. returns 0).
My understanding of these __setup() functions is that returning 1 or 0
does not change much anyway - so this is more of a cleanup than a
functional fix.
I spot it and found it spurious while looking at something else.
Even if the output is not perfect, you'll get the idea with:
Commit ada1da31ce34 ("s390/sclp: sort out physical vs
virtual pointers usage") fixed the notion of virtual
address for sclp_early_sccb pointer. However, it did
not take into account that kasan_early_init() can also
output messages and sclp_early_sccb should be adjusted
by the time kasan_early_init() is called.
Currently it is not a problem, since virtual and physical
addresses on s390 are the same. Nevertheless, should they
ever differ, this would cause an invalid pointer access.
Fixes: ada1da31ce34 ("s390/sclp: sort out physical vs virtual pointers usage") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When unbind_workers() reads wq_unbound_cpumask to set the affinity of
freshly-unbound kworkers, it only holds wq_pool_attach_mutex. This isn't
sufficient as wq_unbound_cpumask is only protected by wq_pool_mutex.
Make wq_unbound_cpumask protected with wq_pool_attach_mutex and also
remove the need of temporary saved_cpumask.
Fixes: 10a5a651e3af ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs") Reported-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
SME does not mandate any specific VL so we may not have 128 bit SME but
the algorithm used for enumerating VLs assumes that we will. Add the
required check to ensure that the algorithm terminates.
Since it was added our hwcap for DIT has specified that DIT is a signed
field but this appears to be incorrect, the two values for the enumeration
are:
0b0000 NI
0b0001 IMP
which look like a normal unsigned enumeration and the in-kernel DIT usage
added by 01ab991fc0ee ("arm64: Enable data independent timing (DIT) in the
kernel") detects the feature with an unsigned enum. Fix the hwcap to specify
the field as unsigned.
Fixes: 7206dc93a58f ("arm64: Expose Arm v8.4 features") Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20221207-arm64-sysreg-helpers-v3-1-0d71a7b174a8@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Print the correct error codes when exiting the test suite due to some
terminal error. Some of these had a switched sign and some of them
printed zero instead of errno.
Print the correct payload when the packet dump option is selected. The
network to host conversion was forgotten and the payload was
erronously declared to be an int instead of an unsigned int.
Previously acpi_ns_simple_repair() would crash if expected_btypes
contained any combination of ACPI_RTYPE_NONE with a different type,
e.g | ACPI_RTYPE_INTEGER because of slightly incorrect logic in the
!return_object branch, which wouldn't return AE_AML_NO_RETURN_VALUE
for such cases.
Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.
Link: https://github.com/acpica/acpica/pull/811 Fixes: 61db45ca2163 ("ACPICA: Restore code that repairs NULL package elements in return values.") Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
For SEV_GET_ID2, the user provided length does not have a specified
limitation because the length of the ID may change in the future. The
kernel memory allocation, however, is implicitly limited to 4MB on x86 by
the page allocator, otherwise the kzalloc() will fail.
When this happens, it is best not to spam the kernel log with the warning.
Simply fail the allocation and return ENOMEM to the user.
Fixes: d6112ea0cb34 ("crypto: ccp - introduce SEV_GET_ID2 command") Reported-by: Andy Nguyen <theflow@google.com> Reported-by: Peter Gonda <pgonda@google.com> Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
The helper mpi_read_raw_from_sgl sets the number of entries in
the SG list according to nbytes. However, if the last entry
in the SG list contains more data than nbytes, then it may overrun
the buffer because it only allocates enough memory for nbytes.
Fixes: 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") Reported-by: Roberto Sassu <roberto.sassu@huaweicloud.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
RCU Tasks and PID-namespace unshare can interact in do_exit() in a
complicated circular dependency:
1) TASK A calls unshare(CLONE_NEWPID), this creates a new PID namespace
that every subsequent child of TASK A will belong to. But TASK A
doesn't itself belong to that new PID namespace.
2) TASK A forks() and creates TASK B. TASK A stays attached to its PID
namespace (let's say PID_NS1) and TASK B is the first task belonging
to the new PID namespace created by unshare() (let's call it PID_NS2).
3) Since TASK B is the first task attached to PID_NS2, it becomes the
PID_NS2 child reaper.
4) TASK A forks() again and creates TASK C which get attached to PID_NS2.
Note how TASK C has TASK A as a parent (belonging to PID_NS1) but has
TASK B (belonging to PID_NS2) as a pid_namespace child_reaper.
5) TASK B exits and since it is the child reaper for PID_NS2, it has to
kill all other tasks attached to PID_NS2, and wait for all of them to
die before getting reaped itself (zap_pid_ns_process()).
6) TASK A calls synchronize_rcu_tasks() which leads to
synchronize_srcu(&tasks_rcu_exit_srcu).
7) TASK B is waiting for TASK C to get reaped. But TASK B is under a
tasks_rcu_exit_srcu SRCU critical section (exit_notify() is between
exit_tasks_rcu_start() and exit_tasks_rcu_finish()), blocking TASK A.
8) TASK C exits and since TASK A is its parent, it waits for it to reap
TASK C, but it can't because TASK A waits for TASK B that waits for
TASK C.
Pid_namespace semantics can hardly be changed at this point. But the
coverage of tasks_rcu_exit_srcu can be reduced instead.
The current task is assumed not to be concurrently reapable at this
stage of exit_notify() and therefore tasks_rcu_exit_srcu can be
temporarily relaxed without breaking its constraints, providing a way
out of the deadlock scenario.
[ paulmck: Fix build failure by adding additional declaration. ]
Fixes: 3f95aa81d265 ("rcu: Make TASKS_RCU handle tasks that are almost done exiting") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Suggested-by: Boqun Feng <boqun.feng@gmail.com> Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Suggested-by: Paul E. McKenney <paulmck@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Eric W . Biederman <ebiederm@xmission.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()")
SRCU doesn't rely anymore on preemption to be disabled in order to
modify the per-CPU counter. And even then it used to be done from the API
itself.
Therefore and after checking further, it appears to be safe to remove
the preemption disablement around __srcu_read_[un]lock() in
exit_tasks_rcu_start() and exit_tasks_rcu_finish()
Suggested-by: Boqun Feng <boqun.feng@gmail.com> Suggested-by: Paul E. McKenney <paulmck@kernel.org> Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Stable-dep-of: 28319d6dc5e2 ("rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()") Signed-off-by: Sasha Levin <sashal@kernel.org>
Make sure we don't need to look again into the depths of git blame in
order not to miss a subtle part about how rcu-tasks is dealing with
exiting tasks.
Suggested-by: Boqun Feng <boqun.feng@gmail.com> Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Suggested-by: Paul E. McKenney <paulmck@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Stable-dep-of: 28319d6dc5e2 ("rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()") Signed-off-by: Sasha Levin <sashal@kernel.org>
The type of member ->irqs_sum is unsigned long, but kstat_cpu_irqs_sum()
returns int, which can result in truncation. Therefore, change the
kstat_cpu_irqs_sum() function's return value to unsigned long to avoid
truncation.
Fixes: f2c66cd8eedd ("/proc/stat: scalability of irq num per cpu") Reported-by: Elliott, Robert (Servers) <elliott@hpe.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Tejun Heo <tj@kernel.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Josh Don <joshdon@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Microsoft introduced support in Windows XP for blocking port I/O
to various regions. For Windows compatibility ACPICA has adopted
the same protections and will disallow writes to those
(presumably) the same regions.
On some systems the AML included with the firmware will issue 4 byte
long writes to 0x80. These writes aren't making it over because of this
blockage. The first 4 byte write attempt is rejected, and then
subsequently 1 byte at a time each offset is tried. The first at 0x80
works, but then the next 3 bytes are rejected.
This manifests in bizarre failures for devices that expected the AML to
write all 4 bytes. Trying the same AML on Windows 10 or 11 doesn't hit
this failure and all 4 bytes are written.
Either some of these regions were wrong or some point after Windows XP
some of these regions blocks have been lifted.
In the last 15 years there doesn't seem to be any reports popping up of
this error in the Windows event viewer anymore. There is no documentation
at Microsoft's developer site indicating that Windows ACPI interpreter
blocks these regions. Between the lack of documentation and the fact that
the writes actually do work in Windows 10 and 11, it's quite likely
Windows doesn't actually enforce this anymore.
So to help the issue, only enforce Windows XP specific entries if the
latest _OSI supported is Windows XP. Continue to enforce the
ALWAYS_ILLEGAL entries.
Link: https://github.com/acpica/acpica/pull/817 Fixes: 7f0719039085 ("ACPICA: New: I/O port protection") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The key can be unaligned, so use the unaligned memory access helpers.
Fixes: 8ceee72808d1 ("crypto: ghash-clmulni-intel - use C implementation for setkey()") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is currently an invalid register mapping in the s390 return
address register. As the manual[1] states, the return address can be
found at r14. In bpf_tracing.h, the s390 registers were named
gprs(general purpose registers). This commit fixes the problem by
correcting the mistyped mapping.
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So replace kfree_skb()
with dev_kfree_skb_irq() under spin_lock_irqsave(). Compile
tested only.
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So replace kfree_skb()
with dev_kfree_skb_irq() under spin_lock_irqsave(). Compile
tested only.
Fixes: f52b041aed77 ("libertas: Add spinlock to avoid race condition") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221207150008.111743-5-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So replace kfree_skb()
with dev_kfree_skb_irq() under spin_lock_irqsave(). Compile
tested only.
Fixes: d2e7b3425c47 ("libertas: disable functionality when interface is down") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221207150008.111743-4-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So replace kfree_skb()
with dev_kfree_skb_irq() under spin_lock_irqsave(). Compile
tested only.
Fixes: a3128feef6d5 ("libertas: use irqsave() in USB's complete callback") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221207150008.111743-3-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So replace kfree_skb()
with dev_kfree_skb_irq() under spin_lock_irqsave(). Compile
tested only.
Fixes: fc75122fabb5 ("libertas_tf: use irqsave() in USB's complete callback") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221207150008.111743-2-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
After the DMA buffer is mapped to a physical address, address is stored
in pktids in brcmf_msgbuf_alloc_pktid(). Then, pktids is parsed in
brcmf_msgbuf_get_pktid()/brcmf_msgbuf_release_array() to obtain physaddr
and later unmap the DMA buffer. But when count is always equal to
pktids->array_size, physaddr isn't stored in pktids and the DMA buffer
will not be unmapped anyway.
Fixes: 9a1bb60250d2 ("brcmfmac: Adding msgbuf protocol.") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221207013114.1748936-1-shaozhengchao@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The brcmf_netdev_start_xmit() returns NETDEV_TX_OK without freeing skb
in case of pskb_expand_head() fails, add dev_kfree_skb() to fix it.
Compile tested only.
Fixes: 270a6c1f65fe ("brcmfmac: rework headroom check in .start_xmit()") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/1668684782-47422-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The root case here is alloc_ordered_workqueue() fails, but
cfg80211_unregister_netdevice() or unregister_netdev() not be called in
error handling path. To fix add unregister_netdev goto lable to add the
unregister operation in error handling path.
In the error path of ipw_wdev_init(), exception value is returned, and
the memory applied for in the function is not released. Also the memory
is not released in ipw_pci_probe(). As a result, memory leakage occurs.
So memory release needs to be added to the error path of ipw_wdev_init().
Fixes: a3caa99e6c68 ("libipw: initiate cfg80211 API conversion (v2)") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221209012422.182669-1-shaozhengchao@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to call kfree_skb() or consume_skb() from hardware
interrupt context or with hardware interrupts being disabled.
It should use dev_kfree_skb_irq() or dev_consume_skb_irq() instead.
The difference between them is free reason, dev_kfree_skb_irq() means
the SKB is dropped in error and dev_consume_skb_irq() means the SKB
is consumed in normal.
In this case, dev_kfree_skb() is called to free and drop the SKB when
it's reset, so replace it with dev_kfree_skb_irq(). Compile tested
only.
btf__align_of() is supposed to be return alignment requirement of
a requested BTF type. For STRUCT/UNION it doesn't always return correct
value, because it calculates alignment only based on field types. But
for packed structs this is not enough, we need to also check field
offsets and struct size. If field offset isn't aligned according to
field type's natural alignment, then struct must be packed. Similarly,
if struct size is not a multiple of struct's natural alignment, then
struct must be packed as well.
This patch fixes this issue precisely by additionally checking these
conditions.
There is a global-out-of-bounds reported by KASAN:
BUG: KASAN: global-out-of-bounds in
_rtl8812ae_eq_n_byte.part.0+0x3d/0x84 [rtl8821ae]
Read of size 1 at addr ffffffffa0773c43 by task NetworkManager/411
CPU: 6 PID: 411 Comm: NetworkManager Tainted: G D
6.1.0-rc8+ #144 e15588508517267d37
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
Call Trace:
<TASK>
...
kasan_report+0xbb/0x1c0
_rtl8812ae_eq_n_byte.part.0+0x3d/0x84 [rtl8821ae]
rtl8821ae_phy_bb_config.cold+0x346/0x641 [rtl8821ae]
rtl8821ae_hw_init+0x1f5e/0x79b0 [rtl8821ae]
...
</TASK>
The root cause of the problem is that the comparison order of
"prate_section" in _rtl8812ae_phy_set_txpower_limit() is wrong. The
_rtl8812ae_eq_n_byte() is used to compare the first n bytes of the two
strings from tail to head, which causes the problem. In the
_rtl8812ae_phy_set_txpower_limit(), it was originally intended to meet
this requirement by carefully designing the comparison order.
For example, "pregulation" and "pbandwidth" are compared in order of
length from small to large, first is 3 and last is 4. However, the
comparison order of "prate_section" dose not obey such order requirement,
therefore when "prate_section" is "HT", when comparing from tail to head,
it will lead to access out of bounds in _rtl8812ae_eq_n_byte(). As
mentioned above, the _rtl8812ae_eq_n_byte() has the same function as
strcmp(), so just strcmp() is enough.
Fix it by removing _rtl8812ae_eq_n_byte() and use strcmp() barely.
Although it can be fixed by adjusting the comparison order of
"prate_section", this may cause the value of "rate_section" to not be
from 0 to 5. In addition, commit "21e4b0726dc6" not only moved driver
from staging to regular tree, but also added setting txpower limit
function during the driver config phase, so the problem was introduced
by this commit.
Fixes: 21e4b0726dc6 ("rtlwifi: rtl8821ae: Move driver from staging to regular tree") Signed-off-by: Li Zetao <lizetao1@huawei.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221212025812.1541311-1-lizetao1@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to call kfree_skb() or consume_skb() from hardware
interrupt context or with hardware interrupts being disabled.
It should use dev_kfree_skb_irq() or dev_consume_skb_irq() instead.
The difference between them is free reason, dev_kfree_skb_irq() means
the SKB is dropped in error and dev_consume_skb_irq() means the SKB
is consumed in normal.
In this case, dev_kfree_skb() is called to free and drop the SKB when
it's shutdown, so replace it with dev_kfree_skb_irq(). Compile tested
only.
Fixes: 26f1fad29ad9 ("New driver: rtl8xxxu (mac80211)") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221208143517.2383424-1-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to call consume_skb() from hardware interrupt context
or with interrupts being disabled. So replace dev_kfree_skb() with
dev_consume_skb_irq() under spin_lock_irqsave(). Compile tested only.
Fixes: 4bc85c1324aa ("Revert "iwlwifi: split the drivers for agn and legacy devices 3945/4965"") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Stanislaw Gruszka <stf_xl@wp.pl> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221207144013.70210-1-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. All the SKBs have
been dequeued from the old queue, so it's safe to enqueue these
SKBs to a free queue, then free them after spin_unlock_irqrestore()
at once. Compile tested only.
Fixes: 5c99f04fec93 ("rtlwifi: rtl8723be: Update driver to match Realtek release of 06/28/14") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221207141411.46098-4-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. All the SKBs have
been dequeued from the old queue, so it's safe to enqueue these
SKBs to a free queue, then free them after spin_unlock_irqrestore()
at once. Compile tested only.
Fixes: 7fe3b3abb5da ("rtlwifi: rtl8188ee: rtl8821ae: Fix a queue locking problem") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221207141411.46098-3-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. All the SKBs have
been dequeued from the old queue, so it's safe to enqueue these
SKBs to a free queue, then free them after spin_unlock_irqrestore()
at once. Compile tested only.
Fixes: 5c99f04fec93 ("rtlwifi: rtl8723be: Update driver to match Realtek release of 06/28/14") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221207141411.46098-2-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
In the expression "map[i].qid << 24" starts as u8, but is promoted to
"signed int", then sign-extended to type "unsigned long", which is not
intended. Cast to u32 to avoid the sign extension.
Fixes: 776ec4e77aa6 ("mt76: mt7915: rework debugfs queue info") Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
SDIO may need addtional 511 bytes to align bus operation. If the tailroom
of this skb is not big enough, we would access invalid memory region.
For low level operation, increase skb size to keep valid memory access in
SDIO host.
[69.952] Allocated by task 854:
[69.952] kasan_save_stack+0x26/0x50
[69.952] kasan_set_track+0x25/0x30
[69.952] kasan_save_alloc_info+0x1b/0x30
[69.952] __kasan_kmalloc+0x87/0xa0
[69.952] __kmalloc_node_track_caller+0x63/0x150
[69.952] kmalloc_reserve+0x31/0xd0
[69.952] __alloc_skb+0xfc/0x2b0
[69.952] __mt76_mcu_msg_alloc+0xbf/0x230 [mt76]
[69.952] mt76_mcu_send_and_get_msg+0xab/0x110 [mt76]
[69.952] __mt76_mcu_send_firmware.cold+0x94/0x15d [mt76]
[69.952] mt76_connac_mcu_send_ram_firmware+0x415/0x54d [mt76_connac_lib]
[69.952] mt76_connac2_load_ram.cold+0x118/0x4bc [mt76_connac_lib]
[69.952] mt7921_run_firmware.cold+0x2e9/0x405 [mt7921_common]
[69.952] mt7921s_mcu_init+0x45/0x80 [mt7921s]
[69.953] mt7921_init_work+0xe1/0x2a0 [mt7921_common]
[69.953] process_one_work+0x7ee/0x1320
[69.953] worker_thread+0x53c/0x1240
[69.953] kthread+0x2b8/0x370
[69.953] ret_from_fork+0x1f/0x30
[69.953] The buggy address belongs to the object at ffff88811c9ce800
which belongs to the cache kmalloc-2k of size 2048
[69.953] The buggy address is located 0 bytes to the right of
2048-byte region [ffff88811c9ce800, ffff88811c9cf000)
[69.953] Memory state around the buggy address:
[69.953] ffff88811c9cef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[69.953] ffff88811c9cef80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[69.953] >ffff88811c9cf000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[69.953] ^
[69.953] ffff88811c9cf080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[69.953] ffff88811c9cf100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Fixes: 764dee47e2c1 ("mt76: sdio: move common code in mt76_sdio module") Suggested-by: Lorenzo Bianconi <lorenzo@kernel.org> Tested-by: YN Chen <YN.Chen@mediatek.com> Signed-off-by: Deren Wu <deren.wu@mediatek.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add missing of_node_put() after of_reserved_mem_lookup()
Fixes: 99ad32a4ca3a ("mt76: mt7915: add support for MT7986") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
Make sure to copy the flags when a bio_integrity_payload is cloned.
Otherwise per-I/O properties such as IP checksum flag will not be
passed down to the HBA driver. Since the integrity buffer is owned by
the original bio, the BIP_BLOCK_INTEGRITY flag needs to be masked off
to avoid a double free in the completion path.
In the current code, io statistics are missing for cgroup when bio
was throttled by blk-throttle. Fix it by moving the unreaching code
to submit_bio_noacct_nocheck.
Fixes: 3f98c753717c ("block: don't check bio in blk_throtl_dispatch_work_fn") Signed-off-by: Jinke Han <hanjinke.666@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230216032250.74230-1-hanjinke.666@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
We support mixed merge for requests/bios with different fastfail
settings. When request fails, each time we only handle the portion
with same failfast setting, then bios with failfast can be failed
immediately, and bios without failfast can be retried.
The idea is pretty good, but the current implementation has several
defects:
1) initially RA bio doesn't set failfast, however bio merge code
doesn't consider this point, and just check its failfast setting for
deciding if mixed merge is required. Fix this issue by adding helper
of bio_failfast().
2) when merging bio to request front, if this request is mixed
merged, we have to sync request's faifast setting with 1st bio's
failfast. Fix it by calling blk_update_mixed_merge().
3) when merging bio to request back, if this request is mixed
merged, we have to mark the bio as failfast, because blk_update_request
simply updates request failfast with 1st bio's failfast. Fix
it by calling blk_update_mixed_merge().
Fixes one normal EXT4 READ IO failure issue, because it is observed
that the normal READ IO is merged with RA IO, and the mixed merged
request has different failfast setting with 1st bio's, so finally
the normal READ IO doesn't get retried.
Cc: Tejun Heo <tj@kernel.org> Fixes: 80a761fd33cf ("block: implement mixed merge of different failfast requests") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230209125527.667004-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Relinquish fscache volume with mutex held. Otherwise if a new domain is
registered when the old domain with the same name gets removed from the
list but not relinquished yet, fscache may complain the collision.
PMK8350 is the first PMIC to require both HLOS and PBS registers for
PON to function properly (at least in theory, sm8350 sees no change).
The support for it on the driver side has been added long ago,
but it has never been wired up. Do so.
Currently, uring_cmd with UBLK_IO_FETCH_REQ or
UBLK_IO_COMMIT_AND_FETCH_REQ is always checked whether
userspace server has provided IO buffer even flag
UBLK_F_NEED_GET_DATA is configured.
This is a excessive check. If UBLK_F_NEED_GET_DATA is
configured, FETCH_RQ doesn't need to provide IO buffer;
COMMIT_AND_FETCH_REQ also doesn't need to do that if
the IO type is not READ.
Check ub_cmd->addr together with ublk_need_get_data()
and IO type in ublk_ch_uring_cmd().
With this fix, userspace server doesn't need to preserve
buffers for every ublk_io when flag UBLK_F_NEED_GET_DATA
is configured, in order to save memory.
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com> Fixes: c86019ff75c1 ("ublk_drv: add support for UBLK_IO_NEED_GET_DATA") Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230210141356.112321-1-xiaodong.liu@intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
When support for ECDSA keys was added, constraints for data & signature
sizes were never updated. This makes it impossible to use such keys via
keyctl API from userspace.
Update constraint on max_data_size to 64 bytes in order to support
SHA512-based signatures. Also update the signature length constraints
per ECDSA signature encoding described in RFC 5480.
Fixes: 299f561a6693 ("x509: Add support for parsing x509 certs with ECDSA keys") Signed-off-by: Denis Kenzior <denkenz@gmail.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Some of Nano series processors will lead GP when accessing
PMC fixed counter. Meanwhile, their hardware support for PMC
has not announced externally. So exclude Nano CPUs from ZXC
by checking stepping information. This is an unambiguous way
to differentiate between ZXC and Nano CPUs.
Following are Nano and ZXC FMS information:
Nano FMS: Family=6, Model=F, Stepping=[0-A][C-D]
ZXC FMS: Family=6, Model=F, Stepping=E-F OR
Family=6, Model=0x19, Stepping=0-3
Fixes: 3a4ac121c2ca ("x86/perf: Add hardware performance events support for Zhaoxin CPU.") Reported-by: Arjan <8vvbbqzo567a@nospam.xutrox.com> Reported-by: Kevin Brace <kevinbrace@gmx.com> Signed-off-by: silviazhao <silviazhao-oc@zhaoxin.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=212389 Signed-off-by: Sasha Levin <sashal@kernel.org>
The perf time is from sched_clock_cpu(). The current PEBS code
unconditionally convert the TSC to native_sched_clock(). There is a
shift between the two clocks. If the TSC is stable, the shift is
consistent, __sched_clock_offset. If the TSC is unstable, the shift has
to be calculated at runtime.
This patch doesn't support the conversion when the TSC is unstable. The
TSC unstable case is a corner case and very unlikely to happen. If it
happens, the TSC in a PEBS record will be dropped and fall back to
perf_event_clock().
Fixes: 47a3aeb39e8d ("perf/x86/intel/pebs: Fix PEBS timestamps overwritten") Reported-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/CAM9d7cgWDVAq8-11RbJ2uGfwkKD6fA-OMwOKDrNUrU_=8MgEjg@mail.gmail.com/ Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()")
removed any path which could make pick_next_rt_entity() return NULL.
However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of
pick_next_rt_entity()) still checks the error condition, which can
never happen, since list_entry() never returns NULL.
Remove the BUG_ON check, and instead emit a warning in the only
possible error condition here: the queue being empty which should
never happen.
Fixes: 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230128-list-entry-null-check-sched-v3-1-b1a71bd1ac6b@diag.uniroma1.it Signed-off-by: Sasha Levin <sashal@kernel.org>
`dasd_reserve_req` is allocated before `dasd_vol_info_req`, and it
also needs to be freed before the error returns, just like the other
cases in this function.
The memory region reserved by a previous commit (see fixes tag below)
overlaps with the SMEM and MPSS memory regions, causing error messages in
dmesg:
OF: reserved mem: OVERLAP DETECTED!
reserved@5000000 (0x0000000005000000--0x0000000007200000)
overlaps with smem_region@6a00000
(0x0000000006a00000--0x0000000006c00000)
This patch resolves both of these by splitting the previously reserved
memory region into two sections either side of the SMEM region and by
cutting off the second memory region to 0x7000000.
The vendor kernel uses RPM_SMD_XO_CLK_SRC clock as an CXO clock rather
than using the RPM_SMD_BB_CLK1 directly. Follow this example and switch
msm8996.dtsi to use RPM_SMD_XO_CLK_SRC clock instead of RPM_SMB_BB_CLK1.
Commit 88022d7201e96 ("blk-mq: don't handle failure in .get_budget")
remove BLK_STS_RESOURCE return value and we only check if we can get
the budget from .get_budget() now.
Correct stale comment that ".get_budget() returns BLK_STS_NO_RESOURCE"
to ".get_budget() fails to get the budget".
Fixes: 88022d7201e9 ("blk-mq: don't handle failure in .get_budget") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit f906a6a0f4268 ("blk-mq: improve tag waiting setup for non-shared
tags") mark restart for unshared tags for improvement. At that time,
tags is only shared betweens queues and we can check if tags is shared
by test BLK_MQ_F_TAG_SHARED.
Afterwards, commit 32bc15afed04b ("blk-mq: Facilitate a shared sbitmap per
tagset") enabled tags share betweens hctxs inside a queue. We only
mark restart for shared hctxs inside a queue and may cause io hung if
there is no tag currently allocated by hctxs going to be marked restart.
Wait on sbitmap_queue instead of mark restart for shared hctxs case to
fix this.
Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
For shared queues case, we will only wait on bitmap_tags if we fail to get
driver tag. However, rq could be from breserved_tags, then two problems
will occur:
1. io hung if no tag is currently allocated from bitmap_tags.
2. unnecessary wakeup when tag is freed to bitmap_tags while no tag is
freed to breserved_tags.
Wait on the bitmap which rq from to fix this.
Fixes: f906a6a0f426 ("blk-mq: improve tag waiting setup for non-shared tags") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 97889f9ac24f8 ("blk-mq: remove synchronize_rcu() from
blk_mq_del_queue_tag_set()") remove handle of TAG_SHARED in restart,
then shared_hctx_restart counted for how many hardware queues are marked
for restart is removed too.
Remove the stale comment that we still count hardware queues need restart.
Fixes: 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 1f5bd336b9150 ("blk-mq: add blk_mq_alloc_request_hctx") add
blk_mq_alloc_request_hctx to send commands to a specific queue. If
BLK_MQ_REQ_NOWAIT is not set in tag allocation, we may change to different
hctx after sleep and get tag from unexpected hctx. So BLK_MQ_REQ_NOWAIT
must be set in flags for blk_mq_alloc_request_hctx.
After commit 600c3b0cea784 ("blk-mq: open code __blk_mq_alloc_request in
blk_mq_alloc_request_hctx"), blk_mq_alloc_request_hctx return -EINVAL
if both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED are not set instead of
if BLK_MQ_REQ_NOWAIT is not set. So if BLK_MQ_REQ_NOWAIT is not set and
BLK_MQ_REQ_RESERVED is set, blk_mq_alloc_request_hctx could alloc tag
from unexpected hctx. I guess what we need here is that return -EINVAL
if either BLK_MQ_REQ_NOWAIT or BLK_MQ_REQ_RESERVED is not set.
Currently both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED will be set if
specific hctx is needed in nvme_auth_submit, nvmf_connect_io_queue
and nvmf_connect_admin_queue. Fix the potential BLK_MQ_REQ_NOWAIT missed
case in future.
Fixes: 600c3b0cea78 ("blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
The STM32MP13x Device Part Number (also named RPN in reference manual)
only uses the first 12 bits in OTP4, all the other bit are reserved and
they can be different of zero; they must be masked in NVMEM result, so
the number of bits must be defined in the nvmem cell description.
MT7986's watchdog embeds a reset controller and needs only the
mediatek,mt7986-wdt compatible string as the MT6589 one is there
for watchdogs that don't have any reset controller capability.
Fixes: 50137c150f5f ("arm64: dts: mediatek: add basic mt7986 support") Signed-off-by: Allen-KH Cheng <allen-kh.cheng@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://lore.kernel.org/r/20221108033209.22751-4-allen-kh.cheng@mediatek.com Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
MT8195's watchdog embeds a reset controller and needs only the
mediatek,mt8195-wdt compatible string as the MT6589 one is there
for watchdogs that don't have any reset controller capability.
Fixes: 37f2582883be ("arm64: dts: Add mediatek SoC mt8195 and evaluation board") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Co-developed-by: Allen-KH Cheng <allen-kh.cheng@mediatek.com> Signed-off-by: Allen-KH Cheng <allen-kh.cheng@mediatek.com> Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://lore.kernel.org/r/20221108033209.22751-3-allen-kh.cheng@mediatek.com Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
MT8186's watchdog embeds a reset controller and needs only the
mediatek,mt8186-wdt compatible string as the MT6589 one is there
for watchdogs that don't have any reset controller capability.
Fixes: 2e78620b1350 ("arm64: dts: Add MediaTek MT8186 dts and evaluation board and Makefile") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Co-developed-by: Allen-KH Cheng <allen-kh.cheng@mediatek.com> Signed-off-by: Allen-KH Cheng <allen-kh.cheng@mediatek.com> Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://lore.kernel.org/r/20221108033209.22751-2-allen-kh.cheng@mediatek.com Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
MT8186 features the ARM DynamIQ technology and combines both two
Cortex-A76 (big) and six Cortex-A55 (LITTLE) CPUs in one cluster:
fix the CPU map to reflect that.
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Fixes: 2e78620b1350 ("arm64: dts: Add MediaTek MT8186 dts and evaluation board and Makefile") Link: https://lore.kernel.org/r/20230126103526.417039-4-angelogioacchino.delregno@collabora.com Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
MT8192 features the ARM DynamIQ technology and combines both four
Cortex-A76 (big) and four Cortex-A55 (LITTLE) CPUs in one cluster:
fix the CPU map to reflect that.
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Fixes: 48489980e27e ("arm64: dts: Add Mediatek SoC MT8192 and evaluation board dts and Makefile") Link: https://lore.kernel.org/r/20230126103526.417039-3-angelogioacchino.delregno@collabora.com Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
MT8195 features the ARM DynamIQ technology and combines both four
Cortex-A78 (big) and four Cortex-A55 (LITTLE) CPUs in one cluster:
fix the CPU map to reflect that.
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Fixes: 37f2582883be ("arm64: dts: Add mediatek SoC mt8195 and evaluation board") Link: https://lore.kernel.org/r/20230126103526.417039-2-angelogioacchino.delregno@collabora.com Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 180dccb0dba4f ("blk-mq: fix tag_get wait task can't be awakened")
mentioned that in case of shared tags, there could be just one real
active hctx(queue) because of lazy detection of tag idle. Then driver tag
allocation may wait forever on this real active hctx(queue) if wake_batch
is > hctx_max_depth where hctx_max_depth is available tags depth for the
actve hctx(queue). However, the condition wake_batch > hctx_max_depth is
not strong enough to avoid IO hung as the sbitmap_queue_wake_up will only
wake up one wait queue for each wake_batch even though there is only one
waiter in the woken wait queue. After this, there is only one tag to free
and wake_batch may not be reached anymore. Commit 180dccb0dba4f ("blk-mq:
fix tag_get wait task can't be awakened") methioned that driver tag
allocation may wait forever. Actually, the inactive hctx(queue) will be
truely idle after at most 30 seconds and will call blk_mq_tag_wakeup_all
to wake one waiter per wait queue to break the hung. But IO hung for 30
seconds is also not acceptable. Set batch size to small enough that depth
of the shared hctx(queue) is enough to wake up all of the queues like
sbq_calc_wake_batch do to fix this potential IO hung.
Although hctx_max_depth will be clamped to at least 4 while wake_batch
recalculation does not do the clamp, the wake_batch will be always
recalculated to 1 when hctx_max_depth <= 4.
Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230116205059.3821738-6-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
sbitmap suffers from code complexity, as demonstrated by recent fixes,
and eventual lost wake ups on nested I/O completion. The later happens,
from what I understand, due to the non-atomic nature of the updates to
wait_cnt, which needs to be subtracted and eventually reset when equal
to zero. This two step process can eventually miss an update when a
nested completion happens to interrupt the CPU in between the wait_cnt
updates. This is very hard to fix, as shown by the recent changes to
this code.
The code complexity arises mostly from the corner cases to avoid missed
wakes in this scenario. In addition, the handling of wake_batch
recalculation plus the synchronization with sbq_queue_wake_up is
non-trivial.
This patchset implements the idea originally proposed by Jan [1], which
removes the need for the two-step updates of wait_cnt. This is done by
tracking the number of completions and wakeups in always increasing,
per-bitmap counters. Instead of having to reset the wait_cnt when it
reaches zero, we simply keep counting, and attempt to wake up N threads
in a single wait queue whenever there is enough space for a batch.
Waking up less than batch_wake shouldn't be a problem, because we
haven't changed the conditions for wake up, and the existing batch
calculation guarantees at least enough remaining completions to wake up
a batch for each queue at any time.
Performance-wise, one should expect very similar performance to the
original algorithm for the case where there is no queueing. In both the
old algorithm and this implementation, the first thing is to check
ws_active, which bails out if there is no queueing to be managed. In the
new code, we took care to avoid accounting completions and wakeups when
there is no queueing, to not pay the cost of atomic operations
unnecessarily, since it doesn't skew the numbers.
For more interesting cases, where there is queueing, we need to take
into account the cross-communication of the atomic operations. I've
been benchmarking by running parallel fio jobs against a single hctx
nullb in different hardware queue depth scenarios, and verifying both
IOPS and queueing.
Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel
jobs. fio was issuing fixed-size randwrites with qd=64 against nullb,
varying only the hardware queue length per test.
The following is a similar experiment, ran against a nullb with a single
bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40
parallel fio jobs operating on the same device
It has also survived blktests and a 12h-stress run against nullb. I also
ran the code against nvme and a scsi SSD, and I didn't observe
performance regression in those. If there are other tests you think I
should run, please let me know and I will follow up with results.
Commit fbb564a557809 ("lib/sbitmap: Fix invalid loop in
__sbitmap_queue_get_batch()") mentioned that "Checking free bits when
setting the target bits. Otherwise, it may reuse the busying bits."
This commit add check to make sure all masked bits in word before
cmpxchg is zero. Then the existing check after cmpxchg to check any
zero bit is existing in masked bits in word is redundant.
Actually, old value of word before cmpxchg is stored in val and we
will filter out busy bits in val by "(get_mask & ~val)" after cmpxchg.
So we will not reuse busy bits methioned in commit fbb564a557809
("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()"). Revert
new-added check to remove redundant check.
Fixes: fbb564a55780 ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230116205059.3821738-3-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>