Add 'volatile' to iounmap()'s argument to prevent build warnings.
This make it the same as other major architectures.
Placates these warnings: (12 such warnings)
../drivers/video/fbdev/riva/fbdev.c: In function 'rivafb_probe':
../drivers/video/fbdev/riva/fbdev.c:2067:42: error: passing argument 1 of 'iounmap' discards 'volatile' qualifier from pointer target type [-Werror=discarded-qualifiers]
2067 | iounmap(default_par->riva.PRAMIN);
Update HDMI volatile registers list as DMA, Channel Selection registers,
vbit control registers are being reflected by hardware DP port
disconnection.
This update is required to fix no display and no sound issue observed
after reconnecting TAMA/SANWA DP cables.
Once DP cable is unplugged, DMA control registers are being reset by
hardware, however at second plugin, new dma control values does not
updated to the dma hardware registers since new register value and
cached values at the time of first plugin are same.
Fixes: 7cb37b7bd0d3 ("ASoC: qcom: Add support for lpass hdmi driver") Signed-off-by: Srinivasa Rao Mandadapu <quic_srivasam@quicinc.com> Reported-by: Kuogee Hsieh <quic_khsieh@quicinc.com> Link: https://lore.kernel.org/r/1665637711-13300-1-git-send-email-quic_srivasam@quicinc.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The mode_valid field in drm_connector_helper_funcs is expected to be of
type:
enum drm_mode_status (* mode_valid) (struct drm_connector *connector,
struct drm_display_mode *mode);
The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.
The return type of mdp4_lvds_connector_mode_valid should be changed from
int to enum drm_mode_status.
The doc says the I²C device's name is used if devname is NULL, but
actually the I²C device driver's name is used.
Fixes: 0658293012af ("media: v4l: subdev: Add a function to set an I²C sub-device's name") Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The h->*_huge_pages counters are protected by the hugetlb_lock, but
alloc_huge_page has a corner case where it can decrement the counter
outside of the lock.
This could lead to a corrupted value of h->resv_huge_pages, which we have
observed on our systems.
Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a
potential race.
Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com Fixes: a88c76954804 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count") Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Glen McCready <gkmccready@meta.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yuanzheng Song [Fri, 28 Oct 2022 03:07:05 +0000 (03:07 +0000)]
mm/memory: add non-anonymous page check in the copy_present_page()
The vma->anon_vma of the child process may be NULL because
the entire vma does not contain anonymous pages. In this
case, a BUG will occur when the copy_present_page() passes
a copy of a non-anonymous page of that vma to the
page_add_new_anon_rmap() to set up new anonymous rmap.
This problem has been fixed by the commit <fb3d824d1a46>
("mm/rmap: split page_dup_rmap() into page_dup_file_rmap()
and page_try_dup_anon_rmap()"), but still exists in the
linux-5.10.y branch.
This patch is not applicable to this version because
of the large version differences. Therefore, fix it by
adding non-anonymous page check in the copy_present_page().
Cc: stable@vger.kernel.org Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes") Signed-off-by: Yuanzheng Song <songyuanzheng@huawei.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Prior to this commit, if a grant mapping operation failed partially,
some of the entries in the map_ops array would be invalid, whereas all
of the entries in the kmap_ops array would be valid. This in turn would
cause the following logic in gntdev_map_grant_pages to become invalid:
for (i = 0; i < map->count; i++) {
if (map->map_ops[i].status == GNTST_okay) {
map->unmap_ops[i].handle = map->map_ops[i].handle;
if (!use_ptemod)
alloced++;
}
if (use_ptemod) {
if (map->kmap_ops[i].status == GNTST_okay) {
if (map->map_ops[i].status == GNTST_okay)
alloced++;
map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
}
}
}
...
atomic_add(alloced, &map->live_grants);
Assume that use_ptemod is true (i.e., the domain mapping the granted
pages is a paravirtualized domain). In the code excerpt above, note that
the "alloced" variable is only incremented when both kmap_ops[i].status
and map_ops[i].status are set to GNTST_okay (i.e., both mapping
operations are successful). However, as also noted above, there are
cases where a grant mapping operation fails partially, breaking the
assumption of the code excerpt above.
The aforementioned causes map->live_grants to be incorrectly set. In
some cases, all of the map_ops mappings fail, but all of the kmap_ops
mappings succeed, meaning that live_grants may remain zero. This in turn
makes it impossible to unmap the successfully grant-mapped pages pointed
to by kmap_ops, because unmap_grant_pages has the following snippet of
code at its beginning:
if (atomic_read(&map->live_grants) == 0)
return; /* Nothing to do */
In other cases where only some of the map_ops mappings fail but all
kmap_ops mappings succeed, live_grants is made positive, but when the
user requests unmapping the grant-mapped pages, __unmap_grant_pages_done
will then make map->live_grants negative, because the latter function
does not check if all of the pages that were requested to be unmapped
were actually unmapped, and the same function unconditionally subtracts
"data->count" (i.e., a value that can be greater than map->live_grants)
from map->live_grants. The side effects of a negative live_grants value
have not been studied.
The net effect of all of this is that grant references are leaked in one
of the above conditions. In Qubes OS v4.1 (which uses Xen's grant
mechanism extensively for X11 GUI isolation), this issue manifests
itself with warning messages like the following to be printed out by the
Linux kernel in the VM that had granted pages (that contain X11 GUI
window data) to dom0: "g.e. 0x1234 still pending", especially after the
user rapidly resizes GUI VM windows (causing some grant-mapping
operations to partially or completely fail, due to the fact that the VM
unshares some of the pages as part of the window resizing, making the
pages impossible to grant-map from dom0).
The fix for this issue involves counting all successful map_ops and
kmap_ops mappings separately, and then adding the sum to live_grants.
During unmapping, only the number of successfully unmapped grants is
subtracted from live_grants. The code is also modified to check for
negative live_grants values after the subtraction and warn the user.
Link: https://github.com/QubesOS/qubes-issues/issues/7631 Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()") Cc: stable@vger.kernel.org Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com> Acked-by: Demi Marie Obenour <demi@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20221002222006.2077-2-m.v.b@runbox.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While working on XSA-361 and its follow-ups, I failed to spot another
place where the kernel mapping part of an operation was not treated the
same as the user space part. Detect and propagate errors and add a 2nd
pr_debug().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/c2513395-74dc-aea3-9192-fd265aa44e35@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Co-authored-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For some exception types the instruction address points behind the
instruction that caused the exception. Take that into account and add
the missing exception table entry.
For some exception types the instruction address points behind the
instruction that caused the exception. Take that into account and add
the missing exception table entry.
For modules, names from kallsyms__parse() contain the module name which
meant that module symbols did not match exactly by name.
Fix by matching the name string up to the separating tab character.
Fixes: 1b36c03e356936d6 ("perf record: Add support for using symbols in address filters") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221026072736.2982-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Syzkaller managed to trigger concurrent calls to
kernfs_remove_by_name_ns() for the same file resulting in
a KASAN detected use-after-free. The race occurs when the root
node is freed during kernfs_drain().
To prevent this acquire an additional reference for the root
of the tree that is removed before calling __kernfs_remove().
Found by syzkaller with the following reproducer (slab_nomerge is
required):
The buggy address belongs to the object at ffff888008880780
which belongs to the cache kernfs_node_cache of size 128
The buggy address is located 112 bytes inside of
128-byte region [ffff888008880780, ffff888008880800)
Memory state around the buggy address: ffff888008880680: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ffff888008880700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff888008880780: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff888008880800: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ffff888008880880: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================
The signal_read(), action_read(), and action_write() callbacks have been
assuming Signal0 is requested without checking. This results in requests
for Signal1 returning data for Signal0. This patch fixes these
oversights by properly checking for the Signal's id in the respective
callbacks and handling accordingly based on the particular Signal
requested. The trig_inverted member of the mchp_tc_data is removed as
superfluous.
SDIO tuple is only allocated for standard SDIO card, especially it causes
memory corruption issues when the non-standard SDIO card has removed, which
is because the card device's reference counter does not increase for it at
sdio_init_func(), but all SDIO card device reference counter gets decreased
at sdio_release_func().
Fixes: 6f51be3d37df ("sdio: allow non-standard SDIO cards") Signed-off-by: Matthew Ma <mahongwei@zeku.com> Reviewed-by: Weizhao Ouyang <ouyangweizhao@zeku.com> Reviewed-by: John Wang <wangdayu@zeku.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221014034951.2300386-1-ouyangweizhao@zeku.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
REGMAP_MMIO is not user-configurable, so we can only satisfy this
dependency by enabling some other Kconfig symbol that properly 'select's
it. Use select like everybody else.
Noticed when trying to enable this driver for compile testing.
Device-managed resources allocated post component bind must be tied to
the lifetime of the aggregate DRM device or they will not necessarily be
released when binding of the aggregate device is deferred.
This is specifically true for the DP IRQ, which will otherwise remain
requested so that the next bind attempt fails when requesting the IRQ a
second time.
Since commit c3bf8e21b38a ("drm/msm/dp: Add eDP support via aux_bus")
this can happen when the aux-bus panel driver has not yet been loaded so
that probe is deferred.
Fix this by tying the device-managed lifetime of the DP IRQ to the DRM
device so that it is released when bind fails.
Add the missing sanity check on the bridge counter to avoid corrupting
data beyond the fixed-sized bridge array in case there are ever more
than eight bridges.
Add the missing sanity check on the bridge counter to avoid corrupting
data beyond the fixed-sized bridge array in case there are ever more
than eight bridges.
One of the sysfs values reported for supported_speeds was not valid (20Gb/s
reported instead of 64Gb/s). Instead of driver internal speed mask
definition, use speed mask defined in transport_fc for reporting
host->supported_speeds.
Back in 2014, the LQI was saved in the skb control buffer (skb->cb, or
mac_cb(skb)) without any actual reset of this area prior to its use.
As part of a useful rework of the use of this region, 32edc40ae65c
("ieee802154: change _cb handling slightly") introduced mac_cb_init() to
basically memset the cb field to 0. In particular, this new function got
called at the beginning of mac802154_parse_frame_start(), right before
the location where the buffer got actually filled.
What went through unnoticed however, is the fact that the very first
helper called by device drivers in the receive path already used this
area to save the LQI value for later extraction. Resetting the cb field
"so late" led to systematically zeroing the LQI.
If we consider the reset of the cb field needed, we can make it as soon
as we get an skb from a device driver, right before storing the LQI,
as is the very first time we need to write something there.
unshare_sighand should only access oldsighand->action
while holding oldsighand->siglock, to make sure that
newsighand->action is in a consistent state.
If "interp_elf_ex" fails to allocate memory in load_elf_binary(),
the program will take the "out_free_ph" error handing path,
resulting in "interpreter" file resource is not released.
Fix it by adding an error handing path "out_free_file", which will
release the file resource when "interp_elf_ex" failed to allocate
memory.
Fixes: 0693ffebcfe5 ("fs/binfmt_elf.c: allocate less for static executable") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221024154421.982230-1-lizetao1@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, every time the device wakes up from sleep, the
iio_chan array is reallocated, leaking the previous one
until the device is removed (basically never).
Move the allocation to the probe function to avoid this.
Signed-off-by: Cosmin Tanislav <cosmin.tanislav@analog.com> Fixes: f110f3188e563 ("iio: temperature: Add support for LTC2983") Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20221014123724.1401011-2-demonsingur@gmail.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tsl2583 probe() uses devm_iio_device_register() and calling
iio_device_unregister() causes the unregister to occur twice. s
Switch to iio_device_register() instead of devm_iio_device_register()
in probe to avoid the device managed cleanup.
The iio_utils uses a digit calculation in order to know length of the
file name containing a buffer number. The digit calculation does not
work for number 0.
This leads to allocation of one character too small buffer for the
file-name when file name contains value '0'. (Eg. buffer0).
Fix digit calculation by returning one digit to be present for number
'0'.
Endpoints are normally deleted from the bandwidth list when they are
dropped, before the virt device is freed.
If xHC host is dying or being removed then the endpoints aren't dropped
cleanly due to functions returning early to avoid interacting with a
non-accessible host controller.
So check and delete endpoints that are still on the bandwidth list when
freeing the virt device.
Solves a list_del corruption kernel crash when unbinding xhci-pci,
caused by xhci_mem_cleanup() when it later tried to delete already freed
endpoints from the bandwidth list.
This only affects hosts that use software bandwidth checking, which
currenty is only the xHC in intel Panther Point PCH (Ivy Bridge)
Cc: stable@vger.kernel.org Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20221024142720.4122053-5-mathias.nyman@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Systems based on Alder Lake P see significant boot time delay if
boot firmware tries to control usb ports in unexpected link states.
This is seen with self-powered usb devices that survive in U3 link
suspended state over S5.
A more generic solution to power off ports at shutdown was attempted in
commit 83810f84ecf1 ("xhci: turn off port power in shutdown")
but it caused regression.
Add host specific XHCI_RESET_TO_DEFAULT quirk which will reset host and
ports back to default state in shutdown.
Originally the absence of the marvell,nand-keep-config property caused
the setup_data_interface function to be provided. However when
setup_data_interface was moved into nand_controller_ops the logic was
unintentionally inverted. Update the logic so that only if the
marvell,nand-keep-config property is present the bootloader NAND config
kept.
Cc: stable@vger.kernel.org Fixes: 7a08dbaedd36 ("mtd: rawnand: Move ->setup_data_interface() to nand_controller_ops") Signed-off-by: Tony O'Brien <tony.obrien@alliedtelesis.co.nz> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20220927024728.28447-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This appears to fix the error:
"xhci_hcd <address>; ERROR Transfer event TRB DMA ptr not part of
current TD ep_index 2 comp_code 13" that appear spuriously (or pretty
often) when using a r8152 USB3 ethernet adapter with integrated hub.
ASM1042 reports as a 0.96 controller, but appears to behave more like 1.0
Inspired by this email thread: https://markmail.org/thread/7vzqbe7t6du6qsw3
When port is connected and then disconnected, the state stays as
configured. Which is incorrect as the port is no longer configured,
but in a not attached state.
The gadget driver may have a certain expectation of how the request
completion flow should be from to its configuration. Make sure the
controller driver respect that. That is, don't set IMI (Interrupt on
Missed Isoc) when usb_request->no_interrupt is set. Also, the driver
should only set IMI to the last TRB of a chain.
When servicing a transfer completion event, the dwc3 driver will reclaim
TRBs of started requests up to the request associated with the interrupt
event. Currently we don't check for interrupt due to missed isoc, and
the driver may attempt to reclaim TRBs beyond the associated event. This
causes invalid memory access when the hardware still owns the TRB. If
there's a missed isoc TRB with IMI (interrupt on missed isoc), make sure
to stop servicing further.
Note that only the last TRB of chained TRBs has its status updated with
missed isoc.
Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver") Cc: stable@vger.kernel.org Reported-by: Jeff Vanhoof <jdv1029@gmail.com> Reported-by: Dan Vacura <w36195@motorola.com> Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Reviewed-by: Jeff Vanhoof <jdv1029@gmail.com> Tested-by: Jeff Vanhoof <jdv1029@gmail.com> Link: https://lore.kernel.org/r/b29acbeab531b666095dfdafd8cb5c7654fbb3e1.1666735451.git.Thinh.Nguyen@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NVIDIA Jetson devices in Force Recovery mode (RCM) do not support
suspending, ie. flashing fails if the device has been suspended. The
devices are still visible in lsusb and seem to work otherwise, making
the issue hard to debug. This has been discovered in various forum
posts, eg. [1].
The patch has been tested on NVIDIA Jetson AGX Xavier, but I'm adding
all the Jetson models listed in [2] on the assumption that they all
behave similarly.
With char becoming unsigned by default, and with `char` alone being
ambiguous and based on architecture, signed chars need to be marked
explicitly as such. This fixes warnings like:
With char becoming unsigned by default, and with `char` alone being
ambiguous and based on architecture, signed chars need to be marked
explicitly as such. This fixes warnings like:
Instead just use del_timer_sync() which will wait for the timer to finish
before continuing. No need to check if the timer is active or not when
doing so.
This doesn't fix the race of a possible re-arming of the timer, but at
least it won't use the data that has just been freed.
kvaser_usb uses completions to signal when a response event is received
for outgoing commands.
However, it uses init_completion() to reinitialize the start_comp and
stop_comp completions before sending the start/stop commands.
In case the device sends the corresponding response just before the
actual command is sent, complete() may be called concurrently with
init_completion() which is not safe.
This might be triggerable even with a properly functioning device by
stopping the interface (CMD_STOP_CHIP) just after it goes bus-off (which
also causes the driver to send CMD_STOP_CHIP when restart-ms is off),
but that was not tested.
Fix the issue by using reinit_completion() instead.
Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices") Tested-by: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://lore.kernel.org/all/20221010185237.319219-2-extja@kvaser.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not allowed to call kfree_skb() from hardware interrupt context
or with interrupts being disabled. The skb is unlinked from the queue,
so it can be freed after spin_unlock_irqrestore().
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20221027091237.2290111-1-yangyingliang@huawei.com Cc: stable@vger.kernel.org
[mkl: adjust subject] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we call connect() for a UDP socket in a reuseport group, we have
to update sk->sk_reuseport_cb->has_conns to 1. Otherwise, the kernel
could select a unconnected socket wrongly for packets sent to the
connected socket.
However, the current way to set has_conns is illegal and possible to
trigger that problem. reuseport_has_conns() changes has_conns under
rcu_read_lock(), which upgrades the RCU reader to the updater. Then,
it must do the update under the updater's lock, reuseport_lock, but
it doesn't for now.
For this reason, there is a race below where we fail to set has_conns
resulting in the wrong socket selection. To avoid the race, let's split
the reader and updater with proper locking.
Note the likely(reuse) in reuseport_has_conns_set() is always true,
but we put the test there for ease of review. [0]
For the record, usually, sk_reuseport_cb is changed under lock_sock().
The only exception is reuseport_grow() & TCP reqsk migration case.
1) shutdown() TCP listener, which is moved into the latter part of
reuse->socks[] to migrate reqsk.
2) New listen() overflows reuse->socks[] and call reuseport_grow().
3) reuse->max_socks overflows u16 with the new listener.
4) reuseport_grow() pops the old shutdown()ed listener from the array
and update its sk->sk_reuseport_cb as NULL without lock_sock().
shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(),
but, reuseport_has_conns_set() is called only for UDP under lock_sock(),
so likely(reuse) never be false in reuseport_has_conns_set().
Seth Jenkins [Thu, 27 Oct 2022 15:36:52 +0000 (11:36 -0400)]
mm: /proc/pid/smaps_rollup: fix no vma's null-deref
Commit 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value
seq_file") introduced a null-deref if there are no vma's in the task in
show_smaps_rollup.
Fixes: 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value seq_file") Signed-off-by: Seth Jenkins <sethjenkins@google.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Tested-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is
initialized") moves wbt_set_write_cache() before rq_qos_add(), which
is wrong because wbt_rq_qos() is still NULL.
Fix the problem by removing wbt_set_write_cache() and setting 'rwb->wc'
directly. Noted that this patch also remove the redundant setting of
'rab->wc'.
Some SD-cards from Sandisk that are SDA-6.0 compliant reports they supports
discard, while they actually don't. This might cause mk2fs to fail while
trying to format the card and revert it to a read-only mode.
To fix this problem, let's add a card quirk (MMC_QUIRK_BROKEN_SD_DISCARD)
to indicate that we shall fall-back to use the legacy erase command
instead.
Alexey reported that the fraction of unknown filename instances in
kallsyms grew from ~0.3% to ~10% recently; Bill and Greg tracked it down
to assembler defined symbols, which regressed as a result of:
commit b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1")
In that commit, I allude to restoring debug info for assembler defined
symbols in a follow up patch, but it seems I forgot to do so in
commit a66049e2cf0e ("Kbuild: make DWARF version a choice")
Fixes: b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1") Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent change in LLVM made CONFIG_EFI_STUB unselectable because it no
longer pretends to support -mabi=ms, breaking the dependency in
Kconfig. Lack of CONFIG_EFI_STUB can prevent kernels from booting via
EFI in certain circumstances.
This check was added by
8f24f8c2fc82 ("efi/libstub: Annotate firmware routines as __efiapi")
to ensure that __attribute__((ms_abi)) was available, as -mabi=ms is
not actually used in any cflags.
According to the GCC documentation, this attribute has been supported
since GCC 4.4.7. The kernel currently requires GCC 5.1 so this check is
not necessary; even when that change landed in 5.6, the kernel required
GCC 4.9 so it was unnecessary then as well.
Clang supports __attribute__((ms_abi)) for all versions that are
supported for building the kernel so no additional check is needed.
Remove the 'depends on' line altogether to allow CONFIG_EFI_STUB to be
selected when CONFIG_EFI is enabled, regardless of compiler.
Fixes: 8f24f8c2fc82 ("efi/libstub: Annotate firmware routines as __efiapi") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Cc: stable@vger.kernel.org Link: https://github.com/llvm/llvm-project/commit/d1ad006a8f64bdc17f618deffa9e7c91d82c444d
[nathan: Fix conflict due to lack of c6dbd3e5e69c in older trees] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit is very different from the upstream commit! It fixes the same
issue by adding more quirks, rather then the general fix from the 6.1
kernel, because the general fix from the 6.1 kernel is part of a larger
refactoring of the backlight code which is not suitable for the stable
series.
As described in "ACPI: video: Drop NL5x?U, PF4NU1F and PF5?U??
acpi_backlight=native quirks" (10212754a0d2) the upstream commit "ACPI:
video: Make backlight class device registration a separate step (v2)"
(3dbc80a3e4c5) makes these quirks unnecessary. However as mentioned in this
bugtracker ticket https://bugzilla.kernel.org/show_bug.cgi?id=215683#c17
the upstream fix is part of a larger patchset that is overall too complex
for stable.
The TongFang GKxNRxx, GMxNGxx, GMxZGxx, and GMxRGxx / TUXEDO
Stellaris/Polaris Gen 1-4, have the same problem as the Clevo NL5xRU and
NL5xNU / TUXEDO Aura 15 Gen1 and Gen2:
They have a working native and video interface for screen backlight.
However the default detection mechanism first registers the video interface
before unregistering it again and switching to the native interface during
boot. This results in a dangling SBIOS request for backlight change for
some reason, causing the backlight to switch to ~2% once per boot on the
first power cord connect or disconnect event. Setting the native interface
explicitly circumvents this buggy behaviour by avoiding the unregistering
process.
Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During vm boot, there might be possibility that vf registration
call comes before the vf association from host to vm.
And this might break netvsc vf path, To prevent the same block
vf registration until vf bind message comes from host.
Cc: stable@vger.kernel.org Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The requirement for 64-bit address filters is that they are canonical
addresses. In other respects any address range is allowed which would
include user space addresses.
That can be useful for tracing virtual machine guests because address
filtering can be used to advantage in place of current privilege level
(CPL) filtering.
RISC-V has no sane defaults to fall back on where there is no cpu-map
in the devicetree.
Without sane defaults, the package, core and thread IDs are all set to
-1. This causes user-visible inaccuracies for tools like hwloc/lstopo
which rely on the sysfs cpu topology files to detect a system's
topology.
On a PolarFire SoC, which should have 4 harts with a thread each,
lstopo currently reports:
arm64's method of defining a default cpu topology requires only minimal
changes to apply to RISC-V also. The current arm64 implementation exits
early in a uniprocessor configuration by reading MPIDR & claiming that
uniprocessor can rely on the default values.
This is appears to be a hangover from prior to '3102bc0e6ac7 ("arm64:
topology: Stop using MPIDR for topology information")', because the
current code just assigns default values for multiprocessor systems.
With the MPIDR references removed, store_cpu_topolgy() can be moved to
the common arch_topology code.
The modem firmware memory requirements vary between 32M/140M on
no-lte/lte skus respectively, so fixup the modem memory region
to reflect the requirements.
Reviewed-by: Evan Green <evgreen@chromium.org> Signed-off-by: Sibi Sankar <sibis@codeaurora.org> Link: https://lore.kernel.org/r/1602786476-27833-1-git-send-email-sibis@codeaurora.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Alex Elder <elder@linaro.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
========================================================
WARNING: possible irq lock inversion dependency detected
5.12.0-syzkaller #0 Not tainted
--------------------------------------------------------
syz-executor132/8391 just changed the state of lock: ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: f_getown_ex fs/fcntl.c:211 [inline] ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: do_fcntl+0x8b4/0x1200 fs/fcntl.c:395
but this lock was taken by another, HARDIRQ-safe lock in the past:
(&dev->event_lock){-...}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Chain exists of:
&dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
However, since &dev->event_lock is HARDIRQ-safe, interrupts have to be
disabled while grabbing &f->f_owner.lock, otherwise we invert the lock
hierarchy.
Hence, we replace calls to read_lock/read_unlock on &f->f_owner.lock,
with read_lock_irq/read_unlock_irq.
Reported-and-tested-by: syzbot+e6d5398a02c516ce5e70@syzkaller.appspotmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently there is no way to differentiate the file with alive owner
from the file with dead owner but pid of the owner reused. That's why
CRIU can't actually know if it needs to restore file owner or not,
because if it restores owner but actual owner was dead, this can
introduce unexpected signals to the "false"-owner (which reused the
pid).
Let's change the api, so that F_GETOWN(EX) returns 0 in case actual
owner is dead already. This comports with the POSIX spec, which
states that a PID of 0 indicates that no signal will be sent.
Cc: Jeff Layton <jlayton@kernel.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Andrei Vagin <avagin@gmail.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
Stable-dep-of: f671a691e299 ("fcntl: fix potential deadlocks for &fown_struct.lock") Signed-off-by: Sasha Levin <sashal@kernel.org>
If the kernel exposes a new perf_event_attr field in a format attr, perf
will return an error stating the specified PMU can't be found. For
example, a format attr with 'config3:0-63' causes an error as config3 is
unknown to perf. This causes a compatibility issue between a newer
kernel with older perf tool.
Before this change with a kernel adding 'config3' I get:
$ perf record -e arm_spe// -- true
event syntax error: 'arm_spe//'
\___ Cannot find PMU `arm_spe'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list
available events
After this change, I get:
$ perf record -e arm_spe// -- true
WARNING: 'arm_spe_0' format 'inv_event_filter' requires 'perf_event_attr::config3' which is not supported by this version of perf!
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.091 MB perf.data ]
To support unknown configN formats, rework the YACC implementation to
pass any config[0-9]+ format to perf_pmu__new_format() to handle with a
warning.
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Rob Herring <robh@kernel.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220914-arm-perf-tool-spe1-2-v2-v4-1-83c098e6212e@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The kernel supported bits have been exported through
/sys/devices/<pmu>/format/. Perf collects the information to
'struct perf_pmu_format' and links it to 'pmu->format' list.
The 'struct perf_pmu_format' has a bitmap which records the
valid bits for this format. For example,
# ./perf stat -e cpu/rf01234,name=aaa/,cpu/r031234,name=bbb/ -a -- sleep 1
WARNING: event 'aaa' not valid (bits 20,22 of config 'f01234' not supported by kernel)!
WARNING: event 'bbb' not valid (bits 16-17 of config '31234' not supported by kernel)!
Co-developed-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210310051138.12154-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Stable-dep-of: e552b7be12ed ("perf: Skip and warn on unknown format 'configN' attrs") Signed-off-by: Sasha Levin <sashal@kernel.org>
When CONFIG_CMDLINE_FORCE is enabled, cmdline provided by
CONFIG_CMDLINE are always used. This allows CONFIG_CMDLINE to be
used regardless of the result of device tree scanning.
This especially fixes the case where a device tree without the
chosen node is supplied to the kernel. In such cases,
early_init_dt_scan would return true. But inside
early_init_dt_scan_chosen, the cmdline won't be updated as there
is no chosen node in the device tree. As a result, CONFIG_CMDLINE
is not copied into boot_command_line even if CONFIG_CMDLINE_FORCE
is enabled. This commit allows properly update boot_command_line
in this situation.
Fixes: 8fd6e05c7463 ("arch: riscv: support kernel command line forcing when no DTB passed") Signed-off-by: Wenting Zhang <zephray@outlook.com> Reviewed-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/PSBPR04MB399135DFC54928AB958D0638B1829@PSBPR04MB3991.apcprd04.prod.outlook.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Ensure tegra_host member "curr_clk_rate" holds the actual clock rate
instead of requested clock rate for proper use during tuning correction
algorithm. Actual clk rate may not be the same as the requested clk
frequency depending on the parent clock source set. Tuning correction
algorithm depends on certain parameters which are sensitive to current
clk rate. If the host clk is selected instead of the actual clock rate,
tuning correction algorithm may end up applying invalid correction,
which could result in errors
Prior to this commit, the gntdev driver code did not handle the
following scenario correctly with paravirtualized (PV) Xen domains:
* User process sets up a gntdev mapping composed of two grant mappings
(i.e., two pages shared by another Xen domain).
* User process munmap()s one of the pages.
* User process munmap()s the remaining page.
* User process exits.
In the scenario above, the user process would cause the kernel to log
the following messages in dmesg for the first munmap(), and the second
munmap() call would result in similar log messages:
For each munmap() call, the Xen hypervisor (if built with CONFIG_DEBUG)
would print out the following and trigger a general protection fault in
the affected Xen PV domain:
(XEN) d0v... Attempt to implicitly unmap d0's grant PTE ...
(XEN) d0v... Attempt to implicitly unmap d0's grant PTE ...
As of this writing, gntdev_grant_map structure's vma field (referred to
as map->vma below) is mainly used for checking the start and end
addresses of mappings. However, with split VMAs, these may change, and
there could be more than one VMA associated with a gntdev mapping.
Hence, remove the use of map->vma and rely on map->pages_vm_start for
the original start address and on (map->count << PAGE_SHIFT) for the
original mapping size. Let the invalidate() and find_special_page()
hooks use these.
Also, given that there can be multiple VMAs associated with a gntdev
mapping, move the "mmu_interval_notifier_remove(&map->notifier)" call to
the end of gntdev_put_map, so that the MMU notifier is only removed
after the closing of the last remaining VMA.
Finally, use an atomic to prevent inadvertent gntdev mapping re-use,
instead of using the map->live_grants atomic counter and/or the map->vma
pointer (the latter of which is now removed). This prevents the
userspace from mmap()'ing (with MAP_FIXED) a gntdev mapping over the
same address range as a previously set up gntdev mapping. This scenario
can be summarized with the following call-trace, which was valid prior
to this commit:
mmap
gntdev_mmap
mmap (repeat mmap with MAP_FIXED over the same address range)
gntdev_invalidate
unmap_grant_pages (sets 'being_removed' entries to true)
gnttab_unmap_refs_async
unmap_single_vma
gntdev_mmap (maps the shared pages again)
munmap
gntdev_invalidate
unmap_grant_pages
(no-op because 'being_removed' entries are true)
unmap_single_vma (For PV domains, Xen reports that a granted page
is being unmapped and triggers a general protection fault in the
affected domain, if Xen was built with CONFIG_DEBUG)
The fix for this last scenario could be worth its own commit, but we
opted for a single commit, because removing the gntdev_grant_map
structure's vma field requires guarding the entry to gntdev_mmap(), and
the live_grants atomic counter is not sufficient on its own to prevent
the mmap() over a pre-existing mapping.
The ftrace_boot_snapshot and alloc_snapshot cmdline options allocate the
snapshot buffer at boot up for use later. The ftrace_boot_snapshot in
particular requires the snapshot to be allocated because it will take a
snapshot at the end of boot up allowing to see the traces that happened
during boot so that it's not lost when user space takes over.
When a tracer is registered (started) there's a path that checks if it
requires the snapshot buffer or not, and if it does not and it was
allocated it will do a synchronization and free the snapshot buffer.
This is only required if the previous tracer was using it for "max
latency" snapshots, as it needs to make sure all max snapshots are
complete before freeing. But this is only needed if the previous tracer
was using the snapshot buffer for latency (like irqoff tracer and
friends). But it does not make sense to free it, if the previous tracer
was not using it, and the snapshot was allocated by the cmdline
parameters. This basically takes away the point of allocating it in the
first place!
Note, the allocated snapshot worked fine for just trace events, but fails
when a tracer is enabled on the cmdline.
Further investigation, this goes back even further and it does not require
a tracer on the cmdline to fail. Simply enable snapshots and then enable a
tracer, and it will remove the snapshot.
Link: https://lkml.kernel.org/r/20221005113757.041df7fe@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: 45ad21ca5530 ("tracing: Have trace_array keep track if snapshot buffer is allocated") Reported-by: Ross Zwisler <zwisler@kernel.org> Tested-by: Ross Zwisler <zwisler@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Two conditional compilation directives "#ifdef CONFIG_TRACER_MAX_TRACE"
are used consecutively, and no other code in between. Simplify conditional
the compilation code and only use one "#ifdef CONFIG_TRACER_MAX_TRACE".
Driver registration fails on SOC imx8mn as its supplier, the clock
control module, is probed later than subsys initcall level. This driver
uses platform_driver_probe which is not compatible with deferred probing
and won't be probed again later if probe function fails due to clock not
being available at that time.
This patch replaces the use of platform_driver_probe with
platform_driver_register which will allow probing the driver later again
when the clock control module will be available.
The __init annotation has been dropped because it is not compatible with
deferred probing. The code is not executed once and its memory cannot be
freed.
Fixes: a580b8c5429a ("dmaengine: mxs-dma: add dma support for i.MX23/28") Co-developed-by: Michael Trimarchi <michael@amarulasolutions.com> Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com> Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220921170556.1055962-1-dario.binacchi@amarulasolutions.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Make virtio_gpu_plane_cleanup_fb() to clean the state which DRM core
wants to clean up and not the current plane's state. Normally the older
atomic state is cleaned up, but the newer state could also be cleaned up
in case of aborted commits.
A splat from kmem_cache_destroy() was seen with a kernel prior to
commit ee2653bbe89d ("iommu/vt-d: Remove domain and devinfo mempool")
when there was a failure in init_dmars(), because the iommu_domain
cache still had objects. While the mempool code is now gone, there
still is a leak of the si_domain memory if init_dmars() fails. So
clean up si_domain in the init_dmars() error path.
Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Fixes: 86080ccc223a ("iommu/vt-d: Allocate si_domain in init_dmars()") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
arch_rmrr_sanity_check() warns if the RMRR is not covered by an ACPI
Reserved region, but it seems like it should accept an NVS region as
well. The ACPI spec
https://uefi.org/specs/ACPI/6.5/15_System_Address_Map_Interfaces.html
uses similar wording for "Reserved" and "NVS" region types; for NVS
regions it says "This range of addresses is in use or reserved by the
system and must not be used by the operating system."
There is an old comment on this mailing list that also suggests NVS
regions should pass the arch_rmrr_sanity_check() test:
The warnings come from arch_rmrr_sanity_check() since it checks whether
the region is E820_TYPE_RESERVED. However, if the purpose of the check
is to detect RMRR has regions that may be used by OS as free memory,
isn't E820_TYPE_NVS safe, too?
This patch overlaps with another proposed patch that would add the region
type to the log since sometimes the bug reporter sees this log on the
console but doesn't know to include the kernel log:
Here's an example of the "Firmware Bug" apparent false positive (wrapped
for line length):
DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR
[0x000000006f760000-0x000000006f762fff], contact BIOS vendor for
fixes
DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR
[0x000000006f760000-0x000000006f762fff]
If the cable is disconnected the PHY seems to toggle between MDI and
MDI-X modes. With the MDI crossover status interrupt active this causes
roughly 10 interrupts per second.
As the crossover status isn't checked by the driver, the interrupt can
be disabled to reduce the interrupt load.
Fixes: 87461f7a58ab ("net: phy: DP83822 initial driver submission") Signed-off-by: Felix Riemann <felix.riemann@sma.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20221018104755.30025-1-svc.sw.rte.linux@sma.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We had one syzbot report [1] in syzbot queue for a while.
I was waiting for more occurrences and/or a repro but
Dmitry Vyukov spotted the issue right away.
<quoting Dmitry>
qdisc_graft() drops reference to qdisc in notify_and_destroy
while it's still assigned to dev->qdisc
</quoting>
Indeed, RCU rules are clear when replacing a data structure.
The visible pointer (dev->qdisc in this case) must be updated
to the new object _before_ RCU grace period is started
(qdisc_put(old) in this case).
[1]
BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
The buggy address belongs to the object at ffff88802065e000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 56 bytes inside of
1024-byte region [ffff88802065e000, ffff88802065e400)
Inject fault while probing module, if device_register() fails,
but the refcount of kobject is not decreased to 0, the name
allocated in dev_set_name() is leaked. Fix this by calling
put_device(), so that name can be freed in callback function
kobject_cleanup().
Filters on different vports are qualified by different implicit MACs and/or
VLANs, so shouldn't be considered equal even if their other match fields
are identical.
Fixes: 7c460d9be610 ("sfc: Extend and abstract efx_filter_spec to cover Huntington/EF10") Co-developed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20221018092841.32206-1-pieter.jansen-van-vuuren@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the default qdisc is sfb, if the qdisc of dev_queue fails to be
inited during mqprio_init(), sfb_reset() is invoked to clear resources.
In this case, the q->qdisc is NULL, and it will cause gpf issue.
The process is as follows:
qdisc_create_dflt()
sfb_init()
tcf_block_get() --->failed, q->qdisc is NULL
...
qdisc_put()
...
sfb_reset()
qdisc_reset(q->qdisc) --->q->qdisc is NULL
ops = qdisc->ops
The following is the Call Trace information:
general protection fault, probably for non-canonical address
0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
RIP: 0010:qdisc_reset+0x2b/0x6f0
Call Trace:
<TASK>
sfb_reset+0x37/0xd0
qdisc_reset+0xed/0x6f0
qdisc_destroy+0x82/0x4c0
qdisc_put+0x9e/0xb0
qdisc_create_dflt+0x2c3/0x4a0
mqprio_init+0xa71/0x1760
qdisc_create+0x3eb/0x1000
tc_modify_qdisc+0x408/0x1720
rtnetlink_rcv_msg+0x38e/0xac0
netlink_rcv_skb+0x12d/0x3a0
netlink_unicast+0x4a2/0x740
netlink_sendmsg+0x826/0xcc0
sock_sendmsg+0xc5/0x100
____sys_sendmsg+0x583/0x690
___sys_sendmsg+0xe8/0x160
__sys_sendmsg+0xbf/0x160
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f2164122d04
</TASK>
Fixes: e13e02a3c68d ("net_sched: SFB flow scheduler") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
qdisc_reset() is clearing qdisc->q.qlen and qdisc->qstats.backlog
_after_ calling qdisc->ops->reset. There is no need to clear them
again in the specific reset function.
When the default qdisc is cake, if the qdisc of dev_queue fails to be
inited during mqprio_init(), cake_reset() is invoked to clear
resources. In this case, the tins is NULL, and it will cause gpf issue.
The process is as follows:
qdisc_create_dflt()
cake_init()
q->tins = kvcalloc(...) --->failed, q->tins is NULL
...
qdisc_put()
...
cake_reset()
...
cake_dequeue_one()
b = &q->tins[...] --->q->tins is NULL
The following is the Call Trace information:
general protection fault, probably for non-canonical address
0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:cake_dequeue_one+0xc9/0x3c0
Call Trace:
<TASK>
cake_reset+0xb1/0x140
qdisc_reset+0xed/0x6f0
qdisc_destroy+0x82/0x4c0
qdisc_put+0x9e/0xb0
qdisc_create_dflt+0x2c3/0x4a0
mqprio_init+0xa71/0x1760
qdisc_create+0x3eb/0x1000
tc_modify_qdisc+0x408/0x1720
rtnetlink_rcv_msg+0x38e/0xac0
netlink_rcv_skb+0x12d/0x3a0
netlink_unicast+0x4a2/0x740
netlink_sendmsg+0x826/0xcc0
sock_sendmsg+0xc5/0x100
____sys_sendmsg+0x583/0x690
___sys_sendmsg+0xe8/0x160
__sys_sendmsg+0xbf/0x160
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f89e5122d04
</TASK>
Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Recent commit 52fde2c07da6 ("nvme: set dma alignment to dword") has
caused a regression on our platform.
It turned out that the nvme_get_log() method invocation caused the
nvme_hwmon_data structure instance corruption. In particular the
nvme_hwmon_data.ctrl pointer was overwritten either with zeros or with
garbage. After some research we discovered that the problem happened
even before the actual NVME DMA execution, but during the buffer mapping.
Since our platform is DMA-noncoherent, the mapping implied the cache-line
invalidations or write-backs depending on the DMA-direction parameter.
In case of the NVME SMART log getting the DMA was performed
from-device-to-memory, thus the cache-invalidation was activated during
the buffer mapping. Since the log-buffer isn't cache-line aligned, the
cache-invalidation caused the neighbour data to be discarded. The
neighbouring data turned to be the data surrounding the buffer in the
framework of the nvme_hwmon_data structure.
In order to fix that we need to make sure that the whole log-buffer is
defined within the cache-line-aligned memory region so the
cache-invalidation procedure wouldn't involve the adjacent data. One of
the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the
rest of the NVME core driver prefer that method it has been chosen to fix
this problem too.
Note after a deeper researches we found out that the denoted commit wasn't
a root cause of the problem. It just revealed the invalidity by activating
the DMA-based NVME SMART log getting performed in the framework of the
NVME hwmon driver. The problem was here since the initial commit of the
driver.
An NVMe controller works perfectly fine even when the hwmon
initialization fails. Stop returning errors that do not come from a
controller reset from nvme_hwmon_init to handle this case consistently.
The hwmon pointer wont be NULL if the registration fails. Though the
exit code path will assign it to ctrl->hwmon_device. Later
nvme_hwmon_exit() will try to free the invalid pointer. Avoid this by
returning the error code from hwmon_device_register_with_info().
Fixes: ed7770f66286 ("nvme/hwmon: rework to avoid devm allocation") Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: c94b7f9bab22 ("nvme-hwmon: kmalloc the NVME SMART log buffer") Signed-off-by: Sasha Levin <sashal@kernel.org>
The original design to use device-managed resource allocation
doesn't really work as the NVMe controller has a vastly different
lifetime than the hwmon sysfs attributes, causing warning about
duplicate sysfs entries upon reconnection.
This patch reworks the hwmon allocation to avoid device-managed
resource allocation, and uses the NVMe controller as parent for
the sysfs attributes.
Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hannes Reinecke <hare@suse.de> Tested-by: Enzo Matsumiya <ematsumiya@suse.de> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: c94b7f9bab22 ("nvme-hwmon: kmalloc the NVME SMART log buffer") Signed-off-by: Sasha Levin <sashal@kernel.org>
It's possible that the driver will dereference a qcq that doesn't exist
when calling ionic_reconfigure_queues(), which causes a page fault BUG.
If a reduction in the number of queues is followed by a different
reconfig such as changing the ring size, the driver can hit a NULL
pointer when trying to clean up non-existent queues.
Fix this by checking to make sure both the qcqs array and qcq entry
exists bofore trying to use and free the entry.
Fixes: 101b40a0171f ("ionic: change queue count with no reset") Signed-off-by: Brett Creeley <brett@pensando.io> Signed-off-by: Shannon Nelson <snelson@pensando.io> Link: https://lore.kernel.org/r/20221017233123.15869-1-snelson@pensando.io Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If not flock, before return -ENOLCK, should free the xid,
otherwise, the xid will be leaked.
Fixes: d0677992d2af ("cifs: add support for flock") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When RX strap in HW is not set to MODE 3 or 4, bit 7 and 8 in CF4
register should be set. The former is already handled in
dp83867_config_init; add the latter in SGMII specific initialization.
Fixes: 2a10154abcb7 ("net: phy: dp83867: Add TI dp83867 phy") Signed-off-by: Harini Katakam <harini.katakam@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Then the input contains '\0' or '\n', proc_mpc_write has read them,
so the return value needs +1.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xiaobo Liu <cppcoffee@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Changing a VF's mac address through the VF (rather than via the PF)
fails with EPERM because the latter part of efx_ef10_set_mac_address
attempts to change the vport mac address list as the VF.
Even with this fixed it still fails with EBUSY because the vadaptor
is still assigned on the VF - the vadaptor reassignment must be within
a section where the VF has torn down its state.
A major reason this has broken is because we have two functions that
ostensibly do the same thing - have a PF and VF cooperate to change a
VF mac address. Rather than do this, if we are changing the mac of a VF
that has a link to the PF in the same VM then simply call
sriov_set_vf_mac instead, which is a proven working function that does
that.
If there is no PF available, or that fails non-fatally, then attempt to
change the VF's mac address as we would a PF, without updating the PF's
data.
Test case:
Create a VF:
echo 1 > /sys/class/net/<if>/device/sriov_numvfs
Set the mac address of the VF directly:
ip link set <vf> addr 00:11:22:33:44:55
Set the MAC address of the VF via the PF:
ip link set <pf> vf 0 mac 00:11:22:33:44:66
Without this patch the last command will fail with ENOENT.
Signed-off-by: Jonathan Cooper <jonathan.s.cooper@amd.com> Reported-by: Íñigo Huguet <ihuguet@redhat.com> Fixes: 910c8789a777 ("set the MAC address using MC_CMD_VADAPTOR_SET_MAC") Acked-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Under certain conditions the Magic Trackpad can group 2 reports in a
single packet. The packet is split and the raw event function is
invoked recursively for each part.
However, after processing each part, the BTN_MOUSE status is updated,
sending multiple click events. [1]
Return after processing double reports to avoid this issue.
This was caused because of new buffers with different RX ring count should
substitute older ones, but those buffers were freed in
i40e_configure_rx_ring and reallocated again with i40e_alloc_rx_bi,
thus kfree on rx_bi caused leak of already mapped DMA.
Fix this by reallocating ZC with rx_bi_zc struct when BPF program loads. Additionally
reallocate back to rx_bi when BPF program unloads.
If BPF program is loaded/unloaded and XSK pools are created, reallocate
RX queues accordingly in XSP_SETUP_XSK_POOL handler.
Fixes: be1222b585fd ("i40e: Separate kernel allocated rx_bi rings from AF_XDP rings") Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan <chandanx.rout@intel.com> (A Contingent Worker at Intel) Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Use a 8-byte write to initialize sub.usr_handle in
tipc_topsrv_kern_subscr(), otherwise four bytes remain uninitialized
when issuing setsockopt(..., SOL_TIPC, ...).
This resulted in an infoleak reported by KMSAN when the packet was
received:
Local variable sub created at:
tipc_topsrv_kern_subscr+0x57/0x400 net/tipc/topsrv.c:562
tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190
Bytes 84-87 of 88 are uninitialized
Memory access of size 88 starts at ffff88801ed57cd0
Data copied to user address 0000000020000400
...
=====================================================
Signed-off-by: Alexander Potapenko <glider@google.com> Fixes: 026321c6d056a5 ("tipc: rename tipc_server to tipc_topsrv") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The trial period exists until jiffies is after addr_trial_end. But as
jiffies will eventually overflow, just using time_after will eventually
give incorrect results. As the node address is set once the trial period
ends, this can be used to know that we are not in the trial period.
Fixes: e415577f57f4 ("tipc: correct discovery message handling during address trial period") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
During backref walking, when processing a delayed reference with a type of
BTRFS_TREE_BLOCK_REF_KEY, we have two bugs there:
1) We are accessing the delayed references extent_op, and its key, without
the protection of the delayed ref head's lock;
2) If there's no extent op for the delayed ref head, we end up with an
uninitialized key in the stack, variable 'tmp_op_key', and then pass
it to add_indirect_ref(), which adds the reference to the indirect
refs rb tree.
This is wrong, because indirect references should have a NULL key
when we don't have access to the key, and in that case they should be
added to the indirect_missing_keys rb tree and not to the indirect rb
tree.
This means that if have BTRFS_TREE_BLOCK_REF_KEY delayed ref resulting
from freeing an extent buffer, therefore with a count of -1, it will
not cancel out the corresponding reference we have in the extent tree
(with a count of 1), since both references end up in different rb
trees.
When using fiemap, where we often need to check if extents are shared
through shared subtrees resulting from snapshots, it means we can
incorrectly report an extent as shared when it's no longer shared.
However this is temporary because after the transaction is committed
the extent is no longer reported as shared, as running the delayed
reference results in deleting the tree block reference from the extent
tree.
Outside the fiemap context, the result is unpredictable, as the key was
not initialized but it's used when navigating the rb trees to insert
and search for references (prelim_ref_compare()), and we expect all
references in the indirect rb tree to have valid keys.
The following reproducer triggers the second bug:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdj
MNT=/mnt/sdj
mkfs.btrfs -f $DEV
mount -o compress $DEV $MNT
# With a compressed 128M file we get a tree height of 2 (level 1 root).
xfs_io -f -c "pwrite -b 1M 0 128M" $MNT/foo
btrfs subvolume snapshot $MNT $MNT/snap
# Fiemap should output 0x2008 in the flags column.
# 0x2000 means shared extent
# 0x8 means encoded extent (because it's compressed)
echo
echo "fiemap after snapshot, range [120M, 120M + 128K):"
xfs_io -c "fiemap -v 120M 128K" $MNT/foo
echo
# Overwrite one extent and fsync to flush delalloc and COW a new path
# in the snapshot's tree.
#
# After this we have a BTRFS_DROP_DELAYED_REF delayed ref of type
# BTRFS_TREE_BLOCK_REF_KEY with a count of -1 for every COWed extent
# buffer in the path.
#
# In the extent tree we have inline references of type
# BTRFS_TREE_BLOCK_REF_KEY, with a count of 1, for the same extent
# buffers, so they should cancel each other, and the extent buffers in
# the fs tree should no longer be considered as shared.
#
echo "Overwriting file range [120M, 120M + 128K)..."
xfs_io -c "pwrite -b 128K 120M 128K" $MNT/snap/foo
xfs_io -c "fsync" $MNT/snap/foo
# Fiemap should output 0x8 in the flags column. The extent in the range
# [120M, 120M + 128K) is no longer shared, it's now exclusive to the fs
# tree.
echo
echo "fiemap after overwrite range [120M, 120M + 128K):"
xfs_io -c "fiemap -v 120M 128K" $MNT/foo
echo
umount $MNT
Running it before this patch:
$ ./test.sh
(...)
wrote 134217728/134217728 bytes at offset 0
128 MiB, 128 ops; 0.1152 sec (1.085 GiB/sec and 1110.5809 ops/sec)
Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap'
fiemap after snapshot, range [120M, 120M + 128K):
/mnt/sdj/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [245760..246015]: 34304..34559 256 0x2008
Overwriting file range [120M, 120M + 128K)...
wrote 131072/131072 bytes at offset 125829120
128 KiB, 1 ops; 0.0001 sec (683.060 MiB/sec and 5464.4809 ops/sec)
fiemap after overwrite range [120M, 120M + 128K):
/mnt/sdj/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [245760..246015]: 34304..34559 256 0x2008
The extent in the range [120M, 120M + 128K) is still reported as shared
(0x2000 bit set) after overwriting that range and flushing delalloc, which
is not correct - an entire path was COWed in the snapshot's tree and the
extent is now only referenced by the original fs tree.
Running it after this patch:
$ ./test.sh
(...)
wrote 134217728/134217728 bytes at offset 0
128 MiB, 128 ops; 0.1198 sec (1.043 GiB/sec and 1068.2067 ops/sec)
Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap'
fiemap after snapshot, range [120M, 120M + 128K):
/mnt/sdj/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [245760..246015]: 34304..34559 256 0x2008
Overwriting file range [120M, 120M + 128K)...
wrote 131072/131072 bytes at offset 125829120
128 KiB, 1 ops; 0.0001 sec (694.444 MiB/sec and 5555.5556 ops/sec)
fiemap after overwrite range [120M, 120M + 128K):
/mnt/sdj/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [245760..246015]: 34304..34559 256 0x8
Now the extent is not reported as shared anymore.
So fix this by passing a NULL key pointer to add_indirect_ref() when
processing a delayed reference for a tree block if there's no extent op
for our delayed ref head with a defined key. Also access the extent op
only after locking the delayed ref head's lock.
The reproducer will be converted later to a test case for fstests.
Fixes: 86d5f994425252 ("btrfs: convert prelimary reference tracking to use rbtrees") Fixes: a6dbceafb915e8 ("btrfs: Remove unused op_key var from add_delayed_refs") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When processing delayed data references during backref walking and we are
using a share context (we are being called through fiemap), whenever we
find a delayed data reference for an inode different from the one we are
interested in, then we immediately exit and consider the data extent as
shared. This is wrong, because:
1) This might be a DROP reference that will cancel out a reference in the
extent tree;
2) Even if it's an ADD reference, it may be followed by a DROP reference
that cancels it out.
In either case we should not exit immediately.
Fix this by never exiting when we find a delayed data reference for
another inode - instead add the reference and if it does not cancel out
other delayed reference, we will exit early when we call
extent_is_shared() after processing all delayed references. If we find
a drop reference, then signal the code that processes references from
the extent tree (add_inline_refs() and add_keyed_refs()) to not exit
immediately if it finds there a reference for another inode, since we
have delayed drop references that may cancel it out. In this later case
we exit once we don't have references in the rb trees that cancel out
each other and have two references for different inodes.
Running it before this patch, the extent is still listed as shared, it has
the flag 0x2000 (FIEMAP_EXTENT_SHARED) set:
$ ./test-2.sh
fiemap after cloning:
/mnt/sdj/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 26624..26751 128 0x2001
fiemap after removing file bar:
/mnt/sdj/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 26624..26751 128 0x2001
After this patch, after deleting bar in both tests, the extent is not
reported with the 0x2000 flag anymore, it gets only the flag 0x1
(which is FIEMAP_EXTENT_LAST):
$ ./test-1.sh
fiemap after cloning:
/mnt/sdj/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 26624..26751 128 0x2001
fiemap after removing file bar:
/mnt/sdj/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 26624..26751 128 0x1
$ ./test-2.sh
fiemap after cloning:
/mnt/sdj/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 26624..26751 128 0x2001
fiemap after removing file bar:
/mnt/sdj/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 26624..26751 128 0x1
These tests will later be converted to a test case for fstests.
Fixes: dc046b10c8b7d4 ("Btrfs: make fiemap not blow when you have lots of snapshots") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The Lenovo OneLink+ Dock contains an RTL8153 controller that behaves as
a broken CDC device by default. Add the custom Lenovo PID to the r8152
driver to support it properly.
Also, systems compatible with this dock provide a BIOS option to enable
MAC address passthrough (as per Lenovo document "ThinkPad Docking
Solutions 2017"). Add the custom PID to the MAC passthrough list too.
Tested on a ThinkPad 13 1st gen with the expected results:
passthrough disabled: Invalid header when reading pass-thru MAC addr
passthrough enabled: Using pass-thru MAC addr XX:XX:XX:XX:XX:XX
Signed-off-by: Jean-Francois Le Fillatre <jflf_kernel@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cortex-A57 and Cortex-A72 have an erratum where an interrupt that
occurs between a pair of AES instructions in aarch32 mode may corrupt
the ELR. The task will subsequently produce the wrong AES result.
The AES instructions are part of the cryptographic extensions, which are
optional. User-space software will detect the support for these
instructions from the hwcaps. If the platform doesn't support these
instructions a software implementation should be used.
Remove the hwcap bits on affected parts to indicate user-space should
not use the AES instructions.
Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20220714161523.279570-3-james.morse@arm.com Signed-off-by: Will Deacon <will@kernel.org>
[florian: removed arch/arm64/tools/cpucaps and fixup cpufeature.c] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Our test found a problem that wbt inflight counter is negative, which
will cause io hang(noted that this problem doesn't exist in mainline):
t1: device create t2: issue io
add_disk
blk_register_queue
wbt_enable_default
wbt_init
rq_qos_add
// wb_normal is still 0
/*
* in mainline, disk can't be opened before
* bdev_add(), however, in old kernels, disk
* can be opened before blk_register_queue().
*/
blkdev_issue_flush
// disk size is 0, however, it's not checked
submit_bio_wait
submit_bio
blk_mq_submit_bio
rq_qos_throttle
wbt_wait
bio_to_wbt_flags
rwb_enabled
// wb_normal is 0, inflight is not increased
wbt_queue_depth_changed(&rwb->rqos);
wbt_update_limits
// wb_normal is initialized
rq_qos_track
wbt_track
rq->wbt_flags |= bio_to_wbt_flags(rwb, bio);
// wb_normal is not 0,wbt_flags will be set
t3: io completion
blk_mq_free_request
rq_qos_done
wbt_done
wbt_is_tracked
// return true
__wbt_done
wbt_rqw_done
atomic_dec_return(&rqw->inflight);
// inflight is decreased
commit 8235b5c1e8c1 ("block: call bdev_add later in device_add_disk") can
avoid this problem, however it's better to fix this problem in wbt:
1) Lower kernel can't backport this patch due to lots of refactor.
2) Root cause is that wbt call rq_qos_add() before wb_normal is
initialized.