ppp_cp_event is called directly or indirectly by ppp_rx with "ppp->lock"
held. It may call mod_timer to add a new timer. However, at the same time
ppp_timer may be already running and waiting for "ppp->lock". In this
case, there's no need for ppp_timer to continue running and it can just
exit.
If we let ppp_timer continue running, it may call add_timer. This causes
kernel panic because add_timer can't be called with a timer pending.
This patch fixes this problem.
Fixes: e022c2f07ae5 ("WAN: new synchronous PPP implementation for generic HDLC.") Cc: Krzysztof Halasa <khc@pm.waw.pl> Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RT_TOS() only clears one of the ECN bits. Therefore, when
fib_compute_spec_dst() resorts to a fib lookup, it can return
different results depending on the value of the second ECN bit.
For example, ECT(0) and ECT(1) packets could be treated differently.
$ ip netns add ns0
$ ip netns add ns1
$ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
$ ip -netns ns0 link set dev lo up
$ ip -netns ns1 link set dev lo up
$ ip -netns ns0 link set dev veth01 up
$ ip -netns ns1 link set dev veth10 up
$ ip -netns ns0 address add 192.0.2.10/24 dev veth01
$ ip -netns ns1 address add 192.0.2.11/24 dev veth10
$ ip -netns ns1 address add 192.0.2.21/32 dev lo
$ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21
$ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0
With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21
(ping uses -Q to set all TOS and ECN bits):
$ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255
[...]
64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms
But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11
because the "tos 4" route isn't matched:
$ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
[...]
64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms
After this patch the ECN bits don't affect the result anymore:
$ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
[...]
64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms
Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper.") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The CPTS driver registers PTP PHC clock when first netif is going up and
unregister it when all netif are down. Now ethtool will show:
- PTP PHC clock index 0 after boot until first netif is up;
- the last assigned PTP PHC clock index even if PTP PHC clock is not
registered any more after all netifs are down.
This patch ensures that -1 is returned by ethtool when PTP PHC clock is not
registered any more.
Two race conditions can be triggered when storing xps cpus, resulting in
various oops and invalid memory accesses:
1. Calling netdev_set_num_tc while netif_set_xps_queue:
- netif_set_xps_queue uses dev->tc_num as one of the parameters to
compute the size of new_dev_maps when allocating it. dev->tc_num is
also used to access the map, and the compiler may generate code to
retrieve this field multiple times in the function.
- netdev_set_num_tc sets dev->tc_num.
If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
is set to a higher value through netdev_set_num_tc, later accesses to
new_dev_maps in netif_set_xps_queue could lead to accessing memory
outside of new_dev_maps; triggering an oops.
2. Calling netif_set_xps_queue while netdev_set_num_tc is running:
2.1. netdev_set_num_tc starts by resetting the xps queues,
dev->tc_num isn't updated yet.
2.2. netif_set_xps_queue is called, setting up the map with the
*old* dev->num_tc.
2.3. netdev_set_num_tc updates dev->tc_num.
2.4. Later accesses to the map lead to out of bound accesses and
oops.
A similar issue can be found with netdev_reset_tc.
One way of triggering this is to set an iface up (for which the driver
uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
xps_cpus in a concurrent thread. With the right timing an oops is
triggered.
Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
and netdev_reset_tc should be mutually exclusive. We do that by taking
the rtnl lock in xps_cpus_store.
Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When aggregating ncsi interfaces and dedicated interfaces to bond
interfaces, the ncsi response handler will use the wrong net device to
find ncsi_dev, so that the ncsi interface will not work properly.
Here, we use the original net device to fix it.
Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler") Signed-off-by: John Wang <wangzhiqiang.bj@bytedance.com> Link: https://lore.kernel.org/r/20201223055523.2069-1-wangzhiqiang.bj@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
virtnet_set_channels can recursively call cpus_read_lock if CONFIG_XPS
and CONFIG_HOTPLUG are enabled.
The path is:
virtnet_set_channels - calls get_online_cpus(), which is a trivial
wrapper around cpus_read_lock()
netif_set_real_num_tx_queues
netif_reset_xps_queues_gt
netif_reset_xps_queues - calls cpus_read_lock()
This call chain and potential deadlock happens when the number of TX
queues is reduced.
This commit the removes netif_set_real_num_[tr]x_queues calls from
inside the get/put_online_cpus section, as they don't require that it
be held.
Fixes: 47be24796c13 ("virtio-net: fix the set affinity bug when CPU IDs are not consecutive") Signed-off-by: Jeff Dike <jdike@akamai.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20201223025421.671-1-jdike@akamai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
IPIP tunnels packets are unknown to device,
hence these packets are incorrectly parsed and
caused the packet corruption, so disable offlods
for such packets at run time.
All the buffers and registers are already set up appropriately for an
MTU slightly above 1500, so we just need to expose this to the
networking stack. AFAICT, there's no need to implement .ndo_change_mtu
when the receive buffers are always set up to support the max_mtu.
This fixes several warnings during boot on our mpc8309-board with an
embedded mv88e6250 switch:
mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU 1500 on port 0
...
mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU 1500 on port 4
ucc_geth e0102000.ethernet eth1: error -22 setting MTU to 1504 to include DSA overhead
The last line explains what the DSA stack tries to do: achieving an MTU
of 1500 on-the-wire requires that the master netdevice connected to
the CPU port supports an MTU of 1500+the tagging overhead.
Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ugeth is the netdiv_priv() part of the netdevice. Accessing the memory
pointed to by ugeth (such as done by ucc_geth_memclean() and the two
of_node_puts) after free_netdev() is thus use-after-free.
Commit 436e980e2ed5 ("kbuild: don't hardcode depmod path") stopped
hard-coding the path of depmod, but in the process caused trouble for
distributions that had that /sbin location, but didn't have it in the
PATH (generally because /sbin is limited to the super-user path).
Work around it for now by just adding /sbin to the end of PATH in the
depmod.sh script.
RQF_PREEMPT is used for two different purposes in the legacy IDE code:
1. To mark power management requests.
2. To mark requests that should preempt another request. An (old)
explanation of that feature is as follows: "The IDE driver in the Linux
kernel normally uses a series of busywait delays during its
initialization. When the driver executes these busywaits, the kernel
does nothing for the duration of the wait. The time spent in these
waits could be used for other initialization activities, if they could
be run concurrently with these waits.
More specifically, busywait-style delays such as udelay() in module
init functions inhibit kernel preemption because the Big Kernel Lock is
held, while yielding APIs such as schedule_timeout() allow
preemption. This is true because the kernel handles the BKL specially
and releases and reacquires it across reschedules allowed by the
current thread.
This IDE-preempt specification requires that the driver eliminate these
busywaits and replace them with a mechanism that allows other work to
proceed while the IDE driver is initializing."
Since I haven't found an implementation of (2), do not set the PREEMPT flag
for sense requests. This patch causes sense requests to be postponed while
a drive is suspended instead of being submitted to ide_queue_rq().
If it would ever be necessary to restore the IDE PREEMPT functionality,
that can be done by introducing a new flag in struct ide_request.
Link: https://lore.kernel.org/r/20201209052951.16136-4-bvanassche@acm.org Cc: David S. Miller <davem@davemloft.net> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Can Guo <cang@codeaurora.org> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The expectation for suspend-to-disk is that devices will be powered-off, so
the UFS device should be put in PowerDown mode. If spm_lvl is not 5, then
that will not happen. Change the pm callbacks to force spm_lvl 5 for
suspend-to-disk poweroff.
In realtime scenario, We do not want to have interference on the
isolated cpu cores. but when invoking alloc_workqueue() for percpu wq
on the housekeeping cpu, it kick a kworker on the isolated cpu.
The comment in pwq_adjust_max_active() said:
"Need to kick a worker after thawed or an unbound wq's
max_active is bumped"
So it is unnecessary to kick a kworker for percpu's wq when invoking
alloc_workqueue(). this patch only kick a worker based on the actual
activation of delayed works.
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
mwifiex_cmd_802_11_ad_hoc_start() calls memcpy() without checking
the destination size may trigger a buffer overflower,
which a local user could use to cause denial of service
or the execution of arbitrary code.
Fix it by putting the length check before calling memcpy().
One of a class of bugs pointed out by Lars in a recent review.
iio_push_to_buffers_with_timestamp() assumes the buffer used is aligned
to the size of the timestamp (8 bytes). This is not guaranteed in
this driver which uses an array of smaller elements on the stack.
As Lars also noted this anti pattern can involve a leak of data to
userspace and that indeed can happen here. We close both issues by
moving to a suitable structure in the iio_priv() data.
This data is allocated with kzalloc() so no data can leak apart from
previous readings.
The explicit alignment of ts is not necessary in this case but
does make the code slightly less fragile so I have included it.
One of a class of bugs pointed out by Lars in a recent review.
iio_push_to_buffers_with_timestamp assumes the buffer used is aligned
to the size of the timestamp (8 bytes). This is not guaranteed in
this driver which uses an array of smaller elements on the stack.
As Lars also noted this anti pattern can involve a leak of data to
userspace and that indeed can happen here. We close both issues by
moving to a suitable array in the iio_priv() data with alignment
explicitly requested. This data is allocated with kzalloc() so no
data can leak apart from previous readings.
In this driver, depending on which channels are enabled, the timestamp
can be in a number of locations. Hence we cannot use a structure
to specify the data layout without it being misleading.
Fixes: 77c4ad2d6a9b ("iio: imu: Add initial support for Bosch BMI160") Reported-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Cc: Daniel Baluta <daniel.baluta@gmail.com> Cc: Daniel Baluta <daniel.baluta@oss.nxp.com> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200920112742.170751-6-jic23@kernel.org
[sudip: adjust context and use bmi160_data in old location] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If emergency system shutdown is called, like by thermal shutdown,
a dm device could be alive when the block device couldn't process
I/O requests anymore. In this state, the handling of I/O errors
by new dm I/O requests or by those already in-flight can lead to
a verity corruption state, which is a misjudgment.
So, skip verity work in response to I/O error when system is shutting
down.
Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The PCM hw_params core function tries to clear up the PCM buffer
before actually using for avoiding the information leak from the
previous usages or the usage before a new allocation. It performs the
memset() with runtime->dma_bytes, but this might still leave some
remaining bytes untouched; namely, the PCM buffer size is aligned in
page size for mmap, hence runtime->dma_bytes doesn't necessarily cover
all PCM buffer pages, and the remaining bytes are exposed via mmap.
This patch changes the memory clearance to cover the all buffer pages
if the stream is supposed to be mmap-ready (that guarantees that the
buffer size is aligned in page size).
Apparently there has been a longstanding race between udev/systemd and
the module loader. Currently, the module loader sends a uevent right
after sysfs initialization, but before the module calls its init
function. However, some udev rules expect that the module has
initialized already upon receiving the uevent.
This race has been triggered recently (see link in references) in some
systemd mount unit files. For instance, the configfs module creates the
/sys/kernel/config mount point in its init function, however the module
loader issues the uevent before this happens. sys-kernel-config.mount
expects to be able to mount /sys/kernel/config upon receipt of the
module loading uevent, but if the configfs module has not called its
init function yet, then this directory will not exist and the mount unit
fails. A similar situation exists for sys-fs-fuse-connections.mount, as
the fuse sysfs mount point is created during the fuse module's init
function. If udev is faster than module initialization then the mount
unit would fail in a similar fashion.
To fix this race, delay the module KOBJ_ADD uevent until after the
module has finished calling its init routine.
I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.
The on-disk quota format supports quota files with upto 2^32 blocks. Be
careful when computing quota file offsets in the quota files from block
numbers as they can overflow 32-bit types. Since quota files larger than
4GB would require ~26 millions of quota users, this is mostly a
theoretical concern now but better be careful, fuzzers would find the
problem sooner or later anyway...
Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
If a module fails to load due to an error in prepare_coming_module(),
the following error handling in load_module() runs with
MODULE_STATE_COMING in module's state. Fix it by correctly setting
MODULE_STATE_GOING under "bug_cleanup" label.
When clk_hw_register_fixed_rate_with_accuracy() fails,
clk_data should be freed. It's the same for the subsequent
two error paths, but we should also unregister the already
registered clocks in them.
The snd_seq_queue struct contains various flags in the bit fields.
Those are categorized to two different use cases, both of which are
protected by different spinlocks. That implies that there are still
potential risks of the bad operations for bit fields by concurrent
accesses.
For addressing the problem, this patch rearranges those flags to be
a standard bool instead of a bit field.
As reported on:
https://lore.kernel.org/linux-media/20190627222020.45909-1-willemdebruijn.kernel@gmail.com/
if gp8psk_usb_in_op() returns an error, the status var is not
initialized. Yet, this var is used later on, in order to
identify:
- if the device was already started;
- if firmware has loaded;
- if the LNBf was powered on.
Using status = 0 seems to ensure that everything will be
properly powered up.
So, instead of the proposed solution, let's just set
status = 0.
Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A kernel-infoleak was reported by syzbot, which was caused because
dbells was left uninitialized.
Using kzalloc() instead of kmalloc() fixes this issue.
Specify type alignment when declaring linker-section match-table entries
to prevent gcc from increasing alignment and corrupting the various
tables with padding (e.g. timers, irqchips, clocks, reserved memory).
This is specifically needed on x86 where gcc (typically) aligns larger
objects like struct of_device_id with static extent on 32-byte
boundaries which at best prevents matching on anything but the first
entry. Specifying alignment when declaring variables suppresses this
optimisation.
Here's a 64-bit example where all entries are corrupt as 16 bytes of
padding has been inserted before the first entry:
And here's a 32-bit example where the 8-byte-aligned table happens to be
placed on a 32-byte boundary so that all but the first entry are corrupt
due to the 28 bytes of padding inserted between entries:
and include <linux/const.h> in UAPI headers instead of <linux/kernel.h>.
The reason is to avoid indirect <linux/sysinfo.h> include when using
some network headers: <linux/netlink.h> or others -> <linux/kernel.h>
-> <linux/sysinfo.h>.
This indirect include causes on MUSL redefinition of struct sysinfo when
included both <sys/sysinfo.h> and some of UAPI headers:
In file included from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/kernel.h:5,
from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/netlink.h:5,
from ../include/tst_netlink.h:14,
from tst_crypto.c:13:
x86_64-buildroot-linux-musl/sysroot/usr/include/linux/sysinfo.h:8:8: error: redefinition of `struct sysinfo'
struct sysinfo {
^~~~~~~
In file included from ../include/tst_safe_macros.h:15,
from ../include/tst_test.h:93,
from tst_crypto.c:11:
x86_64-buildroot-linux-musl/sysroot/usr/include/sys/sysinfo.h:10:8: note: originally defined here
fls() and fls64() are using __builtin_ctz() and _builtin_ctzll().
On powerpc, those builtins trivially use ctlzw and ctlzd power
instructions.
Allthough those instructions provide the expected result with
input argument 0, __builtin_ctz() and __builtin_ctzll() are
documented as undefined for value 0.
The easiest fix would be to use fls() and fls64() functions
defined in include/asm-generic/bitops/builtin-fls.h and
include/asm-generic/bitops/fls64.h, but GCC output is not optimal:
When the input of fls(x) is a constant, just check x for nullity and
return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction
directly.
For fls64() on PPC64, do the same but with __builtin_clzll() and
cntlzd instruction. On PPC32, lets take the generic fls64() which
will use our fls(). The result is as expected:
The driver must not call tty_wakeup() while holding its private lock as
line disciplines are allowed to call back into write() from
write_wakeup(), leading to a deadlock.
Also remove the unneeded work struct that was used to defer wakeup in
order to work around a possible race in ancient times (see comment about
n_tty write_chan() in commit 14b54e39b412 ("USB: serial: remove
changelogs and old todo entries")).
For an LCU update a read unit address configuration IO is required.
This is started using sleep_on(), which has early exit paths in case the
device is not usable for IO. For example when it is in offline processing.
In those cases the LCU update should fail and not be retried.
Therefore lcu_update_work checks if EOPNOTSUPP is returned or not.
Commit 41995342b40c ("s390/dasd: fix endless loop after read unit address configuration")
accidentally removed the EOPNOTSUPP return code from
read_unit_address_configuration(), which in turn might lead to an endless
loop of the LCU update in offline processing.
Fix by returning EOPNOTSUPP again if the device is not able to perform the
request.
Fixes: 41995342b40c ("s390/dasd: fix endless loop after read unit address configuration") Cc: stable@vger.kernel.org #5.3 Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
In case an error occurs in vfio_pci_enable() before the call to
vfio_pci_probe_mmaps(), vfio_pci_disable() will try to iterate
on an uninitialized list and cause a kernel panic.
Lets move to the initialization to vfio_pci_probe() to fix the
issue.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Fixes: 05f0c03fbac1 ("vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive") CC: Stable <stable@vger.kernel.org> # v4.7+ Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We've seen memory.stat reads in top-level cgroups take up to fourteen
seconds during a userspace bug that created tens of thousands of ghost
cgroups pinned by lingering page cache.
Even with a more reasonable number of cgroups, aggregating memory.stat
is unnecessarily heavy. The complexity is this:
nr_cgroups * nr_stat_items * nr_possible_cpus
where the stat items are ~70 at this point. With 128 cgroups and 128
CPUs - decent, not enormous setups - reading the top-level memory.stat
has to aggregate over a million per-cpu counters. This doesn't scale.
Instead of spreading the source of truth across all CPUs, use the
per-cpu counters merely to batch updates to shared atomic counters.
This is the same as the per-cpu stocks we use for charging memory to the
shared atomic page_counters, and also the way the global vmstat counters
are implemented.
Vmstat has elaborate spilling thresholds that depend on the number of
CPUs, amount of memory, and memory pressure - carefully balancing the
cost of counter updates with the amount of per-cpu error. That's
because the vmstat counters are system-wide, but also used for decisions
inside the kernel (e.g. NR_FREE_PAGES in the allocator). Neither is
true for the memory controller.
Use the same static batch size we already use for page_counter updates
during charging. The per-cpu error in the stats will be 128k, which is
an acceptable ratio of cores to memory accounting granularity.
[hannes@cmpxchg.org: fix warning in __this_cpu_xchg() calls] Link: http://lkml.kernel.org/r/20171201135750.GB8097@cmpxchg.org Link: http://lkml.kernel.org/r/20171103153336.24044-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org
[shaoyi@amazon.com: resolved the conflict brought by commit 17ffa29c355658c8e9b19f56cbf0388500ca7905 in mm/memcontrol.c by contextual fix] Signed-off-by: Shaoying Xu <shaoyi@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The implementation of the lruvec stat functions and their variants for
accounting through a page, or accounting from a preemptible context, are
mostly identical and needlessly repetitive.
Implement the lruvec_page functions by looking up the page's lruvec and
then using the lruvec function.
Implement the functions for preemptible contexts by disabling preemption
before calling the atomic context functions.
Link: http://lkml.kernel.org/r/20171103153336.24044-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Shaoying Xu <shaoyi@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The altsetting sanity check in set_sync_ep_implicit_fb_quirk() was
checking for there to be at least one altsetting but then went on to
access the second one, which may not exist.
This could lead to random slab data being used to initialise the sync
endpoint in snd_usb_add_endpoint().
Fixes: c75a8a7ae565 ("ALSA: snd-usb: add support for implicit feedback") Fixes: ca10a7ebdff1 ("ALSA: usb-audio: FT C400 sync playback EP to capture EP") Fixes: 5e35dc0338d8 ("ALSA: usb-audio: add implicit fb quirk for Behringer UFX1204") Fixes: 17f08b0d9aaf ("ALSA: usb-audio: add implicit fb quirk for Axe-Fx II") Fixes: 103e9625647a ("ALSA: usb-audio: simplify set_sync_ep_implicit_fb_quirk") Cc: stable <stable@vger.kernel.org> # 3.5 Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20200114083953.1106-1-johan@kernel.org Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CA0132 has the delayed HP jack detection code that is invoked from the
unsol handler, but it does a few weird things: it contains the cancel
of a work inside the work handler, and yet it misses the cancel-sync
call at (runtime-)suspend. This patch addresses those issues.
In __make_request() a new r10bio is allocated and passed to
raid10_read_request(). The read_slot member of the bio is not
initialized, and the raid10_read_request() uses it to index an
array. This leads to occasional panics.
Fix by initializing the field to invalid value and checking for
valid value in raid10_read_request().
Cc: stable@vger.kernel.org Signed-off-by: Kevin Vigor <kvigor@gmail.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Omitting suffixes from instructions in AT&T mode is bad practice when
operand size cannot be determined by the assembler from register
operands, and is likely going to be warned about by upstream gas in the
future (mine does already). Add the single missing suffix here.
If kobject_init_and_add() fails, pci_slot_release() is called to delete
slot->list from parent->slots. But slot->list hasn't been initialized
yet, so we dereference a NULL pointer:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
...
CPU: 10 PID: 1 Comm: swapper/0 Not tainted 4.4.240 #197
task: ffffeb398a45ef10 task.stack: ffffeb398a470000
PC is at __list_del_entry_valid+0x5c/0xb0
LR is at pci_slot_release+0x84/0xe4
...
__list_del_entry_valid+0x5c/0xb0
pci_slot_release+0x84/0xe4
kobject_put+0x184/0x1c4
pci_create_slot+0x17c/0x1b4
__pci_hp_initialize+0x68/0xa4
pciehp_probe+0x1a4/0x2fc
pcie_port_probe_service+0x58/0x84
driver_probe_device+0x320/0x470
Initialize slot->list before calling kobject_init_and_add() to avoid this.
A recent change to ndctl to attempt to reconfigure namespaces in place
uncovered a label accounting problem in block-window-type namespaces.
The ndctl "create.sh" test is able to trigger this signature:
When allocated capacity for a namespace is renamed (new UUID) the labels
with the old UUID need to be deleted. The ndctl behavior to always
destroy namespaces on reconfiguration hid this problem.
The immediate impact of this bug is limited since block-window-type
namespaces only seem to exist in the specification and not in any
shipping products. However, the label handling code is being reused for
other technologies like CXL region labels, so there is a benefit to
making sure both vertical labels sets (block-window) and horizontal
label sets (pmem) have a functional reference implementation in
libnvdimm.
Fixes: c4703ce11c23 ("libnvdimm/namespace: Fix label tracking error") Cc: <stable@vger.kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
'xenbus_backend' watches 'state' of devices, which is writable by
guests. Hence, if guests intensively updates it, dom0 will have lots of
pending events that exhausting memory of dom0. In other words, guests
can trigger dom0 memory pressure. This is known as XSA-349. However,
the watch callback of it, 'frontend_changed()', reads only 'state', so
doesn't need to have the pending events.
To avoid the problem, this commit disallows pending watch messages for
'xenbus_backend' using the 'will_handle()' watch callback.
This commit adds a counter of pending messages for each watch in the
struct. It is used to skip unnecessary pending messages lookup in
'unregister_xenbus_watch()'. It could also be used in 'will_handle'
callback.
Some code does not directly make 'xenbus_watch' object and call
'register_xenbus_watch()' but use 'xenbus_watch_path()' instead. This
commit adds support of 'will_handle' callback in the
'xenbus_watch_path()' and it's wrapper, 'xenbus_watch_pathfmt()'.
If handling logics of watch events are slower than the events enqueue
logic and the events can be created from the guests, the guests could
trigger memory pressure by intensively inducing the events, because it
will create a huge number of pending events that exhausting the memory.
Fortunately, some watch events could be ignored, depending on its
handler callback. For example, if the callback has interest in only one
single path, the watch wouldn't want multiple pending events. Or, some
watches could ignore events to same path.
To let such watches to volutarily help avoiding the memory pressure
situation, this commit introduces new watch callback, 'will_handle'. If
it is not NULL, it will be called for each new event just before
enqueuing it. Then, if the callback returns false, the event will be
discarded. No watch is using the callback for now, though.
When xen_blkif_disconnect() is called, the kernel thread behind the
block interface is stopped by calling kthread_stop(ring->xenblkd).
The ring->xenblkd thread pointer being non-NULL determines if the
thread has been already stopped.
Normally, the thread's function xen_blkif_schedule() sets the
ring->xenblkd to NULL, when the thread's main loop ends.
However, when the thread has not been started yet (i.e.
wake_up_process() has not been called on it), the xen_blkif_schedule()
function would not be called yet.
In such case the kthread_stop() call returns -EINTR and the
ring->xenblkd remains dangling.
When this happens, any consecutive call to xen_blkif_disconnect (for
example in frontend_changed() callback) leads to a kernel crash in
kthread_stop() (e.g. NULL pointer dereference in exit_creds()).
There is an error in the current code that the XTAL MODE
pin was set to NB MPP1_31 which should be NB MPP1_9.
The latch register of NB MPP1_9 has different offset of 0x8.
Signed-off-by: Terry Zhou <bjzhou@marvell.com>
[pali: Fix pin name in commit message] Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: 7ea8250406a6 ("clk: mvebu: Add the xtal clock for Armada 3700 SoC") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201106100039.11385-1-pali@kernel.org Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
md-cluster uses MD_CLUSTER_SEND_LOCK to make node can exclusively send msg.
During sending msg, node can concurrently receive msg from another node.
When node does resync job, grab token_lockres:EX may trigger a deadlock:
```
nodeA nodeB
-------------------- --------------------
a.
send METADATA_UPDATED
held token_lockres:EX
b.
md_do_sync
resync_info_update
send RESYNCING
+ set MD_CLUSTER_SEND_LOCK
+ wait for holding token_lockres:EX
c.
mdadm /dev/md0 --remove /dev/sdg
+ held reconfig_mutex
+ send REMOVE
+ wait_event(MD_CLUSTER_SEND_LOCK)
d.
recv_daemon //METADATA_UPDATED from A
process_metadata_update
+ (mddev_trylock(mddev) ||
MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD)
//this time, both return false forever
```
Explaination:
a. A send METADATA_UPDATED
This will block another node to send msg
b. B does sync jobs, which will send RESYNCING at intervals.
This will be block for holding token_lockres:EX lock.
c. B do "mdadm --remove", which will send REMOVE.
This will be blocked by step <b>: MD_CLUSTER_SEND_LOCK is 1.
d. B recv METADATA_UPDATED msg, which send from A in step <a>.
This will be blocked by step <c>: holding mddev lock, it makes
wait_event can't hold mddev lock. (btw,
MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD keep ZERO in this scenario.)
There is a similar deadlock in commit 0ba959774e93
("md-cluster: use sync way to handle METADATA_UPDATED msg")
In that commit, step c is "update sb". This patch step c is
"mdadm --remove".
For fixing this issue, we can refer the solution of function:
metadata_update_start. Which does the same grab lock_token action.
lock_comm can use the same steps to avoid deadlock. By moving
MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD from lock_token to lock_comm.
It enlarge a little bit window of MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD,
but it is safe & can break deadlock.
Repro steps (I only triggered 3 times with hundreds tests):
two nodes share 3 iSCSI luns: sdg/sdh/sdi. Each lun size is 1GB.
```
ssh root@node2 "mdadm -S --scan"
mdadm -S --scan
for i in {g,h,i};do dd if=/dev/zero of=/dev/sd$i oflag=direct bs=1M \
count=20; done
The comment implies this device has 3 sensor types, but it only
has an accelerometer and a gyroscope (both 3D). As such the
buffer does not need to be as long as stated.
Note I've separated this from the following patch which fixes
the alignment for passing to iio_push_to_buffers_with_timestamp()
as they are different issues even if they affect the same line
of code.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Cc: Daniel Baluta <daniel.baluta@oss.nxp.com> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200920112742.170751-5-jic23@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Whilst this is another case of the issue Lars reported with
an array of elements of smaller than 8 bytes being passed
to iio_push_to_buffers_with_timestamp(), the solution here is
a bit different from the other cases and relies on __aligned
working on the stack (true since 4.6?)
This one is unusual. We have to do an explicit memset() each time
as we are reading 3 bytes into a potential 4 byte channel which
may sometimes be a 2 byte channel depending on what is enabled.
As such, moving the buffer to the heap in the iio_priv structure
doesn't save us much. We can't use a nice explicit structure
on the stack either as the data channels have different storage
sizes and are all separately controlled.
Fixes: cc26ad455f57 ("iio: Add Freescale MPL3115A2 pressure / temperature sensor driver") Reported-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Cc: Peter Meerwald <pmeerw@pmeerw.net> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200920112742.170751-7-jic23@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
One of a class of bugs pointed out by Lars in a recent review.
iio_push_to_buffers_with_timestamp() assumes the buffer used is aligned
to the size of the timestamp (8 bytes). This is not guaranteed in
this driver which uses an array of smaller elements on the stack.
As Lars also noted this anti pattern can involve a leak of data to
userspace and that indeed can happen here. We close both issues by
moving to a suitable structure in the iio_priv().
This data is allocated with kzalloc() so no data can leak apart
from previous readings and in this case the status byte from the device.
The forced alignment of ts is not necessary in this case but it
potentially makes the code less fragile.
>From personal communications with Mikko:
We could probably split the reading of the int register, but it
would mean a significant performance cost of 20 i2c clock cycles.
Fixes: e12ffd241c00 ("iio: light: rpr0521 triggered buffer") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Cc: Mikko Koivunen <mikko.koivunen@fi.rohmeurope.com> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200920112742.170751-2-jic23@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the missing clk_disable_unprepare() of info->pclk
before return from rockchip_saradc_resume in the error
handling case when fails to prepare and enable info->clk.
Suggested-by: Robin Murphy <robin.murphy@arm.com> Fixes: 44d6f2ef94f9 ("iio: adc: add driver for Rockchip saradc") Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201103120743.110662-1-miaoqinglang@huawei.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When updating the buffer demux, we will skip a scan element from the
device in the case `in_ind != out_ind` and we enter the while loop.
in_ind should only be refreshed with `find_next_bit()` in the end of the
loop.
Note, to cause problems we need a situation where we are skippig over
an element (channel not enabled) that happens to not have the same size
as the next element. Whilst this is a possible situation we haven't
actually identified any cases in mainline where it happens as most drivers
have consistent channel storage sizes with the exception of the timestamp
which is the last element and hence never skipped over.
Fixes: 5ada4ea9be16 ("staging:iio: add demux optionally to path from device to buffer") Signed-off-by: Nuno Sá <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20201112144323.28887-1-nuno.sa@analog.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some devices (especially QCA ones) are already using hardcoded partition
names with colons in it. The OpenMesh A62 for example provides following
mtd relevant information via cmdline:
Such a partition list cannot be parsed and thus the device fails to boot.
Avoid this behavior by making sure that the start of the first part-name
("(") will also be the last byte the mtd-id split algorithm is using for
its colon search.
Fixes: eb13fa022741 ("mtd: parser: cmdline: Support MTD names containing one or more colons") Cc: stable@vger.kernel.org Cc: Ron Minnich <rminnich@google.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20201124062506.185392-1-sven@narfation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
smp2p_update_bits() should disable interrupts when it acquires its
spinlock. This is important because without the _irqsave, a priority
inversion can occur.
This function is called both with interrupts enabled in
qcom_q6v5_request_stop(), and with interrupts disabled in
ipa_smp2p_panic_notifier(). IRQ handling of spinlocks should be
consistent to avoid the panic notifier deadlocking because it's
sitting on the thread that's already got the lock via _request_stop().
Found via lockdep.
Cc: stable@vger.kernel.org Fixes: 50e99641413e7 ("soc: qcom: smp2p: Qualcomm Shared Memory Point to Point") Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Evan Green <evgreen@chromium.org> Link: https://lore.kernel.org/r/20200929133040.RESEND.1.Ideabf6dcdfc577cf39ce3d95b0e4aa1ac8b38f0c@changeid Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the calls to devm_platform_ioremap_resource(), irq_of_parse_and_map()
or devm_request_irq() fail on probe of the ST SSC4 SPI driver, the
runtime PM disable depth is incremented even though it was not
decremented before. Fix it.
If the calls to devm_clk_get(), devm_spi_register_master() or
clk_prepare_enable() fail on probe of the Mikrotik RB4xx SPI driver,
the spi_master struct is erroneously not freed.
Fix by switching over to the new devm_spi_alloc_master() helper.
If the calls to devm_request_irq() or devm_spi_register_master() fail
on probe of the PIC32 SPI driver, the DMA channels requested by
pic32_spi_dma_prep() are erroneously not released. Plug the leak.
spi_sh_remove() accesses the driver's private data after calling
spi_unregister_master() even though that function releases the last
reference on the spi_master and thereby frees the private data.
Fix by switching over to the new devm_spi_alloc_master() helper which
keeps the private data accessible until the driver has unbound.
Fixes: 680c1305e259 ("spi/spi_sh: use spi_unregister_master instead of spi_master_put in remove path") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: <stable@vger.kernel.org> # v3.0+: 5e844cc37a5c: spi: Introduce device-managed SPI controller allocation Cc: <stable@vger.kernel.org> # v3.0+ Cc: Axel Lin <axel.lin@ingics.com> Link: https://lore.kernel.org/r/6d97628b536baf01d5e3e39db61108f84d44c8b2.1607286887.git.lukas@wunner.de Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I observed this when unplugging a DP monitor whilst a computer is asleep
and then waking it up. This left DP chardev nodes still being present on
the filesystem and accessing these device nodes caused an oops because
drm_dp_aux_dev_get_by_minor() assumes a device exists if it is opened.
This can also be reproduced by creating a device node with mknod(1) and
issuing an open(2)
Bounds checking tools can flag a bug in dbAdjTree() for an array index
out of bounds in dmt_stree. Since dmt_stree can refer to the stree in
both structures dmaptree and dmapctl, use the larger array to eliminate
the false positive.
The log of this problem is:
jffs2: Error garbage collecting node at 0x***!
jffs2: No space for garbage collection. Aborting GC thread
This is because GC believe that it do nothing, so it abort.
After going over the image of jffs2, I find a scene that
can trigger this problem stably.
The scene is: there is a normal dirent node at summary-area,
but abnormal at corresponding not-summary-area with error
name_crc.
The reason that GC exit abnormally is because it find that
abnormal dirent node to GC, but when it goes to function
jffs2_add_fd_to_list, it cannot meet the condition listed
below:
if ((*prev)->nhash == new->nhash && !strcmp((*prev)->name, new->name))
So no node is marked obsolete, statistical information of
erase_block do not change, which cause GC exit abnormally.
The root cause of this problem is: we do not check the
name_crc of the abnormal dirent node with summary is enabled.
Noticed that in function jffs2_scan_dirent_node, we use
function jffs2_scan_dirty_space to deal with the dirent
node with error name_crc. So this patch add a checking
code in function read_direntry to ensure the correctness
of dirent node. If checked failed, the dirent node will
be marked obsolete so GC will pass this node and this
problem will be fixed.
Cc: <stable@vger.kernel.org> Signed-off-by: Zhe Li <lizhe67@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A NULL pointer dereference may occur in __ceph_remove_cap with some of the
callbacks used in ceph_iterate_session_caps, namely trim_caps_cb and
remove_session_caps_cb. Those callers hold the session->s_mutex, so they
are prevented from concurrent execution, but ceph_evict_inode does not.
Since the callers of this function hold the i_ceph_lock, the fix is simply
a matter of returning immediately if caps->ci is NULL.
Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/43272 Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit a408e4a86b36b ("ima: open a new file instance if no read
permissions") already introduced a second open to measure a file when the
original file descriptor does not allow it. However, it didn't remove the
existing method of changing the mode of the original file descriptor, which
is still necessary if the current process does not have enough privileges
to open a new one.
Changing the mode isn't really an option, as the filesystem might need to
do preliminary steps to make the read possible. Thus, this patch removes
the code and keeps the second open as the only option to measure a file
when it is unreadable with the original file descriptor.
We currently leak kernel memory to user space, because memory
offlining doesn't do any implicit clearing of memory and we are
missing explicit clearing of memory.
Let's keep it simple and clear pages before removing the linear
mapping.
Reproduced in QEMU/TCG with 10 GiB of main memory:
[root@localhost ~]# dd obs=9G if=/dev/urandom of=/dev/null
[... wait until "free -m" used counter no longer changes and cancel] 19665802+0 records in
1+0 records out 9663676416 bytes (9.7 GB, 9.0 GiB) copied, 135.548 s, 71.3 MB/s
[root@localhost ~]# cat /sys/devices/system/memory/block_size_bytes 40000000
[root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
[ 402.978663][ T1086] page:000000001bc4bc74 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24900
[ 402.980063][ T1086] flags: 0x7ffff000001000(reserved)
[ 402.980415][ T1086] raw: 007ffff000001000c00c000000924008c00c0000009240080000000000000000
[ 402.980627][ T1086] raw: 0000000000000000000000000000000000000001ffffffff0000000000000000
[ 402.980845][ T1086] page dumped because: unmovable page
[ 402.989608][ T1086] Offlined Pages 16384
[ 403.324155][ T1086] memtrace: Allocated trace memory on node 0 at 0x0000000200000000
Before this patch:
[root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace | head 00000000 c8 25 72 51 4d 26 36 c5 5c c2 56 15 d5 1a cd 10 |.%rQM&6.\.V.....| 00000010 19 b9 50 b2 cb e3 60 b8 ec 0a f3 ec 4b 3c 39 f0 |..P...`.....K<9.|$ 00000020 4e 5a 4c cf bd 26 19 ff 37 79 13 67 24 b7 b8 57 |NZL..&..7y.g$..W|$ 00000030 98 3e f5 be 6f 14 6a bd a4 52 bc 6e e9 e0 c1 5d |.>..o.j..R.n...]|$ 00000040 76 b3 ae b5 88 d7 da e3 64 23 85 2c 10 88 07 b6 |v.......d#.,....|$ 00000050 9a d8 91 de f7 50 27 69 2e 64 9c 6f d3 19 45 79 |.....P'i.d.o..Ey|$ 00000060 6a 6f 8a 61 71 19 1f c7 f1 df 28 26 ca 0f 84 55 |jo.aq.....(&...U|$ 00000070 01 3f be e4 e2 e1 da ff 7b 8c 8e 32 37 b4 24 53 |.?......{..27.$S|$ 00000080 1b 70 30 45 56 e6 8c c4 0e b5 4c fb 9f dd 88 06 |.p0EV.....L.....|$ 00000090 ef c4 18 79 f1 60 b1 5c 79 59 4d f4 36 d7 4a 5c |...y.`.\yYM.6.J\|$
After this patch:
[root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace | head 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
* 40000000
Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org # v4.14+ Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-2-david@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since some time now, printk() adds carriage return, leading to
unusable xmon output if there is no udbg backend available:
[ 54.288722] sysrq: Entering xmon
[ 54.292209] Vector: 0 at [cace3d2c]
[ 54.292274] pc:
[ 54.292331] c0023650
[ 54.292468] : xmon+0x28/0x58
[ 54.292519]
[ 54.292574] lr:
[ 54.292630] c0023724
[ 54.292749] : sysrq_handle_xmon+0xa4/0xfc
[ 54.292801]
[ 54.292867] sp: cace3de8
[ 54.292931] msr: 9032
[ 54.292999] current = 0xc28d0000
[ 54.293072] pid = 377, comm = sh
[ 54.293157] Linux version 5.10.0-rc6-s3k-dev-01364-gedf13f0ccd76-dirty (root@po17688vm.idsi0.si.c-s.fr) (powerpc64-linux-gcc (GCC) 10.1.0, GNU ld (GNU Binutils) 2.34) #4211 PREEMPT Fri Dec 4 09:32:11 UTC 2020
[ 54.293287] enter ? for help
[ 54.293470] [cace3de8]
[ 54.293532] c0023724
[ 54.293654] sysrq_handle_xmon+0xa4/0xfc
[ 54.293711] (unreliable)
...
[ 54.296002]
[ 54.296159] --- Exception: c01 (System Call) at
[ 54.296217] 0fd4e784
[ 54.296303]
[ 54.296375] SP (7fca6ff0) is in userspace
[ 54.296431] mon>
[ 54.296484] <no input ...>
Use pr_cont() instead.
Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Mention that it only happens when udbg is not available] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c8a6ec704416ecd5ff2bd26213c9bc026bdd19de.1607077340.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
introduced the following error when invoking the errinjct userspace
tool:
[root@ltcalpine2-lp5 librtas]# errinjct open
[327884.071171] sys_rtas: RTAS call blocked - exploit attempt?
[327884.071186] sys_rtas: token=0x26, nargs=0 (called by errinjct)
errinjct: Could not open RTAS error injection facility
errinjct: librtas: open: Unexpected I/O error
The entry for ibm,open-errinjct in rtas_filter array has a typo where
the "j" is omitted in the rtas call name. After fixing this typo the
errinjct tool functions again as expected.
CAN0 and CAN1 instances share the same message ram configured
at 0x210000 on sama5d2 Linux systems.
According to current configuration of CAN0, we need 0x1c00 bytes
so that the CAN1 don't overlap its message ram:
64 x RX FIFO0 elements => 64 x 72 bytes
32 x TXE (TX Event FIFO) elements => 32 x 8 bytes
32 x TXB (TX Buffer) elements => 32 x 72 bytes
So a total of 7168 bytes (0x1C00).
Fix offset to match this needed size.
Make the CAN0 message ram ioremap match exactly this size so that is
easily understandable. Adapt CAN1 size accordingly.
Xattr code using inodes with large xattr data can end up dropping last
inode reference (and thus deleting the inode) from places like
ext4_xattr_set_entry(). That function is called with transaction started
and so ext4_evict_inode() can deadlock against fs freezing like:
CPU1 CPU2
removexattr() freeze_super()
vfs_removexattr()
ext4_xattr_set()
handle = ext4_journal_start()
...
ext4_xattr_set_entry()
iput(old_ea_inode)
ext4_evict_inode(old_ea_inode)
sb->s_writers.frozen = SB_FREEZE_FS;
sb_wait_write(sb, SB_FREEZE_FS);
ext4_freeze()
jbd2_journal_lock_updates()
-> blocks waiting for all
handles to stop
sb_start_intwrite()
-> blocks as sb is already in SB_FREEZE_FS state
Generally it is advisable to delete inodes from a separate transaction
as it can consume quite some credits however in this case it would be
quite clumsy and furthermore the credits for inode deletion are quite
limited and already accounted for. So just tweak ext4_evict_inode() to
avoid freeze protection if we have transaction already started and thus
it is not really needed anyway.
Cc: stable@vger.kernel.org Fixes: dec214d00e0d ("ext4: xattr inode deduplication") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127110649.24730-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When freeing metadata, we will create an ext4_free_data and
insert it into the pending free list. After the current
transaction is committed, the object will be freed.
ext4_mb_free_metadata() will check whether the area to be freed
overlaps with the pending free list. If true, return directly. At this
time, ext4_free_data is leaked. Fortunately, the probability of this
problem is small, since it only occurs if the file system is corrupted
such that a block is claimed by more one inode and those inodes are
deleted within a single jbd2 transaction.
Some of the self tests create a test inode, setup some extents and then do
calls to btrfs_get_extent() to test that the corresponding extent maps
exist and are correct. However btrfs_get_extent(), since the 5.2 merge
window, now errors out when it finds a regular or prealloc extent for an
inode that does not correspond to a regular file (its ->i_mode is not
S_IFREG). This causes the self tests to fail sometimes, specially when
KASAN, slub_debug and page poisoning are enabled:
$ modprobe btrfs
modprobe: ERROR: could not insert 'btrfs': Invalid argument
This happens because the test inodes are created without ever initializing
the i_mode field of the inode, and neither VFS's new_inode() nor the btrfs
callback btrfs_alloc_inode() initialize the i_mode. Initialization of the
i_mode is done through the various callbacks used by the VFS to create
new inodes (regular files, directories, symlinks, tmpfiles, etc), which
all call btrfs_new_inode() which in turn calls inode_init_owner(), which
sets the inode's i_mode. Since the tests only uses new_inode() to create
the test inodes, the i_mode was never initialized.
This always happens on a VM I used with kasan, slub_debug and many other
debug facilities enabled. It also happened to someone who reported this
on bugzilla (on a 5.3-rc).
Fix this by setting i_mode to S_IFREG at btrfs_new_test_inode().
The driver did not update its view of the available device buffer space
until write() was called in task context. This meant that write_room()
would return 0 even after the device had sent a write-unthrottle
notification, something which could lead to blocked writers not being
woken up (e.g. when using OPOST).
Note that we must also request an unthrottle notification is case a
write() request fills the device buffer exactly.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable <stable@vger.kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver's transmit-unthrottle work was never flushed on disconnect,
something which could lead to the driver port data being freed while the
unthrottle work is still scheduled.
Fix this by cancelling the unthrottle work when shutting down the port.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver's deferred write wakeup was never flushed on disconnect,
something which could lead to the driver port data being freed while the
wakeup work is still scheduled.
Fix this by using the usb-serial write wakeup which gets cancelled
properly on disconnect.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure to clear the write-busy flag also in case no new data was
submitted due to lack of device buffer space so that writing is
resumed once space again becomes available.
Fixes: 507ca9bc0476 ("[PATCH] USB: add ability for usb-serial drivers to determine if their write urb is currently being used.") Cc: stable <stable@vger.kernel.org> # 2.6.13 Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The write() callback can be called in interrupt context (e.g. when used
as a console) so interrupts must be disabled while holding the port lock
to prevent a possible deadlock.
Fixes: e81ee637e4ae ("usb-serial: possible irq lock inversion (PPP vs. usb/serial)") Fixes: 507ca9bc0476 ("[PATCH] USB: add ability for usb-serial drivers to determine if their write urb is currently being used.") Cc: stable <stable@vger.kernel.org> # 2.6.19 Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit c528fcb116e6 ("USB: serial: keyspan_pda: fix receive sanity
checks") broke write-unthrottle handling by dropping well-formed
unthrottle-interrupt packets which are precisely two bytes long. This
could lead to blocked writers not being woken up when buffer space again
becomes available.
Instead, stop unconditionally printing the third byte which is
(presumably) only valid on modem-line changes.
The parallel-port restore operations is called when a driver claims the
port and is supposed to restore the provided state (e.g. saved when
releasing the port).
Fixes: b69578df7e98 ("USB: usbserial: mos7720: add support for parallel port on moschip 7715") Cc: stable <stable@vger.kernel.org> # 2.6.35 Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In order to setup its PCI component, the driver needs any node private
instance in order to get a reference to the PCI device and hand that
into edac_pci_create_generic_ctl(). For convenience, it uses the 0th
memory controller descriptor under the assumption that if any, the 0th
will be always present.
However, this assumption goes wrong when the 0th node doesn't have
memory and the driver doesn't initialize an instance for it:
ecdh_set_secret() casts a void* pointer to a const u64* in order to
feed it into ecc_is_key_valid(). This is not generally permitted by
the C standard, and leads to actual misalignment faults on ARMv6
cores. In some cases, these are fixed up in software, but this still
leads to performance hits that are entirely avoidable.
So let's copy the key into the ctx buffer first, which we will do
anyway in the common case, and which guarantees correct alignment.
Perf event attritube supports exclude_kernel flag to avoid
sampling/profiling in supervisor state (kernel). Based on this event
attr flag, Monitor Mode Control Register bit is set to freeze on
supervisor state. But sometimes (due to hardware limitation), Sampled
Instruction Address Register (SIAR) locks on to kernel address even
when freeze on supervisor is set. Patch here adds a check to drop
those samples.
I have had reports from two different people that attempts to read the
analog input channels of the MF624 board fail with an `ETIMEDOUT` error.
After triggering the conversion, the code calls `comedi_timeout()` with
`mf6x4_ai_eoc()` as the callback function to check if the conversion is
complete. The callback returns 0 if complete or `-EBUSY` if not yet
complete. `comedi_timeout()` returns `-ETIMEDOUT` if it has not
completed within a timeout period which is propagated as an error to the
user application.
The existing code considers the conversion to be complete when the EOLC
bit is high. However, according to the user manuals for the MF624 and
MF634 boards, this test is incorrect because EOLC is an active low
signal that goes high when the conversion is triggered, and goes low
when the conversion is complete. Fix the problem by inverting the test
of the EOLC bit state.
In dasd_alias_disconnect_device_from_lcu the device is removed from any
list on the LCU. Afterwards the LCU is removed from the lcu list if it
does not contain devices any longer.
The lcu->lock protects the lcu from parallel updates. But to cancel all
workers and wait for completion the lcu->lock has to be unlocked.
If two devices are removed in parallel and both are removed from the LCU
the first device that takes the lcu->lock again will delete the LCU because
it is already empty but the second device also tries to free the LCU which
leads to a list corruption of the lcu list.
Fix by removing the device right before the lcu is checked without
unlocking the lcu->lock in between.
Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Cc: stable@vger.kernel.org Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dasd_alias_add_device() moves devices to the active_devices list in case
of a scheduled LCU update regardless if they have previously been in a
pavgroup or not.
Example: device A and B are in the same pavgroup.
Device A has already been in a pavgroup and the private->pavgroup pointer
is set and points to a valid pavgroup. While going through dasd_add_device
it is moved from the pavgroup to the active_devices list.
In parallel device B might be removed from the same pavgroup in
remove_device_from_lcu() which in turn checks if the group is empty
and deletes it accordingly because device A has already been removed from
there.
When now device A enters remove_device_from_lcu() it is tried to remove it
from the pavgroup again because the pavgroup pointer is still set and again
the empty group will be cleaned up which leads to a list corruption.
Fix by setting private->pavgroup to NULL in dasd_add_device.
If the device has been the last device on the pavgroup an empty pavgroup
remains but this will be cleaned up by the scheduled lcu_update which
iterates over all existing pavgroups.
Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Cc: stable@vger.kernel.org Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Prevent _lcu_update from adding a device to a pavgroup if the LCU still
requires an update. The data is not reliable any longer and in parallel
devices might have been moved on the lists already.
This might lead to list corruptions or invalid PAV grouping.
Only add devices to a pavgroup if the LCU is up to date. Additional steps
are taken by the scheduled lcu update.
Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Cc: stable@vger.kernel.org Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>