When calling set_mode from sys/fs, the call flow locks the sys/fs lock
first and then tries to lock rtnl_lock (when calling ipoib_set_mod).
On the other hand, the rmmod call flow takes the rtnl_lock first
(when calling unregister_netdev) and then tries to take the sys/fs
lock. Deadlock a->b, b->a.
The problem starts when ipoib_set_mod frees it's rtnl_lck and tries
to get it after that.
Ever since mount propagation was introduced in cases where a mount in
propagated to parent mount mountpoint pair that is already in use the
code has placed the new mount behind the old mount in the mount hash
table.
This implementation detail is problematic as it allows creating
arbitrary length mount hash chains.
Furthermore it invalidates the constraint maintained elsewhere in the
mount code that a parent mount and a mountpoint pair will have exactly
one mount upon them. Making it hard to deal with and to talk about
this special case in the mount code.
Modify mount propagation to notice when there is already a mount at
the parent mount and mountpoint where a new mount is propagating to
and place that preexisting mount on top of the new mount.
Modify unmount propagation to notice when a mount that is being
unmounted has another mount on top of it (and no other children), and
to replace the unmounted mount with the mount on top of it.
Move the MNT_UMUONT test from __lookup_mnt_last into
__propagate_umount as that is the only call of __lookup_mnt_last where
MNT_UMOUNT may be set on any mount visible in the mount hash table.
These modifications allow:
- __lookup_mnt_last to be removed.
- attach_shadows to be renamed __attach_mnt and its shadow
handling to be removed.
- commit_tree to be simplified
- copy_tree to be simplified
The result is an easier to understand tree of mounts that does not
allow creation of arbitrary length hash chains in the mount hash table.
The result is also a very slight userspace visible difference in semantics.
The following two cases now behave identically, where before order
mattered:
case 1: (explicit user action)
B is a slave of A
mount something on A/a , it will propagate to B/a
and than mount something on B/a
case 2: (tucked mount)
B is a slave of A
mount something on B/a
and than mount something on A/a
Histroically umount A/a would fail in case 1 and succeed in case 2.
Now umount A/a succeeds in both configurations.
This very small change in semantics appears if anything to be a bug
fix to me and my survey of userspace leads me to believe that no programs
will notice or care of this subtle semantic change.
v2: Updated to mnt_change_mountpoint to not call dput or mntput
and instead to decrement the counts directly. It is guaranteed
that there will be other references when mnt_change_mountpoint is
called so this is safe.
v3: Moved put_mountpoint under mount_lock in attach_recursive_mnt
As the locking in fs/namespace.c changed between v2 and v3.
v4: Reworked the logic in propagate_mount_busy and __propagate_umount
that detects when a mount completely covers another mount.
v5: Removed unnecessary tests whose result is alwasy true in
find_topper and attach_recursive_mnt.
v6: Document the user space visible semantic difference.
Fixes: b90fa9ae8f51 ("[PATCH] shared mount handling: bind and rbind") Tested-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
brcmf_sdio_fromevntchan() was being called on the the data frame
rather than the software header, causing some frames to be
mischaracterized as on the event channel rather than the data channel.
This fixes a major performance regression (due to dropped packets). With
this patch the download speed jumped from 1Mbit/s back up to 40MBit/s due
to the sheer amount of packets being incorrectly processed.
Fixes: c56caa9db8ab ("brcmfmac: screening firmware event packet") Signed-off-by: Gavin Li <git@thegavinli.com> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
[kvalo@codeaurora.org: improve commit logs based on email discussion] Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 14a3ae34bfd0 ("cxl: Prevent read/write to AFU config space while AFU
not configured") introduced a rwsem to fix an invalid memory access that
occurred when someone attempts to access the config space of an AFU on a
vPHB whilst the AFU is deconfigured, such as during EEH recovery.
It turns out that it's possible to run into a nested locking issue when EEH
recovery fails and a full device hotplug is required.
cxl_pci_error_detected() deconfigures the AFU, taking a writer lock on
configured_rwsem. When EEH recovery fails, the EEH code calls
pci_hp_remove_devices() to remove the device, which in turn calls
cxl_remove() -> cxl_pci_remove_afu() -> pci_deconfigure_afu(), which tries
to grab the writer lock that's already held.
Standard rwsem semantics don't express what we really want to do here and
don't allow for nested locking. Fix this by replacing the rwsem with an
atomic_t which we can control more finely. Allow the AFU to be locked
multiple times so long as there are no readers.
Fixes: 14a3ae34bfd0 ("cxl: Prevent read/write to AFU config space while AFU not configured") Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During EEH recovery, we deconfigure all AFUs whilst leaving the
corresponding vPHB and virtual PCI device in place.
If something attempts to interact with the AFU's PCI config space (e.g.
running lspci) after the AFU has been deconfigured and before it's
reconfigured, cxl_pcie_{read,write}_config() will read invalid values from
the deconfigured struct cxl_afu and proceed to Oops when they try to
dereference pointers that have been set to NULL during deconfiguration.
Add a rwsem to struct cxl_afu so we can prevent interaction with config
space while the AFU is deconfigured.
When TX descriptors are filled in, the buffer DMA address is split
between the tx_desc->buf_phys_addr field (high-order bits) and
tx_desc->packet_offset field (5 low-order bits).
However, when we re-calculate the DMA address from the TX descriptor in
mvpp2_txq_inc_put(), we do not take tx_desc->packet_offset into
account. This means that when the DMA address is not aligned on a 32
bytes boundary, we end up calling dma_unmap_single() with a DMA address
that was not the one returned by dma_map_single().
This inconsistency is detected by the kernel when DMA_API_DEBUG is
enabled. We fix this problem by properly calculating the DMA address in
mvpp2_txq_inc_put().
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current implementation of setup_randomness uses the stack address
and therefore the pointer to the SYSIB 3.2.2 block as input data
address. Furthermore the length of the input data is the number of
virtual-machine description blocks which is typically one.
This means that typically a single zero byte is fed to
add_device_randomness.
Fix both of these and use the address of the first virtual machine
description block as input data address and also use the correct
length.
Fixes: bcfcbb6bae64 ("s390: add system information as device randomness") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit bcfcbb6bae64 ("s390: add system information as device
randomness") intended to add some virtual machine specific information
to the randomness pool.
Unfortunately it uses the page allocator before it is ready to use. In
result the page allocator always returns NULL and the setup_randomness
function never adds anything to the randomness pool.
To fix this use memblock_alloc and memblock_free instead.
Fixes: bcfcbb6bae64 ("s390: add system information as device randomness") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Return a sensible value if TASK_SIZE if called from a kernel thread.
This gets us around an issue with copy_mount_options that does a magic
size calculation "TASK_SIZE - (unsigned long)data" while in a kernel
thread and data pointing to kernel space.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In binutils/libbfd (bfd/elf.c) it is enforced that all s390 specific ELF
notes like e.g. NT_S390_PREFIX or NT_S390_CTRS have "LINUX" specified
as note name. Otherwise the notes are ignored.
For /proc/vmcore we currently use "CORE" for these notes.
Up to now this has not been a real problem because the dump analysis tool
"crash" does not check the note name. But it will break all programs that
use libbfd for processing ELF notes.
So fix this and use "LINUX" for all s390 specific notes to comply with
libbfd.
Reported-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Reviewed-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit dd22f551 "block: Change direct_access calling convention",
the device size calculation in dcssblk_direct_access() is off-by-one.
This results in bdev_direct_access() always returning -ENXIO because the
returned value is not page aligned.
For devices with multiple input queues, tiqdio_call_inq_handlers()
iterates over all input queues and clears the device's DSCI
during each iteration. If the DSCI is re-armed during one
of the later iterations, we therefore do not scan the previous
queues again.
The re-arming also raises a new adapter interrupt. But its
handler does not trigger a rescan for the device, as the DSCI
has already been erroneously cleared.
This can result in queue stalls on devices with multiple
input queues.
Fix it by clearing the DSCI just once, prior to scanning the queues.
As the code is moved in front of the loop, we also need to access
the DSCI directly (ie irq->dsci) instead of going via each queue's
parent pointer to the same irq. This is not a functional change,
and a follow-up patch will clean up the other users.
In practice, this bug only affects CQ-enabled HiperSockets devices,
ie. devices with sysfs-attribute "hsuid" set. Setting a hsuid is
needed for AF_IUCV socket applications that use HiperSockets
communication.
Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks") Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The MKS Instruments SCOM-0800 and SCOM-0801 cards (originally by Tenta
Technologies) are 3U CompactPCI serial cards with 4 and 8 serial ports,
respectively. The first 4 ports are implemented by an OX16PCI954 chip,
and the second 4 ports are implemented by an OX16C954 chip on a local
bus, bridged by the second PCI function of the OX16PCI954. The ports
are jumper-selectable as RS-232 and RS-422/485, and the UARTs use a
non-standard oscillator frequency of 20 MHz (base_baud = 1250000).
Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently N_HDLC line discipline uses a self-made singly linked list for
data buffers and has n_hdlc.tbuf pointer for buffer retransmitting after
an error.
The commit be10eb7589337e5defbe214dae038a53dd21add8
("tty: n_hdlc add buffer flushing") introduced racy access to n_hdlc.tbuf.
After tx error concurrent flush_tx_queue() and n_hdlc_send_frames() can put
one data buffer to tx_free_buf_list twice. That causes double free in
n_hdlc_release().
Let's use standard kernel linked list and get rid of n_hdlc.tbuf:
in case of tx error put current data buffer after the head of tx_buf_list.
Signed-off-by: Alexander Popov <alex.popov@linux.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Given we scan ~10k entries per run its clearly wrong to reduce interval
based on nonzero eviction count, it will only waste cpu cycles since a vast
majorities of conntracks are not timed out.
Thus only look at the ratio (scanned entries vs. evicted entries) to make
a decision on whether to reduce or not.
Because evictor is supposed to only kick in when system turns idle after
a busy period, pick a high ratio -- this makes it 50%. We thus keep
the idea of increasing scan rate when its likely that table contains many
expired entries.
In order to not let timed-out entries hang around for too long
(important when using event logging, in which case we want to timely
destroy events), we now scan the full table within at most
GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all
timed-out entries sit in same slot.
I tested this with a vm under synflood (with
sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3).
While flood is ongoing, interval now stays at its max rate
(GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms).
With feedback from Nicolas Dichtel.
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries") Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
add_to_page_cache_lru() can fails, so the actual pages to read
can be smaller than the initial size of osd request. We need to
update osd request size in that case.
The driver was calculating the adapter command pagesize indicator from
the system pagesize. However, the buffers the driver allocates are only
one size (SLI4_PAGE_SIZE), so no calculation was necessary.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC arch/mips/mm/sc-ip22.o
{standard input}: Assembler messages:
{standard input}:132: Error: number (0x9000000080000000) larger than 32 bits
{standard input}:159: Error: number (0x9000000080000000) larger than 32 bits
{standard input}:200: Error: number (0x9000000080000000) larger than 32 bits
scripts/Makefile.build:293: recipe for target 'arch/mips/mm/sc-ip22.o' failed
make[1]: *** [arch/mips/mm/sc-ip22.o] Error 1
MIPS has used .set mips3 to temporarily switch the assembler to 64 bit
mode in 64 bit kernels virtually forever. Binutils 2.25 broke this
behavious partially by happily accepting 64 bit instructions in .set mips3
mode but puking on 64 bit constants when generating 32 bit ELF. Binutils
2.26 restored the old behaviour again.
Fix build with binutils 2.25 by open coding the offending
dli $1, 0x9000000080000000
as
li $1, 0x9000
dsll $1, $1, 48
which is ugly be the only thing that will build on all binutils vintages.
We will set LPCR with correct value for radix during int. This make sure we
start with a sanitized value of LPCR. In case of kexec, cpus can have LPCR
value based on the previous translation mode we were running.
Fixes: fe036a0605d60 ("powerpc/64/kexec: Fix MMU cleanup on radix") Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Whenever there is a watchpoint match occurs, hw_breakpoint_handler will
be called by do_break via notifier chains mechanism. If watchpoint is
registered by xmon, hw_breakpoint_handler won't find any associated
perf_event and returns immediately with NOTIFY_STOP. Similarly, do_break
also returns without notifying to xmon.
Solve this by returning NOTIFY_DONE when hw_breakpoint_handler does not
find any perf_event associated with matched watchpoint, rather than
NOTIFY_STOP, which tells the core code to continue calling the other
breakpoint handlers including the xmon one.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The MAX_SEND_SGES check introduced in commit 655fec6987be
("xprtrdma: Use gathered Send for large inline messages") fails
for devices that have a small max_sge.
Instead of checking for a large fixed maximum number of SGEs,
check for a minimum small number. RPC-over-RDMA will switch to
using a Read chunk if an xdr_buf has more pages than can fit in
the device's max_sge limit. This is considerably better than
failing all together to mount the server.
This fix supports devices that have as few as three send SGEs
available.
Reported-by: Selvin Xavier <selvin.xavier@broadcom.com> Reported-by: Devesh Sharma <devesh.sharma@broadcom.com> Reported-by: Honggang Li <honli@redhat.com> Reported-by: Ram Amrani <Ram.Amrani@cavium.com> Fixes: 655fec6987be ("xprtrdma: Use gathered Send for large ...") Tested-by: Honggang Li <honli@redhat.com> Tested-by: Ram Amrani <Ram.Amrani@cavium.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit d5440e27d3e5 ("xprtrdma: Enable pad optimization") made the
Linux client omit XDR round-up padding in normal Read and Write
chunks so that the client doesn't have to register and invalidate
3-byte memory regions that contain no real data.
Unfortunately, my cheery 2014 assessment that this optimization "is
supported now by both Linux and Solaris servers" was premature.
We've found bugs in Solaris in this area since commit d5440e27d3e5
("xprtrdma: Enable pad optimization") was merged (SYMLINK is the
main offender).
So for maximum interoperability, I'm disabling this optimization
again. If a CM private message is exchanged when connecting, the
client recognizes that the server is Linux, and enables the
optimization for that connection.
Until now the Solaris server bugs did not impact common operations,
and were thus largely benign. Soon, less capable devices on Linux
NFS/RDMA clients will make use of Read chunks more often, and these
Solaris bugs will prevent interoperation in more cases.
Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pad optimization is changed by echoing into
/proc/sys/sunrpc/rdma_pad_optimize. This is a global setting,
affecting all RPC-over-RDMA connections to all servers.
The marshaling code picks up that value and uses it for decisions
about how to construct each RPC-over-RDMA frame. Having it change
suddenly in mid-operation can result in unexpected failures. And
some servers a client mounts might need chunk round-up, while
others don't.
So instead, copy the pad_optimize setting into each connection's
rpcrdma_ia when the transport is created, and use the copy, which
can't change during the life of the connection, instead.
This also removes a hack: rpcrdma_convert_iovs was using
the remote-invalidation-expected flag to predict when it could leave
out Write chunk padding. This is because the Linux server handles
implicit XDR padding on Write chunks correctly, and only Linux
servers can set the connection's remote-invalidation-expected flag.
It's more sensible to use the pad optimization setting instead.
Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When pad optimization is disabled, rpcrdma_convert_iovs still
does not add explicit XDR round-up padding to a Read chunk.
Commit 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling")
incorrectly short-circuited the test for whether round-up padding
is needed that appears later in rpcrdma_convert_iovs.
However, if this is indeed a regular Read chunk (and not a
Position-Zero Read chunk), the tail iovec _always_ contains the
chunk's padding, and never anything else.
So, it's easy to just skip the tail when padding optimization is
enabled, and add the tail in a subsequent Read chunk segment, if
disabled.
Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3d8cc00073d6 ("dmaengine: ipu: Consolidate duplicated irq handlers")
consolidated the two interrupts routines into one, but the remaining
interrupt routine only checks the status of the error interrupts, not the
normal interrupts.
This patch fixes that problem (tested on i.MX31 PDK board).
The commit 7a654172161c ("mtd/ifc: Add support for IFC controller
version 2.0") added support for version 2.0 of the IFC controller.
The version 2.0 controller has the ECC status registers at a different
location to the previous versions.
Correct the fsl_ifc_nand structure so that the ECC status can be read
from the correct location for both version 1.0 and 2.0 of the controller.
Fixes: 7a654172161c ("mtd/ifc: Add support for IFC controller version 2.0") Signed-off-by: Mark Marshall <mark.marshall@omicronenergy.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Recently I receive a bug report that on Linux v3.0 based kerenl, hot add
disk to a md linear device causes kernel crash at linear_congested(). From
the crash image analysis, I find in linear_congested(), mddev->raid_disks
contains value N, but conf->disks[] only has N-1 pointers available. Then
a NULL pointer deference crashes the kernel.
There is a race between linear_add() and linear_congested(), RCU stuffs
used in these two functions cannot avoid the race. Since Linuv v4.0
RCU code is replaced by introducing mddev_suspend(). After checking the
upstream code, it seems linear_congested() is not called in
generic_make_request() code patch, so mddev_suspend() cannot provent it
from being called. The possible race still exists.
Here I explain how the race still exists in current code. For a machine
has many CPUs, on one CPU, linear_add() is called to add a hard disk to a
md linear device; at the same time on other CPU, linear_congested() is
called to detect whether this md linear device is congested before issuing
an I/O request onto it.
Now I use a possible code execution time sequence to demo how the possible
race happens,
In linear_add() mddev->raid_disks is increased in time seq 2, and on
another CPU in linear_congested() the for-loop iterates conf->disks[i] by
the increased mddev->raid_disks in time seq 3,4. But conf with one more
element (which is a pointer to struct dev_info type) to conf->disks[] is
not updated yet, accessing its structure member in time seq 4 will cause a
NULL pointer deference fault.
To fix this race, there are 2 parts of modification in the patch,
1) Add 'int raid_disks' in struct linear_conf, as a copy of
mddev->raid_disks. It is initialized in linear_conf(), always being
consistent with pointers number of 'struct dev_info disks[]'. When
iterating conf->disks[] in linear_congested(), use conf->raid_disks to
replace mddev->raid_disks in the for-loop, then NULL pointer deference
will not happen again.
2) RCU stuffs are back again, and use kfree_rcu() in linear_add() to
free oldconf memory. Because oldconf may be referenced as mddev->private
in linear_congested(), kfree_rcu() makes sure that its memory will not
be released until no one uses it any more.
Also some code comments are added in this patch, to make this modification
to be easier understandable.
This patch can be applied for kernels since v4.0 after commit: 3be260cc18f8 ("md/linear: remove rcu protections in favour of
suspend/resume"). But this bug is reported on Linux v3.0 based kernel, for
people who maintain kernels before Linux v4.0, they need to do some back
back port to this patch.
Changelog:
- V3: add 'int raid_disks' in struct linear_conf, and use kfree_rcu() to
replace rcu_call() in linear_add().
- v2: add RCU stuffs by suggestion from Shaohua and Neil.
- v1: initial effort.
Signed-off-by: Coly Li <colyli@suse.de> Cc: Shaohua Li <shli@fb.com> Cc: Neil Brown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If segs_per_sec is over 1 like under SMR, previously f2fs issues discard
commands redundantly on the same section, since we didn't move end position
for the previous discard command.
And, after issue discard for this section,
end start
| |
prefree_bitmap = [01111100111100]
Select this section again by searching from (end + 1),
start end
| |
prefree_bitmap = [01111100111100]
Fixes: 36abef4e796d38 ("f2fs: introduce mode=lfs mount option") Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For foreground gc, greedy algorithm should be adapted, which makes
this formula work well:
(2 * (100 / config.overprovision + 1) + 6)
But currently, we fg_gc have a prior to select bg_gc victim segments to gc
first, these victims are selected by cost-benefit algorithm, we can't guarantee
such segments have the small valid blocks, which may destroy the f2fs rule, on
the worstest case, would consume all the free segments.
This patch fix this by add a filter in check_bg_victims, if segment's has # of
valid blocks over overprovision ratio, skip such segments.
It turns out a stakable filesystem like sdcardfs in AOSP can trigger multiple
vfs_create() to lower filesystem. In that case, f2fs will add multiple dentries
having same name which breaks filesystem consistency.
Until upper layer fixes, let's work around by f2fs, which shows actually not
much performance regression.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're not taking into account that the space needed for the (variable
length) attr bitmap, with the result that we'd sometimes get a spurious
ERANGE when the ACL data got close to the end of a page.
Just add in an extra page to make sure.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bitmap and attrlen follow immediately after the op reply header. This
was an oversight from commit bf118a342f.
Consequences of this are just minor efficiency (extra calls to
xdr_shrink_bufhead).
Fixes: bf118a342f10 "NFSv4: include bitmap in nfsv4 get acl data" Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we see that our pNFS READ/WRITE/COMMIT operation failed, but we
also see that our layout segment is no longer valid, then we need to
get a new layout segment before retrying.
Fixes: 90816d1ddacf ("NFSv4.1/flexfiles: Don't mark the entire deviceid...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Copy offload code needs to be hooked into the code for handling
NFS4ERR_BAD_STATEID by ensuring that we set the "stateid" field
in struct nfs4_exception.
Both the NFS protocols and the Linux VFS use a setattr operation with a
bitmap of attributes to set to set various file attributes including the
file size and the uid/gid.
The Linux syscalls never mix size updates with unrelated updates like
the uid/gid, and some file systems like XFS and GFS2 rely on the fact
that truncates don't update random other attributes, and many other file
systems handle the case but do not update the other attributes in the
same transaction. NFSD on the other hand passes the attributes it gets
on the wire more or less directly through to the VFS, leading to updates
the file systems don't expect. XFS at least has an assert on the
allowed attributes, which caught an unusual NFS client setting the size
and group at the same time.
To handle this issue properly this splits the notify_change call in
nfsd_setattr into two separate ones.
Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 050c3d52cc7810d9d17b8cd231708609af6876ae ("vme: make core
vme support explicitly non-modular") dropped the remove function
because it appeared as if it was for removal of the bus, which is
not supported.
However, vme_bus_remove() is called when a VME device is removed
from the bus and not when the bus is removed; as it calls the VME
device driver's cleanup function. Without this function, the
remove() in the VME device driver is never called and VME device
drivers cannot be reloaded again.
Here we restore the remove function that was deleted in that
commit, and the reference to the function in the bus structure.
The problem is due to rtl8192ce and rtl8192cu sharing routines, and having
different layouts of struct rtl_pci_priv, which is used by rtl8192ce, and
struct rtl_usb_priv, which is used by rtl8192cu. The problem was resolved
by placing the struct bt_coexist_info at the head of each of those private
areas.
Reported-and-tested-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The addresses of Wlan NIC registers are natural alignment, but some
drivers have bugs. These are evident on platforms that need natural
alignment to access registers. This change contains the following:
1. Function _rtl8821ae_dbi_read() is used to read one byte from DBI,
thus it should use rtl_read_byte().
2. Register 0x4C7 of 8192ee is single byte.
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The "fw" firmware object is passed from the remoteproc core and should
not be overwritten, as that results in leaked buffers and a double free
of the the last firmware object.
Fixes: 051fb70fd4ea ("remoteproc: qcom: Driver for the self-authenticating Hexagon v5") Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The RDMA core uses ib_pack() to convert from unpacked CPU structs
to on-the-wire bitpacked structs.
This process requires that 1 bit fields are declared as u8 in the
unpacked struct, otherwise the packing process does not read the
value properly and the packed result is wired to 0. Several
places wrongly used int.
Crucially this means the kernel has never, set reversible
correctly in the path record request. It has always asked for
irreversible paths even if the ULP requests otherwise.
When the kernel is used with a SM that supports this feature, it
completely breaks communication management if reversible paths are
not properly requested.
The only reason this ever worked is because opensm ignores the
reversible bit.
VSS may use a char device to support the communication between
the user level daemon and the driver. When the VSS channel is rescinded
we need to make sure that the char device is fully cleaned up before
we can process a new VSS offer from the host. Implement this logic.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fcopy may use a char device to support the communication between
the user level daemon and the driver. When the Fcopy channel is rescinded
we need to make sure that the char device is fully cleaned up before
we can process a new Fcopy offer from the host. Implement this logic.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KVP may use a char device to support the communication between
the user level daemon and the driver. When the KVP channel is rescinded
we need to make sure that the char device is fully cleaned up before
we can process a new KVP offer from the host. Implement this logic.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The host can rescind a channel that has been offered to the
guest and once the channel is rescinded, the host does not
respond to any requests on that channel. Deal with the case where
the guest may be blocked waiting for a response from the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After the channel is rescinded, the host does not read from the rescinded channel.
Fail writes to a channel that has already been rescinded. If we permit writes on a
rescinded channel, since the host will not respond we will have situations where
we will be unable to unload vmbus drivers that cannot have any outstanding requests
to the host at the point they are unoaded.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It may happen that secondary CPUs are still alive and resetting
hv_context.tsc_page will cause a consequent crash in read_hv_clock_tsc()
as we don't check for it being not NULL there. It is safe as we're not
freeing this page anyways.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Initializing hv_context.percpu_list in hv_synic_alloc() helps to prevent a
crash in percpu_channel_enq() when not all CPUs were online during
initialization and it naturally belongs there.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It may happen that not all CPUs are online when we do hv_synic_alloc() and
in case more CPUs come online later we may try accessing these allocated
structures.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As IN request has to be allocated in set_alt() and released in
disable() we cannot use mutex to protect it as we cannot sleep
in those funcitons. Let's replace this mutex with a spinlock.
Tested-by: David Lechner <david@lechnology.com> Signed-off-by: Krzysztof Opasiak <k.opasiak@samsung.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we unlock our spinlock to copy data to user we may get
disabled by USB host and free the whole list of completed out
requests including the one from which we are copying the data
to user memory.
To prevent from this let's remove our working element from
the list and place it back only if there is sth left when we
finish with it.
Fixes: 99c515005857 ("usb: gadget: hidg: register OUT INT endpoint for SET_REPORT") Tested-by: David Lechner <david@lechnology.com> Signed-off-by: Krzysztof Opasiak <k.opasiak@samsung.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Requests for out endpoint are allocated in bind() function
but never released.
This commit ensures that all pending requests are released
when we disable out endpoint.
Fixes: 99c515005857 ("usb: gadget: hidg: register OUT INT endpoint for SET_REPORT") Tested-by: David Lechner <david@lechnology.com> Signed-off-by: Krzysztof Opasiak <k.opasiak@samsung.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 304f7e5e1d08 ("usb: gadget: Refactor request completion")
removed check if req->req.complete is non-NULL, resulting in a NULL
pointer derefence and a kernel panic.
This patch adds an empty complete function instead of re-introducing
the req->req.complete check.
Fixes: 304f7e5e1d08 ("usb: gadget: Refactor request completion") Signed-off-by: Magnus Lilja <lilja.magnus@gmail.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 855ed04a3758 ("usb: gadget: udc-core: independent registration
of gadgets and gadget drivers")
if we load gadget module but there is no free udc available
then it will be stored on a pending gadgets list.
$ modprobe g_zero.ko
$ modprobe g_ether.ko
[] udc-core: couldn't find an available UDC - added [g_ether] to list
of pending drivers
We scan this list each time when new UDC appears in system.
But we can get a free UDC each time after gadget unbind.
This commit add scanning of that list directly after unbinding
gadget from udc.
Thanks to this, when we unload first gadget:
$ rmmod g_zero.ko
gadget which is pending is automatically
attached to that UDC (if name matches).
Fixes: 855ed04a3758 ("usb: gadget: udc-core: independent registration of gadgets and gadget drivers") Signed-off-by: Krzysztof Opasiak <k.opasiak@samsung.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit 4ac53087d6d4 ("usb: xhci: plat: Create both
HCDs before adding them") move add hcd to the end of
probe, this cause hcc_params uninitiated, because xHCI
driver sets hcc_params in xhci_gen_setup() called from
usb_add_hcd().
This patch checks the Maximum Primary Stream Array Size
in the hcc_params register after add primary hcd.
Signed-off-by: William wu <william.wu@rock-chips.com> Acked-by: Roger Quadros <rogerq@ti.com> Fixes: 4ac53087d6d4 ("usb: xhci: plat: Create both HCDs before adding them") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At least macOS seems to be sending
ClearFeature(ENDPOINT_HALT) to endpoints which
aren't Halted. This makes DWC3's CLEARSTALL command
time out which causes several issues for the driver.
Instead, let's just return 0 and bail out early.
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DA8xx driver is registering and using the CPPI 3.0 DMA controller but
actually, the DA8xx has a CPPI 4.1 DMA controller.
Remove the CPPI 3.0 quirk and methods.
Fixes: f8e9f34f80a2 ("usb: musb: Fix up DMA related macros") Fixes: 7f6283ed6fe8 ("usb: musb: Set up function pointers for DMA") Signed-off-by: Alexandre Bailon <abailon@baylibre.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Bin Liu <b-liu@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ds2490 driver was doing USB transfers from / to buffers on a stack.
This is not permitted and made the driver non-working with vmapped stacks.
Since all these transfers are done under the same bus_mutex lock we can
simply use shared buffers in a device private structure for two most common
of them.
While we are at it, let's also fix a comparison between int and size_t in
ds9490r_search() which made the driver spin in this function if state
register get requests were failing.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Near the beginning of w1_attach_slave_device() we increment a w1 master
reference count.
Later, when we are going to exit this function without actually attaching
a slave device (due to failure of __w1_attach_slave_device()) we need to
decrement this reference count back.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Fixes: 9fcbbac5ded489 ("w1: process w1 netlink commands in w1_process thread") Cc: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 05ca5270005c can: gs_usb: add ethtool set_phys_id callback to locate physical device
The gs_usb driver is performing USB transfers using buffers allocated on
the stack. This causes the driver to not function with vmapped stacks.
Instead, allocate memory for the transfer buffers.
Fixes a regression triggered by a change in the layout of
struct iio_chan_spec, but the real bug is in the driver which assumed
a specific structure layout in the first place. Hint: the two bits were
not OR:ed together as implied by the indentation prior to this patch,
there was a comma between them, which accidentally moved the ..._SCALE
bit to the next structure field. That field was .info_mask_shared_by_type
before the _available attributes was added by commit 51239600074b
("iio:core: add a callback to allow drivers to provide _available
attributes") and .info_mask_separate_available afterwards, and the
regression happened.
info_mask_shared_by_type is actually a better choice than the originally
intended info_mask_separate for the ..._SCALE bit since a constant is
returned from mpl3115_read_raw for the scale. Using
info_mask_shared_by_type also preserves the behavior from before the
regression and is therefore less likely to cause other interesting side
effects.
The above mentioned regression causes an unintended sysfs attibute to
show up that is not backed by code, in turn causing the following NULL
pointer defererence to happen on access.
Fixes: cc26ad455f57 ("iio: Add Freescale MPL3115A2 pressure / temperature sensor driver") Fixes: 51239600074b ("iio:core: add a callback to allow drivers to provide _available attributes") Reported-by: Ken Lin <ken.lin@advantech.com> Tested-by: Ken Lin <ken.lin@advantech.com> Signed-off-by: Peter Rosin <peda@axentia.se> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes a regression triggered by a change in the layout of
struct iio_chan_spec, but the real bug is in the driver which assumed
a specific structure layout in the first place. Hint: the three bits were
not OR:ed together as implied by the indentation prior to this patch,
there was a comma between the first two, which accidentally moved the
..._SCALE and ..._OFFSET bits to the next structure field. That field
was .info_mask_shared_by_type before the _available attributes was added
by commit 51239600074b ("iio:core: add a callback to allow drivers to
provide _available attributes") and .info_mask_separate_available
afterwards, and the regression happened.
info_mask_shared_by_type is actually a better choice than the originally
intended info_mask_separate for the ..._SCALE and ..._OFFSET bits since
a constant is returned from mpl115_read_raw for the scale/offset. Using
info_mask_shared_by_type also preserves the behavior from before the
regression and is therefore less likely to cause other interesting side
effects.
The above mentioned regression causes unintended sysfs attibutes to
show up that are not backed by code, in turn causing a NULL pointer
defererence to happen on access.
Fixes: 3017d90e8931 ("iio: Add Freescale MPL115A2 pressure / temperature sensor driver") Fixes: 51239600074b ("iio:core: add a callback to allow drivers to provide _available attributes") Signed-off-by: Peter Rosin <peda@axentia.se> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The IRQFD framework calls the architecture dependent function
twice if the corresponding GSI type is edge triggered. For ARM,
the function kvm_set_msi() is getting called twice whenever the
IRQFD receives the event signal. The rest of the code path is
trying to inject the MSI without any validation checks. No need
to call the function vgic_its_inject_msi() second time to avoid
an unnecessary overhead in IRQ queue logic. It also avoids the
possibility of VM seeing the MSI twice.
Simple fix, return -1 if the argument 'level' value is zero.
Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since it was introduced in commit da8d02d19ffdd201 ("arm64/capabilities:
Make use of system wide safe value"), __raw_read_system_reg() has
erroneously mapped some sysreg IDs to other registers.
For the fields in ID_ISAR5_EL1, our local feature detection will be
erroneous. We may spuriously detect that a feature is uniformly
supported, or may fail to detect when it actually is, meaning some
compat hwcaps may be erroneous (or not enforced upon hotplug).
This patch corrects the erroneous entries.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: da8d02d19ffdd201 ("arm64/capabilities: Make use of system wide safe value") Reported-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When bypassing SWIOTLB on small-memory systems, we need to avoid calling
into swiotlb_dma_mapping_error() in exactly the same way as we avoid
swiotlb_dma_supported(), because the former also relies on SWIOTLB state
being initialised.
Under the assumptions for which we skip SWIOTLB, dma_map_{single,page}()
will only ever return the DMA-offset-adjusted physical address of the
page passed in, thus we can report success unconditionally.
Fixes: b67a8b29df7e ("arm64: mm: only initialize swiotlb when necessary") CC: Jisheng Zhang <jszhang@marvell.com> Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we fault in a page, we flush it to the PoC (Point of Coherency)
if the faulting vcpu has its own caches off, so that it can observe
the page we just brought it.
But if the vcpu has its caches on, we skip that step. Bad things
happen when *another* vcpu tries to access that page with its own
caches disabled. At that point, there is no garantee that the
data has made it to the PoC, and we access stale data.
The obvious fix is to always flush to PoC when a page is faulted
in, no matter what the state of the vcpu is.
Fixes: 2d58b733c876 ("arm64: KVM: force cache clean on page fault when caches are off") Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kirill reported a warning from UBSAN about undefined behavior when using
protection keys. He is running on hardware that actually has support for
it, which is not widely available.
The warning triggers because of very large shifts of integers when doing a
pkey_free() of a large, invalid value. This happens because we never check
that the pkey "fits" into the mm_pkey_allocation_map().
I do not believe there is any danger here of anything bad happening
other than some aliasing issues where somebody could do:
pkey_free(35);
and the kernel would effectively execute:
pkey_free(8);
While this might be confusing to an app that was doing something stupid, it
has to do something stupid and the effects are limited to the app shooting
itself in the foot.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-kselftest@vger.kernel.org Cc: shuah@kernel.org Cc: kirill.shutemov@linux.intel.com Link: http://lkml.kernel.org/r/20170223222603.A022ED65@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fuse_file_put() was missing the "force" flag for the RELEASE request when
sending synchronously (fuseblk).
If this flag is not set, then a sync request may be interrupted before it
is dequeued by the userspace filesystem. In this case the OPEN won't be
balanced with a RELEASE.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 5a18ec176c93 ("fuse: fix hang of single threaded fuseblk filesystem") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Running with KASAN and crypto tests currently gives
BUG: KASAN: global-out-of-bounds in __test_aead+0x9d9/0x2200 at addr ffffffff8212fca0
Read of size 16 by task cryptomgr_test/1107
Address belongs to variable 0xffffffff8212fca0
CPU: 0 PID: 1107 Comm: cryptomgr_test Not tainted 4.10.0+ #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
Call Trace:
dump_stack+0x63/0x8a
kasan_report.part.1+0x4a7/0x4e0
? __test_aead+0x9d9/0x2200
? crypto_ccm_init_crypt+0x218/0x3c0 [ccm]
kasan_report+0x20/0x30
check_memory_region+0x13c/0x1a0
memcpy+0x23/0x50
__test_aead+0x9d9/0x2200
? kasan_unpoison_shadow+0x35/0x50
? alg_test_akcipher+0xf0/0xf0
? crypto_skcipher_init_tfm+0x2e3/0x310
? crypto_spawn_tfm2+0x37/0x60
? crypto_ccm_init_tfm+0xa9/0xd0 [ccm]
? crypto_aead_init_tfm+0x7b/0x90
? crypto_alloc_tfm+0xc4/0x190
test_aead+0x28/0xc0
alg_test_aead+0x54/0xd0
alg_test+0x1eb/0x3d0
? alg_find_test+0x90/0x90
? __sched_text_start+0x8/0x8
? __wake_up_common+0x70/0xb0
cryptomgr_test+0x4d/0x60
kthread+0x173/0x1c0
? crypto_acomp_scomp_free_ctx+0x60/0x60
? kthread_create_on_node+0xa0/0xa0
ret_from_fork+0x2c/0x40
Memory state around the buggy address: ffffffff8212fb80: 00 00 00 00 01 fa fa fa fa fa fa fa 00 00 00 00 ffffffff8212fc00: 00 01 fa fa fa fa fa fa 00 00 00 00 01 fa fa fa
>ffffffff8212fc80: fa fa fa fa 00 05 fa fa fa fa fa fa 00 00 00 00
^ ffffffff8212fd00: 01 fa fa fa fa fa fa fa 00 00 00 00 01 fa fa fa ffffffff8212fd80: fa fa fa fa 00 00 00 00 00 05 fa fa fa fa fa fa
This always happens on the same IV which is less than 16 bytes.
Per Ard,
"CCM IVs are 16 bytes, but due to the way they are constructed
internally, the final couple of bytes of input IV are dont-cares.
Apparently, we do read all 16 bytes, which triggers the KASAN errors."
Fix this by padding the IV with null bytes to be at least 16 bytes.
Fixes: 0bc5a6c5c79a ("crypto: testmgr - Disable rfc4309 test and convert test vectors") Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If dso__load_kcore frees all of the existing maps, but one has already
been attached to a callchain cursor node, then we can get a SIGSEGV in
any function that happens to try to use this invalid cursor. Use the
existing map refcount mechanism to forestall cleanup of a map until the
cursor iterates past the node.
DoS protection conditions were altered in WS2016 and now it's easy to get
-EAGAIN returned from vmbus_post_msg() (e.g. when we try changing MTU on a
netvsc device in a loop). All vmbus_post_msg() callers don't retry the
operation and we usually end up with a non-functional device or crash.
While host's DoS protection conditions are unknown to me my tests show that
it can take up to 10 seconds before the message is sent so doing udelay()
is not an option, we really need to sleep. Almost all vmbus_post_msg()
callers are ready to sleep but there is one special case:
vmbus_initiate_unload() which can be called from interrupt/NMI context and
we can't sleep there. I'm also not sure about the lonely
vmbus_send_tl_connect_request() which has no in-tree users but its external
users are most likely waiting for the host to reply so sleeping there is
also appropriate.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
eb5767122feb ("PCI: altera: Simplify TLB_CFG_DW0 usage") used
TLP_FMTTYPE_CFGRD* (instead of TLP_FMTTYPE_CFGWR*) for TLP writes, which
causes writing to configuration space to fail. Fix it by using correct
FMTTYPE for write operation.
pnv_php_disable_irq() can be called in two paths: Bailing path in
pnv_php_enable_irq() or releasing slot. The MSI (or MSIx) interrupts
is disabled unconditionally in pnv_php_disable_irq(). It's wrong
because that might be enabled by drivers other than pnv-php.
This disables MSI (or MSIx) interrupts and the PCI device only if
it was enabled by pnv-php. In the error path of pnv_php_enable_irq(),
we rely on the newly added parameter @disable_device. In the path
of releasing slot, @pnv_php->irq is checked.
Fixes: 360aebd85a4c ("drivers/pci/hotplug: Support surprise hotplug in powernv driver") Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The devfn of 00:02.0 is 0x10. devfn_to_wslot(0x10) == 0x2, and
wslot_to_devfn(0x2) should be 0x10, while it's 0x2 in the current code.
Due to this, hv_eject_device_work() -> pci_get_domain_bus_and_slot()
returns NULL and pci_stop_and_remove_bus_device() is not called.
Later when the real device driver's .remove() is invoked by
hv_pci_remove() -> pci_stop_root_bus(), some warnings can be noticed
because the VM has lost the access to the underlying device at that
time.
Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Haiyang Zhang <haiyangz@microsoft.com> CC: K. Y. Srinivasan <kys@microsoft.com> CC: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes the OTP register definitions for the AR934x and AR9550
WMAC SoC.
Previously, the ath9k driver was unable to initialize the integrated
WMAC on an Aerohive AP121:
| ath: phy0: timeout (1000 us) on reg 0x30018: 0xbadc0ffe & 0x00000007 != 0x00000004
| ath: phy0: timeout (1000 us) on reg 0x30018: 0xbadc0ffe & 0x00000007 != 0x00000004
| ath: phy0: Unable to initialize hardware; initialization status: -5
| ath9k ar934x_wmac: failed to initialize device
| ath9k: probe of ar934x_wmac failed with error -5
It turns out that the AR9300_OTP_STATUS and AR9300_OTP_DATA
definitions contain a typo.
Cc: Gabor Juhos <juhosg@openwrt.org> Fixes: add295a4afbdf5852d0 "ath9k: use correct OTP register offsets for AR9550" Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: Chris Blake <chrisrblake93@gmail.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code currently relies on refcounting to disable IRQs from within the
IRQ handler and re-enabling them again after the tasklet has run.
However, due to race conditions sometimes the IRQ handler might be
called twice, or the tasklet may not run at all (if interrupted in the
middle of a reset).
This can cause nasty imbalances in the irq-disable refcount which will
get the driver permanently stuck until the entire radio has been stopped
and started again (ath_reset will not recover from this).
Instead of using this fragile logic, change the code to ensure that
running the irq handler during tasklet processing is safe, and leave the
refcount untouched.
Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rx filter reset and the dynamic tx switch mode (EXT_RESOURCE_CFG)
configuration are causing the following errors when UTF firmware
is loaded to the target.
Parallel reads from multiple threads on a file descriptor
are not well defined and racy. It is safer to return to original
behavior and simply fail the additional read.
The solution is to remove request for next read credit.
Fixes: ff1586a7ea57 ("mei: enqueue consecutive reads") Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the file system requires journal recovery, and the device is
read-ony, return EROFS to the mount system call. This allows xfstests
generic/050 to pass.
If the journal is aborted, the needs_recovery feature flag should not
be removed. Otherwise, it's the journal might not get replayed and
this could lead to more data getting lost.