struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used. -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews. The actual problem (at least for
common crushmaps) isn't the u32 -> u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.
commit 66fc13039422ba7df2d01a8ee0873e4ef965b50b ("mm: shmem_zero_setup
skip security check and lockdep conflict with XFS") caused a regression
for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on
shared anonymous mappings. However, even before that regression, the
checking on such mprotect PROT_EXEC calls was inconsistent with the
checking on a mmap PROT_EXEC call for a shared anonymous mapping. On a
mmap, the security hook is passed a NULL file and knows it is dealing
with an anonymous mapping and therefore applies an execmem check and no
file checks. On a mprotect, the security hook is passed a vma with a
non-NULL vm_file (as this was set from the internally-created shmem
file during mmap) and therefore applies the file-based execute check
and no execmem check. Since the aforementioned commit now marks the
shmem zero inode with the S_PRIVATE flag, the file checks are disabled
and we have no checking at all on mprotect PROT_EXEC. Add a test to
the mprotect hook logic for such private inodes, and apply an execmem
check in that case. This makes the mmap and mprotect checking
consistent for shared anonymous mappings, as well as for /dev/zero and
ashmem.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At present we don't create efficient ebitmaps when importing NetLabel
category bitmaps. This can present a problem when comparing ebitmaps
since ebitmap_cmp() is very strict about these things and considers
these wasteful ebitmaps not equal when compared to their more
efficient counterparts, even if their values are the same. This isn't
likely to cause problems on 64-bit systems due to a bit of luck on
how NetLabel/CIPSO works and the default ebitmap size, but it can be
a problem on 32-bit systems.
This patch fixes this problem by being a bit more intelligent when
importing NetLabel category bitmaps by skipping over empty sections
which should result in a nice, efficient ebitmap.
Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using the clone ioctl (or extent_same ioctl, which calls the same extent
cloning function as well) we end up allowing copy an inline extent from
the source file into a non-zero offset of the destination file. This is
something not expected and that the btrfs code is not prepared to deal
with - all inline extents must be at a file offset equals to 0.
For example, the following excerpt of a test case for fstests triggers
a crash/BUG_ON() on a write operation after an inline extent is cloned
into a non-zero offset:
_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount
# Create our test files. File foo has the same 2K of data at offset 4K
# as file bar has at its offset 0.
$XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
-c "pwrite -S 0xbb 4k 2K" \
-c "pwrite -S 0xcc 8K 4K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# File bar consists of a single inline extent (2K size).
$XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
$SCRATCH_MNT/bar | _filter_xfs_io
# Now call the clone ioctl to clone the extent of file bar into file
# foo at its offset 4K. This made file foo have an inline extent at
# offset 4K, something which the btrfs code can not deal with in future
# IO operations because all inline extents are supposed to start at an
# offset of 0, resulting in all sorts of chaos.
# So here we validate that clone ioctl returns an EOPNOTSUPP, which is
# what it returns for other cases dealing with inlined extents.
$CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
$SCRATCH_MNT/bar $SCRATCH_MNT/foo
# Because of the inline extent at offset 4K, the following write made
# the kernel crash with a BUG_ON().
$XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
status=0
exit
The stack trace of the BUG_ON() triggered by the last write is:
Fix this by returning the error EOPNOTSUPP if an attempt to copy an
inline extent into a non-zero offset happens, just like what is done for
other scenarios that would require copying/splitting inline extents,
which were introduced by the following commits:
00fdf13a2e9f ("Btrfs: fix a crash of clone with inline extents's split") 3f9e3df8da3c ("btrfs: replace error code from btrfs_drop_extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we call btrfs_commit_transaction(), we splice the list "ordered"
of our transaction handle into the transaction's "pending_ordered"
list, but we don't re-initialize the "ordered" list of our transaction
handle, this means it still points to the same elements it used to
before the splice. Then we check if the current transaction's state is
>= TRANS_STATE_COMMIT_START and if it is we end up calling
btrfs_end_transaction() which simply splices again the "ordered" list
of our handle into the transaction's "pending_ordered" list, leaving
multiple pointers to the same ordered extents which results in list
corruption when we are iterating, removing and freeing ordered extents
at btrfs_wait_pending_ordered(), resulting in access to dangling
pointers / use-after-free issues.
Similarly, btrfs_end_transaction() can end up in some cases calling
btrfs_commit_transaction(), and both did a list splice of the transaction
handle's "ordered" list into the transaction's "pending_ordered" without
re-initializing the handle's "ordered" list, resulting in exactly the
same problem.
This produces the following warning on a kernel with linked list
debugging enabled:
On a non-debug kernel this leads to invalid memory accesses, causing a
crash. Fix this by using list_splice_init() instead of list_splice() in
btrfs_commit_transaction() and btrfs_end_transaction().
Fixes: 50d9aa99bd35 ("Btrfs: make sure logged extents complete in the current transaction V3" Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We were allocating memory with memdup_user() but we were never releasing
that memory. This affected pretty much every call to the ioctl, whether
it deduplicated extents or not.
This issue was reported on IRC by Julian Taylor and on the mailing list
by Marcel Ritter, credit goes to them for finding the issue.
Reported-by: Julian Taylor <jtaylor.debian@googlemail.com> Reported-by: Marcel Ritter <ritter.marcel@gmail.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we do an append write to a file (which increases its inode's i_size)
that does not have the flag BTRFS_INODE_NEEDS_FULL_SYNC set in its inode,
and the previous transaction added a new hard link to the file, which sets
the flag BTRFS_INODE_COPY_EVERYTHING in the file's inode, and then fsync
the file, the inode's new i_size isn't logged. This has the consequence
that after the fsync log is replayed, the file size remains what it was
before the append write operation, which means users/applications will
not be able to read the data that was successsfully fsync'ed before.
This happens because neither the inode item nor the delayed inode get
their i_size updated when the append write is made - doing so would
require starting a transaction in the buffered write path, something that
we do not do intentionally for performance reasons.
Fix this by making sure that when the flag BTRFS_INODE_COPY_EVERYTHING is
set the inode is logged with its current i_size (log the in-memory inode
into the log tree).
This issue is not a recent regression and is easy to reproduce with the
following test case for fstests:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
here=`pwd`
tmp=/tmp/$$
status=1 # failure is the default!
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
. ./common/dmflakey
# real QA test starts here
_supported_fs generic
_supported_os Linux
_need_to_be_root
_require_scratch
_require_dm_flakey
_require_metadata_journaling $SCRATCH_DEV
_crash_and_mount()
{
# Simulate a crash/power loss.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
# Allow writes again and mount. This makes the fs replay its fsync log.
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
}
# Create the test file with some initial data and then fsync it.
# The fsync here is only needed to trigger the issue in btrfs, as it causes the
# the flag BTRFS_INODE_NEEDS_FULL_SYNC to be removed from the btrfs inode.
$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 32k" \
-c "fsync" \
$SCRATCH_MNT/foo | _filter_xfs_io
sync
# Add a hard link to our file.
# On btrfs this sets the flag BTRFS_INODE_COPY_EVERYTHING on the btrfs inode,
# which is a necessary condition to trigger the issue.
ln $SCRATCH_MNT/foo $SCRATCH_MNT/bar
# Sync the filesystem to force a commit of the current btrfs transaction, this
# is a necessary condition to trigger the bug on btrfs.
sync
# Now append more data to our file, increasing its size, and fsync the file.
# In btrfs because the inode flag BTRFS_INODE_COPY_EVERYTHING was set and the
# write path did not update the inode item in the btree nor the delayed inode
# item (in memory struture) in the current transaction (created by the fsync
# handler), the fsync did not record the inode's new i_size in the fsync
# log/journal. This made the data unavailable after the fsync log/journal is
# replayed.
$XFS_IO_PROG -c "pwrite -S 0xbb 32K 32K" \
-c "fsync" \
$SCRATCH_MNT/foo | _filter_xfs_io
echo "File content after fsync and before crash:"
od -t x1 $SCRATCH_MNT/foo
_crash_and_mount
echo "File content after crash and log replay:"
od -t x1 $SCRATCH_MNT/foo
status=0
exit
The expected file output before and after the crash/power failure expects the
appended data to be available, which is:
0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
* 0100000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
* 0200000
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
we could have a concurrent call to btrfs_return_ino() that adds a new
entry to the root's free space cache of pinned inodes. This concurrent
call does not acquire the fs_info->commit_root_sem before adding a new
entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
because the caching kthread calls btrfs_unpin_free_ino() after setting
the caching state to BTRFS_CACHE_FINISHED and therefore races with
the task calling btrfs_return_ino(), which is adding a new entry, while
the former (caching kthread) is navigating the cache's rbtree, removing
and freeing nodes from the cache's rbtree without acquiring the spinlock
that protects the rbtree.
This race resulted in memory corruption due to double free of struct
btrfs_free_space objects because both tasks can end up doing freeing the
same objects. Note that adding a new entry can result in merging it with
other entries in the cache, in which case those entries are freed.
This is particularly important as btrfs_free_space structures are also
used for the block group free space caches.
This memory corruption can be detected by a debugging kernel, which
reports it with the following trace:
The free space entries are allocated using kmem_cache_zalloc(),
through __btrfs_add_free_space(), therefore we should use
kmem_cache_free() and not kfree() to avoid any confusion and
any potential problem. Looking at the kfree() definition at
mm/slab.c it has the following comment:
/*
* (...)
*
* Don't free memory not originally allocated by kmalloc()
* or you will run into trouble.
*/
So better be safe and use kmem_cache_free().
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/md/md.c: In function "update_array_info":
drivers/md/md.c:6394:26: warning: logical not is only applied
to the left hand side of comparison [-Wlogical-not-parentheses]
!mddev->persistent != info->not_persistent||
Fix it as Neil Brown said:
mddev->persistent != !info->not_persistent ||
Since commit bafc9b754f75 ("vfs: More precise tests in d_invalidate"),
mounted subvolumes can be deleted because d_invalidate() won't fail.
However, we run into problems when we attempt to delete the default
subvolume while it is mounted as the root filesystem:
# btrfs subvol list /
ID 257 gen 306 top level 5 path rootvol
ID 267 gen 334 top level 5 path snap1
# btrfs subvol get-default /
ID 267 gen 334 top level 5 path snap1
# btrfs inspect-internal rootid /
267
# mount -o subvol=/ /dev/vda1 /mnt
# btrfs subvol del /mnt/snap1
Delete subvolume (no-commit): '/mnt/snap1'
ERROR: cannot delete '/mnt/snap1' - Operation not permitted
# findmnt /
findmnt: can't read /proc/mounts: No such file or directory
# ls /proc
#
Markus reported that this same scenario simply led to a kernel oops.
This happens because in btrfs_ioctl_snap_destroy(), we call
d_invalidate() before we check may_destroy_subvol(), which means that we
detach the submounts and drop the dentry before erroring out. Instead,
we should only invalidate the dentry once the deletion has succeeded.
Additionally, the shrink_dcache_sb() isn't necessary; d_invalidate()
will prune the dcache for the deleted subvolume.
Fixes: bafc9b754f75 ("vfs: More precise tests in d_invalidate") Reported-by: Markus Schauler <mschauler@gmail.com> Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The mcp3021 scaling code is dividing the VDD (full-scale) value in
millivolts by the A2D resolution to obtain the scaling factor. When VDD
is 3300mV (the standard value) and the resolution is 12-bit (4096
divisions), the result is a scale factor of 3300/4096, which is always
one. Effectively, the raw A2D reading is always being returned because
no scaling is applied.
This patch fixes the issue and simplifies the register-to-volts
calculation, removing the unneeded "output_scale" struct member.
Signed-off-by: Nick Stevens <Nick.Stevens@digi.com>
[Guenter Roeck: Dropped unnecessary value check] Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a bug that the bitmap superblock isn't initialised properly for
dm-raid, so a new field can have garbage in new fields.
(dm-raid does initialisation in the kernel - md initialised the
superblock in mdadm).
This means that for dm-raid we cannot currently trust the new ->nodes
field. So:
- use __GFP_ZERO to initialise the superblock properly for all new
arrays
- initialise all fields in bitmap_info in bitmap_new_disk_sb
- ignore ->nodes for dm arrays (yes, this is a hack)
This bug exposes dm-raid to bug in the (still experimental) md-cluster
code, so it is suitable for -stable. It does cause crashes.
If ->private is set when ->run is called, it is assumed to be
a 'config' prepared as part of 'reshape'.
So it is important when we free that config, that we also clear ->private.
This is not often a problem as the mddev will normally be discarded
shortly after the config us freed.
However if an 'assemble' races with a final close, the assemble can use
the old mddev which has a stale ->private. This leads to any of
various sorts of crashes.
So clear ->private after calling ->free().
Reported-by: Nate Clark <nate@neworld.us> Fixes: afa0f557cb15 ("md: rename ->stop to ->free") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes a bug in the XOR driver where the cleanup function can be
called and free descriptors that never been processed by the engine (which
result in data errors).
The cleanup function will free descriptors based on the ownership bit in
the descriptors.
event-sample-980 [000] .... 43.649559: foo_bar: foo hello 21 0x15
BIT1|BIT3|0x10 {0x1,0x6f6f6e53,0xff007970,0xffffffff} Snoopy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The array length is not right, should be {0x1}.
(ffffffff,ffffffff)
event-sample-980 [000] .... 44.653827: foo_bar: foo hello 22 0x16
BIT2|BIT3|0x10
{0x1,0x2,0x646e6147,0x666c61,0xffffffff,0xffffffff,0x750aeffe,0x7}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The array length is not right, should be {0x1,0x2}.
Gandalf (ffffffff,ffffffff)"
This was caused by an update to have __print_array()'s second parameter
be the count of items in the array and not the size of the array.
As there is already users of __print_array(), it can not change. But
the sample code can and we can also improve on the documentation about
__print_array() and __get_dynamic_array_len().
Link: http://lkml.kernel.org/r/1436839171-31527-2-git-send-email-hekuang@huawei.com Fixes: ac01ce1410fc2 ("tracing: Make ftrace_print_array_seq compute buf_len") Reported-by: He Kuang <hekuang@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fengguang Wu's tests triggered a bug in the branch tracer's start up
test when CONFIG_DEBUG_PREEMPT set. This was because that config
adds some debug logic in the per cpu field, which calls back into
the branch tracer.
The branch tracer has its own recursive checks, but uses a per cpu
variable to implement it. If retrieving the per cpu variable calls
back into the branch tracer, you can see how things will break.
Instead of using a per cpu variable, use the trace_recursion field
of the current task struct. Simply set a bit when entering the
branch tracing and clear it when leaving. If the bit is set on
entry, just don't do the tracing.
There's also the case with lockdep, as the local_irq_save() called
before the recursion can also trigger code that can call back into
the function. Changing that to a raw_local_irq_save() will protect
that as well.
This prevents the recursion and the inevitable crash that follows.
The trace.h header when called without CONFIG_EVENT_TRACING enabled
(seldom done), will not compile because of a typo in the protocol
of trace_event_enum_update().
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The filter_parse() function can call infix_get_op() which calls
infix_advance() that updates the infix filter pointers for the cnt
and tail without checking if the filter is already at the end, which
will put the cnt to zero and the tail beyond the end. The loop then calls
infix_next() that has
The cnt will now be below zero, and the tail that is returned is
already passed the end of the filter string. So far the allocation
of the filter string usually has some buffer that is zeroed out, but
if the filter string is of the exact size of the allocated buffer
there's no guarantee that the charater after the nul terminating
character will be zero.
Luckily, only root can write to the filter.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When testing the fix for the trace filter, I could not come up with
a scenario where the operand count goes below zero, so I added a
WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
that it can happen (although the filter would be wrong).
That is, a single operation without any operands will hit the path
where the WARN_ON_ONCE() can trigger. Although this is harmless,
and the filter is reported as a error. But instead of spitting out
a warning to the kernel dmesg, just fail nicely and report it via
the proper channels.
This patch defines a builtin measurement policy "tcb", similar to the
existing "ima_tcb", but with additional rules to also measure files
based on the effective uid and to measure files opened with the "read"
mode bit set (eg. read, read-write).
Changing the builtin "ima_tcb" policy could potentially break existing
users. Instead of defining a new separate boot command line option each
time the builtin measurement policy is modified, this patch defines a
single generic boot command line option "ima_policy=" to specify the
builtin policy and deprecates the use of the builtin ima_tcb policy.
[The "ima_policy=" boot command line option is based on Roberto Sassu's
"ima: added new policy type exec" patch.]
The current "mask" policy option matches files opened as MAY_READ,
MAY_WRITE, MAY_APPEND or MAY_EXEC. This patch extends the "mask"
option to match files opened containing one of these modes. For
example, "mask=^MAY_READ" would match files opened read-write.
The new "euid" policy condition measures files with the specified
effective uid (euid). In addition, for CAP_SETUID files it measures
files with the specified uid or suid.
Changelog:
- fixed checkpatch.pl warnings
- fixed avc denied {setuid} messages - based on Roberto's feedback
To prevent offline stripping of existing file xattrs and relabeling of
them at runtime, EVM allows only newly created files to be labeled. As
pseudo filesystems are not persistent, stripping of xattrs is not a
concern.
Some LSMs defer file labeling on pseudo filesystems. This patch
permits the labeling of existing files on pseudo files systems.
This patch adds a rule in the default measurement policy to skip inodes
in the cgroupfs filesystem. Measurements for this filesystem can be
avoided, as all the digests collected have the same value of the digest of
an empty file.
Furthermore, this patch updates the documentation of IMA policies in
Documentation/ABI/testing/ima_policy to make it consistent with
the policies set in security/integrity/ima/ima_policy.c.
__key_link_end is not freeing the associated array edit structure
and this leads to a 512 byte memory leak each time an identical
existing key is added with add_key().
The reason the add_key() system call returns okay is that
key_create_or_update() calls __key_link_begin() before checking to see
whether it can update a key directly rather than adding/replacing - which
it turns out it can. Thus __key_link() is not called through
__key_instantiate_and_link() and __key_link_end() must cancel the edit.
CVE-2015-1333
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The call to asymmetric_key_hex_to_key_id() from ca_keys_setup()
silently fails with -ENOMEM. Instead of dynamically allocating
memory from a __setup function, this patch defines a variable
and calls __asymmetric_key_hex_to_key_id(), a new helper function,
directly.
This bug was introduced by 'commit 46963b774d44 ("KEYS: Overhaul
key identification when searching for asymmetric keys")'.
Changelog:
- for clarification, rename hexlen to asciihexlen in
asymmetric_key_hex_to_key_id()
- add size argument to __asymmetric_key_hex_to_key_id() - David Howells
- inline __asymmetric_key_hex_to_key_id() - David Howells
- remove duplicate strlen() calls
When a cdev is contained in a dynamic structure the cdev parent kobj
should be set to the kobj that controls the lifetime of the enclosing
structure. In TPM's case this is the embedded struct device.
Also, cdev_init 0's the whole structure, so all sets must be after,
not before. This fixes module ref counting and cdev.
Fixes: 313d21eeab92 ("tpm: device class for tpm") Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tpm_ibmvtpm_probe() calls ibmvtpm_reset_crq(ibmvtpm) without having yet
set the virtual device in the ibmvtpm structure. So in ibmvtpm_reset_crq,
the phype call contains empty unit addresses, ibmvtpm->vdev->unit_address.
Signed-off-by: Hon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com> Signed-off-by: Joy Latten <jmlatten@linux.vnet.ibm.com> Reviewed-by: Ashley Lai <ashley@ahsleylai.com> Fixes: 132f76294744 ("drivers/char/tpm: Add new device driver to support IBM vTPM") Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
le64_to_cpu() was applied twice to the physical addresses read from the
control area. This hasn't shown any visible regressions because CRB
driver has been tested only on the little endian platofrms so far.
Reported-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Fixes: 30fc8d138e91 ("tpm: TPM 2.0 CRB Interface") Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A temperature conversion can take 750 ms and when possible the
w1_therm slave driver drops the bus_mutex to allow other bus
operations, but that includes operations such as a periodic slave
search, which can remove this slave when it is no longer detected.
If that happens the sl->family_data will be freed and set to NULL
causing w1_slave_show to crash when it wakes up.
The xfs_attr3_root_inactive() call from xfs_attr_inactive() assumes that
attribute blocks exist to invalidate. It is possible to have an
attribute fork without extents, however. Consider the case where the
attribute fork is created towards the beginning of xfs_attr_set() but
some part of the subsequent attribute set fails.
If an inode in such a state hits xfs_attr_inactive(), it eventually
calls xfs_dabuf_map() and possibly xfs_bmapi_read(). The former emits a
filesystem corruption warning, returns an error that bubbles back up to
xfs_attr_inactive(), and leads to destruction of the in-core attribute
fork without an on-disk reset. If the inode happens to make it back
through xfs_inactive() in this state (e.g., via a concurrent bulkstat
that cycles the inode from the reclaim state and releases it), i_afp
might not exist when xfs_bmapi_read() is called and causes a NULL
dereference panic.
A '-p 2' fsstress run to ENOSPC on a relatively small fs (1GB)
reproduces these problems. The behavior is a regression caused by:
6dfe5a0 xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
... which removed logic that avoided the attribute extent truncate when
no extents exist. Restore this logic to ensure the attribute fork is
destroyed and reset correctly if it exists without any allocated
extents.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we create a CRC filesystem, mount it, and create a symlink with
a path long enough that it can't live in the inode, we get a very
strange result upon remount:
# ls -l mnt
total 4
lrwxrwxrwx. 1 root root 929 Jun 15 16:58 link -> XSLM
XSLM is the V5 symlink block header magic (which happens to be
followed by a NUL, so the string looks terminated).
xfs_readlink_bmap() advanced cur_chunk by the size of the header
for CRC filesystems, but never actually used that pointer; it
kept reading from bp->b_addr, which is the start of the block,
rather than the start of the symlink data after the header.
Looks like this problem goes back to v3.10.
Fixing this gets us reading the proper link target, again.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 5d3abf8ff67f ("libata: Fall back to unqueued READ LOG EXT if
the DMA variant fails") allowed us to fall back to the unqueued READ
LOG variant if the queued version failed. However, if the device did
not support the page at all we would end up looping due to a merge
snafu.
The original justification for this was that the hpd handlers could
use the unknown state as a hint to force a full detection. But current
i915 code isn't doing that any more, and no one else really uses reset
on resume. So instead just keep the old state around.
References: http://article.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/62584
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100641 Cc: Rui Matos <tiagomatos@gmail.com> Cc: Julien Wajsberg <felash@gmail.com> Cc: kuddel.mail@gmx.de Cc: Lennart Poettering <mzxreary@0pointer.de> Acked-by: Rob Clark <robdclark@gmail.com> Tested-by: Rui Tiago Cação Matos <tiagomatos@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drm: add support for tiled/compressed/etc modifier in addfb2
Missed the structure packing/alignment problem where 64-bit
members were added after the odd number of 32-bit ones. This
makes the compiler produce structures of different sizes under
32- and 64-bit x86 targets and makes the ioctl need explicit
compat handling.
v2: Removed the typedef. (Daniel Vetter)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: Rob Clark <robdclark@gmail.com> Cc: Daniel Stone <daniels@collabora.com> Cc: Daniel Vetter <daniel.vetter@intel.com>
[danvet: Squash in compile fix from Mika.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rather than (incompletely [0]) re-implementing drm_gem_mmap() and
drm_gem_mmap_obj() helpers, call them directly from the rockchip mmap
routines.
Once the core functions return successfully, the rockchip mmap routines
can still use dma_mmap_attrs() to simply mmap the entire buffer.
[0] Previously, we were performing the mmap() without first taking a
reference on the underlying gem buffer. This could leak ptes if the gem
object is destroyed while userspace is still holding the mapping.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This can be the case when the GPU is powered off, e.g. via vgaswitcheroo
or runpm. When the GPU is powered up again, radeon_gart_table_vram_pin
flushes the TLB after setting rdev->gart.ptr to non-NULL.
Fixes panic on powering off R7xx GPUs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61529 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Everything is evicted from VRAM before suspend, so we need to make
sure all BOs are unpinned and re-pinned after resume. Fixes broken
mouse cursor after resume introduced by commit b9729b17.
[Michel Dänzer: Add pinning BOs on resume]
v2:
[Alex Deucher: merge cursor unpin into fb unpin loop]
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100541 Reviewed-by: Christian König <christian.koenig@amd.com> (v1) Signed-off-by: Grigori Goronzy <greg@chown.ath.cx> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Take a GEM reference for and pin the new cursor BO, unpin and drop the
GEM reference for the old cursor BO in radeon_crtc_cursor_set2, and use
radeon_crtc->cursor_addr in radeon_set_cursor.
This fixes radeon_cursor_reset accidentally incrementing the cursor BO
pin count, and cleans up the code a little.
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Trying to resolve issues with missed vblanks and impossible
values inside delivered kms pageflip completion events showed
that radeon's irq handling sometimes doesn't handle valid irqs,
but silently skips them. This was observed for vblank interrupts.
Although those irqs have corresponding events queued in the gpu's
irq ring at time of interrupt, and therefore the corresponding
handling code gets triggered by these events, the handling code
sometimes silently skipped processing the irq. The reason for those
skips is that the handling code double-checks for each irq event if
the corresponding irq status bits in the irq status registers
are set. Sometimes those bits are not set at time of check
for valid irqs, maybe due to some hardware race on some setups?
The problem only seems to happen on some machine + card combos
sometimes, e.g., never happened during my testing of different PC
cards of the DCE-2/3/4 generation a year ago, but happens consistently
now on two different Apple Mac cards (RV730, DCE-3, Apple iMac and
Evergreen JUNIPER, DCE-4 in a Apple MacPro). It also doesn't happen
at each interrupt but only occassionally every couple of
hundred or thousand vblank interrupts.
as well as skipped frames and problems for applications that
use kms pageflip events or vblank events, e.g., users of DRI2 and
DRI3/Present, Waylands Weston compositor, etc. See also
After some talking to Alex and Michel, we decided to fix this
by turning the double-check for asserted irq status bits into a
warning. Whenever a irq event is queued in the IH ring, always
execute the corresponding interrupt handler. Still check the irq
status bits, but only to log a DRM_DEBUG message on a mismatch.
This fixed the problems reliably on both previously failing
cards, RV-730 dual-head tested on both crtcs (pipes D1 and D2)
and a triple-output Juniper HD-5770 card tested on all three
available crtcs (D1/D2/D3). The r600 and evergreen irq handling
is therefore tested, but the cik an si handling is only compile
tested due to lack of hw.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> CC: Michel Dänzer <michel.daenzer@amd.com> CC: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This was regressed by commit 39e7f6f8, although I don't know of any
actual issues caused by it.
The storage domain is read without TTM locking now, but the lock
never helped to prevent any races.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Grigori Goronzy <greg@chown.ath.cx> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In order for hibernation to reliably work we need to properly turn
off the SDMA block, sadly after numerous attemps i haven't not found
proper sequence for clean and full shutdown. So simply reset both
SDMA block, this makes hibernation works reliably on sea island GPU
family (CI)
Hibernation and suspend to ram were tested (several times) on :
Bonaire
Hawaii
Mullins
Kaveri
Kabini
Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In order for hibernation to reliably work we need to cleanup more
thoroughly the compute ring. Hibernation is different from suspend
resume as when we resume from hibernation the hardware is first
fully initialize by regular kernel then freeze callback happens
(which correspond to a suspend inside the radeon kernel driver)
and turn off each of the block. It turns out we were not cleanly
shutting down the compute ring. This patch fix that.
Hibernation and suspend to ram were tested (several times) on :
Bonaire
Hawaii
Mullins
Kaveri
Kabini
Changed since v1:
- Factor the ring stop logic into a function taking ring as arg.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since the hardware sometimes mysteriously totally flummoxes the 64bit
read of a 64bit register when read using a single instruction, split the
read into two instructions. Since the read here is of automatically
incrementing timestamp counters, we also have to be very careful in
order to make sure that it does not increment between the two
instructions.
However, since userspace tried to workaround this issue and so enshrined
this ABI for a broken hardware read and in the process neglected that
the read only fails in some environments, we have to introduce a new
uABI flag for userspace to request the 2x32 bit accurate read of the
timestamp.
v2: Fix alignment check and include details of the workaround for
userspace.
Reported-by: Karol Herbst <freedesktop@karolherbst.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91317
Testcase: igt/gem_reg_read Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Tested-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It breaks existing old userspace which doesn't handle UNKNOWN
swizzling correct. Yes UNKNOWN was a thing back in 2009 and probably
still is on some other platforms, but it still pretty clearly broke
the testers machine. If we want this we need to extend the ioctl with
new paramters that only new userspace looks at.
Cc: Harald Arnesen <harald@skogtun.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reported-by: Harald Arnesen <harald@skogtun.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously only core DRM ioctls under the DRM_COMMAND_BASE were being
forwarded, but the drm.h header suggests (and reality confirms) ones
after (and including) DRM_COMMAND_END should be forwarded as well.
We need this to correctly forward the compat ioctl for the botched-up
addfb2.1 extension.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com>
[danvet: Explain why this is suddenly needed and add cc: stable.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The hang checker needs to inspect whether or not the ring request list is empty
as well as if the given engine has reached or passed the most recently
submitted request. The problem with this is that the hang checker cannot grab
the struct_mutex, which is required in order to safely inspect requests since
requests might be deallocated during inspection. In the past we've had kernel
panics due to this very unsynchronized access in the hang checker.
One solution to this problem is to not inspect the requests directly since
we're only interested in the seqno of the most recently submitted request - not
the request itself. Instead the seqno of the most recently submitted request is
stored separately, which the hang checker then inspects, circumventing the
issue of synchronization from the hang checker entirely.
drm/i915: Convert 'ring_idle()' to use requests not seqnos
v2 (Chris Wilson):
- Pass current engine seqno to ring_idle() from i915_hangcheck_elapsed() rather
than compute it over again.
- Remove extra whitespace.
Issue: VIZ-5998 Signed-off-by: Tomas Elf <tomas.elf@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add regressing commit citation provided by Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The old style of memory interleaving swizzled upto the end of the
first even bank of memory, and then used the remainder as unswizzled on
the unpaired bank - i.e. swizzling is not constant for all memory. This
causes problems when we try to migrate memory and so the kernel prevents
migration at all when we detect L-shaped inconsistent swizzling.
However, this issue also extends to userspace who try to manually detile
into memory as the swizzling for an individual page is unknown (it
depends on its physical address only known to the kernel), userspace
cannot correctly swizzle objects.
v2: Mark the global swizzling as unknown rather than adjust the value
reported to userspace.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91105 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We cannot let IPS enabled with no plane on the pipe:
BSpec: "IPS cannot be enabled until after at least one plane has
been enabled for at least one vertical blank." and "IPS must be
disabled while there is still at least one plane enabled on the
same pipe as IPS." This restriction apply to HSW and BDW.
However a shortcut path on update primary plane function
to make primary plane invisible by setting DSPCTRL to 0
was leting IPS enabled while there was no
other plane enabled on the pipe causing flickerings that we were
believing that it was caused by that other restriction where
ips cannot be used when pixel rate is greater than 95% of cdclok.
v2: Don't mess with Atomic path as pointed out by Ville.
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=85583 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If for some reason [1], the page directory/table does not exist, clear_range
would end up in an infinite while loop.
Introduced by commit 06fda602dbca ("drm/i915: Create page table allocators").
[1] This is already being addressed in one of Mika's patches:
http://mid.gmane.org/1432314314-23530-17-git-send-email-mika.kuoppala@intel.com
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reported-by: John Harrison <john.c.harrison@intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/gpu/drm/bridge/ptn3460.c: In function ‘ptn3460_pre_enable’:
drivers/gpu/drm/bridge/ptn3460.c:135: error: implicit declaration of function ‘gpiod_set_value’
drivers/gpu/drm/bridge/ptn3460.c: In function ‘ptn3460_probe’:
drivers/gpu/drm/bridge/ptn3460.c:333: error: implicit declaration of function ‘devm_gpiod_get’
drivers/gpu/drm/bridge/ptn3460.c:333: warning: assignment makes pointer from integer without a cast
drivers/gpu/drm/bridge/ptn3460.c:340: error: implicit declaration of function ‘gpiod_direction_output’
drivers/gpu/drm/bridge/ptn3460.c:346: warning: assignment makes pointer from integer without a cast
Add the missing #include <linux/gpio/consumer.h> to fix this.
If the function fails reference counter to the object is not decremented
causing leaks.
This is hard to spot as it happens only on very low memory situations.
Signed-off-by: Frediano Ziglio <fziglio@redhat.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If objects are moved back from system memory to VRAM (and spice id
created again) memory is already initialized so we need to set flag
to not clear memory.
If you don't do it after a while using desktop many images turns to
black or transparents.
Signed-off-by: Frediano Ziglio <fziglio@redhat.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The DPAUX read/write FIFO registers aren't sequential in the register
space, causing transfers larger than 4 bytes to cause accesses to non-
existing FIFO registers.
This validates the mst_primary under the lock, and then calls
into the check and send function. This makes the code a lot
easier to understand the locking rules in.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we are doing an MST transaction and we've gotten HPD and we
lookup the device from the incoming msg, we should take the mgr
lock around it, so that mst_primary and mstb->ports are valid.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I've only seen this once, and I failed to capture the
lockdep backtrace, but I did some investigations.
If we are calling into the MST layer from EDID probing,
we have the mode_config mutex held, if during that EDID
probing, the MST hub goes away, then we can get a deadlock
where the connector destruction function in the driver
tries to retake the mode config mutex.
This offloads connector destruction to a workqueue,
and avoid the subsequenct lock ordering issue.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since there's only one global instance ever we don't need to have
anything fancy. Stops a WARNING in the get_unique ioctl that the
unique name isn't set.
Reportedy-and-tested-by: Fabio Coatti <fabio.coatti@gmail.com> Cc: Fabio Coatti <fabio.coatti@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Events defined as watchpoints on nodes must have their config values
converted so that they apply to the respective node's XP. The
function setting new values was using wrong mask for the "port" field,
resulting in corrupted value. Fixed now.
at91sam9g45, at91sam9x5 and sama5 SoCs should not use
"atmel,at91sam9rl-udc" for their USB device compatible property since
this compatible is attached to a specific hardware bug fix.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Tested-by: Bo Shen <voice.shen@atmel.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In RS485 mode, we may want to set the delay_rts_after_send value to 0.
In the datasheet, the 0 value is said to "disable" the Transmitter Timeguard but
this is exactly the expected behavior if we want no delay...
Moreover, if the value was set to non-zero value by device-tree or earlier
ioctl command, it was impossible to change it back to zero.
Reported-by: Sami Pietikäinen <Sami.Pietikainen@wapice.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes a data corruption bug when using discard on top of MD linear,
raid0 and raid10 personalities.
Commit 20d0189b1012 "block: Introduce new bio_split()" permits sharing
the bio_vec between the two resulting bios. That is fine for read/write
requests where the bio_vec is immutable. For discards, however, we need
to be able to attach a payload and update the bio_vec so the page can
get mapped to a scatterlist entry. Therefore the bio_vec can not be
shared when splitting discards and we must do a full clone.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reported-by: Seunguk Shin <seunguk.shin@samsung.com> Tested-by: Seunguk Shin <seunguk.shin@samsung.com> Cc: Seunguk Shin <seunguk.shin@samsung.com> Cc: Jens Axboe <axboe@fb.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If there are too many pending per work I/O, too many
high priority work thread can be generated so that
system performance can be effected.
This patch limits the max_active parameter of workqueue as 16.
This patch fixes Fedora 22 live booting performance
regression when it is booted from squashfs over dm
based on loop, and looks the following reasons are
related with the problem:
- not like other filesyststems(such as ext4), squashfs
is a bit special, and I observed that increasing I/O jobs
to access file in squashfs only improve I/O performance a
little, but it can make big difference for ext4
- nested loop: both squashfs.img and ext3fs.img are mounted
as loop block, and ext3fs.img is inside the squashfs
- during booting, lots of tasks may run concurrently
Fixes: b5dd2f6047ca108001328aac0e8588edd15f1778 Cc: Justin M. Forbes <jforbes@fedoraproject.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Documentation/workqueue.txt:
If there is dependency among multiple work items used
during memory reclaim, they should be queued to separate
wq each with WQ_MEM_RECLAIM.
Loop devices can be stacked, so we have to convert to per-device
workqueue. One example is Fedora live CD.
Fixes: b5dd2f6047ca108001328aac0e8588edd15f1778 Cc: Justin M. Forbes <jforbes@fedoraproject.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Allocate memory using GFP_NOIO when deleting a btree. dm_btree_del()
can be called via an ioctl and we don't want to recurse into the FS or
block layer.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Given the pool's cell_sort_array holds 8192 pointers it triggers an
order 5 allocation via kmalloc. This order 5 allocation is prone to
failure as system memory gets more fragmented over time.
Fix this by allocating the cell_sort_array using vmalloc.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
redistribute3() shares entries out across 3 nodes. Some entries were
being moved the wrong way, breaking the ordering. This manifested as a
BUG() in dm-btree-remove.c:shift() when entries were removed from the
btree.
For additional context see:
https://www.redhat.com/archives/dm-devel/2015-May/msg00113.html
Signed-off-by: Dennis Yang <shinrairis@gmail.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The metadata space map has a simplified 'bootstrap' mode that is
operational when extending the space maps. Whilst in this mode it's
possible for some refcount decrement operations to become queued (eg, as
a result of shadowing one of the bitmap indexes). These decrements were
not being applied when switching out of bootstrap mode.
The effect of this bug was the leaking of a 4k metadata block. This is
detected by the latest version of thin_check as a non fatal error.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a race between a policy deciding to replace a cache entry,
the core target writing back any dirty data from this block, and other
IO threads doing IO to the same block.
This sort of problem is avoided most of the time by the core target
grabbing a bio prison cell before making the request to the policy.
But for a demotion the core target doesn't know which block will be
demoted, so can't do this in advance.
Fix this demotion race by introducing a callback to the policy interface
that allows the policy to grab the cell on behalf of the core target.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
virt_dev->num_cached_rings counts on freed ring and is not updated
correctly. In xhci_free_or_cache_endpoint_ring() function, the free ring
is added into cache and then num_rings_cache is incremented as below:
virt_dev->ring_cache[rings_cached] =
virt_dev->eps[ep_index].ring;
virt_dev->num_rings_cached++;
here, free ring pointer is added to a current index and then
index is incremented.
So current index always points to empty location in the ring cache.
For getting available free ring, current index should be decremented
first and then corresponding ring buffer value should be taken from ring
cache.
But In function xhci_endpoint_init(), the num_rings_cached index is
accessed before decrement.
virt_dev->eps[ep_index].new_ring =
virt_dev->ring_cache[virt_dev->num_rings_cached];
virt_dev->ring_cache[virt_dev->num_rings_cached] = NULL;
virt_dev->num_rings_cached--;
This is bug in manipulating the index of ring cache.
And it should be as below:
virt_dev->num_rings_cached--;
virt_dev->eps[ep_index].new_ring =
virt_dev->ring_cache[virt_dev->num_rings_cached];
virt_dev->ring_cache[virt_dev->num_rings_cached] = NULL;
Signed-off-by: Aman Deep <aman.deep@samsung.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 25cd2882e2fc ("usb/xhci: Change how we indicate a host supports
Link PM.") removed the code to set lpm_capable for USB 3.0 super-speed
root hub. The intention of that change was to avoid touching usb core
internal field, a.k.a. lpm_capable, and let usb core to set it by
checking U1 and U2 exit latency values in the descriptor.
Usb core checks and sets lpm_capable in hub_port_init(). Unfortunately,
root hub is a special usb device as it has no parent. Hub_port_init()
will never be called for a root hub device. That means lpm_capable will
by no means be set for the root hub. As the result, lpm isn't functional
at all in Linux kernel.
This patch add the code to check and set lpm_capable when registering a
root hub device. It could be back-ported to kernels as old as v3.15,
that contains the Commit 25cd2882e2fc ("usb/xhci: Change how we indicate
a host supports Link PM.").
Reported-by: Kevin Strasser <kevin.strasser@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes a bug introduced by commit 977dcfdc6031 ("USB: OHCI:
don't lose track of EDs when a controller dies"). The commit changed
ed_state from ED_UNLINK to ED_IDLE too early, before finish_urb() had
been called. The user-visible consequence is that the driver
occasionally crashes or locks up when an URB is submitted while
another URB for the same endpoint is being unlinked.
This patch moves the ED state change later, to the right place. The
drawback is that now we may unnecessarily execute some instructions
multiple times when a controller dies. Since controllers dying is an
exceptional occurrence, a little wasted time won't matter.