Filipe Manana [Tue, 25 May 2021 10:05:28 +0000 (11:05 +0100)]
btrfs: fix deadlock when cloning inline extents and low on available space
There are a few cases where cloning an inline extent requires copying data
into a page of the destination inode. For these cases we are allocating
the required data and metadata space while holding a leaf locked. This can
result in a deadlock when we are low on available space because allocating
the space may flush delalloc and two deadlock scenarios can happen:
1) When starting writeback for an inode with a very small dirty range that
fits in an inline extent, we deadlock during the writeback when trying
to insert the inline extent, at cow_file_range_inline(), if the extent
is going to be located in the leaf for which we are already holding a
read lock;
2) After successfully starting writeback, for non-inline extent cases,
the async reclaim thread will hang waiting for an ordered extent to
complete if the ordered extent completion needs to modify the leaf
for which the clone task is holding a read lock (for adding or
replacing file extent items). So the cloning task will wait forever
on the async reclaim thread to make progress, which in turn is
waiting for the ordered extent completion which in turn is waiting
to acquire a write lock on the same leaf.
So fix this by making sure we release the path (and therefore the leaf)
every time we need to copy the inline extent's data into a page of the
destination inode, as by that time we do not need to have the leaf locked.
Fixes: 05a5a7621ce66c ("Btrfs: implement full reflink support for inline extents") CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 24 May 2021 10:35:53 +0000 (11:35 +0100)]
btrfs: fix fsync failure and transaction abort after writes to prealloc extents
When doing a series of partial writes to different ranges of preallocated
extents with transaction commits and fsyncs in between, we can end up with
a checksum items in a log tree. This causes an fsync to fail with -EIO and
abort the transaction, turning the filesystem to RO mode, when syncing the
log.
For this to happen, we need to have a full fsync of a file following one
or more fast fsyncs.
The following example reproduces the problem and explains how it happens:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
# Create our test file with 2 preallocated extents. Leave a 1M hole
# between them to ensure that we get two file extent items that will
# never be merged into a single one. The extents are contiguous on disk,
# which will later result in the checksums for their data to be merged
# into a single checksum item in the csums btree.
#
$ xfs_io -f \
-c "falloc 0 1M" \
-c "falloc 3M 3M" \
/mnt/foobar
# Now write to the second extent and leave only 1M of it as unwritten,
# which corresponds to the file range [4M, 5M[.
#
# Then fsync the file to flush delalloc and to clear full sync flag from
# the inode, so that a future fsync will use the fast code path.
#
# After the writeback triggered by the fsync we have 3 file extent items
# that point to the second extent we previously allocated:
#
# 1) One file extent item of type BTRFS_FILE_EXTENT_REG that covers the
# file range [3M, 4M[
#
# 2) One file extent item of type BTRFS_FILE_EXTENT_PREALLOC that covers
# the file range [4M, 5M[
#
# 3) One file extent item of type BTRFS_FILE_EXTENT_REG that covers the
# file range [5M, 6M[
#
# All these file extent items have a generation of 6, which is the ID of
# the transaction where they were created. The split of the original file
# extent item is done at btrfs_mark_extent_written() when ordered extents
# complete for the file ranges [3M, 4M[ and [5M, 6M[.
#
$ xfs_io -c "pwrite -S 0xab 3M 1M" \
-c "pwrite -S 0xef 5M 1M" \
-c "fsync" \
/mnt/foobar
# Commit the current transaction. This wipes out the log tree created by
# the previous fsync.
sync
# Now write to the unwritten range of the second extent we allocated,
# corresponding to the file range [4M, 5M[, and fsync the file, which
# triggers the fast fsync code path.
#
# The fast fsync code path sees that there is a new extent map covering
# the file range [4M, 5M[ and therefore it will log a checksum item
# covering the range [1M, 2M[ of the second extent we allocated.
#
# Also, after the fsync finishes we no longer have the 3 file extent
# items that pointed to 3 sections of the second extent we allocated.
# Instead we end up with a single file extent item pointing to the whole
# extent, with a type of BTRFS_FILE_EXTENT_REG and a generation of 7 (the
# current transaction ID). This is due to the file extent item merging we
# do when completing ordered extents into ranges that point to unwritten
# (preallocated) extents. This merging is done at
# btrfs_mark_extent_written().
#
$ xfs_io -c "pwrite -S 0xcd 4M 1M" \
-c "fsync" \
/mnt/foobar
# Now do some write to our file outside the range of the second extent
# that we allocated with fallocate() and truncate the file size from 6M
# down to 5M.
#
# The truncate operation sets the full sync runtime flag on the inode,
# forcing the next fsync to use the slow code path. It also changes the
# length of the second file extent item so that it represents the file
# range [3M, 5M[ and not the range [3M, 6M[ anymore.
#
# Finally fsync the file. Since this is a fsync that triggers the slow
# code path, it will remove all items associated to the inode from the
# log tree and then it will scan for file extent items in the
# fs/subvolume tree that have a generation matching the current
# transaction ID, which is 7. This means it will log 2 file extent
# items:
#
# 1) One for the first extent we allocated, covering the file range
# [0, 1M[
#
# 2) Another for the first 2M of the second extent we allocated,
# covering the file range [3M, 5M[
#
# When logging the first file extent item we log a single checksum item
# that has all the checksums for the entire extent.
#
# When logging the second file extent item, we also lookup for the
# checksums that are associated with the range [0, 2M[ of the second
# extent we allocated (file range [3M, 5M[), and then we log them with
# btrfs_csum_file_blocks(). However that results in ending up with a log
# that has two checksum items with ranges that overlap:
#
# 1) One for the range [1M, 2M[ of the second extent we allocated,
# corresponding to the file range [4M, 5M[, which we logged in the
# previous fsync that used the fast code path;
#
# 2) One for the ranges [0, 1M[ and [0, 2M[ of the first and second
# extents, respectively, corresponding to the files ranges [0, 1M[
# and [3M, 5M[. This one was added during this last fsync that uses
# the slow code path and overlaps with the previous one logged by
# the previous fast fsync.
#
# This happens because when logging the checksums for the second
# extent, we notice they start at an offset that matches the end of the
# checksums item that we logged for the first extent, and because both
# extents are contiguous on disk, btrfs_csum_file_blocks() decides to
# extend that existing checksums item and append the checksums for the
# second extent to this item. The end result is we end up with two
# checksum items in the log tree that have overlapping ranges, as
# listed before, resulting in the fsync to fail with -EIO and aborting
# the transaction, turning the filesystem into RO mode.
#
$ xfs_io -c "pwrite -S 0xff 0 1M" \
-c "truncate 5M" \
-c "fsync" \
/mnt/foobar
fsync: Input/output error
After running the example, dmesg/syslog shows the tree checker complained
about the checksum items with overlapping ranges and we aborted the
transaction:
Having checksum items covering ranges that overlap is dangerous as in some
cases it can lead to having extent ranges for which we miss checksums
after log replay or getting the wrong checksum item. There were some fixes
in the past for bugs that resulted in this problem, and were explained and
fixed by the following commits:
27b9a8122ff71a ("Btrfs: fix csum tree corruption, duplicate and outdated checksums") b84b8390d6009c ("Btrfs: fix file read corruption after extent cloning and fsync") 40e046acbd2f36 ("Btrfs: fix missing data checksums after replaying a log tree") e289f03ea79bbc ("btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents")
Fix the issue by making btrfs_csum_file_blocks() taking into account the
start offset of the next checksum item when it decides to extend an
existing checksum item, so that it never extends the checksum to end at a
range that goes beyond the start range of the next checksum item.
When we can not access the next checksum item without releasing the path,
simply drop the optimization of extending the previous checksum item and
fallback to inserting a new checksum item - this happens rarely and the
optimization is not significant enough for a log tree in order to justify
the extra complexity, as it would only save a few bytes (the size of a
struct btrfs_item) of leaf space.
This behaviour is only needed when inserting into a log tree because
for the regular checksums tree we never have a case where we try to
insert a range of checksums that overlap with a range that was previously
inserted.
Josef Bacik [Wed, 19 May 2021 18:04:21 +0000 (14:04 -0400)]
btrfs: abort in rename_exchange if we fail to insert the second ref
Error injection stress uncovered a problem where we'd leave a dangling
inode ref if we failed during a rename_exchange. This happens because
we insert the inode ref for one side of the rename, and then for the
other side. If this second inode ref insert fails we'll leave the first
one dangling and leave a corrupt file system behind. Fix this by
aborting if we did the insert for the first inode ref.
CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 19 May 2021 15:26:25 +0000 (11:26 -0400)]
btrfs: check error value from btrfs_update_inode in tree log
Error injection testing uncovered a case where we ended up with invalid
link counts on an inode. This happened because we failed to notice an
error when updating the inode while replaying the tree log, and
committed the transaction with an invalid file system.
Fix this by checking the return value of btrfs_update_inode. This
resolved the link count errors I was seeing, and we already properly
handle passing up the error values in these paths.
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 19 May 2021 17:13:15 +0000 (13:13 -0400)]
btrfs: fixup error handling in fixup_inode_link_counts
This function has the following pattern
while (1) {
ret = whatever();
if (ret)
goto out;
}
ret = 0
out:
return ret;
However several places in this while loop we simply break; when there's
a problem, thus clearing the return value, and in one case we do a
return -EIO, and leak the memory for the path.
Fix this by re-arranging the loop to deal with ret == 1 coming from
btrfs_search_slot, and then simply delete the
ret = 0;
out:
bit so everybody can break if there is an error, which will allow for
proper error handling to occur.
CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 19 May 2021 13:38:27 +0000 (09:38 -0400)]
btrfs: mark ordered extent and inode with error if we fail to finish
While doing error injection testing I saw that sometimes we'd get an
abort that wouldn't stop the current transaction commit from completing.
This abort was coming from finish ordered IO, but at this point in the
transaction commit we should have gotten an error and stopped.
It turns out the abort came from finish ordered io while trying to write
out the free space cache. It occurred to me that any failure inside of
finish_ordered_io isn't actually raised to the person doing the writing,
so we could have any number of failures in this path and think the
ordered extent completed successfully and the inode was fine.
Fix this by marking the ordered extent with BTRFS_ORDERED_IOERR, and
marking the mapping of the inode with mapping_set_error, so any callers
that simply call fdatawait will also get the error.
With this we're seeing the IO error on the free space inode when we fail
to do the finish_ordered_io.
CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 19 May 2021 14:52:46 +0000 (10:52 -0400)]
btrfs: return errors from btrfs_del_csums in cleanup_ref_head
We are unconditionally returning 0 in cleanup_ref_head, despite the fact
that btrfs_del_csums could fail. We need to return the error so the
transaction gets aborted properly, fix this by returning ret from
btrfs_del_csums in cleanup_ref_head.
Reviewed-by: Qu Wenruo <wqu@suse.com> CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 19 May 2021 14:52:45 +0000 (10:52 -0400)]
btrfs: fix error handling in btrfs_del_csums
Error injection stress would sometimes fail with checksums on disk that
did not have a corresponding extent. This occurred because the pattern
in btrfs_del_csums was
while (1) {
ret = btrfs_search_slot();
if (ret < 0)
break;
}
ret = 0;
out:
btrfs_free_path(path);
return ret;
If we got an error from btrfs_search_slot we'd clear the error because
we were breaking instead of goto out. Instead of using goto out, simply
handle the cases where we may leave a random value in ret, and get rid
of the
ret = 0;
out:
pattern and simply allow break to have the proper error reporting. With
this fix we properly abort the transaction and do not commit thinking we
successfully deleted the csum.
Reviewed-by: Qu Wenruo <wqu@suse.com> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 25 May 2021 05:52:43 +0000 (13:52 +0800)]
btrfs: fix compressed writes that cross stripe boundary
[BUG]
When running btrfs/027 with "-o compress" mount option, it always
crashes with the following call trace:
BTRFS critical (device dm-4): mapping failed logical 298901504 bio len 12288 len 8192
------------[ cut here ]------------
kernel BUG at fs/btrfs/volumes.c:6651!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 5 PID: 31089 Comm: kworker/u24:10 Tainted: G OE 5.13.0-rc2-custom+ #26
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
RIP: 0010:btrfs_map_bio.cold+0x58/0x5a [btrfs]
Call Trace:
btrfs_submit_compressed_write+0x2d7/0x470 [btrfs]
submit_compressed_extents+0x3b0/0x470 [btrfs]
? mark_held_locks+0x49/0x70
btrfs_work_helper+0x131/0x3e0 [btrfs]
process_one_work+0x28f/0x5d0
worker_thread+0x55/0x3c0
? process_one_work+0x5d0/0x5d0
kthread+0x141/0x160
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x22/0x30
---[ end trace 63113a3a91f34e68 ]---
[CAUSE]
The critical message before the crash means we have a bio at logical
bytenr 298901504 length 12288, but only 8192 bytes can fit into one
stripe, the remaining 4096 bytes go to another stripe.
In btrfs, all bios are properly split to avoid cross stripe boundary,
but commit 764c7c9a464b ("btrfs: zoned: fix parallel compressed writes")
changed the behavior for compressed writes.
Previously if we find our new page can't be fitted into current stripe,
ie. "submit == 1" case, we submit current bio without adding current
page.
if (pg_index == 0 && use_append)
len = bio_add_zone_append_page(bio, page, PAGE_SIZE, 0);
else
len = bio_add_page(bio, page, PAGE_SIZE, 0);
page->mapping = NULL;
if (submit || len < PAGE_SIZE) {
[FIX]
It's no longer possible to revert to the original code style as we have
two different bio_add_*_page() calls now.
The new fix is to skip the bio_add_*_page() call if @submit is true.
Also to avoid @len to be uninitialized, always initialize it to zero.
If @submit is true, @len will not be checked.
If @submit is not true, @len will be the return value of
bio_add_*_page() call.
Either way, the behavior is still the same as the old code.
Reported-by: Josef Bacik <josef@toxicpanda.com> Fixes: 764c7c9a464b ("btrfs: zoned: fix parallel compressed writes") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Sun, 23 May 2021 16:32:40 +0000 (06:32 -1000)]
Merge tag 'perf-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Two perf fixes:
- Do not check the LBR_TOS MSR when setting up unrelated LBR MSRs as
this can cause malfunction when TOS is not supported
- Allocate the LBR XSAVE buffers along with the DS buffers upfront
because allocating them when adding an event can deadlock"
* tag 'perf-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context
perf/x86: Avoid touching LBR_TOS MSR for Arch LBR
Linus Torvalds [Sun, 23 May 2021 16:30:08 +0000 (06:30 -1000)]
Merge tag 'locking-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two locking fixes:
- Invoke the lockdep tracepoints in the correct place so the ordering
is correct again
- Don't leave the mutex WAITER bit stale when the last waiter is
dropping out early due to a signal as that forces all subsequent
lock operations needlessly into the slowpath until it's cleaned up
again"
* tag 'locking-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal
locking/lockdep: Correct calling tracepoints
Linus Torvalds [Sun, 23 May 2021 16:28:20 +0000 (06:28 -1000)]
Merge tag 'irq-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A few fixes for irqchip drivers:
- Allocate interrupt descriptors correctly on Mainstone PXA when
SPARSE_IRQ is enabled; otherwise the interrupt association fails
- Make the APPLE AIC chip driver depend on APPLE
- Remove redundant error output on devm_ioremap_resource() failure"
* tag 'irq-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: Remove redundant error printing
irqchip/apple-aic: APPLE_AIC should depend on ARCH_APPLE
ARM: PXA: Fix cplds irqdesc allocation when using legacy mode
Linus Torvalds [Sun, 23 May 2021 16:12:25 +0000 (06:12 -1000)]
Merge tag 'x86_urgent_for_v5.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Fix how SEV handles MMIO accesses by forwarding potential page faults
instead of killing the machine and by using the accessors with the
exact functionality needed when accessing memory.
- Fix a confusion with Clang LTO compiler switches passed to the it
- Handle the case gracefully when VMGEXIT has been executed in
userspace
* tag 'x86_urgent_for_v5.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Use __put_user()/__get_user() for data accesses
x86/sev-es: Forward page-faults which happen during emulation
x86/sev-es: Don't return NULL from sev_es_get_ghcb()
x86/build: Fix location of '-plugin-opt=' flags
x86/sev-es: Invalidate the GHCB after completing VMGEXIT
x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch
Linus Torvalds [Sun, 23 May 2021 16:07:33 +0000 (06:07 -1000)]
Merge tag 'powerpc-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix breakage of strace (and other ptracers etc.) when using the new
scv ABI (Power9 or later with glibc >= 2.33).
- Fix early_ioremap() on 64-bit, which broke booting on some machines.
Thanks to Dmitry V. Levin, Nicholas Piggin, Alexey Kardashevskiy, and
Christophe Leroy.
* tag 'powerpc-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls
powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls
powerpc: Fix early setup to make early_ioremap() work
Linus Torvalds [Sun, 23 May 2021 01:20:20 +0000 (15:20 -1000)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"10 patches.
Subsystems affected by this patch series: mm (pagealloc, gup, kasan,
and userfaultfd), ipc, selftests, watchdog, bitmap, procfs, and lib"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
userfaultfd: hugetlbfs: fix new flag usage in error path
lib: kunit: suppress a compilation warning of frame size
proc: remove Alexey from MAINTAINERS
linux/bits.h: fix compilation error with GENMASK
watchdog: reliable handling of timestamps
kasan: slab: always reset the tag in get_freepointer_safe()
tools/testing/selftests/exec: fix link error
ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry
Revert "mm/gup: check page posion status for coredump."
mm/shuffle: fix section mismatch warning
Mike Kravetz [Sun, 23 May 2021 00:42:11 +0000 (17:42 -0700)]
userfaultfd: hugetlbfs: fix new flag usage in error path
In commit d6995da31122 ("hugetlb: use page.private for hugetlb specific
page flags") the use of PagePrivate to indicate a reservation count
should be restored at free time was changed to the hugetlb specific flag
HPageRestoreReserve. Changes to a userfaultfd error path as well as a
VM_BUG_ON() in remove_inode_hugepages() were overlooked.
Users could see incorrect hugetlb reserve counts if they experience an
error with a UFFDIO_COPY operation. Specifically, this would be the
result of an unlikely copy_huge_page_from_user error. There is not an
increased chance of hitting the VM_BUG_ON.
Link: https://lkml.kernel.org/r/20210521233952.236434-1-mike.kravetz@oracle.com Fixes: d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mina Almasry <almasry.mina@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Mina Almasry <almasrymina@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhen Lei [Sun, 23 May 2021 00:42:08 +0000 (17:42 -0700)]
lib: kunit: suppress a compilation warning of frame size
lib/bitfield_kunit.c: In function `test_bitfields_constants':
lib/bitfield_kunit.c:93:1: warning: the frame size of 7456 bytes is larger than 2048 bytes [-Wframe-larger-than=]
}
^
As the description of BITFIELD_KUNIT in lib/Kconfig.debug, it "Only useful
for kernel devs running the KUnit test harness, and not intended for
inclusion into a production build". Therefore, it is not worth modifying
variable 'test_bitfields_constants' to clear this warning. Just suppress
it.
Rikard Falkeborn [Sun, 23 May 2021 00:42:02 +0000 (17:42 -0700)]
linux/bits.h: fix compilation error with GENMASK
GENMASK() has an input check which uses __builtin_choose_expr() to
enable a compile time sanity check of its inputs if they are known at
compile time.
However, it turns out that __builtin_constant_p() does not always return
a compile time constant [0]. It was thought this problem was fixed with
gcc 4.9 [1], but apparently this is not the case [2].
Switch to use __is_constexpr() instead which always returns a compile time
constant, regardless of its inputs.
Petr Mladek [Sun, 23 May 2021 00:41:59 +0000 (17:41 -0700)]
watchdog: reliable handling of timestamps
Commit 9bf3bc949f8a ("watchdog: cleanup handling of false positives")
tried to handle a virtual host stopped by the host a more
straightforward and cleaner way.
But it introduced a risk of false softlockup reports. The virtual host
might be stopped at any time, for example between
kvm_check_and_clear_guest_paused() and is_softlockup(). As a result,
is_softlockup() might read the updated jiffies and detects a softlockup.
A solution might be to put back kvm_check_and_clear_guest_paused() after
is_softlockup() and detect it. But it would put back the cycle that
complicates the logic.
In fact, the handling of all the timestamps is not reliable. The code
does not guarantee when and how many times the timestamps are read. For
example, "period_ts" might be touched anytime also from NMI and re-read in
is_softlockup(). It works just by chance.
Fix all the problems by making the code even more explicit.
1. Make sure that "now" and "period_ts" timestamps are read only once.
They might be changed at anytime by NMI or when the virtual guest is
stopped by the host. Note that "now" timestamp does this implicitly
because "jiffies" is marked volatile.
2. "now" time must be read first. The state of "period_ts" will
decide whether it will be used or the period will get restarted.
3. kvm_check_and_clear_guest_paused() must be called before reading
"period_ts". It touches the variable when the guest was stopped.
As a result, "now" timestamp is used only when the watchdog was not
touched and the guest not stopped in the meantime. "period_ts" is
restarted in all other situations.
Link: https://lkml.kernel.org/r/YKT55gw+RZfyoFf7@alley Fixes: 9bf3bc949f8aeefeacea4b ("watchdog: cleanup handling of false positives") Signed-off-by: Petr Mladek <pmladek@suse.com> Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Yingliang [Sun, 23 May 2021 00:41:53 +0000 (17:41 -0700)]
tools/testing/selftests/exec: fix link error
Fix the link error by adding '-static':
gcc -Wall -Wl,-z,max-page-size=0x1000 -pie load_address.c -o /home/yang/linux/tools/testing/selftests/exec/load_address_4096
/usr/bin/ld: /tmp/ccopEGun.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `stderr@@GLIBC_2.17' which may bind externally can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /tmp/ccopEGun.o(.text+0x158): unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol `stderr@@GLIBC_2.17'
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
make: *** [Makefile:25: tools/testing/selftests/exec/load_address_4096] Error 1
Link: https://lkml.kernel.org/r/20210514092422.2367367-1-yangyingliang@huawei.com Fixes: 206e22f01941 ("tools/testing/selftests: add self-test for verifying load alignment") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Cc: Chris Kennelly <ckennelly@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Varad Gautam [Sun, 23 May 2021 00:41:49 +0000 (17:41 -0700)]
ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry
do_mq_timedreceive calls wq_sleep with a stack local address. The
sender (do_mq_timedsend) uses this address to later call pipelined_send.
This leads to a very hard to trigger race where a do_mq_timedreceive
call might return and leave do_mq_timedsend to rely on an invalid
address, causing the following crash:
1. do_mq_timedreceive calls wq_sleep with the address of `struct
ext_wait_queue` on function stack (aliased as `ewq_addr` here) - it
holds a valid `struct ext_wait_queue *` as long as the stack has not
been overwritten.
2. `ewq_addr` gets added to info->e_wait_q[RECV].list in wq_add, and
do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call
__pipelined_op.
3. Sender calls __pipelined_op::smp_store_release(&this->state,
STATE_READY). Here is where the race window begins. (`this` is
`ewq_addr`.)
4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it
will see `state == STATE_READY` and break.
5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed
to be a `struct ext_wait_queue *` since it was on do_mq_timedreceive's
stack. (Although the address may not get overwritten until another
function happens to touch it, which means it can persist around for an
indefinite time.)
6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a
`struct ext_wait_queue *`, and uses it to find a task_struct to pass to
the wake_q_add_safe call. In the lucky case where nothing has
overwritten `ewq_addr` yet, `ewq_addr->task` is the right task_struct.
In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a
bogus address as the receiver's task_struct causing the crash.
do_mq_timedsend::__pipelined_op() should not dereference `this` after
setting STATE_READY, as the receiver counterpart is now free to return.
Change __pipelined_op to call wake_q_add_safe on the receiver's
task_struct returned by get_task_struct, instead of dereferencing `this`
which sits on the receiver's stack.
As Manfred pointed out, the race potentially also exists in
ipc/msg.c::expunge_all and ipc/sem.c::wake_up_sem_queue_prepare. Fix
those in the same way.
Link: https://lkml.kernel.org/r/20210510102950.12551-1-varad.gautam@suse.com Fixes: c5b2cbdbdac563 ("ipc/mqueue.c: update/document memory barriers") Fixes: 8116b54e7e23ef ("ipc/sem.c: document and update memory barriers") Fixes: 0d97a82ba830d8 ("ipc/msg.c: update and document memory barriers") Signed-off-by: Varad Gautam <varad.gautam@suse.com> Reported-by: Matthias von Faber <matthias.vonfaber@aox-tech.de> Acked-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Manfred Spraul <manfred@colorfullife.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Sun, 23 May 2021 00:41:46 +0000 (17:41 -0700)]
Revert "mm/gup: check page posion status for coredump."
While reviewing [1] I came across commit d3378e86d182 ("mm/gup: check
page posion status for coredump.") and noticed that this patch is broken
in two ways. First it doesn't really prevent hwpoison pages from being
dumped because hwpoison pages can be marked asynchornously at any time
after the check. Secondly, and more importantly, the patch introduces a
ref count leak because get_dump_page takes a reference on the page which
is not released.
It also seems that the patch was merged incorrectly because there were
follow up changes not included as well as discussions on how to address
the underlying problem [2]
Arnd Bergmann [Sun, 23 May 2021 00:41:43 +0000 (17:41 -0700)]
mm/shuffle: fix section mismatch warning
clang sometimes decides not to inline shuffle_zone(), but it calls a
__meminit function. Without the extra __meminit annotation we get this
warning:
WARNING: modpost: vmlinux.o(.text+0x2a86d4): Section mismatch in reference from the function shuffle_zone() to the function .meminit.text:__shuffle_zone()
The function shuffle_zone() references
the function __meminit __shuffle_zone().
This is often because shuffle_zone lacks a __meminit
annotation or the annotation of __shuffle_zone is wrong.
shuffle_free_memory() did not show the same problem in my tests, but it
could happen in theory as well, so mark both as __meminit.
Link: https://lkml.kernel.org/r/20210514135952.2928094-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* tag 'block-5.13-2021-05-22' of git://git.kernel.dk/linux-block:
block: fix a race between del_gendisk and BLKRRPART
block: prevent block device lookups at the beginning of del_gendisk
nvme-fc: clear q_live at beginning of association teardown
nvme-tcp: rerun io_work if req_list is not empty
nvme-tcp: fix possible use-after-completion
nvme-loop: fix memory leak in nvme_loop_create_ctrl()
nvmet: fix memory leak in nvmet_alloc_ctrl()
Linus Torvalds [Sat, 22 May 2021 17:33:09 +0000 (07:33 -1000)]
Merge tag 'for-linus-5.13b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- a fix for a boot regression when running as PV guest on hardware
without NX support
- a small series fixing a bug in the Xen pciback driver when
configuring a PCI card with multiple virtual functions
* tag 'for-linus-5.13b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-pciback: reconfigure also from backend watch handler
xen-pciback: redo VF placement in the virtual topology
x86/Xen: swap NX determination and GDT setup on BSP
Linus Torvalds [Fri, 21 May 2021 23:24:12 +0000 (13:24 -1000)]
Merge tag 'for-5.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes:
- fix unaligned compressed writes in zoned mode
- fix false positive lockdep warning when cloning inline extent
- remove wrong BUG_ON in tree-log error handling"
* tag 'for-5.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: fix parallel compressed writes
btrfs: zoned: pass start block to btrfs_use_zone_append
btrfs: do not BUG_ON in link_to_fixup_dir
btrfs: release path before starting transaction when cloning inline extent
Linus Torvalds [Fri, 21 May 2021 23:12:51 +0000 (13:12 -1000)]
Merge tag '5.13-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Seven smb3 fixes: one for stable, three others fix problems found in
testing handle leases, and a compounded request fix"
* tag '5.13-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6:
Fix KASAN identified use-after-free issue.
Defer close only when lease is enabled.
Fix kernel oops when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
cifs: Fix inconsistent indenting
cifs: fix memory leak in smb2_copychunk_range
SMB3: incorrect file id in requests compounded with open
cifs: remove deadstore in cifs_close_all_deferred_files()
Linus Torvalds [Fri, 21 May 2021 16:31:34 +0000 (06:31 -1000)]
Merge tag 'mmc-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- Fix SD-card detection on Intel NUC10i3FNK4 (GL9755)
- Replace WARN_ONCE with dev_warn_once for scatterlist offsets
- Extend check of scatterlist size alignment with SD_IO_RW_EXTENDED
* tag 'mmc-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-pci-gli: increase 1.8V regulator wait
mmc: meson-gx: also check SD_IO_RW_EXTENDED for scatterlist size alignment
mmc: meson-gx: make replace WARN_ONCE with dev_warn_once about scatterlist offset alignment
Linus Torvalds [Fri, 21 May 2021 16:24:45 +0000 (06:24 -1000)]
Merge tag 'devicetree-fixes-for-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Another batch of removing unneeded type references in schemas
- Fix some out of date filename references
- Convert renesas,drif schema to use DT graph schema
* tag 'devicetree-fixes-for-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: More removals of type references on common properties
dt-bindings: media: renesas,drif: Use graph schema
leds: Fix reference file name of documentation
dt-bindings: phy: cadence-torrent: update reference file of docs
Linus Torvalds [Fri, 21 May 2021 16:12:52 +0000 (06:12 -1000)]
Merge branch 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo fix from Eric Biederman:
"During the merge window an issue with si_perf and the siginfo ABI came
up. The alpha and sparc siginfo structure layout had changed with the
addition of SIGTRAP TRAP_PERF and the new field si_perf.
The reason only alpha and sparc were affected is that they are the
only architectures that use si_trapno.
Looking deeper it was discovered that si_trapno is used for only a few
select signals on alpha and sparc, and that none of the other
_sigfault fields past si_addr are used at all. Which means technically
no regression on alpha and sparc.
While the alignment concerns might be dismissed the abuse of si_errno
by SIGTRAP TRAP_PERF does have the potential to cause regressions in
existing userspace.
While we still have time before userspace starts using and depending
on the new definition siginfo for SIGTRAP TRAP_PERF this set of
changes cleans up siginfo_t.
- The si_trapno field is demoted from magic alpha and sparc status
and made an ordinary union member of the _sigfault member of
siginfo_t. Without moving it of course.
- si_perf is replaced with si_perf_data and si_perf_type ending the
abuse of si_errno.
- Unnecessary additions to signalfd_siginfo are removed"
* 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo
signal: Deliver all of the siginfo perf data in _perf
signal: Factor force_sig_perf out of perf_sigtrap
signal: Implement SIL_FAULT_TRAPNO
siginfo: Move si_trapno inside the union inside _si_fault
Linus Torvalds [Fri, 21 May 2021 16:09:17 +0000 (06:09 -1000)]
Merge tag 'modules-for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module fix from Jessica Yu:
"When CONFIG_MODULE_UNLOAD=n, module exit sections get sorted into the
init region of the module in order to satisfy the requirements of
jump_labels and static_calls.
Previously, the exit section check was done in module_init_section(),
but the solution there is not completely arch-indepedent as ARM is a
special case and supplies its own module_init_section() function.
Instead of pushing this logic further to the arch-specific code,
switch to an arch-independent solution to check for module exit
sections in the core module loader code in layout_sections() instead"
* tag 'modules-for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: check for exit sections in layout_sections() instead of module_init_section()
Jan Beulich [Tue, 18 May 2021 16:14:07 +0000 (18:14 +0200)]
xen-pciback: reconfigure also from backend watch handler
When multiple PCI devices get assigned to a guest right at boot, libxl
incrementally populates the backend tree. The writes for the first of
the devices trigger the backend watch. In turn xen_pcibk_setup_backend()
will set the XenBus state to Initialised, at which point no further
reconfigures would happen unless a device got hotplugged. Arrange for
reconfigure to also get triggered from the backend watch handler.
Jan Beulich [Tue, 18 May 2021 16:13:42 +0000 (18:13 +0200)]
xen-pciback: redo VF placement in the virtual topology
The commit referenced below was incomplete: It merely affected what
would get written to the vdev-<N> xenstore node. The guest would still
find the function at the original function number as long as
__xen_pcibk_get_pci_dev() wouldn't be in sync. The same goes for AER wrt
__xen_pcibk_get_pcifront_dev().
Undo overriding the function to zero and instead make sure that VFs at
function zero remain alone in their slot. This has the added benefit of
improving overall capacity, considering that there's only a total of 32
slots available right now (PCI segment and bus can both only ever be
zero at present).
Fixes: 8a5248fe10b1 ("xen PV passthru: assign SR-IOV virtual functions to separate virtual slots") Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/8def783b-404c-3452-196d-3f3fd4d72c9e@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Thu, 20 May 2021 11:42:42 +0000 (13:42 +0200)]
x86/Xen: swap NX determination and GDT setup on BSP
xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables.
For this to work when NX is not available, x86_configure_nx() needs to
be called first.
[jgross] Note that this is a revert of 36104cb9012a82e73 ("x86/xen:
Delay get_cpu_cap until stack canary is established"), which is possible
now that we no longer support running as PV guest in 32-bit mode.
Cc: <stable.vger.kernel.org> # 5.9 Fixes: 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established") Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/12a866b0-9e89-59f7-ebeb-a2a6cec0987a@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
Linus Torvalds [Fri, 21 May 2021 06:15:43 +0000 (20:15 -1000)]
Merge tag 'drm-fixes-2021-05-21-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Usual collection, mostly amdgpu and some i915 regression fixes. I
nearly managed to hose my build/sign machine this week, but I
recovered it just in time, and I even got clang12 built.
dma-buf:
- WARN fix
amdgpu:
- Fix downscaling ratio on DCN3.x
- Fix for non-4K pages
- PCO/RV compute hang fix
- Dongle fix
- Aldebaran codec query support
- Refcount leak fix
- Use after free fix
- Navi12 golden settings updates
- GPU reset fixes
radeon:
- Fix for imported BO handling
i915:
- Pin the L-shape quirked object as unshrinkable to fix crashes
- Disable HiZ Raw Stall Optimization on broken gen7 to fix glitches,
gfx corruption
- GVT: Move mdev attribute groups into kvmgt module to fix kconfig
deps issue
exynos:
- Correct kerneldoc of fimd_shadow_protect_win function
- Drop redundant error messages"
* tag 'drm-fixes-2021-05-21-1' of git://anongit.freedesktop.org/drm/drm:
dma-buf: fix unintended pin/unpin warnings
drm/amdgpu: stop touching sched.ready in the backend
drm/amd/amdgpu: fix a potential deadlock in gpu reset
drm/amdgpu: update sdma golden setting for Navi12
drm/amdgpu: update gc golden setting for Navi12
drm/amdgpu: Fix a use-after-free
drm/amdgpu: add video_codecs query support for aldebaran
drm/amd/amdgpu: fix refcount leak
drm/amd/display: Disconnect non-DP with no EDID
drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hang
drm/amdgpu: Fix GPU TLB update error when PAGE_SIZE > AMDGPU_PAGE_SIZE
drm/radeon: use the dummy page for GART if needed
drm/amd/display: Use the correct max downscaling value for DCN3.x family
drm/i915/gt: Disable HiZ Raw Stall Optimization on broken gen7
drm/i915/gem: Pin the L-shape quirked object as unshrinkable
drm/exynos/decon5433: Remove redundant error printing in exynos5433_decon_probe()
drm/exynos: Remove redundant error printing in exynos_dsi_probe()
drm/exynos: correct exynos_drm_fimd kerneldoc
drm/i915/gvt: Move mdev attribute groups into kvmgt module
Dave Airlie [Fri, 21 May 2021 03:41:41 +0000 (13:41 +1000)]
Merge tag 'drm-intel-fixes-2021-05-20' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.13-rc3:
- Pin the L-shape quirked object as unshrinkable to fix crashes
- Disable HiZ Raw Stall Optimization on broken gen7 to fix glitches, gfx corruption
- GVT: Move mdev attribute groups into kvmgt module to fix kconfig deps issue
Linus Torvalds [Fri, 21 May 2021 00:46:26 +0000 (14:46 -1000)]
Merge tag 'arm-soc-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"Only a small number of fixes so far, including some that I had applied
during the merge window, so this is based on the original merge of the
other branches.
- The largest change is a fix for a reference counting bug in the AMD
TEE driver.
- Neil Armstrong now co-maintains Amlogic SoC support
- Two build warning fixes for renesas device tree files
- A sign expansion bug for optee
- A DT binding fix for a mismerge"
* tag 'arm-soc-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: npcm: wpcm450: select interrupt controller driver
MAINTAINERS: ARM/Amlogic SoCs: add Neil as primary maintainer
tee: amdtee: unload TA only when its refcount becomes 0
dt-bindings: nvmem: mediatek: remove duplicate mt8192 line
firmware: arm_scmi: Remove duplicate declaration of struct scmi_protocol_handle
firmware: arm_scpi: Prevent the ternary sign expansion bug
arm64: dts: renesas: Add port@0 node for all CSI-2 nodes to dtsi
arm64: dts: renesas: aistarvision-mipi-adapter-2.1: Fix CSI40 ports
Linus Torvalds [Fri, 21 May 2021 00:43:33 +0000 (14:43 -1000)]
Merge branch 'urgent.2021.05.20a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull kcsan fix from Paul McKenney:
"Fix for a regression introduced in this merge window by commit e36299efe7d7 ("kcsan, debugfs: Move debugfs file creation out of early
init").
The regression is not easy to trigger, requiring a KCSAN build using
clang with CONFIG_LTO_CLANG=y. The fix is to simply make the
kcsan_debugfs_init() function's type initcall-compatible. This has
been posted to the relevant mailing lists:"
* 'urgent.2021.05.20a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
kcsan: Fix debugfs initcall return type
Linus Torvalds [Fri, 21 May 2021 00:36:21 +0000 (14:36 -1000)]
Merge tag 'for-5.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix a couple DM snapshot target crashes exposed by user-error.
- Fix DM integrity target to not use discard optimization, introduced
during 5.13 merge, when recalulating.
- Fix some sparse warnings in DM integrity target.
* tag 'for-5.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm integrity: fix sparse warnings
dm integrity: revert to not using discard filler when recalulating
dm snapshot: fix crash with transient storage and zero chunk size
dm snapshot: fix a crash when an origin has no snapshots
When mod_delayed_work is called to modify the delay of pending work,
it might return false and queue a new work when pending work is
already scheduled or when try to grab pending work failed.
So, Increase the reference count when new work is scheduled to
avoid use-after-free.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Thu, 20 May 2021 16:44:04 +0000 (06:44 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"A mixture of small bug fixes, most for longer standing problems:
- NULL pointer crash in siw
- Various error unwind bugs in siw, rxe, cm
- User triggerable errors in uverbs
- Minor bugs in mlx5 and rxe drivers"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/uverbs: Fix a NULL vs IS_ERR() bug
RDMA/mlx5: Fix query DCT via DEVX
RDMA/core: Don't access cm_id after its destruction
RDMA/rxe: Return CQE error if invalid lkey was supplied
RDMA/mlx5: Recover from fatal event in dual port mode
RDMA/mlx5: Verify that DM operation is reasonable
RDMA/rxe: Clear all QP fields if creation failed
RDMA/core: Prevent divide-by-zero error triggered by the user
RDMA/siw: Release xarray entry
RDMA/siw: Properly check send and receive CQ pointers
Linus Torvalds [Thu, 20 May 2021 16:42:21 +0000 (06:42 -1000)]
Merge tag 'sound-5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All small device-specific fixes here: a series of FireWire audio
fixes, UAF and other fixes in USB-audio and co spotted by fuzzer,
and a few HD-audio quirks as usual"
* tag 'sound-5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: line6: Fix racy initialization of LINE6 MIDI
ALSA: dice: fix stream format for TC Electronic Konnekt Live at high sampling transfer frequency
ALSA: dice: disable double_pcm_frames mode for M-Audio Profire 610, 2626 and Avid M-Box 3 Pro
ALSA: intel8x0: Don't update period unless prepared
ALSA: hda/realtek: Add some CLOVE SSIDs of ALC293
ALSA: firewire-lib: fix amdtp_packet tracepoints event for packet_index field
ALSA: firewire-lib: fix calculation for size of IR context payload
ALSA: firewire-lib: fix check for the size of isochronous packet payload
ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro
ALSA: dice: fix stream format at middle sampling rate for Alesis iO 26
ALSA: hda/realtek: Add fixup for HP Spectre x360 15-df0xxx
ALSA: usb-audio: Fix potential out-of-bounce access in MIDI EP parser
ALSA: usb-audio: Validate MS endpoint descriptors
ALSA: hda: fixup headset for ASUS GU502 laptop
ALSA: hda/realtek: reset eapd coeff to default value for alc287
Linus Torvalds [Thu, 20 May 2021 16:40:20 +0000 (06:40 -1000)]
Merge tag 'platform-drivers-x86-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Assorted pdx86 bug-fixes and model-specific quirks for 5.13"
* tag 'platform-drivers-x86-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: touchscreen_dmi: Add info for the Chuwi Hi10 Pro (CWI529) tablet
platform/x86: touchscreen_dmi: Add info for the Mediacom Winpad 7.0 W700 tablet
platform/x86: intel_punit_ipc: Append MODULE_DEVICE_TABLE for ACPI
platform/x86: dell-smbios-wmi: Fix oops on rmmod dell_smbios
platform/x86: hp-wireless: add AMD's hardware id to the supported list
platform/x86: intel_int0002_vgpio: Only call enable_irq_wake() when using s2idle
platform/x86: gigabyte-wmi: add support for B550 Aorus Elite
platform/x86: gigabyte-wmi: add support for X570 UD
platform/x86: gigabyte-wmi: streamline dmi matching
platform/mellanox: mlxbf-tmfifo: Fix a memory barrier issue
platform/surface: dtx: Fix poll function
platform/surface: aggregator: Add platform-drivers-x86 list to MAINTAINERS entry
platform/surface: aggregator: avoid clang -Wconstant-conversion warning
platform/surface: aggregator: Do not mark interrupt as shared
platform/x86: hp_accel: Avoid invoking _INI to speed up resume
platform/x86: ideapad-laptop: fix method name typo
platform/x86: ideapad-laptop: fix a NULL pointer dereference
Linus Torvalds [Thu, 20 May 2021 16:31:52 +0000 (06:31 -1000)]
Merge tag 'char-misc-5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here is a big set of char/misc/other driver fixes for 5.13-rc3.
The majority here is the fallout of the umn.edu re-review of all prior
submissions. That resulted in a bunch of reverts along with the
"correct" changes made, such that there is no regression of any of the
potential fixes that were made by those individuals. I would like to
thank the over 80 different developers who helped with the review and
fixes for this mess.
Other than that, there's a few habanna driver fixes for reported
issues, and some dyndbg fixes for reported problems.
All of these have been in linux-next for a while with no reported
problems"
* tag 'char-misc-5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (82 commits)
misc: eeprom: at24: check suspend status before disable regulator
uio_hv_generic: Fix another memory leak in error handling paths
uio_hv_generic: Fix a memory leak in error handling paths
uio/uio_pci_generic: fix return value changed in refactoring
Revert "Revert "ALSA: usx2y: Fix potential NULL pointer dereference""
dyndbg: drop uninformative vpr_info
dyndbg: avoid calling dyndbg_emit_prefix when it has no work
binder: Return EFAULT if we fail BINDER_ENABLE_ONEWAY_SPAM_DETECTION
cdrom: gdrom: initialize global variable at init time
brcmfmac: properly check for bus register errors
Revert "brcmfmac: add a check for the status of usb_register"
video: imsttfb: check for ioremap() failures
Revert "video: imsttfb: fix potential NULL pointer dereferences"
net: liquidio: Add missing null pointer checks
Revert "net: liquidio: fix a NULL pointer dereference"
media: gspca: properly check for errors in po1030_probe()
Revert "media: gspca: Check the return value of write_bridge for timeout"
media: gspca: mt9m111: Check write_bridge for timeout
Revert "media: gspca: mt9m111: Check write_bridge for timeout"
media: dvb: Add check on sp8870_readreg return
...
Linus Torvalds [Thu, 20 May 2021 16:20:15 +0000 (06:20 -1000)]
Merge tag 'quota_for_v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota fixes from Jan Kara:
"The most important part in the pull is disablement of the new syscall
quotactl_path() which was added in rc1.
The reason is some people at LWN discussion pointed out dirfd would be
useful for this path based syscall and Christian Brauner agreed.
Without dirfd it may be indeed problematic for containers. So let's
just disable the syscall for now when it doesn't have users yet so
that we have more time to mull over how to best specify the filesystem
we want to work on"
* tag 'quota_for_v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Disable quotactl_path syscall
quota: Use 'hlist_for_each_entry' to simplify code
Darrick J. Wong [Wed, 12 May 2021 23:43:10 +0000 (16:43 -0700)]
xfs: restore old ioctl definitions
These ioctl definitions in xfs_fs.h are part of the userspace ABI and
were mistakenly removed during the 5.13 merge window.
Fixes: 9fefd5db08ce ("xfs: convert to fileattr") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 12 May 2021 23:41:13 +0000 (16:41 -0700)]
xfs: fix deadlock retry tracepoint arguments
sc->ip is the inode that's being scrubbed, which means that it's not set
for scrub types that don't involve inodes. If one of those scrubbers
(e.g. inode btrees) returns EDEADLOCK, we'll trip over the null pointer.
Fix that by reporting either the file being examined or the file that
was used to call scrub.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
Darrick J. Wong [Sun, 9 May 2021 23:22:55 +0000 (16:22 -0700)]
xfs: retry allocations when locality-based search fails
If a realtime allocation fails because we can't find a sufficiently
large free extent satisfying locality rules, relax the locality rules
and try again. This reduces the occurrence of short writes to realtime
files when the write size is large and the free space is fragmented.
This was originally discovered by running generic/186 with the realtime
reflink patchset and a 128k cow extent size hint, but the short write
symptoms can manifest with a 128k extent size hint and no reflink, so
apply the fix now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Nicholas Piggin [Thu, 20 May 2021 11:19:31 +0000 (21:19 +1000)]
powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls
The scv implementation missed updating syscall return value and error
value get/set functions to deal with the changed register ABI. This
broke ptrace PTRACE_GET_SYSCALL_INFO as well as some kernel auditing
and tracing functions.
Fix. tools/testing/selftests/ptrace/get_syscall_info now passes when
scv is used.
Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions") Cc: stable@vger.kernel.org # v5.9+ Reported-by: "Dmitry V. Levin" <ldv@altlinux.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210520111931.2597127-2-npiggin@gmail.com
Nicholas Piggin [Thu, 20 May 2021 11:19:30 +0000 (21:19 +1000)]
powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls
The sc and scv 0 system calls have different ABI conventions, and
ptracers need to know which system call type is being used if they want
to look at the syscall registers.
Document that pt_regs.trap can be used for this, and fix one in-tree user
to work with scv 0 syscalls.
Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions") Cc: stable@vger.kernel.org # v5.9+ Reported-by: "Dmitry V. Levin" <ldv@altlinux.org> Suggested-by: "Dmitry V. Levin" <ldv@altlinux.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210520111931.2597127-1-npiggin@gmail.com
Gulam Mohamed [Fri, 14 May 2021 13:18:42 +0000 (15:18 +0200)]
block: fix a race between del_gendisk and BLKRRPART
When BLKRRPART is called concurrently with del_gendisk, the partitions
rescan can create a stale partition that will never be be cleaned up.
Fix this by checking the the disk is up before rescanning partitions
while under bd_mutex.
Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>
[hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210514131842.1600568-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 14 May 2021 13:18:41 +0000 (15:18 +0200)]
block: prevent block device lookups at the beginning of del_gendisk
As an artifact of how gendisk lookup used to work in earlier kernels,
GENHD_FL_UP is only cleared very late in del_gendisk, and a global lock
is used to prevent opens from succeeding while del_gendisk is tearing
down the gendisk. Switch to clearing the flag early and under bd_mutex
so that callers can use bd_mutex to stabilize the flag, which removes
the need for the global mutex.
* tag 'nvme-5.13-2021-05-20' of git://git.infradead.org/nvme:
nvme-fc: clear q_live at beginning of association teardown
nvme-tcp: rerun io_work if req_list is not empty
nvme-tcp: fix possible use-after-completion
nvme-loop: fix memory leak in nvme_loop_create_ctrl()
nvmet: fix memory leak in nvmet_alloc_ctrl()
Johannes Thumshirn [Tue, 18 May 2021 15:40:28 +0000 (00:40 +0900)]
btrfs: zoned: fix parallel compressed writes
When multiple processes write data to the same block group on a
compressed zoned filesystem, the underlying device could report I/O
errors and data corruption is possible.
This happens because on a zoned file system, compressed data writes
where sent to the device via a REQ_OP_WRITE instead of a
REQ_OP_ZONE_APPEND operation. But with REQ_OP_WRITE and parallel
submission it cannot be guaranteed that the data is always submitted
aligned to the underlying zone's write pointer.
The change to using REQ_OP_ZONE_APPEND instead of REQ_OP_WRITE on a
zoned filesystem is non intrusive on a regular file system or when
submitting to a conventional zone on a zoned filesystem, as it is
guarded by btrfs_use_zone_append.
Reported-by: David Sterba <dsterba@suse.com> Fixes: 9d294a685fbc ("btrfs: zoned: enable to mount ZONED incompat flag") CC: stable@vger.kernel.org # 5.12.x: e380adfc213a13: btrfs: zoned: pass start block to btrfs_use_zone_append CC: stable@vger.kernel.org # 5.12.x Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Tue, 18 May 2021 15:40:27 +0000 (00:40 +0900)]
btrfs: zoned: pass start block to btrfs_use_zone_append
btrfs_use_zone_append only needs the passed in extent_map's block_start
member, so there's no need to pass in the full extent map.
This also enables the use of btrfs_use_zone_append in places where we only
have a start byte but no extent_map.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Pavel Begunkov [Thu, 20 May 2021 12:21:20 +0000 (13:21 +0100)]
io_uring: fortify tctx/io_wq cleanup
We don't want anyone poking into tctx->io_wq awhile it's being destroyed
by io_wq_put_and_exit(), and even though it shouldn't even happen, if
buggy would be preferable to get a NULL-deref instead of subtle delayed
failure or UAF.
Hans de Goede [Thu, 20 May 2021 09:32:28 +0000 (11:32 +0200)]
platform/x86: touchscreen_dmi: Add info for the Chuwi Hi10 Pro (CWI529) tablet
Add touchscreen info for the Chuwi Hi10 Pro (CWI529) tablet. This includes
info for getting the firmware directly from the UEFI, so that the user does
not need to manually install the firmware in /lib/firmware/silead.
This change will make the touchscreen on these devices work OOTB,
without requiring any manual setup.
Christian König [Mon, 17 May 2021 11:20:17 +0000 (13:20 +0200)]
dma-buf: fix unintended pin/unpin warnings
DMA-buf internal users call the pin/unpin functions without having a
dynamic attachment. Avoid the warning and backtrace in the logs.
Signed-off-by: Christian König <christian.koenig@amd.com>
Bugs: https://gitlab.freedesktop.org/drm/intel/-/issues/3481 Fixes: c545781e1c55 ("dma-buf: doc polish for pin/unpin") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> CC: stable@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20210517115705.2141-1-christian.koenig@amd.com
Alexey Kardashevskiy [Thu, 20 May 2021 03:29:19 +0000 (13:29 +1000)]
powerpc: Fix early setup to make early_ioremap() work
The immediate problem is that after commit 0bd3f9e953bd ("powerpc/legacy_serial: Use early_ioremap()") the kernel
silently reboots on some systems.
The reason is that early_ioremap() returns broken addresses as it uses
slot_virt[] array which initialized with offsets from FIXADDR_TOP ==
IOREMAP_END+FIXADDR_SIZE == KERN_IO_END - FIXADDR_SIZ + FIXADDR_SIZE ==
__kernel_io_end which is 0 when early_ioremap_setup() is called.
__kernel_io_end is initialized little bit later in early_init_mmu().
This fixes the initialization by swapping early_ioremap_setup() and
early_init_mmu().
Fixes: 265c3491c4bc ("powerpc: Add support for GENERIC_EARLY_IOREMAP") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Drop unrelated cleanup & cleanup change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210520032919.358935-1-aik@ozlabs.ru
Rohith Surabattula [Mon, 17 May 2021 11:28:34 +0000 (11:28 +0000)]
Defer close only when lease is enabled.
When smb2 lease parameter is disabled on server. Server grants
batch oplock instead of RHW lease by default on open, inode page cache
needs to be zapped immediatley upon close as cache is not valid.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Rohith Surabattula [Wed, 5 May 2021 10:56:47 +0000 (10:56 +0000)]
Fix kernel oops when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
Removed oplock_break_received flag which was added to achieve
synchronization between oplock handler and open handler by earlier commit.
It is not needed because there is an existing lock open_file_lock to achieve
the same. find_readable_file takes open_file_lock and then traverses the
openFileList. Similarly, cifs_oplock_break while closing the deferred
handle (i.e cifsFileInfo_put) takes open_file_lock and then sends close
to the server.
Added comments for better readability.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Ronnie Sahlberg [Tue, 18 May 2021 22:40:11 +0000 (08:40 +1000)]
cifs: fix memory leak in smb2_copychunk_range
When using smb2_copychunk_range() for large ranges we will
run through several iterations of a loop calling SMB2_ioctl()
but never actually free the returned buffer except for the final
iteration.
This leads to memory leaks everytime a large copychunk is requested.
Fixes: 9bf0c9cd4314 ("CIFS: Fix SMB2/SMB3 Copy offload support (refcopy) for large files") Cc: <stable@vger.kernel.org> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Christian König [Tue, 18 May 2021 15:48:02 +0000 (17:48 +0200)]
drm/amdgpu: stop touching sched.ready in the backend
This unfortunately comes up in regular intervals and breaks
GPU reset for the engine in question.
The sched.ready flag controls if an engine can't get working
during hw_init, but should never be set to false during hw_fini.
v2: squash in unused variable fix (Alex)
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jingwen Chen [Mon, 17 May 2021 08:16:10 +0000 (16:16 +0800)]
drm/amd/amdgpu: fix refcount leak
[Why]
the gem object rfb->base.obj[0] is get according to num_planes
in amdgpufb_create, but is not put according to num_planes
[How]
put rfb->base.obj[0] in amdgpu_fbdev_destroy according to num_planes
Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chris Park [Tue, 4 May 2021 20:20:55 +0000 (16:20 -0400)]
drm/amd/display: Disconnect non-DP with no EDID
[Why]
Active DP dongles return no EDID when dongle
is connected, but VGA display is taken out.
Current driver behavior does not remove the
active display when this happens, and this is
a gap between dongle DTP and dongle behavior.
[How]
For active DP dongles and non-DP scenario,
disconnect sink on detection when no EDID
is read due to timeout.
Signed-off-by: Chris Park <Chris.Park@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Stylon Wang <stylon.wang@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Changfeng [Fri, 14 May 2021 07:28:25 +0000 (15:28 +0800)]
drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hang
There is problem with 3DCGCG firmware and it will cause compute test
hang on picasso/raven1. It needs to disable 3DCGCG in driver to avoid
compute hang.
Signed-off-by: Changfeng <Changfeng.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Yi Li [Fri, 14 May 2021 06:40:39 +0000 (14:40 +0800)]
drm/amdgpu: Fix GPU TLB update error when PAGE_SIZE > AMDGPU_PAGE_SIZE
When PAGE_SIZE is larger than AMDGPU_PAGE_SIZE, the number of GPU TLB
entries which need to update in amdgpu_map_buffer() should be multiplied
by AMDGPU_GPU_PAGES_IN_CPU_PAGE (PAGE_SIZE / AMDGPU_PAGE_SIZE).
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Yi Li <liyi@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Christian König [Wed, 12 May 2021 08:36:43 +0000 (10:36 +0200)]
drm/radeon: use the dummy page for GART if needed
Imported BOs don't have a pagelist any more.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Fixes: 0575ff3d33cd ("drm/radeon: stop using pages with drm_prime_sg_to_page_addr_arrays v2") CC: stable@vger.kernel.org # 5.12
Nikola Cornij [Fri, 7 May 2021 02:46:52 +0000 (22:46 -0400)]
drm/amd/display: Use the correct max downscaling value for DCN3.x family
[why]
As per spec, DCN3.x can do 6:1 downscaling and DCN2.x can do 4:1. The
max downscaling limit value for DCN2.x is 250, which means it's
calculated as 1000 / 4 = 250. For DCN3.x this then gives 1000 / 6 = 167.
[how]
Set maximum downscaling limit to 167 for DCN3.x
Signed-off-by: Nikola Cornij <nikola.cornij@amd.com> Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Stylon Wang <stylon.wang@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Dan Carpenter [Fri, 14 May 2021 14:18:10 +0000 (17:18 +0300)]
RDMA/uverbs: Fix a NULL vs IS_ERR() bug
The uapi_get_object() function returns error pointers, it never returns
NULL.
Fixes: 149d3845f4a5 ("RDMA/uverbs: Add a method to introspect handles in a context") Link: https://lore.kernel.org/r/YJ6Got+U7lz+3n9a@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Joerg Roedel [Wed, 19 May 2021 13:52:46 +0000 (15:52 +0200)]
x86/sev-es: Use __put_user()/__get_user() for data accesses
The put_user() and get_user() functions do checks on the address which is
passed to them. They check whether the address is actually a user-space
address and whether its fine to access it. They also call might_fault()
to indicate that they could fault and possibly sleep.
All of these checks are neither wanted nor needed in the #VC exception
handler, which can be invoked from almost any context and also for MMIO
instructions from kernel space on kernel memory. All the #VC handler
wants to know is whether a fault happened when the access was tried.
This is provided by __put_user()/__get_user(), which just do the access
no matter what. Also add comments explaining why __get_user() and
__put_user() are the best choice here and why it is safe to use them
in this context. Also explain why copy_to/from_user can't be used.
In addition, also revert commit
7024f60d6552 ("x86/sev-es: Handle string port IO to kernel memory properly")
because using __get_user()/__put_user() fixes the same problem while
the above commit introduced several problems:
1) It uses access_ok() which is only allowed in task context.
2) It uses memcpy() which has no fault handling at all and is
thus unsafe to use here.
[ bp: Fix up commit ID of the reverted commit above. ]
Maor Gottlieb [Wed, 19 May 2021 08:41:32 +0000 (11:41 +0300)]
RDMA/mlx5: Fix query DCT via DEVX
When executing DEVX command to query QP object, we need to take the QP
type from the mlx5_ib_qp struct which hold the driver specific QP types as
well, such as DC.
Linus Torvalds [Wed, 19 May 2021 16:12:31 +0000 (06:12 -1000)]
Merge tag 'fs.idmapped.mount_setattr.v5.13-rc3' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux
Pull mount_setattr fix from Christian Brauner:
"This makes an underlying idmapping assumption more explicit.
We currently don't have any filesystems that support idmapped mounts
which are mountable inside a user namespace, i.e. where s_user_ns !=
init_user_ns. That was a deliberate decision for now as userns root
can just mount the filesystem themselves.
Express this restriction explicitly and enforce it until there's a
real use-case for this. This way we can notice it and will have a
chance to adapt and audit our translation helpers and fstests
appropriately if we need to support such filesystems"
* tag 'fs.idmapped.mount_setattr.v5.13-rc3' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
fs/mount_setattr: tighten permission checks
It turns out this is not ready for primetime yet. The intentions are
good, but using remap_pfn_range() requires that there is nothing already
mapped in the area, and the i915 code seems to very much intentionally
remap the same area multiple times.
That will then just trigger the
BUG_ON(!pte_none(*pte));
in mm/memory.c: remap_pte_range().
There are also reports of mapping type inconsistencies, resulting in
warnings and in screen corruption.
Link: https://lore.kernel.org/lkml/20210519024322.GA29704@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/YKUjvoaKKggAmpIR@sf/ Link: https://lore.kernel.org/lkml/b6b61cf0-5874-f4c0-1fcc-4b3848451c31@redhat.com/ Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Kalle Valo <kvalo@codeaurora.org> Reported-by: Hans de Goede <hdegoede@redhat.com> Reported-by: Sergei Trofimovich <slyfox@gentoo.org> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joerg Roedel [Wed, 19 May 2021 13:52:45 +0000 (15:52 +0200)]
x86/sev-es: Forward page-faults which happen during emulation
When emulating guest instructions for MMIO or IOIO accesses, the #VC
handler might get a page-fault and will not be able to complete. Forward
the page-fault in this case to the correct handler instead of killing
the machine.
Steve French [Sat, 15 May 2021 14:52:22 +0000 (09:52 -0500)]
SMB3: incorrect file id in requests compounded with open
See MS-SMB2 3.2.4.1.4, file ids in compounded requests should be set to
0xFFFFFFFFFFFFFFFF (we were treating it as u32 not u64 and setting
it incorrectly).
Signed-off-by: Steve French <stfrench@microsoft.com> Reported-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Joerg Roedel [Wed, 19 May 2021 13:52:44 +0000 (15:52 +0200)]
x86/sev-es: Don't return NULL from sev_es_get_ghcb()
sev_es_get_ghcb() is called from several places but only one of them
checks the return value. The reaction to returning NULL is always the
same: calling panic() and kill the machine.
Instead of adding checks to all call sites, move the panic() into the
function itself so that it will no longer return NULL.
Takashi Iwai [Tue, 18 May 2021 08:39:39 +0000 (10:39 +0200)]
ALSA: line6: Fix racy initialization of LINE6 MIDI
The initialization of MIDI devices that are found on some LINE6
drivers are currently done in a racy way; namely, the MIDI buffer
instance is allocated and initialized in each private_init callback
while the communication with the interface is already started via
line6_init_cap_control() call before that point. This may lead to
Oops in line6_data_received() when a spurious event is received, as
reported by syzkaller.
This patch moves the MIDI initialization to line6_init_cap_control()
as well instead of the too-lately-called private_init for avoiding the
race. Also this reduces slightly more lines, so it's a win-win
change.
Hans de Goede [Tue, 18 May 2021 12:50:27 +0000 (14:50 +0200)]
platform/x86: dell-smbios-wmi: Fix oops on rmmod dell_smbios
init_dell_smbios_wmi() only registers the dell_smbios_wmi_driver on systems
where the Dell WMI interface is supported. While exit_dell_smbios_wmi()
unregisters it unconditionally, this leads to the following oops:
Make the unregister happen on the same condition the register happens
to fix this.
Cc: Mario Limonciello <mario.limonciello@outlook.com> Fixes: 1a258e670434 ("platform/x86: dell-smbios-wmi: Add new WMI dispatcher driver") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mario Limonciello <mario.limonciello@outlook.com> Reviewed-by: Mark Gross <mgross@linux.intel.com> Link: https://lore.kernel.org/r/20210518125027.21824-1-hdegoede@redhat.com