]> www.infradead.org Git - users/dwmw2/linux.git/log
users/dwmw2/linux.git
4 years agobtrfs: skip devices without magic signature when mounting
Anand Jain [Wed, 30 Sep 2020 13:09:52 +0000 (21:09 +0800)]
btrfs: skip devices without magic signature when mounting

commit 96c2e067ed3e3e004580a643c76f58729206b829 upstream.

Many things can happen after the device is scanned and before the device
is mounted.  One such thing is losing the BTRFS_MAGIC on the device.
If it happens we still won't free that device from the memory and cause
the userland confusion.

For example: As the BTRFS_IOC_DEV_INFO still carries the device path
which does not have the BTRFS_MAGIC, 'btrfs fi show' still lists
device which does not belong to the filesystem anymore:

  $ mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb
  $ wipefs -a /dev/sdb
  # /dev/sdb does not contain magic signature
  $ mount -o degraded /dev/sda /btrfs
  $ btrfs fi show -m
  Label: none  uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd
  Total devices 2 FS bytes used 128.00KiB
  devid    1 size 3.00GiB used 571.19MiB path /dev/sda
  devid    2 size 3.00GiB used 571.19MiB path /dev/sdb

We need to distinguish the missing signature and invalid superblock, so
add a specific error code ENODATA for that. This also fixes failure of
fstest btrfs/198.

CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agobtrfs: cleanup cow block on error
Josef Bacik [Tue, 29 Sep 2020 12:53:54 +0000 (08:53 -0400)]
btrfs: cleanup cow block on error

commit 572c83acdcdafeb04e70aa46be1fa539310be20c upstream.

In fstest btrfs/064 a transaction abort in __btrfs_cow_block could lead
to a system lockup. It gets stuck trying to write back inodes, and the
write back thread was trying to lock an extent buffer:

  $ cat /proc/2143497/stack
  [<0>] __btrfs_tree_lock+0x108/0x250
  [<0>] lock_extent_buffer_for_io+0x35e/0x3a0
  [<0>] btree_write_cache_pages+0x15a/0x3b0
  [<0>] do_writepages+0x28/0xb0
  [<0>] __writeback_single_inode+0x54/0x5c0
  [<0>] writeback_sb_inodes+0x1e8/0x510
  [<0>] wb_writeback+0xcc/0x440
  [<0>] wb_workfn+0xd7/0x650
  [<0>] process_one_work+0x236/0x560
  [<0>] worker_thread+0x55/0x3c0
  [<0>] kthread+0x13a/0x150
  [<0>] ret_from_fork+0x1f/0x30

This is because we got an error while COWing a block, specifically here

        if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) {
                ret = btrfs_reloc_cow_block(trans, root, buf, cow);
                if (ret) {
                        btrfs_abort_transaction(trans, ret);
                        return ret;
                }
        }

  [16402.241552] BTRFS: Transaction aborted (error -2)
  [16402.242362] WARNING: CPU: 1 PID: 2563188 at fs/btrfs/ctree.c:1074 __btrfs_cow_block+0x376/0x540
  [16402.249469] CPU: 1 PID: 2563188 Comm: fsstress Not tainted 5.9.0-rc6+ #8
  [16402.249936] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  [16402.250525] RIP: 0010:__btrfs_cow_block+0x376/0x540
  [16402.252417] RSP: 0018:ffff9cca40e578b0 EFLAGS: 00010282
  [16402.252787] RAX: 0000000000000025 RBX: 0000000000000002 RCX: ffff9132bbd19388
  [16402.253278] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9132bbd19380
  [16402.254063] RBP: ffff9132b41a49c0 R08: 0000000000000000 R09: 0000000000000000
  [16402.254887] R10: 0000000000000000 R11: ffff91324758b080 R12: ffff91326ef17ce0
  [16402.255694] R13: ffff91325fc0f000 R14: ffff91326ef176b0 R15: ffff9132815e2000
  [16402.256321] FS:  00007f542c6d7b80(0000) GS:ffff9132bbd00000(0000) knlGS:0000000000000000
  [16402.256973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [16402.257374] CR2: 00007f127b83f250 CR3: 0000000133480002 CR4: 0000000000370ee0
  [16402.257867] Call Trace:
  [16402.258072]  btrfs_cow_block+0x109/0x230
  [16402.258356]  btrfs_search_slot+0x530/0x9d0
  [16402.258655]  btrfs_lookup_file_extent+0x37/0x40
  [16402.259155]  __btrfs_drop_extents+0x13c/0xd60
  [16402.259628]  ? btrfs_block_rsv_migrate+0x4f/0xb0
  [16402.259949]  btrfs_replace_file_extents+0x190/0x820
  [16402.260873]  btrfs_clone+0x9ae/0xc00
  [16402.261139]  btrfs_extent_same_range+0x66/0x90
  [16402.261771]  btrfs_remap_file_range+0x353/0x3b1
  [16402.262333]  vfs_dedupe_file_range_one.part.0+0xd5/0x140
  [16402.262821]  vfs_dedupe_file_range+0x189/0x220
  [16402.263150]  do_vfs_ioctl+0x552/0x700
  [16402.263662]  __x64_sys_ioctl+0x62/0xb0
  [16402.264023]  do_syscall_64+0x33/0x40
  [16402.264364]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [16402.264862] RIP: 0033:0x7f542c7d15cb
  [16402.266901] RSP: 002b:00007ffd35944ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [16402.267627] RAX: ffffffffffffffda RBX: 00000000009d1968 RCX: 00007f542c7d15cb
  [16402.268298] RDX: 00000000009d2490 RSI: 00000000c0189436 RDI: 0000000000000003
  [16402.268958] RBP: 00000000009d2520 R08: 0000000000000036 R09: 00000000009d2e64
  [16402.269726] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
  [16402.270659] R13: 000000000001f000 R14: 00000000009d1970 R15: 00000000009d2e80
  [16402.271498] irq event stamp: 0
  [16402.271846] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  [16402.272497] hardirqs last disabled at (0): [<ffffffff910dbf59>] copy_process+0x6b9/0x1ba0
  [16402.273343] softirqs last  enabled at (0): [<ffffffff910dbf59>] copy_process+0x6b9/0x1ba0
  [16402.273905] softirqs last disabled at (0): [<0000000000000000>] 0x0
  [16402.274338] ---[ end trace 737874a5a41a8236 ]---
  [16402.274669] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry
  [16402.276179] BTRFS info (device dm-9): forced readonly
  [16402.277046] BTRFS: error (device dm-9) in btrfs_replace_file_extents:2723: errno=-2 No such entry
  [16402.278744] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry
  [16402.279968] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry
  [16402.280582] BTRFS info (device dm-9): balance: ended with status: -30

The problem here is that as soon as we allocate the new block it is
locked and marked dirty in the btree inode.  This means that we could
attempt to writeback this block and need to lock the extent buffer.
However we're not unlocking it here and thus we deadlock.

Fix this by unlocking the cow block if we have any errors inside of
__btrfs_cow_block, and also free it so we do not leak it.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agobtrfs: reschedule when cloning lots of extents
Johannes Thumshirn [Tue, 22 Sep 2020 08:27:29 +0000 (17:27 +0900)]
btrfs: reschedule when cloning lots of extents

commit 6b613cc97f0ace77f92f7bc112b8f6ad3f52baf8 upstream.

We have several occurrences of a soft lockup from fstest's generic/175
testcase, which look more or less like this one:

  watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [xfs_io:10030]
  Kernel panic - not syncing: softlockup: hung tasks
  CPU: 0 PID: 10030 Comm: xfs_io Tainted: G             L    5.9.0-rc5+ #768
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
  Call Trace:
   <IRQ>
   dump_stack+0x77/0xa0
   panic+0xfa/0x2cb
   watchdog_timer_fn.cold+0x85/0xa5
   ? lockup_detector_update_enable+0x50/0x50
   __hrtimer_run_queues+0x99/0x4c0
   ? recalibrate_cpu_khz+0x10/0x10
   hrtimer_run_queues+0x9f/0xb0
   update_process_times+0x28/0x80
   tick_handle_periodic+0x1b/0x60
   __sysvec_apic_timer_interrupt+0x76/0x210
   asm_call_on_stack+0x12/0x20
   </IRQ>
   sysvec_apic_timer_interrupt+0x7f/0x90
   asm_sysvec_apic_timer_interrupt+0x12/0x20
  RIP: 0010:btrfs_tree_unlock+0x91/0x1a0 [btrfs]
  RSP: 0018:ffffc90007123a58 EFLAGS: 00000282
  RAX: ffff8881cea2fbe0 RBX: ffff8881cea2fbe0 RCX: 0000000000000000
  RDX: ffff8881d23fd200 RSI: ffffffff82045220 RDI: ffff8881cea2fba0
  RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000032
  R10: 0000160000000000 R11: 0000000000001000 R12: 0000000000001000
  R13: ffff8882357fd5b0 R14: ffff88816fa76e70 R15: ffff8881cea2fad0
   ? btrfs_tree_unlock+0x15b/0x1a0 [btrfs]
   btrfs_release_path+0x67/0x80 [btrfs]
   btrfs_insert_replace_extent+0x177/0x2c0 [btrfs]
   btrfs_replace_file_extents+0x472/0x7c0 [btrfs]
   btrfs_clone+0x9ba/0xbd0 [btrfs]
   btrfs_clone_files.isra.0+0xeb/0x140 [btrfs]
   ? file_update_time+0xcd/0x120
   btrfs_remap_file_range+0x322/0x3b0 [btrfs]
   do_clone_file_range+0xb7/0x1e0
   vfs_clone_file_range+0x30/0xa0
   ioctl_file_clone+0x8a/0xc0
   do_vfs_ioctl+0x5b2/0x6f0
   __x64_sys_ioctl+0x37/0xa0
   do_syscall_64+0x33/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f87977fc247
  RSP: 002b:00007ffd51a2f6d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f87977fc247
  RDX: 00007ffd51a2f710 RSI: 000000004020940d RDI: 0000000000000003
  RBP: 0000000000000004 R08: 00007ffd51a79080 R09: 0000000000000000
  R10: 00005621f11352f2 R11: 0000000000000206 R12: 0000000000000000
  R13: 0000000000000000 R14: 00005621f128b958 R15: 0000000080000000
  Kernel Offset: disabled
  ---[ end Kernel panic - not syncing: softlockup: hung tasks ]---

All of these lockup reports have the call chain btrfs_clone_files() ->
btrfs_clone() in common. btrfs_clone_files() calls btrfs_clone() with
both source and destination extents locked and loops over the source
extent to create the clones.

Conditionally reschedule in the btrfs_clone() loop, to give some time back
to other processes.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agobtrfs: tree-checker: fix false alert caused by legacy btrfs root item
Qu Wenruo [Tue, 22 Sep 2020 02:37:01 +0000 (10:37 +0800)]
btrfs: tree-checker: fix false alert caused by legacy btrfs root item

commit 1465af12e254a68706e110846f59cf0f09683184 upstream.

Commit 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check")
introduced btrfs root item size check, however btrfs root item has two
versions, the legacy one which just ends before generation_v2 member, is
smaller than current btrfs root item size.

This caused btrfs kernel to reject valid but old tree root leaves.

Fix this problem by also allowing legacy root item, since kernel can
already handle them pretty well and upgrade to newer root item format
when needed.

Reported-by: Martin Steigerwald <martin@lichtvoll.de>
Fixes: 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check")
CC: stable@vger.kernel.org # 5.4+
Tested-By: Martin Steigerwald <martin@lichtvoll.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agobtrfs: use kvzalloc() to allocate clone_roots in btrfs_ioctl_send()
Denis Efremov [Mon, 21 Sep 2020 17:03:35 +0000 (20:03 +0300)]
btrfs: use kvzalloc() to allocate clone_roots in btrfs_ioctl_send()

commit 8eb2fd00153a3a96a19c62ac9c6d48c2efebe5e8 upstream.

btrfs_ioctl_send() used open-coded kvzalloc implementation earlier.
The code was accidentally replaced with kzalloc() call [1]. Restore
the original code by using kvzalloc() to allocate sctx->clone_roots.

[1] https://patchwork.kernel.org/patch/9757891/#20529627

Fixes: 818e010bf9d0 ("btrfs: replace opencoded kvzalloc with the helper")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Denis Efremov <efremov@linux.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agobtrfs: send, recompute reference path after orphanization of a directory
Filipe Manana [Mon, 21 Sep 2020 13:13:30 +0000 (14:13 +0100)]
btrfs: send, recompute reference path after orphanization of a directory

commit 9c2b4e0347067396ceb3ae929d6888c81d610259 upstream.

During an incremental send, when an inode has multiple new references we
might end up emitting rename operations for orphanizations that have a
source path that is no longer valid due to a previous orphanization of
some directory inode. This causes the receiver to fail since it tries
to rename a path that does not exists.

Example reproducer:

  $ cat reproducer.sh
  #!/bin/bash

  mkfs.btrfs -f /dev/sdi >/dev/null
  mount /dev/sdi /mnt/sdi

  touch /mnt/sdi/f1
  touch /mnt/sdi/f2
  mkdir /mnt/sdi/d1
  mkdir /mnt/sdi/d1/d2

  # Filesystem looks like:
  #
  # .                           (ino 256)
  # |----- f1                   (ino 257)
  # |----- f2                   (ino 258)
  # |----- d1/                  (ino 259)
  #        |----- d2/           (ino 260)

  btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap1
  btrfs send -f /tmp/snap1.send /mnt/sdi/snap1

  # Now do a series of changes such that:
  #
  # *) inode 258 has one new hardlink and the previous name changed
  #
  # *) both names conflict with the old names of two other inodes:
  #
  #    1) the new name "d1" conflicts with the old name of inode 259,
  #       under directory inode 256 (root)
  #
  #    2) the new name "d2" conflicts with the old name of inode 260
  #       under directory inode 259
  #
  # *) inodes 259 and 260 now have the old names of inode 258
  #
  # *) inode 257 is now located under inode 260 - an inode with a number
  #    smaller than the inode (258) for which we created a second hard
  #    link and swapped its names with inodes 259 and 260
  #
  ln /mnt/sdi/f2 /mnt/sdi/d1/f2_link
  mv /mnt/sdi/f1 /mnt/sdi/d1/d2/f1

  # Swap d1 and f2.
  mv /mnt/sdi/d1 /mnt/sdi/tmp
  mv /mnt/sdi/f2 /mnt/sdi/d1
  mv /mnt/sdi/tmp /mnt/sdi/f2

  # Swap d2 and f2_link
  mv /mnt/sdi/f2/d2 /mnt/sdi/tmp
  mv /mnt/sdi/f2/f2_link /mnt/sdi/f2/d2
  mv /mnt/sdi/tmp /mnt/sdi/f2/f2_link

  # Filesystem now looks like:
  #
  # .                                (ino 256)
  # |----- d1                        (ino 258)
  # |----- f2/                       (ino 259)
  #        |----- f2_link/           (ino 260)
  #        |       |----- f1         (ino 257)
  #        |
  #        |----- d2                 (ino 258)

  btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap2
  btrfs send -f /tmp/snap2.send -p /mnt/sdi/snap1 /mnt/sdi/snap2

  mkfs.btrfs -f /dev/sdj >/dev/null
  mount /dev/sdj /mnt/sdj

  btrfs receive -f /tmp/snap1.send /mnt/sdj
  btrfs receive -f /tmp/snap2.send /mnt/sdj

  umount /mnt/sdi
  umount /mnt/sdj

When executed the receive of the incremental stream fails:

  $ ./reproducer.sh
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
  At subvol /mnt/sdi/snap1
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
  At subvol /mnt/sdi/snap2
  At subvol snap1
  At snapshot snap2
  ERROR: rename d1/d2 -> o260-6-0 failed: No such file or directory

This happens because:

1) When processing inode 257 we end up computing the name for inode 259
   because it is an ancestor in the send snapshot, and at that point it
   still has its old name, "d1", from the parent snapshot because inode
   259 was not yet processed. We then cache that name, which is valid
   until we start processing inode 259 (or set the progress to 260 after
   processing its references);

2) Later we start processing inode 258 and collecting all its new
   references into the list sctx->new_refs. The first reference in the
   list happens to be the reference for name "d1" while the reference for
   name "d2" is next (the last element of the list).
   We compute the full path "d1/d2" for this second reference and store
   it in the reference (its ->full_path member). The path used for the
   new parent directory was "d1" and not "f2" because inode 259, the
   new parent, was not yet processed;

3) When we start processing the new references at process_recorded_refs()
   we start with the first reference in the list, for the new name "d1".
   Because there is a conflicting inode that was not yet processed, which
   is directory inode 259, we orphanize it, renaming it from "d1" to
   "o259-6-0";

4) Then we start processing the new reference for name "d2", and we
   realize it conflicts with the reference of inode 260 in the parent
   snapshot. So we issue an orphanization operation for inode 260 by
   emitting a rename operation with a destination path of "o260-6-0"
   and a source path of "d1/d2" - this source path is the value we
   stored in the reference earlier at step 2), corresponding to the
   ->full_path member of the reference, however that path is no longer
   valid due to the orphanization of the directory inode 259 in step 3).
   This makes the receiver fail since the path does not exists, it should
   have been "o259-6-0/d2".

Fix this by recomputing the full path of a reference before emitting an
orphanization if we previously orphanized any directory, since that
directory could be a parent in the new path. This is a rare scenario so
keeping it simple and not checking if that previously orphanized directory
is in fact an ancestor of the inode we are trying to orphanize.

A test case for fstests follows soon.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agobtrfs: send, orphanize first all conflicting inodes when processing references
Filipe Manana [Mon, 21 Sep 2020 13:13:29 +0000 (14:13 +0100)]
btrfs: send, orphanize first all conflicting inodes when processing references

commit 98272bb77bf4cc20ed1ffca89832d713e70ebf09 upstream.

When doing an incremental send it is possible that when processing the new
references for an inode we end up issuing rename or link operations that
have an invalid path, which contains the orphanized name of a directory
before we actually orphanized it, causing the receiver to fail.

The following reproducer triggers such scenario:

  $ cat reproducer.sh
  #!/bin/bash

  mkfs.btrfs -f /dev/sdi >/dev/null
  mount /dev/sdi /mnt/sdi

  touch /mnt/sdi/a
  touch /mnt/sdi/b
  mkdir /mnt/sdi/testdir
  # We want "a" to have a lower inode number then "testdir" (257 vs 259).
  mv /mnt/sdi/a /mnt/sdi/testdir/a

  # Filesystem looks like:
  #
  # .                           (ino 256)
  # |----- testdir/             (ino 259)
  # |          |----- a         (ino 257)
  # |
  # |----- b                    (ino 258)

  btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap1
  btrfs send -f /tmp/snap1.send /mnt/sdi/snap1

  # Now rename 259 to "testdir_2", then change the name of 257 to
  # "testdir" and make it a direct descendant of the root inode (256).
  # Also create a new link for inode 257 with the old name of inode 258.
  # By swapping the names and location of several inodes and create a
  # nasty dependency chain of rename and link operations.
  mv /mnt/sdi/testdir/a /mnt/sdi/a2
  touch /mnt/sdi/testdir/a
  mv /mnt/sdi/b /mnt/sdi/b2
  ln /mnt/sdi/a2 /mnt/sdi/b
  mv /mnt/sdi/testdir /mnt/sdi/testdir_2
  mv /mnt/sdi/a2 /mnt/sdi/testdir

  # Filesystem now looks like:
  #
  # .                            (ino 256)
  # |----- testdir_2/            (ino 259)
  # |          |----- a          (ino 260)
  # |
  # |----- testdir               (ino 257)
  # |----- b                     (ino 257)
  # |----- b2                    (ino 258)

  btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap2
  btrfs send -f /tmp/snap2.send -p /mnt/sdi/snap1 /mnt/sdi/snap2

  mkfs.btrfs -f /dev/sdj >/dev/null
  mount /dev/sdj /mnt/sdj

  btrfs receive -f /tmp/snap1.send /mnt/sdj
  btrfs receive -f /tmp/snap2.send /mnt/sdj

  umount /mnt/sdi
  umount /mnt/sdj

When running the reproducer, the receive of the incremental send stream
fails:

  $ ./reproducer.sh
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
  At subvol /mnt/sdi/snap1
  Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
  At subvol /mnt/sdi/snap2
  At subvol snap1
  At snapshot snap2
  ERROR: link b -> o259-6-0/a failed: No such file or directory

The problem happens because of the following:

1) Before we start iterating the list of new references for inode 257,
   we generate its current path and store it at @valid_path, done at
   the very beginning of process_recorded_refs(). The generated path
   is "o259-6-0/a", containing the orphanized name for inode 259;

2) Then we iterate over the list of new references, which has the
   references "b" and "testdir" in that specific order;

3) We process reference "b" first, because it is in the list before
   reference "testdir". We then issue a link operation to create
   the new reference "b" using a target path corresponding to the
   content at @valid_path, which corresponds to "o259-6-0/a".
   However we haven't yet orphanized inode 259, its name is still
   "testdir", and not "o259-6-0". The orphanization of 259 did not
   happen yet because we will process the reference named "testdir"
   for inode 257 only in the next iteration of the loop that goes
   over the list of new references.

Fix the issue by having a preliminar iteration over all the new references
at process_recorded_refs(). This iteration is responsible only for doing
the orphanization of other inodes that have and old reference that
conflicts with one of the new references of the inode we are currently
processing. The emission of rename and link operations happen now in the
next iteration of the new references.

A test case for fstests will follow soon.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agobtrfs: reschedule if necessary when logging directory items
Filipe Manana [Mon, 14 Sep 2020 14:27:50 +0000 (15:27 +0100)]
btrfs: reschedule if necessary when logging directory items

commit bb56f02f26fe23798edb1b2175707419b28c752a upstream.

Logging directories with many entries can take a significant amount of
time, and in some cases monopolize a cpu/core for a long time if the
logging task doesn't happen to block often enough.

Johannes and Lu Fengqi reported test case generic/041 triggering a soft
lockup when the kernel has CONFIG_SOFTLOCKUP_DETECTOR=y. For this test
case we log an inode with 3002 hard links, and because the test removed
one hard link before fsyncing the file, the inode logging causes the
parent directory do be logged as well, which has 6004 directory items to
log (3002 BTRFS_DIR_ITEM_KEY items plus 3002 BTRFS_DIR_INDEX_KEY items),
so it can take a significant amount of time and trigger the soft lockup.

So just make tree-log.c:log_dir_items() reschedule when necessary,
releasing the current search path before doing so and then resume from
where it was before the reschedule.

The stack trace produced when the soft lockup happens is the following:

[10480.277653] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [xfs_io:28172]
[10480.279418] Modules linked in: dm_thin_pool dm_persistent_data (...)
[10480.284915] irq event stamp: 29646366
[10480.285987] hardirqs last  enabled at (29646365): [<ffffffff85249b66>] __slab_alloc.constprop.0+0x56/0x60
[10480.288482] hardirqs last disabled at (29646366): [<ffffffff8579b00d>] irqentry_enter+0x1d/0x50
[10480.290856] softirqs last  enabled at (4612): [<ffffffff85a00323>] __do_softirq+0x323/0x56c
[10480.293615] softirqs last disabled at (4483): [<ffffffff85800dbf>] asm_call_on_stack+0xf/0x20
[10480.296428] CPU: 2 PID: 28172 Comm: xfs_io Not tainted 5.9.0-rc4-default+ #1248
[10480.298948] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[10480.302455] RIP: 0010:__slab_alloc.constprop.0+0x19/0x60
[10480.304151] Code: 86 e8 31 75 21 00 66 66 2e 0f 1f 84 00 00 00 (...)
[10480.309558] RSP: 0018:ffffadbe09397a58 EFLAGS: 00000282
[10480.311179] RAX: ffff8a495ab92840 RBX: 0000000000000282 RCX: 0000000000000006
[10480.313242] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff85249b66
[10480.315260] RBP: ffff8a497d04b740 R08: 0000000000000001 R09: 0000000000000001
[10480.317229] R10: ffff8a497d044800 R11: ffff8a495ab93c40 R12: 0000000000000000
[10480.319169] R13: 0000000000000000 R14: 0000000000000c40 R15: ffffffffc01daf70
[10480.321104] FS:  00007fa1dc5c0e40(0000) GS:ffff8a497da00000(0000) knlGS:0000000000000000
[10480.323559] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10480.325235] CR2: 00007fa1dc5befb8 CR3: 0000000004f8a006 CR4: 0000000000170ea0
[10480.327259] Call Trace:
[10480.328286]  ? overwrite_item+0x1f0/0x5a0 [btrfs]
[10480.329784]  __kmalloc+0x831/0xa20
[10480.331009]  ? btrfs_get_32+0xb0/0x1d0 [btrfs]
[10480.332464]  overwrite_item+0x1f0/0x5a0 [btrfs]
[10480.333948]  log_dir_items+0x2ee/0x570 [btrfs]
[10480.335413]  log_directory_changes+0x82/0xd0 [btrfs]
[10480.336926]  btrfs_log_inode+0xc9b/0xda0 [btrfs]
[10480.338374]  ? init_once+0x20/0x20 [btrfs]
[10480.339711]  btrfs_log_inode_parent+0x8d3/0xd10 [btrfs]
[10480.341257]  ? dget_parent+0x97/0x2e0
[10480.342480]  btrfs_log_dentry_safe+0x3a/0x50 [btrfs]
[10480.343977]  btrfs_sync_file+0x24b/0x5e0 [btrfs]
[10480.345381]  do_fsync+0x38/0x70
[10480.346483]  __x64_sys_fsync+0x10/0x20
[10480.347703]  do_syscall_64+0x2d/0x70
[10480.348891]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[10480.350444] RIP: 0033:0x7fa1dc80970b
[10480.351642] Code: 0f 05 48 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 (...)
[10480.356952] RSP: 002b:00007fffb3d081d0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[10480.359458] RAX: ffffffffffffffda RBX: 0000562d93d45e40 RCX: 00007fa1dc80970b
[10480.361426] RDX: 0000562d93d44ab0 RSI: 0000562d93d45e60 RDI: 0000000000000003
[10480.363367] RBP: 0000000000000001 R08: 0000000000000000 R09: 00007fa1dc7b2a40
[10480.365317] R10: 0000562d93d0e366 R11: 0000000000000293 R12: 0000000000000001
[10480.367299] R13: 0000562d93d45290 R14: 0000562d93d45e40 R15: 0000562d93d45e60

Link: https://lore.kernel.org/linux-btrfs/20180713090216.GC575@fnst.localdomain/
Reported-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
CC: stable@vger.kernel.org # 4.4+
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agobtrfs: tracepoints: output proper root owner for trace_find_free_extent()
Qu Wenruo [Tue, 28 Jul 2020 01:42:49 +0000 (09:42 +0800)]
btrfs: tracepoints: output proper root owner for trace_find_free_extent()

commit 437490fed3b0c9ae21af8f70e0f338d34560842b upstream.

The current trace event always output result like this:

 find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=4(METADATA)
 find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=4(METADATA)
 find_free_extent: root=2(EXTENT_TREE) len=8192 empty_size=0 flags=1(DATA)
 find_free_extent: root=2(EXTENT_TREE) len=8192 empty_size=0 flags=1(DATA)
 find_free_extent: root=2(EXTENT_TREE) len=4096 empty_size=0 flags=1(DATA)
 find_free_extent: root=2(EXTENT_TREE) len=4096 empty_size=0 flags=1(DATA)

T's saying we're allocating data extent for EXTENT tree, which is not
even possible.

It's because we always use EXTENT tree as the owner for
trace_find_free_extent() without using the @root from
btrfs_reserve_extent().

This patch will change the parameter to use proper @root for
trace_find_free_extent():

Now it looks much better:

 find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
 find_free_extent: root=5(FS_TREE) len=8192 empty_size=0 flags=1(DATA)
 find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=1(DATA)
 find_free_extent: root=5(FS_TREE) len=4096 empty_size=0 flags=1(DATA)
 find_free_extent: root=5(FS_TREE) len=8192 empty_size=0 flags=1(DATA)
 find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
 find_free_extent: root=7(CSUM_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
 find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
 find_free_extent: root=1(ROOT_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)

Reported-by: Hans van Kranenburg <hans@knorrie.org>
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agobtrfs: sysfs: init devices outside of the chunk_mutex
Josef Bacik [Tue, 1 Sep 2020 12:09:01 +0000 (08:09 -0400)]
btrfs: sysfs: init devices outside of the chunk_mutex

commit ca10845a56856fff4de3804c85e6424d0f6d0cde upstream.

While running btrfs/061, btrfs/073, btrfs/078, or btrfs/178 we hit the
following lockdep splat:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.9.0-rc3+ #4 Not tainted
  ------------------------------------------------------
  kswapd0/100 is trying to acquire lock:
  ffff96ecc22ef4a0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330

  but task is already holding lock:
  ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #3 (fs_reclaim){+.+.}-{0:0}:
 fs_reclaim_acquire+0x65/0x80
 slab_pre_alloc_hook.constprop.0+0x20/0x200
 kmem_cache_alloc+0x37/0x270
 alloc_inode+0x82/0xb0
 iget_locked+0x10d/0x2c0
 kernfs_get_inode+0x1b/0x130
 kernfs_get_tree+0x136/0x240
 sysfs_get_tree+0x16/0x40
 vfs_get_tree+0x28/0xc0
 path_mount+0x434/0xc00
 __x64_sys_mount+0xe3/0x120
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #2 (kernfs_mutex){+.+.}-{3:3}:
 __mutex_lock+0x7e/0x7e0
 kernfs_add_one+0x23/0x150
 kernfs_create_link+0x63/0xa0
 sysfs_do_create_link_sd+0x5e/0xd0
 btrfs_sysfs_add_devices_dir+0x81/0x130
 btrfs_init_new_device+0x67f/0x1250
 btrfs_ioctl+0x1ef/0x2e20
 __x64_sys_ioctl+0x83/0xb0
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
 __mutex_lock+0x7e/0x7e0
 btrfs_chunk_alloc+0x125/0x3a0
 find_free_extent+0xdf6/0x1210
 btrfs_reserve_extent+0xb3/0x1b0
 btrfs_alloc_tree_block+0xb0/0x310
 alloc_tree_block_no_bg_flush+0x4a/0x60
 __btrfs_cow_block+0x11a/0x530
 btrfs_cow_block+0x104/0x220
 btrfs_search_slot+0x52e/0x9d0
 btrfs_insert_empty_items+0x64/0xb0
 btrfs_insert_delayed_items+0x90/0x4f0
 btrfs_commit_inode_delayed_items+0x93/0x140
 btrfs_log_inode+0x5de/0x2020
 btrfs_log_inode_parent+0x429/0xc90
 btrfs_log_new_name+0x95/0x9b
 btrfs_rename2+0xbb9/0x1800
 vfs_rename+0x64f/0x9f0
 do_renameat2+0x320/0x4e0
 __x64_sys_rename+0x1f/0x30
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
 __lock_acquire+0x119c/0x1fc0
 lock_acquire+0xa7/0x3d0
 __mutex_lock+0x7e/0x7e0
 __btrfs_release_delayed_node.part.0+0x3f/0x330
 btrfs_evict_inode+0x24c/0x500
 evict+0xcf/0x1f0
 dispose_list+0x48/0x70
 prune_icache_sb+0x44/0x50
 super_cache_scan+0x161/0x1e0
 do_shrink_slab+0x178/0x3c0
 shrink_slab+0x17c/0x290
 shrink_node+0x2b2/0x6d0
 balance_pgdat+0x30a/0x670
 kswapd+0x213/0x4c0
 kthread+0x138/0x160
 ret_from_fork+0x1f/0x30

  other info that might help us debug this:

  Chain exists of:
    &delayed_node->mutex --> kernfs_mutex --> fs_reclaim

   Possible unsafe locking scenario:

 CPU0                    CPU1
 ----                    ----
    lock(fs_reclaim);
 lock(kernfs_mutex);
 lock(fs_reclaim);
    lock(&delayed_node->mutex);

   *** DEADLOCK ***

  3 locks held by kswapd0/100:
   #0: ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
   #1: ffffffff8dd65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290
   #2: ffff96ed2ade30e0 (&type->s_umount_key#36){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0

  stack backtrace:
  CPU: 0 PID: 100 Comm: kswapd0 Not tainted 5.9.0-rc3+ #4
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  Call Trace:
   dump_stack+0x8b/0xb8
   check_noncircular+0x12d/0x150
   __lock_acquire+0x119c/0x1fc0
   lock_acquire+0xa7/0x3d0
   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
   __mutex_lock+0x7e/0x7e0
   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
   ? lock_acquire+0xa7/0x3d0
   ? find_held_lock+0x2b/0x80
   __btrfs_release_delayed_node.part.0+0x3f/0x330
   btrfs_evict_inode+0x24c/0x500
   evict+0xcf/0x1f0
   dispose_list+0x48/0x70
   prune_icache_sb+0x44/0x50
   super_cache_scan+0x161/0x1e0
   do_shrink_slab+0x178/0x3c0
   shrink_slab+0x17c/0x290
   shrink_node+0x2b2/0x6d0
   balance_pgdat+0x30a/0x670
   kswapd+0x213/0x4c0
   ? _raw_spin_unlock_irqrestore+0x41/0x50
   ? add_wait_queue_exclusive+0x70/0x70
   ? balance_pgdat+0x670/0x670
   kthread+0x138/0x160
   ? kthread_create_worker_on_cpu+0x40/0x40
   ret_from_fork+0x1f/0x30

This happens because we are holding the chunk_mutex at the time of
adding in a new device.  However we only need to hold the
device_list_mutex, as we're going to iterate over the fs_devices
devices.  Move the sysfs init stuff outside of the chunk_mutex to get
rid of this lockdep splat.

CC: stable@vger.kernel.org # 4.4.x: f3cd2c58110dad14e: btrfs: sysfs, rename device_link add/remove functions
CC: stable@vger.kernel.org # 4.4.x
Reported-by: David Sterba <dsterba@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agobtrfs: qgroup: fix qgroup meta rsv leak for subvolume operations
Qu Wenruo [Fri, 24 Jul 2020 06:46:10 +0000 (14:46 +0800)]
btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations

commit e85fde5162bf1b242cbd6daf7dba0f9b457d592b upstream.

[BUG]
When quota is enabled for TEST_DEV, generic/013 sometimes fails like this:

  generic/013 14s ... _check_dmesg: something found in dmesg (see xfstests-dev/results//generic/013.dmesg)

And with the following metadata leak:

  BTRFS warning (device dm-3): qgroup 0/1370 has unreleased space, type 2 rsv 49152
  ------------[ cut here ]------------
  WARNING: CPU: 2 PID: 47912 at fs/btrfs/disk-io.c:4078 close_ctree+0x1dc/0x323 [btrfs]
  Call Trace:
   btrfs_put_super+0x15/0x17 [btrfs]
   generic_shutdown_super+0x72/0x110
   kill_anon_super+0x18/0x30
   btrfs_kill_super+0x17/0x30 [btrfs]
   deactivate_locked_super+0x3b/0xa0
   deactivate_super+0x40/0x50
   cleanup_mnt+0x135/0x190
   __cleanup_mnt+0x12/0x20
   task_work_run+0x64/0xb0
   __prepare_exit_to_usermode+0x1bc/0x1c0
   __syscall_return_slowpath+0x47/0x230
   do_syscall_64+0x64/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ---[ end trace a6cfd45ba80e4e06 ]---
  BTRFS error (device dm-3): qgroup reserved space leaked
  BTRFS info (device dm-3): disk space caching is enabled
  BTRFS info (device dm-3): has skinny extents

[CAUSE]
The qgroup preallocated meta rsv operations of that offending root are:

  btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=131072
  btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=131072
  btrfs_subvolume_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=49152
  btrfs_delayed_inode_release_metadata: convert_meta_prealloc root=1370 num_bytes=-131072
  btrfs_delayed_inode_release_metadata: convert_meta_prealloc root=1370 num_bytes=-131072

It's pretty obvious that, we reserve qgroup meta rsv in
btrfs_subvolume_reserve_metadata(), but doesn't have corresponding
release/convert calls in btrfs_subvolume_release_metadata().

This leads to the leakage.

[FIX]
To fix this bug, we should follow what we're doing in
btrfs_delalloc_reserve_metadata(), where we reserve qgroup space, and
add it to block_rsv->qgroup_rsv_reserved.

And free the qgroup reserved metadata space when releasing the
block_rsv.

To do this, we need to change the btrfs_subvolume_release_metadata() to
accept btrfs_root, and record the qgroup_to_release number, and call
btrfs_qgroup_convert_reserved_meta() for it.

Fixes: 733e03a0b26a ("btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agobtrfs: improve device scanning messages
Anand Jain [Thu, 3 Sep 2020 13:30:12 +0000 (21:30 +0800)]
btrfs: improve device scanning messages

commit 79dae17d8d44b2d15779e332180080af45df5352 upstream.

Systems booting without the initramfs seems to scan an unusual kind
of device path (/dev/root). And at a later time, the device is updated
to the correct path. We generally print the process name and PID of the
process scanning the device but we don't capture the same information if
the device path is rescanned with a different pathname.

The current message is too long, so drop the unnecessary UUID and add
process name and PID.

While at this also update the duplicate device warning to include the
process name and PID so the messages are consistent

CC: stable@vger.kernel.org # 4.19+
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=89721
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agobtrfs: qgroup: fix wrong qgroup metadata reserve for delayed inode
Qu Wenruo [Fri, 24 Jul 2020 06:46:09 +0000 (14:46 +0800)]
btrfs: qgroup: fix wrong qgroup metadata reserve for delayed inode

commit b4c5d8fdfff3e2b6c4fa4a5043e8946dff500f8c upstream.

For delayed inode facility, qgroup metadata is reserved for it, and
later freed.

However we're freeing more bytes than we reserved.
In btrfs_delayed_inode_reserve_metadata():

num_bytes = btrfs_calc_metadata_size(fs_info, 1);
...
ret = btrfs_qgroup_reserve_meta_prealloc(root,
fs_info->nodesize, true);
...
if (!ret) {
node->bytes_reserved = num_bytes;

But in btrfs_delayed_inode_release_metadata():

if (qgroup_free)
btrfs_qgroup_free_meta_prealloc(node->root,
node->bytes_reserved);
else
btrfs_qgroup_convert_reserved_meta(node->root,
node->bytes_reserved);

This means, we're always releasing more qgroup metadata rsv than we have
reserved.

This won't trigger selftest warning, as btrfs qgroup metadata rsv has
extra protection against cases like quota enabled half-way.

But we still need to fix this problem any way.

This patch will use the same num_bytes for qgroup metadata rsv so we
could handle it correctly.

Fixes: f218ea6c4792 ("btrfs: delayed-inode: Remove wrong qgroup meta reservation calls")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoPM: runtime: Remove link state checks in rpm_get/put_supplier()
Xiang Chen [Tue, 22 Sep 2020 13:11:06 +0000 (21:11 +0800)]
PM: runtime: Remove link state checks in rpm_get/put_supplier()

commit d12544fb2aa9944b180c35914031a8384ab082c1 upstream.

To support runtime PM for hisi SAS driver (the driver is in directory
drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
(consumer device) and hisi_hba->dev(supplier device) with flags
DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.

After runtime suspended consumers and supplier, unload the dirver which
causes a hung.

We found that it called function device_release_driver_internal() to
release the supplier device (hisi_hba->dev), as the device link was
busy, it set the device link state to DL_STATE_SUPPLIER_UNBIND, and
then it called device_release_driver_internal() to release the consumer
device (scsi_device->sdev_gendev).

Then it would try to call pm_runtime_get_sync() to resume the consumer
device, but because consumer-supplier relation existed, it would try
to resume the supplier first, but as the link state was already
DL_STATE_SUPPLIER_UNBIND, so it skipped resuming the supplier and only
resumed the consumer which hanged (it sends IOs to resume scsi_device
while the SAS controller is suspended).

Simple flow is as follows:

device_release_driver_internal -> (supplier device)
    if device_links_busy ->
device_links_unbind_consumers ->
    ...
    WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
    device_release_driver_internal (consumer device)
    pm_runtime_get_sync -> (consumer device)
...
__rpm_callback ->
    rpm_get_suppliers ->
if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
...
    pm_runtime_clean_up_links
    ...

Correct suspend/resume ordering between a supplier device and its consumer
devices (resume the supplier device before resuming consumer devices, and
suspend consumer devices before suspending the supplier device) should be
guaranteed by runtime PM, but the state checks in rpm_get_supplier() and
rpm_put_supplier() break this rule, so remove them.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
[ rjw: Subject and changelog edits ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoscsi: qla2xxx: Fix crash on session cleanup with unload
Quinn Tran [Tue, 29 Sep 2020 10:21:50 +0000 (03:21 -0700)]
scsi: qla2xxx: Fix crash on session cleanup with unload

commit 50457dab670f396557e60c07f086358460876353 upstream.

On unload, session cleanup prematurely gave the signal for driver unload
path to advance.

Link: https://lore.kernel.org/r/20200929102152.32278-6-njavali@marvell.com
Fixes: 726b85487067 ("qla2xxx: Add framework for async fabric discovery")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoscsi: qla2xxx: Fix reset of MPI firmware
Arun Easi [Tue, 29 Sep 2020 10:21:49 +0000 (03:21 -0700)]
scsi: qla2xxx: Fix reset of MPI firmware

commit 3e6efab865ac943f4ec43913eb665695737112b0 upstream.

Normally, the MPI firmware is reset when an MPI dump is collected.  If an
unsaved MPI dump exists in the driver, though, an alternate mechanism is
used. This mechanism, which was not fully correct, is not recommended and
instead an MPI dump template walk is suggested to perform the MPI reset.

To allow for the MPI dump template walk, extra space is reserved in the MPI
dump buffer which gets used only when there is already an MPI dump in
place.

Link: https://lore.kernel.org/r/20200929102152.32278-5-njavali@marvell.com
Fixes: cbb01c2f2f63 ("scsi: qla2xxx: Fix MPI failure AEN (8200) handling")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoscsi: qla2xxx: Fix MPI reset needed message
Arun Easi [Tue, 29 Sep 2020 10:21:48 +0000 (03:21 -0700)]
scsi: qla2xxx: Fix MPI reset needed message

commit 7a6cdbd5e87515ebf6231b762ad903c7cff87b9c upstream.

When printing the message:

  "MPI Heartbeat stop. MPI reset is not needed.."

..the wrong register was checked leading to always printing that MPI reset
is not needed, even when it is needed. Fix the MPI reset message.

Link: https://lore.kernel.org/r/20200929102152.32278-4-njavali@marvell.com
Fixes: cbb01c2f2f63 ("scsi: qla2xxx: Fix MPI failure AEN (8200) handling")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoscsi: mptfusion: Fix null pointer dereferences in mptscsih_remove()
Helge Deller [Thu, 22 Oct 2020 09:00:05 +0000 (11:00 +0200)]
scsi: mptfusion: Fix null pointer dereferences in mptscsih_remove()

commit 2f4843b172c2c0360ee7792ad98025fae7baefde upstream.

The mptscsih_remove() function triggers a kernel oops if the Scsi_Host
pointer (ioc->sh) is NULL, as can be seen in this syslog:

 ioc0: LSI53C1030 B2: Capabilities={Initiator,Target}
 Begin: Waiting for root file system ...
 scsi host2: error handler thread failed to spawn, error = -4
 mptspi: ioc0: WARNING - Unable to register controller with SCSI subsystem
 Backtrace:
  [<000000001045b7cc>] mptspi_probe+0x248/0x3d0 [mptspi]
  [<0000000040946470>] pci_device_probe+0x1ac/0x2d8
  [<0000000040add668>] really_probe+0x1bc/0x988
  [<0000000040ade704>] driver_probe_device+0x160/0x218
  [<0000000040adee24>] device_driver_attach+0x160/0x188
  [<0000000040adef90>] __driver_attach+0x144/0x320
  [<0000000040ad7c78>] bus_for_each_dev+0xd4/0x158
  [<0000000040adc138>] driver_attach+0x4c/0x80
  [<0000000040adb3ec>] bus_add_driver+0x3e0/0x498
  [<0000000040ae0130>] driver_register+0xf4/0x298
  [<00000000409450c4>] __pci_register_driver+0x78/0xa8
  [<000000000007d248>] mptspi_init+0x18c/0x1c4 [mptspi]

This patch adds the necessary NULL-pointer checks.  Successfully tested on
a HP C8000 parisc workstation with buggy SCSI drives.

Link: https://lore.kernel.org/r/20201022090005.GA9000@ls3530.fritz.box
Cc: <stable@vger.kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agofs/kernel_read_file: Remove FIRMWARE_PREALLOC_BUFFER enum
Kees Cook [Fri, 2 Oct 2020 17:38:13 +0000 (10:38 -0700)]
fs/kernel_read_file: Remove FIRMWARE_PREALLOC_BUFFER enum

commit c307459b9d1fcb8bbf3ea5a4162979532322ef77 upstream.

FIRMWARE_PREALLOC_BUFFER is a "how", not a "what", and confuses the LSMs
that are interested in filtering between types of things. The "how"
should be an internal detail made uninteresting to the LSMs.

Fixes: a098ecd2fa7d ("firmware: support loading into a pre-allocated buffer")
Fixes: fd90bc559bfb ("ima: based on policy verify firmware signatures (pre-allocated buffer)")
Fixes: 4f0496d8ffa3 ("ima: based on policy warn about loading firmware (pre-allocated buffer)")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201002173828.2099543-2-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agow1: mxc_w1: Fix timeout resolution problem leading to bus error
Martin Fuzzey [Wed, 30 Sep 2020 08:36:46 +0000 (10:36 +0200)]
w1: mxc_w1: Fix timeout resolution problem leading to bus error

commit c9723750a699c3bd465493ac2be8992b72ccb105 upstream.

On my platform (i.MX53) bus access sometimes fails with
w1_search: max_slave_count 64 reached, will continue next search.

The reason is the use of jiffies to implement a 200us timeout in
mxc_w1_ds2_touch_bit().
On some platforms the jiffies timer resolution is insufficient for this.

Fix by replacing jiffies by ktime_get().

For consistency apply the same change to the other use of jiffies in
mxc_w1_ds2_reset_bus().

Fixes: f80b2581a706 ("w1: mxc_w1: Optimize mxc_w1_ds2_touch_bit()")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
Link: https://lore.kernel.org/r/1601455030-6607-1-git-send-email-martin.fuzzey@flowbird.group
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoio-wq: assign NUMA node locality if appropriate
Jens Axboe [Thu, 15 Oct 2020 16:13:07 +0000 (10:13 -0600)]
io-wq: assign NUMA node locality if appropriate

commit a8b595b22d31f83b715511f59012f152a269d83b upstream.

There was an assumption that kthread_create_on_node() would properly set
NUMA affinities in terms of CPUs allowed, but it doesn't. Make sure we
do this when creating an io-wq context on NUMA.

Cc: stable@vger.kernel.org
Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoacpi-cpufreq: Honor _PSD table setting on new AMD CPUs
Wei Huang [Mon, 19 Oct 2020 03:57:41 +0000 (22:57 -0500)]
acpi-cpufreq: Honor _PSD table setting on new AMD CPUs

commit 5368512abe08a28525d9b24abbfc2a72493e8dba upstream.

acpi-cpufreq has a old quirk that overrides the _PSD table supplied by
BIOS on AMD CPUs. However the _PSD table of new AMD CPUs (Family 19h+)
now accurately reports the P-state dependency of CPU cores. Hence this
quirk needs to be fixed in order to support new CPUs' frequency control.

Fixes: acd316248205 ("acpi-cpufreq: Add quirk to disable _PSD usage on all AMD CPUs")
Signed-off-by: Wei Huang <wei.huang2@amd.com>
[ rjw: Subject edit ]
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe()
Rafael J. Wysocki [Mon, 5 Oct 2020 17:13:46 +0000 (19:13 +0200)]
ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe()

commit e0e9ce390d7bc6a705653d4a8aa4ea92c9a65e53 upstream.

It turns out that in some cases there are EC events to flush in
acpi_ec_dispatch_gpe() even though the ec_no_wakeup kernel parameter
is set and the EC GPE is disabled while sleeping, so drop the
ec_no_wakeup check that prevents those events from being processed
from acpi_ec_dispatch_gpe().

Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoACPI: EC: PM: Flush EC work unconditionally after wakeup
Rafael J. Wysocki [Mon, 5 Oct 2020 17:13:15 +0000 (19:13 +0200)]
ACPI: EC: PM: Flush EC work unconditionally after wakeup

commit 5e92442bb4121562231e6daf8a2d1306cb5f8805 upstream.

Commit 607b9df63057 ("ACPI: EC: PM: Avoid flushing EC work when EC
GPE is inactive") has been reported to cause some power button wakeup
events to be missed on some systems, so modify acpi_ec_dispatch_gpe()
to call acpi_ec_flush_work() unconditionally to effectively reverse
the changes made by that commit.

Also note that the problem which prompted commit 607b9df63057 is not
reproducible any more on the affected machine.

Fixes: 607b9df63057 ("ACPI: EC: PM: Avoid flushing EC work when EC GPE is inactive")
Reported-by: Raymond Tan <raymond.tan@intel.com>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoPCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI
Lukas Wunner [Fri, 2 Oct 2020 05:10:12 +0000 (07:10 +0200)]
PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

commit c6e331312ebfb52b7186e5d82d517d68b4d2f2d8 upstream.

Recent laptops with dual AMD GPUs fail to suspend the discrete GPU, thus
causing lockups on system sleep and high power consumption at runtime.
The discrete GPU would normally be suspended to D3cold by turning off
ACPI _PR3 Power Resources of the Root Port above the GPU.

However on affected systems, the Root Port is hotplug-capable and
pci_bridge_d3_possible() only allows hotplug ports to go to D3 if they
belong to a Thunderbolt device or if the Root Port possesses a
"HotPlugSupportInD3" ACPI property.  Neither is the case on affected
laptops.  The reason for whitelisting only specific, known to work
hotplug ports for D3 is that there have been reports of SkyLake Xeon-SP
systems raising Hardware Error NMIs upon suspending their hotplug ports:
https://lore.kernel.org/linux-pci/20170503180426.GA4058@otc-nc-03/

But if a hotplug port is power manageable by ACPI (as can be detected
through presence of Power Resources and corresponding _PS0 and _PS3
methods) then it ought to be safe to suspend it to D3.  To this end,
amend acpi_pci_bridge_d3() to whitelist such ports for D3.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1222
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1252
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1304
Reported-and-tested-by: Arthur Borsboom <arthurborsboom@gmail.com>
Reported-and-tested-by: matoro <matoro@airmail.cc>
Reported-by: Aaron Zakhrov <aaron.zakhrov@gmail.com>
Reported-by: Michal Rostecki <mrostecki@suse.com>
Reported-by: Shai Coleman <git@shaicoleman.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoACPI: debug: don't allow debugging when ACPI is disabled
Jamie Iles [Mon, 12 Oct 2020 13:04:46 +0000 (14:04 +0100)]
ACPI: debug: don't allow debugging when ACPI is disabled

commit 0fada277147ffc6d694aa32162f51198d4f10d94 upstream.

If ACPI is disabled then loading the acpi_dbg module will result in the
following splat when lock debugging is enabled.

  DEBUG_LOCKS_WARN_ON(lock->magic != lock)
  WARNING: CPU: 0 PID: 1 at kernel/locking/mutex.c:938 __mutex_lock+0xa10/0x1290
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc8+ #103
  Hardware name: linux,dummy-virt (DT)
  Call trace:
   dump_backtrace+0x0/0x4d8
   show_stack+0x34/0x48
   dump_stack+0x174/0x1f8
   panic+0x360/0x7a0
   __warn+0x244/0x2ec
   report_bug+0x240/0x398
   bug_handler+0x50/0xc0
   call_break_hook+0x160/0x1d8
   brk_handler+0x30/0xc0
   do_debug_exception+0x184/0x340
   el1_dbg+0x48/0xb0
   el1_sync_handler+0x170/0x1c8
   el1_sync+0x80/0x100
   __mutex_lock+0xa10/0x1290
   mutex_lock_nested+0x6c/0xc0
   acpi_register_debugger+0x40/0x88
   acpi_aml_init+0xc4/0x114
   do_one_initcall+0x24c/0xb10
   kernel_init_freeable+0x690/0x728
   kernel_init+0x20/0x1e8
   ret_from_fork+0x10/0x18

This is because acpi_debugger.lock has not been initialized as
acpi_debugger_init() is not called when ACPI is disabled.  Fail module
loading to avoid this and any subsequent problems that might arise by
trying to debug AML when ACPI is disabled.

Fixes: 8cfb0cdf07e2 ("ACPI / debugger: Add IO interface to access debugger functionalities")
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Cc: 4.10+ <stable@vger.kernel.org> # 4.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoACPI: video: use ACPI backlight for HP 635 Notebook
Alex Hung [Sun, 13 Sep 2020 22:34:03 +0000 (16:34 -0600)]
ACPI: video: use ACPI backlight for HP 635 Notebook

commit b226faab4e7890bbbccdf794e8b94276414f9058 upstream.

The default backlight interface is AMD's radeon_bl0 which does not
work on this system, so use the ACPI backlight interface on it
instead.

BugLink: https://bugs.launchpad.net/bugs/1894667
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Alex Hung <alex.hung@canonical.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoACPI / extlog: Check for RDMSR failure
Ben Hutchings [Sun, 27 Sep 2020 21:50:42 +0000 (22:50 +0100)]
ACPI / extlog: Check for RDMSR failure

commit 7cecb47f55e00282f972a1e0b09136c8cd938221 upstream.

extlog_init() uses rdmsrl() to read an MSR, which on older CPUs
provokes a error message at boot:

    unchecked MSR access error: RDMSR from 0x179 at rIP: 0xcd047307 (native_read_msr+0x7/0x40)

Use rdmsrl_safe() instead, and return -ENODEV if it fails.

Reported-by: jim@photojim.ca
References: https://bugs.debian.org/971058
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoACPI: button: fix handling lid state changes when input device closed
dmitry.torokhov@gmail.com [Mon, 5 Oct 2020 05:11:25 +0000 (22:11 -0700)]
ACPI: button: fix handling lid state changes when input device closed

commit 21988a8e51479ceffe7b0568b170effabb708dfe upstream.

The original intent of 84d3f6b76447 was to delay evaluating lid state until
all drivers have been loaded, with input device being opened from userspace
serving as a signal for this condition. Let's ensure that state updates
happen even if userspace closed (or in the future inhibited) input device.

Note that if we go through suspend/resume cycle we assume the system has
been fully initialized even if LID input device has not been opened yet.

This has a side-effect of fixing access to input->users outside of
input->mutex protections by the way of eliminating said accesses and using
driver private flag.

Fixes: 84d3f6b76447 ("ACPI / button: Delay acpi_lid_initialize_state() until first user space open")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoNFS: fix nfs_path in case of a rename retry
Ashish Sangwan [Mon, 5 Oct 2020 09:22:43 +0000 (02:22 -0700)]
NFS: fix nfs_path in case of a rename retry

commit 247db73560bc3e5aef6db50c443c3c0db115bc93 upstream.

We are generating incorrect path in case of rename retry because
we are restarting from wrong dentry. We should restart from the
dentry which was received in the call to nfs_path.

CC: stable@vger.kernel.org
Signed-off-by: Ashish Sangwan <ashishsangwan2@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoACPI: configfs: Add missing config_item_put() to fix refcount leak
Hanjun Guo [Fri, 18 Sep 2020 09:13:28 +0000 (17:13 +0800)]
ACPI: configfs: Add missing config_item_put() to fix refcount leak

commit 9a2e849fb6de471b82d19989a7944d3b7671793c upstream.

config_item_put() should be called in the drop_item callback, to
decrement refcount for the config item.

Fixes: 772bf1e2878ec ("ACPI: configfs: Unload SSDT on configfs entry removal")
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
[ rjw: Subject edit ]
Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agofs: Don't invalidate page buffers in block_write_full_page()
Jan Kara [Fri, 4 Sep 2020 08:58:51 +0000 (10:58 +0200)]
fs: Don't invalidate page buffers in block_write_full_page()

commit 6dbf7bb555981fb5faf7b691e8f6169fc2b2e63b upstream.

If block_write_full_page() is called for a page that is beyond current
inode size, it will truncate page buffers for the page and return 0.
This logic has been added in 2.5.62 in commit 81eb69062588 ("fix ext3
BUG due to race with truncate") in history.git tree to fix a problem
with ext3 in data=ordered mode. This particular problem doesn't exist
anymore because ext3 is long gone and ext4 handles ordered data
differently. Also normally buffers are invalidated by truncate code and
there's no need to specially handle this in ->writepage() code.

This invalidation of page buffers in block_write_full_page() is causing
issues to filesystems (e.g. ext4 or ocfs2) when block device is shrunk
under filesystem's hands and metadata buffers get discarded while being
tracked by the journalling layer. Although it is obviously "not
supported" it can cause kernel crashes like:

[ 7986.689400] BUG: unable to handle kernel NULL pointer dereference at
+0000000000000008
[ 7986.697197] PGD 0 P4D 0
[ 7986.699724] Oops: 0002 [#1] SMP PTI
[ 7986.703200] CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G
+O     --------- -  - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1
[ 7986.716438] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015
[ 7986.723462] RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2]
...
[ 7986.810150] Call Trace:
[ 7986.812595]  __jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2]
[ 7986.818408]  jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2]
[ 7986.836467]  kjournald2+0xbd/0x270 [jbd2]

which is not great. The crash happens because bh->b_private is suddently
NULL although BH_JBD flag is still set (this is because
block_invalidatepage() cleared BH_Mapped flag and subsequent bh lookup
found buffer without BH_Mapped set, called init_page_buffers() which has
rewritten bh->b_private). So just remove the invalidation in
block_write_full_page().

Note that the buffer cache invalidation when block device changes size
is already careful to avoid similar problems by using
invalidate_mapping_pages() which skips busy buffers so it was only this
odd block_write_full_page() behavior that could tear down bdev buffers
under filesystem's hands.

Reported-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomedia: uvcvideo: Fix uvc_ctrl_fixup_xu_info() not having any effect
Hans de Goede [Tue, 28 Jul 2020 11:22:08 +0000 (13:22 +0200)]
media: uvcvideo: Fix uvc_ctrl_fixup_xu_info() not having any effect

commit 93df48d37c3f03886d84831992926333e7810640 upstream.

uvc_ctrl_add_info() calls uvc_ctrl_get_flags() which will override
the fixed-up flags set by uvc_ctrl_fixup_xu_info().

uvc_ctrl_init_xu_ctrl() already calls uvc_ctrl_get_flags() before
calling uvc_ctrl_add_info(), so the uvc_ctrl_get_flags() call in
uvc_ctrl_add_info() is not necessary for xu ctrls.

This commit moves the uvc_ctrl_get_flags() call for normal controls
from uvc_ctrl_add_info() to uvc_ctrl_init_ctrl(), so that we no longer
call uvc_ctrl_get_flags() twice for xu controls and so that we no longer
override the fixed-up flags set by uvc_ctrl_fixup_xu_info().

This fixes the xu motor controls not working properly on a Logitech
046d:08cc, and presumably also on the other Logitech models which have
a quirk for this in the uvc_ctrl_fixup_xu_info() function.

Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agohwmon: (pmbus/max34440) Fix OC fault limits
Steve Foreman [Fri, 2 Oct 2020 16:35:38 +0000 (16:35 +0000)]
hwmon: (pmbus/max34440) Fix OC fault limits

commit 2b52278150c49559a472f2d6dd66f6f3b405378e upstream.

The max34* family have the IOUT_OC_WARN_LIMIT and IOUT_OC_CRIT_LIMIT
registers swapped.

Cc: stable@vger.kernel.org
Signed-off-by: Steve Foreman <foremans@google.com>
[groeck: Updated subject, use C comment style, tab after defines]
[groeck: Added missing break; statements (by alexandru.ardelean@analog.com)]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoleds: bcm6328, bcm6358: use devres LED registering function
Marek Behún [Thu, 17 Sep 2020 22:32:58 +0000 (00:32 +0200)]
leds: bcm6328, bcm6358: use devres LED registering function

commit ff5c89d44453e7ad99502b04bf798a3fc32c758b upstream.

These two drivers do not provide remove method and use devres for
allocation of other resources, yet they use led_classdev_register
instead of the devres variant, devm_led_classdev_register.

Fix this.

Signed-off-by: Marek Behún <marek.behun@nic.cz>
Cc: Álvaro Fernández Rojas <noltari@gmail.com>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Jaedon Shin <jaedon.shin@gmail.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoextcon: ptn5150: Fix usage of atomic GPIO with sleeping GPIO chips
Krzysztof Kozlowski [Mon, 17 Aug 2020 07:00:00 +0000 (09:00 +0200)]
extcon: ptn5150: Fix usage of atomic GPIO with sleeping GPIO chips

commit 6aaad58c872db062f7ea2761421ca748bd0931cc upstream.

The driver uses atomic version of gpiod_set_value() without any real
reason.  It is called in a workqueue under mutex so it could sleep
there.  Changing it to "can_sleep" flavor allows to use the driver with
all GPIO chips.

Fixes: 4ed754de2d66 ("extcon: Add support for ptn5150 extcon driver")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Vijai Kumar K <vijaikumar.kanagarajan@gmail.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agospi: sprd: Release DMA channel also on probe deferral
Krzysztof Kozlowski [Tue, 1 Sep 2020 15:27:03 +0000 (17:27 +0200)]
spi: sprd: Release DMA channel also on probe deferral

commit 687a2e76186dcfa42f22c14b655c3fb159839e79 upstream.

If dma_request_chan() for TX channel fails with EPROBE_DEFER, the RX
channel would not be released and on next re-probe it would be requested
second time.

Fixes: 386119bc7be9 ("spi: sprd: spi: sprd: Add DMA mode support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Chunyan Zhang <zhang.lyra@gmail.com>
Link: https://lore.kernel.org/r/20200901152713.18629-1-krzk@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agospi: spi-mtk-nor: fix timeout calculation overflow
Chuanhong Guo [Tue, 22 Sep 2020 11:49:02 +0000 (19:49 +0800)]
spi: spi-mtk-nor: fix timeout calculation overflow

commit 4cafaddedb5fbef9531202ee547784409fd0de33 upstream.

CLK_TO_US macro is used to calculate potential transfer time for various
timeout handling. However it overflows on transfer bigger than 512 bytes
because it first did (len * 8 * 1000000).
This controller typically operates at 45MHz. This patch did 2 things:
1. calculate clock / 1000000 first
2. add a 4M transfer size cap so that the final timeout in DMA reading
   doesn't overflow

Fixes: 881d1ee9fe81f ("spi: add support for mediatek spi-nor controller")
Cc: <stable@vger.kernel.org>
Signed-off-by: Chuanhong Guo <gch981213@gmail.com>
Link: https://lore.kernel.org/r/20200922114905.2942859-1-gch981213@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoperf/x86/amd/ibs: Fix raw sample data accumulation
Kim Phillips [Tue, 8 Sep 2020 21:47:38 +0000 (16:47 -0500)]
perf/x86/amd/ibs: Fix raw sample data accumulation

commit 36e1be8ada994d509538b3b1d0af8b63c351e729 upstream.

Neither IbsBrTarget nor OPDATA4 are populated in IBS Fetch mode.
Don't accumulate them into raw sample user data in that case.

Also, in Fetch mode, add saving the IBS Fetch Control Extended MSR.

Technically, there is an ABI change here with respect to the IBS raw
sample data format, but I don't see any perf driver version information
being included in perf.data file headers, but, existing users can detect
whether the size of the sample record has reduced by 8 bytes to
determine whether the IBS driver has this fix.

Fixes: 904cb3677f3a ("perf/x86/amd/ibs: Update IBS MSRs and feature definitions")
Reported-by: Stephane Eranian <stephane.eranian@google.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200908214740.18097-6-kim.phillips@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoperf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count()
Kim Phillips [Tue, 8 Sep 2020 21:47:37 +0000 (16:47 -0500)]
perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count()

commit 680d69635005ba0e58fe3f4c52fc162b8fc743b0 upstream.

get_ibs_op_count() adds hardware's current count (IbsOpCurCnt) bits
to its count regardless of hardware's valid status.

According to the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54,
if the counter rolls over, valid status is set, and the lower 7 bits
of IbsOpCurCnt are randomized by hardware.

Don't include those bits in the driver's event count.

Fixes: 8b1e13638d46 ("perf/x86-ibs: Fix usage of IBS op current count")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoperf/amd/uncore: Set all slices and threads to restore perf stat -a behaviour
Kim Phillips [Tue, 8 Sep 2020 21:47:34 +0000 (16:47 -0500)]
perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviour

commit c8fe99d0701fec9fb849ec880a86bc5592530496 upstream.

Commit 2f217d58a8a0 ("perf/x86/amd/uncore: Set the thread mask for
F17h L3 PMCs") inadvertently changed the uncore driver's behaviour
wrt perf tool invocations with or without a CPU list, specified with
-C / --cpu=.

Change the behaviour of the driver to assume the former all-cpu (-a)
case, which is the more commonly desired default.  This fixes
'-a -A' invocations without explicit cpu lists (-C) to not count
L3 events only on behalf of the first thread of the first core
in the L3 domain.

BEFORE:

Activity performed by the first thread of the last core (CPU#43) in
CPU#40's L3 domain is not reported by CPU#40:

sudo perf stat -a -A -e l3_request_g1.caching_l3_cache_accesses taskset -c 43 perf bench mem memcpy -s 32mb -l 100 -f default
...
CPU36                 21,835      l3_request_g1.caching_l3_cache_accesses
CPU40                 87,066      l3_request_g1.caching_l3_cache_accesses
CPU44                 17,360      l3_request_g1.caching_l3_cache_accesses
...

AFTER:

The L3 domain activity is now reported by CPU#40:

sudo perf stat -a -A -e l3_request_g1.caching_l3_cache_accesses taskset -c 43 perf bench mem memcpy -s 32mb -l 100 -f default
...
CPU36                354,891      l3_request_g1.caching_l3_cache_accesses
CPU40              1,780,870      l3_request_g1.caching_l3_cache_accesses
CPU44                315,062      l3_request_g1.caching_l3_cache_accesses
...

Fixes: 2f217d58a8a0 ("perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200908214740.18097-2-kim.phillips@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoperf/x86/amd: Fix sampling Large Increment per Cycle events
Kim Phillips [Tue, 8 Sep 2020 21:47:35 +0000 (16:47 -0500)]
perf/x86/amd: Fix sampling Large Increment per Cycle events

commit 26e52558ead4b39c0e0fe7bf08f82f5a9777a412 upstream.

Commit 5738891229a2 ("perf/x86/amd: Add support for Large Increment
per Cycle Events") mistakenly zeroes the upper 16 bits of the count
in set_period().  That's fine for counting with perf stat, but not
sampling with perf record when only Large Increment events are being
sampled.  To enable sampling, we sign extend the upper 16 bits of the
merged counter pair as described in the Family 17h PPRs:

"Software wanting to preload a value to a merged counter pair writes the
high-order 16-bit value to the low-order 16 bits of the odd counter and
then writes the low-order 48-bit value to the even counter. Reading the
even counter of the merged counter pair returns the full 64-bit value."

Fixes: 5738891229a2 ("perf/x86/amd: Add support for Large Increment per Cycle Events")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoperf/x86/intel: Fix Ice Lake event constraint table
Kan Liang [Mon, 28 Sep 2020 13:47:26 +0000 (06:47 -0700)]
perf/x86/intel: Fix Ice Lake event constraint table

commit 010cb00265f150bf82b23c02ad1fb87ce5c781e1 upstream.

An error occues when sampling non-PEBS INST_RETIRED.PREC_DIST(0x01c0)
event.

  perf record -e cpu/event=0xc0,umask=0x01/ -- sleep 1
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument)
  for event (cpu/event=0xc0,umask=0x01/).
  /bin/dmesg | grep -i perf may provide additional information.

The idxmsk64 of the event is set to 0. The event never be successfully
scheduled.

The event should be limit to the fixed counter 0.

Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support")
Reported-by: Yi, Ammy <ammy.yi@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200928134726.13090-1-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoselftests/x86/fsgsbase: Test PTRACE_PEEKUSER for GSBASE with invalid LDT GS
Andy Lutomirski [Wed, 26 Aug 2020 17:00:46 +0000 (10:00 -0700)]
selftests/x86/fsgsbase: Test PTRACE_PEEKUSER for GSBASE with invalid LDT GS

commit 1b9abd1755ad947d7c9913e92e7837b533124c90 upstream.

This tests commit:

  8ab49526b53d ("x86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task")

Unpatched kernels will OOPS.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/c618ae86d1f757e01b1a8e79869f553cb88acf9a.1598461151.git.luto@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoseccomp: Make duplicate listener detection non-racy
Jann Horn [Mon, 5 Oct 2020 01:44:01 +0000 (03:44 +0200)]
seccomp: Make duplicate listener detection non-racy

commit dfe719fef03d752f1682fa8aeddf30ba501c8555 upstream.

Currently, init_listener() tries to prevent adding a filter with
SECCOMP_FILTER_FLAG_NEW_LISTENER if one of the existing filters already
has a listener. However, this check happens without holding any lock that
would prevent another thread from concurrently installing a new filter
(potentially with a listener) on top of the ones we already have.

Theoretically, this is also a data race: The plain load from
current->seccomp.filter can race with concurrent writes to the same
location.

Fix it by moving the check into the region that holds the siglock to guard
against concurrent TSYNC.

(The "Fixes" tag points to the commit that introduced the theoretical
data race; concurrent installation of another filter with TSYNC only
became possible later, in commit 51891498f2da ("seccomp: allow TSYNC and
USER_NOTIF together").)

Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201005014401.490175-1-jannh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomm: memcg/slab: uncharge during kmem_cache_free_bulk()
Bharata B Rao [Tue, 13 Oct 2020 23:53:09 +0000 (16:53 -0700)]
mm: memcg/slab: uncharge during kmem_cache_free_bulk()

commit d1b2cf6cb84a9bd0de6f151512648dd1af82f80f upstream.

Object cgroup charging is done for all the objects during allocation, but
during freeing, uncharging ends up happening for only one object in the
case of bulk allocation/freeing.

Fix this by having a separate call to uncharge all the objects from
kmem_cache_free_bulk() and by modifying memcg_slab_free_hook() to take
care of bulk uncharging.

Fixes: 964d4bd370d5 ("mm: memcg/slab: save obj_cgroup for non-root slab objects"
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201009060423.390479-1-bharata@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agommc: sdhci-acpi: AMDI0040: Set SDHCI_QUIRK2_PRESET_VALUE_BROKEN
Raul E Rangel [Mon, 28 Sep 2020 21:59:20 +0000 (15:59 -0600)]
mmc: sdhci-acpi: AMDI0040: Set SDHCI_QUIRK2_PRESET_VALUE_BROKEN

commit f23cc3ba491af77395cea3f9d51204398729f26b upstream.

This change fixes HS400 tuning for devices with invalid presets.

SDHCI presets are not currently used for eMMC HS/HS200/HS400, but are
used for DDR52. The HS400 retuning sequence is:

    HS400->DDR52->HS->HS200->Perform Tuning->HS->HS400

This means that when HS400 tuning happens, we transition through DDR52
for a very brief period. This causes presets to be enabled
unintentionally and stay enabled when transitioning back to HS200 or
HS400. Some firmware has invalid presets, so we end up with driver
strengths that can cause I/O problems.

Fixes: 34597a3f60b1 ("mmc: sdhci-acpi: Add support for ACPI HID of AMD Controller with HS400")
Signed-off-by: Raul E Rangel <rrangel@chromium.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200928154718.1.Icc21d4b2f354e83e26e57e270dc952f5fe0b0a40@changeid
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agommc: sdhci: Add LTR support for some Intel BYT based controllers
Adrian Hunter [Tue, 18 Aug 2020 10:45:08 +0000 (13:45 +0300)]
mmc: sdhci: Add LTR support for some Intel BYT based controllers

commit 46f4a69ec8ed6ab9f6a6172afe50df792c8bc1b6 upstream.

Some Intel BYT based host controllers support the setting of latency
tolerance.  Accordingly, implement the PM QoS ->set_latency_tolerance()
callback.  The raw register values are also exposed via debugfs.

Intel EHL controllers require this support.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: cb3a7d4a0aec4e ("mmc: sdhci-pci: Add support for Intel EHL")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200818104508.7149-1-adrian.hunter@intel.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomd/raid5: fix oops during stripe resizing
Song Liu [Mon, 5 Oct 2020 16:35:21 +0000 (09:35 -0700)]
md/raid5: fix oops during stripe resizing

commit b44c018cdf748b96b676ba09fdbc5b34fc443ada upstream.

KoWei reported crash during raid5 reshape:

[ 1032.252932] Oops: 0002 [#1] SMP PTI
[...]
[ 1032.252943] RIP: 0010:memcpy_erms+0x6/0x10
[...]
[ 1032.252947] RSP: 0018:ffffba1ac0c03b78 EFLAGS: 00010286
[ 1032.252949] RAX: 0000784ac0000000 RBX: ffff91bec3d09740 RCX: 0000000000001000
[ 1032.252951] RDX: 0000000000001000 RSI: ffff91be6781c000 RDI: 0000784ac0000000
[ 1032.252953] RBP: ffffba1ac0c03bd8 R08: 0000000000001000 R09: ffffba1ac0c03bf8
[ 1032.252954] R10: 0000000000000000 R11: 0000000000000000 R12: ffffba1ac0c03bf8
[ 1032.252955] R13: 0000000000001000 R14: 0000000000000000 R15: 0000000000000000
[ 1032.252958] FS:  0000000000000000(0000) GS:ffff91becf500000(0000) knlGS:0000000000000000
[ 1032.252959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1032.252961] CR2: 0000784ac0000000 CR3: 000000031780a002 CR4: 00000000001606e0
[ 1032.252962] Call Trace:
[ 1032.252969]  ? async_memcpy+0x179/0x1000 [async_memcpy]
[ 1032.252977]  ? raid5_release_stripe+0x8e/0x110 [raid456]
[ 1032.252982]  handle_stripe_expansion+0x15a/0x1f0 [raid456]
[ 1032.252988]  handle_stripe+0x592/0x1270 [raid456]
[ 1032.252993]  handle_active_stripes.isra.0+0x3cb/0x5a0 [raid456]
[ 1032.252999]  raid5d+0x35c/0x550 [raid456]
[ 1032.253002]  ? schedule+0x42/0xb0
[ 1032.253006]  ? schedule_timeout+0x10e/0x160
[ 1032.253011]  md_thread+0x97/0x160
[ 1032.253015]  ? wait_woken+0x80/0x80
[ 1032.253019]  kthread+0x104/0x140
[ 1032.253022]  ? md_start_sync+0x60/0x60
[ 1032.253024]  ? kthread_park+0x90/0x90
[ 1032.253027]  ret_from_fork+0x35/0x40

This is because cache_size_mutex was unlocked too early in resize_stripes,
which races with grow_one_stripe() that grow_one_stripe() allocates a
stripe with wrong pool_size.

Fix this issue by unlocking cache_size_mutex after updating pool_size.

Cc: <stable@vger.kernel.org> # v4.4+
Reported-by: KoWei Sung <winders@amazon.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agomd: fix the checking of wrong work queue
Guoqing Jiang [Thu, 8 Oct 2020 03:19:09 +0000 (05:19 +0200)]
md: fix the checking of wrong work queue

commit cf0b9b4821a2955f8a23813ef8f422208ced9bd7 upstream.

It should check md_rdev_misc_wq instead of md_misc_wq.

Fixes: cc1ffe61c026 ("md: add new workqueue for delete rdev")
Cc: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoirqchip/loongson-htvec: Fix initial interrupt clearing
Huacai Chen [Fri, 11 Sep 2020 10:26:18 +0000 (18:26 +0800)]
irqchip/loongson-htvec: Fix initial interrupt clearing

commit 1d1e5630de78f7253ac24b92cee6427c3ff04d56 upstream.

In htvec_reset() only the first group of initial interrupts is cleared.
This sometimes causes spurious interrupts, so let's clear all groups.

While at it, fix the nearby comment that to match the reality of what
the driver does.

Fixes: 818e915fbac518e8c78e1877 ("irqchip: Add Loongson HyperTransport Vector support")
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1599819978-13999-2-git-send-email-chenhc@lemote.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agovmlinux.lds.h: Add PGO and AutoFDO input sections
Nick Desaulniers [Fri, 21 Aug 2020 19:42:47 +0000 (12:42 -0700)]
vmlinux.lds.h: Add PGO and AutoFDO input sections

commit eff8728fe69880d3f7983bec3fb6cea4c306261f upstream.

Basically, consider .text.{hot|unlikely|unknown}.* part of .text, too.

When compiling with profiling information (collected via PGO
instrumentations or AutoFDO sampling), Clang will separate code into
.text.hot, .text.unlikely, or .text.unknown sections based on profiling
information. After D79600 (clang-11), these sections will have a
trailing `.` suffix, ie.  .text.hot., .text.unlikely., .text.unknown..

When using -ffunction-sections together with profiling infomation,
either explicitly (FGKASLR) or implicitly (LTO), code may be placed in
sections following the convention:
.text.hot.<foo>, .text.unlikely.<bar>, .text.unknown.<baz>
where <foo>, <bar>, and <baz> are functions.  (This produces one section
per function; we generally try to merge these all back via linker script
so that we don't have 50k sections).

For the above cases, we need to teach our linker scripts that such
sections might exist and that we'd explicitly like them grouped
together, otherwise we can wind up with code outside of the
_stext/_etext boundaries that might not be mapped properly for some
architectures, resulting in boot failures.

If the linker script is not told about possible input sections, then
where the section is placed as output is a heuristic-laiden mess that's
non-portable between linkers (ie. BFD and LLD), and has resulted in many
hard to debug bugs.  Kees Cook is working on cleaning this up by adding
--orphan-handling=warn linker flag used in ARCH=powerpc to additional
architectures. In the case of linker scripts, borrowing from the Zen of
Python: explicit is better than implicit.

Also, ld.bfd's internal linker script considers .text.hot AND
.text.hot.* to be part of .text, as well as .text.unlikely and
.text.unlikely.*. I didn't see support for .text.unknown.*, and didn't
see Clang producing such code in our kernel builds, but I see code in
LLVM that can produce such section names if profiling information is
missing. That may point to a larger issue with generating or collecting
profiles, but I would much rather be safe and explicit than have to
debug yet another issue related to orphan section placement.

Reported-by: Jian Cai <jiancai@google.com>
Suggested-by: Fāng-ruì Sòng <maskray@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Luis Lozano <llozano@google.com>
Tested-by: Manoj Gupta <manojgupta@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=add44f8d5c5c05e08b11e033127a744d61c26aee
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=1de778ed23ce7492c523d5850c6c6dbb34152655
Link: https://reviews.llvm.org/D79600
Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1084760
Link: https://lore.kernel.org/r/20200821194310.3089815-7-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Debugged-by: Luis Lozano <llozano@google.com>
4 years agonvme-rdma: fix crash when connect rejected
Chao Leng [Mon, 12 Oct 2020 08:10:40 +0000 (16:10 +0800)]
nvme-rdma: fix crash when connect rejected

[ Upstream commit 43efdb8e870ee0f58633fd579aa5b5185bf5d39e ]

A crash can happened when a connect is rejected.   The host establishes
the connection after received ConnectReply, and then continues to send
the fabrics Connect command.  If the controller does not receive the
ReadyToUse capsule, host may receive a ConnectReject reply.

Call nvme_rdma_destroy_queue_ib after the host received the
RDMA_CM_EVENT_REJECTED event.  Then when the fabrics Connect command
times out, nvme_rdma_timeout calls nvme_rdma_complete_rq to fail the
request.  A crash happenes due to use after free in
nvme_rdma_complete_rq.

nvme_rdma_destroy_queue_ib is redundant when handling the
RDMA_CM_EVENT_REJECTED event as nvme_rdma_destroy_queue_ib is already
called in connection failure handler.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agosgl_alloc_order: fix memory leak
Douglas Gilbert [Thu, 15 Oct 2020 18:57:35 +0000 (14:57 -0400)]
sgl_alloc_order: fix memory leak

[ Upstream commit b2a182a40278bc5849730e66bca01a762188ed86 ]

sgl_alloc_order() can fail when 'length' is large on a memory
constrained system. When order > 0 it will potentially be
making several multi-page allocations with the later ones more
likely to fail than the earlier one. So it is important that
sgl_alloc_order() frees up any pages it has obtained before
returning NULL. In the case when order > 0 it calls the wrong
free page function and leaks. In testing the leak was
sufficient to bring down my 8 GiB laptop with OOM.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agonbd: make the config put is called before the notifying the waiter
Xiubo Li [Wed, 14 Oct 2020 02:45:14 +0000 (22:45 -0400)]
nbd: make the config put is called before the notifying the waiter

[ Upstream commit 87aac3a80af5cbad93e63250e8a1e19095ba0d30 ]

There has one race case for ceph's rbd-nbd tool. When do mapping
it may fail with EBUSY from ioctl(nbd, NBD_DO_IT), but actually
the nbd device has already unmaped.

It dues to if just after the wake_up(), the recv_work() is scheduled
out and defers calling the nbd_config_put(), though the map process
has exited the "nbd->recv_task" is not cleared.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoarm64: dts: qcom: kitakami: Temporarily disable SDHCI1
Konrad Dybcio [Fri, 14 Aug 2020 15:47:49 +0000 (17:47 +0200)]
arm64: dts: qcom: kitakami: Temporarily disable SDHCI1

[ Upstream commit e884fb6cc89dce1debeae33704edd7735a3d6d9c ]

There is an issue with Kitakami eMMCs dying when a quirk
isn't addressed. Until that happens, disable it.

Signed-off-by: Konrad Dybcio <konradybcio@gmail.com>
Link: https://lore.kernel.org/r/20200814154749.257837-1-konradybcio@gmail.com
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agofirmware: arm_scmi: Move scmi bus init and exit calls into the driver
Sudeep Holla [Mon, 7 Sep 2020 11:00:04 +0000 (12:00 +0100)]
firmware: arm_scmi: Move scmi bus init and exit calls into the driver

[ Upstream commit 5a2f0a0bdf201e2183904b6217f9c74774c961a8 ]

In preparation to enable building scmi as a single module, let us move
the scmi bus {de-,}initialisation call into the driver.

The main reason for this is to keep it simple instead of maintaining
it as separate modules and dealing with all possible initcall races
and deferred probe handling. We can move it as separate modules if
needed in future.

Link: https://lore.kernel.org/r/20200907195046.56615-3-sudeep.holla@arm.com
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agobindings: soc: ti: soc: ringacc: remove ti,dma-ring-reset-quirk
Grygorii Strashko [Sat, 12 Sep 2020 04:30:23 +0000 (21:30 -0700)]
bindings: soc: ti: soc: ringacc: remove ti,dma-ring-reset-quirk

[ Upstream commit aee123f48f387ea62002cddb46c7cb04c96628df ]

Remove "ti,dma-ring-reset-quirk" DT property as proper w/a handling is
implemented now in Ringacc driver using SoC info.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agosoc: ti: k3: ringacc: add am65x sr2.0 support
Grygorii Strashko [Sat, 12 Sep 2020 04:29:56 +0000 (21:29 -0700)]
soc: ti: k3: ringacc: add am65x sr2.0 support

[ Upstream commit 95e7be062aea6d2e09116cd4d28957d310c04781 ]

The AM65x SR2.0 Ringacc has fixed errata i2023 "RINGACC, UDMA: RINGACC and
UDMA Ring State Interoperability Issue after Channel Teardown". This errata
also fixed for J271E SoC.

Use SOC bus data for K3 SoC identification and enable i2023 errate w/a only
for the AM65x SR1.0. This also makes obsolete "ti,dma-ring-reset-quirk" DT
property.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agosoc: qcom: rpmh-rsc: Sleep waiting for tcs slots to be free
Stephen Boyd [Fri, 24 Jul 2020 21:17:11 +0000 (14:17 -0700)]
soc: qcom: rpmh-rsc: Sleep waiting for tcs slots to be free

[ Upstream commit 2bc20f3c8487bd5bc4dd9ad2c06d2ba05fd4e838 ]

The busy loop in rpmh_rsc_send_data() is written with the assumption
that the udelay will be preempted by the tcs_tx_done() irq handler when
the TCS slots are all full. This doesn't hold true when the calling
thread is an irqthread and the tcs_tx_done() irq is also an irqthread.
That's because kernel irqthreads are SCHED_FIFO and thus need to
voluntarily give up priority by calling into the scheduler so that other
threads can run.

I see RCU stalls when I boot with irqthreads on the kernel commandline
because the modem remoteproc driver is trying to send an rpmh async
message from an irqthread that needs to give up the CPU for the rpmh
irqthread to run and clear out tcs slots.

 rcu: INFO: rcu_preempt self-detected stall on CPU
 rcu:     0-....: (1 GPs behind) idle=402/1/0x4000000000000002 softirq=2108/2109 fqs=4920
  (t=21016 jiffies g=2933 q=590)
 Task dump for CPU 0:
 irq/11-smp2p    R  running task        0   148      2 0x00000028
 Call trace:
  dump_backtrace+0x0/0x154
  show_stack+0x20/0x2c
  sched_show_task+0xfc/0x108
  dump_cpu_task+0x44/0x50
  rcu_dump_cpu_stacks+0xa4/0xf8
  rcu_sched_clock_irq+0x7dc/0xaa8
  update_process_times+0x30/0x54
  tick_sched_handle+0x50/0x64
  tick_sched_timer+0x4c/0x8c
  __hrtimer_run_queues+0x21c/0x36c
  hrtimer_interrupt+0xf0/0x22c
  arch_timer_handler_phys+0x40/0x50
  handle_percpu_devid_irq+0x114/0x25c
  __handle_domain_irq+0x84/0xc4
  gic_handle_irq+0xd0/0x178
  el1_irq+0xbc/0x180
  save_return_addr+0x18/0x28
  return_address+0x54/0x88
  preempt_count_sub+0x40/0x88
  _raw_spin_unlock_irqrestore+0x4c/0x6c
  ___ratelimit+0xd0/0x128
  rpmh_rsc_send_data+0x24c/0x378
  __rpmh_write+0x1b0/0x208
  rpmh_write_async+0x90/0xbc
  rpmhpd_send_corner+0x60/0x8c
  rpmhpd_aggregate_corner+0x8c/0x124
  rpmhpd_set_performance_state+0x8c/0xbc
  _genpd_set_performance_state+0xdc/0x1b8
  dev_pm_genpd_set_performance_state+0xb8/0xf8
  q6v5_pds_disable+0x34/0x60 [qcom_q6v5_mss]
  qcom_msa_handover+0x38/0x44 [qcom_q6v5_mss]
  q6v5_handover_interrupt+0x24/0x3c [qcom_q6v5]
  handle_nested_irq+0xd0/0x138
  qcom_smp2p_intr+0x188/0x200
  irq_thread_fn+0x2c/0x70
  irq_thread+0xfc/0x14c
  kthread+0x11c/0x12c
  ret_from_fork+0x10/0x18

This busy loop naturally lends itself to using a wait queue so that each
thread that tries to send a message will sleep waiting on the waitqueue
and only be woken up when a free slot is available. This should make
things more predictable too because the scheduler will be able to sleep
tasks that are waiting on a free tcs instead of the busy loop we
currently have today.

Reviewed-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Maulik Shah <mkshah@codeaurora.org>
Cc: Lina Iyer <ilina@codeaurora.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20200724211711.810009-1-sboyd@kernel.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoARM: dts: s5pv210: align SPI GPIO node name with dtschema in Aries
Krzysztof Kozlowski [Mon, 7 Sep 2020 16:11:40 +0000 (18:11 +0200)]
ARM: dts: s5pv210: align SPI GPIO node name with dtschema in Aries

[ Upstream commit 1ed7f6d0bab2f1794f1eb4ed032e90575552fd21 ]

The device tree schema expects SPI controller to be named "spi",
otherwise dtbs_check complain with a warning like:

  spi-gpio-0: $nodename:0: 'spi-gpio-0' does not match '^spi(@.*|-[0-9a-f])*$'

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Tested-by: Jonathan Bakker <xc-racer2@live.ca>
Link: https://lore.kernel.org/r/20200907161141.31034-25-krzk@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoARM: dts: s5pv210: add RTC 32 KHz clock in Aries family
Krzysztof Kozlowski [Mon, 7 Sep 2020 16:11:27 +0000 (18:11 +0200)]
ARM: dts: s5pv210: add RTC 32 KHz clock in Aries family

[ Upstream commit 086c4498b0cc87fdb09188f3da7056e898814948 ]

The S3C RTC requires 32768 Hz clock as input which is provided by PMIC.
However there is no such clock provider but rather a regulator driver
which registers the clock as a regulator.  This is an old driver which
will not be updated so add a workaround - a fixed-clock to fill missing
clock phandle reference in S3C RTC.

This fixes dtbs_check warnings:

  rtc@e2800000: clocks: [[2, 145]] is too short
  rtc@e2800000: clock-names: ['rtc'] is too short

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Tested-by: Jonathan Bakker <xc-racer2@live.ca>
Link: https://lore.kernel.org/r/20200907161141.31034-12-krzk@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoARM: dts: s5pv210: remove dedicated 'audio-subsystem' node
Krzysztof Kozlowski [Mon, 7 Sep 2020 16:11:24 +0000 (18:11 +0200)]
ARM: dts: s5pv210: remove dedicated 'audio-subsystem' node

[ Upstream commit 6c17a2974abf68a58517f75741b15c4aba42b4b8 ]

The 'audio-subsystem' node is an artificial creation, not representing
real hardware.  The hardware is described by its nodes - AUDSS clock
controller and I2S0.

Remove the 'audio-subsystem' node along with its undocumented compatible
to fix dtbs_check warnings like:

  audio-subsystem: $nodename:0: 'audio-subsystem' does not match '^([a-z][a-z0-9\\-]+-bus|bus|soc|axi|ahb|apb)(@[0-9a-f]+)?$'

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Tested-by: Jonathan Bakker <xc-racer2@live.ca>
Link: https://lore.kernel.org/r/20200907161141.31034-9-krzk@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoARM: dts: s5pv210: move PMU node out of clock controller
Krzysztof Kozlowski [Mon, 7 Sep 2020 16:11:23 +0000 (18:11 +0200)]
ARM: dts: s5pv210: move PMU node out of clock controller

[ Upstream commit bb98fff84ad1ea321823759edaba573a16fa02bd ]

The Power Management Unit (PMU) is a separate device which has little
common with clock controller.  Moving it to one level up (from clock
controller child to SoC) allows to remove fake simple-bus compatible and
dtbs_check warnings like:

  clock-controller@e0100000: $nodename:0:
    'clock-controller@e0100000' does not match '^([a-z][a-z0-9\\-]+-bus|bus|soc|axi|ahb|apb)(@[0-9a-f]+)?$'

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Tested-by: Jonathan Bakker <xc-racer2@live.ca>
Link: https://lore.kernel.org/r/20200907161141.31034-8-krzk@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoARM: dts: s5pv210: move fixed clocks under root node
Krzysztof Kozlowski [Mon, 7 Sep 2020 16:11:22 +0000 (18:11 +0200)]
ARM: dts: s5pv210: move fixed clocks under root node

[ Upstream commit d38cae370e5f2094cbc38db3082b8e9509ae52ce ]

The fixed clocks are kept under dedicated 'external-clocks' node, thus a
fake 'reg' was added.  This is not correct with dtschema as fixed-clock
binding does not have a 'reg' property.  Moving fixed clocks out of
'soc' to root node fixes multiple dtbs_check warnings:

  external-clocks: $nodename:0: 'external-clocks' does not match '^([a-z][a-z0-9\\-]+-bus|bus|soc|axi|ahb|apb)(@[0-9a-f]+)?$'
  external-clocks: #size-cells:0:0: 0 is not one of [1, 2]
  external-clocks: oscillator@0:reg:0: [0] is too short
  external-clocks: oscillator@1:reg:0: [1] is too short
  external-clocks: 'ranges' is a required property
  oscillator@0: 'reg' does not match any of the regexes: 'pinctrl-[0-9]+'

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Tested-by: Jonathan Bakker <xc-racer2@live.ca>
Link: https://lore.kernel.org/r/20200907161141.31034-7-krzk@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoARM: dts: s5pv210: remove DMA controller bus node name to fix dtschema warnings
Krzysztof Kozlowski [Mon, 7 Sep 2020 16:11:21 +0000 (18:11 +0200)]
ARM: dts: s5pv210: remove DMA controller bus node name to fix dtschema warnings

[ Upstream commit ea4e792f3c8931fffec4d700cf6197d84e9f35a6 ]

There is no need to keep DMA controller nodes under AMBA bus node.
Remove the "amba" node to fix dtschema warnings like:

  amba: $nodename:0: 'amba' does not match '^([a-z][a-z0-9\\-]+-bus|bus|soc|axi|ahb|apb)(@[0-9a-f]+)?$'

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Tested-by: Jonathan Bakker <xc-racer2@live.ca>
Link: https://lore.kernel.org/r/20200907161141.31034-6-krzk@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoARM: dts: s5pv210: Enable audio on Aries boards
Jonathan Bakker [Wed, 2 Sep 2020 00:38:58 +0000 (17:38 -0700)]
ARM: dts: s5pv210: Enable audio on Aries boards

[ Upstream commit cd972fe90008adf49de0790250c1275480ac5cdc ]

Both the Galaxy S and the Fascinate4G have a WM8994 codec, but they
differ slightly in their jack detection and micbias configuration.

Signed-off-by: Jonathan Bakker <xc-racer2@live.ca>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomemory: emif: Remove bogus debugfs error handling
Dan Carpenter [Wed, 26 Aug 2020 11:37:59 +0000 (14:37 +0300)]
memory: emif: Remove bogus debugfs error handling

[ Upstream commit fd22781648080cc400772b3c68aa6b059d2d5420 ]

Callers are generally not supposed to check the return values from
debugfs functions.  Debugfs functions never return NULL so this error
handling will never trigger.  (Historically debugfs functions used to
return a mix of NULL and error pointers but it was eventually deemed too
complicated for something which wasn't intended to be used in normal
situations).

Delete all the error handling.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Link: https://lore.kernel.org/r/20200826113759.GF393664@mwanda
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoARM: dts: omap4: Fix sgx clock rate for 4430
Tony Lindgren [Tue, 10 Mar 2020 21:02:48 +0000 (14:02 -0700)]
ARM: dts: omap4: Fix sgx clock rate for 4430

[ Upstream commit 19d3e9a0bdd57b90175f30390edeb06851f5f9f3 ]

We currently have a different clock rate for droid4 compared to the
stock v3.0.8 based Android Linux kernel:

# cat /sys/kernel/debug/clk/dpll_*_m7x2_ck/clk_rate
266666667
307200000
# cat /sys/kernel/debug/clk/l3_gfx_cm:clk:0000:0/clk_rate
307200000

Let's fix this by configuring sgx to use 153.6 MHz instead of 307.2 MHz.
Looks like also at least duover needs this change to avoid hangs, so
let's apply it for all 4430.

This helps a bit with thermal issues that seem to be related to memory
corruption when using sgx. It seems that other driver related issues
still remain though.

Cc: Arthur Demchenkov <spinal.by@gmail.com>
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoarm64: dts: renesas: ulcb: add full-pwr-cycle-in-suspend into eMMC nodes
Yoshihiro Shimoda [Fri, 17 Jul 2020 12:33:21 +0000 (21:33 +0900)]
arm64: dts: renesas: ulcb: add full-pwr-cycle-in-suspend into eMMC nodes

[ Upstream commit 992d7a8b88c83c05664b649fc54501ce58e19132 ]

Add full-pwr-cycle-in-suspend property to do a graceful shutdown of
the eMMC device in system suspend.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/1594989201-24228-1-git-send-email-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agocifs: handle -EINTR in cifs_setattr
Ronnie Sahlberg [Thu, 8 Oct 2020 23:32:56 +0000 (09:32 +1000)]
cifs: handle -EINTR in cifs_setattr

[ Upstream commit c6cc4c5a72505a0ecefc9b413f16bec512f38078 ]

RHBZ: 1848178

Some calls that set attributes, like utimensat(), are not supposed to return
-EINTR and thus do not have handlers for this in glibc which causes us
to leak -EINTR to the applications which are also unprepared to handle it.

For example tar will break if utimensat() return -EINTR and abort unpacking
the archive. Other applications may break too.

To handle this we add checks, and retry, for -EINTR in cifs_setattr()

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoHandle STATUS_IO_TIMEOUT gracefully
Rohith Surabattula [Fri, 18 Sep 2020 05:37:28 +0000 (05:37 +0000)]
Handle STATUS_IO_TIMEOUT gracefully

[ Upstream commit 8e670f77c4a55013db6d23b962f9bf6673a5e7b6 ]

Currently STATUS_IO_TIMEOUT is not treated as retriable error.
It is currently mapped to ETIMEDOUT and returned to userspace
for most system calls. STATUS_IO_TIMEOUT is returned by server
in case of unavailability or throttling errors.

This patch will map the STATUS_IO_TIMEOUT to EAGAIN, so that it
can be retried. Also, added a check to drop the connection to
not overload the server in case of ongoing unavailability.

Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agogfs2: add validation checks for size of superblock
Anant Thazhemadam [Wed, 14 Oct 2020 16:31:09 +0000 (22:01 +0530)]
gfs2: add validation checks for size of superblock

[ Upstream commit 0ddc5154b24c96f20e94d653b0a814438de6032b ]

In gfs2_check_sb(), no validation checks are performed with regards to
the size of the superblock.
syzkaller detected a slab-out-of-bounds bug that was primarily caused
because the block size for a superblock was set to zero.
A valid size for a superblock is a power of 2 between 512 and PAGE_SIZE.
Performing validation checks and ensuring that the size of the superblock
is valid fixes this bug.

Reported-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
Tested-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
Suggested-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
[Minor code reordering.]
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agogfs2: use-after-free in sysfs deregistration
Jamie Iles [Mon, 12 Oct 2020 13:13:09 +0000 (14:13 +0100)]
gfs2: use-after-free in sysfs deregistration

[ Upstream commit c2a04b02c060c4858762edce4674d5cba3e5a96f ]

syzkaller found the following splat with CONFIG_DEBUG_KOBJECT_RELEASE=y:

  Read of size 1 at addr ffff000028e896b8 by task kworker/1:2/228

  CPU: 1 PID: 228 Comm: kworker/1:2 Tainted: G S                5.9.0-rc8+ #101
  Hardware name: linux,dummy-virt (DT)
  Workqueue: events kobject_delayed_cleanup
  Call trace:
   dump_backtrace+0x0/0x4d8
   show_stack+0x34/0x48
   dump_stack+0x174/0x1f8
   print_address_description.constprop.0+0x5c/0x550
   kasan_report+0x13c/0x1c0
   __asan_report_load1_noabort+0x34/0x60
   memcmp+0xd0/0xd8
   gfs2_uevent+0xc4/0x188
   kobject_uevent_env+0x54c/0x1240
   kobject_uevent+0x2c/0x40
   __kobject_del+0x190/0x1d8
   kobject_delayed_cleanup+0x2bc/0x3b8
   process_one_work+0x96c/0x18c0
   worker_thread+0x3f0/0xc30
   kthread+0x390/0x498
   ret_from_fork+0x10/0x18

  Allocated by task 1110:
   kasan_save_stack+0x28/0x58
   __kasan_kmalloc.isra.0+0xc8/0xe8
   kasan_kmalloc+0x10/0x20
   kmem_cache_alloc_trace+0x1d8/0x2f0
   alloc_super+0x64/0x8c0
   sget_fc+0x110/0x620
   get_tree_bdev+0x190/0x648
   gfs2_get_tree+0x50/0x228
   vfs_get_tree+0x84/0x2e8
   path_mount+0x1134/0x1da8
   do_mount+0x124/0x138
   __arm64_sys_mount+0x164/0x238
   el0_svc_common.constprop.0+0x15c/0x598
   do_el0_svc+0x60/0x150
   el0_svc+0x34/0xb0
   el0_sync_handler+0xc8/0x5b4
   el0_sync+0x15c/0x180

  Freed by task 228:
   kasan_save_stack+0x28/0x58
   kasan_set_track+0x28/0x40
   kasan_set_free_info+0x24/0x48
   __kasan_slab_free+0x118/0x190
   kasan_slab_free+0x14/0x20
   slab_free_freelist_hook+0x6c/0x210
   kfree+0x13c/0x460

Use the same pattern as f2fs + ext4 where the kobject destruction must
complete before allowing the FS itself to be freed.  This means that we
need an explicit free_sbd in the callers.

Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[Also go to fail_free when init_names fails.]
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agogfs2: Fix NULL pointer dereference in gfs2_rgrp_dump
Andrew Price [Wed, 7 Oct 2020 11:30:58 +0000 (12:30 +0100)]
gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump

[ Upstream commit 0e539ca1bbbe85a86549c97a30a765ada4a09df9 ]

When an rindex entry is found to be corrupt, compute_bitstructs() calls
gfs2_consist_rgrpd() which calls gfs2_rgrp_dump() like this:

    gfs2_rgrp_dump(NULL, rgd->rd_gl, fs_id_buf);

gfs2_rgrp_dump then dereferences the gl without checking it and we get

    BUG: KASAN: null-ptr-deref in gfs2_rgrp_dump+0x28/0x280

because there's no rgrp glock involved while reading the rindex on mount.

Fix this by changing gfs2_rgrp_dump to take an rgrp argument.

Reported-by: syzbot+43fa87986bdd31df9de6@syzkaller.appspotmail.com
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agogfs2: call truncate_inode_pages_final for address space glocks
Bob Peterson [Wed, 16 Sep 2020 16:06:23 +0000 (11:06 -0500)]
gfs2: call truncate_inode_pages_final for address space glocks

[ Upstream commit ee1e2c773e4f4ce2213f9d77cc703b669ca6fa3f ]

Before this patch, we were not calling truncate_inode_pages_final for the
address space for glocks, which left the possibility of a leak. We now
take care of the problem instead of complaining, and we do it during
glock tear-down..

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoscsi: core: Clean up allocation and freeing of sgtables
Christoph Hellwig [Mon, 5 Oct 2020 08:41:28 +0000 (10:41 +0200)]
scsi: core: Clean up allocation and freeing of sgtables

[ Upstream commit 7007e9dd56767a95de0947b3f7599bcc2f21687f ]

Rename scsi_init_io() to scsi_alloc_sgtables(), and ensure callers call
scsi_free_sgtables() to cleanup failures close to scsi_init_io() instead of
leaking it down the generic I/O submission path.

Link: https://lore.kernel.org/r/20201005084130.143273-9-hch@lst.de
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoKVM: PPC: Book3S HV: Do not allocate HPT for a nested guest
Fabiano Rosas [Fri, 11 Sep 2020 04:16:07 +0000 (01:16 -0300)]
KVM: PPC: Book3S HV: Do not allocate HPT for a nested guest

[ Upstream commit 05e6295dc7de859c9d56334805485c4d20bebf25 ]

The current nested KVM code does not support HPT guests. This is
informed/enforced in some ways:

- Hosts < P9 will not be able to enable the nested HV feature;

- The nested hypervisor MMU capabilities will not contain
  KVM_CAP_PPC_MMU_HASH_V3;

- QEMU reflects the MMU capabilities in the
  'ibm,arch-vec-5-platform-support' device-tree property;

- The nested guest, at 'prom_parse_mmu_model' ignores the
  'disable_radix' kernel command line option if HPT is not supported;

- The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT.

There is, however, still a way to start a HPT guest by using
max-compat-cpu=power8 at the QEMU machine options. This leads to the
guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB
ioctl.

With the guest set to hash, the nested hypervisor goes through the
entry path that has no knowledge of nesting (kvmppc_run_vcpu) and
crashes when it tries to execute an hypervisor-privileged (mtspr
HDEC) instruction at __kvmppc_vcore_entry:

root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ...

<snip>
[  538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1
[  538.543355] NIP:  c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0
[  538.543417] REGS: c0000013e91e33b0 TRAP: 0700   Not tainted  (5.9.0-rc4)
[  538.543470] MSR:  8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 22422882  XER: 20040000
[  538.543546] CFAR: c00800000753f4b0 IRQMASK: 3
               GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000
               GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72
               GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0
               GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008
               GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8
               GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000
               GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001
               GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001
[  538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv]
[  538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv]
[  538.544173] Call Trace:
[  538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable)
[  538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv]
[  538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv]
[  538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[  538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm]
[  538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm]
[  538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90
[  538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0
[  538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c
[  538.544787] Instruction dump:
[  538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002
[  538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba67d083a14 f90d10a0 480104fd
[  538.544953] ---[ end trace 74423e2b948c2e0c ]---

This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in
the nested hypervisor, causing QEMU to abort.

Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoext4: Detect already used quota file early
Jan Kara [Thu, 15 Oct 2020 11:03:30 +0000 (13:03 +0200)]
ext4: Detect already used quota file early

[ Upstream commit e0770e91424f694b461141cbc99adf6b23006b60 ]

When we try to use file already used as a quota file again (for the same
or different quota type), strange things can happen. At the very least
lockdep annotations may be wrong but also inode flags may be wrongly set
/ reset. When the file is used for two quota types at once we can even
corrupt the file and likely crash the kernel. Catch all these cases by
checking whether passed file is already used as quota file and bail
early in that case.

This fixes occasional generic/219 failure due to lockdep complaint.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reported-by: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201015110330.28716-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agojbd2: avoid transaction reuse after reformatting
changfengnan [Mon, 12 Oct 2020 16:49:00 +0000 (18:49 +0200)]
jbd2: avoid transaction reuse after reformatting

[ Upstream commit fc750a3b44bdccb9fb96d6abbc48a9b8e480ce7b ]

When ext4 is formatted with lazy_journal_init=1 and transactions from
the previous filesystem are still on disk, it is possible that they are
considered during a recovery after a crash. Because the checksum seed
has changed, the CRC check will fail, and the journal recovery fails
with checksum error although the journal is otherwise perfectly valid.
Fix the problem by checking commit block time stamps to determine
whether the data in the journal block is just stale or whether it is
indeed corrupt.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Fengnan Chang <changfengnan@hikvision.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201012164900.20197-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agodrivers: watchdog: rdc321x_wdt: Fix race condition bugs
Madhuparna Bhowmik [Fri, 7 Aug 2020 11:29:02 +0000 (16:59 +0530)]
drivers: watchdog: rdc321x_wdt: Fix race condition bugs

[ Upstream commit 4b2e7f99cdd314263c9d172bc17193b8b6bba463 ]

In rdc321x_wdt_probe(), rdc321x_wdt_device.queue is initialized
after misc_register(), hence if ioctl is called before its
initialization which can call rdc321x_wdt_start() function,
it will see an uninitialized value of rdc321x_wdt_device.queue,
hence initialize it before misc_register().
Also, rdc321x_wdt_device.default_ticks is accessed in reset()
function called from write callback, thus initialize it before
misc_register().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20200807112902.28764-1-madhuparnabhowmik10@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoceph: encode inodes' parent/d_name in cap reconnect message
Yan, Zheng [Tue, 11 Aug 2020 07:23:03 +0000 (15:23 +0800)]
ceph: encode inodes' parent/d_name in cap reconnect message

[ Upstream commit a33f6432b3a63a4909dbbb0967f7c9df8ff2de91 ]

Since nautilus, MDS tracks dirfrags whose child inodes have caps in open
file table. When MDS recovers, it prefetches all of these dirfrags. This
avoids using backtrace to load inodes. But dirfrags prefetch may load
lots of useless inodes into cache, and make MDS run out of memory.

Recent MDS adds an option that disables dirfrags prefetch. When dirfrags
prefetch is disabled. Recovering MDS only prefetches corresponding dir
inodes. Including inodes' parent/d_name in cap reconnect message can
help MDS to load inodes into its cache.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agonet: 9p: initialize sun_server.sun_path to have addr's value only when addr is valid
Anant Thazhemadam [Mon, 12 Oct 2020 04:24:04 +0000 (09:54 +0530)]
net: 9p: initialize sun_server.sun_path to have addr's value only when addr is valid

[ Upstream commit 7ca1db21ef8e0e6725b4d25deed1ca196f7efb28 ]

In p9_fd_create_unix, checking is performed to see if the addr (passed
as an argument) is NULL or not.
However, no check is performed to see if addr is a valid address, i.e.,
it doesn't entirely consist of only 0's.
The initialization of sun_server.sun_path to be equal to this faulty
addr value leads to an uninitialized variable, as detected by KMSAN.
Checking for this (faulty addr) and returning a negative error number
appropriately, resolves this issue.

Link: http://lkml.kernel.org/r/20201012042404.2508-1-anant.thazhemadam@gmail.com
Reported-by: syzbot+75d51fe5bf4ebe988518@syzkaller.appspotmail.com
Tested-by: syzbot+75d51fe5bf4ebe988518@syzkaller.appspotmail.com
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agonfsd4: remove check_conflicting_opens warning
J. Bruce Fields [Fri, 25 Sep 2020 14:09:58 +0000 (10:09 -0400)]
nfsd4: remove check_conflicting_opens warning

[ Upstream commit 50747dd5e47bde3b7d7f839c84d0d3b554090497 ]

There are actually rare races where this is possible (e.g. if a new open
intervenes between the read of i_writecount and the fi_fds).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agonfsd: rename delegation related tracepoints to make them less confusing
Hou Tao [Fri, 28 Aug 2020 07:02:55 +0000 (15:02 +0800)]
nfsd: rename delegation related tracepoints to make them less confusing

[ Upstream commit 3caf91757ced158e6c4a44d8b105bd7b3e1767d8 ]

Now when a read delegation is given, two delegation related traces
will be printed:

    nfsd_deleg_open: client 5f45b854:e6058001 stateid 00000030:00000001
    nfsd_deleg_none: client 5f45b854:e6058001 stateid 0000002f:00000001

Although the intention is to let developers know two stateid are
returned, the traces are confusing about whether or not a read delegation
is handled out. So renaming trace_nfsd_deleg_none() to trace_nfsd_open()
and trace_nfsd_deleg_open() to trace_nfsd_deleg_read() to make
the intension clearer.

The patched traces will be:

    nfsd_deleg_read: client 5f48a967:b55b21cd stateid 00000003:00000001
    nfsd_open: client 5f48a967:b55b21cd stateid 00000002:00000001

Suggested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoclk: ti: clockdomain: fix static checker warning
Tero Kristo [Mon, 7 Sep 2020 08:25:59 +0000 (11:25 +0300)]
clk: ti: clockdomain: fix static checker warning

[ Upstream commit b7a7943fe291b983b104bcbd2f16e8e896f56590 ]

Fix a memory leak induced by not calling clk_put after doing of_clk_get.

Reported-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Link: https://lore.kernel.org/r/20200907082600.454-3-t-kristo@ti.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoPCI/ACPI: Add Ampere Altra SOC MCFG quirk
Tuan Phan [Thu, 6 Aug 2020 21:57:34 +0000 (14:57 -0700)]
PCI/ACPI: Add Ampere Altra SOC MCFG quirk

[ Upstream commit 877c1a5f79c6984bbe3f2924234c08e2f4f1acd5 ]

Ampere Altra SOC supports only 32-bit ECAM reads.  Add an MCFG quirk for
the platform.

Link: https://lore.kernel.org/r/1596751055-12316-1-git-send-email-tuanphan@os.amperecomputing.com
Signed-off-by: Tuan Phan <tuanphan@os.amperecomputing.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agorpmsg: glink: Use complete_all for open states
Chris Lew [Wed, 24 Jun 2020 16:45:18 +0000 (22:15 +0530)]
rpmsg: glink: Use complete_all for open states

[ Upstream commit 4fcdaf6e28d11e2f3820d54dd23cd12a47ddd44e ]

The open_req and open_ack completion variables are the state variables
to represet a remote channel as open. Use complete_all so there are no
races with waiters and using completion_done.

Signed-off-by: Chris Lew <clew@codeaurora.org>
Signed-off-by: Arun Kumar Neelakantam <aneela@codeaurora.org>
Signed-off-by: Deepak Kumar Singh <deesin@codeaurora.org>
Link: https://lore.kernel.org/r/1593017121-7953-2-git-send-email-deesin@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agobnxt_en: Log unknown link speed appropriately.
Michael Chan [Mon, 12 Oct 2020 09:10:51 +0000 (05:10 -0400)]
bnxt_en: Log unknown link speed appropriately.

[ Upstream commit 8eddb3e7ce124dd6375d3664f1aae13873318b0f ]

If the VF virtual link is set to always enabled, the speed may be
unknown when the physical link is down.  The driver currently logs
the link speed as 4294967295 Mbps which is SPEED_UNKNOWN.  Modify
the link up log message as "speed unknown" which makes more sense.

Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-7-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agof2fs: fix to set SBI_NEED_FSCK flag for inconsistent inode
Chao Yu [Fri, 9 Oct 2020 02:40:48 +0000 (10:40 +0800)]
f2fs: fix to set SBI_NEED_FSCK flag for inconsistent inode

[ Upstream commit d662fad143c0470ad7f46ea7b02da539f613d7d7 ]

If compressed inode has inconsistent fields on i_compress_algorithm,
i_compr_blocks and i_log_cluster_size, we missed to set SBI_NEED_FSCK
to notice fsck to repair the inode, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomd/bitmap: md_bitmap_get_counter returns wrong blocks
Zhao Heming [Mon, 5 Oct 2020 16:00:24 +0000 (00:00 +0800)]
md/bitmap: md_bitmap_get_counter returns wrong blocks

[ Upstream commit d837f7277f56e70d82b3a4a037d744854e62f387 ]

md_bitmap_get_counter() has code:

```
    if (bitmap->bp[page].hijacked ||
        bitmap->bp[page].map == NULL)
        csize = ((sector_t)1) << (bitmap->chunkshift +
                      PAGE_COUNTER_SHIFT - 1);
```

The minus 1 is wrong, this branch should report 2048 bits of space.
With "-1" action, this only report 1024 bit of space.

This bug code returns wrong blocks, but it doesn't inflence bitmap logic:
1. Most callers focus this function return value (the counter of offset),
   not the parameter blocks.
2. The bug is only triggered when hijacked is true or map is NULL.
   the hijacked true condition is very rare.
   the "map == null" only true when array is creating or resizing.
3. Even the caller gets wrong blocks, current code makes caller just to
   call md_bitmap_get_counter() one more time.

Signed-off-by: Zhao Heming <heming.zhao@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agobtrfs: fix replace of seed device
Anand Jain [Fri, 4 Sep 2020 17:34:22 +0000 (01:34 +0800)]
btrfs: fix replace of seed device

[ Upstream commit c6a5d954950c5031444173ad2195efc163afcac9 ]

If you replace a seed device in a sprouted fs, it appears to have
successfully replaced the seed device, but if you look closely, it
didn't.  Here is an example.

  $ mkfs.btrfs /dev/sda
  $ btrfstune -S1 /dev/sda
  $ mount /dev/sda /btrfs
  $ btrfs device add /dev/sdb /btrfs
  $ umount /btrfs
  $ btrfs device scan --forget
  $ mount -o device=/dev/sda /dev/sdb /btrfs
  $ btrfs replace start -f /dev/sda /dev/sdc /btrfs
  $ echo $?
  0

  BTRFS info (device sdb): dev_replace from /dev/sda (devid 1) to /dev/sdc started
  BTRFS info (device sdb): dev_replace from /dev/sda (devid 1) to /dev/sdc finished

  $ btrfs fi show
  Label: none  uuid: ab2c88b7-be81-4a7e-9849-c3666e7f9f4f
  Total devices 2 FS bytes used 256.00KiB
  devid    1 size 3.00GiB used 520.00MiB path /dev/sdc
  devid    2 size 3.00GiB used 896.00MiB path /dev/sdb

  Label: none  uuid: 10bd3202-0415-43af-96a8-d5409f310a7e
  Total devices 1 FS bytes used 128.00KiB
  devid    1 size 3.00GiB used 536.00MiB path /dev/sda

So as per the replace start command and kernel log replace was successful.
Now let's try to clean mount.

  $ umount /btrfs
  $ btrfs device scan --forget

  $ mount -o device=/dev/sdc /dev/sdb /btrfs
  mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error.

  [  636.157517] BTRFS error (device sdc): failed to read chunk tree: -2
  [  636.180177] BTRFS error (device sdc): open_ctree failed

That's because per dev items it is still looking for the original seed
device.

 $ btrfs inspect-internal dump-tree -d /dev/sdb

item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
devid 1 total_bytes 3221225472 bytes_used 545259520
io_align 4096 io_width 4096 sector_size 4096 type 0
generation 6 start_offset 0 dev_group 0
seek_speed 0 bandwidth 0
uuid 59368f50-9af2-4b17-91da-8a783cc418d4  <--- seed uuid
fsid 10bd3202-0415-43af-96a8-d5409f310a7e  <--- seed fsid
item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
devid 2 total_bytes 3221225472 bytes_used 939524096
io_align 4096 io_width 4096 sector_size 4096 type 0
generation 0 start_offset 0 dev_group 0
seek_speed 0 bandwidth 0
uuid 56a0a6bc-4630-4998-8daf-3c3030c4256a  <- sprout uuid
fsid ab2c88b7-be81-4a7e-9849-c3666e7f9f4f <- sprout fsid

But the replaced target has the following uuid+fsid in its superblock
which doesn't match with the expected uuid+fsid in its devitem.

  $ btrfs in dump-super /dev/sdc | egrep '^generation|dev_item.uuid|dev_item.fsid|devid'
  generation 20
  dev_item.uuid 59368f50-9af2-4b17-91da-8a783cc418d4
  dev_item.fsid ab2c88b7-be81-4a7e-9849-c3666e7f9f4f [match]
  dev_item.devid 1

So if you provide the original seed device the mount shall be
successful.  Which so long happening in the test case btrfs/163.

  $ btrfs device scan --forget
  $ mount -o device=/dev/sda /dev/sdb /btrfs

Fix in this patch:
If a seed is not sprouted then there is no replacement of it, because of
its read-only filesystem with a read-only device. Similarly, in the case
of a sprouted filesystem, the seed device is still read only. So, mark
it as you can't replace a seed device, you can only add a new device and
then delete the seed device. If replace is attempted then returns
-EINVAL.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoblock: Consider only dispatched requests for inflight statistic
Gabriel Krisman Bertazi [Tue, 6 Oct 2020 19:41:25 +0000 (15:41 -0400)]
block: Consider only dispatched requests for inflight statistic

[ Upstream commit a926c7afffcc0f2e35e6acbccb16921bacf34617 ]

According to Documentation/block/stat.rst, inflight should not include
I/O requests that are in the queue but not yet dispatched to the device,
but blk-mq identifies as inflight any request that has a tag allocated,
which, for queues without elevator, happens at request allocation time
and before it is queued in the ctx (default case in blk_mq_submit_bio).

In addition, current behavior is different for queues with elevator from
queues without it, since for the former the driver tag is allocated at
dispatch time.  A more precise approach would be to only consider
requests with state MQ_RQ_IN_FLIGHT.

This effectively reverts commit 6131837b1de6 ("blk-mq: count allocated
but not started requests in iostats inflight") to consolidate blk-mq
behavior with itself (elevator case) and with original documentation,
but it differs from the behavior used by the legacy path.

This version differs from v1 by using blk_mq_rq_state to access the
state attribute.  Avoid using blk_mq_request_started, which was
suggested, since we don't want to include MQ_RQ_COMPLETE.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoARC: [dts] fix the errors detected by dtbs_check
Zhen Lei [Thu, 24 Sep 2020 07:17:54 +0000 (15:17 +0800)]
ARC: [dts] fix the errors detected by dtbs_check

[ Upstream commit 05b1be68c4d6d76970025e6139bfd735c2256ee5 ]

xxx/arc/boot/dts/axs101.dt.yaml: dw-apb-ictl@e0012000: $nodename:0: \
'dw-apb-ictl@e0012000' does not match '^interrupt-controller(@[0-9a-f,]+)*$'
 From schema: xxx/interrupt-controller/snps,dw-apb-ictl.yaml

The node name of the interrupt controller must start with
"interrupt-controller" instead of "dw-apb-ictl".

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agodrm/amd/display: Avoid set zero in the requested clk
Rodrigo Siqueira [Wed, 30 Sep 2020 17:57:54 +0000 (13:57 -0400)]
drm/amd/display: Avoid set zero in the requested clk

[ Upstream commit 2f8be0e516803cc3fd87c1671247896571a5a8fb ]

[Why]
Sometimes CRTCs can be disabled due to display unplugging or temporarily
transition in the userspace; in these circumstances, DCE tries to set
the minimum clock threshold. When we have this situation, the function
bw_calcs is invoked with number_of_displays set to zero, making DCE set
dispclk_khz and sclk_khz to zero. For these reasons, we have seen some
ATOM bios errors that look like:

[drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than
5secs aborting
[drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck
executing EA8A (len 761, WS 0, PS 0) @ 0xEABA

[How]
This error happens due to an attempt to optimize the bandwidth using the
sclk, and the dispclk clock set to zero. Technically we handle this in
the function dce112_set_clock, but we are not considering the case that
this value is set to zero. This commit fixes this issue by ensuring that
we never set a minimum value below the minimum clock threshold.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agodrm/amd/display: HDMI remote sink need mode validation for Linux
Fangzhi Zuo [Mon, 21 Sep 2020 21:52:43 +0000 (17:52 -0400)]
drm/amd/display: HDMI remote sink need mode validation for Linux

[ Upstream commit 95d620adb48f7728e67d82f56f756e8d451cf8d2 ]

[Why]
Currently mode validation is bypassed if remote sink exists. That
leads to mode set issue when a BW bottle neck exists in the link path,
e.g., a DP-to-HDMI converter that only supports HDMI 1.4.

Any invalid mode passed to Linux user space will cause the modeset
failure due to limitation of Linux user space implementation.

[How]
Mode validation is skipped only if in edid override. For real remote
sink, clock limit check should be done for HDMI remote sink.

Have HDMI related remote sink going through mode validation to
elimiate modes which pixel clock exceeds BW limitation.

Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopower: supply: test_power: add missing newlines when printing parameters by sysfs
Xiongfeng Wang [Fri, 4 Sep 2020 06:09:58 +0000 (14:09 +0800)]
power: supply: test_power: add missing newlines when printing parameters by sysfs

[ Upstream commit c07fa6c1631333f02750cf59f22b615d768b4d8f ]

When I cat some module parameters by sysfs, it displays as follows.
It's better to add a newline for easy reading.

root@syzkaller:~# cd /sys/module/test_power/parameters/
root@syzkaller:/sys/module/test_power/parameters# cat ac_online
onroot@syzkaller:/sys/module/test_power/parameters# cat battery_present
trueroot@syzkaller:/sys/module/test_power/parameters# cat battery_health
goodroot@syzkaller:/sys/module/test_power/parameters# cat battery_status
dischargingroot@syzkaller:/sys/module/test_power/parameters# cat battery_technology
LIONroot@syzkaller:/sys/module/test_power/parameters# cat usb_online
onroot@syzkaller:/sys/module/test_power/parameters#

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3
Jonathan Cameron [Wed, 30 Sep 2020 14:05:45 +0000 (22:05 +0800)]
ACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3

[ Upstream commit 2c5b9bde95c96942f2873cea6ef383c02800e4a8 ]

In ACPI 6.3, the Memory Proximity Domain Attributes Structure
changed substantially.  One of those changes was that the flag
for "Memory Proximity Domain field is valid" was deprecated.

This was because the field "Proximity Domain for the Memory"
became a required field and hence having a validity flag makes
no sense.

So the correct logic is to always assume the field is there.
Current code assumes it never is.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agobus/fsl_mc: Do not rely on caller to provide non NULL mc_io
Diana Craciun [Tue, 29 Sep 2020 08:54:38 +0000 (11:54 +0300)]
bus/fsl_mc: Do not rely on caller to provide non NULL mc_io

[ Upstream commit 5026cf605143e764e1785bbf9158559d17f8d260 ]

Before destroying the mc_io, check first that it was
allocated.

Reviewed-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Acked-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Link: https://lore.kernel.org/r/20200929085441.17448-11-diana.craciun@oss.nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agobus: mhi: core: Abort suspends due to outgoing pending packets
Bhaumik Bhatt [Tue, 29 Sep 2020 17:52:02 +0000 (23:22 +0530)]
bus: mhi: core: Abort suspends due to outgoing pending packets

[ Upstream commit 515847c557dd33167be86cb429fc0674a331bc88 ]

Add the missing check to abort suspends if a client driver has pending
outgoing packets to send to the device. This allows better utilization
of the MHI bus wherein clients on the host are not left waiting for
longer suspend or resume cycles to finish for data transfers.

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20200929175218.8178-4-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>