]> www.infradead.org Git - users/hch/block.git/log
users/hch/block.git
4 years agoscripts: Fix typo in headers_install.sh
Masanari Iida [Tue, 16 Jun 2020 12:51:32 +0000 (21:51 +0900)]
scripts: Fix typo in headers_install.sh

This patch fixes a spelling typo in scripts/headers_install.sh

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: unify cc-option and as-option
Masahiro Yamada [Sun, 14 Jun 2020 14:43:41 +0000 (23:43 +0900)]
kconfig: unify cc-option and as-option

cc-option and as-option are almost the same; both pass the flag to
$(CC). The main difference is the cc-option stops before the assemble
stage (-S option) whereas as-option stops after (-c option).

I chose -S because it is slightly faster, but $(cc-option,-gz=zlib)
returns a wrong result (https://lkml.org/lkml/2020/6/9/1529).
It has been fixed by commit 7b16994437c7 ("Makefile: Improve compressed
debug info support detection"), but the assembler should always be
invoked for more reliable compiler option tests.

However, you cannot simply replace -S with -c because the following
code in lib/Kconfig.debug would break:

    depends on $(cc-option,-gsplit-dwarf)

The combination of -c and -gsplit-dwarf does not accept /dev/null as
output.

  $ cat /dev/null | gcc -gsplit-dwarf -S -x c - -o /dev/null
  $ echo $?
  0

  $ cat /dev/null | gcc -gsplit-dwarf -c -x c - -o /dev/null
  objcopy: Warning: '/dev/null' is not an ordinary file
  $ echo $?
  1

  $ cat /dev/null | gcc -gsplit-dwarf -c -x c - -o tmp.o
  $ echo $?
  0

There is another flag that creates an separate file based on the
object file path:

  $ cat /dev/null | gcc -ftest-coverage -c -x c - -o /dev/null
  <stdin>:1: error: cannot open /dev/null.gcno

So, we cannot use /dev/null to sink the output.

Align the cc-option implementation with scripts/Kbuild.include.

With -c option used in cc-option, as-option is unneeded.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
4 years agotools/bootconfig: Add testcase for show-command and quotes test
Masami Hiramatsu [Tue, 16 Jun 2020 10:14:34 +0000 (19:14 +0900)]
tools/bootconfig: Add testcase for show-command and quotes test

Add testcases for the return value of the command to show
bootconfig in initrd, and double/single quotes selecting.

Link: http://lkml.kernel.org/r/159230247428.65555.2109472942519215104.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotools/bootconfig: Fix to return 0 if succeeded to show the bootconfig
Masami Hiramatsu [Tue, 16 Jun 2020 10:14:25 +0000 (19:14 +0900)]
tools/bootconfig: Fix to return 0 if succeeded to show the bootconfig

Fix bootconfig to return 0 if succeeded to show the bootconfig
in initrd. Without this fix, "bootconfig INITRD" command
returns !0 even if the command succeeded to show the bootconfig.

Link: http://lkml.kernel.org/r/159230246566.65555.11891772258543514487.stgit@devnote2
Cc: stable@vger.kernel.org
Fixes: 950313ebf79c ("tools: bootconfig: Add bootconfig command")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotools/bootconfig: Fix to use correct quotes for value
Masami Hiramatsu [Tue, 16 Jun 2020 10:14:17 +0000 (19:14 +0900)]
tools/bootconfig: Fix to use correct quotes for value

Fix bootconfig tool to select double or single quotes
correctly according to the value.

If a bootconfig value includes a double quote character,
we must use single-quotes to quote that value.

Link: http://lkml.kernel.org/r/159230245697.65555.12444299015852932304.stgit@devnote2
Cc: stable@vger.kernel.org
Fixes: 950313ebf79c ("tools: bootconfig: Add bootconfig command")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agoproc/bootconfig: Fix to use correct quotes for value
Masami Hiramatsu [Tue, 16 Jun 2020 10:14:08 +0000 (19:14 +0900)]
proc/bootconfig: Fix to use correct quotes for value

Fix /proc/bootconfig to select double or single quotes
corrctly according to the value.

If a bootconfig value includes a double quote character,
we must use single-quotes to quote that value.

This modifies if() condition and blocks for avoiding
double-quote in value check in 2 places. Anyway, since
xbc_array_for_each_value() can handle the array which
has a single node correctly.
Thus,

if (vnode && xbc_node_is_array(vnode)) {
xbc_array_for_each_value(vnode) /* vnode->next != NULL */
...
} else {
snprintf(val); /* val is an empty string if !vnode */
}

is equivalent to

if (vnode) {
xbc_array_for_each_value(vnode) /* vnode->next can be NULL */
...
} else {
snprintf(""); /* value is always empty */
}

Link: http://lkml.kernel.org/r/159230244786.65555.3763894451251622488.stgit@devnote2
Cc: stable@vger.kernel.org
Fixes: c1a3c36017d4 ("proc: bootconfig: Add /proc/bootconfig to show boot config list")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotracing: Remove unused event variable in tracing_iter_reset
YangHui [Tue, 16 Jun 2020 03:36:46 +0000 (11:36 +0800)]
tracing: Remove unused event variable in tracing_iter_reset

We do not use the event variable, just remove it.

Signed-off-by: YangHui <yanghui.def@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotracing/probe: Fix memleak in fetch_op_data operations
Vamshi K Sthambamkadi [Mon, 15 Jun 2020 14:30:38 +0000 (20:00 +0530)]
tracing/probe: Fix memleak in fetch_op_data operations

kmemleak report:
    [<57dcc2ca>] __kmalloc_track_caller+0x139/0x2b0
    [<f1c45d0f>] kstrndup+0x37/0x80
    [<f9761eb0>] parse_probe_arg.isra.7+0x3cc/0x630
    [<055bf2ba>] traceprobe_parse_probe_arg+0x2f5/0x810
    [<655a7766>] trace_kprobe_create+0x2ca/0x950
    [<4fc6a02a>] create_or_delete_trace_kprobe+0xf/0x30
    [<6d1c8a52>] trace_run_command+0x67/0x80
    [<be812cc0>] trace_parse_run_command+0xa7/0x140
    [<aecfe401>] probes_write+0x10/0x20
    [<2027641c>] __vfs_write+0x30/0x1e0
    [<6a4aeee1>] vfs_write+0x96/0x1b0
    [<3517fb7d>] ksys_write+0x53/0xc0
    [<dad91db7>] __ia32_sys_write+0x15/0x20
    [<da347f64>] do_syscall_32_irqs_on+0x3d/0x260
    [<fd0b7e7d>] do_fast_syscall_32+0x39/0xb0
    [<ea5ae810>] entry_SYSENTER_32+0xaf/0x102

Post parse_probe_arg(), the FETCH_OP_DATA operation type is overwritten
to FETCH_OP_ST_STRING, as a result memory is never freed since
traceprobe_free_probe_arg() iterates only over SYMBOL and DATA op types

Setup fetch string operation correctly after fetch_op_data operation.

Link: https://lkml.kernel.org/r/20200615143034.GA1734@cosmos
Cc: stable@vger.kernel.org
Fixes: a42e3c4de964 ("tracing/probe: Add immediate string parameter support")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotrace: Fix typo in allocate_ftrace_ops()'s comment
Wei Yang [Wed, 10 Jun 2020 03:32:51 +0000 (11:32 +0800)]
trace: Fix typo in allocate_ftrace_ops()'s comment

No functional change, just correct the word.

Link: https://lkml.kernel.org/r/20200610033251.31713-1-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotracing: Make ftrace packed events have align of 1
Steven Rostedt (VMware) [Wed, 10 Jun 2020 02:00:41 +0000 (22:00 -0400)]
tracing: Make ftrace packed events have align of 1

When using trace-cmd on 5.6-rt for the function graph tracer, the output was
corrupted. It gave output like this:

 funcgraph_entry:       func=0xffffffff depth=38982
 funcgraph_entry:       func=0x1ffffffff depth=16044
 funcgraph_exit:        func=0xffffffff overrun=0x92539aaf00000000 calltime=0x92539c9900000072 rettime=0x100000072 depth=11084
 funcgraph_exit:        func=0xffffffff overrun=0x9253946e00000000 calltime=0x92539e2100000072 rettime=0x72 depth=26033702
 funcgraph_entry:       func=0xffffffff depth=85798
 funcgraph_entry:       func=0x1ffffffff depth=12044

The reason was because the tracefs/events/ftrace/funcgraph_entry/exit format
file was incorrect. The -rt kernel adds more common fields to the trace
events. Namely, common_migrate_disable and common_preempt_lazy_count. Each
is one byte in size. This changes the alignment of the normal payload. Most
events are aligned normally, but the function and function graph events are
defined with a "PACKED" macro, that packs their payload. As the offsets
displayed in the format files are now calculated by an aligned field, the
aligned field for function and function graph events should be 1, not their
normal alignment.

With aligning of the funcgraph_entry event, the format file has:

        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;
        field:unsigned char common_migrate_disable;     offset:8;       size:1; signed:0;
        field:unsigned char common_preempt_lazy_count;  offset:9;       size:1; signed:0;

        field:unsigned long func;       offset:16;      size:8; signed:0;
        field:int depth;        offset:24;      size:4; signed:1;

But the actual alignment is:

field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned char common_migrate_disable; offset:8; size:1; signed:0;
field:unsigned char common_preempt_lazy_count; offset:9; size:1; signed:0;

field:unsigned long func; offset:12; size:8; signed:0;
field:int depth; offset:20; size:4; signed:1;

Link: https://lkml.kernel.org/r/20200609220041.2a3b527f@oasis.local.home
Cc: stable@vger.kernel.org
Fixes: 04ae87a52074e ("ftrace: Rework event_create_dir()")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agosample-trace-array: Remove trace_array 'sample-instance'
Kefeng Wang [Tue, 9 Jun 2020 13:52:00 +0000 (13:52 +0000)]
sample-trace-array: Remove trace_array 'sample-instance'

Remove trace_array 'sample-instance' if kthread_run fails
in sample_trace_array_init().

Link: https://lkml.kernel.org/r/20200609135200.2206726-1-wangkefeng.wang@huawei.com
Cc: stable@vger.kernel.org
Fixes: 89ed42495ef4a ("tracing: Sample module to demonstrate kernel access to Ftrace instances.")
Reviewed-by: Divya Indi <divya.indi@oracle.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agosample-trace-array: Fix sleeping function called from invalid context
Kefeng Wang [Wed, 10 Jun 2020 01:12:44 +0000 (01:12 +0000)]
sample-trace-array: Fix sleeping function called from invalid context

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/5
 1 lock held by swapper/5/0:
  #0: ffff80001002bd90 (samples/ftrace/sample-trace-array.c:38){+.-.}-{0:0}, at: call_timer_fn+0x8/0x3e0
 CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.7.0+ #8
 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
 Call trace:
  dump_backtrace+0x0/0x1a0
  show_stack+0x20/0x30
  dump_stack+0xe4/0x150
  ___might_sleep+0x160/0x200
  __might_sleep+0x58/0x90
  __mutex_lock+0x64/0x948
  mutex_lock_nested+0x3c/0x58
  __ftrace_set_clr_event+0x44/0x88
  trace_array_set_clr_event+0x24/0x38
  mytimer_handler+0x34/0x40 [sample_trace_array]

mutex_lock() will be called in interrupt context, using workqueue to fix it.

Link: https://lkml.kernel.org/r/20200610011244.2209486-1-wangkefeng.wang@huawei.com
Cc: stable@vger.kernel.org
Fixes: 89ed42495ef4 ("tracing: Sample module to demonstrate kernel access to Ftrace instances.")
Reviewed-by: Divya Indi <divya.indi@oracle.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agokretprobe: Prevent triggering kretprobe from within kprobe_flush_task
Jiri Olsa [Tue, 12 May 2020 08:03:18 +0000 (17:03 +0900)]
kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
My test was also able to trigger lockdep output:

 ============================================
 WARNING: possible recursive locking detected
 5.6.0-rc6+ #6 Not tainted
 --------------------------------------------
 sched-messaging/2767 is trying to acquire lock:
 ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0

 but task is already holding lock:
 ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(kretprobe_table_locks[i].lock));
   lock(&(kretprobe_table_locks[i].lock));

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by sched-messaging/2767:
  #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

 stack backtrace:
 CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
 Call Trace:
  dump_stack+0x96/0xe0
  __lock_acquire.cold.57+0x173/0x2b7
  ? native_queued_spin_lock_slowpath+0x42b/0x9e0
  ? lockdep_hardirqs_on+0x590/0x590
  ? __lock_acquire+0xf63/0x4030
  lock_acquire+0x15a/0x3d0
  ? kretprobe_hash_lock+0x52/0xa0
  _raw_spin_lock_irqsave+0x36/0x70
  ? kretprobe_hash_lock+0x52/0xa0
  kretprobe_hash_lock+0x52/0xa0
  trampoline_handler+0xf8/0x940
  ? kprobe_fault_handler+0x380/0x380
  ? find_held_lock+0x3a/0x1c0
  kretprobe_trampoline+0x25/0x50
  ? lock_acquired+0x392/0xbc0
  ? _raw_spin_lock_irqsave+0x50/0x70
  ? __get_valid_kprobe+0x1f0/0x1f0
  ? _raw_spin_unlock_irqrestore+0x3b/0x40
  ? finish_task_switch+0x4b9/0x6d0
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70

The code within the kretprobe handler checks for probe reentrancy,
so we won't trigger any _raw_spin_lock_irqsave probe in there.

The problem is in outside kprobe_flush_task, where we call:

  kprobe_flush_task
    kretprobe_table_lock
      raw_spin_lock_irqsave
        _raw_spin_lock_irqsave

where _raw_spin_lock_irqsave triggers the kretprobe and installs
kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

The kretprobe_trampoline handler is then executed with already
locked kretprobe_table_locks, and first thing it does is to
lock kretprobe_table_locks ;-) the whole lockup path like:

  kprobe_flush_task
    kretprobe_table_lock
      raw_spin_lock_irqsave
        _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed

        ---> kretprobe_table_locks locked

        kretprobe_trampoline
          trampoline_handler
            kretprobe_hash_lock(current, &head, &flags);  <--- deadlock

Adding kprobe_busy_begin/end helpers that mark code with fake
probe installed to prevent triggering of another kprobe within
this code.

Using these helpers in kprobe_flush_task, so the probe recursion
protection check is hit and the probe is never set to prevent
above lockup.

Link: http://lkml.kernel.org/r/158927059835.27680.7011202830041561604.stgit@devnote2
Fixes: ef53d9c5e4da ("kprobes: improve kretprobe scalability with hashed locking")
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agokprobes: Remove redundant arch_disarm_kprobe() call
Masami Hiramatsu [Tue, 12 May 2020 08:03:07 +0000 (17:03 +0900)]
kprobes: Remove redundant arch_disarm_kprobe() call

Fix to remove redundant arch_disarm_kprobe() call in
force_unoptimize_kprobe(). This arch_disarm_kprobe()
will be invoked if the kprobe is optimized but disabled,
but that means the kprobe (optprobe) is unused (and
unoptimized) state.

In that case, unoptimize_kprobe() puts it in freeing_list
and kprobe_optimizer (do_unoptimize_kprobes()) automatically
disarm it. Thus this arch_disarm_kprobe() is redundant.

Link: http://lkml.kernel.org/r/158927058719.27680.17183632908465341189.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agokprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex
Masami Hiramatsu [Tue, 12 May 2020 08:02:56 +0000 (17:02 +0900)]
kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex

In kprobe_optimizer() kick_kprobe_optimizer() is called
without kprobe_mutex, but this can race with other caller
which is protected by kprobe_mutex.

To fix that, expand kprobe_mutex protected area to protect
kick_kprobe_optimizer() call.

Link: http://lkml.kernel.org/r/158927057586.27680.5036330063955940456.stgit@devnote2
Fixes: cd7ebe2298ff ("kprobes: Use text_poke_smp_batch for optimizing")
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ziqian SUN <zsun@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agokprobes: Use non RCU traversal APIs on kprobe_tables if possible
Masami Hiramatsu [Tue, 12 May 2020 08:02:44 +0000 (17:02 +0900)]
kprobes: Use non RCU traversal APIs on kprobe_tables if possible

Current kprobes uses RCU traversal APIs on kprobe_tables
even if it is safe because kprobe_mutex is locked.

Make those traversals to non-RCU APIs where the kprobe_mutex
is locked.

Link: http://lkml.kernel.org/r/158927056452.27680.9710575332163005121.stgit@devnote2
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agokprobes: Suppress the suspicious RCU warning on kprobes
Masami Hiramatsu [Tue, 12 May 2020 08:02:33 +0000 (17:02 +0900)]
kprobes: Suppress the suspicious RCU warning on kprobes

Anders reported that the lockdep warns that suspicious
RCU list usage in register_kprobe() (detected by
CONFIG_PROVE_RCU_LIST.) This is because get_kprobe()
access kprobe_table[] by hlist_for_each_entry_rcu()
without rcu_read_lock.

If we call get_kprobe() from the breakpoint handler context,
it is run with preempt disabled, so this is not a problem.
But in other cases, instead of rcu_read_lock(), we locks
kprobe_mutex so that the kprobe_table[] is not updated.
So, current code is safe, but still not good from the view
point of RCU.

Joel suggested that we can silent that warning by passing
lockdep_is_held() to the last argument of
hlist_for_each_entry_rcu().

Add lockdep_is_held(&kprobe_mutex) at the end of the
hlist_for_each_entry_rcu() to suppress the warning.

Link: http://lkml.kernel.org/r/158927055350.27680.10261450713467997503.stgit@devnote2
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Suggested-by: Joel Fernandes <joel@joelfernandes.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agorecordmcount: support >64k sections
Sami Tolvanen [Fri, 24 Apr 2020 19:30:46 +0000 (12:30 -0700)]
recordmcount: support >64k sections

When compiling a kernel with Clang and LTO, we need to run
recordmcount on vmlinux.o with a large number of sections, which
currently fails as the program doesn't understand extended
section indexes. This change adds support for processing binaries
with >64k sections.

Link: https://lkml.kernel.org/r/20200424193046.160744-1-samitolvanen@google.com
Link: https://lore.kernel.org/lkml/CAK7LNARbZhoaA=Nnuw0=gBrkuKbr_4Ng_Ei57uafujZf7Xazgw@mail.gmail.com/
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Matt Helsley <mhelsley@vmware.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agokbuild: improve cc-option to clean up all temporary files
Masahiro Yamada [Sun, 14 Jun 2020 14:43:40 +0000 (23:43 +0900)]
kbuild: improve cc-option to clean up all temporary files

When cc-option and friends evaluate compiler flags, the temporary file
$$TMP is created as an output object, and automatically cleaned up.
The actual file path of $$TMP is .<pid>.tmp, here <pid> is the process
ID of $(shell ...) invoked from cc-option. (Please note $$$$ is the
escape sequence of $$).

Such garbage files are cleaned up in most cases, but some compiler flags
create additional output files.

For example, -gsplit-dwarf creates a .dwo file.

When CONFIG_DEBUG_INFO_SPLIT=y, you will see a bunch of .<pid>.dwo files
left in the top of build directories. You may not notice them unless you
do 'ls -a', but the garbage files will increase every time you run 'make'.

This commit changes the temporary object path to .tmp_<pid>/tmp, and
removes .tmp_<pid> directory when exiting. Separate build artifacts such
as *.dwo will be cleaned up all together because their file paths are
usually determined based on the base name of the object.

Another example is -ftest-coverage, which outputs the coverage data into
<base-name-of-object>.gcno

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Wed, 17 Jun 2020 00:44:54 +0000 (17:44 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

 1) Don't get per-cpu pointer with preemption enabled in nft_set_pipapo,
    fix from Stefano Brivio.

 2) Fix memory leak in ctnetlink, from Pablo Neira Ayuso.

 3) Multiple definitions of MPTCP_PM_MAX_ADDR, from Geliang Tang.

 4) Accidently disabling NAPI in non-error paths of macb_open(), from
    Charles Keepax.

 5) Fix races between alx_stop and alx_remove, from Zekun Shen.

 6) We forget to re-enable SRIOV during resume in bnxt_en driver, from
    Michael Chan.

 7) Fix memory leak in ipv6_mc_destroy_dev(), from Wang Hai.

 8) rxtx stats use wrong index in mvpp2 driver, from Sven Auhagen.

 9) Fix memory leak in mptcp_subflow_create_socket error path, from Wei
    Yongjun.

10) We should not adjust the TCP window advertised when sending dup acks
    in non-SACK mode, because it won't be counted as a dup by the sender
    if the window size changes. From Eric Dumazet.

11) Destroy the right number of queues during remove in mvpp2 driver,
    from Sven Auhagen.

12) Various WOL and PM fixes to e1000 driver, from Chen Yu, Vaibhav
    Gupta, and Arnd Bergmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
  e1000e: fix unused-function warning
  e1000: use generic power management
  e1000e: Do not wake up the system via WOL if device wakeup is disabled
  lan743x: add MODULE_DEVICE_TABLE for module loading alias
  mlxsw: spectrum: Adjust headroom buffers for 8x ports
  bareudp: Fixed configuration to avoid having garbage values
  mvpp2: remove module bugfix
  tcp: grow window for OOO packets only for SACK flows
  mptcp: fix memory leak in mptcp_subflow_create_socket()
  netfilter: flowtable: Make nf_flow_table_offload_add/del_cb inline
  net/sched: act_ct: Make tcf_ct_flow_table_restore_skb inline
  net: dsa: sja1105: fix PTP timestamping with large tc-taprio cycles
  mvpp2: ethtool rxtx stats fix
  MAINTAINERS: switch to my private email for Renesas Ethernet drivers
  rocker: fix incorrect error handling in dma_rings_init
  test_objagg: Fix potential memory leak in error handling
  net: ethernet: mtk-star-emac: simplify interrupt handling
  mld: fix memory leak in ipv6_mc_destroy_dev()
  bnxt_en: Return from timer if interface is not in open state.
  bnxt_en: Fix AER reset logic on 57500 chips.
  ...

4 years agoMerge tag 'afs-fixes-20200616' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowe...
Linus Torvalds [Wed, 17 Jun 2020 00:40:51 +0000 (17:40 -0700)]
Merge tag 'afs-fixes-20200616' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:
 "I've managed to get xfstests kind of working with afs. Here are a set
  of patches that fix most of the bugs found.

  There are a number of primary issues:

   - Incorrect handling of mtime and non-handling of ctime. It might be
     argued, that the latter isn't a bug since the AFS protocol doesn't
     support ctime, but I should probably still update it locally.

   - Shared-write mmap, truncate and writeback bugs. This includes not
     changing i_size under the callback lock, overwriting local i_size
     with the reply from the server after a partial writeback, not
     limiting the writeback from an mmapped page to EOF.

   - Checks for an abort code indicating that the primary vnode in an
     operation was deleted by a third-party are done in the wrong place.

   - Silly rename bugs. This includes an incomplete conversion to the
     new operation handling, duplicate nlink handling, nlink changing
     not being done inside the callback lock and insufficient handling
     of third-party conflicting directory changes.

  And some secondary ones:

   - The UAEOVERFLOW abort code should map to EOVERFLOW not EREMOTEIO.

   - Remove a couple of unused or incompletely used bits.

   - Remove a couple of redundant success checks.

  These seem to fix all the data-corruption bugs found by

./check -afs -g quick

  along with the obvious silly rename bugs and time bugs.

  There are still some test failures, but they seem to fall into two
  classes: firstly, the authentication/security model is different to
  the standard UNIX model and permission is arbitrated by the server and
  cached locally; and secondly, there are a number of features that AFS
  does not support (such as mknod). But in these cases, the tests
  themselves need to be adapted or skipped.

  Using the in-kernel afs client with xfstests also found a bug in the
  AuriStor AFS server that has been fixed for a future release"

* tag 'afs-fixes-20200616' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix silly rename
  afs: afs_vnode_commit_status() doesn't need to check the RPC error
  afs: Fix use of afs_check_for_remote_deletion()
  afs: Remove afs_operation::abort_code
  afs: Fix yfs_fs_fetch_status() to honour vnode selector
  afs: Remove yfs_fs_fetch_file_status() as it's not used
  afs: Fix the mapping of the UAEOVERFLOW abort code
  afs: Fix truncation issues and mmap writeback size
  afs: Concoct ctimes
  afs: Fix EOF corruption
  afs: afs_write_end() should change i_size under the right lock
  afs: Fix non-setting of mtime when writing into mmap

4 years agoDocumentation: remove SH-5 index entries
Randy Dunlap [Mon, 15 Jun 2020 02:59:07 +0000 (19:59 -0700)]
Documentation: remove SH-5 index entries

Remove SH-5 documentation index entries following the removal
of SH-5 source code.

Error: Cannot open file ../arch/sh/mm/tlb-sh5.c
Error: Cannot open file ../arch/sh/mm/tlb-sh5.c
Error: Cannot open file ../arch/sh/include/asm/tlb_64.h
Error: Cannot open file ../arch/sh/include/asm/tlb_64.h

Fixes: 3b69e8b45711 ("Merge tag 'sh-for-5.8' of git://git.libc.org/linux-sh")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Rich Felker <dalias@libc.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: ysato@users.sourceforge.jp
Cc: linux-sh@vger.kernel.org
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoselinux: fix a double free in cond_read_node()/cond_read_list()
Tom Rix [Mon, 15 Jun 2020 20:45:48 +0000 (13:45 -0700)]
selinux: fix a double free in cond_read_node()/cond_read_list()

Clang static analysis reports this double free error

security/selinux/ss/conditional.c:139:2: warning: Attempt to free released memory [unix.Malloc]
        kfree(node->expr.nodes);
        ^~~~~~~~~~~~~~~~~~~~~~~

When cond_read_node fails, it calls cond_node_destroy which frees the
node but does not poison the entry in the node list.  So when it
returns to its caller cond_read_list, cond_read_list deletes the
partial list.  The latest entry in the list will be deleted twice.

So instead of freeing the node in cond_read_node, let list freeing in
code_read_list handle the freeing the problem node along with all of the
earlier nodes.

Because cond_read_node no longer does any error handling, the goto's
the error case are redundant.  Instead just return the error code.

Cc: stable@vger.kernel.org
Fixes: 60abd3181db2 ("selinux: convert cond_list to array")
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
4 years agoMerge tag 'flex-array-conversions-5.8-rc2' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Wed, 17 Jun 2020 00:23:57 +0000 (17:23 -0700)]
Merge tag 'flex-array-conversions-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull flexible-array member conversions from Gustavo A. R. Silva:
 "Replace zero-length arrays with flexible-array members.

  Notice that all of these patches have been baking in linux-next for
  two development cycles now.

  There is a regular need in the kernel to provide a way to declare
  having a dynamically sized set of trailing elements in a structure.
  Kernel code should always use “flexible array members”[1] for these
  cases. The older style of one-element or zero-length arrays should no
  longer be used[2].

  C99 introduced “flexible array members”, which lacks a numeric size
  for the array declaration entirely:

        struct something {
                size_t count;
                struct foo items[];
        };

  This is the way the kernel expects dynamically sized trailing elements
  to be declared. It allows the compiler to generate errors when the
  flexible array does not occur last in the structure, which helps to
  prevent some kind of undefined behavior[3] bugs from being
  inadvertently introduced to the codebase.

  It also allows the compiler to correctly analyze array sizes (via
  sizeof(), CONFIG_FORTIFY_SOURCE, and CONFIG_UBSAN_BOUNDS). For
  instance, there is no mechanism that warns us that the following
  application of the sizeof() operator to a zero-length array always
  results in zero:

        struct something {
                size_t count;
                struct foo items[0];
        };

        struct something *instance;

        instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
        instance->count = count;

        size = sizeof(instance->items) * instance->count;
        memcpy(instance->items, source, size);

  At the last line of code above, size turns out to be zero, when one
  might have thought it represents the total size in bytes of the
  dynamic memory recently allocated for the trailing array items. Here
  are a couple examples of this issue[4][5].

  Instead, flexible array members have incomplete type, and so the
  sizeof() operator may not be applied[6], so any misuse of such
  operators will be immediately noticed at build time.

  The cleanest and least error-prone way to implement this is through
  the use of a flexible array member:

        struct something {
                size_t count;
                struct foo items[];
        };

        struct something *instance;

        instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
        instance->count = count;

        size = sizeof(instance->items[0]) * instance->count;
        memcpy(instance->items, source, size);

  instead"

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
[4] commit f2cd32a443da ("rndis_wlan: Remove logically dead code")
[5] commit ab91c2a89f86 ("tpm: eventlog: Replace zero-length array with flexible-array member")
[6] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html

* tag 'flex-array-conversions-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (41 commits)
  w1: Replace zero-length array with flexible-array
  tracing/probe: Replace zero-length array with flexible-array
  soc: ti: Replace zero-length array with flexible-array
  tifm: Replace zero-length array with flexible-array
  dmaengine: tegra-apb: Replace zero-length array with flexible-array
  stm class: Replace zero-length array with flexible-array
  Squashfs: Replace zero-length array with flexible-array
  ASoC: SOF: Replace zero-length array with flexible-array
  ima: Replace zero-length array with flexible-array
  sctp: Replace zero-length array with flexible-array
  phy: samsung: Replace zero-length array with flexible-array
  RxRPC: Replace zero-length array with flexible-array
  rapidio: Replace zero-length array with flexible-array
  media: pwc: Replace zero-length array with flexible-array
  firmware: pcdp: Replace zero-length array with flexible-array
  oprofile: Replace zero-length array with flexible-array
  block: Replace zero-length array with flexible-array
  tools/testing/nvdimm: Replace zero-length array with flexible-array
  libata: Replace zero-length array with flexible-array
  kprobes: Replace zero-length array with flexible-array
  ...

4 years agox86/purgatory: Add -fno-stack-protector
Arvind Sankar [Tue, 16 Jun 2020 22:25:47 +0000 (18:25 -0400)]
x86/purgatory: Add -fno-stack-protector

The purgatory Makefile removes -fstack-protector options if they were
configured in, but does not currently add -fno-stack-protector.

If gcc was configured with the --enable-default-ssp configure option,
this results in the stack protector still being enabled for the
purgatory (absent distro-specific specs files that might disable it
again for freestanding compilations), if the main kernel is being
compiled with stack protection enabled (if it's disabled for the main
kernel, the top-level Makefile will add -fno-stack-protector).

This will break the build since commit
  e4160b2e4b02 ("x86/purgatory: Fail the build if purgatory.ro has missing symbols")
and prior to that would have caused runtime failure when trying to use
kexec.

Explicitly add -fno-stack-protector to avoid this, as done in other
Makefiles that need to disable the stack protector.

Reported-by: Gabriel C <nix.or.die@googlemail.com>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Tue, 16 Jun 2020 23:16:24 +0000 (16:16 -0700)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2020-06-16

This series contains fixes to e1000 and e1000e.

Chen fixes an e1000e issue where systems could be waken via WoL, even
though the user has disabled the wakeup bit via sysfs.

Vaibhav Gupta updates the e1000 driver to clean up the legacy Power
Management hooks.

Arnd Bergmann cleans up the inconsistent use CONFIG_PM_SLEEP
preprocessor tags, which also resolves the compiler warnings about the
possibility of unused structure.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoe1000e: fix unused-function warning
Arnd Bergmann [Wed, 27 May 2020 13:47:00 +0000 (15:47 +0200)]
e1000e: fix unused-function warning

The CONFIG_PM_SLEEP #ifdef checks in this file are inconsistent,
leading to a warning about sometimes unused function:

drivers/net/ethernet/intel/e1000e/netdev.c:137:13: error: unused function 'e1000e_check_me' [-Werror,-Wunused-function]

Rather than adding more #ifdefs, just remove them completely
and mark the PM functions as __maybe_unused to let the compiler
work it out on it own.

Fixes: e086ba2fccda ("e1000e: disable s0ix entry and exit flows for ME systems")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoe1000: use generic power management
Vaibhav Gupta [Mon, 25 May 2020 12:27:10 +0000 (17:57 +0530)]
e1000: use generic power management

With legacy PM hooks, it was the responsibility of a driver to manage PCI
states and also the device's power state. The generic approach is to let PCI
core handle the work.

e1000_suspend() calls __e1000_shutdown() to perform intermediate tasks.
__e1000_shutdown() modifies the value of "wake" (device should be wakeup
enabled or not), responsible for controlling the flow of legacy PM.

Since, PCI core has no idea about the value of "wake", new code for generic
PM may produce unexpected results. Thus, use "device_set_wakeup_enable()"
to wakeup-enable the device accordingly.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoe1000e: Do not wake up the system via WOL if device wakeup is disabled
Chen Yu [Thu, 21 May 2020 17:59:00 +0000 (01:59 +0800)]
e1000e: Do not wake up the system via WOL if device wakeup is disabled

Currently the system will be woken up via WOL(Wake On LAN) even if the
device wakeup ability has been disabled via sysfs:
 cat /sys/devices/pci0000:00/0000:00:1f.6/power/wakeup
 disabled

The system should not be woken up if the user has explicitly
disabled the wake up ability for this device.

This patch clears the WOL ability of this network device if the
user has disabled the wake up ability in sysfs.

Fixes: bc7f75fa9788 ("[E1000E]: New pci-express e1000 driver")
Reported-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agolan743x: add MODULE_DEVICE_TABLE for module loading alias
Tim Harvey [Tue, 16 Jun 2020 15:31:58 +0000 (08:31 -0700)]
lan743x: add MODULE_DEVICE_TABLE for module loading alias

Without a MODULE_DEVICE_TABLE the attributes are missing that create
an alias for auto-loading the module in userspace via hotplug.

Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoafs: Fix silly rename
David Howells [Mon, 15 Jun 2020 16:36:58 +0000 (17:36 +0100)]
afs: Fix silly rename

Fix AFS's silly rename by the following means:

 (1) Set the destination directory in afs_do_silly_rename() so as to avoid
     misbehaviour and indicate that the directory data version will
     increment by 1 so as to avoid warnings about unexpected changes in the
     DV.  Also indicate that the ctime should be updated to avoid xfstest
     grumbling.

 (2) Note when the server indicates that a directory changed more than we
     expected (AFS_OPERATION_DIR_CONFLICT), indicating a conflict with a
     third party change, checking on successful completion of unlink and
     rename.

     The problem is that the FS.RemoveFile RPC op doesn't report the status
     of the unlinked file, though YFS.RemoveFile2 does.  This can be
     mitigated by the assumption that if the directory DV cranked by
     exactly 1, we can be sure we removed one link from the file; further,
     ordinarily in AFS, files cannot be hardlinked across directories, so
     if we reduce nlink to 0, the file is deleted.

     However, if the directory DV jumps by more than 1, we cannot know if a
     third party intervened by adding or removing a link on the file we
     just removed a link from.

     The same also goes for any vnode that is at the destination of the
     FS.Rename RPC op.

 (3) Make afs_vnode_commit_status() apply the nlink drop inside the cb_lock
     section along with the other attribute updates if ->op_unlinked is set
     on the descriptor for the appropriate vnode.

 (4) Issue a follow up status fetch to the unlinked file in the event of a
     third party conflict that makes it impossible for us to know if we
     actually deleted the file or not.

 (5) Provide a flag, AFS_VNODE_SILLY_DELETED, to make afs_getattr() lie to
     the user about the nlink of a silly deleted file so that it appears as
     0, not 1.

Found with the generic/035 and generic/084 xfstests.

Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
4 years agomlxsw: spectrum: Adjust headroom buffers for 8x ports
Ido Schimmel [Tue, 16 Jun 2020 07:14:58 +0000 (10:14 +0300)]
mlxsw: spectrum: Adjust headroom buffers for 8x ports

The port's headroom buffers are used to store packets while they
traverse the device's pipeline and also to store packets that are egress
mirrored.

On Spectrum-3, ports with eight lanes use two headroom buffers between
which the configured headroom size is split.

In order to prevent packet loss, multiply the calculated headroom size
by two for 8x ports.

Fixes: da382875c616 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobareudp: Fixed configuration to avoid having garbage values
Martin [Tue, 16 Jun 2020 05:48:58 +0000 (11:18 +0530)]
bareudp: Fixed configuration to avoid having garbage values

Code to initialize the conf structure while gathering the configuration
of the device was missing.

Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Martin <martin.varghese@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomvpp2: remove module bugfix
Sven Auhagen [Tue, 16 Jun 2020 04:35:29 +0000 (06:35 +0200)]
mvpp2: remove module bugfix

The remove function does not destroy all
BM Pools when per cpu pool is active.

When reloading the mvpp2 as a module the BM Pools
are still active in hardware and due to the bug
have twice the size now old + new.

This eventually leads to a kernel crash.

v2:
* add Fixes tag

Fixes: 7d04b0b13b11 ("mvpp2: percpu buffers")
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotcp: grow window for OOO packets only for SACK flows
Eric Dumazet [Tue, 16 Jun 2020 03:37:07 +0000 (20:37 -0700)]
tcp: grow window for OOO packets only for SACK flows

Back in 2013, we made a change that broke fast retransmit
for non SACK flows.

Indeed, for these flows, a sender needs to receive three duplicate
ACK before starting fast retransmit. Sending ACK with different
receive window do not count.

Even if enabling SACK is strongly recommended these days,
there still are some cases where it has to be disabled.

Not increasing the window seems better than having to
rely on RTO.

After the fix, following packetdrill test gives :

// Initialize connection
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 8>
   +0 < . 1:1(0) ack 1 win 514

   +0 accept(3, ..., ...) = 4

   +0 < . 1:1001(1000) ack 1 win 514
// Quick ack
   +0 > . 1:1(0) ack 1001 win 264

   +0 < . 2001:3001(1000) ack 1 win 514
// DUPACK : Normally we should not change the window
   +0 > . 1:1(0) ack 1001 win 264

   +0 < . 3001:4001(1000) ack 1 win 514
// DUPACK : Normally we should not change the window
   +0 > . 1:1(0) ack 1001 win 264

   +0 < . 4001:5001(1000) ack 1 win 514
// DUPACK : Normally we should not change the window
    +0 > . 1:1(0) ack 1001 win 264

   +0 < . 1001:2001(1000) ack 1 win 514
// Hole is repaired.
   +0 > . 1:1(0) ack 5001 win 272

Fixes: 4e4f1fc22681 ("tcp: properly increase rcv_ssthresh for ofo packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'mfd-fixes-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Linus Torvalds [Tue, 16 Jun 2020 18:07:02 +0000 (11:07 -0700)]
Merge tag 'mfd-fixes-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MFD fix from Lee Jones:
 "Fix NULL pointer dereference in mt6360 driver"

* tag 'mfd-fixes-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
  mfd: mt6360: Fix register driver NULL pointer by adding driver name

4 years agoarm64: sve: Fix build failure when ARM64_SVE=y and SYSCTL=n
Will Deacon [Tue, 16 Jun 2020 17:29:11 +0000 (18:29 +0100)]
arm64: sve: Fix build failure when ARM64_SVE=y and SYSCTL=n

When I squashed the 'allnoconfig' compiler warning about the
set_sve_default_vl() function being defined but not used in commit
1e570f512cbd ("arm64/sve: Eliminate data races on sve_default_vl"), I
accidentally broke the build for configs where ARM64_SVE is enabled, but
SYSCTL is not.

Fix this by only compiling the SVE sysctl support if both CONFIG_SVE=y
and CONFIG_SYSCTL=y.

Cc: Dave Martin <Dave.Martin@arm.com>
Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200616131808.GA1040@lca.pw
Signed-off-by: Will Deacon <will@kernel.org>
4 years agobtrfs: use kfree() in btrfs_ioctl_get_subvol_info()
Waiman Long [Tue, 16 Jun 2020 15:31:59 +0000 (11:31 -0400)]
btrfs: use kfree() in btrfs_ioctl_get_subvol_info()

In btrfs_ioctl_get_subvol_info(), there is a classic case where kzalloc()
was incorrectly paired with kzfree(). According to David Sterba, there
isn't any sensitive information in the subvol_info that needs to be
cleared before freeing. So kzfree() isn't really needed, use kfree()
instead.

Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agobtrfs: fix RWF_NOWAIT writes blocking on extent locks and waiting for IO
Filipe Manana [Mon, 15 Jun 2020 17:49:39 +0000 (18:49 +0100)]
btrfs: fix RWF_NOWAIT writes blocking on extent locks and waiting for IO

A RWF_NOWAIT write is not supposed to wait on filesystem locks that can be
held for a long time or for ongoing IO to complete.

However when calling check_can_nocow(), if the inode has prealloc extents
or has the NOCOW flag set, we can block on extent (file range) locks
through the call to btrfs_lock_and_flush_ordered_range(). Such lock can
take a significant amount of time to be available. For example, a fiemap
task may be running, and iterating through the entire file range checking
all extents and doing backref walking to determine if they are shared,
or a readpage operation may be in progress.

Also at btrfs_lock_and_flush_ordered_range(), called by check_can_nocow(),
after locking the file range we wait for any existing ordered extent that
is in progress to complete. Another operation that can take a significant
amount of time and defeat the purpose of RWF_NOWAIT.

So fix this by trying to lock the file range and if it's currently locked
return -EAGAIN to user space. If we are able to lock the file range without
waiting and there is an ordered extent in the range, return -EAGAIN as
well, instead of waiting for it to complete. Finally, don't bother trying
to lock the snapshot lock of the root when attempting a RWF_NOWAIT write,
as that is only important for buffered writes.

Fixes: edf064e7c6fec3 ("btrfs: nowait aio support")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agobtrfs: fix RWF_NOWAIT write not failling when we need to cow
Filipe Manana [Mon, 15 Jun 2020 17:49:13 +0000 (18:49 +0100)]
btrfs: fix RWF_NOWAIT write not failling when we need to cow

If we attempt to do a RWF_NOWAIT write against a file range for which we
can only do NOCOW for a part of it, due to the existence of holes or
shared extents for example, we proceed with the write as if it were
possible to NOCOW the whole range.

Example:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ touch /mnt/sdj/bar
  $ chattr +C /mnt/sdj/bar

  $ xfs_io -d -c "pwrite -S 0xab -b 256K 0 256K" /mnt/bar
  wrote 262144/262144 bytes at offset 0
  256 KiB, 1 ops; 0.0003 sec (694.444 MiB/sec and 2777.7778 ops/sec)

  $ xfs_io -c "fpunch 64K 64K" /mnt/bar
  $ sync

  $ xfs_io -d -c "pwrite -N -V 1 -b 128K -S 0xfe 0 128K" /mnt/bar
  wrote 131072/131072 bytes at offset 0
  128 KiB, 1 ops; 0.0007 sec (160.051 MiB/sec and 1280.4097 ops/sec)

This last write should fail with -EAGAIN since the file range from 64K to
128K is a hole. On xfs it fails, as expected, but on ext4 it currently
succeeds because apparently it is expensive to check if there are extents
allocated for the whole range, but I'll check with the ext4 people.

Fix the issue by checking if check_can_nocow() returns a number of
NOCOW'able bytes smaller then the requested number of bytes, and if it
does return -EAGAIN.

Fixes: edf064e7c6fec3 ("btrfs: nowait aio support")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agobtrfs: fix failure of RWF_NOWAIT write into prealloc extent beyond eof
Filipe Manana [Mon, 15 Jun 2020 17:48:58 +0000 (18:48 +0100)]
btrfs: fix failure of RWF_NOWAIT write into prealloc extent beyond eof

If we attempt to write to prealloc extent located after eof using a
RWF_NOWAIT write, we always fail with -EAGAIN.

We do actually check if we have an allocated extent for the write at
the start of btrfs_file_write_iter() through a call to check_can_nocow(),
but later when we go into the actual direct IO write path we simply
return -EAGAIN if the write starts at or beyond EOF.

Trivial to reproduce:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ touch /mnt/foo
  $ chattr +C /mnt/foo

  $ xfs_io -d -c "pwrite -S 0xab 0 64K" /mnt/foo
  wrote 65536/65536 bytes at offset 0
  64 KiB, 16 ops; 0.0004 sec (135.575 MiB/sec and 34707.1584 ops/sec)

  $ xfs_io -c "falloc -k 64K 1M" /mnt/foo

  $ xfs_io -d -c "pwrite -N -V 1 -S 0xfe -b 64K 64K 64K" /mnt/foo
  pwrite: Resource temporarily unavailable

On xfs and ext4 the write succeeds, as expected.

Fix this by removing the wrong check at btrfs_direct_IO().

Fixes: edf064e7c6fec3 ("btrfs: nowait aio support")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agobtrfs: fix hang on snapshot creation after RWF_NOWAIT write
Filipe Manana [Mon, 15 Jun 2020 17:46:01 +0000 (18:46 +0100)]
btrfs: fix hang on snapshot creation after RWF_NOWAIT write

If we do a successful RWF_NOWAIT write we end up locking the snapshot lock
of the inode, through a call to check_can_nocow(), but we never unlock it.

This means the next attempt to create a snapshot on the subvolume will
hang forever.

Trivial reproducer:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ touch /mnt/foobar
  $ chattr +C /mnt/foobar
  $ xfs_io -d -c "pwrite -S 0xab 0 64K" /mnt/foobar
  $ xfs_io -d -c "pwrite -N -V 1 -S 0xfe 0 64K" /mnt/foobar

  $ btrfs subvolume snapshot -r /mnt /mnt/snap
    --> hangs

Fix this by unlocking the snapshot lock if check_can_nocow() returned
success.

Fixes: edf064e7c6fec3 ("btrfs: nowait aio support")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agobtrfs: check if a log root exists before locking the log_mutex on unlink
Filipe Manana [Mon, 15 Jun 2020 09:38:44 +0000 (10:38 +0100)]
btrfs: check if a log root exists before locking the log_mutex on unlink

This brings back an optimization that commit e678934cbe5f02 ("btrfs:
Remove unnecessary check from join_running_log_trans") removed, but in
a different form. So it's almost equivalent to a revert.

That commit removed an optimization where we avoid locking a root's
log_mutex when there is no log tree created in the current transaction.
The affected code path is triggered through unlink operations.

That commit was based on the assumption that the optimization was not
necessary because we used to have the following checks when the patch
was authored:

  int btrfs_del_dir_entries_in_log(...)
  {
        (...)
        if (dir->logged_trans < trans->transid)
            return 0;

        ret = join_running_log_trans(root);
        (...)
   }

   int btrfs_del_inode_ref_in_log(...)
   {
        (...)
        if (inode->logged_trans < trans->transid)
            return 0;

        ret = join_running_log_trans(root);
        (...)
   }

However before that patch was merged, another patch was merged first which
replaced those checks because they were buggy.

That other patch corresponds to commit 803f0f64d17769 ("Btrfs: fix fsync
not persisting dentry deletions due to inode evictions"). The assumption
that if the logged_trans field of an inode had a smaller value then the
current transaction's generation (transid) meant that the inode was not
logged in the current transaction was only correct if the inode was not
evicted and reloaded in the current transaction. So the corresponding bug
fix changed those checks and replaced them with the following helper
function:

  static bool inode_logged(struct btrfs_trans_handle *trans,
                           struct btrfs_inode *inode)
  {
        if (inode->logged_trans == trans->transid)
                return true;

        if (inode->last_trans == trans->transid &&
            test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags) &&
            !test_bit(BTRFS_FS_LOG_RECOVERING, &trans->fs_info->flags))
                return true;

        return false;
  }

So if we have a subvolume without a log tree in the current transaction
(because we had no fsyncs), every time we unlink an inode we can end up
trying to lock the log_mutex of the root through join_running_log_trans()
twice, once for the inode being unlinked (by btrfs_del_inode_ref_in_log())
and once for the parent directory (with btrfs_del_dir_entries_in_log()).

This means if we have several unlink operations happening in parallel for
inodes in the same subvolume, and the those inodes and/or their parent
inode were changed in the current transaction, we end up having a lot of
contention on the log_mutex.

The test robots from intel reported a -30.7% performance regression for
a REAIM test after commit e678934cbe5f02 ("btrfs: Remove unnecessary check
from join_running_log_trans").

So just bring back the optimization to join_running_log_trans() where we
check first if a log root exists before trying to lock the log_mutex. This
is done by checking for a bit that is set on the root when a log tree is
created and removed when a log tree is freed (at transaction commit time).

Commit e678934cbe5f02 ("btrfs: Remove unnecessary check from
join_running_log_trans") was merged in the 5.4 merge window while commit
803f0f64d17769 ("Btrfs: fix fsync not persisting dentry deletions due to
inode evictions") was merged in the 5.3 merge window. But the first
commit was actually authored before the second commit (May 23 2019 vs
June 19 2019).

Reported-by: kernel test robot <rong.a.chen@intel.com>
Link: https://lore.kernel.org/lkml/20200611090233.GL12456@shao2-debian/
Fixes: e678934cbe5f02 ("btrfs: Remove unnecessary check from join_running_log_trans")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agobtrfs: fix bytes_may_use underflow when running balance and scrub in parallel
Filipe Manana [Mon, 8 Jun 2020 12:33:05 +0000 (13:33 +0100)]
btrfs: fix bytes_may_use underflow when running balance and scrub in parallel

When balance and scrub are running in parallel it is possible to end up
with an underflow of the bytes_may_use counter of the data space_info
object, which triggers a warning like the following:

   [134243.793196] BTRFS info (device sdc): relocating block group 1104150528 flags data
   [134243.806891] ------------[ cut here ]------------
   [134243.807561] WARNING: CPU: 1 PID: 26884 at fs/btrfs/space-info.h:125 btrfs_add_reserved_bytes+0x1da/0x280 [btrfs]
   [134243.808819] Modules linked in: btrfs blake2b_generic xor (...)
   [134243.815779] CPU: 1 PID: 26884 Comm: kworker/u8:8 Tainted: G        W         5.6.0-rc7-btrfs-next-58 #5
   [134243.816944] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
   [134243.818389] Workqueue: writeback wb_workfn (flush-btrfs-108483)
   [134243.819186] RIP: 0010:btrfs_add_reserved_bytes+0x1da/0x280 [btrfs]
   [134243.819963] Code: 0b f2 85 (...)
   [134243.822271] RSP: 0018:ffffa4160aae7510 EFLAGS: 00010287
   [134243.822929] RAX: 000000000000c000 RBX: ffff96159a8c1000 RCX: 0000000000000000
   [134243.823816] RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffff96158067a810
   [134243.824742] RBP: ffff96158067a800 R08: 0000000000000001 R09: 0000000000000000
   [134243.825636] R10: ffff961501432a40 R11: 0000000000000000 R12: 000000000000c000
   [134243.826532] R13: 0000000000000001 R14: ffffffffffff4000 R15: ffff96158067a810
   [134243.827432] FS:  0000000000000000(0000) GS:ffff9615baa00000(0000) knlGS:0000000000000000
   [134243.828451] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [134243.829184] CR2: 000055bd7e414000 CR3: 00000001077be004 CR4: 00000000003606e0
   [134243.830083] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   [134243.830975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   [134243.831867] Call Trace:
   [134243.832211]  find_free_extent+0x4a0/0x16c0 [btrfs]
   [134243.832846]  btrfs_reserve_extent+0x91/0x180 [btrfs]
   [134243.833487]  cow_file_range+0x12d/0x490 [btrfs]
   [134243.834080]  fallback_to_cow+0x82/0x1b0 [btrfs]
   [134243.834689]  ? release_extent_buffer+0x121/0x170 [btrfs]
   [134243.835370]  run_delalloc_nocow+0x33f/0xa30 [btrfs]
   [134243.836032]  btrfs_run_delalloc_range+0x1ea/0x6d0 [btrfs]
   [134243.836725]  ? find_lock_delalloc_range+0x221/0x250 [btrfs]
   [134243.837450]  writepage_delalloc+0xe8/0x150 [btrfs]
   [134243.838059]  __extent_writepage+0xe8/0x4c0 [btrfs]
   [134243.838674]  extent_write_cache_pages+0x237/0x530 [btrfs]
   [134243.839364]  extent_writepages+0x44/0xa0 [btrfs]
   [134243.839946]  do_writepages+0x23/0x80
   [134243.840401]  __writeback_single_inode+0x59/0x700
   [134243.841006]  writeback_sb_inodes+0x267/0x5f0
   [134243.841548]  __writeback_inodes_wb+0x87/0xe0
   [134243.842091]  wb_writeback+0x382/0x590
   [134243.842574]  ? wb_workfn+0x4a2/0x6c0
   [134243.843030]  wb_workfn+0x4a2/0x6c0
   [134243.843468]  process_one_work+0x26d/0x6a0
   [134243.843978]  worker_thread+0x4f/0x3e0
   [134243.844452]  ? process_one_work+0x6a0/0x6a0
   [134243.844981]  kthread+0x103/0x140
   [134243.845400]  ? kthread_create_worker_on_cpu+0x70/0x70
   [134243.846030]  ret_from_fork+0x3a/0x50
   [134243.846494] irq event stamp: 0
   [134243.846892] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
   [134243.847682] hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
   [134243.848687] softirqs last  enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
   [134243.849913] softirqs last disabled at (0): [<0000000000000000>] 0x0
   [134243.850698] ---[ end trace bd7c03622e0b0a96 ]---
   [134243.851335] ------------[ cut here ]------------

When relocating a data block group, for each extent allocated in the
block group we preallocate another extent with the same size for the
data relocation inode (we do it at prealloc_file_extent_cluster()).
We reserve space by calling btrfs_check_data_free_space(), which ends
up incrementing the data space_info's bytes_may_use counter, and
then call btrfs_prealloc_file_range() to allocate the extent, which
always decrements the bytes_may_use counter by the same amount.

The expectation is that writeback of the data relocation inode always
follows a NOCOW path, by writing into the preallocated extents. However,
when starting writeback we might end up falling back into the COW path,
because the block group that contains the preallocated extent was turned
into RO mode by a scrub running in parallel. The COW path then calls the
extent allocator which ends up calling btrfs_add_reserved_bytes(), and
this function decrements the bytes_may_use counter of the data space_info
object by an amount corresponding to the size of the allocated extent,
despite we haven't previously incremented it. When the counter currently
has a value smaller then the allocated extent we reset the counter to 0
and emit a warning, otherwise we just decrement it and slowly mess up
with this counter which is crucial for space reservation, the end result
can be granting reserved space to tasks when there isn't really enough
free space, and having the tasks fail later in critical places where
error handling consists of a transaction abort or hitting a BUG_ON().

Fix this by making sure that if we fallback to the COW path for a data
relocation inode, we increment the bytes_may_use counter of the data
space_info object. The COW path will then decrement it at
btrfs_add_reserved_bytes() on success or through its error handling part
by a call to extent_clear_unlock_delalloc() (which ends up calling
btrfs_clear_delalloc_extent() that does the decrement operation) in case
of an error.

Test case btrfs/061 from fstests could sporadically trigger this.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agobtrfs: fix data block group relocation failure due to concurrent scrub
Filipe Manana [Mon, 8 Jun 2020 12:32:55 +0000 (13:32 +0100)]
btrfs: fix data block group relocation failure due to concurrent scrub

When running relocation of a data block group while scrub is running in
parallel, it is possible that the relocation will fail and abort the
current transaction with an -EINVAL error:

   [134243.988595] BTRFS info (device sdc): found 14 extents, stage: move data extents
   [134243.999871] ------------[ cut here ]------------
   [134244.000741] BTRFS: Transaction aborted (error -22)
   [134244.001692] WARNING: CPU: 0 PID: 26954 at fs/btrfs/ctree.c:1071 __btrfs_cow_block+0x6a7/0x790 [btrfs]
   [134244.003380] Modules linked in: btrfs blake2b_generic xor raid6_pq (...)
   [134244.012577] CPU: 0 PID: 26954 Comm: btrfs Tainted: G        W         5.6.0-rc7-btrfs-next-58 #5
   [134244.014162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
   [134244.016184] RIP: 0010:__btrfs_cow_block+0x6a7/0x790 [btrfs]
   [134244.017151] Code: 48 c7 c7 (...)
   [134244.020549] RSP: 0018:ffffa41607863888 EFLAGS: 00010286
   [134244.021515] RAX: 0000000000000000 RBX: ffff9614bdfe09c8 RCX: 0000000000000000
   [134244.022822] RDX: 0000000000000001 RSI: ffffffffb3d63980 RDI: 0000000000000001
   [134244.024124] RBP: ffff961589e8c000 R08: 0000000000000000 R09: 0000000000000001
   [134244.025424] R10: ffffffffc0ae5955 R11: 0000000000000000 R12: ffff9614bd530d08
   [134244.026725] R13: ffff9614ced41b88 R14: ffff9614bdfe2a48 R15: 0000000000000000
   [134244.028024] FS:  00007f29b63c08c0(0000) GS:ffff9615ba600000(0000) knlGS:0000000000000000
   [134244.029491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [134244.030560] CR2: 00007f4eb339b000 CR3: 0000000130d6e006 CR4: 00000000003606f0
   [134244.031997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   [134244.033153] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   [134244.034484] Call Trace:
   [134244.034984]  btrfs_cow_block+0x12b/0x2b0 [btrfs]
   [134244.035859]  do_relocation+0x30b/0x790 [btrfs]
   [134244.036681]  ? do_raw_spin_unlock+0x49/0xc0
   [134244.037460]  ? _raw_spin_unlock+0x29/0x40
   [134244.038235]  relocate_tree_blocks+0x37b/0x730 [btrfs]
   [134244.039245]  relocate_block_group+0x388/0x770 [btrfs]
   [134244.040228]  btrfs_relocate_block_group+0x161/0x2e0 [btrfs]
   [134244.041323]  btrfs_relocate_chunk+0x36/0x110 [btrfs]
   [134244.041345]  btrfs_balance+0xc06/0x1860 [btrfs]
   [134244.043382]  ? btrfs_ioctl_balance+0x27c/0x310 [btrfs]
   [134244.045586]  btrfs_ioctl_balance+0x1ed/0x310 [btrfs]
   [134244.045611]  btrfs_ioctl+0x1880/0x3760 [btrfs]
   [134244.049043]  ? do_raw_spin_unlock+0x49/0xc0
   [134244.049838]  ? _raw_spin_unlock+0x29/0x40
   [134244.050587]  ? __handle_mm_fault+0x11b3/0x14b0
   [134244.051417]  ? ksys_ioctl+0x92/0xb0
   [134244.052070]  ksys_ioctl+0x92/0xb0
   [134244.052701]  ? trace_hardirqs_off_thunk+0x1a/0x1c
   [134244.053511]  __x64_sys_ioctl+0x16/0x20
   [134244.054206]  do_syscall_64+0x5c/0x280
   [134244.054891]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
   [134244.055819] RIP: 0033:0x7f29b51c9dd7
   [134244.056491] Code: 00 00 00 (...)
   [134244.059767] RSP: 002b:00007ffcccc1dd08 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
   [134244.061168] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f29b51c9dd7
   [134244.062474] RDX: 00007ffcccc1dda0 RSI: 00000000c4009420 RDI: 0000000000000003
   [134244.063771] RBP: 0000000000000003 R08: 00005565cea4b000 R09: 0000000000000000
   [134244.065032] R10: 0000000000000541 R11: 0000000000000202 R12: 00007ffcccc2060a
   [134244.066327] R13: 00007ffcccc1dda0 R14: 0000000000000002 R15: 00007ffcccc1dec0
   [134244.067626] irq event stamp: 0
   [134244.068202] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
   [134244.069351] hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
   [134244.070909] softirqs last  enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
   [134244.072392] softirqs last disabled at (0): [<0000000000000000>] 0x0
   [134244.073432] ---[ end trace bd7c03622e0b0a99 ]---

The -EINVAL error comes from the following chain of function calls:

  __btrfs_cow_block() <-- aborts the transaction
    btrfs_reloc_cow_block()
      replace_file_extents()
        get_new_location() <-- returns -EINVAL

When relocating a data block group, for each allocated extent of the block
group, we preallocate another extent (at prealloc_file_extent_cluster()),
associated with the data relocation inode, and then dirty all its pages.
These preallocated extents have, and must have, the same size that extents
from the data block group being relocated have.

Later before we start the relocation stage that updates pointers (bytenr
field of file extent items) to point to the the new extents, we trigger
writeback for the data relocation inode. The expectation is that writeback
will write the pages to the previously preallocated extents, that it
follows the NOCOW path. That is generally the case, however, if a scrub
is running it may have turned the block group that contains those extents
into RO mode, in which case writeback falls back to the COW path.

However in the COW path instead of allocating exactly one extent with the
expected size, the allocator may end up allocating several smaller extents
due to free space fragmentation - because we tell it at cow_file_range()
that the minimum allocation size can match the filesystem's sector size.
This later breaks the relocation's expectation that an extent associated
to a file extent item in the data relocation inode has the same size as
the respective extent pointed by a file extent item in another tree - in
this case the extent to which the relocation inode poins to is smaller,
causing relocation.c:get_new_location() to return -EINVAL.

For example, if we are relocating a data block group X that has a logical
address of X and the block group has an extent allocated at the logical
address X + 128KiB with a size of 64KiB:

1) At prealloc_file_extent_cluster() we allocate an extent for the data
   relocation inode with a size of 64KiB and associate it to the file
   offset 128KiB (X + 128KiB - X) of the data relocation inode. This
   preallocated extent was allocated at block group Z;

2) A scrub running in parallel turns block group Z into RO mode and
   starts scrubing its extents;

3) Relocation triggers writeback for the data relocation inode;

4) When running delalloc (btrfs_run_delalloc_range()), we try first the
   NOCOW path because the data relocation inode has BTRFS_INODE_PREALLOC
   set in its flags. However, because block group Z is in RO mode, the
   NOCOW path (run_delalloc_nocow()) falls back into the COW path, by
   calling cow_file_range();

5) At cow_file_range(), in the first iteration of the while loop we call
   btrfs_reserve_extent() to allocate a 64KiB extent and pass it a minimum
   allocation size of 4KiB (fs_info->sectorsize). Due to free space
   fragmentation, btrfs_reserve_extent() ends up allocating two extents
   of 32KiB each, each one on a different iteration of that while loop;

6) Writeback of the data relocation inode completes;

7) Relocation proceeds and ends up at relocation.c:replace_file_extents(),
   with a leaf which has a file extent item that points to the data extent
   from block group X, that has a logical address (bytenr) of X + 128KiB
   and a size of 64KiB. Then it calls get_new_location(), which does a
   lookup in the data relocation tree for a file extent item starting at
   offset 128KiB (X + 128KiB - X) and belonging to the data relocation
   inode. It finds a corresponding file extent item, however that item
   points to an extent that has a size of 32KiB, which doesn't match the
   expected size of 64KiB, resuling in -EINVAL being returned from this
   function and propagated up to __btrfs_cow_block(), which aborts the
   current transaction.

To fix this make sure that at cow_file_range() when we call the allocator
we pass it a minimum allocation size corresponding the desired extent size
if the inode belongs to the data relocation tree, otherwise pass it the
filesystem's sector size as the minimum allocation size.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agobtrfs: fix race between block group removal and block group creation
Filipe Manana [Mon, 1 Jun 2020 18:12:19 +0000 (19:12 +0100)]
btrfs: fix race between block group removal and block group creation

There is a race between block group removal and block group creation
when the removal is completed by a task running fitrim or scrub. When
this happens we end up failing the block group creation with an error
-EEXIST since we attempt to insert a duplicate block group item key
in the extent tree. That results in a transaction abort.

The race happens like this:

1) Task A is doing a fitrim, and at btrfs_trim_block_group() it freezes
   block group X with btrfs_freeze_block_group() (until very recently
   that was named btrfs_get_block_group_trimming());

2) Task B starts removing block group X, either because it's now unused
   or due to relocation for example. So at btrfs_remove_block_group(),
   while holding the chunk mutex and the block group's lock, it sets
   the 'removed' flag of the block group and it sets the local variable
   'remove_em' to false, because the block group is currently frozen
   (its 'frozen' counter is > 0, until very recently this counter was
   named 'trimming');

3) Task B unlocks the block group and the chunk mutex;

4) Task A is done trimming the block group and unfreezes the block group
   by calling btrfs_unfreeze_block_group() (until very recently this was
   named btrfs_put_block_group_trimming()). In this function we lock the
   block group and set the local variable 'cleanup' to true because we
   were able to decrement the block group's 'frozen' counter down to 0 and
   the flag 'removed' is set in the block group.

   Since 'cleanup' is set to true, it locks the chunk mutex and removes
   the extent mapping representing the block group from the mapping tree;

5) Task C allocates a new block group Y and it picks up the logical address
   that block group X had as the logical address for Y, because X was the
   block group with the highest logical address and now the second block
   group with the highest logical address, the last in the fs mapping tree,
   ends at an offset corresponding to block group X's logical address (this
   logical address selection is done at volumes.c:find_next_chunk()).

   At this point the new block group Y does not have yet its item added
   to the extent tree (nor the corresponding device extent items and
   chunk item in the device and chunk trees). The new group Y is added to
   the list of pending block groups in the transaction handle;

6) Before task B proceeds to removing the block group item for block
   group X from the extent tree, which has a key matching:

   (X logical offset, BTRFS_BLOCK_GROUP_ITEM_KEY, length)

   task C while ending its transaction handle calls
   btrfs_create_pending_block_groups(), which finds block group Y and
   tries to insert the block group item for Y into the exten tree, which
   fails with -EEXIST since logical offset is the same that X had and
   task B hasn't yet deleted the key from the extent tree.
   This failure results in a transaction abort, producing a stack like
   the following:

------------[ cut here ]------------
 BTRFS: Transaction aborted (error -17)
 WARNING: CPU: 2 PID: 19736 at fs/btrfs/block-group.c:2074 btrfs_create_pending_block_groups+0x1eb/0x260 [btrfs]
 Modules linked in: btrfs blake2b_generic xor raid6_pq (...)
 CPU: 2 PID: 19736 Comm: fsstress Tainted: G        W         5.6.0-rc7-btrfs-next-58 #5
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
 RIP: 0010:btrfs_create_pending_block_groups+0x1eb/0x260 [btrfs]
 Code: ff ff ff 48 8b 55 50 f0 48 (...)
 RSP: 0018:ffffa4160a1c7d58 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff961581909d98 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: ffffffffb3d63990 RDI: 0000000000000001
 RBP: ffff9614f3356a58 R08: 0000000000000000 R09: 0000000000000001
 R10: ffff9615b65b0040 R11: 0000000000000000 R12: ffff961581909c10
 R13: ffff9615b0c32000 R14: ffff9614f3356ab0 R15: ffff9614be779000
 FS:  00007f2ce2841e80(0000) GS:ffff9615bae00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000555f18780000 CR3: 0000000131d34005 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  btrfs_start_dirty_block_groups+0x398/0x4e0 [btrfs]
  btrfs_commit_transaction+0xd0/0xc50 [btrfs]
  ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
  ? __ia32_sys_fdatasync+0x20/0x20
  iterate_supers+0xdb/0x180
  ksys_sync+0x60/0xb0
  __ia32_sys_sync+0xa/0x10
  do_syscall_64+0x5c/0x280
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7f2ce1d4d5b7
 Code: 83 c4 08 48 3d 01 (...)
 RSP: 002b:00007ffd8b558c58 EFLAGS: 00000202 ORIG_RAX: 00000000000000a2
 RAX: ffffffffffffffda RBX: 000000000000002c RCX: 00007f2ce1d4d5b7
 RDX: 00000000ffffffff RSI: 00000000186ba07b RDI: 000000000000002c
 RBP: 0000555f17b9e520 R08: 0000000000000012 R09: 000000000000ce00
 R10: 0000000000000078 R11: 0000000000000202 R12: 0000000000000032
 R13: 0000000051eb851f R14: 00007ffd8b558cd0 R15: 0000555f1798ec20
 irq event stamp: 0
 hardirqs last  enabled at (0): [<0000000000000000>] 0x0
 hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
 softirqs last  enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
 softirqs last disabled at (0): [<0000000000000000>] 0x0
 ---[ end trace bd7c03622e0b0a9c ]---

Fix this simply by making btrfs_remove_block_group() remove the block
group's item from the extent tree before it flags the block group as
removed. Also make the free space deletion from the free space tree
before flagging the block group as removed, to avoid a similar race
with adding and removing free space entries for the free space tree.

Fixes: 04216820fe83d5 ("Btrfs: fix race between fs trimming and block group remove/allocation")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agobtrfs: fix a block group ref counter leak after failure to remove block group
Filipe Manana [Mon, 1 Jun 2020 18:12:06 +0000 (19:12 +0100)]
btrfs: fix a block group ref counter leak after failure to remove block group

When removing a block group, if we fail to delete the block group's item
from the extent tree, we jump to the 'out' label and end up decrementing
the block group's reference count once only (by 1), resulting in a counter
leak because the block group at that point was already removed from the
block group cache rbtree - so we have to decrement the reference count
twice, once for the rbtree and once for our lookup at the start of the
function.

There is a second bug where if removing the free space tree entries (the
call to remove_block_group_free_space()) fails we end up jumping to the
'out_put_group' label but end up decrementing the reference count only
once, when we should have done it twice, since we have already removed
the block group from the block group cache rbtree. This happens because
the reference count decrement for the rbtree reference happens after
attempting to remove the free space tree entries, which is far away from
the place where we remove the block group from the rbtree.

To make things less error prone, decrement the reference count for the
rbtree immediately after removing the block group from it. This also
eleminates the need for two different exit labels on error, renaming
'out_put_label' to just 'out' and removing the old 'out'.

Fixes: f6033c5e333238 ("btrfs: fix block group leak when removing fails")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agoselftests/ftrace: Support ":README" suffix for requires
Masami Hiramatsu [Wed, 3 Jun 2020 02:41:09 +0000 (11:41 +0900)]
selftests/ftrace: Support ":README" suffix for requires

Add ":README" suffix support for the requires list, so that
the testcase can list up the required string for README file
to the requires list.

Note that the required string is treated as a fixed string,
instead of regular expression. Also, the testcase can specify
a string containing spaces with quotes. E.g.

# requires: "place: [<module>:]<symbol>":README

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
4 years agoselftests/ftrace: Support ":tracer" suffix for requires
Masami Hiramatsu [Wed, 3 Jun 2020 02:40:59 +0000 (11:40 +0900)]
selftests/ftrace: Support ":tracer" suffix for requires

Add ":tracer" suffix support for the requires list, so that
the testcase can list up the required tracer (e.g. function)
to the requires list.

For example, if the testcase requires function_graph tracer,
it can write requires list as below instead of checking
available_tracers.

# requires: function_graph:tracer

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
4 years agoselftests/ftrace: Convert check_filter_file() with requires list
Masami Hiramatsu [Wed, 3 Jun 2020 02:40:49 +0000 (11:40 +0900)]
selftests/ftrace: Convert check_filter_file() with requires list

Since check_filter_file() is basically checking the filter
tracefs file, we can convert it into requires list.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
4 years agoselftests/ftrace: Convert required interface checks into requires list
Masami Hiramatsu [Wed, 3 Jun 2020 02:40:38 +0000 (11:40 +0900)]
selftests/ftrace: Convert required interface checks into requires list

Convert the required tracefs interface checking code with
requires: list.

Fixed merge conflicts in trigger-hist.tc and trigger-trace-marker-hist.tc
Shuah Khan <skhan@linuxfoundation.org>

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
4 years agoblock: Fix use-after-free in blkdev_get()
Jason Yan [Tue, 16 Jun 2020 12:16:55 +0000 (20:16 +0800)]
block: Fix use-after-free in blkdev_get()

In blkdev_get() we call __blkdev_get() to do some internal jobs and if
there is some errors in __blkdev_get(), the bdput() is called which
means we have released the refcount of the bdev (actually the refcount of
the bdev inode). This means we cannot access bdev after that point. But
acctually bdev is still accessed in blkdev_get() after calling
__blkdev_get(). This results in use-after-free if the refcount is the
last one we released in __blkdev_get(). Let's take a look at the
following scenerio:

  CPU0            CPU1                    CPU2
blkdev_open     blkdev_open           Remove disk
                  bd_acquire
  blkdev_get
    __blkdev_get      del_gendisk
bdev_unhash_inode
  bd_acquire          bdev_get_gendisk
    bd_forget           failed because of unhashed
  bdput
              bdput (the last one)
        bdev_evict_inode

       access bdev => use after free

[  459.350216] BUG: KASAN: use-after-free in __lock_acquire+0x24c1/0x31b0
[  459.351190] Read of size 8 at addr ffff88806c815a80 by task syz-executor.0/20132
[  459.352347]
[  459.352594] CPU: 0 PID: 20132 Comm: syz-executor.0 Not tainted 4.19.90 #2
[  459.353628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  459.354947] Call Trace:
[  459.355337]  dump_stack+0x111/0x19e
[  459.355879]  ? __lock_acquire+0x24c1/0x31b0
[  459.356523]  print_address_description+0x60/0x223
[  459.357248]  ? __lock_acquire+0x24c1/0x31b0
[  459.357887]  kasan_report.cold+0xae/0x2d8
[  459.358503]  __lock_acquire+0x24c1/0x31b0
[  459.359120]  ? _raw_spin_unlock_irq+0x24/0x40
[  459.359784]  ? lockdep_hardirqs_on+0x37b/0x580
[  459.360465]  ? _raw_spin_unlock_irq+0x24/0x40
[  459.361123]  ? finish_task_switch+0x125/0x600
[  459.361812]  ? finish_task_switch+0xee/0x600
[  459.362471]  ? mark_held_locks+0xf0/0xf0
[  459.363108]  ? __schedule+0x96f/0x21d0
[  459.363716]  lock_acquire+0x111/0x320
[  459.364285]  ? blkdev_get+0xce/0xbe0
[  459.364846]  ? blkdev_get+0xce/0xbe0
[  459.365390]  __mutex_lock+0xf9/0x12a0
[  459.365948]  ? blkdev_get+0xce/0xbe0
[  459.366493]  ? bdev_evict_inode+0x1f0/0x1f0
[  459.367130]  ? blkdev_get+0xce/0xbe0
[  459.367678]  ? destroy_inode+0xbc/0x110
[  459.368261]  ? mutex_trylock+0x1a0/0x1a0
[  459.368867]  ? __blkdev_get+0x3e6/0x1280
[  459.369463]  ? bdev_disk_changed+0x1d0/0x1d0
[  459.370114]  ? blkdev_get+0xce/0xbe0
[  459.370656]  blkdev_get+0xce/0xbe0
[  459.371178]  ? find_held_lock+0x2c/0x110
[  459.371774]  ? __blkdev_get+0x1280/0x1280
[  459.372383]  ? lock_downgrade+0x680/0x680
[  459.373002]  ? lock_acquire+0x111/0x320
[  459.373587]  ? bd_acquire+0x21/0x2c0
[  459.374134]  ? do_raw_spin_unlock+0x4f/0x250
[  459.374780]  blkdev_open+0x202/0x290
[  459.375325]  do_dentry_open+0x49e/0x1050
[  459.375924]  ? blkdev_get_by_dev+0x70/0x70
[  459.376543]  ? __x64_sys_fchdir+0x1f0/0x1f0
[  459.377192]  ? inode_permission+0xbe/0x3a0
[  459.377818]  path_openat+0x148c/0x3f50
[  459.378392]  ? kmem_cache_alloc+0xd5/0x280
[  459.379016]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  459.379802]  ? path_lookupat.isra.0+0x900/0x900
[  459.380489]  ? __lock_is_held+0xad/0x140
[  459.381093]  do_filp_open+0x1a1/0x280
[  459.381654]  ? may_open_dev+0xf0/0xf0
[  459.382214]  ? find_held_lock+0x2c/0x110
[  459.382816]  ? lock_downgrade+0x680/0x680
[  459.383425]  ? __lock_is_held+0xad/0x140
[  459.384024]  ? do_raw_spin_unlock+0x4f/0x250
[  459.384668]  ? _raw_spin_unlock+0x1f/0x30
[  459.385280]  ? __alloc_fd+0x448/0x560
[  459.385841]  do_sys_open+0x3c3/0x500
[  459.386386]  ? filp_open+0x70/0x70
[  459.386911]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  459.387610]  ? trace_hardirqs_off_caller+0x55/0x1c0
[  459.388342]  ? do_syscall_64+0x1a/0x520
[  459.388930]  do_syscall_64+0xc3/0x520
[  459.389490]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  459.390248] RIP: 0033:0x416211
[  459.390720] Code: 75 14 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83
04 19 00 00 c3 48 83 ec 08 e8 0a fa ff ff 48 89 04 24 b8 02 00 00 00 0f
   05 <48> 8b 3c 24 48 89 c2 e8 53 fa ff ff 48 89 d0 48 83 c4 08 48 3d
      01
[  459.393483] RSP: 002b:00007fe45dfe9a60 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
[  459.394610] RAX: ffffffffffffffda RBX: 00007fe45dfea6d4 RCX: 0000000000416211
[  459.395678] RDX: 00007fe45dfe9b0a RSI: 0000000000000002 RDI: 00007fe45dfe9b00
[  459.396758] RBP: 000000000076bf20 R08: 0000000000000000 R09: 000000000000000a
[  459.397930] R10: 0000000000000075 R11: 0000000000000293 R12: 00000000ffffffff
[  459.399022] R13: 0000000000000bd9 R14: 00000000004cdb80 R15: 000000000076bf2c
[  459.400168]
[  459.400430] Allocated by task 20132:
[  459.401038]  kasan_kmalloc+0xbf/0xe0
[  459.401652]  kmem_cache_alloc+0xd5/0x280
[  459.402330]  bdev_alloc_inode+0x18/0x40
[  459.402970]  alloc_inode+0x5f/0x180
[  459.403510]  iget5_locked+0x57/0xd0
[  459.404095]  bdget+0x94/0x4e0
[  459.404607]  bd_acquire+0xfa/0x2c0
[  459.405113]  blkdev_open+0x110/0x290
[  459.405702]  do_dentry_open+0x49e/0x1050
[  459.406340]  path_openat+0x148c/0x3f50
[  459.406926]  do_filp_open+0x1a1/0x280
[  459.407471]  do_sys_open+0x3c3/0x500
[  459.408010]  do_syscall_64+0xc3/0x520
[  459.408572]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  459.409415]
[  459.409679] Freed by task 1262:
[  459.410212]  __kasan_slab_free+0x129/0x170
[  459.410919]  kmem_cache_free+0xb2/0x2a0
[  459.411564]  rcu_process_callbacks+0xbb2/0x2320
[  459.412318]  __do_softirq+0x225/0x8ac

Fix this by delaying bdput() to the end of blkdev_get() which means we
have finished accessing bdev.

Fixes: 77ea887e433a ("implement in-kernel gendisk events handling")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoarm64: pgtable: Clear the GP bit for non-executable kernel pages
Will Deacon [Mon, 15 Jun 2020 15:27:43 +0000 (16:27 +0100)]
arm64: pgtable: Clear the GP bit for non-executable kernel pages

Commit cca98e9f8b5e ("mm: enforce that vmap can't map pages executable")
introduced 'pgprot_nx(prot)' for arm64 but collided silently with the
BTI support during the merge window, which endeavours to clear the GP
bit for non-executable kernel mappings in set_memory_nx().

For consistency between the two APIs, clear the GP bit in pgprot_nx().

Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20200615154642.3579-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
4 years agoafs: afs_vnode_commit_status() doesn't need to check the RPC error
David Howells [Mon, 15 Jun 2020 23:52:30 +0000 (00:52 +0100)]
afs: afs_vnode_commit_status() doesn't need to check the RPC error

afs_vnode_commit_status() is only ever called if op->error is 0, so remove
the op->error checks from the function.

Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
4 years agoafs: Fix use of afs_check_for_remote_deletion()
David Howells [Mon, 15 Jun 2020 23:34:09 +0000 (00:34 +0100)]
afs: Fix use of afs_check_for_remote_deletion()

afs_check_for_remote_deletion() checks to see if error ENOENT is returned
by the server in response to an operation and, if so, marks the primary
vnode as having been deleted as the FID is no longer valid.

However, it's being called from the operation success functions, where no
abort has happened - and if an inline abort is recorded, it's handled by
afs_vnode_commit_status().

Fix this by actually calling the operation aborted method if provided and
having that point to afs_check_for_remote_deletion().

Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
4 years agoafs: Remove afs_operation::abort_code
David Howells [Mon, 15 Jun 2020 23:25:56 +0000 (00:25 +0100)]
afs: Remove afs_operation::abort_code

Remove afs_operation::abort_code as it's read but never set.  Use
ac.abort_code instead.

Signed-off-by: David Howells <dhowells@redhat.com>
4 years agoafs: Fix yfs_fs_fetch_status() to honour vnode selector
David Howells [Mon, 15 Jun 2020 23:23:12 +0000 (00:23 +0100)]
afs: Fix yfs_fs_fetch_status() to honour vnode selector

Fix yfs_fs_fetch_status() to honour the vnode selector in
op->fetch_status.which as does afs_fs_fetch_status() that allows
afs_do_lookup() to use this as an alternative to the InlineBulkStatus RPC
call if not implemented by the server.

This doesn't matter in the current code as YFS servers always implement
InlineBulkStatus, but a subsequent will call it on YFS servers too in some
circumstances.

Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
4 years agoafs: Remove yfs_fs_fetch_file_status() as it's not used
David Howells [Mon, 15 Jun 2020 23:18:09 +0000 (00:18 +0100)]
afs: Remove yfs_fs_fetch_file_status() as it's not used

Remove yfs_fs_fetch_file_status() as it's no longer used.

Signed-off-by: David Howells <dhowells@redhat.com>
4 years agoselftests/ftrace: Add "requires:" list support
Masami Hiramatsu [Wed, 3 Jun 2020 02:40:28 +0000 (11:40 +0900)]
selftests/ftrace: Add "requires:" list support

Introduce "requires:" list to check required ftrace interface
for each test. This will simplify the interface checking code
and unify the error message. Another good point is, it can
skip the ftrace initializing.

Note that this requires list must be written as a shell
comment.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
4 years agoselftests/ftrace: Return unsupported for the unconfigured features
Masami Hiramatsu [Wed, 3 Jun 2020 02:40:19 +0000 (11:40 +0900)]
selftests/ftrace: Return unsupported for the unconfigured features

As same as other test cases, return unsupported if kprobe_events
or argument access feature are not found.

There can be a new arch which does not port those features yet,
and an older kernel which doesn't support it.
Those can not enable the features.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
4 years agoselftests/ftrace: Allow ":" in description
Masami Hiramatsu [Wed, 3 Jun 2020 02:40:10 +0000 (11:40 +0900)]
selftests/ftrace: Allow ":" in description

Allow ":" in the description line. Currently if there is ":"
in the test description line, the description is cut at that
point, but that was unintended.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
4 years agolibceph: don't omit used_replica in target_copy()
Ilya Dryomov [Tue, 9 Jun 2020 09:59:08 +0000 (11:59 +0200)]
libceph: don't omit used_replica in target_copy()

Currently target_copy() is used only for sending linger pings, so
this doesn't come up, but generally omitting used_replica can hang
the client as we wouldn't notice the acting set change (legacy_change
in calc_target()) or trigger a warning in handle_reply().

Fixes: 117d96a04f00 ("libceph: support for balanced and localized reads")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
4 years agolibceph: don't omit recovery_deletes in target_copy()
Ilya Dryomov [Tue, 9 Jun 2020 09:57:56 +0000 (11:57 +0200)]
libceph: don't omit recovery_deletes in target_copy()

Currently target_copy() is used only for sending linger pings, so
this doesn't come up, but generally omitting recovery_deletes can
result in unneeded resends (force_resend in calc_target()).

Fixes: ae78dd8139ce ("libceph: make RECOVERY_DELETES feature create a new interval")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
4 years agolibceph: move away from global osd_req_flags
Ilya Dryomov [Thu, 4 Jun 2020 09:12:34 +0000 (11:12 +0200)]
libceph: move away from global osd_req_flags

osd_req_flags is overly general and doesn't suit its only user
(read_from_replica option) well:

- applying osd_req_flags in account_request() affects all OSD
  requests, including linger (i.e. watch and notify).  However,
  linger requests should always go to the primary even though
  some of them are reads (e.g. notify has side effects but it
  is a read because it doesn't result in mutation on the OSDs).

- calls to class methods that are reads are allowed to go to
  the replica, but most such calls issued for "rbd map" and/or
  exclusive lock transitions are requested to be resent to the
  primary via EAGAIN, doubling the latency.

Get rid of global osd_req_flags and set read_from_replica flag
only on specific OSD requests instead.

Fixes: 8ad44d5e0d1e ("libceph: read_from_replica option")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
4 years agos390/numa: let NODES_SHIFT depend on NEED_MULTIPLE_NODES
Heiko Carstens [Wed, 10 Jun 2020 08:36:05 +0000 (10:36 +0200)]
s390/numa: let NODES_SHIFT depend on NEED_MULTIPLE_NODES

Qian Cai reported:
"""
When NUMA=n and nr_node_ids=2, in apply_wqattrs_prepare(), it has,

for_each_node(node) {
        if (wq_calc_node_cpumask(...

where it will trigger a booting warning,

WARNING: workqueue cpumask: online intersect > possible intersect

because it found 2 nodes and wq_numa_possible_cpumask[1] is an empty
cpumask.
"""

Let NODES_SHIFT depend on NEED_MULTIPLE_NODES like it is done
on other architectures in order to fix this.

Fixes: 701dc81e7412 ("s390/mm: remove fake numa support")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390/vdso: fix vDSO clock_getres()
Vincenzo Frascino [Tue, 24 Mar 2020 12:10:27 +0000 (12:10 +0000)]
s390/vdso: fix vDSO clock_getres()

clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().

In particular, posix_get_hrtimer_res() does:
    sec = 0;
    ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or at run time.

Fix the s390 vdso implementation of clock_getres keeping a copy of
hrtimer_resolution in vdso data and using that directly.

Link: https://lkml.kernel.org/r/20200324121027.21665-1-vincenzo.frascino@arm.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[heiko.carstens@de.ibm.com: use llgf for proper zero extension]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390/vdso: Use $(LD) instead of $(CC) to link vDSO
Nathan Chancellor [Tue, 2 Jun 2020 19:25:24 +0000 (12:25 -0700)]
s390/vdso: Use $(LD) instead of $(CC) to link vDSO

Currently, the VDSO is being linked through $(CC). This does not match
how the rest of the kernel links objects, which is through the $(LD)
variable.

When clang is built in a default configuration, it first attempts to use
the target triple's default linker, which is just ld. However, the user
can override this through the CLANG_DEFAULT_LINKER cmake define so that
clang uses another linker by default, such as LLVM's own linker, ld.lld.
This can be useful to get more optimized links across various different
projects.

However, this is problematic for the s390 vDSO because ld.lld does not
have any s390 emulatiom support:

https://github.com/llvm/llvm-project/blob/llvmorg-10.0.1-rc1/lld/ELF/Driver.cpp#L132-L150

Thus, if a user is using a toolchain with ld.lld as the default, they
will see an error, even if they have specified ld.bfd through the LD
make variable:

$ make -j"$(nproc)" -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- LLVM=1 \
                       LD=s390x-linux-gnu-ld \
                       defconfig arch/s390/kernel/vdso64/
ld.lld: error: unknown emulation: elf64_s390
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)

Normally, '-fuse-ld=bfd' could be used to get around this; however, this
can be fragile, depending on paths and variable naming. The cleaner
solution for the kernel is to take advantage of the fact that $(LD) can
be invoked directly, which bypasses the heuristics of $(CC) and respects
the user's choice. Similar changes have been done for ARM, ARM64, and
MIPS.

Link: https://lkml.kernel.org/r/20200602192523.32758-1-natechancellor@gmail.com
Link: https://github.com/ClangBuiltLinux/linux/issues/1041
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
[heiko.carstens@de.ibm.com: add --build-id flag]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390/protvirt: use scnprintf() instead of snprintf()
Chen Zhou [Sat, 9 May 2020 08:56:08 +0000 (16:56 +0800)]
s390/protvirt: use scnprintf() instead of snprintf()

snprintf() returns the number of bytes that would be written,
which may be greater than the the actual length to be written.

uv_query_facilities() should return the number of bytes printed
into the buffer. This is the return value of scnprintf().
The other functions are the same.

Link: https://lkml.kernel.org/r/20200509085608.41061-4-chenzhou10@huawei.com
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390: use scnprintf() in sys_##_prefix##_##_name##_show
Chen Zhou [Sat, 9 May 2020 08:56:07 +0000 (16:56 +0800)]
s390: use scnprintf() in sys_##_prefix##_##_name##_show

snprintf() returns the number of bytes that would be written,
which may be greater than the the actual length to be written.

show() methods should return the number of bytes printed into the
buffer. This is the return value of scnprintf().

Link: https://lkml.kernel.org/r/20200509085608.41061-3-chenzhou10@huawei.com
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390/crypto: use scnprintf() instead of snprintf()
Chen Zhou [Sat, 9 May 2020 08:56:06 +0000 (16:56 +0800)]
s390/crypto: use scnprintf() instead of snprintf()

snprintf() returns the number of bytes that would be written,
which may be greater than the the actual length to be written.

show() methods should return the number of bytes printed into the
buffer. This is the return value of scnprintf().

Link: https://lkml.kernel.org/r/20200509085608.41061-2-chenzhou10@huawei.com
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390/zcrypt: use kzalloc
Zou Wei [Tue, 21 Apr 2020 12:35:48 +0000 (20:35 +0800)]
s390/zcrypt: use kzalloc

This patch fixes below warning reported by coccicheck

drivers/s390/crypto/zcrypt_ep11misc.c:198:8-15: WARNING:
kzalloc should be used for cprb, instead of kmalloc/memset

Link: https://lkml.kernel.org/r/1587472548-105240-1-git-send-email-zou_wei@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390/virtio: remove unused pm callbacks
Cornelia Huck [Tue, 26 May 2020 09:36:29 +0000 (11:36 +0200)]
s390/virtio: remove unused pm callbacks

Support for hibernation on s390 has been recently been removed with
commit 394216275c7d ("s390: remove broken hibernate / power management
support"), no need to keep unused code around.

Link: https://lkml.kernel.org/r/20200526093629.257649-1-cohuck@redhat.com
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390/qdio: reduce SLSB writes during Input Queue processing
Julian Wiedmann [Tue, 21 Apr 2020 10:35:00 +0000 (11:35 +0100)]
s390/qdio: reduce SLSB writes during Input Queue processing

Streamline the processing of QDIO Input Queues, and remove some
intermittent SLSB updates (no deleting of old ACKs, no redundant
transitions through NOT_INIT).

Rather than counting ACKs, we now keep track of the whole batch of
SBALs that were completed during the current polling cycle.
Most completed SBALs stay in their initial state (ie. PRIMED or ERROR),
except that the most recent SBAL in each sub-run is ACKed for
IRQ reduction.

The only logic changes happen in inbound_handle_work(), the other
delta is just a renaming of the variables that track the SBAL batch.

Note that in particular we don't need to flip the _oldest_ SBAL to
an idle state (eg. NOT_INIT or ACKed) as a guard against catching our
own tail. Since get_inbound_buffer_frontier() will never scan more than
the remaining nr_buf_used SBALs, this scenario just doesn't occur.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agoselftests/seccomp: s390 shares the syscall and return value register
Sven Schnelle [Mon, 9 Mar 2020 15:56:53 +0000 (16:56 +0100)]
selftests/seccomp: s390 shares the syscall and return value register

s390 cannot set syscall number and reture code at the same time,
so set the appropriate flag to indicate it.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390/ptrace: fix setting syscall number
Sven Schnelle [Mon, 9 Mar 2020 15:44:50 +0000 (16:44 +0100)]
s390/ptrace: fix setting syscall number

When strace wants to update the syscall number, it sets GPR2
to the desired number and updates the GPR via PTRACE_SETREGSET.
It doesn't update regs->int_code which would cause the old syscall
executed on syscall restart. As we cannot change the ptrace ABI and
don't have a field for the interruption code, check whether the tracee
is in a syscall and the last instruction was svc. In that case assume
that the tracer wants to update the syscall number and copy the GPR2
value to regs->int_code.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390/ptrace: pass invalid syscall numbers to tracing
Sven Schnelle [Fri, 6 Mar 2020 12:19:34 +0000 (13:19 +0100)]
s390/ptrace: pass invalid syscall numbers to tracing

tracing expects to see invalid syscalls, so pass it through.
The syscall path in entry.S checks the syscall number before
looking up the handler, so it is still safe.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390/ptrace: return -ENOSYS when invalid syscall is supplied
Sven Schnelle [Fri, 6 Mar 2020 12:18:31 +0000 (13:18 +0100)]
s390/ptrace: return -ENOSYS when invalid syscall is supplied

The current code returns the syscall number which an invalid
syscall number is supplied and tracing is enabled. This makes
the strace testsuite fail.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390/seccomp: pass syscall arguments via seccomp_data
Sven Schnelle [Wed, 4 Mar 2020 15:07:34 +0000 (16:07 +0100)]
s390/seccomp: pass syscall arguments via seccomp_data

Use __secure_computing() and pass the register data via
seccomp_data so secure computing doesn't have to fetch it
again.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agos390/qdio: fine-tune SLSB update
Julian Wiedmann [Fri, 8 May 2020 15:00:21 +0000 (17:00 +0200)]
s390/qdio: fine-tune SLSB update

xchg() for a single-byte location assembles to a 4-byte Compare&Swap,
wrapped into a non-trivial amount of retry code that deals with
concurrent modifications to the unaffected bytes.

Change it to a simple byte-store, but preserve the memory ordering
semantics that the CS provided.
This simplifies the generated code for a hot path, and in theory also
allows us to amortize the memory barriers over multiple SLSB updates.

CC: Andreas Krebbel <krebbel@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agodrm/i915/display: Fix the encoder type check
Vandita Kulkarni [Fri, 12 Jun 2020 08:22:37 +0000 (13:52 +0530)]
drm/i915/display: Fix the encoder type check

For all ddi, encoder->type holds output type as ddi,
assigning it to individual o/p types is no more valid.

Fixes: 362bfb995b78 ("drm/i915/tgl: Add DKL PHY vswing table for HDMI")
v2: Rebase, no functional change.

Signed-off-by: Vandita Kulkarni <vandita.kulkarni@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200612082237.11886-1-vandita.kulkarni@intel.com
(cherry picked from commit 94641eb6c69682884abbecf22fe5b7c185af6a06)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agodrm/i915/icl+: Fix hotplug interrupt disabling after storm detection
Imre Deak [Fri, 12 Jun 2020 12:17:31 +0000 (15:17 +0300)]
drm/i915/icl+: Fix hotplug interrupt disabling after storm detection

Atm, hotplug interrupts on TypeC ports are left enabled after detecting
an interrupt storm, fix this.

Reported-by: Kunal Joshi <kunal1.joshi@intel.com>
References: https://gitlab.freedesktop.org/drm/intel/-/issues/351
Bugzilla: https://gitlab.freedesktop.org/drm/intel/-/issues/1964
Cc: Kunal Joshi <kunal1.joshi@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200612121731.19596-1-imre.deak@intel.com
(cherry picked from commit 587a87b9d7e94927edcdea018565bc1939381eb1)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agodrm/i915/gt: Move gen4 GT workarounds from init_clock_gating to workarounds
Chris Wilson [Thu, 11 Jun 2020 08:01:40 +0000 (09:01 +0100)]
drm/i915/gt: Move gen4 GT workarounds from init_clock_gating to workarounds

Rescue the GT workarounds from being buried inside init_clock_gating so
that we remember to apply them after a GT reset, and that they are
included in our verification that the workarounds are applied.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200611080140.30228-6-chris@chris-wilson.co.uk
(cherry picked from commit 2bcefd0d263ab4a72f0d61921ae6b0dc81606551)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agodrm/i915/gt: Move ilk GT workarounds from init_clock_gating to workarounds
Chris Wilson [Thu, 11 Jun 2020 08:01:39 +0000 (09:01 +0100)]
drm/i915/gt: Move ilk GT workarounds from init_clock_gating to workarounds

Rescue the GT workarounds from being buried inside init_clock_gating so
that we remember to apply them after a GT reset, and that they are
included in our verification that the workarounds are applied.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200611080140.30228-5-chris@chris-wilson.co.uk
(cherry picked from commit 806a45c0838d253e306a6384057e851b65d11099)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agodrm/i915/gt: Move snb GT workarounds from init_clock_gating to workarounds
Chris Wilson [Thu, 11 Jun 2020 08:01:38 +0000 (09:01 +0100)]
drm/i915/gt: Move snb GT workarounds from init_clock_gating to workarounds

Rescue the GT workarounds from being buried inside init_clock_gating so
that we remember to apply them after a GT reset, and that they are
included in our verification that the workarounds are applied.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200611080140.30228-4-chris@chris-wilson.co.uk
(cherry picked from commit c3b93a943f2c9ee4a106db100a2fc3b2f126bfc5)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agodrm/i915/gt: Move vlv GT workarounds from init_clock_gating to workarounds
Chris Wilson [Thu, 11 Jun 2020 08:01:37 +0000 (09:01 +0100)]
drm/i915/gt: Move vlv GT workarounds from init_clock_gating to workarounds

Rescue the GT workarounds from being buried inside init_clock_gating so
that we remember to apply them after a GT reset, and that they are
included in our verification that the workarounds are applied.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200611080140.30228-3-chris@chris-wilson.co.uk
(cherry picked from commit 7331c356b6d2d8a01422cacab27478a1dba9fa2a)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agodrm/i915/gt: Move ivb GT workarounds from init_clock_gating to workarounds
Chris Wilson [Thu, 11 Jun 2020 08:01:36 +0000 (09:01 +0100)]
drm/i915/gt: Move ivb GT workarounds from init_clock_gating to workarounds

Rescue the GT workarounds from being buried inside init_clock_gating so
that we remember to apply them after a GT reset, and that they are
included in our verification that the workarounds are applied.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200611080140.30228-2-chris@chris-wilson.co.uk
(cherry picked from commit 19f1f627b33385a2f0855cbc7d33d86d7f4a1e78)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agodrm/i915/gt: Move hsw GT workarounds from init_clock_gating to workarounds
Chris Wilson [Thu, 11 Jun 2020 09:30:15 +0000 (10:30 +0100)]
drm/i915/gt: Move hsw GT workarounds from init_clock_gating to workarounds

Rescue the GT workarounds from being buried inside init_clock_gating so
that we remember to apply them after a GT reset, and that they are
included in our verification that the workarounds are applied.

v2: Leave HSW_SCRATCH to set an explicit value, not or in our disable
bit.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2011
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200611093015.11370-1-chris@chris-wilson.co.uk
(cherry picked from commit f93ec5fb563779bda4501890b1854526de58e0f1)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agodrm/i915/icl: Disable DIP on MST ports with the transcoder clock still on
Imre Deak [Tue, 9 Jun 2020 22:06:16 +0000 (01:06 +0300)]
drm/i915/icl: Disable DIP on MST ports with the transcoder clock still on

According to BSpec the Data Island Packet should be disabled after
disabling the transcoder, but before the transcoder clock select is set
to none. On an ICL RVP, daisy-chained MST config not following this
leads to a hang with the following MCE when disabling the output:

[  870.948739] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 6: ba00000011000402
[  871.019212] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81aca652> {poll_idle+0x92/0xb0}
[  871.019212] mce: [Hardware Error]: TSC 135a261fe61
[  871.019212] mce: [Hardware Error]: PROCESSOR 0:706e5 TIME 1591739604 SOCKET 0 APIC 0 microcode 20
[  871.019212] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[  871.019212] mce: [Hardware Error]: Machine check: Processor context corrupt
[  871.019212] Kernel panic - not syncing: Fatal machine check
[  871.019212] Kernel Offset: disabled

Bspec: 4287

Fixes: fa37a213275c ("drm/i915: Stop sending DP SDPs on ddi disable")
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200609220616.6015-1-imre.deak@intel.com
(cherry picked from commit c980216dd224c52b5c70172753c209b653d84958)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agodrm/i915/gt: Incrementally check for rewinding
Chris Wilson [Tue, 9 Jun 2020 15:17:23 +0000 (16:17 +0100)]
drm/i915/gt: Incrementally check for rewinding

In commit 5ba32c7be81e ("drm/i915/execlists: Always force a context
reload when rewinding RING_TAIL"), we placed the check for rewinding a
context on actually submitting the next request in that context. This
was so that we only had to check once, and could do so with precision
avoiding as many forced restores as possible. For example, to ensure
that we can resubmit the same request a couple of times, we include a
small wa_tail such that on the next submission, the ring->tail will
appear to move forwards when resubmitting the same request. This is very
common as it will happen for every lite-restore to fill the second port
after a context switch.

However, intel_ring_direction() is limited in precision to movements of
upto half the ring size. The consequence being that if we tried to
unwind many requests, we could exceed half the ring and flip the sense
of the direction, so missing a force restore. As no request can be
greater than half the ring (i.e. 2048 bytes in the smallest case), we
can check for rollback incrementally. As we check against the tail that
would be submitted, we do not lose any sensitivity and allow lite
restores for the simple case. We still need to double check upon
submitting the context, to allow for multiple preemptions and
resubmissions.

Fixes: 5ba32c7be81e ("drm/i915/execlists: Always force a context reload when rewinding RING_TAIL")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200609151723.12971-1-chris@chris-wilson.co.uk
(cherry picked from commit e36ba817fa966f81fb1c8d16f3721b5a644b2fa9)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agodrm/i915/tc: fix the reset of ln0
Khaled Almahallawy [Mon, 8 Jun 2020 20:45:37 +0000 (13:45 -0700)]
drm/i915/tc: fix the reset of ln0

Setting ln0 similar to ln1

Fixes: 3b51be4e4061b ("drm/i915/tc: Update DP_MODE programming")
Cc: <stable@vger.kernel.org> # v5.5+
Signed-off-by: Khaled Almahallawy <khaled.almahallawy@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200608204537.28468-1-khaled.almahallawy@intel.com
(cherry picked from commit 4f72a8ee819d57d7329d88f487a2fc9b45153177)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agodrm/i915/gt: Prevent timeslicing into unpreemptable requests
Chris Wilson [Wed, 27 May 2020 16:24:18 +0000 (17:24 +0100)]
drm/i915/gt: Prevent timeslicing into unpreemptable requests

We have a I915_REQUEST_NOPREEMPT flag that we set when we must prevent
the HW from preempting during the course of this request. We need to
honour this flag and protect the HW even if we have a heartbeat request,
or other maximum priority barrier, pending. As such, restrict the
timeslicing check to avoid preempting into the topmost priority band,
leaving the unpreemptable requests in blissful peace running
uninterrupted on the HW.

v2: Set the I915_PRIORITY_BARRIER to be less than
I915_PRIORITY_UNPREEMPTABLE so that we never submit a request
(heartbeat or barrier) that can legitimately preempt the current
non-premptable request.

Fixes: 2a98f4e65bba ("drm/i915: add infrastructure to hold off preemption on a request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200527162418.24755-1-chris@chris-wilson.co.uk
(cherry picked from commit b72f02d78e4f257761ed003444ae52083f962076)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agodrm/i915/selftests: Restore to default heartbeat
Chris Wilson [Tue, 19 May 2020 06:31:14 +0000 (07:31 +0100)]
drm/i915/selftests: Restore to default heartbeat

Since we temporarily disable the heartbeat and restore back to the
default value, we can use the stored defaults on the engine and avoid
using a local.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200519063123.20673-3-chris@chris-wilson.co.uk
(cherry picked from commit 3a230a554dbbc6cd5016cf1b56ee77cfcd48c7d8)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 years agomfd: mt6360: Fix register driver NULL pointer by adding driver name
Gene Chen [Mon, 8 Jun 2020 09:38:45 +0000 (17:38 +0800)]
mfd: mt6360: Fix register driver NULL pointer by adding driver name

The driver name was accidentally removed when .probe() by was replaced
by .probe_new() during an early patch review.

[  121.243012] EAX: c2a8bc64 EBX: 00000000 ECX: 00000000 EDX: 00000000
[  121.243012] ESI: c2a8bc79 EDI: 00000000 EBP: e54bdea8 ESP: e54bdea0
[  121.243012] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010286
[  121.243012] CR0: 80050033 CR2: 00000000 CR3: 02ec3000 CR4: 000006b0
[  121.243012] Call Trace:
[  121.243012]  kset_find_obj+0x3d/0xc0
[  121.243012]  driver_find+0x16/0x40
[  121.243012]  driver_register+0x49/0x100
[  121.243012]  ? i2c_for_each_dev+0x39/0x50
[  121.243012]  ? __process_new_adapter+0x20/0x20
[  121.243012]  ? cht_wc_driver_init+0x11/0x11
[  121.243012]  i2c_register_driver+0x30/0x80
[  121.243012]  ? intel_lpss_pci_driver_init+0x16/0x16
[  121.243012]  mt6360_pmu_driver_init+0xf/0x11
[  121.243012]  do_one_initcall+0x33/0x1a0
[  121.243012]  ? parse_args+0x1eb/0x3d0
[  121.243012]  ? __might_sleep+0x31/0x90
[  121.243012]  ? kernel_init_freeable+0x10a/0x17f
[  121.243012]  kernel_init_freeable+0x12c/0x17f
[  121.243012]  ? rest_init+0x110/0x110
[  121.243012]  kernel_init+0xb/0x100
[  121.243012]  ? schedule_tail_wrapper+0x9/0xc
[  121.243012]  ret_from_fork+0x19/0x24
[  121.243012] Modules linked in:
[  121.243012] CR2: 0000000000000000
[  121.243012] random: get_random_bytes called from init_oops_id+0x3a/0x40 with crng_init=0
[  121.243012] ---[ end trace 38a803400f1a2bee ]---
[  121.243012] EIP: strcmp+0x11/0x30

Fixes: 7edd363421dab ("mfd: Add support for PMIC MT6360")
Signed-off-by: Gene Chen <gene_chen@richtek.com>
Reviewed-by: Matthias Brugger <matthias.bgg@kernel.org>
[Lee: Taking the opportunity to fix the compatible string too 's/_/-/']
Signed-off-by: Lee Jones <lee.jones@linaro.org>
4 years agopinctrl: mcp23s08: Split to three parts: fix ptr_ret.cocci warnings
kernel test robot [Mon, 8 Jun 2020 01:02:53 +0000 (09:02 +0800)]
pinctrl: mcp23s08: Split to three parts: fix ptr_ret.cocci warnings

drivers/pinctrl/pinctrl-mcp23s08_spi.c:129:1-3: WARNING: PTR_ERR_OR_ZERO can be used

 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Fixes: 0f04a81784fe ("pinctrl: mcp23s08: Split to three parts: core, I²C, SPI")
Signed-off-by: kernel test robot <lkp@intel.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20200608010253.GA79576@44f7ab9e8d59
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
4 years agopinctrl: tegra: Use noirq suspend/resume callbacks
Vidya Sagar [Thu, 4 Jun 2020 17:49:35 +0000 (23:19 +0530)]
pinctrl: tegra: Use noirq suspend/resume callbacks

Use noirq suspend/resume callbacks as other drivers which implement
noirq suspend/resume callbacks (Ex:- PCIe) depend on pinctrl driver to
configure the signals used by their respective devices in the noirq phase.

Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Link: https://lore.kernel.org/r/20200604174935.26560-1-vidyas@nvidia.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
4 years agopinctrl: qcom: spmi-gpio: fix warning about irq chip reusage
Dmitry Baryshkov [Thu, 4 Jun 2020 00:28:17 +0000 (03:28 +0300)]
pinctrl: qcom: spmi-gpio: fix warning about irq chip reusage

Fix the following warnings caused by reusage of the same irq_chip
instance for all spmi-gpio gpio_irq_chip instances. Instead embed
irq_chip into pmic_gpio_state struct.

gpio gpiochip2: (c440000.qcom,spmi:pmic@2:gpio@c000): detected irqchip that is shared with multiple gpiochips: please fix the driver.
gpio gpiochip3: (c440000.qcom,spmi:pmic@4:gpio@c000): detected irqchip that is shared with multiple gpiochips: please fix the driver.
gpio gpiochip4: (c440000.qcom,spmi:pmic@a:gpio@c000): detected irqchip that is shared with multiple gpiochips: please fix the driver.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20200604002817.667160-1-dmitry.baryshkov@linaro.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
4 years agow1: Replace zero-length array with flexible-array
Gustavo A. R. Silva [Thu, 28 May 2020 14:35:11 +0000 (09:35 -0500)]
w1: Replace zero-length array with flexible-array

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
4 years agotracing/probe: Replace zero-length array with flexible-array
Gustavo A. R. Silva [Thu, 28 May 2020 14:35:11 +0000 (09:35 -0500)]
tracing/probe: Replace zero-length array with flexible-array

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
4 years agosoc: ti: Replace zero-length array with flexible-array
Gustavo A. R. Silva [Thu, 28 May 2020 14:35:11 +0000 (09:35 -0500)]
soc: ti: Replace zero-length array with flexible-array

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
4 years agotifm: Replace zero-length array with flexible-array
Gustavo A. R. Silva [Thu, 28 May 2020 14:35:11 +0000 (09:35 -0500)]
tifm: Replace zero-length array with flexible-array

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>