]> www.infradead.org Git - nvme.git/log
nvme.git
4 years agoperf tools: Use scandir() to iterate threads when synthesizing PERF_RECORD_ events
Namhyung Kim [Tue, 2 Feb 2021 09:01:18 +0000 (18:01 +0900)]
perf tools: Use scandir() to iterate threads when synthesizing PERF_RECORD_ events

Like in __event__synthesize_thread(), I think it's better to use
scandir() instead of the readdir() loop.  In case some malicious task
continues to create new threads, the readdir() loop will run over and
over to collect tids.  The scandir() also has the problem but the window
is much smaller since it doesn't do much work during the iteration.

Also add filter_task() function as we only care the tasks.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210202090118.2008551-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Skip PERF_RECORD_MMAP event synthesis for kernel threads
Namhyung Kim [Tue, 2 Feb 2021 09:01:17 +0000 (18:01 +0900)]
perf tools: Skip PERF_RECORD_MMAP event synthesis for kernel threads

To synthesize information to resolve sample IPs, it needs to scan task
and mmap info from the /proc filesystem.  For each process, it opens
(and reads) status and maps file respectively.  But as kernel threads
don't have memory maps so we can skip the maps file.

To find kernel threads, check "VmPeak:" line in /proc/<PID>/status file.
It's about the peak virtual memory usage so only user-level tasks have
that.  Note that it's possible to miss the line due to partial reads.
So we should double-check if it's a really kernel thread when there's no
VmPeak line.

Thus check "Threads:" line (which follows the VmPeak line whether or not
it exists) to be sure it's read enough data - just in case of deeply
nested pid namespaces or large number of supplementary groups are
involved.

This is for user process:

  $ head -40 /proc/1/status
  Name: systemd
  Umask: 0000
  State: S (sleeping)
  Tgid: 1
  Ngid: 0
  Pid: 1
  PPid: 0
  TracerPid: 0
  Uid: 0 0 0 0
  Gid: 0 0 0 0
  FDSize: 256
  Groups:
  NStgid: 1
  NSpid: 1
  NSpgid: 1
  NSsid: 1
  VmPeak:   234192 kB           <-- here
  VmSize:   169964 kB
  VmLck:        0 kB
  VmPin:        0 kB
  VmHWM:    29528 kB
  VmRSS:     6104 kB
  RssAnon:     2756 kB
  RssFile:     3348 kB
  RssShmem:        0 kB
  VmData:    19776 kB
  VmStk:     1036 kB
  VmExe:      784 kB
  VmLib:     9532 kB
  VmPTE:      116 kB
  VmSwap:     2400 kB
  HugetlbPages:        0 kB
  CoreDumping: 0
  THP_enabled: 1
  Threads: 1                     <-- and here
  SigQ: 1/62808
  SigPnd: 0000000000000000
  ShdPnd: 0000000000000000
  SigBlk: 7be3c0fe28014a03
  SigIgn: 0000000000001000

And this is for kernel thread:

  $ head -20 /proc/2/status
  Name: kthreadd
  Umask: 0000
  State: S (sleeping)
  Tgid: 2
  Ngid: 0
  Pid: 2
  PPid: 0
  TracerPid: 0
  Uid: 0 0 0 0
  Gid: 0 0 0 0
  FDSize: 64
  Groups:
  NStgid: 2
  NSpid: 2
  NSpgid: 0
  NSsid: 0
  Threads: 1                     <-- here
  SigQ: 1/62808
  SigPnd: 0000000000000000
  ShdPnd: 0000000000000000

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210202090118.2008551-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Use /proc/<PID>/task/<TID>/status for PERF_RECORD_ event synthesis
Namhyung Kim [Tue, 2 Feb 2021 09:01:16 +0000 (18:01 +0900)]
perf tools: Use /proc/<PID>/task/<TID>/status for PERF_RECORD_ event synthesis

To save memory usage, it needs to reduce the number of entries in the
proc filesystem.  It's using /proc/<PID>/task directory to traverse
threads in the process and then kernel creates /proc/<PID>/task/<TID>
entries.

After that it checks the thread info using the /proc/<TID>/status file
rather than /proc/<PID>/task/<TID>/status.  As far as I can see, they
are the same and contain all the info we need.

Using the latter eliminates the unnecessary /proc/<TID> entry.  This can
be useful especially a large number of threads are used in the system.
In my experiment around 1KB of memory on average was saved for each
thread (which is not a thread group leader).

To do this, pass both pid and tid to perf_event_prepare_comm() if it
knows them.  In case it doesn't know, passing 0 as pid will do the old
way.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210202090118.2008551-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf vendor events arm64: Reference common and uarch events for A76
John Garry [Thu, 28 Jan 2021 12:00:36 +0000 (20:00 +0800)]
perf vendor events arm64: Reference common and uarch events for A76

Reduce duplication in the JSONs by referencing standard events from
armv8-common-and-microarch.json

In general the "PublicDescription" fields are not modified when somewhat
significantly worded differently than the standard.

Apart from that, description and names for events slightly different to
standard are changed (to standard) for consistency.

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611835236-34696-5-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf vendor events arm64: Reference common and uarch events for Ampere eMag
John Garry [Thu, 28 Jan 2021 12:00:35 +0000 (20:00 +0800)]
perf vendor events arm64: Reference common and uarch events for Ampere eMag

Reduce duplication in the JSONs by referencing standard events from
armv8-common-and-microarch.json

In general the "PublicDescription" fields are not modified when somewhat
significantly worded differently than the standard.

Apart from that, description and names for events slightly different to
standard are changed (to standard) for consistency.

Note that names for events 0x34 and 0x35 are non-standard and remain
unchanged. Those events came from the following originally:

  https://github.com/AmpereComputing/ampere-centos-kernel/blob/4c2479c67bbcf35b35224db12a092b33682b181c/Documentation/arm64/eMAG-ARM-CoreImpDefined.pdf

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
Cc: mathieu.poirier@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611835236-34696-4-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf vendor events arm64: Add common and uarch event JSON
John Garry [Thu, 28 Jan 2021 12:00:34 +0000 (20:00 +0800)]
perf vendor events arm64: Add common and uarch event JSON

Add a common and microarch JSON, which can be referenced from CPU JSONs.

For now, brief and public description are as event brief event
description from the ARMv8 ARM [0], D7-11.

The list of events is not complete, as not all events will be referenced
yet.

Reference document is at the following:

[0] https://documentation-service.arm.com/static/5fa3bd1eb209f547eebd4141?token=

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611835236-34696-3-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf vendor events arm64: Fix Ampere eMag event typo
John Garry [Thu, 28 Jan 2021 12:00:33 +0000 (20:00 +0800)]
perf vendor events arm64: Fix Ampere eMag event typo

The "briefdescription" for event 0x35 has a typo - fix it.

Fixes: d35c595bf005 ("perf vendor events arm64: Revise core JSON events for eMAG")
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611835236-34696-2-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf script: Support DSO filter like in other perf tools
Jin Yao [Sun, 24 Jan 2021 23:27:50 +0000 (07:27 +0800)]
perf script: Support DSO filter like in other perf tools

Other perf tool builtins already supported a DSO filter.

For example:

  $ perf report --dsos a,b,c

which only considers symbols in these dsos.

Now the DSO filter is supported in 'perf script':

  root@kbl-ppc:~# ./perf script --dsos "[kernel.kallsyms]"
            perf 18123 [000] 6142863.075104:          1   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
            perf 18123 [000] 6142863.075107:          1   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
            perf 18123 [000] 6142863.075108:         10   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
            perf 18123 [000] 6142863.075109:        273   cycles:  ffffffff9ca7730a native_write_msr+0xa ([kernel.kallsyms])
            perf 18123 [000] 6142863.075110:       7684   cycles:  ffffffff9ca3c9c0 native_sched_clock+0x50 ([kernel.kallsyms])
            perf 18123 [000] 6142863.075112:     213017   cycles:  ffffffff9d765a92 syscall_exit_to_user_mode+0x32 ([kernel.kallsyms])
            perf 18123 [001] 6142863.075156:          1   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
            perf 18123 [001] 6142863.075158:          1   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
            perf 18123 [001] 6142863.075159:         17   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])

Committer testing:

  $ perf script
                ls 2364888 29303.010949:          1 cycles:u:  ffffffffa4bbc6a9 [unknown] ([unknown])
                ls 2364888 29303.010957:          1 cycles:u:  ffffffffa429ef48 [unknown] ([unknown])
                ls 2364888 29303.010961:          1 cycles:u:  ffffffffa4260133 [unknown] ([unknown])
                ls 2364888 29303.010964:          5 cycles:u:  ffffffffa429efad [unknown] ([unknown])
                ls 2364888 29303.010967:         41 cycles:u:  ffffffffa42a4586 [unknown] ([unknown])
                ls 2364888 29303.010972:        435 cycles:u:  ffffffffa429efe0 [unknown] ([unknown])
                ls 2364888 29303.010978:       5142 cycles:u:      7f9b95bc2abf __GI___tunables_init+0x11f (/usr/lib64/ld-2.32.so)
                ls 2364888 29303.011006:      38551 cycles:u:  ffffffffa4290f61 [unknown] ([unknown])
                ls 2364888 29303.011486:     238234 cycles:u:      7f9b95bb7741 _dl_relocate_object+0xa71 (/usr/lib64/ld-2.32.so)
                ls 2364888 29303.011937:     415870 cycles:u:      7f9b95a1c80e __strcoll_l+0xe (/usr/lib64/libc-2.32.so)
  $

Before:

  $ perf script --dsos /usr/lib64/libc-2.32.so |& head -5
    Error: unknown option `dsos'

   Usage: perf script [<options>]
      or: perf script [<options>] record <script> [<record-options>] <command>
      or: perf script [<options>] report <script> [script-args]
  $

After:

  $ perf script --dsos /usr/lib64/libc-2.32.so
                ls 2364888 29303.011937:     415870 cycles:u:      7f9b95a1c80e __strcoll_l+0xe (/usr/lib64/libc-2.32.so)
  $

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210124232750.19170-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Fix DSO filtering when not finding a map for a sampled address
Arnaldo Carvalho de Melo [Thu, 28 Jan 2021 12:52:47 +0000 (09:52 -0300)]
perf tools: Fix DSO filtering when not finding a map for a sampled address

When we lookup an address and don't find a map we should filter that
sample if the user specified a list of --dso entries to filter on, fix
it.

Before:

  $ perf script
             sleep 274800  2843.556162:          1 cycles:u:  ffffffffbb26bff4 [unknown] ([unknown])
             sleep 274800  2843.556168:          1 cycles:u:  ffffffffbb2b047d [unknown] ([unknown])
             sleep 274800  2843.556171:          1 cycles:u:  ffffffffbb2706b2 [unknown] ([unknown])
             sleep 274800  2843.556174:          6 cycles:u:  ffffffffbb2b0267 [unknown] ([unknown])
             sleep 274800  2843.556176:         59 cycles:u:  ffffffffbb2b03b1 [unknown] ([unknown])
             sleep 274800  2843.556180:        691 cycles:u:  ffffffffbb26bff4 [unknown] ([unknown])
             sleep 274800  2843.556189:       9160 cycles:u:      7fa9550eeaa3 __GI___tunables_init+0xf3 (/usr/lib64/ld-2.32.so)
             sleep 274800  2843.556312:      86937 cycles:u:      7fa9550e157b _dl_lookup_symbol_x+0x4b (/usr/lib64/ld-2.32.so)
  $

So we have some samples we somehow didn't find in a map for, if we now
do:

  $ perf report --stdio --dso /usr/lib64/ld-2.32.so
  # dso: /usr/lib64/ld-2.32.so
  #
  # Total Lost Samples: 0
  #
  # Samples: 8  of event 'cycles:u'
  # Event count (approx.): 96856
  #
  # Overhead  Command  Symbol
  # ........  .......  ........................
  #
      89.76%  sleep    [.] _dl_lookup_symbol_x
       9.46%  sleep    [.] __GI___tunables_init
       0.71%  sleep    [k] 0xffffffffbb26bff4
       0.06%  sleep    [k] 0xffffffffbb2b03b1
       0.01%  sleep    [k] 0xffffffffbb2b0267
       0.00%  sleep    [k] 0xffffffffbb2706b2
       0.00%  sleep    [k] 0xffffffffbb2b047d
  $

After this patch we get the right output with just entries for the DSOs
specified in --dso:

  $ perf report --stdio --dso /usr/lib64/ld-2.32.so
  # dso: /usr/lib64/ld-2.32.so
  #
  # Total Lost Samples: 0
  #
  # Samples: 8  of event 'cycles:u'
  # Event count (approx.): 96856
  #
  # Overhead  Command  Symbol
  # ........  .......  ........................
  #
      89.76%  sleep    [.] _dl_lookup_symbol_x
       9.46%  sleep    [.] __GI___tunables_init
  $
  #

Fixes: 96415e4d3f5fdf9c ("perf symbols: Avoid unnecessary symbol loading when dso list is specified")
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210128131209.GD775562@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf stat: Add Topdown metrics events as default events
Kan Liang [Thu, 21 Jan 2021 13:37:52 +0000 (05:37 -0800)]
perf stat: Add Topdown metrics events as default events

The Topdown Microarchitecture Analysis (TMA) Method is a structured
analysis methodology to identify critical performance bottlenecks in
out-of-order processors. From the Ice Lake and later platforms, the
Topdown information can be retrieved from the dedicated "metrics"
register, which isn't impacted by other events. Also, the Topdown
metrics support both per thread/process and per core measuring.  Adding
Topdown metrics events as default events can enrich the default
measuring information, and would not cost any extra multiplexing.

Introduce arch_evlist__add_default_attrs() to allow architecture
specific default events. Add the Topdown metrics events in the X86
specific arch_evlist__add_default_attrs(). Other architectures can add
their own default events later separately.

With the patch:

 $ perf stat sleep 1

 Performance counter stats for 'sleep 1':

           0.82 msec task-clock:u              #    0.001 CPUs utilized
              0      context-switches:u        #    0.000 K/sec
              0      cpu-migrations:u          #    0.000 K/sec
             61      page-faults:u             #    0.074 M/sec
        319,941      cycles:u                  #    0.388 GHz
        242,802      instructions:u            #    0.76  insn per cycle
         54,380      branches:u                #   66.028 M/sec
          4,043      branch-misses:u           #    7.43% of all branches
      1,585,555      slots:u                   # 1925.189 M/sec
        238,941      topdown-retiring:u        #     15.0% retiring
        410,378      topdown-bad-spec:u        #     25.8% bad speculation
        634,222      topdown-fe-bound:u        #     39.9% frontend bound
        304,675      topdown-be-bound:u        #     19.2% backend bound

       1.001791625 seconds time elapsed

       0.000000000 seconds user
       0.001572000 seconds sys

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20210121133752.118327-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf test: Add parse-metric memory bandwidth testcase
John Garry [Mon, 25 Jan 2021 12:47:22 +0000 (20:47 +0800)]
perf test: Add parse-metric memory bandwidth testcase

Event duration_time in a metric expression requires special handling.

Improve test coverage by including a metric whose expression includes
duration_time. The actual metric is a copied from the L1D_Cache_Fill_BW
metric on my broadwell machine.

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linuxarm@openeuler.org
Link: http://lore.kernel.org/lkml/1611578842-5749-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoMerge remote-tracking branch 'torvalds/master' into perf/core
Arnaldo Carvalho de Melo [Wed, 27 Jan 2021 19:48:04 +0000 (16:48 -0300)]
Merge remote-tracking branch 'torvalds/master' into perf/core

To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoMerge branch 'parisc-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Wed, 27 Jan 2021 19:06:15 +0000 (11:06 -0800)]
Merge branch 'parisc-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
 "Two small fixes:

   - Fix linking error with 64-bit kernel when modules are disabled,
     reported by kernel test robot

   - Remove leftover reference to power_tasklet, by Davidlohr Bueso"

* 'parisc-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Enable -mlong-calls gcc option by default when !CONFIG_MODULES
  parisc: Remove leftover reference to the power_tasklet

4 years agomailmap: remove the "repo-abbrev" comment
Ævar Arnfjörð Bjarmason [Tue, 26 Jan 2021 00:04:38 +0000 (01:04 +0100)]
mailmap: remove the "repo-abbrev" comment

Remove the magical "repo-abbrev" comment added when this file was
introduced in e0ab1ec9fcd3 ([PATCH] add .mailmap for proper
git-shortlog output, 2007-02-14).

It's been an undocumented feature of git-shortlog(1), originally added
to git for Linus's use. Since then he's no longer using it[1], and
I've removed the feature in git.git's 4e168333a87 (shortlog: remove
unused(?) "repo-abbrev" feature, 2021-01-12). It's on the "master"
branch, but not yet in a release version.

Let's also remove it from linux.git, both as a heads-up to any
potential users of it in linux.git whose use would be broken sooner
than later by git itself, and because it'll eventually be entirely
redundant.

1. https://lore.kernel.org/git/CAHk-=wixHyBKZVUcxq+NCWMbkrX0xnppb7UCopRWw1+oExYpYw@mail.gmail.com/

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoparisc: Enable -mlong-calls gcc option by default when !CONFIG_MODULES
Helge Deller [Tue, 26 Jan 2021 19:16:21 +0000 (20:16 +0100)]
parisc: Enable -mlong-calls gcc option by default when !CONFIG_MODULES

When building a kernel without module support, the CONFIG_MLONGCALL option
needs to be enabled in order to reach symbols which are outside of a 22-bit
branch.

This patch changes the autodetection in the Kconfig script to always enable
CONFIG_MLONGCALL when modules are disabled and uses a far call to
preempt_schedule_irq() in intr_do_preempt() to reach the symbol in all cases.

Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: kernel test robot <lkp@intel.com>
Cc: stable@vger.kernel.org # v5.6+
4 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Tue, 26 Jan 2021 19:10:14 +0000 (11:10 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

 -  x86 bugfixes

 - Documentation fixes

 - Avoid performance regression due to SEV-ES patches

 - ARM:
     - Don't allow tagged pointers to point to memslots
     - Filter out ARMv8.1+ PMU events on v8.0 hardware
     - Hide PMU registers from userspace when no PMU is configured
     - More PMU cleanups
     - Don't try to handle broken PSCI firmware
     - More sys_reg() to reg_to_encoding() conversions

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX
  KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written"
  KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest
  KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration
  kvm: tracing: Fix unmatched kvm_entry and kvm_exit events
  KVM: Documentation: Update description of KVM_{GET,CLEAR}_DIRTY_LOG
  KVM: x86: get smi pending status correctly
  KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]
  KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()
  KVM: x86: Add more protection against undefined behavior in rsvd_bits()
  KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM
  KVM: Forbid the use of tagged userspace addresses for memslots
  KVM: arm64: Filter out v8.1+ events on v8.0 HW
  KVM: arm64: Compute TPIDR_EL2 ignoring MTE tag
  KVM: arm64: Use the reg_to_encoding() macro instead of sys_reg()
  KVM: arm64: Allow PSCI SYSTEM_OFF/RESET to return
  KVM: arm64: Simplify handling of absent PMU system registers
  KVM: arm64: Hide PMU registers from userspace when not available

4 years agoMerge tag 'spi-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brooni...
Linus Torvalds [Tue, 26 Jan 2021 19:03:30 +0000 (11:03 -0800)]
Merge tag 'spi-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "One new device ID here, plus an error handling fix - nothing
  remarkable in either"

* tag 'spi-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spidev: Add cisco device compatible
  spi: altera: Fix memory leak on error path

4 years agoMerge tag 'regulator-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 26 Jan 2021 18:59:01 +0000 (10:59 -0800)]
Merge tag 'regulator-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "The main thing here is a change to make sure that we don't try to
  double resolve the supply of a regulator if we have two probes going
  on simultaneously, plus an incremental fix on top of that to resolve a
  lockdep issue it introduced.

  There's also a patch from Dmitry Osipenko adding stubs for some
  functions to avoid build issues in consumers in some configurations"

* tag 'regulator-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: Fix lockdep warning resolving supplies
  regulator: consumer: Add missing stubs to regulator/consumer.h
  regulator: core: avoid regulator_resolve_supply() race condition

4 years agoparisc: Remove leftover reference to the power_tasklet
Davidlohr Bueso [Fri, 15 Jan 2021 00:14:48 +0000 (16:14 -0800)]
parisc: Remove leftover reference to the power_tasklet

This was removed long ago, back in:

     6e16d9409e1 ([PARISC] Convert soft power switch driver to kthread)

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Helge Deller <deller@gmx.de>
4 years agoRevert "mm: fix initialization of struct page for holes in memory layout"
Linus Torvalds [Tue, 26 Jan 2021 18:39:46 +0000 (10:39 -0800)]
Revert "mm: fix initialization of struct page for holes in memory layout"

This reverts commit d3921cb8be29ce5668c64e23ffdaeec5f8c69399.

Chris Wilson reports that it causes boot problems:

 "We have half a dozen or so different machines in CI that are silently
  failing to boot, that we believe is bisected to this patch"

and the CI team confirmed that a revert fixed the issues.

The cause is unknown for now, so let's revert it.

Link: https://lore.kernel.org/lkml/161160687463.28991.354987542182281928@build.alporthouse.com/
Reported-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoKVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX
Paolo Bonzini [Fri, 8 Jan 2021 16:43:08 +0000 (11:43 -0500)]
KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX

VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS,
which may need to be loaded outside guest mode.  Therefore we cannot
WARN in that case.

However, that part of nested_get_vmcs12_pages is _not_ needed at
vmentry time.  Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling,
so that both vmentry and migration (and in the latter case, independent
of is_guest_mode) do the parts that are needed.

Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3ba: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
Cc: <stable@vger.kernel.org> # 5.10.x
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: Revert "KVM: x86: Mark GPRs dirty when written"
Sean Christopherson [Fri, 22 Jan 2021 23:50:48 +0000 (15:50 -0800)]
KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written"

Revert the dirty/available tracking of GPRs now that KVM copies the GPRs
to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty.  Per
commit de3cd117ed2f ("KVM: x86: Omit caching logic for always-available
GPRs"), tracking for GPRs noticeably impacts KVM's code footprint.

This reverts commit 1c04d8c986567c27c56c05205dceadc92efb14ff.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210122235049.3107620-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest
Sean Christopherson [Fri, 22 Jan 2021 23:50:47 +0000 (15:50 -0800)]
KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest

Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the
GRPs' dirty bits are set from time zero and never cleared, i.e. will
always be seen as dirty.  The obvious alternative would be to clear
the dirty bits when appropriate, but removing the dirty checks is
desirable as it allows reverting GPR dirty+available tracking, which
adds overhead to all flavors of x86 VMs.

Note, unconditionally writing the GPRs in the GHCB is tacitly allowed
by the GHCB spec, which allows the hypervisor (or guest) to provide
unnecessary info; it's the guest's responsibility to consume only what
it needs (the hypervisor is untrusted after all).

  The guest and hypervisor can supply additional state if desired but
  must not rely on that additional state being provided.

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210122235049.3107620-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration
Maxim Levitsky [Thu, 14 Jan 2021 20:54:47 +0000 (22:54 +0200)]
KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration

Even when we are outside the nested guest, some vmcs02 fields
may not be in sync vs vmcs12.  This is intentional, even across
nested VM-exit, because the sync can be delayed until the nested
hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those
rarely accessed fields.

However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to
be able to restore it.  To fix that, call copy_vmcs02_to_vmcs12_rare()
before the vmcs12 contents are copied to userspace.

Fixes: 7952d769c29ca ("KVM: nVMX: Sync rarely accessed guest fields only when needed")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210114205449.8715-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agokvm: tracing: Fix unmatched kvm_entry and kvm_exit events
Lorenzo Brescia [Wed, 23 Dec 2020 14:45:07 +0000 (14:45 +0000)]
kvm: tracing: Fix unmatched kvm_entry and kvm_exit events

On VMX, if we exit and then re-enter immediately without leaving
the vmx_vcpu_run() function, the kvm_entry event is not logged.
That means we will see one (or more) kvm_exit, without its (their)
corresponding kvm_entry, as shown here:

 CPU-1979 [002] 89.871187: kvm_entry: vcpu 1
 CPU-1979 [002] 89.871218: kvm_exit:  reason MSR_WRITE
 CPU-1979 [002] 89.871259: kvm_exit:  reason MSR_WRITE

It also seems possible for a kvm_entry event to be logged, but then
we leave vmx_vcpu_run() right away (if vmx->emulation_required is
true). In this case, we will have a spurious kvm_entry event in the
trace.

Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run()
(where trace_kvm_exit() already is).

A trace obtained with this patch applied looks like this:

 CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0
 CPU-14295 [000] 8388.395392: kvm_exit:  reason MSR_WRITE
 CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0
 CPU-14295 [000] 8388.395503: kvm_exit:  reason EXTERNAL_INTERRUPT

Of course, not calling trace_kvm_entry() in common x86 code any
longer means that we need to adjust the SVM side of things too.

Signed-off-by: Lorenzo Brescia <lorenzo.brescia@edu.unito.it>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: Documentation: Update description of KVM_{GET,CLEAR}_DIRTY_LOG
Zenghui Yu [Tue, 8 Dec 2020 04:34:39 +0000 (12:34 +0800)]
KVM: Documentation: Update description of KVM_{GET,CLEAR}_DIRTY_LOG

Update various words, including the wrong parameter name and the vague
description of the usage of "slot" field.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Message-Id: <20201208043439.895-1-yuzenghui@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: get smi pending status correctly
Jay Zhou [Mon, 18 Jan 2021 08:47:20 +0000 (16:47 +0800)]
KVM: x86: get smi pending status correctly

The injection process of smi has two steps:

    Qemu                        KVM
Step1:
    cpu->interrupt_request &= \
        ~CPU_INTERRUPT_SMI;
    kvm_vcpu_ioctl(cpu, KVM_SMI)

                                call kvm_vcpu_ioctl_smi() and
                                kvm_make_request(KVM_REQ_SMI, vcpu);

Step2:
    kvm_vcpu_ioctl(cpu, KVM_RUN, 0)

                                call process_smi() if
                                kvm_check_request(KVM_REQ_SMI, vcpu) is
                                true, mark vcpu->arch.smi_pending = true;

The vcpu->arch.smi_pending will be set true in step2, unfortunately if
vcpu paused between step1 and step2, the kvm_run->immediate_exit will be
set and vcpu has to exit to Qemu immediately during step2 before mark
vcpu->arch.smi_pending true.
During VM migration, Qemu will get the smi pending status from KVM using
KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status
will be lost.

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Shengen Zhuang <zhuangshengen@huawei.com>
Message-Id: <20210118084720.1585-1-jianjay.zhou@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]
Like Xu [Wed, 30 Dec 2020 08:19:16 +0000 (16:19 +0800)]
KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]

The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as
0x0300 in the intel_perfmon_event_map[]. Correct its usage.

Fixes: 62079d8a4312 ("KVM: PMU: add proper support for fixed counter 2")
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20201230081916.63417-1-like.xu@linux.intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()
Like Xu [Mon, 18 Jan 2021 02:58:00 +0000 (10:58 +0800)]
KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()

Since we know vPMU will not work properly when (1) the guest bit_width(s)
of the [gp|fixed] counters are greater than the host ones, or (2) guest
requested architectural events exceeds the range supported by the host, so
we can setup a smaller left shift value and refresh the guest cpuid entry,
thus fixing the following UBSAN shift-out-of-bounds warning:

shift exponent 197 is too large for 64-bit type 'long long unsigned int'

Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348
 kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177
 kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308
 kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709
 kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: syzbot+ae488dc136a4cc6ba32b@syzkaller.appspotmail.com
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20210118025800.34620-1-like.xu@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: Add more protection against undefined behavior in rsvd_bits()
Sean Christopherson [Wed, 13 Jan 2021 20:45:15 +0000 (12:45 -0800)]
KVM: x86: Add more protection against undefined behavior in rsvd_bits()

Add compile-time asserts in rsvd_bits() to guard against KVM passing in
garbage hardcoded values, and cap the upper bound at '63' for dynamic
values to prevent generating a mask that would overflow a u64.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210113204515.3473079-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM
Quentin Perret [Fri, 8 Jan 2021 16:53:49 +0000 (16:53 +0000)]
KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM

The documentation classifies KVM_ENABLE_CAP with KVM_CAP_ENABLE_CAP_VM
as a vcpu ioctl, which is incorrect. Fix it by specifying it as a VM
ioctl.

Fixes: e5d83c74a580 ("kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic")
Signed-off-by: Quentin Perret <qperret@google.com>
Message-Id: <20210108165349.747359-1-qperret@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoMerge tag 'kvmarm-fixes-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Mon, 25 Jan 2021 23:52:01 +0000 (18:52 -0500)]
Merge tag 'kvmarm-fixes-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.11, take #2

- Don't allow tagged pointers to point to memslots
- Filter out ARMv8.1+ PMU events on v8.0 hardware
- Hide PMU registers from userspace when no PMU is configured
- More PMU cleanups
- Don't try to handle broken PSCI firmware
- More sys_reg() to reg_to_encoding() conversions

4 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Mon, 25 Jan 2021 23:26:51 +0000 (15:26 -0800)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "Fix a regression in the cesa driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: marvel/cesa - Fix tdma descriptor on 64-bit

4 years agofs/pipe: allow sendfile() to pipe again
Johannes Berg [Mon, 25 Jan 2021 09:16:15 +0000 (10:16 +0100)]
fs/pipe: allow sendfile() to pipe again

After commit 36e2c7421f02 ("fs: don't allow splice read/write
without explicit ops") sendfile() could no longer send data
from a real file to a pipe, breaking for example certain cgit
setups (e.g. when running behind fcgiwrap), because in this
case cgit will try to do exactly this: sendfile() to a pipe.

Fix this by using iter_file_splice_write for the splice_write
method of pipes, as suggested by Christoph.

Cc: stable@vger.kernel.org
Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoCommit 9bb48c82aced ("tty: implement write_iter") converted the tty
Sami Tolvanen [Mon, 25 Jan 2021 19:09:25 +0000 (11:09 -0800)]
Commit 9bb48c82aced ("tty: implement write_iter") converted the tty
layer to use write_iter. Fix the redirected_tty_write declaration
also in n_tty and change the comparisons to use write_iter instead of
write.

[ Also moved the declaration of redirected_tty_write() to the proper
  location in a header file. The reason for the bug was the bogus extern
  declaration in n_tty.c silently not matching the changed definition in
  tty_io.c, and because it wasn't in a shared header file, there was no
  cross-checking of the declaration.

  Sami noticed because Clang's Control Flow Integrity checking ended up
  incidentally noticing the inconsistent declaration.    - Linus ]

Fixes: 9bb48c82aced ("tty: implement write_iter")
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge tag 'printk-for-5.11-urgent-fixup' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 25 Jan 2021 18:19:40 +0000 (10:19 -0800)]
Merge tag 'printk-for-5.11-urgent-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fix from Petr Mladek:
 "The fix of a potential buffer overflow in 5.11-rc5 introduced another
  one. The trailing '\0' might be written up to the message "len" past
  the buffer. Fortunately, it is not that easy to hit.

  Most readers use 1kB buffers for a single message. Typical messages
  fit into the temporary buffer with enough reserve.

  Also readers do not rely on the '\0'. It is related to the previous
  fix. Some readers required the space for the trailing '\0'. We decided
  to write it there to avoid such regressions in the future.

  The most realistic victims are dumpers using kmsg_dump_get_buffer().
  They are filling the entire buffer with as many messages as possible.
  They are typically used when handling panic()"

* tag 'printk-for-5.11-urgent-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: fix string termination for record_print_text()

4 years agoMerge branch 'printk-rework' into for-linus
Petr Mladek [Mon, 25 Jan 2021 13:29:35 +0000 (14:29 +0100)]
Merge branch 'printk-rework' into for-linus

4 years agospidev: Add cisco device compatible
Daniel Walker [Thu, 21 Jan 2021 23:12:36 +0000 (15:12 -0800)]
spidev: Add cisco device compatible

Add compatible string for Cisco device present on the Cisco Petra
platform.

Signed-off-by: Daniel Walker <danielwa@cisco.com>
Cc: xe-linux-external@cisco.com
Link: https://lore.kernel.org/r/20210121231237.30664-2-danielwa@cisco.com
Signed-off-by: Mark Brown <broonie@kernel.org>
4 years agoprintk: fix string termination for record_print_text()
John Ogness [Sun, 24 Jan 2021 20:27:28 +0000 (21:33 +0106)]
printk: fix string termination for record_print_text()

Commit f0e386ee0c0b ("printk: fix buffer overflow potential for
print_text()") added string termination in record_print_text().
However it used the wrong base pointer for adding the terminator.
This led to a 0-byte being written somewhere beyond the buffer.

Use the correct base pointer when adding the terminator.

Fixes: f0e386ee0c0b ("printk: fix buffer overflow potential for print_text()")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210124202728.4718-1-john.ogness@linutronix.de
4 years agoLinux 5.11-rc5
Linus Torvalds [Mon, 25 Jan 2021 00:47:14 +0000 (16:47 -0800)]
Linux 5.11-rc5

4 years agoMerge tag 'sh-for-5.11' of git://git.libc.org/linux-sh
Linus Torvalds [Sun, 24 Jan 2021 21:52:02 +0000 (13:52 -0800)]
Merge tag 'sh-for-5.11' of git://git.libc.org/linux-sh

Pull arch/sh updates from Rich Felker:
 "Cleanup and warning fixes"

* tag 'sh-for-5.11' of git://git.libc.org/linux-sh:
  sh/intc: Restore devm_ioremap() alignment
  sh: mach-sh03: remove duplicate include
  arch: sh: remove duplicate include
  sh: Drop ARCH_NR_GPIOS definition
  sh: Remove unused HAVE_COPY_THREAD_TLS macro
  sh: remove CONFIG_IDE from most defconfig
  sh: mm: Convert to DEFINE_SHOW_ATTRIBUTE
  sh: intc: Convert to DEFINE_SHOW_ATTRIBUTE
  arch/sh: hyphenate Non-Uniform in Kconfig prompt
  sh: dma: fix kconfig dependency for G2_DMA

4 years agoMerge tag 'io_uring-5.11-2021-01-24' of git://git.kernel.dk/linux-block
Linus Torvalds [Sun, 24 Jan 2021 20:30:14 +0000 (12:30 -0800)]
Merge tag 'io_uring-5.11-2021-01-24' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Still need a final cancelation fix that isn't quite done done,
  expected in the next day or two. That said, this contains:

   - Wakeup fix for IOPOLL requests

   - SQPOLL split close op handling fix

   - Ensure that any use of io_uring fd itself is marked as inflight

   - Short non-regular file read fix (Pavel)

   - Fix up bad false positive warning (Pavel)

   - SQPOLL fixes (Pavel)

   - In-flight removal fix (Pavel)"

* tag 'io_uring-5.11-2021-01-24' of git://git.kernel.dk/linux-block:
  io_uring: account io_uring internal files as REQ_F_INFLIGHT
  io_uring: fix sleeping under spin in __io_clean_op
  io_uring: fix short read retries for non-reg files
  io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state
  io_uring: fix skipping disabling sqo on exec
  io_uring: fix uring_flush in exit_files() warning
  io_uring: fix false positive sqo warning on flush
  io_uring: iopoll requests should also wake task ->in_idle state

4 years agoMerge tag 'block-5.11-2021-01-24' of git://git.kernel.dk/linux-block
Linus Torvalds [Sun, 24 Jan 2021 20:24:35 +0000 (12:24 -0800)]
Merge tag 'block-5.11-2021-01-24' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request from Christoph:
      - fix a status code in nvmet (Chaitanya Kulkarni)
      - avoid double completions in nvme-rdma/nvme-tcp (Chao Leng)
      - fix the CMB support to cope with NVMe 1.4 controllers (Klaus Jensen)
      - fix PRINFO handling in the passthrough ioctl (Revanth Rajashekar)
      - fix a double DMA unmap in nvme-pci

 - lightnvm error path leak fix (Pan)

 - MD pull request from Song:
      - Flush request fix (Xiao)

* tag 'block-5.11-2021-01-24' of git://git.kernel.dk/linux-block:
  lightnvm: fix memory leak when submit fails
  nvme-pci: fix error unwind in nvme_map_data
  nvme-pci: refactor nvme_unmap_data
  md: Set prev_flush_start and flush_bio in an atomic way
  nvmet: set right status on error in id-ns handler
  nvme-pci: allow use of cmb on v1.4 controllers
  nvme-tcp: avoid request double completion for concurrent nvme_tcp_timeout
  nvme-rdma: avoid request double completion for concurrent nvme_rdma_timeout
  nvme: check the PRINFO bit before deciding the host buffer length

4 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sun, 24 Jan 2021 20:16:34 +0000 (12:16 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "18 patches.

  Subsystems affected by this patch series: mm (pagealloc, memcg, kasan,
  memory-failure, and highmem), ubsan, proc, and MAINTAINERS"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  MAINTAINERS: add a couple more files to the Clang/LLVM section
  proc_sysctl: fix oops caused by incorrect command parameters
  powerpc/mm/highmem: use __set_pte_at() for kmap_local()
  mips/mm/highmem: use set_pte() for kmap_local()
  mm/highmem: prepare for overriding set_pte_at()
  sparc/mm/highmem: flush cache and TLB
  mm: fix page reference leak in soft_offline_page()
  ubsan: disable unsigned-overflow check for i386
  kasan, mm: fix resetting page_alloc tags for HW_TAGS
  kasan, mm: fix conflicts with init_on_alloc/free
  kasan: fix HW_TAGS boot parameters
  kasan: fix incorrect arguments passing in kasan_add_zero_shadow
  kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow
  mm: fix numa stats for thp migration
  mm: memcg: fix memcg file_dirty numa stat
  mm: memcg/slab: optimize objcg stock draining
  mm: fix initialization of struct page for holes in memory layout
  x86/setup: don't remove E820_TYPE_RAM for pfn 0

4 years agoMerge tag 'char-misc-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sun, 24 Jan 2021 19:26:46 +0000 (11:26 -0800)]
Merge tag 'char-misc-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small char/misc driver fixes for 5.11-rc5:

   - habanalabs driver fixes

   - phy driver fixes

   - hwtracing driver fixes

   - rtsx cardreader driver fix

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  misc: rtsx: init value of aspm_enabled
  habanalabs: disable FW events on device removal
  habanalabs: fix backward compatibility of idle check
  habanalabs: zero pci counters packet before submit to FW
  intel_th: pci: Add Alder Lake-P support
  stm class: Fix module init return on allocation failure
  habanalabs: prevent soft lockup during unmap
  habanalabs: fix reset process in case of failures
  habanalabs: fix dma_addr passed to dma_mmap_coherent
  phy: mediatek: allow compile-testing the dsi phy
  phy: cpcap-usb: Fix warning for missing regulator_disable
  PHY: Ingenic: fix unconditional build of phy-ingenic-usb

4 years agoMerge tag 'driver-core-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 24 Jan 2021 19:05:48 +0000 (11:05 -0800)]
Merge tag 'driver-core-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some small driver core fixes for 5.11-rc5 that resolve some
  reported problems:

   - revert of a -rc1 patch that was causing problems with some machines

   - device link device name collision problem fix (busses only have to
     name devices unique to their bus, not unique to all busses)

   - kernfs splice bugfixes to resolve firmware loading problems for
     Qualcomm systems.

   - other tiny driver core fixes for minor issues reported.

  All of these have been in linux-next with no reported problems"

* tag 'driver-core-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: Fix device link device name collision
  driver core: Extend device_is_dependent()
  kernfs: wire up ->splice_read and ->splice_write
  kernfs: implement ->write_iter
  kernfs: implement ->read_iter
  Revert "driver core: Reorder devices on successful probe"
  Driver core: platform: Add extra error check in devm_platform_get_irqs_affinity()
  drivers core: Free dma_range_map when driver probe failed

4 years agoMerge tag 'staging-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 24 Jan 2021 19:02:01 +0000 (11:02 -0800)]
Merge tag 'staging-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/IIO driver fixes from Greg KH:
 "Here are some IIO driver fixes for 5.11-rc5 to resolve some reported
  problems.

  Nothing major, just a few small fixes, all of these have been in
  linux-next for a while and full details are in the shortlog"

* tag 'staging-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: sx9310: Fix semtech,avg-pos-strength setting when > 16
  iio: common: st_sensors: fix possible infinite loop in st_sensors_irq_thread
  iio: ad5504: Fix setting power-down state
  counter:ti-eqep: remove floor
  drivers: iio: temperature: Add delay after the addressed reset command in mlx90632.c
  iio: adc: ti_am335x_adc: remove omitted iio_kfifo_free()
  dt-bindings: iio: accel: bma255: Fix bmc150/bmi055 compatible
  iio: sx9310: Off by one in sx9310_read_thresh()

4 years agoMerge tag 'tty-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 24 Jan 2021 18:56:45 +0000 (10:56 -0800)]
Merge tag 'tty-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are three small tty/serial fixes for 5.11-rc5 to resolve reported
  problems:

   - two patches to fix up writing to ttys with splice

   - mvebu-uart driver fix for reported problem

  All of these have been in linux-next with no reported problems"

* tag 'tty-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: fix up hung_up_tty_write() conversion
  tty: implement write_iter
  serial: mvebu-uart: fix tx lost characters at power off

4 years agoMerge tag 'usb-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 24 Jan 2021 18:54:54 +0000 (10:54 -0800)]
Merge tag 'usb-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB driver fixes for 5.11-rc5.  They resolve:

   - xhci issues for some reported problems

   - ehci driver issue for one specific device

   - USB gadget fixes for some reported problems

   - cdns3 driver fixes for issues reported

   - MAINTAINERS file update

   - thunderbolt minor fix

  All of these have been in linux-next with no reported issues"

* tag 'usb-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: bdc: Make bdc pci driver depend on BROKEN
  xhci: tegra: Delay for disabling LFPS detector
  xhci: make sure TRB is fully written before giving it to the controller
  usb: udc: core: Use lock when write to soft_connect
  USB: gadget: dummy-hcd: Fix errors in port-reset handling
  usb: gadget: aspeed: fix stop dma register setting.
  USB: ehci: fix an interrupt calltrace error
  ehci: fix EHCI host controller initialization sequence
  MAINTAINERS: update Peter Chen's email address
  thunderbolt: Drop duplicated 0x prefix from format string
  MAINTAINERS: Update address for Cadence USB3 driver
  usb: cdns3: imx: improve driver .remove API
  usb: cdns3: imx: fix can't create core device the second time issue
  usb: cdns3: imx: fix writing read-only memory issue

4 years agoMAINTAINERS: add a couple more files to the Clang/LLVM section
Nathan Chancellor [Sun, 24 Jan 2021 05:02:21 +0000 (21:02 -0800)]
MAINTAINERS: add a couple more files to the Clang/LLVM section

The K: entry should ensure that Nick and I always get CC'd on patches that
touch these files but it is better to be explicit rather than implicit.

Link: https://lkml.kernel.org/r/20210114004059.2129921-1-natechancellor@gmail.com
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoproc_sysctl: fix oops caused by incorrect command parameters
Xiaoming Ni [Sun, 24 Jan 2021 05:02:16 +0000 (21:02 -0800)]
proc_sysctl: fix oops caused by incorrect command parameters

The process_sysctl_arg() does not check whether val is empty before
invoking strlen(val).  If the command line parameter () is incorrectly
configured and val is empty, oops is triggered.

For example:
  "hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is
  triggered. The call stack is as follows:
    Kernel command line: .... hung_task_panic
    ......
    Call trace:
    __pi_strlen+0x10/0x98
    parse_args+0x278/0x344
    do_sysctl_args+0x8c/0xfc
    kernel_init+0x5c/0xf4
    ret_from_fork+0x10/0x30

To fix it, check whether "val" is empty when "phram" is a sysctl field.
Error codes are returned in the failure branch, and error logs are
generated by parse_args().

Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com
Fixes: 3db978d480e2843 ("kernel/sysctl: support setting sysctl parameters from kernel command line")
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: <stable@vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agopowerpc/mm/highmem: use __set_pte_at() for kmap_local()
Thomas Gleixner [Sun, 24 Jan 2021 05:02:11 +0000 (21:02 -0800)]
powerpc/mm/highmem: use __set_pte_at() for kmap_local()

The original PowerPC highmem mapping function used __set_pte_at() to
denote that the mapping is per CPU.  This got lost with the conversion
to the generic implementation.

Override the default map function.

Link: https://lkml.kernel.org/r/20210112170411.281464308@linutronix.de
Fixes: 47da42b27a56 ("powerpc/mm/highmem: Switch to generic kmap atomic")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomips/mm/highmem: use set_pte() for kmap_local()
Thomas Gleixner [Sun, 24 Jan 2021 05:02:07 +0000 (21:02 -0800)]
mips/mm/highmem: use set_pte() for kmap_local()

set_pte_at() on MIPS invokes update_cache() which might recurse into
kmap_local().

Use set_pte() like the original MIPS highmem implementation did.

Link: https://lkml.kernel.org/r/20210112170411.187513575@linutronix.de
Fixes: a4c33e83bca1 ("mips/mm/highmem: Switch to generic kmap atomic")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Paul Cercueil <paul@crapouillou.net>
Reported-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/highmem: prepare for overriding set_pte_at()
Thomas Gleixner [Sun, 24 Jan 2021 05:02:02 +0000 (21:02 -0800)]
mm/highmem: prepare for overriding set_pte_at()

The generic kmap_local() map function uses set_pte_at(), but MIPS requires
set_pte() and PowerPC wants __set_pte_at().

Provide arch_kmap_local_set_pte() and default it to set_pte_at().

Link: https://lkml.kernel.org/r/20210112170411.056306194@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agosparc/mm/highmem: flush cache and TLB
Thomas Gleixner [Sun, 24 Jan 2021 05:01:57 +0000 (21:01 -0800)]
sparc/mm/highmem: flush cache and TLB

Patch series "mm/highmem: Fix fallout from generic kmap_local
conversions".

The kmap_local conversion wreckaged sparc, mips and powerpc as it missed
some of the details in the original implementation.

This patch (of 4):

The recent conversion to the generic kmap_local infrastructure failed to
assign the proper pre/post map/unmap flush operations for sparc.

Sparc requires cache flush before map/unmap and tlb flush afterwards.

Link: https://lkml.kernel.org/r/20210112170136.078559026@linutronix.de
Link: https://lkml.kernel.org/r/20210112170410.905976187@linutronix.de
Fixes: 3293efa97807 ("sparc/mm/highmem: Switch to generic kmap atomic")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Andreas Larsson <andreas@gaisler.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: fix page reference leak in soft_offline_page()
Dan Williams [Sun, 24 Jan 2021 05:01:52 +0000 (21:01 -0800)]
mm: fix page reference leak in soft_offline_page()

The conversion to move pfn_to_online_page() internal to
soft_offline_page() missed that the get_user_pages() reference taken by
the madvise() path needs to be dropped when pfn_to_online_page() fails.

Note the direct sysfs-path to soft_offline_page() does not perform a
get_user_pages() lookup.

When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page()
pfn the kernel hangs at dax-device shutdown due to a leaked reference.

Link: https://lkml.kernel.org/r/161058501210.1840162.8108917599181157327.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoubsan: disable unsigned-overflow check for i386
Arnd Bergmann [Sun, 24 Jan 2021 05:01:48 +0000 (21:01 -0800)]
ubsan: disable unsigned-overflow check for i386

Building ubsan kernels even for compile-testing introduced these
warnings in my randconfig environment:

  crypto/blake2b_generic.c:98:13: error: stack frame size of 9636 bytes in function 'blake2b_compress' [-Werror,-Wframe-larger-than=]
  static void blake2b_compress(struct blake2b_state *S,

  crypto/sha512_generic.c:151:13: error: stack frame size of 1292 bytes in function 'sha512_generic_block_fn' [-Werror,-Wframe-larger-than=]
  static void sha512_generic_block_fn(struct sha512_state *sst, u8 const *src,

  lib/crypto/curve25519-fiat32.c:312:22: error: stack frame size of 2180 bytes in function 'fe_mul_impl' [-Werror,-Wframe-larger-than=]
  static noinline void fe_mul_impl(u32 out[10], const u32 in1[10], const u32 in2[10])

  lib/crypto/curve25519-fiat32.c:444:22: error: stack frame size of 1588 bytes in function 'fe_sqr_impl' [-Werror,-Wframe-larger-than=]
  static noinline void fe_sqr_impl(u32 out[10], const u32 in1[10])

Further testing showed that this is caused by
-fsanitize=unsigned-integer-overflow, but is isolated to the 32-bit x86
architecture.

The one in blake2b immediately overflows the 8KB stack area
architectures, so better ensure this never happens by disabling the
option for 32-bit x86.

Link: https://lkml.kernel.org/r/20210112202922.2454435-1-arnd@kernel.org
Link: https://lore.kernel.org/lkml/20201230154749.746641-1-arnd@kernel.org/
Fixes: d0a3ac549f38 ("ubsan: enable for all*config builds")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Marco Elver <elver@google.com>
Cc: George Popescu <georgepope@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agokasan, mm: fix resetting page_alloc tags for HW_TAGS
Andrey Konovalov [Sun, 24 Jan 2021 05:01:43 +0000 (21:01 -0800)]
kasan, mm: fix resetting page_alloc tags for HW_TAGS

A previous commit added resetting KASAN page tags to
kernel_init_free_pages() to avoid false-positives due to accesses to
metadata with the hardware tag-based mode.

That commit did reset page tags before the metadata access, but didn't
restore them after.  As the result, KASAN fails to detect bad accesses
to page_alloc allocations on some configurations.

Fix this by recovering the tag after the metadata access.

Link: https://lkml.kernel.org/r/02b5bcd692e912c27d484030f666b350ad7e4ae4.1611074450.git.andreyknvl@google.com
Fixes: aa1ef4d7b3f6 ("kasan, mm: reset tags when accessing metadata")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agokasan, mm: fix conflicts with init_on_alloc/free
Andrey Konovalov [Sun, 24 Jan 2021 05:01:38 +0000 (21:01 -0800)]
kasan, mm: fix conflicts with init_on_alloc/free

A few places where SLUB accesses object's data or metadata were missed
in a previous patch.  This leads to false positives with hardware
tag-based KASAN when bulk allocations are used with init_on_alloc/free.

Fix the false-positives by resetting pointer tags during these accesses.

(The kasan_reset_tag call is removed from slab_alloc_node, as it's added
 into maybe_wipe_obj_freeptr.)

Link: https://linux-review.googlesource.com/id/I50dd32838a666e173fe06c3c5c766f2c36aae901
Link: https://lkml.kernel.org/r/093428b5d2ca8b507f4a79f92f9929b35f7fada7.1610731872.git.andreyknvl@google.com
Fixes: aa1ef4d7b3f67 ("kasan, mm: reset tags when accessing metadata")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agokasan: fix HW_TAGS boot parameters
Andrey Konovalov [Sun, 24 Jan 2021 05:01:34 +0000 (21:01 -0800)]
kasan: fix HW_TAGS boot parameters

The initially proposed KASAN command line parameters are redundant.

This change drops the complex "kasan.mode=off/prod/full" parameter and
adds a simpler kill switch "kasan=off/on" instead.  The new parameter
together with the already existing ones provides a cleaner way to
express the same set of features.

The full set of parameters with this change:

  kasan=off/on             - whether KASAN is enabled
  kasan.fault=report/panic - whether to only print a report or also panic
  kasan.stacktrace=off/on  - whether to collect alloc/free stack traces

Default values:

  kasan=on
  kasan.fault=report
  kasan.stacktrace=on  (if CONFIG_DEBUG_KERNEL=y)
  kasan.stacktrace=off (otherwise)

Link: https://linux-review.googlesource.com/id/Ib3694ed90b1e8ccac6cf77dfd301847af4aba7b8
Link: https://lkml.kernel.org/r/4e9c4a4bdcadc168317deb2419144582a9be6e61.1610736745.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agokasan: fix incorrect arguments passing in kasan_add_zero_shadow
Lecopzer Chen [Sun, 24 Jan 2021 05:01:29 +0000 (21:01 -0800)]
kasan: fix incorrect arguments passing in kasan_add_zero_shadow

kasan_remove_zero_shadow() shall use original virtual address, start and
size, instead of shadow address.

Link: https://lkml.kernel.org/r/20210103063847.5963-1-lecopzer@gmail.com
Fixes: 0207df4fa1a86 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agokasan: fix unaligned address is unhandled in kasan_remove_zero_shadow
Lecopzer Chen [Sun, 24 Jan 2021 05:01:25 +0000 (21:01 -0800)]
kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow

During testing kasan_populate_early_shadow and kasan_remove_zero_shadow,
if the shadow start and end address in kasan_remove_zero_shadow() is not
aligned to PMD_SIZE, the remain unaligned PTE won't be removed.

In the test case for kasan_remove_zero_shadow():

    shadow_start: 0xffffffb802000000, shadow end: 0xffffffbfbe000000

    3-level page table:
      PUD_SIZE: 0x40000000 PMD_SIZE: 0x200000 PAGE_SIZE: 4K

0xffffffbf80000000 ~ 0xffffffbfbdf80000 will not be removed because in
kasan_remove_pud_table(), kasan_pmd_table(*pud) is true but the next
address is 0xffffffbfbdf80000 which is not aligned to PUD_SIZE.

In the correct condition, this should fallback to the next level
kasan_remove_pmd_table() but the condition flow always continue to skip
the unaligned part.

Fix by correcting the condition when next and addr are neither aligned.

Link: https://lkml.kernel.org/r/20210103135621.83129-1-lecopzer@gmail.com
Fixes: 0207df4fa1a86 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: YJ Chiang <yj.chiang@mediatek.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge tag 'irq_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 24 Jan 2021 18:24:20 +0000 (10:24 -0800)]
Merge tag 'irq_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Fix a kernel panic in mips-cpu due to invalid irq domain hierarchy.

 - Fix to not lose IPIs on bcm2836.

 - Fix for a bogus marking of ITS devices as shared due to unitialized
   stack variable.

 - Clear a phantom interrupt on qcom-pdc to unblock suspend.

 - Small cleanups, warning and build fixes.

* tag 'irq_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Export irq_check_status_bit()
  irqchip/mips-cpu: Set IPI domain parent chip
  irqchip/pruss: Simplify the TI_PRUSS_INTC Kconfig
  irqchip/loongson-liointc: Fix build warnings
  driver core: platform: Add extra error check in devm_platform_get_irqs_affinity()
  irqchip/bcm2836: Fix IPI acknowledgement after conversion to handle_percpu_devid_irq
  irqchip/irq-sl28cpld: Convert comma to semicolon
  genirq/msi: Initialize msi_alloc_info before calling msi_domain_prepare_irqs()

4 years agoMerge tag 'objtool_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 24 Jan 2021 18:17:03 +0000 (10:17 -0800)]
Merge tag 'objtool_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Borislav Petkov:

 - Adjust objtool to handle a recent binutils change to not generate
   unused symbols anymore.

 - Revert the fail-the-build-on-fatal-errors objtool strategy for now
   due to the ever-increasing matrix of supported toolchains/plugins and
   them causing too many such fatal errors currently.

 - Do not add empty symbols to objdump's rbtree to accommodate clang
   removing section symbols.

* tag 'objtool_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Don't fail on missing symbol table
  objtool: Don't fail the kernel build on fatal errors
  objtool: Don't add empty symbols to the rbtree

4 years agoMerge tag 'sched_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 24 Jan 2021 18:09:20 +0000 (10:09 -0800)]
Merge tag 'sched_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Correct the marking of kthreads which are supposed to run on a
   specific, single CPU vs such which are affine to only one CPU, mark
   per-cpu workqueue threads as such and make sure that marking
   "survives" CPU hotplug. Fix CPU hotplug issues with such kthreads.

 - A fix to not push away tasks on CPUs coming online.

 - Have workqueue CPU hotplug code use cpu_possible_mask when breaking
   affinity on CPU offlining so that pending workers can finish on newly
   arrived onlined CPUs too.

 - Dump tasks which haven't vacated a CPU which is currently being
   unplugged.

 - Register a special scale invariance callback which gets called on
   resume from RAM to read out APERF/MPERF after resume and thus make
   the schedutil scaling governor more precise.

* tag 'sched_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Relax the set_cpus_allowed_ptr() semantics
  sched: Fix CPU hotplug / tighten is_per_cpu_kthread()
  sched: Prepare to use balance_push in ttwu()
  workqueue: Restrict affinity change to rescuer
  workqueue: Tag bound workers with KTHREAD_IS_PER_CPU
  kthread: Extract KTHREAD_IS_PER_CPU
  sched: Don't run cpu-online with balance_push() enabled
  workqueue: Use cpu_possible_mask instead of cpu_active_mask to break affinity
  sched/core: Print out straggler tasks in sched_cpu_dying()
  x86: PM: Register syscore_ops for scale invariance

4 years agoMerge tag 'timers_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 24 Jan 2021 17:58:38 +0000 (09:58 -0800)]
Merge tag 'timers_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Borislav Petkov:

 - Fix an integer overflow in the NTP RTC synchronization which led to
   the latter happening every 2 seconds instead of the intended every 11
   minutes.

 - Get rid of now unused get_seconds().

* tag 'timers_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ntp: Fix RTC synchronization on 32-bit platforms
  timekeeping: Remove unused get_seconds()

4 years agoMerge tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 24 Jan 2021 17:46:05 +0000 (09:46 -0800)]
Merge tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add a new Intel model number for Alder Lake

 - Differentiate which aspects of the FPU state get saved/restored when
   the FPU is used in-kernel and fix a boot crash on K7 due to early
   MXCSR access before CR4.OSFXSR is even set.

 - A couple of noinstr annotation fixes

 - Correct die ID setting on AMD for users of topology information which
   need the correct die ID

 - A SEV-ES fix to handle string port IO to/from kernel memory properly

* tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add another Alder Lake CPU to the Intel family
  x86/mmx: Use KFPU_387 for MMX string operations
  x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
  x86/topology: Make __max_die_per_package available unconditionally
  x86: __always_inline __{rd,wr}msr()
  x86/mce: Remove explicit/superfluous tracing
  locking/lockdep: Avoid noinstr warning for DEBUG_LOCKDEP
  locking/lockdep: Cure noinstr fail
  x86/sev: Fix nonistr violation
  x86/entry: Fix noinstr fail
  x86/cpu/amd: Set __max_die_per_package on AMD
  x86/sev-es: Handle string port IO to kernel memory properly

4 years agoMerge tag 'powerpc-5.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 24 Jan 2021 17:40:51 +0000 (09:40 -0800)]
Merge tag 'powerpc-5.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a bad interaction between the scv handling and the fallback L1D
   flush, which could lead to user register corruption. Only affects
   people using scv (~no one) on machines with old firmware that are
   missing the L1D flush.

 - Two small selftest fixes.

Thanks to Eirik Fuller, Libor Pechacek, Nicholas Piggin, Sandipan Das,
and Tulio Magno Quites Machado Filho.

* tag 'powerpc-5.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: fix scv entry fallback flush vs interrupt
  selftests/powerpc: Only test lwm/stmw on big endian
  selftests/powerpc: Fix exit status of pkey tests

4 years agoMerge tag 'for-linus-2021-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 24 Jan 2021 17:35:28 +0000 (09:35 -0800)]
Merge tag 'for-linus-2021-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull misc fixes from Christian Brauner:

 - Jann reported sparse complaints because of a missing __user
   annotation in a helper we added way back when we added
   pidfd_send_signal() to avoid compat syscall handling. Fix it.

 - Yanfei replaces a reference in a comment to the _do_fork() helper I
   removed a while ago with a reference to the new kernel_clone()
   replacement

 - Alexander Guril added a simple coding style fix

* tag 'for-linus-2021-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  kthread: remove comments about old _do_fork() helper
  Kernel: fork.c: Fix coding style: Do not use {} around single-line statements
  signal: Add missing __user annotation to copy_siginfo_from_user_any

4 years agoMerge tag '5.11-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 24 Jan 2021 17:27:14 +0000 (09:27 -0800)]
Merge tag '5.11-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "An important signal handling patch for stable, and two small cleanup
  patches"

* tag '5.11-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: do not fail __smb_send_rqst if non-fatal signals are pending
  fs/cifs: Simplify bool comparison.
  fs/cifs: Assign boolean values to a bool variable

4 years agomm: fix numa stats for thp migration
Shakeel Butt [Sun, 24 Jan 2021 05:01:15 +0000 (21:01 -0800)]
mm: fix numa stats for thp migration

Currently the kernel is not correctly updating the numa stats for
NR_FILE_PAGES and NR_SHMEM on THP migration.  Fix that.

For NR_FILE_DIRTY and NR_ZONE_WRITE_PENDING, although at the moment
there is no need to handle THP migration as kernel still does not have
write support for file THP but to be more future proof, this patch adds
the THP support for those stats as well.

Link: https://lkml.kernel.org/r/20210108155813.2914586-2-shakeelb@google.com
Fixes: e71769ae52609 ("mm: enable thp migration for shmem thp")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: memcg: fix memcg file_dirty numa stat
Shakeel Butt [Sun, 24 Jan 2021 05:01:11 +0000 (21:01 -0800)]
mm: memcg: fix memcg file_dirty numa stat

The kernel updates the per-node NR_FILE_DIRTY stats on page migration
but not the memcg numa stats.

That was not an issue until recently the commit 5f9a4f4a7096 ("mm:
memcontrol: add the missing numa_stat interface for cgroup v2") exposed
numa stats for the memcg.

So fix the file_dirty per-memcg numa stat.

Link: https://lkml.kernel.org/r/20210108155813.2914586-1-shakeelb@google.com
Fixes: 5f9a4f4a7096 ("mm: memcontrol: add the missing numa_stat interface for cgroup v2")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: memcg/slab: optimize objcg stock draining
Roman Gushchin [Sun, 24 Jan 2021 05:01:07 +0000 (21:01 -0800)]
mm: memcg/slab: optimize objcg stock draining

Imran Khan reported a 16% regression in hackbench results caused by the
commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects
instead of pages").  The regression is noticeable in the case of a
consequent allocation of several relatively large slab objects, e.g.
skb's.  As soon as the amount of stocked bytes exceeds PAGE_SIZE,
drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads
to a number of atomic operations in page_counter_uncharge().

The corresponding call graph is below (provided by Imran Khan):

  |__alloc_skb
  |    |
  |    |__kmalloc_reserve.isra.61
  |    |    |
  |    |    |__kmalloc_node_track_caller
  |    |    |    |
  |    |    |    |slab_pre_alloc_hook.constprop.88
  |    |    |     obj_cgroup_charge
  |    |    |    |    |
  |    |    |    |    |__memcg_kmem_charge
  |    |    |    |    |    |
  |    |    |    |    |    |page_counter_try_charge
  |    |    |    |    |
  |    |    |    |    |refill_obj_stock
  |    |    |    |    |    |
  |    |    |    |    |    |drain_obj_stock.isra.68
  |    |    |    |    |    |    |
  |    |    |    |    |    |    |__memcg_kmem_uncharge
  |    |    |    |    |    |    |    |
  |    |    |    |    |    |    |    |page_counter_uncharge
  |    |    |    |    |    |    |    |    |
  |    |    |    |    |    |    |    |    |page_counter_cancel
  |    |    |    |
  |    |    |    |
  |    |    |    |__slab_alloc
  |    |    |    |    |
  |    |    |    |    |___slab_alloc
  |    |    |    |    |
  |    |    |    |slab_post_alloc_hook

Instead of directly uncharging the accounted kernel memory, it's
possible to refill the generic page-sized per-cpu stock instead.  It's a
much faster operation, especially on a default hierarchy.  As a bonus,
__memcg_kmem_uncharge_page() will also get faster, so the freeing of
page-sized kernel allocations (e.g.  large kmallocs) will become faster.

A similar change has been done earlier for the socket memory by the
commit 475d0487a2ad ("mm: memcontrol: use per-cpu stocks for socket
memory uncharging").

Link: https://lkml.kernel.org/r/20210106042239.2860107-1-guro@fb.com
Fixes: f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of pages")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Imran Khan <imran.f.khan@oracle.com>
Tested-by: Imran Khan <imran.f.khan@oracle.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Michal Koutn <mkoutny@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: fix initialization of struct page for holes in memory layout
Mike Rapoport [Sun, 24 Jan 2021 05:01:02 +0000 (21:01 -0800)]
mm: fix initialization of struct page for holes in memory layout

There could be struct pages that are not backed by actual physical
memory.  This can happen when the actual memory bank is not a multiple
of SECTION_SIZE or when an architecture does not register memory holes
reserved by the firmware as memblock.memory.

Such pages are currently initialized using init_unavailable_mem()
function that iterates through PFNs in holes in memblock.memory and if
there is a struct page corresponding to a PFN, the fields if this page
are set to default values and the page is marked as Reserved.

init_unavailable_mem() does not take into account zone and node the page
belongs to and sets both zone and node links in struct page to zero.

On a system that has firmware reserved holes in a zone above ZONE_DMA,
for instance in a configuration below:

# grep -A1 E820 /proc/iomem
7a17b000-7a216fff : Unknown E820 type
7a217000-7bffffff : System RAM

unset zone link in struct page will trigger

VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);

because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link
in struct page) in the same pageblock.

Update init_unavailable_mem() to use zone constraints defined by an
architecture to properly setup the zone link and use node ID of the
adjacent range in memblock.memory to set the node link.

Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org
Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agox86/setup: don't remove E820_TYPE_RAM for pfn 0
Mike Rapoport [Sun, 24 Jan 2021 05:00:57 +0000 (21:00 -0800)]
x86/setup: don't remove E820_TYPE_RAM for pfn 0

Patch series "mm: fix initialization of struct page for holes in  memory layout", v3.

Commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions
rather that check each PFN") exposed several issues with the memory map
initialization and these patches fix those issues.

Initially there were crashes during compaction that Qian Cai reported
back in April [1].  It seemed back then that the problem was fixed, but
a few weeks ago Andrea Arcangeli hit the same bug [2] and there was an
additional discussion at [3].

[1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
[2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com
[3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foundation.org

This patch (of 2):

The first 4Kb of memory is a BIOS owned area and to avoid its allocation
for the kernel it was not listed in e820 tables as memory.  As the result,
pfn 0 was never recognised by the generic memory management and it is not
a part of neither node 0 nor ZONE_DMA.

If set_pfnblock_flags_mask() would be ever called for the pageblock
corresponding to the first 2Mbytes of memory, having pfn 0 outside of
ZONE_DMA would trigger

VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);

Along with reserving the first 4Kb in e820 tables, several first pages are
reserved with memblock in several places during setup_arch().  These
reservations are enough to ensure the kernel does not touch the BIOS area
and it is not necessary to remove E820_TYPE_RAM for pfn 0.

Remove the update of e820 table that changes the type of pfn 0 and move
the comment describing why it was done to trim_low_memory_range() that
reserves the beginning of the memory.

Link: https://lkml.kernel.org/r/20210111194017.22696-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoio_uring: account io_uring internal files as REQ_F_INFLIGHT
Jens Axboe [Sat, 23 Jan 2021 22:49:31 +0000 (15:49 -0700)]
io_uring: account io_uring internal files as REQ_F_INFLIGHT

We need to actively cancel anything that introduces a potential circular
loop, where io_uring holds a reference to itself. If the file in question
is an io_uring file, then add the request to the inflight list.

Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoio_uring: fix sleeping under spin in __io_clean_op
Pavel Begunkov [Sun, 24 Jan 2021 15:08:14 +0000 (15:08 +0000)]
io_uring: fix sleeping under spin in __io_clean_op

[   27.629441] BUG: sleeping function called from invalid context
at fs/file.c:402
[   27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0,
pid: 1012, name: io_wqe_worker-0
[   27.633220] 1 lock held by io_wqe_worker-0/1012:
[   27.634286]  #0: ffff888105e26c98 (&ctx->completion_lock)
{....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70
[   27.649249] Call Trace:
[   27.649874]  dump_stack+0xac/0xe3
[   27.650666]  ___might_sleep+0x284/0x2c0
[   27.651566]  put_files_struct+0xb8/0x120
[   27.652481]  __io_clean_op+0x10c/0x2a0
[   27.653362]  __io_cqring_fill_event+0x2c1/0x350
[   27.654399]  __io_req_complete.part.102+0x41/0x70
[   27.655464]  io_openat2+0x151/0x300
[   27.656297]  io_issue_sqe+0x6c/0x14e0
[   27.660991]  io_wq_submit_work+0x7f/0x240
[   27.662890]  io_worker_handle_work+0x501/0x8a0
[   27.664836]  io_wqe_worker+0x158/0x520
[   27.667726]  kthread+0x134/0x180
[   27.669641]  ret_from_fork+0x1f/0x30

Instead of cleaning files on overflow, return back overflow cancellation
into io_uring_cancel_files(). Previously it was racy to clean
REQ_F_OVERFLOW flag, but we got rid of it, and can do it through
repetitive attempts targeting all matching requests.

Reported-by: Abaci <abaci@linux.alibaba.com>
Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoMerge branch 'mtd/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Linus Torvalds [Sat, 23 Jan 2021 20:02:58 +0000 (12:02 -0800)]
Merge branch 'mtd/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull mtd fixes from Miquel Raynal.

* 'mtd/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: rawnand: omap: Use BCH private fields in the specific OOB layout
  mtd: spinand: Fix MTD_OPS_AUTO_OOB requests
  mtd: rawnand: intel: check the mtd name only after setting the variable
  mtd: rawnand: nandsim: Fix the logic when selecting Hamming soft ECC engine
  mtd: rawnand: gpmi: fix dst bit offset when extracting raw payload

4 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 23 Jan 2021 19:43:02 +0000 (11:43 -0800)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Another bunch  of driver fixes"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: sprd: depend on COMMON_CLK to fix compile tests
  Revert "i2c: imx: Remove unused .id_table support"
  i2c: octeon: check correct size of maximum RECV_LEN packet
  i2c: tegra: Create i2c_writesl_vi() to use with VI I2C for filling TX FIFO
  i2c: bpmp-tegra: Ignore unknown I2C_M flags
  i2c: tegra: Wait for config load atomically while in ISR

4 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 23 Jan 2021 19:35:02 +0000 (11:35 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Twelve minor fixes, all in drivers or doc.

  Most of the fixes are pretty obvious (although we had two goes to get
  the UFS sysfs doc right) and the biggest change is in the ufs driver
  which they've extensively tested"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ibmvfc: Set default timeout to avoid crash during migration
  scsi: target: tcmu: Fix use-after-free of se_cmd->priv
  scsi: fnic: Fix memleak in vnic_dev_init_devcmd2
  scsi: libfc: Avoid invoking response handler twice if ep is already completed
  scsi: scsi_transport_srp: Don't block target in failfast state
  scsi: docs: ABI: sysfs-driver-ufs: Rectify table formatting
  scsi: ufs: Fix tm request when non-fatal error happens
  scsi: ufs: Fix livelock of ufshcd_clear_ua_wluns()
  scsi: ibmvfc: Fix missing cast of ibmvfc_event pointer to u64 handle
  scsi: ufs: ufshcd-pltfrm depends on HAS_IOMEM
  scsi: megaraid_sas: Fix MEGASAS_IOC_FIRMWARE regression
  scsi: docs: ABI: sysfs-driver-ufs: Add DeepSleep power mode

4 years agoMerge tag 'linux-kselftest-kunit-fixes-5.11-rc5' of git://git.kernel.org/pub/scm...
Linus Torvalds [Sat, 23 Jan 2021 19:25:33 +0000 (11:25 -0800)]
Merge tag 'linux-kselftest-kunit-fixes-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit fixes from Shuah :
 "Five fixes to the kunit tool and documentation from Daniel Latypov and
  David Gow"

* tag 'linux-kselftest-kunit-fixes-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: tool: move kunitconfig parsing into __init__, make it optional
  kunit: tool: fix minor typing issue with None status
  kunit: tool: surface and address more typing issues
  Documentation: kunit: include example of a parameterized test
  kunit: tool: Fix spelling of "diagnostic" in kunit_parser

4 years agocifs: do not fail __smb_send_rqst if non-fatal signals are pending
Ronnie Sahlberg [Wed, 20 Jan 2021 22:22:48 +0000 (08:22 +1000)]
cifs: do not fail __smb_send_rqst if non-fatal signals are pending

RHBZ 1848178

The original intent of returning an error in this function
in the patch:
  "CIFS: Mask off signals when sending SMB packets"
was to avoid interrupting packet send in the middle of
sending the data (and thus breaking an SMB connection),
but we also don't want to fail the request for non-fatal
signals even before we have had a chance to try to
send it (the reported problem could be reproduced e.g.
by exiting a child process when the parent process was in
the midst of calling futimens to update a file's timestamps).

In addition, since the signal may remain pending when we enter the
sending loop, we may end up not sending the whole packet before
TCP buffers become full. In this case the code returns -EINTR
but what we need here is to return -ERESTARTSYS instead to
allow system calls to be restarted.

Fixes: b30c74c73c78 ("CIFS: Mask off signals when sending SMB packets")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agoMerge tag 'for-5.11/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 22 Jan 2021 22:31:00 +0000 (14:31 -0800)]
Merge tag 'for-5.11/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM integrity crash if "recalculate" used without "internal_hash"

 - Fix DM integrity "recalculate" support to prevent recalculating
   checksums if we use internal_hash or journal_hash with a key (e.g.
   HMAC). Use of crypto as a means to prevent malicious corruption
   requires further changes and was never a design goal for
   dm-integrity's primary usecase of detecting accidental corruption.

 - Fix a benign dm-crypt copy-and-paste bug introduced as part of a fix
   that was merged for 5.11-rc4.

 - Fix DM core's dm_get_device() to avoid filesystem lookup to get block
   device (if possible).

* tag 'for-5.11/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: avoid filesystem lookup in dm_get_dev_t()
  dm crypt: fix copy and paste bug in crypt_alloc_req_aead
  dm integrity: conditionally disable "recalculate" feature
  dm integrity: fix a crash if "recalculate" used without "internal_hash"

4 years agoMerge tag 'perf-tools-fixes-v5.11-2-2021-01-22' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Fri, 22 Jan 2021 21:55:00 +0000 (13:55 -0800)]
Merge tag 'perf-tools-fixes-v5.11-2-2021-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull more perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix id index used in Intel PT for heterogeneous systems

 - Fix overrun issue in 'perf script' for dynamically-allocated PMU type
   number

 - Fix 'perf stat' metrics containing the 'duration_time' synthetic
   event

 - Fix system PMU 'perf stat' metrics

* tag 'perf-tools-fixes-v5.11-2-2021-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf script: Fix overrun issue for dynamically-allocated PMU type number
  perf metricgroup: Fix system PMU metrics
  perf metricgroup: Fix for metrics containing duration_time
  perf evlist: Fix id index for heterogeneous systems

4 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 22 Jan 2021 21:51:17 +0000 (13:51 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Correctly mask out bits 63:60 in a kernel tag check fault address
   (specified as unknown by the architecture). Previously they were just
   zeroed but for kernel pointers they need to be all ones.

 - Fix a panic (unexpected kernel BRK exception) caused by kprobes being
   reentered due to an interrupt.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: kprobes: Fix Uexpected kernel BRK exception at EL1
  kasan, arm64: fix pointer tags in KASAN reports

4 years agoMerge tag 'ceph-for-5.11-rc5' of git://github.com/ceph/ceph-client
Linus Torvalds [Fri, 22 Jan 2021 21:47:25 +0000 (13:47 -0800)]
Merge tag 'ceph-for-5.11-rc5' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A patch to zero out sensitive cryptographic data and two minor
  cleanups prompted by the fact that a bunch of code was moved in this
  cycle"

* tag 'ceph-for-5.11-rc5' of git://github.com/ceph/ceph-client:
  libceph: fix "Boolean result is used in bitwise operation" warning
  libceph, ceph: disambiguate ceph_connection_operations handlers
  libceph: zero out session key and connection secret

4 years agoMerge tag 'fixes-2021-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt...
Linus Torvalds [Fri, 22 Jan 2021 21:45:52 +0000 (13:45 -0800)]
Merge tag 'fixes-2021-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull typo fix from Mike Rapoport:
 "Fix typo in comment of memblock_phys_alloc_try_nid()"

* tag 'fixes-2021-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  mm/memblock: Fix typo in comment of memblock_phys_alloc_try_nid()

4 years agoMerge tag 'mmc-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 22 Jan 2021 21:43:42 +0000 (13:43 -0800)]
Merge tag 'mmc-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "MMC core:
   - Fix initialization of block size when ext_csd isn't present

  MMC host:
   - sdhci-brcmstb: Fix mmc timeout errors on S5 suspend
   - sdhci-of-dwcmshc: Fix request accessing RPMB
   - sdhci-xenon: Fix 1.8v regulator stabilization"

* tag 'mmc-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: core: don't initialize block size from ext_csd if not present
  mmc: sdhci-brcmstb: Fix mmc timeout errors on S5 suspend
  mmc: sdhci-xenon: fix 1.8v regulator stabilization
  mmc: sdhci-of-dwcmshc: fix rpmb access

4 years agoMerge tag 'platform-drivers-x86-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 22 Jan 2021 21:38:40 +0000 (13:38 -0800)]
Merge tag 'platform-drivers-x86-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:
 "A small collection of bug-fixes and model-specific quirks"

* tag 'platform-drivers-x86-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: thinkpad_acpi: Add P53/73 firmware to fan_quirk_table for dual fan control
  platform/x86: hp-wmi: Don't log a warning on HPWMI_RET_UNKNOWN_COMMAND errors
  platform/x86: intel-vbtn: Drop HP Stream x360 Convertible PC 11 from allow-list
  platform/x86: ideapad-laptop: Disable touchpad_switch for ELAN0634
  platform/x86: amd-pmc: Fix CONFIG_DEBUG_FS check
  platform/x86: thinkpad_acpi: correct palmsensor error checking
  platform/x86: intel-vbtn: Support for tablet mode on Dell Inspiron 7352
  platform/x86: touchscreen_dmi: Add swap-x-y quirk for Goodix touchscreen on Estar Beauty HD tablet
  platform/x86: i2c-multi-instantiate: Don't create platform device for INT3515 ACPI nodes
  platform/surface: SURFACE_PLATFORMS should depend on ACPI
  platform/surface: surface_gpe: Fix non-PM_SLEEP build warnings
  tools/power/x86/intel-speed-select: Set higher of cpuinfo_max_freq or base_frequency
  tools/power/x86/intel-speed-select: Set scaling_max_freq to base_frequency

4 years agoio_uring: fix short read retries for non-reg files
Pavel Begunkov [Thu, 21 Jan 2021 12:01:08 +0000 (12:01 +0000)]
io_uring: fix short read retries for non-reg files

Sockets and other non-regular files may actually expect short reads to
happen, don't retry reads for them. Because non-reg files don't set
FMODE_BUF_RASYNC and so it won't do second/retry do_read, we can filter
out those cases after first do_read() attempt with ret>0.

Cc: stable@vger.kernel.org # 5.9+
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoio_uring: fix SQPOLL IORING_OP_CLOSE cancelation state
Jens Axboe [Tue, 19 Jan 2021 17:10:54 +0000 (10:10 -0700)]
io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state

IORING_OP_CLOSE is special in terms of cancelation, since it has an
intermediate state where we've removed the file descriptor but hasn't
closed the file yet. For that reason, it's currently marked with
IO_WQ_WORK_NO_CANCEL to prevent cancelation. This ensures that the op
is always run even if canceled, to prevent leaving us with a live file
but an fd that is gone. However, with SQPOLL, since a cancel request
doesn't carry any resources on behalf of the request being canceled, if
we cancel before any of the close op has been run, we can end up with
io-wq not having the ->files assigned. This can result in the following
oops reported by Joseph:

BUG: kernel NULL pointer dereference, address: 00000000000000d8
PGD 800000010b76f067 P4D 800000010b76f067 PUD 10b462067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 1788 Comm: io_uring-sq Not tainted 5.11.0-rc4 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:__lock_acquire+0x19d/0x18c0
Code: 00 00 8b 1d fd 56 dd 08 85 db 0f 85 43 05 00 00 48 c7 c6 98 7b 95 82 48 c7 c7 57 96 93 82 e8 9a bc f5 ff 0f 0b e9 2b 05 00 00 <48> 81 3f c0 ca 67 8a b8 00 00 00 00 41 0f 45 c0 89 04 24 e9 81 fe
RSP: 0018:ffffc90001933828 EFLAGS: 00010002
RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000d8
RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff888106e8a140 R15: 00000000000000d8
FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000d8 CR3: 0000000106efa004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 lock_acquire+0x31a/0x440
 ? close_fd_get_file+0x39/0x160
 ? __lock_acquire+0x647/0x18c0
 _raw_spin_lock+0x2c/0x40
 ? close_fd_get_file+0x39/0x160
 close_fd_get_file+0x39/0x160
 io_issue_sqe+0x1334/0x14e0
 ? lock_acquire+0x31a/0x440
 ? __io_free_req+0xcf/0x2e0
 ? __io_free_req+0x175/0x2e0
 ? find_held_lock+0x28/0xb0
 ? io_wq_submit_work+0x7f/0x240
 io_wq_submit_work+0x7f/0x240
 io_wq_cancel_cb+0x161/0x580
 ? io_wqe_wake_worker+0x114/0x360
 ? io_uring_get_socket+0x40/0x40
 io_async_find_and_cancel+0x3b/0x140
 io_issue_sqe+0xbe1/0x14e0
 ? __lock_acquire+0x647/0x18c0
 ? __io_queue_sqe+0x10b/0x5f0
 __io_queue_sqe+0x10b/0x5f0
 ? io_req_prep+0xdb/0x1150
 ? mark_held_locks+0x6d/0xb0
 ? mark_held_locks+0x6d/0xb0
 ? io_queue_sqe+0x235/0x4b0
 io_queue_sqe+0x235/0x4b0
 io_submit_sqes+0xd7e/0x12a0
 ? _raw_spin_unlock_irq+0x24/0x30
 ? io_sq_thread+0x3ae/0x940
 io_sq_thread+0x207/0x940
 ? do_wait_intr_irq+0xc0/0xc0
 ? __ia32_sys_io_uring_enter+0x650/0x650
 kthread+0x134/0x180
 ? kthread_create_worker_on_cpu+0x90/0x90
 ret_from_fork+0x1f/0x30

Fix this by moving the IO_WQ_WORK_NO_CANCEL until _after_ we've modified
the fdtable. Canceling before this point is totally fine, and running
it in the io-wq context _after_ that point is also fine.

For 5.12, we'll handle this internally and get rid of the no-cancel
flag, as IORING_OP_CLOSE is the only user of it.

Cc: stable@vger.kernel.org
Fixes: b5dba59e0cf7 ("io_uring: add support for IORING_OP_CLOSE")
Reported-by: "Abaci <abaci@linux.alibaba.com>"
Reviewed-and-tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoarm64: kprobes: Fix Uexpected kernel BRK exception at EL1
Qais Yousef [Fri, 22 Jan 2021 11:09:09 +0000 (11:09 +0000)]
arm64: kprobes: Fix Uexpected kernel BRK exception at EL1

I was hitting the below panic continuously when attaching kprobes to
scheduler functions

[  159.045212] Unexpected kernel BRK exception at EL1
[  159.053753] Internal error: BRK handler: f2000006 [#1] PREEMPT SMP
[  159.059954] Modules linked in:
[  159.063025] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.11.0-rc4-00008-g1e2a199f6ccd #56
[rt-app] <notice> [1] Exiting.[  159.071166] Hardware name: ARM Juno development board (r2) (DT)
[  159.079689] pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO BTYPE=--)

[  159.085723] pc : 0xffff80001624501c
[  159.089377] lr : attach_entity_load_avg+0x2ac/0x350
[  159.094271] sp : ffff80001622b640
[rt-app] <notice> [0] Exiting.[  159.097591] x29: ffff80001622b640 x28: 0000000000000001
[  159.105515] x27: 0000000000000049 x26: ffff000800b79980

[  159.110847] x25: ffff00097ef37840 x24: 0000000000000000
[  159.116331] x23: 00000024eacec1ec x22: ffff00097ef12b90
[  159.121663] x21: ffff00097ef37700 x20: ffff800010119170
[rt-app] <notice> [11] Exiting.[  159.126995] x19: ffff00097ef37840 x18: 000000000000000e
[  159.135003] x17: 0000000000000001 x16: 0000000000000019
[  159.140335] x15: 0000000000000000 x14: 0000000000000000
[  159.145666] x13: 0000000000000002 x12: 0000000000000002
[  159.150996] x11: ffff80001592f9f0 x10: 0000000000000060
[  159.156327] x9 : ffff8000100f6f9c x8 : be618290de0999a1
[  159.161659] x7 : ffff80096a4b1000 x6 : 0000000000000000
[  159.166990] x5 : ffff00097ef37840 x4 : 0000000000000000
[  159.172321] x3 : ffff000800328948 x2 : 0000000000000000
[  159.177652] x1 : 0000002507d52fec x0 : ffff00097ef12b90
[  159.182983] Call trace:
[  159.185433]  0xffff80001624501c
[  159.188581]  update_load_avg+0x2d0/0x778
[  159.192516]  enqueue_task_fair+0x134/0xe20
[  159.196625]  enqueue_task+0x4c/0x2c8
[  159.200211]  ttwu_do_activate+0x70/0x138
[  159.204147]  sched_ttwu_pending+0xbc/0x160
[  159.208253]  flush_smp_call_function_queue+0x16c/0x320
[  159.213408]  generic_smp_call_function_single_interrupt+0x1c/0x28
[  159.219521]  ipi_handler+0x1e8/0x3c8
[  159.223106]  handle_percpu_devid_irq+0xd8/0x460
[  159.227650]  generic_handle_irq+0x38/0x50
[  159.231672]  __handle_domain_irq+0x6c/0xc8
[  159.235781]  gic_handle_irq+0xcc/0xf0
[  159.239452]  el1_irq+0xb4/0x180
[  159.242600]  rcu_is_watching+0x28/0x70
[  159.246359]  rcu_read_lock_held_common+0x44/0x88
[  159.250991]  rcu_read_lock_any_held+0x30/0xc0
[  159.255360]  kretprobe_dispatcher+0xc4/0xf0
[  159.259555]  __kretprobe_trampoline_handler+0xc0/0x150
[  159.264710]  trampoline_probe_handler+0x38/0x58
[  159.269255]  kretprobe_trampoline+0x70/0xc4
[  159.273450]  run_rebalance_domains+0x54/0x80
[  159.277734]  __do_softirq+0x164/0x684
[  159.281406]  irq_exit+0x198/0x1b8
[  159.284731]  __handle_domain_irq+0x70/0xc8
[  159.288840]  gic_handle_irq+0xb0/0xf0
[  159.292510]  el1_irq+0xb4/0x180
[  159.295658]  arch_cpu_idle+0x18/0x28
[  159.299245]  default_idle_call+0x9c/0x3e8
[  159.303265]  do_idle+0x25c/0x2a8
[  159.306502]  cpu_startup_entry+0x2c/0x78
[  159.310436]  secondary_start_kernel+0x160/0x198
[  159.314984] Code: d42000c0 aa1e03e9 d42000c0 aa1e03e9 (d42000c0)

After a bit of head scratching and debugging it turned out that it is
due to kprobe handler being interrupted by a tick that causes us to go
into (I think another) kprobe handler.

The culprit was kprobe_breakpoint_ss_handler() returning DBG_HOOK_ERROR
which leads to the Unexpected kernel BRK exception.

Reverting commit ba090f9cafd5 ("arm64: kprobes: Remove redundant
kprobe_step_ctx") seemed to fix the problem for me.

Further analysis showed that kcb->kprobe_status is set to
KPROBE_REENTER when the error occurs. By teaching
kprobe_breakpoint_ss_handler() to handle this status I can no  longer
reproduce the problem.

Fixes: ba090f9cafd5 ("arm64: kprobes: Remove redundant kprobe_step_ctx")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210122110909.3324607-1-qais.yousef@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agosched: Relax the set_cpus_allowed_ptr() semantics
Peter Zijlstra [Sat, 16 Jan 2021 10:56:37 +0000 (11:56 +0100)]
sched: Relax the set_cpus_allowed_ptr() semantics

Now that we have KTHREAD_IS_PER_CPU to denote the critical per-cpu
tasks to retain during CPU offline, we can relax the warning in
set_cpus_allowed_ptr(). Any spurious kthread that wants to get on at
the last minute will get pushed off before it can run.

While during CPU online there is no harm, and actual benefit, to
allowing kthreads back on early, it simplifies hotplug code and fixes
a number of outstanding races.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Lai jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103507.240724591@infradead.org
4 years agosched: Fix CPU hotplug / tighten is_per_cpu_kthread()
Peter Zijlstra [Tue, 12 Jan 2021 10:28:16 +0000 (11:28 +0100)]
sched: Fix CPU hotplug / tighten is_per_cpu_kthread()

Prior to commit 1cf12e08bc4d ("sched/hotplug: Consolidate task
migration on CPU unplug") we'd leave any task on the dying CPU and
break affinity and force them off at the very end.

This scheme had to change in order to enable migrate_disable(). One
cannot wait for migrate_disable() to complete while stuck in
stop_machine(). Furthermore, since we need at the very least: idle,
hotplug and stop threads at any point before stop_machine, we can't
break affinity and/or push those away.

Under the assumption that all per-cpu kthreads are sanely handled by
CPU hotplug, the new code no long breaks affinity or migrates any of
them (which then includes the critical ones above).

However, there's an important difference between per-cpu kthreads and
kthreads that happen to have a single CPU affinity which is lost. The
latter class very much relies on the forced affinity breaking and
migration semantics previously provided.

Use the new kthread_is_per_cpu() infrastructure to tighten
is_per_cpu_kthread() and fix the hot-unplug problems stemming from the
change.

Fixes: 1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103507.102416009@infradead.org
4 years agosched: Prepare to use balance_push in ttwu()
Peter Zijlstra [Wed, 20 Jan 2021 14:05:41 +0000 (15:05 +0100)]
sched: Prepare to use balance_push in ttwu()

In preparation of using the balance_push state in ttwu() we need it to
provide a reliable and consistent state.

The immediate problem is that rq->balance_callback gets cleared every
schedule() and then re-set in the balance_push_callback() itself. This
is not a reliable signal, so add a variable that stays set during the
entire time.

Also move setting it before the synchronize_rcu() in
sched_cpu_deactivate(), such that we get guaranteed visibility to
ttwu(), which is a preempt-disable region.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.966069627@infradead.org
4 years agoworkqueue: Restrict affinity change to rescuer
Peter Zijlstra [Fri, 15 Jan 2021 18:08:36 +0000 (19:08 +0100)]
workqueue: Restrict affinity change to rescuer

create_worker() will already set the right affinity using
kthread_bind_mask(), this means only the rescuer will need to change
it's affinity.

Howveer, while in cpu-hot-unplug a regular task is not allowed to run
on online&&!active as it would be pushed away quite agressively. We
need KTHREAD_IS_PER_CPU to survive in that environment.

Therefore set the affinity after getting that magic flag.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.826629830@infradead.org
4 years agoworkqueue: Tag bound workers with KTHREAD_IS_PER_CPU
Peter Zijlstra [Tue, 12 Jan 2021 10:26:49 +0000 (11:26 +0100)]
workqueue: Tag bound workers with KTHREAD_IS_PER_CPU

Mark the per-cpu workqueue workers as KTHREAD_IS_PER_CPU.

Workqueues have unfortunate semantics in that per-cpu workers are not
default flushed and parked during hotplug, however a subset does
manual flush on hotplug and hard relies on them for correctness.

Therefore play silly games..

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.693465814@infradead.org
4 years agokthread: Extract KTHREAD_IS_PER_CPU
Peter Zijlstra [Tue, 12 Jan 2021 10:24:04 +0000 (11:24 +0100)]
kthread: Extract KTHREAD_IS_PER_CPU

There is a need to distinguish geniune per-cpu kthreads from kthreads
that happen to have a single CPU affinity.

Geniune per-cpu kthreads are kthreads that are CPU affine for
correctness, these will obviously have PF_KTHREAD set, but must also
have PF_NO_SETAFFINITY set, lest userspace modify their affinity and
ruins things.

However, these two things are not sufficient, PF_NO_SETAFFINITY is
also set on other tasks that have their affinities controlled through
other means, like for instance workqueues.

Therefore another bit is needed; it turns out kthread_create_per_cpu()
already has such a bit: KTHREAD_IS_PER_CPU, which is used to make
kthread_park()/kthread_unpark() work correctly.

Expose this flag and remove the implicit setting of it from
kthread_create_on_cpu(); the io_uring usage of it seems dubious at
best.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.557620262@infradead.org
4 years agosched: Don't run cpu-online with balance_push() enabled
Peter Zijlstra [Fri, 15 Jan 2021 17:17:45 +0000 (18:17 +0100)]
sched: Don't run cpu-online with balance_push() enabled

We don't need to push away tasks when we come online, mark the push
complete right before the CPU dies.

XXX hotplug state machine has trouble with rollback here.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.415606087@infradead.org
4 years agoworkqueue: Use cpu_possible_mask instead of cpu_active_mask to break affinity
Lai Jiangshan [Mon, 11 Jan 2021 15:26:33 +0000 (23:26 +0800)]
workqueue: Use cpu_possible_mask instead of cpu_active_mask to break affinity

The scheduler won't break affinity for us any more, and we should
"emulate" the same behavior when the scheduler breaks affinity for
us.  The behavior is "changing the cpumask to cpu_possible_mask".

And there might be some other CPUs online later while the worker is
still running with the pending work items.  The worker should be allowed
to use the later online CPUs as before and process the work items ASAP.
If we use cpu_active_mask here, we can't achieve this goal but
using cpu_possible_mask can.

Fixes: 06249738a41a ("workqueue: Manually break affinity on hotplug")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210111152638.2417-4-jiangshanlai@gmail.com