]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
11 months agoperf docs: Refine the description for the buffer size
Leo Yan [Mon, 12 Aug 2024 09:34:59 +0000 (10:34 +0100)]
perf docs: Refine the description for the buffer size

Current description for the AUX trace buffer size is misleading. When a
user specifies the option '-m,512M', it represents a size value in bytes
(512MiB) but not 512M pages (512M x 4KiB regard to a page of 4KiB).

Make the document clear that the normal buffer and the AUX tracing
buffer share the same semantics. Syncs the documents for consistent
text.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812093459.2575278-1-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf script: add --addr2line option
Martin Liška [Fri, 19 Jul 2024 10:57:08 +0000 (12:57 +0200)]
perf script: add --addr2line option

Similarly to other subcommands (like report, top), it would be handy to
provide a path for addr2line command.

Signed-off-by: Martin Liska <martin.liska@hey.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/eadc3e36-029d-4848-9d69-272fe5a83a26@foxlink.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf tests pmu: Initialize all fields of test_pmu variable
Arnaldo Carvalho de Melo [Mon, 12 Aug 2024 12:57:20 +0000 (09:57 -0300)]
perf tests pmu: Initialize all fields of test_pmu variable

Instead of explicitely initializing just the .name and .alias_name,
use struct member named initialization of just the non-null -name field,
the compiler will initialize all the other non-explicitely initialized
fields to NULL.

This makes the code more robust, avoiding the error recently fixed when
the .alias_name was used and contained a random value.

Reviewed-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Radostin Stoyanov <rstoyano@redhat.com>
Link: https://lore.kernel.org/lkml/e26941f9-f86c-4f2e-b812-20c49fb2c0d3@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf annotate-data: Support --skip-empty option
Namhyung Kim [Wed, 7 Aug 2024 06:17:13 +0000 (23:17 -0700)]
perf annotate-data: Support --skip-empty option

The --skip-empty option is to hide dummy events in a group.  Like other
output mode in 'perf report' and 'perf annotate', the data-type
profiling output should support the option.

Committer testing:

With dummy:

  root@number:~# perf annotate --stdio --group --data-type --skip-empty | head -24
  Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples):
   event[0] = cpu_atom/mem-loads,ldlat=30/P
   event[1] = cpu_atom/mem-stores/P
   event[2] = dummy:u
  ============================================================================
                   Percent     offset       size  field
    100.00  100.00    0.00          0         40  pthread_mutex_t  {
    100.00  100.00    0.00          0         40      struct __pthread_mutex_s __data {
     45.21   84.54    0.00          0          4          int __lock;
      0.00    0.00    0.00          4          4          unsigned int __count;
      0.00    1.83    0.00          8          4          int __owner;
      5.19   10.65    0.00         12          4          unsigned int __nusers;
     49.61    2.97    0.00         16          4          int __kind;
      0.00    0.00    0.00         20          2          short int __spins;
      0.00    0.00    0.00         22          2          short int __elision;
      0.00    0.00    0.00         24         16          __pthread_list_t __list {
      0.00    0.00    0.00         24          8              struct __pthread_internal_list* __prev;
      0.00    0.00    0.00         32          8              struct __pthread_internal_list* __next;
                                                          };
                                                      };
      0.00    0.00    0.00          0          0      char[] __size;
     45.21   84.54    0.00          0          8      long int __align;
                                                };
Skipping it:

  root@number:~# perf annotate --stdio --group --data-type --skip-empty | head -24
  Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples):
   event[0] = cpu_atom/mem-loads,ldlat=30/P
   event[1] = cpu_atom/mem-stores/P
  ============================================================================
           Percent     offset       size  field
    100.00  100.00          0         40  pthread_mutex_t  {
    100.00  100.00          0         40      struct __pthread_mutex_s __data {
     45.21   84.54          0          4          int __lock;
      0.00    0.00          4          4          unsigned int __count;
      0.00    1.83          8          4          int __owner;
      5.19   10.65         12          4          unsigned int __nusers;
     49.61    2.97         16          4          int __kind;
      0.00    0.00         20          2          short int __spins;
      0.00    0.00         22          2          short int __elision;
      0.00    0.00         24         16          __pthread_list_t __list {
      0.00    0.00         24          8              struct __pthread_internal_list* __prev;
      0.00    0.00         32          8              struct __pthread_internal_list* __next;
                                                  };
                                              };
      0.00    0.00          0          0      char[] __size;
     45.21   84.54          0          8      long int __align;
                                          };

  Annotate type: 'pthread_mutexattr_t' in /usr/lib64/libc.so.6 (1 samples):
  root@number:~#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240807061713.1642924-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf annotate: Fix --group behavior when leader has no samples
Namhyung Kim [Wed, 7 Aug 2024 06:15:55 +0000 (23:15 -0700)]
perf annotate: Fix --group behavior when leader has no samples

When --group option is used, it should display all events together.  But
the current logic only checks if the first (leader) event has samples or
not.  Let's check the member events as well.

Also it missed to put the linked samples from member evsels to the
output RB-tree so that it can be displayed in the output.

For example, take a look at this example.

  $ ./perf evlist
  cpu/mem-loads,ldlat=30/P
  cpu/mem-stores/P
  dummy:u

It has three events but 'path_put' function has samples only for
mem-stores (second) event.

  $ sudo ./perf annotate --stdio -f path_put
   Percent |      Source code & Disassembly of kcore for cpu/mem-stores/P (2 samples, percent: local period)
  ----------------------------------------------------------------------------------------------------------
           : 0                0xffffffffae600020 <path_put>:
      0.00 :   ffffffffae600020:       endbr64
      0.00 :   ffffffffae600024:       nopl    (%rax, %rax)
     91.22 :   ffffffffae600029:       pushq   %rbx
      0.00 :   ffffffffae60002a:       movq    %rdi, %rbx
      0.00 :   ffffffffae60002d:       movq    8(%rdi), %rdi
      8.78 :   ffffffffae600031:       callq   0xffffffffae614aa0
      0.00 :   ffffffffae600036:       movq    (%rbx), %rdi
      0.00 :   ffffffffae600039:       popq    %rbx
      0.00 :   ffffffffae60003a:       jmp     0xffffffffae620670
      0.00 :   ffffffffae60003f:       nop

Therefore, it didn't show up when --group option is used since the
leader ("mem-loads") event has no samples.  But now it checks both
events.

Before:
  $ sudo ./perf annotate --stdio -f --group path_put
  (no output)

After:
  $ sudo ./perf annotate --stdio -f --group path_put
   Percent                 |      Source code & Disassembly of kcore for cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u (0 samples, percent: local period)
  -------------------------------------------------------------------------------------------------------------------------------------------------------------
                           : 0                0xffffffffae600020 <path_put>:
      0.00    0.00    0.00 :   ffffffffae600020:       endbr64
      0.00    0.00    0.00 :   ffffffffae600024:       nopl    (%rax, %rax)
      0.00   91.22    0.00 :   ffffffffae600029:       pushq   %rbx
      0.00    0.00    0.00 :   ffffffffae60002a:       movq    %rdi, %rbx
      0.00    0.00    0.00 :   ffffffffae60002d:       movq    8(%rdi), %rdi
      0.00    8.78    0.00 :   ffffffffae600031:       callq   0xffffffffae614aa0
      0.00    0.00    0.00 :   ffffffffae600036:       movq    (%rbx), %rdi
      0.00    0.00    0.00 :   ffffffffae600039:       popq    %rbx
      0.00    0.00    0.00 :   ffffffffae60003a:       jmp     0xffffffffae620670
      0.00    0.00    0.00 :   ffffffffae60003f:       nop

Committer testing:

Before:

  root@number:~# perf annotate --group --stdio2 clear_page_erms
  root@number:~#

After:

  root@number:~# perf annotate --group --stdio2 clear_page_erms
  Samples: 125  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 13198416, [percent: local period]
  clear_page_erms() /proc/kcore
  Percent                      0xffffffff990c6cc0 <clear_page_erms>:
                                 endbr64
                                 movl    $0x1000,%ecx
                                 xorl    %eax,%eax
     0.00  100.00    0.00        rep     stosb %al, (%rdi)
                               ← retq
                                 int3
                                 int3
                                 int3
                                 int3
                                 nop
                                 nop
  root@number:~#

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20240807061555.1642669-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf tools: Create source symlink in perf object dir
Andi Kleen [Wed, 7 Aug 2024 23:18:20 +0000 (16:18 -0700)]
perf tools: Create source symlink in perf object dir

Create a source symlink to the original source in the objdir.

This is similar to what the main kernel build script does.

Committer testing:

  ⬢[acme@toolbox perf-tools-next]$ make O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin
  <SNIP>
  ⬢[acme@toolbox perf-tools-next]$ ls -la /tmp/build/perf-tools-next/source
  lrwxrwxrwx. 1 acme acme 41 Aug  9 16:26 /tmp/build/perf-tools-next/source -> /home/acme/git/perf-tools-next/tools/perf
  ⬢[acme@toolbox perf-tools-next]$

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240807231823.898979-1-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf debuginfo: Fix the build with !HAVE_DWARF_SUPPORT
Arnaldo Carvalho de Melo [Fri, 9 Aug 2024 14:32:45 +0000 (11:32 -0300)]
perf debuginfo: Fix the build with !HAVE_DWARF_SUPPORT

In that case we have a set of placeholder functions, one of them uses a
'Dwarf_Addr' type that is not present as it is defined in the missing
DWARF libraries, so provide a placeholder typedef for that as well.

The build error before this patch:

  In file included from util/annotate.c:28:
  util/debuginfo.h:44:46: error: unknown type name ‘Dwarf_Addr’
     44 |                                              Dwarf_Addr *offs __maybe_unused,
        |                                              ^~~~~~~~~~
  make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: util/annotate.o] Error 1
  make[6]: *** Waiting for unfinished jobs....

Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/lkml/CAM9d7ciushSwEfj7yW4rtDEJBTcCB991V4cswwFEL+cv6QF2pg@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf script python: Add the 'ins_lat' field to event handler
Zixian Cai [Fri, 9 Aug 2024 08:01:36 +0000 (08:01 +0000)]
perf script python: Add the 'ins_lat' field to event handler

For example, when using the Alder Lake PMU memory load event, the
instruction latency is stored in 'ins_lat', while the cache latency
is stored in 'weight'.

This patch reports the 'ins_lat' field for Python scripting.

Committer testing:

On a Rocket Lake Refresh Intel machine (14th gen):

  root@number:~# grep -m1 'model name' /proc/cpuinfo
  model name : Intel(R) Core(TM) i7-14700K
  root@number:~# perf mem record -a sleep 5
  Memory events are enabled on a subset of CPUs: 16-27
  [ perf record: Woken up 85 times to write data ]
  [ perf record: Captured and wrote 41.236 MB perf.data (191390 samples) ]
  root@number:~# perf evlist -v
  cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
  cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
  dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
  root@number:~#

Now generate a python script to then dump the dictionary that now needs
to have that 'ins_lat' field:

  root@number:~# perf script --gen python
  generated Python script: perf-script.py
  root@number:~# vim perf-script.py
  root@number:~# perf script -s perf-script.py | head -40
  in trace_begin
  in trace_end
  root@number:~# vim perf-script.py

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Zixian Cai <fzczx123@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240809080137.3590148-1-fzczx123@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf test shell lbr: Support hybrid x86 systems too
Arnaldo Carvalho de Melo [Thu, 8 Aug 2024 14:26:13 +0000 (11:26 -0300)]
perf test shell lbr: Support hybrid x86 systems too

Running on a:

  root@x1:~# grep 'model name' -m1 /proc/cpuinfo
  model name : 13th Gen Intel(R) Core(TM) i7-1365U
  root@x1:~#

It skips all the tests with:

  root@x1:~# perf test -vvvv LBR
   97: perf record LBR tests:
  --- start ---
  test child forked, pid 2033388
  Skip: only x86 CPUs support LBR
  ---- end(-2) ----
   97: perf record LBR tests                                           : Skip
  root@x1:~#

Because the test checks for the /sys/devices/cpu/caps/branches file,
that isn't present as we have instead:

  root@x1:~# ls -la /sys/devices/cpu*/caps/branches
  -r--r--r--. 1 root root 4096 Aug  8 11:22 /sys/devices/cpu_atom/caps/branches
  -r--r--r--. 1 root root 4096 Aug  8 11:21 /sys/devices/cpu_core/caps/branches
  root@x1:~#

If we check as well for one of those,
/sys/devices/cpu_core/caps/branches, then we don't skip the tests and
all are run on these x86 Intel Hybrid systems as well, passing all of
them:

  root@x1:~# perf test -vvvv LBR
   97: perf record LBR tests:
  --- start ---
  test child forked, pid 2034956
  LBR callgraph
  [ perf record: Woken up 5 times to write data ]
  [ perf record: Captured and wrote 1.812 MB /tmp/__perf_test.perf.data.B2HvQ (8114 samples) ]
  LBR callgraph [Success]
  LBR any branch test
  [ perf record: Woken up 25 times to write data ]
  [ perf record: Captured and wrote 6.382 MB /tmp/__perf_test.perf.data.B2HvQ (8071 samples) ]
  LBR any branch test: 8071 samples
  LBR any branch test [Success]
  LBR any call test
  [ perf record: Woken up 23 times to write data ]
  [ perf record: Captured and wrote 6.208 MB /tmp/__perf_test.perf.data.B2HvQ (8092 samples) ]
  LBR any call test: 8092 samples
  LBR any call test [Success]
  LBR any ret test
  [ perf record: Woken up 24 times to write data ]
  [ perf record: Captured and wrote 6.396 MB /tmp/__perf_test.perf.data.B2HvQ (8093 samples) ]
  LBR any ret test: 8093 samples
  LBR any ret test [Success]
  LBR any indirect call test
  [ perf record: Woken up 25 times to write data ]
  [ perf record: Captured and wrote 6.344 MB /tmp/__perf_test.perf.data.B2HvQ (8067 samples) ]
  LBR any indirect call test: 8067 samples
  LBR any indirect call test [Success]
  LBR any indirect jump test
  [ perf record: Woken up 12 times to write data ]
  [ perf record: Captured and wrote 3.073 MB /tmp/__perf_test.perf.data.B2HvQ (8061 samples) ]
  LBR any indirect jump test: 8061 samples
  LBR any indirect jump test [Success]
  LBR direct calls test
  [ perf record: Woken up 25 times to write data ]
  [ perf record: Captured and wrote 6.380 MB /tmp/__perf_test.perf.data.B2HvQ (8076 samples) ]
  LBR direct calls test: 8076 samples
  LBR direct calls test [Success]
  LBR any indirect user call test
  [ perf record: Woken up 5 times to write data ]
  [ perf record: Captured and wrote 1.597 MB /tmp/__perf_test.perf.data.B2HvQ (8079 samples) ]
  LBR any indirect user call test: 8079 samples
  LBR any indirect user call test [Success]
  LBR system wide any branch test
  [ perf record: Woken up 26 times to write data ]
  [ perf record: Captured and wrote 9.088 MB /tmp/__perf_test.perf.data.B2HvQ (9209 samples) ]
  LBR system wide any branch test: 9209 samples
  LBR system wide any branch test [Success]
  LBR system wide any call test
  [ perf record: Woken up 25 times to write data ]
  [ perf record: Captured and wrote 8.945 MB /tmp/__perf_test.perf.data.B2HvQ (9333 samples) ]
  LBR system wide any call test: 9333 samples
  LBR system wide any call test [Success]
  LBR parallel any branch test
  LBR parallel any call test
  LBR parallel any ret test
  LBR parallel any indirect call test
  LBR parallel any indirect jump test
  LBR parallel direct calls test
  LBR parallel system wide any branch test
  LBR parallel any indirect user call test
  LBR parallel system wide any call test
  [ perf record: Woken up 9 times to write data ]
  [ perf record: Woken up 51 times to write data ]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Woken up 5 times to write data ]
  [ perf record: Woken up 559 times to write data ]
  [ perf record: Woken up 14 times to write data ]
  [ perf record: Woken up 17 times to write data ]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Woken up 11 times to write data ]
  [ perf record: Captured and wrote 0.150 MB /tmp/__perf_test.perf.data.lANpR (1909 samples) ]
  [ perf record: Captured and wrote 2.371 MB /tmp/__perf_test.perf.data.Olum8 (3033 samples) ]
  [ perf record: Captured and wrote 1.230 MB /tmp/__perf_test.perf.data.njfJ8 (1742 samples) ]
  [ perf record: Captured and wrote 5.554 MB /tmp/__perf_test.perf.data.4ZTrj (29662 samples) ]
  [ perf record: Captured and wrote 19.906 MB /tmp/__perf_test.perf.data.dlGQt (29576 samples) ]
  [ perf record: Captured and wrote 0.289 MB /tmp/__perf_test.perf.data.CAT7y (4311 samples) ]
  [ perf record: Captured and wrote 3.129 MB /tmp/__perf_test.perf.data.diuKG (3971 samples) ]
  LBR parallel any indirect user call test: 1909 samples
  [ perf record: Captured and wrote 4.858 MB /tmp/__perf_test.perf.data.sVjtN (6130 samples) ]
  LBR parallel any indirect user call test [Success]
  [ perf record: Captured and wrote 3.669 MB /tmp/__perf_test.perf.data.AJtNI (4827 samples) ]
  LBR parallel any indirect jump test: 4311 samples
  LBR parallel any indirect jump test [Success]
  LBR parallel direct calls test: 3033 samples
  LBR parallel direct calls test [Success]
  LBR parallel any indirect call test: 1742 samples
  LBR parallel any indirect call test [Success]
  LBR parallel any call test: 4827 samples
  LBR parallel any call test [Success]
  LBR parallel any branch test: 6130 samples
  LBR parallel any branch test [Success]
  LBR parallel system wide any branch test: 29662 samples
  LBR parallel any ret test: 3971 samples
  LBR parallel any ret test [Success]
  LBR parallel system wide any branch test [Success]
  LBR parallel system wide any call test: 29576 samples
  LBR parallel system wide any call test [Success]
  ---- end(0) ----
   97: perf record LBR tests                                           : Ok
  root@x1:~#

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZrTXftup0H46R8WK@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf test: Add set of perf record LBR tests
Ian Rogers [Thu, 8 Aug 2024 05:46:44 +0000 (22:46 -0700)]
perf test: Add set of perf record LBR tests

Adds coverage for LBR operations and LBR callgraph.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240808054644.1286065-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf callchain: Fix stitch LBR memory leaks
Ian Rogers [Thu, 8 Aug 2024 05:46:43 +0000 (22:46 -0700)]
perf callchain: Fix stitch LBR memory leaks

The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps
and map members are reference counted. Ensure these values use a _get
routine to increment the reference counts and use map_symbol__exit() to
release the reference counts.

Do similar for 'struct thread's prev_lbr_cursor, but save the size of
the prev_lbr_cursor array so that it may be iterated.

Ensure that when stitch_nodes are placed on the free list the
map_symbols are exited.

Fix resolve_lbr_callchain_sample() by replacing list_replace_init() to
list_splice_init(), so the whole list is moved and nodes aren't leaked.

A reproduction of the memory leaks is possible with a leak sanitizer
build in the perf report command of:

  ```
  $ perf record -e cycles --call-graph lbr perf test -w thloop
  $ perf report --stitch-lbr
  ```

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Fixes: ff165628d72644e3 ("perf callchain: Stitch LBR call stack")
Signed-off-by: Ian Rogers <irogers@google.com>
[ Basic tests after applying the patch, repeating the example above ]
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240808054644.1286065-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf test pmu: Set uninitialized PMU alias to null
Veronika Molnarova [Thu, 8 Aug 2024 10:37:49 +0000 (12:37 +0200)]
perf test pmu: Set uninitialized PMU alias to null

Commit 3e0bf9fde2984469 ("perf pmu: Restore full PMU name wildcard
support") adds a test case "PMU cmdline match" that covers PMU name
wildcard support provided by function perf_pmu__match().

The test works with a wide range of supported combinations of PMU name
matching but omits the case that if the perf_pmu__match() cannot match
the PMU name to the wildcard, it tries to match its alias. However, this
variable is not set up, causing the test case to fail when run with
subprocesses or to segfault if run as a single process.

  ./perf test -vv 9
    9: Sysfs PMU tests                                :
    9.1: Parsing with PMU format directory            : Ok
    9.2: Parsing with PMU event                       : Ok
    9.3: PMU event names                              : Ok
    9.4: PMU name combining                           : Ok
    9.5: PMU name comparison                          : Ok
    9.6: PMU cmdline match                            : FAILED!

  ./perf test -F 9
    9.1: Parsing with PMU format directory            : Ok
    9.2: Parsing with PMU event                       : Ok
    9.3: PMU event names                              : Ok
    9.4: PMU name combining                           : Ok
    9.5: PMU name comparison                          : Ok
  Segmentation fault (core dumped)

Initialize the PMU alias to null for all tests of perf_pmu__match()
as this functionality is not being tested and the alias matching works
exactly the same as the matching of the PMU name.

./perf test -F 9
  9.1: Parsing with PMU format directory                             : Ok
  9.2: Parsing with PMU event                                        : Ok
  9.3: PMU event names                                               : Ok
  9.4: PMU name combining                                            : Ok
  9.5: PMU name comparison                                           : Ok
  9.6: PMU cmdline match                                             : Ok

Fixes: 3e0bf9fde2984469 ("perf pmu: Restore full PMU name wildcard support")
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Radostin Stoyanov <rstoyano@redhat.com>
Link: https://lore.kernel.org/r/20240808103749.9356-1-vmolnaro@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf tests ftrace: Add pattern check for time, count
Arnaldo Carvalho de Melo [Thu, 8 Aug 2024 12:59:40 +0000 (09:59 -0300)]
perf tests ftrace: Add pattern check for time, count

In 'perf ftrace profile sleep 0.1' we know that we'll have an specific
kernel function that will take a bit more than 0.1 seconds and will take
place just one time, so we can add a check for that so that we validate
more than just the presence of some functions in the profile.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/lkml/ZrTBo7KACZeuCyLj@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf test: Add a new shell test for perf ftrace
Namhyung Kim [Thu, 8 Aug 2024 04:49:54 +0000 (21:49 -0700)]
perf test: Add a new shell test for perf ftrace

  $ sudo ./perf test ftrace -vv
   86: perf ftrace tests:
  --- start ---
  test child forked, pid 1772223
  perf ftrace list test
  syscalls for sleep:
  __x64_sys_nanosleep
  __ia32_sys_nanosleep
  __x64_sys_clock_nanosleep
  __ia32_sys_clock_nanosleep
  perf ftrace list test  [Success]
  perf ftrace trace test
  # tracer: function_graph
  #
  # CPU  DURATION                  FUNCTION CALLS
  # |     |   |                     |   |   |   |
   0)               |  __x64_sys_clock_nanosleep() {
   0)               |    common_nsleep() {
   0)               |      hrtimer_nanosleep() {
   0)               |        do_nanosleep() {
  perf ftrace trace test  [Success]
  perf ftrace latency test
  target function: __x64_sys_clock_nanosleep
  #   DURATION     |      COUNT | GRAPH                                          |
      32 - 64   ms |          1 | ############################################## |
  perf ftrace latency test  [Success]
  perf ftrace profile test
  # Total (us)   Avg (us)   Max (us)      Count   Function
    100136.400 100136.400 100136.400          1   __x64_sys_clock_nanosleep
    100135.200 100135.200 100135.200          1   common_nsleep
    100134.700 100134.700 100134.700          1   hrtimer_nanosleep
    100133.700 100133.700 100133.700          1   do_nanosleep
    100130.600 100130.600 100130.600          1   schedule
       166.868     55.623     80.299          3   scheduler_tick
         5.926      5.926      5.926          1   native_smp_send_reschedule
       301.941    301.941    301.941          1   __x64_sys_execve
       295.786    295.786    295.786          1   do_execveat_common.isra.0
        71.397     35.699     46.403          2   bprm_execve
         2.519      1.260      1.547          2   sched_mm_cid_before_execve
         1.098      0.549      0.686          2   sched_mm_cid_after_execve
  perf ftrace profile test  [Success]
  ---- end(0) ----
   86: perf ftrace tests                                               : Ok

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240808044954.1775333-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf annotate-data: Show typedef names properly
Namhyung Kim [Wed, 7 Aug 2024 22:31:29 +0000 (15:31 -0700)]
perf annotate-data: Show typedef names properly

The die_get_typename() would resolve typedef and get to the original
type.  But sometimes the original type is a struct without name and it
makes the output confusing and hard to read.

This is a diff of perf report -s type before and after the change.
New types such as atomic{,64}_t and sigset_t appeared and the portion
of unnamed struct was reduced.  Also u32, u64 and size_t were splitted
from the base types.

  --- b   2024-08-01 17:02:34.307809952 -0700
  +++ a   2024-08-07 14:17:05.245853999 -0700
  -     2.40%  long unsigned int
  +     2.26%  long unsigned int
  -     1.56%  unsigned int
  +     1.27%  unsigned int
  -     0.98%  struct
  -     0.79%  long long unsigned int
  +     0.58%  long long unsigned int
  +     0.36%  struct
  +     0.27%  atomic64_t
  +     0.22%  u32
  +     0.21%  u64
  +     0.19%  atomic_t
  +     0.13%  size_t
  -     0.08%  struct seqcount_spinlock
  +     0.08%  seqcount_spinlock_t
  +     0.08%  sigset_t
  +     0.08%  __poll_t

Let's use the typedef name directly and the resolved to get the size of
the type.

Committer testing:

  root@x1:~# diff -u before after | head -30
  --- before 2024-08-08 09:35:13.917325041 -0300
  +++ after 2024-08-08 09:37:35.312257905 -0300
  @@ -10,25 +10,27 @@
   # ........  .........
   #
       79.40%  (unknown)
  -     2.28%  union
        1.96%  (stack operation)
  -     1.24%  struct
  +     1.87%  pthread_mutex_t
        0.99%  u32[]
  -     0.92%  unsigned int
        0.77%  struct task_struct
  +     0.75%  U32
        0.75%  struct pcpu_hot
        0.63%  struct qspinlock
  +     0.61%  atomic_t
        0.59%  struct list_head
  -     0.58%  int
        0.53%  struct cfs_rq
        0.51%  BYTE*
  -     0.48%  unsigned char
  +     0.48%  BYTE
        0.48%  long unsigned int
        0.46%  struct rq
        0.41%  struct worker
        0.41%  struct memcg_vmstats_percpu
  +     0.41%  pthread_cond_t
        0.37%  _Bool
  +     0.36%  int
  root@x1:~#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240807223129.1738004-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf annotate: Cache debuginfo for data type profiling
Namhyung Kim [Mon, 5 Aug 2024 23:46:48 +0000 (16:46 -0700)]
perf annotate: Cache debuginfo for data type profiling

In find_data_type(), it creates and deletes a debug info whenver it
tries to find data type for a sample.  This is inefficient and it most
likely accesses the same binary again and again.

Let's add a single entry cache the debug info structure for the last DSO.
Depending on sample data, it usually gives me 2~3x (and sometimes more)
speed ups.

Note that this will introduce a little difference in the output due to
the order of checking stack operations.  It used to check the stack ops
before checking the availability of debug info but I moved it after the
symbol check.  So it'll report stack operations in DSOs without debug
info as unknown.  But I think it's ok and better to have the checking
near the caching logic.

Committer testing:

  root@x1:~# perf mem record -a sleep 5s
  root@x1:~# perf evlist
  cpu_atom/mem-loads,ldlat=30/P
  cpu_atom/mem-stores/P
  dummy:u
  root@x1:~# diff -u before after
  --- before 2024-08-08 09:33:53.880780784 -0300
  +++ after 2024-08-08 09:35:13.917325041 -0300
  @@ -81,8 +81,8 @@
   # Overhead  Data Type
   # ........  .........
   #
  -    55.43%  (unknown)
  -    11.61%  (stack operation)
  +    55.56%  (unknown)
  +    11.48%  (stack operation)
        4.93%  struct pcpu_hot
        3.26%  unsigned int
        2.48%  struct

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240805234648.1453689-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf hist: Fix reference counting of branch_info
Ian Rogers [Wed, 7 Aug 2024 06:51:36 +0000 (23:51 -0700)]
perf hist: Fix reference counting of branch_info

iter_finish_branch_entry() doesn't put the branch_info from/to map
elements creating memory leaks. This can be seen with:

```
$ perf record -e cycles -b perf test -w noploop
$ perf report -D
...
Direct leak of 984344 byte(s) in 123043 object(s) allocated from:
    #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x564d3400d10b in map__get util/map.h:186
    #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981
    #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151
    #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898
    #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238
    #6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334
    #7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655
    #8 0x564d3403ba52 in do_flush util/ordered-events.c:245
    #9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324
    #10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708
    #11 0x564d34032480 in perf_session__process_event util/session.c:1877
    #12 0x564d340336ad in reader__read_event util/session.c:2399
    #13 0x564d34033fdc in reader__process_events util/session.c:2448
    #14 0x564d34033fdc in __perf_session__process_events util/session.c:2495
    #15 0x564d34033fdc in perf_session__process_events util/session.c:2661
    #16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065
    #17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805
    #18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350
    #19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403
    #20 0x564d33cdd827 in run_argv tools/perf/perf.c:447
    #21 0x564d33cdd827 in main tools/perf/perf.c:561
...
```

Clearing up the map_symbols properly creates maps reference count
issues so resolve those. Resolving this issue doesn't improve peak
heap consumption for the test above.

Committer testing:

  $ sudo dnf install libasan
  $ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Yanteng Si <siyanteng@loongson.cn>
Link: https://lore.kernel.org/r/20240807065136.1039977-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoMerge remote-tracking branch 'torvalds/master' into perf-tools-next
Arnaldo Carvalho de Melo [Tue, 6 Aug 2024 17:01:06 +0000 (14:01 -0300)]
Merge remote-tracking branch 'torvalds/master' into perf-tools-next

To pick a patch that albeit being for tools/perf/ directory went thru a
different tree and ended up breaking some recent tests introduced in the
perf-tools-next tree to validate duplicate events in the JSON
performance event files.

Link: https://lore.kernel.org/lkml/ZrIqDMg7cBVhstYU@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoMerge tag 'platform-drivers-x86-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 6 Aug 2024 14:52:10 +0000 (07:52 -0700)]
Merge tag 'platform-drivers-x86-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:
 "Fixes:

   - Fix ACPI notifier racing with itself (intel-vbtn)

   - Initialize local variable to cover a timeout corner case
     (intel/ifs)

   - WMI docs spelling

  New device IDs:

   - amd/{pmc,pmf}: AMD 1Ah model 60h series.

   - amd/pmf: SPS quirk support for ASUS ROG Ally X"

* tag 'platform-drivers-x86-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/intel/ifs: Initialize union ifs_status to zero
  platform/x86: msi-wmi-platform: Fix spelling mistakes
  platform/x86/amd/pmf: Add new ACPI ID AMDI0107
  platform/x86/amd/pmc: Send OS_HINT command for new AMD platform
  platform/x86/amd: pmf: Add quirk for ROG Ally X
  platform/x86: intel-vbtn: Protect ACPI notify handler against recursion

11 months agoperf jevents.py: Ensure event names aren't duplicated
Ian Rogers [Mon, 5 Aug 2024 19:44:24 +0000 (12:44 -0700)]
perf jevents.py: Ensure event names aren't duplicated

Duplicate event names break invariants in 'perf list'. Assert that an
event name isn't duplicated so that broken JSON won't build.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inochi Amaoto <inochiama@outlook.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Locus Wei-Han Chen <locus84@andestech.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240805194424.597244-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf pmu-events: Remove duplicated ampereone event
Ian Rogers [Mon, 5 Aug 2024 19:44:22 +0000 (12:44 -0700)]
perf pmu-events: Remove duplicated ampereone event

OP_SPEC is repeated twice in the file which will break invariants in
'perf list' as discussed in this thread:

  https://lore.kernel.org/linux-perf-users/20240719081651.24853-1-eric.lin@sifive.com/

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inochi Amaoto <inochiama@outlook.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Locus Wei-Han Chen <locus84@andestech.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240805194424.597244-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf pmu-events: Change dependencies for empty-pmu-events.c test
Ian Rogers [Mon, 5 Aug 2024 19:44:21 +0000 (12:44 -0700)]
perf pmu-events: Change dependencies for empty-pmu-events.c test

Switch from $? (all the prerequisites that are newer than the target)
to $^ (all the prerequisites) as touching jevents.py will mean that
empty-pmu-events.c won't be passed to the diff command breaking the
build.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inochi Amaoto <inochiama@outlook.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Locus Wei-Han Chen <locus84@andestech.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240805194424.597244-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf test: Add build test for JEVENTS_ARCH=all
Ian Rogers [Mon, 5 Aug 2024 19:44:20 +0000 (12:44 -0700)]
perf test: Add build test for JEVENTS_ARCH=all

Building with JEVENTS_ARCH=all builds all CPU types and allows things
like assertions to check the validity of the input JSON.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inochi Amaoto <inochiama@outlook.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Locus Wei-Han Chen <locus84@andestech.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240805194424.597244-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoMerge tag 'linux_kselftest-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Mon, 5 Aug 2024 21:31:12 +0000 (14:31 -0700)]
Merge tag 'linux_kselftest-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan:
 "A single fix to the conditional in ksft.py script which incorrectly
  flags a test suite failed when there are skipped tests in the mix.

  The logic is fixed to take skipped tests into account and report the
  test as passed"

* tag 'linux_kselftest-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: ksft: Fix finished() helper exit code on skipped tests

11 months agoperf annotate: Add --skip-empty option
Namhyung Kim [Sat, 3 Aug 2024 21:13:32 +0000 (14:13 -0700)]
perf annotate: Add --skip-empty option

Like in 'perf report', we want to hide empty events in the 'perf annotate'
output.  This is consistent when the option is set in perf report.

For example, the following command would use 3 events including dummy.

  $ perf mem record -a -- perf test -w noploop

  $ perf evlist
  cpu/mem-loads,ldlat=30/P
  cpu/mem-stores/P
  dummy:u

Just using perf annotate with --group will show the all 3 events.

  $ perf annotate --group --stdio | head
   Percent                 | Source code & Disassembly of ...
  --------------------------------------------------------------
                           : 0     0xe060 <_dl_relocate_object>:
      0.00    0.00    0.00 :    e060:       pushq   %rbp
      0.00    0.00    0.00 :    e061:       movq    %rsp, %rbp
      0.00    0.00    0.00 :    e064:       pushq   %r15
      0.00    0.00    0.00 :    e066:       movq    %rdi, %r15
      0.00    0.00    0.00 :    e069:       pushq   %r14
      0.00    0.00    0.00 :    e06b:       pushq   %r13
      0.00    0.00    0.00 :    e06d:       movl    %edx, %r13d

Now with --skip-empty, it'll hide the last dummy event.

  $ perf annotate --group --stdio --skip-empty | head
   Percent         | Source code & Disassembly of ...
  ------------------------------------------------------
                   : 0     0xe060 <_dl_relocate_object>:
      0.00    0.00 :    e060:       pushq   %rbp
      0.00    0.00 :    e061:       movq    %rsp, %rbp
      0.00    0.00 :    e064:       pushq   %r15
      0.00    0.00 :    e066:       movq    %rdi, %r15
      0.00    0.00 :    e069:       pushq   %r14
      0.00    0.00 :    e06b:       pushq   %r13
      0.00    0.00 :    e06d:       movl    %edx, %r13d

Committer testing:

  root@x1:~# perf evlist
  cpu_atom/mem-loads,ldlat=30/P
  cpu_atom/mem-stores/P
  dummy:u
  root@x1:~#

Before:

  root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25
  Samples: 20  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period]
  do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
  Percent                       0x9900 <do_lookup_x>:
                                  pushq      %rbp
                                  movq       %rsp,%rbp
                                  pushq      %r15
                                  pushq      %r14
                                  pushq      %r13
                                  pushq      %r12
                                  pushq      %rbx
                                  subq       $0x88,%rsp
                                  movq       %rdi,-0x50(%rbp)
                                  movl       8(%r9),%edi
                                  movq       0x10(%rbp),%r12
                                  movq       0x28(%rbp),%r10
                                  movq       %rdx,-0x70(%rbp)
                                  movq       %rcx,-0x58(%rbp)
                                  movq       %rdi,%r11
     0.00    5.73    0.00         movq       %r8,-0x68(%rbp)
                                  movq       (%r9),%r8
                                  movl       %esi,%eax
     8.30    0.00    0.00         movl       0x30(%rbp),%r9d
                                  movl       %esi,%r15d
                                  shrl       $6, %eax
                                  movq       %r8,%r13
  root@x1:~#

After:

  root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x | head -25
  Samples: 20  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period]
  do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
  Percent               0x9900 <do_lookup_x>:
                          pushq      %rbp
                          movq       %rsp,%rbp
                          pushq      %r15
                          pushq      %r14
                          pushq      %r13
                          pushq      %r12
                          pushq      %rbx
                          subq       $0x88,%rsp
                          movq       %rdi,-0x50(%rbp)
                          movl       8(%r9),%edi
                          movq       0x10(%rbp),%r12
                          movq       0x28(%rbp),%r10
                          movq       %rdx,-0x70(%rbp)
                          movq       %rcx,-0x58(%rbp)
                          movq       %rdi,%r11
     0.00    5.73         movq       %r8,-0x68(%rbp)
                          movq       (%r9),%r8
                          movl       %esi,%eax
     8.30    0.00         movl       0x30(%rbp),%r9d
                          movl       %esi,%r15d
                          shrl       $6, %eax
                          movq       %r8,%r13
  root@x1:~#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf annotate: Set al->data_nr using the notes->src->nr_events
Namhyung Kim [Sat, 3 Aug 2024 21:13:31 +0000 (14:13 -0700)]
perf annotate: Set al->data_nr using the notes->src->nr_events

This is a preparation to support skipping empty events.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf annotate: Use annotation__pcnt_width() consistently
Namhyung Kim [Sat, 3 Aug 2024 21:13:30 +0000 (14:13 -0700)]
perf annotate: Use annotation__pcnt_width() consistently

The annotation__pcnt_width() calculates the screen width for the
overhead (percent) area considering event groups properly.  Use this
function consistently so that we can make sure it has similar output
in different modes.  But there's a difference in stdio and tui output:
stdio uses 8 and tui uses 7 for a percent.

Let's use 8 and adjust the print width in __annotation_line__write()
properly.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf annotate: Set notes->src->nr_events early
Namhyung Kim [Sat, 3 Aug 2024 21:13:29 +0000 (14:13 -0700)]
perf annotate: Set notes->src->nr_events early

We want to use it in different places so make sure it sets properly
in symbol__annotate() before creating the disasm lines.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf annotate: Use al->data_nr if possible
Namhyung Kim [Sat, 3 Aug 2024 21:13:28 +0000 (14:13 -0700)]
perf annotate: Use al->data_nr if possible

The data_nr keeps the number of entries in al->data[] so it should use
it when it iterates the array.  The notes->src->nr_events should have
the same number but it'd be natural to use al->data_nr.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoMerge tag 'slab-fixes-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 5 Aug 2024 16:23:00 +0000 (09:23 -0700)]
Merge tag 'slab-fixes-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:
 "Since v6.8 we've had a subtle breakage in SLUB with KFENCE enabled,
  that can cause a crash. It hasn't been found earlier due to quite
  specific conditions necessary (OOM during kmem_cache_alloc_bulk())"

* tag 'slab-fixes-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm, slub: do not call do_slab_free for kfence object

11 months agotools build: Correct bpf fixdep dependencies
Brian Norris [Mon, 15 Jul 2024 20:32:44 +0000 (13:32 -0700)]
tools build: Correct bpf fixdep dependencies

The dependencies in tools/lib/bpf/Makefile are incorrect. Before we
recurse to build $(BPF_IN_STATIC), we need to build its 'fixdep'
executable.

I can't use the usual shortcut from Makefile.include:

  <target>: <sources> fixdep

because its 'fixdep' target relies on $(OUTPUT), and $(OUTPUT) differs
in the parent 'make' versus the child 'make' -- so I imitate it via
open-coding.

I tweak a few $(MAKE) invocations while I'm at it, because
1. I'm adding a new recursive make; and
2. these recursive 'make's print spurious lines about files that are "up
   to date" (which isn't normally a feature in Kbuild subtargets) or
   "jobserver not available" (see [1])

I also need to tweak the assignment of the OUTPUT variable, so that
relative path builds work. For example, for 'make tools/lib/bpf', OUTPUT
is unset, and is usually treated as "cwd" -- but recursive make will
change cwd and so OUTPUT has a new meaning. For consistency, I ensure
OUTPUT is always an absolute path.

And $(Q) gets a backup definition in tools/build/Makefile.include,
because Makefile.include is sometimes included without
tools/build/Makefile, so the "quiet command" stuff doesn't actually work
consistently without it.

After this change, top-level builds result in an empty grep result from:

  $ grep 'cannot find fixdep' $(find tools/ -name '*.cmd')

[1] https://www.gnu.org/software/make/manual/html_node/MAKE-Variable.html
If we're not using $(MAKE) directly, then we need to use more '+'.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20240715203325.3832977-4-briannorris@chromium.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agotools build: Avoid circular .fixdep-in.o.cmd issues
Brian Norris [Mon, 15 Jul 2024 20:32:43 +0000 (13:32 -0700)]
tools build: Avoid circular .fixdep-in.o.cmd issues

The 'fixdep' tool is used to post-process dependency files for various
reasons, and it runs after every object file generation command. This
even includes 'fixdep' itself.

In Kbuild, this isn't actually a problem, because it uses a single
command to generate fixdep (a compile-and-link command on fixdep.c), and
afterward runs the fixdep command on the accompanying .fixdep.cmd file.

In tools/ builds (which notably is maintained separately from Kbuild),
fixdep is generated in several phases:

 1. fixdep.c -> fixdep-in.o
 2. fixdep-in.o -> fixdep

Thus, fixdep is not available in the post-processing for step 1, and
instead, we generate .cmd files that look like:

  ## from tools/objtool/libsubcmd/.fixdep.o.cmd
  # cannot find fixdep (/path/to/linux/tools/objtool/libsubcmd//fixdep)
  [...]

These invalid .cmd files are benign in some respects, but cause problems
in others (such as the linked reports).

Because the tools/ build system is rather complicated in its own right
(and pointedly different than Kbuild), I choose to simply open-code the
rule for building fixdep, and avoid the recursive-make indirection that
produces the problem in the first place.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/all/Zk-C5Eg84yt6_nml@google.com/
Link: https://lore.kernel.org/r/20240715203325.3832977-3-briannorris@chromium.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agotools build: Correct libsubcmd fixdep dependencies
Brian Norris [Mon, 15 Jul 2024 20:32:42 +0000 (13:32 -0700)]
tools build: Correct libsubcmd fixdep dependencies

All built targets need fixdep to be built first, before handling object
dependencies [1]. We're missing one such dependency before the libsubcmd
target.

This resolves .cmd file generation issues such that the following
sequence produces many fewer results:

  $ git clean -xfd tools/
  $ make tools/objtool
  $ grep "cannot find fixdep" $(find tools/objtool -name '*.cmd')

In particular, only a buggy tools/objtool/libsubcmd/.fixdep.o.cmd
remains, due to circular dependencies of fixdep on itself.

Such incomplete .cmd files don't usually cause a direct problem, since
they're designed to fail "open", but they can cause some subtle problems
that would otherwise be handled by proper fixdep'd dependency files. [2]

[1] This problem is better described in commit abb26210a395 ("perf
tools: Force fixdep compilation at the start of the build"). I don't
apply its solution here, because additional recursive make can be a bit
of overkill.

[2] Example failure case:

  cp -arl linux-src linux-src2
  cd linux-src2
  make O=/path/to/out
  cd ../linux-src
  rm -rf ../linux-src2
  make O=/path/to/out

Previously, we'd see errors like:

  make[6]: *** No rule to make target
  '/path/to/linux-src2/tools/include/linux/compiler.h', needed by
  '/path/to/out/tools/bpf/resolve_btfids/libsubcmd/exec-cmd.o'.  Stop.

Now, the properly-fixdep'd .cmd files will ignore a missing
/path/to/linux-src2/...

Signed-off-by: Brian Norris <briannorris@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/all/ZGVi9HbI43R5trN8@bhelgaas/
Link: https://lore.kernel.org/all/Zk-C5Eg84yt6_nml@google.com/
Link: https://lore.kernel.org/r/20240715203325.3832977-2-briannorris@chromium.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf mem: Update documentation for new options
Namhyung Kim [Fri, 2 Aug 2024 18:09:13 +0000 (11:09 -0700)]
perf mem: Update documentation for new options

Add a common options section and move some items to the section.  Also
add description of new options to report options.

Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/20240802180913.1023886-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoLinux 6.11-rc2
Linus Torvalds [Sun, 4 Aug 2024 20:50:53 +0000 (13:50 -0700)]
Linux 6.11-rc2

11 months agoprofiling: remove profile=sleep support
Tetsuo Handa [Sun, 4 Aug 2024 09:48:10 +0000 (18:48 +0900)]
profiling: remove profile=sleep support

The kernel sleep profile is no longer working due to a recursive locking
bug introduced by commit 42a20f86dc19 ("sched: Add wrapper for get_wchan()
to keep task blocked")

Booting with the 'profile=sleep' kernel command line option added or
executing

  # echo -n sleep > /sys/kernel/profiling

after boot causes the system to lock up.

Lockdep reports

  kthreadd/3 is trying to acquire lock:
  ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70

  but task is already holding lock:
  ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370

with the call trace being

   lock_acquire+0xc8/0x2f0
   get_wchan+0x32/0x70
   __update_stats_enqueue_sleeper+0x151/0x430
   enqueue_entity+0x4b0/0x520
   enqueue_task_fair+0x92/0x6b0
   ttwu_do_activate+0x73/0x140
   try_to_wake_up+0x213/0x370
   swake_up_locked+0x20/0x50
   complete+0x2f/0x40
   kthread+0xfb/0x180

However, since nobody noticed this regression for more than two years,
let's remove 'profile=sleep' support based on the assumption that nobody
needs this functionality.

Fixes: 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 months agoMerge tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 4 Aug 2024 15:57:08 +0000 (08:57 -0700)]
Merge tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:

 - Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver.

   A recent change in the ACPI code which consolidated code pathes moved
   the invocation of init_freq_invariance_cppc() to be moved to a CPU
   hotplug handler. The first invocation on AMD CPUs ends up enabling a
   static branch which dead locks because the static branch enable tries
   to acquire cpu_hotplug_lock but that lock is already held write by
   the hotplug machinery.

   Use static_branch_enable_cpuslocked() instead and take the hotplug
   lock read for the Intel code path which is invoked from the
   architecture code outside of the CPU hotplug operations.

 - Fix the number of reserved bits in the sev_config structure bit field
   so that the bitfield does not exceed 64 bit.

 - Add missing Zen5 model numbers

 - Fix the alignment assumptions of pti_clone_pgtable() and
   clone_entry_text() on 32-bit:

   The code assumes PMD aligned code sections, but on 32-bit the kernel
   entry text is not PMD aligned. So depending on the code size and
   location, which is configuration and compiler dependent, entry text
   can cross a PMD boundary. As the start is not PMD aligned adding PMD
   size to the start address is larger than the end address which
   results in partially mapped entry code for user space. That causes
   endless recursion on the first entry from userspace (usually #PF).

   Cure this by aligning the start address in the addition so it ends up
   at the next PMD start address.

   clone_entry_text() enforces PMD mapping, but on 32-bit the tail might
   eventually be PTE mapped, which causes a map fail because the PMD for
   the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for
   the clone() invocation which resolves to PTE on 32-bit and PMD on
   64-bit.

 - Zero the 8-byte case for get_user() on range check failure on 32-bit

   The recend consolidation of the 8-byte get_user() case broke the
   zeroing in the failure case again. Establish it by clearing ECX
   before the range check and not afterwards as that obvioulsy can't be
   reached when the range check fails

* tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit
  x86/mm: Fix pti_clone_entry_text() for i386
  x86/mm: Fix pti_clone_pgtable() alignment assumption
  x86/setup: Parse the builtin command line before merging
  x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range
  x86/sev: Fix __reserved field in sev_config
  x86/aperfmperf: Fix deadlock on cpu_hotplug_lock

11 months agoMerge tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 4 Aug 2024 15:50:16 +0000 (08:50 -0700)]
Merge tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "Two fixes for the timer/clocksource code:

   - The recent fix to make the take over of the broadcast timer more
     reliable retrieves a per CPU pointer in preemptible context.

     This went unnoticed in testing as some compilers hoist the access
     into the non-preemotible section where the pointer is actually
     used, but obviously compilers can rightfully invoke it where the
     code put it.

     Move it into the non-preemptible section right to the actual usage
     side to cure it.

   - The clocksource watchdog is supposed to emit a warning when the
     retry count is greater than one and the number of retries reaches
     the limit.

     The condition is backwards and warns always when the count is
     greater than one. Fixup the condition to prevent spamming dmesg"

* tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()
  tick/broadcast: Move per CPU pointer access into the atomic section

11 months agoMerge tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 4 Aug 2024 15:46:14 +0000 (08:46 -0700)]
Merge tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:

 - When stime is larger than rtime due to accounting imprecision, then
   utime = rtime - stime becomes negative. As this is unsigned math, the
   result becomes a huge positive number.

   Cure it by resetting stime to rtime in that case, so utime becomes 0.

 - Restore consistent state when sched_cpu_deactivate() fails.

   When offlining a CPU fails in sched_cpu_deactivate() after the SMT
   present counter has been decremented, then the function aborts but
   fails to increment the SMT present counter and leaves it imbalanced.
   Consecutive operations cause it to underflow. Add the missing fixup
   for the error path.

   For SMT accounting the runqueue needs to marked online again in the
   error exit path to restore consistent state.

* tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix unbalance set_rq_online/offline() in sched_cpu_deactivate()
  sched/core: Introduce sched_set_rq_on/offline() helper
  sched/smt: Fix unbalance sched_smt_present dec/inc
  sched/smt: Introduce sched_smt_present_inc/dec() helper
  sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime

11 months agoMerge tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 4 Aug 2024 15:42:18 +0000 (08:42 -0700)]
Merge tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf fixes from Thomas Gleixner:

 - Move the smp_processor_id() invocation back into the non-preemtible
   region, so that the result is valid to use

 - Add the missing package C2 residency counters for Sierra Forest CPUs
   to make the newly added support actually useful

* tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix smp_processor_id()-in-preemptible warnings
  perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest

11 months agoMerge tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 4 Aug 2024 15:36:57 +0000 (08:36 -0700)]
Merge tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A couple of fixes for interrupt chip drivers:

   - Make sure to skip the clear register space in the MBIGEN driver
     when calculating the node register index. Otherwise the clear
     register is clobbered and the wrong node registers are accessed.

   - Fix a signed/unsigned confusion in the loongarch CPU driver which
     converts an error code to a huge "valid" interrupt number.

   - Convert the mesion GPIO interrupt controller lock to a raw spinlock
     so it works on RT.

   - Add a missing static to a internal function in the pic32 EVIC
     driver"

* tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mbigen: Fix mbigen node address layout
  irqchip/meson-gpio: Convert meson_gpio_irq_controller::lock to 'raw_spinlock_t'
  irqchip/irq-pic32-evic: Add missing 'static' to internal function
  irqchip/loongarch-cpu: Fix return value of lpic_gsi_to_irq()

11 months agoMerge tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 4 Aug 2024 15:32:31 +0000 (08:32 -0700)]
Merge tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "Two fixes for locking and jump labels:

   - Ensure that the atomic_cmpxchg() conditions are correct and
     evaluating to true on any non-zero value except 1. The missing
     check of the return value leads to inconsisted state of the jump
     label counter.

   - Add a missing type conversion in the paravirt spinlock code which
     makes loongson build again"

* tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  jump_label: Fix the fix, brown paper bags galore
  locking/pvqspinlock: Correct the type of "old" variable in pv_kick_node()

11 months agoarm: dts: arm: versatile-ab: Fix duplicate clock node name
Rob Herring (Arm) [Tue, 30 Jul 2024 21:00:30 +0000 (15:00 -0600)]
arm: dts: arm: versatile-ab: Fix duplicate clock node name

Commit 04f08ef291d4 ("arm/arm64: dts: arm: Use generic clock and
regulator nodenames") renamed nodes and created 2 "clock-24000000" nodes
(at different paths).

The kernel can't handle these duplicate names even though they are at
different paths.  Fix this by renaming one of the nodes to "clock-pclk".

This name is aligned with other Arm boards (those didn't have a known
frequency to use in the node name).

Fixes: 04f08ef291d4 ("arm/arm64: dts: arm: Use generic clock and regulator nodenames")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 months agoMerge tag '6.11-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 4 Aug 2024 15:18:40 +0000 (08:18 -0700)]
Merge tag '6.11-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - two reparse point fixes

 - minor cleanup

 - additional trace point (to help debug a recent problem)

* tag '6.11-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal version number
  smb: client: fix FSCTL_GET_REPARSE_POINT against NetApp
  smb3: add dynamic tracepoints for shutdown ioctl
  cifs: Remove cifs_aio_ctx
  smb: client: handle lack of FSCTL_GET_REPARSE_POINT support

11 months agoMerge tag 'media/v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Sun, 4 Aug 2024 15:12:33 +0000 (08:12 -0700)]
Merge tag 'media/v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - two Kconfig fixes

 - one fix for the UVC driver addressing probing time detection of a UVC
   custom controls

 - one fix related to PDF generation

* tag 'media/v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: v4l: Fix missing tabular column hint for Y14P format
  media: intel/ipu6: select AUXILIARY_BUS in Kconfig
  media: ipu-bridge: fix ipu6 Kconfig dependencies
  media: uvcvideo: Fix custom control mapping probing

11 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 3 Aug 2024 22:12:56 +0000 (15:12 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "One core change that reverts the double message print patch in sd.c
  (it was causing regressions on embedded systems).

  The rest are driver fixes in ufs, mpt3sas and mpi3mr"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: exynos: Don't resume FMP when crypto support is disabled
  scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES
  scsi: mpi3mr: Avoid IOMMU page faults on REPORT ZONES
  scsi: ufs: core: Do not set link to OFF state while waking up from hibernation
  scsi: Revert "scsi: sd: Do not repeat the starting disk message"
  scsi: ufs: core: Fix deadlock during RTC update
  scsi: ufs: core: Bypass quick recovery if force reset is needed
  scsi: ufs: core: Check LSDBS cap when !mcq

11 months agoMerge tag 'xfs-6.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 3 Aug 2024 16:09:25 +0000 (09:09 -0700)]
Merge tag 'xfs-6.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Chandan Babu:

 - Fix memory leak when corruption is detected during scrubbing parent
   pointers

 - Allow SECURE namespace xattrs to use reserved block pool to in order
   to prevent ENOSPC

 - Save stack space by passing tracepoint's char array to file_path()
   instead of another stack variable

 - Remove unused parameter in macro XFS_DQUOT_LOGRES

 - Replace comma with semicolon in a couple of places

* tag 'xfs-6.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: convert comma to semicolon
  xfs: convert comma to semicolon
  xfs: remove unused parameter in macro XFS_DQUOT_LOGRES
  xfs: fix file_path handling in tracepoints
  xfs: allow SECURE namespace xattrs to use reserved block pool
  xfs: fix a memory leak

11 months agoMerge tag 'parisc-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 3 Aug 2024 16:03:21 +0000 (09:03 -0700)]
Merge tag 'parisc-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc architecture fixes from Helge Deller:

 - fix unaligned memory accesses when calling BPF functions

 - adjust memory size constants to fix possible DMA corruptions

* tag 'parisc-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: fix a possible DMA corruption
  parisc: fix unaligned accesses in BPF

11 months agoruntime constants: deal with old decrepit linkers
Linus Torvalds [Sat, 3 Aug 2024 01:12:06 +0000 (18:12 -0700)]
runtime constants: deal with old decrepit linkers

The runtime constants linker script depended on documented linker
behavior [1]:

 "If an output section’s name is the same as the input section’s name
  and is representable as a C identifier, then the linker will
  automatically PROVIDE two symbols: __start_SECNAME and __stop_SECNAME,
  where SECNAME is the name of the section. These indicate the start
  address and end address of the output section respectively"

to just automatically define the symbol names for the bounds of the
runtime constant arrays.

It turns out that this isn't actually something we can rely on, with old
linkers not generating these automatic symbols.  It looks to have been
introduced in binutils-2.29 back in 2017, and we still support building
with versions all the way back to binutils-2.25 (from 2015).

And yes, Oleg actually seems to be using such ancient versions of
binutils.

So instead of depending on the implicit symbols from "section names
match and are representable C identifiers", just do this all manually.
It's not like it causes us any extra pain, we already have to do that
for all the other sections that we use that often have special
characters in them.

Reported-and-tested-by: Oleg Nesterov <oleg@redhat.com>
Link: https://sourceware.org/binutils/docs/ld/Input-Section-Example.html
Link: https://lore.kernel.org/all/20240802114518.GA20924@redhat.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 months agoMerge tag 'tags/fixes-media-uvc-20230722' of git://git.kernel.org/pub/scm/linux/kerne...
Hans Verkuil [Sat, 3 Aug 2024 09:01:04 +0000 (11:01 +0200)]
Merge tag 'tags/fixes-media-uvc-20230722' of git://git.kernel.org/pub/scm/linux/kernel/git/pinchartl/linux.git

uvcvideo v6.11 regression fix: fix custom control mapping probing

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
11 months agoMerge tag 'io_uring-6.11-20240802' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 2 Aug 2024 21:18:31 +0000 (14:18 -0700)]
Merge tag 'io_uring-6.11-20240802' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:
 "Two minor tweaks for the NAPI handling, both from Olivier:

   - Kill two unused list definitions

   - Ensure that multishot NAPI doesn't age away"

* tag 'io_uring-6.11-20240802' of git://git.kernel.dk/linux:
  io_uring: remove unused local list heads in NAPI functions
  io_uring: keep multishot request NAPI timeout current

11 months agoMerge tag 'thermal-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 2 Aug 2024 21:10:11 +0000 (14:10 -0700)]
Merge tag 'thermal-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "These fix a few issues related to the MSI IRQs management in the
  int340x thermal driver, fix a thermal core issue that may lead to
  missing trip point crossing events and update the thermal core
  documentation.

  Specifics:

   - Fix MSI error path cleanup in int340x, allow it to work with a
     subset of thermal MSI IRQs if some of them are not working and make
     it free all MSI IRQs on module exit (Srinivas Pandruvada)

   - Fix a thermal core issue that may lead to missing trip point
     crossing events in some cases when thermal_zone_set_trips() is used
     and update the thermal core documentation (Rafael Wysocki)"

* tag 'thermal-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: core: Update thermal zone registration documentation
  thermal: trip: Avoid skipping trips in thermal_zone_set_trips()
  thermal: intel: int340x: Free MSI IRQ vectors on module exit
  thermal: intel: int340x: Allow limited thermal MSI support
  thermal: intel: int340x: Fix kernel warning during MSI cleanup

11 months agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 2 Aug 2024 20:46:43 +0000 (13:46 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Expand the speculative SSBS errata workaround to more CPUs

 - Ensure jump label changes are visible to all CPUs with a
   kick_all_cpus_sync() (and also enable jump label batching as part of
   the fix)

 - The shadow call stack sanitiser is currently incompatible with Rust,
   make CONFIG_RUST conditional on !CONFIG_SHADOW_CALL_STACK

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: jump_label: Ensure patched jump_labels are visible to all CPUs
  rust: SHADOW_CALL_STACK is incompatible with Rust
  arm64: errata: Expand speculative SSBS workaround (again)
  arm64: cputype: Add Cortex-A725 definitions
  arm64: cputype: Add Cortex-X1C definitions

11 months agoMerge tag 'ceph-for-6.11-rc2' of https://github.com/ceph/ceph-client
Linus Torvalds [Fri, 2 Aug 2024 17:33:06 +0000 (10:33 -0700)]
Merge tag 'ceph-for-6.11-rc2' of https://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A fix for a potential hang in the MDS when cap revocation races with
  the client releasing the caps in question, marked for stable"

* tag 'ceph-for-6.11-rc2' of https://github.com/ceph/ceph-client:
  ceph: force sending a cap update msg back to MDS for revoke op

11 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 2 Aug 2024 17:17:49 +0000 (10:17 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "The bulk of the changes here is a largish change to guest_memfd,
  delaying the clearing and encryption of guest-private pages until they
  are actually added to guest page tables. This started as "let's make
  it impossible to misuse the API" for SEV-SNP; but then it ballooned a
  bit.

  The new logic is generally simpler and more ready for hugepage support
  in guest_memfd.

  Summary:

   - fix latent bug in how usage of large pages is determined for
     confidential VMs

   - fix "underline too short" in docs

   - eliminate log spam from limited APIC timer periods

   - disallow pre-faulting of memory before SEV-SNP VMs are initialized

   - delay clearing and encrypting private memory until it is added to
     guest page tables

   - this change also enables another small cleanup: the checks in
     SNP_LAUNCH_UPDATE that limit it to non-populated, private pages can
     now be moved in the common kvm_gmem_populate() function

   - fix compilation error that the RISC-V merge introduced in selftests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/mmu: fix determination of max NPT mapping level for private pages
  KVM: riscv: selftests: Fix compile error
  KVM: guest_memfd: abstract how prepared folios are recorded
  KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns
  KVM: extend kvm_range_has_memory_attributes() to check subset of attributes
  KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes()
  KVM: guest_memfd: move check for already-populated page to common code
  KVM: remove kvm_arch_gmem_prepare_needed()
  KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm
  KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest
  KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn
  KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_*
  KVM: guest_memfd: do not go through struct page
  KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation
  KVM: guest_memfd: return folio from __kvm_gmem_get_pfn()
  KVM: x86: disallow pre-fault for SNP VMs before initialization
  KVM: Documentation: Fix title underline too short warning
  KVM: x86: Eliminate log spam from limited APIC timer periods

11 months agoMerge branch 'kvm-fixes' into HEAD
Paolo Bonzini [Fri, 2 Aug 2024 16:31:48 +0000 (12:31 -0400)]
Merge branch 'kvm-fixes' into HEAD

* fix latent bug in how usage of large pages is determined for
  confidential VMs

* fix "underline too short" in docs

* eliminate log spam from limited APIC timer periods

* disallow pre-faulting of memory before SEV-SNP VMs are initialized

* delay clearing and encrypting private memory until it is added to
  guest page tables

* this change also enables another small cleanup: the checks in
  SNP_LAUNCH_UPDATE that limit it to non-populated, private pages
  can now be moved in the common kvm_gmem_populate() function

11 months agoMerge tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 2 Aug 2024 16:33:35 +0000 (09:33 -0700)]
Merge tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A fix to avoid dropping some of the internal pseudo-extensions, which
   breaks *envcfg dependency parsing

 - The kernel entry address is now aligned in purgatory, which avoids a
   misaligned load that can lead to crash on systems that don't support
   misaligned accesses early in boot

 - The FW_SFENCE_VMA_RECEIVED perf event was duplicated in a handful of
   perf JSON configurations, one of them been updated to
   FW_SFENCE_VMA_ASID_SENT

 - The starfive cache driver is now restricted to 64-bit systems, as it
   isn't 32-bit clean

 - A fix for to avoid aliasing legacy-mode perf counters with software
   perf counters

 - VM_FAULT_SIGSEGV is now handled in the page fault code

 - A fix for stalls during CPU hotplug due to IPIs being disabled

 - A fix for memblock bounds checking. This manifests as a crash on
   systems with discontinuous memory maps that have regions that don't
   fit in the linear map

* tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix linear mapping checks for non-contiguous memory regions
  RISC-V: Enable the IPI before workqueue_online_cpu()
  riscv/mm: Add handling for VM_FAULT_SIGSEGV in mm_fault_error()
  perf: riscv: Fix selecting counters in legacy mode
  cache: StarFive: Require a 64-bit system
  perf arch events: Fix duplicate RISC-V SBI firmware event name
  riscv/purgatory: align riscv_kernel_entry
  riscv: cpufeature: Do not drop Linux-internal extensions

11 months agoMerge tag 'kvm-riscv-fixes-6.11-1' of https://github.com/kvm-riscv/linux into HEAD
Paolo Bonzini [Fri, 2 Aug 2024 16:31:29 +0000 (12:31 -0400)]
Merge tag 'kvm-riscv-fixes-6.11-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv fixes for 6.11, take #1

- Fix compile error in get-reg-list selftests

11 months agoMerge tag 's390-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Fri, 2 Aug 2024 16:29:54 +0000 (09:29 -0700)]
Merge tag 's390-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - remove unused empty CPU alternatives header file

 - fix recently and erroneously removed exception handling when loading
   an invalid floating point register

 - ptdump fixes to reflect the recent changes due to the uncoupling of
   physical vs virtual kernel address spaces

 - changes to avoid the unnecessary splitting of large pages in kernel
   mappings

 - add the missing MODULE_DESCRIPTION for the CIO modules

* tag 's390-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: Keep inittext section writable
  s390/vmlinux.lds.S: Move ro_after_init section behind rodata section
  s390/mm: Get rid of RELOC_HIDE()
  s390/mm/ptdump: Improve sorting of markers
  s390/mm/ptdump: Add support for relocated lowcore mapping
  s390/mm/ptdump: Fix handling of identity mapping area
  s390/cio: Add missing MODULE_DESCRIPTION() macros
  s390/alternatives: Remove unused empty header file
  s390/fpu: Re-add exception handling in load_fpu_state()

11 months agoclocksource: Fix brown-bag boolean thinko in cs_watchdog_read()
Paul E. McKenney [Fri, 2 Aug 2024 15:46:15 +0000 (08:46 -0700)]
clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()

The current "nretries > 1 || nretries >= max_retries" check in
cs_watchdog_read() will always evaluate to true, and thus pr_warn(), if
nretries is greater than 1.  The intent is instead to never warn on the
first try, but otherwise warn if the successful retry was the last retry.

Therefore, change that "||" to "&&".

Fixes: db3a34e17433 ("clocksource: Retry clock read if long delays detected")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240802154618.4149953-2-paulmck@kernel.org
11 months agoMerge tag 'asm-generic-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 2 Aug 2024 16:14:48 +0000 (09:14 -0700)]
Merge tag 'asm-generic-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic fixes from Arnd Bergmann:
 "These are three important bug fixes for the cross-architecture tree,
  fixing a regression with the new syscall.tbl file, the inconsistent
  numbering for the new uretprobe syscall and a bug with iowrite64be on
  alpha"

* tag 'asm-generic-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  syscalls: fix syscall macros for newfstat/newfstatat
  uretprobe: change syscall number, again
  alpha: fix ioread64be()/iowrite64be() helpers

11 months agoMerge tag 'sound-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 2 Aug 2024 16:04:57 +0000 (09:04 -0700)]
Merge tag 'sound-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A small collection of fixes:

   - Revert of FireWire changes that caused a long-time regression

   - Another long-time regression fix for AMD HDMI

   - MIDI2 UMP fixes

   - HD-audio Conexant codec fixes and a quirk"

* tag 'sound-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda: Conditionally use snooping for AMD HDMI
  ALSA: usb-audio: Correct surround channels in UAC1 channel map
  ALSA: seq: ump: Explicitly reset RPN with Null RPN
  ALSA: seq: ump: Transmit RPN/NRPN message at each MSB/LSB data reception
  ALSA: seq: ump: Use the common RPN/bank conversion context
  ALSA: ump: Explicitly reset RPN with Null RPN
  ALSA: ump: Transmit RPN/NRPN message at each MSB/LSB data reception
  Revert "ALSA: firewire-lib: operate for period elapse event in process context"
  Revert "ALSA: firewire-lib: obsolete workqueue for period update"
  ALSA: hda/realtek: Add quirk for Acer Aspire E5-574G
  ALSA: seq: ump: Optimize conversions from SysEx to UMP
  ALSA: hda/conexant: Mute speakers at suspend / shutdown
  ALSA: hda/generic: Add a helper to mute speakers at suspend/shutdown
  ALSA: hda: conexant: Fix headset auto detect fail in the polling mode

11 months agoMerge tag 'drm-fixes-2024-08-02' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 2 Aug 2024 15:59:09 +0000 (08:59 -0700)]
Merge tag 'drm-fixes-2024-08-02' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Regular weekly fixes. This is a bit larger than usual but doesn't seem
  too crazy.

  Most of it is vmwgfx changes that fix a bunch of issues with wayland
  userspaces with dma-buf/external buffers and modesetting fixes.

  Otherwise it's kinda spread out, v3d fixes some new ioctls, nouveau
  has regression revert and fixes, amdgpu, i915 and ast have some small
  fixes, and some core fixes spread about.

  client:
   - fix error code

  atomic:
   - allow damage clips with async flips
   - allow explicit sync with async flips

  kselftests:
   - fix dmabuf-heaps test

  panic:
   - fix schedule_work in panic paths

  panel:
   - fix OrangePi Neo orientation

  gpuvm:
   - fix missing dependency

  amdgpu:
   - SMU 14.x update
   - Fix contiguous VRAM handling for IB parsing
   - GFX 12 fix
   - Regression fix for old APUs

  i915:
   - Static analysis fix for int overflow
   - Fix for HDCP2_STREAM_STATUS macro and removal of PWR_CLK_STATE for gen12

  nouveau:
   - revert busy wait change that caused a resume regression
   - fix buffer placement fault on dynamic pm s/r
   - fix refcount underflow

  ast:
   - fix black screen on resume
   - wake during connector status detect

  v3d:
   - fix issues with perf/timestamp ioctls

  vmwgfx:
   - fix deadlock in dma-buf fence polling
   - fix screen surface refcounting
   - fix dumb buffer handling
   - fix support for external buffers
   - fix overlay with screen targets
   - trigger modeset on screen moves"

* tag 'drm-fixes-2024-08-02' of https://gitlab.freedesktop.org/drm/kernel: (31 commits)
  Revert "nouveau: rip out busy fence waits"
  nouveau: set placement to original placement on uvmm validate.
  drm/atomic: Allow userspace to use damage clips with async flips
  drm/atomic: Allow userspace to use explicit sync with atomic async flips
  drm/i915: Fix possible int overflow in skl_ddi_calculate_wrpll()
  drm/i915/hdcp: Fix HDCP2_STREAM_STATUS macro
  drm/ast: astdp: Wake up during connector status detection
  i915/perf: Remove code to update PWR_CLK_STATE for gen12
  kselftests: dmabuf-heaps: Ensure the driver name is null-terminated
  drm/client: Fix error code in drm_client_buffer_vmap_local()
  drm/amdgpu: Fix APU handling in amdgpu_pm_load_smu_firmware()
  drm/amdgpu: increase mes log buffer size for gfx12
  drm/amdgpu: fix contiguous handling for IB parsing v2
  drm/amdgpu/pm: support gpu_metrics sysfs interface for smu v14.0.2/3
  drm/vmwgfx: Trigger a modeset when the screen moves
  drm/vmwgfx: Fix overlay when using Screen Targets
  drm/vmwgfx: Add basic support for external buffers
  drm/vmwgfx: Fix handling of dumb buffers
  drm/vmwgfx: Make sure the screen surface is ref counted
  drm/vmwgfx: Fix a deadlock in dma buf fence polling
  ...

11 months agocifs: update internal version number
Steve French [Fri, 26 Jul 2024 23:44:16 +0000 (18:44 -0500)]
cifs: update internal version number

To 2.50

Signed-off-by: Steve French <stfrench@microsoft.com>
11 months agosmb: client: fix FSCTL_GET_REPARSE_POINT against NetApp
Paulo Alcantara [Thu, 1 Aug 2024 21:12:39 +0000 (18:12 -0300)]
smb: client: fix FSCTL_GET_REPARSE_POINT against NetApp

NetApp server requires the file to be open with FILE_READ_EA access in
order to support FSCTL_GET_REPARSE_POINT, otherwise it will return
STATUS_INVALID_DEVICE_REQUEST.  It doesn't make any sense because
there's no requirement for FILE_READ_EA bit to be set nor
STATUS_INVALID_DEVICE_REQUEST being used for something other than
"unsupported reparse points" in MS-FSA.

To fix it and improve compatibility, set FILE_READ_EA & SYNCHRONIZE
bits to match what Windows client currently does.

Tested-by: Sebastian Steinbeisser <Sebastian.Steinbeisser@lrz.de>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
11 months agosmb3: add dynamic tracepoints for shutdown ioctl
Steve French [Tue, 30 Jul 2024 05:26:21 +0000 (00:26 -0500)]
smb3: add dynamic tracepoints for shutdown ioctl

For debugging an umount failure in xfstests generic/043 generic/044 in some
configurations, we needed more information on the shutdown ioctl which
was suspected of being related to the cause, so tracepoints are added
in this patch e.g.

  "trace-cmd record -e smb3_shutdown_enter -e smb3_shutdown_done -e smb3_shutdown_err"

Sample output:
  godown-47084   [011] .....  3313.756965: smb3_shutdown_enter: flags=0x1 tid=0x733b3e75
  godown-47084   [011] .....  3313.756968: smb3_shutdown_done: flags=0x1 tid=0x733b3e75

Tested-by: Anthony Nandaa (Microsoft) <profnandaa@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
11 months agocifs: Remove cifs_aio_ctx
David Howells [Wed, 31 Jul 2024 10:30:00 +0000 (11:30 +0100)]
cifs: Remove cifs_aio_ctx

Remove struct cifs_aio_ctx and its associated alloc/release functions as it
is no longer used, the functions being taken over by netfslib.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
11 months agosmb: client: handle lack of FSCTL_GET_REPARSE_POINT support
Paulo Alcantara [Wed, 31 Jul 2024 13:23:39 +0000 (10:23 -0300)]
smb: client: handle lack of FSCTL_GET_REPARSE_POINT support

As per MS-FSA 2.1.5.10.14, support for FSCTL_GET_REPARSE_POINT is
optional and if the server doesn't support it,
STATUS_INVALID_DEVICE_REQUEST must be returned for the operation.

If we find files with reparse points and we can't read them due to
lack of client or server support, just ignore it and then treat them
as regular files or junctions.

Fixes: 5f71ebc41294 ("smb: client: parse reparse point flag in create response")
Reported-by: Sebastian Steinbeisser <Sebastian.Steinbeisser@lrz.de>
Tested-by: Sebastian Steinbeisser <Sebastian.Steinbeisser@lrz.de>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
11 months agoMerge tag 'ata-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata...
Linus Torvalds [Fri, 2 Aug 2024 15:54:16 +0000 (08:54 -0700)]
Merge tag 'ata-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fix from Damien Le Moal:

 - Add missing power-domains property to the device tree bindings for
   the Rockchip Designware AHCI adapter (from Heiko)

* tag 'ata-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  dt-bindings: ata: rockchip-dwc-ahci: add missing power-domains

11 months agoMerge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Fri, 2 Aug 2024 15:52:27 +0000 (08:52 -0700)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
 "do_dup2() out-of-bounds array speculation fix"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  protect the fetch of ->fd[fd] in do_dup2() from mispredictions

11 months agoarm64: jump_label: Ensure patched jump_labels are visible to all CPUs
Will Deacon [Wed, 31 Jul 2024 13:36:01 +0000 (14:36 +0100)]
arm64: jump_label: Ensure patched jump_labels are visible to all CPUs

Although the Arm architecture permits concurrent modification and
execution of NOP and branch instructions, it still requires some
synchronisation to ensure that other CPUs consistently execute the newly
written instruction:

 >  When the modified instructions are observable, each PE that is
 >  executing the modified instructions must execute an ISB or perform a
 >  context synchronizing event to ensure execution of the modified
 >  instructions

Prior to commit f6cc0c501649 ("arm64: Avoid calling stop_machine() when
patching jump labels"), the arm64 jump_label patching machinery
performed synchronisation using stop_machine() after each modification,
however this was problematic when flipping static keys from atomic
contexts (namely, the arm_arch_timer CPU hotplug startup notifier) and
so we switched to the _nosync() patching routines to avoid "scheduling
while atomic" BUG()s during boot.

In hindsight, the analysis of the issue in f6cc0c501649 isn't quite
right: it cites the use of IPIs in the default patching routines as the
cause of the lockup, whereas stop_machine() does not rely on IPIs and
the I-cache invalidation is performed using __flush_icache_range(),
which elides the call to kick_all_cpus_sync(). In fact, the blocking
wait for other CPUs is what triggers the BUG() and the problem remains
even after f6cc0c501649, for example because we could block on the
jump_label_mutex. Eventually, the arm_arch_timer driver was fixed to
avoid the static key entirely in commit a862fc2254bd
("clocksource/arm_arch_timer: Remove use of workaround static key").

This all leaves the jump_label patching code in a funny situation on
arm64 as we do not synchronise with other CPUs to reduce the likelihood
of a bug which no longer exists. Consequently, toggling a static key on
one CPU cannot be assumed to take effect on other CPUs, leading to
potential issues, for example with missing preempt notifiers.

Rather than revert f6cc0c501649 and go back to stop_machine() for each
patch site, implement arch_jump_label_transform_apply() and kick all
the other CPUs with an IPI at the end of patching.

Cc: Alexander Potapenko <glider@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Fixes: f6cc0c501649 ("arm64: Avoid calling stop_machine() when patching jump labels")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240731133601.3073-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 months agosyscalls: fix syscall macros for newfstat/newfstatat
Arnd Bergmann [Thu, 1 Aug 2024 12:27:23 +0000 (14:27 +0200)]
syscalls: fix syscall macros for newfstat/newfstatat

The __NR_newfstat and __NR_newfstatat macros accidentally got renamed
in the conversion to the syscall.tbl format, dropping the 'new' portion
of the name.

In an unrelated change, the two syscalls are no longer architecture
specific but are once more defined on all 64-bit architectures, so the
'newstat' ABI keyword can be dropped from the table as a simplification.

Fixes: Fixes: 4fe53bf2ba0a ("syscalls: add generic scripts/syscall.tbl")
Closes: https://lore.kernel.org/lkml/838053e0-b186-4e9f-9668-9a3384a71f23@app.fastmail.com/T/#t
Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
11 months agouretprobe: change syscall number, again
Arnd Bergmann [Tue, 30 Jul 2024 15:30:40 +0000 (17:30 +0200)]
uretprobe: change syscall number, again

Despite multiple attempts to get the syscall number assignment right
for the newly added uretprobe syscall, we ended up with a bit of a mess:

 - The number is defined as 467 based on the assumption that the
   xattrat family of syscalls would use 463 through 466, but those
   did not make it into 6.11.

 - The include/uapi/asm-generic/unistd.h file still lists the number
   463, but the new scripts/syscall.tbl that was supposed to have the
   same data lists 467 instead as the number for arc, arm64, csky,
   hexagon, loongarch, nios2, openrisc and riscv. None of these
   architectures actually provide a uretprobe syscall.

 - All the other architectures (powerpc, arm, mips, ...) don't list
   this syscall at all.

There are two ways to make it consistent again: either list it with
the same syscall number on all architectures, or only list it on x86
but not in scripts/syscall.tbl and asm-generic/unistd.h.

Based on the most recent discussion, it seems like we won't need it
anywhere else, so just remove the inconsistent assignment and instead
move the x86 number to the next available one in the architecture
specific range, which is 335.

Fixes: 5c28424e9a34 ("syscalls: Fix to add sys_uretprobe to syscall.tbl")
Fixes: 190fec72df4a ("uprobe: Wire up uretprobe system call")
Fixes: 63ded110979b ("uprobe: Change uretprobe syscall scope and number")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
11 months agothermal: core: Update thermal zone registration documentation
Rafael J. Wysocki [Thu, 1 Aug 2024 16:39:28 +0000 (18:39 +0200)]
thermal: core: Update thermal zone registration documentation

The thermal sysfs API document is outdated.  One of the problems with
it is that is still documents thermal_zone_device_register() which
does not exit any more and it does not reflect the current thermal
zone operations definition.

Replace the thermal_zone_device_register() description in it with
a thermal_zone_device_register_with_trips() description, including
an update of the thermal zone operations list.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/2767845.mvXUDI8C0e@rjwysocki.net
11 months agoRevert "nouveau: rip out busy fence waits"
Dave Airlie [Fri, 2 Aug 2024 04:38:28 +0000 (14:38 +1000)]
Revert "nouveau: rip out busy fence waits"

This reverts commit d45bb9c5f7a6f7b6e47939856b28cb1da0cdc119.

Just got a report that this causes some suspend/resume issues,
so back it out and I'll investigate it later.

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
11 months agoMerge tag 'drm-misc-fixes-2024-08-01' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Fri, 2 Aug 2024 02:14:28 +0000 (12:14 +1000)]
Merge tag 'drm-misc-fixes-2024-08-01' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

A couple drm_panic fixes, several v3d fixes to increase the new timestamp API
safety, several fixes for vmwgfx for various modesetting issues, PM fixes
for ast, async flips improvements and two fixes for nouveau to fix
resource refcounting and buffer placement.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240801-interesting-antique-bat-2fe4c0@houat
11 months agoMerge tag 'drm-intel-fixes-2024-08-01' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Fri, 2 Aug 2024 01:19:14 +0000 (11:19 +1000)]
Merge tag 'drm-intel-fixes-2024-08-01' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Static analysis fix for int overflow
- Fix for HDCP2_STREAM_STATUS macro and removal of PWR_CLK_STATE for gen12

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZqslBkcZlInYdYgm@jlahtine-mobl.ger.corp.intel.com
11 months agoMerge tag 'amd-drm-fixes-6.11-2024-07-27' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Thu, 1 Aug 2024 22:21:34 +0000 (08:21 +1000)]
Merge tag 'amd-drm-fixes-6.11-2024-07-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.11-2024-07-27:

amdgpu:
- SMU 14.x update
- Fix contiguous VRAM handling for IB parsing
- GFX 12 fix
- Regression fix for old APUs

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240728025407.2115881-1-alexander.deucher@amd.com
11 months agoperf mem: Add -T/--data-type option to report subcommand
Namhyung Kim [Wed, 31 Jul 2024 23:55:05 +0000 (16:55 -0700)]
perf mem: Add -T/--data-type option to report subcommand

This is just a shortcut to have 'type' in the sort key and use more
compact output format like below.

  $ perf mem report -T
  ...
  #
  # Overhead       Samples  Memory access                            Snoop         TLB access              Data Type
  # ........  ............  .......................................  ............  ......................  .........
  #
      14.84%            22  L1 hit                                   None          L1 or L2 hit            (unknown)
       7.68%             8  LFB/MAB hit                              None          L1 or L2 hit            (unknown)
       7.17%             3  RAM hit                                  Hit           L2 miss                 (unknown)
       6.29%            12  L1 hit                                   None          L1 or L2 hit            (stack operation)
       4.85%             5  RAM hit                                  Hit           L1 or L2 hit            (unknown)
       3.97%             5  LFB/MAB hit                              None          L1 or L2 hit            struct psi_group_cpu
       3.18%             3  LFB/MAB hit                              None          L1 or L2 hit            (stack operation)
       2.58%             3  L1 hit                                   None          L1 or L2 hit            unsigned int
       2.36%             2  L1 hit                                   None          L1 or L2 hit            struct
       2.31%             2  L1 hit                                   None          L1 or L2 hit            struct psi_group_cpu
  ...

Users also can use their own sort keys and -T option makes sure it has
the 'type' sort key at the end.

  $ perf mem report -T -s mem

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240731235505.710436-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf mem: Add -s/--sort option
Namhyung Kim [Wed, 31 Jul 2024 23:55:04 +0000 (16:55 -0700)]
perf mem: Add -s/--sort option

So that users can set the sort key manually as they want.

  $ perf mem report -s
   Error: switch `s' requires a value
   Usage: perf mem report [<options>]

      -s, --sort <key[,key2...]>
                          sort by key(s): overhead overhead_sys overhead_us overhead_guest_sys
     overhead_guest_us overhead_children sample period
     weight1 weight2 weight3 ins_lat retire_lat p_stage_cyc
     pid comm dso symbol parent cpu socket srcline srcfile
     local_weight weight transaction trace symbol_size
     dso_size cgroup cgroup_id ipc_null time code_page_size
     local_ins_lat ins_lat local_p_stage_cyc p_stage_cyc
     addr local_retire_lat retire_lat simd type typeoff
     symoff symbol_daddr dso_daddr locked tlb mem snoop
     dcacheline symbol_iaddr phys_daddr data_page_size
     blocked

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240731235505.710436-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf tools: Add mode argument to sort_help()
Namhyung Kim [Wed, 31 Jul 2024 23:55:03 +0000 (16:55 -0700)]
perf tools: Add mode argument to sort_help()

Some sort keys are meaningful only in a specific mode - like branch
stack and memory (data-src).  Add the mode to skip unnecessary ones.
This will be used for 'perf mem report' later.

While at it, change the prefix for the -F/--fields option to remove
the duplicate part.

Before:

  $ perf report -F
   Error: switch `F' requires a value
   Usage: perf report [<options>]

      -F, --fields <key[,keys...]>
     output field(s): overhead period sample  overhead overhead_sys
     overhead_us overhead_guest_sys overhead_guest_us overhead_children
     sample period weight1 weight2 weight3 ins_lat retire_lat
     ...
After:

  $ perf report -F
   Error: switch `F' requires a value
   Usage: perf report [<options>]

      -F, --fields <key[,keys...]>
     output field(s): overhead overhead_sys overhead_us
     overhead_guest_sys overhead_guest_us overhead_children
     sample period weight1 weight2 weight3 ins_lat retire_lat
     ...

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240731235505.710436-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf mem: Rework command option handling
Namhyung Kim [Wed, 31 Jul 2024 23:55:02 +0000 (16:55 -0700)]
perf mem: Rework command option handling

Split the common option and ones for record or report.  Otherwise -U in
the record option cannot be used because it clashes with in the common
(or report) option.  Also rename report_events() to __cmd_report() to
follow the convention and to be sync with the record part.

Also set the flag PARSE_OPT_STOP_AT_NON_OPTION for the common option so
that it can show the help message in the subcommand like below:

  $ perf mem record -h

   Usage: perf mem record [<options>] [<command>]
      or: perf mem record [<options>] -- <command> [<options>]

      -C, --cpu <cpu>       list of cpus to profile
      -e, --event <event>   event selector. use 'perf mem record -e list' to list available events
      -f, --force           don't complain, do it
      -K, --all-kernel      collect only kernel level data
      -p, --phys-data       Record/Report sample physical addresses
      -t, --type <type>     memory operations(load,store) Default load,store
      -U, --all-user        collect only user level data
      -v, --verbose         be more verbose (show counter open errors, etc)
          --data-page-size  Record/Report sample data address page size
          --ldlat <n>       mem-loads latency

Cc: Aditya Gupta <adityag@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240731235505.710436-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf mem: Free the allocated sort string, fixing a leak
Namhyung Kim [Wed, 31 Jul 2024 23:55:01 +0000 (16:55 -0700)]
perf mem: Free the allocated sort string, fixing a leak

The get_sort_order() returns either a new string (from strdup) or NULL
but it never gets freed.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fixes: 2e7f545096f954a9 ("perf mem: Factor out a function to generate sort order")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240731235505.710436-3-namhyung@kernel.org
[ Added Fixes tag ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf hist: Correct hist_entry->mem_info refcounts
Namhyung Kim [Wed, 31 Jul 2024 23:55:00 +0000 (16:55 -0700)]
perf hist: Correct hist_entry->mem_info refcounts

The 'struct mem_info' is created by iter_prepare_mem_entry() at the
beginning and destroyed by iter_finish_mem_entry() at the end.

So if it's used in a new hist_entry, it should be cloned.

Simplify (hopefully) the logic by adding some helper functions and by
not holding the refcount in the temporary entry.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240731235505.710436-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf python: Remove PYTHON_PERF ifdefs
Ian Rogers [Wed, 31 Jul 2024 23:00:05 +0000 (16:00 -0700)]
perf python: Remove PYTHON_PERF ifdefs

When perf code was compiled one way for the binary and another for the
python module, the PYTHON_PERF ifdef was used to remove some code from
the python module.

Since switching to building the perf code as a series of libraries, with
the same libraries being used for the python module, the ifdefs became
unused as PYTHON_PERF is never defined. As such remove the ifdefs.

Fixes: 9dabf4003423c8d3 ("perf python: Switch module to linking libraries from building source")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240731230005.12295-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf jevents: Autogenerate empty-pmu-events.c
Ian Rogers [Tue, 30 Jul 2024 19:17:44 +0000 (12:17 -0700)]
perf jevents: Autogenerate empty-pmu-events.c

empty-pmu-events.c exists so that builds may occur without python
being installed on a system. Manually updating empty-pmu-events.c to
be in sync with jevents.py is a pain, let's use jevents.py to generate
empty-pmu-events.c.

1) change jevents.py so that an arch and model of none cause
   generation of a pmu-events.c without any json. Add a SPDX and
   autogenerated warning to the start of the file.

2) change Build so that if a generated pmu-events.c for arch none and
   model none doesn't match empty-pmu-events.c the build fails with a
   cat of the differences. Update Makefile.perf to clean up the files
   used for this.

3) update empty-pmu-events.c to match the output of jevents.py with
   arch and mode of none.

Committer notes:

The firtst paragraph is confusing, so I asked and Ian further clarified:

 ---
The requirement for python hasn't changed.

Case 1: no python or NO_JEVENTS=1
Build happens using empty-pmu-events.c that is checked in, no python
is required.

Case 2: python
pmu-events.c is created by jevents.py (requiring python) and then built.
This change adds a step where the empty-pmu-events.c is created using
jevents.py and that file is diffed against the checked in version.

This stops the checked in empty-pmu-events.c diverging if changes are
made to jevents.py. If the diff causes the build to fail then you just
copy the diff empty-pmu-events.c over the checked in one.
 ---

Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Sang <oliver.sang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philip Li <philip.li@intel.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240730191744.3097329-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf bpf: Move BPF disassembly routines to separate file to avoid clash with capstone...
Arnaldo Carvalho de Melo [Wed, 31 Jul 2024 14:58:56 +0000 (11:58 -0300)]
perf bpf: Move BPF disassembly routines to separate file to avoid clash with capstone bpf headers

There is a clash of the libbpf and capstone libraries, that ends up
with:

  In file included from /usr/include/capstone/capstone.h:325,
                   from util/disasm.c:1513:
  /usr/include/capstone/bpf.h:94:14: error: ‘bpf_insn’ defined as wrong kind of tag
     94 | typedef enum bpf_insn {

So far we're just trying to avoid this by not having both headers
included in the same .c or .h file, do it one more time by moving the
BPF diassembly routines from util/disasm.c to util/disasm_bpf.c.

This is only being hit when building with BUILD_NONDISTRO=1, i.e.
building with binutils-devel, that isn't the in the default build due to
a licencing clash. We need to reimplement what is now isolated in
util/disasm_bpf.c using some other library to have BPF annotation
feature that now only is available with BUILD_NONDISTRO=1.

Fixes: 6d17edc113de1e21 ("perf annotate: Use libcapstone to disassemble")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZqpUSKPxMwaQKORr@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoprotect the fetch of ->fd[fd] in do_dup2() from mispredictions
Al Viro [Thu, 1 Aug 2024 19:22:22 +0000 (15:22 -0400)]
protect the fetch of ->fd[fd] in do_dup2() from mispredictions

both callers have verified that fd is not greater than ->max_fds;
however, misprediction might end up with
        tofree = fdt->fd[fd];
being speculatively executed.  That's wrong for the same reasons
why it's wrong in close_fd()/file_close_fd_locked(); the same
solution applies - array_index_nospec(fd, fdt->max_fds) could differ
from fd only in case of speculative execution on mispredicted path.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 months agox86/uaccess: Zero the 8-byte get_range case on failure on 32-bit
David Gow [Wed, 31 Jul 2024 07:30:29 +0000 (15:30 +0800)]
x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit

While zeroing the upper 32 bits of an 8-byte getuser on 32-bit x86 was
fixed by commit 8c860ed825cb ("x86/uaccess: Fix missed zeroing of ia32 u64
get_user() range checking") it was broken again in commit 8a2462df1547
("x86/uaccess: Improve the 8-byte getuser() case").

This is because the register which holds the upper 32 bits (%ecx) is being
cleared _after_ the check_range, so if the range check fails, %ecx is never
cleared.

This can be reproduced with:
./tools/testing/kunit/kunit.py run --arch i386 usercopy

Instead, clear %ecx _before_ check_range in the 8-byte case. This
reintroduces a bit of the ugliness we were trying to avoid by adding
another #ifndef CONFIG_X86_64, but at least keeps check_range from needing
a separate bad_get_user_8 jump.

Fixes: 8a2462df1547 ("x86/uaccess: Improve the 8-byte getuser() case")
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/20240731073031.4045579-1-davidgow@google.com
11 months agoriscv: Fix linear mapping checks for non-contiguous memory regions
Stuart Menefy [Sat, 22 Jun 2024 11:42:16 +0000 (12:42 +0100)]
riscv: Fix linear mapping checks for non-contiguous memory regions

The RISC-V kernel already has checks to ensure that memory which would
lie outside of the linear mapping is not used. However those checks
use memory_limit, which is used to implement the mem= kernel command
line option (to limit the total amount of memory, not its address
range). When memory is made up of two or more non-contiguous memory
banks this check is incorrect.

Two changes are made here:
 - add a call in setup_bootmem() to memblock_cap_memory_range() which
   will cause any memory which falls outside the linear mapping to be
   removed from the memory regions.
 - remove the check in create_linear_mapping_page_table() which was
   intended to remove memory which is outside the liner mapping based
   on memory_limit, as it is no longer needed. Note a check for
   mapping more memory than memory_limit (to implement mem=) is
   unnecessary because of the existing call to
   memblock_enforce_memory_limit().

This issue was seen when booting on a SV39 platform with two memory
banks:
  0x00,80000000 1GiB
  0x20,00000000 32GiB
This memory range is 158GiB from top to bottom, but the linear mapping
is limited to 128GiB, so the lower block of RAM will be mapped at
PAGE_OFFSET, and the upper block straddles the top of the linear
mapping.

This causes the following Oops:
[    0.000000] Linux version 6.10.0-rc2-gd3b8dd5b51dd-dirty (stuart.menefy@codasip.com) (riscv64-codasip-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41.0.20231213) #20 SMP Sat Jun 22 11:34:22 BST 2024
[    0.000000] memblock_add: [0x0000000080000000-0x00000000bfffffff] early_init_dt_add_memory_arch+0x4a/0x52
[    0.000000] memblock_add: [0x0000002000000000-0x00000027ffffffff] early_init_dt_add_memory_arch+0x4a/0x52
...
[    0.000000] memblock_alloc_try_nid: 23724 bytes align=0x8 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 early_init_dt_alloc_memory_arch+0x1e/0x48
[    0.000000] memblock_reserve: [0x00000027ffff5350-0x00000027ffffaffb] memblock_alloc_range_nid+0xb8/0x132
[    0.000000] Unable to handle kernel paging request at virtual address fffffffe7fff5350
[    0.000000] Oops [#1]
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.10.0-rc2-gd3b8dd5b51dd-dirty #20
[    0.000000] Hardware name: codasip,a70x (DT)
[    0.000000] epc : __memset+0x8c/0x104
[    0.000000]  ra : memblock_alloc_try_nid+0x74/0x84
[    0.000000] epc : ffffffff805e88c8 ra : ffffffff806148f6 sp : ffffffff80e03d50
[    0.000000]  gp : ffffffff80ec4158 tp : ffffffff80e0bec0 t0 : fffffffe7fff52f8
[    0.000000]  t1 : 00000027ffffb000 t2 : 5f6b636f6c626d65 s0 : ffffffff80e03d90
[    0.000000]  s1 : 0000000000005cac a0 : fffffffe7fff5350 a1 : 0000000000000000
[    0.000000]  a2 : 0000000000005cac a3 : fffffffe7fffaff8 a4 : 000000000000002c
[    0.000000]  a5 : ffffffff805e88c8 a6 : 0000000000005cac a7 : 0000000000000030
[    0.000000]  s2 : fffffffe7fff5350 s3 : ffffffffffffffff s4 : 0000000000000000
[    0.000000]  s5 : ffffffff8062347e s6 : 0000000000000000 s7 : 0000000000000001
[    0.000000]  s8 : 0000000000002000 s9 : 00000000800226d0 s10: 0000000000000000
[    0.000000]  s11: 0000000000000000 t3 : ffffffff8080a928 t4 : ffffffff8080a928
[    0.000000]  t5 : ffffffff8080a928 t6 : ffffffff8080a940
[    0.000000] status: 0000000200000100 badaddr: fffffffe7fff5350 cause: 000000000000000f
[    0.000000] [<ffffffff805e88c8>] __memset+0x8c/0x104
[    0.000000] [<ffffffff8062349c>] early_init_dt_alloc_memory_arch+0x1e/0x48
[    0.000000] [<ffffffff8043e892>] __unflatten_device_tree+0x52/0x114
[    0.000000] [<ffffffff8062441e>] unflatten_device_tree+0x9e/0xb8
[    0.000000] [<ffffffff806046fe>] setup_arch+0xd4/0x5bc
[    0.000000] [<ffffffff806007aa>] start_kernel+0x76/0x81a
[    0.000000] Code: b823 02b2 bc23 02b2 b023 04b2 b423 04b2 b823 04b2 (bc23) 04b2
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

The problem is that memblock (unaware that some physical memory cannot
be used) has allocated memory from the top of memory but which is
outside the linear mapping region.

Signed-off-by: Stuart Menefy <stuart.menefy@codasip.com>
Fixes: c99127c45248 ("riscv: Make sure the linear mapping does not use the kernel mapping")
Reviewed-by: David McKay <david.mckay@codasip.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240622114217.2158495-1-stuart.menefy@codasip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
11 months agoMerge tag 'pci-v6.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Thu, 1 Aug 2024 18:30:15 +0000 (11:30 -0700)]
Merge tag 'pci-v6.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI fixes from Bjorn Helgaas:

 - Fix a pci_intx() regression that caused driver reload to fail with
   "Resources present before probing" (Philipp Stanner)

 - Fix a pciehp regression that clobbered the upper bits of RAID status
   LEDs on NVMe devices behind an Intel VMD (Blazej Kucman)

* tag 'pci-v6.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: pciehp: Retain Power Indicator bits for userspace indicators
  PCI: Fix devres regression in pci_intx()

11 months agoKVM: x86/mmu: fix determination of max NPT mapping level for private pages
Ackerley Tng [Thu, 1 Aug 2024 17:39:55 +0000 (17:39 +0000)]
KVM: x86/mmu: fix determination of max NPT mapping level for private pages

The `if (req_max_level)` test was meant ignore req_max_level if
PG_LEVEL_NONE was returned. Hence, this function should return
max_level instead of the ignored req_max_level.

This is only a latent issue for now, since guest_memfd does not
support large pages.

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Message-ID: <20240801173955.1975034-1-ackerleytng@google.com>
Fixes: f32fb32820b1 ("KVM: x86: Add hook for determining max NPT mapping level")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoPCI: pciehp: Retain Power Indicator bits for userspace indicators
Blazej Kucman [Mon, 22 Jul 2024 14:14:40 +0000 (16:14 +0200)]
PCI: pciehp: Retain Power Indicator bits for userspace indicators

The sysfs "attention" file normally controls the Slot Control Attention
Indicator with 0 (off), 1 (on), 2 (blink) settings.

576243b3f9ea ("PCI: pciehp: Allow exclusive userspace control of
indicators") added pciehp_set_raw_indicator_status() to allow userspace to
directly control all four bits in both the Attention Indicator and the
Power Indicator fields via the "attention" file.

This is used on Intel VMD bridges so utilities like "ledmon" can use sysfs
"attention" to control up to 16 indicators for NVMe device RAID status.

abaaac4845a0 ("PCI: hotplug: Use FIELD_GET/PREP()") broke this by masking
the sysfs data with PCI_EXP_SLTCTL_AIC, which discards the upper two bits
intended for the Power Indicator Control field (PCI_EXP_SLTCTL_PIC).

For NVMe devices behind an Intel VMD, ledmon settings that use the
PCI_EXP_SLTCTL_PIC bits, i.e., ATTENTION_REBUILD (0x5), ATTENTION_LOCATE
(0x7), ATTENTION_FAILURE (0xD), ATTENTION_OFF (0xF), no longer worked
correctly.

Mask with PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC to retain both the
Attention Indicator and the Power Indicator bits.

Fixes: abaaac4845a0 ("PCI: hotplug: Use FIELD_GET/PREP()")
Link: https://lore.kernel.org/r/20240722141440.7210-1-blazej.kucman@intel.com
Signed-off-by: Blazej Kucman <blazej.kucman@intel.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v6.7+
11 months agoPCI: Fix devres regression in pci_intx()
Philipp Stanner [Thu, 25 Jul 2024 12:07:30 +0000 (14:07 +0200)]
PCI: Fix devres regression in pci_intx()

pci_intx() becomes managed if pcim_enable_device() has been called in
advance. Commit 25216afc9db5 ("PCI: Add managed pcim_intx()") changed this
behavior so that pci_intx() always leads to creation of a separate device
resource for itself, whereas earlier, a shared resource was used for all
PCI devres operations.

Unfortunately, pci_intx() seems to be used in some drivers' remove() paths;
in the managed case this causes a device resource to be created on driver
detach, which causes .probe() to fail if the driver is reloaded:

  pci 0000:00:1f.2: Resources present before probing

Fix the regression by only redirecting pci_intx() to its managed twin
pcim_intx() if the pci_command changes.

Link: https://lore.kernel.org/r/20240725120729.59788-2-pstanner@redhat.com
Fixes: 25216afc9db5 ("PCI: Add managed pcim_intx()")
Reported-by: Damien Le Moal <dlemoal@kernel.org>
Closes: https://lore.kernel.org/all/b8f4ba97-84fc-4b7e-ba1a-99de2d9f0118@kernel.org/
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
[bhelgaas: add error message to commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
11 months agoMerge tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 1 Aug 2024 16:42:09 +0000 (09:42 -0700)]
Merge tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from wireless, bleutooth, BPF and netfilter.

  Current release - regressions:

   - core: drop bad gso csum_start and offset in virtio_net_hdr

   - wifi: mt76: fix null pointer access in mt792x_mac_link_bss_remove

   - eth: tun: add missing bpf_net_ctx_clear() in do_xdp_generic()

   - phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and
     aqr115c

  Current release - new code bugs:

   - smc: prevent UAF in inet_create()

   - bluetooth: btmtk: fix kernel crash when entering btmtk_usb_suspend

   - eth: bnxt: reject unsupported hash functions

  Previous releases - regressions:

   - sched: act_ct: take care of padding in struct zones_ht_key

   - netfilter: fix null-ptr-deref in iptable_nat_table_init().

   - tcp: adjust clamping window for applications specifying SO_RCVBUF

  Previous releases - always broken:

   - ethtool: rss: small fixes to spec and GET

   - mptcp:
      - fix signal endpoint re-add
      - pm: fix backup support in signal endpoints

   - wifi: ath12k: fix soft lockup on suspend

   - eth: bnxt_en: fix RSS logic in __bnxt_reserve_rings()

   - eth: ice: fix AF_XDP ZC timeout and concurrency issues

   - eth: mlx5:
      - fix missing lock on sync reset reload
      - fix error handling in irq_pool_request_irq"

* tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
  mptcp: fix duplicate data handling
  mptcp: fix bad RCVPRUNED mib accounting
  ipv6: fix ndisc_is_useropt() handling for PIO
  igc: Fix double reset adapter triggered from a single taprio cmd
  net: MAINTAINERS: Demote Qualcomm IPA to "maintained"
  net: wan: fsl_qmc_hdlc: Discard received CRC
  net: wan: fsl_qmc_hdlc: Convert carrier_lock spinlock to a mutex
  net/mlx5e: Add a check for the return value from mlx5_port_set_eth_ptys
  net/mlx5e: Fix CT entry update leaks of modify header context
  net/mlx5e: Require mlx5 tc classifier action support for IPsec prio capability
  net/mlx5: Fix missing lock on sync reset reload
  net/mlx5: Lag, don't use the hardcoded value of the first port
  net/mlx5: DR, Fix 'stack guard page was hit' error in dr_rule
  net/mlx5: Fix error handling in irq_pool_request_irq
  net/mlx5: Always drain health in shutdown callback
  net: Add skbuff.h to MAINTAINERS
  r8169: don't increment tx_dropped in case of NETDEV_TX_BUSY
  netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init().
  netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init().
  net: drop bad gso csum_start and offset in virtio_net_hdr
  ...

11 months agorust: SHADOW_CALL_STACK is incompatible with Rust
Alice Ryhl [Mon, 29 Jul 2024 14:22:49 +0000 (14:22 +0000)]
rust: SHADOW_CALL_STACK is incompatible with Rust

When using the shadow call stack sanitizer, all code must be compiled
with the -ffixed-x18 flag, but this flag is not currently being passed
to Rust. This results in crashes that are extremely difficult to debug.

To ensure that nobody else has to go through the same debugging session
that I had to, prevent configurations that enable both SHADOW_CALL_STACK
and RUST.

It is rather common for people to backport 724a75ac9542 ("arm64: rust:
Enable Rust support for AArch64"), so I recommend applying this fix all
the way back to 6.1.

Cc: stable@vger.kernel.org # 6.1 and later
Fixes: 724a75ac9542 ("arm64: rust: Enable Rust support for AArch64")
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/r/20240729-shadow-call-stack-v4-1-2a664b082ea4@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 months agoperf test: Update sample filtering test
Namhyung Kim [Wed, 3 Jul 2024 22:30:35 +0000 (15:30 -0700)]
perf test: Update sample filtering test

Now it can run the BPF filtering test with normal user if the BPF
objects are pinned by 'sudo perf record --setup-filter pin'.  Let's
update the test case to verify the behavior.  It'll skip the test if the
filter check is failed from a normal user, but it shows a message how to
set up the filters.

First, run the test as a normal user and it fails.

  $ perf test -vv filtering
   95: perf record sample filtering (by BPF) tests:
  --- start ---
  test child forked, pid 425677
  Checking BPF-filter privilege
  try 'sudo perf record --setup-filter pin' first.       <<<--- here
  bpf-filter test [Skipped permission]
  ---- end(-2) ----
   95: perf record sample filtering (by BPF) tests                     : Skip

According to the message, run the perf record command to pin the BPF
objects.

  $ sudo perf record --setup-filter pin

And re-run the test as a normal user.

  $ perf test -vv filtering
   95: perf record sample filtering (by BPF) tests:
  --- start ---
  test child forked, pid 424486
  Checking BPF-filter privilege
  Basic bpf-filter test
  Basic bpf-filter test [Success]
  Failing bpf-filter test
  Error: task-clock event does not have PERF_SAMPLE_CPU
  Failing bpf-filter test [Success]
  Group bpf-filter test
  Error: task-clock event does not have PERF_SAMPLE_CPU
  Error: task-clock event does not have PERF_SAMPLE_CODE_PAGE_SIZE
  Group bpf-filter test [Success]
  ---- end(0) ----
   95: perf record sample filtering (by BPF) tests                     : Ok

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240703223035.2024586-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf record: Add --setup-filter option
Namhyung Kim [Wed, 3 Jul 2024 22:30:34 +0000 (15:30 -0700)]
perf record: Add --setup-filter option

To allow BPF filters for unprivileged users it needs to pin the BPF
objects to BPF-fs first.  Let's add a new option to pin and unpin the
objects easily.  I'm not sure 'perf record' is a right place to do this
but I don't have a better idea right now.

  $ sudo perf record --setup-filter pin

The above command would pin BPF program and maps for the filter when the
system has BPF-fs (usually at /sys/fs/bpf/).  To unpin the objects,
users can run the following command (as root).

  $ sudo perf record --setup-filter unpin

Committer testing:

  root@number:~# perf record --setup-filter pin
  root@number:~# ls -la /sys/fs/bpf/perf_filter/
  total 0
  drwxr-xr-x. 2 root root 0 Jul 31 10:43 .
  drwxr-xr-t. 3 root root 0 Jul 31 10:43 ..
  -rw-rw-rw-. 1 root root 0 Jul 31 10:43 dropped
  -rw-rw-rw-. 1 root root 0 Jul 31 10:43 filters
  -rwxrwxrwx. 1 root root 0 Jul 31 10:43 perf_sample_filter
  -rw-rw-rw-. 1 root root 0 Jul 31 10:43 pid_hash
  -rw-------. 1 root root 0 Jul 31 10:43 sample_f_rodata
  root@number:~# ls -la /sys/fs/bpf/perf_filter/perf_sample_filter
  -rwxrwxrwx. 1 root root 0 Jul 31 10:43 /sys/fs/bpf/perf_filter/perf_sample_filter
  root@number:~#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240703223035.2024586-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf record: Fix a potential error handling issue
Namhyung Kim [Wed, 3 Jul 2024 22:30:33 +0000 (15:30 -0700)]
perf record: Fix a potential error handling issue

The evlist is allocated at the beginning of cmd_record().  Also free-ing
thread masks should be paired with record__init_thread_masks() which is
called right before __cmd_record().

Let's change the order of these functions to release the resources
correctly in case of errors.  This is maybe fine as the process exits,
but it might be a problem if it manages some system-wide resources that
live longer than the process.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240703223035.2024586-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agoperf bpf-filter: Support separate lost counts for each filter
Namhyung Kim [Wed, 3 Jul 2024 22:30:32 +0000 (15:30 -0700)]
perf bpf-filter: Support separate lost counts for each filter

As the BPF filter is shared between other processes, it should have its
own counter for each invocation.  Add a new array map (lost_count) to
save the count using the same index as the filter.  It should clear the
count before running the filter.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240703223035.2024586-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>