]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
4 months agoperf python: Add evlist.config to set up record options
Ian Rogers [Fri, 28 Feb 2025 22:23:07 +0000 (14:23 -0800)]
perf python: Add evlist.config to set up record options

Add access to evlist__config that is used to configure an evlist with
record options.

Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf python: Add evlist all_cpus accessor
Ian Rogers [Fri, 28 Feb 2025 22:23:06 +0000 (14:23 -0800)]
perf python: Add evlist all_cpus accessor

Add a means to get the reference counted all_cpus CPU map from an
evlist in its python form.

Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf python: Avoid duplicated code in get_tracepoint_field
Ian Rogers [Fri, 28 Feb 2025 22:23:05 +0000 (14:23 -0800)]
perf python: Avoid duplicated code in get_tracepoint_field

The code replicates computations done in evsel__tp_format, reuse
evsel__tp_format to simplify the python C code.

Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf python: Update ungrouped evsel leader in clone
Ian Rogers [Fri, 28 Feb 2025 22:23:04 +0000 (14:23 -0800)]
perf python: Update ungrouped evsel leader in clone

evsels are cloned in the python code as they form part of the Python
object pyrf_evsel. The cloning doesn't update the evsel's leader, do
this for the case of an evsel being ungrouped.

Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf python: Add optional cpus and threads arguments to parse_events
Ian Rogers [Fri, 28 Feb 2025 22:23:03 +0000 (14:23 -0800)]
perf python: Add optional cpus and threads arguments to parse_events

Used for the evlist initialization.

Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf python: Add member access to a number of evsel variables
Ian Rogers [Fri, 28 Feb 2025 22:23:02 +0000 (14:23 -0800)]
perf python: Add member access to a number of evsel variables

Most variables are part of the perf_event_attr, so that they may be
queried and modified.

Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf python: Add evlist enable and disable methods
Ian Rogers [Fri, 28 Feb 2025 22:23:01 +0000 (14:23 -0800)]
perf python: Add evlist enable and disable methods

By default the evsels from parse_events will be disabled. Add access
to the evlist functions so they can be enabled/disabled.

Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf evsel: tp_format accessing improvements
Ian Rogers [Fri, 28 Feb 2025 22:23:00 +0000 (14:23 -0800)]
perf evsel: tp_format accessing improvements

Ensure evsel__clone copies the tp_sys and tp_name variables.
In evsel__tp_format, if tp_sys isn't set, use the config value to find
the tp_format. This succeeds in python code where pyrf__tracepoint has
already found the format.

Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-4-irogers@google.com
Fixes: 6c8310e8380d472c ("perf evsel: Allow evsel__newtp without libtraceevent")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf evlist: Add success path to evlist__create_syswide_maps
Ian Rogers [Fri, 28 Feb 2025 22:22:59 +0000 (14:22 -0800)]
perf evlist: Add success path to evlist__create_syswide_maps

Over various refactorings evlist__create_syswide_maps has been made to
only ever return with -ENOMEM. Fix this so that when
perf_evlist__set_maps is successfully called, 0 is returned.

Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-3-irogers@google.com
Fixes: 8c0498b6891d7ca5 ("perf evlist: Fix create_syswide_maps() not propagating maps")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf debug: Avoid stack overflow in recursive error message
Ian Rogers [Fri, 28 Feb 2025 22:22:58 +0000 (14:22 -0800)]
perf debug: Avoid stack overflow in recursive error message

In debug_file, pr_warning_once is called on error. As that function
calls debug_file the function will yield a stack overflow. Switch the
location of the call so the recursion is avoided.

Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-2-irogers@google.com
Fixes: ec49230cf6dda704 ("perf debug: Expose debug file")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf symbol: Support .gnu_debugdata for symbols
Stephen Brennan [Fri, 7 Mar 2025 23:22:03 +0000 (15:22 -0800)]
perf symbol: Support .gnu_debugdata for symbols

Fedora introduced a "MiniDebuginfo" feature, in which an LZMA-compressed
ELF file is placed inside a section named ".gnu_debugdata". This file
contains nothing but a symbol table, which can be used to supplement the
.dynsym section which only contains required symbols for runtime.

It is supported by GDB for stack traces, but it should be useful for
tracing as well. Implement support for loading symbols from
.gnu_debugdata.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250307232206.2102440-4-stephen.s.brennan@oracle.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf tools: Add LZMA decompression from FILE
Stephen Brennan [Fri, 7 Mar 2025 23:22:02 +0000 (15:22 -0800)]
perf tools: Add LZMA decompression from FILE

Internally lzma_decompress_to_file() creates a FILE from the filename.
Add an API that takes an existing FILE directly. This allows
decompressing already-open files and even buffers opened by fmemopen().
It is necessary for supporting .gnu_debugdata in the next patch.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250307232206.2102440-3-stephen.s.brennan@oracle.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf tools: Add dummy functions for !HAVE_LZMA_SUPPORT
Stephen Brennan [Fri, 7 Mar 2025 23:22:01 +0000 (15:22 -0800)]
perf tools: Add dummy functions for !HAVE_LZMA_SUPPORT

This allows us to use them without needing to ifdef the calling code.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250307232206.2102440-2-stephen.s.brennan@oracle.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf mem: Don't leak mem event names
Ian Rogers [Sat, 8 Mar 2025 01:28:53 +0000 (17:28 -0800)]
perf mem: Don't leak mem event names

When preparing the mem events for the argv copies are intentionally
made. These copies are leaked and cause runs of perf using address
sanitizer to fail. Rather than leak the memory allocate a chunk of
memory for the mem event names upfront and build the strings in this -
the storage is sized larger than the previous buffer size. The caller
is then responsible for clearing up this memory. As part of this
change, remove the mem_loads_name and mem_stores_name global buffers
then change the perf_pmu__mem_events_name to write to an out argument
buffer.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250308012853.1384762-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf vendor events riscv: Add SiFive P650 events
Eric Lin [Thu, 13 Feb 2025 01:21:40 +0000 (17:21 -0800)]
perf vendor events riscv: Add SiFive P650 events

The SiFive Performance P650 core (including the vector-enabled P670 and
area-optimized P450/P470 variants) updates the P550 microarchitecture.
It brings in the debug, trace, and counter events from newer Bullet
cores, and adds new events for iTLB and dTLB multi-hits.

All other PMU events are unchanged from the P550 core.

Signed-off-by: Eric Lin <eric.lin@sifive.com>
Co-developed-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250213220341.3215660-8-samuel.holland@sifive.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf vendor events riscv: Add SiFive P550 events
Eric Lin [Thu, 13 Feb 2025 01:21:39 +0000 (17:21 -0800)]
perf vendor events riscv: Add SiFive P550 events

The SiFive Performance P550 core features an out-of-order
microarchitecture which exposes the same PMU events as Bullet,
plus events for UTLB hits and PTE cache misses/hits.

Add support for specifying these events using symbolic names.

Signed-off-by: Eric Lin <eric.lin@sifive.com>
Co-developed-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250213220341.3215660-7-samuel.holland@sifive.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf vendor events riscv: Add SiFive Bullet version 0x0d events
Eric Lin [Thu, 13 Feb 2025 01:21:38 +0000 (17:21 -0800)]
perf vendor events riscv: Add SiFive Bullet version 0x0d events

SiFive Bullet microarchitecture cores with mimpid values starting with
0x0d or greater add new PMU events to count TLB miss stall cycles.

All other PMU events are unchanged from earlier Bullet cores.

Signed-off-by: Eric Lin <eric.lin@sifive.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250213220341.3215660-6-samuel.holland@sifive.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf vendor events riscv: Add SiFive Bullet version 0x07 events
Eric Lin [Thu, 13 Feb 2025 01:21:37 +0000 (17:21 -0800)]
perf vendor events riscv: Add SiFive Bullet version 0x07 events

SiFive Bullet microarchitecture cores with mimpid values starting with
0x07 or greater add new PMU events to support debug, trace, and counter
sampling and filtering (Sscofpmf).

All other PMU events are unchanged from earlier Bullet cores.

Signed-off-by: Eric Lin <eric.lin@sifive.com>
Co-developed-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250213220341.3215660-5-samuel.holland@sifive.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf vendor events riscv: Update SiFive Bullet events
Eric Lin [Thu, 13 Feb 2025 01:21:36 +0000 (17:21 -0800)]
perf vendor events riscv: Update SiFive Bullet events

Regenerate the event lists from the original hardware description. This
makes them consistent with the event lists for newer versions of the
hardware, allowing most files to be reused across hardware versions.

Signed-off-by: Eric Lin <eric.lin@sifive.com>
Co-developed-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250213220341.3215660-4-samuel.holland@sifive.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf vendor events riscv: Remove leading zeroes
Samuel Holland [Thu, 13 Feb 2025 01:21:35 +0000 (17:21 -0800)]
perf vendor events riscv: Remove leading zeroes

The EventCode field (as stored in the mhpmeventN CSRs) is actually 56
bits wide, but there is no need to keep leading zeroes in the JSON
files. Remove them to simplify review of the following change, which
regenerates the files in a way that does not include leading zeroes.

This change was performed automatically with `sed -i "s/0x0*/0x/"`.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250213220341.3215660-3-samuel.holland@sifive.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf vendor events riscv: Rename U74 to Bullet
Samuel Holland [Thu, 13 Feb 2025 01:21:34 +0000 (17:21 -0800)]
perf vendor events riscv: Rename U74 to Bullet

This set of PMU event descriptions applies not only to the SiFive U74
core configuration, but also to other SiFive cores that implement the
Bullet microarchitecture (such as U64, P270, and X280). Rename the
directory to be more generic.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250213220341.3215660-2-samuel.holland@sifive.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf util: Remove unused perf_config__refresh
Dr. David Alan Gilbert [Wed, 5 Mar 2025 02:31:20 +0000 (02:31 +0000)]
perf util: Remove unused perf_config__refresh

perf_config__refresh() was added in 2016 by
commit 8a0a9c7e9146 ("perf config: Introduce new init() and exit()")
but has remained unused.

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250305023120.155420-7-linux@treblig.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf util: Remove unused perf_pmus__default_pmu_name
Dr. David Alan Gilbert [Wed, 5 Mar 2025 02:31:19 +0000 (02:31 +0000)]
perf util: Remove unused perf_pmus__default_pmu_name

perf_pmus__default_pmu_name() last use was removed by 2023's
commit e3edd6cf6399 ("perf pmu-events: Reduce processed events by passing
PMU")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250305023120.155420-6-linux@treblig.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf util: Remove unused perf_data__update_dir
Dr. David Alan Gilbert [Wed, 5 Mar 2025 02:31:18 +0000 (02:31 +0000)]
perf util: Remove unused perf_data__update_dir

perf_data__update_dir() was added in 2019's
commit e8be135751f2 ("perf data: Add perf_data__update_dir() function")
but has never been used.

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250305023120.155420-5-linux@treblig.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf util: Remove unused pstack__pop
Dr. David Alan Gilbert [Wed, 5 Mar 2025 02:31:17 +0000 (02:31 +0000)]
perf util: Remove unused pstack__pop

The last use of pstack__pop() was removed in 2015 by
commit 6422184b087f ("perf hists browser: Simplify zooming code using
pstack_peek()")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250305023120.155420-4-linux@treblig.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf util: Remove unused perf_color_default_config
Dr. David Alan Gilbert [Wed, 5 Mar 2025 02:31:16 +0000 (02:31 +0000)]
perf util: Remove unused perf_color_default_config

perf_color_default_config() was added in 2009 by
commit 8fc0321f1ad0 ("perf_counter tools: Add color terminal output
support")
but has remained unused.

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250305023120.155420-3-linux@treblig.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf tests: Fix data symbol test with LTO builds
Ian Rogers [Wed, 26 Feb 2025 23:01:09 +0000 (15:01 -0800)]
perf tests: Fix data symbol test with LTO builds

With LTO builds, although regular builds could also see this as
all the code is in one file, the datasym workload can realize the
buf1.reserved data is never accessed. The compiler moves the
variable to bss and only keeps the data1 and data2 parts as
separate variables. This causes the symbol check to fail in the
test. Make the variable volatile to disable the more aggressive
optimization. Rename the variable to make which buf1 in perf is
being referred to.

Before:

  $ perf test -vv "data symbol"
  126: Test data symbol:
  --- start ---
  test child forked, pid 299808
  perf does not have symbol 'buf1'
  perf is missing symbols - skipping test
  ---- end(-2) ----
  126: Test data symbol                                                : Skip
  $ nm perf|grep buf1
  0000000000a5fa40 b buf1.0
  0000000000a5fa48 b buf1.1

After:

  $ nm perf|grep buf1
  0000000000a53a00 d buf1
  $ perf test -vv "data symbol"126: Test data symbol:
  --- start ---
  test child forked, pid 302166
   a53a00-a53a39 l buf1
  perf does have symbol 'buf1'
  Recording workload...
  Waiting for "perf record has started" message
  OK
  Cleaning up files...
  ---- end(0) ----
  126: Test data symbol                                                : Ok

Fixes: 3dfc01fe9d12 ("perf test: Add 'datasym' test workload")
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250226230109.314580-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf report: Fix memory leaks in the hierarchy mode
Namhyung Kim [Fri, 7 Mar 2025 06:12:50 +0000 (22:12 -0800)]
perf report: Fix memory leaks in the hierarchy mode

Ian told me that there are many memory leaks in the hierarchy mode.  I
can easily reproduce it with the follwing command.

  $ make DEBUG=1 EXTRA_CFLAGS=-fsanitize=leak

  $ perf record --latency -g -- ./perf test -w thloop

  $ perf report -H --stdio
  ...
  Indirect leak of 168 byte(s) in 21 object(s) allocated from:
      #0 0x7f3414c16c65 in malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75
      #1 0x55ed3602346e in map__get util/map.h:189
      #2 0x55ed36024cc4 in hist_entry__init util/hist.c:476
      #3 0x55ed36025208 in hist_entry__new util/hist.c:588
      #4 0x55ed36027c05 in hierarchy_insert_entry util/hist.c:1587
      #5 0x55ed36027e2e in hists__hierarchy_insert_entry util/hist.c:1638
      #6 0x55ed36027fa4 in hists__collapse_insert_entry util/hist.c:1685
      #7 0x55ed360283e8 in hists__collapse_resort util/hist.c:1776
      #8 0x55ed35de0323 in report__collapse_hists /home/namhyung/project/linux/tools/perf/builtin-report.c:735
      #9 0x55ed35de15b4 in __cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1119
      #10 0x55ed35de43dc in cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1867
      #11 0x55ed35e66767 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:351
      #12 0x55ed35e66a0e in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:404
      #13 0x55ed35e66b67 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:448
      #14 0x55ed35e66eb0 in main /home/namhyung/project/linux/tools/perf/perf.c:556
      #15 0x7f340ac33d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
  ...

  $ perf report -H --stdio 2>&1 | grep -c '^Indirect leak'
  93

I found that hist_entry__delete() missed to release child entries in the
hierarchy tree (hroot_{in,out}).  It needs to iterate the child entries
and call hist_entry__delete() recursively.

After this change:

  $ perf report -H --stdio 2>&1 | grep -c '^Indirect leak'
  0

Reported-by: Ian Rogers <irogers@google.com>
Tested-by Thomas Falcon <thomas.falcon@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250307061250.320849-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf report: Use map_symbol__copy() when copying callchains
Namhyung Kim [Fri, 7 Mar 2025 06:12:49 +0000 (22:12 -0800)]
perf report: Use map_symbol__copy() when copying callchains

It seems there are places to miss updating refcount of maps.
Let's use map_symbol__copy() helper to properly copy them with
refcounts updated.

Link: https://lore.kernel.org/r/20250307061250.320849-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf annotate: Return errors from disasm_line__parse_powerpc()
Athira Rajeev [Tue, 4 Mar 2025 15:41:14 +0000 (21:11 +0530)]
perf annotate: Return errors from disasm_line__parse_powerpc()

In disasm_line__parse_powerpc() , return code from function
disasm_line__parse() is ignored. This will result in bad results
if the disasm_line__parse() fails to disasm the line. Use
the return code to fix this.

Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/r/20250304154114.62093-2-atrajeev@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf annotate: Add annotation_options.disassembler_used
Athira Rajeev [Tue, 4 Mar 2025 15:41:13 +0000 (21:11 +0530)]
perf annotate: Add annotation_options.disassembler_used

When doing "perf annotate", perf tool provides option to
use specific disassembler like llvm/objdump/capstone. The
order picked is to use llvm first and if that fails fallback
to objdump ie to use PERF_DISASM_LLVM, PERF_DISASM_CAPSTONE
and PERF_DISASM_OBJDUMP

In powerpc, when using "data type" sort keys, first preferred
approach is to read the raw instruction from the DSO. In objdump
is specified in "--objdump" option, it picks the symbol disassemble
using objdump. Currently disasm_line__parse_powerpc() function
uses length of the "line" to determine if objdump is used.
But there are few cases, where if objdump doesn't recognise the
instruction, the disassembled string will be empty.

Example:

     134cdc: c4 05 82 41  beq     1352a0 <getcwd+0x6e0>
     134ce0: ac 00 8e 40  bne     cr3,134d8c <getcwd+0x1cc>
     134ce4: 0f 00 10 04  pld     r9,1028308
====>134ce8: d4 b0 20 e5
     134cec: 16 00 40 39  li      r10,22
     134cf0: 48 01 21 ea  ld      r17,328(r1)

So depending on length of line will give bad results.

Add a new filed to annotation options structure,
"struct annotation_options" to save the disassembler used.
Use this info to determine if disassembly is done while
parsing the disasm line.

Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/r/20250304154114.62093-1-atrajeev@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf report: Do not process non-JIT BPF ksymbol events
Namhyung Kim [Wed, 5 Mar 2025 23:28:38 +0000 (15:28 -0800)]
perf report: Do not process non-JIT BPF ksymbol events

The length of PERF_RECORD_KSYMBOL for BPF is a size of JITed code so
it'd be 0 when it's not JITed.  The ksymbol is needed to symbolize the
code when it gets samples in the region but non-JITed code cannot get
samples.  Thus it'd be ok to ignore them.

Actually it caused a performance issue in the perf tools on old ARM
kernels where it can refuse to JIT some BPF codes.  It ended up
splitting the existing kernel map (kallsyms).  And later lookup for a
kernel symbol would create a new kernel map from kallsyms and then
split it again and again. :(

Probably there's a bug in the kernel map/symbol handling in perf tools.
But I think we need to fix this anyway.

Reported-by: Kevin Nomura <nomurak@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250305232838.128692-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf test: Fix leak in "Synthesize attr update" test
Ian Rogers [Wed, 5 Mar 2025 19:19:31 +0000 (11:19 -0800)]
perf test: Fix leak in "Synthesize attr update" test

The own_cpus map variable may be non-NULL and hold a reference, in
particular on hybrid machines. Do a put before overwriting the
variable to avoid a memory leak.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250305191931.604764-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf machine: Fix insertion of PERF_RECORD_KSYMBOL related kernel maps
Namhyung Kim [Fri, 28 Feb 2025 21:17:34 +0000 (18:17 -0300)]
perf machine: Fix insertion of PERF_RECORD_KSYMBOL related kernel maps

This was detected at the end of a 'perf record' session when build-id
collection was enabled and thus the BPF programs put in place while the
session was running, some even put in place by perf itself were
processed and inserted, with some overlaps related to BPF trampolines
and programs took place.

Using maps__fixup_overlap_and_insert() instead of maps__insert() "fixes"
the problem, in the sense that overlaps will be dealt with and then the
consistency will be kept, but it would be interesting to fully
understand why such overlaps take place and how to deal with them when
doing symbol resolution.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Suggested-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/lkml/CAP-5=fXEEMFgPF2aZhKsfrY_En+qoqX20dWfuE_ad73Uxf0ZHQ@mail.gmail.com
Link: https://lore.kernel.org/r/20250228211734.33781-7-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf maps: Add missing map__set_kmap_maps() when replacing a kernel map
Arnaldo Carvalho de Melo [Fri, 28 Feb 2025 21:17:33 +0000 (18:17 -0300)]
perf maps: Add missing map__set_kmap_maps() when replacing a kernel map

Since in this case __maps__insert_sorted() is not called and thus
doesn't have the opportunity to do the needed map__set_kmap_maps() calls on
the new map.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/lkml/Z7-May5w9VQd5QD0@x1
Link: https://lore.kernel.org/r/20250228211734.33781-6-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf maps: Fixup maps_by_name when modifying maps_by_address
Namhyung Kim [Fri, 28 Feb 2025 21:17:32 +0000 (18:17 -0300)]
perf maps: Fixup maps_by_name when modifying maps_by_address

We can't just replacing the map in the maps_by_address and not touching
on the maps_by_name, that would leave the refcount as 1 and thus trip
another consistency check, this one:

  perf: util/maps.c:110: check_invariants:
   Assertion `refcount_read(map__refcnt(map)) > 1' failed.

  106         /*
  107          * Maps by name maps should be in maps_by_address, so
  108          * the reference count should be higher.
  109          */
  110         assert(refcount_read(map__refcnt(map)) > 1);

Committer notice:

Initialize the newly added 'ni' variable, that really can't be
accessed unitialized trips some gcc versions, like:

  12    20.00 archlinux:base                : FAIL gcc version 13.2.1 20230801 (GCC)
    util/maps.c: In function ‘__maps__fixup_overlap_and_insert’:
    util/maps.c:896:54: error: ‘ni’ may be used uninitialized [-Werror=maybe-uninitialized]
      896 |                                 map__put(maps_by_name[ni]);
          |                                                      ^
    util/maps.c:816:25: note: ‘ni’ was declared here
      816 |         unsigned int i, ni;
          |                         ^~
    cc1: all warnings being treated as errors
    make[3]: *** [/git/perf-6.14.0-rc1/tools/build/Makefile.build:138: util] Error 2

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/lkml/Z79std66tPq-nqsD@google.com
Link: https://lore.kernel.org/r/20250228211734.33781-5-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf machine: Fixup kernel maps ends after adding extra maps
Namhyung Kim [Fri, 28 Feb 2025 21:17:31 +0000 (18:17 -0300)]
perf machine: Fixup kernel maps ends after adding extra maps

I just noticed it would add extra kernel maps after modules.  I think it
should fixup end address of the kernel maps after adding all maps first.

Fixes: 876e80cf83d10585 ("perf tools: Fixup end address of modules")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/lkml/Z7TvZGjVix2asYWI@x1
Link: https://lore.kernel.org/lkml/Z712hzvv22Ni63f1@google.com
Link: https://lore.kernel.org/r/20250228211734.33781-4-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf maps: Set the kmaps for newly created/added kernel maps
Arnaldo Carvalho de Melo [Fri, 28 Feb 2025 21:17:30 +0000 (18:17 -0300)]
perf maps: Set the kmaps for newly created/added kernel maps

When using __maps__insert_sorted() the map kmaps field needs to be
initialized, as we need kernel maps to work with map__kmap().

Fix it by using the newly introduced map__set_kmap() method.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/lkml/Z74V0hZXrTLM6VIJ@x1
Link: https://lore.kernel.org/r/20250228211734.33781-3-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf maps: Introduce map__set_kmap_maps() for kernel maps
Arnaldo Carvalho de Melo [Fri, 28 Feb 2025 21:17:29 +0000 (18:17 -0300)]
perf maps: Introduce map__set_kmap_maps() for kernel maps

We need to set it in other places than __maps__insert(), so that we can
have access to the 'struct maps' from a kernel 'struct map'.

When building perf with 'DEBUG=1' we can notice it failing a consistency
check done in the check_invariants() function:

  root@number:~# perf record -- perf test -w offcpu
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.040 MB perf.data (23 samples) ]
  perf: util/maps.c:95: check_invariants: Assertion `map__end(prev) <= map__end(map)' failed.
  Aborted (core dumped)
  root@number:~#

The investigation on that was happening bisected to 876e80cf83d10585
("perf tools: Fixup end address of modules"), and the following patches
will plug the problems found, this patch is just legwork on that
direction.

Use the map__set_kmap_maps() name as per a review comment from Ian
Rogers, later there are further suggestions from him on getting rid of
the kmaps variable, see the thread referenced in the Link below.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/lkml/Z74V0hZXrTLM6VIJ@x1
Link: https://lore.kernel.org/r/20250228211734.33781-2-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf script: Fix output type for dynamically allocated core PMU's
Thomas Falcon [Wed, 5 Mar 2025 16:39:35 +0000 (10:39 -0600)]
perf script: Fix output type for dynamically allocated core PMU's

This patch was originally posted here:

https://lore.kernel.org/all/20241213215421.661139-1-thomas.falcon@intel.com/

I have rebased on top of Arnaldo's patch here:

https://lore.kernel.org/all/Z2XCi3PgstSrV0SE@x1/

The original commit message:
"
perf script output may show different fields on different core PMU's
that exist on heterogeneous platforms. For example,

perf record -e "{cpu_core/mem-loads-aux/,cpu_core/event=0xcd,\
umask=0x01,ldlat=3,name=MEM_UOPS_RETIRED.LOAD_LATENCY/}:upp"\
-c10000 -W -d -a -- sleep 1

perf script:

chromium-browse   46572 [002] 544966.882384:      10000  cpu_core/MEM_UOPS_RETIRED.LOAD_LATENCY/: 7ffdf1391b0c     10268100142 \
 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK    N/A    5   7    0   7fad7c47425d [unknown] (/usr/lib64/libglib-2.0.so.0.8000.3)

perf record -e cpu_atom/event=0xd0,umask=0x05,ldlat=3,\
name=MEM_UOPS_RETIRED.LOAD_LATENCY/upp -c10000 -W -d -a -- sleep 1

perf script:

gnome-control-c  534224 [023] 544951.816227:      10000 cpu_atom/MEM_UOPS_RETIRED.LOAD_LATENCY/:   7f0aaaa0aae0  [unknown] (/usr/lib64/libglib-2.0.so.0.8000.3)

Some fields, such as data_src, are not included by default.

The cause is that while one PMU may be assigned a type such as
PERF_TYPE_RAW, other core PMU's are dynamically allocated at boot time.
If this value does not match an existing PERF_TYPE_X value,
output_type(perf_event_attr.type) will return OUTPUT_TYPE_OTHER.

Instead search for a core PMU with a matching perf_event_attr type
and, if one is found, return PERF_TYPE_RAW to match output of other
core PMU's.
"

Suggested-by: Kan Liang <kan.liang@intel.com>
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250305163935.1605312-1-thomas.falcon@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf bench: Fix perf bench syscall loop count
Thomas Richter [Tue, 4 Mar 2025 09:23:49 +0000 (10:23 +0100)]
perf bench: Fix perf bench syscall loop count

Command 'perf bench syscall fork -l 100000' offers option -l to run for
a specified number of iterations. However this option is not always
observed. The number is silently limited to 10000 iterations as can be
seen:

Output before:
 # perf bench syscall fork -l 100000
 # Running 'syscall/fork' benchmark:
 # Executed 10,000 fork() calls
     Total time: 23.388 [sec]

    2338.809800 usecs/op
            427 ops/sec
 #

When explicitly specified with option -l or --loops, also observe
higher number of iterations:

Output after:
 # perf bench syscall fork -l 100000
 # Running 'syscall/fork' benchmark:
 # Executed 100,000 fork() calls
     Total time: 716.982 [sec]

    7169.829510 usecs/op
            139 ops/sec
 #

This patch fixes the issue for basic execve fork and getpgid.

Fixes: ece7f7c0507c ("perf bench syscall: Add fork syscall benchmark")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/20250304092349.2618082-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf test: Simplify data symbol test
Namhyung Kim [Tue, 4 Mar 2025 02:28:37 +0000 (18:28 -0800)]
perf test: Simplify data symbol test

Now the workload will end after 1 second.  Just run it with perf instead
of waiting for the background process.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304022837.1877845-7-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf test: Add timeout to datasym workload
Namhyung Kim [Tue, 4 Mar 2025 02:28:36 +0000 (18:28 -0800)]
perf test: Add timeout to datasym workload

Unlike others it has an infinite loop that make it annoying to call.
Make it finish after 1 second and handle command-line argument to change
the setting.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304022837.1877845-6-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf test: Add trace record and replay test
Namhyung Kim [Tue, 4 Mar 2025 02:28:35 +0000 (18:28 -0800)]
perf test: Add trace record and replay test

It just check trace record and replay could display correct output.
It uses 'sleep' process and sees there's a clock_nanosleep syscall.

  $ sudo perf test -vv replay
  108: perf trace record and replay:
  --- start ---
  test child forked, pid 1563219
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.077 MB /tmp/temporary_file.w1ApA (242 samples) ]
       0.686 (1000.068 ms): sleep/1563226 clock_nanosleep(rqtp: 0x7ffc20ffee10, rmtp: 0x7ffc20ffee50)           = 0
  ---- end(0) ----
  108: perf trace record and replay                                    : Ok

Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Link: https://lore.kernel.org/r/20250304022837.1877845-5-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf test: Skip perf trace tests when running as non-root
Namhyung Kim [Tue, 4 Mar 2025 02:28:34 +0000 (18:28 -0800)]
perf test: Skip perf trace tests when running as non-root

perf trace requires root because it needs to use tracepoints and BPF.
Skip those test when it's not run as root.

Before:
  $ perf test trace
   15: Parse sched tracepoints fields                                  : Skip (permissions)
   80: perf ftrace tests                                               : Skip
  105: perf trace enum augmentation tests                              : FAILED!
  106: perf trace BTF general tests                                    : FAILED!
  107: perf trace exit race                                            : FAILED!
  118: probe libc's inet_pton & backtrace it with ping                 : Skip
  125: Check Arm CoreSight trace data recording and synthesized samples: Skip
  127: Check Arm SPE trace data recording and synthesized samples      : Skip
  132: Check open filename arg using perf trace + vfs_getname          : FAILED!

After:
  $ perf test trace
   15: Parse sched tracepoints fields                                  : Skip (permissions)
   80: perf ftrace tests                                               : Skip
  105: perf trace enum augmentation tests                              : Skip
  106: perf trace BTF general tests                                    : Skip
  107: perf trace exit race                                            : Skip
  118: probe libc's inet_pton & backtrace it with ping                 : Skip
  125: Check Arm CoreSight trace data recording and synthesized samples: Skip
  127: Check Arm SPE trace data recording and synthesized samples      : Skip
  132: Check open filename arg using perf trace + vfs_getname          : Skip

Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Link: https://lore.kernel.org/r/20250304022837.1877845-4-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf test: Skip perf probe tests when running as non-root
Namhyung Kim [Tue, 4 Mar 2025 02:28:33 +0000 (18:28 -0800)]
perf test: Skip perf probe tests when running as non-root

perf trace requires root because it needs to use [ku]probes.
Skip those test when it's not run as root.

Before:
  $ perf test probe
   47: Probe SDT events                                                : Ok
  104: test perf probe of function from different CU                   : FAILED!
  115: perftool-testsuite_probe                                        : FAILED!
  117: Add vfs_getname probe to get syscall args filenames             : FAILED!
  118: probe libc's inet_pton & backtrace it with ping                 : FAILED!
  119: Use vfs_getname probe to get syscall args filenames             : FAILED!

After:
  $ perf test probe
   47: Probe SDT events                                                : Ok
  104: test perf probe of function from different CU                   : Skip
  115: perftool-testsuite_probe                                        : Skip
  117: Add vfs_getname probe to get syscall args filenames             : Skip
  118: probe libc's inet_pton & backtrace it with ping                 : Skip
  119: Use vfs_getname probe to get syscall args filenames             : Skip

Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20250304022837.1877845-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf test: Add --metric-only to perf stat output tests
Namhyung Kim [Tue, 4 Mar 2025 02:28:32 +0000 (18:28 -0800)]
perf test: Add --metric-only to perf stat output tests

Add a test case for --metric-only for std, csv, json output mode using
shadow IPC metric from instructions and cycles events.  It should
produce 'insn per cycle' metric.

But currently JSON output has (none) 'GHz' as well.  It looks like a bug
but I don't have enough time to debug it for now so I made it pass. :(

  $ perf stat --metric-only -e instructions,cycles true

   Performance counter stats for 'true':

                    0.56

         0.002127319 seconds time elapsed

         0.002077000 seconds user
         0.000000000 seconds sys

  $ perf stat -x, --metric-only -e instructions,cycles true

  0.55,,

  $ perf stat -j --metric-only -e instructions,cycles true
  {"insn per cycle" : "0.53", "GHz" : "none"}

  $ perf test output -v
    5: Test data source output                                         : Ok
   31: Sort output of hist entries                                     : Ok
   88: perf stat CSV output linter                                     : Ok
   90: perf stat JSON output linter                                    : Ok
   92: perf stat STD output linter                                     : Ok

Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20250304022837.1877845-2-namhyung@kernel.org
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf arm-spe: Support previous branch target (PBT) address
Leo Yan [Tue, 4 Mar 2025 11:12:40 +0000 (11:12 +0000)]
perf arm-spe: Support previous branch target (PBT) address

When FEAT_SPE_PBT is implemented, the previous branch target address
(named as PBT) before the sampled operation, will be recorded.

This commit first introduces a 'prev_br_tgt' field in the record for
saving the PBT address in the decoder.

If the current operation is a branch instruction, by combining with PBT,
it can create a chain with two consecutive branches.  As the branch
stack stores branches in descending order, meaning a newer branch is
stored in a lower entry in the stack.  Arm SPE stores the latest branch
in the first entry of branch stack, and the previous branch coming from
PBT is stored into the second entry.

Otherwise, if current operation is not a branch, the last branch will be
saved for PBT only.  PBT lacks associated information such as branch
source address, branch type, and events.  The branch entry fills zeros
for the corresponding fields and only set its target address.

After:

  perf script -f --itrace=bl -F flags,addr,brstack
  jcc                   ffff800080187914 0xffff8000801878fc/0xffff800080187914/P/-/-/1/COND/-  0x0/0xffff8000801878f8/-/-/-/0//-
  jcc                   ffff8000802d12d8 0xffff8000802d12f8/0xffff8000802d12d8/P/-/-/1/COND/-  0x0/0xffff8000802d12ec/-/-/-/0//-
  jcc                   ffff8000813fe200 0xffff8000813fe20c/0xffff8000813fe200/P/-/-/1/COND/-  0x0/0xffff8000813fe200/-/-/-/0//-
  jcc                   ffff8000813fe200 0xffff8000813fe20c/0xffff8000813fe200/P/-/-/1/COND/-  0x0/0xffff8000813fe200/-/-/-/0//-
  jmp                   ffff800081410980 0xffff800081419108/0xffff800081410980/P/-/-/1//-  0x0/0xffff800081419104/-/-/-/0//-
  return                ffff80008036e064 0xffff80008141ba84/0xffff80008036e064/P/-/-/1/RET/-  0x0/0xffff80008141ba60/-/-/-/0//-
  jcc                   ffff8000803d54f0 0xffff8000803d54e8/0xffff8000803d54f0/P/-/-/1/COND/-  0x0/0xffff8000803d54e0/-/-/-/0//-
  jmp                   ffff80008015e468 0xffff8000803d46dc/0xffff80008015e468/P/-/-/1//-  0x0/0xffff8000803d46c8/-/-/-/0//-
  jmp                   ffff8000806e2d50 0xffff80008040f710/0xffff8000806e2d50/P/-/-/1//-  0x0/0xffff80008040f6e8/-/-/-/0//-
  jcc                   ffff800080721704 0xffff8000807216b4/0xffff800080721704/P/-/-/1/COND/-  0x0/0xffff8000807216ac/-/-/-/0//-

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-13-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf arm-spe: Add branch stack
Leo Yan [Tue, 4 Mar 2025 11:12:39 +0000 (11:12 +0000)]
perf arm-spe: Add branch stack

Although Arm SPE cannot generate continuous branch records, this commit
creates a branch stack with only one branch entry.  A single branch info
can be used for performance optimization.

A branch stack structure is dynamically allocated in the decode queue.
The branch stack and stack flags are synthesized based on branch types
and associated events.

After:

  # perf script --itrace=bl1 -F flags,addr,brstack

  jcc                   ffffc0fad9c6b214 0xffffc0fad9c6b234/0xffffc0fad9c6b214/P/-/-/7/COND/-
  jcc/miss,not_taken/   ffffc0fadaaebb30 0xffffc0fadaaebb2c/0xffffc0fadaaebb30/MN/-/-/7/COND/-
  jmp                   ffffc0fadaaea358 0xffffc0fadaaea5ec/0xffffc0fadaaea358/P/-/-/5//-
  jcc/not_taken/        ffffc0fadaae6494 0xffffc0fadaae6490/0xffffc0fadaae6494/PN/-/-/11/COND/-
  jcc/not_taken/            ffff7f83ab54 0xffff7f83ab50/0xffff7f83ab54/PN/-/-/13/COND/-
  jcc/not_taken/            ffff7f83ab08 0xffff7f83ab04/0xffff7f83ab08/PN/-/-/8/COND/-
  jcc                       ffff7f83aa80 0xffff7f83aa58/0xffff7f83aa80/P/-/-/10/COND/-
  jcc                       ffff7f9a45d0 0xffff7f9a43f0/0xffff7f9a45d0/P/-/-/29/COND/-
  jcc/not_taken/        ffffc0fad9ba6db4 0xffffc0fad9ba6db0/0xffffc0fad9ba6db4/PN/-/-/44/COND/-
  jcc                   ffffc0fadaac2964 0xffffc0fadaac2970/0xffffc0fadaac2964/P/-/-/6/COND/-
  jcc                   ffffc0fad99ddc10 0xffffc0fad99ddc04/0xffffc0fad99ddc10/P/-/-/72/COND/-
  jcc/not_taken/        ffffc0fad9b3f21c 0xffffc0fad9b3f218/0xffffc0fad9b3f21c/PN/-/-/64/COND/-
  jcc                   ffffc0fad9c3b604 0xffffc0fad9c3b5f8/0xffffc0fad9c3b604/P/-/-/13/COND/-
  jcc                   ffffc0fadaad6048 0xffffc0fadaad5f8c/0xffffc0fadaad6048/P/-/-/5/COND/-
  return/miss/              ffff7f84e614 0xffffc0fad98a2274/0xffff7f84e614/M/-/-/13/RET/-
  jcc/not_taken/        ffffc0fadaac4eb4 0xffffc0fadaac4eb0/0xffffc0fadaac4eb4/PN/-/-/5/COND/-
  jmp                       ffff7f8e3130 0xffff7f87555c/0xffff7f8e3130/P/-/-/5//-
  jcc/not_taken/        ffffc0fad9b3d9b0 0xffffc0fad9b3d9ac/0xffffc0fad9b3d9b0/PN/-/-/14/COND/-
  return                ffffc0fad9b91950 0xffffc0fad98c3e28/0xffffc0fad9b91950/P/-/-/12/RET/-

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-12-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf arm-spe: Set sample flags with supplement info
Leo Yan [Tue, 4 Mar 2025 11:12:38 +0000 (11:12 +0000)]
perf arm-spe: Set sample flags with supplement info

Based on the supplement information in the record, this commit sets the
sample flags for conditional branch, function call, return.  It also
sets events in flags, such as mispredict, not taken, and in transaction.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-11-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf arm-spe: Fill branch operations and events to record
Leo Yan [Tue, 4 Mar 2025 11:12:37 +0000 (11:12 +0000)]
perf arm-spe: Fill branch operations and events to record

The new added branch operations and events are filled into record, the
information will be consumed when synthesizing samples.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-10-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf arm-spe: Decode transactional event
Leo Yan [Tue, 4 Mar 2025 11:12:36 +0000 (11:12 +0000)]
perf arm-spe: Decode transactional event

The bit[16] in an event payload indicates an operation is in
transactional state.  Decode the bit.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-9-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf arm-spe: Extend branch operations
Leo Yan [Tue, 4 Mar 2025 11:12:35 +0000 (11:12 +0000)]
perf arm-spe: Extend branch operations

In Arm ARM (ARM DDI 0487, L.a), the section "D18.2.7 Operation Type
packet", the branch subclass is extended for Call Return (CR), Guarded
control stack data access (GCS).

This commit adds support CR and GCS operations.  The IND (indirect)
operation is defined only in bit [1], its macro is updated accordingly.

Move the COND (Conditional) macro into the same group with other
operations for better maintenance.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-8-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf arm-spe: Fix load-store operation checking
Leo Yan [Tue, 4 Mar 2025 11:12:34 +0000 (11:12 +0000)]
perf arm-spe: Fix load-store operation checking

The ARM_SPE_OP_LD and ARM_SPE_OP_ST operations are secondary operation
type, they are overlapping with other second level's operation types
belonging to SVE and branch operations.  As a result, a non load-store
operation can be parsed for data source and memory sample.

To fix the issue, this commit introduces a is_ldst_op() macro for
checking LDST operation, and apply the checking when synthesize data
source and memory samples.

Fixes: a89dbc9b988f ("perf arm-spe: Set sample's data source field")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250304111240.3378214-7-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf script: Add not taken event for branch stack
Leo Yan [Tue, 4 Mar 2025 11:12:33 +0000 (11:12 +0000)]
perf script: Add not taken event for branch stack

The branch stack has an existed field for printing mispredict, extend
the field for printing events and add support not-taken event.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-6-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf script: Add not taken event for branches
Leo Yan [Tue, 4 Mar 2025 11:12:32 +0000 (11:12 +0000)]
perf script: Add not taken event for branches

Some hardware (e.g., Arm SPE) can trace the not taken event for
branches.  Add a flag for this event and support printing it.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-5-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf script: Separate events from branch types
Leo Yan [Tue, 4 Mar 2025 11:12:31 +0000 (11:12 +0000)]
perf script: Separate events from branch types

Branch types and events are two different things.  A branch type can be
a conditional branch, an indirect branch, a procedure call, a return, or
an exception taken, etc.  The extra event information is provided for
what happens during a branch, e.g. if a branch is mispredicted or not
taken (specific to conditional branches).

To deliver information about branches, this commit separates events from
branch types.  It parses branch types first, then appends event strings
embraced by the '/' character.  If multiple events occur, the events is
separated with a comma (,).

Also add a minor improvement by adding char 'm' in char array for branch
mispredict event.

Below are extracted sample flags.

Before:
        branch:   br miss
  instructions:   br miss

After:
        branch:   jmp/miss/
  instructions:   jmp/miss/

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-4-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf script: Refactor sample_flags_to_name() function
Leo Yan [Tue, 4 Mar 2025 11:12:30 +0000 (11:12 +0000)]
perf script: Refactor sample_flags_to_name() function

When generating a string for sample flags, the sample_flags_to_name()
function lacks the ability to parse the trace start bit or trace end bit.
Therefore, the function is invoked multiple times after clearing its
unsupported bits.

This commit improves the sample_flags_to_name() function to parse sample
flags in one go for three kinds of information:
- The prefix info for trace start, trace end, etc.
- Branch types.
- Extra info for transaction and interrupt related info.

As a result, the code is simplified to call the sample_flags_to_name()
only once.  No expectation for any changes in the perf script output.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-3-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf script: Make printing flags reliable
Leo Yan [Tue, 4 Mar 2025 11:12:29 +0000 (11:12 +0000)]
perf script: Make printing flags reliable

Add a check for the generated string of flags.  Print out the raw number
if the string generation fails.

Use the SAMPLE_FLAGS_STR_ALIGNED_SIZE macro to replace the value '21'.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-2-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf stat: Fix non-uniquified hybrid legacy events
James Clark [Wed, 26 Feb 2025 14:55:25 +0000 (14:55 +0000)]
perf stat: Fix non-uniquified hybrid legacy events

Legacy hybrid events have attr.type == PERF_TYPE_HARDWARE, so they look
like plain legacy events if we only look at attr.type. But legacy events
should still be uniquified if they were opened on a non-legacy PMU. Fix
it by checking if the evsel is hybrid and forcing needs_uniquify
before looking at the attr.type.

This restores PMU names on hybrid systems and also changes "perf stat
metrics (shadow stat) test" from a FAIL back to a SKIP (on hybrid). The
test was gated on "cycles" appearing alone which doesn't happen on
here.

Before:
  $ perf stat -- true
  ...
     <not counted>      instructions:u                           (0.00%)
           162,536      instructions:u            # 0.58  insn per cycle
  ...

After:
  $ perf stat -- true
  ...
      <not counted>      cpu_atom/instructions/u                  (0.00%)
            162,541      cpu_core/instructions/u   # 0.62  insn per cycle
  ...

Fixes: 357b965deba9 ("perf stat: Changes to event name uniquification")
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20250226145526.632380-1-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf tools: Skip BPF sideband event for userspace profiling
Namhyung Kim [Wed, 26 Feb 2025 20:30:39 +0000 (12:30 -0800)]
perf tools: Skip BPF sideband event for userspace profiling

The BPF sideband information is tracked using a separate thread and
evlist.  But it's only useful for profiling kernel and we can skip it
when users profile their application only.

It seems it already fails to open the sideband event in that case.
Let's remove the noise in the verbose output anyway.

Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250226203039.1099131-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf test: Fix spelling mistake "sythesizing" -> "synthesizing"
Colin Ian King [Fri, 28 Feb 2025 09:09:41 +0000 (09:09 +0000)]
perf test: Fix spelling mistake "sythesizing" -> "synthesizing"

There are spelling mistakes in TEST_ASSERT_VAL messages. Fix them.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20250228090941.680226-1-colin.i.king@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf build: Fix in-tree build due to symbolic link
Luca Ceresoli [Fri, 24 Jan 2025 13:06:08 +0000 (14:06 +0100)]
perf build: Fix in-tree build due to symbolic link

Building perf in-tree is broken after commit 890a1961c812 ("perf tools:
Create source symlink in perf object dir") which added a 'source' symlink
in the output dir pointing to the source dir.

With in-tree builds, the added 'SOURCE = ...' line is executed multiple
times (I observed 2 during the build plus 2 during installation). This is a
minor inefficiency, in theory not harmful because symlink creation is
assumed to be idempotent. But it is not.

Considering with in-tree builds:

  srctree=/absolute/path/to/linux
   OUTPUT=/absolute/path/to/linux/tools/perf

here's what happens:

 1. ln -sf $(srctree)/tools/perf $(OUTPUT)/source
    -> creates /absolute/path/to/linux/tools/perf/source
       link to /absolute/path/to/linux/tools/perf
    => OK, that's what was intended
 2. ln -sf $(srctree)/tools/perf $(OUTPUT)/source   # same command as 1
    -> creates /absolute/path/to/linux/tools/perf/perf
       link to /absolute/path/to/linux/tools/perf
    => Not what was intended, not idempotent
 3. Now the build _should_ create the 'perf' executable, but it fails

The reason is the tricky 'ln' command line. At the first invocation 'ln'
uses the 1st form:

       ln [OPTION]... [-T] TARGET LINK_NAME

and creates a link to TARGET *called LINK_NAME*.

At the second invocation $(OUTPUT)/source exists, so 'ln' uses the 3rd
form:

       ln [OPTION]... TARGET... DIRECTORY

and creates a link to TARGET *called TARGET* inside DIRECTORY.

Fix by adding -n/--no-dereference to "treat LINK_NAME as a normal file
if it is a symbolic link to a directory", as the manpage says.

Closes: https://lore.kernel.org/all/20241125182506.38af9907@booty/
Fixes: 890a1961c812 ("perf tools: Create source symlink in perf object dir")
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20250124-perf-fix-intree-build-v1-1-485dd7a855e4@bootlin.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agotools/x86: Fix linux/unaligned.h include path in lib/insn.c
Ian Rogers [Tue, 25 Feb 2025 19:36:00 +0000 (11:36 -0800)]
tools/x86: Fix linux/unaligned.h include path in lib/insn.c

tools/arch/x86/include/linux doesn't exist but building is working by
virtue of a -I. Building using bazel this fails. Use angle brackets to
include unaligned.h so there isn't an invalid relative include.

Fixes: 5f60d5f6bbc1 ("move asm/unaligned.h to linux/unaligned.h")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20250225193600.90037-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf arm-spe: Report error if set frequency
Leo Yan [Thu, 27 Feb 2025 08:55:44 +0000 (08:55 +0000)]
perf arm-spe: Report error if set frequency

When users set the parameter '-F' to specify frequency for Arm SPE, the
tool reports error:

  perf record -F 1000 -e arm_spe_0// -- sleep 1
  Error:
  Invalid event (arm_spe_0//) in per-thread mode, enable system wide with '-a'.

The output logs are confused and it does not give the correct reminding.
Arm SPE does not support frequency setting given it adopts a statistical
based approach.

Alternatively, Arm SPE supports setting period.  This commit adds a
for frequency setting.  It reports error and reminds users to set period
instead.

After:

  perf record -F 1000 -e arm_spe_0// -- sleep 1
  Arm SPE: Frequency is not supported. Set period with -c option or PMU parameter (-e arm_spe_0/period=NUM/).

Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250227085544.2154136-1-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf lock: Report owner stack in usermode
Chun-Tse Shao [Thu, 27 Feb 2025 00:28:56 +0000 (16:28 -0800)]
perf lock: Report owner stack in usermode

This patch parses `owner_lock_stat` into a RB tree, enabling ordered
reporting of owner lock statistics with stack traces. It also updates
the documentation for the `-o` option in contention mode, decouples `-o`
from `-t`, and issues a warning to inform users about the new behavior
of `-ov`.

Example output:
  $ sudo ~/linux/tools/perf/perf lock con -abvo -Y mutex-spin -E3 perf bench sched pipe
  ...
   contended   total wait     max wait     avg wait         type   caller

         171      1.55 ms     20.26 us      9.06 us        mutex   pipe_read+0x57
                          0xffffffffac6318e7  pipe_read+0x57
                          0xffffffffac623862  vfs_read+0x332
                          0xffffffffac62434b  ksys_read+0xbb
                          0xfffffffface604b2  do_syscall_64+0x82
                          0xffffffffad00012f  entry_SYSCALL_64_after_hwframe+0x76
          36    193.71 us     15.27 us      5.38 us        mutex   pipe_write+0x50
                          0xffffffffac631ee0  pipe_write+0x50
                          0xffffffffac6241db  vfs_write+0x3bb
                          0xffffffffac6244ab  ksys_write+0xbb
                          0xfffffffface604b2  do_syscall_64+0x82
                          0xffffffffad00012f  entry_SYSCALL_64_after_hwframe+0x76
           4     51.22 us     16.47 us     12.80 us        mutex   do_epoll_wait+0x24d
                          0xffffffffac691f0d  do_epoll_wait+0x24d
                          0xffffffffac69249b  do_epoll_pwait.part.0+0xb
                          0xffffffffac693ba5  __x64_sys_epoll_pwait+0x95
                          0xfffffffface604b2  do_syscall_64+0x82
                          0xffffffffad00012f  entry_SYSCALL_64_after_hwframe+0x76

  === owner stack trace ===

           3     31.24 us     15.27 us     10.41 us        mutex   pipe_read+0x348
                          0xffffffffac631bd8  pipe_read+0x348
                          0xffffffffac623862  vfs_read+0x332
                          0xffffffffac62434b  ksys_read+0xbb
                          0xfffffffface604b2  do_syscall_64+0x82
                          0xffffffffad00012f  entry_SYSCALL_64_after_hwframe+0x76
  ...

Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-5-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf lock: Make rb_tree helper functions generic
Chun-Tse Shao [Thu, 27 Feb 2025 00:28:55 +0000 (16:28 -0800)]
perf lock: Make rb_tree helper functions generic

The rb_tree helper functions can be reused for parsing `owner_lock_stat`
into rb tree for sorting.

Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-4-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf lock: Retrieve owner callstack in bpf program
Chun-Tse Shao [Thu, 27 Feb 2025 00:28:54 +0000 (16:28 -0800)]
perf lock: Retrieve owner callstack in bpf program

This implements per-callstack aggregation of lock owners in addition to
per-thread.  The owner callstack is captured using `bpf_get_task_stack()`
at `contention_begin()` and it also adds a custom stackid function for the
owner stacks to be compared easily.

The owner info is kept in a hash map using lock addr as a key to handle
multiple waiters for the same lock.  At `contention_end()`, it updates the
owner lock stat based on the info that was saved at `contention_begin()`.
If there are more waiters, it'd update the owner pid to itself as
`contention_end()` means it gets the lock now.  But it also needs to check
the return value of the lock function in case task was killed by a signal
or something.

Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-3-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf lock: Add bpf maps for owner stack tracing
Chun-Tse Shao [Thu, 27 Feb 2025 00:28:53 +0000 (16:28 -0800)]
perf lock: Add bpf maps for owner stack tracing

Add a struct and few bpf maps in order to tracing owner stack.
`struct owner_tracing_data`: Contains owner's pid, stack id, timestamp for
  when the owner acquires lock, and the count of lock waiters.
`stack_buf`: Percpu buffer for retrieving owner stacktrace.
`owner_stacks`: For tracing owner stacktrace to customized owner stack id.
`owner_data`: For tracing lock_address to `struct owner_tracing_data` in
  bpf program.
`owner_stat`: For reporting owner stacktrace in usermode.

Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-2-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf cpumap: Reduce cpu size from int to int16_t
Ian Rogers [Mon, 10 Feb 2025 19:12:31 +0000 (11:12 -0800)]
perf cpumap: Reduce cpu size from int to int16_t

Fewer than 32k logical CPUs are currently supported by perf. A cpumap
is indexed by an integer (see perf_cpu_map__cpu) yielding a perf_cpu
that wraps a 4-byte int for the logical CPU - the wrapping is done
deliberately to avoid confusing a logical CPU with an index into a
cpumap. Using a 4-byte int within the perf_cpu is larger than required
so this patch reduces it to the 2-byte int16_t. For a cpumap
containing 16 entries this will reduce the array size from 64 to 32
bytes. For very large servers with lots of logical CPUs the size
savings will be greater.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250210191231.156294-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf trace: Add missing perf_tool__init()
Athira Rajeev [Tue, 25 Feb 2025 11:31:57 +0000 (17:01 +0530)]
perf trace: Add missing perf_tool__init()

Perf trace on perf.data fails as below:

./perf trace record -- sleep 1
./perf trace -i perf.data
perf: Segmentation fault
Segmentation fault (core dumped)

Backtrace pointed to :
?? ()
perf_session.process_user_event ()
reader.read_event ()
perf_session.process_events ()
cmd_trace ()
run_builtin ()
handle_internal_command ()
main ()

Further debug pointed that, segmentation fault happens when
trying to access id_index. Code snippet:

case PERF_RECORD_ID_INDEX:
err = tool->id_index(session, event);

Since 'commit 15d4a6f41d72 ("perf tool: Remove
perf_tool__fill_defaults()")', perf_tool__fill_defaults is
removed. All tools are initialized using perf_tool__init()
prior to use. But in builtin-trace, perf_tool__init is not
used and hence the defaults are not initialized. Use
perf_tool__init() in perf trace to handle the initialization.

Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250225113157.28836-1-atrajeev@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf list: Document -v option deduplication feature
James Clark [Wed, 26 Feb 2025 10:41:02 +0000 (10:41 +0000)]
perf list: Document -v option deduplication feature

-v disables deduplication of similarly suffixed PMUs so add it to the
help and doc strings.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250226104111.564443-4-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf pmu: Don't double count common sysfs and json events
James Clark [Wed, 26 Feb 2025 10:41:01 +0000 (10:41 +0000)]
perf pmu: Don't double count common sysfs and json events

After pmu_add_cpu_aliases() is called, perf_pmu__num_events() returns an
incorrect value that double counts common events and doesn't match the
actual count of events in the alias list. This is because after
'cpu_aliases_added == true', the number of events returned is
'sysfs_aliases + cpu_json_aliases'. But when adding 'case
EVENT_SRC_SYSFS' events, 'sysfs_aliases' and 'cpu_json_aliases' are both
incremented together, failing to account that these ones overlap and
only add a single item to the list. Fix it by adding another counter for
overlapping events which doesn't influence 'cpu_json_aliases'.

There doesn't seem to be a current issue because it's used in perf list
before pmu_add_cpu_aliases() so the correct value is returned. Other
uses in tests may also miss it for other reasons like only looking at
uncore events. However it's marked as a fixes commit in case any new fix
with new uses of perf_pmu__num_events() is backported.

Fixes: d9c5f5f94c2d ("perf pmu: Count sys and cpuid JSON events separately")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250226104111.564443-3-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf pmu: Dynamically allocate tool PMU
James Clark [Wed, 26 Feb 2025 10:41:00 +0000 (10:41 +0000)]
perf pmu: Dynamically allocate tool PMU

perf_pmus__destroy() treats all PMUs as allocated and free's them so we
can't have any static PMUs that are added to the PMU lists. Fix it by
allocating the tool PMU in the same way as the others. Current users of
the tool PMU already use find_pmu() and not perf_pmus__tool_pmu(), so
rename the function to add 'new' to avoid it being misused in the
future.

perf_pmus__fake_pmu() can remain as static as it's not added to the
PMU lists.

Fixes the following error:

  $ perf bench internals pmu-scan

  # Running 'internals/pmu-scan' benchmark:
  Computing performance of sysfs PMU event scan for 100 times
  munmap_chunk(): invalid pointer
  Aborted (core dumped)

Fixes: 240505b2d0ad ("perf tool_pmu: Factor tool events into their own PMU")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250226104111.564443-2-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf probe: Pick the correct dwarf die while adding probe points
Athira Rajeev [Tue, 25 Feb 2025 12:30:42 +0000 (18:00 +0530)]
perf probe: Pick the correct dwarf die while adding probe points

Perf probe on vfs_fstatat fails as below on a powerpc system

  $ ./perf probe -nf --max-probes=512 -a 'vfs_fstatat $params'
  Segmentation fault (core dumped)

This is observed while running perftool-testsuite_probe testcase.

While running with verbose, its observed that segfault happens
at:

   synthesize_probe_trace_arg ()
   synthesize_probe_trace_command ()
   probe_file.add_event ()
   apply_perf_probe_events ()
   __cmd_probe ()
   cmd_probe ()
   run_builtin ()
   handle_internal_command ()
   main ()

Code in synthesize_probe_trace_arg() access a null value and results in
segfault. Data structure which is null:
struct probe_trace_arg arg->value

We are hitting a case where arg->value is null in probe point:
"vfs_fstatat $params". This is happening since 'commit e896474fe485
("getname_maybe_null() - the third variant of pathname copy-in")'
Before the commit, probe point for vfs_fstatat was getting added only
for one location:

  Writing event: p:probe/vfs_fstatat _text+6345404 dfd=%gpr3:s32 filename=%gpr4:x64 stat=%gpr5:x64 flags=%gpr6:s32

With this change, vfs_fstatat code is inlined for other locations in the
code:
  Probe point found: __do_sys_lstat64+48
  Probe point found: __do_sys_stat64+48
  Probe point found: __do_sys_newlstat+48
  Probe point found: __do_sys_newstat+48
  Probe point found: vfs_fstatat+0

When trying to find matching dwarf information entry (DIE)
from the debuginfo, the code incorrectly picks DIE which is
not referring to vfs_fstatat. Snippet from dwarf entry in vmlinux
debuginfo file.

The main abstract die is:
 <1><4214883>: Abbrev Number: 147 (DW_TAG_subprogram)
    <4214885>   DW_AT_external    : 1
    <4214885>   DW_AT_name        : (indirect string, offset: 0x17b9f3): vfs_fstatat

With formal parameters:
 <2><4214896>: Abbrev Number: 51 (DW_TAG_formal_parameter)
    <4214897>   DW_AT_name        : dfd
 <2><42148a3>: Abbrev Number: 23 (DW_TAG_formal_parameter)
    <42148a4>   DW_AT_name        : (indirect string, offset: 0x8fda9): filename
 <2><42148b0>: Abbrev Number: 23 (DW_TAG_formal_parameter)
    <42148b1>   DW_AT_name        : (indirect string, offset: 0x16bd9c): stat
 <2><42148bd>: Abbrev Number: 23 (DW_TAG_formal_parameter)
    <42148be>   DW_AT_name        : (indirect string, offset: 0x39832b): flags

While collecting variables/parameters for a probe point, the function
copy_variables_cb() also looks at dwarf debug entries based on the
instruction address. Snippet

        if (dwarf_haspc(die_mem, vf->pf->addr))
                return DIE_FIND_CB_CONTINUE;
        else
                return DIE_FIND_CB_SIBLING;

But incase of inlined function instance for vfs_fstatat, there are two
entries which has the instruction address entry point as same.

Instance 1: which is for vfs_fstatat and DW_AT_abstract_origin points to
0x4214883 (reference above for main abstract die)

 <3><42131fa>: Abbrev Number: 59 (DW_TAG_inlined_subroutine)
    <42131fb>   DW_AT_abstract_origin: <0x4214883>
    <42131ff>   DW_AT_entry_pc    : 0xc00000000062b1e0

Instance 2: which is not for vfs_fstatat but for getname

 <5><4213270>: Abbrev Number: 39 (DW_TAG_inlined_subroutine)
    <4213271>   DW_AT_abstract_origin: <0x4215b6b>
    <4213275>   DW_AT_entry_pc    : 0xc00000000062b1e0

But the copy_variables_cb() continues to add parameters from second
instance also based on the dwarf_haspc() check. This results in
formal parameters for getname also appended to params. But while
filling in the args->value for these parameters, since these args
are not part of dwarf with offset "42131fa". Hence value will be
null. This incorrect args results in segfault when value field is
accessed.

Save the dwarf dieoffset of the actual DW_TAG_subprogram as part of
"struct probe_finder". In copy_variables_cb(), include check to make
sure the DW_AT_abstract_origin points to the correct entry if the
dwarf_haspc() matches the instruction address.

Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20250225123042.37263-1-atrajeev@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf ftrace latency: allow to hide empty buckets
Gabriele Monaco [Fri, 7 Feb 2025 08:04:45 +0000 (09:04 +0100)]
perf ftrace latency: allow to hide empty buckets

Especially while using several buckets, it isn't uncommon to have some
of them empty and reading the histogram may be a bit more complex:

  # perf ftrace latency -a -T mutex_lock --bucket-range 5 --max-latency 200
  #   DURATION     |  COUNT | GRAPH                                  |
       0 -    5 us |  14816 | ###################################### |
       5 -   10 us |   1228 | ###                                    |
      10 -   15 us |    438 | #                                      |
      15 -   20 us |    106 |                                        |
      20 -   25 us |     21 |                                        |
      25 -   30 us |     11 |                                        |
      30 -   35 us |      1 |                                        |
      35 -   40 us |      2 |                                        |
      40 -   45 us |      4 |                                        |
      45 -   50 us |      0 |                                        |
      50 -   55 us |      1 |                                        |
      55 -   60 us |      0 |                                        |
      60 -   65 us |      1 |                                        |
      65 -   70 us |      1 |                                        |
      70 -   75 us |      1 |                                        |
      75 -   80 us |      2 |                                        |
      80 -   85 us |      0 |                                        |
      85 -   90 us |      1 |                                        |
      90 -   95 us |      0 |                                        |
      95 -  100 us |      1 |                                        |
     100 -  105 us |      0 |                                        |
     105 -  110 us |      0 |                                        |
     110 -  115 us |      0 |                                        |
     115 -  120 us |      0 |                                        |
     120 -  125 us |      1 |                                        |
     125 -  130 us |      0 |                                        |
     130 -  135 us |      0 |                                        |
     135 -  140 us |      1 |                                        |
     140 -  145 us |      0 |                                        |
     145 -  150 us |      0 |                                        |
     150 -  155 us |      0 |                                        |
     155 -  160 us |      0 |                                        |
     160 -  165 us |      0 |                                        |
     165 -  170 us |      0 |                                        |
     170 -  175 us |      0 |                                        |
     175 -  180 us |      0 |                                        |
     180 -  185 us |      0 |                                        |
     185 -  190 us |      0 |                                        |
     190 -  195 us |      0 |                                        |
     195 -  200 us |      0 |                                        |
     200 -  ... us |      2 |                                        |

Allow the optional flag --hide-empty to remove buckets with no element
and produce a more compact graph. This feature could be misleading since
there is no clear indication for missing buckets, for this reason it's
disabled by default.

  # perf ftrace latency -a -T mutex_lock --bucket-range 5 --max-latency --hide-empty 200
  #   DURATION     |  COUNT | GRAPH                                  |
       0 -    5 us |  14816 | ###################################### |
       5 -   10 us |   1228 | ###                                    |
      10 -   15 us |    438 | #                                      |
      15 -   20 us |    106 |                                        |
      20 -   25 us |     21 |                                        |
      25 -   30 us |     11 |                                        |
      30 -   35 us |      1 |                                        |
      35 -   40 us |      2 |                                        |
      40 -   45 us |      4 |                                        |
      50 -   55 us |      1 |                                        |
      60 -   65 us |      1 |                                        |
      65 -   70 us |      1 |                                        |
      70 -   75 us |      1 |                                        |
      75 -   80 us |      2 |                                        |
      85 -   90 us |      1 |                                        |
      95 -  100 us |      1 |                                        |
     120 -  125 us |      1 |                                        |
     135 -  140 us |      1 |                                        |
     200 -  ... us |      2 |                                        |

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20250207080446.77630-2-gmonaco@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf ftrace latency: variable histogram buckets
Gabriele Monaco [Fri, 7 Feb 2025 08:04:44 +0000 (09:04 +0100)]
perf ftrace latency: variable histogram buckets

The max-latency value can make the histogram smaller, but not larger, we
have a maximum of 22 buckets and specifying a max-latency that would
require more buckets has no effect.

Dynamically allocate the buckets and compute the bucket number from the
max latency as (max-min) / range + 2

If the maximum is not specified, we still set the bucket number to 22
and compute the maximum accordingly.

Fail if the maximum is smaller than min+range, this way we make sure we
always have 3 buckets: those below min, those above max and one in the
middle.

Since max-latency is not available in log2 mode, always use 22 buckets.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20250207080446.77630-1-gmonaco@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf annotate-data: Handle direct use of stack pointer without fbreg
Namhyung Kim [Sun, 26 Jan 2025 21:02:42 +0000 (13:02 -0800)]
perf annotate-data: Handle direct use of stack pointer without fbreg

Sometimes compiler generates code to use the stack pointer register
without frame pointer.  As we know RSP is the stack register on x86,
let's treat it as same as fbreg.  But the offset would be opposite
direction so update the debug message accordingly.

Reported-by: Blake Jones <blakejones@google.com>
Link: https://lore.kernel.org/r/20250126210242.1181225-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf report: Fix sample number stats for branch entry mode
Thomas Falcon [Thu, 20 Feb 2025 04:59:42 +0000 (22:59 -0600)]
perf report: Fix sample number stats for branch entry mode

Currently, stats->nr_samples is incremented per entry in the branch stack
instead of per sample taken. As a result, statistics of samples taken
during perf record in --branch-filter or --branch-any mode does not
seem correct. Instead call hists__inc_nr_samples() for each sample taken
instead of for each entry in the branch stack.

Before:

$ ./perf record -e cycles:u -b -c 10000000000 ./tchain_edit
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.005 MB perf.data (2 samples) ]
$ perf report -D | tail -n 16
Aggregated stats:
               TOTAL events:         16
                COMM events:          2  (12.5%)
                EXIT events:          1  ( 6.2%)
              SAMPLE events:          2  (12.5%)
               MMAP2 events:          2  (12.5%)
             KSYMBOL events:          1  ( 6.2%)
      FINISHED_ROUND events:          1  ( 6.2%)
            ID_INDEX events:          1  ( 6.2%)
          THREAD_MAP events:          1  ( 6.2%)
             CPU_MAP events:          1  ( 6.2%)
        EVENT_UPDATE events:          2  (12.5%)
           TIME_CONV events:          1  ( 6.2%)
       FINISHED_INIT events:          1  ( 6.2%)
cpu_core/cycles/u stats:
              SAMPLE events:         64

After:

$ ./perf report -D | tail -n 16
Aggregated stats:
               TOTAL events:         16
                COMM events:          2  (12.5%)
                EXIT events:          1  ( 6.2%)
              SAMPLE events:          2  (12.5%)
               MMAP2 events:          2  (12.5%)
             KSYMBOL events:          1  ( 6.2%)
      FINISHED_ROUND events:          1  ( 6.2%)
            ID_INDEX events:          1  ( 6.2%)
          THREAD_MAP events:          1  ( 6.2%)
             CPU_MAP events:          1  ( 6.2%)
        EVENT_UPDATE events:          2  (12.5%)
           TIME_CONV events:          1  ( 6.2%)
       FINISHED_INIT events:          1  ( 6.2%)
cpu_core/cycles/u stats:
              SAMPLE events:          2

Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20250220045942.114965-1-thomas.falcon@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf machine: Reuse module path buffer
Ian Rogers [Sat, 22 Feb 2025 06:10:13 +0000 (22:10 -0800)]
perf machine: Reuse module path buffer

Rather than copying the path and appending the directory entry in a
fresh path buffer, append to the path at the end of where it is for
the recursion level. This saves a PATH_MAX buffer per recursion level
and some unnecessary copying.

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf hwmon_pmu: Switch event discovery to io_dir__readdir
Ian Rogers [Sat, 22 Feb 2025 06:10:12 +0000 (22:10 -0800)]
perf hwmon_pmu: Switch event discovery to io_dir__readdir

Avoid DIR allocations when scanning sysfs by using io_dir for the
readdir implementation, that allocates about 1kb on the stack.

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf parse-events: Switch tracepoints to io_dir__readdir
Ian Rogers [Sat, 22 Feb 2025 06:10:11 +0000 (22:10 -0800)]
perf parse-events: Switch tracepoints to io_dir__readdir

Avoid DIR allocations when scanning sysfs by using io_dir for the
readdir implementation, that allocates about 1kb on the stack.

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf events: Remove scandir in thread synthesis
Ian Rogers [Sat, 22 Feb 2025 06:10:10 +0000 (22:10 -0800)]
perf events: Remove scandir in thread synthesis

This avoids scanddir reading the directory into memory that's
allocated and instead allocates on the stack.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250222061015.303622-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf header: Switch mem topology to io_dir__readdir
Ian Rogers [Sat, 22 Feb 2025 06:10:09 +0000 (22:10 -0800)]
perf header: Switch mem topology to io_dir__readdir

Switch memory_node__read and build_mem_topology from opendir/readdir
to io_dir__readdir, with smaller stack allocations. Reduces peak
memory consumption of perf record by 10kb.

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf pmu: Switch to io_dir__readdir
Ian Rogers [Sat, 22 Feb 2025 06:10:08 +0000 (22:10 -0800)]
perf pmu: Switch to io_dir__readdir

Avoid DIR allocations when scanning sysfs by using io_dir for the
readdir implementation, that allocates about 1kb on the stack.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf maps: Switch modules tree walk to io_dir__readdir
Ian Rogers [Sat, 22 Feb 2025 06:10:07 +0000 (22:10 -0800)]
perf maps: Switch modules tree walk to io_dir__readdir

Compared to glibc's opendir/readdir this lowers the max RSS of perf
record by 1.8MB on a Debian machine.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agotools lib api: Add io_dir an allocation free readdir alternative
Ian Rogers [Sat, 22 Feb 2025 06:10:06 +0000 (22:10 -0800)]
tools lib api: Add io_dir an allocation free readdir alternative

glibc's opendir allocates a minimum of 32kb, when called recursively
for a directory tree the memory consumption can add up - nearly 300kb
during perf start-up when processing modules. Add a stack allocated
variant of readdir sized a little more than 1kb.

As getdents64 may be missing from libc, add support using syscall. As
the system call number maybe missing, add #defines for those.

Note, an earlier version of this patch had a feature test for
getdents64 but there were problems on certains distros where
getdents64 would be #define renamed to getdents breaking the code. The
syscall use was made uncondtional to work around this. There is
context in:
https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf parse-events: Tidy name token matching
Ian Rogers [Thu, 9 Jan 2025 17:54:01 +0000 (09:54 -0800)]
perf parse-events: Tidy name token matching

Prior to commit 70c90e4a6b2f ("perf parse-events: Avoid scanning PMUs
before parsing") names (generally event names) excluded hyphen (minus)
symbols as the formation of legacy names with hyphens was handled in
the yacc code. That commit allowed hyphens supposedly making
name_minus unnecessary. However, changing name_minus to name has
issues in the term config tokens as then name ends up having priority
over numbers and name allows matching numbers since commit
5ceb57990bf4 ("perf parse: Allow tracepoint names to start with digits
"). It is also permissable for a name to match with a colon (':') in
it when its in a config term list. To address this rename name_minus
to term_name, make the pattern match name's except for the colon, add
number matching into the config term region with a higher priority
than name matching. This addresses an inconsistency and allows greater
matching for names inside of term lists, for example, they may start
with a number.

Rename name_tag to quoted_name and update comments and helper
functions to avoid str detecting quoted strings which was already done
by the lexer.

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250109175401.161340-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf tools: Improve startup time by reducing unnecessary stat() calls
Krzysztof Łopatowski [Thu, 6 Feb 2025 11:33:15 +0000 (12:33 +0100)]
perf tools: Improve startup time by reducing unnecessary stat() calls

When testing perf trace on NixOS, I noticed significant startup delays:
- `ls`: ~2ms
- `strace ls`: ~10ms
- `perf trace ls`: ~550ms

Profiling showed that 51% of the time is spent reading files,
26% in loading BPF programs, and 11% in `newfstatat`.

This patch optimizes module path exploration by avoiding `stat()` calls
unless necessary. For filesystems that do not implement `d_type`
(DT_UNKNOWN), it falls back to the old behavior.
See `readdir(3)` for details.

This reduces `perf trace ls` time to ~500ms.

A more thorough startup optimization based on command parameters would
be ideal, but that is a larger effort.

Signed-off-by: Krzysztof Łopatowski <krzysztof.m.lopatowski@gmail.com>
Acked-by: Howard Chu <howardchu95@gmail.com>
Link: https://lore.kernel.org/r/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf report: Fix input reload/switch with symbol sort key
Dmitry Vyukov [Wed, 8 Jan 2025 06:36:23 +0000 (07:36 +0100)]
perf report: Fix input reload/switch with symbol sort key

Currently the code checks that there is no "ipc" in the sort order
and add an ipc string. This will always error out on the second pass
after input reload/switch, since the sort order already contains "ipc".
Do the ipc check/fixup only on the first pass.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/r/20250108063628.215577-1-dvyukov@google.com
Fixes: ec6ae74fe8f0 ("perf report: Display average IPC and IPC coverage per symbol")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf report: Support switching data w/ and w/o callchains
Namhyung Kim [Tue, 11 Feb 2025 06:07:45 +0000 (22:07 -0800)]
perf report: Support switching data w/ and w/o callchains

The symbol_conf.use_callchain should be reset when switching to new data
file, otherwise report__setup_sample_type() will show an error message
that it enabled callchains but no callchain data.  The function also
will turn on the callchains if the data has PERF_SAMPLE_CALLCHAIN so I
think it's ok to reset symbol_conf.use_callchain here.

Link: https://lore.kernel.org/r/20250211060745.294289-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf report: Switch data file correctly in TUI
Namhyung Kim [Tue, 11 Feb 2025 06:07:44 +0000 (22:07 -0800)]
perf report: Switch data file correctly in TUI

The 's' key is to switch to a new data file and load the data in the
same window.  The switch_data_file() will show a popup menu to select
which data file user wants and update the 'input_name' global variable.

But in the cmd_report(), it didn't update the data.path using the new
'input_name' and keep usng the old file.  This is fairly an old bug and
I assume people don't use this feature much. :)

Link: https://lore.kernel.org/r/20250211060745.294289-1-namhyung@kernel.org
Closes: https://lore.kernel.org/linux-perf-users/89e678bc-f0af-4929-a8a6-a2666f1294a4@linaro.org
Fixes: f5fc14124c5cefdd ("perf tools: Add data object to handle perf data file")
Reported-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf tools: Fix up some comments and code to properly use the event_source bus
Greg Kroah-Hartman [Wed, 19 Feb 2025 13:40:56 +0000 (14:40 +0100)]
perf tools: Fix up some comments and code to properly use the event_source bus

In sysfs, the perf events are all located in
/sys/bus/event_source/devices/ but some places ended up hard-coding the
location to be at the root of /sys/devices/ which could be very risky as
you do not exactly know what type of device you are accessing in sysfs
at that location.

So fix this all up by properly pointing everything at the bus device
list instead of the root of the sysfs devices/ tree.

Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/2025021955-implant-excavator-179d@gregkh
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf list: Also append PMU name in verbose mode
James Clark [Wed, 19 Feb 2025 15:16:21 +0000 (15:16 +0000)]
perf list: Also append PMU name in verbose mode

When listing in verbose mode, the long description is used but the PMU
name isn't appended. There doesn't seem to be a reason to exclude it
when asking for more information, so use the same print block for both
long and short descriptions.

Before:
  $ perf list -v
  ...
  inst_retired
       [Instruction architecturally executed]

After:
  $ perf list -v
  ...
   inst_retired
       [Instruction architecturally executed. Unit: armv8_cortex_a57]

Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250219151622.1097289-1-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 months agoperf vendor events arm64: Fix incorrect CPU_CYCLE in metrics expr
Yangyu Chen [Thu, 13 Feb 2025 08:44:09 +0000 (16:44 +0800)]
perf vendor events arm64: Fix incorrect CPU_CYCLE in metrics expr

Some existing metrics for Neoverse N3 and V3 expressions use CPU_CYCLE
to represent the number of cycles, but this is incorrect. The correct
event to use is CPU_CYCLES.

I encountered this issue while working on a patch to add pmu events for
Cortex A720 and A520 by reusing the existing patch for Neoverse N3 and
V3 by James Clark [1] and my check script [2] reported this issue.

[1] https://lore.kernel.org/lkml/20250122163504.2061472-1-james.clark@linaro.org/
[2] https://github.com/cyyself/arm-pmu-check

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/tencent_D4ED18476ADCE818E31084C60E3E72C14907@qq.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
5 months agoperf script: Fix hangup in offline flamegraph report
Namhyung Kim [Wed, 19 Feb 2025 00:05:58 +0000 (16:05 -0800)]
perf script: Fix hangup in offline flamegraph report

A recent change in the flamegraph script fixed an issue with live mode
but it created another for offline mode.  It needs to pass "-" to -i
option to read from stdin in the live mode.  Actually there's a logic
to pass the option in the perf script code, but the script was written
with "-- $@" which prevented the option to go to the perf script.  So
the previous commit added the hard-coded "-i -" to the report command.

But it's a problem for the offline mode which expects input from a file
and now it's stuck on reading from stdin.  Let's remove the "-i - --"
part and let it pass the options properly to perf script.

Closes: https://lore.kernel.org/linux-perf-users/c41e4b04-e1fd-45ab-80b0-ec2ac6e94310@linux.ibm.com
Fixes: 23e0a63c6dd3f69c ("perf script: force stdin for flamegraph in live mode")
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Anubhav Shelat <ashelat@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
5 months agoperf hist: Shrink struct hist_entry size
Dmitry Vyukov [Thu, 13 Feb 2025 09:08:22 +0000 (10:08 +0100)]
perf hist: Shrink struct hist_entry size

Reorder the struct fields by size to reduce paddings and reduce
struct simd_flags size from 8 to 1 byte.

This reduces struct hist_entry size by 8 bytes (592->584),
and leaves a single more usable 6 byte padding hole.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/7c1cb1c8f9901e945162701ba7269d0f9c70be89.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
5 months agoperf test: Add tests for latency and parallelism profiling
Dmitry Vyukov [Thu, 13 Feb 2025 09:08:21 +0000 (10:08 +0100)]
perf test: Add tests for latency and parallelism profiling

Ensure basic operation of latency/parallelism profiling and that
main latency/parallelism record/report invocations don't fail/crash.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/c129c8f02f328f68e1e9ef2cdc582f8a9786a97d.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
5 months agoperf report: Add latency and parallelism profiling documentation
Dmitry Vyukov [Thu, 13 Feb 2025 09:08:20 +0000 (10:08 +0100)]
perf report: Add latency and parallelism profiling documentation

Describe latency and parallelism profiling, related flags, and differences
with the currently only supported CPU-consumption-centric profiling.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/a13f270ed33cedb03ce9ebf9ddbd064854ca0f19.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
5 months agoperf report: Add --latency flag
Dmitry Vyukov [Thu, 13 Feb 2025 09:08:19 +0000 (10:08 +0100)]
perf report: Add --latency flag

Add record/report --latency flag that allows to capture and show
latency-centric profiles rather than the default CPU-consumption-centric
profiles. For latency profiles record captures context switch events,
and report shows Latency as the first column.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/e9640464bcbc47dde2cb557003f421052ebc9eec.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>