Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240401062724.1006010-3-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When 'perf sched' enables the call-graph recording, sample_type of dummy
event does not have PERF_SAMPLE_CALLCHAIN, timehist_check_attr() checks
that the evsel does not have a callchain, and set show_callchain to 0.
Currently 'perf sched timehist' only saves callchain when processing the
'sched:sched_switch event', timehist_check_attr() only needs to determine
whether the event has PERF_SAMPLE_CALLCHAIN.
Before:
# perf sched record -g true
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 4.153 MB perf.data (7536 samples) ]
# perf sched timehist
Samples do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
147851.826019 [0000] perf[285035] 0.000 0.000 0.000
147851.826029 [0000] migration/0[15] 0.000 0.003 0.009
147851.826063 [0001] perf[285035] 0.000 0.000 0.000
147851.826069 [0001] migration/1[21] 0.000 0.003 0.006
<SNIP>
Fixes: 9c95e4ef06572349 ("perf evlist: Add evlist__findnew_tracking_event() helper") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240401062724.1006010-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 22 Mar 2024 22:43:13 +0000 (15:43 -0700)]
perf annotate: Honor output options with --data-type
For data type profiling output, it should be in sync with normal output
so make it display percentage for each field. Also use coloring scheme
for users to identify fields with big overhead easily.
Users can use --show-total-period or --show-nr-samples to change the
output style like in the normal perf annotate output.
Before:
$ perf annotate --data-type
Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples):
============================================================================
samples offset size field
34 0 9792 struct task_struct {
2 0 24 struct thread_info thread_info {
0 0 8 long unsigned int flags;
1 8 8 long unsigned int syscall_work;
0 16 4 u32 status;
1 20 4 u32 cpu;
};
After:
$ perf annotate --data-type
Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples):
============================================================================
Percent offset size field
100.00 0 9792 struct task_struct {
3.55 0 24 struct thread_info thread_info {
0.00 0 8 long unsigned int flags;
1.63 8 8 long unsigned int syscall_work;
0.00 16 4 u32 status;
1.91 20 4 u32 cpu;
};
Committer testing:
First collect a suitable perf.data file for use with 'perf annotate --data-type':
root@number:~# perf mem record -a sleep 1s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 11.047 MB perf.data (3466 samples) ]
root@number:~#
Then, before:
root@number:~# perf annotate --data-type
Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples):
============================================================================
samples offset size field
6 0 40 union {
6 0 40 struct __pthread_mutex_s __data {
2 0 4 int __lock;
0 4 4 unsigned int __count;
0 8 4 int __owner;
1 12 4 unsigned int __nusers;
2 16 4 int __kind;
1 20 2 short int __spins;
0 22 2 short int __elision;
0 24 16 __pthread_list_t __list {
0 24 8 struct __pthread_internal_list* __prev;
0 32 8 struct __pthread_internal_list* __next;
};
};
0 0 0 char* __size;
2 0 8 long int __align;
};
<SNIP>
And after:
Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples):
============================================================================
Percent offset size field
100.00 0 40 union {
100.00 0 40 struct __pthread_mutex_s __data {
31.27 0 4 int __lock;
0.00 4 4 unsigned int __count;
0.00 8 4 int __owner;
7.67 12 4 unsigned int __nusers;
53.10 16 4 int __kind;
7.96 20 2 short int __spins;
0.00 22 2 short int __elision;
0.00 24 16 __pthread_list_t __list {
0.00 24 8 struct __pthread_internal_list* __prev;
0.00 32 8 struct __pthread_internal_list* __next;
};
};
0.00 0 0 char* __size;
31.27 0 8 long int __align;
};
<SNIP>
The lines with percentages >= 7.67 have its percentages red colored.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240322224313.423181-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 22 Mar 2024 22:43:12 +0000 (15:43 -0700)]
perf annotate: Get rid of duplicate --group option item
The options array in cmd_annotate() has duplicate --group options. It
only needs one and let's get rid of the other.
$ perf annotate -h 2>&1 | grep group
--group Show event group information together
--group Show event group information together
Fixes: 7ebaf4890f63eb90 ("perf annotate: Support '--group' option") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240322224313.423181-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 22 Mar 2024 14:34:26 +0000 (11:34 -0300)]
perf tools: Add Kan Liang to MAINTAINERS as a reviewer
Kan has been reviewing patches regularly, add him as a perf tools
reviewer so that people CC him on new patches.
Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: "Liang, Kan" <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 11 Mar 2024 20:07:33 +0000 (17:07 -0300)]
perf beauty: Move uapi/linux/vhost.h copy out of the directory used to build perf
It is only used to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/include/ hierarchy, that is used just for
scraping.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
No other tools/ living code uses it, just <linux/vhost.h> coming from
either 'make install_headers' or from the system /usr/include/
directory.
Ian Rogers [Thu, 21 Mar 2024 16:02:48 +0000 (09:02 -0700)]
perf dso: Reorder members to save space in 'struct dso'
Save 40 bytes and move from 8 to 7 cache lines. Make member dwfl
dependent on being a powerpc build. Squeeze bits of int/enum types
when appropriate. Remove holes/padding by reordering variables.
/* size: 424, cachelines: 7, members: 48 */
/* sum members: 415 */
/* sum bitfield members: 31 bits, bit holes: 1, sum bit holes: 1 bits */
/* padding: 5 */
/* paddings: 1, sum paddings: 4 */
/* forced alignments: 1 */
/* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240321160300.1635121-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Anne Macedo [Tue, 19 Mar 2024 14:36:26 +0000 (14:36 +0000)]
perf lock contention: Trim backtrace by skipping traceiter functions
The 'perf lock contention' program currently shows the caller of the locks
as __traceiter_contention_begin+0x??. This caller can be ignored, as it is
from the traceiter itself. Instead, it should show the real callers for
the locks.
When fiddling with the --stack-skip parameter, the actual callers for
the locks start to show up. However, just ignore the
__traceiter_contention_begin and the __traceiter_contention_end symbols
so the actual callers will show up.
Before this patch is applied:
sudo perf lock con -a -b -- sleep 3
contended total wait max wait avg wait type caller
8 2.33 s 2.28 s 291.18 ms rwlock:W __traceiter_contention_begin+0x44
4 2.33 s 2.28 s 582.35 ms rwlock:W __traceiter_contention_begin+0x44
7 140.30 ms 46.77 ms 20.04 ms rwlock:W __traceiter_contention_begin+0x44
2 63.35 ms 33.76 ms 31.68 ms mutex trace_contention_begin+0x84
2 46.74 ms 46.73 ms 23.37 ms rwlock:W __traceiter_contention_begin+0x44
1 13.54 us 13.54 us 13.54 us mutex trace_contention_begin+0x84
1 3.67 us 3.67 us 3.67 us rwsem:R __traceiter_contention_begin+0x44
Before this patch is applied - using --stack-skip 5
sudo perf lock con --stack-skip 5 -a -b -- sleep 3
contended total wait max wait avg wait type caller
2 2.24 s 2.24 s 1.12 s rwlock:W do_epoll_wait+0x5a0
4 1.65 s 824.21 ms 412.08 ms rwlock:W do_exit+0x338
2 824.35 ms 824.29 ms 412.17 ms spinlock get_signal+0x108
2 824.14 ms 824.14 ms 412.07 ms rwlock:W release_task+0x68
1 25.22 ms 25.22 ms 25.22 ms mutex cgroup_kn_lock_live+0x58
1 24.71 us 24.71 us 24.71 us spinlock do_exit+0x44
1 22.04 us 22.04 us 22.04 us rwsem:R lock_mm_and_find_vma+0xb0
After this patch is applied:
sudo ./perf lock con -a -b -- sleep 3
contended total wait max wait avg wait type caller
4 4.13 s 2.07 s 1.03 s rwlock:W release_task+0x68
2 2.07 s 2.07 s 1.03 s rwlock:R mm_update_next_owner+0x50
2 2.07 s 2.07 s 1.03 s rwlock:W do_exit+0x338
1 41.56 ms 41.56 ms 41.56 ms mutex cgroup_kn_lock_live+0x58
2 36.12 us 18.83 us 18.06 us rwlock:W do_exit+0x338
Signed-off-by: Anne Macedo <retpolanne@posteo.net> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240319143629.3422590-1-retpolanne@posteo.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Updates various descriptions and removes the event
UNC_IIO_NUM_REQ_FROM_CPU.IRP.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-12-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This change didn't increase the version number from v58.
Updates various descriptions.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Various description updates. Adds the event
OFFCORE_RESPONSE.ALL_READS.L3_HIT.HIT_OTHER_CORE_FWD.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Various description updates. Adds topdown events
TOPDOWN_BAD_SPECULATION.ALL_P, TOPDOWN_BE_BOUND.ALL_P,
TOPDOWN_FE_BOUND.ALL_P and TOPDOWN_RETIRING.ALL_P.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Various description updates. Adds uncore events
UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR_LOCAL,
UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR_REMOTE,
UNC_CHA_TOR_INSERTS.IO_ITOM_LOCAL, UNC_CHA_TOR_INSERTS.IO_ITOM_REMOTE,
UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_LOCAL,
UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_REMOTE,
UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOMCACHENEAR_LOCAL,
UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOMCACHENEAR_REMOTE,
UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOM_LOCAL,
UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOM_REMOTE,
UNC_CHA_TOR_OCCUPANCY.IO_MISS_PCIRDCUR_LOCAL,
UNC_CHA_TOR_OCCUPANCY.IO_MISS_PCIRDCUR_REMOTE and removes core events
AMX_OPS_RETIRED.BF16 and AMX_OPS_RETIRED.INT8.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Various description updates. Adds topdown, offcore and uncore events
OCR.DEMAND_DATA_RD.L3_HIT, OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_NO_FWD,
OCR.DEMAND_RFO.L3_HIT, OCR.DEMAND_DATA_RD.L3_MISS,
OCR.DEMAND_RFO.L3_MISS, OCR.DEMAND_DATA_RD.ANY_RESPONSE,
OCR.DEMAND_DATA_RD.DRAM, OCR.DEMAND_RFO.ANY_RESPONSE,
OCR.DEMAND_RFO.DRAM, TOPDOWN_BAD_SPECULATION.ALL_P,
TOPDOWN_BE_BOUND.ALL_P, TOPDOWN_FE_BOUND.ALL_P,
TOPDOWN_RETIRING.ALL_P, UNC_ARB_DAT_OCCUPANCY.RD and
UNC_HAC_ARB_COH_TRK_REQUESTS.ALL.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Various encoding and description updates. Adds the events
CPU_CLK_UNHALTED.CORE, CPU_CLK_UNHALTED.CORE_P,
CPU_CLK_UNHALTED.REF_TSC_P, CPU_CLK_UNHALTED.THREAD,
MISC_RETIRED.LBR_INSERTS, TOPDOWN_BAD_SPECULATION.ALL_P,
TOPDOWN_BE_BOUND.ALL_P, TOPDOWN_FE_BOUND.ALL_P,
TOPDOWN_RETIRING.ALL_P.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes spelling and descriptions. Adds the uncore events
UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_LOCAL and
UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_REMOTE, while removing
UNC_IIO_NUM_REQ_FROM_CPU.IRP.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes spelling and descriptions. Adds topdown events and uncore cache
UNC_CHA_TOR_OCCUPANCY.IA_HIT_DRD_OPT,
UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_OPT,
UNC_CHA_TOR_OCCUPANCY.IA_DRD_OPT.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes spelling and descriptions. Adds cache miss latency events
UNC_CHA_TOR_(INSERTS|OCCUPANCY).IO_(PCIRDCUR|ITOM|ITOMCACHENEAR)_(LOCAL|REMOTE).
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 21 Mar 2024 14:13:30 +0000 (11:13 -0300)]
perf probe: Add missing libgen.h header needed for using basename()
This prototype is obtained indirectly, by luck, from some other header
in probe-event.c in most systems, but recently exploded on alpine:edge:
8 13.39 alpine:edge : FAIL gcc version 13.2.1 20240309 (Alpine 13.2.1_git20240309)
util/probe-event.c: In function 'convert_exec_to_group':
util/probe-event.c:225:16: error: implicit declaration of function 'basename' [-Werror=implicit-function-declaration]
225 | ptr1 = basename(exec_copy);
| ^~~~~~~~
util/probe-event.c:225:14: error: assignment to 'char *' from 'int' makes pointer from integer without a cast [-Werror=int-conversion]
225 | ptr1 = basename(exec_copy);
| ^
cc1: all warnings being treated as errors
make[3]: *** [/git/perf-6.8.0/tools/build/Makefile.build:158: util] Error 2
Fix it by adding the libgen.h header where basename() is prototyped.
Fixes: fb7345bbf7fad9bf ("perf probe: Support basic dwarf-based operations on uprobe events") Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There were needless two entries, one for 'newfstatat' and another for
'fstatat', keep just one and pretty print its 'flags' argument using the
fs_at_flags scnprintf that is also used by other FS syscalls such as
'stat', now:
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-6-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 20 Mar 2024 19:15:20 +0000 (16:15 -0300)]
perf trace: Beautify the 'flags' arg of unlinkat
Reusing the fs_at_flags array done for the 'stat' syscall.
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The fsaccessat and fsaccessat2 now have beautifiers for its arguments.
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 19 Mar 2024 17:49:32 +0000 (14:49 -0300)]
perf beauty: Introduce scrape script for the 'statx' syscall 'mask' argument
It was using the first variation on producing a string representation
for a binary flag, one that used the system's stat.h and preprocessor
tricks that had to be updated everytime a new flag was introduced.
Use the more recent scrape script + strarray +
strarray__scnprintf_flags() combo.
Now we need a copy of uapi/linux/stat.h from tools/include/ in the
scrape only directory tools/perf/trace/beauty/include.
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 19 Mar 2024 17:49:32 +0000 (14:49 -0300)]
perf beauty: Introduce scrape script for various fs syscalls 'flags' arguments
It was using the first variation on producing a string representation
for a binary flag, one that used the system's fcntl.h and preprocessor
tricks that had to be updated everytime a new flag was introduced.
Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo.
Now we need a copy of uapi/linux/fcntl.h from tools/include/ in the
scrape only directory tools/perf/trace/beauty/include and will use that
fs_at_flags array for other fs syscalls.
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 1 Mar 2024 17:47:11 +0000 (09:47 -0800)]
perf tests: Run tests in parallel by default
Switch from running tests sequentially to running in parallel by
default. Change the opt-in '-p' or '--parallel' flag to '-S' or
'--sequential'.
On an 8 core tigerlake an address sanitizer run time changes from:
326.54user 622.73system 6:59.91elapsed 226%CPU
to:
973.02user 583.98system 3:01.17elapsed 859%CPU
So over twice as fast, saving 4 minutes.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240301174711.2646944-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 1 Mar 2024 20:13:06 +0000 (12:13 -0800)]
perf help: Lower levenshtein penality for deleting character
The levenshtein penalty for deleting a character was far higher than
subsituting or inserting a character. Lower the penalty to match that
of inserting a character.
Before:
$ perf recccord
perf: 'recccord' is not a perf-command. See 'perf --help'.
$
After:
$ perf recccord
perf: 'recccord' is not a perf-command. See 'perf --help'.
Did you mean this?
record
$
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240301201306.2680986-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 1 Mar 2024 20:13:05 +0000 (12:13 -0800)]
perf tools: Suggest inbuilt commands for unknown command
The existing unknown command code looks for perf scripts like
perf-archive.sh and perf-iostat.sh, however, inbuilt commands aren't
suggested. Add the inbuilt commands so they may be suggested too.
Before:
$ perf reccord
perf: 'reccord' is not a perf-command. See 'perf --help'.
$
After:
$ perf reccord
perf: 'reccord' is not a perf-command. See 'perf --help'.
Did you mean this?
record
$
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240301201306.2680986-1-irogers@google.com
[ Added some fixes from Ian to problems I noticed while testing ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 1 Mar 2024 07:46:39 +0000 (23:46 -0800)]
perf test: Read child test 10 times a second rather than 1
Make the perf test output smoother by timing out the poll of the child
process after 100ms rather than 1s.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240301074639.2260708-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 1 Mar 2024 07:46:38 +0000 (23:46 -0800)]
perf test: Use a single fd for the child process out/err
Switch from dumping err then out, to a single file descriptor for both
of them. This allows the err and output to be correctly interleaved in
verbose output.
Fixes: b482f5f8e0168f1e ("perf tests: Add option to run tests in parallel") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240301074639.2260708-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 1 Mar 2024 07:46:37 +0000 (23:46 -0800)]
perf test: Stat output per thread of just the parent process
Per-thread mode requires either system-wide (-a), a pid (-p) or a tid
(-t).
The stat output tests were using system-wide mode but this is racy when
threads are starting and exiting - something that happens a lot when
running the tests in parallel (perf test -p).
Avoid the race conditions by using pid mode with the pid of the parent
process.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240301074639.2260708-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 1 Mar 2024 07:46:36 +0000 (23:46 -0800)]
perf record: Delete session after stopping sideband thread
The session has a header in it which contains a perf env with
bpf_progs. The bpf_progs are accessed by the sideband thread and so
the sideband thread must be stopped before the session is deleted, to
avoid a use after free. This error was detected by AddressSanitizer
in the following:
==2054673==ERROR: AddressSanitizer: heap-use-after-free on address 0x61d000161e00 at pc 0x55769289de54 bp 0x7f9df36d4ab0 sp 0x7f9df36d4aa8
READ of size 8 at 0x61d000161e00 thread T1
#0 0x55769289de53 in __perf_env__insert_bpf_prog_info util/env.c:42
#1 0x55769289dbb1 in perf_env__insert_bpf_prog_info util/env.c:29
#2 0x557692bbae29 in perf_env__add_bpf_info util/bpf-event.c:483
#3 0x557692bbb01a in bpf_event__sb_cb util/bpf-event.c:512
#4 0x5576928b75f4 in perf_evlist__poll_thread util/sideband_evlist.c:68
#5 0x7f9df96a63eb in start_thread nptl/pthread_create.c:444
#6 0x7f9df9726a4b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
0x61d000161e00 is located 384 bytes inside of 2136-byte region [0x61d000161c80,0x61d0001624d8)
freed by thread T0 here:
#0 0x7f9dfa6d7288 in __interceptor_free libsanitizer/asan/asan_malloc_linux.cpp:52
#1 0x557692978d50 in perf_session__delete util/session.c:319
#2 0x557692673959 in __cmd_record tools/perf/builtin-record.c:2884
#3 0x55769267a9f0 in cmd_record tools/perf/builtin-record.c:4259
#4 0x55769286710c in run_builtin tools/perf/perf.c:349
#5 0x557692867678 in handle_internal_command tools/perf/perf.c:402
#6 0x557692867a40 in run_argv tools/perf/perf.c:446
#7 0x557692867fae in main tools/perf/perf.c:562
#8 0x7f9df96456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Fixes: 657ee5531903339b ("perf evlist: Introduce side band thread") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240301074639.2260708-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 8 Mar 2024 00:19:15 +0000 (16:19 -0800)]
perf tools: Add/use PMU reverse lookup from config to name
Add perf_pmu__name_from_config that does a reverse lookup from a
config number to an alias name. The lookup is expensive as the config
is computed for every alias by filling in a perf_event_attr, but this
is only done when verbose output is enabled. The lookup also only
considers config, and not config1, config2 or config3.
Fix the python binding build by adding dummies for not strictly
needed perf_pmu__name_from_config() and perf_pmus__find_by_type().
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It feels more consistent to do this, rather than only prefer a PMU
name when a hard coded name isn't available.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
55 58.44 ubuntu:24.04 : FAIL gcc version 13.2.0 (Ubuntu 13.2.0-17ubuntu2)
util/pmu.c:1638:70: error: '_Static_assert' with no message is a C2x extension [-Werror,-Wc2x-extensions]
1638 | _Static_assert(ARRAY_SIZE(terms) == __PARSE_EVENTS__TERM_TYPE_NR - 6);
| ^
| , ""
1 error generated.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 8 Mar 2024 00:19:12 +0000 (16:19 -0800)]
perf list: Allow wordwrap to wrap on commas
A raw event encoding may be a block with terms separated by commas. If
wrapping such a string it would be useful to break at the commas, so
add this ability to wordwrap.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 8 Mar 2024 00:19:11 +0000 (16:19 -0800)]
perf pmu: Drop "default_core" from alias names
"default_core" is used by jevents.py for json events' PMU name when
none is specified. On x86 the "default_core" is typically the PMU
"cpu". When creating an alias see if the event's PMU name is
"default_core" in which case don't record it. This means in places
like "perf list" the PMU's name will be used in its place.
Before:
$ perf list --details
...
cache:
l1d.replacement
[Counts the number of cache lines replaced in L1 data cache]
default_core/event=0x51,period=0x186a3,umask=0x1/
...
After:
$ perf list --details
...
cache:
l1d.replacement
[Counts the number of cache lines replaced in L1 data cache. Unit: cpu]
cpu/event=0x51,period=0x186a3,umask=0x1/
...
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 8 Mar 2024 00:19:10 +0000 (16:19 -0800)]
perf list: Add tracepoint encoding to detailed output
The tracepoint id holds the config value and is probed in determining
what an event is. Add reading of the id so that we can display the
event encoding as:
$ perf list --details
...
alarmtimer:alarmtimer_cancel [Tracepoint event]
tracepoint/config=0x18c/
...
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 19 Mar 2024 17:49:32 +0000 (14:49 -0300)]
perf beauty: Introduce scrape script for 'clone' syscall 'flags' argument
It was using the first variation on producing a string representation
for a binary flag, one that used the copy of uapi/linux/sched.h with
preprocessor tricks that had to be updated everytime a new flag was
introduced.
Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo.
Now we can move uapi/linux/sched.h from tools/include/, that is used for
building perf to the scrape only directory tools/perf/trace/beauty/include.
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZfnULIn3XKDq0bpc@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:51:15 +0000 (22:51 -0700)]
perf annotate-data: Do not retry for invalid types
In some cases, it was able to find a type or location info (for per-cpu
variable) but cannot match because of invalid offset or missing global
information. In those cases, it's meaningless to go to the outer scope
and retry because there will be no additional information.
Let's change the return type of find_matching_type() and bail out if it
returns -1 for the cases.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-24-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:51:13 +0000 (22:51 -0700)]
perf annotate-data: Add stack canary type
When the stack protector is enabled, compiler would generate code to
check stack overflow with a special value called 'stack carary' at
runtime. On x86_64, GCC hard-codes the stack canary as %gs:40.
While there's a definition of fixed_percpu_data in asm/processor.h,
it seems that the header is not included everywhere and many places
it cannot find the type info. As it's in the well-known location (at
%gs:40), let's add a pseudo stack canary type to handle it specially.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-22-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:51:12 +0000 (22:51 -0700)]
perf annotate-data: Handle ADD instructions
There are different patterns for percpu variable access using a constant
value added to the base.
 2aeb:  mov   -0x7da0f7e0(,%rax,8),%r14 # r14 = __per_cpu_offset[cpu]
 2af3:  mov   $0x34740,%rax # rax = address of runqueues
* 2afa:  add   %rax,%r14 # r14 = &per_cpu(runqueues, cpu)
 2bfd:  cmpl  $0x0,0x10(%r14) # cpu_rq(cpu)->has_blocked_load
 2b03:  je   0x2b36
At the first instruction, r14 has the __per_cpu_offset. And then rax
has an immediate value and then added to r14 to calculate the address of
a per-cpu variable. So it needs to track the immediate values and ADD
instructions.
Similar but a little different case is to use "this_cpu_off" instead of
"__per_cpu_offset" for the current CPU. This time the variable address
comes with PC-rel addressing.
Namhyung Kim [Tue, 19 Mar 2024 05:51:11 +0000 (22:51 -0700)]
perf annotate-data: Support general per-cpu access
This is to support per-cpu variable access often without a matching
DWARF entry. For some reason, I cannot find debug info of per-cpu
variables sometimes. They have more complex pattern to calculate the
address of per-cpu variables like below.
Let's assume the rax register has a number for a CPU at 2b7d. The next
instruction is to get the per-cpu offset' for that cpu. The offset
-0x7da0f7e0 is 0xffffffff825f0820 in u64 which is the address of the
'__per_cpu_offset' array in my system. So it'd get the actual offset
of that CPU's per-cpu region and save it to the rcx register.
Then, at 2b8c, accesses using rcx can be handled same as the global
variable access. To handle this case, it should check if the offset
of the instruction matches to the address of '__per_cpu_offset'.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-20-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:51:10 +0000 (22:51 -0700)]
perf annotate-data: Track instructions with a this-cpu variable
Like global variables, this per-cpu variables should be tracked
correctly. Factor our get_global_var_type() to handle both global
and per-cpu (for this cpu) variables in the same manner.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-19-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
And the current macro generates the instructions like below:
mov %gs:0x32940, %rcx
So the %gs segment register points to the beginning of the per-cpu
region of this cpu and it points the variable with a constant.
Let's update the instruction location info to have a segment register
and handle %gs in kernel to look up a global variable. Pretend it as
a global variable by changing the register number to DWARF_REG_PC.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-18-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:51:08 +0000 (22:51 -0700)]
perf annotate: Parse x86 segment register location
Add a segment field in the struct annotated_insn_loc and save it for the
segment based addressing like %gs:0x28. For simplicity it now handles
%gs register only.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-17-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If it failed to find a variable for the location directly, it might be
due to a missing variable in the source code. For example, accessing
pointer variables in a chain can result in the case like below:
struct foo *foo = ...;
int i = foo->bar->baz;
The DWARF debug information is created for each variable so it'd have
one for 'foo'. But there's no variable for 'foo->bar' and then it
cannot know the type of 'bar' and 'baz'.
The above source code can be compiled to the follow x86 instructions:
Let's say 'foo' is located in the %rax and it has a pointer to struct
foo. But perf sample is captured in the second instruction and there
is no variable or type info for the %rcx.
It'd be great if compiler could generate debug info for %rcx, but we
should handle it on our side. So this patch implements the logic to
iterate instructions and update the type table for each location.
As it already collected a list of scopes including the target
instruction, we can use it to construct the type table smartly.
Image the target instruction has 5 scopes, each scope will have its own
variables and parameters. Then it can start with the innermost scope
(4). So it'd search the shortest path from the start of scope[4] to
the target address and build a list of basic blocks. Then it iterates
the basic blocks with the variables in the scope and update the table.
If it finds a type at the target instruction, then returns it.
Otherwise, it moves to the upper scope[3]. Now it'd search the shortest
path from the start of scope[3] to the start of scope[4]. Then connect
it to the existing basic block list. Then it'd iterate the blocks with
variables for both scopes. It can repeat this until it finds a type at
the target instruction or reaches to the top scope[0].
As the basic blocks contain the shortest path, it won't worry about
branches and can update the table simply.
The final check will be done by find_matching_type() in the next patch.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-15-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:51:05 +0000 (22:51 -0700)]
perf annotate-data: Handle call instructions
When updating instruction states, the call instruction should play a
role since it changes the register states. For simplicity, mark some
registers as caller-saved registers (should be arch-dependent), and
invalidate them all after a function call.
If the function returns something, the designated register (ret_reg)
will have the type info.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-14-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:51:04 +0000 (22:51 -0700)]
perf annotate-data: Handle global variable access
When updating the instruction states, it also needs to handle global
variable accesses. Same as it does for PC-relative addressing, it can
look up the type by address (if it's defined in the same file), or by
name after finding the symbol by address (for declarations).
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-13-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:51:02 +0000 (22:51 -0700)]
perf annotate-data: Add update_insn_state()
The update_insn_state() function is to update the type state table after
processing each instruction. For now, it handles MOV (on x86) insn
to transfer type info from the source location to the target.
The location can be a register or a stack slot. Check carefully when
memory reference happens and fetch the type correctly. It basically
ignores write to a memory since it doesn't change the type info. One
exception is writes to (new) stack slots for register spilling.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:51:01 +0000 (22:51 -0700)]
perf annotate-data: Maintain variable type info
As it collected basic block and variable information in each scope, it
now can build a state table to find matching variable at the location.
The struct type_state is to keep the type info saved in each register
and stack slot. The update_var_state() updates the table when it finds
variables in the current address. It expects die_collect_vars() filled
a list of variables with type info and starting address.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-10-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:51:00 +0000 (22:51 -0700)]
perf annotate-data: Add debug messages
Add a new debug option "type-profile" to enable the detailed info during
the type analysis especially for instruction tracking. You can use this
before the command name like 'report' or 'annotate'.
The find_data_type() needs many information to describe the location of
the data. Add the new 'struct data_loc_info' to pass those information at
once.
No functional changes intended.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:50:56 +0000 (22:50 -0700)]
perf dwarf-aux: Add die_find_func_rettype()
The die_find_func_rettype() is to find a debug entry for the given
function name and sets the type information of the return value. By
convention, it'd return the pointer to the type die (should be the
same as the given mem_die argument) if found, or NULL otherwise.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:50:55 +0000 (22:50 -0700)]
perf dwarf-aux: Handle type transfer for memory access
We want to track type states as instructions are executed. Each
instruction can access compound types like struct or union and load/
store its members to a different location.
The die_deref_ptr_type() is to find a type of memory access with a
pointer variable. If it points to a compound type like struct, the
target memory is a member in the struct. The access will happen with an
offset indicating which member it refers. Let's follow the DWARF info
to figure out the type of the pointer target.
For example, say we have the following code.
struct foo {
int a;
int b;
};
struct foo *p = malloc(sizeof(*p));
p->b = 0;
The last pointer access should produce x86 asm like below:
mov 0x0, 4(%rbx)
And we know %rbx register has a pointer to struct foo. Then offset 4
should return the debug info of member 'b'.
Also variables of compound types can be accessed directly without a
pointer. The die_get_member_type() is to handle a such case.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-4-namhyung@kernel.org
[ Check if die_get_real_type() returned NULL ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 19 Mar 2024 05:50:54 +0000 (22:50 -0700)]
perf dwarf-aux: Add die_collect_vars()
The die_collect_vars() is to find all variable information in the scope
including function parameters. The struct die_var_type is to save the
type of the variable with the location (reg and offset) as well as where
it's defined in the code (addr).
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 2 Feb 2024 23:40:54 +0000 (15:40 -0800)]
perf cpumap: Clean up use of perf_cpu_map__has_any_cpu_or_is_empty
Most uses of what was perf_cpu_map__empty but is now
perf_cpu_map__has_any_cpu_or_is_empty want to do something with the
CPU map if it contains CPUs. Replace uses of
perf_cpu_map__has_any_cpu_or_is_empty with other helpers so that CPUs
within the map can be handled.
Ian Rogers [Fri, 2 Feb 2024 23:40:53 +0000 (15:40 -0800)]
perf intel-pt/intel-bts: Switch perf_cpu_map__has_any_cpu_or_is_empty use
Switch perf_cpu_map__has_any_cpu_or_is_empty() to
perf_cpu_map__is_any_cpu_or_is_empty() as a CPU map may contain CPUs as
well as the dummy event and perf_cpu_map__is_any_cpu_or_is_empty() is a
more correct alternative.
Ian Rogers [Fri, 2 Feb 2024 23:40:52 +0000 (15:40 -0800)]
perf arm-spe/cs-etm: Directly iterate CPU maps
Rather than iterate all CPUs and see if they are in CPU maps, directly
iterate the CPU map. Similarly make use of the intersect function
taking care for when "any" CPU is specified. Switch
perf_cpu_map__has_any_cpu_or_is_empty() to more appropriate
alternatives.
Ian Rogers [Fri, 2 Feb 2024 23:40:51 +0000 (15:40 -0800)]
libperf cpumap: Ensure empty cpumap is NULL from alloc
Potential corner cases could cause a cpumap to be allocated with size
0, but an empty cpumap should be represented as NULL. Add a path in
perf_cpu_map__alloc() to ensure this.
Ethan Adams [Thu, 14 Mar 2024 22:20:12 +0000 (15:20 -0700)]
perf build: Fix out of tree build related to installation of sysreg-defs
It seems that a previous modification to sysreg-defs, which corrected
emitting the header to the specified output directory, exposed missing
subdir, prefix variables.
This breaks out of tree builds of perf as the file is now built into the
output directory, but still tries to descend into output directory as a
subdir.
Fixes: a29ee6aea7030786 ("perf build: Ensure sysreg-defs Makefile respects output dir") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Tycho Andersen <tycho@tycho.pizza> Signed-off-by: Ethan Adams <j.ethan.adams@gmail.com> Tested-by: Tycho Andersen <tycho@tycho.pizza> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240314222012.47193-1-j.ethan.adams@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Fri, 15 Mar 2024 07:13:34 +0000 (09:13 +0200)]
perf auxtrace: Fix multiple use of --itrace option
If the --itrace option is used more than once, the options are
combined, but "i" and "y" (sub-)options can be corrupted because
itrace_do_parse_synth_opts() incorrectly overwrites the period type and
period with default values.
For example, with:
--itrace=i0ns --itrace=e
The processing of "--itrace=e", resets the "i" period from 0 nanoseconds
to the default 100 microseconds.
Fix by performing the default setting of period type and period only if
"i" or "y" are present in the currently processed --itrace value.
Fixes: f6986c95af84ff2a ("perf session: Add instruction tracing options") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240315071334.3478-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Tue, 12 Mar 2024 13:25:07 +0000 (13:25 +0000)]
perf docs arm_spe: Clarify more SPE requirements related to KPTI
The question of exactly when KPTI needs to be disabled comes up a lot
because it doesn't always need to be done. Add the relevant kernel
function and some examples that describe the behavior.
Also describe the interrupt requirement and that no error message will
be printed if this isn't met.
Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240312132508.423320-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 15 Mar 2024 19:18:14 +0000 (16:18 -0300)]
tools headers: Remove almost unused copy of uapi/stat.h, add few conditional defines
These were used to build perf to provide defines not available in older
distros, but this was back in 2017, nowadays most the distros that are
supported and I have build containers for work using just the system
headers, so ditch them.
For the few that don't have STATX_MNT_ID{_UNIQUE}, or STATX_MNT_DIOALIGN
add them conditionally.
Some of these older distros may not have things that are used in 'perf
trace', but then they also don't have libtraceevent packages, so don't
build 'perf trace'.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240315204835.748716-6-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 15 Mar 2024 19:18:14 +0000 (16:18 -0300)]
tools headers: Remove now unused copies of uapi/{fcntl,openat2}.h and asm/fcntl.h
These were used to build perf to provide defines not available in older
distros, but this was back in 2017, nowadays all the distros that are
supported and I have build containers for work using just the system
headers, so ditch them.
Some of these older distros may not have things that are used in 'perf
trace', but then they also don't have libtraceevent packages, so don't
build 'perf trace'.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240315204835.748716-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 15 Mar 2024 19:06:01 +0000 (16:06 -0300)]
perf beauty: Use the system linux/fcntl.h instead of a copy from the kernel
Builds ok all the way back to these older distros:
1 almalinux:8 : Ok gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20) , clang version 16.0.6 (Red Hat 16.0.6-2.module_el8.9.0+3621+df7f7146) flex 2.6.1
3 alpine:3.15 : Ok gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027 , Alpine clang version 12.0.1 flex 2.6.4
15 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0 , Debian clang version 11.0.1-2~deb10u1 flex 2.6.4
32 opensuse:15.4 : Ok gcc (SUSE Linux) 7.5.0 , clang version 15.0.7 flex 2.6.4
23 fedora:35 : Ok gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-3) , clang version 13.0.1 (Fedora 13.0.1-1.fc35) flex 2.6.4
38 ubuntu:18.04 : Ok gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 flex 2.6.4
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240315204835.748716-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 11 Mar 2024 20:07:33 +0000 (17:07 -0300)]
perf beauty: Move prctl.h files (uapi/linux and x86's) copy out of the directory used to build perf
It is used only to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/{include,arch}/ hierarchies, that is used
just for scraping.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming
from either 'make install_headers' or from the system /usr/include/
directory.
Arnaldo Carvalho de Melo [Thu, 14 Mar 2024 20:39:29 +0000 (17:39 -0300)]
perf beauty: Stop using the copy of uapi/linux/prctl.h
Use the system one, nothing used in that file isn't available in the
supported, active distros.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org>
To: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/20240315204835.748716-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 11 Mar 2024 20:07:33 +0000 (17:07 -0300)]
perf beauty: Move arch/x86/include/asm/irq_vectors.h copy out of the directory used to build perf
It is used only to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/include/ hierarchy, that is used just for
scraping.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
Arnaldo Carvalho de Melo [Mon, 11 Mar 2024 20:07:33 +0000 (17:07 -0300)]
perf beauty: Move uapi/sound/asound.h copy out of the directory used to build perf
It is used only to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/include/ hierarchy, that is used just for
scraping.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
Arnaldo Carvalho de Melo [Mon, 11 Mar 2024 20:07:33 +0000 (17:07 -0300)]
perf beauty: Move uapi/linux/usbdevice_fs.h copy out of the directory used to build perf
It is mostly used only to generate string tables, not to build perf, so
move it to the tools/perf/trace/beauty/include/ hierarchy, that is used
just for scraping.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming
from either 'make install_headers' or from the system /usr/include/
directory.
Arnaldo Carvalho de Melo [Mon, 11 Mar 2024 20:07:33 +0000 (17:07 -0300)]
perf beauty: Move uapi/linux/mount.h copy out of the directory used to build perf
It is mostly used only to generate string tables, not to build perf, so
move it to the tools/perf/trace/beauty/include/ hierarchy, that is used
just for scraping.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
No other tools/ living code uses it, just <linux/mount.h> coming from
either 'make install_headers' or from the system /usr/include/
directory.
Arnaldo Carvalho de Melo [Tue, 12 Mar 2024 16:53:39 +0000 (13:53 -0300)]
perf beauty: Don't include uapi/linux/mount.h, use sys/mount.h instead
The tools/include/uapi/linux/mount.h file is mostly used for scrapping
defines into id->string tables, this is the only place were it is being
directly used, stop doing so.
Define MOUNT_ATTR_RELATIME and MOUNT_ATTR__ATIME if not available in the
system's headers.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 11 Mar 2024 20:07:33 +0000 (17:07 -0300)]
perf beauty: Move uapi/linux/fs.h copy out of the directory used to build perf
It is mostly used only to generate string tables, not to build perf, so
move it to the tools/perf/trace/beauty/include/ hierarchy, that is used
just for scraping.
The only case where it was being used to build was in
tools/perf/trace/beauty/sync_file_range.c, because some older systems
doesn't have the SYNC_FILE_RANGE_WRITE_AND_WAIT define, just use the
system's linux/fs.h header instead, defining it if not available.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
No other tools/ living code uses it, just <linux/fs.h> coming from
either 'make install_headers' or from the system /usr/include/
directory.
Arnaldo Carvalho de Melo [Mon, 11 Mar 2024 20:49:01 +0000 (17:49 -0300)]
perf beauty: Fix dependency of tables using uapi/linux/mount.h
Several such tables were depending on uapi/linux/fs.h, cut and paste
error when they were introduced, fix it.
Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Ze9vjxv42PN_QGZv@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This will not get reused by any other syscall as nanosleep is the only
one to have as its first argument a 'struct timespec" pointer argument
passed from userspace to the kernel:
BTF based pretty printing will simplify all this, but then lets just get
the low hanging fruits first.
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zbq72dJRpOlfRWnf@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Wed, 20 Mar 2024 23:42:47 +0000 (16:42 -0700)]
Merge tag 'v6.9-rc-smb3-server-fixes' of git://git.samba.org/ksmbd
Pull smb server updates from Steve French:
- add support for durable file handles (an important data integrity
feature)
- fixes for potential out of bounds issues
- fix possible null dereference in close
- getattr fixes
- trivial typo fix and minor cleanup
* tag 'v6.9-rc-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: remove module version
ksmbd: fix potencial out-of-bounds when buffer offset is invalid
ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16()
ksmbd: Fix spelling mistake "connction" -> "connection"
ksmbd: fix possible null-deref in smb_lazy_parent_lease_break_close
ksmbd: add support for durable handles v1/v2
ksmbd: mark SMB2_SESSION_EXPIRED to session when destroying previous session
ksmbd: retrieve number of blocks using vfs_getattr in set_file_allocation_info
ksmbd: replace generic_fillattr with vfs_getattr
Linus Torvalds [Wed, 20 Mar 2024 23:37:07 +0000 (16:37 -0700)]
Merge tag 'trace-tools-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull trace tool updates from Steven Rostedt:
"Tracing:
- Update makefiles for latency-collector and RTLA, using tools/build/
makefiles like perf does, inheriting its benefits. For example,
having a proper way to handle library dependencies.
- The timerlat tracer has an interface for any tool to use. rtla
timerlat tool uses this interface dispatching its own threads as
workload. But, rtla timerlat could also be used for any other
process. So, add 'rtla timerlat -U' option, allowing the timerlat
tool to measure the latency of any task using the timerlat tracer
interface.
Verification:
- Update makefiles for verification/rv, using tools/build/ makefiles
like perf does, inheriting its benefits. For example, having a
proper way to handle dependencies"
* tag 'trace-tools-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tools/rtla: Add -U/--user-load option to timerlat
tools/verification: Use tools/build makefiles on rv
tools/rtla: Use tools/build makefiles to build rtla
tools/tracing: Use tools/build makefiles on latency-collector
Daniel Bristot de Oliveira [Tue, 6 Feb 2024 14:32:06 +0000 (15:32 +0100)]
tools/rtla: Add -U/--user-load option to timerlat
The timerlat tracer provides an interface for any application to wait
for the timerlat's periodic wakeup. Currently, rtla timerlat uses it
to dispatch its user-space workload (-u option).
But as the tracer interface is generic, rtla timerlat can also be used
to monitor any workload that uses it. For example, a user might
place their own workload to wait on the tracer interface, and
monitor the results with rtla timerlat.
Add the -U option to rtla timerlat top and hist. With this option, rtla
timerlat will not dispatch its workload but only setting up the
system, waiting for a user to dispatch its workload.
The sample code in this patch is an example of python application
that loops in the timerlat tracer fd.
To use it, dispatch:
# rtla timerlat -U
In a terminal, then run the python program on another terminal,
specifying the CPU to run it. For example, setting on CPU 1:
#./timerlat_load.py 1
Then rtla timerlat will start printing the statistics of the
./timerlat_load.py app.
An interesting point is that the "Ret user Timer Latency" value
is the overall response time of the load. The sample load does
a memory copy to exemplify that.
The stop tracing options on rtla timerlat works in this setup
as well, including auto analysis.
Daniel Bristot de Oliveira [Fri, 15 Mar 2024 16:44:04 +0000 (17:44 +0100)]
tools/rtla: Use tools/build makefiles to build rtla
Use tools/build/ makefiles to build rtla, inheriting the benefits of
it. For example, having a proper way to handle dependencies.
rtla is built using perf infra-structure when building inside the
kernel tree.
At this point, rtla diverges from perf in two points: Documentation
and tarball generation/build.
At the documentation level, rtla is one step ahead, placing the
documentation at Documentation/tools/rtla/, using the same build
tools as kernel documentation. The idea is to move perf
documentation to the same scheme and then share the same makefiles.
rtla has a tarball target that the (old) RHEL8 uses. The tarball was
kept using a simple standalone makefile for compatibility. The
standalone makefile shares most of the code, e.g., flags, with
regular buildings.
The tarball method was set as deprecated. If necessary, we can make
a rtla tarball like perf, which includes the entire tools/build.
But this would also require changes in the user side (the directory
structure changes, and probably the deps to build the package).
Inspired on perf and objtool.
Link: https://lkml.kernel.org/r/57563abf2715d22515c0c54a87cff3849eca5d52.1710519524.git.bristot@kernel.org Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: John Kacur <jkacur@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Daniel Bristot de Oliveira [Fri, 15 Mar 2024 16:44:03 +0000 (17:44 +0100)]
tools/tracing: Use tools/build makefiles on latency-collector
Use tools/build/ makefiles to build latency-collector, inheriting
the benefits of it. For example: Before this patch, a missing
tracefs/traceevents headers will result in fail like this:
~/linux/tools/tracing/latency $ make
cc -Wall -Wextra -g -O2 -o latency-collector latency-collector.c -lpthread
latency-collector.c:26:10: fatal error: tracefs.h: No such file or directory
26 | #include <tracefs.h>
| ^~~~~~~~~~~
compilation terminated.
make: *** [Makefile:14: latency-collector] Error 1
Which is not that helpful. After this change it reports:
~/linux/tools/tracing/latency# make
Auto-detecting system features:
... libtraceevent: [ OFF ]
... libtracefs: [ OFF ]
libtraceevent is missing. Please install libtraceevent-dev/libtraceevent-devel
libtracefs is missing. Please install libtracefs-dev/libtracefs-devel
Makefile.config:29: *** Please, check the errors above.. Stop.
This type of output is common across other tools in tools/ like perf
and objtool.
Link: https://lkml.kernel.org/r/872420b0880b11304e4ba144a0086c6478c5b469.1710519524.git.bristot@kernel.org Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: John Kacur <jkacur@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Linus Torvalds [Wed, 20 Mar 2024 00:27:25 +0000 (17:27 -0700)]
Merge tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Assorted bugfixes.
Most are fixes for simple assertion pops; the most significant fix is
for a deadlock in recovery when we have to rewrite large numbers of
btree nodes to fix errors. This was incorrectly running out of the
same workqueue as the core interior btree update path - we now give it
its own single threaded workqueue.
This was visible to users as "bch2_btree_update_start(): error:
BCH_ERR_journal_reclaim_would_deadlock" - and then recovery hanging"
* tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefs:
bcachefs: Fix lost wakeup on journal shutdown
bcachefs; Fix deadlock in bch2_btree_update_start()
bcachefs: ratelimit errors from async_btree_node_rewrite
bcachefs: Run check_topology() first
bcachefs: Improve bch2_fatal_error()
bcachefs: Fix lost transaction restart error
bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info
bcachefs: fix for building in userspace
bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recovery
bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init()
bcachefs: Improve sysfs internal/btree_updates
bcachefs: Split out btree_node_rewrite_worker
bcachefs: Fix locking in bch2_alloc_write_key()
bcachefs: Avoid extent entry type assertions in .invalid()
bcachefs: Fix spurious -BCH_ERR_transaction_restart_nested
bcachefs: Fix check_key_has_snapshot() call
bcachefs: Change "accounting overran journal reservation" to a warning
Linus Torvalds [Tue, 19 Mar 2024 18:57:26 +0000 (11:57 -0700)]
Merge tag 'soc-late-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull more ARM SoC updates from Arnd Bergmann:
"These are changes that for some reason ended up not making it into the
first four branches but that should still make it into 6.9:
- A rework of the omap clock support that touches both drivers and
device tree files
- The reset controller branch changes that had a dependency on late
bugfixes. Merging them here avoids a backmerge of 6.8-rc5 into the
drivers branch
- The RISC-V/starfive, RISC-V/microchip and ARM/Broadcom devicetree
changes that got delayed and needed some extra time in linux-next
for wider testing"
* tag 'soc-late-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (31 commits)
soc: fsl: dpio: fix kcalloc() argument order
bus: ts-nbus: Improve error reporting
bus: ts-nbus: Convert to atomic pwm API
riscv: dts: starfive: jh7110: Add camera subsystem nodes
ARM: bcm: stop selecing CONFIG_TICK_ONESHOT
ARM: dts: omap3: Update clksel clocks to use reg instead of ti,bit-shift
ARM: dts: am3: Update clksel clocks to use reg instead of ti,bit-shift
clk: ti: Improve clksel clock bit parsing for reg property
clk: ti: Handle possible address in the node name
dt-bindings: pwm: opencores: Add compatible for StarFive JH8100
dt-bindings: riscv: cpus: reg matches hart ID
reset: Instantiate reset GPIO controller for shared reset-gpios
reset: gpio: Add GPIO-based reset controller
cpufreq: do not open-code of_phandle_args_equal()
of: Add of_phandle_args_equal() helper
reset: simple: add support for Sophgo SG2042
dt-bindings: reset: sophgo: support SG2042
riscv: dts: microchip: add specific compatible for mpfs pdma
riscv: dts: microchip: add missing CAN bus clocks
ARM: brcmstb: Add debug UART entry for 74165
...