Ian Rogers [Wed, 14 Feb 2024 01:18:12 +0000 (17:18 -0800)]
perf vendor events intel: Update ivytown TMA metrics to 4.7
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_fetch_bandwidth and
tma_ports_utilized_3m.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-24-irogers@google.com
Ian Rogers [Wed, 14 Feb 2024 01:18:11 +0000 (17:18 -0800)]
perf vendor events intel: Update ivybridge TMA metrics to 4.7
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_fetch_bandwidth and
tma_ports_utilized_3m.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-23-irogers@google.com
Ian Rogers [Wed, 14 Feb 2024 01:18:10 +0000 (17:18 -0800)]
perf vendor events intel: Update icelakex TMA metrics to 4.7
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-22-irogers@google.com
Ian Rogers [Wed, 14 Feb 2024 01:18:09 +0000 (17:18 -0800)]
perf vendor events intel: Update icelake TMA metrics to 4.7
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-21-irogers@google.com
Ian Rogers [Wed, 14 Feb 2024 01:18:08 +0000 (17:18 -0800)]
perf vendor events intel: Update haswellx TMA metrics to 4.7
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and
tma_ports_utilized_3m.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-20-irogers@google.com
Ian Rogers [Wed, 14 Feb 2024 01:18:07 +0000 (17:18 -0800)]
perf vendor events intel: Update haswell TMA metrics to 4.7
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and
tma_ports_utilized_3m.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-19-irogers@google.com
Ian Rogers [Wed, 14 Feb 2024 01:18:06 +0000 (17:18 -0800)]
perf vendor events intel: Update cascadelakex TMA metrics to 4.7
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-18-irogers@google.com
Ian Rogers [Wed, 14 Feb 2024 01:18:05 +0000 (17:18 -0800)]
perf vendor events intel: Update broadwellx TMA metrics to 4.7
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc and tma_info_inst_mix_ipflop.
- Removal of tma_info_bad_spec_branch_misprediction_cost.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-17-irogers@google.com
Ian Rogers [Wed, 14 Feb 2024 01:18:04 +0000 (17:18 -0800)]
perf vendor events intel: Update broadwellde TMA metrics to 4.7
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc and tma_info_inst_mix_ipflop.
- Removal of tma_info_bad_spec_branch_misprediction_cost.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-16-irogers@google.com
Ian Rogers [Wed, 14 Feb 2024 01:18:03 +0000 (17:18 -0800)]
perf vendor events intel: Update broadwell TMA metrics to 4.7
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc and tma_info_inst_mix_ipflop.
- Removal of tma_info_bad_spec_branch_misprediction_cost.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-15-irogers@google.com
Ian Rogers [Wed, 14 Feb 2024 01:18:02 +0000 (17:18 -0800)]
perf vendor events intel: Update alderlake TMA metrics to 4.7
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- tma_c01_wait and tma_c02_wait metrics measure power-performance
states.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-14-irogers@google.com
Documentation fixes, removal of TOPDOWN.BR_MISPREDICT_SLOTS,
deprecation of UNC_ARB_DAT_REQUESTS.RD, UNC_ARB_DAT_REQUESTS.RD and
UNC_ARB_IFA_OCCUPANCY.ALL.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-13-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-12-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-11-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-10-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-9-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-8-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: linux-perf-users@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-7-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-6-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-5-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-4-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-3-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-2-irogers@google.com
pahole manages to find it from /sys/kernel/btf/vmlinux, that is
generated from the kernel types.
With this linux/bpf.h doesn't need to be included, as its already in the
minimalistic tools/perf/util/bpf_skel/vmlinux/vmlinux.h file or what we
need comes when generating a vmlinux.h file from BTF info, i.e. when
using GEN_VMLINUX_H=1, as noticed by Namyung in a build break before
removing linux/bpf.h.
Veronika Molnarova [Thu, 15 Feb 2024 11:02:29 +0000 (12:02 +0100)]
perf testsuite: Add common output checking helpers
As a form of validation, it is a common practice to check the outputs
of commands whether they contain expected patterns or match a certain
regex.
Add helpers for verifying that all regexes are found in the output, that
all lines match any pattern from a set and that a certain expression is
not present in the output.
In verbose mode these helpers log mismatches for easier failure
investigation.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: kjain@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240215110231.15385-6-mpetlan@redhat.com
Veronika Molnarova [Thu, 15 Feb 2024 11:02:28 +0000 (12:02 +0100)]
perf testsuite: Add test case for perf probe
Add new perf probe test case that acts as an entry element in perf test
list. Runs multiple subtests from directory "base_probe", which will be
added in incomming patches and can be expanded without further editing.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: kjain@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240215110231.15385-5-mpetlan@redhat.com
Veronika Molnarova [Thu, 15 Feb 2024 11:02:26 +0000 (12:02 +0100)]
perf testsuite: Add common setting for shell tests
Add settings defining sample commands later shared by shell tests. This
adds the possibility to globally adjust the default values for the whole
testsuite.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: kjain@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240215110231.15385-3-mpetlan@redhat.com
Adrian Hunter [Wed, 31 Jan 2024 19:24:16 +0000 (21:24 +0200)]
perf test: Enable Symbols test to work with a current module dso
The test needs a struct machine and creates one for the current host,
but a side-effect is that struct machine has set up kernel maps
including module maps.
If the 'Symbols' test --dso option specifies a current kernel module,
it will already be present as a kernel dso, and a map with kmaps needs
to be used otherwise there will be a segfault - see below.
For that case, find the existing map and use that. In that case also,
the dso is split by section into multiple dsos, so test those dsos
also. That in turn, shows up that those dsos have not had overlapping
symbols removed, so the test fails.
Leo Yan [Wed, 14 Feb 2024 11:39:47 +0000 (19:39 +0800)]
perf build: Cleanup perf register configuration
The target is to allow the tool to always enable the perf register
feature for native parsing and cross parsing, and current code doesn't
depend on the macro 'HAVE_PERF_REGS_SUPPORT'.
This patch remove the variable 'NO_PERF_REGS' and the defined macro
'HAVE_PERF_REGS_SUPPORT' from the Makefile.
Signed-off-by: Leo Yan <leo.yan@linux.dev> Reviewed-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ming Wang <wangming01@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214113947.240957-5-leo.yan@linux.dev
Leo Yan [Wed, 14 Feb 2024 11:39:46 +0000 (19:39 +0800)]
perf parse-regs: Introduce a weak function arch__sample_reg_masks()
Every architecture can provide a register list for sampling. If an
architecture doesn't support register sampling, it won't define the data
structure 'sample_reg_masks'. Consequently, any code using this
structure must be protected by the macro 'HAVE_PERF_REGS_SUPPORT'.
This patch defines a weak function, arch__sample_reg_masks(), which will
be replaced by an architecture-defined function for returning the
architecture's register list. With this refactoring, the function always
exists, the condition checking for 'HAVE_PERF_REGS_SUPPORT' is not
needed anymore, so remove it.
Signed-off-by: Leo Yan <leo.yan@linux.dev> Reviewed-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ming Wang <wangming01@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214113947.240957-4-leo.yan@linux.dev
Currently, the macro HAVE_PERF_REGS_SUPPORT is used as a switch to turn
on or turn off the code of perf registers. If any architecture cannot
support perf register, it disables the perf register parsing, for both
the native parsing and cross parsing for other architectures.
To support both the native parsing and cross parsing, the tool should
always build the perf regs functions. Thus, this patch removes
HAVE_PERF_REGS_SUPPORT from the perf regs files.
Signed-off-by: Leo Yan <leo.yan@linux.dev> Reviewed-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ming Wang <wangming01@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214113947.240957-3-leo.yan@linux.dev
Leo Yan [Wed, 14 Feb 2024 11:39:44 +0000 (19:39 +0800)]
perf build: Remove unused CONFIG_PERF_REGS
CONFIG_PERF_REGS is not used, remove it.
Signed-off-by: Leo Yan <leo.yan@linux.dev> Reviewed-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ming Wang <wangming01@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214113947.240957-2-leo.yan@linux.dev
Ian Rogers [Fri, 9 Feb 2024 20:49:47 +0000 (12:49 -0800)]
perf metric: Don't remove scale from counts
Counts were switched from the scaled saved value form to the
aggregated count to avoid double accounting. When this happened the
removing of scaling for a count should have been removed, however, it
wasn't and this wasn't observed as it normally doesn't matter because
a counter's scale is 1. A problem was observed with RAPL events that
are scaled.
Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather than saved_value") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Kaige Ye <ye@kaige.org> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240209204947.3873294-5-irogers@google.com
Ian Rogers [Fri, 9 Feb 2024 20:49:46 +0000 (12:49 -0800)]
perf stat: Avoid metric-only segv
Cycles is recognized as part of a hard coded metric in stat-shadow.c,
it may call print_metric_only with a NULL fmt string leading to a
segfault. Handle the NULL fmt explicitly.
Fixes: 088519f318be ("perf stat: Move the display functions to stat-display.c") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Kaige Ye <ye@kaige.org> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240209204947.3873294-4-irogers@google.com
Ian Rogers [Fri, 9 Feb 2024 20:49:45 +0000 (12:49 -0800)]
perf expr: Fix "has_event" function for metric style events
Events in metrics cannot use '/' as a separator, it would be
recognized as a divide, so they use '@'. The '@' is recognized in the
metricgroups code and changed to '/', do the same in the has_event
function so that the parsing is only tried without the @s.
Fixes: 4a4a9bf9075f ("perf expr: Add has_event function") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Kaige Ye <ye@kaige.org> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240209204947.3873294-3-irogers@google.com
Ian Rogers [Fri, 9 Feb 2024 20:49:44 +0000 (12:49 -0800)]
perf expr: Allow NaN to be a valid number
Currently only floating point numbers can be parsed, add a special
case for NaN.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Kaige Ye <ye@kaige.org> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240209204947.3873294-2-irogers@google.com
Ian Rogers [Sat, 10 Feb 2024 03:17:46 +0000 (19:17 -0800)]
perf maps: Locking tidy up of nr_maps
After this change maps__nr_maps is only used by tests, existing users
are migrated to maps__empty. Compute maps__empty under the read lock.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-7-irogers@google.com
Ian Rogers [Sat, 10 Feb 2024 03:17:45 +0000 (19:17 -0800)]
perf maps: Hide maps internals
Move the struct into the C file. Add maps__equal to work around
exposing the struct for reference count checking. Add accessors for
the unwind_libunwind_ops. Move maps_list_node to its only use in
symbol.c.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-6-irogers@google.com
Ian Rogers [Sat, 10 Feb 2024 03:17:44 +0000 (19:17 -0800)]
perf maps: Get map before returning in maps__find_next_entry
Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-5-irogers@google.com
Ian Rogers [Sat, 10 Feb 2024 03:17:43 +0000 (19:17 -0800)]
perf maps: Get map before returning in maps__find_by_name
Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this. Also fix some reference counted pointer comparisons.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-4-irogers@google.com
Ian Rogers [Sat, 10 Feb 2024 03:17:42 +0000 (19:17 -0800)]
perf maps: Get map before returning in maps__find
Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-3-irogers@google.com
Ian Rogers [Sat, 10 Feb 2024 03:17:41 +0000 (19:17 -0800)]
perf maps: Switch from rbtree to lazily sorted array for addresses
Maps is a collection of maps primarily sorted by the starting address
of the map. Prior to this change the maps were held in an rbtree
requiring 4 pointers per node. Prior to reference count checking, the
rbnode was embedded in the map so 3 pointers per node were
necessary. This change switches the rbtree to an array lazily sorted
by address, much as the array sorting nodes by name. 1 pointer is
needed per node, but to avoid excessive resizing the backing array may
be twice the number of used elements. Meaning the memory overhead is
roughly half that of the rbtree. For a perf record with
"--no-bpf-event -g -a" of true, the memory overhead of perf inject is
reduce fom 3.3MB to 3MB, so 10% or 300KB is saved.
Map inserts always happen at the end of the array. The code tracks
whether the insertion violates the sorting property. O(log n) rb-tree
complexity is switched to O(1).
Remove slides the array, so O(log n) rb-tree complexity is degraded to
O(n).
A find may need to sort the array using qsort which is O(n*log n), but
in general the maps should be sorted and so average performance should
be O(log n) as with the rbtree.
An rbtree node consumes a cache line, but with the array 4 nodes fit
on a cache line. Iteration is simplified to scanning an array rather
than pointer chasing.
Overall it is expected the performance after the change should be
comparable to before, but with half of the memory consumed.
To avoid a list and repeated logic around splitting maps,
maps__merge_in is rewritten in terms of
maps__fixup_overlap_and_insert. maps_merge_in splits the given mapping
inserting remaining gaps. maps__fixup_overlap_and_insert splits the
existing mappings, then adds the incoming mapping. By adding the new
mapping first, then re-inserting the existing mappings the splitting
behavior matches.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-2-irogers@google.com
Ian Rogers [Thu, 1 Feb 2024 00:15:02 +0000 (16:15 -0800)]
perf srcline: Add missed addr2line closes
The child_process for addr2line sets in and out to -1 so that pipes
get created. It is the caller's responsibility to close the pipes,
finish_command doesn't do it. Add the missed closes.
Fixes: b3801e791231 ("perf srcline: Simplify addr2line subprocess") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240201001504.1348511-8-irogers@google.com
Yicong Yang [Thu, 8 Feb 2024 02:40:26 +0000 (10:40 +0800)]
perf stat: Support per-cluster aggregation
Some platforms have 'cluster' topology and CPUs in the cluster will
share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2
cache (for Intel Jacobsville). Currently parsing and building cluster
topology have been supported since [1].
perf stat has already supported aggregation for other topologies like
die or socket, etc. It'll be useful to aggregate per-cluster to find
problems like L3T bandwidth contention.
This patch add support for "--per-cluster" option for per-cluster
aggregation. Also update the docs and related test. The output will
be like:
[root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
On a legacy system without cluster or cluster support, the output will
be look like:
[root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1
Note that this patch doesn't mix the cluster information in the outputs
of --per-core to avoid breaking any tools/scripts using it.
Note that perf recently supports "--per-cache" aggregation, but it's not
the same with the cluster although cluster CPUs may share some cache
resources. For example on my machine all clusters within a die share the
same L3 cache:
$ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
0-31
$ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
0-3
[1] commit c5e22feffdd7 ("topology: Represent clusters of CPUs within a die")
Tested-by: Jie Zhan <zhanjie9@hisilicon.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: james.clark@arm.com Cc: 21cnbao@gmail.com Cc: prime.zeng@hisilicon.com Cc: Jonathan.Cameron@huawei.com Cc: fanghao11@huawei.com Cc: linuxarm@huawei.com Cc: tim.c.chen@intel.com Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240208024026.2691-1-yangyicong@huawei.com
Namhyung Kim [Thu, 8 Feb 2024 18:10:25 +0000 (10:10 -0800)]
perf tools: Remove misleading comments on map functions
When it converts sample IP to or from objdump-capable one, there's a
comment saying that kernel modules have DSO_SPACE__USER. But commit 02213cec64bb ("perf maps: Mark module DSOs with kernel type") changed
it and makes the comment confusing. Let's get rid of it.
Yang Jihong [Tue, 6 Feb 2024 08:32:27 +0000 (08:32 +0000)]
perf sched: Move curr_pid and cpu_last_switched initialization to perf_sched__{lat|map|replay}()
The curr_pid and cpu_last_switched are used only for the
'perf sched replay/latency/map'. Put their initialization in
perf_sched__{lat|map|replay () to reduce unnecessary actions in other
commands.
Simple functional testing:
# perf sched record perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.209 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 16.456 MB perf.data (147907 samples) ]
# perf sched lat
-------------------------------------------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Max delay start | Max delay end |
-------------------------------------------------------------------------------------------------------------------------------------------
sched-messaging:(401) | 2990.699 ms | 38705 | avg: 0.661 ms | max: 67.046 ms | max start: 456532.624830 s | max end: 456532.691876 s
qemu-system-x86:(7) | 179.764 ms | 2191 | avg: 0.152 ms | max: 21.857 ms | max start: 456532.576434 s | max end: 456532.598291 s
sshd:48125 | 0.522 ms | 2 | avg: 0.037 ms | max: 0.046 ms | max start: 456532.514610 s | max end: 456532.514656 s
<SNIP>
ksoftirqd/11:82 | 0.063 ms | 1 | avg: 0.005 ms | max: 0.005 ms | max start: 456532.769366 s | max end: 456532.769371 s
kworker/9:0-mm_:34624 | 0.233 ms | 20 | avg: 0.004 ms | max: 0.007 ms | max start: 456532.690804 s | max end: 456532.690812 s
migration/13:93 | 0.000 ms | 1 | avg: 0.004 ms | max: 0.004 ms | max start: 456532.512669 s | max end: 456532.512674 s
-----------------------------------------------------------------------------------------------------------------
TOTAL: | 3180.750 ms | 41368 |
---------------------------------------------------
Yang Jihong [Tue, 6 Feb 2024 08:32:26 +0000 (08:32 +0000)]
perf sched: Move curr_thread initialization to perf_sched__map()
The curr_thread is used only for the 'perf sched map'. Put initialization
in perf_sched__map() to reduce unnecessary actions in other commands.
Simple functional testing:
# perf sched record perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.197 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 15.526 MB perf.data (140095 samples) ]
Yang Jihong [Tue, 6 Feb 2024 08:32:24 +0000 (08:32 +0000)]
perf sched: Move start_work_mutex and work_done_wait_mutex initialization to perf_sched__replay()
The start_work_mutex and work_done_wait_mutex are used only for the
'perf sched replay'. Put their initialization in perf_sched__replay () to
reduce unnecessary actions in other commands.
Simple functional testing:
# perf sched record perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.197 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 14.952 MB perf.data (134165 samples) ]
Yicong Yang [Wed, 7 Feb 2024 09:12:22 +0000 (17:12 +0800)]
perf test: Skip metric w/o event name on arm64 in stat STD output linter
stat+std_output.sh test fails on my arm64 machine:
[root@localhost shell]# ./stat+std_output.sh
Checking STD output: no args Unknown event name in TopDownL1 # 0.18 retiring
[root@localhost shell]# ./stat+std_output.sh
Checking STD output: no args [Success]
Checking STD output: system wide [Success]
Checking STD output: interval [Success]
Checking STD output: per thread Unknown event name in tmux: server-1114960 # 0.41 frontend_bound
When no args specified `perf stat` will add TopdownL1 metric group
and the output will be like:
[root@localhost shell]# perf stat -- stress-ng --vm 1 --timeout 1
stress-ng: info: [3351733] setting to a 1 second run per stressor
stress-ng: info: [3351733] dispatching hogs: 1 vm
stress-ng: info: [3351733] successful run completed in 1.02s
Performance counter stats for 'stress-ng --vm 1 --timeout 1':
Metrics in group TopDownL1 don't have event name on arm64 but are not
listed in the $skip_metric list which they should be listed. Add them
to the skip list as what does for x86 platforms in [1].
[1] commit 4d60e83dfcee ("perf test: Skip metrics w/o event name in stat STD output linter")
Currently perf does not record module section addresses except for
the .text section. In general that means perf cannot get module section
mappings correct (except for .text) when loading symbols from a kernel
module file. (Note using --kcore does not have this issue)
Improve that situation slightly by identifying executable sections that
use the same mapping as the .text section. That happens when an
executable section comes directly after the .text section, both in memory
and on file, something that can be determined by following the same layout
rules used by the kernel, refer kernel layout_sections(). Note whether
that happens is somewhat arbitrary, so this is not a final solution.
Namhyung Kim [Fri, 12 Jan 2024 23:13:40 +0000 (15:13 -0800)]
perf record: Display data size on pipe mode
Currently pipe mode doesn't set the file size and it results in a
misleading message of 0 data size at the end. Although it might miss
some accounting for pipe header or more, just displaying the data size
would reduce the possible confusion.
Before:
$ perf record -o- perf test -w noploop | perf report -i- -q --percent-limit=1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ] <====== (here)
99.58% perf perf [.] noploop
After:
$ perf record -o- perf test -w noploop | perf report -i- -q --percent-limit=1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.229 MB - ]
99.46% perf perf [.] noploop
Kan Liang [Mon, 5 Feb 2024 14:58:19 +0000 (06:58 -0800)]
perf script: Print source line for each jump in brstackinsn
With the srcline option, the perf script only prints a source line at
the beginning of a sample with call/ret from functions, but not for
each jump in brstackinsn. It's useful to print a source line for each
jump in brstackinsn when the end user analyze the full assembler
sequences of branch sequences for the sample.
The srccode option can also be used to locate the source code line.
However, it's printed almost for every line and makes the output less
readable.
Ian Rogers [Tue, 6 Feb 2024 23:59:02 +0000 (15:59 -0800)]
perf kvm powerpc: Fix build
Updates to struct parse_events_error needed to be carried through to
PowerPC specific event parsing.
Fixes: fd7b8e8fb20f ("perf parse-events: Print all errors") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung <namhyung@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206235902.2917395-1-irogers@google.com
Ben Gainey [Tue, 23 Jan 2024 10:31:37 +0000 (10:31 +0000)]
tools: perf: Expose sample ID / stream ID to python scripts
perf script exposes the evsel_name to python scripts as part of the data
passed to the sample or tracepoint handler function, and it passes the id and
stream_id to the throttled/unthrottled handler functions. This makes matching
throttle events and samples difficult.
To make this possible, this change exposes the sample id and stream_id values
to the script.
Signed-off-by: Ben Gainey <ben.gainey@arm.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: will@kernel.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240123103137.1890779-2-ben.gainey@arm.com
Arnaldo Carvalho de Melo [Fri, 2 Feb 2024 14:32:20 +0000 (11:32 -0300)]
perf bpf: Clean up the generated/copied vmlinux.h
When building perf with BPF skels we either copy the minimalistic
tools/perf/util/bpf_skel/vmlinux/vmlinux.h or use bpftool to generate a
vmlinux from BTF, storing the result in $(SKEL_OUT)/vmlinux.h.
We need to remove that when doing a 'make -C tools/perf clean', fix it.
Fixes: b7a2d774c9c5a9a3 ("perf build: Add ability to build with a generated vmlinux.h") Reviewed-by: Ian Rogers <irogers@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: bpf@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/Zbz89KK5wHfZ82jv@x1
Ian Rogers [Wed, 31 Jan 2024 20:14:29 +0000 (12:14 -0800)]
perf jevents: Drop or simplify small integer values
Prior to this patch '0' would be dropped as the config values default
to 0. Some json values are hex and the string '0' wouldn't match '0x0'
as zero. Add a more robust is_zero test to drop these event terms.
When encoding numbers as hex, if the number is between 0 and 9
inclusive then don't add a 0x prefix.
Update test expectations for these changes.
On x86 this reduces the event/metric C string by 58,411 bytes.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240131201429.792138-1-irogers@google.com
Ian Rogers [Wed, 31 Jan 2024 13:49:40 +0000 (05:49 -0800)]
perf parse-events: Print all errors
Prior to this patch the first and the last error encountered during
parsing are printed. To see other errors verbose needs
enabling. Unfortunately this can drop useful errors, in particular on
terms. This patch changes the errors so that instead of the first and
last all errors are recorded and printed, the underlying data
structure is changed to a list.
Before:
```
$ perf stat -e 'slots/edge=2/' true
event syntax error: 'slots/edge=2/'
\___ Bad event or PMU
Unable to find PMU or event on a PMU of 'slots'
Initial error:
event syntax error: 'slots/edge=2/'
\___ Cannot find PMU `slots'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
After:
```
$ perf stat -e 'slots/edge=2/' true
event syntax error: 'slots/edge=2/'
\___ Bad event or PMU
Unable to find PMU or event on a PMU of 'slots'
event syntax error: 'slots/edge=2/'
\___ value too big for format (edge), maximum is 1
event syntax error: 'slots/edge=2/'
\___ Cannot find PMU `slots'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: tchen168@asu.edu Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240131134940.593788-3-irogers@google.com
Ian Rogers [Wed, 31 Jan 2024 13:49:39 +0000 (05:49 -0800)]
perf parse-events: Improve error location of terms cloned from an event
A PMU event/alias will have a set of format terms that replace it when
an event is parsed. The location of the terms is their position when
parsed for the event/alias either from sysfs or json. This location is
of little use when an event fails to parse as the error will be given
in terms of the location in the string of events parsed not the json
or sysfs string. Fix this by making the cloned terms location that of
the event/alias.
If a cloned term from an event/alias is invalid the bad format is hard
to determine from the error string. Add the name of the bad format
into the error string.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: tchen168@asu.edu Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240131134940.593788-2-irogers@google.com
Ian Rogers [Wed, 31 Jan 2024 13:49:38 +0000 (05:49 -0800)]
perf tsc: Add missing newlines to debug statements
It is assumed that debug statements always print a newline, fix two
missing ones.
Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: tchen168@asu.edu Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240131134940.593788-1-irogers@google.com
Weilin Wang [Tue, 30 Jan 2024 18:09:07 +0000 (10:09 -0800)]
perf test: Simplify metric value validation test final report
The original test report was too complicated to read with information
that not really useful. This new update simplify the report which should
largely improve the readibility.
Signed-off-by: Weilin Wang <weilin.wang@intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240130180907.639729-1-weilin.wang@intel.com
Ze Gao [Tue, 23 Jan 2024 07:02:11 +0000 (02:02 -0500)]
perf evsel: Rename get_states() to parse_task_states() and make it public
Since get_states() assumes the existence of libtraceevent, so move
to where it should belong, i.e, util/trace-event-parse.c, and also
rename it to parse_task_states().
Leave evsel_getstate() untouched as it fits well in the evsel
category.
Also make some necessary tweaks for python support, and get it
verified with: perf test python.
James Clark [Wed, 24 Jan 2024 09:43:57 +0000 (09:43 +0000)]
perf evlist: Fix evlist__new_default() for > 1 core PMU
The 'Session topology' test currently fails with this message when
evlist__new_default() opens more than one event:
32: Session topology :
--- start ---
templ file: /tmp/perf-test-vv5YzZ
Using CPUID 0x00000000410fd070
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xb00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 4
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 5
non matching sample_type
FAILED tests/topology.c:73 can't get session
---- end ----
Session topology: FAILED!
This is because when re-opening the file and parsing the header, Perf
expects that any file that has more than one event has the sample ID
flag set. Perf record already sets the flag in a similar way when there
is more than one event, so add the same logic to evlist__new_default().
evlist__new_default() is only currently used in tests, so I don't
expect this change to have any other side effects. The other tests that
use it don't save and re-open the file so don't hit this issue.
The session topology test has been failing on Arm big.LITTLE platforms
since commit 251aa040244a3b17 ("perf parse-events: Wildcard most
"numeric" events") when evlist__new_default() started opening multiple
events for 'cycles'.
Fixes: 251aa040244a3b17 ("perf parse-events: Wildcard most "numeric" events") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com>
[ This was failing as well on a Rocket Lake Refresh/14700k Intel hybrid system - Arnaldo ] Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong1@huawei.com> Closes: https://lore.kernel.org/lkml/CAP-5=fWVQ-7ijjK3-w1q+k2WYVNHbAcejb-xY0ptbjRw476VKA@mail.gmail.com/ Link: https://lore.kernel.org/r/20240124094358.489372-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 30 Jan 2024 13:09:18 +0000 (10:09 -0300)]
tools headers x86 cpufeatures: Sync with the kernel sources to pick TDX, Zen, APIC MSR fence changes
To pick the changes from:
1e536e10689700e0 ("x86/cpu: Detect TDX partial write machine check erratum") 765a0542fdc7aad7 ("x86/virt/tdx: Detect TDX during kernel boot") 30fa92832f405d5a ("x86/CPU/AMD: Add ZenX generations flags") 04c3024560d3a14a ("x86/barrier: Do not serialize MSR accesses on AMD")
This causes these perf files to be rebuilt and brings some X86_FEATURE
that will be used when updating the copies of
tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kai Huang <kai.huang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Used in some architectures to create syscall tables.
This addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZbfMuAlUMRO9Hqa6@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Sat, 27 Jan 2024 17:52:23 +0000 (14:52 -0300)]
tools headers UAPI: Sync kvm headers with the kernel sources
To pick the changes in:
a5d3df8ae13fada7 ("KVM: remove deprecated UAPIs") 6d72283526090850 ("KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT") 89ea60c2c7b5838b ("KVM: x86: Add support for "protected VMs" that can utilize private memory") 8dd2eee9d526c30f ("KVM: x86/mmu: Handle page fault for private memory") a7800aa80ea4d535 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") 5a475554db1e476a ("KVM: Introduce per-page memory attributes") 16f95f3b95caded2 ("KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace") bb58b90b1a8f753b ("KVM: Introduce KVM_SET_USER_MEMORY_REGION2") 3f9cd0ca848413fd ("KVM: arm64: Allow userspace to get the writable masks for feature ID registers")
That automatically adds support for some new ioctls and remove a bunch
of deprecated ones.
This ends up making the new binary to forget about the deprecated one,
so when used in an older system it will not be able to resolve those
codes to strings.
error: 'calloc' sizes specified with 'sizeof' in the earlier argument and
not in the later argument [-Werror=calloc-transposed-args]
Committer notes:
I noticed this on fedora 40 and rawhide.
Signed-off-by: Sun Haiyong <sunhaiyong@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240106094129.3337057-1-siyanteng@loongson.cn Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sun Haiyong [Mon, 4 Dec 2023 08:20:55 +0000 (16:20 +0800)]
perf top: Remove needless malloc(0) call that triggers -Walloc-size
GCC 14 introduces a new -Walloc-size included in -Wextra which errors out
like:
builtin-top.c: In function ‘prompt_integer’:
builtin-top.c:360:21: error: allocation of insufficient size ‘0’ for
type ‘char’ with size ‘1’ [-Werror=alloc-size]
360 | char *buf = malloc(0), *p;
| ^~~~~~
Just set it to NULL, getline() will do the allocation.
Signed-off-by: Sun Haiyong <sunhaiyong@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231204082055.91877-1-siyanteng@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yicong Yang [Mon, 22 Jan 2024 08:04:06 +0000 (16:04 +0800)]
perf build: Make minimal shellcheck version to v0.6.0
The perf build failed due to the shellcheck on my machine (v0.4.6 on Ubuntu
18.04.1 LTS) doesn't support -a/--check-sourced and -S/--severity option.
These two options are introduced in shellcheck v0.4.7 and v0.6.0
respectively. So restrict the minimal version of shellcheck to v0.6.0.
Fixes: b809fc656e763296 ("perf build: Shellcheck support for OUTPUT directory") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Junhao He <hejunhao3@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20240122080406.28678-1-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 26 Jan 2024 14:00:01 +0000 (11:00 -0300)]
tools headers UAPI: Update tools's copy of drm.h headers to pick DRM_IOCTL_MODE_CLOSEFB
Picking the changes from:
8570c27932e132d2 ("drm/syncobj: Add deadline support for syncobj waits") 9724ed6c1b1212d1 ("drm: Introduce DRM_CLIENT_CAP_CURSOR_PLANE_HOTSPOT") e4d983acffff270c ("drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP") d208d875667e2a29 ("drm: introduce CLOSEFB IOCTL") afa5cf3175a22b71 ("drm/i915/uapi: fix typos/spellos and punctuation")
Addressing these perf build warnings:
Warning: Kernel ABI header differences:
Now 'perf trace' and other code that might use the
tools/perf/trace/beauty autogenerated tables will be able to translate
this new ioctl command into a string:
$ tools/perf/trace/beauty/drm_ioctl.sh > before
$ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h
$ tools/perf/trace/beauty/drm_ioctl.sh > after
$ diff -u before after
--- before 2024-01-26 10:54:23.486381862 -0300
+++ after 2024-01-26 10:54:35.767902442 -0300
@@ -109,6 +109,7 @@
[0xCD] = "SYNCOBJ_TIMELINE_SIGNAL",
[0xCE] = "MODE_GETFB2",
[0xCF] = "SYNCOBJ_EVENTFD",
+ [0xD0] = "MODE_CLOSEFB",
[DRM_COMMAND_BASE + 0x00] = "I915_INIT",
[DRM_COMMAND_BASE + 0x01] = "I915_FLUSH",
[DRM_COMMAND_BASE + 0x02] = "I915_FLIP",
$
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Ian Rogers <irogers@google.com> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rob Clark <robdclark@chromium.org> Cc: Simon Ser <contact@emersion.fr> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Zack Rusin <zack.rusin@broadcom.com> Link: https://lore.kernel.org/lkml/ZbPIN9Dcc5AM0uxo@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 24 Jan 2024 04:30:15 +0000 (20:30 -0800)]
perf test shell daemon: Make signal test less racy
The daemon signal test sends signals and then expects files to be
written. It was observed on an Intel Alderlake that the signals were
sent too quickly leading to the 3 expected files not appearing.
To avoid this send the next signal only after the expected previous file
has appeared. To avoid an infinite loop the number of retries is
limited.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ross Zwisler <zwisler@chromium.org> Cc: Shirisha G <shirisha@linux.ibm.com> Link: https://lore.kernel.org/r/20240124043015.1388867-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 24 Jan 2024 04:30:14 +0000 (20:30 -0800)]
perf test shell script: Fix test for python being disabled
"grep -cv" can exit with an error code that causes the "set -e" to abort
the script. Switch to using the grep exit code in the if condition to
avoid this.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ross Zwisler <zwisler@chromium.org> Cc: Shirisha G <shirisha@linux.ibm.com> Link: https://lore.kernel.org/r/20240124043015.1388867-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Thu, 25 Jan 2024 10:03:51 +0000 (11:03 +0100)]
perf test: Fix 'perf script' tests on s390
In linux next repo, test case 'perf script tests' fails on s390.
The root case is a command line invocation of 'perf record' with
call-graph information. On s390 only DWARF formatted call-graphs are
supported and only on software events.
Change the command line parameters for s390.
Output before:
# perf test 89
89: perf script tests : FAILED!
#
Output after:
# perf test 89
89: perf script tests : Ok
#
Fixes: 0dd5041c9a0eaf8c ("perf addr_location: Add init/exit/copy functions") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20240125100351.936262-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 25 Jan 2024 14:23:56 +0000 (11:23 -0300)]
tools headers UAPI: Sync linux/fcntl.h with the kernel sources
To get the changes in:
8a924db2d7b5eb69 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")
That don't add anything that is handled by existing hard coded tables or
table generation scripts.
This silences this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stefan Berger <stefanb@linux.ibm.com> Link: https://lore.kernel.org/lkml/ZbJv9fGF_k2xXEdr@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If we use -v (verbose mode) we can see what it does behind the scenes:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_MKTME_KEYID_PARTITIONING"
Using CPUID GenuineIntel-6-8E-A
0x87
New filter for msr:read_msr: (msr==0x87) && (common_pid != 58627 && common_pid != 3792)
0x87
New filter for msr:write_msr: (msr==0x87) && (common_pid != 58627 && common_pid != 3792)
mmap size 528384B
^C#
Example with a frequent msr:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
Using CPUID AuthenticAMD-25-21-0
0x48
New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
0x48
New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
mmap size 528384B
Looking at the vmlinux_path (8 entries long)
symsrc__init: build id mismatch for vmlinux.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule ([kernel.kallsyms])
futex_wait_queue_me ([kernel.kallsyms])
futex_wait ([kernel.kallsyms])
do_futex ([kernel.kallsyms])
__x64_sys_futex ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
__futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule_idle ([kernel.kallsyms])
do_idle ([kernel.kallsyms])
cpu_startup_entry ([kernel.kallsyms])
secondary_startup_64_no_verify ([kernel.kallsyms])
#
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kai Huang <kai.huang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZbJt27rjkQVU1YoP@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That add STATX_MNT_ID_UNIQUE that was manually added to
tools/perf/trace/beauty/statx.c, at some point this should move to the
shell based automated way.
This silences this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZbJq08s19890WDo-@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 24 Jan 2024 23:42:00 +0000 (15:42 -0800)]
perf pmu: Treat the msr pmu as software
The msr PMU is a software one, meaning msr events may be grouped
with events in a hardware context. As the msr PMU isn't marked as a
software PMU by perf_pmu__is_software, groups with the msr PMU in
are broken and the msr events placed in a different group. This
may lead to multiplexing errors where a hardware event isn't
counted while the msr event, such as tsc, is. Fix all of this by
marking the msr PMU as software, which agrees with the driver.
Before:
```
$ perf stat -e '{slots,tsc}' -a true
WARNING: events were regrouped to match PMUs
Fixes: 251aa040244a ("perf parse-events: Wildcard most "numeric" events") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240124234200.1510417-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Linus Torvalds [Thu, 25 Jan 2024 18:58:35 +0000 (10:58 -0800)]
Merge tag 'net-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf, netfilter and WiFi.
Jakub is doing a lot of work to include the self-tests in our CI, as a
result a significant amount of self-tests related fixes is flowing in
(and will likely continue in the next few weeks).
Current release - regressions:
- bpf: fix a kernel crash for the riscv 64 JIT
- bnxt_en: fix memory leak in bnxt_hwrm_get_rings()
- revert "net: macsec: use skb_ensure_writable_head_tail to expand
the skb"
Previous releases - regressions:
- core: fix removing a namespace with conflicting altnames
- tcp:
- make sure init the accept_queue's spinlocks once
- fix autocork on CPUs with weak memory model
- udp: fix busy polling
- mlx5e:
- fix out-of-bound read in port timestamping
- fix peer flow lists corruption
- iwlwifi: fix a memory corruption
Previous releases - always broken:
- netfilter:
- nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress
basechain
- nft_limit: reject configurations that cause integer overflow
- bpf: fix bpf_xdp_adjust_tail() with XSK zero-copy mbuf, avoiding a
NULL pointer dereference upon shrinking
- llc: make llc_ui_sendmsg() more robust against bonding changes
- smc: fix illegal rmb_desc access in SMC-D connection dump
- dpll: fix pin dump crash for rebound module
- bnxt_en: fix possible crash after creating sw mqprio TCs
- hv_netvsc: calculate correct ring size when PAGE_SIZE is not 4kB
Misc:
- several self-tests fixes for better integration with the netdev CI
- added several missing modules descriptions"
* tag 'net-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ring
tsnep: Remove FCS for XDP data path
net: fec: fix the unhandled context fault from smmu
selftests: bonding: do not test arp/ns target with mode balance-alb/tlb
fjes: fix memleaks in fjes_hw_setup
i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
i40e: set xdp_rxq_info::frag_size
xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
ice: remove redundant xdp_rxq_info registration
i40e: handle multi-buffer packets that are shrunk by xdp prog
ice: work on pre-XDP prog frag count
xsk: fix usage of multi-buffer BPF helpers for ZC XDP
xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
xsk: recycle buffer in case Rx queue was full
net: fill in MODULE_DESCRIPTION()s for rvu_mbox
net: fill in MODULE_DESCRIPTION()s for litex
net: fill in MODULE_DESCRIPTION()s for fsl_pq_mdio
net: fill in MODULE_DESCRIPTION()s for fec
...
Linus Torvalds [Thu, 25 Jan 2024 18:52:30 +0000 (10:52 -0800)]
Merge tag 'ovl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs fix from Amir Goldstein:
"Change the on-disk format for the new "xwhiteouts" feature introduced
in v6.7
The change reduces unneeded overhead of an extra getxattr per readdir.
The only user of the "xwhiteout" feature is the external composefs
tool, which has been updated to support the new on-disk format.
This change is also designated for 6.7.y"
* tag 'ovl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: mark xwhiteouts directory with overlay.opaque='x'
Linus Torvalds [Thu, 25 Jan 2024 18:41:29 +0000 (10:41 -0800)]
Merge tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull netfs fixes from Christian Brauner:
"This contains various fixes for the netfs work merged earlier this
cycle:
afs:
- Fix locking imbalance in afs_proc_addr_prefs_show()
- Remove afs_dynroot_d_revalidate() which is redundant
- Fix error handling during lookup
- Hide sillyrenames from userspace. This fixes a race between
silly-rename files being created/removed and userspace iterating
over directory entries
- Don't use unnecessary folio_*() functions
cifs:
- Don't use unnecessary folio_*() functions
cachefiles:
- erofs: Fix Null dereference when cachefiles are not doing
ondemand-mode
- Update mailing list
netfs library:
- Add Jeff Layton as reviewer
- Update mailing list
- Fix a error checking in netfs_perform_write()
- fscache: Check error before dereferencing
- Don't use unnecessary folio_*() functions"
* tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
afs: Fix missing/incorrect unlocking of RCU read lock
afs: Remove afs_dynroot_d_revalidate() as it is redundant
afs: Fix error handling with lookup via FS.InlineBulkStatus
afs: Hide silly-rename files from userspace
cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode
netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write()
netfs, fscache: Prevent Oops in fscache_put_cache()
cifs: Don't use certain unnecessary folio_*() functions
afs: Don't use certain unnecessary folio_*() functions
netfs: Don't use certain unnecessary folio_*() functions
netfs: Add Jeff Layton as reviewer
netfs, cachefiles: Change mailing list
Linus Torvalds [Thu, 25 Jan 2024 18:26:52 +0000 (10:26 -0800)]
Merge tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix in-kernel RPC UDP transport
- Fix NFSv4.0 RELEASE_LOCKOWNER
* tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix RELEASE_LOCKOWNER
SUNRPC: use request size to initialize bio_vec in svc_udp_sendto()