Update to v1.14, the metrics are based on TMA 4.4 full.
Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
to download and generate the latest events and metrics. Manually copy
the icelake files into perf and update mapfile.csv.
Tested on a non-icelake with 'perf test':
10: PMU events :
10.1: PMU event table sanity : Ok
10.2: PMU event map aliases : Ok
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: http://lore.kernel.org/lkml/20220727220832.2865794-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
- "PublicDescription": "Counts all retired load instructions. This event accounts for SW prefetch instructions for loads.",
+ "PublicDescription": "Counts all retired load instructions. This event accounts for SW prefetch instructions of PREFETCHNTA or PREFETCHT0/1/2 or PREFETCHW.",
"SampleAfterValue": "1000003",
"UMask": "0x81"
},
{
- "BriefDescription": "All retired store instructions.",
+ "BriefDescription": "Retired store instructions.",
- "PublicDescription": "Actual per-core usage of the Floating Point execution units (regardless of the vector width). Values > 1 are possible due to Fused-Multiply Add (FMA) counting."
+ "PublicDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar or 128/256-bit vectors - less common)."
},
{
- "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)",
+ "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per-core",
"PublicDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means higher occurrence rate). May undercount due to FMA double counting."
},
+ {
+ "BriefDescription": "Instructions per Software prefetch instruction (of any type: NTA/T0/T1/T2/Prefetch) (lower number means higher occurrence rate)",
- "PublicDescription": "Actual Average Latency for L1 data-cache miss demand load instructions (in core cycles). Latency may be overestimated for multi-load instructions - e.g. repeat strings."
+ "MetricName": "Load_Miss_Real_Latency"
},
{
"BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)",
- "BriefDescription": "Fill Buffer (FB) true hits per kilo instructions for retired demand loads",
+ "BriefDescription": "Fill Buffer (FB) hits per kilo instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)",
+ "PublicDescription": "Giga Floating Point Operations Per Second. Aggregate across all supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine."
},
{
"BriefDescription": "Average Frequency Utilization relative nominal frequency",