James Clark [Wed, 13 Aug 2025 13:38:51 +0000 (14:38 +0100)]
perf test: Extend branch stack sampling test for Arm64 BRBE
BRBE emits IRQ and ERET branches for branching and returning from
trapped instructions. Add a test that loops on a trapped instruction
(MRS - Read special register) for this.
Extend the expected 'any_call' branches to include FAULT_DATA and
FAULT_INST as these are emitted by BRBE.
Reviewed-by: Ian Rogers <irogers@google.com> Co-developed-by: German Gomez <german.gomez@arm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adam Young <admiyo@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Wed, 13 Aug 2025 13:38:50 +0000 (14:38 +0100)]
perf test: Add syscall and address tests to brstack test
Test that SYSCALL type branches are emitted from the expected 'getppid'
symbol. Test that when only 'k' is used, sources addresses are all in
the kernel. Test that no kernel addresses leak by checking for them in
the 'u' test.
Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adam Young <admiyo@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Wed, 13 Aug 2025 13:38:49 +0000 (14:38 +0100)]
perf test: Refactor brstack test
check_branches() will be used by other tests in a later commit so make
it a function. And the any_call filters are duplicated and will also
be extended in a later commit, so move them to a variable.
No functional changes intended.
Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adam Young <admiyo@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 1 Oct 2025 18:12:28 +0000 (11:12 -0700)]
perf bpf_counter: Fix handling of cpumap fixing hybrid
Don't open evsels on all CPUs, open them just on the CPUs they
support. This avoids opening say an e-core event on a p-core and
getting a failure - achieve this by getting rid of the "all_cpu_map".
In install_pe functions don't use the cpu_map_idx as a CPU number,
translate the cpu_map_idx, which is a dense index into the cpu_map
skipping holes at the beginning, to a proper CPU number.
Before:
```
$ perf stat --bpf-counters -a -e cycles,instructions -- sleep 1
Ian Rogers [Wed, 1 Oct 2025 18:12:27 +0000 (11:12 -0700)]
perf bpf_counter: Move header declarations into C code
Reduce the API surface that is in bpf_counter.h, this helps compiler
analysis like unused static function, makes it easier to set a
breakpoint and just makes it easier to see the code is self contained.
When code is shared between BPF C code, put it inside HAVE_BPF_SKEL.
Move transitively found #includes into appropriate C files.
No functional change.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf annotate: Use architecture-agnostic register limit
Remove the arch-specific guard around TYPE_STATE_MAX_REGS and define it
as 32 for all architectures.
The architecture that perf is built on may not match the architecture
that produced the perf.data file, so relying on __powerpc__ or similar
is fragile.
Using 32 as a fixed upper bound is safe since it is greater than the
previous maximum of 16.
Add a comment to clarify that TYPE_STATE_MAX_REGS is an arch-independent
maximum rather than a build-time choice.
Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Thu, 25 Sep 2025 16:24:28 +0000 (09:24 -0700)]
MAINTAINERS: Remove myself from perf_events subsystem
I'm stepping down as the Reviewer of perf_events subsystem.
It has been an honor and a pleasure to work with everyone to improve the
perf_events subsystem. However, due to personal reasons, I have to leave
Intel. I believe it would be difficult for me to continue in this role
any further.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The process_event() function in "builtin-script.c" invokes
perf_sample__fprintf_synth() for displaying PERF_TYPE_SYNTH
type events.
if (attr->type == PERF_TYPE_SYNTH && PRINT_FIELD(SYNTH))
perf_sample__fprintf_synth(sample, evsel, fp);
perf_sample__fprintf_synth() process the sample depending on the value
in evsel->core.attr.config. Introduce perf_sample__fprintf_synth_vpadtl()
and invoke this for PERF_SYNTH_POWERPC_VPA_DTL
Sample output:
./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.300 MB perf.data ]
perf powerpc: Process the DTL entries in queue and deliver samples
Create samples from DTL entries for displaying in 'perf report'
and 'perf script'.
When the different PERF_RECORD_XX records are processed from perf
session, powerpc_vpadtl_process_event() will be invoked.
For each of the PERF_RECORD_XX record, compare the timestamp of perf
record with timestamp of top element in the auxtrace heap.
Process the auxtrace queue if the timestamp of element from heap is
lower than timestamp from entry in perf record.
Sometimes it could happen that one buffer is only partially processed.
if the timestamp of occurrence of another event is more than currently
processed element in the queue, it will move on to next perf record.
So keep track of position of buffer to continue processing next time.
Update the timestamp of the auxtrace heap with the timestamp of last
processed entry from the auxtrace buffer.
Generate perf sample for each entry in the dispatch trace log.
Fill in the sample details:
- sample ip is picked from srr0 field of dtl_entry
- sample cpu is picked from processor_id of dtl_entry
- sample id is from sample_id of powerpc_vpadtl
- cpumode is set to PERF_RECORD_MISC_KERNEL
- Additionally save the details in raw_data of sample.
This is to print the relevant fields in perf_sample__fprintf_synth()
when called from builtin-script
The sample is processed by calling perf_session__deliver_synth_event()
so that it gets included in perf report.
Sample Output:
./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.300 MB perf.data ]
perf powerpc: Allocate and setup aux buffer queue to help co-relate with other events across CPU's
When the Dispatch Trace Log data is collected along with other events
like sched tracepoint events, it needs to be correlated and present
interleaved along with these events.
Perf events can be collected parallely across the CPUs. Hence it needs
to be ensured events/dtl entries are processed in timestamp order.
An auxtrace_queue is created for each CPU.
Data within each queue is in increasing order of timestamp. Each
auxtrace queue has a array/list of auxtrace buffers.
When processing the auxtrace buffer, the data is mmapp'ed.
All auxtrace queues is maintained in auxtrace heap.
Each queue has a queue number and a timestamp.
The queues are sorted/added to head based on the time stamp.
So always the lowest timestamp (entries to be processed first) is on top
of the heap.
The auxtrace queue needs to be allocated and heap needs to be populated
in the sorted order of timestamp.
The queue needs to be filled with data only once via
powerpc_vpadtl__update_queues() function.
powerpc_vpadtl__setup_queues() iterates through all the entries to
allocate and setup the auxtrace queue.
To add to auxtrace heap, it is required to fetch the timebase of first
entry for each of the queue.
The first entry in the queue for VPA DTL PMU has the boot timebase,
frequency details which are needed to get timestamp which is required to
correlate with other events.
The very next entry is the actual trace data that provides timestamp for
occurrence of DTL event.
Formula used to get the timestamp from dtl entry is:
powerpc_vpadtl_decode() adds the boot time and frequency as part of
powerpc_vpadtl_queue structure so that it can be reused.
Each of the dtl_entry is of 48 bytes size. Sometimes it could happen
that one buffer is only partially processed (if the timestamp of
occurrence of another event is more than currently processed element in
queue, it will move on to next event).
In order to keep track of position of buffer, additional fields is added
to powerpc_vpadtl_queue structure.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Tested-by: Tejas Manhas <tejas05@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Aboorva Devarajan <aboorvad@linux.ibm.com> Cc: Aditya Bodkhe <Aditya.Bodkhe1@ibm.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf powerpc: Add event name as vpa-dtl of PERF_TYPE_SYNTH type to present DTL samples
Dispatch Trace Log details are captured as-is in PERF_RECORD_AUXTRACE
records.
To present dtl entries as samples, create an event with name as
"vpa-dtl" and type PERF_TYPE_SYNTH.
Add perf_synth_id, "PERF_SYNTH_POWERPC_VPA_DTL" as config value for the
event.
Create a sample id to be a fixed offset from evsel id.
To present the relevant fields from the "struct dtl_entry", prepare the
entries as events of type PERF_TYPE_SYNTH.
By defining as PERF_TYPE_SYNTH type, samples can be printed as part of
perf_sample__fprintf_synth in builtin-script.c
From powerpc_vpadtl_process_auxtrace_info(), invoke
auxtrace_queues__process_index() function which will queue the auxtrace
buffers by invoke auxtrace_queues__add_event().
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Tested-by: Tejas Manhas <tejas05@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Aboorva Devarajan <aboorvad@linux.ibm.com> Cc: Aditya Bodkhe <Aditya.Bodkhe1@ibm.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf powerpc: Process auxtrace events and display in 'perf report -D'
Add VPA DTL PMU auxtrace process function for "perf report -D".
The auxtrace event processing functions are defined in file
"util/powerpc-vpadtl.c".
Data structures used includes "struct powerpc_vpadtl_queue", "struct
powerpc_vpadtl" to store the auxtrace buffers in queue. Different
PERF_RECORD_XXX are generated during recording.
PERF_RECORD_AUXTRACE_INFO is processed first since it is of type
perf_user_event_type and perf session event delivers
perf_session__process_user_event() first.
Define function powerpc_vpadtl_process_auxtrace_info() to handle the
processing of PERF_RECORD_AUXTRACE_INFO records.
In this function, initialize the aux buffer queues using
auxtrace_queues__init().
Setup the required infrastructure for aux data processing.
The data is collected per CPU and auxtrace_queue is created for each
CPU.
Define powerpc_vpadtl_process_event() function to process
PERF_RECORD_AUXTRACE records.
In this, add the event to queue using auxtrace_queues__add_event() and
process the buffer in powerpc_vpadtl_dump_event().
The first entry in the buffer with timebase as zero has boot timebase
and frequency.
Remaining data is of format for "struct powerpc_vpadtl_entry".
Define the translation for dispatch_reasons and preempt_reasons, report
this when dump trace is invoked via powerpc_vpadtl_dump()
Sample output:
./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.300 MB perf.data ]
Ian Rogers [Sun, 14 Sep 2025 18:33:53 +0000 (11:33 -0700)]
perf sched: Avoid union type punning undefined behavior
A union is used to set the priv value in thread (a void*) to a boolean
value through type punning. Undefined behavior sanitizer fails on this.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Test cases from perftool_testsuite are affected by the current
directory where the test are run. For this reason, the test
driver has to change the directory to the base_dir for references to
work correctly.
Utilize absolute paths when sourcing and referencing other scripts so
that the current working directory doesn't impact the test cases.
Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Jakub Brnak <jbrnak@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Markus Heidelberg [Thu, 25 Sep 2025 11:26:13 +0000 (13:26 +0200)]
perf tools: Fix duplicated words in documentation and comments
- "the the"
- "in in"
- "a a"
Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Markus Heidelberg <m.heidelberg@cab.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf bpf: Check libbpf version to use btf_dump_type_data_opts.emit_strings
When building perf with LIBBPF_DYNAMIC=1 on a fedora system with
libbpf-devel 1.5 I it was breaking with:
util/bpf-event.c: In function ‘format_btf_variable’:
util/bpf-event.c:291:18: error: ‘const struct btf_dump_type_data_opts’ has no member named ‘emit_strings’
291 | .emit_strings = 1,
| ^~~~~~~~~~~~
util/bpf-event.c:291:33: error: initialized field overwritten [-Werror=override-init]
291 | .emit_strings = 1,
| ^
util/bpf-event.c:291:33: note: (near initialization for ‘opts.skip_names’)
Check the version before using that feature.
Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf bpf: Move the LIBBPF_CURRENT_VERSION_GEQ macro to bpf-utils.h
We need it to fix some other libbpf version dependent issues when
building with LIBBPF_DYNAMIC=1.
Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Sat, 30 Aug 2025 05:35:49 +0000 (22:35 -0700)]
perf bpf-filter: Fix opts declaration on older libbpfs
Building perf with LIBBPF_DYNAMIC (ie not the default static linking of
libbpf with perf) is breaking as the libbpf isn't version 1.7 or newer,
where dont_enable is added to bpf_perf_event_opts.
To avoid this breakage add a compile time version check and don't
declare the variable when not present.
Fixes: 5e2ac8e8571df54d ("perf bpf-filter: Enable events manually") Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: bpf@vger.kernel.org Cc: Hao Ge <gehao@kylinos.cn> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 18 Sep 2025 17:24:16 +0000 (10:24 -0700)]
perf build-id: Ensure snprintf string is empty when size is 0
The string result of build_id__snprintf() is unconditionally used in
places like dsos__fprintf_buildid_cb(). If the build id has size 0 then
this creates a use of uninitialized memory. Add null termination for the
size 0 case.
A similar fix was written by Jiri Olsa in commit 6311951d4f8f28c4 ("perf
tools: Initialize output buffer in build_id__sprintf") but lost in the
transition to snprintf.
Fixes: fccaaf6fbbc59910 ("perf build-id: Change sprintf functions to snprintf") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 18 Sep 2025 17:24:15 +0000 (10:24 -0700)]
perf evsel: Ensure the fallback message is always written to
The fallback message is unconditionally printed in places like
record__open().
If no fallback is attempted this can lead to printing uninitialized
data, crashes, etc.
Fixes: c0a54341c0e89333 ("perf evsel: Introduce event fallback method") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 18 Sep 2025 22:22:02 +0000 (15:22 -0700)]
perf test: Avoid uncore_imc/clockticks in uniquification test
The detection of uncore_imc may happen for free running PMUs and the
clockticks event may be present on uncore_clock. Rewrite the test to
detect duplicated/deduplicated events from perf list, not hardcoded to
uncore_imc.
If perf stat fails then assume it is permissions and skip the test.
Committer testing:
Before:
root@x1:~# perf test -vv uniquifyi
96: perf stat events uniquifying:
--- start ---
test child forked, pid 220851
stat event uniquifying test
grep: Unmatched [, [^, [:, [., or [=
Event is not uniquified [Failed]
perf stat -e clockticks -A -o /tmp/__perf_test.stat_output.X7ChD -- true
# started on Fri Sep 19 16:48:38 2025
root@x1:~# perf test -vv uniquifyi
96: perf stat events uniquifying:
--- start ---
test child forked, pid 222366
Uniquification of PMU sysfs events test
Testing event uncore_imc_free_running/data_read/ is uniquified to uncore_imc_free_running_0/data_read/
Testing event uncore_imc_free_running/data_total/ is uniquified to uncore_imc_free_running_0/data_total/
Testing event uncore_imc_free_running/data_write/ is uniquified to uncore_imc_free_running_0/data_write/
Testing event uncore_imc_free_running/data_read/ is uniquified to uncore_imc_free_running_1/data_read/
Testing event uncore_imc_free_running/data_total/ is uniquified to uncore_imc_free_running_1/data_total/
Testing event uncore_imc_free_running/data_write/ is uniquified to uncore_imc_free_running_1/data_write/
---- end(0) ----
96: perf stat events uniquifying : Ok
root@x1:~#
Fixes: 070b315333ee942f ("perf test: Restrict uniquifying test to machines with 'uncore_imc'") Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.ibm.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 18 Sep 2025 22:22:00 +0000 (15:22 -0700)]
perf test: Don't leak workload gopipe in PERF_RECORD_*
The test starts a workload and then opens events. If the events fail
to open, for example because of perf_event_paranoid, the gopipe of the
workload is leaked and the file descriptor leak check fails when the
test exits. To avoid this cancel the workload when opening the events
fails.
Before:
```
$ perf test -vv 7
7: PERF_RECORD_* events & perf_sample fields:
--- start ---
test child forked, pid 1189568
Using CPUID GenuineIntel-6-B7-1
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -13
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
disabled 1
exclude_kernel 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -13
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
disabled 1
exclude_kernel 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3
Attempt to add: software/cpu-clock/
..after resolving event: software/config=0/
cpu-clock -> software/cpu-clock/
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0x9 (PERF_COUNT_SW_DUMMY)
sample_type IP|TID|TIME|CPU
read_format ID|LOST
disabled 1
inherit 1
mmap 1
comm 1
enable_on_exec 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
{ wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 1189569 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -13
perf_evlist__open: Permission denied
---- end(-2) ----
Leak of file descriptor 6 that opened: 'pipe:[14200347]'
---- unexpected signal (6) ----
iFailed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
#0 0x565358f6666e in child_test_sig_handler builtin-test.c:311
#1 0x7f29ce849df0 in __restore_rt libc_sigaction.c:0
#2 0x7f29ce89e95c in __pthread_kill_implementation pthread_kill.c:44
#3 0x7f29ce849cc2 in raise raise.c:27
#4 0x7f29ce8324ac in abort abort.c:81
#5 0x565358f662d4 in check_leaks builtin-test.c:226
#6 0x565358f6682e in run_test_child builtin-test.c:344
#7 0x565358ef7121 in start_command run-command.c:128
#8 0x565358f67273 in start_test builtin-test.c:545
#9 0x565358f6771d in __cmd_test builtin-test.c:647
#10 0x565358f682bd in cmd_test builtin-test.c:849
#11 0x565358ee5ded in run_builtin perf.c:349
#12 0x565358ee6085 in handle_internal_command perf.c:401
#13 0x565358ee61de in run_argv perf.c:448
#14 0x565358ee6527 in main perf.c:555
#15 0x7f29ce833ca8 in __libc_start_call_main libc_start_call_main.h:74
#16 0x7f29ce833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
#17 0x565358e391c1 in _start perf[851c1]
7: PERF_RECORD_* events & perf_sample fields : FAILED!
```
Fixes: 16d00fee703866c6 ("perf tests: Move test__PERF_RECORD into separate object") Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.ibm.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 8 Sep 2025 18:19:18 +0000 (11:19 -0700)]
tools build: Make libperl opt-in rather than opt-out, deprecate
If libperl is installed then the perf tool build will build against
it. There appears to be limited interest in the scripting support for
perl so let's make it opt-in and deprecate it.
With this patch applied you need to add LIBPERL=1 to get libperl
support in perf - there is no warning if libperl is missing, but
building will fail if libperl is missing and the build has LIBPERL=1.
The perf version output is changed to:
```
$ perf version --build-options
perf version 6.17.rc3.g8eca69269947
aio: [ on ] # HAVE_AIO_SUPPORT
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
bpf_skeletons: [ on ] # HAVE_BPF_SKEL
debuginfod: [ on ] # HAVE_DEBUGINFOD_SUPPORT
dwarf: [ on ] # HAVE_LIBDW_SUPPORT
dwarf_getlocations: [ on ] # HAVE_LIBDW_SUPPORT
dwarf-unwind: [ on ] # HAVE_DWARF_UNWIND_SUPPORT
auxtrace: [ on ] # HAVE_AUXTRACE_SUPPORT
libbfd: [ OFF ] # HAVE_LIBBFD_SUPPORT ( tip: Deprecated, license incompatibility, use BUILD_NONDISTRO=1 and install binutils-dev[el] )
libbpf-strings: [ on ] # HAVE_LIBBPF_STRINGS_SUPPORT
libcapstone: [ on ] # HAVE_LIBCAPSTONE_SUPPORT
libdw-dwarf-unwind: [ on ] # HAVE_LIBDW_SUPPORT
libelf: [ on ] # HAVE_LIBELF_SUPPORT
libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT
libopencsd: [ OFF ] # HAVE_CSTRACE_SUPPORT
libperl: [ OFF ] # HAVE_LIBPERL_SUPPORT ( tip:
Deprecated, use LIBPERL=1 and install libperl-dev to build with it )
libpfm4: [ on ] # HAVE_LIBPFM
libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT
libslang: [ on ] # HAVE_SLANG_SUPPORT
libtraceevent: [ on ] # HAVE_LIBTRACEEVENT
libunwind: [ OFF ] # HAVE_LIBUNWIND_SUPPORT ( tip:
Deprecated, use LIBUNWIND=1 and install libunwind-dev[el] to build
with it )
lzma: [ on ] # HAVE_LZMA_SUPPORT
numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT
zlib: [ on ] # HAVE_ZLIB_SUPPORT
zstd: [ on ] # HAVE_ZSTD_SUPPORT
```
i.e. there is a tip saying about deprecation and how to get support
back.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <qmo@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Yuzhuo Jing <yuzhuo@google.com> Link: https://lore.kernel.org/lkml/aMrk03gigBlGcYLK@x1/ Link: https://lore.kernel.org/lkml/CAP-5=fVX+bLBRJCiziDi_hBySgv2NFtDoghtpheSSxVAvvETGw@mail.gmail.com
[ Keep the pre-existing perl-ExtUtils-Embed hint for Fedora/RHEL systems ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Fri, 8 Aug 2025 13:24:40 +0000 (14:24 +0100)]
perf session: Fix handling when buffer exceeds 2 GiB
If a user specifies an AUX buffer larger than 2 GiB, the returned size
may exceed 0x80000000. Since the err variable is defined as a signed
32-bit integer, such a value overflows and becomes negative.
As a result, the perf record command reports an error:
0x146e8 [0x30]: failed to process type: 71 [Unknown error 183711232]
Change the type of the err variable to a signed 64-bit integer to
accommodate large buffer sizes correctly.
Fixes: d5652d865ea734a1 ("perf session: Add ability to skip 4GiB or more") Reported-by: Tamas Zsoldos <tamas.zsoldos@arm.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250808-perf_fix_big_buffer_size-v1-1-45f45444a9a4@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Split mem benchmark options into common and memset/memcpy specific.
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The default value is identical to the total size of the region, which
preserves current behaviour.
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Both the reservation and node from which hugepages are allocated
from are expected to be addressed by the user.
An example of page-size selection:
$ perf bench mem memset -s 4gb -p 2mb
# Running 'mem/memset' benchmark:
# function 'default' (Default memset() provided by glibc)
# Copying 4gb bytes ...
14.919194 GB/sec
# function 'x86-64-unrolled' (unrolled memset() in arch/x86/lib/memset_64.S)
# Copying 4gb bytes ...
11.514503 GB/sec
# function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S)
# Copying 4gb bytes ...
12.600568 GB/sec
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Using mmap() ensures that the buffer is always aligned at a fixed
boundary. Switch to that to remove one source of variability.
Since we always want to read/write from the allocated buffers map
with pagetables pre-populated.
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf bench mem: Move mem op parameters into a structure
Move benchmark function parameters in struct bench_params.
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20250917152418.4077386-4-ankur.a.arora@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf bench mem: Defer type munging of size to float
Do type conversion to double at the point of use.
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20250917152418.4077386-3-ankur.a.arora@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf bench mem: Remove repetition around time measurement
We have two copies of each mem benchmark: one using cycles to
measure time, the second for gettimeofday().
Unify.
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf test attr: Add missing int_mist.uop_dropping event to test-stat files
Setup 'struct perf_event_attr' test was failing on EMR cpu because 'perf
stat' was providing an event that was not included in the test. Type 4
Config 4269 or 10ad, int_misc.uop_dropping.
Add event type=4 config=4269 to test-stat-default and
test-stat-detailed-* files with optional=1 so EMR (Emerald Rapids)
machines can pass the test.
Fixes: d9a6bb9e359e6f81 ("perf vendor events: Update emeraldrapids events/metrics") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Trevor Allison <tallison@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge tag 'sound-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. The volume became higher than wished, but
nothing really stands out -- all small, nice and smooth.
A slightly large change is found in qcom USB-audio offload stuff, but
this is a regression fix specific to this device, hence it should be
safe to apply at this late stage.
- Various small fixes for ASoC Cirrus, Realtek, lpass, Intel and
Qualcomm drivers
- ASoC SoundWire fixes
- A few TAS2781 HD-audio side-codec driver fixes
- A fix for Qualcomm USB-audio offload breakage
- Usual a few HD-audio quirks"
* tag 'sound-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (35 commits)
ALSA: hda/realtek: Fix mute led for HP Laptop 15-dw4xx
ALSA: hda: intel-dsp-config: Prevent SEGFAULT if ACPI_HANDLE() is NULL
ALSA: usb: qcom: Fix false-positive address space check
ASoC: rt5682s: Adjust SAR ADC button mode to fix noise issue
ASoC: Intel: PTL: Add entry for HDMI-In capture support to non-I2S codec boards.
ASoC: amd: acp: Fix incorrect retrival of acp_chip_info
ASoC: Intel: sof_sdw: use PRODUCT_FAMILY for Fatcat series
ASoC: qcom: sc8280xp: Fix sound card driver name match data for QCS8275
ALSA: hda/realtek: Fix volume control on Lenovo Thinkbook 13x Gen 4
ALSA: hda/realtek: Support Lenovo Thinkbook 13x Gen 5
ALSA: hda: cs35l41: Support Lenovo Thinkbook 13x Gen 5
ALSA: hda/realtek: Add ALC295 Dell TAS2781 I2C fixup
ALSA: hda/tas2781: Fix a potential race condition that causes a NULL pointer in case no efi.get_variable exsits
ASoC: qcom: sc8280xp: Enable DAI format configuration for MI2S interfaces
ASoC: qcom: q6apm-lpass-dais: Fix missing set_fmt DAI op for I2S
ASoC: qcom: audioreach: Fix lpaif_type configuration for the I2S interface
ASoC: Intel: catpt: Expose correct bit depth to userspace
ALSA: hda/tas2781: Fix the order of TAS2781 calibrated-data
ASoC: codecs: lpass-wsa-macro: Fix speaker quality distortion
ASoC: codecs: lpass-rx-macro: Fix playback quality distortion
...
Ian Rogers [Thu, 21 Aug 2025 22:18:31 +0000 (15:18 -0700)]
perf test shell lbr: Avoid failures with perf event paranoia
When not running as root and with higher perf event paranoia values
the perf record LBR tests could fail rather than skipping the
problematic tests.
Add the sensitivity to the test and confirm it passes with paranoia
values from -1 to 2.
Committer testing:
Testing with '$ perf test -vv lbr', i.e. as non root, and then comparing
the output shows the mentioned errors before this patch:
acme@x1:~$ grep -m1 "model name" /proc/cpuinfo
model name : 13th Gen Intel(R) Core(TM) i7-1365U
acme@x1:~$
Before:
132: perf record LBR tests : Skip
After:
132: perf record LBR tests : Ok
Fixes: 32559b99e0f59070 ("perf test: Add set of perf record LBR tests") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 8 Aug 2025 22:26:50 +0000 (15:26 -0700)]
perf tools: Remove a pointless check
Static analyser cppcheck says:
linux-6.16/tools/perf/util/tool_pmu.c:242:15: warning:
Opposite inner 'if' condition leads to a dead code block. [oppositeInnerCondition]
Source code is:
for (thread = 0; thread < nthreads; thread++) {
if (thread >= nthreads)
break;
Reported-by: David Binderman <dcb314@hotmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Athira Rajeev [Thu, 14 Aug 2025 11:49:08 +0000 (17:19 +0530)]
perf tests record: Update testcase to fix usage of affinity for machines with #CPUs > 1K
The perf record testcase fails on systems with more than 1K CPUs.
Testcase: perf test -vv "PERF_RECORD_* events & perf_sample fields"
PERF_RECORD_* events & perf_sample fields :
--- start ---
test child forked, pid 272482
sched_getaffinity: Invalid argument
sched__get_first_possible_cpu: Invalid argument
test child finished with -1
---- end ----
PERF_RECORD_* events & perf_sample fields: FAILED!
sched__get_first_possible_cpu uses "sched_getaffinity" to get the
cpumask and this call is returning EINVAL (Invalid argument).
This happens because the default mask size in glibc is 1024.
To overcome this 1024 CPUs mask size limitation of cpu_set_t, change the
mask size using the CPU_*_S macros ie, use CPU_ALLOC to allocate
cpumask, CPU_ALLOC_SIZE for size.
Same fix needed for mask which is used to setaffinity so that mask size
is large enough to represent number of possible CPU's in the system.
Reported-by: Tejas Manhas <tejas05@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Tejas Manhas <tejas05@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Aditya Bodkhe <Aditya.Bodkhe1@ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Zecheng Li [Mon, 25 Aug 2025 19:54:05 +0000 (19:54 +0000)]
perf dwarf-aux: Better variable collection for insn tracking
Utilizes the previous is_breg_access_indirect function to determine if
the register + offset stores the variable itself or the struct it points
to, save the information in die_var_type.is_reg_var_addr.
Since we are storing the real types in the stack state, we need to do a
type dereference when is_reg_var_addr is set to false for stack/frame
registers.
For other gp registers, skip the variable when the register is a pointer
to the type. If we want to accept these variables, we might also utilize
is_reg_var_addr in a different way, we need to mark that register as a
pointer to the type.
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Zecheng Li <zecheng@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xu Liu <xliuprof@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Zecheng Li [Mon, 25 Aug 2025 19:54:04 +0000 (19:54 +0000)]
perf dwarf-aux: More accurate variable type match for breg
Introduces the function is_breg_access_indirect to determine whether a
memory access involving a DW_OP_breg* operation refers to the variable's
value directly or requires dereferencing the variable's type as a
pointer based on the DWARF expression.
Previously, all breg based accesses were assumed to directly access the
variable's value (is_pointer = false).
The is_breg_access_indirect function handles three cases:
1. Base register + offset only: (e.g., DW_OP_breg7 RSP+88) The
calculated address is the location of the variable. The access is
direct, so no type dereference is needed. Returns false.
2. Base register + offset, followed by other operations ending in
DW_OP_stack_value, including DW_OP_deref: (e.g., DW_OP_breg*,
DW_OP_deref, DW_OP_stack_value) The DWARF expression computes the
variable's value, but that value requires a dereference. The memory
access is fetching that value, so no type dereference is needed.
Returns false.
3. Base register + offset, followed only by DW_OP_stack_value: (e.g.,
DW_OP_breg13 R13+256, DW_OP_stack_value) This indicates the value at
the base + offset is the variable's value. Since this value is being
used as an address in the memory access, the variable's type is
treated as a pointer and requires a type dereference. Returns true.
The is_pointer argument passed to match_var_offset is now set by
is_breg_access_indirect for breg accesses.
There are more complex expressions that includes multiple operations and
may require additional handling, such as DW_OP_deref without a
DW_OP_stack_value, or including multiple base registers. They are less
common in the Linux kernel dwarf and are skipped in check_allowed_ops.
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Zecheng Li <zecheng@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xu Liu <xliuprof@google.com> Link: https://lore.kernel.org/r/CAJUgMyK2wTiEZB__dtgCELmaNGFWhG1j0g9rv_C=cLD6Zq4_5w@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Gautam Menghani [Mon, 4 Aug 2025 11:06:01 +0000 (16:36 +0530)]
perf auxtrace: Avoid redundant NULL check in auxtrace_mmap_params__set_idx()
Since commit eead8a011477 ("libperf threadmap: Don't segv for index 0 for the
NULL 'struct perf_thread_map' pointer"), perf_thread_map__pid() can
check for a NULL map and return -1 if idx is 0. Cleanup
auxtrace_mmap_params__set_idx() and remove the redundant NULL check.
No functional change intended.
Reviewed-by: Aditya Bodkhe <adityab1@linux.ibm.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Amit Machhiwal <amachhiw@linux.ibm.com> Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ilkka Koskinen [Wed, 10 Sep 2025 19:52:13 +0000 (12:52 -0700)]
perf vendor events arm64 AmpereOne: Fix typos in metrics' descriptions
While fixing a typo in "l1d_cache_access_prefetches" in AmpereOneX,
a few other typos were found in metrics' descriptions too. While AmpereOne
doesn't have the metric, it did have the typos in the descriptions.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250910195214.50814-3-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ilkka Koskinen [Wed, 10 Sep 2025 19:52:12 +0000 (12:52 -0700)]
perf vendor events arm64 AmpereOneX: Fix typo - should be l1d_cache_access_prefetches
Add missing 'h' to l1d_cache_access_prefetces
Also fix a couple of typos and use consistent term in brief descriptions
Fixes: 16438b652b464ef7 ("perf vendor events arm64 AmpereOneX: Add core PMU events and metrics") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Thu, 21 Aug 2025 00:32:20 +0000 (17:32 -0700)]
perf trace: Add --max-summary option
The --max-summary option is to limit the number of output lines for
syscall summary stats. The max applies to each entries like thread and
cgroups. For total summary, it will just print up to the given number.
Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Fri, 12 Sep 2025 15:42:19 +0000 (16:42 +0100)]
perf arm_spe: Allow parsing both data source and events
Current code skips to parse events after generating data source. The
reason is the data source packets have cache and snooping related info,
the afterwards event packets might contain duplicate info.
This commit changes to continue parsing the events after data source
analysis. If data source does not give out memory level and snooping
types, then the event info is used to synthesize the related fields.
As a result, both the peer snoop option ('-d peer') and hitm options
('-d tot/lcl/rmt') are supported by Arm SPE in 'perf c2c'.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Fri, 12 Sep 2025 15:42:18 +0000 (16:42 +0100)]
perf arm_spe: Set HITM flag
Since FEAT_SPEv1p4, Arm SPE provides two extra events: "Cache data
modified" and "Data snooped".
Set the snoop mode as:
- If both the "Cache data modified" event and the "Data snooped" event
are set, which indicates a load operation that snooped from a outside
cache and hit a modified copy, set the HITM flag to inspect false
sharing.
- If the snooped event bit is not set, and the snooped event has been
supported by the hardware, set as NONE mode (no snoop operation).
- If the snooped event bit is not set, and the event is not supported or
absent the events info in the meta data, set as NA mode (not
available).
Don't set any mode for only "Cache data modified" event, as it hits a
local modified copy.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Handle "CPU=-1" (per-thread mode) in the arm_spe__get_metadata_by_cpu()
function. As a result, the function is more general and will be invoked
by a sequential change.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Fri, 12 Sep 2025 15:42:16 +0000 (16:42 +0100)]
perf arm_spe: Fill memory levels for FEAT_SPEv1p4
Starting with FEAT_SPEv1p4, Arm SPE provides information on Level 2 data
cache and recently fetched events. This patch fills in the memory levels
for these new events.
The recently fetched events are matched to line-fill buffer (LFB). In
general, the latency for accessing LFB is higher than accessing L1 cache
but lower than accessing L2 cache. Thus, it locates in the memory
hierarchy information between L1 cache and L2 cache.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Fri, 12 Sep 2025 15:42:15 +0000 (16:42 +0100)]
perf arm_spe: Separate setting of memory levels for loads and stores
For a load hit, the lowest-level cache reflects the latency of fetching
a data. Otherwise, the highest-level cache involved in refilling
indicates the overhead caused by a load.
Store operations remain unchanged to keep the descending order when
iterating through cache levels.
Split into two functions: one is for setting memory levels for loads and
another for stores.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Fri, 12 Sep 2025 15:42:14 +0000 (16:42 +0100)]
perf arm_spe: Refine memory level filling
This commit introduces macros for detecting cache level and cache miss.
Populates the 'mem_lvl_num' field which is a later added attribute for
representing memory level. Set NA ("not available") to memory levels if
memory hierarchy info is absent.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Fri, 12 Sep 2025 15:42:13 +0000 (16:42 +0100)]
perf arm_spe: Add "event_filter" entry in meta data
Add a new "event_filter" entry in the meta data and dump it in raw data
mode.
After:
# perf script -D
...
0 0 0x470 [0x1f0]: PERF_RECORD_AUXTRACE_INFO type: 4
Header version :2
Header size :4
PMU type v2 :11
CPU number :8
Magic :0x1010101010101010
CPU # :0
Num of params :4
MIDR :0x410fd0f0
PMU Type :11
Min Interval :256
Event Filter :0x3fe08fe
...
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Fri, 12 Sep 2025 15:42:12 +0000 (16:42 +0100)]
perf arm_spe: Decode event types for new features
Decode new event types introduced by FEAT_SPEv1p4, FEAT_SPE_SME and
FEAT_SPE_SME.
The printed event names don't strictly follow the naming in the Arm ARM.
For example, the "Cache data modified" event is shown as "HITM", and the
"Data snooped" event is printed as "SNOOPED". Shorter names are easier
to read while preserving core meanings.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Fri, 12 Sep 2025 15:42:11 +0000 (16:42 +0100)]
perf arm_spe: Directly propagate raw event
Two sets of event bits are defined: one for generating samples and
another are raw event bits used in the backend decoder. Reduce the
redundancy by using the raw event bits directly in the frontend code.
To avoid overflow issues, change the type of the event variable from
enum to u64.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Fri, 12 Sep 2025 15:42:10 +0000 (16:42 +0100)]
perf arm_spe: Use full type for data_src
data_src has an actual type rather than just being a u64. To help
readers, delay decomposing it to a u64 until it's finally assigned to
the sample.
Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Fri, 12 Sep 2025 15:42:09 +0000 (16:42 +0100)]
perf arm_spe: Correct memory level for remote access
For remote accesses, the data source packet does not contain information
about the memory level. To avoid misinformation, set the memory level to
NA (Not Available).
Fixes: 4e6430cbb1a9f1dc ("perf arm-spe: Use SPE data source for neoverse cores") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Fri, 12 Sep 2025 15:42:08 +0000 (16:42 +0100)]
perf arm_spe: Correct setting remote access
Set the mem_remote field for a remote access to appropriately represent
the event.
Fixes: a89dbc9b988f3ba8 ("perf arm-spe: Set sample's data source field") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge tag 'drm-fixes-2025-09-19' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes for drm, it's a bit busier than I'd like on the xe side
this week, but otherwise amdgpu and some smaller fixes for i915/bridge
and a revert on docs.
docs:
- fix docs build regression
i915:
- Honor VESA eDP backlight luminance control capability
bridge:
- anx7625: Fix NULL pointer dereference with early IRQ
- cdns-mhdp8546: Fix missing mutex unlock on error path
xe:
- Release kobject for the failure path
- SRIOV PF: Drop rounddown_pow_of_two fair
- Remove type casting on hwmon
- Defer free of NVM auxiliary container to device release
- Fix a NULL vs IS_ERR
- Add cleanup action in xe_device_sysfs_init
- Fix error handling if PXP fails to start
- Set GuC RCS/CCS yield policy
amdgpu:
- GC 11.0.1/4 cleaner shader support
- DC irq fix
- OD fix
amdkfd:
- S0ix fix"
* tag 'drm-fixes-2025-09-19' of https://gitlab.freedesktop.org/drm/kernel:
drm/amdgpu: suspend KFD and KGD user queues for S0ix
drm/amdkfd: add proper handling for S0ix
drm/xe/guc: Set RCS/CCS yield policy
drm/xe: Fix error handling if PXP fails to start
drm/xe/sysfs: Add cleanup action in xe_device_sysfs_init
drm/amd: Only restore cached manual clock settings in restore if OD enabled
drm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue()
drm: bridge: cdns-mhdp8546: Fix missing mutex unlock on error path
drm/i915/backlight: Honor VESA eDP backlight luminance control capability
drm/amd/display: Allow RX6xxx & RX7700 to invoke amdgpu_irq_get/put
drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.0.1/11.0.4 GPUs
drm: bridge: anx7625: Fix NULL pointer dereference with early IRQ
drm/xe: defer free of NVM auxiliary container to device release callback
drm/xe/hwmon: Remove type casting
drm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitation
drm/xe/tile: Release kobject for the failure path
Revert "drm: Add directive to format code in comment"
Dave Airlie [Fri, 19 Sep 2025 01:19:36 +0000 (11:19 +1000)]
Merge tag 'drm-xe-fixes-2025-09-18' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
- Release kobject for the failure path (Shuicheng)
- SRIOV PF: Drop rounddown_pow_of_two fair (Michal)
- Remove type casting on hwmon (Mallesh)
- Defer free of NVM auxiliary container to device release (Nitin)
- Fix a NULL vs IS_ERR (Dan)
- Add cleanup action in xe_device_sysfs_init (Zongyao)
- Fix error handling if PXP fails to start (Daniele)
- Set GuC RCS/CCS yield policy (Daniele)
Merge tag 'trace-rv-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull runtime verifier fixes from Steven Rostedt:
- Fix build in some RISC-V flavours
Some system calls only are available for the 64bit RISC-V machines.
#ifdef out the cases of clock_nanosleep and futex in the sleep
monitor if they are not supported by the architecture.
- Fix wrong cast, obsolete after refactoring
Use container_of() to get to the rv_monitor structure from the
enable_monitors_next() 'p' pointer. The assignment worked only
because the list field used happened to be the first field of the
structure.
- Remove redundant include files
Some include files were listed twice. Remove the extra ones and sort
the includes.
- Fix missing unlock on failure
There was an error path that exited the rv_register_monitor()
function without releasing a lock. Change that to goto the lock
release.
- Add Gabriele Monaco to be Runtime Verifier maintainer
Gabriele is doing most of the work on RV as well as collecting
patches. Add him to the maintainers file for Runtime Verification.
* tag 'trace-rv-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv: Add Gabriele Monaco as maintainer for Runtime Verification
rv: Fix missing mutex unlock in rv_register_monitor()
include/linux/rv.h: remove redundant include file
rv: Fix wrong type cast in enabled_monitors_next()
rv: Support systems with time64-only syscalls
Alex Deucher [Wed, 17 Sep 2025 16:42:11 +0000 (12:42 -0400)]
drm/amdgpu: suspend KFD and KGD user queues for S0ix
We need to make sure the user queues are preempted so
GFX can enter gfxoff.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Tested-by: David Perry <david.perry@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f8b367e6fa1716cab7cc232b9e3dff29187fc99d) Cc: stable@vger.kernel.org
Alex Deucher [Wed, 17 Sep 2025 16:42:09 +0000 (12:42 -0400)]
drm/amdkfd: add proper handling for S0ix
When in S0i3, the GFX state is retained, so all we need to do
is stop the runlist so GFX can enter gfxoff.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Tested-by: David Perry <david.perry@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4bfa8609934dbf39bbe6e75b4f971469384b50b1) Cc: stable@vger.kernel.org
- net: clear sk->sk_ino in sk_set_socket(sk, NULL), fix CRIU
Previous releases - regressions:
- bonding: set random address only when slaves already exist
- rxrpc: fix untrusted unsigned subtract
- eth:
- ice: fix Rx page leak on multi-buffer frames
- mlx5: don't return mlx5_link_info table when speed is unknown
Previous releases - always broken:
- tls: make sure to abort the stream if headers are bogus
- tcp: fix null-deref when using TCP-AO with TCP_REPAIR
- dpll: fix skipping last entry in clock quality level reporting
- eth: qed: don't collect too many protection override GRC elements,
fix memory corruption"
* tag 'net-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
cnic: Fix use-after-free bugs in cnic_delete_task
devlink rate: Remove unnecessary 'static' from a couple places
MAINTAINERS: update sundance entry
net: liquidio: fix overflow in octeon_init_instr_queue()
net: clear sk->sk_ino in sk_set_socket(sk, NULL)
Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
selftests: tls: test skb copy under mem pressure and OOB
tls: make sure to abort the stream if headers are bogus
selftest: packetdrill: Add tcp_fastopen_server_reset-after-disconnect.pkt.
tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().
octeon_ep: fix VF MAC address lifecycle handling
selftests: bonding: add vlan over bond testing
bonding: don't set oif to bond dev when getting NS target destination
net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
net/mlx5e: Add a miss level for ipsec crypto offload
net/mlx5e: Harden uplink netdev access against device unbind
MAINTAINERS: make the DPLL entry cover drivers
doc/netlink: Fix typos in operation attributes
igc: don't fail igc_probe() on LED setup error
...
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"These are mostly Oliver's Arm changes: lock ordering fixes for the
vGIC, and reverts for a buggy attempt to avoid RCU stalls on large
VMs.
Arm:
- Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
visiting from an MMU notifier
- Fixes to the TLB match process and TLB invalidation range for
managing the VCNR pseudo-TLB
- Prevent SPE from erroneously profiling guests due to UNKNOWN reset
values in PMSCR_EL1
- Fix save/restore of host MDCR_EL2 to account for eagerly
programming at vcpu_load() on VHE systems
- Correct lock ordering when dealing with VGIC LPIs, avoiding
scenarios where an xarray's spinlock was nested with a *raw*
spinlock
- Permit stage-2 read permission aborts which are possible in the
case of NV depending on the guest hypervisor's stage-2 translation
- Call raw_spin_unlock() instead of the internal spinlock API
- Fix parameter ordering when assigning VBAR_EL1
- Reverted a couple of fixes for RCU stalls when destroying a stage-2
page table.
There appears to be some nasty refcounting / UAF issues lurking in
those patches and the band-aid we tried to apply didn't hold.
s390:
- mm fixes, including userfaultfd bug fix
x86:
- Sync the vTPR from the local APIC to the VMCB even when AVIC is
active.
This fixes a bug where host updates to the vTPR, e.g. via
KVM_SET_LAPIC or emulation of a guest access, are lost and result
in interrupt delivery issues in the guest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active
Revert "KVM: arm64: Split kvm_pgtable_stage2_destroy()"
Revert "KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables"
KVM: arm64: vgic: fix incorrect spinlock API usage
KVM: arm64: Remove stage 2 read fault check
KVM: arm64: Fix parameter ordering for VBAR_EL1 assignment
KVM: arm64: nv: Fix incorrect VNCR invalidation range calculation
KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock
KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks
KVM: arm64: Spin off release helper from vgic_put_irq()
KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs
KVM: arm64: vgic: Drop stale comment on IRQ active state
KVM: arm64: VHE: Save and restore host MDCR_EL2 value correctly
KVM: arm64: Initialize PMSCR_EL1 when in VHE
KVM: arm64: nv: fix VNCR TLB ASID match logic for non-Global entries
KVM: s390: Fix FOLL_*/FAULT_FLAG_* confusion
KVM: s390: Fix incorrect usage of mmu_notifier_register()
KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
KVM: arm64: Mark freed S2 MMUs as invalid
Merge tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
"Fixes and new HW support:
- amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list
- amd/pmf: Support new ACPI ID AMDI0108
- asus-wmi: Re-add extra keys to ignore_key_wlan quirk
- oxpec: Add support for AOKZOE A1X and OneXPlayer X1Pro EVA-02"
* tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: asus-wmi: Re-add extra keys to ignore_key_wlan quirk
platform/x86/amd/pmf: Support new ACPI ID AMDI0108
platform/x86: oxpec: Add support for AOKZOE A1X
platform/x86: oxpec: Add support for OneXPlayer X1Pro EVA-02
platform/x86/amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list
Merge tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML fixes from Johannes Berg:
"A few fixes for UML, which I'd meant to send earlier but then forgot.
All of them are pretty long-standing issues that are either not really
happening (the UAF), in rarely used code (the FD buffer issue), or an
issue only for some host configurations (the executable stack):
- mark stack not executable to work on more modern systems with
selinux
* tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: Fix FD copy size in os_rcv_fd_msg()
um: virtio_uml: Fix use-after-free after put_device in probe
um: Don't mark stack executable
octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
The original code relies on cancel_delayed_work() in otx2_ptp_destroy(),
which does not ensure that the delayed work item synctstamp_work has fully
completed if it was already running. This leads to use-after-free scenarios
where otx2_ptp is deallocated by otx2_ptp_destroy(), while synctstamp_work
remains active and attempts to dereference otx2_ptp in otx2_sync_tstamp().
Furthermore, the synctstamp_work is cyclic, the likelihood of triggering
the bug is nonnegligible.
A typical race condition is illustrated below:
CPU 0 (cleanup) | CPU 1 (delayed work callback)
otx2_remove() |
otx2_ptp_destroy() | otx2_sync_tstamp()
cancel_delayed_work() |
kfree(ptp) |
| ptp = container_of(...); //UAF
| ptp-> //UAF
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the delayed work item is properly canceled before the otx2_ptp is
deallocated.
This bug was initially identified through static analysis. To reproduce
and test it, I simulated the OcteonTX2 PCI device in QEMU and introduced
artificial delays within the otx2_sync_tstamp() function to increase the
likelihood of triggering the bug.
Fixes: 2958d17a8984 ("octeontx2-pf: Add support for ptp 1-step mode on CN10K silicon") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original code uses cancel_delayed_work() in cnic_cm_stop_bnx2x_hw(),
which does not guarantee that the delayed work item 'delete_task' has
fully completed if it was already running. Additionally, the delayed work
item is cyclic, the flush_workqueue() in cnic_cm_stop_bnx2x_hw() only
blocks and waits for work items that were already queued to the
workqueue prior to its invocation. Any work items submitted after
flush_workqueue() is called are not included in the set of tasks that the
flush operation awaits. This means that after the cyclic work items have
finished executing, a delayed work item may still exist in the workqueue.
This leads to use-after-free scenarios where the cnic_dev is deallocated
by cnic_free_dev(), while delete_task remains active and attempt to
dereference cnic_dev in cnic_delete_task().
A typical race condition is illustrated below:
CPU 0 (cleanup) | CPU 1 (delayed work callback)
cnic_netdev_event() |
cnic_stop_hw() | cnic_delete_task()
cnic_cm_stop_bnx2x_hw() | ...
cancel_delayed_work() | /* the queue_delayed_work()
flush_workqueue() | executes after flush_workqueue()*/
| queue_delayed_work()
cnic_free_dev(dev)//free | cnic_delete_task() //new instance
| dev = cp->dev; //use
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the cyclic delayed work item is properly canceled and that any
ongoing execution of the work item completes before the cnic_dev is
deallocated. Furthermore, since cancel_delayed_work_sync() uses
__flush_work(work, true) to synchronously wait for any currently
executing instance of the work item to finish, the flush_workqueue()
becomes redundant and should be removed.
This bug was identified through static analysis. To reproduce the issue
and validate the fix, I simulated the cnic PCI device in QEMU and
introduced intentional delays — such as inserting calls to ssleep()
within the cnic_delete_task() function — to increase the likelihood
of triggering the bug.
devlink rate: Remove unnecessary 'static' from a couple places
devlink_rate_node_get_by_name() and devlink_rate_nodes_destroy() have a
couple of unnecessary static variables for iterating over devlink rates.
This could lead to races/corruption/unhappiness if two concurrent
operations execute the same function.
Remove 'static' from both. It's amazing this was missed for 4+ years.
While at it, I confirmed there are no more examples of this mistake in
net/ with 1, 2 or 3 levels of indentation.
net: liquidio: fix overflow in octeon_init_instr_queue()
The expression `(conf->instr_type == 64) << iq_no` can overflow because
`iq_no` may be as high as 64 (`CN23XX_MAX_RINGS_PER_PF`). Casting the
operand to `u64` ensures correct 64-bit arithmetic.
Fixes: f21fb3ed364b ("Add support of Cavium Liquidio ethernet adapters") Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
This reverts commit d24341740fe48add8a227a753e68b6eedf4b385a.
It causes errors when trying to configure QoS, as well as
loss of L2 connectivity (on multi-host devices).
Reported-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/20250910170011.70528106@kernel.org Fixes: d24341740fe4 ("net/mlx5e: Update and set Xon/Xoff upon port speed set") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 17 Sep 2025 00:28:13 +0000 (17:28 -0700)]
tls: make sure to abort the stream if headers are bogus
Normally we wait for the socket to buffer up the whole record
before we service it. If the socket has a tiny buffer, however,
we read out the data sooner, to prevent connection stalls.
Make sure that we abort the connection when we find out late
that the record is actually invalid. Retrying the parsing is
fine in itself but since we copy some more data each time
before we parse we can overflow the allocated skb space.
Constructing a scenario in which we're under pressure without
enough data in the socket to parse the length upfront is quite
hard. syzbot figured out a way to do this by serving us the header
in small OOB sends, and then filling in the recvbuf with a large
normal send.
Make sure that tls_rx_msg_size() aborts strp, if we reach
an invalid record there's really no way to recover.
Reported-by: Lee Jones <lee@kernel.org> Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250917002814.1743558-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Merge tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"15 hotfixes. 11 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels. 13 of these
fixes are for MM.
The usual shower of singletons, plus
- fixes from Hugh to address various misbehaviors in get_user_pages()
- patches from SeongJae to address a quite severe issue in DAMON
- another series also from SeongJae which completes some fixes for a
DAMON startup issue"
* tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
zram: fix slot write race condition
nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/*
samples/damon/mtier: avoid starting DAMON before initialization
samples/damon/prcl: avoid starting DAMON before initialization
samples/damon/wsse: avoid starting DAMON before initialization
MAINTAINERS: add Lance Yang as a THP reviewer
MAINTAINERS: add Jann Horn as rmap reviewer
mm/damon/sysfs: use dynamically allocated repeat mode damon_call_control
mm/damon/core: introduce damon_call_control->dealloc_on_cancel
mm: folio_may_be_lru_cached() unless folio_test_large()
mm: revert "mm: vmscan.c: fix OOM on swap stress test"
mm: revert "mm/gup: clear the LRU flag of a page before adding to LRU batch"
mm/gup: local lru_add_drain() to avoid lru_add_drain_all()
mm/gup: check ref_count instead of lru before migration
Merge tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Two fixes for remaining_data_length and offset checks in receive path
- Don't go over max SGEs which caused smbdirect send to fail (and
trigger disconnect)
* tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_size
ksmbd: smbdirect: validate data_offset and data_length field of smb_direct_data_transfer
smb: server: let smb_direct_writev() respect SMB_DIRECT_MAX_SEND_SGES
All recent platforms (including all the ones officially supported by the
Xe driver) do not allow concurrent execution of RCS and CCS workloads
from different address spaces, with the HW blocking the context switch
when it detects such a scenario.
The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a
context it knows will not be able to execute. This, however, causes a new
problem: if RCS and CCS queues have pending workloads from different
address spaces, the GuC needs to choose from which of the 2 queues to
pick the next workload to execute. By default, the GuC prioritizes RCS
submissions over CCS ones, which can lead to CCS workloads being
significantly (or completely) starved of execution time.
The driver can tune this by setting a dedicated scheduling policy KLV;
this KLV allows the driver to specify a quantum (in ms) and a ratio
(percentage value between 0 and 100), and the GuC will prioritize the CCS
for that percentage of each quantum.
Given that we want to guarantee enough RCS throughput to avoid missing
frames, we set the yield policy to 20% of each 80ms interval.
v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable
in gt_sanitize
Fixes: d9a1ae0d17bd ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com
(cherry picked from commit 88434448438e4302e272b2a2b810b42e05ea024b) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo added #include "xe_guc_submit.h" while backporting]