Ian Rogers [Thu, 26 Sep 2024 14:48:36 +0000 (15:48 +0100)]
perf evsel: Remove pmu_name
"evsel->pmu_name" is only ever assigned a strdup of "pmu->name", a
strdup of "evsel->pmu_name" or NULL. As such, prefer to use
"pmu->name" directly and even to directly compare PMUs than PMU
names. For safety, add some additional NULL tests.
Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com>
[ Fix arm-spe.c usage of pmu_name and empty PMU name ] Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-6-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ian Rogers [Thu, 26 Sep 2024 14:48:35 +0000 (15:48 +0100)]
perf evsel x86: Make evsel__has_perf_metrics work for legacy events
Use PMU interface to better detect core PMU for legacy events. Look
for slots event on core PMU if it is appropriate for the event.
Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-5-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ian Rogers [Thu, 26 Sep 2024 14:48:34 +0000 (15:48 +0100)]
perf stat: Remove evlist__add_default_attrs use strings
add_default_atttributes would add evsels by having pre-created
perf_event_attr, however, this needed fixing for hybrid as the
extended PMU type was necessary for each core PMU. The logic for this
was in an arch specific x86 function and wasn't present for ARM,
meaning that default events weren't being opened on all PMUs on
ARM. Change the creation of the default events to use parse_events and
strings as that will open the events on all PMUs.
Rather than try to detect events on PMUs before parsing, parse the
event but skip its output in stat-display.
The previous order of hardware events was: cycles,
stalled-cycles-frontend, stalled-cycles-backend, instructions. As
instructions is a more fundamental concept the order is changed to:
instructions, cycles, stalled-cycles-frontend, stalled-cycles-backend.
Closes: https://lore.kernel.org/lkml/CAP-5=fVABSBZnsmtRn1uF-k-G1GWM-L5SgiinhPTfHbQsKXb_g@mail.gmail.com/ Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com>
[Don't display unsupported default events except 'cycles'] Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-4-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ian Rogers [Thu, 26 Sep 2024 14:48:33 +0000 (15:48 +0100)]
perf stat: Uniquify event name improvements
Without aggregation on Intel:
```
$ perf stat -e instructions,cycles ...
```
Will use "cycles" for the name of the legacy cycles event but as
"instructions" has a sysfs name it will and a "[cpu]" PMU suffix. This
often breaks things as the space between the event and the PMU name
look like an extra column. The existing uniquify logic was also
uniquifying in cases when all events are core and not with uncore
events, it was not correctly handling modifiers, etc.
Change the logic so that an initial pass that can disable
uniquification is run. For individual counters, disable uniquification
in more cases such as for consistency with legacy events or for
libpfm4 events. Don't use the "[pmu]" style suffix in uniquification,
always use "pmu/.../". Change how modifiers/terms are handled in the
uniquification so that they look like parse-able events.
This fixes "102: perf stat metrics (shadow stat) test:" that has been
failing due to "instructions [cpu]" breaking its column/awk logic when
values aren't aggregated. This started happening when instructions
could match a sysfs rather than a legacy event, so the fixes tag
reflects this.
Fixes: 617824a7f0f7 ("perf parse-events: Prefer sysfs/JSON hardware events over legacy") Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com>
[ Fix Intel TPEBS counting mode test ] Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-3-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ian Rogers [Thu, 26 Sep 2024 14:48:32 +0000 (15:48 +0100)]
perf evsel: Add alternate_hw_config and use in evsel__match
There are cases where we want to match events like instructions and
cycles with legacy hardware values, in particular in stat-shadow's
hard coded metrics. An evsel's name isn't a good point of reference as
it gets altered, strstr would be too imprecise and re-parsing the
event from its name is silly. Instead, hold the legacy hardware event
name, determined during parsing, in the evsel for this matching case.
Inline evsel__match2 that is only used in builtin-diff.
Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Namhyung Kim [Thu, 12 Sep 2024 22:42:08 +0000 (15:42 -0700)]
perf symbol: Do not fixup end address of labels
When it loads symbols from an ELF file, it loads label symbols which is
0 size. Sometimes it has the same address with other symbols and might
shadow the original symbols because it fixes up the size of the symbol.
For example, in my system __do_softirq is shadowed and only accepts the
__softirqentry_text_start instead. But it should accept __do_softirq.
We can ignore NOTYPE symbols in the symbols__fixup_end() so that it can
pick the __do_softirq() in choose_best_symbol(). This should be fine
since most symbols have either STT_FUNC or STT_OBJECT.
Levi Yun [Wed, 25 Sep 2024 13:20:22 +0000 (14:20 +0100)]
perf stat: Stop repeating when ref_perf_stat() returns -1
Exit when run_perf_stat() returns an error to avoid continuously
repeating the same error message. It's not expected that COUNTER_FATAL
or internal errors are recoverable so there's no point in retrying.
This fixes the following flood of error messages for permission issues,
for example when perf_event_paranoid==3:
perf stat -r 1044 -- false
Error:
Access to performance monitoring and observability operations is limited.
...
Error:
Access to performance monitoring and observability operations is limited.
...
(repeating for 1044 times).
Levi Yun [Wed, 25 Sep 2024 13:20:21 +0000 (14:20 +0100)]
perf stat: Close cork_fd when create_perf_stat_counter() failed
When create_perf_stat_counter() failed, it doesn't close workload.cork_fd
open in evlist__prepare_workload(). This could make too many open file
error while __run_perf_stat() repeats.
Introduce evlist__cancel_workload to close workload.cork_fd and
wait workload.child_pid until exit to clear child process
when create_perf_stat_counter() is failed.
Signed-off-by: Levi Yun <yeoreum.yun@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: nd@arm.com Cc: howardchu95@gmail.com Link: https://lore.kernel.org/r/20240925132022.2650180-2-yeoreum.yun@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
perf evsel: display dmesg command of showing a hardcoded path
In non-FHS compliant distros like NixOS, nothing resides in `/bin`
and `/usr/bin`. Instead dynamically symlinked into
`/run/current-system/sw/bin/`, the executable resides in `/nix/store`.
With this patch,`/bin` prefix from the dmesg command in the error
message is stripped.
James Clark [Mon, 16 Sep 2024 13:57:38 +0000 (14:57 +0100)]
perf test: cs-etm: Test Coresight disassembly script
Run a few samples through the disassembly script and check to see that
at least one branch instruction is printed.
Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-8-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
James Clark [Mon, 16 Sep 2024 13:57:37 +0000 (14:57 +0100)]
perf scripts python cs-etm: Add start and stop arguments
Make it possible to only disassemble a range of timestamps or sample
indexes. This will be used by the test to limit the runtime, but it's
also useful for users.
Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-7-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
James Clark [Mon, 16 Sep 2024 13:57:36 +0000 (14:57 +0100)]
perf scripts python cs-etm: Improve arguments
Make vmlinux detection automatic and use Perf's default objdump
when -d is specified. This will make it easier for a test to use the
script without having to provide arguments. And similarly for users.
Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-6-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
James Clark [Mon, 16 Sep 2024 13:57:35 +0000 (14:57 +0100)]
perf scripts python cs-etm: Update to use argparse
optparse is deprecated and less flexible than argparse so update it.
Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-5-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
James Clark [Mon, 16 Sep 2024 13:57:34 +0000 (14:57 +0100)]
perf scripting python: Add function to get a config value
This can be used to get config values like which objdump Perf uses for
disassembly.
Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-4-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
James Clark [Mon, 16 Sep 2024 13:57:33 +0000 (14:57 +0100)]
perf cs-etm: Use new OpenCSD consistency checks
Previously when the incorrect binary was used for decode, Perf would
silently continue to generate incorrect samples. With OpenCSD 1.5.4 we
can enable consistency checks that do a best effort to detect a mismatch
in the image. When one is detected a warning is printed and sample
generation stops until the trace resynchronizes with a good part of the
image.
Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Closes: https://lore.kernel.org/all/20240719092619.274730-1-gankulkarni@os.amperecomputing.com/ Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-3-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
James Clark [Mon, 16 Sep 2024 13:57:32 +0000 (14:57 +0100)]
perf cs-etm: Don't flush when packet_queue fills up
cs_etm__flush(), like cs_etm__sample() is an operation that generates a
sample and then swaps the current with the previous packet. Calling
flush after processing the queues results in two swaps which corrupts
the next sample. Therefore it wasn't appropriate to call flush here so
remove it.
Flushing is still done on a discontinuity to explicitly clear the last
branch buffer, but when the packet_queue fills up before reaching a
timestamp, that's not a discontinuity and the call to
cs_etm__process_traceid_queue() already generated samples and drained
the buffers correctly.
This is visible by looking for a branch that has the same target as the
previous branch and the following source is before the address of the
last target, which is impossible as execution would have had to have
gone backwards:
Make sure that a final branch stack is output at the end of the trace
by calling cs_etm__end_block(). This is already done for both the
timeless decode paths.
Fixes: 21fe8dc1191a ("perf cs-etm: Add support for CPU-wide trace scenarios") Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Closes: https://lore.kernel.org/all/20240719092619.274730-1-gankulkarni@os.amperecomputing.com/ Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ian Rogers [Thu, 2 May 2024 22:31:15 +0000 (15:31 -0700)]
perf test: Be more tolerant of metricgroup failures
Previously "set -e" meant any non-zero exit code from perf stat would
cause a test failure. As a non-zero exit happens when there aren't
sufficient permissions, check for this case and make the exit code
2/skip for it.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20240502223115.2357499-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org Link: https://lore.kernel.org/lkml/ZuH6MquMraBvODRp@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We'll probably need to come up with some way for using the BTF info to
synthesize a test that then gets used and captures the output of the
'perf trace' output to check if the arguments are the ones synthesized,
randomically, for now, lets make do manually:
Interesting, glibc seems to be using rseq here, as in addition to the
totally fake one this test case uses, we have this one, around these
other syscalls:
And also the focus for the v6.13 should be to have a better, strace
like BTF pretty printer as one of the outputs we can get from the libbpf
BTF dumper.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZuH2K1LLt1pIDkbd@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
# Branch counter abbr list:
# cpu_core/branch-instructions/pp = A
# cpu_core/branches/ = B
# '-' No event occurs
# '+' Event occurrences may be lost due to branch counter saturated
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter
# ............... .............. ........... .......... ..............
44.54% 727.1K 0.00% 1 |+ |+ |
36.31% 592.7K 0.00% 2 |+ |+ |
17.83% 291.1K 0.00% 1 |+ |+ |
The branch counter information (br_cntr_width and br_cntr_nr) in the
perf_env is retrieved from the CPU_PMU_CAPS. However, the CPU_PMU_CAPS
is not available on hybrid machines. Without the width information, the
number of occurrences of an event cannot be calculated.
For a hybrid machine, the caps information should be retrieved from the
PMU_CAPS, and stored in the perf_env->pmu_caps.
Add a perf_env__find_br_cntr_info() to return the correct branch counter
information from the corresponding fields.
Committer notes:
While testing I couldn't s ee those "Branch counter" columns enabled by
pressing 'B' on the TUI, after reporting it to the list Kan explained
the situation:
<quote Kan Liang>
For a hybrid client, the "Branch Counter" feature is only supported
starting from the just released Lunar Lake. Perf falls back to only
"ANY" on your Raptor Lake.
The "The branch counter is not available" message is expected.
Here is the 'perf evlist' result from my Lunar Lake machine,
Kan Liang [Sun, 8 Sep 2024 20:28:47 +0000 (13:28 -0700)]
perf evlist: Print hint for group
An event group is a critical relationship. There is a -g option that can
display the relationship. But it's hard for a user to know when should
this option be applied.
If there is an event group in the perf record, print a hint to suggest
the user apply the -g to display the group information.
With the patch,
$ perf record -e "{cycles,instructions},instructions" sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (4 samples) ]
$
$ perf evlist
cycles
instructions
instructions
# Tip: use 'perf evlist -g' to show group information
root@number:~# perf evlist -g
{cpu_core/branch-instructions/pp,cpu_core/branches/}
dummy:u
root@number:~# perf evlist
cpu_core/branch-instructions/pp
cpu_core/branches/
dummy:u
# Tip: use 'perf evlist -g' to show group information
root@number:~#
Then for something _without_ a group, no hint:
root@number:~# perf record ls
<SNIP>
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.035 MB perf.data (7 samples) ]
root@number:~# perf evlist
cpu_atom/cycles/P
cpu_core/cycles/P
dummy:u
root@number:~#
No suggestion, good.
Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Closes: https://lore.kernel.org/lkml/ZttgvduaKsVn1r4p@x1/ Link: https://lore.kernel.org/r/20240908202847.176280-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sam James [Sun, 8 Sep 2024 18:46:41 +0000 (19:46 +0100)]
tools: Drop nonsensical -O6
-O6 is very much not-a-thing. Really, this should've been dropped
entirely in 49b3cd306e60b9d8 ("tools: Set the maximum optimization level
according to the compiler being used") instead of just passing it for
not-Clang.
Just collapse it down to -O3, instead of "-O6 unless Clang, in which case
-O3".
GCC interprets > -O3 as -O3. It doesn't even interpret > -O3 as -Ofast,
which is a good thing, given -Ofast has specific (non-)requirements for
code built using it. So, this does nothing except look a bit daft.
Remove the silliness and also save a few lines in the Makefiles accordingly.
Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com> Signed-off-by: Sam James <sam@gentoo.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bill Wendling <morbo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/4f01524fa4ea91c7146a41e26ceaf9dae4c127e4.1725821201.git.sam@gentoo.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Sat, 7 Sep 2024 05:08:19 +0000 (22:08 -0700)]
perf pmu: To info add event_type_desc
All PMU events are assumed to be "Kernel PMU event", however, this
isn't true for fake PMUs and won't be true with the addition of more
software PMUs. Make the PMU's type description name configurable -
largely for printing callbacks.
Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240907050830.6752-5-irogers@google.com Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Sat, 7 Sep 2024 05:08:18 +0000 (22:08 -0700)]
perf evsel: Add accessor for tool_event
Currently tool events use a dedicated variable within the evsel. Later
changes will move this to the unused struct perf_event_attr config for
these events. Add an accessor to allow the later change to be well
typed and avoid changing all uses.
Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240907050830.6752-4-irogers@google.com Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Sat, 7 Sep 2024 05:08:17 +0000 (22:08 -0700)]
perf pmus: Fake PMU clean up
Rather than passing a fake PMU around, just pass that the fake PMU
should be used - true when doing testing. Move the fake PMU into
pmus.[ch] and try to abstract the PMU's properties in pmu.c, ie so
there is less "if fake_pmu" in non-PMU code. Give the fake PMU a made
up type number.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240907050830.6752-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Sat, 7 Sep 2024 05:08:16 +0000 (22:08 -0700)]
perf list: Avoid potential out of bounds memory read
If a desc string is 0 length then -1 will be out of bounds, add a
check.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240907050830.6752-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andrew Kreimer [Sat, 7 Sep 2024 13:10:01 +0000 (16:10 +0300)]
perf help: Fix a typo ("bellow")
Fix a typo in comments.
Reported-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Andrew Kreimer <algonell@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Link: https://lore.kernel.org/r/20240907131006.18510-1-algonell@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Changbin Du [Wed, 11 Sep 2024 10:01:26 +0000 (18:01 +0800)]
perf ftrace: Detect whether ftrace is enabled on system
To make error messages more accurate, this change detects whether ftrace is
enabled on system by checking trace file "set_ftrace_pid".
Before:
# perf ftrace
failed to reset ftrace
#
After:
# perf ftrace
ftrace is not supported on this system
#
Committer testing:
Doing it in an unprivileged toolbox container on Fedora 40:
Before:
acme@number:~/git/perf-tools-next$ toolbox enter perf
⬢[acme@toolbox perf-tools-next]$ sudo su -
⬢[root@toolbox ~]# ~acme/bin/perf ftrace
failed to reset ftrace
⬢[root@toolbox ~]#
After this patch:
⬢[root@toolbox ~]# ~acme/bin/perf ftrace
ftrace is not supported on this system
⬢[root@toolbox ~]#
Maybe we could check if we are in such as situation, inside an
unprivileged container, and provide a HINT line?
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Changbin Du <changbin.du@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240911100126.900779-1-changbin.du@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf test shell probe_vfs_getname: Remove extraneous '=' from probe line number regex
Thomas reported the vfs_getname perf tests failing on s/390, it seems it
was just to some extraneous '=' somehow getting into the regexp, remove
it, now:
root@x1:~# perf test getname
91: Add vfs_getname probe to get syscall args filenames : Ok
93: Use vfs_getname probe to get syscall args filenames : FAILED!
126: Check open filename arg using perf trace + vfs_getname : Ok
root@x1:~#
Second one remains a mistery, have to take some time to nail it down.
Reported-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vasily Gorbik <gor@linux.ibm.com>, Link: https://lore.kernel.org/lkml/1d7f3b7b-9edc-4d90-955c-9345428563f1@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So lets make that the required version, if you happen to have a slightly
older version where this work, please report so that we can adjust the
minimum required version.
Reported-by: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZuGL9ROeTV2uXoSp@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf trace: If a syscall arg is marked as 'const', assume it is coming _from_ userspace
We need to decide where to copy syscall arg contents, if at the
syscalls:sys_entry hook, meaning is something that is coming from
user to kernel space, or if it is a response, i.e. if it is something
the _kernel_ is filling in and thus going to userspace.
Since we have 'const' used in those syscalls, and unsure about this
being consistent, doing:
Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWnuQrrBoTn6Rrn6vM_xQ2fCoc9i-AitD7abTcNi-4o1Q@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yang Li [Tue, 10 Sep 2024 00:55:22 +0000 (08:55 +0800)]
perf parse-events: Remove duplicated include in parse-events.c
The header files parse-events.h is included twice in parse-events.c,
so one inclusion of each can be removed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10822 Link: https://lore.kernel.org/r/20240910005522.35994-1-yang.lee@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 9 Sep 2024 20:37:40 +0000 (13:37 -0700)]
perf callchain: Allow symbols to be optional when resolving a callchain
In uses like 'perf inject' it is not necessary to gather the symbol for
each call chain location, the map for the sample IP is wanted so that
build IDs and the like can be injected. Make gathering the symbol in the
callchain_cursor optional.
For a 'perf inject -B' command this lowers the peak RSS from 54.1MB to
29.6MB by avoiding loading symbols.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Casey Chen <cachen@purestorage.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Link: https://lore.kernel.org/r/20240909203740.143492-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 9 Sep 2024 20:37:39 +0000 (13:37 -0700)]
perf inject: Lazy build-id mmap2 event insertion
Add -B option that lazily inserts mmap2 events thereby dropping all
mmap events without samples. This is similar to the behavior of -b
where only build_id events are inserted when a dso is accessed in a
sample.
File size savings can be significant in system-wide mode, consider:
$ perf record -g -a -o perf.data sleep 1
$ perf inject -B -i perf.data -o perf.new.data
$ ls -al perf.data perf.new.data 5147049 perf.data 2248493 perf.new.data
Give test coverage of the new option in pipe test.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Casey Chen <cachen@purestorage.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Link: https://lore.kernel.org/r/20240909203740.143492-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 9 Sep 2024 20:37:38 +0000 (13:37 -0700)]
perf inject: Add new mmap2-buildid-all option
Add an option that allows all mmap or mmap2 events to be rewritten as
mmap2 events with build IDs.
This is similar to the existing -b/--build-ids and --buildid-all options
except instead of adding a build_id event an existing mmap/mmap2 event
is used as a template and a new mmap2 event synthesized from it.
As mmap2 events are typical this avoids the insertion of build_id
events.
Add test coverage to the pipe test.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Casey Chen <cachen@purestorage.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Link: https://lore.kernel.org/r/20240909203740.143492-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 9 Sep 2024 20:37:37 +0000 (13:37 -0700)]
perf inject: Fix build ID injection
Build ID injection wasn't inserting a sample ID and aligning events to
64 bytes rather than 8. No sample ID means events are unordered and two
different build_id events for the same path, as happens when a file is
replaced, can't be differentiated.
Add in sample ID insertion for the build_id events alongside some
refactoring. The refactoring better aligns the function arguments for
different use cases, such as synthesizing build_id events without
needing to have a dso. The misc bits are explicitly passed as with
callchains the maps/dsos may span user and kernel land, so using
sample->cpumode isn't good enough.
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Casey Chen <cachen@purestorage.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Link: https://lore.kernel.org/r/20240909203740.143492-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 9 Sep 2024 21:42:51 +0000 (14:42 -0700)]
perf annotate-data: Add pr_debug_scope()
The pr_debug_scope() is to print more information about the scope DIE
during the instruction tracking so that it can help finding relevant
debug info and the source code like inlined functions more easily.
$ perf --debug type-profile annotate --data-type
...
-----------------------------------------------------------
find data type for 0(reg0, reg12) at set_task_cpu+0xdd
CU for kernel/sched/core.c (die:0x1268dae)
frame base: cfa=1 fbreg=7
scope: [3/3] (die:12b6d28) [inlined] set_task_rq <<<--- (here)
bb: [9f - dd]
var [9f] reg3 type='struct task_struct*' size=0x8 (die:0x126aff0)
var [9f] reg6 type='unsigned int' size=0x4 (die:0x1268e0d)
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240909214251.3033827-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 9 Sep 2024 21:42:50 +0000 (14:42 -0700)]
perf annotate: Treat 'call' instruction as stack operation
I found some portion of mem-store events sampled on CALL instruction
which has no memory access. But it actually saves a return address
into stack. It should be considered as a stack operation like RET
instruction.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240909214251.3033827-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Tue, 10 Sep 2024 14:04:01 +0000 (15:04 +0100)]
perf build: Remove unused feature test target
llvm-version was removed in commit 56b11a2126bf ("perf bpf: Remove
support for embedding clang for compiling BPF events (-e foo.c)") but
some parts were left in the Makefile so finish removing them.
Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bill Wendling <morbo@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Daniel Wagner <dwagner@suse.de> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Manu Bretelle <chantr4@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <qmo@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20240910140405.568791-2-james.clark@linaro.org
[ Removed one leftover, 'llvm-version' from FEATURE_TESTS_EXTRA ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Tue, 10 Sep 2024 14:04:00 +0000 (15:04 +0100)]
perf build: Autodetect minimum required llvm-dev version
The new LLVM addr2line feature requires a minimum version of 13 to
compile. Add a feature check for the version so that NO_LLVM=1 doesn't
need to be explicitly added. Leave the existing llvm feature check
intact because it's used by tools other than Perf.
This fixes the following compilation error when the llvm-dev version
doesn't match:
util/llvm-c-helpers.cpp: In function 'char* llvm_name_for_code(dso*, const char*, u64)':
util/llvm-c-helpers.cpp:178:21: error: 'std::remove_reference_t<llvm::DILineInfo>' {aka 'struct llvm::DILineInfo'} has no member named 'StartAddress'
178 | addr, res_or_err->StartAddress ? *res_or_err->StartAddress : 0);
Fixes: c3f8644c21df9b7d ("perf report: Support LLVM for addr2line()") Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bill Wendling <morbo@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Manu Bretelle <chantr4@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <qmo@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20240910140405.568791-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Better than before, still needs improvements in the configurability of
the libbpf BTF dumper to get it to the strace output standard.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZuBQI-f8CGpuhIdH@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf trace: Support collecting 'union's with the BPF augmenter
And reuse the BTF based struct pretty printer, with that we can offer
initial support for the 'bpf' syscall's second argument, a 'union
bpf_attr' pointer.
But this is not that satisfactory as the libbpf btf dumper will pretty
print _all_ the union, we need to have a way to say that the first arg
selects the type for the union member to be pretty printed, something
like what pahole does translating the PERF_RECORD_ selector into a name,
and using that name to find a matching struct.
In the case of 'union bpf_attr' it would map PROG_LOAD to one of the
union members, but unfortunately there is no such mapping:
So this is one case where BTF gets us only that far, not getting all
the way to automate the pretty printing of unions designed like 'union
bpf_attr', we will need a custom pretty printer for this union, as using
the libbpf union BTF dumper is way too verbose:
For simpler unions this may be better than not seeing any payload, so
keep it there.
Acked-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZuBLat8cbadILNLA@x1
[ Removed needless parenteses in the if block leading to the trace__btf_scnprintf() call, as per Howard's review comments ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Howard Chu [Sat, 24 Aug 2024 16:33:21 +0000 (00:33 +0800)]
perf trace: Add --force-btf for debugging
If --force-btf is enabled, prefer btf_dump general pretty printer to
perf trace's customized pretty printers.
Mostly for debug purposes.
Committer testing:
diff before/after shows we need several improvements to be able to
compare the changes, first we need to cut off/disable mutable data such
as pids and timestamps, then what is left are the buffer addresses
passed from userspace, returned from kernel space, maybe we can ask
'perf trace' to go on making those reproducible.
That would entail a Pointer Address Translation (PAT) like for
networking, that would, for simple, reproducible if not for these
details, workloads, that we would then use in our regression tests.
Enough digression, this is one such diff:
openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY|CLOEXEC) = 3
-fstat(fd: 3, statbuf: 0x7fff01f212a0) = 0
-read(fd: 3, buf: 0x5596bab2d630, count: 4096) = 2998
-read(fd: 3, buf: 0x5596bab2d630, count: 4096) = 0
+fstat(fd: 3, statbuf: 0x7ffc163cf0e0) = 0
+read(fd: 3, buf: 0x55b4e0631630, count: 4096) = 2998
+read(fd: 3, buf: 0x55b4e0631630, count: 4096) = 0
close(fd: 3) = 0
openat(dfd: CWD, filename: "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
openat(dfd: CWD, filename: "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
@@ -45,7 +45,7 @@
openat(dfd: CWD, filename: "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
openat(dfd: CWD, filename: "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
openat(dfd: CWD, filename: "/usr/share/locale/en/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
-{ .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7fff01f21990) = 0
+(struct __kernel_timespec){.tv_sec = (__kernel_time64_t)1,}, rmtp: 0x7ffc163cf7d0) =
The problem more close to our hands is to make the libbpf BTF pretty
printer to have a mode that closely resembles what we're trying to
resemble: strace output.
Being able to run something with 'perf trace' and with 'strace' and get
the exact same output should be of interest of anybody wanting to have
strace and 'perf trace' regression tested against each other.
That last part is 'perf trace' shot at being something so useful as
strace... ;-)
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240824163322.60796-8-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Howard Chu [Sat, 24 Aug 2024 16:33:20 +0000 (00:33 +0800)]
perf trace: Collect augmented data using BPF
Include trace_augment.h for TRACE_AUG_MAX_BUF, so that BPF reads
TRACE_AUG_MAX_BUF bytes of buffer maximum.
Determine what type of argument and how many bytes to read from user space, us ing the
value in the beauty_map. This is the relation of parameter type and its corres ponding
value in the beauty map, and how many bytes we read eventually:
string: 1 -> size of string (till null)
struct: size of struct -> size of struct
buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_ MAX_BUF)
After reading from user space, we output the augmented data using
bpf_perf_event_output().
If the struct augmenter, augment_sys_enter() failed, we fall back to
using bpf_tail_call().
I have to make the payload 6 times the size of augmented_arg, to pass the
BPF verifier.
Howard Chu [Sat, 24 Aug 2024 16:33:19 +0000 (00:33 +0800)]
perf trace: Pretty print buffer data
Define TRACE_AUG_MAX_BUF in trace_augment.h data, which is the maximum
buffer size we can augment. BPF will include this header too.
Print buffer in a way that's different than just printing a string, we
print all the control characters in \digits (such as \0 for null, and
\10 for newline, LF).
For character that has a bigger value than 127, we print the digits
instead of the character itself as well.
Committer notes:
Simplified the buffer scnprintf to avoid using multiple buffers as
discussed in the patch review thread.
We can't really all 'buf' args to SCA_BUF as we're collecting so far
just on the sys_enter path, so we would be printing the previous 'read'
arg buffer contents, not what the kernel puts there.
Howard Chu [Sat, 24 Aug 2024 16:33:18 +0000 (00:33 +0800)]
perf trace: Pretty print struct data
Change the arg->augmented.args to arg->augmented.args->value to skip the
header for customized pretty printers, since we collect data in BPF
using the general augment_sys_enter(), which always adds the header.
Use btf_dump API to pretty print augmented struct pointer.
Prefer existed pretty-printer than btf general pretty-printer.
set compact = true and skip_names = true, so that no newline character
and argument name are printed.
Committer notes:
Simplified the btf_dump_snprintf callback to avoid using multiple
buffers, as discussed in the thread accessible via the Link tag below.
I.e. show the type and struct field names according to that tunable, we
probably need another tunable just for this, but for now if the user
wants to see syscall names in addition to its value, it makes sense to
see the struct field names according to that tunable.
Committer testing:
The following have explicitely set beautifiers (SCA_FILENAME,
SCA_SOCKADDR and SCA_PERF_ATTR), SCA_FILENAME is here just because we
have been wiring up the "renameat2" ("renameat" until recently), so it
doesn't use the introduced generic fallback (btf_struct_scnprintf(), see
the definition of SCA_PERF_ATTR, SCA_SOCKADDR to see the more feature
rich beautifiers, that are not using BTF):
Now if we use it even for the ones we have a specific beautifier in
tools/perf/trace/beauty, i.e. use btf_struct_scnprintf() for all
structs, by adding the following patch:
#define AF_UNIX 1 /* Unix domain sockets */
#define AF_LOCAL 1 /* POSIX name for AF_UNIX */
#define AF_INET 2 /* Internet IP Protocol */
<SNIP>
#define AF_INET6 10 /* IP version 6 */
And 'D' == 68, so the preexisting sockaddr BPF collector is working with
the new generic BTF pretty printer (btf_struct_scnprintf()), its just
that it doesn't know about 'struct sockaddr' besides what is in BTF,
i.e. its an array of bytes, not an IPv4 address that needs extra
massaging.
We need to work with the libbpf btf dump api to get one output that
matches the 'perf trace'/strace expectations/format, but having this in
this current form is already an improvement to 'perf trace', so lets
improve from what we have.
Howard Chu [Sat, 24 Aug 2024 16:33:16 +0000 (00:33 +0800)]
perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF
Set up beauty_map, load it to BPF, in such format: if argument No.3 is a
struct of size 32 bytes (of syscall number 114) beauty_map[114][2] = 32;
if argument No.3 is a string (of syscall number 114) beauty_map[114][2] =
1;
if argument No.3 is a buffer, its size is indicated by argument No.4 (of
syscall number 114) beauty_map[114][2] = -4; /* -1 ~ -6, we'll read this
buffer size in BPF */
Committer notes:
Moved syscall_arg_fmt__cache_btf_struct() from a ifdef
HAVE_LIBBPF_SUPPORT to closer to where it is used, that is ifdef'ed on
HAVE_BPF_SKEL and thus breaks the build when building with
BUILD_BPF_SKEL=0, as detected using 'make -C tools/perf build-test'.
Also add 'struct beauty_map_enter' to tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
as we're using it in this patch, otherwise we get this while trying to
build at this point in the original patch series:
builtin-trace.c: In function ‘trace__init_syscalls_bpf_prog_array_maps’:
builtin-trace.c:3725:58: error: ‘struct <anonymous>’ has no member named ‘beauty_map_enter’
3725 | int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
|
We also have to take into account syscall_arg_fmt.from_user when telling
the kernel what to copy in the sys_enter generic collector, we don't
want to collect bogus data in buffers that will only be available to us
at sys_exit time, i.e. after the kernel has filled it, so leave this for
when we have such a sys_exit based collector.
Committer testing:
Not wired up yet, so all continues to work, using the existing BPF
collector and userspace beautifiers that are augmentation aware:
perf trace: Introduce SCA_TIMESPEC_FROM_USER() to set .from_user = true
Paving the way for the generic BPF BTF based syscall arg augmenter.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf trace: Introduce SCA_SOCKADDR_FROM_USER() to set .from_user = true
Paving the way for the generic BPF BTF based syscall arg augmenter.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf trace: Introduce SCA_PERF_ATTR_FROM_USER() to set .from_user = true
Paving the way for the generic BPF BTF based syscall arg augmenter.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf trace: Mark which syscall arguments go from user space to kernel space
We need to know where to collect it in the BPF augmenters, if in the
sys_enter hook or in the sys_exit hook.
Start with the SCA_FILENAME one, that is just from user to kernel space.
The alternative, better, but takes a bit more time than I have now, is
to use the __user information that is already in the syscall args and
encoded in BTF via a tag, do it later.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf trace: Use a common encoding for augmented arguments, with size + error + payload
We were using a more compact format, without explicitely encoding the
size and possible error in the payload for an argument.
To do it generically, at least as Howard Chu did in his GSoC activities,
it is more convenient to use the same model that was being used for
string arguments, passing { size, error, payload }.
So use that for the non string syscall args we have so far:
struct timespec
struct perf_event_attr
struct sockaddr (this one has even a variable size)
With this in place we have the userspace pretty printers:
Ready to have the generic BPF collector in tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
sending its generic payload and thus we'll use them instead of a generic
libbpf btf_dump interface that doesn't know about about the sockaddr
mux, perf_event_attr non-trivial fields (sample_type, etc), leaving it
as a (useful) fallback that prints just basic types until we put in
place a more sophisticated pretty printer infrastructure that associates
synthesized enums to struct fields using the header scrapers we have in
tools/perf/trace/beauty/, some of them in this list:
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fW4=2GoP6foAN6qbrCiUzy0a_TzHbd8rvDsakTPfdzvfg@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf trace augmented_syscalls.bpf: Move the renameat aumenter to renameat2, temporarily
While trying to shape Howard Chu's generic BPF augmenter transition into
the codebase I got stuck with the renameat2 syscall.
Until I noticed that the attempt at reusing augmenters were making it
use the 'openat' syscall augmenter, that collect just one string syscall
arg, for the 'renameat2' syscall, that takes two strings.
So, for the moment, just to help in this transition period, since
'renameat2' is what is used these days in the 'mv' utility, just make
the BPF collector be associated with the more widely used syscall,
hopefully the transition to Howard's generic BPF augmenter will cure
this, so get this out of the way for now!
So now we still have that odd "reuse", but for something we're not
testing so won't get in the way anymore:
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fXjGYs=tpBgETK-P9U-CuXssytk9pSnTXpfphrmmOydWA@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Thu, 5 Sep 2024 17:07:36 +0000 (10:07 -0700)]
perf mem: Fix missed p-core mem events on ADL and RPL
The p-core mem events are missed when launching 'perf mem record' on ADL
and RPL.
root@number:~# perf mem record sleep 1
Memory events are enabled on a subset of CPUs: 16-27
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.032 MB perf.data ]
root@number:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
A variable 'record' in the 'struct perf_mem_event' is to indicate
whether a mem event in a mem_events[] should be recorded. The current
code only configure the variable for the first eligible PMU.
It's good enough for a non-hybrid machine or a hybrid machine which has
the same mem_events[].
However, if a different mem_events[] is used for different PMUs on a
hybrid machine, e.g., ADL or RPL, the 'record' for the second PMU never
get a chance to be set.
The mem_events[] of the second PMU are always ignored.
'perf mem' doesn't support the per-PMU configuration now. A per-PMU
mem_events[] 'record' variable doesn't make sense. Make it global.
That could also avoid searching for the per-PMU mem_events[] via
perf_pmu__mem_events_ptr every time.
Kan Liang [Thu, 5 Sep 2024 17:07:35 +0000 (10:07 -0700)]
perf mem: Check mem_events for all eligible PMUs
The current perf_pmu__mem_events_init() only checks the availability of
the mem_events for the first eligible PMU. It works for non-hybrid
machines and hybrid machines that have the same mem_events.
However, it may bring issues if a hybrid machine has a different
mem_events on different PMU, e.g., Alder Lake and Raptor Lake. A
mem-loads-aux event is only required for the p-core. The mem_events on
both e-core and p-core should be checked and marked.
The issue was not found, because it's hidden by another bug, which only
records the mem-events for the e-core. The wrong check for the p-core
events didn't yell.
Fixes: abbdd79b786e036e ("perf mem: Clean up perf_mem_events__name()") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240905170737.4070743-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Running a script that processes PEBS records gives buffer overflows
in valgrind.
The problem is that the allocation of the register string doesn't
include the terminating 0 byte. Fix this.
I also replaced the very magic "28" with a more reasonable larger buffer
that should fit all registers. There's no need to conserve memory here.
==2106591== Memcheck, a memory error detector
==2106591== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==2106591== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==2106591== Command: ../perf script -i tcall.data gcov.py tcall.gcov
==2106591==
==2106591== Invalid write of size 1
==2106591== at 0x713354: regs_map (trace-event-python.c:748)
==2106591== by 0x7134EB: set_regs_in_dict (trace-event-python.c:784)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442)
==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
==2106591==
==2106591== Invalid read of size 1
==2106591== at 0x484B6C6: strlen (vg_replace_strmem.c:502)
==2106591== by 0x555D494: PyUnicode_FromString (unicodeobject.c:1899)
==2106591== by 0x7134F7: set_regs_in_dict (trace-event-python.c:786)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442)
==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
==2106591==
==2106591== Invalid write of size 1
==2106591== at 0x713354: regs_map (trace-event-python.c:748)
==2106591== by 0x713539: set_regs_in_dict (trace-event-python.c:789)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442)
==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
==2106591==
==2106591== Invalid read of size 1
==2106591== at 0x484B6C6: strlen (vg_replace_strmem.c:502)
==2106591== by 0x555D494: PyUnicode_FromString (unicodeobject.c:1899)
==2106591== by 0x713545: set_regs_in_dict (trace-event-python.c:791)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442)
==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
==2106591==
73056 total, 29 ignored
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes from perf-tools/perf-tools, some of which were also in
perf-tools-next but were then indentified as being more appropriate to
go sooner, to fix regressions in v6.11.
Resolve a simple merge conflict in tools/perf/tests/pmu.c where a more
future proof approach to initialize all fields of a struct was used in
perf-tools-next, the one that is going into v6.11 is enough for the
segfault it addressed (using an uninitialized test_pmu.alias field).
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge tag 'bpf-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix crash when btf_parse_base() returns an error (Martin Lau)
- Fix out of bounds access in btf_name_valid_section() (Jeongjun Park)
* tag 'bpf-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add a selftest to check for incorrect names
bpf: add check for invalid name in btf_name_valid_section()
bpf: Fix a crash when btf_parse_base() returns an error pointer
Merge tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, bluetooth and wireless.
No known regressions at this point. Another calm week, but chances are
that has more to do with vacation season than the quality of our work.
Current release - new code bugs:
- smc: prevent NULL pointer dereference in txopt_get
- eth: ti: am65-cpsw: number of XDP-related fixes
Previous releases - regressions:
- Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over
BREDR/LE", it breaks existing user space
- Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid
later problems with suspend
- can: mcp251x: fix deadlock if an interrupt occurs during
mcp251x_open
- eth: r8152: fix the firmware communication error due to use of bulk
write
- ptp: ocp: fix serial port information export
- eth: igb: fix not clearing TimeSync interrupts for 82580
- Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo
Previous releases - always broken:
- eth: intel: fix crashes and bugs when reconfiguration and resets
happening in parallel
- wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power()
Misc:
- docs: netdev: document guidance on cleanup.h"
* tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
ila: call nf_unregister_net_hooks() sooner
tools/net/ynl: fix cli.py --subscribe feature
MAINTAINERS: fix ptp ocp driver maintainers address
selftests: net: enable bind tests
net: dsa: vsc73xx: fix possible subblocks range of CAPT block
sched: sch_cake: fix bulk flow accounting logic for host fairness
docs: netdev: document guidance on cleanup.h
net: xilinx: axienet: Fix race in axienet_stop
net: bridge: br_fdb_external_learn_add(): always set EXT_LEARN
r8152: fix the firmware doesn't work
fou: Fix null-ptr-deref in GRO.
bareudp: Fix device stats updates.
net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
bpf, net: Fix a potential race in do_sock_getsockopt()
net: dqs: Do not use extern for unused dql_group
sch/netem: fix use after free in netem_dequeue
usbnet: modern method to get random MAC
MAINTAINERS: wifi: cw1200: add net-cw1200.h
ice: do not bring the VSI up, if it was down before the XDP setup
ice: remove ICE_CFG_BUSY locking from AF_XDP code
...
Merge tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few small driver specific fixes (including some of the widespread
work on fixing missing ID tables for module autoloading and the revert
of some problematic PM work in spi-rockchip), some improvements to the
MAINTAINERS information for the NXP drivers and the addition of a new
device ID to spidev"
* tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
MAINTAINERS: SPI: Add mailing list imx@lists.linux.dev for nxp spi drivers
MAINTAINERS: SPI: Add freescale lpspi maintainer information
spi: spi-fsl-lpspi: Fix off-by-one in prescale max
spi: spidev: Add missing spi_device_id for jg10309-01
spi: bcm63xx: Enable module autoloading
spi: intel: Add check devm_kasprintf() returned value
spi: spidev: Add an entry for elgin,jg10309-01
spi: rockchip: Resolve unbalanced runtime PM / system PM handling
Merge tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A fix from Doug Anderson for a missing stub, required to fix the build
for some newly added users of devm_regulator_bulk_get_const() in
!REGULATOR configurations"
* tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Stub devm_regulator_bulk_get_const() if !CONFIG_REGULATOR
Merge tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux
Pull Rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Fix builds for nightly compiler users now that 'new_uninit' was
split into new features by using an alternative approach for the
code that used what is now called the 'box_uninit_write' feature
- Allow the 'stable_features' lint to preempt upcoming warnings about
them, since soon there will be unstable features that will become
stable in nightly compilers
- Export bss symbols too
'kernel' crate:
- 'block' module: fix wrong usage of lockdep API
'macros' crate:
- Provide correct provenance when constructing 'THIS_MODULE'
Documentation:
- Remove unintended indentation (blockquotes) in generated output
- Fix a couple typos
MAINTAINERS:
- Remove Wedson as Rust maintainer
- Update Andreas' email"
* tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux:
MAINTAINERS: update Andreas Hindborg's email address
MAINTAINERS: Remove Wedson as Rust maintainer
rust: macros: provide correct provenance when constructing THIS_MODULE
rust: allow `stable_features` lint
docs: rust: remove unintended blockquote in Quick Start
rust: alloc: eschew `Box<MaybeUninit<T>>::write`
rust: kernel: fix typos in code comments
docs: rust: remove unintended blockquote in Coding Guidelines
rust: block: fix wrong usage of lockdep API
rust: kbuild: fix export of bss symbols
Merge tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix adding a new fgraph callback after function graph tracing has
already started.
If the new caller does not initialize its hash before registering the
fgraph_ops, it can cause a NULL pointer dereference. Fix this by
adding a new parameter to ftrace_graph_enable_direct() passing in the
newly added gops directly and not rely on using the fgraph_array[],
as entries in the fgraph_array[] must be initialized.
Assign the new gops to the fgraph_array[] after it goes through
ftrace_startup_subops() as that will properly initialize the
gops->ops and initialize its hashes.
- Fix a memory leak in fgraph storage memory test.
If the "multiple fgraph storage on a function" boot up selftest fails
in the registering of the function graph tracer, it will not free the
memory it allocated for the filter. Break the loop up into two where
it allocates the filters first and then registers the functions where
any errors will do the appropriate clean ups.
- Only clear the timerlat timers if it has an associated kthread.
In the rtla tool that uses timerlat, if it was killed just as it was
shutting down, the signals can free the kthread and the timer. But
the closing of the timerlat files could cause the hrtimer_cancel() to
be called on the already freed timer. As the kthread variable is is
set to NULL when the kthreads are stopped and the timers are freed it
can be used to know not to call hrtimer_cancel() on the timer if the
kthread variable is NULL.
- Use a cpumask to keep track of osnoise/timerlat kthreads
The timerlat tracer can use user space threads for its analysis. With
the killing of the rtla tool, the kernel can get confused between if
it is using a user space thread to analyze or one of its own kernel
threads. When this confusion happens, kthread_stop() can be called on
a user space thread and bad things happen. As the kernel threads are
per-cpu, a bitmask can be used to know when a kernel thread is used
or when a user space thread is used.
- Add missing interface_lock to osnoise/timerlat stop_kthread()
The stop_kthread() function in osnoise/timerlat clears the osnoise
kthread variable, and if it was a user space thread does a put_task
on it. But this can race with the closing of the timerlat files that
also does a put_task on the kthread, and if the race happens the task
will have put_task called on it twice and oops.
- Add cond_resched() to the tracing_iter_reset() loop.
The latency tracers keep writing to the ring buffer without resetting
when it issues a new "start" event (like interrupts being disabled).
When reading the buffer with an iterator, the tracing_iter_reset()
sets its pointer to that start event by walking through all the
events in the buffer until it gets to the time stamp of the start
event. In the case of a very large buffer, the loop that looks for
the start event has been reported taking a very long time with a non
preempt kernel that it can trigger a soft lock up warning. Add a
cond_resched() into that loop to make sure that doesn't happen.
- Use list_del_rcu() for eventfs ei->list variable
It was reported that running loops of creating and deleting kprobe
events could cause a crash due to the eventfs list iteration hitting
a LIST_POISON variable. This is because the list is protected by SRCU
but when an item is deleted from the list, it was using list_del()
which poisons the "next" pointer. This is what list_del_rcu() was to
prevent.
* tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()
tracing/timerlat: Only clear timer if a kthread exists
tracing/osnoise: Use a cpumask to know what threads are kthreads
eventfs: Use list_del_rcu() for SRCU protected list variable
tracing: Avoid possible softlockup in tracing_iter_reset()
tracing: Fix memory leak in fgraph storage selftest
tracing: fgraph: Fix to add new fgraph_ops to array after ftrace_startup_subops()
Memory state around the buggy address: ffff88806461ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88806461ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888064620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^ ffff888064620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888064620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20240904144418.1162839-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Execution of command:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml /
--subscribe "monitor" --sleep 10
fails with:
File "/repo/./tools/net/ynl/cli.py", line 109, in main
ynl.check_ntf()
File "/repo/tools/net/ynl/lib/ynl.py", line 924, in check_ntf
op = self.rsp_by_value[nl_msg.cmd()]
KeyError: 19
Parsing Generic Netlink notification messages performs lookup for op in
the message. The message was not yet decoded, and is not yet considered
GenlMsg, thus msg.cmd() returns Generic Netlink family id (19) instead of
proper notification command id (i.e.: DPLL_CMD_PIN_CHANGE_NTF=13).
Allow the op to be obtained within NetlinkProtocol.decode(..) itself if the
op was not passed to the decode function, thus allow parsing of Generic
Netlink notifications without causing the failure.
Merge tag 'platform-drivers-x86-v6.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
- amd/pmf: ASUS GA403 quirk matching tweak
- dell-smbios: Fix to the init function rollback path
* tag 'platform-drivers-x86-v6.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/amd: pmf: Make ASUS GA403 quirk generic
platform/x86: dell-smbios: Fix error path in dell_smbios_init()
Merge tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fix fromShuah Khan:
"One single fix to a use-after-free bug resulting from
kunit_driver_create() failing to copy the driver name leaving it on
the stack or freeing it"
* tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Device wrappers should also manage driver name
Steven Rostedt [Thu, 5 Sep 2024 15:33:59 +0000 (11:33 -0400)]
tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()
The timerlat interface will get and put the task that is part of the
"kthread" field of the osn_var to keep it around until all references are
released. But here's a race in the "stop_kthread()" code that will call
put_task_struct() on the kthread if it is not a kernel thread. This can
race with the releasing of the references to that task struct and the
put_task_struct() can be called twice when it should have been called just
once.
Take the interface_lock() in stop_kthread() to synchronize this change.
But to do so, the function stop_per_cpu_kthreads() needs to change the
loop from for_each_online_cpu() to for_each_possible_cpu() and remove the
cpu_read_lock(), as the interface_lock can not be taken while the cpu
locks are held. The only side effect of this change is that it may do some
extra work, as the per_cpu variables of the offline CPUs would not be set
anyway, and would simply be skipped in the loop.
Remove unneeded "return;" in stop_kthread().
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tomas Glozar <tglozar@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Link: https://lore.kernel.org/20240905113359.2b934242@gandalf.local.home Fixes: e88ed227f639e ("tracing/timerlat: Add user-space interface") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Thu, 5 Sep 2024 12:53:30 +0000 (08:53 -0400)]
tracing/timerlat: Only clear timer if a kthread exists
The timerlat tracer can use user space threads to check for osnoise and
timer latency. If the program using this is killed via a SIGTERM, the
threads are shutdown one at a time and another tracing instance can start
up resetting the threads before they are fully closed. That causes the
hrtimer assigned to the kthread to be shutdown and freed twice when the
dying thread finally closes the file descriptors, causing a use-after-free
bug.
Only cancel the hrtimer if the associated thread is still around. Also add
the interface_lock around the resetting of the tlat_var->kthread.
Note, this is just a quick fix that can be backported to stable. A real
fix is to have a better synchronization between the shutdown of old
threads and the starting of new ones.
Steven Rostedt [Wed, 4 Sep 2024 14:34:28 +0000 (10:34 -0400)]
tracing/osnoise: Use a cpumask to know what threads are kthreads
The start_kthread() and stop_thread() code was not always called with the
interface_lock held. This means that the kthread variable could be
unexpectedly changed causing the kthread_stop() to be called on it when it
should not have been, leading to:
while true; do
rtla timerlat top -u -q & PID=$!;
sleep 5;
kill -INT $PID;
sleep 0.001;
kill -TERM $PID;
wait $PID;
done
This is because it would mistakenly call kthread_stop() on a user space
thread making it "exit" before it actually exits.
Since kthreads are created based on global behavior, use a cpumask to know
when kthreads are running and that they need to be shutdown before
proceeding to do new work.
Steven Rostedt [Wed, 4 Sep 2024 17:16:05 +0000 (13:16 -0400)]
eventfs: Use list_del_rcu() for SRCU protected list variable
Chi Zhiling reported:
We found a null pointer accessing in tracefs[1], the reason is that the
variable 'ei_child' is set to LIST_POISON1, that means the list was
removed in eventfs_remove_rec. so when access the ei_child->is_freed, the
panic triggered.
by the way, the following script can reproduce this panic
loop1 (){
while true
do
echo "p:kp submit_bio" > /sys/kernel/debug/tracing/kprobe_events
echo "" > /sys/kernel/debug/tracing/kprobe_events
done
}
loop2 (){
while true
do
tree /sys/kernel/debug/tracing/events/kprobes/
done
}
loop1 &
loop2
The issue is that list_del() is used on an SRCU protected list variable
before the synchronization occurs. This can poison the list pointers while
there is a reader iterating the list.
This is simply fixed by using list_del_rcu() that is specifically made for
this purpose.
Zheng Yejian [Tue, 27 Aug 2024 12:46:54 +0000 (20:46 +0800)]
tracing: Avoid possible softlockup in tracing_iter_reset()
In __tracing_open(), when max latency tracers took place on the cpu,
the time start of its buffer would be updated, then event entries with
timestamps being earlier than start of the buffer would be skipped
(see tracing_iter_reset()).
Softlockup will occur if the kernel is non-preemptible and too many
entries were skipped in the loop that reset every cpu buffer, so add
cond_resched() to avoid it.
Cc: stable@vger.kernel.org Fixes: 2f26ebd549b9a ("tracing: use timestamp to determine start of latency traces") Link: https://lore.kernel.org/20240827124654.3817443-1-zhengyejian@huaweicloud.com Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stefan Wahren [Thu, 5 Sep 2024 11:15:37 +0000 (13:15 +0200)]
spi: spi-fsl-lpspi: Fix off-by-one in prescale max
The commit 783bf5d09f86 ("spi: spi-fsl-lpspi: limit PRESCALE bit in
TCR register") doesn't implement the prescaler maximum as intended.
The maximum allowed value for i.MX93 should be 1 and for i.MX7ULP
it should be 7. So this needs also a adjustment of the comparison
in the scldiv calculation.
sched: sch_cake: fix bulk flow accounting logic for host fairness
In sch_cake, we keep track of the count of active bulk flows per host,
when running in dst/src host fairness mode, which is used as the
round-robin weight when iterating through flows. The count of active
bulk flows is updated whenever a flow changes state.
This has a peculiar interaction with the hash collision handling: when a
hash collision occurs (after the set-associative hashing), the state of
the hash bucket is simply updated to match the new packet that collided,
and if host fairness is enabled, that also means assigning new per-host
state to the flow. For this reason, the bulk flow counters of the
host(s) assigned to the flow are decremented, before new state is
assigned (and the counters, which may not belong to the same host
anymore, are incremented again).
Back when this code was introduced, the host fairness mode was always
enabled, so the decrement was unconditional. When the configuration
flags were introduced the *increment* was made conditional, but
the *decrement* was not. Which of course can lead to a spurious
decrement (and associated wrap-around to U16_MAX).
AFAICT, when host fairness is disabled, the decrement and wrap-around
happens as soon as a hash collision occurs (which is not that common in
itself, due to the set-associative hashing). However, in most cases this
is harmless, as the value is only used when host fairness mode is
enabled. So in order to trigger an array overflow, sch_cake has to first
be configured with host fairness disabled, and while running in this
mode, a hash collision has to occur to cause the overflow. Then, the
qdisc has to be reconfigured to enable host fairness, which leads to the
array out-of-bounds because the wrapped-around value is retained and
used as an array index. It seems that syzbot managed to trigger this,
which is quite impressive in its own right.
This patch fixes the issue by introducing the same conditional check on
decrement as is used on increment.
The original bug predates the upstreaming of cake, but the commit listed
in the Fixes tag touched that code, meaning that this patch won't apply
before that.
Fixes: 712639929912 ("sch_cake: Make the dual modes fairer") Reported-by: syzbot+7fe7b81d602cc1e6b94d@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20240903160846.20909-1-toke@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Fri, 30 Aug 2024 17:14:42 +0000 (10:14 -0700)]
docs: netdev: document guidance on cleanup.h
Document what was discussed multiple times on list and various
virtual / in-person conversations. guard() being okay in functions
<= 20 LoC is a bit of my own invention. If the function is trivial
it should be fine, but feel free to disagree :)
We'll obviously revisit this guidance as time passes and we and other
subsystems get more experience.
Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240830171443.3532077-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 5 Sep 2024 00:37:37 +0000 (17:37 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
ice: fix synchronization between .ndo_bpf() and reset
Larysa Zaremba says:
PF reset can be triggered asynchronously, by tx_timeout or by a user. With some
unfortunate timings both ice_vsi_rebuild() and .ndo_bpf will try to access and
modify XDP rings at the same time, causing system crash.
The first patch factors out rtnl-locked code from VSI rebuild code to avoid
deadlock. The following changes lock rebuild and .ndo_bpf() critical sections
with an internal mutex as well and provide complementary fixes.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: do not bring the VSI up, if it was down before the XDP setup
ice: remove ICE_CFG_BUSY locking from AF_XDP code
ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
ice: check for XDP rings instead of bpf program when unconfiguring
ice: protect XDP configuration with a mutex
ice: move netif_queue_set_napi to rtnl-protected sections
====================
Jakub Kicinski [Thu, 5 Sep 2024 00:14:11 +0000 (17:14 -0700)]
Merge tag 'wireless-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.11
Hopefully final fixes for v6.11 and this time only fixes to ath11k
driver. We need to revert hibernation support due to reported
regressions and we have a fix for kernel crash introduced in
v6.11-rc1.
* tag 'wireless-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
MAINTAINERS: wifi: cw1200: add net-cw1200.h
Revert "wifi: ath11k: support hibernation"
Revert "wifi: ath11k: restore country code during resume"
wifi: ath11k: fix NULL pointer dereference in ath11k_mac_get_eirp_power()
====================
Sean Anderson [Tue, 3 Sep 2024 17:51:41 +0000 (13:51 -0400)]
net: xilinx: axienet: Fix race in axienet_stop
axienet_dma_err_handler can race with axienet_stop in the following
manner:
CPU 1 CPU 2
====================== ==================
axienet_stop()
napi_disable()
axienet_dma_stop()
axienet_dma_err_handler()
napi_disable()
axienet_dma_stop()
axienet_dma_start()
napi_enable()
cancel_work_sync()
free_irq()
Fix this by setting a flag in axienet_stop telling
axienet_dma_err_handler not to bother doing anything. I chose not to use
disable_work_sync to allow for easier backporting.
Jonas Gorski [Tue, 3 Sep 2024 08:19:57 +0000 (10:19 +0200)]
net: bridge: br_fdb_external_learn_add(): always set EXT_LEARN
When userspace wants to take over a fdb entry by setting it as
EXTERN_LEARNED, we set both flags BR_FDB_ADDED_BY_EXT_LEARN and
BR_FDB_ADDED_BY_USER in br_fdb_external_learn_add().
If the bridge updates the entry later because its port changed, we clear
the BR_FDB_ADDED_BY_EXT_LEARN flag, but leave the BR_FDB_ADDED_BY_USER
flag set.
If userspace then wants to take over the entry again,
br_fdb_external_learn_add() sees that BR_FDB_ADDED_BY_USER and skips
setting the BR_FDB_ADDED_BY_EXT_LEARN flags, thus silently ignores the
update.
Fix this by always allowing to set BR_FDB_ADDED_BY_EXT_LEARN regardless
if this was a user fdb entry or not.
Fixes: 710ae7287737 ("net: bridge: Mark FDB entries that were added by user as such") Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20240903081958.29951-1-jonas.gorski@bisdn.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hayes Wang [Tue, 3 Sep 2024 06:33:33 +0000 (14:33 +0800)]
r8152: fix the firmware doesn't work
generic_ocp_write() asks the parameter "size" must be 4 bytes align.
Therefore, write the bp would fail, if the mac->bp_num is odd. Align the
size to 4 for fixing it. The way may write an extra bp, but the
rtl8152_is_fw_mac_ok() makes sure the value must be 0 for the bp whose
index is more than mac->bp_num. That is, there is no influence for the
firmware.
Besides, I check the return value of generic_ocp_write() to make sure
everything is correct.
We observed a null-ptr-deref in fou_gro_receive() while shutting down
a host. [0]
The NULL pointer is sk->sk_user_data, and the offset 8 is of protocol
in struct fou.
When fou_release() is called due to netns dismantle or explicit tunnel
teardown, udp_tunnel_sock_release() sets NULL to sk->sk_user_data.
Then, the tunnel socket is destroyed after a single RCU grace period.
So, in-flight udp4_gro_receive() could find the socket and execute the
FOU GRO handler, where sk->sk_user_data could be NULL.
Let's use rcu_dereference_sk_user_data() in fou_from_sock() and add NULL
checks in FOU GRO handlers.
Merge tag 'bcachefs-2024-09-04' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
- Fix a typo in the rebalance accounting changes
- BCH_SB_MEMBER_INVALID: small on disk format feature which will be
needed for full erasure coding support; this is only the minimum so
that 6.11 can handle future versions without barfing.
* tag 'bcachefs-2024-09-04' of git://evilpiepirate.org/bcachefs:
bcachefs: BCH_SB_MEMBER_INVALID
bcachefs: fix rebalance accounting
====================
bpf: fix incorrect name check pass logic in btf_name_valid_section
This patch was written to fix an issue where btf_name_valid_section() would
not properly check names with certain conditions and would throw an OOB vuln.
And selftest was added to verify this patch.
====================
Fix two inconsistencies in feature names as discussed in [1]:
1. Rename "dwarf-unwind-support" to "dwarf-unwind"
2. 'get_cpuid' feature and 'HAVE_AUXTRACE_SUPPORT' names don't
look related, change the feature name to 'auxtrace' to match the
macro name, as 'get_cpuid' string is not used anywhere to check the
feature presence
perf tests probe_vfs_getname.sh: Update to use 'perf check feature'
In probe_vfs_getname.sh, current we use "perf record --dry-run"
to check for libtraceevent and skip the test if perf is not
build with libtraceevent. Change the check to use "perf check feature"
option
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904190132.415212-6-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf version: Update --build-options to use 'supported_features' array
Now that the feature list has been duplicated in a global
'supported_features' array, use that array instead of manually checking
status of built-in features.
This helps in being consistent with commands such as 'perf check feature',
so commands can use the same array, and any new feature can be added at
one place, in the 'supported_features' array
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904190132.415212-4-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge tag 'perf-tools-fixes-for-v6.11-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
"A number of small fixes for the late cycle:
- Two more build fixes on 32-bit archs
- Fixed a segfault during perf test
- Fixed spinlock/rwlock accounting bug in perf lock contention"
* tag 'perf-tools-fixes-for-v6.11-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf daemon: Fix the build on more 32-bit architectures
perf python: include "util/sample.h"
perf lock contention: Fix spinlock and rwlock accounting
perf test pmu: Set uninitialized PMU alias to null