Will Deacon [Tue, 6 Dec 2022 11:25:43 +0000 (11:25 +0000)]
Merge branch 'for-next/selftests' into for-next/core
* for-next/selftests:
kselftest/arm64: Allow epoll_wait() to return more than one result
kselftest/arm64: Don't drain output while spawning children
kselftest/arm64: Hold fp-stress children until they're all spawned
kselftest/arm64: Set test names prior to starting children
kselftest/arm64: Use preferred form for predicate load/stores
kselftest/arm64: fix array_size.cocci warning
kselftest/arm64: fix array_size.cocci warning
kselftest/arm64: Print ASCII version of unknown signal frame magic values
kselftest/arm64: Remove validation of extra_context from TODO
kselftest/arm64: Provide progress messages when signalling children
kselftest/arm64: Check that all children are producing output in fp-stress
Will Deacon [Tue, 6 Dec 2022 11:20:21 +0000 (11:20 +0000)]
Merge branch 'for-next/kprobes' into for-next/core
* for-next/kprobes:
arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK
arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
arm64: Prohibit instrumentation on arch_stack_walk()
Will Deacon [Tue, 6 Dec 2022 11:16:20 +0000 (11:16 +0000)]
Merge branch 'for-next/kdump' into for-next/core
* for-next/kdump:
arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
arm64: kdump: Provide default size when crashkernel=Y,low is not specified
Will Deacon [Tue, 6 Dec 2022 10:44:07 +0000 (10:44 +0000)]
Merge branch 'for-next/cpufeature' into for-next/core
* for-next/cpufeature:
kselftest/arm64: Add SVE 2.1 to hwcap test
arm64/hwcap: Add support for SVE 2.1
kselftest/arm64: Add FEAT_RPRFM to the hwcap test
arm64/hwcap: Add support for FEAT_RPRFM
kselftest/arm64: Add FEAT_CSSC to the hwcap selftest
arm64/hwcap: Add support for FEAT_CSSC
arm64: Enable data independent timing (DIT) in the kernel
arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK
Return DBG_HOOK_ERROR if kprobes can not handle a BRK because it
fails to find a kprobe corresponding to the address.
Since arm64 kprobes uses stop_machine based text patching for removing
BRK, it ensures all running kprobe_break_handler() is done at that point.
And after removing the BRK, it removes the kprobe from its hash list.
Thus, if the kprobe_break_handler() fails to find kprobe from hash list,
there is a bug.
arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
Since arm64's do_page_fault() can handle the page fault correctly
than kprobe_fault_handler() according to the context, let it handle
the page fault instead of simply call fixup_exception() in the
kprobe_fault_handler().
arm64: Prohibit instrumentation on arch_stack_walk()
Mark arch_stack_walk() as noinstr instead of notrace and inline functions
called from arch_stack_walk() as __always_inline so that user does not
put any instrumentations on it, because this function can be used from
return_address() which is used by lockdep.
Without this, if the kernel built with CONFIG_LOCKDEP=y, just probing
arch_stack_walk() via <tracefs>/kprobe_events will crash the kernel on
arm64.
junhua huang [Fri, 2 Dec 2022 07:11:10 +0000 (15:11 +0800)]
arm64:uprobe fix the uprobe SWBP_INSN in big-endian
We use uprobe in aarch64_be, which we found the tracee task would exit
due to SIGILL when we enable the uprobe trace.
We can see the replace inst from uprobe is not correct in aarch big-endian.
As in Armv8-A, instruction fetches are always treated as little-endian,
we should treat the UPROBE_SWBP_INSN as little-endian。
The test case is as following。
bash-4.4# ./mqueue_test_aarchbe 1 1 2 1 10 > /dev/null &
bash-4.4# cd /sys/kernel/debug/tracing/
bash-4.4# echo 'p:test /mqueue_test_aarchbe:0xc30 %x0 %x1' > uprobe_events
bash-4.4# echo 1 > events/uprobes/enable
bash-4.4#
bash-4.4# ps
PID TTY TIME CMD
140 ? 00:00:01 bash
237 ? 00:00:00 ps
[1]+ Illegal instruction ./mqueue_test_aarchbe 1 1 2 1 100 > /dev/null
Anshuman Khandual [Fri, 2 Dec 2022 01:56:11 +0000 (07:26 +0530)]
arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()
__hw_perf_event_init() already calls armpmu->map_event() callback, and also
returns its error code including -ENOENT, along with a debug callout. Hence
an additional armpmu->map_event() check for -ENOENT is redundant.
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20221202015611.338499-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Tue, 29 Nov 2022 21:59:25 +0000 (21:59 +0000)]
kselftest/arm64: Allow epoll_wait() to return more than one result
When everything is starting up we are likely to have a lot of child
processes producing output at once. This means that we can reduce
overhead a bit by allowing epoll_wait() to return more than one
descriptor at once, it cuts down on the number of system calls we need
to do which on virtual platforms where the syscall overhead is a bit
more noticable and we're likely to have a lot more children active can
make a small but noticable difference.
On physical platforms the relatively small number of processes being run
and vastly improved speeds push the effects of this change into the
noise.
Mark Brown [Tue, 29 Nov 2022 21:59:23 +0000 (21:59 +0000)]
kselftest/arm64: Hold fp-stress children until they're all spawned
At present fp-stress has a bit of a thundering herd problem since the
children it spawns start running immediately, meaning that they can start
starving the parent process of CPU before it has even started all the
children. This is much more severe on virtual platforms since they tend to
support far more SVE and SME vector lengths, be slower in general and for
some have issues with performance when simulating multiple CPUs.
We can mitigate this problem by having all the child processes block before
starting the test program, meaning that we at least have all the child
processes started before we start heavily using CPU. We still have the same
load issues while waiting for the actual stress test programs to start up
and produce output but they're at least all ready to go before that kicks
in, resulting in substantial reductions in overall runtime on some of the
severely affected systems. One test was showing about 20% improvement.
Will Deacon [Wed, 16 Nov 2022 17:03:24 +0000 (17:03 +0000)]
firmware: arm_ffa: Move constants to header file
FF-A function IDs and error codes will be needed in the hypervisor too,
so move to them to the header file where they can be shared. Rename the
version constants with an "FFA_" prefix so that they are less likely
to clash with other code in the tree.
Co-developed-by: Andrew Walbran <qwandor@google.com> Signed-off-by: Andrew Walbran <qwandor@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20221116170335.2341003-2-qperret@google.com Signed-off-by: Will Deacon <will@kernel.org>
Yicong Yang [Thu, 17 Nov 2022 08:41:36 +0000 (16:41 +0800)]
drivers/perf: hisi: Add TLP filter support
The PMU support to filter the TLP when counting the bandwidth with below
options:
- only count the TLP headers
- only count the TLP payloads
- count both TLP headers and payloads
In the current driver it's default to count the TLP payloads only, which
will have an implicity side effects that on the traffic only have header
only TLPs, we'll get no data.
Make this user configuration through "len_mode" parameter and make it
default to count both TLP headers and payloads when user not specified.
Also update the documentation for it.
Bagas Sanjaya [Thu, 17 Nov 2022 08:41:35 +0000 (16:41 +0800)]
Documentation: perf: Indent filter options list of hisi-pcie-pmu
The "Filter options" list have a rather ugly indentation. Also, the first
paragraph after list name is rendered without separator (as continuation
from the name).
Align the list by indenting the list items and add a blank line
separator for each list name.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20221117084136.53572-4-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
Anshuman Khandual [Mon, 28 Nov 2022 02:54:49 +0000 (08:24 +0530)]
arm64/perf: Replace PMU version number '0' with ID_AA64DFR0_EL1_PMUVer_NI
__armv8pmu_probe_pmu() returns if detected PMU is either not implemented or
implementation defined. Extracted ID_AA64DFR0_EL1_PMUVer value, when PMU is
not implemented is '0' which can be replaced with ID_AA64DFR0_EL1_PMUVer_NI
defined as '0b0000'.
Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20221128025449.39085-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Thu, 24 Nov 2022 12:07:22 +0000 (12:07 +0000)]
kselftest/arm64: Set test names prior to starting children
Since we now flush output immediately on starting children we should ensure
that the child name is set beforehand so that any output that does get
flushed from the newly created child has the name of the child attached.
Ard Biesheuvel [Tue, 22 Nov 2022 17:02:49 +0000 (18:02 +0100)]
arm64: booting: Require placement within 48-bit addressable memory
Some configurations (i.e., 64k + LVA/LPA) can tolerate a physical
placement of the kernel image outside of the 48-bit addressable region,
but given that the loader has no way of knowing whether or not the image
in question supports LVA/LPA, it currently has no choice but to place it
below the 48-bit mark.
Once we add support for LPA2, which allows 52-bit physical and virtual
addressing when using 4k or 16k pages, but in way that relies on
increasing the number of paging levels, there will be more variety in
the configurations that may or may not support this.
So redefine bit #3 in the Image header as 'must be placed within 48-bit
addressable memory', as this is the current de facto meaning.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20221122170249.2453853-1-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
... due to some mismatched ifdeffery, some of which checks
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and some of which checks
CONFIG_DYNAMIC_FTRACE_WITH_ARGS, leading to some missing definitions expected
by the core code when CONFIG_DYNAMIC_FTRACE=n and consequently
CONFIG_DYNAMIC_FTRACE_WITH_ARGS=n.
There's really not much point in supporting CONFIG_DYNAMIC_FTRACE=n (AKA
static ftrace). All supported toolchains allow us to implement
DYNAMIC_FTRACE, distributions all prefer DYNAMIC_FTRACE, and both
powerpc and s390 removed support for static ftrace in commits:
0c0c52306f4792a4 ("powerpc: Only support DYNAMIC_FTRACE not static") 5d6a0163494c78ad ("s390/ftrace: enforce DYNAMIC_FTRACE if FUNCTION_TRACER is selected")
... and according to Steven, static ftrace is only supported on x86 to
allow testing that the core code still functions in this configuration.
Given that, let's simplify matters by removing arm64's support for
static ftrace. This avoids the problem originally reported, and leaves
us with less code to maintain.
Jiucheng Xu [Tue, 22 Nov 2022 08:40:28 +0000 (16:40 +0800)]
perf/amlogic: Fix build error for x86_64 allmodconfig
The driver misses including <linux/io.h>, which causes a compilation
error with x86_64 'allmodconfig':
drivers/perf/amlogic/meson_g12_ddr_pmu.c: In function 'dmc_g12_get_freq_quick':
drivers/perf/amlogic/meson_g12_ddr_pmu.c:135:15: error: implicit declaration of function 'readl' [-Werror=implicit-function-declaration]
135 | val = readl(info->pll_reg);
| ^~~~~
drivers/perf/amlogic/meson_g12_ddr_pmu.c: In function 'dmc_g12_counter_enable':
drivers/perf/amlogic/meson_g12_ddr_pmu.c:204:9: error: implicit declaration of function 'writel' [-Werror=implicit-function-declaration]
204 | writel(clock_count, info->ddr_reg[0] + DMC_MON_G12_TIMER);
| ^~~~~~
Add the missing header to fix the build.
Fixes: 2016e2113d35 ("perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jiucheng Xu <jiucheng.xu@amlogic.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20221122084028.572494-1-jiucheng.xu@amlogic.com Signed-off-by: Will Deacon <will@kernel.org>
Jiucheng Xu [Mon, 21 Nov 2022 02:15:58 +0000 (10:15 +0800)]
perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver
Add support for Amlogic Meson G12 Series SOC - DDR bandwidth PMU driver
framework and interfaces. The PMU can not only monitor the total DDR
bandwidth, but also individual IP module bandwidth.
Shaokun Zhang [Fri, 18 Nov 2022 06:54:00 +0000 (14:54 +0800)]
MAINTAINERS: Update HiSilicon PMU maintainers
Now Qi Liu has left HiSilicon and will no longer access to the
necessary hardware and document, remove the mail and thanks for
her's work.
While add the new maintainer Jonathan Cameron, He is skilled with
kernel and enough knowledge of the driver.
Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Qi Liu <liuqi6124@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Qi Liu <liuqi6124@gmail.com> Link: https://lore.kernel.org/r/20221118065400.48836-1-zhangshaokun@hisilicon.com Signed-off-by: Will Deacon <will@kernel.org>
Anshuman Khandual [Wed, 16 Nov 2022 14:09:15 +0000 (19:39 +0530)]
arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption
If a Cortex-A715 cpu sees a page mapping permissions change from executable
to non-executable, it may corrupt the ESR_ELx and FAR_ELx registers, on the
next instruction abort caused by permission fault.
Only user-space does executable to non-executable permission transition via
mprotect() system call which calls ptep_modify_prot_start() and ptep_modify
_prot_commit() helpers, while changing the page mapping. The platform code
can override these helpers via __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION.
Work around the problem via doing a break-before-make TLB invalidation, for
all executable user space mappings, that go through mprotect() system call.
This overrides ptep_modify_prot_start() and ptep_modify_prot_commit(), via
defining HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION on the platform thus giving
an opportunity to intercept user space exec mappings, and do the necessary
TLB invalidation. Similar interceptions are also implemented for HugeTLB.
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20221116140915.356601-3-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Thu, 17 Nov 2022 11:41:30 +0000 (11:41 +0000)]
kselftest/arm64: Use preferred form for predicate load/stores
The preferred form of the str/ldr for predicate registers with an immediate
of zero is to omit the zero, and the clang built in assembler rejects the
zero immediate. Drop the immediate.
Zhen Lei [Wed, 16 Nov 2022 12:10:44 +0000 (20:10 +0800)]
arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
For crashkernel=X without '@offset', select a region within DMA zones
first, and fall back to reserve region above DMA zones. This allows
users to use the same configuration on multiple platforms.
Zhen Lei [Wed, 16 Nov 2022 12:10:43 +0000 (20:10 +0800)]
arm64: kdump: Provide default size when crashkernel=Y,low is not specified
Try to allocate at least 128 MiB low memory automatically for the case
that crashkernel=,high is explicitly specified, while crashkenrel=,low
is omitted. This allows users to focus more on the high memory
requirements of their business rather than the low memory requirements
of the crash kernel booting.
Mark Rutland [Thu, 3 Nov 2022 17:05:20 +0000 (17:05 +0000)]
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer
authentication, where it's necessary to snapshot/manipulate the LR
before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need
to hook function entry before an instrumented function manipulates
the stack or argument registers. Practically speaking, we need to
preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.
As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.
I've tested this with the ftrace selftests, where there are no
unexpected failures.
Co-developed-by: Florent Revest <revest@chromium.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Florent Revest <revest@chromium.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly necessary
in the core ftrace code.
This patch adds new ftrace_regs_{get,set}_*() helpers which can be used
to manipulate ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y,
these can always be used on any ftrace_regs, and when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n these can be used when regs are
available. A new ftrace_regs_has_args(fregs) helper is added which code
can use to check when these are usable.
Co-developed-by: Florent Revest <revest@chromium.org> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
In subsequent patches we'll add a sew of ftrace_regs_{get,set}_*()
helpers. In preparation, this patch renames
ftrace_instruction_pointer_set() to
ftrace_regs_set_instruction_pointer().
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Florent Revest <revest@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Thu, 3 Nov 2022 17:05:17 +0000 (17:05 +0000)]
ftrace: pass fregs to arch_ftrace_set_direct_caller()
In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly
necessary in the core ftrace code.
This patch changes the prototype of arch_ftrace_set_direct_caller() to
take ftrace_regs rather than pt_regs, and moves the extraction of the
pt_regs into arch_ftrace_set_direct_caller().
On x86, arch_ftrace_set_direct_caller() can be used even when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n, and <linux/ftrace.h> defines
struct ftrace_regs. Due to this, it's necessary to define
arch_ftrace_set_direct_caller() as a macro to avoid using an incomplete
type. I've also moved the body of arch_ftrace_set_direct_caller() after
the CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y defineidion of struct
ftrace_regs.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Florent Revest <revest@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Besar Wicaksono [Wed, 16 Nov 2022 20:39:52 +0000 (14:39 -0600)]
perf: arm_cspmu: Fix module cyclic dependency
Build on arm64 allmodconfig failed with:
| depmod: ERROR: Cycle detected: arm_cspmu -> nvidia_cspmu -> arm_cspmu
| depmod: ERROR: Found 2 modules in dependency cycles!
The arm_cspmu.c provides standard functions to operate the PMU and the
vendor code provides vendor specific attributes. Both need to be built as
single kernel module.
Update the makefile to compile sources under arm_cspmu into one module.
Besar Wicaksono [Fri, 11 Nov 2022 22:23:28 +0000 (16:23 -0600)]
perf: arm_cspmu: Add support for ARM CoreSight PMU driver
Add support for ARM CoreSight PMU driver framework and interfaces.
The driver provides generic implementation to operate uncore PMU based
on ARM CoreSight PMU architecture. The driver also provides interface
to get vendor/implementation specific information, for example event
attributes and formating.
The specification used in this implementation can be found below:
* ACPI Arm Performance Monitoring Unit table:
https://developer.arm.com/documentation/den0117/latest
* ARM Coresight PMU architecture:
https://developer.arm.com/documentation/ihi0091/latest
Shang XiaoJing [Tue, 15 Nov 2022 11:55:40 +0000 (19:55 +0800)]
perf/smmuv3: Fix hotplug callback leak in arm_smmu_pmu_init()
arm_smmu_pmu_init() won't remove the callback added by
cpuhp_setup_state_multi() when platform_driver_register() failed. Remove
the callback by cpuhp_remove_multi_state() in fail path.
Similar to the handling of arm_ccn_init() in commit 26242b330093 ("bus:
arm-ccn: Prevent hotplug callback leak")
Shang XiaoJing [Tue, 15 Nov 2022 11:55:39 +0000 (19:55 +0800)]
perf/arm_dmc620: Fix hotplug callback leak in dmc620_pmu_init()
dmc620_pmu_init() won't remove the callback added by
cpuhp_setup_state_multi() when platform_driver_register() failed. Remove
the callback by cpuhp_remove_multi_state() in fail path.
Similar to the handling of arm_ccn_init() in commit 26242b330093 ("bus:
arm-ccn: Prevent hotplug callback leak")
Fixes: 53c218da220c ("driver/perf: Add PMU driver for the ARM DMC-620 memory controller") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com> Link: https://lore.kernel.org/r/20221115115540.6245-2-shangxiaojing@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
Yuan Can [Tue, 15 Nov 2022 07:02:07 +0000 (07:02 +0000)]
drivers: perf: marvell_cn10k: Fix hotplug callback leak in tad_pmu_init()
tad_pmu_init() won't remove the callback added by cpuhp_setup_state_multi()
when platform_driver_register() failed. Remove the callback by
cpuhp_remove_multi_state() in fail path.
Similar to the handling of arm_ccn_init() in commit 26242b330093 ("bus:
arm-ccn: Prevent hotplug callback leak")
Yuan Can [Tue, 15 Nov 2022 07:02:06 +0000 (07:02 +0000)]
perf: arm_dsu: Fix hotplug callback leak in dsu_pmu_init()
dsu_pmu_init() won't remove the callback added by cpuhp_setup_state_multi()
when platform_driver_register() failed. Remove the callback by
cpuhp_remove_multi_state() in fail path.
Similar to the handling of arm_ccn_init() in commit 26242b330093 ("bus:
arm-ccn: Prevent hotplug callback leak")
Fixes: 7520fa99246d ("perf: ARM DynamIQ Shared Unit PMU support") Signed-off-by: Yuan Can <yuancan@huawei.com> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20221115070207.32634-2-yuancan@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Mon, 14 Nov 2022 10:44:11 +0000 (10:44 +0000)]
arm64: mm: kfence: only handle translation faults
Alexander noted that KFENCE only expects to handle faults from invalid page
table entries (i.e. translation faults), but arm64's fault handling logic will
call kfence_handle_page_fault() for other types of faults, including alignment
faults caused by unaligned atomics. This has the unfortunate property of
causing those other faults to be reported as "KFENCE: use-after-free",
which is misleading and hinders debugging.
Fix this by only forwarding unhandled translation faults to the KFENCE
code, similar to what x86 does already.
Alexander has verified that this passes all the tests in the KFENCE test
suite and avoids bogus reports on misaligned atomics.
Mark Rutland [Mon, 14 Nov 2022 13:59:28 +0000 (13:59 +0000)]
arm64: insn: always inline hint generation
All users of aarch64_insn_gen_hint() (e.g. aarch64_insn_gen_nop()) pass
a constant argument and generate a constant value. Some of those users
are noinstr code (e.g. for alternatives patching).
For noinstr code it is necessary to either inline these functions or to
ensure the out-of-line versions are noinstr.
Since in all cases these are generating a constant, make them
__always_inline.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20221114135928.3000571-5-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Mon, 14 Nov 2022 13:59:27 +0000 (13:59 +0000)]
arm64: insn: simplify insn group identification
The only code which needs to check for an entire instruction group is
the aarch64_insn_is_steppable() helper function used by kprobes, which
must not be instrumented, and only needs to check for the "Branch,
exception generation and system instructions" class.
Currently we have an out-of-line helper in insn.c which must be marked
as __kprobes, which indexes a table with some bits extracted from the
instruction. In aarch64_insn_is_steppable() we then need to compare the
result with an expected enum value.
It would be simpler to have a predicate for this, as with the other
aarch64_insn_is_*() helpers, which would be always inlined to prevent
inadvertent instrumentation, and would permit better code generation.
This patch adds a predicate function for this instruction group using
the existing __AARCH64_INSN_FUNCS() helpers, and removes the existing
out-of-line helper. As the only class we currently care about is the
branch+exception+sys class, I have only added helpers for this, and left
the other classes unimplemented for now.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20221114135928.3000571-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Mon, 14 Nov 2022 13:59:26 +0000 (13:59 +0000)]
arm64: insn: always inline predicates
We have a number of aarch64_insn_*() predicates which are used in code
which is not instrumentation safe (e.g. alternatives patching, kprobes).
Some of those are marked with __kprobes, but most are not, and are
implemented out-of-line in insn.c.
This patch moves the predicates to insn.h and marks them with
__always_inline. This is ensures that they will respect the
instrumentation requirements of their caller which they will be inlined
into.
At the same time, I've formatted each of the functions consistently as a
list, to make them easier to read and update in future.
Other than preventing unwanted instrumentation, there should be no
functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20221114135928.3000571-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Jeremy Linton [Wed, 9 Nov 2022 17:47:20 +0000 (11:47 -0600)]
ACPI: Enable FPDT on arm64
FPDT provides some boot timing records useful for analyzing
parts of the UEFI boot stack. Given the existing code works
on arm64, and allows reading the values without utilizing
/dev/mem it seems like a good idea to turn it on.
wangkailong@jari.cn [Sun, 13 Nov 2022 09:41:10 +0000 (17:41 +0800)]
kselftest/arm64: fix array_size.cocci warning
Fix following coccicheck warning:
tools/testing/selftests/arm64/mte/check_mmap_options.c:64:24-25:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:66:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:135:25-26:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:96:25-26:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:190:24-25:
WARNING: Use ARRAY_SIZE
Anshuman Khandual [Mon, 7 Nov 2022 14:17:53 +0000 (19:47 +0530)]
arm64/mm: Simplify and document pte_to_phys() for 52 bit addresses
pte_to_phys() assembly definition does multiple bits field transformations
to derive physical address, embedded inside a page table entry. Unlike its
C counter part i.e __pte_to_phys(), pte_to_phys() is not very apparent. It
simplifies these operations via a new macro PTE_ADDR_HIGH_SHIFT indicating
how far the pte encoded higher address bits need to be left shifted. While
here, this also updates __pte_to_phys() and __phys_to_pte_val().
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Suggested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20221107141753.2938621-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Thu, 27 Oct 2022 15:59:08 +0000 (17:59 +0200)]
arm64: implement dynamic shadow call stack for Clang
Implement dynamic shadow call stack support on Clang, by parsing the
unwind tables at init time to locate all occurrences of PACIASP/AUTIASP
instructions, and replacing them with the shadow call stack push and pop
instructions, respectively.
This is useful because the overhead of the shadow call stack is
difficult to justify on hardware that implements pointer authentication
(PAC), and given that the PAC instructions are executed as NOPs on
hardware that doesn't, we can just replace them without breaking
anything. As PACIASP/AUTIASP are guaranteed to be paired with respect to
manipulations of the return address, replacing them 1:1 with shadow call
stack pushes and pops is guaranteed to result in the desired behavior.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20221027155908.1940624-4-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Thu, 27 Oct 2022 15:59:07 +0000 (17:59 +0200)]
scs: add support for dynamic shadow call stacks
In order to allow arches to use code patching to conditionally emit the
shadow stack pushes and pops, rather than always taking the performance
hit even on CPUs that implement alternatives such as stack pointer
authentication on arm64, add a Kconfig symbol that can be set by the
arch to omit the SCS codegen itself, without otherwise affecting how
support code for SCS and compiler options (for register reservation, for
instance) are emitted.
Also, add a static key and some plumbing to omit the allocation of
shadow call stack for dynamic SCS configurations if SCS is disabled at
runtime.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20221027155908.1940624-3-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Thu, 27 Oct 2022 15:59:06 +0000 (17:59 +0200)]
arm64: unwind: add asynchronous unwind tables to kernel and modules
Enable asynchronous unwind table generation for both the core kernel as
well as modules, and emit the resulting .eh_frame sections as init code
so we can use the unwind directives for code patching at boot or module
load time.
This will be used by dynamic shadow call stack support, which will rely
on code patching rather than compiler codegen to emit the shadow call
stack push and pop instructions.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20221027155908.1940624-2-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Mon, 17 Oct 2022 15:25:19 +0000 (16:25 +0100)]
arm64/hwcap: Add support for SVE 2.1
FEAT_SVE2p1 introduces a number of new SVE instructions. Since there is no
new architectural state added kernel support is simply a new hwcap which
lets userspace know that the feature is supported.
Mark Brown [Mon, 17 Oct 2022 15:25:17 +0000 (16:25 +0100)]
arm64/hwcap: Add support for FEAT_RPRFM
FEAT_RPRFM adds a new range prefetch hint within the existing PRFM space
for range prefetch hinting. Add a new hwcap to allow userspace to discover
support for the new instruction.
Mark Brown [Mon, 17 Oct 2022 15:25:15 +0000 (16:25 +0100)]
arm64/hwcap: Add support for FEAT_CSSC
FEAT_CSSC adds a number of new instructions usable to optimise common short
sequences of instructions, add a hwcap indicating that the feature is
available and can be used by userspace.
Mark Brown [Mon, 7 Nov 2022 17:07:47 +0000 (17:07 +0000)]
arm64/fpsimd: Make kernel_neon_ API _GPL
Currently for reasons lost in the mists of time the kernel_neon_ APIs are
EXPORT_SYMBOL() but the general policy for floating point usage is that it
should be GPL only given the non-standard runtime environment that holds
while it is in use and PCS impacts when code is compiled for FP usage.
Given the limited existing deployment of non-GPL modules for arm64 and the
fact that other architectures like x86 already make their equivalent
functions GPL only this is not expected to be disruptive to existing users.
Kang Minchul [Sat, 5 Nov 2022 07:31:43 +0000 (16:31 +0900)]
kselftest/arm64: fix array_size.cocci warning
Use ARRAY_SIZE to fix the following coccicheck warnings:
tools/testing/selftests/arm64/mte/check_buffer_fill.c:341:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:35:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:168:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:72:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:369:25-26:
WARNING: Use ARRAY_SIZE
Mark Brown [Wed, 2 Nov 2022 14:05:43 +0000 (14:05 +0000)]
kselftest/arm64: Print ASCII version of unknown signal frame magic values
The signal magic values are supposed to be allocated as somewhat meaningful
ASCII so if we encounter a bad magic value print the any alphanumeric
characters we find in it as well as the hex value to aid debuggability.
Mark Brown [Thu, 27 Oct 2022 11:03:24 +0000 (12:03 +0100)]
kselftest/arm64: Remove validation of extra_context from TODO
When fixing up support for extra_context in the signal handling tests I
didn't notice that there is a TODO file in the directory which lists this
as a thing to be done. Since it's been done remove it from the list.
Mark Brown [Mon, 17 Oct 2022 14:45:53 +0000 (15:45 +0100)]
kselftest/arm64: Provide progress messages when signalling children
Especially when the test is configured to run for a longer time it can be
reassuring to users to see that the supervising program is running OK so
provide a message every second when the output timer expires.
Mark Brown [Mon, 17 Oct 2022 14:45:52 +0000 (15:45 +0100)]
kselftest/arm64: Check that all children are producing output in fp-stress
Currently we don't have an explicit check that when it's been a second
since we have seen output produced from the test programs starting up that
means all of them are running and we should start both sending signals and
timing out. This is not reliable, especially on very heavily loaded systems
where the test programs might take longer than a second to run.
We do skip sending signals to children that have not produced output yet
so we won't cause them to exit unexpectedly by sending a signal but this
can create confusion when interpreting output, for example appearing to
show the tests running for less time than expected or appearing to show
missed signal deliveries. Avoid issues by explicitly checking that we have
seen output from all the child processes before we start sending signals
or counting test run time.
This is especially likely on virtual platforms with large numbers of vector
lengths supported since the platforms are slow and there will be a lot of
tasks per CPU.
Ard Biesheuvel [Mon, 7 Nov 2022 17:24:00 +0000 (18:24 +0100)]
arm64: Enable data independent timing (DIT) in the kernel
The ARM architecture revision v8.4 introduces a data independent timing
control (DIT) which can be set at any exception level, and instructs the
CPU to avoid optimizations that may result in a correlation between the
execution time of certain instructions and the value of the data they
operate on.
The DIT bit is part of PSTATE, and is therefore context switched as
usual, given that it becomes part of the saved program state (SPSR) when
taking an exception. We have also defined a hwcap for DIT, and so user
space can discover already whether or nor DIT is available. This means
that, as far as user space is concerned, DIT is wired up and fully
functional.
In the kernel, however, we never bothered with DIT: we disable at it
boot (i.e., INIT_PSTATE_EL1 has DIT cleared) and ignore the fact that we
might run with DIT enabled if user space happened to set it.
Currently, we have no idea whether or not running privileged code with
DIT disabled on a CPU that implements support for it may result in a
side channel that exposes privileged data to unprivileged user space
processes, so let's be cautious and just enable DIT while running in the
kernel if supported by all CPUs.
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Kees Cook <keescook@chromium.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Adam Langley <agl@google.com> Link: https://lore.kernel.org/all/YwgCrqutxmX0W72r@gmail.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20221107172400.1851434-1-ardb@kernel.org
[will: Removed cpu_has_dit() as per Mark's suggestion on the list] Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Fri, 30 Sep 2022 11:18:44 +0000 (12:18 +0100)]
arm_pmu: rework ACPI probing
The current ACPI PMU probing logic tries to associate PMUs with CPUs
when the CPU is first brought online, in order to handle late hotplug,
though PMUs are only registered during early boot, and so for late
hotplugged CPUs this can only associate the CPU with an existing PMU.
We tried to be clever and the have the arm_pmu_acpi_cpu_starting()
callback allocate a struct arm_pmu when no matching instance is found,
in order to avoid duplication of logic. However, as above this doesn't
do anything useful for late hotplugged CPUs, and this requires us to
allocate memory in an atomic context, which is especially problematic
for PREEMPT_RT, as reported by Valentin and Pierre.
This patch reworks the probing to detect PMUs for all online CPUs in the
arm_pmu_acpi_probe() function, which is more aligned with how DT probing
works. The arm_pmu_acpi_cpu_starting() callback only tries to associate
CPUs with an existing arm_pmu instance, avoiding the problem of
allocating in atomic context.
Note that as we didn't previously register PMUs for late-hotplugged
CPUs, this change doesn't result in a loss of existing functionality,
though we will now warn when we cannot associate a CPU with a PMU.
This change allows us to pull the hotplug callback registration into the
arm_pmu_acpi_probe() function, as we no longer need the callbacks to be
invoked shortly after probing the boot CPUs, and can register it without
invoking the calls.
For the moment the arm_pmu_acpi_init() initcall remains to register the
SPE PMU, though in future this should probably be moved elsewhere (e.g.
the arm64 ACPI init code), since this doesn't need to be tied to the
regular CPU PMU code.
Mark Rutland [Fri, 30 Sep 2022 11:18:43 +0000 (12:18 +0100)]
arm_pmu: factor out PMU matching
A subsequent patch will rework the ACPI probing of PMUs, and we'll need
to match a CPU with a known cpuid in two separate paths.
Factor out the matching logic into a helper function so that it can be
reused.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Pierre Gondois <pierre.gondois@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-and-tested-by: Pierre Gondois <pierre.gondois@arm.com> Link: https://lore.kernel.org/r/20220930111844.1522365-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Fri, 30 Sep 2022 11:18:42 +0000 (12:18 +0100)]
arm_pmu: acpi: factor out PMU<->CPU association
A subsequent patch will rework the ACPI probing of PMUs, and we'll need
to associate a CPU with a PMU in two separate paths.
Factor out the association logic into a helper function so that it can
be reused.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Pierre Gondois <pierre.gondois@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-and-tested-by: Pierre Gondois <pierre.gondois@arm.com> Link: https://lore.kernel.org/r/20220930111844.1522365-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Masahiro Yamada [Wed, 12 Oct 2022 23:35:00 +0000 (08:35 +0900)]
arm64: remove special treatment for the link order of head.o
In the previous discussion (see the Link tag), Ard pointed out that
arm/arm64/kernel/head.o does not need any special treatment - the only
piece that must appear right at the start of the binary image is the
image header which is emitted into .head.text.
The linker script does the right thing to do. The build system does
not need to manipulate the link order of head.o.
Inspired by x86 commit 864b435514b2("x86/jump_label: Mark arguments as
const to satisfy asm constraints"), constify alternative_has_feature_*
argument to satisfy asm constraints. And Steven in [1] also pointed
out that "The "i" constraint needs to be a constant."
Tested with building a simple external kernel module with "O0".
Before the patch, got similar gcc warnings and errors as below:
In file included from <command-line>:
In function ‘alternative_has_feature_likely’,
inlined from ‘system_capabilities_finalized’ at
arch/arm64/include/asm/cpufeature.h:440:9,
inlined from ‘arm64_preempt_schedule_irq’ at
arch/arm64/kernel/entry-common.c:264:6:
include/linux/compiler_types.h:285:33: warning:
‘asm’ operand 0 probably does not match constraints
285 | #define asm_volatile_goto(x...) asm goto(x)
| ^~~
arch/arm64/include/asm/alternative-macros.h:232:9:
note: in expansion of macro ‘asm_volatile_goto’
232 | asm_volatile_goto(
| ^~~~~~~~~~~~~~~~~
include/linux/compiler_types.h:285:33: error:
impossible constraint in ‘asm’
285 | #define asm_volatile_goto(x...) asm goto(x)
| ^~~
arch/arm64/include/asm/alternative-macros.h:232:9:
note: in expansion of macro ‘asm_volatile_goto’
232 | asm_volatile_goto(
| ^~~~~~~~~~~~~~~~~
After the patch, the simple external test kernel module is built fine
with "-O0".
Jisheng Zhang [Thu, 6 Oct 2022 07:55:41 +0000 (15:55 +0800)]
arm64: jump_label: mark arguments as const to satisfy asm constraints
Inspired by x86 commit 864b435514b2("x86/jump_label: Mark arguments as
const to satisfy asm constraints"), mark arch_static_branch()'s and
arch_static_branch_jump()'s arguments as const to satisfy asm
constraints. And Steven in [1] also pointed out that "The "i"
constraint needs to be a constant."
Tested with building a simple external kernel module with "O0".
Robin Murphy [Wed, 28 Sep 2022 19:21:26 +0000 (20:21 +0100)]
ACPI/IORT: Update SMMUv3 DeviceID support
IORT E.e now allows SMMUv3 nodes to describe the DeviceID for MSIs
independently of wired GSIVs, where the previous oddly-restrictive
definition meant that an SMMU without PRI support had to provide a
DeviceID even if it didn't support MSIs either. Support this, with
the usual temporary flag definition while the real one is making
its way through ACPICA.
Besar Wicaksono [Thu, 29 Sep 2022 00:28:34 +0000 (19:28 -0500)]
ACPI: ARM Performance Monitoring Unit Table (APMT) initial support
ARM Performance Monitoring Unit Table describes the properties of PMU
support in ARM-based system. The APMT table contains a list of nodes,
each represents a PMU in the system that conforms to ARM CoreSight PMU
architecture. The properties of each node include information required
to access the PMU (e.g. MMIO base address, interrupt number) and also
identification. For more detailed information, please refer to the
specification below:
* APMT: https://developer.arm.com/documentation/den0117/latest
* ARM Coresight PMU:
https://developer.arm.com/documentation/ihi0091/latest
The initial support adds the detection of APMT table and generic
infrastructure to create platform devices for ARM CoreSight PMUs.
Similar to IORT the root pointer of APMT is preserved during runtime
and each PMU platform device is given a pointer to the corresponding
APMT node.
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20220929002834.32664-1-bwicaksono@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
Linus Torvalds [Sun, 6 Nov 2022 21:09:52 +0000 (13:09 -0800)]
Merge tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dan Williams:
"Several fixes for CXL region creation crashes, leaks and failures.
This is mainly fallout from the original implementation of dynamic CXL
region creation (instantiate new physical memory pools) that arrived
in v6.0-rc1.
Given the theme of "failures in the presence of pass-through decoders"
this also includes new regression test infrastructure for that case.
Summary:
- Fix region creation crash with pass-through decoders
- Fix region creation crash when no decoder allocation fails
- Fix region creation crash when scanning regions to enforce the
increasing physical address order constraint that CXL mandates
- Fix a memory leak for cxl_pmem_region objects, track 1:N instead of
1:1 memory-device-to-region associations.
- Fix a memory leak for cxl_region objects when regions with active
targets are deleted
- Fix assignment of NUMA nodes to CXL regions by CFMWS (CXL Window)
emulated proximity domains.
- Fix region creation failure for switch attached devices downstream
of a single-port host-bridge
- Fix false positive memory leak of cxl_region objects by recycling
recently used region ids rather than freeing them
- Add regression test infrastructure for a pass-through decoder
configuration
- Fix some mailbox payload handling corner cases"
* tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Recycle region ids
cxl/region: Fix 'distance' calculation with passthrough ports
tools/testing/cxl: Add a single-port host-bridge regression config
tools/testing/cxl: Fix some error exits
cxl/pmem: Fix cxl_pmem_region and cxl_memdev leak
cxl/region: Fix cxl_region leak, cleanup targets at region delete
cxl/region: Fix region HPA ordering validation
cxl/pmem: Use size_add() against integer overflow
cxl/region: Fix decoder allocation crash
ACPI: NUMA: Add CXL CFMWS 'nodes' to the possible nodes set
cxl/pmem: Fix failure to account for 8 byte header for writes to the device LSA.
cxl/region: Fix null pointer dereference due to pass through decoder commit
cxl/mbox: Add a check on input payload size
Linus Torvalds [Sun, 6 Nov 2022 20:59:12 +0000 (12:59 -0800)]
Merge tag 'hwmon-for-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix two regressions:
- Commit 54cc3dbfc10d ("hwmon: (pmbus) Add regulator supply into
macro") resulted in regulator undercount when disabling regulators.
Revert it.
- The thermal subsystem rework caused the scmi driver to no longer
register with the thermal subsystem because index values no longer
match. To fix the problem, the scmi driver now directly registers
with the thermal subsystem, no longer through the hwmon core"
* tag 'hwmon-for-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
Revert "hwmon: (pmbus) Add regulator supply into macro"
hwmon: (scmi) Register explicitly with Thermal Framework
Linus Torvalds [Sun, 6 Nov 2022 20:41:32 +0000 (12:41 -0800)]
Merge tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Add Cooper Lake's stepping to the PEBS guest/host events isolation
fixed microcode revisions checking quirk
- Update Icelake and Sapphire Rapids events constraints
- Use the standard energy unit for Sapphire Rapids in RAPL
- Fix the hw_breakpoint test to fail more graciously on !SMP configs
* tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
perf/x86/intel: Fix pebs event constraints for SPR
perf/x86/intel: Fix pebs event constraints for ICL
perf/x86/rapl: Use standard Energy Unit for SPR Dram RAPL domain
perf/hw_breakpoint: test: Skip the test if dependencies unmet
Linus Torvalds [Sun, 6 Nov 2022 20:36:47 +0000 (12:36 -0800)]
Merge tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add new Intel CPU models
- Enforce that TDX guests are successfully loaded only on TDX hardware
where virtualization exception (#VE) delivery on kernel memory is
disabled because handling those in all possible cases is "essentially
impossible"
- Add the proper include to the syscall wrappers so that BTF can see
the real pt_regs definition and not only the forward declaration
* tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add several Intel server CPU model numbers
x86/tdx: Panic on bad configs that #VE on "private" memory access
x86/tdx: Prepare for using "INFO" call for a second purpose
x86/syscall: Include asm/ptrace.h in syscall_wrapper header
Linus Torvalds [Sun, 6 Nov 2022 20:23:10 +0000 (12:23 -0800)]
Merge tag 'kbuild-fixes-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Use POSIX-compatible grep options
- Document git-related tips for reproducible builds
- Fix a typo in the modpost rule
- Suppress SIGPIPE error message from gcc-ar and llvm-ar
- Fix segmentation fault in the menuconfig search
* tag 'kbuild-fixes-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: fix segmentation fault in menuconfig search
kbuild: fix SIGPIPE error message for AR=gcc-ar and AR=llvm-ar
kbuild: fix typo in modpost
Documentation: kbuild: Add description of git for reproducible builds
kbuild: use POSIX-compatible grep option
Linus Torvalds [Sun, 6 Nov 2022 18:46:59 +0000 (10:46 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix the pKVM stage-1 walker erronously using the stage-2 accessor
- Correctly convert vcpu->kvm to a hyp pointer when generating an
exception in a nVHE+MTE configuration
- Check that KVM_CAP_DIRTY_LOG_* are valid before enabling them
- Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
- Document the boot requirements for FGT when entering the kernel at
EL1
x86:
- Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
- Make argument order consistent for kvcalloc()
- Userspace API fixes for DEBUGCTL and LBRs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Fix a typo about the usage of kvcalloc()
KVM: x86: Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL
KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()
KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs
arm64: booting: Document our requirements for fine grained traps with SME
KVM: arm64: Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
KVM: arm64: Fix bad dereference on MTE-enabled systems
KVM: arm64: Use correct accessor to parse stage-1 PTEs
Linus Torvalds [Sun, 6 Nov 2022 18:42:29 +0000 (10:42 -0800)]
Merge tag 'for-linus-6.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"One fix for silencing a smatch warning, and a small cleanup patch"
* tag 'for-linus-6.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: simplify sysenter and syscall setup
x86/xen: silence smatch warning in pmu_msr_chk_emulated()
Linus Torvalds [Sun, 6 Nov 2022 18:30:29 +0000 (10:30 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a number of bugs, including some regressions, the most serious of
which was one which would cause online resizes to fail with file
systems with metadata checksums enabled.
Also fix a warning caused by the newly added fortify string checker,
plus some bugs that were found using fuzzed file systems"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix fortify warning in fs/ext4/fast_commit.c:1551
ext4: fix wrong return err in ext4_load_and_init_journal()
ext4: fix warning in 'ext4_da_release_space'
ext4: fix BUG_ON() when directory entry has invalid rec_len
ext4: update the backup superblock's at the end of the online resize