www.infradead.org Git - users/dwmw2/linux.git/log

KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN

When KVM is in VHE mode, the host kernel tries to save and restore the
configuration of CPACR_EL1.SMEN (i.e. CPTR_EL2.SMEN when HCR_EL2.E2H=1)
across kvm_arch_vcpu_load_fp() and kvm_arch_vcpu_put_fp(), since the
configuration may be clobbered by hyp when running a vCPU. This logic
has historically been broken, and is currently redundant.

This logic was originally introduced in commit:

861262ab86270206 ("KVM: arm64: Handle SME host state when running guests")

At the time, the VHE hyp code would reset CPTR_EL2.SMEN to 0b00 when
returning to the host, trapping host access to SME state. Unfortunately,
this was unsafe as the host could take a softirq before calling
kvm_arch_vcpu_put_fp(), and if a softirq handler were to use kernel mode
NEON the resulting attempt to save the live FPSIMD/SVE/SME state would
result in a fatal trap.

That issue was limited to VHE mode. For nVHE/hVHE modes, KVM always
saved/restored the host kernel's CPACR_EL1 value, and configured
CPTR_EL2.TSM to 0b0, ensuring that host usage of SME would not be
trapped.

The issue above was incidentally fixed by commit:

375110ab51dec5dc ("KVM: arm64: Fix resetting SME trap values on reset for (h)VHE")

That commit changed the VHE hyp code to configure CPTR_EL2.SMEN to 0b01
when returning to the host, permitting host kernel usage of SME,
avoiding the issue described above. At the time, this was not identified
as a fix for commit 861262ab86270206.

Now that the host eagerly saves and unbinds its own FPSIMD/SVE/SME
state, there's no need to save/restore the state of the EL0 SME trap.
The kernel can safely save/restore state without trapping, as described
above, and will restore userspace state (including trap controls) before
returning to userspace.

Remove the redundant logic.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250210195226.1215254-5-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN

When KVM is in VHE mode, the host kernel tries to save and restore the
configuration of CPACR_EL1.ZEN (i.e. CPTR_EL2.ZEN when HCR_EL2.E2H=1)
across kvm_arch_vcpu_load_fp() and kvm_arch_vcpu_put_fp(), since the
configuration may be clobbered by hyp when running a vCPU. This logic is
currently redundant.

The VHE hyp code unconditionally configures CPTR_EL2.ZEN to 0b01 when
returning to the host, permitting host kernel usage of SVE.

Now that the host eagerly saves and unbinds its own FPSIMD/SVE/SME
state, there's no need to save/restore the state of the EL0 SVE trap.
The kernel can safely save/restore state without trapping, as described
above, and will restore userspace state (including trap controls) before
returning to userspace.

Remove the redundant logic.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250210195226.1215254-4-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Remove host FPSIMD saving for non-protected KVM

Now that the host eagerly saves its own FPSIMD/SVE/SME state,
non-protected KVM never needs to save the host FPSIMD/SVE/SME state,
and the code to do this is never used. Protected KVM still needs to
save/restore the host FPSIMD/SVE state to avoid leaking guest state to
the host (and to avoid revealing to the host whether the guest used
FPSIMD/SVE/SME), and that code needs to be retained.

Remove the unused code and data structures.

To avoid the need for a stub copy of kvm_hyp_save_fpsimd_host() in the
VHE hyp code, the nVHE/hVHE version is moved into the shared switch
header, where it is only invoked when KVM is in protected mode.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250210195226.1215254-3-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state

There are several problems with the way hyp code lazily saves the host's
FPSIMD/SVE state, including:

* Host SVE being discarded unexpectedly due to inconsistent
  configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to
  result in QEMU crashes where SVE is used by memmove(), as reported by
  Eric Auger:

  https://issues.redhat.com/browse/RHEL-68997

* Host SVE state is discarded *after* modification by ptrace, which was an
  unintentional ptrace ABI change introduced with lazy discarding of SVE state.

* The host FPMR value can be discarded when running a non-protected VM,
  where FPMR support is not exposed to a VM, and that VM uses
  FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR
  before unbinding the host's FPSIMD/SVE/SME state, leaving a stale
  value in memory.

Avoid these by eagerly saving and "flushing" the host's FPSIMD/SVE/SME
state when loading a vCPU such that KVM does not need to save any of the
host's FPSIMD/SVE/SME state. For clarity, fpsimd_kvm_prepare() is
removed and the necessary call to fpsimd_save_and_flush_cpu_state() is
placed in kvm_arch_vcpu_load_fp(). As 'fpsimd_state' and 'fpmr_ptr'
should not be used, they are set to NULL; all uses of these will be
removed in subsequent patches.

Historical problems go back at least as far as v5.17, e.g. erroneous
assumptions about TIF_SVE being clear in commit:

  8383741ab2e773a9 ("KVM: arm64: Get rid of host SVE tracking/saving")

... and so this eager save+flush probably needs to be backported to ALL
stable trees.

Fixes: 93ae6b01bafee8fa ("KVM: arm64: Discard any SVE state when entering KVM guests")
Fixes: 8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch")
Fixes: ef3be86021c3bdf3 ("KVM: arm64: Add save/restore support for FPMR")
Reported-by: Eric Auger <eauger@redhat.com>
Reported-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Tested-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250210195226.1215254-2-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Fix __pkvm_host_mkyoung_guest() return value

Don't use an uninitialised stack variable, and just return 0
on the non-error path.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202502100911.8c9DbtKD-lkp@intel.com/
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Simplify np-guest hypercalls

When the handling of a guest stage-2 permission fault races with an MMU
notifier, the faulting page might be gone from the guest's stage-2 by
the point we attempt to call (p)kvm_pgtable_stage2_relax_perms(). In the
normal KVM case, this leads to returning -EAGAIN which user_mem_abort()
handles correctly by simply re-entering the guest. However, the pKVM
hypercall implementation has additional logic to check the page state
using __check_host_shared_guest() which gets confused with absence of a
page mapped at the requested IPA and returns -ENOENT, hence breaking
user_mem_abort() and hilarity ensues.

Luckily, several of the hypercalls for managing the stage-2 page-table
of NP guests have no effect on the pKVM ownership tracking (wrprotect,
test_clear_young, mkyoung, and crucially relax_perms), so the extra
state checking logic is in fact not strictly necessary. So, to fix the
discrepancy between standard KVM and pKVM, let's just drop the
superfluous __check_host_shared_guest() logic from those hypercalls and
make the extra state checking a debug assertion dependent on
CONFIG_NVHE_EL2_DEBUG as we already do for other transitions.

Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250207145438.1333475-3-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Improve error handling from check_host_shared_guest()

The check_host_shared_guest() path expects to find a last-level valid
PTE in the guest's stage-2 page-table. However, it checks the PTE's
level before its validity, which makes it hard for callers to figure out
what went wrong.

To make error handling simpler, check the PTE's validity first.

Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250207145438.1333475-2-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: timer: Don't adjust the EL2 virtual timer offset

The way we deal with the EL2 virtual timer is a bit odd.

We try to cope with E2H being flipped, and adjust which offset
applies to that timer depending on the current E2H value. But that's
a complexity we shouldn't have to worry about.

What we have to deal with is either E2H being RES1, in which case
there is no offset, or E2H being RES0, and the virtual timer simply
does not exist.

Drop the adjusting of the timer offset, which makes things a bit
simpler. At the same time, make sure that accessing the HV timer
when E2H is RES0 results in an UNDEF in the guest.

Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250204110050.150560-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: timer: Correctly handle EL1 timer emulation when !FEAT_ECV

Both Wei-Lin Chang and Volodymyr Babchuk report that the way we
handle the emulation of EL1 timers with NV is completely wrong,
specially in the case of HCR_EL2.E2H==0.

There are three problems in about as many lines of code:

- With E2H==0, the EL1 timers are overwritten with the EL1 state,
  while they should actually contain the EL2 state (as per the timer
  map)

- With E2H==1, we run the full EL1 timer emulation even when ECV
  is present, hiding a bug in timer_emulate() (see previous patch)

- The comments are actively misleading, and say all the wrong things.

This is only attributable to the code having been initially written
for FEAT_NV, hacked up to handle FEAT_NV2 *in parallel*, and vaguely
hacked again to be FEAT_NV2 only. Oh, and yours truly being a gold
plated idiot.

The fix is obvious: just delete most of the E2H==0 code, have a unified
handling of the timers (because they really are E2H agnostic), and
make sure we don't execute any of that when FEAT_ECV is present.

Fixes: 4bad3068cfa9f ("KVM: arm64: nv: Sync nested timer state with FEAT_NV2")
Reported-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw>
Reported-by: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Link: https://lore.kernel.org/r/fqiqfjzwpgbzdtouu2pwqlu7llhnf5lmy4hzv5vo6ph4v3vyls@jdcfy3fjjc5k
Link: https://lore.kernel.org/r/87frl51tse.fsf@epam.com
Tested-by: Dmytro Terletskyi <dmytro_terletskyi@epam.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250204110050.150560-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: timer: Always evaluate the need for a soft timer

When updating the interrupt state for an emulated timer, we return
early and skip the setup of a soft timer that runs in parallel
with the guest.

While this is OK if we have set the interrupt pending, it is pretty
wrong if the guest moved CVAL into the future. In that case,
no timer is armed and the guest can wait for a very long time
(it will take a full put/load cycle for the situation to resolve).

This is specially visible with EDK2 running at EL2, but still
using the EL1 virtual timer, which in that case is fully emulated.
Any key-press takes ages to be captured, as there is no UART
interrupt and EDK2 relies on polling from a timer...

The fix is simply to drop the early return. If the timer interrupt
is pending, we will still return early, and otherwise arm the soft
timer.

Fixes: 4d74ecfa6458b ("KVM: arm64: Don't arm a hrtimer for an already pending timer")
Cc: stable@vger.kernel.org
Tested-by: Dmytro Terletskyi <dmytro_terletskyi@epam.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250204110050.150560-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Fix nested S2 MMU structures reallocation

For each vcpu that userspace creates, we allocate a number of
s2_mmu structures that will eventually contain our shadow S2
page tables.

Since this is a dynamically allocated array, we reallocate
the array and initialise the newly allocated elements. Once
everything is correctly initialised, we adjust pointer and size
in the kvm structure, and move on.

But should that initialisation fail *and* the reallocation triggered
a copy to another location, we end-up returning early, with the
kvm structure still containing the (now stale) old pointer. Weeee!

Cure it by assigning the pointer early, and use this to perform
the initialisation. If everything succeeds, we adjust the size.
Otherwise, we just leave the size as it was, no harm done, and the
new memory is as good as the ol' one (we hope...).

Fixes: 4f128f8e1aaac ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures")
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/r/20250204145554.774427-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Fail protected mode init if no vgic hardware is present

Protected mode assumes that at minimum vgic-v3 is present, however KVM
fails to actually enforce this at the time of initialization. As such,
when running protected mode in a half-baked state on GICv2 hardware we
see the hyp go belly up at vcpu_load() when it tries to restore the
vgic-v3 cpuif:

  $ ./arch_timer_edge_cases
  [  130.599140] kvm [4518]: nVHE hyp panic at: [<ffff800081102b58>] __kvm_nvhe___vgic_v3_restore_vmcr_aprs+0x8/0x84!
  [  130.603685] kvm [4518]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE
  [  130.611962] kvm [4518]: Hyp Offset: 0xfffeca95ed000000
  [  130.617053] Kernel panic - not syncing: HYP panic:
  [  130.617053] PS:800003c9 PC:0000b56a94102b58 ESR:0000000002000000
  [  130.617053] FAR:ffff00007b98d4d0 HPFAR:00000000007b98d0 PAR:0000000000000000
  [  130.617053] VCPU:0000000000000000
  [  130.638013] CPU: 0 UID: 0 PID: 4518 Comm: arch_timer_edge Tainted: G         C         6.13.0-rc3-00009-gf7d03fcbf1f4 #1
  [  130.648790] Tainted: [C]=CRAP
  [  130.651721] Hardware name: Libre Computer AML-S905X-CC (DT)
  [  130.657242] Call trace:
  [  130.659656]  show_stack+0x18/0x24 (C)
  [  130.663279]  dump_stack_lvl+0x38/0x90
  [  130.666900]  dump_stack+0x18/0x24
  [  130.670178]  panic+0x388/0x3e8
  [  130.673196]  nvhe_hyp_panic_handler+0x104/0x208
  [  130.677681]  kvm_arch_vcpu_load+0x290/0x548
  [  130.681821]  vcpu_load+0x50/0x80
  [  130.685013]  kvm_arch_vcpu_ioctl_run+0x30/0x868
  [  130.689498]  kvm_vcpu_ioctl+0x2e0/0x974
  [  130.693293]  __arm64_sys_ioctl+0xb4/0xec
  [  130.697174]  invoke_syscall+0x48/0x110
  [  130.700883]  el0_svc_common.constprop.0+0x40/0xe0
  [  130.705540]  do_el0_svc+0x1c/0x28
  [  130.708818]  el0_svc+0x30/0xd0
  [  130.711837]  el0t_64_sync_handler+0x10c/0x138
  [  130.716149]  el0t_64_sync+0x198/0x19c
  [  130.719774] SMP: stopping secondary CPUs
  [  130.723660] Kernel Offset: disabled
  [  130.727103] CPU features: 0x000,00000800,02800000,0200421b
  [  130.732537] Memory Limit: none
  [  130.735561] ---[ end Kernel panic - not syncing: HYP panic:
  [  130.735561] PS:800003c9 PC:0000b56a94102b58 ESR:0000000002000000
  [  130.735561] FAR:ffff00007b98d4d0 HPFAR:00000000007b98d0 PAR:0000000000000000
  [  130.735561] VCPU:0000000000000000 ]---

Fix it by failing KVM initialization if the system doesn't implement
vgic-v3, as protected mode will never do anything useful on such
hardware.

Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/kvmarm/5ca7588c-7bf2-4352-8661-e4a56a9cd9aa@sirena.org.uk/
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250203231543.233511-1-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Flush/sync debug state in protected mode

The recent changes to debug state management broke self-hosted debug for
guests when running in protected mode, since both the debug owner and
the debug state itself aren't shared with the hyp's view of the vcpu.

Fix it by flushing/syncing the relevant bits with the hyp vcpu.

Fixes: beb470d96cec ("KVM: arm64: Use debug_owner to track if debug regs need save/restore")
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/kvmarm/5f62740f-a065-42d9-9f56-8fb648b9c63f@sirena.org.uk/
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250131222922.1548780-3-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Flush hyp bss section after initialization of variables in bss

To determine CPU features during initialization, the nVHE hypervisor
utilizes sanitized values of the host's CPU features registers. These
values, stored in u64 idaa64*_el1_sys_val variables are updated by the
kvm_hyp_init_symbols() function at EL1. To ensure EL2 visibility with
the MMU off, the data cache needs to be flushed after these updates.
However, individually flushing each variable using
kvm_flush_dcache_to_poc() is inefficient.

These cpu feature variables would be part of the bss section of
the hypervisor. Hence, flush the entire bss section of hypervisor
once the initialization is complete.

Fixes: 6c30bfb18d0b ("KVM: arm64: Add handlers for protected VM System Registers")
Suggested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Lokesh Vutla <lokeshvutla@google.com>
Link: https://lore.kernel.org/r/20250121044016.2219256-1-lokeshvutla@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

arm64/sysreg: Get rid of TRFCR_ELx SysregFields

There is no such thing as TRFCR_ELx in the architecture.
What we have is TRFCR_EL1, for which TRFCR_EL12 is an accessor.

Rename TRFCR_ELx_* to TRFCR_EL1_*, and fix the bit of code using
these names.

Similarly, TRFCR_EL12 is redefined as a mapping to TRFCR_EL1.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/87cygsqgkh.wl-maz@kernel.org
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>

Merge branch kvm-arm64/misc-6.14 into kvmarm-master/next

* kvm-arm64/misc-6.14:
  : .
  : Misc KVM/arm64 changes for 6.14
  :
  : - Don't expose AArch32 EL0 capability when NV is enabled
  :
  : - Update documentation to reflect the full gamut of kvm-arm.mode
  :   behaviours
  :
  : - Use the hypervisor VA bit width when dumping stacktraces
  :
  : - Decouple the hypervisor stack size from PAGE_SIZE, at least
  :   on the surface...
  :
  : - Make use of str_enabled_disabled() when advertising GICv4.1 support
  :
  : - Explicitly handle BRBE traps as UNDEFINED
  : .
  KVM: arm64: Explicitly handle BRBE traps as UNDEFINED
  KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe()
  arm64: kvm: Introduce nvhe stack size constants
  KVM: arm64: Fix nVHE stacktrace VA bits mask
  Documentation: Update the behaviour of "kvm-arm.mode"
  KVM: arm64: nv: Advertise the lack of AArch32 EL0 support

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/nv-resx-fixes-6.14 into kvmarm-master/next

* kvm-arm64/nv-resx-fixes-6.14:
  : .
  : Fixes for NV sysreg accessors. From the cover letter:
  :
  : "Joey recently reported that some rather basic tests were failing on
  : NV, and managed to track it down to critical register fields (such as
  : HCR_EL2.E2H) not having their expect value.
  :
  : Further investigation has outlined a couple of critical issues:
  :
  : - Evaluating HCR_EL2.E2H must always be done with a sanitising
  :   accessor, no ifs, no buts. Given that KVM assumes a fixed value for
  :   this bit, we cannot leave it to the guest to mess with.
  :
  : - Resetting the sysreg file must result in the RESx bits taking
  :   effect. Otherwise, we may end-up making the wrong decision (see
  :   above), and we definitely expose invalid values to the guest. Note
  :   that because we compute the RESx masks very late in the VM setup, we
  :   need to apply these masks at that particular point as well.
  : [...]"
  : .
  KVM: arm64: nv: Apply RESx settings to sysreg reset values
  KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors

Signed-off-by: Marc Zyngier <maz@kernel.org>
# Conflicts:
# arch/arm64/kvm/nested.c

Merge branch kvm-arm64/coresight-6.14 into kvmarm-master/next

* kvm-arm64/coresight-6.14:
  : .
  : Trace filtering update from James Clark. From the cover letter:
  :
  : "The guest filtering rules from the Perf session are now honored for both
  : nVHE and VHE modes. This is done by either writing to TRFCR_EL12 at the
  : start of the Perf session and doing nothing else further, or caching the
  : guest value and writing it at guest switch for nVHE. In pKVM, trace is
  : now be disabled for both protected and unprotected guests."
  : .
  KVM: arm64: Fix selftests after sysreg field name update
  coresight: Pass guest TRFCR value to KVM
  KVM: arm64: Support trace filtering for guests
  KVM: arm64: coresight: Give TRBE enabled state to KVM
  coresight: trbe: Remove redundant disable call
  arm64/sysreg/tools: Move TRFCR definitions to sysreg
  tools: arm64: Update sysreg.h header files

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/pkvm-memshare-declutter into kvmarm-master/next

* kvm-arm64/pkvm-memshare-declutter:
  : .
  : pKVM memory transition simplifications, courtesy of Quentin Perret.
  :
  : From the cover letter:
  : "Since its early days, pKVM has formalized memory 'transitions' (shares
  : and donations) using 'struct pkvm_mem_transition' and bunch of helpers
  : to manipulate it. The intention was for all transitions to use this
  : machinery to ensure we're checking things consistently. However, as
  : development progressed, it became clear that the rigidity of this model
  : made it really difficult to use in some use-cases which ended-up
  : side-stepping it entirely. That is the case for the
  : hyp_{un}pin_shared_mem() and host_{un}share_guest() paths upstream which
  : use lower level helpers directly, as well as for several other pKVM
  : features that should land upstream in the future (ex: when a guest
  : relinquishes a page during ballooning, when annotating a page that is
  : being DMA'd to, ...). On top of this, the pkvm_mem_transition machinery
  : requires a lot of boilerplate which makes the code hard to read, but
  : also adds layers of indirection that no compilers seems to see through,
  : hence leading to suboptimal generated code.
  :
  : Given all the above, this series removes the pkvm_mem_transition
  : machinery from mem_protect.c, and converts all its users to use
  : __*_{check,set}_page_state_range() low-level helpers directly."
  : .
  KVM: arm64: Drop pkvm_mem_transition for host/hyp donations
  KVM: arm64: Drop pkvm_mem_transition for host/hyp sharing
  KVM: arm64: Drop pkvm_mem_transition for FF-A
  KVM: arm64: Only apply PMCR_EL0.P to the guest range of counters
  KVM: arm64: nv: Reload PMU events upon MDCR_EL2.HPME change
  KVM: arm64: Use KVM_REQ_RELOAD_PMU to handle PMCR_EL0.E change
  KVM: arm64: Add unified helper for reprogramming counters by mask
  KVM: arm64: Always check the state from hyp_ack_unshare()
  KVM: arm64: Fix set_id_regs selftest for ASIDBITS becoming unwritable

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/nv-timers into kvmarm-master/next

* kvm-arm64/nv-timers:
  : .
  : Nested Virt support for the EL2 timers. From the initial cover letter:
  :
  : "Here's another batch of NV-related patches, this time bringing in most
  : of the timer support for EL2 as well as nested guests.
  :
  : The code is pretty convoluted for a bunch of reasons:
  :
  : - FEAT_NV2 breaks the timer semantics by redirecting HW controls to
  :   memory, meaning that a guest could setup a timer and never see it
  :   firing until the next exit
  :
  : - We go try hard to reflect the timer state in memory, but that's not
  :   great.
  :
  : - With FEAT_ECV, we can finally correctly emulate the virtual timer,
  :   but this emulation is pretty costly
  :
  : - As a way to make things suck less, we handle timer reads as early as
  :   possible, and only defer writes to the normal trap handling
  :
  : - Finally, some implementations are badly broken, and require some
  :   hand-holding, irrespective of NV support. So we try and reuse the NV
  :   infrastructure to make them usable. This could be further optimised,
  :   but I'm running out of patience for this sort of HW.
  :
  : [...]"
  : .
  KVM: arm64: nv: Fix doc header layout for timers
  KVM: arm64: nv: Document EL2 timer API
  KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosity
  KVM: arm64: nv: Sanitise CNTHCTL_EL2
  KVM: arm64: nv: Propagate CNTHCTL_EL2.EL1NV{P,V}CT bits
  KVM: arm64: nv: Add trap routing for CNTHCTL_EL2.EL1{NVPCT,NVVCT,TVT,TVCT}
  KVM: arm64: Handle counter access early in non-HYP context
  KVM: arm64: nv: Accelerate EL0 counter accesses from hypervisor context
  KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV in use
  KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
  KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state
  KVM: arm64: nv: Sync nested timer state with FEAT_NV2
  KVM: arm64: nv: Add handling of EL2-specific timer registers

Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Fix doc header layout for timers

Stephen reports that 'make htmldocs' spits out a warning
("Documentation/virt/kvm/devices/vcpu.rst:147: WARNING: Definition
list ends without a blank line; unexpected unindent.").

Fix it by keeping all the timer attributes on a single line.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Apply RESx settings to sysreg reset values

While we have sanitisation in place for the guest sysregs, we lack
that sanitisation out of reset. So some of the fields could be
evaluated and not reflect their RESx status, which sounds like
a very bad idea.

Apply the RESx masks to the the sysreg file in two situations:

- when going via a reset of the sysregs

- after having computed the RESx masks

Having this separate reset phase from the actual reset handling is
a bit grotty, but we need to apply this after the ID registers are
final.

Tested-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250112165029.1181056-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors

A lot of the NV code depends on HCR_EL2.{E2H,TGE}, and we assume
in places that at least HCR_EL2.E2H is invariant for a given guest.

However, we make a point in *not* using the sanitising accessor
that would enforce this, and are at the mercy of the guest doing
stupid things. Clearly, that's not good.

Rework the HCR_EL2 accessors to use __vcpu_sys_reg() instead,
guaranteeing that the RESx settings get applied, specially
when HCR_EL2.E2H is evaluated. This results in fewer accessors
overall.

Huge thanks to Joey who spent a long time tracking this bug down.

Reported-by: Joey Gouly <Joey.Gouly@arm.com>
Tested-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250112165029.1181056-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Fix selftests after sysreg field name update

Fix KVM selftests that check for EL0's 64bit-ness, and use a now
removed definition. Kindly point them at the new one.

Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>

coresight: Pass guest TRFCR value to KVM

Currently the userspace and kernel filters for guests are never set, so
no trace will be generated for them. Add support for tracing guests by
passing the desired TRFCR value to KVM so it can be applied to the
guest.

By writing either E1TRE or E0TRE, filtering on either guest kernel or
guest userspace is also supported. And if both E1TRE and E0TRE are
cleared when exclude_guest is set, that option is supported too. This
change also brings exclude_host support which is difficult to add as a
separate commit without excess churn and resulting in no trace at all.

cpu_prohibit_trace() gets moved to TRBE because the ETM driver doesn't
need the read, it already has the base TRFCR value. TRBE only needs
the read to disable it and then restore.

Testing
=======

The addresses were counted with the following:

  $ perf report -D | grep -Eo 'EL2|EL1|EL0' | sort | uniq -c

Guest kernel only:

  $ perf record -e cs_etm//Gk -a -- true
    535 EL1
      1 EL2

Guest user only (only 5 addresses because the guest runs slowly in the
model):

  $ perf record -e cs_etm//Gu -a -- true
    5 EL0

Host kernel only:

  $  perf record -e cs_etm//Hk -a -- true
   3501 EL2

Host userspace only:

  $  perf record -e cs_etm//Hu -a -- true
    408 EL0
      1 EL2

Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20250106142446.628923-8-james.clark@linaro.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Support trace filtering for guests

For nVHE, switch the filter value in and out if the Coresight driver
asks for it. This will support filters for guests when sinks other than
TRBE are used.

For VHE, just write the filter directly to TRFCR_EL1 where trace can be
used even with TRBE sinks.

Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250106142446.628923-7-james.clark@linaro.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: coresight: Give TRBE enabled state to KVM

Currently in nVHE, KVM has to check if TRBE is enabled on every guest
switch even if it was never used. Because it's a debug feature and is
more likely to not be used than used, give KVM the TRBE buffer status to
allow a much simpler and faster do-nothing path in the hyp.

Protected mode now disables trace regardless of TRBE (because
trfcr_while_in_guest is always 0), which was not previously done.
However, it continues to flush whenever the buffer is enabled
regardless of the filter status. This avoids the hypothetical case of a
host that had disabled the filter but not flushed which would arise if
only doing the flush when the filter was enabled.

Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250106142446.628923-6-james.clark@linaro.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

coresight: trbe: Remove redundant disable call

trbe_drain_and_disable_local() just clears TRBLIMITR and drains.
TRBLIMITR is already cleared on the next line after this call, so
replace it with only drain. This is so we can make a kvm call that has a
preempt enabled warning from set_trbe_disabled() in the next commit,
where trbe_reset_local() is called from a preemptible hotplug path.

Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250106142446.628923-5-james.clark@linaro.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

arm64/sysreg/tools: Move TRFCR definitions to sysreg

Convert TRFCR to automatic generation. Add separate definitions for ELx
and EL2 as TRFCR_EL1 doesn't have CX. This also mirrors the previous
definition so no code change is required.

Also add TRFCR_EL12 which will start to be used in a later commit.

Unfortunately, to avoid breaking the Perf build with duplicate
definition errors, the tools copy of the sysreg.h header needs to be
updated at the same time rather than the usual second commit. This is
because the generated version of sysreg
(arch/arm64/include/generated/asm/sysreg-defs.h), is currently shared
and tools/ does not have its own copy.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250106142446.628923-4-james.clark@linaro.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

tools: arm64: Update sysreg.h header files

Created with the following:

cp include/linux/kasan-tags.h tools/include/linux/
cp arch/arm64/include/asm/sysreg.h tools/arch/arm64/include/asm/

Update the tools copy of sysreg.h so that the next commit to add a new
register doesn't have unrelated changes in it. Because the new version
of sysreg.h includes kasan-tags.h, that file also now needs to be copied
into tools.

Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250106142446.628923-3-james.clark@linaro.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Drop pkvm_mem_transition for host/hyp donations

Simplify the __pkvm_host_donate_hyp() and pkvm_hyp_donate_host() paths
by not using the pkvm_mem_transition machinery. As the last users of
this, also remove all the now unused code.

No functional changes intended.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250110121936.1559655-4-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Drop pkvm_mem_transition for host/hyp sharing

Simplify the __pkvm_host_{un}share_hyp() paths by not using the
pkvm_mem_transition machinery. As there are the last users of the
do_share()/do_unshare(), remove all the now-unused code as well.

No functional changes intended.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250110121936.1559655-3-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Drop pkvm_mem_transition for FF-A

Simplify the __pkvm_host_{un}share_ffa() paths by using
{check,set}_page_state_range().

No functional changes intended.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250110121936.1559655-2-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch 'kvmarm-fixes-6.13-3'

Merge branch kvm-arm64/pkvm-fixed-features-6.14 into kvmarm-master/next

* kvm-arm64/pkvm-fixed-features-6.14: (24 commits)
  : .
  : Complete rework of the pKVM handling of features, catching up
  : with the rest of the code deals with it these days.
  : Patches courtesy of Fuad Tabba. From the cover letter:
  :
  : "This patch series uses the vm's feature id registers to track the
  : supported features, a framework similar to nested virt to set the
  : trap values, and removes the need to store cptr_el2 per vcpu in
  : favor of setting its value when traps are activated, as VHE mode
  : does."
  :
  : This branch drags the arm64/for-next/cpufeature branch to solve
  : ugly conflicts in -next.
  : .
  KVM: arm64: Fix FEAT_MTE in pKVM
  KVM: arm64: Use kvm_vcpu_has_feature() directly for struct kvm
  KVM: arm64: Convert the SVE guest vcpu flag to a vm flag
  KVM: arm64: Remove PtrAuth guest vcpu flag
  KVM: arm64: Fix the value of the CPTR_EL2 RES1 bitmask for nVHE
  KVM: arm64: Refactor kvm_reset_cptr_el2()
  KVM: arm64: Calculate cptr_el2 traps on activating traps
  KVM: arm64: Remove redundant setting of HCR_EL2 trap bit
  KVM: arm64: Remove fixed_config.h header
  KVM: arm64: Rework specifying restricted features for protected VMs
  KVM: arm64: Set protected VM traps based on its view of feature registers
  KVM: arm64: Fix RAS trapping in pKVM for protected VMs
  KVM: arm64: Initialize feature id registers for protected VMs
  KVM: arm64: Use KVM extension checks for allowed protected VM capabilities
  KVM: arm64: Remove KVM_ARM_VCPU_POWER_OFF from protected VMs allowed features in pKVM
  KVM: arm64: Move checking protected vcpu features to a separate function
  KVM: arm64: Group setting traps for protected VMs by control register
  KVM: arm64: Consolidate allowed and restricted VM feature checks
  arm64/sysreg: Get rid of CPACR_ELx SysregFields
  arm64/sysreg: Convert *_EL12 accessors to Mapping
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>
# Conflicts:
# arch/arm64/kvm/fpsimd.c
# arch/arm64/kvm/hyp/nvhe/pkvm.c

Merge branch kvm-arm64/pkvm-np-guest into kvmarm-master/next

* kvm-arm64/pkvm-np-guest:
  : .
  : pKVM support for non-protected guests using the standard MM
  : infrastructure, courtesy of Quentin Perret. From the cover letter:
  :
  : "This series moves the stage-2 page-table management of non-protected
  : guests to EL2 when pKVM is enabled. This is only intended as an
  : incremental step towards a 'feature-complete' pKVM, there is however a
  : lot more that needs to come on top.
  :
  : With that series applied, pKVM provides near-parity with standard KVM
  : from a functional perspective all while Linux no longer touches the
  : stage-2 page-tables itself at EL1. The majority of mm-related KVM
  : features work out of the box, including MMU notifiers, dirty logging,
  : RO memslots and things of that nature. There are however two gotchas:
  :
  :  - We don't support mapping devices into guests: this requires
  :    additional hypervisor support for tracking the 'state' of devices,
  :    which will come in a later series. No device assignment until then.
  :
  :  - Stage-2 mappings are forced to page-granularity even when backed by a
  :    huge page for the sake of simplicity of this series. I'm only aiming
  :    at functional parity-ish (from userspace's PoV) for now, support for
  :    HP can be added on top later as a perf improvement."
  : .
  KVM: arm64: Plumb the pKVM MMU in KVM
  KVM: arm64: Introduce the EL1 pKVM MMU
  KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
  KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
  KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()
  KVM: arm64: Introduce __pkvm_host_wrprotect_guest()
  KVM: arm64: Introduce __pkvm_host_relax_guest_perms()
  KVM: arm64: Introduce __pkvm_host_unshare_guest()
  KVM: arm64: Introduce __pkvm_host_share_guest()
  KVM: arm64: Introduce __pkvm_vcpu_{load,put}()
  KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers
  KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function
  KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms
  KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung
  KVM: arm64: Move host page ownership tracking to the hyp vmemmap
  KVM: arm64: Make hyp_page::order a u8
  KVM: arm64: Move enum pkvm_page_state to memory.h
  KVM: arm64: Change the layout of enum pkvm_page_state

Signed-off-by: Marc Zyngier <maz@kernel.org>
# Conflicts:
# arch/arm64/kvm/arm.c

Merge branch kvm-arm64/debug-6.14 into kvmarm-master/next

* kvm-arm64/debug-6.14:
  : .
  : Large rework of the debug code to make it a bit less horrid,
  : courtesy of Oliver Upton. From the original cover letter:
  :
  : "The debug code has become a bit difficult to reason about, especially
  : all the hacks and bandaids for state tracking + trap configuration.
  :
  : This series reworks the entire mess around using a single enumeration to
  : track the state of the debug registers (free, guest-owned, host-owned),
  : using that to drive trap configuration and save/restore.
  :
  : On top of that, this series wires most of the implementation into vCPU
  : load/put rather than the main KVM_RUN loop. This has been a long time
  : coming for VHE, as a lot of the trap configuration and EL1 state gets
  : loaded into hardware at that point anyway.
  :
  : The save/restore of the debug registers is simplified quite a bit as
  : well. KVM will now restore the registers for *any* access rather than
  : just writes, and keep doing so until the next vcpu_put() instead of
  : dropping it on the floor after the next exception."
  : .
  KVM: arm64: Promote guest ownership for DBGxVR/DBGxCR reads
  KVM: arm64: Fold DBGxVR/DBGxCR accessors into common set
  KVM: arm64: Avoid reading ID_AA64DFR0_EL1 for debug save/restore
  KVM: arm64: nv: Honor MDCR_EL2.TDE routing for debug exceptions
  KVM: arm64: Manage software step state at load/put
  KVM: arm64: Don't hijack guest context MDSCR_EL1
  KVM: arm64: Compute MDCR_EL2 at vcpu_load()
  KVM: arm64: Reload vCPU for accesses to OSLAR_EL1
  KVM: arm64: Use debug_owner to track if debug regs need save/restore
  KVM: arm64: Remove vestiges of debug_ptr
  KVM: arm64: Remove debug tracepoints
  KVM: arm64: Select debug state to save/restore based on debug owner
  KVM: arm64: Clean up KVM_SET_GUEST_DEBUG handler
  KVM: arm64: Evaluate debug owner at vcpu_load()
  KVM: arm64: Write MDCR_EL2 directly from kvm_arm_setup_mdcr_el2()
  KVM: arm64: Move host SME/SVE tracking flags to host data
  KVM: arm64: Track presence of SPE/TRBE in kvm_host_data instead of vCPU
  KVM: arm64: Get rid of __kvm_get_mdcr_el2() and related warts
  KVM: arm64: Drop MDSCR_EL1_DEBUG_MASK

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge remote-tracking branch 'arm64/for-next/cpufeature' into kvm-arm64/pkvm-fixed-features-6.14

Merge arm64/for-next/cpufeature to solve extensive conflicts
caused by the CPACR_ELx->CPACR_EL1 repainting.

Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Explicitly handle BRBE traps as UNDEFINED

The Branch Record Buffer Extension (BRBE) adds a number of system
registers and instructions which we don't currently intend to expose to
guests. Our existing logic handles this safely, but this could be
improved with some explicit handling of BRBE.

KVM currently hides BRBE from guests: the cpufeature code's
ftr_id_aa64dfr0[] table doesn't have an entry for the BRBE field, and so
this will be zero in the sanitised value of ID_AA64DFR0 exposed to
guests via read_sanitised_id_aa64dfr0_el1().

KVM currently traps BRBE usage from guests: the default configuration of
the fine-grained trap controls HDFGRTR_EL2.{nBRBDATA,nBRBCTL,nBRBIDR}
and HFGITR_EL2.{nBRBINJ_nBRBIALL} cause these to be trapped to EL2.

Well-behaved guests shouldn't try to use the registers or instructions,
but badly-behaved guests could use these, resulting in unnecessary
warnings from KVM before it injects an UNDEF, e.g.

| kvm [197]: Unsupported guest access at: 401c98
|  { Op0( 2), Op1( 1), CRn( 9), CRm( 0), Op2( 0), func_read },
| kvm [197]: Unsupported guest access at: 401d04
|  { Op0( 2), Op1( 1), CRn( 9), CRm( 0), Op2( 1), func_read },
| kvm [197]: Unsupported guest access at: 401d70
|  { Op0( 2), Op1( 1), CRn( 9), CRm( 2), Op2( 0), func_read },
| kvm [197]: Unsupported guest access at: 401ddc
|  { Op0( 2), Op1( 1), CRn( 9), CRm( 1), Op2( 0), func_read },
| kvm [197]: Unsupported guest access at: 401e48
|  { Op0( 2), Op1( 1), CRn( 9), CRm( 1), Op2( 1), func_read },
| kvm [197]: Unsupported guest access at: 401eb4
|  { Op0( 2), Op1( 1), CRn( 9), CRm( 1), Op2( 2), func_read },
| kvm [197]: Unsupported guest access at: 401f20
|  { Op0( 2), Op1( 1), CRn( 9), CRm( 0), Op2( 2), func_read },
| kvm [197]: Unsupported guest access at: 401f8c
|  { Op0( 1), Op1( 1), CRn( 7), CRm( 2), Op2( 4), func_write },
| kvm [197]: Unsupported guest access at: 401ff8
|  { Op0( 1), Op1( 1), CRn( 7), CRm( 2), Op2( 5), func_write },

As with other features that we know how to handle, these warnings aren't
particularly interesting, and we can simply treat these as UNDEFINED
without any warning. Add the necessary fine-grained undef configuration
to make this happen, as suggested by Marc Zyngier:

  https://lore.kernel.org/linux-arm-kernel/86r0czk6wd.wl-maz@kernel.org/

At the same time, update read_sanitised_id_aa64dfr0_el1() to hide BRBE
from guests, as we do for SPE. This will prevent accidentally exposing
BRBE to guests if/when ftr_id_aa64dfr0[] gains a BRBE entry.

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20250109223836.419240-1-robh@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe()

Remove hard-coded strings by using the str_enabled_disabled() helper
function.

Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250110225310.369980-2-thorsten.blum@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>

arm64: kvm: Introduce nvhe stack size constants

Refactor nvhe stack code to use NVHE_STACK_SIZE/SHIFT constants,
instead of directly using PAGE_SIZE/SHIFT. This makes the code a bit
easier to read, without introducing any functional changes.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Link: https://lore.kernel.org/r/20241112003336.1375584-1-kaleshsingh@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Fix nVHE stacktrace VA bits mask

The hypervisor VA space size depends on both the ID map's
(IDMAP_VA_BITS) and the kernel stage-1 (VA_BITS). However, the
hypervisor stacktrace decoding is solely relying on VA_BITS. This is
especially an issue when VA_BITS < IDMAP_VA_BITS (i.e. VA_BITS is
39-bit): the hypervisor may have addresses bigger than the stacktrace is
masking.

Align this mask with hyp_va_bits.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250107112821.416591-1-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Fix FEAT_MTE in pKVM

Make sure we do not trap access to Allocation Tags.

Fixes: b56680de9c64 ("KVM: arm64: Initialize trap register values in hyp in pKVM")
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250106112421.65355-1-vladimir.murzin@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

Documentation: Update the behaviour of "kvm-arm.mode"

Commit 5053c3f0519c ("KVM: arm64: Use hVHE in pKVM by default on CPUs with
VHE support") modified the behaviour of "kvm-arm.mode=protected" without
the updating the kernel parameters doc.

Update it to match the current implementation.

Also, update required architecture version for nested virtualization as
suggested by Marc.

Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241025093259.2216093-1-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Document EL2 timer API

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-13-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosity

It appears that on Qualcomm's x1e CPU, CNTVOFF_EL2 doesn't really
work, specially with HCR_EL2.E2H=1.

A non-zero offset results in a screaming virtual timer interrupt,
to the tune of a few 100k interrupts per second on a 4 vcpu VM.
This is also evidenced by this CPU's inability to correctly run
any of the timer selftests.

The only case this doesn't break is when this register is set to 0,
which breaks VM migration.

When HCR_EL2.E2H=0, the timer seems to behave normally, and does
not result in an interrupt storm.

As a workaround, use the fact that this CPU implements FEAT_ECV,
and trap all accesses to the virtual timer and counter, keeping
CNTVOFF_EL2 set to zero, and emulate accesses to CVAL/TVAL/CTL
and the counter itself, fixing up the timer to account for the
missing offset.

And if you think this is disgusting, you'd probably be right.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-12-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Sanitise CNTHCTL_EL2

Inject some sanity in CNTHCTL_EL2, ensuring that we don't handle
more than we advertise to the guest.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Propagate CNTHCTL_EL2.EL1NV{P,V}CT bits

Allow a guest hypervisor to trap accesses to CNT{P,V}CT_EL02 by
propagating these trap bits to the host trap configuration.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-10-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Add trap routing for CNTHCTL_EL2.EL1{NVPCT,NVVCT,TVT,TVCT}

For completeness, fun, and cerebral meltdown, add the virtualisation
related traps to the counter and timers.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-9-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Handle counter access early in non-HYP context

We already deal with CNTPCT_EL0 accesses in non-HYP context.
Let's add CNTVCT_EL0 as a good measure.

This is also an opportunity to simplify things and make it
plain that this code is only for non-HYP context handling.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-8-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Accelerate EL0 counter accesses from hypervisor context

Similarly to handling the physical timer accesses early when FEAT_ECV
causes a trap, we try to handle the physical counter without returning
to the general sysreg handling.

More surprisingly, we introduce something similar for the virtual
counter. Although this isn't necessary yet, it will prove useful on
systems that have a broken CNTVOFF_EL2 implementation. Yes, they exist.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-7-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV in use

Although FEAT_ECV allows us to correctly emulate the timers, it also
reduces performances pretty badly.

Mitigate this by emulating the CTL/CVAL register reads in the
inner run loop, without returning to the general kernel.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-6-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers

Although FEAT_NV2 makes most things fast, it also makes it impossible
to correctly emulate the timers, as the sysreg accesses are redirected
to memory.

FEAT_ECV addresses this by giving a hypervisor the ability to trap
the EL02 sysregs as well as the virtual timer.

Add the required trap setting to make use of the feature, allowing
us to elide the ugly resync in the middle of the run loop.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-5-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state

With FEAT_NV2, the EL0 timer state is entirely stored in memory,
meaning that the hypervisor can only provide a very poor emulation.

The only thing we can really do is to publish the interrupt state
in the guest view of CNT{P,V}_CTL_EL0, and defer everything else
to the next exit.

Only FEAT_ECV will allow us to fix it, at the cost of extra trapping.

Suggested-by: Chase Conklin <chase.conklin@arm.com>
Suggested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Sync nested timer state with FEAT_NV2

Emulating the timers with FEAT_NV2 is a bit odd, as the timers
can be reconfigured behind our back without the hypervisor even
noticing. In the VHE case, that's an actual regression in the
architecture...

Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Add handling of EL2-specific timer registers

Add the required handling for EL2 and EL02 registers, as
well as EL1 registers used in the E2H context. This includes
handling the virtual timer accesses when CNTHCTL_EL2.EL1TVT
or CNTHCTL_EL2.EL1TVCT are set.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Advertise the lack of AArch32 EL0 support

Although we never supported 32bit anywhere in NV, we fail to
advertise so for EL0, probably owing to the relative lack of
hardware supporting both NV2 and 32bit EL0.

Add some sanitising to ID_AA64PFR0_EL1.EL0, and reaffirm that
"in 64bit-only we trust".

Reported-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Use kvm_vcpu_has_feature() directly for struct kvm

Now that we have introduced kvm_vcpu_has_feature(), use it in the
remaining code that checks for features in struct kvm, instead of
using the __vcpu_has_feature() helper.

No functional change intended.

Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-18-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Convert the SVE guest vcpu flag to a vm flag

The vcpu flag GUEST_HAS_SVE is per-vcpu, but it is based on what
is now a per-vm feature. Make the flag per-vm.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-17-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Remove PtrAuth guest vcpu flag

The vcpu flag GUEST_HAS_PTRAUTH is always associated with the
vcpu PtrAuth features, which are defined per vm rather than per
vcpu.

Remove the flag, and replace it with checks for the features
instead.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-16-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Fix the value of the CPTR_EL2 RES1 bitmask for nVHE

Since the introduction of SME, bit 12 in CPTR_EL2 (nVHE) is TSM
for trapping SME, instead of RES1, as per ARM ARM DDI 0487K.a,
section D23.2.34.

Fix the value of CPTR_NVHE_EL2_RES1 to reflect that, and adjust
the code that relies on it accordingly.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-15-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Refactor kvm_reset_cptr_el2()

Fold kvm_get_reset_cptr_el2() into kvm_reset_cptr_el2(), since it
is its only caller. Add a comment to clarify that this function
is meant for the host value of cptr_el2.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-14-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Calculate cptr_el2 traps on activating traps

Similar to VHE, calculate the value of cptr_el2 from scratch on
activate traps. This removes the need to store cptr_el2 in every
vcpu structure. Moreover, some traps, such as whether the guest
owns the fp registers, need to be set on every vcpu run.

Reported-by: James Clark <james.clark@linaro.org>
Fixes: 5294afdbf45a ("KVM: arm64: Exclude FP ownership from kvm_vcpu_arch")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-13-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Remove redundant setting of HCR_EL2 trap bit

In hVHE mode, HCR_E2H should be set for both protected and
non-protected VMs. Since commit b56680de9c64 ("KVM: arm64:
Initialize trap register values in hyp in pKVM"), this has been
fixed, and the setting of the flag here is redundant.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-12-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Remove fixed_config.h header

The few remaining items needed in fixed_config.h are better
suited for pkvm.h. Move them there and delete it.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-11-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Rework specifying restricted features for protected VMs

The existing code didn't properly distinguish between signed and
unsigned features, and was difficult to read and to maintain.
Rework it using the same method used in other parts of KVM when
handling vcpu features.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-10-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Set protected VM traps based on its view of feature registers

Now that the VM's feature id registers are initialized with the
values of the supported features, use those values to determine
which traps to set using kvm_has_feature().

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-9-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Fix RAS trapping in pKVM for protected VMs

Trap RAS in pKVM if not supported at all for protected VMs. The
RAS version doesn't matter in this case.

Fixes: 2a0c343386ae ("KVM: arm64: Initialize trap registers for protected VMs")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-8-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Initialize feature id registers for protected VMs

The hypervisor maintains the state of protected VMs. Initialize
the values for feature ID registers for protected VMs, to be used
when setting traps and when advertising features to protected
VMs.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-7-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Use KVM extension checks for allowed protected VM capabilities

Use KVM extension checks as the source for determining which
capabilities are allowed for protected VMs. KVM extension checks
is the natural place for this, since it is also the interface
exposed to users.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Remove KVM_ARM_VCPU_POWER_OFF from protected VMs allowed features in pKVM

The hypervisor is responsible for the power state of protected
VMs in pKVM. Therefore, remove KVM_ARM_VCPU_POWER_OFF from the
list of allowed features for protected VMs.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Move checking protected vcpu features to a separate function

At the moment, checks for supported vcpu features for protected
VMs are build-time bugs. In the following patch, they will become
runtime checks based on the vcpu's features registers. Therefore,
consolidate them into one function that would return an error if
it encounters an unsupported feature.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Group setting traps for protected VMs by control register

Group setting protected VM traps by control register rather than
feature id register, since some trap values (e.g., PAuth), depend
on more than one feature id register.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Consolidate allowed and restricted VM feature checks

The definitions for features allowed and allowed with
restrictions for protected guests, which are based on feature
registers, were defined and checked for separately, even though
they are handled in the same way. This could result in missing
checks for certain features, e.g., pointer authentication,
causing traps for allowed features.

Consolidate the definitions into one. Use that new definition to
construct the guest view of the feature registers for
consistency.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Plumb the pKVM MMU in KVM

Introduce the KVM_PGT_CALL() helper macro to allow switching from the
traditional pgtable code to the pKVM version easily in mmu.c. The cost
of this 'indirection' is expected to be very minimal due to
is_protected_kvm_enabled() being backed by a static key.

With this, everything is in place to allow the delegation of
non-protected guest stage-2 page-tables to pKVM, so let's stop using the
host's kvm_s2_mmu from EL2 and enjoy the ride.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-19-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce the EL1 pKVM MMU

Introduce a set of helper functions allowing to manipulate the pKVM
guest stage-2 page-tables from EL1 using pKVM's HVC interface.

Each helper has an exact one-to-one correspondance with the traditional
kvm_pgtable_stage2_*() functions from pgtable.c, with a strictly
matching prototype. This will ease plumbing later on in mmu.c.

These callbacks track the gfn->pfn mappings in a simple rb_tree indexed
by IPA in lieu of a page-table. This rb-tree is kept in sync with pKVM's
state and is protected by the mmu_lock like a traditional stage-2
page-table.

Signed-off-by: Quentin Perret <qperret@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-18-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce __pkvm_tlb_flush_vmid()

Introduce a new hypercall to flush the TLBs of non-protected guests. The
host kernel will be responsible for issuing this hypercall after changing
stage-2 permissions using the __pkvm_host_relax_guest_perms() or
__pkvm_host_wrprotect_guest() paths. This is left under the host's
responsibility for performance reasons.

Note however that the TLB maintenance for all *unmap* operations still
remains entirely under the hypervisor's responsibility for security
reasons -- an unmapped page may be donated to another entity, so a stale
TLB entry could be used to leak private data.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-17-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce __pkvm_host_mkyoung_guest()

Plumb the kvm_pgtable_stage2_mkyoung() callback into pKVM for
non-protected guests. It will be called later from the fault handling
path.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-16-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()

Plumb the kvm_stage2_test_clear_young() callback into pKVM for
non-protected guest. It will be later be called from MMU notifiers.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-15-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce __pkvm_host_wrprotect_guest()

Introduce a new hypercall to remove the write permission from a
non-protected guest stage-2 mapping. This will be used for e.g. enabling
dirty logging.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-14-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce __pkvm_host_relax_guest_perms()

Introduce a new hypercall allowing the host to relax the stage-2
permissions of mappings in a non-protected guest page-table. It will be
used later once we start allowing RO memslots and dirty logging.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-13-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce __pkvm_host_unshare_guest()

In preparation for letting the host unmap pages from non-protected
guests, introduce a new hypercall implementing the host-unshare-guest
transition.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-12-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce __pkvm_host_share_guest()

In preparation for handling guest stage-2 mappings at EL2, introduce a
new pKVM hypercall allowing to share pages with non-protected guests.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-11-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce __pkvm_vcpu_{load,put}()

Rather than look-up the hyp vCPU on every run hypercall at EL2,
introduce a per-CPU 'loaded_hyp_vcpu' tracking variable which is updated
by a pair of load/put hypercalls called directly from
kvm_arch_vcpu_{load,put}() when pKVM is enabled.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-10-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers

In preparation for accessing pkvm_hyp_vm structures at EL2 in a context
where we can't always expect a vCPU to be loaded (e.g. MMU notifiers),
introduce get/put helpers to get temporary references to hyp VMs from
any context.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-9-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function

Turn kvm_pgtable_stage2_init() into a static inline function instead of
a macro. This will allow the usage of typeof() on it later on.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-8-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms

kvm_pgtable_stage2_relax_perms currently assumes that it is being called
from a 'shared' walker, which will not be true once called from pKVM. To
allow for the re-use of that function, make the walk flags one of its
parameters.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-7-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung

kvm_pgtable_stage2_mkyoung currently assumes that it is being called
from a 'shared' walker, which will not be true once called from pKVM.
To allow for the re-use of that function, make the walk flags one of
its parameters.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-6-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Move host page ownership tracking to the hyp vmemmap

We currently store part of the page-tracking state in PTE software bits
for the host, guests and the hypervisor. This is sub-optimal when e.g.
sharing pages as this forces to break block mappings purely to support
this software tracking. This causes an unnecessarily fragmented stage-2
page-table for the host in particular when it shares pages with Secure,
which can lead to measurable regressions. Moreover, having this state
stored in the page-table forces us to do multiple costly walks on the
page transition path, hence causing overhead.

In order to work around these problems, move the host-side page-tracking
logic from SW bits in its stage-2 PTEs to the hypervisor's vmemmap.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-5-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Make hyp_page::order a u8

We don't need 16 bits to store the hyp page order, and we'll need some
bits to store page ownership data soon, so let's reduce the order
member.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-4-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Move enum pkvm_page_state to memory.h

In order to prepare the way for storing page-tracking information in
pKVM's vmemmap, move the enum pkvm_page_state definition to
nvhe/memory.h.

No functional changes intended.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-3-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Change the layout of enum pkvm_page_state

The 'concrete' (a.k.a non-meta) page states are currently encoded using
software bits in PTEs. For performance reasons, the abstract
pkvm_page_state enum uses the same bits to encode these states as that
makes conversions from and to PTEs easy.

In order to prepare the ground for moving the 'concrete' state storage
to the hyp vmemmap, re-arrange the enum to use bits 0 and 1 for this
purpose.

No functional changes intended.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-2-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Promote guest ownership for DBGxVR/DBGxCR reads

Only yielding control of the debug registers for writes is a bit silly,
unless of course you're a fan of pointless traps. Give control of the
debug registers to the guest upon the first access, regardless of
direction.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-20-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Fold DBGxVR/DBGxCR accessors into common set

There is a nauseating amount of boilerplate for accessing the
breakpoint and watchpoint registers. Fold everything together into a
single set of accessors and select the right storage based on the sysreg
encoding.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-19-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Avoid reading ID_AA64DFR0_EL1 for debug save/restore

Similar to other per-CPU profiling/debug features we handle, store the
number of breakpoints/watchpoints in kvm_host_data to avoid reading the
ID register 4 times on every guest entry/exit. And if you're in the
nested virt business that's quite a few avoidable exits to the L0
hypervisor.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-18-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: nv: Honor MDCR_EL2.TDE routing for debug exceptions

Inject debug exceptions into vEL2 if MDCR_EL2.TDE is set.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-17-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Manage software step state at load/put

KVM takes over the guest's software step state machine if the VMM is
debugging the guest, but it does the save/restore fiddling for every
guest entry.

Note that the only constraint on host usage of software step is that the
guest's configuration remains visible to userspace via the ONE_REG
ioctls. So, we can cut down on the amount of fiddling by doing this at
load/put instead.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-16-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Don't hijack guest context MDSCR_EL1

Stealing MDSCR_EL1 in the guest's kvm_cpu_context for external debugging
is rather gross. Just add a field for this instead and let the context
switch code pick the correct one based on the debug owner.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-15-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Compute MDCR_EL2 at vcpu_load()

KVM has picked up several hacks to cope with vcpu->arch.mdcr_el2 needing
to be prepared before vcpu_load(), which is when it gets programmed
into hardware on VHE.

Now that the flows for reprogramming MDCR_EL2 have been simplified, move
that computation to vcpu_load().

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-14-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Reload vCPU for accesses to OSLAR_EL1

KVM takes ownership of the debug regs if the guest enables the OS lock,
as it needs to use MDSCR_EL1 to mask debug exceptions. Just reload the
vCPU if the guest toggles the OS lock, relying on kvm_vcpu_load_debug()
to update the debug owner and get the right trap configuration in place.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-13-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>