Marc Zyngier [Fri, 17 Jan 2025 11:04:53 +0000 (11:04 +0000)]
Merge branch kvm-arm64/nv-timers into kvmarm-master/next
* kvm-arm64/nv-timers:
: .
: Nested Virt support for the EL2 timers. From the initial cover letter:
:
: "Here's another batch of NV-related patches, this time bringing in most
: of the timer support for EL2 as well as nested guests.
:
: The code is pretty convoluted for a bunch of reasons:
:
: - FEAT_NV2 breaks the timer semantics by redirecting HW controls to
: memory, meaning that a guest could setup a timer and never see it
: firing until the next exit
:
: - We go try hard to reflect the timer state in memory, but that's not
: great.
:
: - With FEAT_ECV, we can finally correctly emulate the virtual timer,
: but this emulation is pretty costly
:
: - As a way to make things suck less, we handle timer reads as early as
: possible, and only defer writes to the normal trap handling
:
: - Finally, some implementations are badly broken, and require some
: hand-holding, irrespective of NV support. So we try and reuse the NV
: infrastructure to make them usable. This could be further optimised,
: but I'm running out of patience for this sort of HW.
:
: [...]"
: .
KVM: arm64: nv: Fix doc header layout for timers
KVM: arm64: nv: Document EL2 timer API
KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosity
KVM: arm64: nv: Sanitise CNTHCTL_EL2
KVM: arm64: nv: Propagate CNTHCTL_EL2.EL1NV{P,V}CT bits
KVM: arm64: nv: Add trap routing for CNTHCTL_EL2.EL1{NVPCT,NVVCT,TVT,TVCT}
KVM: arm64: Handle counter access early in non-HYP context
KVM: arm64: nv: Accelerate EL0 counter accesses from hypervisor context
KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV in use
KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state
KVM: arm64: nv: Sync nested timer state with FEAT_NV2
KVM: arm64: nv: Add handling of EL2-specific timer registers
Marc Zyngier [Thu, 16 Jan 2025 10:27:10 +0000 (10:27 +0000)]
KVM: arm64: nv: Fix doc header layout for timers
Stephen reports that 'make htmldocs' spits out a warning
("Documentation/virt/kvm/devices/vcpu.rst:147: WARNING: Definition
list ends without a blank line; unexpected unindent.").
Fix it by keeping all the timer attributes on a single line.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sun, 12 Jan 2025 10:40:10 +0000 (10:40 +0000)]
Merge branch kvm-arm64/pkvm-fixed-features-6.14 into kvmarm-master/next
* kvm-arm64/pkvm-fixed-features-6.14: (24 commits)
: .
: Complete rework of the pKVM handling of features, catching up
: with the rest of the code deals with it these days.
: Patches courtesy of Fuad Tabba. From the cover letter:
:
: "This patch series uses the vm's feature id registers to track the
: supported features, a framework similar to nested virt to set the
: trap values, and removes the need to store cptr_el2 per vcpu in
: favor of setting its value when traps are activated, as VHE mode
: does."
:
: This branch drags the arm64/for-next/cpufeature branch to solve
: ugly conflicts in -next.
: .
KVM: arm64: Fix FEAT_MTE in pKVM
KVM: arm64: Use kvm_vcpu_has_feature() directly for struct kvm
KVM: arm64: Convert the SVE guest vcpu flag to a vm flag
KVM: arm64: Remove PtrAuth guest vcpu flag
KVM: arm64: Fix the value of the CPTR_EL2 RES1 bitmask for nVHE
KVM: arm64: Refactor kvm_reset_cptr_el2()
KVM: arm64: Calculate cptr_el2 traps on activating traps
KVM: arm64: Remove redundant setting of HCR_EL2 trap bit
KVM: arm64: Remove fixed_config.h header
KVM: arm64: Rework specifying restricted features for protected VMs
KVM: arm64: Set protected VM traps based on its view of feature registers
KVM: arm64: Fix RAS trapping in pKVM for protected VMs
KVM: arm64: Initialize feature id registers for protected VMs
KVM: arm64: Use KVM extension checks for allowed protected VM capabilities
KVM: arm64: Remove KVM_ARM_VCPU_POWER_OFF from protected VMs allowed features in pKVM
KVM: arm64: Move checking protected vcpu features to a separate function
KVM: arm64: Group setting traps for protected VMs by control register
KVM: arm64: Consolidate allowed and restricted VM feature checks
arm64/sysreg: Get rid of CPACR_ELx SysregFields
arm64/sysreg: Convert *_EL12 accessors to Mapping
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
# Conflicts:
# arch/arm64/kvm/fpsimd.c
# arch/arm64/kvm/hyp/nvhe/pkvm.c
Marc Zyngier [Sun, 12 Jan 2025 10:37:15 +0000 (10:37 +0000)]
Merge branch kvm-arm64/pkvm-np-guest into kvmarm-master/next
* kvm-arm64/pkvm-np-guest:
: .
: pKVM support for non-protected guests using the standard MM
: infrastructure, courtesy of Quentin Perret. From the cover letter:
:
: "This series moves the stage-2 page-table management of non-protected
: guests to EL2 when pKVM is enabled. This is only intended as an
: incremental step towards a 'feature-complete' pKVM, there is however a
: lot more that needs to come on top.
:
: With that series applied, pKVM provides near-parity with standard KVM
: from a functional perspective all while Linux no longer touches the
: stage-2 page-tables itself at EL1. The majority of mm-related KVM
: features work out of the box, including MMU notifiers, dirty logging,
: RO memslots and things of that nature. There are however two gotchas:
:
: - We don't support mapping devices into guests: this requires
: additional hypervisor support for tracking the 'state' of devices,
: which will come in a later series. No device assignment until then.
:
: - Stage-2 mappings are forced to page-granularity even when backed by a
: huge page for the sake of simplicity of this series. I'm only aiming
: at functional parity-ish (from userspace's PoV) for now, support for
: HP can be added on top later as a perf improvement."
: .
KVM: arm64: Plumb the pKVM MMU in KVM
KVM: arm64: Introduce the EL1 pKVM MMU
KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()
KVM: arm64: Introduce __pkvm_host_wrprotect_guest()
KVM: arm64: Introduce __pkvm_host_relax_guest_perms()
KVM: arm64: Introduce __pkvm_host_unshare_guest()
KVM: arm64: Introduce __pkvm_host_share_guest()
KVM: arm64: Introduce __pkvm_vcpu_{load,put}()
KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers
KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function
KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms
KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung
KVM: arm64: Move host page ownership tracking to the hyp vmemmap
KVM: arm64: Make hyp_page::order a u8
KVM: arm64: Move enum pkvm_page_state to memory.h
KVM: arm64: Change the layout of enum pkvm_page_state
Signed-off-by: Marc Zyngier <maz@kernel.org>
# Conflicts:
# arch/arm64/kvm/arm.c
Marc Zyngier [Sun, 12 Jan 2025 10:36:00 +0000 (10:36 +0000)]
Merge branch kvm-arm64/debug-6.14 into kvmarm-master/next
* kvm-arm64/debug-6.14:
: .
: Large rework of the debug code to make it a bit less horrid,
: courtesy of Oliver Upton. From the original cover letter:
:
: "The debug code has become a bit difficult to reason about, especially
: all the hacks and bandaids for state tracking + trap configuration.
:
: This series reworks the entire mess around using a single enumeration to
: track the state of the debug registers (free, guest-owned, host-owned),
: using that to drive trap configuration and save/restore.
:
: On top of that, this series wires most of the implementation into vCPU
: load/put rather than the main KVM_RUN loop. This has been a long time
: coming for VHE, as a lot of the trap configuration and EL1 state gets
: loaded into hardware at that point anyway.
:
: The save/restore of the debug registers is simplified quite a bit as
: well. KVM will now restore the registers for *any* access rather than
: just writes, and keep doing so until the next vcpu_put() instead of
: dropping it on the floor after the next exception."
: .
KVM: arm64: Promote guest ownership for DBGxVR/DBGxCR reads
KVM: arm64: Fold DBGxVR/DBGxCR accessors into common set
KVM: arm64: Avoid reading ID_AA64DFR0_EL1 for debug save/restore
KVM: arm64: nv: Honor MDCR_EL2.TDE routing for debug exceptions
KVM: arm64: Manage software step state at load/put
KVM: arm64: Don't hijack guest context MDSCR_EL1
KVM: arm64: Compute MDCR_EL2 at vcpu_load()
KVM: arm64: Reload vCPU for accesses to OSLAR_EL1
KVM: arm64: Use debug_owner to track if debug regs need save/restore
KVM: arm64: Remove vestiges of debug_ptr
KVM: arm64: Remove debug tracepoints
KVM: arm64: Select debug state to save/restore based on debug owner
KVM: arm64: Clean up KVM_SET_GUEST_DEBUG handler
KVM: arm64: Evaluate debug owner at vcpu_load()
KVM: arm64: Write MDCR_EL2 directly from kvm_arm_setup_mdcr_el2()
KVM: arm64: Move host SME/SVE tracking flags to host data
KVM: arm64: Track presence of SPE/TRBE in kvm_host_data instead of vCPU
KVM: arm64: Get rid of __kvm_get_mdcr_el2() and related warts
KVM: arm64: Drop MDSCR_EL1_DEBUG_MASK
Marc Zyngier [Tue, 17 Dec 2024 14:23:19 +0000 (14:23 +0000)]
KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosity
It appears that on Qualcomm's x1e CPU, CNTVOFF_EL2 doesn't really
work, specially with HCR_EL2.E2H=1.
A non-zero offset results in a screaming virtual timer interrupt,
to the tune of a few 100k interrupts per second on a 4 vcpu VM.
This is also evidenced by this CPU's inability to correctly run
any of the timer selftests.
The only case this doesn't break is when this register is set to 0,
which breaks VM migration.
When HCR_EL2.E2H=0, the timer seems to behave normally, and does
not result in an interrupt storm.
As a workaround, use the fact that this CPU implements FEAT_ECV,
and trap all accesses to the virtual timer and counter, keeping
CNTVOFF_EL2 set to zero, and emulate accesses to CVAL/TVAL/CTL
and the counter itself, fixing up the timer to account for the
missing offset.
And if you think this is disgusting, you'd probably be right.
Marc Zyngier [Tue, 17 Dec 2024 14:23:14 +0000 (14:23 +0000)]
KVM: arm64: nv: Accelerate EL0 counter accesses from hypervisor context
Similarly to handling the physical timer accesses early when FEAT_ECV
causes a trap, we try to handle the physical counter without returning
to the general sysreg handling.
More surprisingly, we introduce something similar for the virtual
counter. Although this isn't necessary yet, it will prove useful on
systems that have a broken CNTVOFF_EL2 implementation. Yes, they exist.
Marc Zyngier [Tue, 17 Dec 2024 14:23:12 +0000 (14:23 +0000)]
KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
Although FEAT_NV2 makes most things fast, it also makes it impossible
to correctly emulate the timers, as the sysreg accesses are redirected
to memory.
FEAT_ECV addresses this by giving a hypervisor the ability to trap
the EL02 sysregs as well as the virtual timer.
Add the required trap setting to make use of the feature, allowing
us to elide the ugly resync in the middle of the run loop.
Marc Zyngier [Tue, 17 Dec 2024 14:23:10 +0000 (14:23 +0000)]
KVM: arm64: nv: Sync nested timer state with FEAT_NV2
Emulating the timers with FEAT_NV2 is a bit odd, as the timers
can be reconfigured behind our back without the hypervisor even
noticing. In the VHE case, that's an actual regression in the
architecture...
Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 17 Dec 2024 14:23:09 +0000 (14:23 +0000)]
KVM: arm64: nv: Add handling of EL2-specific timer registers
Add the required handling for EL2 and EL02 registers, as
well as EL1 registers used in the E2H context. This includes
handling the virtual timer accesses when CNTHCTL_EL2.EL1TVT
or CNTHCTL_EL2.EL1TVCT are set.
Fuad Tabba [Mon, 16 Dec 2024 10:50:57 +0000 (10:50 +0000)]
KVM: arm64: Use kvm_vcpu_has_feature() directly for struct kvm
Now that we have introduced kvm_vcpu_has_feature(), use it in the
remaining code that checks for features in struct kvm, instead of
using the __vcpu_has_feature() helper.
Fuad Tabba [Mon, 16 Dec 2024 10:50:53 +0000 (10:50 +0000)]
KVM: arm64: Refactor kvm_reset_cptr_el2()
Fold kvm_get_reset_cptr_el2() into kvm_reset_cptr_el2(), since it
is its only caller. Add a comment to clarify that this function
is meant for the host value of cptr_el2.
Fuad Tabba [Mon, 16 Dec 2024 10:50:52 +0000 (10:50 +0000)]
KVM: arm64: Calculate cptr_el2 traps on activating traps
Similar to VHE, calculate the value of cptr_el2 from scratch on
activate traps. This removes the need to store cptr_el2 in every
vcpu structure. Moreover, some traps, such as whether the guest
owns the fp registers, need to be set on every vcpu run.
Reported-by: James Clark <james.clark@linaro.org> Fixes: 5294afdbf45a ("KVM: arm64: Exclude FP ownership from kvm_vcpu_arch") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-13-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
Fuad Tabba [Mon, 16 Dec 2024 10:50:51 +0000 (10:50 +0000)]
KVM: arm64: Remove redundant setting of HCR_EL2 trap bit
In hVHE mode, HCR_E2H should be set for both protected and
non-protected VMs. Since commit b56680de9c64 ("KVM: arm64:
Initialize trap register values in hyp in pKVM"), this has been
fixed, and the setting of the flag here is redundant.
Fuad Tabba [Mon, 16 Dec 2024 10:50:49 +0000 (10:50 +0000)]
KVM: arm64: Rework specifying restricted features for protected VMs
The existing code didn't properly distinguish between signed and
unsigned features, and was difficult to read and to maintain.
Rework it using the same method used in other parts of KVM when
handling vcpu features.
Fuad Tabba [Mon, 16 Dec 2024 10:50:48 +0000 (10:50 +0000)]
KVM: arm64: Set protected VM traps based on its view of feature registers
Now that the VM's feature id registers are initialized with the
values of the supported features, use those values to determine
which traps to set using kvm_has_feature().
Fuad Tabba [Mon, 16 Dec 2024 10:50:46 +0000 (10:50 +0000)]
KVM: arm64: Initialize feature id registers for protected VMs
The hypervisor maintains the state of protected VMs. Initialize
the values for feature ID registers for protected VMs, to be used
when setting traps and when advertising features to protected
VMs.
Fuad Tabba [Mon, 16 Dec 2024 10:50:45 +0000 (10:50 +0000)]
KVM: arm64: Use KVM extension checks for allowed protected VM capabilities
Use KVM extension checks as the source for determining which
capabilities are allowed for protected VMs. KVM extension checks
is the natural place for this, since it is also the interface
exposed to users.
Fuad Tabba [Mon, 16 Dec 2024 10:50:44 +0000 (10:50 +0000)]
KVM: arm64: Remove KVM_ARM_VCPU_POWER_OFF from protected VMs allowed features in pKVM
The hypervisor is responsible for the power state of protected
VMs in pKVM. Therefore, remove KVM_ARM_VCPU_POWER_OFF from the
list of allowed features for protected VMs.
Fuad Tabba [Mon, 16 Dec 2024 10:50:43 +0000 (10:50 +0000)]
KVM: arm64: Move checking protected vcpu features to a separate function
At the moment, checks for supported vcpu features for protected
VMs are build-time bugs. In the following patch, they will become
runtime checks based on the vcpu's features registers. Therefore,
consolidate them into one function that would return an error if
it encounters an unsupported feature.
Fuad Tabba [Fri, 20 Dec 2024 11:33:05 +0000 (11:33 +0000)]
KVM: arm64: Group setting traps for protected VMs by control register
Group setting protected VM traps by control register rather than
feature id register, since some trap values (e.g., PAuth), depend
on more than one feature id register.
Marc Zyngier [Fri, 20 Dec 2024 11:33:05 +0000 (11:33 +0000)]
KVM: arm64: Consolidate allowed and restricted VM feature checks
The definitions for features allowed and allowed with
restrictions for protected guests, which are based on feature
registers, were defined and checked for separately, even though
they are handled in the same way. This could result in missing
checks for certain features, e.g., pointer authentication,
causing traps for allowed features.
Consolidate the definitions into one. Use that new definition to
construct the guest view of the feature registers for
consistency.
Quentin Perret [Wed, 18 Dec 2024 19:40:59 +0000 (19:40 +0000)]
KVM: arm64: Plumb the pKVM MMU in KVM
Introduce the KVM_PGT_CALL() helper macro to allow switching from the
traditional pgtable code to the pKVM version easily in mmu.c. The cost
of this 'indirection' is expected to be very minimal due to
is_protected_kvm_enabled() being backed by a static key.
With this, everything is in place to allow the delegation of
non-protected guest stage-2 page-tables to pKVM, so let's stop using the
host's kvm_s2_mmu from EL2 and enjoy the ride.
Quentin Perret [Wed, 18 Dec 2024 19:40:58 +0000 (19:40 +0000)]
KVM: arm64: Introduce the EL1 pKVM MMU
Introduce a set of helper functions allowing to manipulate the pKVM
guest stage-2 page-tables from EL1 using pKVM's HVC interface.
Each helper has an exact one-to-one correspondance with the traditional
kvm_pgtable_stage2_*() functions from pgtable.c, with a strictly
matching prototype. This will ease plumbing later on in mmu.c.
These callbacks track the gfn->pfn mappings in a simple rb_tree indexed
by IPA in lieu of a page-table. This rb-tree is kept in sync with pKVM's
state and is protected by the mmu_lock like a traditional stage-2
page-table.
Quentin Perret [Wed, 18 Dec 2024 19:40:57 +0000 (19:40 +0000)]
KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
Introduce a new hypercall to flush the TLBs of non-protected guests. The
host kernel will be responsible for issuing this hypercall after changing
stage-2 permissions using the __pkvm_host_relax_guest_perms() or
__pkvm_host_wrprotect_guest() paths. This is left under the host's
responsibility for performance reasons.
Note however that the TLB maintenance for all *unmap* operations still
remains entirely under the hypervisor's responsibility for security
reasons -- an unmapped page may be donated to another entity, so a stale
TLB entry could be used to leak private data.
Introduce a new hypercall to remove the write permission from a
non-protected guest stage-2 mapping. This will be used for e.g. enabling
dirty logging.
Introduce a new hypercall allowing the host to relax the stage-2
permissions of mappings in a non-protected guest page-table. It will be
used later once we start allowing RO memslots and dirty logging.
Marc Zyngier [Wed, 18 Dec 2024 19:40:50 +0000 (19:40 +0000)]
KVM: arm64: Introduce __pkvm_vcpu_{load,put}()
Rather than look-up the hyp vCPU on every run hypercall at EL2,
introduce a per-CPU 'loaded_hyp_vcpu' tracking variable which is updated
by a pair of load/put hypercalls called directly from
kvm_arch_vcpu_{load,put}() when pKVM is enabled.
Quentin Perret [Wed, 18 Dec 2024 19:40:49 +0000 (19:40 +0000)]
KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers
In preparation for accessing pkvm_hyp_vm structures at EL2 in a context
where we can't always expect a vCPU to be loaded (e.g. MMU notifiers),
introduce get/put helpers to get temporary references to hyp VMs from
any context.
Quentin Perret [Wed, 18 Dec 2024 19:40:47 +0000 (19:40 +0000)]
KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms
kvm_pgtable_stage2_relax_perms currently assumes that it is being called
from a 'shared' walker, which will not be true once called from pKVM. To
allow for the re-use of that function, make the walk flags one of its
parameters.
Quentin Perret [Wed, 18 Dec 2024 19:40:46 +0000 (19:40 +0000)]
KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung
kvm_pgtable_stage2_mkyoung currently assumes that it is being called
from a 'shared' walker, which will not be true once called from pKVM.
To allow for the re-use of that function, make the walk flags one of
its parameters.
Quentin Perret [Wed, 18 Dec 2024 19:40:45 +0000 (19:40 +0000)]
KVM: arm64: Move host page ownership tracking to the hyp vmemmap
We currently store part of the page-tracking state in PTE software bits
for the host, guests and the hypervisor. This is sub-optimal when e.g.
sharing pages as this forces to break block mappings purely to support
this software tracking. This causes an unnecessarily fragmented stage-2
page-table for the host in particular when it shares pages with Secure,
which can lead to measurable regressions. Moreover, having this state
stored in the page-table forces us to do multiple costly walks on the
page transition path, hence causing overhead.
In order to work around these problems, move the host-side page-tracking
logic from SW bits in its stage-2 PTEs to the hypervisor's vmemmap.
Quentin Perret [Wed, 18 Dec 2024 19:40:42 +0000 (19:40 +0000)]
KVM: arm64: Change the layout of enum pkvm_page_state
The 'concrete' (a.k.a non-meta) page states are currently encoded using
software bits in PTEs. For performance reasons, the abstract
pkvm_page_state enum uses the same bits to encode these states as that
makes conversions from and to PTEs easy.
In order to prepare the ground for moving the 'concrete' state storage
to the hyp vmemmap, re-arrange the enum to use bits 0 and 1 for this
purpose.
Oliver Upton [Thu, 19 Dec 2024 22:41:16 +0000 (14:41 -0800)]
KVM: arm64: Promote guest ownership for DBGxVR/DBGxCR reads
Only yielding control of the debug registers for writes is a bit silly,
unless of course you're a fan of pointless traps. Give control of the
debug registers to the guest upon the first access, regardless of
direction.
Oliver Upton [Thu, 19 Dec 2024 22:41:15 +0000 (14:41 -0800)]
KVM: arm64: Fold DBGxVR/DBGxCR accessors into common set
There is a nauseating amount of boilerplate for accessing the
breakpoint and watchpoint registers. Fold everything together into a
single set of accessors and select the right storage based on the sysreg
encoding.
Oliver Upton [Thu, 19 Dec 2024 22:41:14 +0000 (14:41 -0800)]
KVM: arm64: Avoid reading ID_AA64DFR0_EL1 for debug save/restore
Similar to other per-CPU profiling/debug features we handle, store the
number of breakpoints/watchpoints in kvm_host_data to avoid reading the
ID register 4 times on every guest entry/exit. And if you're in the
nested virt business that's quite a few avoidable exits to the L0
hypervisor.
Marc Zyngier [Fri, 20 Dec 2024 08:59:48 +0000 (08:59 +0000)]
KVM: arm64: Manage software step state at load/put
KVM takes over the guest's software step state machine if the VMM is
debugging the guest, but it does the save/restore fiddling for every
guest entry.
Note that the only constraint on host usage of software step is that the
guest's configuration remains visible to userspace via the ONE_REG
ioctls. So, we can cut down on the amount of fiddling by doing this at
load/put instead.
Oliver Upton [Thu, 19 Dec 2024 22:41:11 +0000 (14:41 -0800)]
KVM: arm64: Don't hijack guest context MDSCR_EL1
Stealing MDSCR_EL1 in the guest's kvm_cpu_context for external debugging
is rather gross. Just add a field for this instead and let the context
switch code pick the correct one based on the debug owner.
Oliver Upton [Thu, 19 Dec 2024 22:41:10 +0000 (14:41 -0800)]
KVM: arm64: Compute MDCR_EL2 at vcpu_load()
KVM has picked up several hacks to cope with vcpu->arch.mdcr_el2 needing
to be prepared before vcpu_load(), which is when it gets programmed
into hardware on VHE.
Now that the flows for reprogramming MDCR_EL2 have been simplified, move
that computation to vcpu_load().
Oliver Upton [Thu, 19 Dec 2024 22:41:09 +0000 (14:41 -0800)]
KVM: arm64: Reload vCPU for accesses to OSLAR_EL1
KVM takes ownership of the debug regs if the guest enables the OS lock,
as it needs to use MDSCR_EL1 to mask debug exceptions. Just reload the
vCPU if the guest toggles the OS lock, relying on kvm_vcpu_load_debug()
to update the debug owner and get the right trap configuration in place.
Oliver Upton [Thu, 19 Dec 2024 22:41:08 +0000 (14:41 -0800)]
KVM: arm64: Use debug_owner to track if debug regs need save/restore
Use the debug owner to determine if the debug regs are in use instead of
keeping around the DEBUG_DIRTY flag. Debug registers are now
saved/restored after the first trap, regardless of whether it was a read
or a write. This also shifts the point at which KVM becomes lazy to
vcpu_put() rather than the next exception taken from the guest.
Oliver Upton [Thu, 19 Dec 2024 22:41:06 +0000 (14:41 -0800)]
KVM: arm64: Remove debug tracepoints
The debug tracepoints are a useless firehose of information that track
implementation detail rather than well-defined events. These are going
to be rather difficult to uphold now that the implementation is getting
redone, so throw them out instead of bending over backwards.
Oliver Upton [Thu, 19 Dec 2024 22:41:05 +0000 (14:41 -0800)]
KVM: arm64: Select debug state to save/restore based on debug owner
Select the set of debug registers to use based on the owner rather than
relying on debug_ptr. Besides the code cleanup, this allows us to
eliminate a couple instances kern_hyp_va() as well.
Oliver Upton [Thu, 19 Dec 2024 22:41:03 +0000 (14:41 -0800)]
KVM: arm64: Evaluate debug owner at vcpu_load()
In preparation for tossing the debug_ptr mess, introduce an enumeration
to track the ownership of the debug registers while in the guest. Update
the owner at vcpu_load() based on whether the host needs to steal the
guest's debug context or if breakpoints/watchpoints are actively in use.
Oliver Upton [Thu, 19 Dec 2024 22:41:00 +0000 (14:41 -0800)]
KVM: arm64: Track presence of SPE/TRBE in kvm_host_data instead of vCPU
Add flags to kvm_host_data to track if SPE/TRBE is present +
programmable on a per-CPU basis. Set the flags up at init rather than
vcpu_load() as the programmability of these buffers is unlikely to
change.
Oliver Upton [Thu, 19 Dec 2024 22:40:59 +0000 (14:40 -0800)]
KVM: arm64: Get rid of __kvm_get_mdcr_el2() and related warts
KVM caches MDCR_EL2 on a per-CPU basis in order to preserve the
configuration of MDCR_EL2.HPMN while running a guest. This is a bit
gross, since we're relying on some baked configuration rather than the
hardware definition of implemented counters.
Discover the number of implemented counters by reading PMCR_EL0.N
instead. This works because:
- In VHE the kernel runs at EL2, and N always returns the number of
counters implemented in hardware
- In {n,h}VHE, the EL2 setup code programs MDCR_EL2.HPMN with the EL2
view of PMCR_EL0.N for the host
Lastly, avoid traps under nested virtualization by saving PMCR_EL0.N in
host data.
Marc Zyngier [Thu, 19 Dec 2024 17:33:48 +0000 (17:33 +0000)]
arm64/sysreg: Get rid of the TCR2_EL1x SysregFields
TCR2_EL1x is a pretty bizarre construct, as it is shared between
TCR2_EL1 and TCR2_EL12. But the latter is obviously only an
accessor to the former.
In order to make things more consistent, upgrade TCR2_EL1x to
a full-blown sysreg definition for TCR2_EL1, and describe TCR2_EL12
as a mapping to TCR2_EL1.
This results in a couple of minor changes to the actual code.
Marc Zyngier [Thu, 19 Dec 2024 17:33:47 +0000 (17:33 +0000)]
arm64/sysreg: Allow a 'Mapping' descriptor for system registers
*EL02 and *_EL12 system registers are actually only accessors for
EL0 and EL1 registers accessed from EL2 when HCR_EL2.E2H==1. They
do not have fields of their own.
To that effect, introduce a 'Mapping' entry, describing which
system register an _EL12 register maps to.
Implementation wise, this is handled the same was as Fields,
which ls only a comment.
Linus Torvalds [Sun, 15 Dec 2024 23:33:41 +0000 (15:33 -0800)]
Merge tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- Limit EFI zboot to GZIP and ZSTD before it comes in wider use
- Fix inconsistent error when looking up a non-existent file in
efivarfs with a name that does not adhere to the NAME-GUID format
- Drop some unused code
* tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/esrt: remove esre_attribute::store()
efivarfs: Fix error on non-existent file
efi/zboot: Limit compression options to GZIP and ZSTD
Linus Torvalds [Sun, 15 Dec 2024 23:29:07 +0000 (15:29 -0800)]
Merge tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"i2c host fixes: PNX used the wrong unit for timeouts, Nomadik was
missing a sentinel, and RIIC was missing rounding up"
* tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: riic: Always round-up when calculating bus period
i2c: nomadik: Add missing sentinel to match table
i2c: pnx: Fix timeout in wait functions
Linus Torvalds [Sun, 15 Dec 2024 18:01:10 +0000 (10:01 -0800)]
Merge tag 'edac_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
- Make sure amd64_edac loads successfully on certain Zen4 memory
configurations
* tag 'edac_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/amd64: Simplify ECC check on unified memory controllers
Linus Torvalds [Sun, 15 Dec 2024 17:58:27 +0000 (09:58 -0800)]
Merge tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Disable the secure programming interface of the GIC500 chip in the
RK3399 SoC to fix interrupt priority assignment and even make a dead
machine boot again when the gic-v3 driver enables pseudo NMIs
- Correct the declaration of a percpu variable to fix several sparse
warnings
* tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Work around insecure GIC integrations
irqchip/gic: Correct declaration of *percpu_base pointer in union gic_base
Linus Torvalds [Sun, 15 Dec 2024 17:38:03 +0000 (09:38 -0800)]
Merge tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Prevent incorrect dequeueing of the deadline dlserver helper task and
fix its time accounting
- Properly track the CFS runqueue runnable stats
- Check the total number of all queued tasks in a sched fair's runqueue
hierarchy before deciding to stop the tick
- Fix the scheduling of the task that got woken last (NEXT_BUDDY) by
preventing those from being delayed
* tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/dlserver: Fix dlserver time accounting
sched/dlserver: Fix dlserver double enqueue
sched/eevdf: More PELT vs DELAYED_DEQUEUE
sched/fair: Fix sched_can_stop_tick() for fair tasks
sched/fair: Fix NEXT_BUDDY
Linus Torvalds [Sun, 15 Dec 2024 17:26:13 +0000 (09:26 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Fix confusion with implicitly-shifted MDCR_EL2 masks breaking
SPE/TRBE initialization
- Align nested page table walker with the intended memory attribute
combining rules of the architecture
- Prevent userspace from constraining the advertised ASID width,
avoiding horrors of guest TLBIs not matching the intended context
in hardware
- Don't leak references on LPIs when insertion into the translation
cache fails
RISC-V:
- Replace csr_write() with csr_set() for HVIEN PMU overflow bit
x86:
- Cache CPUID.0xD XSTATE offsets+sizes during module init
On Intel's Emerald Rapids CPUID costs hundreds of cycles and there
are a lot of leaves under 0xD. Getting rid of the CPUIDs during
nested VM-Enter and VM-Exit is planned for the next release, for
now just cache them: even on Skylake that is 40% faster"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init
RISC-V: KVM: Fix csr_write -> csr_set for HVIEN PMU overflow bit
KVM: arm64: vgic-its: Add error handling in vgic_its_cache_translation
KVM: arm64: Do not allow ID_AA64MMFR0_EL1.ASIDbits to be overridden
KVM: arm64: Fix S1/S2 combination when FWB==1 and S2 has Device memory type
arm64: Fix usage of new shifted MDCR_EL2 values
Linus Torvalds [Sat, 14 Dec 2024 23:53:02 +0000 (15:53 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"Single one-line fix in the ufs driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Update compl_time_stamp_local_clock after completing a cqe
Linus Torvalds [Sat, 14 Dec 2024 20:58:14 +0000 (12:58 -0800)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:
- Fix a bug in the BPF verifier to track changes to packet data
property for global functions (Eduard Zingerman)
- Fix a theoretical BPF prog_array use-after-free in RCU handling of
__uprobe_perf_func (Jann Horn)
- Fix BPF tracing to have an explicit list of tracepoints and their
arguments which need to be annotated as PTR_MAYBE_NULL (Kumar
Kartikeya Dwivedi)
- Fix a logic bug in the bpf_remove_insns code where a potential error
would have been wrongly propagated (Anton Protopopov)
- Avoid deadlock scenarios caused by nested kprobe and fentry BPF
programs (Priya Bala Govindasamy)
- Fix a bug in BPF verifier which was missing a size check for
BTF-based context access (Kumar Kartikeya Dwivedi)
- Fix a crash found by syzbot through an invalid BPF prog_array access
in perf_event_detach_bpf_prog (Jiri Olsa)
- Fix several BPF sockmap bugs including a race causing a refcount
imbalance upon element replace (Michal Luczaj)
- Fix a use-after-free from mismatching BPF program/attachment RCU
flavors (Jann Horn)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits)
bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs
selftests/bpf: Add tests for raw_tp NULL args
bpf: Augment raw_tp arguments with PTR_MAYBE_NULL
bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"
selftests/bpf: Add test for narrow ctx load for pointer args
bpf: Check size for BTF-based ctx access of pointer members
selftests/bpf: extend changes_pkt_data with cases w/o subprograms
bpf: fix null dereference when computing changes_pkt_data of prog w/o subprogs
bpf: Fix theoretical prog_array UAF in __uprobe_perf_func()
bpf: fix potential error return
selftests/bpf: validate that tail call invalidates packet pointers
bpf: consider that tail calls invalidate packet pointers
selftests/bpf: freplace tests for tracking of changes_packet_data
bpf: check changes_pkt_data property for extension programs
selftests/bpf: test for changing packet data from global functions
bpf: track changes_pkt_data property for global functions
bpf: refactor bpf_helper_changes_pkt_data to use helper number
bpf: add find_containing_subprog() utility function
bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog
bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors
...
bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs
BPF program types like kprobe and fentry can cause deadlocks in certain
situations. If a function takes a lock and one of these bpf programs is
hooked to some point in the function's critical section, and if the
bpf program tries to call the same function and take the same lock it will
lead to deadlock. These situations have been reported in the following
bug reports.
The bugs reported by syzbot are due to tracepoint bpf programs being
called in the critical sections. This patch does not aim to fix deadlocks
caused by tracepoint programs. However, it does prevent deadlocks from
occurring in similar situations due to kprobe and fentry programs.
Linus Torvalds [Sat, 14 Dec 2024 17:31:19 +0000 (09:31 -0800)]
Merge tag 'tty-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
"Here are two small serial driver fixes for 6.13-rc3. They are:
- ioport build fallout fix for the 8250 port driver that should
resolve Guenter's runtime problems
- sh-sci driver bugfix for a reported problem
Both of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: serial: Work around warning backtrace in serial8250_set_defaults
serial: sh-sci: Check if TX data was written to device in .tx_empty()
Linus Torvalds [Sat, 14 Dec 2024 17:27:07 +0000 (09:27 -0800)]
Merge tag 'staging-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging gpib driver build and bugfixes for issues
that have been much-reported (should finally fix Guenter's build
issues). There are more of these coming in later -rc releases, but for
now this should fix the majority of the reported problems.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: gpib: Fix i386 build issue
staging: gpib: Fix faulty workaround for assignment in if
staging: gpib: Workaround for ppc build failure
staging: gpib: Make GPIB_NI_PCI_ISA depend on HAS_IOPORT
Linus Torvalds [Sat, 14 Dec 2024 17:15:49 +0000 (09:15 -0800)]
Merge tag 'v6.13-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"Fix a regression in rsassa-pkcs1 as well as a buffer overrun in
hisilicon/debugfs"
* tag 'v6.13-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: hisilicon/debugfs - fix the struct pointer incorrectly offset problem
crypto: rsassa-pkcs1 - Copy source data for SG list
Linus Torvalds [Sat, 14 Dec 2024 16:51:43 +0000 (08:51 -0800)]
Merge tag 'rust-fixes-6.13' of https://github.com/Rust-for-Linux/linux
Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Set bindgen's Rust target version to prevent issues when
pairing older rustc releases with newer bindgen releases,
such as bindgen >= 0.71.0 and rustc < 1.82 due to
unsafe_extern_blocks.
drm/panic:
- Remove spurious empty line detected by a new Clippy warning"
* tag 'rust-fixes-6.13' of https://github.com/Rust-for-Linux/linux:
rust: kbuild: set `bindgen`'s Rust target version
drm/panic: remove spurious empty line to clean warning
Linus Torvalds [Sat, 14 Dec 2024 16:41:40 +0000 (08:41 -0800)]
Merge tag 'iommu-fixes-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
- Per-domain device-list locking fixes for the AMD IOMMU driver
- Fix incorrect use of smp_processor_id() in the NVidia-specific part
of the ARM-SMMU-v3 driver
- Intel IOMMU driver fixes:
- Remove cache tags before disabling ATS
- Avoid draining PRQ in sva mm release path
- Fix qi_batch NULL pointer with nested parent domain
* tag 'iommu-fixes-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/vt-d: Avoid draining PRQ in sva mm release path
iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
iommu/vt-d: Remove cache tags before disabling ATS
iommu/amd: Add lockdep asserts for domain->dev_list
iommu/amd: Put list_add/del(dev_data) back under the domain->lock
iommu/tegra241-cmdqv: do not use smp_processor_id in preemptible context
Linus Torvalds [Sat, 14 Dec 2024 16:40:05 +0000 (08:40 -0800)]
Merge tag 'ata-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Damien Le Moal:
- Fix an OF node reference leak in the sata_highbank driver
* tag 'ata-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: sata_highbank: fix OF node reference leak in highbank_initialize_phys()
Wolfram Sang [Sat, 14 Dec 2024 09:01:46 +0000 (10:01 +0100)]
Merge tag 'i2c-host-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.13-rc3
- Replaced jiffies with msec for timeout calculations.
- Added a sentinel to the 'of_device_id' array in Nomadik.
- Rounded up bus period calculation in RIIC.
Linus Torvalds [Sat, 14 Dec 2024 01:36:02 +0000 (17:36 -0800)]
Merge tag '6.13-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- fix rmmod leak
- two minor cleanups
- fix for unlink/rename with pending i/o
* tag '6.13-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: destroy cfid_put_wq on module exit
cifs: Use str_yes_no() helper in cifs_ses_add_channel()
cifs: Fix rmdir failure due to ongoing I/O on deleted file
smb3: fix compiler warning in reparse code
Linus Torvalds [Sat, 14 Dec 2024 01:29:19 +0000 (17:29 -0800)]
Merge tag 'spi-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few fairly small fixes for v6.13, the most substatial one being
disabling STIG mode for Cadence QSPI controllers on Altera SoCFPGA
platforms since it doesn't work"
* tag 'spi-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-cadence-qspi: Disable STIG mode for Altera SoCFPGA.
spi: rockchip: Fix PM runtime count on no-op cs
spi: aspeed: Fix an error handling path in aspeed_spi_[read|write]_user()
Linus Torvalds [Sat, 14 Dec 2024 01:18:28 +0000 (17:18 -0800)]
Merge tag 'regulator-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of additional changes, one ensuring we give AXP717 enough
time to stabilise after changing voltages which fixes serious
stability issues on some platforms and another documenting the DT
support required for the Qualcomm WCN6750"
* tag 'regulator-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: axp20x: AXP717: set ramp_delay
regulator: dt-bindings: qcom,qca6390-pmu: document wcn6750-pmu
Linus Torvalds [Sat, 14 Dec 2024 00:58:39 +0000 (16:58 -0800)]
Merge tag 'drm-fixes-2024-12-14' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"This is the weekly fixes pull for drm. Just has i915, xe and amdgpu
changes in it. Nothing too major in here:
i915:
- Don't use indexed register writes needlessly [dsb]
- Stop using non-posted DSB writes for legacy LUT [color]
- Fix NULL pointer dereference in capture_engine
- Fix memory leak by correcting cache object name in error handler
xe:
- Fix a KUNIT test error message (Mirsad Todorovac)
- Fix an invalidation fence PM ref leak (Daniele)
- Fix a register pool UAF (Lucas)
amdgpu:
- ISP hw init fix
- SR-IOV fixes
- Fix contiguous VRAM mapping for UVD on older GPUs
- Fix some regressions due to drm scheduler changes
- Workload profile fixes
- Cleaner shader fix
amdkfd:
- Fix DMA map direction for migration
- Fix a potential null pointer dereference
- Cacheline size fixes
- Runtime PM fix"
* tag 'drm-fixes-2024-12-14' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe/reg_sr: Remove register pool
drm/xe: Call invalidation_fence_fini for PT inval fences in error state
drm/xe: fix the ERR_PTR() returned on failure to allocate tiny pt
drm/amdkfd: pause autosuspend when creating pdd
drm/amdgpu: fix when the cleaner shader is emitted
drm/amdgpu: Fix ISP HW init issue
drm/amdkfd: hard-code MALL cacheline size for gfx11, gfx12
drm/amdkfd: hard-code cacheline size for gfx11
drm/amdkfd: Dereference null return value
drm/i915: Fix memory leak by correcting cache object name in error handler
drm/i915: Fix NULL pointer dereference in capture_engine
drm/i915/color: Stop using non-posted DSB writes for legacy LUT
drm/i915/dsb: Don't use indexed register writes needlessly
drm/amdkfd: Correct the migration DMA map direction
drm/amd/pm: Set SMU v13.0.7 default workload type
drm/amd/pm: Initialize power profile mode
amdgpu/uvd: get ring reference from rq scheduler
drm/amdgpu: fix UVD contiguous CS mapping problem
drm/amdgpu: use sjt mec fw on gfx943 for sriov
Revert "drm/amdgpu: Fix ISP hw init issue"