]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
2 months agoMerge branch kvm-arm64/at-fixes-6.16 into kvmarm-master/next
Marc Zyngier [Fri, 23 May 2025 09:58:34 +0000 (10:58 +0100)]
Merge branch kvm-arm64/at-fixes-6.16 into kvmarm-master/next

* kvm-arm64/at-fixes-6.16:
  : .
  : Set of fixes for Address Translation (AT) instruction emulation,
  : which affect the (not yet upstream) NV support.
  :
  : From the cover letter:
  :
  : "Here's a small series of fixes for KVM's implementation of address
  : translation (aka the AT S1* instructions), addressing a number of
  : issues in increasing levels of severity:
  :
  : - We misreport PAR_EL1.PTW in a number of occasions, including state
  :   that is not possible as per the architecture definition
  :
  : - We don't handle access faults at all, and that doesn't play very
  :   well with the rest of the VNCR stuff
  :
  : - AT S1E{0,1} from EL2 with HCR_EL2.{E2H,TGE}={1,1} will absolutely
  :   take the host down, no questions asked"
  : .
  KVM: arm64: Don't feed uninitialised data to HCR_EL2
  KVM: arm64: Teach address translation about access faults
  KVM: arm64: Fix PAR_EL1.{PTW,S} reporting on AT S1E*

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoMerge branch kvm-arm64/fgt-masks into kvmarm-master/next
Marc Zyngier [Fri, 23 May 2025 09:58:15 +0000 (10:58 +0100)]
Merge branch kvm-arm64/fgt-masks into kvmarm-master/next

* kvm-arm64/fgt-masks: (43 commits)
  : .
  : Large rework of the way KVM deals with trap bits in conjunction with
  : the CPU feature registers. It now draws a direct link between which
  : the feature set, the system registers that need to UNDEF to match
  : the configuration and bits that need to behave as RES0 or RES1 in
  : the trap registers that are visible to the guest.
  :
  : Best of all, these definitions are mostly automatically generated
  : from the JSON description published by ARM under a permissive
  : license.
  : .
  KVM: arm64: Handle TSB CSYNC traps
  KVM: arm64: Add FGT descriptors for FEAT_FGT2
  KVM: arm64: Allow sysreg ranges for FGT descriptors
  KVM: arm64: Add context-switch for FEAT_FGT2 registers
  KVM: arm64: Add trap routing for FEAT_FGT2 registers
  KVM: arm64: Add sanitisation for FEAT_FGT2 registers
  KVM: arm64: Add FEAT_FGT2 registers to the VNCR page
  KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits
  KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits
  KVM: arm64: Allow kvm_has_feat() to take variable arguments
  KVM: arm64: Use FGT feature maps to drive RES0 bits
  KVM: arm64: Validate FGT register descriptions against RES0 masks
  KVM: arm64: Switch to table-driven FGU configuration
  KVM: arm64: Handle PSB CSYNC traps
  KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask
  KVM: arm64: Remove hand-crafted masks for FGT registers
  KVM: arm64: Use computed FGT masks to setup FGT registers
  KVM: arm64: Propagate FGT masks to the nVHE hypervisor
  KVM: arm64: Unconditionally configure fine-grain traps
  KVM: arm64: Use computed masks as sanitisers for FGT registers
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoMerge branch kvm-arm64/mte-frac into kvmarm-master/next
Marc Zyngier [Fri, 23 May 2025 09:57:44 +0000 (10:57 +0100)]
Merge branch kvm-arm64/mte-frac into kvmarm-master/next

* kvm-arm64/mte-frac:
  : .
  : Prevent FEAT_MTE_ASYNC from being accidently exposed to a guest,
  : courtesy of Ben Horgan. From the cover letter:
  :
  : "The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM.
  : However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0
  : indicates that MTE_ASYNC is supported. On a host with
  : ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the
  : MTE capability enabled will incorrectly see MTE_ASYNC advertised as
  : supported. This series fixes that."
  : .
  KVM: selftests: Confirm exposing MTE_frac does not break migration
  KVM: arm64: Make MTE_frac masking conditional on MTE capability
  arm64/sysreg: Expose MTE_frac so that it is visible to KVM

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoMerge branch kvm-arm64/ubsan-el2 into kvmarm-master/next
Marc Zyngier [Fri, 23 May 2025 09:57:32 +0000 (10:57 +0100)]
Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/next

* kvm-arm64/ubsan-el2:
  : .
  : Add UBSAN support to the EL2 portion of KVM, reusing most of the
  : existing logic provided by CONFIG_IBSAN_TRAP.
  :
  : Patches courtesy of Mostafa Saleh.
  : .
  KVM: arm64: Handle UBSAN faults
  KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2
  ubsan: Remove regs from report_ubsan_failure()
  arm64: Introduce esr_is_ubsan_brk()

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoMerge branch kvm-arm64/pkvm-np-thp-6.16 into kvmarm-master/next
Marc Zyngier [Fri, 23 May 2025 09:56:25 +0000 (10:56 +0100)]
Merge branch kvm-arm64/pkvm-np-thp-6.16 into kvmarm-master/next

* kvm-arm64/pkvm-np-thp-6.16: (21 commits)
  : .
  : Large mapping support for non-protected pKVM guests, courtesy of
  : Vincent Donnefort. From the cover letter:
  :
  : "This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM
  : np-guests, that is installing PMD-level mappings in the stage-2,
  : whenever the stage-1 is backed by either Hugetlbfs or THPs."
  : .
  KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
  KVM: arm64: Stage-2 huge mappings for np-guests
  KVM: arm64: Add a range to pkvm_mappings
  KVM: arm64: Convert pkvm_mappings to interval tree
  KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
  KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
  KVM: arm64: Add a range to __pkvm_host_unshare_guest()
  KVM: arm64: Add a range to __pkvm_host_share_guest()
  KVM: arm64: Introduce for_each_hyp_page
  KVM: arm64: Handle huge mappings for np-guest CMOs
  KVM: arm64: Extend pKVM selftest for np-guests
  KVM: arm64: Selftest for pKVM transitions
  KVM: arm64: Don't WARN from __pkvm_host_share_guest()
  KVM: arm64: Add .hyp.data section
  KVM: arm64: Unconditionally cross check hyp state
  KVM: arm64: Defer EL2 stage-1 mapping on share
  KVM: arm64: Move hyp state to hyp_vmemmap
  KVM: arm64: Introduce {get,set}_host_state() helpers
  KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE
  KVM: arm64: Fix pKVM page-tracking comments
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: np-guest CMOs with PMD_SIZE fixmap
Vincent Donnefort [Wed, 21 May 2025 12:48:34 +0000 (13:48 +0100)]
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap

With the introduction of stage-2 huge mappings in the pKVM hypervisor,
guest pages CMO is needed for PMD_SIZE size. Fixmap only supports
PAGE_SIZE and iterating over the huge-page is time consuming (mostly due
to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency.

Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap)
to improve guest page CMOs when stage-2 huge mappings are installed.

On a Pixel6, the iterative solution resulted in a latency of ~700us,
while the PMD_SIZE fixmap reduces it to ~100us.

Because of the horrendous private range allocation that would be
necessary, this is disabled for 64KiB pages systems.

Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-11-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Stage-2 huge mappings for np-guests
Vincent Donnefort [Wed, 21 May 2025 12:48:33 +0000 (13:48 +0100)]
KVM: arm64: Stage-2 huge mappings for np-guests

Now np-guests hypercalls with range are supported, we can let the
hypervisor to install block mappings whenever the Stage-1 allows it,
that is when backed by either Hugetlbfs or THPs. The size of those block
mappings is limited to PMD_SIZE.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-10-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Add a range to pkvm_mappings
Quentin Perret [Wed, 21 May 2025 12:48:32 +0000 (13:48 +0100)]
KVM: arm64: Add a range to pkvm_mappings

In preparation for supporting stage-2 huge mappings for np-guest, add a
nr_pages member for pkvm_mappings to allow EL1 to track the size of the
stage-2 mapping.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-9-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Convert pkvm_mappings to interval tree
Quentin Perret [Wed, 21 May 2025 12:48:31 +0000 (13:48 +0100)]
KVM: arm64: Convert pkvm_mappings to interval tree

In preparation for supporting stage-2 huge mappings for np-guest, let's
convert pgt.pkvm_mappings to an interval tree.

No functional change intended.

Suggested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-8-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
Vincent Donnefort [Wed, 21 May 2025 12:48:30 +0000 (13:48 +0100)]
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()

In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_test_clear_young_guest hypercall.
This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is
512 on a 4K-pages system).

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-7-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
Vincent Donnefort [Wed, 21 May 2025 12:48:29 +0000 (13:48 +0100)]
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()

In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_wrprotect_guest hypercall. This
range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512
on a 4K-pages system).

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-6-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Add a range to __pkvm_host_unshare_guest()
Vincent Donnefort [Wed, 21 May 2025 12:48:28 +0000 (13:48 +0100)]
KVM: arm64: Add a range to __pkvm_host_unshare_guest()

In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_unshare_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-5-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Add a range to __pkvm_host_share_guest()
Vincent Donnefort [Wed, 21 May 2025 12:48:27 +0000 (13:48 +0100)]
KVM: arm64: Add a range to __pkvm_host_share_guest()

In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_share_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-4-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Introduce for_each_hyp_page
Vincent Donnefort [Wed, 21 May 2025 12:48:26 +0000 (13:48 +0100)]
KVM: arm64: Introduce for_each_hyp_page

Add a helper to iterate over the hypervisor vmemmap. This will be
particularly handy with the introduction of huge mapping support
for the np-guest stage-2.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-3-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Handle huge mappings for np-guest CMOs
Vincent Donnefort [Wed, 21 May 2025 12:48:25 +0000 (13:48 +0100)]
KVM: arm64: Handle huge mappings for np-guest CMOs

clean_dcache_guest_page() and invalidate_icache_guest_page() accept a
size as an argument. But they also rely on fixmap, which can only map a
single PAGE_SIZE page.

With the upcoming stage-2 huge mappings for pKVM np-guests, those
callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis
until the whole range is done.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-2-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoMerge branch kvm-arm64/pkvm-selftest-6.16 into kvm-arm64/pkvm-np-thp-6.16
Marc Zyngier [Wed, 21 May 2025 13:33:43 +0000 (14:33 +0100)]
Merge branch kvm-arm64/pkvm-selftest-6.16 into kvm-arm64/pkvm-np-thp-6.16

* kvm-arm64/pkvm-selftest-6.16:
  : .
  : pKVM selftests covering the memory ownership transitions by
  : Quentin Perret. From the initial cover letter:
  :
  : "We have recently found a bug [1] in the pKVM memory ownership
  : transitions by code inspection, but it could have been caught with a
  : test.
  :
  : Introduce a boot-time selftest exercising all the known pKVM memory
  : transitions and importantly checks the rejection of illegal transitions.
  :
  : The new test is hidden behind a new Kconfig option separate from
  : CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the
  : transition checks ([1] doesn't reproduce with EL2 debug enabled).
  :
  : [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/"
  : .
  KVM: arm64: Extend pKVM selftest for np-guests
  KVM: arm64: Selftest for pKVM transitions
  KVM: arm64: Don't WARN from __pkvm_host_share_guest()
  KVM: arm64: Add .hyp.data section

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoMerge branch kvm-arm64/pkvm-6.16 into kvm-arm64/pkvm-np-thp-6.16
Marc Zyngier [Wed, 21 May 2025 13:33:39 +0000 (14:33 +0100)]
Merge branch kvm-arm64/pkvm-6.16 into kvm-arm64/pkvm-np-thp-6.16

* kvm-arm64/pkvm-6.16:
  : .
  : pKVM memory management cleanups, courtesy of Quentin Perret.
  : From the cover letter:
  :
  : "This series moves the hypervisor's ownership state to the hyp_vmemmap,
  : as discussed in [1]. The two main benefits are:
  :
  :  1. much cheaper hyp state lookups, since we can avoid the hyp stage-1
  :     page-table walk;
  :
  :  2. de-correlates the hyp state from the presence of a mapping in the
  :     linear map range of the hypervisor; which enables a bunch of
  :     clean-ups in the existing code and will simplify the introduction of
  :     other features in the future (hyp tracing, ...)"
  : .
  KVM: arm64: Unconditionally cross check hyp state
  KVM: arm64: Defer EL2 stage-1 mapping on share
  KVM: arm64: Move hyp state to hyp_vmemmap
  KVM: arm64: Introduce {get,set}_host_state() helpers
  KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE
  KVM: arm64: Fix pKVM page-tracking comments
  KVM: arm64: Track SVE state in the hypervisor vcpu structure

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Handle TSB CSYNC traps
Marc Zyngier [Mon, 27 Jan 2025 11:58:38 +0000 (11:58 +0000)]
KVM: arm64: Handle TSB CSYNC traps

The architecture introduces a trap for TSB CSYNC that fits in
the same EC as LS64 and PSB CSYNC. Let's deal with it in a similar
way.

It's not that we expect this to be useful any time soon anyway.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Add FGT descriptors for FEAT_FGT2
Marc Zyngier [Fri, 25 Apr 2025 13:00:01 +0000 (14:00 +0100)]
KVM: arm64: Add FGT descriptors for FEAT_FGT2

Bulk addition of all the FGT2 traps reported with EC == 0x18,
as described in the 2025-03 JSON drop.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Allow sysreg ranges for FGT descriptors
Marc Zyngier [Fri, 25 Apr 2025 12:53:18 +0000 (13:53 +0100)]
KVM: arm64: Allow sysreg ranges for FGT descriptors

Just like we allow sysreg ranges for Coarse Grained Trap descriptors,
allow them for Fine Grain Traps as well.

This comes with a warning that not all ranges are suitable for this
particular definition of ranges.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Add context-switch for FEAT_FGT2 registers
Marc Zyngier [Tue, 22 Apr 2025 20:20:18 +0000 (21:20 +0100)]
KVM: arm64: Add context-switch for FEAT_FGT2 registers

Just like the rest of the FGT registers, perform a switch of the
FGT2 equivalent. This avoids the host configuration leaking into
the guest...

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Add trap routing for FEAT_FGT2 registers
Marc Zyngier [Fri, 25 Apr 2025 16:42:49 +0000 (17:42 +0100)]
KVM: arm64: Add trap routing for FEAT_FGT2 registers

Similarly to the FEAT_FGT registers, pick the correct FEAT_FGT2
register when a sysreg trap indicates they could be responsible
for the exception.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Add sanitisation for FEAT_FGT2 registers
Marc Zyngier [Tue, 22 Apr 2025 20:16:34 +0000 (21:16 +0100)]
KVM: arm64: Add sanitisation for FEAT_FGT2 registers

Just like the FEAT_FGT registers, treat the FGT2 variant the same
way. THis is a large  update, but a fairly mechanical one.

The config dependencies are extracted from the 2025-03 JSON drop.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Add FEAT_FGT2 registers to the VNCR page
Marc Zyngier [Tue, 22 Apr 2025 18:21:46 +0000 (19:21 +0100)]
KVM: arm64: Add FEAT_FGT2 registers to the VNCR page

The FEAT_FGT2 registers are part of the VNCR page. Describe the
corresponding offsets and add them to the vcpu sysreg enumeration.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits
Marc Zyngier [Tue, 4 Feb 2025 10:46:41 +0000 (10:46 +0000)]
KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits

Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.

An additional complexity stems from the status of some bits such
as E2H and RW, which do not had a RESx status, but still take
a fixed value due to implementation choices in KVM.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits
Marc Zyngier [Sun, 9 Feb 2025 14:51:23 +0000 (14:51 +0000)]
KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits

Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Allow kvm_has_feat() to take variable arguments
Marc Zyngier [Sun, 9 Feb 2025 13:38:35 +0000 (13:38 +0000)]
KVM: arm64: Allow kvm_has_feat() to take variable arguments

In order to be able to write more compact (and easier to read) code,
let kvm_has_feat() and co take variable arguments. This enables
constructs such as:

#define FEAT_SME ID_AA64PFR1_EL1, SME, IMP

if (kvm_has_feat(kvm, FEAT_SME))
[...]

which is admitedly more readable.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Use FGT feature maps to drive RES0 bits
Marc Zyngier [Sun, 9 Feb 2025 14:45:29 +0000 (14:45 +0000)]
KVM: arm64: Use FGT feature maps to drive RES0 bits

Another benefit of mapping bits to features is that it becomes trivial
to define which bits should be handled as RES0.

Let's apply this principle to the guest's view of the FGT registers.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: selftests: Confirm exposing MTE_frac does not break migration
Ben Horgan [Mon, 12 May 2025 11:41:12 +0000 (12:41 +0100)]
KVM: selftests: Confirm exposing MTE_frac does not break migration

When MTE is supported but MTE_ASYMM is not (ID_AA64PFR1_EL1.MTE == 2)
ID_AA64PFR1_EL1.MTE_frac == 0xF indicates MTE_ASYNC is unsupported
and MTE_frac == 0 indicates it is supported.

As MTE_frac was previously unconditionally read as 0 from the guest
and user-space, check that using SET_ONE_REG to set it to 0 succeeds
but does not change MTE_frac from unsupported (0xF) to supported (0).
This is required as values originating from KVM from user-space must
be accepted to avoid breaking migration.

Also, to allow this MTE field to be tested, enable KVM_ARM_CAP_MTE
for the set_id_regs test. No effect on existing tests is expected.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Link: https://lore.kernel.org/r/20250512114112.359087-4-ben.horgan@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Make MTE_frac masking conditional on MTE capability
Ben Horgan [Mon, 12 May 2025 11:41:11 +0000 (12:41 +0100)]
KVM: arm64: Make MTE_frac masking conditional on MTE capability

If MTE_frac is masked out unconditionally then the guest will always
see ID_AA64PFR1_EL1_MTE_frac as 0. However, a value of 0 when
ID_AA64PFR1_EL1_MTE is 2 indicates that MTE_ASYNC is supported. Hence, for
a host with ID_AA64PFR1_EL1_MTE==2 and ID_AA64PFR1_EL1_MTE_frac==0xf
(MTE_ASYNC unsupported) the guest would see MTE_ASYNC advertised as
supported whilst the host does not support it. Hence, expose the sanitised
value of MTE_frac to the guest and user-space.

As MTE_frac was previously hidden, always 0, and KVM must accept values
from KVM provided by user-space, when ID_AA64PFR1_EL1.MTE is 2 allow
user-space to set ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to
avoid incorrectly claiming hardware support for MTE_ASYNC in the guest.

Note that linux does not check the value of ID_AA64PFR1_EL1_MTE_frac and
wrongly assumes that MTE async faults can be generated even on hardware
that does nto support them. This issue is not addressed here.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Link: https://lore.kernel.org/r/20250512114112.359087-3-ben.horgan@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64/sysreg: Expose MTE_frac so that it is visible to KVM
Ben Horgan [Mon, 12 May 2025 11:41:10 +0000 (12:41 +0100)]
arm64/sysreg: Expose MTE_frac so that it is visible to KVM

KVM exposes the sanitised ID registers to guests. Currently these ignore
the ID_AA64PFR1_EL1.MTE_frac field, meaning guests always see a value of
zero.

This is a problem for platforms without the MTE_ASYNC feature where
ID_AA64PFR1_EL1.MTE==0x2 and ID_AA64PFR1_EL1.MTE_frac==0xf. KVM forces
MTE_frac to zero, meaning the guest believes MTE_ASYNC is supported, when
no async fault will ever occur.

Before KVM can fix this, the architecture needs to sanitise the ID
register field for MTE_frac.

Linux itself does not use MTE_frac field and just assumes MTE async faults
can be generated if MTE is supported.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Link: https://lore.kernel.org/r/20250512114112.359087-2-ben.horgan@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Don't feed uninitialised data to HCR_EL2
Marc Zyngier [Tue, 22 Apr 2025 12:26:12 +0000 (13:26 +0100)]
KVM: arm64: Don't feed uninitialised data to HCR_EL2

When the guest executes an AT S1E{0,1} from EL2, and that its
HCR_EL2.{E2H,TGE}=={1,1}, then this is a pure S1 translation
that doesn't involve a guest-supplied S2, and the full S1
context is already in place. This allows us to take a shortcut
and avoid save/restoring a bunch of registers.

However, we set HCR_EL2 to a value suitable for the use of AT
in guest context. And we do so by using the value that we saved.
Or not. In the case described above, we restore whatever junk
was on the stack, and carry on with it until the next entry.

Needless to say, this is completely broken.

But this also triggers the realisation that saving HCR_EL2 is
a bit pointless. We are always in host context at the point where
reach this code, and what we program to enter the guest is a known
value (vcpu->arch.hcr_el2).

Drop the pointless save/restore, and wrap the AT operations with
writes that switch between guest and host values for HCR_EL2.

Reported-by: D Scott Phillips <scott@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20250422122612.2675672-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Teach address translation about access faults
Marc Zyngier [Tue, 22 Apr 2025 12:26:11 +0000 (13:26 +0100)]
KVM: arm64: Teach address translation about access faults

It appears that our S1 PTW is completely oblivious of access faults.
Teach the S1 translation code about it.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250422122612.2675672-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Fix PAR_EL1.{PTW,S} reporting on AT S1E*
Marc Zyngier [Tue, 22 Apr 2025 12:26:10 +0000 (13:26 +0100)]
KVM: arm64: Fix PAR_EL1.{PTW,S} reporting on AT S1E*

When an AT S1E* operation fails, we need to report whether the
translation failed at S2, and whether this was during a S1 PTW.

But these two bits are not independent. PAR_EL1.PTW can only be
set of PAR_EL1.S is also set, and PAR_EL1.S can only be set on
its own when the full S1 PTW has succeeded, but that the access
itself is reporting a fault at S2.

As a result, it makes no sense to carry both ptw and s2 as parameters
to fail_s1_walk(), and they should be unified.

This fixes a number of cases where we were reporting PTW=1 *and*
S=0, which makes no sense.

Link: https://lore.kernel.org/r/20250422122612.2675672-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Validate FGT register descriptions against RES0 masks
Marc Zyngier [Sun, 9 Feb 2025 14:19:05 +0000 (14:19 +0000)]
KVM: arm64: Validate FGT register descriptions against RES0 masks

In order to point out to the unsuspecting KVM hacker that they
are missing something somewhere, validate that the known FGT bits
do not intersect with the corresponding RES0 mask, as computed at
boot time.

THis check is also performed at boot time, ensuring that there is
no runtime overhead.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Switch to table-driven FGU configuration
Marc Zyngier [Sun, 9 Feb 2025 14:05:01 +0000 (14:05 +0000)]
KVM: arm64: Switch to table-driven FGU configuration

Defining the FGU behaviour is extremely tedious. It relies on matching
each set of bits from FGT registers with am architectural feature, and
adding them to the FGU list if the corresponding feature isn't advertised
to the guest.

It is however relatively easy to dump most of that information from
the architecture JSON description, and use that to control the FGU bits.

Let's introduce a new set of tables descripbing the mapping between
FGT bits and features. Most of the time, this is only a lookup in
an idreg field, with a few more complex exceptions.

While this is obviously many more lines in a new file, this is
mostly generated, and is pretty easy to maintain.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Handle PSB CSYNC traps
Marc Zyngier [Mon, 27 Jan 2025 11:58:38 +0000 (11:58 +0000)]
KVM: arm64: Handle PSB CSYNC traps

The architecture introduces a trap for PSB CSYNC that fits in
 the same EC as LS64. Let's deal with it in a similar way as
LS64.

It's not that we expect this to be useful any time soon anyway.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask
Marc Zyngier [Fri, 24 Jan 2025 19:04:26 +0000 (19:04 +0000)]
KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask

We do not have a computed table for HCRX_EL2, so statically define
the bits we know about. A warning will fire if the architecture
grows bits that are not handled yet.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Remove hand-crafted masks for FGT registers
Marc Zyngier [Fri, 24 Jan 2025 17:21:17 +0000 (17:21 +0000)]
KVM: arm64: Remove hand-crafted masks for FGT registers

These masks are now useless, and can be removed.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Use computed FGT masks to setup FGT registers
Marc Zyngier [Fri, 24 Jan 2025 17:20:31 +0000 (17:20 +0000)]
KVM: arm64: Use computed FGT masks to setup FGT registers

Flip the hyervisor FGT configuration over to the computed FGT
masks.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Handle UBSAN faults
Mostafa Saleh [Wed, 30 Apr 2025 16:27:11 +0000 (16:27 +0000)]
KVM: arm64: Handle UBSAN faults

As now UBSAN can be enabled, handle brk64 exits from UBSAN.
Re-use the decoding code from the kernel, and panic with
UBSAN message.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-5-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2
Mostafa Saleh [Wed, 30 Apr 2025 16:27:10 +0000 (16:27 +0000)]
KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2

Add a new Kconfig CONFIG_UBSAN_KVM_EL2 for KVM which enables
UBSAN for EL2 code (in protected/nvhe/hvhe) modes.
This will re-use the same checks enabled for the kernel for
the hypervisor. The only difference is that for EL2 it always
emits a "brk" instead of implementing hooks as the hypervisor
can't print reports.

The KVM code will re-use the same code for the kernel
"report_ubsan_failure()" so #ifdefs are changed to also have this
code for CONFIG_UBSAN_KVM_EL2

Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-4-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoubsan: Remove regs from report_ubsan_failure()
Mostafa Saleh [Wed, 30 Apr 2025 16:27:09 +0000 (16:27 +0000)]
ubsan: Remove regs from report_ubsan_failure()

report_ubsan_failure() doesn't use argument regs, and soon it will
be called from the hypervisor context were regs are not available.
So, remove the unused argument.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
Acked-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-3-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: Introduce esr_is_ubsan_brk()
Mostafa Saleh [Wed, 30 Apr 2025 16:27:08 +0000 (16:27 +0000)]
arm64: Introduce esr_is_ubsan_brk()

Soon, KVM is going to use this logic for hypervisor panics,
so add it in a wrapper that can be used by the hypervisor exit
handler to decode hyp panics.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-2-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Propagate FGT masks to the nVHE hypervisor
Marc Zyngier [Fri, 24 Jan 2025 17:17:42 +0000 (17:17 +0000)]
KVM: arm64: Propagate FGT masks to the nVHE hypervisor

The nVHE hypervisor needs to have access to its own view of the FGT
masks, which unfortunately results in a bit of data duplication.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Unconditionally configure fine-grain traps
Mark Rutland [Tue, 19 Nov 2024 13:57:22 +0000 (13:57 +0000)]
KVM: arm64: Unconditionally configure fine-grain traps

... otherwise we can inherit the host configuration if this differs from
the KVM configuration.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[maz: simplified a couple of things]
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Use computed masks as sanitisers for FGT registers
Marc Zyngier [Fri, 24 Jan 2025 16:01:47 +0000 (16:01 +0000)]
KVM: arm64: Use computed masks as sanitisers for FGT registers

Now that we have computed RES0 bits, use them to sanitise the
guest view of FGT registers.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Add description of FGT bits leading to EC!=0x18
Marc Zyngier [Fri, 24 Jan 2025 17:09:27 +0000 (17:09 +0000)]
KVM: arm64: Add description of FGT bits leading to EC!=0x18

The current FTP tables are only concerned with the bits generating
ESR_ELx.EC==0x18. However, we want an exhaustive view of what KVM
really knows about.

So let's add another small table that provides that extra information.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Compute FGT masks from KVM's own FGT tables
Marc Zyngier [Fri, 24 Jan 2025 15:51:12 +0000 (15:51 +0000)]
KVM: arm64: Compute FGT masks from KVM's own FGT tables

In the process of decoupling KVM's view of the FGT bits from the
wider architectural state, use KVM's own FGT tables to build
a synthetic view of what is actually known.

This allows for some checking along the way.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Plug FEAT_GCS handling
Marc Zyngier [Fri, 24 Jan 2025 15:44:32 +0000 (15:44 +0000)]
KVM: arm64: Plug FEAT_GCS handling

We don't seem to be handling the GCS-specific exception class.
Handle it by delivering an UNDEF to the guest, and populate the
relevant trap bits.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Don't treat HCRX_EL2 as a FGT register
Marc Zyngier [Fri, 24 Jan 2025 15:36:17 +0000 (15:36 +0000)]
KVM: arm64: Don't treat HCRX_EL2 as a FGT register

Treating HCRX_EL2 as yet another FGT register seems excessive, and
gets in a way of further improvements. It is actually simpler to
just be explicit about the masking, so just to that.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Restrict ACCDATA_EL1 undef to FEAT_LS64_ACCDATA being disabled
Marc Zyngier [Wed, 3 Jul 2024 15:41:47 +0000 (16:41 +0100)]
KVM: arm64: Restrict ACCDATA_EL1 undef to FEAT_LS64_ACCDATA being disabled

We currently unconditionally make ACCDATA_EL1 accesses UNDEF.

As we are about to support it, restrict the UNDEF behaviour to cases
where FEAT_LS64_ACCDATA is not exposed to the guest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Handle trapping of FEAT_LS64* instructions
Marc Zyngier [Thu, 4 Jul 2024 17:58:01 +0000 (18:58 +0100)]
KVM: arm64: Handle trapping of FEAT_LS64* instructions

We generally don't expect FEAT_LS64* instructions to trap, unless
they are trapped by a guest hypervisor.

Otherwise, this is just the guest playing tricks on us by using
an instruction that isn't advertised, which we handle with a well
deserved UNDEF.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Simplify handling of negative FGT bits
Marc Zyngier [Sat, 26 Apr 2025 11:05:04 +0000 (12:05 +0100)]
KVM: arm64: Simplify handling of negative FGT bits

check_fgt_bit() and triage_sysreg_trap() implement the same thing
twice for no good reason. We have to lookup the FGT register twice,
as we don't communicate it. Similarly, we extract the register value
at the wrong spot.

Reorganise the code in a more logical way so that things are done
at the correct location, removing a lot of duplication.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Tighten handling of unknown FGT groups
Marc Zyngier [Sat, 26 Apr 2025 10:42:15 +0000 (11:42 +0100)]
KVM: arm64: Tighten handling of unknown FGT groups

triage_sysreg_trap() assumes that it knows all the possible values
for FGT groups, which won't be the case as we start adding more
FGT registers (unless we add everything in one go, which is obviously
undesirable).

At the same time, it doesn't offer much in terms of debugging info
when things go wrong.

Turn the "__NR_FGT_GROUP_IDS__" case into a default, covering any
unhandled value, and give the kernel hacker a bit of a clue about
what's wrong (system register and full trap descriptor).

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: Add FEAT_FGT2 capability
Marc Zyngier [Tue, 22 Apr 2025 18:23:41 +0000 (19:23 +0100)]
arm64: Add FEAT_FGT2 capability

As we will eventually have to context-switch the FEAT_FGT2 registers
in KVM (something that has been completely ignored so far), add
a new cap that we will be able to check for.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: Add syndrome information for trapped LD64B/ST64B{,V,V0}
Marc Zyngier [Thu, 4 Jul 2024 17:45:22 +0000 (18:45 +0100)]
arm64: Add syndrome information for trapped LD64B/ST64B{,V,V0}

Provide the architected EC and ISS values for all the FEAT_LS64*
instructions.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: tools: Resync sysreg.h
Marc Zyngier [Sat, 26 Apr 2025 10:17:13 +0000 (11:17 +0100)]
arm64: tools: Resync sysreg.h

Perform a bulk resync of tools/arch/arm64/include/asm/sysreg.h.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: Remove duplicated sysreg encodings
Marc Zyngier [Tue, 11 Mar 2025 09:36:28 +0000 (09:36 +0000)]
arm64: Remove duplicated sysreg encodings

A bunch of sysregs are now generated from the sysreg file, so no
need to carry separate definitions.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: sysreg: Add system instructions trapped by HFGIRT2_EL2
Marc Zyngier [Fri, 25 Apr 2025 12:47:18 +0000 (13:47 +0100)]
arm64: sysreg: Add system instructions trapped by HFGIRT2_EL2

Add the new CMOs trapped by HFGITR2_EL2.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: sysreg: Add registers trapped by HDFG{R,W}TR2_EL2
Marc Zyngier [Fri, 25 Apr 2025 12:44:31 +0000 (13:44 +0100)]
arm64: sysreg: Add registers trapped by HDFG{R,W}TR2_EL2

Bulk addition of all the system registers trapped by HDFG{R,W}TR2_EL2.

The descriptions are extracted from the BSD-licenced JSON file part
of the 2025-03 drop from ARM.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: sysreg: Add registers trapped by HFG{R,W}TR2_EL2
Marc Zyngier [Thu, 24 Apr 2025 18:47:09 +0000 (19:47 +0100)]
arm64: sysreg: Add registers trapped by HFG{R,W}TR2_EL2

Bulk addition of all the system registers trapped by HFG{R,W}TR2_EL2.

The descriptions are extracted from the BSD-licenced JSON file part
of the 2025-03 drop from ARM.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: sysreg: Update CPACR_EL1 description
Marc Zyngier [Tue, 6 May 2025 16:27:25 +0000 (17:27 +0100)]
arm64: sysreg: Update CPACR_EL1 description

Add the couple of fields introduced with FEAT_NV2p1.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: sysreg: Update TRBIDR_EL1 description
Marc Zyngier [Wed, 23 Apr 2025 10:28:05 +0000 (11:28 +0100)]
arm64: sysreg: Update TRBIDR_EL1 description

Add the missing MPAM field.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: sysreg: Update PMSIDR_EL1 description
Marc Zyngier [Wed, 23 Apr 2025 10:27:44 +0000 (11:27 +0100)]
arm64: sysreg: Update PMSIDR_EL1 description

Add the missing SME, ALTCLK, FPF, EFT. CRR and FDS fields.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: sysreg: Update ID_AA64PFR0_EL1 description
Marc Zyngier [Wed, 23 Apr 2025 10:26:42 +0000 (11:26 +0100)]
arm64: sysreg: Update ID_AA64PFR0_EL1 description

Add the missing RASv2 description.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: sysreg: Replace HFGxTR_EL2 with HFG{R,W}TR_EL2
Marc Zyngier [Tue, 11 Mar 2025 14:44:30 +0000 (14:44 +0000)]
arm64: sysreg: Replace HFGxTR_EL2 with HFG{R,W}TR_EL2

Treating HFGRTR_EL2 and HFGWTR_EL2 identically was a mistake.
It makes things hard to reason about, has the potential to
introduce bugs by giving a meaning to bits that are really reserved,
and is in general a bad description of the architecture.

Given that #defines are cheap, let's describe both registers as
intended by the architecture, and repaint all the existing uses.

Yes, this is painful.

The registers themselves are generated from the JSON file in
an automated way.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: sysreg: Add layout for HCR_EL2
Marc Zyngier [Mon, 3 Feb 2025 17:27:23 +0000 (17:27 +0000)]
arm64: sysreg: Add layout for HCR_EL2

Add HCR_EL2 to the sysreg file, more or less directly generated
from the JSON file.

Since the generated names significantly differ from the existing
naming, express the old names in terms of the new one. One day, we'll
fix this mess, but I'm not in any hurry.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: sysreg: Update ID_AA64MMFR4_EL1 description
Marc Zyngier [Mon, 10 Feb 2025 11:35:31 +0000 (11:35 +0000)]
arm64: sysreg: Update ID_AA64MMFR4_EL1 description

Resync the ID_AA64MMFR4_EL1 with the architectue description.

This results in:

- the new PoPS field
- the new NV2P1 value for the NV_frac field
- the new RMEGDI field
- the new SRMASK field

These fields have been generated from the reference JSON file.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoarm64: sysreg: Add ID_AA64ISAR1_EL1.LS64 encoding for FEAT_LS64WB
Marc Zyngier [Thu, 31 Oct 2024 08:42:11 +0000 (08:42 +0000)]
arm64: sysreg: Add ID_AA64ISAR1_EL1.LS64 encoding for FEAT_LS64WB

The 2024 extensions are adding yet another variant of LS64
(aptly named FEAT_LS64WB) supporting LS64 accesses to write-back
memory, as well as 32 byte single-copy atomic accesses using pairs
of FP registers.

Add the relevant encoding to ID_AA64ISAR1_EL1.LS64.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Extend pKVM selftest for np-guests
Quentin Perret [Wed, 16 Apr 2025 16:09:00 +0000 (16:09 +0000)]
KVM: arm64: Extend pKVM selftest for np-guests

The pKVM selftest intends to test as many memory 'transitions' as
possible, so extend it to cover sharing pages with non-protected guests,
including in the case of multi-sharing.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-5-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Selftest for pKVM transitions
Quentin Perret [Wed, 16 Apr 2025 16:08:59 +0000 (16:08 +0000)]
KVM: arm64: Selftest for pKVM transitions

We have recently found a bug [1] in the pKVM memory ownership
transitions by code inspection, but it could have been caught with a
test.

Introduce a boot-time selftest exercising all the known pKVM memory
transitions and importantly checks the rejection of illegal transitions.

The new test is hidden behind a new Kconfig option separate from
CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the
transition checks ([1] doesn't reproduce with EL2 debug enabled).

[1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-4-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Don't WARN from __pkvm_host_share_guest()
Quentin Perret [Wed, 16 Apr 2025 16:08:58 +0000 (16:08 +0000)]
KVM: arm64: Don't WARN from __pkvm_host_share_guest()

We currently WARN() if the host attempts to share a page that is not in
an acceptable state with a guest. This isn't strictly necessary and
makes testing much harder, so drop the WARN and make sure to propage the
error code instead.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-3-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Add .hyp.data section
David Brazdil [Wed, 16 Apr 2025 16:08:57 +0000 (16:08 +0000)]
KVM: arm64: Add .hyp.data section

The hypervisor has not needed its own .data section because all globals
were either .rodata or .bss. To avoid having to initialize future
data-structures at run-time, let's introduce add a .data section to the
hypervisor.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-2-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoMerge branch kvm-arm64/nv-pmu-fixes into kvmarm-master/next
Marc Zyngier [Tue, 29 Apr 2025 12:38:21 +0000 (13:38 +0100)]
Merge branch kvm-arm64/nv-pmu-fixes into kvmarm-master/next

* kvm-arm64/nv-pmu-fixes:
  : .
  : Fixes for NV PMU emulation. From the cover letter:
  :
  : "Joey reports that some of his PMU tests do not behave quite as
  : expected:
  :
  : - MDCR_EL2.HPMN is set to 0 out of reset
  :
  : - PMCR_EL0.P should reset all the counters when written from EL2
  :
  : Oliver points out that setting PMCR_EL0.N from userspace by writing to
  : the register is silly with NV, and that we need a new PMU attribute
  : instead.
  :
  : On top of that, I figured out that we had a number of little gotchas:
  :
  : - It is possible for a guest to write an HPMN value that is out of
  :   bound, and it seems valuable to limit it
  :
  : - PMCR_EL0.N should be the maximum number of counters when read from
  :   EL2, and MDCR_EL2.HPMN when read from EL0/EL1
  :
  : - Prevent userspace from updating PMCR_EL0.N when EL2 is available"
  : .
  KVM: arm64: Let kvm_vcpu_read_pmcr() return an EL-dependent value for PMCR_EL0.N
  KVM: arm64: Handle out-of-bound write to MDCR_EL2.HPMN
  KVM: arm64: Don't let userspace write to PMCR_EL0.N when the vcpu has EL2
  KVM: arm64: Allow userspace to limit the number of PMU counters for EL2 VMs
  KVM: arm64: Contextualise the handling of PMCR_EL0.P writes
  KVM: arm64: Fix MDCR_EL2.HPMN reset value
  KVM: arm64: Repaint pmcr_n into nr_pmu_counters

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Unconditionally cross check hyp state
Quentin Perret [Wed, 16 Apr 2025 15:26:47 +0000 (15:26 +0000)]
KVM: arm64: Unconditionally cross check hyp state

Now that the hypervisor's state is stored in the hyp_vmemmap, we no
longer need an expensive page-table walk to read it. This means we can
now afford to cross check the hyp-state during all memory ownership
transitions where the hyp is involved unconditionally, hence avoiding
problems such as [1].

[1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/

Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-8-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Defer EL2 stage-1 mapping on share
Quentin Perret [Wed, 16 Apr 2025 15:26:46 +0000 (15:26 +0000)]
KVM: arm64: Defer EL2 stage-1 mapping on share

We currently blindly map into EL2 stage-1 *any* page passed to the
__pkvm_host_share_hyp() HVC. This is less than ideal from a security
perspective as it makes exploitation of potential hypervisor gadgets
easier than it should be. But interestingly, pKVM should never need to
access SHARED_BORROWED pages that it hasn't previously pinned, so there
is no need to map the page before that.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-7-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Move hyp state to hyp_vmemmap
Quentin Perret [Wed, 16 Apr 2025 15:26:45 +0000 (15:26 +0000)]
KVM: arm64: Move hyp state to hyp_vmemmap

Tracking the hypervisor's ownership state into struct hyp_page has
several benefits, including allowing far more efficient lookups (no
page-table walk needed) and de-corelating the state from the presence
of a mapping. This will later allow to map pages into EL2 stage-1 less
proactively which is generally a good thing for security. And in the
future this will help with tracking the state of pages mapped into the
hypervisor's private range without requiring an alias into the 'linear
map' range.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-6-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Introduce {get,set}_host_state() helpers
Quentin Perret [Wed, 16 Apr 2025 15:26:44 +0000 (15:26 +0000)]
KVM: arm64: Introduce {get,set}_host_state() helpers

Instead of directly accessing the host_state member in struct hyp_page,
introduce static inline accessors to do it. The future hyp_state member
will follow the same pattern as it will need some logic in the accessors.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-5-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Use 0b11 for encoding PKVM_NOPAGE
Quentin Perret [Wed, 16 Apr 2025 15:26:43 +0000 (15:26 +0000)]
KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE

The page ownership state encoded as 0b11 is currently considered
reserved for future use, and PKVM_NOPAGE uses bit 2. In order to
simplify the relocation of the hyp ownership state into the
vmemmap in later patches, let's use the 'reserved' encoding for
the PKVM_NOPAGE state. The struct hyp_page layout isn't guaranteed
stable at all, so there is no real reason to have 'reserved' encodings.

No functional changes intended.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-4-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Fix pKVM page-tracking comments
Quentin Perret [Wed, 16 Apr 2025 15:26:42 +0000 (15:26 +0000)]
KVM: arm64: Fix pKVM page-tracking comments

Most of the comments relating to pKVM page-tracking in nvhe/memory.h are
now either slightly outdated or outright wrong. Fix the comments.

Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-3-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: Track SVE state in the hypervisor vcpu structure
Fuad Tabba [Wed, 16 Apr 2025 15:26:41 +0000 (15:26 +0000)]
KVM: arm64: Track SVE state in the hypervisor vcpu structure

When dealing with a guest with SVE enabled, make sure the host SVE
state is pinned at EL2 S1, and that the hypervisor vCPU state is
correctly initialised (and then unpinned on teardown).

Co-authored-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-2-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
3 months agoLinux 6.15-rc4
Linus Torvalds [Sun, 27 Apr 2025 22:19:23 +0000 (15:19 -0700)]
Linux 6.15-rc4

3 months agoMerge tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Sat, 26 Apr 2025 20:02:36 +0000 (13:02 -0700)]
Merge tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI fixes from Bjorn Helgaas:

 - When releasing a start-aligned resource, e.g., a bridge window, save
   start/end/flags for the next assignment attempt; fixes a v6.15-rc1
   regression (Ilpo Järvinen)

 - Move set_pcie_speed.sh from TEST_PROGS to TEST_FILE; fixes a bwctrl
   selftest v6.15-rc1 regression (Ilpo Järvinen)

 - Add Manivannan Sadhasivam as maintainer of native host bridge and
   endpoint drivers (Manivannan Sadhasivam)

 - In endpoint test driver, defer IRQ allocation from .probe() until
   ioctl() to fix a regression on platforms where the Vendor/Device ID
   match doesn't include driver_data (Niklas Cassel)

* tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  misc: pci_endpoint_test: Defer IRQ allocation until ioctl(PCITEST_SET_IRQTYPE)
  MAINTAINERS: Move Manivannan Sadhasivam as PCI Native host bridge and endpoint maintainer
  selftests/pcie_bwctrl: Fix test progs list
  PCI: Restore assigned resources fully after release

3 months agoMerge tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Sat, 26 Apr 2025 17:43:03 +0000 (10:43 -0700)]
Merge tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Revert a v6.15 patch due to a report of SELinux test failures

* tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  Revert "sunrpc: clean cache_detail immediately when flush is written frequently"

3 months agoMerge tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 26 Apr 2025 16:45:54 +0000 (09:45 -0700)]
Merge tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Fix 32-bit kernel boot crash if passed physical memory with more than
   32 address bits

 - Fix Xen PV crash

 - Work around build bug in certain limited build environments

 - Fix CTEST instruction decoding in insn_decoder_test

* tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/insn: Fix CTEST instruction decoding
  x86/boot: Work around broken busybox 'truncate' tool
  x86/mm: Fix _pgd_alloc() for Xen PV mode
  x86/e820: Discard high memory that can't be addressed by 32-bit systems

3 months agoMerge tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 26 Apr 2025 16:23:20 +0000 (09:23 -0700)]
Merge tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
 "Fix sporadic crashes in dequeue_entities() due to ... bad math.

  [ Arguably if pick_eevdf()/pick_next_entity() was less trusting of
    complex math being correct it could have de-escalated a crash into
    a warning, but that's for a different patch ]"

* tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash

3 months agoMerge tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 26 Apr 2025 16:13:09 +0000 (09:13 -0700)]
Merge tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc perf events fixes from Ingo Molnar:

 - Use POLLERR for events in error state, instead of the ambiguous
   POLLHUP error value

 - Fix non-sampling (counting) events on certain x86 platforms

* tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix non-sampling (counting) events on certain x86 platforms
  perf/core: Change to POLLERR for pinned events with error

3 months agoMerge tag 'irq-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 26 Apr 2025 16:08:45 +0000 (09:08 -0700)]
Merge tag 'irq-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
 "Fix crashes in the gic-v2m irqchip driver, caused by an incorrect
  __init annotation"

* tag 'irq-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()

3 months agoMerge tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 26 Apr 2025 16:02:41 +0000 (09:02 -0700)]
Merge tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
 "Add a missing Kconfig option, fix some bugs in exception handlers,
  memory management and KVM"

* tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally
  LoongArch: KVM: Fully clear some CSRs when VM reboot
  LoongArch: KVM: Fix multiple typos of KVM code
  LoongArch: Return NULL from huge_pte_offset() for invalid PMD
  LoongArch: Remove a bogus reference to ZONE_DMA
  LoongArch: Handle fp, lsx, lasx and lbt assembly symbols
  LoongArch: Make do_xyz() exception handlers more robust
  LoongArch: Make regs_irqs_disabled() more clear
  LoongArch: Select ARCH_USE_MEMTEST

3 months agoMerge tag 'for-linus' of https://github.com/openrisc/linux
Linus Torvalds [Sat, 26 Apr 2025 16:01:13 +0000 (09:01 -0700)]
Merge tag 'for-linus' of https://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:

 - Support for cacheinfo API to expose OpenRISC cache info via sysfs,
   this also translated to some cleanups to OpenRISC cache flush and
   invalidate API's

 - Documentation updates for new mailing list and toolchain binaries

* tag 'for-linus' of https://github.com/openrisc/linux:
  Documentation: openrisc: Update toolchain binaries URL
  Documentation: openrisc: Update mailing list
  openrisc: Add cacheinfo support
  openrisc: Introduce new utility functions to flush and invalidate caches
  openrisc: Refactor struct cpuinfo_or1k to reduce duplication

3 months agoRevert "sunrpc: clean cache_detail immediately when flush is written frequently"
Chuck Lever [Thu, 24 Apr 2025 13:27:35 +0000 (09:27 -0400)]
Revert "sunrpc: clean cache_detail immediately when flush is written frequently"

Ondrej reports that certain SELinux tests are failing after commit
fc2a169c56de ("sunrpc: clean cache_detail immediately when flush is
written frequently"), merged during the v6.15 merge window.

Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Fixes: fc2a169c56de ("sunrpc: clean cache_detail immediately when flush is written frequently")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 months agoMerge tag 'move-lib-kunit-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 26 Apr 2025 15:55:24 +0000 (08:55 -0700)]
Merge tag 'move-lib-kunit-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kunit fix from Kees Cook:
 "A single fix for the kunit lib/tests/ relocation:

   - Ensure prime numbers tests are included in KUnit test runs (Mark Brown)"

* tag 'move-lib-kunit-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lib: Ensure prime numbers tests are included in KUnit test runs

3 months agoMerge tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Sat, 26 Apr 2025 15:32:29 +0000 (08:32 -0700)]
Merge tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Weekly drm fixes, mostly amdgpu, with some exynos cleanups and a
  couple of minor fixes, seems a bit quiet, but probably some lag from
  Easter holidays.

  amdgpu:
   - P2P DMA fixes
   - Display reset fixes
   - DCN 3.5 fixes
   - ACPI EDID fix
   - LTTPR fix
   - mode_valid() fix

  exynos:
   - fix spelling error
   - remove redundant error handling in exynos_drm_vidi.c module
   - marks struct decon_data as const in the exynos7_drm_decon driver
     since it is only read
   - Remove unnecessary checking in exynos_drm_drv.c module

  meson:
   - Fix VCLK calculation

  panel:
   - jd9365a: Fix reset polarity"

* tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernel:
  drm/exynos: Fix spelling mistake "enqueu" -> "enqueue"
  drm/exynos: exynos7_drm_decon: Consstify struct decon_data
  drm/exynos: fixed a spelling error
  drm/exynos/vidi: Remove redundant error handling in vidi_get_modes()
  drm/exynos: Remove unnecessary checking
  drm/amd/display: do not copy invalid CRTC timing info
  drm/amd/display: Default IPS to RCG_IN_ACTIVE_IPS2_IN_OFF
  drm/amd/display: Use 16ms AUX read interval for LTTPR with old sinks
  drm/amd/display: Fix ACPI edid parsing on some Lenovo systems
  drm/amdgpu: Allow P2P access through XGMI
  drm/amd/display: Enable urgent latency adjustment on DCN35
  drm/amd/display: Force full update in gpu reset
  drm/amd/display: Fix gpu reset in multidisplay config
  drm/amdgpu: Don't pin VRAM without DMABUF_MOVE_NOTIFY
  drm/amdgpu: Use allowed_domains for pinning dmabufs
  drm: panel: jd9365da: fix reset signal polarity in unprepare
  drm/meson: use unsigned long long / Hz for frequency types
  Revert "drm/meson: vclk: fix calculation of 59.94 fractional rates"

3 months agosched/eevdf: Fix se->slice being set to U64_MAX and resulting crash
Omar Sandoval [Fri, 25 Apr 2025 08:51:24 +0000 (01:51 -0700)]
sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash

There is a code path in dequeue_entities() that can set the slice of a
sched_entity to U64_MAX, which sometimes results in a crash.

The offending case is when dequeue_entities() is called to dequeue a
delayed group entity, and then the entity's parent's dequeue is delayed.
In that case:

1. In the if (entity_is_task(se)) else block at the beginning of
   dequeue_entities(), slice is set to
   cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then
   it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX.
2. The first for_each_sched_entity() loop dequeues the entity.
3. If the entity was its parent's only child, then the next iteration
   tries to dequeue the parent.
4. If the parent's dequeue needs to be delayed, then it breaks from the
   first for_each_sched_entity() loop _without updating slice_.
5. The second for_each_sched_entity() loop sets the parent's ->slice to
   the saved slice, which is still U64_MAX.

This throws off subsequent calculations with potentially catastrophic
results. A manifestation we saw in production was:

6. In update_entity_lag(), se->slice is used to calculate limit, which
   ends up as a huge negative number.
7. limit is used in se->vlag = clamp(vlag, -limit, limit). Because limit
   is negative, vlag > limit, so se->vlag is set to the same huge
   negative number.
8. In place_entity(), se->vlag is scaled, which overflows and results in
   another huge (positive or negative) number.
9. The adjusted lag is subtracted from se->vruntime, which increases or
   decreases se->vruntime by a huge number.
10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which
    incorrectly returns false because the vruntime is so far from the
    other vruntimes on the queue, causing the
    (vruntime - cfs_rq->min_vruntime) * load calulation to overflow.
11. Nothing appears to be eligible, so pick_eevdf() returns NULL.
12. pick_next_entity() tries to dereference the return value of
    pick_eevdf() and crashes.

Dumping the cfs_rq states from the core dumps with drgn showed tell-tale
huge vruntime ranges and bogus vlag values, and I also traced se->slice
being set to U64_MAX on live systems (which was usually "benign" since
the rest of the runqueue needed to be in a particular state to crash).

Fix it in dequeue_entities() by always setting slice from the first
non-empty cfs_rq.

Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/f0c2d1072be229e1bdddc73c0703919a8b00c652.1745570998.git.osandov@fb.com
3 months agoirqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()
Suzuki K Poulose [Tue, 22 Apr 2025 16:16:16 +0000 (17:16 +0100)]
irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()

With ACPI in place, gicv2m_get_fwnode() is registered with the pci
subsystem as pci_msi_get_fwnode_cb(), which may get invoked at runtime
during a PCI host bridge probe. But, the call back is wrongly marked as
__init, causing it to be freed, while being registered with the PCI
subsystem and could trigger:

 Unable to handle kernel paging request at virtual address ffff8000816c0400
  gicv2m_get_fwnode+0x0/0x58 (P)
  pci_set_bus_msi_domain+0x74/0x88
  pci_register_host_bridge+0x194/0x548

This is easily reproducible on a Juno board with ACPI boot.

Retain the function for later use.

Fixes: 0644b3daca28 ("irqchip/gic-v2m: acpi: Introducing GICv2m ACPI support")
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
3 months agoLoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally
Bibo Mao [Thu, 24 Apr 2025 12:15:52 +0000 (20:15 +0800)]
LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally

In function kvm_pre_enter_guest(), it prepares to enter guest and check
whether there are pending signals or events. And it will not enter guest
if there are, PMU pass-through preparation for guest should be cancelled
and host should own PMU hardware.

Cc: stable@vger.kernel.org
Fixes: f4e40ea9f78f ("LoongArch: KVM: Add PMU support for guest")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
3 months agoLoongArch: KVM: Fully clear some CSRs when VM reboot
Bibo Mao [Thu, 24 Apr 2025 12:15:52 +0000 (20:15 +0800)]
LoongArch: KVM: Fully clear some CSRs when VM reboot

Some registers such as LOONGARCH_CSR_ESTAT and LOONGARCH_CSR_GINTC are
partly cleared with function _kvm_setcsr(). This comes from the hardware
specification, some bits are read only in VM mode, and however they can
be written in host mode. So they are partly cleared in VM mode, and can
be fully cleared in host mode.

These read only bits show pending interrupt or exception status. When VM
reset, the read-only bits should be cleared, otherwise vCPU will receive
unknown interrupts in boot stage.

Here registers LOONGARCH_CSR_ESTAT/LOONGARCH_CSR_GINTC are fully cleared
in ioctl KVM_REG_LOONGARCH_VCPU_RESET vCPU reset path.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
3 months agoLoongArch: KVM: Fix multiple typos of KVM code
Yulong Han [Thu, 24 Apr 2025 12:15:52 +0000 (20:15 +0800)]
LoongArch: KVM: Fix multiple typos of KVM code

Fix multiple typos inside arch/loongarch/kvm.

Cc: stable@vger.kernel.org
Reviewed-by: Yuli Wang <wangyuli@uniontech.com>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Yulong Han <wheatfox17@icloud.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
3 months agoLoongArch: Return NULL from huge_pte_offset() for invalid PMD
Ming Wang [Thu, 24 Apr 2025 12:15:47 +0000 (20:15 +0800)]
LoongArch: Return NULL from huge_pte_offset() for invalid PMD

LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot
even if the underlying entry points to invalid_pte_table (indicating no
mapping). Callers like smaps_hugetlb_range() fetch this invalid entry
value (the address of invalid_pte_table) via this pointer.

The generic is_swap_pte() check then incorrectly identifies this address
as a swap entry on LoongArch, because it satisfies the "!pte_present()
&& !pte_none()" conditions. This misinterpretation, combined with a
coincidental match by is_migration_entry() on the address bits, leads to
kernel crashes in pfn_swap_entry_to_page().

Fix this at the architecture level by modifying huge_pte_offset() to
check the PMD entry's content using pmd_none() before returning. If the
entry is invalid (i.e., it points to invalid_pte_table), return NULL
instead of the pointer to the slot.

Cc: stable@vger.kernel.org
Acked-by: Peter Xu <peterx@redhat.com>
Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Ming Wang <wangming01@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>