David Woodhouse [Wed, 25 Oct 2023 21:53:45 +0000 (22:53 +0100)]
KVM: x86/xen: Inject vCPU upcall vector when local APIC is enabled
Linux guests since commit b1c3497e604d ("x86/xen: Add support for
HVMOP_set_evtchn_upcall_vector") in v6.0 onwards will use the per-vCPU
upcall vector when it's advertised in the Xen CPUID leaves.
This upcall is injected through the guest's local APIC as an MSI, unlike
the older system vector which was merely injected by the hypervisor any
time the CPU was able to receive an interrupt and the upcall_pending
flags is set in its vcpu_info.
Effectively, that makes the per-CPU upcall edge triggered instead of
level triggered, which results in the upcall being lost if the MSI is
delivered when the local APIC is *disabled*.
Xen checks the vcpu_info->evtchn_upcall_pending flag when the local APIC
for a vCPU is software enabled (in fact, on any write to the SPIV
register which doesn't disable the APIC). Do the same in KVM since KVM
doesn't provide a way for userspace to intervene and trap accesses to
the SPIV register of a local APIC emulated by KVM.
Astute reviewers may note that kvm_xen_inject_vcpu_vector() function has
a WARN_ON_ONCE() in the case where kvm_irq_delivery_to_apic_fast() fails
and returns false. In the case where the MSI is not delivered due to the
local APIC being disabled, kvm_irq_delivery_to_apic_fast() still returns
true but the value in *r is zero. So the WARN_ON_ONCE() remains correct,
as that case should still never happen.
Fixes: fde0451be8fb3 ("KVM: x86/xen: Support per-vCPU event channel upcall via local APIC") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Cc: stable@vger.kernel.org
David Woodhouse [Fri, 27 Oct 2023 13:33:20 +0000 (14:33 +0100)]
KVM: x86/xen: improve accuracy of Xen timers
A test program such as http://david.woodhou.se/timerlat.c confirms user
reports that timers are increasingly inaccurate as the lifetime of a
guest increases. Reporting the actual delay observed when asking for
100µs of sleep, it starts off OK on a newly-launched guest but gets
worse over time, giving incorrect sleep times:
The biggest problem is that get_kvmclock_ns() returns inaccurate values
when the guest TSC is scaled. The guest sees a TSC value scaled from the
host TSC by a mul/shift conversion (hopefully done in hardware). The
guest then converts that guest TSC value into nanoseconds using the
mul/shift conversion given to it by the KVM pvclock information.
But get_kvmclock_ns() performs only a single conversion directly from
host TSC to nanoseconds, giving a different result. A test program at
http://david.woodhou.se/tsdrift.c demonstrates the cumulative error
over a day.
It's non-trivial to fix get_kvmclock_ns(), although I'll come back to
that. The actual guest hv_clock is per-CPU, and *theoretically* each
vCPU could be running at a *different* frequency. But this patch is
needed anyway because...
The other issue with Xen timers was that the code would snapshot the
host CLOCK_MONOTONIC at some point in time, and then... after a few
interrupts may have occurred, some preemption perhaps... would also read
the guest's kvmclock. Then it would proceed under the false assumption
that those two happened at the *same* time. Any time which *actually*
elapsed between reading the two clocks was introduced as inaccuracies
in the time at which the timer fired.
Fix it to use a variant of kvm_get_time_and_clockread(), which reads the
host TSC just *once*, then use the returned TSC value to calculate the
kvmclock (making sure to do that the way the guest would instead of
making the same mistake get_kvmclock_ns() does).
Sadly, hrtimers based on CLOCK_MONOTONIC_RAW are not supported, so Xen
timers still have to use CLOCK_MONOTONIC. In practice the difference
between the two won't matter over the timescales involved, as the
*absolute* values don't matter; just the delta.
This does mean a new variant of kvm_get_time_and_clockread() is needed;
called kvm_get_monotonic_and_clockread() because that's what it does.
Fixes: 536395260582 ("KVM: x86/xen: handle PV timers oneshot mode") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
David Woodhouse [Fri, 12 Jan 2024 17:54:05 +0000 (17:54 +0000)]
KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues
This function can race with kvm_gpc_deactivate(), which does not take
the ->refresh_lock. This means kvm_gpc_deactivate() can wipe the ->pfn
and ->khva fields, and unmap the latter, while hva_to_pfn_retry() has
temporarily dropped its write lock on gpc->lock.
Then if hva_to_pfn_retry() determines that the PFN hasn't changed and
that the original pfn and khva can be reused, they get assigned back to
gpc->pfn and gpc->khva even though the khva was already unmapped by
kvm_gpc_deactivate(). This leaves the cache in an apparently valid state
but with ->khva pointing to an address which has been unmapped. Which in
turn leads to oopses in e.g. __kvm_xen_has_interrupt() and
set_shinfo_evtchn_pending() when they dereference said khva.
It may be possible to fix this just by making kvm_gpc_deactivate() take
the ->refresh_lock, but that still leaves ->refresh_lock being basically
redundant with the write lock on ->lock, which frankly makes my skin
itch, with the way that pfn_to_hva_retry() operates on fields in the gpc
without holding a write lock on ->lock.
Instead, fix it by cleaning up the semantics of hva_to_pfn_retry(). It
now no longer does locking gymnastics because it no longer operates on
the gpc object at all. I's called with a uhva and simply returns the
corresponding pfn (pinned), and a mapped khva for it.
Its caller __kvm_gpc_refresh() now sets gpc->uhva and clears gpc->valid
before dropping ->lock, calling hva_to_pfn_retry() and retaking ->lock
for write.
If hva_to_pfn_retry() fails, *or* if the ->uhva or ->active fields in
the gpc changed while the lock was dropped, the new mapping is discarded
and the gpc is not modified. On success with an unchanged gpc, the new
mapping is installed and the current ->pfn and ->uhva are taken into the
local old_pfn and old_khva variables to be unmapped once the locks are
all released.
This simplification means that ->refresh_lock is no longer needed for
correctness, but it does still provide a minor optimisation because it
will prevent two concurrent __kvm_gpc_refresh() calls from mapping a
given PFN, only for one of them to lose the race and discard its
mapping.
The optimisation in hva_to_pfn_retry() where it attempts to use the old
mapping if the pfn doesn't change is dropped, since it makes the pinning
more complex. It's a pointless optimisation anyway, since the odds of
the pfn ending up the same when the uhva has changed (i.e. the odds of
the two userspace addresses both pointing to the same underlying
physical page) are negligible,
The 'hva_changed' local variable in __kvm_gpc_refresh() is also removed,
since it's simpler just to clear gpc->valid if the uhva changed.
Likewise the unmap_old variable is dropped because it's just as easy to
check the old_pfn variable for KVM_PFN_ERR_FAULT.
I remain slightly confused because although this is clearly a race in
the gfn_to_pfn_cache code, I don't quite know how the Xen support code
actually managed to trigger it. We've seen oopses from dereferencing a
valid-looking ->khva in both __kvm_xen_has_interrupt() (the vcpu_info)
and in set_shinfo_evtchn_pending() (the shared_info). But surely the
race shouldn't happen for the vcpu_info gpc because all calls to both
refresh and deactivate hold the vcpu mutex, and it shouldn't happen
for the shared_info gpc because all calls to both will hold the
kvm->arch.xen.xen_lock mutex.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <pdurrant@amazon.com>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com>
v12:
- New in this version.
Paul Durrant [Tue, 26 Sep 2023 10:52:37 +0000 (10:52 +0000)]
KVM: xen: allow vcpu_info content to be 'safely' copied
If the guest sets an explicit vcpu_info GPA then, for any of the first 32
vCPUs, the content of the default vcpu_info in the shared_info page must be
copied into the new location. Because this copy may race with event
delivery (which updates the 'evtchn_pending_sel' field in vcpu_info) there
needs to be a way to defer that until the copy is complete.
Happily there is already a shadow of 'evtchn_pending_sel' in kvm_vcpu_xen
that is used in atomic context if the vcpu_info PFN cache has been
invalidated so that the update of vcpu_info can be deferred until the
cache can be refreshed (on vCPU thread's the way back into guest context).
Also use this shadow if the vcpu_info cache has been *deactivated*, so that
the VMM can safely copy the vcpu_info content and then re-activate the
cache with the new GPA. To do this, stop considering an inactive vcpu_info
cache as a hard error in kvm_xen_set_evtchn_fast().
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: David Woodhouse <dwmw2@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org
v8:
- Update commit comment.
Paul Durrant [Mon, 4 Dec 2023 10:41:39 +0000 (10:41 +0000)]
KVM: pfncache: check the need for invalidation under read lock first
Taking a write lock on a pfncache will be disruptive if the cache is
heavily used (which only requires a read lock). Hence, in the MMU notifier
callback, take read locks on caches to check for a match; only taking a
write lock to actually perform an invalidation (after a another check).
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org>
v10:
- New in this version.
Paul Durrant [Mon, 4 Dec 2023 10:06:08 +0000 (10:06 +0000)]
KVM: xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast()
As described in [1] compiling with CONFIG_PROVE_RAW_LOCK_NESTING shows that
kvm_xen_set_evtchn_fast() is blocking on pfncache locks in IRQ context.
There is only actually blocking with PREEMPT_RT because the locks will
turned into mutexes. There is no 'raw' version of rwlock_t that can be used
to avoid that, so use read_trylock() and treat failure to lock the same as
an invalid cache.
Paul Durrant [Wed, 15 Nov 2023 21:03:32 +0000 (21:03 +0000)]
KVM: xen: split up kvm_xen_set_evtchn_fast()
The implementation of kvm_xen_set_evtchn_fast() is a rather lengthy piece
of code that performs two operations: updating of the shared_info
evtchn_pending mask, and updating of the vcpu_info evtchn_pending_sel
mask. Introduce a separate function to perform each of those operations and
re-work kvm_xen_set_evtchn_fast() to use them.
No functional change intended.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: x86@kernel.org
v11:
- Fixed /64 vs /32 switcheroo and changed type of port_word_bit back to
int.
v10:
- Updated in this version. Dropped David'd R-b since the updates are
non-trivial.
Paul Durrant [Mon, 18 Sep 2023 09:42:14 +0000 (09:42 +0000)]
KVM: xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability
Now that all relevant kernel changes and selftests are in place, enable the
new capability.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: x86@kernel.org
v2:
- New in this version.
Paul Durrant [Fri, 22 Sep 2023 13:47:57 +0000 (13:47 +0000)]
KVM: selftests / xen: re-map vcpu_info using HVA rather than GPA
If the relevant capability (KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) is present
then re-map vcpu_info using the HVA part way through the tests to make sure
then there is no functional change.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org>
v5:
- New in this version.
Paul Durrant [Mon, 18 Sep 2023 08:44:43 +0000 (08:44 +0000)]
KVM: selftests / xen: map shared_info using HVA rather than GFN
Using the HVA of the shared_info page is more efficient, so if the
capability (KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) is present use that method
to do the mapping.
NOTE: Have the juggle_shinfo_state() thread map and unmap using both
GFN and HVA, to make sure the older mechanism is not broken.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org>
v3:
- Re-work the juggle_shinfo_state() thread.
Paul Durrant [Thu, 21 Sep 2023 08:50:44 +0000 (08:50 +0000)]
KVM: xen: allow vcpu_info to be mapped by fixed HVA
If the guest does not explicitly set the GPA of vcpu_info structure in
memory then, for guests with 32 vCPUs or fewer, the vcpu_info embedded
in the shared_info page may be used. As described in a previous commit,
the shared_info page is an overlay at a fixed HVA within the VMM, so in
this case it also more optimal to activate the vcpu_info cache with a
fixed HVA to avoid unnecessary invalidation if the guest memory layout
is modified.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: David Woodhouse <dwmw2@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org
v8:
- Re-base.
Paul Durrant [Thu, 7 Sep 2023 17:43:11 +0000 (17:43 +0000)]
KVM: xen: allow shared_info to be mapped by fixed HVA
The shared_info page is not guest memory as such. It is a dedicated page
allocated by the VMM and overlaid onto guest memory in a GFN chosen by the
guest and specified in the XENMEM_add_to_physmap hypercall. The guest may
even request that shared_info be moved from one GFN to another by
re-issuing that hypercall, but the HVA is never going to change.
Because the shared_info page is an overlay the memory slots need to be
updated in response to the hypercall. However, memory slot adjustment is
not atomic and, whilst all vCPUs are paused, there is still the possibility
that events may be delivered (which requires the shared_info page to be
updated) whilst the shared_info GPA is absent. The HVA is never absent
though, so it makes much more sense to use that as the basis for the
kernel's mapping.
Hence add a new KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA attribute type for this
purpose and a KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag to advertize its
availability. Don't actually advertize it yet though. That will be done in
a subsequent patch, which will also add tests for the new attribute type.
Also update the KVM API documentation with the new attribute and also fix
it up to consistently refer to 'shared_info' (with the underscore).
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: David Woodhouse <dwmw2@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org
v8:
- Re-base.
v2:
- Define the new attribute and capability but don't advertize the
capability yet.
- Add API documentation.
Paul Durrant [Wed, 8 Nov 2023 10:06:33 +0000 (10:06 +0000)]
KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set
If the shared_info PFN cache has already been initialized then the content
of the shared_info page needs to be re-initialized whenever the guest
mode is (re)set.
Setting the guest mode is either done explicitly by the VMM via the
KVM_XEN_ATTR_TYPE_LONG_MODE attribute, or implicitly when the guest writes
the MSR to set up the hypercall page.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: x86@kernel.org
v12:
- Fix missing update of return value if mode is not actually changed.
v11:
- Drop the hunk removing the call to kvm_xen_shared_info_init() when
KVM_XEN_ATTR_TYPE_SHARED_INFO is set; it was a mistake and causes self-
test failures.
Paul Durrant [Wed, 8 Nov 2023 09:44:26 +0000 (09:44 +0000)]
KVM: xen: separate initialization of shared_info cache and content
A subsequent patch will allow shared_info to be initialized using either a
GPA or a user-space (i.e. VMM) HVA. To make that patch cleaner, separate
the initialization of the shared_info content from the activation of the
pfncache.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: x86@kernel.org
v11:
- Fix accidental regression from commit 5d6d6a7d7e66a ("KVM: x86: Refine
calculation of guest wall clock to use a single TSC read").
Paul Durrant [Mon, 13 Nov 2023 20:56:20 +0000 (20:56 +0000)]
KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
Some pfncache pages may actually be overlays on guest memory that have a
fixed HVA within the VMM. It's pointless to invalidate such cached
mappings if the overlay is moved so allow a cache to be activated directly
with the HVA to cater for such cases. A subsequent patch will make use
of this facility.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org>
v11:
- Fixed kvm_gpc_check() to ignore memslot generation if the cache is not
activated with a GPA. (This breakage occured during the re-work for v8).
v9:
- Pass both GPA and HVA into __kvm_gpc_refresh() rather than overloading
the address paraneter and using a bool flag to indicated what it is.
v8:
- Re-worked to avoid messing with struct gfn_to_pfn_cache.
Paul Durrant [Fri, 10 Nov 2023 10:32:21 +0000 (10:32 +0000)]
KVM: pfncache: include page offset in uhva and use it consistently
Currently the pfncache page offset is sometimes determined using the gpa
and sometimes the khva, whilst the uhva is always page-aligned. After a
subsequent patch is applied the gpa will not always be valid so adjust
the code to include the page offset in the uhva and use it consistently
as the source of truth.
Also, where a page-aligned address is required, use PAGE_ALIGN_DOWN()
for clarity.
No functional change intended.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org>
v8:
- New in this version.
Paul Durrant [Fri, 10 Nov 2023 09:29:01 +0000 (09:29 +0000)]
KVM: pfncache: stop open-coding offset_in_page()
Some code in pfncache uses offset_in_page() but in other places it is open-
coded. Use offset_in_page() consistently everywhere.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org>
v8:
- New in this version.
Paul Durrant [Thu, 9 Nov 2023 15:43:41 +0000 (15:43 +0000)]
KVM: pfncache: remove KVM_GUEST_USES_PFN usage
As noted in [1] the KVM_GUEST_USES_PFN usage flag is never set by any
callers of kvm_gpc_init(), which also makes the 'vcpu' argument redundant.
Moreover, all existing callers specify KVM_HOST_USES_PFN so the usage
check in hva_to_pfn_retry() and hence the 'usage' argument to
kvm_gpc_init() are also redundant.
Remove the pfn_cache_usage enumeration and remove the redundant arguments,
fields of struct gfn_to_hva_cache, and all the related code.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: x86@kernel.org
v8:
- New in this version.
Paul Durrant [Thu, 7 Sep 2023 16:03:40 +0000 (16:03 +0000)]
KVM: pfncache: add a mark-dirty helper
At the moment pages are marked dirty by open-coded calls to
mark_page_dirty_in_slot(), directly deferefencing the gpa and memslot
from the cache. After a subsequent patch these may not always be set
so add a helper now so that caller will protected from the need to know
about this detail.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: David Woodhouse <dwmw2@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org
v8:
- Make the helper a static inline.
Paul Durrant [Thu, 9 Nov 2023 14:17:02 +0000 (14:17 +0000)]
KVM: xen: mark guest pages dirty with the pfncache lock held
Sampling gpa and memslot from an unlocked pfncache may yield inconsistent
values so, since there is no problem with calling mark_page_dirty_in_slot()
with the pfncache lock held, relocate the calls in
kvm_xen_update_runstate_guest() and kvm_xen_inject_pending_events()
accordingly.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: x86@kernel.org
v8:
- New in this version.
Paul Durrant [Thu, 9 Nov 2023 13:09:22 +0000 (13:09 +0000)]
KVM: pfncache: remove unnecessary exports
There is no need for the existing kvm_gpc_XXX() functions to be exported.
Clean up now before additional functions are added in subsequent patches.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com>
v8:
- New in this version.
Paul Durrant [Thu, 7 Sep 2023 12:53:13 +0000 (12:53 +0000)]
KVM: pfncache: Add a map helper function
There is a pfncache unmap helper but mapping is open-coded. Arguably this
is fine because mapping is done in only one place, hva_to_pfn_retry(), but
adding the helper does make that function more readable.
No functional change intended.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
--- Cc: Sean Christopherson <seanjc@google.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com>
v8:
- Re-work commit comment.
- Fix CONFIG_HAS_IOMEM=n build.
Kirill A. Shutemov [Tue, 5 Dec 2023 00:45:01 +0000 (03:45 +0300)]
x86/kvm: Do not try to disable kvmclock if it was not enabled
kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is
present in the VM. It leads to write to a MSR that doesn't exist on some
configurations, namely in TDX guest:
unchecked MSR access error: WRMSR to 0x12 (tried to write 0x0000000000000000)
at rIP: 0xffffffff8110687c (kvmclock_disable+0x1c/0x30)
kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt
features.
Do not disable kvmclock if it was not enabled.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown") Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org
Message-Id: <20231205004510.27164-6-kirill.shutemov@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 8 Jan 2024 13:10:32 +0000 (08:10 -0500)]
Merge tag 'kvm-x86-mmu-6.8' of https://github.com/kvm-x86/linux into HEAD
KVM x86 MMU changes for 6.8:
- Fix a relatively benign off-by-one error when splitting huge pages during
CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
- Relax the TDP MMU's lockdep assertions related to holding mmu_lock for read
versus write so that KVM doesn't pass "bool shared" all over the place just
to have precise assertions in paths that don't actually care about whether
the caller is a reader or a writer.
Paolo Bonzini [Mon, 8 Jan 2024 13:10:20 +0000 (08:10 -0500)]
Merge tag 'kvm-x86-xen-6.8' of https://github.com/kvm-x86/linux into HEAD
KVM Xen change for 6.8:
To workaround Xen guests that don't expect Xen PV clocks to be marked as being
based on a stable TSC, add a Xen config knob to allow userspace to opt out of
KVM setting the "TSC stable" bit in Xen PV clocks. Note, the "TSC stable" bit
was added to the PVCLOCK ABI by KVM without an ack from Xen, i.e. KVM isn't
entirely blameless for the buggy guest behavior.
Paolo Bonzini [Mon, 8 Jan 2024 13:10:16 +0000 (08:10 -0500)]
Merge tag 'kvm-x86-svm-6.8' of https://github.com/kvm-x86/linux into HEAD
KVM SVM changes for 6.8:
- Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.
- Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
flushes on nested transitions, i.e. always satisfies flush requests. This
allows running bleeding edge versions of VMware Workstation on top of KVM.
- Sanity check that the CPU supports flush-by-ASID when enabling SEV support.
- Fix a benign NMI virtualization bug where KVM would unnecessarily intercept
IRET when manually injecting an NMI, e.g. when KVM pends an NMI and injects
a second, "simultaneous" NMI.
Paolo Bonzini [Mon, 8 Jan 2024 13:10:12 +0000 (08:10 -0500)]
Merge tag 'kvm-x86-lam-6.8' of https://github.com/kvm-x86/linux into HEAD
KVM x86 support for virtualizing Linear Address Masking (LAM)
Add KVM support for Linear Address Masking (LAM). LAM tweaks the canonicality
checks for most virtual address usage in 64-bit mode, such that only the most
significant bit of the untranslated address bits must match the polarity of the
last translated address bit. This allows software to use ignored, untranslated
address bits for metadata, e.g. to efficiently tag pointers for address
sanitization.
LAM can be enabled separately for user pointers and supervisor pointers, and
for userspace LAM can be select between 48-bit and 57-bit masking
- 48-bit LAM: metadata bits 62:48, i.e. LAM width of 15.
- 57-bit LAM: metadata bits 62:57, i.e. LAM width of 6.
For user pointers, LAM enabling utilizes two previously-reserved high bits from
CR3 (similar to how PCID_NOFLUSH uses bit 63): LAM_U48 and LAM_U57, bits 62 and
61 respectively. Note, if LAM_57 is set, LAM_U48 is ignored, i.e.:
- CR3.LAM_U48=0 && CR3.LAM_U57=0 == LAM disabled for user pointers
- CR3.LAM_U48=1 && CR3.LAM_U57=0 == LAM-48 enabled for user pointers
- CR3.LAM_U48=x && CR3.LAM_U57=1 == LAM-57 enabled for user pointers
For supervisor pointers, LAM is controlled by a single bit, CR4.LAM_SUP, with
the 48-bit versus 57-bit LAM behavior following the current paging mode, i.e.:
- CR4.LAM_SUP=0 && CR4.LA57=x == LAM disabled for supervisor pointers
- CR4.LAM_SUP=1 && CR4.LA57=0 == LAM-48 enabled for supervisor pointers
- CR4.LAM_SUP=1 && CR4.LA57=1 == LAM-57 enabled for supervisor pointers
The bulk of KVM support for LAM is to emulate LAM's modified canonicality
checks. The approach taken by KVM is to "fill" the metadata bits using the
highest bit of the translated address, e.g. for LAM-48, bit 47 is sign-extended
to bits 62:48. The most significant bit, 63, is *not* modified, i.e. its value
from the raw, untagged virtual address is kept for the canonicality check. This
untagging allows
Aside from emulating LAM's canonical checks behavior, LAM has the usual KVM
touchpoints for selectable features: enumeration (CPUID.7.1:EAX.LAM[bit 26],
enabling via CR3 and CR4 bits, etc.
Paolo Bonzini [Mon, 8 Jan 2024 13:10:08 +0000 (08:10 -0500)]
Merge tag 'kvm-x86-pmu-6.8' of https://github.com/kvm-x86/linux into HEAD
KVM x86 PMU changes for 6.8:
- Fix a variety of bugs where KVM fail to stop/reset counters and other state
prior to refreshing the vPMU model.
- Fix a double-overflow PMU bug by tracking emulated counter events using a
dedicated field instead of snapshotting the "previous" counter. If the
hardware PMC count triggers overflow that is recognized in the same VM-Exit
that KVM manually bumps an event count, KVM would pend PMIs for both the
hardware-triggered overflow and for KVM-triggered overflow.
Paolo Bonzini [Mon, 8 Jan 2024 13:10:04 +0000 (08:10 -0500)]
Merge tag 'kvm-x86-misc-6.8' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.8:
- Turn off KVM_WERROR by default for all configs so that it's not
inadvertantly enabled by non-KVM developers, which can be problematic for
subsystems that require no regressions for W=1 builds.
- Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
"features".
- Don't force a masterclock update when a vCPU synchronizes to the current TSC
generation, as updating the masterclock can cause kvmclock's time to "jump"
unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.
- Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
partly as a super minor optimization, but mostly to make KVM play nice with
position independent executable builds.
Paolo Bonzini [Mon, 8 Jan 2024 13:09:53 +0000 (08:09 -0500)]
Merge tag 'kvmarm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 6.8
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
base granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with
a prefix branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV
support to that version of the architecture.
- A small set of vgic fixes and associated cleanups.
Paolo Bonzini [Sat, 6 Jan 2024 07:24:00 +0000 (02:24 -0500)]
KVM: fix direction of dependency on MMU notifiers
KVM_GENERIC_MEMORY_ATTRIBUTES requires the generic MMU notifier code, because
it uses kvm_mmu_invalidate_begin/end. However, it would not work with a bespoke
implementation of MMU notifiers that does not use KVM_GENERIC_MMU_NOTIFIER,
because most likely it would not synchronize correctly on invalidation. So
the right thing to do is to note the problematic configuration if the
architecture does not select itself KVM_GENERIC_MMU_NOTIFIER; not to
enable it blindly.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 4 Jan 2024 16:15:07 +0000 (11:15 -0500)]
KVM: introduce CONFIG_KVM_COMMON
CONFIG_HAVE_KVM is currently used by some architectures to either
enabled the KVM config proper, or to enable host-side code that is
not part of the KVM module. However, CONFIG_KVM's "select" statement
in virt/kvm/Kconfig corresponds to a third meaning, namely to
enable common Kconfigs required by all architectures that support
KVM.
These three meanings can be replaced respectively by an
architecture-specific Kconfig, by IS_ENABLED(CONFIG_KVM), or by
a new Kconfig symbol that is in turn selected by the
architecture-specific "config KVM".
Start by introducing such a new Kconfig symbol, CONFIG_KVM_COMMON.
Unlike CONFIG_HAVE_KVM, it is selected by CONFIG_KVM, not by
architecture code, and it brings in all dependencies of common
KVM code. In particular, INTERVAL_TREE was missing in loongarch
and riscv, so that is another thing that is fixed.
Fixes: 8132d887a702 ("KVM: remove CONFIG_HAVE_KVM_EVENTFD", 2023-12-08) Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/all/44907c6b-c5bd-4e4a-a921-e4d3825539d8@infradead.org/ Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Will Deacon [Thu, 4 Jan 2024 16:42:20 +0000 (16:42 +0000)]
KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
In commit f320bc742bc23 ("KVM: arm64: Prepare the creation of s1
mappings at EL2"), pKVM switches from a temporary host-provided
page-table to its own page-table at EL2. Since there is only a single
TTBR for the nVHE hypervisor, this involves disabling and re-enabling
the MMU in __pkvm_init_switch_pgd().
Unfortunately, the memory barriers here are not quite correct.
Specifically:
- A DSB is required to complete the TLB invalidation executed while
the MMU is disabled.
- An ISB is required to make the new TTBR value visible to the
page-table walker before the MMU is enabled in the SCTLR.
An earlier version of the patch actually got this correct:
but thanks to some badly worded review comments from yours truly, these
were dropped for the version that was eventually merged.
Bring back the barriers and fix the potential issue (but note that this
was found by code inspection).
Cc: Quentin Perret <qperret@google.com> Fixes: f320bc742bc23 ("KVM: arm64: Prepare the creation of s1 mappings at EL2") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240104164220.7968-1-will@kernel.org
Marc Zyngier [Thu, 4 Jan 2024 19:28:15 +0000 (19:28 +0000)]
Merge branch kvm-arm64/vgic-6.8 into kvmarm-master/next
* kvm-arm64/vgic-6.8:
: .
: Fix for the GICv4.1 vSGI pending state being set/cleared from
: userspace, and some cleanup to the MMIO and userspace accessors
: for the pending state.
:
: Also a fix for a potential UAF in the ITS translation cache.
: .
KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
KVM: arm64: vgic-v3: Reinterpret user ISPENDR writes as I{C,S}PENDR
KVM: arm64: vgic: Use common accessor for writes to ICPENDR
KVM: arm64: vgic: Use common accessor for writes to ISPENDR
KVM: arm64: vgic-v4: Restore pending state on host userspace write
Oliver Upton [Thu, 4 Jan 2024 18:32:32 +0000 (18:32 +0000)]
KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
There is a potential UAF scenario in the case of an LPI translation
cache hit racing with an operation that invalidates the cache, such
as a DISCARD ITS command. The root of the problem is that
vgic_its_check_cache() does not elevate the refcount on the vgic_irq
before dropping the lock that serializes refcount changes.
Have vgic_its_check_cache() raise the refcount on the returned vgic_irq
and add the corresponding decrement after queueing the interrupt.
Paolo Bonzini [Tue, 2 Jan 2024 18:19:40 +0000 (13:19 -0500)]
Merge tag 'kvm-riscv-6.8-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.8 part #1
- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list selftest
- Steal time account support along with selftest
Add guest_sbi_probe_extension(), allowing guest code to probe for
SBI extensions. As guest_sbi_probe_extension() needs
SBI_ERR_NOT_SUPPORTED, take the opportunity to bring in all SBI
error codes. We don't bring in all current extension IDs or base
extension function IDs though, even though we need one of each,
because we'd prefer to bring those in as necessary.
Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 20 Dec 2023 16:00:22 +0000 (17:00 +0100)]
RISC-V: KVM: Implement SBI STA extension
Add a select SCHED_INFO to the KVM config in order to get run_delay
info. Then implement SBI STA's set-steal-time-shmem function and
kvm_riscv_vcpu_record_steal_time() to provide the steal-time info
to guests.
Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 20 Dec 2023 16:00:21 +0000 (17:00 +0100)]
RISC-V: KVM: Add support for SBI STA registers
KVM userspace needs to be able to save and restore the steal-time
shared memory address. Provide the address through the get/set-one-reg
interface with two ulong-sized SBI STA extension registers (lo and hi).
64-bit KVM userspace must not set the hi register to anything other
than zero and is allowed to completely neglect saving/restoring it.
Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 20 Dec 2023 16:00:20 +0000 (17:00 +0100)]
RISC-V: KVM: Add support for SBI extension registers
Some SBI extensions have state that needs to be saved / restored
when migrating the VM. Provide a get/set-one-reg register type
for SBI extension registers. Each SBI extension that uses this type
will have its own subtype. There are currently no subtypes defined.
The next patch introduces the first one.
Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 20 Dec 2023 16:00:19 +0000 (17:00 +0100)]
RISC-V: KVM: Add SBI STA info to vcpu_arch
KVM's implementation of SBI STA needs to track the address of each
VCPU's steal-time shared memory region as well as the amount of
stolen time. Add a structure to vcpu_arch to contain this state
and make sure that the address is always set to INVALID_GPA on
vcpu reset. And, of course, ensure KVM won't try to update steal-
time when the shared memory address is invalid.
Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 20 Dec 2023 16:00:18 +0000 (17:00 +0100)]
RISC-V: KVM: Add steal-update vcpu request
Add a new vcpu request to inform a vcpu that it should record its
steal-time information. The request is made each time it has been
detected that the vcpu task was not assigned a cpu for some time,
which is easy to do by making the request from vcpu-load. The record
function is just a stub for now and will be filled in with the rest
of the steal-time support functions in following patches.
Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 20 Dec 2023 16:00:17 +0000 (17:00 +0100)]
RISC-V: KVM: Add SBI STA extension skeleton
Add the files and functions needed to support the SBI STA
(steal-time accounting) extension. In the next patches we'll
complete the functions to fully enable SBI STA support.
Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 20 Dec 2023 16:00:16 +0000 (17:00 +0100)]
RISC-V: paravirt: Implement steal-time support
When the SBI STA extension exists we can use it to implement
paravirt steal-time support. Fill in the empty pv-time functions
with an SBI STA implementation and add the Kconfig knobs allowing
it to be enabled.
Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 20 Dec 2023 16:00:14 +0000 (17:00 +0100)]
RISC-V: paravirt: Add skeleton for pv-time support
Add the files and functions needed to support paravirt time on
RISC-V. Also include the common code needed for the first
application of pv-time, which is steal-time. In the next
patches we'll complete the functions to fully enable steal-time
support.
Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Daniel Henrique Barboza [Tue, 5 Dec 2023 17:45:08 +0000 (14:45 -0300)]
RISC-V: KVM: add 'vlenb' Vector CSR
Userspace requires 'vlenb' to be able to encode it in reg ID. Otherwise
it is not possible to retrieve any vector reg since we're returning
EINVAL if reg_size isn't vlenb (see kvm_riscv_vcpu_vreg_addr()).
Daniel Henrique Barboza [Tue, 5 Dec 2023 17:45:07 +0000 (14:45 -0300)]
RISC-V: KVM: set 'vlenb' in kvm_riscv_vcpu_alloc_vector_context()
'vlenb', added to riscv_v_ext_state by commit c35f3aa34509 ("RISC-V:
vector: export VLENB csr in __sc_riscv_v_state"), isn't being
initialized in guest_context. If we export 'vlenb' as a KVM CSR,
something we want to do in the next patch, it'll always return 0.
Andrew Jones [Wed, 13 Dec 2023 17:09:58 +0000 (18:09 +0100)]
RISC-V: KVM: selftests: Treat SBI ext regs like ISA ext regs
SBI extension registers may not be present and indeed when
running on a platform without sscofpmf the PMU SBI extension
is not. Move the SBI extension registers from the base set of
registers to the filter list. Individual configs should test
for any that may or may not be present separately. Since
the PMU extension may disappear and the DBCN extension is only
present in later kernels, separate them from the rest into
their own configs. The rest are lumped together into the same
config.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 13 Dec 2023 17:09:57 +0000 (18:09 +0100)]
KVM: riscv: selftests: Use register subtypes
Always use register subtypes in the get-reg-list test when registers
have them. The only registers neglecting to do so were ISA extension
registers. While we don't really need to use KVM_REG_RISCV_ISA_SINGLE
(since it's zero), the main purpose is to avoid confusion and to
self-document the tests. Also add print support for the multi
registers like SBI extensions have, even though they're only used for
debugging.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 13 Dec 2023 17:09:56 +0000 (18:09 +0100)]
KVM: riscv: selftests: Add RISCV_SBI_EXT_REG
While adding RISCV_SBI_EXT_REG(), acknowledge that some registers
have subtypes and extend __kvm_reg_id() to take a subtype field.
Then, update all macros to set the new field appropriately. The
general CSR macro gets renamed to include "GENERAL", but the other
macros, like the new RISCV_SBI_EXT_REG, just use the SINGLE subtype.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 13 Dec 2023 17:09:55 +0000 (18:09 +0100)]
RISC-V: KVM: Make SBI uapi consistent with ISA uapi
When an SBI extension cannot be enabled, that's a distinct state vs.
enabled and disabled. Modify enum kvm_riscv_sbi_ext_status to
accommodate it, which allows KVM userspace to tell the difference
in state too, as the SBI extension register will disappear when it
cannot be enabled, i.e. accesses to it return ENOENT. get-reg-list is
updated as well to only add SBI extension registers to the list which
may be enabled. Returning ENOENT for SBI extension registers which
cannot be enabled makes them consistent with ISA extension registers.
Any SBI extensions which were enabled by default are still enabled by
default, if they can be enabled at all.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 13 Dec 2023 17:09:54 +0000 (18:09 +0100)]
KVM: riscv: selftests: Drop SBI multi registers
These registers are no longer getting added to get-reg-list.
We keep sbi_ext_multi_id_to_str() for printing, even though
we don't expect it to normally be used, because it may be
useful for debug.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
Andrew Jones [Wed, 13 Dec 2023 17:09:53 +0000 (18:09 +0100)]
RISC-V: KVM: Don't add SBI multi regs in get-reg-list
The multi regs are derived from the single registers. Only list the
single registers in get-reg-list. This also makes the SBI extension
register listing consistent with the ISA extension register listing.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
Anup Patel [Tue, 28 Nov 2023 14:53:43 +0000 (20:23 +0530)]
KVM: riscv: selftests: Generate ISA extension reg_list using macros
Various ISA extension reg_list have common pattern so let us generate
these using macros.
We define two macros for the above purpose:
1) KVM_ISA_EXT_SIMPLE_CONFIG - Macro to generate reg_list for
ISA extension without any additional ONE_REG registers
2) KVM_ISA_EXT_SUBLIST_CONFIG - Macro to generate reg_list for
ISA extension with additional ONE_REG registers
This patch also adds the missing config for svnapot.
Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Chao Du [Mon, 11 Dec 2023 09:40:14 +0000 (09:40 +0000)]
RISC-V: KVM: remove a redundant condition in kvm_arch_vcpu_ioctl_run()
The latest ret value is updated by kvm_riscv_vcpu_aia_update(),
the loop will continue if the ret is less than or equal to zero.
So the later condition will never hit. Thus remove it.
Signed-off-by: Chao Du <duchao@eswincomputing.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Clément Léger [Tue, 24 Oct 2023 13:26:55 +0000 (15:26 +0200)]
riscv: kvm: use ".L" local labels in assembly when applicable
For the sake of coherency, use local labels in assembly when
applicable. This also avoid kprobes being confused when applying a
kprobe since the size of function is computed by checking where the
next visible symbol is located. This might end up in computing some
function size to be way shorter than expected and thus failing to apply
kprobes to the specified offset.
Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Clément Léger [Tue, 24 Oct 2023 13:26:54 +0000 (15:26 +0200)]
riscv: kvm: Use SYM_*() assembly macros instead of deprecated ones
ENTRY()/END()/WEAK() macros are deprecated and we should make use of the
new SYM_*() macros [1] for better annotation of symbols. Replace the
deprecated ones with the new ones and fix wrong usage of END()/ENDPROC()
to correctly describe the symbols.
Linus Torvalds [Sat, 23 Dec 2023 20:13:28 +0000 (12:13 -0800)]
Merge tag 'x86-urgent-2023-12-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix a secondary CPUs enumeration regression caused by creative MADT
APIC table entries on certain systems.
- Fix a race in the NOP-patcher that can spuriously trigger crashes on
bootup.
- Fix a bootup failure regression caused by the parallel bringup code,
caused by firmware inconsistency between the APIC initialization
states of the boot and secondary CPUs, on certain systems.
* tag 'x86-urgent-2023-12-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/acpi: Handle bogus MADT APIC tables gracefully
x86/alternatives: Disable interrupts and sync when optimizing NOPs in place
x86/alternatives: Sync core before enabling interrupts
x86/smpboot/64: Handle X2APIC BIOS inconsistency gracefully
Linus Torvalds [Sat, 23 Dec 2023 19:58:53 +0000 (11:58 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four small fixes, three in drivers with the core one adding a batch
indicator (for drivers which use it) to the error handler"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Let the sq_lock protect sq_tail_slot access
scsi: ufs: qcom: Return ufs_qcom_clk_scale_*() errors in ufs_qcom_clk_scale_notify()
scsi: core: Always send batch on reset or error handling command
scsi: bnx2fc: Fix skb double free in bnx2fc_rcv()
Linus Torvalds [Sat, 23 Dec 2023 19:48:05 +0000 (11:48 -0800)]
Merge tag 'usb-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
"Here are some small bugfixes and new device ids for USB and
Thunderbolt drivers for 6.7-rc7. Included in here are:
- new usb-serial device ids
- thunderbolt driver fixes
- typec driver fix
- usb-storage driver quirk added
- fotg210 driver fix
All of these have been in linux-next with no reported issues"
* tag 'usb-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: option: add Quectel EG912Y module support
USB: serial: ftdi_sio: update Actisense PIDs constant names
usb: fotg210-hcd: delete an incorrect bounds test
usb-storage: Add quirk for incorrect WP on Kingston DT Ultimate 3.0 G3
usb: typec: ucsi: fix gpio-based orientation detection
net: usb: ax88179_178a: avoid failed operations when device is disconnected
USB: serial: option: add Quectel RM500Q R13 firmware support
USB: serial: option: add Foxconn T99W265 with new baseline
thunderbolt: Fix minimum allocated USB 3.x and PCIe bandwidth
thunderbolt: Fix memory leak in margining_port_remove()
Linus Torvalds [Sat, 23 Dec 2023 19:29:12 +0000 (11:29 -0800)]
Merge tag 'char-misc-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver fixes from Greg KH:
"Here are a small number of various driver fixes for 6.7-rc7 that
normally come through the char-misc tree, and one debugfs fix as well.
Included in here are:
- iio and hid sensor driver fixes for a number of small things
- interconnect driver fixes
- brcm_nvmem driver fixes
- debugfs fix for previous fix
- guard() definition in device.h so that many subsystems can start
using it for 6.8-rc1 (requested by Dan Williams to make future
merges easier)
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
debugfs: initialize cancellations earlier
Revert "iio: hid-sensor-als: Add light color temperature support"
Revert "iio: hid-sensor-als: Add light chromaticity support"
nvmem: brcm_nvram: store a copy of NVRAM content
dt-bindings: nvmem: mxs-ocotp: Document fsl,ocotp
driver core: Add a guard() definition for the device_lock()
interconnect: qcom: icc-rpm: Fix peak rate calculation
iio: adc: MCP3564: fix hardware identification logic
iio: adc: MCP3564: fix calib_bias and calib_scale range checks
iio: adc: meson: add separate config for axg SoC family
iio: adc: imx93: add four channels for imx93 adc
iio: adc: ti_am335x_adc: Fix return value check of tiadc_request_dma()
interconnect: qcom: sm8250: Enable sync_state
iio: triggered-buffer: prevent possible freeing of wrong buffer
iio: imu: inv_mpu6050: fix an error code problem in inv_mpu6050_read_raw
iio: imu: adis16475: use bit numbers in assign_bit()
iio: imu: adis16475: add spi_device_id table
iio: tmag5273: fix temperature offset
interconnect: Treat xlate() returning NULL node as an error
iio: common: ms_sensors: ms_sensors_i2c: fix humidity conversion time table
...
Linus Torvalds [Sat, 23 Dec 2023 19:16:58 +0000 (11:16 -0800)]
Merge tag 'input-for-v6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a quirk to AT keyboard driver to skip issuing "GET ID" command when
8042 is in translated mode and the device is a laptop/portable,
because the "GET ID" command makes a bunch of recent laptops unhappy
- a quirk to i8042 to disable multiplexed mode on Acer P459-G2-M which
causes issues on resume
- psmouse will activate native RMI4 protocol support for touchpad on
ThinkPad L14 G1
- addition of Razer Wolverine V2 ID to xpad gamepad driver
- mapping for airplane mode button in soc_button_array driver for
TUXEDO laptops
- improved error handling in ipaq-micro-keys driver
- amimouse being prepared for platform remove callback returning void
* tag 'input-for-v6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: soc_button_array - add mapping for airplane mode button
Input: xpad - add Razer Wolverine V2 support
Input: ipaq-micro-keys - add error handling for devm_kmemdup
Input: amimouse - convert to platform remove callback returning void
Input: i8042 - add nomux quirk for Acer P459-G2-M
Input: atkbd - skip ATKBD_CMD_GETID in translated mode
Input: psmouse - enable Synaptics InterTouch for ThinkPad L14 G1
Nina Schoetterl-Glausch [Tue, 19 Dec 2023 14:08:52 +0000 (15:08 +0100)]
KVM: s390: cpu model: Use proper define for facility mask size
Use the previously unused S390_ARCH_FAC_MASK_SIZE_U64 instead of
S390_ARCH_FAC_LIST_SIZE_U64 for defining the fac_mask array.
Note that both values are the same, there is no functional change.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Link: https://lore.kernel.org/r/20231219140854.1042599-4-nsg@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20231219140854.1042599-4-nsg@linux.ibm.com>
Nina Schoetterl-Glausch [Tue, 19 Dec 2023 14:08:51 +0000 (15:08 +0100)]
KVM: s390: vsie: Fix length of facility list shadowed
The length of the facility list accessed when interpretively executing
STFLE is the same as the hosts facility list (in case of format-0)
The memory following the facility list doesn't need to be accessible.
The current VSIE implementation accesses a fixed length that exceeds the
guest/host facility list length and can therefore wrongly inject a
validity intercept.
Instead, find out the host facility list length by running STFLE and
copy only as much as necessary when shadowing.
Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20231219140854.1042599-3-nsg@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20231219140854.1042599-3-nsg@linux.ibm.com>
STFLE can be interpretively executed.
This occurs when the facility list designation is unequal to zero.
Perform the check before applying the address mask instead of after.
Fixes: 66b630d5b7f2 ("KVM: s390: vsie: support STFLE interpretation") Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20231219140854.1042599-2-nsg@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20231219140854.1042599-2-nsg@linux.ibm.com>
Christoffer Sandberg [Sat, 23 Dec 2023 07:25:38 +0000 (23:25 -0800)]
Input: soc_button_array - add mapping for airplane mode button
This add a mapping for the airplane mode button on the TUXEDO Pulse Gen3.
While it is physically a key it behaves more like a switch, sending a key
down on first press and a key up on 2nd press. Therefor the switch event
is used here. Besides this behaviour it uses the HID usage-id 0xc6
(Wireless Radio Button) and not 0xc8 (Wireless Radio Slider Switch), but
since neither 0xc6 nor 0xc8 are currently implemented at all in
soc_button_array this not to standard behaviour is not put behind a quirk
for the moment.
Linus Torvalds [Sat, 23 Dec 2023 03:36:48 +0000 (19:36 -0800)]
Merge tag 'block-6.7-2023-12-22' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Just an NVMe pull request this time, with a fix for bad sleeping
context, and a revert of a patch that caused some trouble"
* tag 'block-6.7-2023-12-22' of git://git.kernel.dk/linux:
nvme-pci: fix sleeping function called from interrupt context
Revert "nvme-fc: fix race between error recovery and creating association"
Linus Torvalds [Sat, 23 Dec 2023 03:22:20 +0000 (19:22 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"RISC-V:
- Fix a race condition in updating external interrupt for
trap-n-emulated IMSIC swfile
- Fix print_reg defaults in get-reg-list selftest
ARM:
- Ensure a vCPU's redistributor is unregistered from the MMIO bus if
vCPU creation fails
- Fix building KVM selftests for arm64 from the top-level Makefile
x86:
- Fix breakage for SEV-ES guests that use XSAVES
Selftests:
- Fix bad use of strcat(), by not using strcat() at all"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
KVM: selftests: Fix dynamic generation of configuration names
RISCV: KVM: update external interrupt atomically for IMSIC swfile
KVM: riscv: selftests: Fix get-reg-list print_reg defaults
KVM: selftests: Ensure sysreg-defs.h is generated at the expected path
KVM: Convert comment into an assertion in kvm_io_bus_register_dev()
KVM: arm64: vgic: Ensure that slots_lock is held in vgic_register_all_redist_iodevs()
KVM: arm64: vgic: Force vcpu vgic teardown on vcpu destroy
KVM: arm64: vgic: Add a non-locking primitive for kvm_vgic_vcpu_destroy()
KVM: arm64: vgic: Simplify kvm_vgic_destroy()
Linus Torvalds [Fri, 22 Dec 2023 16:46:44 +0000 (08:46 -0800)]
Merge tag 'sound-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Apparently there were so many kids wishing bug fixes that made Santa
busy; here we have lots of fixes although it's a bit late. But all
changes are device-specific, hence it should be relatively safe to
apply.
Most of changes are for Cirrus codecs (for both ASoC and HD-audio),
while the remaining are fixes for TI codecs, HD-audio and USB-audio
quirks"
* tag 'sound-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ALSA: hda: cs35l41: Only add SPI CS GPIO if SPI is enabled in kernel
ALSA: hda: cs35l41: Do not allow uninitialised variables to be freed
ASoC: fsl_sai: Fix channel swap issue on i.MX8MP
ASoC: hdmi-codec: fix missing report for jack initial status
ALSA: hda/realtek: Add quirks for ASUS Zenbook 2023 Models
ALSA: hda: cs35l41: Support additional ASUS Zenbook 2023 Models
ALSA: hda/realtek: Add quirks for ASUS Zenbook 2022 Models
ALSA: hda: cs35l41: Support additional ASUS Zenbook 2022 Models
ALSA: hda/realtek: Add quirks for ASUS ROG 2023 models
ALSA: hda: cs35l41: Support additional ASUS ROG 2023 models
ALSA: hda: cs35l41: Add config table to support many laptops without _DSD
ASoC: Intel: bytcr_rt5640: Add new swapped-speakers quirk
ASoC: Intel: bytcr_rt5640: Add quirk for the Medion Lifetab S10346
kselftest: alsa: fixed a print formatting warning
ALSA: usb-audio: Increase delay in MOTU M quirk
ASoC: tas2781: check the validity of prm_no/cfg_no
ALSA: hda/tas2781: select program 0, conf 0 by default
ALSA: hda/realtek: Add quirk for ASUS ROG GV302XA
ASoC: cs42l43: Don't enable bias sense during type detect
ASoC: Intel: soc-acpi-intel-mtl-match: Change CS35L56 prefixes to AMPn
...
Linus Torvalds [Fri, 22 Dec 2023 16:42:55 +0000 (08:42 -0800)]
Merge tag 'i2c-for-6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- error path fixes (qcom-geni)
- polling mode fix (rk3x)
- target mode state machine fix (aspeed)
* tag 'i2c-for-6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: aspeed: Handle the coalesced stop conditions with the start conditions.
i2c: rk3x: fix potential spinlock recursion on poll
i2c: qcom-geni: fix missing clk_disable_unprepare() and geni_se_resources_off()
Linus Torvalds [Fri, 22 Dec 2023 16:41:04 +0000 (08:41 -0800)]
Merge tag 'gpio-fixes-for-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"Here's another round of fixes from the GPIO subsystem for this release
cycle.
There's one commit adding synchronization to an ioctl() we overlooked
previously and another synchronization changeset for one of the
drivers:
- add protection against GPIO device removal to an overlooked ioctl()
- synchronize the interrupt mask register manually in gpio-dwapb"
* tag 'gpio-fixes-for-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: dwapb: mask/unmask IRQ when disable/enale it
gpiolib: cdev: add gpio_device locking wrapper around gpio_ioctl()
Linus Torvalds [Fri, 22 Dec 2023 16:37:48 +0000 (08:37 -0800)]
Merge tag 'for-linus-6.7a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"A single patch fixing a build issue for x86 32-bit configurations with
CONFIG_XEN, which was introduced in the 6.7 development cycle"
* tag 'for-linus-6.7a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: add CPU dependencies for 32-bit build
Linus Torvalds [Fri, 22 Dec 2023 15:59:25 +0000 (07:59 -0800)]
Merge tag 'drm-fixes-2023-12-22' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Pretty quiet for this week, just i915 and amdgpu fixes,
I think the misc tree got lost this week, but didn't seem to have too
much in it, so it can wait. I've also got a bunch of nouveau GSP fixes
sailing around that'll probably land next time as well.
i915:
- Fix state readout and check for DSC and bigjoiner combo
- Fix a potential integer overflow
- Reject async flips with bigjoiner
- Fix MTL HDMI/DP PLL clock selection
- Fix various issues by disabling pipe DMC events"
* tag 'drm-fixes-2023-12-22' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: re-create idle bo's PTE during VM state machine reset
drm/amd/display: dereference variable before checking for zero
drm/amd/display: get dprefclk ss info from integration info table
drm/amd/display: Add case for dcn35 to support usb4 dmub hpd event
drm/amd/display: disable FPO and SubVP for older DMUB versions on DCN32x
drm/amdkfd: svm range always mapped flag not working on APU
drm/amd/display: Revert " drm/amd/display: Use channel_width = 2 for vram table 3.0"
drm/i915/dmc: Don't enable any pipe DMC events
drm/i915/mtl: Fix HDMI/DP PLL clock selection
drm/i915: Reject async flips with bigjoiner
drm/i915/hwmon: Fix static analysis tool reported issues
drm/i915/display: Get bigjoiner config before dsc config during readout
Linus Torvalds [Fri, 22 Dec 2023 15:50:34 +0000 (07:50 -0800)]
Merge tag '9p-for-6.7-rc7' of https://github.com/martinetd/linux
Pull 9p fixes from Dominique Martinet:
"Two small fixes scheduled for stable trees:
A tracepoint fix that's been reading past the end of messages forever,
but semi-recently also went over the end of the buffer. And a
potential incorrectly freeing garbage in pdu parsing error path"
* tag '9p-for-6.7-rc7' of https://github.com/martinetd/linux:
net: 9p: avoid freeing uninit memory in p9pdu_vreadf
9p: prevent read overrun in protocol dump tracepoint
Oliver Upton [Tue, 19 Dec 2023 06:58:55 +0000 (06:58 +0000)]
KVM: arm64: vgic-v3: Reinterpret user ISPENDR writes as I{C,S}PENDR
User writes to ISPENDR for GICv3 are treated specially, as zeroes
actually clear the pending state for interrupts (unlike HW). Reimplement
it using the ISPENDR and ICPENDR user accessors.
Oliver Upton [Tue, 19 Dec 2023 06:58:53 +0000 (06:58 +0000)]
KVM: arm64: vgic: Use common accessor for writes to ISPENDR
Perhaps unsurprisingly, there is a considerable amount of duplicate
code between the MMIO and user accessors for ISPENDR. At the same
time there are some important differences between user and guest
MMIO, like how SGIs can only be made pending from userspace.
Fold user and MMIO accessors into a common helper, maintaining the
distinction between the two.
Marc Zyngier [Sun, 17 Dec 2023 11:15:09 +0000 (11:15 +0000)]
KVM: arm64: vgic-v4: Restore pending state on host userspace write
When the VMM writes to ISPENDR0 to set the state pending state of
an SGI, we fail to convey this to the HW if this SGI is already
backed by a GICv4.1 vSGI.
This is a bit of a corner case, as this would only occur if the
vgic state is changed on an already running VM, but this can
apparently happen across a guest reset driven by the VMM.
Fix this by always writing out the pending_latch value to the
HW, and reseting it to false.
Johannes Berg [Thu, 21 Dec 2023 14:04:45 +0000 (15:04 +0100)]
debugfs: initialize cancellations earlier
Tetsuo Handa pointed out that in the (now reverted)
lockdep commit I initialized the data too late. The
same is true for the cancellation data, it must be
initialized before the cmpxchg(), otherwise it may
be done twice and possibly even overwriting data in
there already when there's a race. Fix that, which
also requires destroying the mutex in case we lost
the race.
Dave Airlie [Fri, 22 Dec 2023 03:11:08 +0000 (13:11 +1000)]
Merge tag 'drm-intel-fixes-2023-12-21' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v6.7-rc7:
- Fix state readout and check for DSC and bigjoiner combo
- Fix a potential integer overflow
- Reject async flips with bigjoiner
- Fix MTL HDMI/DP PLL clock selection
- Fix various issues by disabling pipe DMC events
Linus Torvalds [Fri, 22 Dec 2023 00:19:27 +0000 (16:19 -0800)]
Merge tag 'pinctrl-v6.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Some driver fixes for v6.7, all are in drivers, the most interesting
one is probably the AMD laptop suspend bug which really needs fixing.
Freedestop org has the bug description:
- Ignore disabled device tree nodes in the Starfive 7100 and 7100
drivers.
- Mask non-wake source pins with interrupt enabled at suspend in the
AMD driver, this blocks unnecessary wakeups from misc interrupts.
This can be power consuming because in many cases the system
doesn't really suspend, it just wakes right back up.
- Fix a typo breaking compilation of the cy8c95x0 driver, and fix up
bugs in the get/set config callbacks.
- Use a dedicated lock class for the PIO4 drivers IRQ. This fixes a
crash on suspend"
* tag 'pinctrl-v6.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: at91-pio4: use dedicated lock class for IRQ
pinctrl: cy8c95x0: Fix get_pincfg
pinctrl: cy8c95x0: Fix regression
pinctrl: cy8c95x0: Fix typo
pinctrl: amd: Mask non-wake source pins with interrupt enabled at suspend
pinctrl: starfive: jh7100: ignore disabled device tree nodes
pinctrl: starfive: jh7110: ignore disabled device tree nodes
Jens Axboe [Thu, 21 Dec 2023 21:32:35 +0000 (14:32 -0700)]
Merge tag 'nvme-6.7-2023-12-21' of git://git.infradead.org/nvme into block-6.7
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.7
- Revert a commit with improper sleep context (Keith)
- Fix async event handling sleep context (Maurizio)"
* tag 'nvme-6.7-2023-12-21' of git://git.infradead.org/nvme:
nvme-pci: fix sleeping function called from interrupt context
Revert "nvme-fc: fix race between error recovery and creating association"
David Howells [Thu, 21 Dec 2023 13:57:31 +0000 (13:57 +0000)]
afs: Fix use-after-free due to get/remove race in volume tree
When an afs_volume struct is put, its refcount is reduced to 0 before
the cell->volume_lock is taken and the volume removed from the
cell->volumes tree.
Unfortunately, this means that the lookup code can race and see a volume
with a zero ref in the tree, resulting in a use-after-free:
(1) When putting, use a flag to indicate if the volume has been removed
from the tree and skip the rb_erase if it has.
(2) When looking up, use a conditional ref increment and if it fails
because the refcount is 0, replace the node in the tree and set the
removal flag.
Fixes: 20325960f875 ("afs: Reorganise volume and server trees to be rooted on the cell") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>