David Woodhouse [Tue, 31 Jan 2023 15:00:54 +0000 (15:00 +0000)]
hw/xen: Implement core serialize/deserialize methods for xenstore_impl
In fact I think we want to only serialize the contents of the domain's
path in /local/domain/${domid} and leave the rest to be recreated? Will
defer to Paul for that.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Sun, 22 Jan 2023 22:59:49 +0000 (22:59 +0000)]
hw/xen: Watches on XenStore transactions
Firing watches on the nodes that still exist is relatively easy; just
walk the tree and look at the nodes with refcount of one.
Firing watches on *deleted* nodes is more fun. We add 'modified_in_tx'
and 'deleted_in_tx' flags to each node. Nodes with those flags cannot
be shared, as they will always be unique to the transaction in which
they were created.
When xs_node_walk would need to *create* a node as scaffolding and it
encounters a deleted_in_tx node, it can resurrect it simply by clearing
its deleted_in_tx flag. If that node originally had any *data*, they're
gone, and the modified_in_tx flag will have been set when it was first
deleted.
We then attempt to send appropriate watches when the transaction is
committed, properly delete the deleted_in_tx nodes, and remove the
modified_in_tx flag from the others.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Sun, 22 Jan 2023 22:05:37 +0000 (22:05 +0000)]
hw/xen: Implement XenStore transactions
Given that the whole thing supported copy on write from the beginning,
transactions end up being fairly simple. On starting a transaction, just
take a ref of the existing root; swap it back in on a successful commit.
The main tree has a transaction ID too, and we keep a record of the last
transaction ID given out. if the main tree is ever modified when it isn't
the latest, it gets a new transaction ID.
A commit can only succeed if the main tree hasn't moved on since it was
forked. Strictly speaking, the XenStore protocol allows a transaction to
succeed as long as nothing *it* read or wrote has changed in the interim,
but no implementations do that; *any* change is sufficient to abort a
transaction.
This does not yet fire watches on the changed nodes on a commit. That bit
is more fun and will come in a follow-on commit.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Sun, 22 Jan 2023 18:38:23 +0000 (18:38 +0000)]
hw/xen: Implement XenStore watches
Starts out fairly simple: a hash table of watches based on the path.
Except there can be multiple watches on the same path, so the watch ends
up being a simple linked list, and the head of that list is in the hash
table. Which makes removal a bit of a PITA but it's not so bad; we just
special-case "I had to remove the head of the list and now I have to
replace it in / remove it from the hash table". And if we don't remove
the head, it's a simple linked-list operation.
We do need to fire watches on *deleted* nodes, so instead of just a simple
xs_node_unref() on the topmost victim, we need to recurse down and fire
watches on them all.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Fri, 20 Jan 2023 01:36:38 +0000 (01:36 +0000)]
hw/xen: Add basic XenStore tree walk and write/read/directory support
This is a fairly simple implementation of a copy-on-write tree.
The node walk function starts off at the root, with 'inplace == true'.
If it ever encounters a node with a refcount greater than one (including
the root node), then that node is shared with other trees, and cannot
be modified in place, so the inplace flag is cleared and we copy on
write from there on down.
Xenstore write has 'mkdir -p' semantics and will create the intermediate
nodes if they don't already exist, so in that case we flip the inplace
flag back to true as as populated the newly-created nodes.
We put a copy of the absolute path into the buffer in the struct walk_op,
with *two* NUL terminators at the end. As xs_node_walk() goes down the
tree, it replaces the next '/' separator with a NUL so that it can use
the 'child name' in place. The next recursion down then puts the '/'
back and repeats the exercise for the next path element... if it doesn't
hit that *second* NUL termination which indicates the true end of the
path.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Wed, 18 Jan 2023 18:55:47 +0000 (18:55 +0000)]
hw/xen: Add xenstore wire implementation and implementation stubs
This implements the basic wire protocol for the XenStore commands, punting
all the actual implementation to xs_impl_* functions which all just return
errors for now.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Fri, 13 Jan 2023 23:35:46 +0000 (23:35 +0000)]
hw/xen: Support MSI mapping to PIRQ
The way that Xen handles MSI PIRQs is kind of awful.
There is a special MSI message which targets a PIRQ. The vector in the
low bits of data must be zero. The low 8 bits of the PIRQ# are in the
destination ID field, the extended destination ID field is unused, and
instead the high bits of the PIRQ# are in the high 32 bits of the address.
Using the high bits of the address means that we can't intercept and
translate these messages in kvm_send_msi(), because they won't be caught
by the APIC — addresses like 0x1000fee46000 aren't in the APIC's range.
So we catch them in pci_msi_trigger() instead, and deliver the event
channel directly.
That isn't even the worst part. The worst part is that Xen snoops on
writes to devices' MSI vectors while they are *masked*. When a MSI
message is written which looks like it targets a PIRQ, it remembers
the device and vector for later.
When the guest makes a hypercall to bind that PIRQ# (snooped from a
marked MSI vector) to an event channel port, Xen *unmasks* that MSI
vector on the device. Xen guests using PIRQ delivery of MSI don't
ever actually unmask the MSI for themselves.
Now that this is working we can finally enable XENFEAT_hvm_pirqs and
let the guest use it all.
Tested with passthrough igb and emulated e1000e + AHCI.
David Woodhouse [Tue, 17 Jan 2023 13:51:22 +0000 (13:51 +0000)]
hw/xen: Automatically add xen-platform PCI device for emulated Xen guests
It isn't strictly mandatory but Linux guests at least will only map
their grant tables over the dummy BAR that it provides, and don't have
sufficient wit to map them in any other unused part of their guest
address space. So include it by default for minimal surprise factor.
As I come to document "how to run a Xen guest in QEMU", this means one
fewer thing to tell the user about, according to the mantra of "if it
needs documenting, fix it first, then document what remains".
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Wed, 28 Dec 2022 10:06:49 +0000 (10:06 +0000)]
hw/xen: Add basic ring handling to xenstore
Extract requests, return ENOSYS to all of them. This is enough to allow
older Linux guests to boot, as they need *something* back but it doesn't
matter much what.
In the first instance we're likely to wire this up over a UNIX socket to
an actual xenstored implementation, but in the fullness of time it would
be nice to have a fully single-tenant "virtual" xenstore within qemu.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Tue, 27 Dec 2022 21:20:07 +0000 (21:20 +0000)]
hw/xen: Add backend implementation of interdomain event channel support
The provides the QEMU side of interdomain event channels, allowing events
to be sent to/from the guest.
The API mirrors libxenevtchn, and in time both this and the real Xen one
will be available through ops structures so that the PV backend drivers
can use the correct one as appropriate.
For now, this implementation can be used directly by our XenStore which
will be for emulated mode only.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Which is used to fetch xenstore PFN and port to be used
by the guest. This is preallocated by the toolstack when
guest will just read those and use it straight away.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Introduce support for one shot and periodic mode of Xen PV timers,
whereby timer interrupts come through a special virq event channel
with deadlines being set through:
David Woodhouse [Fri, 16 Dec 2022 00:03:21 +0000 (00:03 +0000)]
hw/xen: Support HVM_PARAM_CALLBACK_TYPE_PCI_INTX callback
The guest is permitted to specify an arbitrary domain/bus/device/function
and INTX pin from which the callback IRQ shall appear to have come.
In QEMU we can only easily do this for devices that actually exist, and
even that requires us "knowing" that it's a PCMachine in order to find
the PCI root bus — although that's OK really because it's always true.
We also don't get to get notified of INTX routing changes, because we
can't do that as a passive observer; if we try to register a notifier
it will overwrite any existing notifier callback on the device.
But in practice, guests using PCI_INTX will only ever use pin A on the
Xen platform device, and won't swizzle the INTX routing after they set
it up. So this is just fine.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Thu, 15 Dec 2022 20:35:24 +0000 (20:35 +0000)]
hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback
The GSI callback (and later PCI_INTX) is a level triggered interrupt. It
is asserted when an event channel is delivered to vCPU0, and is supposed
to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0
is cleared again.
Thankfully, Xen does *not* assert the GSI if the guest sets its own
evtchn_upcall_pending field; we only need to assert the GSI when we
have delivered an event for ourselves. So that's the easy part, kind of.
There's a slight complexity in that we need to hold the BQL before we
can call qemu_set_irq(), and we definitely can't do that while holding
our own port_lock (because we'll need to take that from the qemu-side
functions that the PV backend drivers will call). So if we end up
wanting to set the IRQ in a context where we *don't* already hold the
BQL, defer to a BH.
However, we *do* need to poll for the evtchn_upcall_pending flag being
cleared. In an ideal world we would poll that when the EOI happens on
the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd
pairs — one is used to trigger the interrupt, and the other works in the
other direction to 'resample' on EOI, and trigger the first eventfd
again if the line is still active.
However, QEMU doesn't seem to do that. Even VFIO level interrupts seem
to be supported by temporarily unmapping the device's BARs from the
guest when an interrupt happens, then trapping *all* MMIO to the device
and sending the 'resample' event on *every* MMIO access until the IRQ
is cleared! Maybe in future we'll plumb the 'resample' concept through
QEMU's irq framework but for now we'll do what Xen itself does: just
check the flag on every vmexit if the upcall GSI is known to be
asserted.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Joao Martins [Tue, 21 Aug 2018 16:16:19 +0000 (12:16 -0400)]
i386/xen: add monitor commands to test event injection
Specifically add listing, injection of event channels.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
David Woodhouse [Tue, 13 Dec 2022 22:40:56 +0000 (22:40 +0000)]
hw/xen: Implement EVTCHNOP_bind_virq
Add the array of virq ports to each vCPU so that we can deliver timers,
debug ports, etc. Global virqs are allocated against vCPU 0 initially,
but can be migrated to other vCPUs (when we implement that).
The kernel needs to know about VIRQ_TIMER in order to accelerate timers,
so tell it via KVM_XEN_VCPU_ATTR_TYPE_TIMER. Also save/restore the value
of the singleshot timer across migration, as the kernel will handle the
hypercalls automatically now.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Tue, 13 Dec 2022 17:20:46 +0000 (17:20 +0000)]
hw/xen: Implement EVTCHNOP_unmask
This finally comes with a mechanism for actually injecting events into
the guest vCPU, with all the atomic-test-and-set that's involved in
setting the bit in the shinfo, then the index in the vcpu_info, and
injecting either the lapic vector as MSI, or letting KVM inject the
bare vector.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Tue, 13 Dec 2022 13:57:44 +0000 (13:57 +0000)]
hw/xen: Implement EVTCHNOP_close
It calls an internal close_port() helper which will also be used from
EVTCHNOP_reset and will actually do the work to disconnect/unbind a port
once any of that is actually implemented in the first place.
That in turn calls a free_port() internal function which will be in
error paths after allocation.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Fri, 16 Dec 2022 14:32:25 +0000 (14:32 +0000)]
i386/xen: Add support for Xen event channel delivery to vCPU
The kvm_xen_inject_vcpu_callback_vector() function will either deliver
the per-vCPU local APIC vector (as an MSI), or just kick the vCPU out
of the kernel to trigger KVM's automatic delivery of the global vector.
Support for asserting the GSI/PCI_INTX callbacks will come later.
Also add kvm_xen_get_vcpu_info_hva() which returns the vcpu_info of
a given vCPU.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
David Woodhouse [Fri, 16 Dec 2022 14:02:29 +0000 (14:02 +0000)]
hw/xen: Add xen_evtchn device for event channel emulation
Include basic support for setting HVM_PARAM_CALLBACK_IRQ to the global
vector method HVM_PARAM_CALLBACK_TYPE_VECTOR, which is handled in-kernel
by raising the vector whenever the vCPU's vcpu_info->evtchn_upcall_pending
flag is set.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
Ankur Arora [Tue, 6 Dec 2022 11:14:07 +0000 (11:14 +0000)]
i386/xen: implement HVMOP_set_param
This is the hook for adding the HVM_PARAM_CALLBACK_IRQ parameter in a
subsequent commit.
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Split out from another commit] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
The HVMOP_set_evtchn_upcall_vector hypercall sets the per-vCPU upcall
vector, to be delivered to the local APIC just like an MSI (with an EOI).
This takes precedence over the system-wide delivery method set by the
HVMOP_set_param hypercall with HVM_PARAM_CALLBACK_IRQ. It's used by
Windows and Xen (PV shim) guests but normally not by Linux.
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Rework for upstream kernel changes and split from HVMOP_set_param] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
Joao Martins [Thu, 28 Jun 2018 16:36:19 +0000 (12:36 -0400)]
i386/xen: implement HYPERVISOR_event_channel_op
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch event_channel_op_compat which was never available to HVM guests] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
Joao Martins [Mon, 18 Jun 2018 16:26:44 +0000 (12:26 -0400)]
i386/xen: implement HYPERVISOR_vcpu_op
This is simply when guest tries to register a vcpu_info
and since vcpu_info placement is optional in the minimum ABI
therefore we can just fail with -ENOSYS
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
Joao Martins [Mon, 18 Jun 2018 16:17:42 +0000 (12:17 -0400)]
i386/xen: implement HYPERVISOR_memory_op
Specifically XENMEM_add_to_physmap with space XENMAPSPACE_shared_info to
allow the guest to set its shared_info page.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Use the xen_overlay device, add compat support] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
David Woodhouse [Mon, 12 Dec 2022 14:03:41 +0000 (14:03 +0000)]
i386/xen: manage and save/restore Xen guest long_mode setting
Xen will "latch" the guest's 32-bit or 64-bit ("long mode") setting when
the guest writes the MSR to fill in the hypercall page, or when the guest
sets the event channel callback in HVM_PARAM_CALLBACK_IRQ.
KVM handles the former and sets the kernel's long_mode flag accordingly.
The latter will be handled in userspace. Keep them in sync by noticing
when a hypercall is made in a mode that doesn't match qemu's idea of
the guest mode, and resyncing from the kernel. Do that same sync right
before serialization too, in case the guest has set the hypercall page
but hasn't yet made a system call.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
David Woodhouse [Mon, 12 Dec 2022 23:40:45 +0000 (23:40 +0000)]
i386/xen: add pc_machine_kvm_type to initialize XEN_EMULATE mode
The xen_overlay device (and later similar devices for event channels and
grant tables) need to be instantiated. Do this from a kvm_type method on
the PC machine derivatives, since KVM is only way to support Xen emulation
for now.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
David Woodhouse [Wed, 7 Dec 2022 09:19:31 +0000 (09:19 +0000)]
hw/xen: Add xen_overlay device for emulating shared xenheap pages
For the shared info page and for grant tables, Xen shares its own pages
from the "Xen heap" to the guest. The guest requests that a given page
from a certain address space (XENMAPSPACE_shared_info, etc.) be mapped
to a given GPA using the XENMEM_add_to_physmap hypercall.
To support that in qemu when *emulating* Xen, create a memory region
(migratable) and allow it to be mapped as an overlay when requested.
Xen theoretically allows the same page to be mapped multiple times
into the guest, but that's hard to track and reinstate over migration,
so we automatically *unmap* any previous mapping when creating a new
one. This approach has been used in production with.... a non-trivial
number of guests expecting true Xen, without any problems yet being
noticed.
This adds just the shared info page for now. The grant tables will be
a larger region, and will need to be overlaid one page at a time. I
think that means I need to create separate aliases for each page of
the overall grant_frames region, so that they can be mapped individually.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
David Woodhouse [Wed, 14 Dec 2022 21:50:41 +0000 (21:50 +0000)]
i386/xen: Implement SCHEDOP_poll and SCHEDOP_yield
They both do the same thing and just call sched_yield. This is enough to
stop the Linux guest panicking when running on a host kernel which doesn't
intercept SCHEDOP_poll and lets it reach userspace.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
It allows to shutdown itself via hypercall with any of the 3 reasons:
1) self-reboot
2) shutdown
3) crash
Implementing SCHEDOP_shutdown sub op let us handle crashes gracefully rather
than leading to triple faults if it remains unimplemented.
In addition, the SHUTDOWN_soft_reset reason is used for kexec, to reset
Xen shared pages and other enlightenments and leave a clean slate for the
new kernel without the hypervisor helpfully writing information at
unexpected addresses.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch sched_op_compat which was never available for HVM guests,
Add SCHEDOP_soft_reset] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
Joao Martins [Thu, 14 Jun 2018 12:29:45 +0000 (08:29 -0400)]
i386/xen: implement HYPERVISOR_xen_version
This is just meant to serve as an example on how we can implement
hypercalls. xen_version specifically since Qemu does all kind of
feature controllability. So handling that here seems appropriate.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Implement kvm_gva_rw() safely] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
Joao Martins [Wed, 13 Jun 2018 14:14:31 +0000 (10:14 -0400)]
i386/xen: handle guest hypercalls
This means handling the new exit reason for Xen but still
crashing on purpose. As we implement each of the hypercalls
we will then return the right return code.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Add CPL to hypercall tracing, disallow hypercalls from CPL > 0] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Joao Martins [Tue, 19 Jun 2018 10:44:46 +0000 (06:44 -0400)]
xen-platform: allow its creation with XEN_EMULATE mode
The only thing we need to fix to make this build is the PIO hack which
sets the BIOS memory areas to R/W v.s. R/O. Theoretically we could hook
that up to the PAM registers on the emulated PIIX, but in practice
nobody cares, so just leave it doing nothing.
Now it builds without actual Xen, move it to CONFIG_XEN_BUS to include it
in the KVM-only builds.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
David Woodhouse [Fri, 16 Dec 2022 11:05:29 +0000 (11:05 +0000)]
i386/hvm: Set Xen vCPU ID in KVM
There are (at least) three different vCPU ID number spaces. One is the
internal KVM vCPU index, based purely on which vCPU was chronologically
created in the kernel first. If userspace threads are all spawned and
create their KVM vCPUs in essentially random order, then the KVM indices
are basically random too.
The second number space is the APIC ID space, which is consistent and
useful for referencing vCPUs. MSIs will specify the target vCPU using
the APIC ID, for example, and the KVM Xen APIs also take an APIC ID
from userspace whenever a vCPU needs to be specified (as opposed to
just using the appropriate vCPU fd).
The third number space is not normally relevant to the kernel, and is
the ACPI/MADT/Xen CPU number which corresponds to cs->cpu_index. But
Xen timer hypercalls use it, and Xen timer hypercalls *really* want
to be accelerated in the kernel rather than handled in userspace, so
the kernel needs to be told.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
Joao Martins [Tue, 6 Dec 2022 10:48:53 +0000 (10:48 +0000)]
i386/kvm: handle Xen HVM cpuid leaves
Introduce support for emulating CPUID for Xen HVM guests. It doesn't make
sense to advertise the KVM leaves to a Xen guest, so do Xen unconditionally
when the xen-version machine property is set.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Obtain xen_version from KVM property, make it automatic] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
David Woodhouse [Sat, 3 Dec 2022 17:51:13 +0000 (09:51 -0800)]
i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support
This just initializes the basic Xen support in KVM for now. Only permitted
on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap
overlay, event channel, grant tables and other stuff will exist. There's
no point having the basic hypercall support if nothing else works.
Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used
later by support devices.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
David Woodhouse [Tue, 6 Dec 2022 09:03:48 +0000 (09:03 +0000)]
xen: add CONFIG_XEN_BUS and CONFIG_XEN_EMU options for Xen emulation
The XEN_EMU option will cover core Xen support in target/, which exists
only for x86 with KVM today but could theoretically also be implemented
on Arm/Aarch64 and with TCG or other accelerators (if anyone wants to
run the gauntlet of struct layout compatibility, errno mapping, and the
rest of that fui).
It will also cover the support for architecture-independent grant table
and event channel support which will be added in hw/i386/kvm/ (on the
basis that the non-KVM support is very theoretical and making it not use
KVM directly seems like gratuitous overengineering at this point).
The XEN_BUS option is for the xenfv platform support, which will now be
used both by XEN_EMU and by real Xen.
The XEN option remains dependent on the Xen runtime libraries, and covers
support for real Xen. Some code which currently resides under CONFIG_XEN
will be moving to CONFIG_XEN_BUS over time as the direct dependencies on
Xen runtime libraries are eliminated. The Xen PCI platform device will
also reside under CONFIG_XEN_BUS.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
Joao Martins [Wed, 13 Feb 2019 17:29:47 +0000 (12:29 -0500)]
include: import Xen public headers to hw/xen/interface
There's already a partial set here; update them and pull in a more
complete set.
To start with, define __XEN_TOOLS__ in hw/xen/xen.h to ensure that any
internal definitions needed by Xen toolstack libraries are present
regardless of the order in which the headers are included. A reckoning
will come later, once we make the PV backends work in emulation and
untangle the headers for Xen-native vs. generic parts.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Update to Xen public headers from 4.16.2 release, add some in io/,
define __XEN_TOOLS__ in hw/xen/xen.h, move to hw/xen/interface/] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
block/blkio: Fix inclusion of required headers
virtio-blk: simplify virtio_blk_dma_restart_cb()
util/aio: Defer disabling poll mode as long as possible
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Krempa [Mon, 23 Jan 2023 12:39:27 +0000 (13:39 +0100)]
block/blkio: Fix inclusion of required headers
After recent header file inclusion rework the build fails when the blkio
module is enabled:
../block/blkio.c: In function ‘blkio_detach_aio_context’:
../block/blkio.c:321:24: error: implicit declaration of function ‘bdrv_get_aio_context’; did you mean ‘qemu_get_aio_context’? [-Werror=implicit-function-declaration]
321 | aio_set_fd_handler(bdrv_get_aio_context(bs),
| ^~~~~~~~~~~~~~~~~~~~
| qemu_get_aio_context
../block/blkio.c:321:24: error: nested extern declaration of ‘bdrv_get_aio_context’ [-Werror=nested-externs]
../block/blkio.c:321:24: error: passing argument 1 of ‘aio_set_fd_handler’ makes pointer from integer without a cast [-Werror=int-conversion]
321 | aio_set_fd_handler(bdrv_get_aio_context(bs),
| ^~~~~~~~~~~~~~~~~~~~~~~~
| |
| int
In file included from /home/pipo/git/qemu.git/include/qemu/job.h:33,
from /home/pipo/git/qemu.git/include/block/blockjob.h:30,
from /home/pipo/git/qemu.git/include/block/block_int-global-state.h:28,
from /home/pipo/git/qemu.git/include/block/block_int.h:27,
from ../block/blkio.c:13:
/home/pipo/git/qemu.git/include/block/aio.h:476:37: note: expected ‘AioContext *’ but argument is of type ‘int’
476 | void aio_set_fd_handler(AioContext *ctx,
| ~~~~~~~~~~~~^~~
../block/blkio.c: In function ‘blkio_file_open’:
../block/blkio.c:821:34: error: passing argument 2 of ‘blkio_attach_aio_context’ makes pointer from integer without a cast [-Werror=int-conversion]
821 | blkio_attach_aio_context(bs, bdrv_get_aio_context(bs));
| ^~~~~~~~~~~~~~~~~~~~~~~~
| |
| int
Fix it by including 'block/block-io.h' which contains the required
declarations.
Fixes: e2c1c34f139f49ef909bb4322607fb8b39002312 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 2bc956011404a1ab03342aefde0087b5b4762562.1674477350.git.pkrempa@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Wed, 2 Nov 2022 18:23:37 +0000 (14:23 -0400)]
virtio-blk: simplify virtio_blk_dma_restart_cb()
virtio_blk_dma_restart_cb() is tricky because the BH must deal with
virtio_blk_data_plane_start()/virtio_blk_data_plane_stop() being called.
There are two issues with the code:
1. virtio_blk_realize() should use qdev_add_vm_change_state_handler()
instead of qemu_add_vm_change_state_handler(). This ensures the
ordering with virtio_init()'s vm change state handler that calls
virtio_blk_data_plane_start()/virtio_blk_data_plane_stop() is
well-defined. Then blk's AioContext is guaranteed to be up-to-date in
virtio_blk_dma_restart_cb() and it's no longer necessary to have a
special case for virtio_blk_data_plane_start().
2. Only blk_drain() waits for virtio_blk_dma_restart_cb()'s
blk_inc_in_flight() to be decremented. The bdrv_drain() family of
functions do not wait for BlockBackend's in_flight counter to reach
zero. virtio_blk_data_plane_stop() relies on blk_set_aio_context()'s
implicit drain, but that's a bdrv_drain() and not a blk_drain().
Note that virtio_blk_reset() already correctly relies on blk_drain().
If virtio_blk_data_plane_stop() switches to blk_drain() then we can
properly wait for pending virtio_blk_dma_restart_bh() calls.
Once these issues are taken care of the code becomes simpler. This
change is in preparation for multiple IOThreads in virtio-blk where we
need to clean up the multi-threading behavior.
I ran the reproducer from commit 49b44549ace7 ("virtio-blk: On restart,
process queued requests in the proper context") to check that there is
no regression.
Cc: Sergio Lopez <slp@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-id: 20221102182337.252202-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
util/aio: Defer disabling poll mode as long as possible
When we measure FIO read performance (cache=writethrough, bs=4k,
iodepth=64) in VMs, ~80K/s notifications (e.g., EPT_MISCONFIG) are observed
from guest to qemu.
It turns out those frequent notificatons are caused by interference from
worker threads. Worker threads queue bottom halves after completing IO
requests. Pending bottom halves may lead to either aio_compute_timeout()
zeros timeout and pass it to try_poll_mode() or run_poll_handlers() returns
no progress after noticing pending aio_notify() events. Both cause
run_poll_handlers() to call poll_set_started(false) to disable poll mode.
However, for both cases, as timeout is already zeroed, the event loop
(i.e., aio_poll()) just processes bottom halves and then starts the next
event loop iteration. So, disabling poll mode has no value but leads to
unnecessary notifications from guest.
To minimize unnecessary notifications from guest, defer disabling poll
mode to when the event loop is about to be blocked.
With this patch applied, FIO seq-read performance (bs=4k, iodepth=64,
cache=writethrough) in VMs increases from 330K/s to 413K/s IOPS.
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Chao Gao <chao.gao@intel.com>
Message-id: 20220710120849.63086-1-chao.gao@intel.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Peter Maydell [Mon, 23 Jan 2023 13:40:28 +0000 (13:40 +0000)]
Merge tag 'pull-target-arm-20230123' of https://git.linaro.org/people/pmaydell/qemu-arm into staging
target-arm queue:
* Widen cnthctl_el2 to uint64_t
* Unify checking for M Main Extension in MRS/MSR
* bitbang_i2c, versatile_i2c: code cleanups
* SME: refactor SME SM/ZA handling
* Fix physical address resolution for MTE
* Fix in_debug path in S1_ptw_translate
* Don't set EXC_RETURN.ES if Security Extension not present
* Implement DBGCLAIM registers
* Provide stubs for more external debug registers
* Look up ARMCPRegInfo at runtime, not translate time
* tag 'pull-target-arm-20230123' of https://git.linaro.org/people/pmaydell/qemu-arm: (26 commits)
target/arm: Look up ARMCPRegInfo at runtime
target/arm: Reorg do_coproc_insn
target/arm: provide stubs for more external debug registers
target/arm: implement DBGCLAIM registers
target/arm: Don't set EXC_RETURN.ES if Security Extension not present
target/arm: Fix in_debug path in S1_ptw_translate
target/arm: Fix physical address resolution for MTE
target/arm/sme: Unify set_pstate() SM/ZA helpers as set_svcr()
target/arm/sme: Rebuild hflags in aarch64_set_svcr()
target/arm/sme: Reset ZA state in aarch64_set_svcr()
target/arm/sme: Reset SVE state in aarch64_set_svcr()
target/arm/sme: Introduce aarch64_set_svcr()
target/arm/sme: Rebuild hflags in set_pstate() helpers
target/arm/sme: Reorg SME access handling in handle_msr_i()
hw/i2c/versatile_i2c: Rename versatile_i2c -> arm_sbcon_i2c
hw/i2c/versatile_i2c: Use ARM_SBCON_I2C() macro
hw/i2c/versatile_i2c: Replace TYPE_VERSATILE_I2C -> TYPE_ARM_SBCON_I2C
hw/i2c/versatile_i2c: Replace VersatileI2CState -> ArmSbconI2CState
hw/i2c/versatile_i2c: Drop useless casts from void * to pointer
hw/i2c/bitbang_i2c: Convert DPRINTF() to trace events
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Richard Henderson [Fri, 6 Jan 2023 19:44:51 +0000 (11:44 -0800)]
target/arm: Look up ARMCPRegInfo at runtime
Do not encode the pointer as a constant in the opcode stream.
This pointer is specific to the cpu that first generated the
translation, which runs into problems with both hot-pluggable
cpus and user-only threads, as cpus are removed. It's also a
potential correctness issue in the theoretical case of a
slightly-heterogenous system, because if CPU 0 generates a
TB and then CPU 1 executes it, CPU 1 will end up using CPU 0's
hash table, which might have a wrong set of registers in it.
(All our current systems are either completely homogenous,
M-profile, or have CPUs sufficiently different that they
wouldn't be sharing TBs anyway because the differences would
show up in the TB flags, so the correctness issue is only
theoretical, not practical.)
Perform the lookup in either helper_access_check_cp_reg,
or a new helper_lookup_cp_reg.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230106194451.1213153-3-richard.henderson@linaro.org
[PMM: added note in commit message about correctness issue] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Richard Henderson [Fri, 6 Jan 2023 19:44:50 +0000 (11:44 -0800)]
target/arm: Reorg do_coproc_insn
Move the ri == NULL case to the top of the function and return.
This allows the else to be removed and the code unindented.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20230106194451.1213153-2-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Evgeny Iakovlev [Fri, 20 Jan 2023 15:59:29 +0000 (16:59 +0100)]
target/arm: provide stubs for more external debug registers
Qemu doesn't implement Debug Communication Channel, as well as the rest
of external debug interface. However, Microsoft Hyper-V in tries to
access some of those registers during an EL2 context switch.
Since there is no architectural way to not advertise support for external
debug, provide RAZ/WI stubs for OSDTRRX_EL1, OSDTRTX_EL1 and OSECCR_EL1
registers in the same way the rest of DCM is currently done. Do account
for access traps though with access_tda.
Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230120155929.32384-3-eiakovlev@linux.microsoft.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Fri, 16 Dec 2022 15:24:10 +0000 (15:24 +0000)]
target/arm: Don't set EXC_RETURN.ES if Security Extension not present
In v7m_exception_taken(), for v8M we set the EXC_RETURN.ES bit if
either the exception targets Secure or if the CPU doesn't implement
the Security Extension. This is incorrect: the v8M Arm ARM specifies
that the ES bit should be RES0 if the Security Extension is not
implemented, and the pseudocode agrees.
Remove the incorrect condition, so that we leave the ES bit 0
if the Security Extension isn't implemented.
This doesn't have any guest-visible effects for our current set of
emulated CPUs, because all our v8M CPUs implement the Security
Extension; but it's worth fixing in case we add a v8M CPU without
the extension in future.
Reported-by: Igor Kotrasinski <i.kotrasinsk@samsung.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Richard Henderson [Sat, 14 Jan 2023 03:12:13 +0000 (17:12 -1000)]
target/arm: Fix physical address resolution for MTE
Conversion to probe_access_full missed applying the page offset.
Fixes: b8967ddf ("target/arm: Use probe_access_full for MTE")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1416 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230114031213.2970349-1-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This device model started with the Versatile board, named
TYPE_VERSATILE_I2C, then ended up renamed TYPE_ARM_SBCON_I2C
as per the official "ARM SBCon two-wire serial bus interface"
description from:
https://developer.arm.com/documentation/dui0440/b/programmer-s-reference/two-wire-serial-bus-interface--sbcon
Use the latter name as a better description.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230110082508.24038-6-philmd@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Philippe Mathieu-Daudé [Tue, 10 Jan 2023 08:25:07 +0000 (09:25 +0100)]
hw/i2c/versatile_i2c: Use ARM_SBCON_I2C() macro
ARM_SBCON_I2C() macro and ArmSbconI2CState typedef are
already declared via the QOM DECLARE_INSTANCE_CHECKER()
macro in "hw/i2c/arm_sbcon_i2c.h". Drop the VERSATILE_I2C
declarations from versatile_i2c.c.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230110082508.24038-5-philmd@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230110082508.24038-4-philmd@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In order to rename TYPE_VERSATILE_I2C as TYPE_ARM_SBCON_I2C
(the formal ARM naming), start renaming its state.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230110082508.24038-3-philmd@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Philippe Mathieu-Daudé [Tue, 10 Jan 2023 08:25:04 +0000 (09:25 +0100)]
hw/i2c/versatile_i2c: Drop useless casts from void * to pointer
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230110082508.24038-2-philmd@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
David Reiss [Mon, 9 Jan 2023 23:05:19 +0000 (15:05 -0800)]
target/arm: Unify checking for M Main Extension in MRS/MSR
BASEPRI, FAULTMASK, and their _NS equivalents only exist on devices with
the Main Extension. However, the MRS instruction did not check this,
and the MSR instruction handled it inconsistently (warning BASEPRI, but
silently ignoring writes to BASEPRI_NS). Unify this behavior and always
warn when reading or writing any of these registers if the extension is
not present.
Signed-off-by: David Reiss <dreiss@meta.com>
Message-id: 167330628518.10497.13100425787268927786-0@git.sr.ht Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Richard Henderson [Sun, 15 Jan 2023 17:16:33 +0000 (07:16 -1000)]
target/arm: Widen cnthctl_el2 to uint64_t
This is a 64-bit register on AArch64, even if the high 44 bits
are RES0. Because this is defined as ARM_CP_STATE_BOTH, we are
asserting that the cpreg field is 64-bits.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1400 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230115171633.3171890-1-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Fri, 20 Jan 2023 16:17:56 +0000 (16:17 +0000)]
Merge tag 'pull-riscv-to-apply-20230120' of https://github.com/alistair23/qemu into staging
Second RISC-V PR for QEMU 8.0
* riscv_htif: Support console output via proxy syscall
* Cleanup firmware and device tree loading
* Fix elen check when using vector extensions
* add RISC-V OpenSBI boot test
* Ensure we always follow MISA parsing
* Fix up masking of vsip/vsie accesses
* Trap on writes to stimecmp from VS when hvictl.VTI=1
* Introduce helper_set_rounding_mode_chkfrm
* tag 'pull-riscv-to-apply-20230120' of https://github.com/alistair23/qemu: (37 commits)
hw/riscv/virt.c: move create_fw_cfg() back to virt_machine_init()
target/riscv: Remove helper_set_rod_rounding_mode
target/riscv: Introduce helper_set_rounding_mode_chkfrm
tcg/riscv: Use tcg_pcrel_diff in tcg_out_ldst
target/riscv: Trap on writes to stimecmp from VS when hvictl.VTI=1
target/riscv: Fix up masking of vsip/vsie accesses
hw/riscv: use ms->fdt in riscv_socket_fdt_write_distance_matrix()
hw/riscv: use MachineState::fdt in riscv_socket_fdt_write_id()
hw/riscv/virt.c: remove 'is_32_bit' param from create_fdt_socket_cpus()
hw/riscv/sifive_u.c: simplify create_fdt()
hw/riscv/virt.c: simplify create_fdt()
hw/riscv/spike.c: simplify create_fdt()
target/riscv: Use TARGET_FMT_lx for env->mhartid
target/riscv/cpu.c: do not skip misa logic in riscv_cpu_realize()
target/riscv/cpu: set cpu->cfg in register_cpu_props()
hw/riscv/boot.c: use MachineState in riscv_load_kernel()
hw/riscv/boot.c: use MachineState in riscv_load_initrd()
hw/riscv: write bootargs 'chosen' FDT after riscv_load_kernel()
hw/riscv: write initrd 'chosen' FDT inside riscv_load_initrd()
hw/riscv/spike.c: load initrd right after riscv_load_kernel()
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* tag 'pull-include-2023-01-20' of https://repo.or.cz/qemu/armbru:
include/hw/ppc include/hw/pci-host: Drop extra typedefs
include/hw/ppc: Don't include hw/pci-host/pnv_phb.h from pnv.h
include/hw/ppc: Supply a few missing includes
include/hw/ppc: Split pnv_chip.h off pnv.h
include/hw/block: Include hw/block/block.h where needed
hw/sparc64/niagara: Use blk_name() instead of open-coding it
include/block: Untangle inclusion loops
coroutine: Use Coroutine typedef name instead of structure tag
coroutine: Split qemu/coroutine-core.h off qemu/coroutine.h
coroutine: Clean up superfluous inclusion of qemu/lockable.h
coroutine: Move coroutine_fn to qemu/osdep.h, trim includes
coroutine: Clean up superfluous inclusion of qemu/coroutine.h
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>