www.infradead.org Git - users/dwmw2/qemu.git/log

hw/xen: Subsume xen_be_register_common() into xen_be_init()

Every caller of xen_be_init() checks and exits on error, then calls
xen_be_register_common(). Just make xen_be_init() abort for itself and
return void, and register the common devices too.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Document Xen HVM emulation

Signed-off-by: David Woodhouse <dwmw2@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

kvm/i386: Add xen-evtchn-max-pirq property

The default number of PIRQs is set to 256 to avoid issues with 32-bit MSI
devices. Allow it to be increased if the user desires.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Support MSI mapping to PIRQ

The way that Xen handles MSI PIRQs is kind of awful.

There is a special MSI message which targets a PIRQ. The vector in the
low bits of data must be zero. The low 8 bits of the PIRQ# are in the
destination ID field, the extended destination ID field is unused, and
instead the high bits of the PIRQ# are in the high 32 bits of the address.

Using the high bits of the address means that we can't intercept and
translate these messages in kvm_send_msi(), because they won't be caught
by the APIC — addresses like 0x1000fee46000 aren't in the APIC's range.

So we catch them in pci_msi_trigger() instead, and deliver the event
channel directly.

That isn't even the worst part. The worst part is that Xen snoops on
writes to devices' MSI vectors while they are *masked*. When a MSI
message is written which looks like it targets a PIRQ, it remembers
the device and vector for later.

When the guest makes a hypercall to bind that PIRQ# (snooped from a
marked MSI vector) to an event channel port, Xen *unmasks* that MSI
vector on the device. Xen guests using PIRQ delivery of MSI don't
ever actually unmask the MSI for themselves.

Now that this is working we can finally enable XENFEAT_hvm_pirqs and
let the guest use it all.

Tested with passthrough igb and emulated e1000e + AHCI.

           CPU0       CPU1
  0:         65          0   IO-APIC   2-edge      timer
  1:          0         14  xen-pirq   1-ioapic-edge  i8042
  4:          0        846  xen-pirq   4-ioapic-edge  ttyS0
  8:          1          0  xen-pirq   8-ioapic-edge  rtc0
  9:          0          0  xen-pirq   9-ioapic-level  acpi
12:        257          0  xen-pirq  12-ioapic-edge  i8042
24:       9600          0  xen-percpu    -virq      timer0
25:       2758          0  xen-percpu    -ipi       resched0
26:          0          0  xen-percpu    -ipi       callfunc0
27:          0          0  xen-percpu    -virq      debug0
28:       1526          0  xen-percpu    -ipi       callfuncsingle0
29:          0          0  xen-percpu    -ipi       spinlock0
30:          0       8608  xen-percpu    -virq      timer1
31:          0        874  xen-percpu    -ipi       resched1
32:          0          0  xen-percpu    -ipi       callfunc1
33:          0          0  xen-percpu    -virq      debug1
34:          0       1617  xen-percpu    -ipi       callfuncsingle1
35:          0          0  xen-percpu    -ipi       spinlock1
36:          8          0   xen-dyn    -event     xenbus
37:          0       6046  xen-pirq    -msi       ahci[0000:00:03.0]
38:          1          0  xen-pirq    -msi-x     ens4
39:          0         73  xen-pirq    -msi-x     ens4-rx-0
40:         14          0  xen-pirq    -msi-x     ens4-rx-1
41:          0         32  xen-pirq    -msi-x     ens4-tx-0
42:         47          0  xen-pirq    -msi-x     ens4-tx-1

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Support GSI mapping to PIRQ

If I advertise XENFEAT_hvm_pirqs then a guest now boots successfully as
long as I tell it 'pci=nomsi'.

[root@localhost ~]# cat /proc/interrupts
           CPU0
  0:         52   IO-APIC   2-edge      timer
  1:         16  xen-pirq   1-ioapic-edge  i8042
  4:       1534  xen-pirq   4-ioapic-edge  ttyS0
  8:          1  xen-pirq   8-ioapic-edge  rtc0
  9:          0  xen-pirq   9-ioapic-level  acpi
11:       5648  xen-pirq  11-ioapic-level  ahci[0000:00:04.0]
12:        257  xen-pirq  12-ioapic-edge  i8042
...

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement emulated PIRQ hypercall support

This wires up the basic infrastructure but the actual interrupts aren't
there yet, so don't advertise it to the guest.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Implement HYPERVISOR_physdev_op

Just hook up the basic hypercalls to stubs in xen_evtchn.c for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Automatically add xen-platform PCI device for emulated Xen guests

It isn't strictly mandatory but Linux guests at least will only map
their grant tables over the dummy BAR that it provides, and don't have
sufficient wit to map them in any other unused part of their guest
address space. So include it by default for minimal surprise factor.

As I come to document "how to run a Xen guest in QEMU", this means one
fewer thing to tell the user about, according to the mantra of "if it
needs documenting, fix it first, then document what remains".

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Add basic ring handling to xenstore

Extract requests, return ENOSYS to all of them. This is enough to allow
older Linux guests to boot, as they need *something* back but it doesn't
matter much what.

A full implementation of a single-tentant internal XenStore copy-on-write
tree with transactions and watches is waiting in the wings to be sent in
a subsequent round of patches along with hooking up the actual PV disk
back end in qemu, but this is enough to get guests booting for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Add xen_xenstore device for xenstore emulation

Just the basic shell, with the event channel hookup. It only dumps the
buffer for now; a real ring implmentation will come in a subsequent patch.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Add backend implementation of interdomain event channel support

The provides the QEMU side of interdomain event channels, allowing events
to be sent to/from the guest.

The API mirrors libxenevtchn, and in time both this and the real Xen one
will be available through ops structures so that the PV backend drivers
can use the correct one as appropriate.

For now, this implementation can be used directly by our XenStore which
will be for emulated mode only.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle HVMOP_get_param

Which is used to fetch xenstore PFN and port to be used
by the guest. This is preallocated by the toolstack when
guest will just read those and use it straight away.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Reserve Xen special pages for console, xenstore rings

Xen has eight frames at 0xfeff8000 for this; we only really need two for
now and KVM puts the identity map at 0xfeffc000, so limit ourselves to
four.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle PV timer hypercalls

Introduce support for one shot and periodic mode of Xen PV timers,
whereby timer interrupts come through a special virq event channel
with deadlines being set through:

1) set_timer_op hypercall (only oneshot)
2) vcpu_op hypercall for {set,stop}_{singleshot,periodic}_timer
hypercalls

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement GNTTABOP_query_size

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Implement HYPERVISOR_grant_table_op and GNTTABOP_[gs]et_verson

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Support mapping grant frames

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Add xen_gnttab device for grant table emulation

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

kvm/i386: Add xen-gnttab-max-frames property

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Support HVM_PARAM_CALLBACK_TYPE_PCI_INTX callback

The guest is permitted to specify an arbitrary domain/bus/device/function
and INTX pin from which the callback IRQ shall appear to have come.

In QEMU we can only easily do this for devices that actually exist, and
even that requires us "knowing" that it's a PCMachine in order to find
the PCI root bus — although that's OK really because it's always true.

We also don't get to get notified of INTX routing changes, because we
can't do that as a passive observer; if we try to register a notifier
it will overwrite any existing notifier callback on the device.

But in practice, guests using PCI_INTX will only ever use pin A on the
Xen platform device, and won't swizzle the INTX routing after they set
it up. So this is just fine.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback

The GSI callback (and later PCI_INTX) is a level triggered interrupt. It
is asserted when an event channel is delivered to vCPU0, and is supposed
to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0
is cleared again.

Thankfully, Xen does *not* assert the GSI if the guest sets its own
evtchn_upcall_pending field; we only need to assert the GSI when we
have delivered an event for ourselves. So that's the easy part, kind of.

There's a slight complexity in that we need to hold the BQL before we
can call qemu_set_irq(), and we definitely can't do that while holding
our own port_lock (because we'll need to take that from the qemu-side
functions that the PV backend drivers will call). So if we end up
wanting to set the IRQ in a context where we *don't* already hold the
BQL, defer to a BH.

However, we *do* need to poll for the evtchn_upcall_pending flag being
cleared. In an ideal world we would poll that when the EOI happens on
the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd
pairs — one is used to trigger the interrupt, and the other works in the
other direction to 'resample' on EOI, and trigger the first eventfd
again if the line is still active.

However, QEMU doesn't seem to do that. Even VFIO level interrupts seem
to be supported by temporarily unmapping the device's BARs from the
guest when an interrupt happens, then trapping *all* MMIO to the device
and sending the 'resample' event on *every* MMIO access until the IRQ
is cleared! Maybe in future we'll plumb the 'resample' concept through
QEMU's irq framework but for now we'll do what Xen itself does: just
check the flag on every vmexit if the upcall GSI is known to be
asserted.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: add monitor commands to test event injection

Specifically add listing, injection of event channels.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_reset

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_bind_vcpu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_bind_interdomain

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_alloc_unbound

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_send

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_bind_ipi

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_bind_virq

Add the array of virq ports to each vCPU so that we can deliver timers,
debug ports, etc. Global virqs are allocated against vCPU 0 initially,
but can be migrated to other vCPUs (when we implement that).

The kernel needs to know about VIRQ_TIMER in order to accelerate timers,
so tell it via KVM_XEN_VCPU_ATTR_TYPE_TIMER. Also save/restore the value
of the singleshot timer across migration, as the kernel will handle the
hypercalls automatically now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_unmask

This finally comes with a mechanism for actually injecting events into
the guest vCPU, with all the atomic-test-and-set that's involved in
setting the bit in the shinfo, then the index in the vcpu_info, and
injecting either the lapic vector as MSI, or letting KVM inject the
bare vector.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_close

It calls an internal close_port() helper which will also be used from
EVTCHNOP_reset and will actually do the work to disconnect/unbind a port
once any of that is actually implemented in the first place.

That in turn calls a free_port() internal function which will be in
error paths after allocation.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_status

This adds the basic structure for maintaining the port table and reporting
the status of ports therein.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Add support for Xen event channel delivery to vCPU

The kvm_xen_inject_vcpu_callback_vector() function will either deliver
the per-vCPU local APIC vector (as an MSI), or just kick the vCPU out
of the kernel to trigger KVM's automatic delivery of the global vector.
Support for asserting the GSI/PCI_INTX callbacks will come later.

Also add kvm_xen_get_vcpu_info_hva() which returns the vcpu_info of
a given vCPU.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Add xen_evtchn device for event channel emulation

Include basic support for setting HVM_PARAM_CALLBACK_IRQ to the global
vector method HVM_PARAM_CALLBACK_TYPE_VECTOR, which is handled in-kernel
by raising the vector whenever the vCPU's vcpu_info->evtchn_upcall_pending
flag is set.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HVMOP_set_param

This is the hook for adding the HVM_PARAM_CALLBACK_IRQ parameter in a
subsequent commit.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Split out from another commit]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HVMOP_set_evtchn_upcall_vector

The HVMOP_set_evtchn_upcall_vector hypercall sets the per-vCPU upcall
vector, to be delivered to the local APIC just like an MSI (with an EOI).

This takes precedence over the system-wide delivery method set by the
HVMOP_set_param hypercall with HVM_PARAM_CALLBACK_IRQ. It's used by
Windows and Xen (PV shim) guests but normally not by Linux.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Rework for upstream kernel changes and split from HVMOP_set_param]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_event_channel_op

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch event_channel_op_compat which was never available to HVM guests]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle VCPUOP_register_runstate_memory_area

Allow guest to setup the vcpu runstates which is used as
steal clock.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle VCPUOP_register_vcpu_time_info

In order to support Linux vdso in Xen.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle VCPUOP_register_vcpu_info

Handle the hypercall to set a per vcpu info, and also wire up the default
vcpu_info in the shared_info page for the first 32 vCPUs.

To avoid deadlock within KVM a vCPU thread must set its *own* vcpu_info
rather than it being set from the context in which the hypercall is
invoked.

Add the vcpu_info (and default) GPA to the vmstate_x86_cpu for migration,
and restore it in kvm_arch_put_registers() appropriately.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_vcpu_op

This is simply when guest tries to register a vcpu_info
and since vcpu_info placement is optional in the minimum ABI
therefore we can just fail with -ENOSYS

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_hvm_op

This is when guest queries for support for HVMOP_pagetable_dying.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement XENMEM_add_to_physmap_batch

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_memory_op

Specifically XENMEM_add_to_physmap with space XENMAPSPACE_shared_info to
allow the guest to set its shared_info page.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Use the xen_overlay device, add compat support]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: manage and save/restore Xen guest long_mode setting

Xen will "latch" the guest's 32-bit or 64-bit ("long mode") setting when
the guest writes the MSR to fill in the hypercall page, or when the guest
sets the event channel callback in HVM_PARAM_CALLBACK_IRQ.

KVM handles the former and sets the kernel's long_mode flag accordingly.
The latter will be handled in userspace. Keep them in sync by noticing
when a hypercall is made in a mode that doesn't match qemu's idea of
the guest mode, and resyncing from the kernel. Do that same sync right
before serialization too, in case the guest has set the hypercall page
but hasn't yet made a system call.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: add pc_machine_kvm_type to initialize XEN_EMULATE mode

The xen_overlay device (and later similar devices for event channels and
grant tables) need to be instantiated. Do this from a kvm_type method on
the PC machine derivatives, since KVM is only way to support Xen emulation
for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: Permit --xen-domid argument when accel is KVM

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: David Wooodhouse <dwmw@amazon.co.uk>

hw/xen: Add xen_overlay device for emulating shared xenheap pages

For the shared info page and for grant tables, Xen shares its own pages
from the "Xen heap" to the guest. The guest requests that a given page
from a certain address space (XENMAPSPACE_shared_info, etc.) be mapped
to a given GPA using the XENMEM_add_to_physmap hypercall.

To support that in qemu when *emulating* Xen, create a memory region
(migratable) and allow it to be mapped as an overlay when requested.

Xen theoretically allows the same page to be mapped multiple times
into the guest, but that's hard to track and reinstate over migration,
so we automatically *unmap* any previous mapping when creating a new
one. This approach has been used in production with.... a non-trivial
number of guests expecting true Xen, without any problems yet being
noticed.

This adds just the shared info page for now. The grant tables will be
a larger region, and will need to be overlaid one page at a time. I
think that means I need to create separate aliases for each page of
the overall grant_frames region, so that they can be mapped individually.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Implement SCHEDOP_poll and SCHEDOP_yield

They both do the same thing and just call sched_yield. This is enough to
stop the Linux guest panicking when running on a host kernel which doesn't
intercept SCHEDOP_poll and lets it reach userspace.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_sched_op, SCHEDOP_shutdown

It allows to shutdown itself via hypercall with any of the 3 reasons:
  1) self-reboot
  2) shutdown
  3) crash

Implementing SCHEDOP_shutdown sub op let us handle crashes gracefully rather
than leading to triple faults if it remains unimplemented.

In addition, the SHUTDOWN_soft_reset reason is used for kexec, to reset
Xen shared pages and other enlightenments and leave a clean slate for the
new kernel without the hypervisor helpfully writing information at
unexpected addresses.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch sched_op_compat which was never available for HVM guests,
        Add SCHEDOP_soft_reset]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_xen_version

This is just meant to serve as an example on how we can implement
hypercalls. xen_version specifically since Qemu does all kind of
feature controllability. So handling that here seems appropriate.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Implement kvm_gva_rw() safely]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle guest hypercalls

This means handling the new exit reason for Xen but still
crashing on purpose. As we implement each of the hypercalls
we will then return the right return code.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Add CPL to hypercall tracing, disallow hypercalls from CPL > 0]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen-platform: allow its creation with XEN_EMULATE mode

The only thing we need to fix to make this build is the PIO hack which
sets the BIOS memory areas to R/W v.s. R/O. Theoretically we could hook
that up to the PAM registers on the emulated PIIX, but in practice
nobody cares, so just leave it doing nothing.

Now it builds without actual Xen, move it to CONFIG_XEN_BUS to include it
in the KVM-only builds.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen-platform: exclude vfio-pci from the PCI platform unplug

Such that PCI passthrough devices work for Xen emulated guests.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/kvm: Set Xen vCPU ID in KVM

There are (at least) three different vCPU ID number spaces. One is the
internal KVM vCPU index, based purely on which vCPU was chronologically
created in the kernel first. If userspace threads are all spawned and
create their KVM vCPUs in essentially random order, then the KVM indices
are basically random too.

The second number space is the APIC ID space, which is consistent and
useful for referencing vCPUs. MSIs will specify the target vCPU using
the APIC ID, for example, and the KVM Xen APIs also take an APIC ID
from userspace whenever a vCPU needs to be specified (as opposed to
just using the appropriate vCPU fd).

The third number space is not normally relevant to the kernel, and is
the ACPI/MADT/Xen CPU number which corresponds to cs->cpu_index. But
Xen timer hypercalls use it, and Xen timer hypercalls *really* want
to be accelerated in the kernel rather than handled in userspace, so
the kernel needs to be told.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/kvm: handle Xen HVM cpuid leaves

Introduce support for emulating CPUID for Xen HVM guests. It doesn't make
sense to advertise the KVM leaves to a Xen guest, so do Xen unconditionally
when the xen-version machine property is set.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Obtain xen_version from KVM property, make it automatic]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support

This just initializes the basic Xen support in KVM for now. Only permitted
on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap
overlay, event channel, grant tables and other stuff will exist. There's
no point having the basic hypercall support if nothing else works.

Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used
later by support devices.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: Add XEN_DISABLED mode and make it default

Also set XEN_ATTACH mode in xen_init() to reflect the truth; not that
anyone ever cared before. It was *only* ever checked in xen_init_pv()
before.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: add CONFIG_XEN_BUS and CONFIG_XEN_EMU options for Xen emulation

The XEN_EMU option will cover core Xen support in target/, which exists
only for x86 with KVM today but could theoretically also be implemented
on Arm/Aarch64 and with TCG or other accelerators (if anyone wants to
run the gauntlet of struct layout compatibility, errno mapping, and the
rest of that fui).

It will also cover the support for architecture-independent grant table
and event channel support which will be added in hw/i386/kvm/ (on the
basis that the non-KVM support is very theoretical and making it not use
KVM directly seems like gratuitous overengineering at this point).

The XEN_BUS option is for the xenfv platform support, which will now be
used both by XEN_EMU and by real Xen.

The XEN option remains dependent on the Xen runtime libraries, and covers
support for real Xen. Some code which currently resides under CONFIG_XEN
will be moving to CONFIG_XEN_BUS over time as the direct dependencies on
Xen runtime libraries are eliminated. The Xen PCI platform device will
also reside under CONFIG_XEN_BUS.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

include: import Xen public headers to hw/xen/interface

There's already a partial set here; update them and pull in a more
complete set.

To start with, define __XEN_TOOLS__ in hw/xen/xen.h to ensure that any
internal definitions needed by Xen toolstack libraries are present
regardless of the order in which the headers are included. A reckoning
will come later, once we make the PV backends work in emulation and
untangle the headers for Xen-native vs. generic parts.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Update to Xen public headers from 4.16.2 release, add some in io/,
define __XEN_TOOLS__ in hw/xen/xen.h, move to hw/xen/interface/]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

Block layer patches

- Lock the graph, part 2 (BlockDriver callbacks)
- virtio-scsi: fix SCSIDevice hot unplug with IOThread
- rbd: Add support for layered encryption

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmP3tUURHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9ZQkA/9HFBrcsfSyzU5sHXcpqrcVPsvFwwzhsXN
# V6zMvBXQVEMYo6oDBSyNrniOJSYjiFLm1c+bMAaAFbo8dvVqqlkecBuZgQkFjnCy
# vXyaYeWnBSG5A91Vs30qzLObBsrX7P1Gh+bvtRvBPThC1zd8lrxMbVzlsxnTfDFo
# DsPkgiXL0SZ6YLBN5s61GBCfjvF8i0/8TPAvvwhHEo15sBgcBSTFYSftzEe9TXmH
# NHAuHnRshrd9DNnf20tVPuHCanSTsIpbx5cLYBoy81vSbjqJG4agULZLltKP3fiM
# kadpqmhJwjq+KhioLmcIjevPnUuqOMEzubaxZUm9o8jjsFPa8Isv4sIaAxyUP6e6
# aze1Xh9vUXn/JEf2/hApUY+2rz5dREL/TqpFwyzZjdqJb8PVCuy1JA1m2zLkvRPd
# Bl9pS7kabhcZOHrITnJS7Lvyy4IWeiw78trtaer0nCbKbPdQB62eswSXKYh5g+Ke
# kVJbkRSNi6lnljK5egIR3VxxM5kbGZsY4aGuyZk3Lc5yeAuPOil9swHlSO+5LFxP
# lRZOyumHbfKU6J7JbGFErrqR2fZiqKUN/6i0HZAIcjpZq1QxXlmHBbmrkXao+j5Y
# 0WcHdduH65dHT8fnBMgDZCXUfV7iBufspkCmY1v50YNJRPNmDzb4Os/Jh9qLHHMQ
# M1ae+58T0Fo=
# =gOli
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 23 Feb 2023 18:49:41 GMT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (29 commits)
  block/rbd: Add support for layered encryption
  block/rbd: Add luks-any encryption opening option
  block/rbd: Remove redundant stack variable passphrase_len
  virtio-scsi: reset SCSI devices from main loop thread
  dma-helpers: prevent dma_blk_cb() vs dma_aio_cancel() race
  scsi: protect req->aiocb with AioContext lock
  block: Mark bdrv_co_refresh_total_sectors() and callers GRAPH_RDLOCK
  block: Mark bdrv_*_dirty_bitmap() and callers GRAPH_RDLOCK
  block: Mark bdrv_co_delete_file() and callers GRAPH_RDLOCK
  block: Mark bdrv_(un)register_buf() GRAPH_RDLOCK
  block: Mark bdrv_co_eject/lock_medium() and callers GRAPH_RDLOCK
  block: Mark bdrv_co_is_inserted() and callers GRAPH_RDLOCK
  block: Mark bdrv_co_io_(un)plug() and callers GRAPH_RDLOCK
  block: Mark bdrv_co_create() and callers GRAPH_RDLOCK
  block: Mark preadv_snapshot/snapshot_block_status GRAPH_RDLOCK
  block: Mark bdrv_co_copy_range() GRAPH_RDLOCK
  block: Mark bdrv_co_do_pwrite_zeroes() GRAPH_RDLOCK
  block: Mark bdrv_co_pwrite_sync() and callers GRAPH_RDLOCK
  block: Mark public read/write functions GRAPH_RDLOCK
  block: Mark read/write in block/io.c GRAPH_RDLOCK
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-error-2023-02-23' of https://repo.or.cz/qemu/armbru into staging

Error reporting patches patches for 2023-02-23

# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmP3ZogSHGFybWJydUBy
# ZWRoYXQuY29tAAoJEDhwtADrkYZT+PsP/ibioHJkJiR8yMt2/2iSwpkMrphZDmRQ
# 5sAgxCARdcp0m7maH4McCFkgtERcROip+j98FV29qI4y2P/mLkt1jyMYC+TH9r4O
# X3G997526gzZBLIJJsnYlVlJ1Gbgn+uCy4AzRLuhaKAHsYoxkP0jygoSs/eIZ9tK
# Wg2tkQ/wY4bXihrlzdOpWqU3Y0ADo2PQ29p7HWheRMDQz6JQxq82hFFs1jgGQ1aq
# 4HmcpIMX0+/LshFbDU91dL1pxW17vWT9J3xtzAsWlfBBgAh257LKvJqVD0XojL04
# FxJZ05IqTXZ04gvwgji0dcvNjdmP/dXVoGLfxAYwCFtKxiig700bdNb0+6MjCT6u
# P2tSPyQQzNQ5LYI7AgER4kMyXK22RkBXx+Q7y7QK1YXszWWSmGFZWGLA2FSg4lO6
# 5jsCgtEGixsMym/ox3XeoywSh4BgWkNXC+gKMSg/hQXgfriQmndHUOlK0ZU95I43
# 7gnPol+pU1HIEy/GDU8oMyieG513Ti1KVPZyv/FbuW75AYUDlHAXH/5OFlsuaLIR
# 1QF449xCLR5vIOOLXHbKJ9jbkcAaidhq5pOhLr7oV3yKh4H53iNB7gy8+vJ6XtBf
# tXXcYPVD8LpZxDegKNpIaeT0Nr4pyy6bYfrF+YeisVotD6PDtPALfJ9eSCWjaQsl
# DG2opOfv5xuV
# =VRxu
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 23 Feb 2023 13:13:44 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* tag 'pull-error-2023-02-23' of https://repo.or.cz/qemu/armbru:
  rocker: Tweak stubbed out monitor commands' error messages
  migration/colo: Improve an x-colo-lost-heartbeat error message
  hw/core: Improve the query-hotpluggable-cpus error message
  replay: Simplify setting replay blockers
  qga: Drop dangling reference to QERR_QGA_LOGGING_DISABLED
  hw/acpi: Move QMP command to hw/core/
  hw/acpi: Dumb down acpi_table_add() stub
  hw/smbios: Dumb down smbios_entry_add() stub
  hw/core: Improve error message when machine doesn't provide NMIs
  dump: Assert cpu_get_note_size() can't fail
  dump: Improve error message when target doesn't support memory dump
  error: Drop superfluous #include "qapi/qmp/qerror.h"

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-qapi-2023-02-23' of https://repo.or.cz/qemu/armbru into staging

QAPI patches patches for 2023-02-23

# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmP3VikSHGFybWJydUBy
# ZWRoYXQuY29tAAoJEDhwtADrkYZTf98P/1tf2tPnWL0QpGXqGOq/iy2cFhcoco06
# 30t4JzTZGMZv8aUBRzlnhNgp+C7uMnXxuO7DeVN/K8VCvRfGXYz1HYFJ0NWhhMz6
# RULvVncJ7m9vFykmu3iibxvjyH6uj0R5xJ8ZNIrySLTCu+58voDF/IbZ0ep3v5nX
# 1AV1ljL9taxg2SrQ53Whbet9zfgXVFnV5wLKkLOqLGvviO2OBPG7rrtQaEX2jrsa
# SdTiOdBk1IMvG3FT6cVx3bM3kQd15UwfcJsdIYpB7QBZNoqgiyfMPsNr8HzpZlJn
# KOe3qWVFWHGMWY4MtQ1j9Ph44RPrJybvPQRMDNB3CiDYEtBWsth0fZxhw9T/tKca
# 5KgJaxecB3UsXFUBWhmvhkw+hwG+cDWHtiYZSb9AX4cqvPid1UdLnSQFWgHFGX+2
# ok0Q7gy9jYEpteVbIM8kQG0TF7xnZlv99uDK8b4MAH33roXwy70vffxpRGnngNyH
# IcLvzmDqRlrlzdvUi8Uro22VmUAUqSQKxKYt9yBJcEUV9NLi8E6g+Hcrvt7YNF9V
# jcVub4aIawEZCvnPCpOgzHD9p26ofwb2WQ245/5kzMUVi2pBYsHH6hJj7WdMPixS
# r24Ykgo4sxujW4pVy45lXzpA8uKWELCp9iKUOO6hvdoJEybVDMj9zcVn70cJgDrE
# RUle5av0n8XR
# =XV0D
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 23 Feb 2023 12:03:53 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* tag 'pull-qapi-2023-02-23' of https://repo.or.cz/qemu/armbru:
  qapi: remove JSON value FIXME
  qapi: remove _JSONObject
  qapi/parser: add QAPIExpression type
  qapi: Add minor typing workaround for 3.6
  qapi: update pylint configuration
  qapi: Update flake8 config
  docs/devel/qapi-code-gen: Fix a missing 'may', clarify SchemaInfo
  docs/devel/qapi-code-gen: Belatedly update features documentation

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'python-pull-request' of https://gitlab.com/jsnow/qemu into staging

Python

Only minor testing updates.

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAmP27UcACgkQfe+BBqr8
# OQ5xSBAAlh+PSySlKnqxJm9dVi+K+8o6KOf9Oq0TAO+1lSEdAN0yMTW81TNAk50S
# HWTk2CclPVCrNxmvR1zr7ibp1BJvWLtfqcBSh+e3XQ1pPpciS2L+ny2OtuNYH5G5
# qSfLxoOXqV57gHdwkWmtC1b3AsnpcgdH84r4gUIaVPWx4fvm/JBVa0R40OjWaEZ6
# gTteIqoXN/tusBk6+ssELcNAA6jlHcVbhzS31Xi1/GDAWiu4wehqQ30zbFwvpyHn
# QN0NKeh1L6cGtjfN2PHv6tji5Z479yKYQU861BCn8SEJ052f4qLb/GBT01Fx3h+7
# 6bonnNXQrnyBNXWotYadTZMreUdDokuPF7FV4dNqd9E+552aF7WhodueO0lyyaTv
# bPHFavgyfNhfPblYqLpAWiPt+BlkZNazeWTAyRaQCqA1zHOr44K0ff1vVBGGvA2/
# xd0zGJ8xGiagz4ifIpyb3Fk9fampZkMAlJjKDfhhQzDdm/mrtdOt2uZBT2IhYX7z
# E+2+WfRE98kgAy17pzVB5GPRm+yFzWiu7H7zpGu4nQzswLWrKPrdwq8XYOZ16fL8
# NAKbn6h6CS0sOYiArr3tzQSnzBlaKCmOalsNjNCeFbuH4vTmKGamohpAW/OoBxhN
# 1X3aCdXqW0ewBrLWVHfluM0mhbq6i9ycYGi24pTikFPBqJCQP2o=
# =FJCZ
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 23 Feb 2023 04:36:23 GMT
# gpg:                using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full]
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* tag 'python-pull-request' of https://gitlab.com/jsnow/qemu:
  python: drop pipenv
  python: support pylint 2.16

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* x86 bugfixes
* OpenBSD support for naming threads
* Refined Python support policy

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP0wtkUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroNI6QgAjMvEV0N5FZYMpiuQdjebBeV+uHM6
# LThewCQa0cW5jb1X1NFBbOxYlNfE3WQeZqQF+BiVJr5wT2UsyNsPH7wTjsP387vV
# juoD7D/XZo8P4Qi+vJWo8XVBrzWEK8QS1P+NxWr+ZnsAhDx2+MR87fVmHtVBW1pI
# oDO0iyRrvVtaTAIVyNWSgZ59SLMmcH/6L4aYv5nrKYuAWx7fTneGGheKuqk55RsV
# sMv+fHolmmwKVm8tMFksw0atPwL7ZmSm1uObNHCQKdDNSoakC7YpaXa3y8LEzU7I
# B4h/PsmRpN33ggvsiuzFp9kfEHMy4QazfpoVFFqTLalhTr+XuiNTxj8xdA==
# =6eNN
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 21 Feb 2023 13:10:49 GMT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
  target/i386/gdbstub: Fix a bug about order of FPU stack in 'g' packets.
  docs: build-platforms: refine requirements on Python build dependencies
  thread-posix: add support for setting threads name on OpenBSD
  target/i386: Fix 32-bit AD[CO]X insns in 64-bit mode

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-tcg-20230221' of https://gitlab.com/rth7680/qemu into staging

tcg: Allow first half of insn in ram, and second half in mmio
linux-user/sparc: SIGILL for unknown trap vectors
linux-user/microblaze: SIGILL for privileged insns
linux-user: Fix deadlock while exiting due to signal
target/microblaze: Add gdbstub xml
util: Adjust cacheflush for windows-arm64
include/sysemu/os-win32: Adjust setjmp/longjmp for windows-arm64

# -----BEGIN PGP SIGNATURE-----
#
# iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmP1dpkdHHJpY2hhcmQu
# aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV+70gf+OOM3KmsFpsJ4+68W
# v/ulVwye3RFQXv4KRtuRPeKCKMk7vXmBRj9gsyOpc23TaoYiMNbFbztpAkcc/Z/1
# +6H8QeZGLWDqiX6ashwGNm/2bqPbvY7znaCvNuLkNGCPBeJ12C19uN1BBiGdeqOe
# IXIIk1r0U6rfIDhP2PJALXOxgHd/8/onYbhU6kU5tQjM24pycW44UUGPSeV++I0e
# xWezAYOmZ4PK58bXHDPMZ0UkzuefaNmiLlfwj/4nlaWQetwQTy7BeEU6FpKolUN2
# wrvfCqth/c3SdUaZHu4DoX1yWt72L37SpO0ijvk8E+AqsvXTn9gFdWK2dsEiPEeS
# Z9abFw==
# =dxZo
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 22 Feb 2023 01:57:45 GMT
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* tag 'pull-tcg-20230221' of https://gitlab.com/rth7680/qemu:
  sysemu/os-win32: fix setjmp/longjmp on windows-arm64
  util/cacheflush: fix cache on windows-arm64
  target/microblaze: Add gdbstub xml
  linux-user/microblaze: Handle privileged exception
  cpus: Make {start,end}_exclusive() recursive
  linux-user: Always exit from exclusive state in fork_end()
  linux-user/sparc: Raise SIGILL for all unhandled software traps
  accel/tcg: Allow the second page of an instruction to be MMIO

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

block/rbd: Add support for layered encryption

Starting from ceph Reef, RBD has built-in support for layered encryption,
where each ancestor image (in a cloned image setting) can be possibly
encrypted using a unique passphrase.

A new function, rbd_encryption_load2, was added to librbd API.
This new function supports an array of passphrases (via "spec" structs).

This commit extends the qemu rbd driver API to use this new librbd API,
in order to support this new layered encryption feature.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
Message-Id: <20230129113120.722708-4-oro@oro.sl.cloud9.ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/rbd: Add luks-any encryption opening option

Ceph RBD encryption API required specifying the encryption format
for loading encryption. The supported formats were LUKS (v1) and LUKS2.

Starting from Reef release, RBD also supports loading with "luks-any" format,
which works for both versions of LUKS.

This commit extends the qemu rbd driver API to enable qemu users to use
this luks-any wildcard format.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
Message-Id: <20230129113120.722708-3-oro@oro.sl.cloud9.ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/rbd: Remove redundant stack variable passphrase_len

Signed-off-by: Or Ozeri <oro@il.ibm.com>
Message-Id: <20230129113120.722708-2-oro@oro.sl.cloud9.ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

virtio-scsi: reset SCSI devices from main loop thread

When an IOThread is configured, the ctrl virtqueue is processed in the
IOThread. TMFs that reset SCSI devices are currently called directly
from the IOThread and trigger an assertion failure in blk_drain() from
the following call stack:

virtio_scsi_handle_ctrl_req -> virtio_scsi_do_tmf -> device_code_reset
-> scsi_disk_reset -> scsi_device_purge_requests -> blk_drain

../block/block-backend.c:1780: void blk_drain(BlockBackend *): Assertion `qemu_in_main_thread()' failed.

The blk_drain() function is not designed to be called from an IOThread
because it needs the Big QEMU Lock (BQL).

This patch defers TMFs that reset SCSI devices to a Bottom Half (BH)
that runs in the main loop thread under the BQL. This way it's safe to
call blk_drain() and the assertion failure is avoided.

Introduce s->tmf_bh_list for tracking TMF requests that have been
deferred to the BH. When the BH runs it will grab the entire list and
process all requests. Care must be taken to clear the list when the
virtio-scsi device is reset or unrealized. Otherwise deferred TMF
requests could execute later and lead to use-after-free or other
undefined behavior.

The s->resetting counter that's used by TMFs that reset SCSI devices is
accessed from multiple threads. This patch makes that explicit by using
atomic accessor functions. With this patch applied the counter is only
modified by the main loop thread under the BQL but can be read by any
thread.

Reported-by: Qing Wang <qinwang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230221212218.1378734-4-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

dma-helpers: prevent dma_blk_cb() vs dma_aio_cancel() race

dma_blk_cb() only takes the AioContext lock around ->io_func(). That
means the rest of dma_blk_cb() is not protected. In particular, the
DMAAIOCB field accesses happen outside the lock.

There is a race when the main loop thread holds the AioContext lock and
invokes scsi_device_purge_requests() -> bdrv_aio_cancel() ->
dma_aio_cancel() while an IOThread executes dma_blk_cb(). The dbs->acb
field determines how cancellation proceeds. If dma_aio_cancel() sees
dbs->acb == NULL while dma_blk_cb() is still running, the request can be
completed twice (-ECANCELED and the actual return value).

The following assertion can occur with virtio-scsi when an IOThread is
used:

../hw/scsi/scsi-disk.c:368: scsi_dma_complete: Assertion `r->req.aiocb != NULL' failed.

Fix the race by holding the AioContext across dma_blk_cb(). Now
dma_aio_cancel() under the AioContext lock will not see
inconsistent/intermediate states.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230221212218.1378734-3-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

scsi: protect req->aiocb with AioContext lock

If requests are being processed in the IOThread when a SCSIDevice is
unplugged, scsi_device_purge_requests() -> scsi_req_cancel_async() races
with I/O completion callbacks. Both threads load and store req->aiocb.
This can lead to assert(r->req.aiocb == NULL) failures and undefined
behavior.

Protect r->req.aiocb with the AioContext lock to prevent the race.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230221212218.1378734-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_refresh_total_sectors() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_refresh_total_sectors() need to hold a reader lock for the
graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-24-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_*_dirty_bitmap() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_*_dirty_bitmap() need to hold a reader lock for the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-23-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_delete_file() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_delete_file() need to hold a reader lock for the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-22-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_(un)register_buf() GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_register_buf() and bdrv_unregister_buf() need to hold a reader lock
for the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-21-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_eject/lock_medium() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_eject() and bdrv_co_lock_medium() need to hold a reader lock for
the graph.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-20-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_is_inserted() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_is_inserted() need to hold a reader lock for the graph.

blk_is_inserted() is done as a co_wrapper_mixed_bdrv_rdlock (unlike most
other blk_* functions) because it is called a lot from other blk_co_*()
functions that already hold the lock. These calls go through
blk_is_available(), which becomes a co_wrapper_mixed_bdrv_rdlock, too,
for the same reason.

Functions that run in a coroutine and can call bdrv_co_is_available()
directly are changed to do so, which results in better TSA coverage.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-19-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_io_(un)plug() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_io_plug() and bdrv_co_io_unplug() need to hold a reader lock for
the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-18-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_create() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_create() need to hold a reader lock for the graph.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-17-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark preadv_snapshot/snapshot_block_status GRAPH_RDLOCK

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-16-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_copy_range() GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_copy_range() need to hold a reader lock for the graph.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-15-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_do_pwrite_zeroes() GRAPH_RDLOCK

All callers are already GRAPH_RDLOCK, so just add the annotation and
remove assume_graph_lock().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-14-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_pwrite_sync() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_pwrite_sync() need to hold a reader lock for the graph.

For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-13-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark public read/write functions GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_pread*/pwrite*() need to hold a reader lock for the graph.

For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-12-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark read/write in block/io.c GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_driver_*() need to hold a reader lock for the graph. It doesn't add
the annotation to public functions yet.

For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-11-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_pwrite_zeroes() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_pwrite_zeroes() need to hold a reader lock for the graph.

For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-10-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_pdiscard() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_pdiscard() need to hold a reader lock for the graph.

For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-9-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_flush() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_flush() need to hold a reader lock for the graph.

For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-8-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/qed: add missing graph rdlock in qed_need_check_timer_entry

This function is called in two different places:
- timer callback, which does not take the graph rdlock.
- bdrv_qed_drain_begin(), which is .bdrv_drain_begin()
callback documented as function that does not take the lock.

Since it calls recursive functions that traverse the
graph, we need to protect them with the graph rdlock.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-7-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_ioctl() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_ioctl() need to hold a reader lock for the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-6-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_block_status() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_block_status() need to hold a reader lock for the graph.

For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-5-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Mark bdrv_co_truncate() and callers GRAPH_RDLOCK

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_truncate() need to hold a reader lock for the graph.

For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-4-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

mirror: Fix access of uninitialised fields during start

bdrv_mirror_top_pwritev() accesses the job object when active mirroring
is enabled. It disables this code during early initialisation while
s->job isn't set yet.

However, s->job is still set way too early when the job object isn't
fully initialised. For example, &s->ops_in_flight isn't initialised yet
and the in_flight bitmap doesn't exist yet. This causes crashes when a
write request comes in too early.

Move the assignment of s->job to when the mirror job is actually fully
initialised to make sure that the mirror_top driver doesn't access it
too early.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-3-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Make bdrv_can_set_read_only() static

It is never called outside of block.c.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-2-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

rocker: Tweak stubbed out monitor commands' error messages

The QERR_ macros are leftovers from the days of "rich" error objects.
We've been trying to reduce their remaining use.

The stubbed out Rocker monitor commands are the last remaining users
of QERR_FEATURE_DISABLED.  They fail like this:

    (qemu) info rocker mumble
    Error: The feature 'rocker' is not enabled

The real rocker commands fail like this when the named object doesn't
exist:

    Error: rocker mumble not found

If that's good enough when Rocker is enabled, then it's good enough
when it's disabled, so replace QERR_FEATURE_DISABLED with that, and
drop the macro.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-13-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>

migration/colo: Improve an x-colo-lost-heartbeat error message

The QERR_ macros are leftovers from the days of "rich" error objects.
We've been trying to reduce their remaining use.

Get rid of a use of QERR_FEATURE_DISABLED, and improve the somewhat
imprecise error message

    (qemu) x_colo_lost_heartbeat
    Error: The feature 'colo' is not enabled

to

    Error: VM is not in COLO mode

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-12-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>

hw/core: Improve the query-hotpluggable-cpus error message

The QERR_ macros are leftovers from the days of "rich" error objects.
We've been trying to reduce their remaining use.

Get rid of a use of QERR_FEATURE_DISABLED, and improve the slightly
awkward error message

    (qemu) info hotpluggable-cpus
    Error: The feature 'query-hotpluggable-cpus' is not enabled

to

    Error: machine does not support hot-plugging CPUs

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-11-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

replay: Simplify setting replay blockers

replay_add_blocker() takes an Error *. All callers pass one created
like this:

error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED, "some feature");

Folding this into replay_add_blocker() simplifies the callers, losing
a bit of generality we haven't needed in more than six years.

Since there are no other uses of macro QERR_REPLAY_NOT_SUPPORTED,
replace the remaining one by its expansion, and drop the macro.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

qga: Drop dangling reference to QERR_QGA_LOGGING_DISABLED

slog()'s function comment advises to use QERR_QGA_LOGGING_DISABLED.
This macro never existed. The reference got added in commit
e3d4d25206a "guest agent: add guest agent RPCs/commands" along with
QERR_QGA_LOGGING_FAILED, so maybe that one was meant. However,
QERR_QGA_LOGGING_FAILED was never actually used, and was removed in
commit d73f0beadb5 "qerror.h: Remove unused error classes".

Drop the dangling reference.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-9-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com>