www.infradead.org Git - users/dwmw2/qemu.git/log

i386/xen: add monitor commands to test event injection

Specifically add listing, injection of event channels.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_reset

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_bind_vcpu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_bind_interdomain

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_alloc_unbound

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_send

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_bind_ipi

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_bind_virq

Add the array of virq ports to each vCPU so that we can deliver timers,
debug ports, etc. Global virqs are allocated against vCPU 0 initially,
but can be migrated to other vCPUs (when we implement that).

The kernel needs to know about VIRQ_TIMER in order to accelerate timers,
so tell it via KVM_XEN_VCPU_ATTR_TYPE_TIMER. Also save/restore the value
of the singleshot timer across migration, as the kernel will handle the
hypercalls automatically now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_unmask

This finally comes with a mechanism for actually injecting events into
the guest vCPU, with all the atomic-test-and-set that's involved in
setting the bit in the shinfo, then the index in the vcpu_info, and
injecting either the lapic vector as MSI, or letting KVM inject the
bare vector.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_close

It calls an internal close_port() helper which will also be used from
EVTCHNOP_reset and will actually do the work to disconnect/unbind a port
once any of that is actually implemented in the first place.

That in turn calls a free_port() internal function which will be in
error paths after allocation.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_status

This adds the basic structure for maintaining the port table and reporting
the status of ports therein.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Add support for Xen event channel delivery to vCPU

The kvm_xen_inject_vcpu_callback_vector() function will either deliver
the per-vCPU local APIC vector (as an MSI), or just kick the vCPU out
of the kernel to trigger KVM's automatic delivery of the global vector.
Support for asserting the GSI/PCI_INTX callbacks will come later.

Also add kvm_xen_get_vcpu_info_hva() which returns the vcpu_info of
a given vCPU.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Add xen_evtchn device for event channel emulation

Include basic support for setting HVM_PARAM_CALLBACK_IRQ to the global
vector method HVM_PARAM_CALLBACK_TYPE_VECTOR, which is handled in-kernel
by raising the vector whenever the vCPU's vcpu_info->evtchn_upcall_pending
flag is set.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HVMOP_set_param

This is the hook for adding the HVM_PARAM_CALLBACK_IRQ parameter in a
subsequent commit.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Split out from another commit]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HVMOP_set_evtchn_upcall_vector

The HVMOP_set_evtchn_upcall_vector hypercall sets the per-vCPU upcall
vector, to be delivered to the local APIC just like an MSI (with an EOI).

This takes precedence over the system-wide delivery method set by the
HVMOP_set_param hypercall with HVM_PARAM_CALLBACK_IRQ. It's used by
Windows and Xen (PV shim) guests but normally not by Linux.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Rework for upstream kernel changes and split from HVMOP_set_param]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_event_channel_op

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch event_channel_op_compat which was never available to HVM guests]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle VCPUOP_register_runstate_memory_area

Allow guest to setup the vcpu runstates which is used as
steal clock.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle VCPUOP_register_vcpu_time_info

In order to support Linux vdso in Xen.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle VCPUOP_register_vcpu_info

Handle the hypercall to set a per vcpu info, and also wire up the default
vcpu_info in the shared_info page for the first 32 vCPUs.

To avoid deadlock within KVM a vCPU thread must set its *own* vcpu_info
rather than it being set from the context in which the hypercall is
invoked.

Add the vcpu_info (and default) GPA to the vmstate_x86_cpu for migration,
and restore it in kvm_arch_put_registers() appropriately.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_vcpu_op

This is simply when guest tries to register a vcpu_info
and since vcpu_info placement is optional in the minimum ABI
therefore we can just fail with -ENOSYS

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_hvm_op

This is when guest queries for support for HVMOP_pagetable_dying.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement XENMEM_add_to_physmap_batch

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_memory_op

Specifically XENMEM_add_to_physmap with space XENMAPSPACE_shared_info to
allow the guest to set its shared_info page.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Use the xen_overlay device, add compat support]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: manage and save/restore Xen guest long_mode setting

Xen will "latch" the guest's 32-bit or 64-bit ("long mode") setting when
the guest writes the MSR to fill in the hypercall page, or when the guest
sets the event channel callback in HVM_PARAM_CALLBACK_IRQ.

KVM handles the former and sets the kernel's long_mode flag accordingly.
The latter will be handled in userspace. Keep them in sync by noticing
when a hypercall is made in a mode that doesn't match qemu's idea of
the guest mode, and resyncing from the kernel. Do that same sync right
before serialization too, in case the guest has set the hypercall page
but hasn't yet made a system call.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: add pc_machine_kvm_type to initialize XEN_EMULATE mode

The xen_overlay device (and later similar devices for event channels and
grant tables) need to be instantiated. Do this from a kvm_type method on
the PC machine derivatives, since KVM is only way to support Xen emulation
for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: Permit --xen-domid argument when accel is KVM

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: David Wooodhouse <dwmw@amazon.co.uk>

hw/xen: Add xen_overlay device for emulating shared xenheap pages

For the shared info page and for grant tables, Xen shares its own pages
from the "Xen heap" to the guest. The guest requests that a given page
from a certain address space (XENMAPSPACE_shared_info, etc.) be mapped
to a given GPA using the XENMEM_add_to_physmap hypercall.

To support that in qemu when *emulating* Xen, create a memory region
(migratable) and allow it to be mapped as an overlay when requested.

Xen theoretically allows the same page to be mapped multiple times
into the guest, but that's hard to track and reinstate over migration,
so we automatically *unmap* any previous mapping when creating a new
one. This approach has been used in production with.... a non-trivial
number of guests expecting true Xen, without any problems yet being
noticed.

This adds just the shared info page for now. The grant tables will be
a larger region, and will need to be overlaid one page at a time. I
think that means I need to create separate aliases for each page of
the overall grant_frames region, so that they can be mapped individually.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Implement SCHEDOP_poll and SCHEDOP_yield

They both do the same thing and just call sched_yield. This is enough to
stop the Linux guest panicking when running on a host kernel which doesn't
intercept SCHEDOP_poll and lets it reach userspace.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_sched_op, SCHEDOP_shutdown

It allows to shutdown itself via hypercall with any of the 3 reasons:
  1) self-reboot
  2) shutdown
  3) crash

Implementing SCHEDOP_shutdown sub op let us handle crashes gracefully rather
than leading to triple faults if it remains unimplemented.

In addition, the SHUTDOWN_soft_reset reason is used for kexec, to reset
Xen shared pages and other enlightenments and leave a clean slate for the
new kernel without the hypervisor helpfully writing information at
unexpected addresses.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch sched_op_compat which was never available for HVM guests,
        Add SCHEDOP_soft_reset]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_xen_version

This is just meant to serve as an example on how we can implement
hypercalls. xen_version specifically since Qemu does all kind of
feature controllability. So handling that here seems appropriate.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Implement kvm_gva_rw() safely]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle guest hypercalls

This means handling the new exit reason for Xen but still
crashing on purpose. As we implement each of the hypercalls
we will then return the right return code.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Add CPL to hypercall tracing, disallow hypercalls from CPL > 0]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

xen-platform: allow its creation with XEN_EMULATE mode

The only thing we need to fix to make this build is the PIO hack which
sets the BIOS memory areas to R/W v.s. R/O. Theoretically we could hook
that up to the PAM registers on the emulated PIIX, but in practice
nobody cares, so just leave it doing nothing.

Now it builds without actual Xen, move it to CONFIG_XEN_BUS to include it
in the KVM-only builds.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen-platform: exclude vfio-pci from the PCI platform unplug

Such that PCI passthrough devices work for Xen emulated guests.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/hvm: Set Xen vCPU ID in KVM

There are (at least) three different vCPU ID number spaces. One is the
internal KVM vCPU index, based purely on which vCPU was chronologically
created in the kernel first. If userspace threads are all spawned and
create their KVM vCPUs in essentially random order, then the KVM indices
are basically random too.

The second number space is the APIC ID space, which is consistent and
useful for referencing vCPUs. MSIs will specify the target vCPU using
the APIC ID, for example, and the KVM Xen APIs also take an APIC ID
from userspace whenever a vCPU needs to be specified (as opposed to
just using the appropriate vCPU fd).

The third number space is not normally relevant to the kernel, and is
the ACPI/MADT/Xen CPU number which corresponds to cs->cpu_index. But
Xen timer hypercalls use it, and Xen timer hypercalls *really* want
to be accelerated in the kernel rather than handled in userspace, so
the kernel needs to be told.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/kvm: handle Xen HVM cpuid leaves

Introduce support for emulating CPUID for Xen HVM guests. It doesn't make
sense to advertise the KVM leaves to a Xen guest, so do Xen unconditionally
when the xen-version machine property is set.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Obtain xen_version from KVM property, make it automatic]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support

This just initializes the basic Xen support in KVM for now. Only permitted
on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap
overlay, event channel, grant tables and other stuff will exist. There's
no point having the basic hypercall support if nothing else works.

Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used
later by support devices.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: Add XEN_DISABLED mode and make it default

Also set XEN_ATTACH mode in xen_init() to reflect the truth; not that
anyone ever cared before. It was *only* ever checked in xen_init_pv()
before.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: add CONFIG_XEN_BUS and CONFIG_XEN_EMU options for Xen emulation

The XEN_EMU option will cover core Xen support in target/, which exists
only for x86 with KVM today but could theoretically also be implemented
on Arm/Aarch64 and with TCG or other accelerators (if anyone wants to
run the gauntlet of struct layout compatibility, errno mapping, and the
rest of that fui).

It will also cover the support for architecture-independent grant table
and event channel support which will be added in hw/i386/kvm/ (on the
basis that the non-KVM support is very theoretical and making it not use
KVM directly seems like gratuitous overengineering at this point).

The XEN_BUS option is for the xenfv platform support, which will now be
used both by XEN_EMU and by real Xen.

The XEN option remains dependent on the Xen runtime libraries, and covers
support for real Xen. Some code which currently resides under CONFIG_XEN
will be moving to CONFIG_XEN_BUS over time as the direct dependencies on
Xen runtime libraries are eliminated. The Xen PCI platform device will
also reside under CONFIG_XEN_BUS.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

include: import Xen public headers to hw/xen/interface

There's already a partial set here; update them and pull in a more
complete set.

To start with, define __XEN_TOOLS__ in hw/xen/xen.h to ensure that any
internal definitions needed by Xen toolstack libraries are present
regardless of the order in which the headers are included. A reckoning
will come later, once we make the PV backends work in emulation and
untangle the headers for Xen-native vs. generic parts.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Update to Xen public headers from 4.16.2 release, add some in io/,
define __XEN_TOOLS__ in hw/xen/xen.h, move to hw/xen/interface/]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

Merge tag 'xenpvh2' of https://gitlab.com/sstabellini/qemu into staging

tag xenpvh2

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEE0E4zq6UfZ7oH0wrqiU+PSHDhrpAFAmPsG84ACgkQiU+PSHDh
# rpAN9g/8DzXnYz7dpoS7ooh8X4RW6LRCxD6I7sFwFyu6ViMBsWFEHe+/q46haZ+n
# H6OqwpvSZ2hKTc4FN3ZQ5klC9Ho3DU6PI01atIQM9FnMVCkn/+y6XFj+A6e/hlEG
# PEAB+TwZkXhMtKMBQsWBx+4Uy5akpG/yVCz88ekodKebP/0Mj0YNYRWizQaXVjKG
# lntP/+6fdQ+jxISFHJ7lMZs4WgjfQ+17t9dY66MTG1N1dFhw6Eo21CmRdpeDvH1h
# y1Hv6eFkMwQ/uCbdWLWuRsHM6QmLycB0uucqMGB5pVnoIzPV/3ZaW6Lfb4LG2aPj
# ef8HCgwpKSnsiuOFwDnPdrj/28yPY+ldEydbhn30Aub3qyQG1sYpWTnJ9DhERbeP
# zR9KOWG/1XWZalQIpYzYYburJOX1CfcpnBdhceeqDHrydSs89WXTlHv7hV8XSCyA
# ++BOi5T+kgYP66uTUR8iDgHsN+nLmbcY9FB41yMhu8gHU6YHINJ0a8gyNDzaxKJa
# 7ymyYlofK5O8kRC+0ww6mx7iY4yqjRBOzOTzTtkLk2+yU5DjIKp7W6750xcz40Hd
# FpgzDIfVOXYE4Zu9JL6fyCQvI8y3tZEn/7I1UoyJBAUcCEv8TdWwQe6DdYzKXJNQ
# klDj9pkH6xnbGI+uk3xPk3EgeDG/NbYepaWN5bLtRem6TNho8tw=
# =w7C0
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 14 Feb 2023 23:39:58 GMT
# gpg:                using RSA key D04E33ABA51F67BA07D30AEA894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <sstabellini@kernel.org>" [unknown]
# gpg:                 aka "Stefano Stabellini <stefano.stabellini@eu.citrix.com>" [full]
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3  0AEA 894F 8F48 70E1 AE90

* tag 'xenpvh2' of https://gitlab.com/sstabellini/qemu:
  meson.build: enable xenpv machine build for ARM
  hw/arm: introduce xenpvh machine
  meson.build: do not set have_xen_pci_passthrough for aarch64 targets
  hw/xen/xen-hvm-common: Use g_new and error_report
  hw/xen/xen-hvm-common: skip ioreq creation on ioreq registration failure
  include/hw/xen/xen_common: return error from xen_create_ioreq_server
  xen-hvm: reorganize xen-hvm and move common function to xen-hvm-common
  hw/i386/xen/xen-hvm: move x86-specific fields out of XenIOState
  hw/i386/xen: rearrange xen_hvm_init_pc
  hw/i386/xen/: move xen-mapcache.c to hw/xen/

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

meson.build: enable xenpv machine build for ARM

Add CONFIG_XEN for aarch64 device to support build for ARM targets.

Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

hw/arm: introduce xenpvh machine

Add a new machine xenpvh which creates a IOREQ server to register/connect with
Xen Hypervisor.

Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
TPM emulator and connects to swtpm running on host machine via chardev socket
and support TPM functionalities for a guest domain.

Extra command line for aarch64 xenpvh QEMU to connect to swtpm:
    -chardev socket,id=chrtpm,path=/tmp/myvtpm2/swtpm-sock \
    -tpmdev emulator,id=tpm0,chardev=chrtpm \
    -machine tpm-base-addr=0x0c000000 \

swtpm implements a TPM software emulator(TPM 1.2 & TPM 2) built on libtpms and
provides access to TPM functionality over socket, chardev and CUSE interface.
Github repo: https://github.com/stefanberger/swtpm
Example for starting swtpm on host machine:
    mkdir /tmp/vtpm2
    swtpm socket --tpmstate dir=/tmp/vtpm2 \
    --ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &

Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

meson.build: do not set have_xen_pci_passthrough for aarch64 targets

have_xen_pci_passthrough is only used for Xen x86 VMs.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

hw/xen/xen-hvm-common: Use g_new and error_report

Replace g_malloc with g_new and perror with error_report.

Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen/xen-hvm-common: skip ioreq creation on ioreq registration failure

On ARM it is possible to have a functioning xenpv machine with only the
PV backends and no IOREQ server. If the IOREQ server creation fails continue
to the PV backends initialization.

Also, moved the IOREQ registration and mapping subroutine to new function
xen_do_ioreq_register().

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Paul Durrant <paul@xen.org>

include/hw/xen/xen_common: return error from xen_create_ioreq_server

This is done to prepare for enabling xenpv support for ARM architecture.
On ARM it is possible to have a functioning xenpv machine with only the
PV backends and no IOREQ server. If the IOREQ server creation fails,
continue to the PV backends initialization.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Paul Durrant <paul@xen.org>

xen-hvm: reorganize xen-hvm and move common function to xen-hvm-common

This patch does following:
1. creates arch_handle_ioreq() and arch_xen_set_memory(). This is done in
    preparation for moving most of xen-hvm code to an arch-neutral location,
    move the x86-specific portion of xen_set_memory to arch_xen_set_memory.
    Also, move handle_vmport_ioreq to arch_handle_ioreq.

2. Pure code movement: move common functions to hw/xen/xen-hvm-common.c
    Extract common functionalities from hw/i386/xen/xen-hvm.c and move them to
    hw/xen/xen-hvm-common.c. These common functions are useful for creating
    an IOREQ server.

    xen_hvm_init_pc() contains the architecture independent code for creating
    and mapping a IOREQ server, connecting memory and IO listeners, initializing
    a xen bus and registering backends. Moved this common xen code to a new
    function xen_register_ioreq() which can be used by both x86 and ARM machines.

    Following functions are moved to hw/xen/xen-hvm-common.c:
        xen_vcpu_eport(), xen_vcpu_ioreq(), xen_ram_alloc(), xen_set_memory(),
        xen_region_add(), xen_region_del(), xen_io_add(), xen_io_del(),
        xen_device_realize(), xen_device_unrealize(),
        cpu_get_ioreq_from_shared_memory(), cpu_get_ioreq(), do_inp(),
        do_outp(), rw_phys_req_item(), read_phys_req_item(),
        write_phys_req_item(), cpu_ioreq_pio(), cpu_ioreq_move(),
        cpu_ioreq_config(), handle_ioreq(), handle_buffered_iopage(),
        handle_buffered_io(), cpu_handle_ioreq(), xen_main_loop_prepare(),
        xen_hvm_change_state_handler(), xen_exit_notifier(),
        xen_map_ioreq_server(), destroy_hvm_domain() and
        xen_shutdown_fatal_error()

3. Removed static type from below functions:
    1. xen_region_add()
    2. xen_region_del()
    3. xen_io_add()
    4. xen_io_del()
    5. xen_device_realize()
    6. xen_device_unrealize()
    7. xen_hvm_change_state_handler()
    8. cpu_ioreq_pio()
    9. xen_exit_notifier()

4. Replace TARGET_PAGE_SIZE with XC_PAGE_SIZE to match the page side with Xen.

Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/i386/xen/xen-hvm: move x86-specific fields out of XenIOState

In preparation to moving most of xen-hvm code to an arch-neutral location, move:
- shared_vmport_page
- log_for_dirtybit
- dirty_bitmap
- suspend
- wakeup

out of XenIOState struct as these are only used on x86, especially the ones
related to dirty logging.
Updated XenIOState can be used for both aarch64 and x86.

Also, remove free_phys_offset as it was unused.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

hw/i386/xen: rearrange xen_hvm_init_pc

In preparation to moving most of xen-hvm code to an arch-neutral location,
move non IOREQ references to:
- xen_get_vmport_regs_pfn
- xen_suspend_notifier
- xen_wakeup_notifier
- xen_ram_init

towards the end of the xen_hvm_init_pc() function.

This is done to keep the common ioreq functions in one place which will be
moved to new function in next patch in order to make it common to both x86 and
aarch64 machines.

Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/i386/xen/: move xen-mapcache.c to hw/xen/

xen-mapcache.c contains common functions which can be used for enabling Xen on
aarch64 with IOREQ handling. Moving it out from hw/i386/xen to hw/xen to make it
accessible for both aarch64 and x86.

Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Paul Durrant <paul@xen.org>

Merge tag 'pull-request-2023-02-14' of https://gitlab.com/thuth/qemu into staging

* Bump minimum Clang version to 10.0
* Improve the handling of the libdw library
* Deprecate --enable-gprof builds and remove them from CI
* Remove the deprecated "sga" device
* Some header #include clean-ups
* Make qtests more flexible with regards to missing devices
* Some small s390x-related fixes/improvements

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmPra8ARHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbW5zQ/+LRyPlCB9/Zz+40QUNIl9M9pcpjwCno9w
# pWIeOXMftBu8vCa/+o58Y22NzmWq7cFypfsIvjrlCqCIZQx1shez5+bvE/ZSyECd
# vy3I+ybKRSaXgE/ighAifwOR7eOhqLBHyXJkAN5grMx0kuu3oKqXBpyJNqTHjq6o
# VgKTu8+o7Ddb+CuzwGahKzyzTqD7gY+9+Y5Gnlq0DvbNEq9k6tcjQZ8P3MqOmjWJ
# WnBZH9Rrq9itnlcZXMl8xY6w572GLTq+aJbtc/iS3cvNV2WMwTFMj4lJZjuSkzX0
# af2Aw/4e23uKpjnD44cI17XkRSQbUcT756b3KtdHlKHRvKSL+zwjsraimEN6sleO
# Y5ibaFamqvoR7IuOqJ7/cbpBN+P46JU1ryQ17BvAN/HnNSEmkn6FLGtDhyvabIHq
# C69TBb1IX0GApo0txeg0d0nFrSvVACpX+/D1bRedmI5SD3EukTcBr6UELiBqXqLH
# O75tWQydMwxuP8ay7/zU4g4PApif/RBbR41pw+oVsQNIx1p5QilNy069m/V2nT3k
# gMciT2+U8bIu3GUULuW0zPuD/o88XJk+ocjFHIE8BbwCx41iL6jKhM+xHwUeIkv4
# dZT+8BlbBzNZlSAgpMu9wt0T2hyHrKYtmLxvrVi/qq6ZyO7ZUVBfOGfqAd3QtvQX
# hIERpFunzFY=
# =GNLD
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 14 Feb 2023 11:08:48 GMT
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2023-02-14' of https://gitlab.com/thuth/qemu: (22 commits)
  hw/s390x/event-facility: Replace DO_UPCAST(SCLPEvent) by SCLP_EVENT()
  tests/tcg/s390x: Use -nostdlib for softmmu tests
  tests/qtest: Don't build virtio-serial-test.c if device not present
  tests/qtest: bios-tables-test: Skip if missing configs
  tests/qemu-iotests: Require virtio-scsi-pci
  tests/qtest: Do not include hexloader-test if loader device is not present
  tests/qtest: Check for devices in bios-tables-test
  tests/qtest: drive_del-test: Skip tests that require missing devices
  tests/qtest: Skip unplug tests that use missing devices
  test/qtest: Fix coding style in device-plug-test.c
  tests/qtest: hd-geo-test: Check for missing devices
  tests/qtest: Add dependence on PCIE_PORT for virtio-net-failover.c
  tests/qtest: Do not run lsi53c895a test if device is not present
  tests/qtest: Skip PXE tests for missing devices
  Do not include "qemu/error-report.h" in headers that do not need it
  include/hw: Do not include "hw/registerfields.h" in headers that don't need it
  hw/misc/sga: Remove the deprecated "sga" device
  tests/qtest/npcm7xx_pwm-test: Be less verbose unless V=2
  build: deprecate --enable-gprof builds and remove from CI
  meson: Disable libdw for static builds by default
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/s390x/event-facility: Replace DO_UPCAST(SCLPEvent) by SCLP_EVENT()

Use the SCLP_EVENT() QOM type-checking macro to avoid DO_UPCAST().

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230212225144.58660-16-philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/tcg/s390x: Use -nostdlib for softmmu tests

The code currently uses -nostartfiles, but this does not prevent
linking with libc. On Fedora there is no cross-libc, so the linking
step fails.

Fix by using the more comprehensive -nostdlib (that's also what
probe_target_compiler() checks for as well).

Fixes: 503e549e441e ("tests/tcg/s390x: Test unaligned accesses to lowcore")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20230131182057.2261614-1-iii@linux.ibm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest: Don't build virtio-serial-test.c if device not present

The virtconsole device might not be present in the QEMU build that is
being tested.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20230213210738.9719-5-farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest: bios-tables-test: Skip if missing configs

If we build with --without-default-devices, CONFIG_HPET and
CONFIG_PARALLEL are set to N, which makes the respective devices go
missing from acpi tables.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230208194700.11035-13-farosas@suse.de>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qemu-iotests: Require virtio-scsi-pci

Check that virtio-scsi-pci is present in the QEMU build before running
the tests.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230208194700.11035-12-farosas@suse.de>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest: Do not include hexloader-test if loader device is not present

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20230208194700.11035-11-farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest: Check for devices in bios-tables-test

Do not include tests that require devices that are not available in
the QEMU build.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20230208194700.11035-10-farosas@suse.de>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest: drive_del-test: Skip tests that require missing devices

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20230208194700.11035-9-farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest: Skip unplug tests that use missing devices

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20230208194700.11035-8-farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

test/qtest: Fix coding style in device-plug-test.c

We should not mix declarations and statements in QEMU code.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20230208194700.11035-7-farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest: hd-geo-test: Check for missing devices

Don't include tests that require devices not available in the QEMU
binary.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230208194700.11035-6-farosas@suse.de>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest: Add dependence on PCIE_PORT for virtio-net-failover.c

This test depends on the presence of the pcie-root-port device. Add a
build time dependency.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20230208194700.11035-4-farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest: Do not run lsi53c895a test if device is not present

The tests are built once for all the targets, so as long as one QEMU
binary is built with CONFIG_LSI_SCSI_PCI=y, this test will
run. However some binaries might not include the device. So check this
again in runtime.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20230208194700.11035-3-farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest: Skip PXE tests for missing devices

Check if the devices we're trying to add are present in the QEMU
binary. They could have been removed from the build via Kconfig or the
--without-default-devices option.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20230208194700.11035-2-farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

Do not include "qemu/error-report.h" in headers that do not need it

Include it in the .c files instead that use the error reporting
functions.

Message-Id: <20230210111931.1115489-1-thuth@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

include/hw: Do not include "hw/registerfields.h" in headers that don't need it

Include "hw/registerfields.h" in the .c files instead (if needed).

Message-Id: <20230210112315.1116966-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

hw/misc/sga: Remove the deprecated "sga" device

It's been deprecated since QEMU v6.2, so it should be OK to
finally remove this now.

Message-Id: <20230209161540.1054669-1-thuth@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest/npcm7xx_pwm-test: Be less verbose unless V=2

The npcm7xx_pwm-test produces a lot of output at V=1, which
means that on our CI tests the log files exceed the gitlab
500KB limit. Suppress the messages about exactly what is
being tested unless at V=2 and above.

This follows the pattern we use with qom-test.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20230209135047.1753081-1-peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

build: deprecate --enable-gprof builds and remove from CI

As gprof relies on instrumentation you rarely get useful data compared
to a real optimised build. Lets deprecate the build option and
simplify the CI configuration as a result.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1338
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230131094224.861621-1-alex.bennee@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

meson: Disable libdw for static builds by default

Static QEMU build fails on Debian Bullseye:

    /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libdw.a(debuginfod-client.o): in function `__libdwfl_debuginfod_init':
    (.text.startup+0x17): undefined reference to `dlopen'

The reason is that pkg-config does not suggest -ldl for libdw, and
adding --extra-ldflags="-ldl" resolves the issue. However, static
linking with libdw is an unclear topic:

* Linux perf does it.
* Debian's libdw-dev description says:

      Only link to the static version for special cases and when you
      don't need anything from the ebl backends.

* As the error message above indicates, -ldl is also needed for
  debuginfod support.

The functionality provided by libdw is needed for analyzing performance
of JITed code, which is mostly useful to developers and researchers.
Therefore, in order to avoid unpleasant surprises for people who don't
need this, simply disable libdw for static builds by default. It can
still be enabled explicitly if needed.

Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20230210005208.438142-2-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

meson: Add missing libdw knobs

Add the missing meson infrastructure bits for the new libdw
dependency. Model them after the existing capstone knobs.

Fixes: 7c10cb38ccb8 ("accel/tcg: Add debuginfo support")
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230210005208.438142-1-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

configure: Bump minimum Clang version to 10.0

Anthony Perard recently reported some problems with Clang v6.0 from
Ubuntu Bionic (with regards to the -Wmissing-braces configure test).
Since we're not officially supporting that version of Ubuntu anymore,
we should better bump our minimum version check in the configure script
instead of using our time to fix problems of unsupported compilers.
According to repology.org, our supported distros ship these versions
of Clang (looking at the highest version only):

              Fedora 36: 14.0.5
      CentOS 8 (RHEL-8): 12.0.1
              Debian 11: 13.0.1
     OpenSUSE Leap 15.4: 13.0.1
       Ubuntu LTS 20.04: 12.0.0
          FreeBSD Ports: 15.0.7
          NetBSD pkgsrc: 15.0.7
               Homebrew: 15.0.7
            MSYS2 mingw: 15.0.7
            Haiku ports: 12.0.1

While it seems like we could update to v12.0.0 from that point of view,
the default version on Ubuntu 20.04 is still v10.0, and we use that for
our CI tests based via the tests/docker/dockerfiles/ubuntu2004.docker
file.

Thus let's make v10.0 our minimum version now (which corresponds to
Apple Clang version v12.0). The -Wmissing-braces check can then be
removed, too, since both our minimum GCC and our minimum Clang version
now handle this correctly.

Message-Id: <20230131180239.1582302-1-thuth@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

Merge tag 'migration-20230213-pull-request' of https://gitlab.com/juan.quintela/qemu into staging

Migration Pull request (take3)

Hi

In this PULL request:
- Added to leonardo fixes:
Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Please apply.

[take 2]
- rebase to latest upstream
- fix compilation of linux-user (if have_system was missing) (me)
- cleanup multifd_load_cleanup(leonardo)
- Document RAM flags (me)

Please apply.

[take 1]
This are all the reviewed patches for migration:
- AVX512 support for xbzrle (Ling Xu)
- /dev/userfaultd support (Peter Xu)
- Improve ordering of channels (Peter Xu)
- multifd cleanups (Li Zhang)
- Remove spurious files from last merge (me)
  Rebase makes that to you
- Fix mixup between state_pending_{exact,estimate} (me)
- Cache RAM size during migration (me)
- cleanup several functions (me)

Please apply.

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmPppZYACgkQ9IfvGFhy
# 1yPLvQ//f8D6txzFawaxrfzpSAHnq70Gx+B5GkIwgwB8nlPIC3QELEf5uooM/RGA
# nSaUctUNOJUWqVGK3vp3jDIep02DzdIUrlOfy96h+pnTMpyMWFC2BexDfveVMUId
# dw8WCWZkGCFDfIWuKF+GA8eTu6HM1ouzgCJrRmPtCqsikzAPkogPm60hQSTAQxm9
# Kzdp1SXV1HmyA440vy8rtYf71BKpvb9OJFmwgZ+ICy0rc1aUmgJbKxkwyOgiI2lq
# ONekpbOg7lzlFVAQu1QHTmYN13bsn4uzwUgdifn1PixFQyRE3AVs4jdTmqeLnoPe
# Ac6j8v3pDOw/Xf4kpRWUmhkqTMEJt8/lyneJzu1mQkw0wwiUtDvknFgPG8wJsa+J
# ZQr1cBXQj4IjtkN6+ixF7XYvx3T6pWz0L+/w2+TbFBdLWIrPgFH0yPYjhx7FdDid
# cjUHyS1a0w9ngnXOxRG8+UNHWCpPOUhXeeiyNioogYZNKu77PFxJVDMe3eB6dXAB
# pDfl4P129PloKAPafcz9E6Sxr+lIgrETZmsRJlRz4czg18TxlIukMlDtyrepNWti
# GtIf9xTpP3JKjpHnKbWLaxP5VeFC7kQd0qas4VxD+tDjbJdUZdZMfHcOSS0SMRGe
# q5LVEzMMIPCQJQIqiLEJ0HTUUOtB8i+bGoirNEbDqhLa/oZwPP8=
# =TDnO
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 13 Feb 2023 02:51:02 GMT
# gpg:                using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full]
# gpg:                 aka "Juan Quintela <quintela@trasno.org>" [full]
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* tag 'migration-20230213-pull-request' of https://gitlab.com/juan.quintela/qemu: (22 commits)
  ram: Document migration ram flags
  migration/multifd: Move load_cleanup inside incoming_state_destroy
  migration/multifd: Join all multifd threads in order to avoid leaks
  migration/multifd: Remove unnecessary assignment on multifd_load_cleanup()
  migration/multifd: Change multifd_load_cleanup() signature and usage
  migration: Postpone postcopy preempt channel to be after main
  migration: Add a semaphore to count PONGs
  migration: Cleanup postcopy_preempt_setup()
  migration: Rework multi-channel checks on URI
  Update bench-code for addressing CI problem
  AVX512 support for xbzrle_encode_buffer
  migration: I messed state_pending_exact/estimate
  migration: Make ram_save_target_page() a pointer
  migration: Calculate ram size once
  migration: Split ram_bytes_total_common() in two functions
  migration: Make find_dirty_block() return a single parameter
  migration: Simplify ram_find_and_save_block()
  util/userfaultfd: Support /dev/userfaultfd
  linux-headers: Update to v6.1
  multifd: Remove some redundant code
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

ram: Document migration ram flags

0x80 is RAM_SAVE_FLAG_HOOK, it is in qemu-file now.
Bigger usable flag is 0x200, noticing that.
We can reuse RAM_SAVe_FLAG_FULL.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration/multifd: Move load_cleanup inside incoming_state_destroy

Currently running migration_incoming_state_destroy() without first running
multifd_load_cleanup() will cause a yank error:

qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance:
Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
(core dumped)

The above error happens in the target host, when multifd is being used
for precopy, and then postcopy is triggered and the migration finishes.
This will crash the VM in the target host.

To avoid that, move multifd_load_cleanup() inside
migration_incoming_state_destroy(), so that the load cleanup becomes part
of the incoming state destroying process.

Running multifd_load_cleanup() twice can become an issue, though, but the
only scenario it could be ran twice is on process_incoming_migration_bh().
So removing this extra call is necessary.

On the other hand, this multifd_load_cleanup() call happens way before the
migration_incoming_state_destroy() and having this happening before
dirty_bitmap_mig_before_vm_start() and vm_start() may be a need.

So introduce a new function multifd_load_shutdown() that will mainly stop
all multifd threads and close their QIOChannels. Then use this function
instead of multifd_load_cleanup() to make sure nothing else is received
before dirty_bitmap_mig_before_vm_start().

Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration/multifd: Join all multifd threads in order to avoid leaks

Current approach will only join threads that are still running.

For the threads not joined, resources or private memory are always kept in
the process space and never reclaimed before process end, and this risks
serious memory leaks.

This should usually not represent a big problem, since multifd migration
is usually just ran at most a few times, and after it succeeds there is
not much to be done before exiting the process.

Yet still, it should not hurt performance to join all of them.

Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration/multifd: Remove unnecessary assignment on multifd_load_cleanup()

Before assigning "p->quit = true" for every multifd channel,
multifd_load_cleanup() will call multifd_recv_terminate_threads() which
already does the same assignment, while protected by a mutex.

So there is no point doing the same assignment again.

Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration/multifd: Change multifd_load_cleanup() signature and usage

Since it's introduction in commit f986c3d256 ("migration: Create multifd
migration threads"), multifd_load_cleanup() never returned any value
different than 0, neither set up any error on errp.

Even though, on process_incoming_migration_bh() an if clause uses it's
return value to decide on setting autostart = false, which will never
happen.

In order to simplify the codebase, change multifd_load_cleanup() signature
to 'void multifd_load_cleanup(void)', and for every usage remove error
handling or decision made based on return value != 0.

Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration: Postpone postcopy preempt channel to be after main

Postcopy with preempt-mode enabled needs two channels to communicate.  The
order of channel establishment is not guaranteed.  It can happen that the
dest QEMU got the preempt channel connection request before the main
channel is established, then the migration may make no progress even during
precopy due to the wrong order.

To fix it, create the preempt channel only if we know the main channel is
established.

For a general postcopy migration, we delay it until postcopy_start(),
that's where we already went through some part of precopy on the main
channel.  To make sure dest QEMU has already established the channel, we
wait until we got the first PONG received.  That's something we do at the
start of precopy when postcopy enabled so it's guaranteed to happen sooner
or later.

For a postcopy recovery, we delay it to qemu_savevm_state_resume_prepare()
where we'll have round trips of data on bitmap synchronizations, which
means the main channel must have been established.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration: Add a semaphore to count PONGs

This is mostly useless, but useful for us to know whether the main channel
is correctly established without changing the migration protocol.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration: Cleanup postcopy_preempt_setup()

Since we just dropped the only case where postcopy_preempt_setup() can
return an error, it doesn't need a retval anymore because it never fails.
Move the preempt check to the caller, preparing it to be used elsewhere to
do nothing but as simple as kicking the async connection.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration: Rework multi-channel checks on URI

The whole idea of multi-channel checks was not properly done, IMHO.

Currently we check multi-channel in a lot of places, but actually that's
not needed because we only need to check it right after we get the URI and
that should be it.

If the URI check succeeded, we should never need to check it again because
we must have it.  If it check fails, we should fail immediately on either
the qmp_migrate or qmp_migrate_incoming, instead of failingg it later after
the connection established.

Neither should we fail any set capabiliities like what we used to do here:

5ad15e8614 ("migration: allow enabling mutilfd for specific protocol only", 2021-10-19)

Because logically the URI will only be set later after the capability is
set, so it doesn't make a lot of sense to check the URI type when setting
the capability, because we're checking the cap with an old URI passed in,
and that may not even be the URI we're going to use later.

This patch mostly reverted all such checks for before, dropping the
variable migrate_allow_multi_channels and helpers.  Instead, add a common
helper to check URI for multi-channels for either qmp_migrate and
qmp_migrate_incoming and that should do all the proper checks.  The failure
will only trigger with the "migrate" or "migrate_incoming" command, or when
user specified "-incoming xxx" where "xxx" is not "defer".

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Update bench-code for addressing CI problem

Unit test code is in test-xbzrle.c, and benchmark code is in xbzrle-bench.c
for performance benchmarking. we have modified xbzrle-bench.c to address
CI problem.

Signed-off-by: ling xu <ling1.xu@intel.com>
Co-authored-by: Zhou Zhao <zhou.zhao@intel.com>
Co-authored-by: Jun Jin <jun.i.jin@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

AVX512 support for xbzrle_encode_buffer

This commit is the same with [PATCH v6 1/2], and provides avx512 support for xbzrle_encode_buffer
function to accelerate xbzrle encoding speed. Runtime check of avx512
support and benchmark for this feature are added. Compared with C
version of xbzrle_encode_buffer function, avx512 version can achieve
50%-70% performance improvement on benchmarking. In addition, if dirty
data is randomly located in 4K page, the avx512 version can achieve
almost 140% performance gain.

Signed-off-by: ling xu <ling1.xu@intel.com>
Co-authored-by: Zhou Zhao <zhou.zhao@intel.com>
Co-authored-by: Jun Jin <jun.i.jin@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration: I messed state_pending_exact/estimate

I called the helper function from the wrong top level function.

This code was introduced in:

commit c8df4a7aeffcb46020f610526eea621fa5b0cd47
Author: Juan Quintela <quintela@redhat.com>
Date:   Mon Oct 3 02:00:03 2022 +0200

    migration: Split save_live_pending() into state_pending_*

    We split the function into to:

    - state_pending_estimate: We estimate the remaining state size without
      stopping the machine.

    - state pending_exact: We calculate the exact amount of remaining
      state.

Thanks to Avihai Horon <avihaih@nvidia.com> for finding it.

Fixes:c8df4a7aeffcb46020f610526eea621fa5b0cd47

When we introduced that patch, we enden calling

state_pending_estimate() helper from qemu_savevm_statepending_exact()
and
state_pending_exact() helper from qemu_savevm_statepending_estimate()

This patch fixes it.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration: Make ram_save_target_page() a pointer

We are going to create a new function for multifd latest in the series.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Calculate ram size once

We are recalculating ram size continously, when we know that it don't
change during migration. Create a field in RAMState to track it.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

migration: Split ram_bytes_total_common() in two functions

It is just a big if in the middle of the function, and we need two
functions anways.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---

Reindent to make Phillipe happy (and CODING_STYLE)

migration: Make find_dirty_block() return a single parameter

We used to return two bools, just return a single int with the
following meaning:

old return / again / new return
false        false   PAGE_ALL_CLEAN
false        true    PAGE_TRY_AGAIN
true         true    PAGE_DIRTY_FOUND  /* We don't care about again at all */

Signed-off-by: Juan Quintela <quintela@redhat.com>

migration: Simplify ram_find_and_save_block()

We will need later that find_dirty_block() return errors, so
simplify the loop.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

util/userfaultfd: Support /dev/userfaultfd

Teach QEMU to use /dev/userfaultfd when it existed and fallback to the
system call if either it's not there or doesn't have enough permission.

Firstly, as long as the app has permission to access /dev/userfaultfd, it
always have the ability to trap kernel faults which QEMU mostly wants.
Meanwhile, in some context (e.g. containers) the userfaultfd syscall can be
forbidden, so it can be the major way to use postcopy in a restricted
environment with strict seccomp setup.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

linux-headers: Update to v6.1

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

multifd: Remove some redundant code

Clean up some unnecessary code

Signed-off-by: Li Zhang <lizhang@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>

multifd: cleanup the function multifd_channel_connect

Cleanup multifd_channel_connect

Signed-off-by: Li Zhang <lizhang@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration: Remove spurious files

I introduced spurious files on my tree during a rebase:

commit ebfc57871506b3fe36cc41f69ee3ad31a34afd63
Author: Zhenzhong Duan <zhenzhong.duan@intel.com>
Date:   Mon Oct 17 15:53:51 2022 +0800

    multifd: Fix flush of zero copy page send request

    Make IO channel flush call after the inflight request has been drained
    in multifd thread, or else we may missed to flush the inflight request.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
To make things worse, it appears like Zhenzhong is the one to blame.

for(int i=0; i < 1000000; i++) {
printf("I will not do rebases when I am tired\n");
}

Sorry, Juan.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging

Pull request

A few fixes that I've picked up.

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmPlEFsACgkQnKSrs4Gr
# c8h3OggAj5TeIB/R6F69vHIuoqogJELm5jVkPoXd1VK+nAayEXHFmEfUi72JIh62
# l8E4NOuUIxwSXdD2HLH/CpezBh5EVW/LJ9AfRUXCWV65KL92dkZoQyxonNQMKdQ3
# pxj7zwHrlsdORPfRSnFVaGksaIdePgj46CjSQh8IF8RMvYMVF9hG3ias7rT+EWi7
# SidPse4tik3WPxWteEXQd/8fdUehloPOB6Xm8pFilr0oR/TlRyMRWeaUs5+6WUIy
# y1+mqObsY22DvIDqsqTbZDULnHXAI5zxy9gwHi+DhRi3DbuAxdjH1Vclk0Y9wsGY
# QKhoFtGhfOd+94uSusp5UbG5iNvbuQ==
# =HZZr
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 09 Feb 2023 15:25:15 GMT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
  iotests/detect-zeroes-registered-buf: add new test
  qemu-io: add -r option to register I/O buffer
  qemu-io: use BdrvRequestFlags instead of int
  block: fix detect-zeroes= with BDRV_REQ_REGISTERED_BUF
  virtio-blk: add missing AioContext lock
  vhost-user-fs: Back up vqs before cleaning up vhost_dev

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

iotests/detect-zeroes-registered-buf: add new test

This regression test demonstrates that detect-zeroes works with
registered buffers. Bug details:
https://gitlab.com/qemu-project/qemu/-/issues/1404

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230207203719.242926-5-stefanha@redhat.com>

qemu-io: add -r option to register I/O buffer

The blk_register_buf() API is an optimization hint that allows some
block drivers to avoid I/O buffer housekeeping or bounce buffers.

Add an -r option to register the I/O buffer so that qemu-io can be used
to test the blk_register_buf() API. The next commit will add a test that
uses the new option.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230207203719.242926-4-stefanha@redhat.com>

qemu-io: use BdrvRequestFlags instead of int

The block layer APIs use BdrvRequestFlags while qemu-io code uses int.
Although the code compiles and runs fine, BdrvRequestFlags is clearer
because it differentiates between other types of flags like bdrv_open()
flags.

This is purely refactoring.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230207203719.242926-3-stefanha@redhat.com>