David Woodhouse [Thu, 19 Oct 2023 14:30:23 +0000 (15:30 +0100)]
docs: update Xen-on-KVM documentation
Add notes about console and network support, and how to launch PV guests.
Clean up the disk configuration examples now that that's simpler, and
remove the comment about IDE unplug on q35/AHCI now that it's fixed.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Thu, 19 Oct 2023 11:56:42 +0000 (12:56 +0100)]
xen-platform: unplug AHCI disks
To support Xen guests using the Q35 chipset, the unplug protocol needs
to also remove AHCI disks.
Make pci_xen_ide_unplug() more generic, iterating over the children
of the PCI device and destroying the "ide-hd" devices. That works the
same for both AHCI and IDE, as does the detection of the primary disk
as unit 0 on the bus named "ide.0".
Then pci_xen_ide_unplug() can be used for both AHCI and IDE devices.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Tue, 17 Oct 2023 20:59:03 +0000 (21:59 +0100)]
net: do not delete nics in net_cleanup()
In net_cleanup() we only need to delete the netdevs, as those may have
state which outlives Qemu when it exits, and thus may actually need to
be cleaned up on exit.
The nics, on the other hand, are owned by the device which created them.
Most devices don't bother to clean up on exit because they don't have
any state which will outlive Qemu... but XenBus devices do need to clean
up their nodes in XenStore, and do have an exit handler to delete them.
When the XenBus exit handler destroys the xen-net-device, it attempts
to delete its nic after net_cleanup() had already done so. And crashes.
Fix this by only deleting netdevs as we walk the list. As the comment
notes, we can't use QTAILQ_FOREACH_SAFE() as each deletion may remove
*multiple* entries, including the "safely" saved 'next' pointer. But
we can store the *previous* entry, since nics are safe.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Wed, 18 Oct 2023 23:27:08 +0000 (00:27 +0100)]
hw/xenpv: fix '-nic' support for xen-net-device
I can't see how this has ever worked. If I start with the simple attempt
"-nic user,model=xen", it creates a device with index -1 because it's
assuming that it'll be attached to a hubport. So it creates a frontend
at e.g. "/local/domain/84/device/vif/-1" and the guest fails to connect.
If I jump through hoops to give it a configuration that it might like:
-netdev user,id=usernic
-netdev hubport,hubid=0,id=hub0,netdev=usernic
-nic,hubport,hubid=0,model=xen
... it *still* doesn't work. Qemu does actually use a slightly more
sensible index in the XenStore frontend path now, and the guest does
manage to connect to it. But on the Qemu side, the NIC still isn't
actually *attached* to the netdev:
qemu-system-x86_64: warning: hub port #net036 has no peer
qemu-system-x86_64: warning: hub 0 with no nics
qemu-system-x86_64: warning: netdev #net036 has no peer
qemu-system-x86_64: warning: requested NIC (anonymous, model xen) was not created (not supported by this machine?)
I can't see any point in the git history where the xen-nic driver
would actually look at that "handle" property, find the right netdev,
and actually *attach* the emulated NIC to anything.
Just rip out the special XenStore magic and instantiate a xen-net-device
on the XenBus. It all works now. Accept "model=xen-net-device" because
that's the actual Qemu device name and that's what works on HVM & emu.
Also accept model==NULL because why in $DEITY's name was that excluded
before anyway? What else are we doing to do for *PV* guests?
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Tue, 17 Oct 2023 16:53:58 +0000 (17:53 +0100)]
hw/i386/pc: support '-nic' for xen-net-device
The default NIC creation seems a bit hackish to me. I don't understand
why each platform has to call pci_nic_init_nofail() from a point in the
code where it actually has a pointer to the PCI bus, and then we have
the special cases for things like ne2k_isa.
If qmp_device_add() can *find* the appropriate bus and instantiate
the device on it, why can't we just do that from generic code for
creating the default NICs too?
But that isn't a yak I want to shave today. Add a xenbus field to the
PCMachineState so that it can make its way from pc_basic_device_init()
to pc_nic_init() and be handled as a special case like ne2k_isa is.
Now we can launch emulated Xen guests with '-nic user'.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Wed, 18 Oct 2023 23:26:42 +0000 (00:26 +0100)]
hw/xen: handle soft reset for primary console
On soft reset, the prinary console event channel needs to be rebound to
the backend port (in the xen-console driver). We could put that into the
xen-console driver itself, but it's slightly less ugly to keep it within
the KVM/Xen code, by stashing the backend port# on event channel reset
and then rebinding in the primary console reset when it has to recreate
the guest port anyway.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Mon, 16 Oct 2023 15:00:23 +0000 (16:00 +0100)]
hw/xen: add support for Xen primary console in emulated mode
The primary console is special because the toolstack maps a page at a
fixed GFN and also allocates the guest-side event channel. Add support
for that in emulated mode, so that we can have a primary console.
Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which
supports literally nothing except a single-page mapping of the console
page. This might as well have been a hack in the xen_console driver, but
this way at least the special-casing is kept within the Xen emulation
code, and it gives us a hook for a more complete implementation if/when
we ever do need one.
Now at last we can boot the Xen PV shim and run PV kernels in QEMU.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Mon, 16 Oct 2023 09:28:17 +0000 (10:28 +0100)]
hw/xen: do not repeatedly try to create a failing backend device
If xen_backend_device_create() fails to instantiate a device, the XenBus
code will just keep trying over and over again each time the bus is
re-enumerated, as long as the backend appears online and in
XenbusStateInitialising.
The only thing which prevents the XenBus code from recreating duplicates
of devices which already exist, is the fact that xen_device_realize()
sets the backend state to XenbusStateInitWait. If the attempt to create
the device doesn't get *that* far, that's when it will keep getting
retried.
My first thought was to handle errors by setting the backend state to
XenbusStateClosed, but that doesn't work for XenConsole which wants to
*ignore* any device of type != "ioemu" completely.
So, make xen_backend_device_create() *keep* the XenBackendInstance for a
failed device, and provide a new xen_backend_exists() function to allow
xen_bus_type_enumerate() to check whether one already exists before
creating a new one.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Sat, 14 Oct 2023 15:53:23 +0000 (16:53 +0100)]
hw/xen: add get_frontend_path() method to XenDeviceClass
The primary Xen console is special. The guest's side is set up for it by
the toolstack automatically and not by the standard PV init sequence.
Accordingly, its *frontend* doesn't appear in …/device/console/0 either;
instead it appears under …/console in the guest's XenStore node.
To allow the Xen console driver to override the frontend path for the
primary console, add a method to the XenDeviceClass which can be used
instead of the standard xen_device_get_frontend_path()
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Mon, 16 Oct 2023 12:01:39 +0000 (13:01 +0100)]
hw/xen: automatically assign device index to block devices
There's no need to force the user to assign a vdev. We can automatically
assign one, starting at xvda and searching until we find the first disk
name that's unused.
This means we can now allow '-drive if=xen,file=xxx' to work without an
explicit separate -driver argument, just like if=virtio.
Rip out the legacy handling from the xenpv machine, which was scribbling
over any disks configured by the toolstack, and didn't work with anything
but raw images.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Kevin Wolf <kwolf@redhat.com>
David Woodhouse [Thu, 12 Oct 2023 09:59:45 +0000 (10:59 +0100)]
hw/xen: populate store frontend nodes with XenStore PFN/port
This is kind of redundant since without being able to get these through
some other method (HVMOP_get_param) the guest wouldn't be able to access
XenStore in order to find them. But Xen populates them, and it does
allow guests to *rebind* to the event channel port after a reset.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Wed, 2 Aug 2023 16:04:49 +0000 (17:04 +0100)]
hw/xen: Clean up event channel 'type_val' handling to use union
A previous implementation of this stuff used a 64-bit field for all of
the port information (vcpu/type/type_val) and did atomic exchanges on
them. When I implemented that in Qemu I regretted my life choices and
just kept it simple with locking instead.
So there's no need for the XenEvtchnPort to be so simplistic. We can
use a union for the pirq/virq/interdomain information, which lets us
keep a separate bit for the 'remote domain' in interdomain ports. A
single bit is enough since the only possible targets are loopback or
qemu itself.
So now we can ditch PORT_INFO_TYPEVAL_REMOTE_QEMU and the horrid
manual masking, although the in-memory representation is identical
so there's no change in the saved state ABI.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
David Woodhouse [Tue, 17 Oct 2023 12:34:18 +0000 (13:34 +0100)]
hw/xen: fix XenStore watch delivery to guest
When fire_watch_cb() found the response buffer empty, it would call
deliver_watch() to generate the XS_WATCH_EVENT message in the response
buffer and send an event channel notification to the guest… without
actually *copying* the response buffer into the ring. So there was
nothing for the guest to see. The pending response didn't actually get
processed into the ring until the guest next triggered some activity
from its side.
Add the missing call to put_rsp().
It might have been slightly nicer to call xen_xenstore_event() here,
which would *almost* have worked. Except for the fact that it calls
xen_be_evtchn_pending() to check that it really does have an event
pending (and clear the eventfd for next time). And under Xen it's
defined that setting that fd to O_NONBLOCK isn't guaranteed to work,
so the emu implementation follows suit.
This fixes Xen device hot-unplug.
Fixes: 0254c4d19df ("hw/xen: Add xenstore wire implementation and implementation stubs") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Wed, 18 Oct 2023 12:31:20 +0000 (13:31 +0100)]
hw/xen: don't clear map_track[] in xen_gnttab_reset()
The refcounts actually correspond to 'active_ref' structures stored in a
GHashTable per "user" on the backend side (mostly, per XenDevice).
If we zero map_track[] on reset, then when the backend drivers get torn
down and release their mapping we hit the assert(s->map_track[ref] != 0)
in gnt_unref().
So leave them in place. Each backend driver will disconnect and reconnect
as the guest comes back up again and reconnects, and it all works out OK
in the end as the old refs get dropped.
Fixes: de26b2619789 ("hw/xen: Implement soft reset for emulated gnttab") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Wed, 11 Oct 2023 23:06:26 +0000 (00:06 +0100)]
hw/xen: select kernel mode for per-vCPU event channel upcall vector
A guest which has configured the per-vCPU upcall vector may set the
HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero.
For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support
for HVMOP_set_evtchn_upcall_vector") will just do this after setting the
vector:
/* Trick toolstack to think we are enlightened. */
if (!cpu)
rc = xen_set_callback_via(1);
That's explicitly setting the delivery to GSI#1, but it's supposed to be
overridden by the per-vCPU vector setting. This mostly works in Qemu
*except* for the logic to enable the in-kernel handling of event channels,
which falsely determines that the kernel cannot accelerate GSI delivery
in this case.
Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has
the vector set, and use that in xen_evtchn_set_callback_param() to
enable the kernel acceleration features even when the param *appears*
to be set to target a GSI.
Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to
*zero* the event channel delivery is disabled completely. (Which is
what that bizarre guest behaviour is working round in the first place.)
Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Wed, 11 Oct 2023 22:30:08 +0000 (23:30 +0100)]
i386/xen: fix per-vCPU upcall vector for Xen emulation
The per-vCPU upcall vector support had two problems. Firstly it was
using the wrong hypercall argument and would always return -EFAULT.
And secondly it was using the wrong ioctl() to pass the vector to
the kernel and thus the *kernel* would always return -EINVAL.
Linux doesn't (yet) use this mode so it went without decent testing
for a while.
Fixes: 105b47fdf2d0 ("i386/xen: implement HVMOP_set_evtchn_upcall_vector") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Wed, 30 Aug 2023 20:13:57 +0000 (21:13 +0100)]
hw/timer/hpet: fix IRQ routing in legacy support mode
The interrupt from timer 0 in legacy mode is supposed to go to IRQ 0 on
the i8259 and IRQ 2 on the I/O APIC. The generic x86 GSI handling can't
cope with IRQ numbers differing between the two chips (despite it also
being the case for PCI INTx routing), so add a special case for the HPET.
IRQ 2 isn't valid on the i8259; it's the cascade IRQ and would be
interpreted as spurious interrupt on the secondary PIC. So we can fix
up all attempts to deliver IRQ2, to actually deliver to IRQ0 on the PIC.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
David Woodhouse [Fri, 10 Mar 2023 17:37:03 +0000 (17:37 +0000)]
intel-iommu: Report interrupt remapping faults, fix return value
A generic X86IOMMUClass->int_remap function should not return VT-d
specific values; fix it to return 0 if the interrupt was successfully
translated or -EINVAL if not.
The VTD_FR_IR_xxx values are supposed to be used to actually raise
faults through the fault reporting mechanism, so do that instead for
the case where the IRQ is actually being injected.
There is more work to be done here, as pretranslations for the KVM IRQ
routing table can't fault; an untranslatable IRQ should be handled in
userspace and the fault raised only when the IRQ actually happens (if
indeed the IRTE is still not valid at that time). But we can work on
that later; we can at least raise faults for the direct case.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Peter Xu <peterx@redhat.com>
Stefan Hajnoczi [Wed, 18 Oct 2023 10:21:15 +0000 (06:21 -0400)]
Merge tag 'pull-vfio-20231018' of https://github.com/legoater/qemu into staging
vfio queue:
* Support for VFIODisplay migration with ramfb
* Preliminary work for IOMMUFD support
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmUvlEYACgkQUaNDx8/7
# 7KFlaw//X2053de2eTdo38/UMSzi5ACWWn2j1iGQZf/3+J2LcdlixZarZr/2DN56
# 4axmwF6+GKozt5+EnvWtgodDn6U9iyMNaAB3CGBHFHsH8uqKeZd/Ii754q4Rcmy9
# ZufBOPWm9Ff7s2MMFiAZvso75jP2wuwVEe1YPRjeJnsNSNIJ6WZfemh3Sl96yRBb
# r38uqzqetKwl7HziMMWP3yb8v+dU8A9bqI1hf1FZGttfFz3XA+pmjXKA6XxdfiZF
# AAotu5x9w86a08sAlr/qVsZFLR37oQykkXM0D840DafJDyr5fbJiq8cwfOjMw9+D
# w6+udRm5KoBWPsvb/T3dR88GRMO22PChjH9Vjl51TstMNhdTxuKJTKhhSoUFZbXV
# 8CMjwfALk5ggIOyCk1LRd04ed+9qkqgcbw1Guy5pYnyPnY/X6XurxxaxS6Gemgtn
# UvgRYhSjio+LgHLO77IVkWJMooTEPzUTty2Zxa7ldbbE+utPUtsmac9+1m2pnpqk
# 5VQmB074QnsJuvf+7HPU6vYCzQWoXHsH1UY/A0fF7MPedNUAbVYzKrdGPyqEMqHy
# xbilAIaS3oO0pMT6kUpRv5c5vjbwkx94Nf/ii8fQVjWzPfCcaF3yEfaam62jMUku
# stySaRpavKIx2oYLlucBqeKaBGaUofk13gGTQlsFs8pKCOAV7r4=
# =s0fN
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 18 Oct 2023 04:16:06 EDT
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [unknown]
# gpg: aka "Cédric Le Goater <clg@kaod.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-vfio-20231018' of https://github.com/legoater/qemu: (22 commits)
hw/vfio: add ramfb migration support
ramfb-standalone: add migration support
ramfb: add migration support
vfio/pci: Remove vfio_detach_device from vfio_realize error path
vfio/ccw: Remove redundant definition of TYPE_VFIO_CCW
vfio/ap: Remove pointless apdev variable
vfio/pci: Fix a potential memory leak in vfio_listener_region_add
vfio/common: Move legacy VFIO backend code into separate container.c
vfio/common: Introduce a global VFIODevice list
vfio/common: Store the parent container in VFIODevice
vfio/common: Introduce a per container device list
vfio/common: Move VFIO reset handler registration to a group agnostic function
vfio/ccw: Use vfio_[attach/detach]_device
vfio/ap: Use vfio_[attach/detach]_device
vfio/platform: Use vfio_[attach/detach]_device
vfio/pci: Introduce vfio_[attach/detach]_device
vfio/common: Extract out vfio_kvm_device_[add/del]_fd
vfio/common: Introduce vfio_container_add|del_section_window()
vfio/common: Propagate KVM_SET_DEVICE_ATTR error if any
vfio/common: Move IOMMU agnostic helpers to a separate file
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Wed, 18 Oct 2023 10:20:41 +0000 (06:20 -0400)]
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* build system and Python cleanups
* fix netbsd VM build
* allow non-relocatable installs
* allow using command line options to configure qemu-ga
* target/i386: check intercept for XSETBV
* target/i386: fix CPUID_HT exposure
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (32 commits)
configure: define "pkg-config" in addition to "pkgconfig"
meson: add a note on why we use config_host for program paths
meson-buildoptions: document the data at the top
configure, meson: use command line options to configure qemu-ga
configure: unify handling of several Debian cross containers
configure: move environment-specific defaults to config-meson.cross
configure: move target-specific defaults to an external machine file
configure: remove some dead cruft
configure: clean up PIE option handling
configure: clean up plugin option handling
configure, tests/tcg: simplify GDB conditionals
tests/tcg/arm: move non-SVE tests out of conditional
hw/remote: move stub vfu_object_set_bus_irq out of stubs/
hw/xen: cleanup sourcesets
configure: clean up handling of CFI option
meson, cutils: allow non-relocatable installs
meson: do not use set10
meson: do not build shaders by default
tracetool: avoid invalid escape in Python string
tests/vm: avoid invalid escape in Python string
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Marc-André Lureau [Mon, 9 Oct 2023 06:32:45 +0000 (10:32 +0400)]
ramfb: add migration support
Implementing RAMFB migration is quite straightforward. One caveat is to
treat the whole RAMFBCfg as a blob, since that's what is exposed to the
guest directly. This avoid having to fiddle with endianness issues if we
were to migrate fields individually as integers.
The devices using RAMFB will have to include ramfb_vmstate in their
migration description.
Eric Auger [Wed, 11 Oct 2023 20:09:34 +0000 (22:09 +0200)]
vfio/pci: Remove vfio_detach_device from vfio_realize error path
In vfio_realize, on the error path, we currently call
vfio_detach_device() after a successful vfio_attach_device.
While this looks natural, vfio_instance_finalize also induces
a vfio_detach_device(), and it seems to be the right place
instead as other resources are released there which happen
to be a prerequisite to a successful UNSET_CONTAINER.
So let's rely on the finalize vfio_detach_device call to free
all the relevant resources.
Zhenzhong Duan [Mon, 9 Oct 2023 02:20:46 +0000 (10:20 +0800)]
vfio/pci: Fix a potential memory leak in vfio_listener_region_add
When there is an failure in vfio_listener_region_add() and the section
belongs to a ram device, there is an inaccurate error report which should
never be related to vfio_dma_map failure. The memory holding err is also
incrementally leaked in each failure.
Fix it by reporting the real error and free it.
Fixes: 567b5b309ab ("vfio/pci: Relax DMA map errors for MMIO regions") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Yi Liu [Mon, 9 Oct 2023 09:09:17 +0000 (11:09 +0200)]
vfio/common: Move legacy VFIO backend code into separate container.c
Move all the code really dependent on the legacy VFIO container/group
into a separate file: container.c. What does remain in common.c is
the code related to VFIOAddressSpace, MemoryListeners, migration and
all other general operations.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Zhenzhong Duan [Mon, 9 Oct 2023 09:09:16 +0000 (11:09 +0200)]
vfio/common: Introduce a global VFIODevice list
Some functions iterate over all the VFIODevices. This is currently
achieved by iterating over all groups/devices. Let's
introduce a global list of VFIODevices simplifying that scan.
This will also be useful while migrating to IOMMUFD by hiding the
group specificity.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Zhenzhong Duan [Mon, 9 Oct 2023 09:09:15 +0000 (11:09 +0200)]
vfio/common: Store the parent container in VFIODevice
let's store the parent contaienr within the VFIODevice.
This simplifies the logic in vfio_viommu_preset() and
brings the benefice to hide the group specificity which
is useful for IOMMUFD migration.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Zhenzhong Duan [Mon, 9 Oct 2023 09:09:14 +0000 (11:09 +0200)]
vfio/common: Introduce a per container device list
Several functions need to iterate over the VFIO devices attached to
a given container. This is currently achieved by iterating over the
groups attached to the container and then over the devices in the group.
Let's introduce a per container device list that simplifies this
search.
Per container list is used in below functions:
vfio_devices_all_dirty_tracking
vfio_devices_all_device_dirty_tracking
vfio_devices_all_running_and_mig_active
vfio_devices_dma_logging_stop
vfio_devices_dma_logging_start
vfio_devices_query_dirty_bitmap
This will also ease the migration of IOMMUFD by hiding the group
specificity.
Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Zhenzhong Duan [Mon, 9 Oct 2023 09:09:13 +0000 (11:09 +0200)]
vfio/common: Move VFIO reset handler registration to a group agnostic function
Move the reset handler registration/unregistration to a place that is not
group specific. vfio_[get/put]_address_space are the best places for that
purpose.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Eric Auger [Mon, 9 Oct 2023 09:09:12 +0000 (11:09 +0200)]
vfio/ccw: Use vfio_[attach/detach]_device
Let the vfio-ccw device use vfio_attach_device() and
vfio_detach_device(), hence hiding the details of the used
IOMMU backend.
Note that the migration reduces the following trace
"vfio: subchannel %s has already been attached" (featuring
cssid.ssid.devid) into "device is already attached"
Also now all the devices have been migrated to use the new
vfio_attach_device/vfio_detach_device API, let's turn the
legacy functions into static functions, local to container.c.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Eric Auger [Mon, 9 Oct 2023 09:09:11 +0000 (11:09 +0200)]
vfio/ap: Use vfio_[attach/detach]_device
Let the vfio-ap device use vfio_attach_device() and
vfio_detach_device(), hence hiding the details of the used
IOMMU backend.
We take the opportunity to use g_path_get_basename() which
is prefered, as suggested by 3e015d815b ("use g_path_get_basename instead of basename")
Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Eric Auger [Mon, 9 Oct 2023 09:09:10 +0000 (11:09 +0200)]
vfio/platform: Use vfio_[attach/detach]_device
Let the vfio-platform device use vfio_attach_device() and
vfio_detach_device(), hence hiding the details of the used
IOMMU backend.
Drop the trace event for vfio-platform as we have similar
one in vfio_attach_device.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Eric Auger [Mon, 9 Oct 2023 09:09:09 +0000 (11:09 +0200)]
vfio/pci: Introduce vfio_[attach/detach]_device
We want the VFIO devices to be able to use two different
IOMMU backends, the legacy VFIO one and the new iommufd one.
Introduce vfio_[attach/detach]_device which aim at hiding the
underlying IOMMU backend (IOCTLs, datatypes, ...).
Once vfio_attach_device completes, the device is attached
to a security context and its fd can be used. Conversely
When vfio_detach_device completes, the device has been
detached from the security context.
At the moment only the implementation based on the legacy
container/group exists. Let's use it from the vfio-pci device.
Subsequent patches will handle other devices.
We also take benefit of this patch to properly free
vbasedev->name on failure.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Zhenzhong Duan [Mon, 9 Oct 2023 09:09:08 +0000 (11:09 +0200)]
vfio/common: Extract out vfio_kvm_device_[add/del]_fd
Introduce two new helpers, vfio_kvm_device_[add/del]_fd
which take as input a file descriptor which can be either a group fd or
a cdev fd. This uses the new KVM_DEV_VFIO_FILE VFIO KVM device group,
which aliases to the legacy KVM_DEV_VFIO_GROUP.
vfio_kvm_device_[add/del]_group then call those new helpers.
Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Introduce helper functions that isolate the code used for
VFIO_SPAPR_TCE_v2_IOMMU.
Those helpers hide implementation details beneath the container object
and make the vfio_listener_region_add/del() implementations more
readable. No code change intended.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Eric Auger [Mon, 9 Oct 2023 09:09:06 +0000 (11:09 +0200)]
vfio/common: Propagate KVM_SET_DEVICE_ATTR error if any
In the VFIO_SPAPR_TCE_v2_IOMMU container case, when
KVM_SET_DEVICE_ATTR fails, we currently don't propagate the
error as we do on the vfio_spapr_create_window() failure
case. Let's align the code. Take the opportunity to
reword the error message and make it more explicit.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Yi Liu [Mon, 9 Oct 2023 09:09:05 +0000 (11:09 +0200)]
vfio/common: Move IOMMU agnostic helpers to a separate file
Move low-level iommu agnostic helpers to a separate helpers.c
file. They relate to regions, interrupts, device/region
capabilities and etc.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Eric Auger [Mon, 9 Oct 2023 09:09:03 +0000 (11:09 +0200)]
scripts/update-linux-headers: Add iommufd.h
Update the script to import iommufd.h
Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Paolo Bonzini [Tue, 17 Oct 2023 15:32:50 +0000 (17:32 +0200)]
configure: define "pkg-config" in addition to "pkgconfig"
Meson used to allow both "pkgconfig" and "pkg-config" entries in machine
files; the former was used for dependency lookup and the latter
was used as return value for "find_program('pkg-config')", which is a less
common use-case and one that QEMU does not need.
This inconsistency is going to be fixed by Meson 1.3, which will deprecate
"pkgconfig" in favor of "pkg-config" (the less common one, but it makes
sense because it matches the name of the binary). For backward
compatibility it is still allowed to define both, so do that in the
configure-generated machine file.
Related: https://github.com/mesonbuild/meson/pull/12385 Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 9 Oct 2023 12:03:56 +0000 (14:03 +0200)]
configure: unify handling of several Debian cross containers
The Debian and GNU architecture names match very often, even though
there are common cases (32-bit Arm or 64-bit x86) where they do not
and other cases in which the GNU triplet is actually a quadruplet.
But it is still possible to group the common case into a single
case inside probe_target_compiler.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 16 Oct 2023 06:20:13 +0000 (08:20 +0200)]
configure: move environment-specific defaults to config-meson.cross
Store the -Werror and SMBD defaults in the machine file, which still allows
them to be overridden on the command line and enables automatic parsing
of the related options.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 18 Sep 2023 09:06:48 +0000 (11:06 +0200)]
configure: clean up plugin option handling
Keep together all the conditions that lead to disabling plugins, and
remove now-dead code.
Since the option was not in SKIP_OPTIONS, it was present twice in
the help message, both from configure and from meson-buildoptions.sh.
Remove the duplication and take the occasion to document the option as
autodetected, which it is.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 31 Aug 2023 09:14:18 +0000 (11:14 +0200)]
hw/xen: cleanup sourcesets
xen_ss is added unconditionally to arm_ss and i386_ss (the only
targets that can have CONFIG_XEN enabled) and its contents are gated by
CONFIG_XEN; xen_specific_ss has no condition for its constituent files
but is gated on CONFIG_XEN when its added to specific_ss.
So xen_ss is a duplicate of xen_specific_ss, though defined in a
different way. Merge the two by eliminating xen_ss.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 5 Oct 2023 12:19:34 +0000 (14:19 +0200)]
meson, cutils: allow non-relocatable installs
Say QEMU is configured with bindir = "/usr/bin" and a firmware path
that starts with "/usr/share/qemu". Ever since QEMU 5.2, QEMU's
install has been relocatable: if you move qemu-system-x86_64 from
/usr/bin to /home/username/bin, it will start looking for firmware in
/home/username/share/qemu. Previously, you would get a non-relocatable
install where the moved QEMU will keep looking for firmware in
/usr/share/qemu.
Windows almost always wants relocatable installs, and in fact that
is why QEMU 5.2 introduced relocatability in the first place.
However, newfangled distribution mechanisms such as AppImage
(https://docs.appimage.org/reference/best-practices.html), and
possibly NixOS, also dislike using at runtime the absolute paths
that were established at build time.
On POSIX systems you almost never care; if you do, your usecase
dictates which one is desirable, so there's no single answer.
Obviously relocatability works fine most of the time, because not many
people have complained about QEMU's switch to relocatable install,
and that's why until now there was no way to disable relocatability.
But a non-relocatable, non-modular binary can help if you want to do
experiments with old firmware and new QEMU or vice versa (because you
can just upgrade/downgrade the firmware package, and use rpm2cpio or
similar to extract the QEMU binaries outside /usr), so allow both.
This patch allows one to build a non-relocatable install using a new
option to configure. Why? Because it's not too hard, and because
it helps the user double check the relocatability of their install.
Note that the same code that handles relocation also lets you run QEMU
from the build tree and pick e.g. firmware files from the source tree
transparently. Therefore that part remains active with this patch,
even if you configure with --disable-relocatable.
Suggested-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* tag 'migration-20231017-pull-request' of https://gitlab.com/juan.quintela/qemu: (38 commits)
migration/multifd: Clarify Error usage in multifd_channel_connect
migration/multifd: Unify multifd_send_thread error paths
migration/multifd: Remove direct "socket" references
migration/ram: Merge save_zero_page functions
migration/ram: Move xbzrle zero page handling into save_zero_page
migration/ram: Stop passing QEMUFile around in save_zero_page
migration/ram: Remove RAMState from xbzrle_cache_zero_page
migration/ram: Refactor precopy ram loading code
multifd: reset next_packet_len after sending pages
multifd: fix counters in multifd_send_thread
migration: check for rate_limit_max for RATE_LIMIT_DISABLED
migration: Improve json and formatting
migration/rdma: Remove all "ret" variables that are used only once
migration/rdma: Declare for index variables local
migration/rdma: Use i as for index instead of idx
migration/rdma: Check sooner if we are in postcopy for save_page()
migration/rdma: Remove qemu_ prefix from exported functions
migration/rdma: Move rdma constants from qemu-file.h to rdma.h
qemu-file: Remove QEMUFileHooks
migration/rdma: Create rdma_control_save_page()
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Thu, 5 Oct 2023 12:31:27 +0000 (14:31 +0200)]
meson: do not use set10
Make all items of config-host.h consistent. To keep the --disable-coroutine-pool
code visible to the compiler, mutuate the IS_ENABLED() macro from Linux.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When called from git-send-email, some results contain unclosed
parentheses from the subsystem title, for example:
(cc-cmd) Adding cc: qemu-ppc@nongnu.org (open list:PowerNV (Non-Virt...) from: 'scripts/get_maintainer.pl --nogit-fallback'
(cc-cmd) Adding cc: qemu-devel@nongnu.org (open list:All patches CC here) from: 'scripts/get_maintainer.pl --nogit-fallback'
Unmatched () '(open list:PowerNV (Non-Virt...)' '' at /usr/lib/git-core/git-send-email line 642.
error: unable to extract a valid address from: qemu-ppc@nongnu.org (open list:PowerNV (Non-Virt...)
What to do with this address? ([q]uit|[d]rop|[e]dit): d
This commit removes all parentheses from results.
Signed-off-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231013091628.669415-1-manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Thomas Huth [Mon, 16 Oct 2023 09:49:17 +0000 (11:49 +0200)]
scripts: Mark feature_to_c.py as non-executable to fix a build issue
Meson tries to run scripts via the shebang line if they files are
marked as executable. If "python3" is not in the $PATH, or if it
is a version that is too old, then the script execution fails.
We should make sure to run scripts via the python3 interpreter
that is used for Meson itself. For this, the files need to be marked
as non-executable, then meson will use the python3 binary that has
been used to run itself.
Fixes: 956af7daad ("gdbstub: Introduce GDBFeature structure") Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20231016094917.19044-1-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Tue, 10 Oct 2023 06:05:39 +0000 (02:05 -0400)]
target/i386/cpu: Fix CPUID_HT exposure
When explicitly booting a multiple vcpus vm with "-cpu +ht", it gets
warning of
warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
Make CPUID_HT as supported unconditionally can resolve the warning.
However it introduces another issue that it also expose CPUID_HT to
guest when "-cpu host/max" with only 1 vcpu. To fix this, need mark
CPUID_HT as the no_autoenable_flags.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20231010060539.210258-1-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to https://peter.eisentraut.org/blog/2014/12/01/ccache-and-clang-part-3
it's already fixed in new version of ccache
According to https://ccache.dev/manual/4.8.html#config_run_second_cpp
CCACHE_CPP2 are default to true for new version ccache
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Message-ID: <20231009165113.498-1-luoyonggang@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fabiano Rosas [Thu, 12 Oct 2023 13:43:43 +0000 (10:43 -0300)]
migration/multifd: Clarify Error usage in multifd_channel_connect
The function is currently called from two sites, one always gives it a
NULL Error and the other always gives it a non-NULL Error.
In the non-NULL case, all it does it trace the error and return. One
of the callers already have tracing, add a tracepoint to the other and
stop passing the error into the function.
Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231012134343.23757-4-farosas@suse.de>
The preferred usage of the Error type is to always set both the return
code and the error when a failure happens. As all code called from the
send thread follows this pattern, we'll always have the return code
and the error set at the same time.
Aside from the convention, in this piece of code this must be the
case, otherwise the if (ret != 0) would be exiting the thread without
calling multifd_send_terminate_threads() which is incorrect.
Unify both paths to make it clear that both are taken when there's an
error.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231012134343.23757-3-farosas@suse.de>
Fabiano Rosas [Wed, 11 Oct 2023 18:46:03 +0000 (15:46 -0300)]
migration/ram: Merge save_zero_page functions
We don't need to do this in two pieces. One single function makes it
easier to grasp, specially since it removes the indirection on the
return value handling.
Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184604.32364-7-farosas@suse.de>
Nikolay Borisov [Wed, 11 Oct 2023 18:45:59 +0000 (15:45 -0300)]
migration/ram: Refactor precopy ram loading code
Extract the ramblock parsing code into a routine that operates on the
sequence of headers from the stream and another the parses the
individual ramblock. This makes ram_load_precopy() easier to
comprehend.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184604.32364-3-farosas@suse.de>
Elena Ufimtseva [Wed, 11 Oct 2023 18:43:58 +0000 (11:43 -0700)]
multifd: reset next_packet_len after sending pages
Sometimes multifd sends just sync packet with no pages
(normal_num is 0). In this case the old value is being
preserved and being accounted for while only packet_len
is being transferred.
Reset it to 0 after sending and accounting for.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184358.97349-5-elena.ufimtseva@oracle.com>
Elena Ufimtseva [Wed, 11 Oct 2023 18:43:57 +0000 (11:43 -0700)]
multifd: fix counters in multifd_send_thread
Previous commit cbec7eb76879d419e7dbf531ee2506ec0722e825
"migration/multifd: Compute transferred bytes correctly"
removed accounting for packet_len in non-rdma
case, but the next_packet_size only accounts for pages, not for
the header packet (normal_pages * PAGE_SIZE) that is being sent
as iov[0]. The packet_len part should be added to account for
the size of MultiFDPacket and the array of the offsets.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184358.97349-4-elena.ufimtseva@oracle.com>
Elena Ufimtseva [Wed, 11 Oct 2023 18:43:55 +0000 (11:43 -0700)]
migration: check for rate_limit_max for RATE_LIMIT_DISABLED
In migration rate limiting atomic operations are used
to read the rate limit variables and transferred bytes and
they are expensive. Check first if rate_limit_max is equal
to RATE_LIMIT_DISABLED and return false immediately if so.
Note that with this patch we will also will stop flushing
by not calling qemu_fflush() from migration_transferred_bytes()
if the migration rate is not exceeded.
This should be fine since migration thread calls in the loop
migration_update_counters from migration_rate_limit() that
calls the migration_transferred_bytes() and flushes there.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184358.97349-2-elena.ufimtseva@oracle.com>
Juan Quintela [Fri, 13 Oct 2023 10:47:27 +0000 (12:47 +0200)]
migration: Improve json and formatting
Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231013104736.31722-2-quintela@redhat.com>
Juan Quintela [Wed, 11 Oct 2023 20:35:24 +0000 (22:35 +0200)]
migration/rdma: Check sooner if we are in postcopy for save_page()
Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-11-quintela@redhat.com>