www.infradead.org Git - users/dwmw2/qemu.git/log

i386/xen: Reserve Xen special pages for console, xenstore rings

Xen has eight frames at 0xfeff8000 for this; we only really need two for
now and KVM puts the identity map at 0xfeffc000, so limit ourselves to
four.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle PV timer hypercalls

Introduce support for one shot and periodic mode of Xen PV timers,
whereby timer interrupts come through a special virq event channel
with deadlines being set through:

1) set_timer_op hypercall (only oneshot)
2) vcpu_op hypercall for {set,stop}_{singleshot,periodic}_timer
hypercalls

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement GNTTABOP_query_size

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Implement HYPERVISOR_grant_table_op and GNTTABOP_[gs]et_verson

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Support mapping grant frames

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Add xen_gnttab device for grant table emulation

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

kvm/i386: Add xen-gnttab-max-frames property

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Support HVM_PARAM_CALLBACK_TYPE_PCI_INTX callback

The guest is permitted to specify an arbitrary domain/bus/device/function
and INTX pin from which the callback IRQ shall appear to have come.

In QEMU we can only easily do this for devices that actually exist, and
even that requires us "knowing" that it's a PCMachine in order to find
the PCI root bus — although that's OK really because it's always true.

We also don't get to get notified of INTX routing changes, because we
can't do that as a passive observer; if we try to register a notifier
it will overwrite any existing notifier callback on the device.

But in practice, guests using PCI_INTX will only ever use pin A on the
Xen platform device, and won't swizzle the INTX routing after they set
it up. So this is just fine.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback

The GSI callback (and later PCI_INTX) is a level triggered interrupt. It
is asserted when an event channel is delivered to vCPU0, and is supposed
to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0
is cleared again.

Thankfully, Xen does *not* assert the GSI if the guest sets its own
evtchn_upcall_pending field; we only need to assert the GSI when we
have delivered an event for ourselves. So that's the easy part, kind of.

There's a slight complexity in that we need to hold the BQL before we
can call qemu_set_irq(), and we definitely can't do that while holding
our own port_lock (because we'll need to take that from the qemu-side
functions that the PV backend drivers will call). So if we end up
wanting to set the IRQ in a context where we *don't* already hold the
BQL, defer to a BH.

However, we *do* need to poll for the evtchn_upcall_pending flag being
cleared. In an ideal world we would poll that when the EOI happens on
the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd
pairs — one is used to trigger the interrupt, and the other works in the
other direction to 'resample' on EOI, and trigger the first eventfd
again if the line is still active.

However, QEMU doesn't seem to do that. Even VFIO level interrupts seem
to be supported by temporarily unmapping the device's BARs from the
guest when an interrupt happens, then trapping *all* MMIO to the device
and sending the 'resample' event on *every* MMIO access until the IRQ
is cleared! Maybe in future we'll plumb the 'resample' concept through
QEMU's irq framework but for now we'll do what Xen itself does: just
check the flag on every vmexit if the upcall GSI is known to be
asserted.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: add monitor commands to test event injection

Specifically add listing, injection of event channels.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_reset

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_bind_vcpu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_bind_interdomain

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_alloc_unbound

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_send

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_bind_ipi

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_bind_virq

Add the array of virq ports to each vCPU so that we can deliver timers,
debug ports, etc. Global virqs are allocated against vCPU 0 initially,
but can be migrated to other vCPUs (when we implement that).

The kernel needs to know about VIRQ_TIMER in order to accelerate timers,
so tell it via KVM_XEN_VCPU_ATTR_TYPE_TIMER. Also save/restore the value
of the singleshot timer across migration, as the kernel will handle the
hypercalls automatically now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_unmask

This finally comes with a mechanism for actually injecting events into
the guest vCPU, with all the atomic-test-and-set that's involved in
setting the bit in the shinfo, then the index in the vcpu_info, and
injecting either the lapic vector as MSI, or letting KVM inject the
bare vector.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_close

It calls an internal close_port() helper which will also be used from
EVTCHNOP_reset and will actually do the work to disconnect/unbind a port
once any of that is actually implemented in the first place.

That in turn calls a free_port() internal function which will be in
error paths after allocation.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Implement EVTCHNOP_status

This adds the basic structure for maintaining the port table and reporting
the status of ports therein.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Add support for Xen event channel delivery to vCPU

The kvm_xen_inject_vcpu_callback_vector() function will either deliver
the per-vCPU local APIC vector (as an MSI), or just kick the vCPU out
of the kernel to trigger KVM's automatic delivery of the global vector.
Support for asserting the GSI/PCI_INTX callbacks will come later.

Also add kvm_xen_get_vcpu_info_hva() which returns the vcpu_info of
a given vCPU.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Add xen_evtchn device for event channel emulation

Include basic support for setting HVM_PARAM_CALLBACK_IRQ to the global
vector method HVM_PARAM_CALLBACK_TYPE_VECTOR, which is handled in-kernel
by raising the vector whenever the vCPU's vcpu_info->evtchn_upcall_pending
flag is set.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HVMOP_set_param

This is the hook for adding the HVM_PARAM_CALLBACK_IRQ parameter in a
subsequent commit.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Split out from another commit]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HVMOP_set_evtchn_upcall_vector

The HVMOP_set_evtchn_upcall_vector hypercall sets the per-vCPU upcall
vector, to be delivered to the local APIC just like an MSI (with an EOI).

This takes precedence over the system-wide delivery method set by the
HVMOP_set_param hypercall with HVM_PARAM_CALLBACK_IRQ. It's used by
Windows and Xen (PV shim) guests but normally not by Linux.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Rework for upstream kernel changes and split from HVMOP_set_param]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_event_channel_op

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch event_channel_op_compat which was never available to HVM guests]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle VCPUOP_register_runstate_memory_area

Allow guest to setup the vcpu runstates which is used as
steal clock.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle VCPUOP_register_vcpu_time_info

In order to support Linux vdso in Xen.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle VCPUOP_register_vcpu_info

Handle the hypercall to set a per vcpu info, and also wire up the default
vcpu_info in the shared_info page for the first 32 vCPUs.

To avoid deadlock within KVM a vCPU thread must set its *own* vcpu_info
rather than it being set from the context in which the hypercall is
invoked.

Add the vcpu_info (and default) GPA to the vmstate_x86_cpu for migration,
and restore it in kvm_arch_put_registers() appropriately.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_vcpu_op

This is simply when guest tries to register a vcpu_info
and since vcpu_info placement is optional in the minimum ABI
therefore we can just fail with -ENOSYS

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_hvm_op

This is when guest queries for support for HVMOP_pagetable_dying.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement XENMEM_add_to_physmap_batch

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_memory_op

Specifically XENMEM_add_to_physmap with space XENMAPSPACE_shared_info to
allow the guest to set its shared_info page.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Use the xen_overlay device, add compat support]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: manage and save/restore Xen guest long_mode setting

Xen will "latch" the guest's 32-bit or 64-bit ("long mode") setting when
the guest writes the MSR to fill in the hypercall page, or when the guest
sets the event channel callback in HVM_PARAM_CALLBACK_IRQ.

KVM handles the former and sets the kernel's long_mode flag accordingly.
The latter will be handled in userspace. Keep them in sync by noticing
when a hypercall is made in a mode that doesn't match qemu's idea of
the guest mode, and resyncing from the kernel. Do that same sync right
before serialization too, in case the guest has set the hypercall page
but hasn't yet made a system call.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: add pc_machine_kvm_type to initialize XEN_EMULATE mode

The xen_overlay device (and later similar devices for event channels and
grant tables) need to be instantiated. Do this from a kvm_type method on
the PC machine derivatives, since KVM is only way to support Xen emulation
for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: Permit --xen-domid argument when accel is KVM

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: David Wooodhouse <dwmw@amazon.co.uk>

hw/xen: Add xen_overlay device for emulating shared xenheap pages

For the shared info page and for grant tables, Xen shares its own pages
from the "Xen heap" to the guest. The guest requests that a given page
from a certain address space (XENMAPSPACE_shared_info, etc.) be mapped
to a given GPA using the XENMEM_add_to_physmap hypercall.

To support that in qemu when *emulating* Xen, create a memory region
(migratable) and allow it to be mapped as an overlay when requested.

Xen theoretically allows the same page to be mapped multiple times
into the guest, but that's hard to track and reinstate over migration,
so we automatically *unmap* any previous mapping when creating a new
one. This approach has been used in production with.... a non-trivial
number of guests expecting true Xen, without any problems yet being
noticed.

This adds just the shared info page for now. The grant tables will be
a larger region, and will need to be overlaid one page at a time. I
think that means I need to create separate aliases for each page of
the overall grant_frames region, so that they can be mapped individually.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Implement SCHEDOP_poll and SCHEDOP_yield

They both do the same thing and just call sched_yield. This is enough to
stop the Linux guest panicking when running on a host kernel which doesn't
intercept SCHEDOP_poll and lets it reach userspace.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_sched_op, SCHEDOP_shutdown

It allows to shutdown itself via hypercall with any of the 3 reasons:
  1) self-reboot
  2) shutdown
  3) crash

Implementing SCHEDOP_shutdown sub op let us handle crashes gracefully rather
than leading to triple faults if it remains unimplemented.

In addition, the SHUTDOWN_soft_reset reason is used for kexec, to reset
Xen shared pages and other enlightenments and leave a clean slate for the
new kernel without the hypervisor helpfully writing information at
unexpected addresses.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch sched_op_compat which was never available for HVM guests,
        Add SCHEDOP_soft_reset]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_xen_version

This is just meant to serve as an example on how we can implement
hypercalls. xen_version specifically since Qemu does all kind of
feature controllability. So handling that here seems appropriate.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Implement kvm_gva_rw() safely]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle guest hypercalls

This means handling the new exit reason for Xen but still
crashing on purpose. As we implement each of the hypercalls
we will then return the right return code.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Add CPL to hypercall tracing, disallow hypercalls from CPL > 0]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen-platform: allow its creation with XEN_EMULATE mode

The only thing we need to fix to make this build is the PIO hack which
sets the BIOS memory areas to R/W v.s. R/O. Theoretically we could hook
that up to the PAM registers on the emulated PIIX, but in practice
nobody cares, so just leave it doing nothing.

Now it builds without actual Xen, move it to CONFIG_XEN_BUS to include it
in the KVM-only builds.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen-platform: exclude vfio-pci from the PCI platform unplug

Such that PCI passthrough devices work for Xen emulated guests.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/kvm: Set Xen vCPU ID in KVM

There are (at least) three different vCPU ID number spaces. One is the
internal KVM vCPU index, based purely on which vCPU was chronologically
created in the kernel first. If userspace threads are all spawned and
create their KVM vCPUs in essentially random order, then the KVM indices
are basically random too.

The second number space is the APIC ID space, which is consistent and
useful for referencing vCPUs. MSIs will specify the target vCPU using
the APIC ID, for example, and the KVM Xen APIs also take an APIC ID
from userspace whenever a vCPU needs to be specified (as opposed to
just using the appropriate vCPU fd).

The third number space is not normally relevant to the kernel, and is
the ACPI/MADT/Xen CPU number which corresponds to cs->cpu_index. But
Xen timer hypercalls use it, and Xen timer hypercalls *really* want
to be accelerated in the kernel rather than handled in userspace, so
the kernel needs to be told.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/kvm: handle Xen HVM cpuid leaves

Introduce support for emulating CPUID for Xen HVM guests. It doesn't make
sense to advertise the KVM leaves to a Xen guest, so do Xen unconditionally
when the xen-version machine property is set.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Obtain xen_version from KVM property, make it automatic]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support

This just initializes the basic Xen support in KVM for now. Only permitted
on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap
overlay, event channel, grant tables and other stuff will exist. There's
no point having the basic hypercall support if nothing else works.

Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used
later by support devices.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: Add XEN_DISABLED mode and make it default

Also set XEN_ATTACH mode in xen_init() to reflect the truth; not that
anyone ever cared before. It was *only* ever checked in xen_init_pv()
before.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: add CONFIG_XEN_BUS and CONFIG_XEN_EMU options for Xen emulation

The XEN_EMU option will cover core Xen support in target/, which exists
only for x86 with KVM today but could theoretically also be implemented
on Arm/Aarch64 and with TCG or other accelerators (if anyone wants to
run the gauntlet of struct layout compatibility, errno mapping, and the
rest of that fui).

It will also cover the support for architecture-independent grant table
and event channel support which will be added in hw/i386/kvm/ (on the
basis that the non-KVM support is very theoretical and making it not use
KVM directly seems like gratuitous overengineering at this point).

The XEN_BUS option is for the xenfv platform support, which will now be
used both by XEN_EMU and by real Xen.

The XEN option remains dependent on the Xen runtime libraries, and covers
support for real Xen. Some code which currently resides under CONFIG_XEN
will be moving to CONFIG_XEN_BUS over time as the direct dependencies on
Xen runtime libraries are eliminated. The Xen PCI platform device will
also reside under CONFIG_XEN_BUS.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

include: import Xen public headers to hw/xen/interface

There's already a partial set here; update them and pull in a more
complete set.

To start with, define __XEN_TOOLS__ in hw/xen/xen.h to ensure that any
internal definitions needed by Xen toolstack libraries are present
regardless of the order in which the headers are included. A reckoning
will come later, once we make the PV backends work in emulation and
untangle the headers for Xen-native vs. generic parts.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Update to Xen public headers from 4.16.2 release, add some in io/,
define __XEN_TOOLS__ in hw/xen/xen.h, move to hw/xen/interface/]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging

# -----BEGIN PGP SIGNATURE-----
# Version: GnuPG v1
#
# iQEcBAABAgAGBQJj7xKYAAoJEO8Ells5jWIRDZQH/Rao24sq3j97qE5RzekvANzq
# GnHUyLnl3yeOSNumv2BJInZTvgUpYL2etGQr3DtGRwOrr7w1vKB3zhY3V3jQefkh
# f4rsEGkamL/qM2N2cGUIUSqevo7OGnP8aQojpEi4MWWZ30B3L6jqd4NqyA1gyndV
# 1eBkpR+BY2PjcLbgvFUZEXeAn/vapE5NKULXUGhg5mMvgwYH3CgZXpqqkxr876za
# S4rZMtReXKNeid14Z35SUjJdV2WKYmo/lN9+GQxF2YNLmDC3RtuFQVm038erSqvs
# uLVSg8tiIlCyOcSDpR/BARNrxVwzlJp5X6ocapHubS/i0Rp/Zo7ezSk/XWH1gfU=
# =UbzF
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 17 Feb 2023 05:37:28 GMT
# gpg:                using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* tag 'net-pull-request' of https://github.com/jasowang/qemu:
  vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check
  net: stream: add a new option to automatically reconnect
  vmnet: stop recieving events when VM is stopped
  net: Increase L2TPv3 buffer to fit jumboframes
  hw/net/vmxnet3: allow VMXNET3_MAX_MTU itself as a value
  hw/net/lan9118: log [read|write]b when mode_16bit is enabled rather than abort
  net: Replace "Supported NIC models" with "Available NIC models"
  net: Restore printing of the help text with "-nic help"
  net: Move the code to collect available NIC models to a separate function

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pr-2023-02-16' of https://gitlab.com/a1xndr/qemu into staging

Replace fork-based fuzzing with reboots.
Now the fuzzers will reboot the guest between inputs.

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEE+tTiv4cTddY0BRfETmYd3lg6lk4FAmPu/LoACgkQTmYd3lg6
# lk6RHg/7BRGI5ZPXb1MmTNCC+SroQ6TT++lO4b0hbkN2HO6U+WVvfuA6+0wg+8qC
# 4bp+G1Tabpcq1MTYUuim6DBtWswgpqr0AbWNwn1eF7hya+3W9woH2POVYY2wwc7m
# S3EdwXCCKo9gGXlaNrotnbwIk+o8B4BzXOXLIlRtg26wGYhT5fkJA/BQcHKDXz37
# ctyWxlyjIM8pNCgfybMvjC7MYtp8DufPsv/rrKx9t0TM7f1jPVgXLek7t0+ZwjeY
# qz2Om2jiij1INgK9hTieWs4eHwpwre6vH2a+JKRkZ3sS7WYcj1auNKVJb3GvDqmc
# wy+Nz5Lz4+aPP19pkCYjfz5w3CqEEsSlSDn5UVRbfl2fbENSceoNwo9huMXsF1pB
# oO6NK2NxbOygmNpYxp+JEt45KFIXzUcIFQwbn8aCDODIl+0H2yu7/ll6XgELf1Pa
# P83THOaVxIxfcI9VOdt/FwDq1ZzmV5nk/BkIGJeIWNYMbU4Gze6YoaL3U8AHDxKH
# f6f3qDzcVJjqD0wKhvYcQ3kSPq+vHc/ioh6mYwos6VUEVYz/SLOY876MaSB/K4PE
# ofBV7y6HvJ6AMwg1TBg4YtOP08gWK+4sYH+I09oU40U3UcwEpkbkQTF72lPQHxFs
# 8UVRJrgWv/xzrwzXTX5ruQ633F8zuhqQTeERqksj1pPHJ3NdHps=
# =F6qI
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 17 Feb 2023 04:04:10 GMT
# gpg:                using RSA key FAD4E2BF871375D6340517C44E661DDE583A964E
# gpg: Good signature from "Alexander Bulekov <alxndr@bu.edu>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: FAD4 E2BF 8713 75D6 3405  17C4 4E66 1DDE 583A 964E

* tag 'pr-2023-02-16' of https://gitlab.com/a1xndr/qemu:
  docs/fuzz: remove mentions of fork-based fuzzing
  fuzz: remove fork-fuzzing scaffolding
  fuzz/i440fx: remove fork-based fuzzer
  fuzz/virtio-blk: remove fork-based fuzzer
  fuzz/virtio-net: remove fork-based fuzzer
  fuzz/virtio-scsi: remove fork-based fuzzer
  fuzz/generic-fuzz: add a limit on DMA bytes written
  fuzz/generic-fuzz: use reboots instead of forks to reset state
  fuzz: add fuzz_reset API
  hw/sparse-mem: clear memory on reset

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'vfio-updates-20230216.0' of https://gitlab.com/alex.williamson/qemu into staging

VFIO updates 2023-02-16

* Initial v2 migration support for vfio (Avihai Horon)

* Add Cédric as vfio reviewer (Cédric Le Goater)

# -----BEGIN PGP SIGNATURE-----
#
# iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmPumhUbHGFsZXgud2ls
# bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsijnMP/0Rz/lsGxym76mXtr5WY
# OR5SDFpifpaUVi+1xTugYFPnZvN+RdnlcLrcp1g8G+lmd4ANqwT0b9XTTTI8WTau
# DhSHW/05WgAOrf/jOSV29oNSf7jtGJZcDbAy8f5NXxwK/IRlJEDJfCaqxwYSyYf1
# nfC0ZwMTrBrA6pzF5OzIJSkhl/uPwlTsBxRnbN86Z22rE128ASjUtj1jir4rPLg0
# ClUn7Rrdk/Y6uXIB9c6TFC+wmG0QAVsklWIeNLUFWUak4H0gqp7AUmMlJV99i5Q7
# 3H4Zjspwn79llvGm4X1QpuLaop2QaIQaW4FTpzRSftelEosjIjkTCMrWTb4MKff1
# cgT0dmC1Hht+zQ0MPbmgeaiwPH/V7r+J9GffG6p2b4itdHmrKVsqKQMSQS/IJFBw
# eiO1rENRXNcTnC29jPUhe1IS1DEwCNkWm9NgJoC5WPJYQXsiEvo4YDH/30FnByXg
# KQdd5OxR7o6qJM5e4PUn4wd9sHsYU8IsIEJdKnynoS9qUdPqv0tJ+tLYWcBhQPJq
# M8R+mDwImMzw0bgurg4607VgL9HJEXna2rgdd5hcMq88M+M5OpmowXlk4TTY4Ha9
# lmWSndYJG6npNY4NXcxbe4x5H8ndvHcO+g3weynsxPFjnL959NzQyWNFXFDBqBg3
# fhNVqYTrMOcEN5uv18o+mnsG
# =oK7/
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 16 Feb 2023 21:03:17 GMT
# gpg:                using RSA key 42F6C04E540BD1A99E7B8A90239B9B6E3BB08B22
# gpg:                issuer "alex.williamson@redhat.com"
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" [full]
# gpg:                 aka "Alex Williamson <alex@shazbot.org>" [full]
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>" [full]
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>" [full]
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* tag 'vfio-updates-20230216.0' of https://gitlab.com/alex.williamson/qemu:
  MAINTAINERS: Add myself as VFIO reviewer
  docs/devel: Align VFIO migration docs to v2 protocol
  vfio: Alphabetize migration section of VFIO trace-events file
  vfio/migration: Remove VFIO migration protocol v1
  vfio/migration: Implement VFIO migration protocol v2
  vfio/migration: Rename functions/structs related to v1 protocol
  vfio/migration: Move migration v1 logic to vfio_migration_init()
  vfio/migration: Block multiple devices migration
  vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one
  vfio/migration: Allow migration without VFIO IOMMU dirty tracking support
  vfio/migration: Fix NULL pointer dereference bug
  linux-headers: Update to v6.2-rc8

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-virtiofs-20230216b' of https://gitlab.com/dagrh/qemu into staging

Remove C virtiofsd

We deprecated the C virtiofsd in commit 34deee7b6a1418f3d62a
in v7.0 in favour of the Rust implementation at

  https://gitlab.com/virtio-fs/virtiofsd

since then, the Rust version has had more development and
has held up well.  It's time to say goodbye to the C version
that got us going.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmPudNkACgkQBRYzHrxb
# /ed2ZBAAlz+bjTIoWjJr5/5nSjydd5ucARYDX4n0PI2byVDHFVaUCTIi+rLrxCYs
# qnb7HVmQW4y6zy15sM1xsbqSyrVgqvDheAJPWVekkoIuVT1t6wVpAZ7sykwx8U1I
# ge2T6pXcc4xKptyGnMAB0v0T5r9hN2Peghg3/KBn6WSD1WM2rD6KmvU4untYOeST
# I/KeoEDc4WUHtaPoDIduQlEcxGKJft6ifS0ksL0Jlf06aDg9UXxcuovtN6GtmnD2
# oNPYR0qG6aN4FynTrcVBN38N3cEosdCtEK0kgvbxulnQ4Iwxyi9hBkvUJA3UmjQ/
# THkWa9Gl+bFTGfNFxUEBV+0bBI46MFn2zXmpatPeV6NvKhiaDi4DDUczueUH1+s+
# C5KWYN3LuDznmM2NQzFipG1NtP2tif6wM2dYTOHf62n4UZBSe0xSdM1OKwqKXQnN
# w5TPlZEvnaYY7vz2fjDlnLKAD9WxlxvMYjr/eJrrjDPSWGxAoe59q0nXBlzXi1Bl
# 6GcCqt/GQpLbY9X2l2pb1bvFOZcPtPZ6CiLBCslKZ5MxmiCvZWnJQ2ZHe9ccQeUX
# 22wWB5gkvWz/1bPddQR7JJ48HxBEPRd4aZ93A3jJfZqWCaTaHQ6bZboghVywMbXJ
# P0wkwaXsFshcyuZfus/dq61y+jsIVR3EyxxRMxd2rO6Mg6nvTcs=
# =0FYt
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 16 Feb 2023 18:24:25 GMT
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* tag 'pull-virtiofs-20230216b' of https://gitlab.com/dagrh/qemu:
  virtiofsd: Swing deprecated message to removed-features
  virtiofsd: Remove source
  virtiofsd: Remove build and docs glue
  virtiofsd: Remove test

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

Block layer patches

- configure: Enable -Wthread-safety if present
- no_co_wrapper to fix bdrv_open*() calls from coroutine context
- curl fixes, including enablement of newer libcurl versions
- MAINTAINERS: drop Vladimir from parallels block driver
- hbitmap: fix hbitmap_status() return value for first dirty bit case
- file-posix: Fix assertion failure in write_zeroes after moving
  bdrv_getlength() to co_wrapper

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmPvgm0RHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9ZxQg//ZWwwh/s/P1PnKAjInNZZNklAWKThNEbZ
# cF1S94w26IhEQqM0i6MflqcDsPU5t4xZtBUOizx++9M4G8amWnomJSdczUcKULla
# Az9yweFC1Gu6ENdw+ql5VOzCfpdH5Bn9Jkly5fxuI4vmnBz1PH1Dnd3P4wuLq2sL
# xna5dijVEhRc5mTKWjbp4nFfvQhucuEBPSNjgnZwEPbhciWxTMmB1GmyRvTxZy8v
# UY8PcoTlxdKeVQ6DTmkOirphpGj7HeNCEQnZppWs7vHys2oGi9kmR5qTKUNZxGrY
# 8yWiCiVDqbb50fhEC1srhph79bCij87QC1N33Bm+NuGjnjG4bKVx2B9DC8+6S/JS
# e3x6u+r0dd6/t0rjKnt1+inYqmM+i5lBJ7+R0yhWUQ+DYkvttNf5yiotD8qvccWJ
# Kcx14lfjPLK7siAMEY5K0bNMimhN4RR9oCLoPTOHei+vlxdfiMm2XPN61NNht5gD
# lYZ8JMBsEF/o2ebqTgsJrIHS+Q/8MqcwSunBc54fcXZoF+eiza3W2ArXLNfAEfGE
# U4JowNK2PrTIrpEjD+Vs0RsBBSmN5PcYIAz04ioODpDnYMq73/t3x9MKdVoxOT64
# AM7w58fSyWu8iwvkeA0d3XeVtSHFqZ49PqqIem4IegtnC/AXMUNrJ/VT99xHjeJY
# oLhOJz7LUg0=
# =FtaA
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 17 Feb 2023 13:34:37 GMT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (22 commits)
  hbitmap: fix hbitmap_status() return value for first dirty bit case
  block/file-posix: don't use functions calling AIO_WAIT_WHILE in worker threads
  MAINTAINERS: drop Vladimir from parallels block driver
  block: temporarily hold the new AioContext of bs_top in bdrv_append()
  block: Handle curl 7.55.0, 7.85.0 version changes
  block: Assert non-coroutine context for bdrv_open_inherit()
  block: Fix bdrv_co_create_opts_simple() to open images with no_co_wrapper
  vpc: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper
  vmdk: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper
  vhdx: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper
  vdi: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper
  qed: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper
  qcow2: Fix open/create to open images with no_co_wrapper
  qcow: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper
  parallels: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper
  luks: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper
  block: Create no_co_wrappers for open functions
  block-coroutine-wrapper: Introduce no_co_wrapper
  curl: Fix error path in curl_open()
  configure: Enable -Wthread-safety if present
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hbitmap: fix hbitmap_status() return value for first dirty bit case

The last return statement should return true, as we already evaluated that
start == next_dirty

Also, fix hbitmap_status() description in header

Cc: qemu-stable@nongnu.org
Fixes: a6426475a75 ("block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()")
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
Message-Id: <20230202181523.423131-1-andrey.zhadchenko@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/file-posix: don't use functions calling AIO_WAIT_WHILE in worker threads

When calling bdrv_getlength() in handle_aiocb_write_zeroes(), the
function creates a new coroutine and then waits that it finishes using
AIO_WAIT_WHILE.
The problem is that this function could also run in a worker thread,
that has a different AioContext from main loop and iothreads, therefore
in AIO_WAIT_WHILE we will have in_aio_context_home_thread(ctx) == false
and therefore
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
in the else branch will fail, crashing QEMU.

Aside from that, bdrv_getlength() is wrong also conceptually, because
it reads the BDS graph from another thread and is not protected by
any lock.

Replace it with raw_co_getlength, that doesn't create a coroutine and
doesn't read the BDS graph.

Reported-by: Ninad Palsule <ninad@linux.vnet.ibm.com>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230209154522.1164401-1-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

MAINTAINERS: drop Vladimir from parallels block driver

I have to admit this is out of my scope now. Still feel free to Cc me
directly if my help is needed :)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230214182848.1564714-1-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: temporarily hold the new AioContext of bs_top in bdrv_append()

bdrv_append() is called with bs_top AioContext held, but
bdrv_attach_child_noperm() could change the AioContext of bs_top.

bdrv_replace_node_noperm() calls bdrv_drained_begin() starting from
commit 2398747128 ("block: Don't poll in bdrv_replace_child_noperm()").
bdrv_drained_begin() can call BDRV_POLL_WHILE that assumes the new lock
is taken, so let's temporarily hold the new AioContext to prevent QEMU
from failing in BDRV_POLL_WHILE when it tries to release the wrong
AioContext.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2168209
Reported-by: Aihua Liang <aliang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20230214171621.11574-1-sgarzare@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Handle curl 7.55.0, 7.85.0 version changes

* 7.55.0 deprecates CURLINFO_CONTENT_LENGTH_DOWNLOAD in favour of a *_T
  version, which returns curl_off_t instead of a double.
* 7.85.0 deprecates CURLOPT_PROTOCOLS and CURLOPT_REDIR_PROTOCOLS in
  favour of *_STR variants, specifying the desired protocols via a
  string.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1440
Signed-off-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20230123201431.23118-1-anjo@rev.ng>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Assert non-coroutine context for bdrv_open_inherit()

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-14-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Fix bdrv_co_create_opts_simple() to open images with no_co_wrapper

bdrv_co_create_opts_simple() runs in a coroutine. Therefore it is not
allowed to open images directly. Fix the call to use the corresponding
no_co_wrapper instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-13-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

vpc: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper

.bdrv_co_create implementations run in a coroutine. Therefore they are
not allowed to open images directly. Fix the calls to use the
corresponding no_co_wrappers instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-12-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

vmdk: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper

.bdrv_co_create implementations run in a coroutine. Therefore they are
not allowed to open images directly. Fix the calls to use the
corresponding no_co_wrappers instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-11-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

vhdx: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper

.bdrv_co_create implementations run in a coroutine. Therefore they are
not allowed to open images directly. Fix the calls to use the
corresponding no_co_wrappers instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-10-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

vdi: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper

.bdrv_co_create implementations run in a coroutine. Therefore they are
not allowed to open images directly. Fix the calls to use the
corresponding no_co_wrappers instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-9-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qed: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper

.bdrv_co_create implementations run in a coroutine. Therefore they are
not allowed to open images directly. Fix the calls to use the
corresponding no_co_wrappers instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-8-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Fix open/create to open images with no_co_wrapper

.bdrv_co_create implementations run in a coroutine, as does
qcow2_do_open(). Therefore they are not allowed to open images directly.
Fix the calls to use the corresponding no_co_wrappers instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-7-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper

.bdrv_co_create implementations run in a coroutine. Therefore they are
not allowed to open images directly. Fix the calls to use the
corresponding no_co_wrappers instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-6-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

parallels: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper

.bdrv_co_create implementations run in a coroutine. Therefore they are
not allowed to open images directly. Fix the calls to use the
corresponding no_co_wrappers instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-5-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

luks: Fix .bdrv_co_create(_opts) to open images with no_co_wrapper

.bdrv_co_create implementations run in a coroutine. Therefore they are
not allowed to open images directly. Fix the calls to use the
corresponding no_co_wrappers instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-4-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Create no_co_wrappers for open functions

Images can't be opened in coroutine context because opening needs to
change the block graph. Add no_co_wrappers so that coroutines have a
simple way of opening images in a BH instead.

At the same time, mark the wrapped functions as no_coroutine_fn.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-3-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-coroutine-wrapper: Introduce no_co_wrapper

Some functions must not be called from coroutine context. The common
pattern to use them anyway from a coroutine is running them in a BH and
letting the calling coroutine yield to be woken up when the BH is
completed.

Instead of manually writing such wrappers, add support for generating
them to block-coroutine-wrapper.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230126172432.436111-2-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

curl: Fix error path in curl_open()

g_hash_table_destroy() and g_hash_table_foreach_remove() (called by
curl_drop_all_sockets()) both require the table to be non-NULL, or will
print assertion failures (just print, no abort).

There are several paths in curl_open() that can lead to the out_noclean
label without s->sockets being allocated, so clean it only if it has
been allocated.

Example reproducer:
$ qemu-img info -f http ''
qemu-img: GLib: g_hash_table_foreach_remove: assertion 'hash_table != NULL' failed
qemu-img: GLib: g_hash_table_destroy: assertion 'hash_table != NULL' failed
qemu-img: Could not open '': http curl driver cannot handle the URL '' (does not start with 'http://')

Closes: https://gitlab.com/qemu-project/qemu/-/issues/1475
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230206132949.92917-1-hreitz@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

configure: Enable -Wthread-safety if present

This enables clang's thread safety analysis (TSA), which we'll use to
statically check the block graph locking.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-9-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230117135203.3049709-4-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

bsd-user/mmap: use TSA_NO_TSA to suppress clang TSA warnings in FreeBSD

FreeBSD implements pthread headers using TSA (thread safety analysis)
annotations, therefore when an application is compiled with
-Wthread-safety there are some locking/annotation requirements that the
user of the pthread API has to follow.

This will also be the case in QEMU, since bsd-user/mmap.c uses the
pthread API. Therefore when building it with -Wthread-safety the
compiler will throw warnings because the functions are not properly
annotated. We need TSA to be enabled because it ensures that the
critical sections of an annotated variable are properly locked.

In order to make the compiler happy and avoid adding all the necessary
macros to all callers (lock functions should use TSA_ACQUIRE, while
unlock TSA_RELEASE, and this applies to all users of pthread_mutex_lock
and pthread_mutex_unlock), simply use TSA_NO_TSA to supppress such
warnings.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230117135203.3049709-3-eesposit@redhat.com>
Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

util/qemu-thread-posix: use TSA_NO_TSA to suppress clang TSA warnings in FreeBSD

FreeBSD implements pthread headers using TSA (thread safety analysis)
annotations, therefore when an application is compiled with
-Wthread-safety there are some locking/annotation requirements that the
user of the pthread API has to follow.

This will also be the case in QEMU, since util/qemu-thread-posix.c uses
the pthread API. Therefore when building it with -Wthread-safety, the
compiler will throw warnings because the functions are not properly
annotated. We need TSA to be enabled because it ensures that the
critical sections of an annotated variable are properly locked.

In order to make the compiler happy and avoid adding all the necessary
macros to all callers (lock functions should use TSA_ACQUIRE, while
unlock TSA_RELEASE, and this applies to all users of pthread_mutex_lock
and pthread_mutex_unlock), simply use TSA_NO_TSA to supppress such
warnings.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230117135203.3049709-2-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check

VHOST_BACKEND_F_IOTLB_ASID is the feature bit, not the bitmask. Since
the device under test also provided VHOST_BACKEND_F_IOTLB_MSG_V2 and
VHOST_BACKEND_F_IOTLB_BATCH, this went unnoticed.

Fixes: c1a1008685 ("vdpa: always start CVQ in SVQ mode if possible")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

net: stream: add a new option to automatically reconnect

In stream mode, if the server shuts down there is currently
no way to reconnect the client to a new server without removing
the NIC device and the netdev backend (or to reboot).

This patch introduces a reconnect option that specifies a delay
to try to reconnect with the same parameters.

Add a new test in qtest to test the reconnect option and the
connect/disconnect events.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

vmnet: stop recieving events when VM is stopped

When the VM is stopped using the HMP command "stop", soon the handler will
stop reading from the vmnet interface. This causes a flood of
`VMNET_INTERFACE_PACKETS_AVAILABLE` events to arrive and puts the host CPU
at 100%. We fix this by removing the event handler from vmnet when the VM
is no longer in a running state and restore it when we return to a running
state.

Signed-off-by: Joelle van Dyne <j@getutm.app>
Signed-off-by: Jason Wang <jasowang@redhat.com>

net: Increase L2TPv3 buffer to fit jumboframes

Increase the allocated buffer size to fit larger packets.
Given that jumboframes can commonly be up to 9000 bytes the closest suitable
value seems to be 16 KiB.

Tested by running qemu towards a Linux L2TPv3 endpoint and pushing
jumboframe traffic through the interfaces.

Signed-off-by: Christian Svensson <blue@cmd.nu>
Signed-off-by: Jason Wang <jasowang@redhat.com>

hw/net/vmxnet3: allow VMXNET3_MAX_MTU itself as a value

Currently, VMXNET3_MAX_MTU itself (being 9000) is not considered a
valid value for the MTU, but a guest running ESXi 7.0 might try to
set it and fail the assert [0].

In the Linux kernel, dev->max_mtu itself is a valid value for the MTU
and for the vmxnet3 driver it's 9000, so a guest running Linux will
also fail the assert when trying to set an MTU of 9000.

VMXNET3_MAX_MTU and s->mtu don't seem to be used in relation to buffer
allocations/accesses, so allowing the upper limit itself as a value
should be fine.

[0]: https://forum.proxmox.com/threads/114011/

Fixes: d05dcd94ae ("net: vmxnet3: validate configuration values during activate (CVE-2021-20203)")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

hw/net/lan9118: log [read|write]b when mode_16bit is enabled rather than abort

This patch replaces hw_error to guest error log for [read|write]b
accesses when mode_16bit is enabled. This avoids aborting qemu.

Fixes: 1248f8d4cbc3 ("hw/lan9118: Add basic 16-bit mode support.")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1433
Reported-by: Qiang Liu <cyruscyliu@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Qiang Liu <cyruscyliu@gmail.com>
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>

net: Replace "Supported NIC models" with "Available NIC models"

Just because a NIC model is compiled into the QEMU binary does not
necessary mean that it can be used with each and every machine.
So let's rather talk about "available" models instead of "supported"
models, just to avoid confusion.

Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

net: Restore printing of the help text with "-nic help"

Running QEMU with "-nic help" used to work in QEMU 5.2 and earlier versions
(it showed the available netdev backends), but this feature got broken during
some refactoring in version 6.0. Let's restore the old behavior, and while
we're at it, let's also print the available NIC models here now since this
option can be used to configure both, netdev backend and model in one go.

Fixes: ad6f932fe8 ("net: do not exit on "netdev_add help" monitor command")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

net: Move the code to collect available NIC models to a separate function

The code that collects the available NIC models is not really specific
to PCI anymore and will be required in the next patch, too, so let's
move this into a new separate function in net.c instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

docs/fuzz: remove mentions of fork-based fuzzing

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

fuzz: remove fork-fuzzing scaffolding

Fork-fuzzing provides a few pros, but our implementation prevents us
from using fuzzers other than libFuzzer, and may be causing issues such
as coverage-failure builds on OSS-Fuzz. It is not a great long-term
solution as it depends on internal implementation details of libFuzzer
(which is no longer in active development). Remove it in favor of other
methods of resetting state between inputs.

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

fuzz/i440fx: remove fork-based fuzzer

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

fuzz/virtio-blk: remove fork-based fuzzer

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

fuzz/virtio-net: remove fork-based fuzzer

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

fuzz/virtio-scsi: remove fork-based fuzzer

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

fuzz/generic-fuzz: add a limit on DMA bytes written

As we have repplaced fork-based fuzzing, with reboots - we can no longer
use a timeout+exit() to avoid slow inputs. Libfuzzer has its own timer
that it uses to catch slow inputs, however these timeouts are usually
seconds-minutes long: more than enough to bog-down the fuzzing process.
However, I found that slow inputs often attempt to fill overly large DMA
requests. Thus, we can mitigate most timeouts by setting a cap on the
total number of DMA bytes written by an input.

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

fuzz/generic-fuzz: use reboots instead of forks to reset state

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

fuzz: add fuzz_reset API

As we are converting most fuzzers to rely on reboots to reset state,
introduce an API to make sure reboots are invoked in a consistent
manner.

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>

hw/sparse-mem: clear memory on reset

We use sparse-mem for fuzzing. For long-running fuzzing processes, we
eventually end up with many allocated sparse-mem pages. To avoid this,
clear the allocated pages on system-reset.

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

MAINTAINERS: Add myself as VFIO reviewer

To show my interest in the VFIO susbsystem, let's start reviewing code.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/r/20230119185736.616664-1-clg@kaod.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

docs/devel: Align VFIO migration docs to v2 protocol

Now that VFIO migration protocol v2 has been implemented and v1 protocol
has been removed, update the documentation according to v2 protocol.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-12-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Alphabetize migration section of VFIO trace-events file

Sort the migration section of VFIO trace events file alphabetically
and move two misplaced traces to common.c section.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-11-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/migration: Remove VFIO migration protocol v1

Now that v2 protocol implementation has been added, remove the
deprecated v1 implementation.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-10-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/migration: Implement VFIO migration protocol v2

Implement the basic mandatory part of VFIO migration protocol v2.
This includes all functionality that is necessary to support
VFIO_MIGRATION_STOP_COPY part of the v2 protocol.

The two protocols, v1 and v2, will co-exist and in the following patches
v1 protocol code will be removed.

There are several main differences between v1 and v2 protocols:
- VFIO device state is now represented as a finite state machine instead
  of a bitmap.

- Migration interface with kernel is now done using VFIO_DEVICE_FEATURE
  ioctl and normal read() and write() instead of the migration region.

- Pre-copy is made optional in v2 protocol. Support for pre-copy will be
  added later on.

Detailed information about VFIO migration protocol v2 and its difference
compared to v1 protocol can be found here [1].

[1]
https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>.
Link: https://lore.kernel.org/r/20230216143630.25610-9-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/migration: Rename functions/structs related to v1 protocol

To avoid name collisions, rename functions and structs related to VFIO
migration protocol v1. This will allow the two protocols to co-exist
when v2 protocol is added, until v1 is removed. No functional changes
intended.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-8-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>