www.infradead.org Git - users/dwmw2/qemu.git/log

printf debugging for intack

Notify IRQ sources of level interrupt ack/EOI
This allows drivers to register a callback on a qemu_irq, which is
invoked when a level-triggered IRQ is acked on the irqchip.

This allows us to simulate level-triggered interrupts more efficiently,
by resampling the state of the interrupt only when it actually matters.

This can be used in two ways.

The example in the patch below shows the event source literally being
resampled from the callback — in this case the line level is tied to
the Xen evtchn_upcall_pending flag in vCPU0's vcpu_info, and the
callback from the irqchip allows us to avoid having to constantly poll
for that being clearer).

As we hook it up to INTx interrupts on VFIO PCI devices, it would
unconditionally return 'true' to clear the level in the irqchip, and
also send an event on the 'resample' eventfd to the kernel, so that the
kernel will reraise the interrupt if it's still actually physically set
on the device.

There's theoretically a race condition there, if the kernel reraises
the interrupt before the callback even returns and the irqchip clears
its internal s->irr. But I think we get away with it by being single-
threaded for the I/O processing so we won't actually consume the event
until later?

It was the Xen part firt that offended me, having to poll on vmexit:
https://git.infradead.org/users/dwmw2/qemu.git/commitdiff/7bada5e4f#patch5

But then I looked at how VFIO handles this, and it offends me even
more; sending the resample eventfd down to the kernel on *ever* MMIO
read/write.... having unmapped the device's BARs from the guest in
order to *trap* those MMIO accesses... with a timer to map it back
again...

It'll take a little more work to hook up the reverse path for the ack
back through PCI INTx handling, a bit like I've had to do it with
gsi_ack_handler to convey the ack events back from the {i8259,ioapic}
qemu_irq to the GSI qemu_irqs. And I'll need to do it for more than
just i8259 and ioapic. But I suspect it'll be worth it.

Opinions?

Tested by booting a (KVM) Xen guest with xen_no_vector_callback on its
command line, and sometimes also 'apic=off'. In PIC mode we still seem
to get two interrupts per event, but I think that's actually genuine
because printfs in the evtchn code confirm that ->evtchn_upcall_pending
for vCPU0 really *is* still set the first time the interrupt is acked
in the i8259 and genuinely doesn't get cleared.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: Initialize XenBus and legacy backends from pc_init1()

Now that we're close to being able to use the PV backends without actual
Xen, move the bus instantiation out from xen_hvm_init_pc() to pc_init1().

However, still only do it for (xen_mode == XEN_ATTACH) (i.e. when running
on true Xen) because we don't have XenStore ops for emulation yet, and
the XenBus instantiation failure is fatal. Once we have a functional
XenStore for emulated mode, this will become (xen_mode != XEN_DISABLED).

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Remove old version of Xen headers

For Xen emulation support, a full set of Xen headers was imported to
include/standard-headers/xen. This renders the original partial set in
include/hw/xen/interface redundant.

Now that the header ordering and handling of __XEN_INTERFACE_VERSION__
vs. __XEN_TOOLS__ is all untangled, remove the old set of headers and
use only the new ones.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement soft reset for emulated gnttab

This is only part of it; we will also need to get the PV back end drivers
to tear down their own mappings (or do it for them, but they kind of need
to stop using the pointers too).

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add backend implementation of grant table operations

This is limited to mapping a single grant at a time, because under Xen the
pages are mapped *contiguously* into qemu's address space, and that's very
hard to do when those pages actually come from anonymous mappings in qemu
in the first place.

Eventually perhaps we can look at using shared mappings of actual objects
for system RAM, and then we can make new mappings of the same backing
store (be it deleted files, shmem, whatever). But for now let's stuck to
a page at a time.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Map guest XENSTORE_PFN grant in emulated Xenstore

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Build PV backend drivers for XENFV_MACHINE

Now that we have the redirectable Xen backend operations we can build the
PV backends even without the Xen libraries.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Rename xen_common.h to xen_native.h

This header is now only for native Xen code, not PV backends that may be
used in Xen emulation. Since the toolstack libraries may depend on the
specific version of Xen headers that they pull in (and will set the
__XEN_TOOLS__ macro to enable internal definitions that they depend on),
the rule is that xen_native.h (and thus the toolstack library headers)
must be included *before* any of the headers in standard-headers/xen.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Use XEN_PAGE_SIZE in PV backend drivers

XC_PAGE_SIZE comes from the actual Xen libraries, while XEN_PAGE_SIZE is
provided by QEMU itself in xen_backend_ops.h. For backends which may be
built for emulation mode, use the latter.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Move xenstore_store_pv_console_info to xen_console.c

There's no need for this to be in the Xen accel code, and as we want to
use the Xen console support with KVM-emulated Xen we'll want to have a
platform-agnostic version of it. Make it use GString to build up the
path while we're at it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add xenstore operations to allow redirection to internal emulation

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add foreignmem operations to allow redirection to internal emulation

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>

hw/xen: Pass grant ref to gnttab unmap

Under real Xen we only need to munmap() but in the emulation code will want
to know the original ref to keep the the tracking straight.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add gnttab operations to allow redirection to internal emulation

In emulation, mapping more than one grant ref to be virtually contiguous
would be fairly difficult. The best way to do it might be to make the
ram_block mappings actually backed by a file (shmem or a deleted file,
perhaps) so that we can have multiple *shared* mappings of it. But that
would be fairly intrusive.

Making the backend drivers cope with page *lists* instead of expecting
the mapping to be contiguous is also non-trivial, since some structures
would actually *cross* page boundaries (e.g. the 32-bit blkif responses
which are 12 bytes).

So for now, we'll support only single-page mappings in emulation.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>

hw/xen: Add emulated evtchn ops

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add evtchn operations to allow redirection to internal emulation

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>

hw/xen: Add basic ring handling to xenstore

Extract requests, return ENOSYS to all of them. This is enough to allow
older Linux guests to boot, as they need *something* back but it doesn't
matter much what.

In the first instance we're likely to wire this up over a UNIX socket to
an actual xenstored implementation, but in the fullness of time it would
be nice to have a fully single-tenant "virtual" xenstore within qemu.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add xen_xenstore device for xenstore emulation

Just the basic shell, with the event channel hookup. It only dumps the
buffer for now; a real ring implmentation will come in a subsequent patch.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add backend implementation of interdomain event channel support

The provides the QEMU side of interdomain event channels, allowing events
to be sent to/from the guest.

The API mirrors libxenevtchn, and in time both this and the real Xen one
will be available through ops structures so that the PV backend drivers
can use the correct one as appropriate.

For now, this implementation can be used directly by our XenStore which
will be for emulated mode only.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: handle HVMOP_get_param

Which is used to fetch xenstore PFN and port to be used
by the guest. This is preallocated by the toolstack when
guest will just read those and use it straight away.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: Reserve Xen special pages for console, xenstore rings

Xen has eight frames at 0xfeff8000 for this; we only really need two for
now and KVM puts the identity map at 0xfeffc000, so limit ourselves to
four.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: handle PV timer hypercalls

Introduce support for one shot and periodic mode of Xen PV timers,
whereby timer interrupts come through a special virq event channel
with deadlines being set through:

1) set_timer_op hypercall (only oneshot)
2) vcpu_op hypercall for {set,stop}_{singleshot,periodic}_timer
hypercalls

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement GNTTABOP_query_size

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: Implement HYPERVISOR_grant_table_op and GNTTABOP_[gs]et_verson

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Support mapping grant frames

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add xen_gnttab device for grant table emulation

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

kvm/i386: Add xen-gnttab-max-frames property

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Support HVM_PARAM_CALLBACK_TYPE_PCI_INTX callback

The guest is permitted to specify an arbitrary domain/bus/device/function
and INTX pin from which the callback IRQ shall appear to have come.

In QEMU we can only easily do this for devices that actually exist, and
even that requires us "knowing" that it's a PCMachine in order to find
the PCI root bus — although that's OK really because it's always true.

We also don't get to get notified of INTX routing changes, because we
can't do that as a passive observer; if we try to register a notifier
it will overwrite any existing notifier callback on the device.

But in practice, guests using PCI_INTX will only ever use pin A on the
Xen platform device, and won't swizzle the INTX routing after they set
it up. So this is just fine.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback

The GSI callback (and later PCI_INTX) is a level triggered interrupt. It
is asserted when an event channel is delivered to vCPU0, and is supposed
to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0
is cleared again.

Thankfully, Xen does *not* assert the GSI if the guest sets its own
evtchn_upcall_pending field; we only need to assert the GSI when we
have delivered an event for ourselves. So that's the easy part.

However, we *do* need to poll for the evtchn_upcall_pending flag being
cleared. In an ideal world we would poll that when the EOI happens on
the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd
pairs — one is used to trigger the interrupt, and the other works in the
other direction to 'resample' on EOI, and trigger the first eventfd
again if the line is still active.

However, QEMU doesn't seem to do that. Even VFIO level interrupts seem
to be supported by temporarily unmapping the device's BARs from the
guest when an interrupt happens, then trapping *all* MMIO to the device
and sending the 'resample' event on *every* MMIO access until the IRQ
is cleared! Maybe in future we'll plumb the 'resample' concept through
QEMU's irq framework but for now we'll do what Xen itself does: just
check the flag on every vmexit if the upcall GSI is known to be
asserted.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: add monitor commands to test event injection

Specifically add listing, injection of event channels.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_reset

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_bind_vcpu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_bind_interdomain

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_alloc_unbound

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_send

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_bind_ipi

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_bind_virq

Add the array of virq ports to each vCPU so that we can deliver timers,
debug ports, etc. Global virqs are allocated against vCPU 0 initially,
but can be migrated to other vCPUs (when we implement that).

The kernel needs to know about VIRQ_TIMER in order to accelerate timers,
so tell it via KVM_XEN_VCPU_ATTR_TYPE_TIMER. Also save/restore the value
of the singleshot timer across migration, as the kernel will handle the
hypercalls automatically now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_unmask

This finally comes with a mechanism for actually injecting events into
the guest vCPU, with all the atomic-test-and-set that's involved in
setting the bit in the shinfo, then the index in the vcpu_info, and
injecting either the lapic vector as MSI, or letting KVM inject the
bare vector.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_close

It calls an internal close_port() helper which will also be used from
EVTCHNOP_reset and will actually do the work to disconnect/unbind a port
once any of that is actually implemented in the first place.

That in turn calls a free_port() internal function which will be in
error paths after allocation.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_status

This adds the basic structure for maintaining the port table and reporting
the status of ports therein.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: Add support for Xen event channel delivery to vCPU

The kvm_xen_inject_vcpu_callback_vector() function will either deliver
the per-vCPU local APIC vector (as an MSI), or just kick the vCPU out
of the kernel to trigger KVM's automatic delivery of the global vector.
Support for asserting the GSI/PCI_INTX callbacks will come later.

Also add kvm_xen_get_vcpu_info_hva() which returns the vcpu_info of
a given vCPU.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add xen_evtchn device for event channel emulation

Include basic support for setting HVM_PARAM_CALLBACK_IRQ to the global
vector method HVM_PARAM_CALLBACK_TYPE_VECTOR, which is handled in-kernel
by raising the vector whenever the vCPU's vcpu_info->evtchn_upcall_pending
flag is set.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: implement HVMOP_set_param

This is the hook for adding the HVM_PARAM_CALLBACK_IRQ parameter in a
subsequent commit.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Split out from another commit]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: implement HVMOP_set_evtchn_upcall_vector

The HVMOP_set_evtchn_upcall_vector hypercall sets the per-vCPU upcall
vector, to be delivered to the local APIC just like an MSI (with an EOI).

This takes precedence over the system-wide delivery method set by the
HVMOP_set_param hypercall with HVM_PARAM_CALLBACK_IRQ. It's used by
Windows and Xen (PV shim) guests but normally not by Linux.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Rework for upstream kernel changes and split from HVMOP_set_param]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_event_channel_op

Additionally set XEN_INTERFACE_VERSION to most recent in order to
exercise the "new" event_channel_op.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch event_channel_op_compat which was never available to HVM guests]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: handle VCPUOP_register_runstate_memory_area

Allow guest to setup the vcpu runstates which is used as
steal clock.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: handle VCPUOP_register_vcpu_time_info

In order to support Linux vdso in Xen.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: handle VCPUOP_register_vcpu_info

Handle the hypercall to set a per vcpu info, and also wire up the default
vcpu_info in the shared_info page for the first 32 vCPUs.

To avoid deadlock within KVM a vCPU thread must set its *own* vcpu_info
rather than it being set from the context in which the hypercall is
invoked.

Add the vcpu_info (and default) GPA to the vmstate_x86_cpu for migration,
and restore it in kvm_arch_put_registers() appropriately.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: implement HYPERVISOR_vcpu_op

This is simply when guest tries to register a vcpu_info
and since vcpu_info placement is optional in the minimum ABI
therefore we can just fail with -ENOSYS

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: implement HYPERVISOR_hvm_op

This is when guest queries for support for HVMOP_pagetable_dying.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: implement XENMEM_add_to_physmap_batch

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: implement HYPERVISOR_memory_op

Specifically XENMEM_add_to_physmap with space XENMAPSPACE_shared_info to
allow the guest to set its shared_info page.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Use the xen_overlay device, add compat support]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: manage and save/restore Xen guest long_mode setting

Xen will "latch" the guest's 32-bit or 64-bit ("long mode") setting when
the guest writes the MSR to fill in the hypercall page, or when the guest
sets the event channel callback in HVM_PARAM_CALLBACK_IRQ.

KVM handles the former and sets the kernel's long_mode flag accordingly.
The latter will be handled in userspace. Keep them in sync by noticing
when a hypercall is made in a mode that doesn't match qemu's idea of
the guest mode, and resyncing from the kernel. Do that same sync right
before serialization too, in case the guest has set the hypercall page
but hasn't yet made a system call.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: add pc_machine_kvm_type to initialize XEN_EMULATE mode

The xen_overlay device (and later similar devices for event channels and
grant tables) need to be instantiated. Do this from a kvm_type method on
the PC machine derivatives, since KVM is only way to support Xen emulation
for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add xen_overlay device for emulating shared xenheap pages

For the shared info page and for grant tables, Xen shares its own pages
from the "Xen heap" to the guest. The guest requests that a given page
from a certain address space (XENMAPSPACE_shared_info, etc.) be mapped
to a given GPA using the XENMEM_add_to_physmap hypercall.

To support that in qemu when *emulating* Xen, create a memory region
(migratable) and allow it to be mapped as an overlay when requested.

Xen theoretically allows the same page to be mapped multiple times
into the guest, but that's hard to track and reinstate over migration,
so we automatically *unmap* any previous mapping when creating a new
one. This approach has been used in production with.... a non-trivial
number of guests expecting true Xen, without any problems yet being
noticed.

This adds just the shared info page for now. The grant tables will be
a larger region, and will need to be overlaid one page at a time. I
think that means I need to create separate aliases for each page of
the overall grant_frames region, so that they can be mapped individually.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: Implement SCHEDOP_poll and SCHEDOP_yield

They both do the same thing and just call sched_yield. This is enough to
stop the Linux guest panicking when running on a host kernel which doesn't
intercept SCHEDOP_poll and lets it reach userspace.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: implement HYPERVISOR_sched_op, SCHEDOP_shutdown

It allows to shutdown itself via hypercall with any of the 3 reasons:
  1) self-reboot
  2) shutdown
  3) crash

Implementing SCHEDOP_shutdown sub op let us handle crashes gracefully rather
than leading to triple faults if it remains unimplemented.

In addition, the SHUTDOWN_soft_reset reason is used for kexec, to reset
Xen shared pages and other enlightenments and leave a clean slate for the
new kernel without the hypervisor helpfully writing information at
unexpected addresses.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch sched_op_compat which was never available for HVM guests,
        Add SCHEDOP_soft_reset]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: implement HYPERVISOR_xen_version

This is just meant to serve as an example on how we can implement
hypercalls. xen_version specifically since Qemu does all kind of
feature controllability. So handling that here seems appropriate.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Implement kvm_gva_rw() safely]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle guest hypercalls

This means handling the new exit reason for Xen but still
crashing on purpose. As we implement each of the hypercalls
we will then return the right return code.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Add CPL to hypercall tracing, disallow hypercalls from CPL > 0]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

xen-platform: allow its creation with XEN_EMULATE mode

The only thing we need to handle on KVM side is to change the
pfn from R/W to R/O.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

xen-platform: exclude vfio-pci from the PCI platform unplug

Such that PCI passthrough devices work for Xen emulated guests.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/hvm: Set Xen vCPU ID in KVM

There are (at least) three different vCPU ID number spaces. One is the
internal KVM vCPU index, based purely on which vCPU was chronologically
created in the kernel first. If userspace threads are all spawned and
create their KVM vCPUs in essentially random order, then the KVM indices
are basically random too.

The second number space is the APIC ID space, which is consistent and
useful for referencing vCPUs. MSIs will specify the target vCPU using
the APIC ID, for example, and the KVM Xen APIs also take an APIC ID
from userspace whenever a vCPU needs to be specified (as opposed to
just using the appropriate vCPU fd).

The third number space is not normally relevant to the kernel, and is
the ACPI/MADT/Xen CPU number which corresponds to cs->cpu_index. But
Xen timer hypercalls use it, and Xen timer hypercalls *really* want
to be accelerated in the kernel rather than handled in userspace, so
the kernel needs to be told.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/kvm: handle Xen HVM cpuid leaves

Introduce support for emulating CPUID for Xen HVM guests. It doesn't make
sense to advertise the KVM leaves to a Xen guest, so do Xen unconditionally
when the xen-version machine property is set.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Obtain xen_version from KVM property, make it automatic]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support

This just initializes the basic Xen support in KVM for now. Only permitted
on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap
overlay, event channel, grant tables and other stuff will exist. There's
no point having the basic hypercall support if nothing else works.

Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used
later by support devices.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: Add XEN_DISABLED mode and make it default

Also set XEN_ATTACH mode in xen_init() to reflect the truth; not that
anyone ever cared before. It was *only* ever checked in xen_init_pv()
before.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: add CONFIG_XENFV_MACHINE and CONFIG_XEN_EMU options for Xen emulation

The XEN_EMU option will cover core Xen support in target/, which exists
only for x86 with KVM today but could theoretically also be implemented
on Arm/Aarch64 and with TCG or other accelerators. It will also cover
the support for architecture-independent grant table and event channel
support which will be added in hw/i386/kvm/ (on the basis that the
non-KVM support is very theoretical and making it not use KVM directly
seems like gratuitous overengineering at this point).

The XENFV_MACHINE option is for the xenfv platform support, which will
now be used both by XEN_EMU and by real Xen.

The XEN option remains dependent on the Xen runtime libraries, and covers
support for real Xen. Some code which currently resides under CONFIG_XEN
will be moving to CONFIG_XENFV_MACHINE over time.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

include: import Xen public headers to include/standard-headers/

There are already some partial headers in include/hw/xen/interface/
which will be removed once we migrate users to the new location.

To start with, define __XEN_TOOLS__ in hw/xen/xen.h to ensure that any
internal definitions needed by Xen toolstack libraries are present
regardless of the order in which the headers are included. A reckoning
will come later, once we make the PV backends work in emulation and
untangle the headers for Xen-native vs. generic parts.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Update to Xen public headers from 4.16.2 release, add some in io/,
define __XEN_TOOLS__ in hw/xen/xen.h]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

Merge tag 'pull-request-2023-01-09' of https://gitlab.com/thuth/qemu into staging

* s390x header clean-ups from Philippe
* Rework and improvements of the EINTR handling by Nikita
* Deprecate the -no-hpet command line option
* Disable the qtests in the 32-bit Windows CI job again
* Some other misc fixes here and there

# gpg: Signature made Mon 09 Jan 2023 14:21:19 GMT
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2023-01-09' of https://gitlab.com/thuth/qemu:
  .gitlab-ci.d/windows: Do not run the qtests in the msys2-32bit job
  error handling: Use RETRY_ON_EINTR() macro where applicable
  Refactoring: refactor TFR() macro to RETRY_ON_EINTR()
  docs/interop: Change the vnc-ledstate-Pseudo-encoding doc into .rst
  i386: Deprecate the -no-hpet QEMU command line option
  tests/qtest/bios-tables-test: Replace -no-hpet with hpet=off machine parameter
  tests/readconfig: spice doesn't support unix socket on windows yet
  target/s390x: Restrict sysemu/reset.h to system emulation
  target/s390x/tcg/excp_helper: Restrict system headers to sysemu
  target/s390x/tcg/misc_helper: Remove unused "memory.h" include
  hw/s390x/pv: Restrict Protected Virtualization to sysemu
  exec/memory: Expose memory_region_access_valid()
  MAINTAINERS: Add MIPS-related docs and configs to the MIPS architecture section
  tests/vm: Update get_default_jobs() to work on non-x86_64 non-KVM hosts
  qemu-iotests/stream-under-throttle: do not shutdown QEMU

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

.gitlab-ci.d/windows: Do not run the qtests in the msys2-32bit job

The qtests are not stable in the msys2-32bit job yet - especially
the test-hmp and the qom-test are failing randomly. Until this is
fixed, let's better disable the qtests here again to avoid failing
CI tests.

Message-Id: <20230105204819.26992-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

error handling: Use RETRY_ON_EINTR() macro where applicable

There is a defined RETRY_ON_EINTR() macro in qemu/osdep.h
which handles the same while loop.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/415
Signed-off-by: Nikita Ivanov <nivanov@cloudlinux.com>
Message-Id: <20221023090422.242617-3-nivanov@cloudlinux.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[thuth: Dropped the hunk that changed socket_accept() in libqtest.c]
Signed-off-by: Thomas Huth <thuth@redhat.com>

Refactoring: refactor TFR() macro to RETRY_ON_EINTR()

Rename macro name to more transparent one and refactor
it to expression.

Signed-off-by: Nikita Ivanov <nivanov@cloudlinux.com>
Message-Id: <20221023090422.242617-2-nivanov@cloudlinux.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

docs/interop: Change the vnc-ledstate-Pseudo-encoding doc into .rst

The file seems to contain perfectly valid rst syntax already, so
rename it to .rst and wire it up in the index.

Message-Id: <20221213101806.46640-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

i386: Deprecate the -no-hpet QEMU command line option

The HPET setting has been turned into a machine property a while ago
already, so we should finally do the next step and deprecate the
legacy CLI option, too.

Message-Id: <20221229114913.260400-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest/bios-tables-test: Replace -no-hpet with hpet=off machine parameter

We are going to deprecate (and finally remove later) the -no-hpet command
line option. Prepare the bios-tables-test by using the replacement hpet=off
machine parameter instead.

Message-Id: <20230109081205.116369-1-thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/readconfig: spice doesn't support unix socket on windows yet

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230103110814.3726795-6-marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

target/s390x: Restrict sysemu/reset.h to system emulation

In user emulation, threads -- implemented as CPU -- are
created/destroyed, but never reset. There is no point in
allowing the user emulation access the sysemu/reset API.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221220145625.26392-5-philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

target/s390x/tcg/excp_helper: Restrict system headers to sysemu

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221217152454.96388-6-philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

target/s390x/tcg/misc_helper: Remove unused "memory.h" include

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221217152454.96388-5-philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

hw/s390x/pv: Restrict Protected Virtualization to sysemu

Protected Virtualization is irrelevant in user emulation.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221217152454.96388-4-philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

exec/memory: Expose memory_region_access_valid()

Instead of having hardware device poking into memory
internal API, expose memory_region_access_valid().

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221217152454.96388-2-philmd@linaro.org>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

MAINTAINERS: Add MIPS-related docs and configs to the MIPS architecture section

docs/system/target-mips.rst and configs/targets/mips* are not covered
in our MAINTAINERS file yet, so let's add them now.

Message-Id: <20221212171252.194864-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/vm: Update get_default_jobs() to work on non-x86_64 non-KVM hosts

On non-x86_64 host, if KVM is not available we get:

  Traceback (most recent call last):
    File "tests/vm/basevm.py", line 634, in main
      vm = vmcls(args, config=config)
    File "tests/vm/basevm.py", line 104, in __init__
      mem = max(4, args.jobs)
  TypeError: '>' not supported between instances of 'NoneType' and 'int'

Fix by always returning a -- not ideal but safe -- '1' value.

Fixes: b09539444a ("tests/vm: allow us to take advantage of MTTCG")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221209164743.70836-1-philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

qemu-iotests/stream-under-throttle: do not shutdown QEMU

Without a kernel or boot disk a QEMU on s390 will exit (usually with a
disabled wait state). This breaks the stream-under-throttle test case.
Do not exit qemu if on s390.

Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Message-Id: <20221207131452.8455-1-borntraeger@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

virtio,pc,pci: features, cleanups, fixes

mostly vhost-vdpa:
    guest announce feature emulation when using shadow virtqueue
    support for configure interrupt
    startup speed ups

an acpi change to only generate cluster node in PPTT when specified for arm

misc fixes, cleanups

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Sun 08 Jan 2023 08:01:39 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (50 commits)
  vhost-scsi: fix memleak of vsc->inflight
  acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block
  tests: acpi: aarch64: Add *.topology tables
  tests: acpi: aarch64: Add topology test for aarch64
  tests: acpi: Add and whitelist *.topology blobs
  tests: virt: Update expected ACPI tables for virt test
  hw/acpi/aml-build: Only generate cluster node in PPTT when specified
  tests: virt: Allow changes to PPTT test table
  virtio-pci: fix proxy->vector_irqfd leak in virtio_pci_set_guest_notifiers
  vdpa: commit all host notifier MRs in a single MR transaction
  vhost: configure all host notifiers in a single MR transaction
  vhost: simplify vhost_dev_enable_notifiers
  vdpa: harden the error path if get_iova_range failed
  vdpa-dev: get iova range explicitly
  docs/devel: Rules on #include in headers
  include: Include headers where needed
  include/hw/virtio: Break inclusion loop
  include/hw/cxl: Break inclusion loop cxl_pci.h and cxl_cdat_h
  include/hw/pci: Include hw/pci/pci.h where needed
  include/hw/pci: Split pci_device.h off pci.h
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* Atomic memslot updates for KVM (Emanuele, David)
* Always send errors to logfile when daemonized (Greg)
* Add support for IDE CompactFlash card (Lubomir)
* First round of build system cleanups (myself)
* First round of feature removals (myself)
* Reduce "qemu/accel.h" inclusion (Philippe)

# gpg: Signature made Thu 05 Jan 2023 23:51:09 GMT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (24 commits)
  i386: SGX: remove deprecated member of SGXInfo
  target/i386: Add SGX aex-notify and EDECCSSA support
  util: remove support -chardev tty and -chardev parport
  util: remove support for hex numbers with a scaling suffix
  KVM: remove support for kernel-irqchip=off
  docs: do not talk about past removal as happening in the future
  meson: accept relative symlinks in "meson introspect --installed" data
  meson: cleanup compiler detection
  meson: support meson 0.64 -Doptimization=plain
  configure: test all warnings
  tests/qapi-schema: remove Meson workaround
  meson: cleanup dummy-cpus.c rules
  meson: tweak hardening options for Windows
  configure: remove backwards-compatibility and obsolete options
  configure: preserve qemu-ga variables
  configure: cleanup $cpu tests
  configure: remove dead function
  configure: remove useless write_c_skeleton
  ide: Add "ide-cf" driver, a CompactFlash card
  ide: Add 8-bit data mode
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-tcg-20230106' of https://gitlab.com/rth7680/qemu into staging

tcg/s390x improvements:
- drop support for pre-z196 cpus (eol before 2017)
- add support for misc-instruction-extensions-3
- misc cleanups

# gpg: Signature made Sat 07 Jan 2023 07:47:59 GMT
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* tag 'pull-tcg-20230106' of https://gitlab.com/rth7680/qemu: (27 commits)
  tcg/s390x: Avoid the constant pool in tcg_out_movi
  tcg/s390x: Cleanup tcg_out_movi
  tcg/s390x: Tighten constraints for 64-bit compare
  tcg/s390x: Implement ctpop operation
  tcg/s390x: Use tgen_movcond_int in tgen_clz
  tcg/s390x: Support SELGR instruction in movcond
  tcg/s390x: Generalize movcond implementation
  tcg/s390x: Create tgen_cmp2 to simplify movcond
  tcg/s390x: Support MIE3 logical operations
  tcg/s390x: Tighten constraints for and_i64
  tcg/s390x: Tighten constraints for or_i64 and xor_i64
  tcg/s390x: Issue XILF directly for xor_i32
  tcg/s390x: Support MIE2 MGRK instruction
  tcg/s390x: Support MIE2 multiply single instructions
  tcg/s390x: Distinguish RIE formats
  tcg/s390x: Distinguish RRF-a and RRF-c formats
  tcg/s390x: Use LARL+AGHI for odd addresses
  tcg/s390x: Remove DISTINCT_OPERANDS facility check
  tcg/s390x: Remove FAST_BCR_SER facility check
  tcg/s390x: Check for load-on-condition facility at startup
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

vhost-scsi: fix memleak of vsc->inflight

This is below memleak detected when to quit the qemu-system-x86_64 (with
vhost-scsi-pci).

(qemu) quit

=================================================================
==15568==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f00aec57917 in __interceptor_calloc (/lib64/libasan.so.6+0xb4917)
    #1 0x7f00ada0d7b5 in g_malloc0 (/lib64/libglib-2.0.so.0+0x517b5)
    #2 0x5648ffd38bac in vhost_scsi_start ../hw/scsi/vhost-scsi.c:92
    #3 0x5648ffd38d52 in vhost_scsi_set_status ../hw/scsi/vhost-scsi.c:131
    #4 0x5648ffda340e in virtio_set_status ../hw/virtio/virtio.c:2036
    #5 0x5648ff8de281 in virtio_ioport_write ../hw/virtio/virtio-pci.c:431
    #6 0x5648ff8deb29 in virtio_pci_config_write ../hw/virtio/virtio-pci.c:576
    #7 0x5648ffe5c0c2 in memory_region_write_accessor ../softmmu/memory.c:493
    #8 0x5648ffe5c424 in access_with_adjusted_size ../softmmu/memory.c:555
    #9 0x5648ffe6428f in memory_region_dispatch_write ../softmmu/memory.c:1515
    #10 0x5648ffe8613d in flatview_write_continue ../softmmu/physmem.c:2825
    #11 0x5648ffe86490 in flatview_write ../softmmu/physmem.c:2867
    #12 0x5648ffe86d9f in address_space_write ../softmmu/physmem.c:2963
    #13 0x5648ffe86e57 in address_space_rw ../softmmu/physmem.c:2973
    #14 0x5648fffbfb3d in kvm_handle_io ../accel/kvm/kvm-all.c:2639
    #15 0x5648fffc0e0d in kvm_cpu_exec ../accel/kvm/kvm-all.c:2890
    #16 0x5648fffc90a7 in kvm_vcpu_thread_fn ../accel/kvm/kvm-accel-ops.c:51
    #17 0x56490042400a in qemu_thread_start ../util/qemu-thread-posix.c:505
    #18 0x7f00ac3b6ea4 in start_thread (/lib64/libpthread.so.0+0x7ea4)

Free the vsc->inflight at the 'stop' path.

Fixes: b82526c7ee ("vhost-scsi: support inflight io track")
Cc: Joe Jin <joe.jin@oracle.com>
Cc: Li Feng <fengli@smartx.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Message-Id: <20230104160433.21353-1-dongli.zhang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block

The modern ACPI CPU hotplug interface was introduced in the following
series (aa1dd39ca307..679dd1a957df), released in v2.7.0:

  1  abd49bc2ed2f docs: update ACPI CPU hotplug spec with new protocol
  2  16bcab97eb9f pc: piix4/ich9: add 'cpu-hotplug-legacy' property
  3  5e1b5d93887b acpi: cpuhp: add CPU devices AML with _STA method
  4  ac35f13ba8f8 pc: acpi: introduce AcpiDeviceIfClass.madt_cpu hook
  5  d2238cb6781d acpi: cpuhp: implement hot-add parts of CPU hotplug
                  interface
  6  8872c25a26cc acpi: cpuhp: implement hot-remove parts of CPU hotplug
                  interface
  7  76623d00ae57 acpi: cpuhp: add cpu._OST handling
  8  679dd1a957df pc: use new CPU hotplug interface since 2.7 machine type

Before patch#1, "docs/specs/acpi_cpu_hotplug.txt" only specified 1-byte
accesses for the hotplug register block.  Patch#1 preserved the same
restriction for the legacy register block, but:

- it specified DWORD accesses for some of the modern registers,

- in particular, the switch from the legacy block to the modern block
  would require a DWORD write to the *legacy* block.

The latter functionality was then implemented in cpu_status_write()
[hw/acpi/cpu_hotplug.c], in patch#8.

Unfortunately, all DWORD accesses depended on a dormant bug: the one
introduced in earlier commit a014ed07bd5a ("memory: accept mismatching
sizes in memory_region_access_valid", 2013-05-29); first released in
v1.6.0.  Due to commit a014ed07bd5a, the DWORD accesses to the *legacy*
CPU hotplug register block would work in spite of the above series *not*
relaxing "valid.max_access_size = 1" in "hw/acpi/cpu_hotplug.c":

> static const MemoryRegionOps AcpiCpuHotplug_ops = {
>     .read = cpu_status_read,
>     .write = cpu_status_write,
>     .endianness = DEVICE_LITTLE_ENDIAN,
>     .valid = {
>         .min_access_size = 1,
>         .max_access_size = 1,
>     },
> };

Later, in commits e6d0c3ce6895 ("acpi: cpuhp: introduce 'Command data 2'
field", 2020-01-22) and ae340aa3d256 ("acpi: cpuhp: spec: add typical
usecases", 2020-01-22), first released in v5.0.0, the modern CPU hotplug
interface (including the documentation) was extended with another DWORD
*read* access, namely to the "Command data 2" register, which would be
important for the guest to confirm whether it managed to switch the
register block from legacy to modern.

This functionality too silently depended on the bug from commit
a014ed07bd5a.

In commit 5d971f9e6725 ('memory: Revert "memory: accept mismatching sizes
in memory_region_access_valid"', 2020-06-26), first released in v5.1.0,
the bug from commit a014ed07bd5a was fixed (the commit was reverted).
That swiftly exposed the bug in "AcpiCpuHotplug_ops", still present from
the v2.7.0 series quoted at the top -- namely the fact that
"valid.max_access_size = 1" didn't match what the guest was supposed to
do, according to the spec ("docs/specs/acpi_cpu_hotplug.txt").

The symptom is that the "modern interface negotiation protocol"
described in commit ae340aa3d256:

> +      Use following steps to detect and enable modern CPU hotplug interface:
> +        1. Store 0x0 to the 'CPU selector' register,
> +           attempting to switch to modern mode
> +        2. Store 0x0 to the 'CPU selector' register,
> +           to ensure valid selector value
> +        3. Store 0x0 to the 'Command field' register,
> +        4. Read the 'Command data 2' register.
> +           If read value is 0x0, the modern interface is enabled.
> +           Otherwise legacy or no CPU hotplug interface available

falls apart for the guest: steps 1 and 2 are lost, because they are DWORD
writes; so no switching happens.  Step 3 (a single-byte write) is not
lost, but it has no effect; see the condition in cpu_status_write() in
patch#8.  And step 4 *misleads* the guest into thinking that the switch
worked: the DWORD read is lost again -- it returns zero to the guest
without ever reaching the device model, so the guest never learns the
switch didn't work.

This means that guest behavior centered on the "Command data 2" register
worked *only* in the v5.0.0 release; it got effectively regressed in
v5.1.0.

To make things *even more* complicated, the breakage was (and remains, as
of today) visible with TCG acceleration only.  Commit 5d971f9e6725 makes
no difference with KVM acceleration -- the DWORD accesses still work,
despite "valid.max_access_size = 1".

As commit 5d971f9e6725 suggests, fix the problem by raising
"valid.max_access_size" to 4 -- the spec now clearly instructs the guest
to perform DWORD accesses to the legacy register block too, for enabling
(and verifying!) the modern block.  In order to keep compatibility for the
device model implementation though, set "impl.max_access_size = 1", so
that wide accesses be split before they reach the legacy read/write
handlers, like they always have been on KVM, and like they were on TCG
before 5d971f9e6725 (v5.1.0).

Tested with:

- OVMF IA32 + qemu-system-i386, CPU hotplug/hot-unplug with SMM,
  intermixed with ACPI S3 suspend/resume, using KVM accel
  (regression-test);

- OVMF IA32X64 + qemu-system-x86_64, CPU hotplug/hot-unplug with SMM,
  intermixed with ACPI S3 suspend/resume, using KVM accel
  (regression-test);

- OVMF IA32 + qemu-system-i386, SMM enabled, using TCG accel; verified the
  register block switch and the present/possible CPU counting through the
  modern hotplug interface, during OVMF boot (bugfix test);

- I do not have any testcase (guest payload) for regression-testing CPU
  hotplug through the *legacy* CPU hotplug register block.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Ani Sinha <ani@anisinha.ca>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: qemu-stable@nongnu.org
Ref: "IO port write width clamping differs between TCG and KVM"
Link: http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com
Link: https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html
Reported-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230105161804.82486-1-lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tests: acpi: aarch64: Add *.topology tables

Add *.topology tables for the aarch64's topology test and empty
bios-tables-test-allowed-diff.h

The disassembled differences between actual and expected
PPTT (the table which we actually care about):

+/*
+ * Intel ACPI Component Architecture
+ * AML/ASL+ Disassembler version 20180105 (64-bit version)
+ * Copyright (c) 2000 - 2018 Intel Corporation
+ *
+ * Disassembly of /tmp/aml-WUN4U1, Tue Nov  1 09:51:52 2022
+ *
+ * ACPI Data Table [PPTT]
+ *
+ * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
+ */
+
+[000h 0000   4]                    Signature : "PPTT"    [Processor Properties Topology Table]
+[004h 0004   4]                 Table Length : 00000150
+[008h 0008   1]                     Revision : 02
+[009h 0009   1]                     Checksum : 7C
+[00Ah 0010   6]                       Oem ID : "BOCHS "
+[010h 0016   8]                 Oem Table ID : "BXPC    "
+[018h 0024   4]                 Oem Revision : 00000001
+[01Ch 0028   4]              Asl Compiler ID : "BXPC"
+[020h 0032   4]        Asl Compiler Revision : 00000001
+
+
+[024h 0036   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[025h 0037   1]                       Length : 14
+[026h 0038   2]                     Reserved : 0000
+[028h 0040   4]        Flags (decoded below) : 00000001
+                            Physical package : 1
+                     ACPI Processor ID valid : 0
+[02Ch 0044   4]                       Parent : 00000000
+[030h 0048   4]            ACPI Processor ID : 00000000
+[034h 0052   4]      Private Resource Number : 00000000
+
+[038h 0056   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[039h 0057   1]                       Length : 14
+[03Ah 0058   2]                     Reserved : 0000
+[03Ch 0060   4]        Flags (decoded below) : 00000000
+                            Physical package : 0
+                     ACPI Processor ID valid : 0
+[040h 0064   4]                       Parent : 00000024
+[044h 0068   4]            ACPI Processor ID : 00000000
+[048h 0072   4]      Private Resource Number : 00000000
+
+[04Ch 0076   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[04Dh 0077   1]                       Length : 14
+[04Eh 0078   2]                     Reserved : 0000
+[050h 0080   4]        Flags (decoded below) : 00000000
+                            Physical package : 0
+                     ACPI Processor ID valid : 0
+[054h 0084   4]                       Parent : 00000038
+[058h 0088   4]            ACPI Processor ID : 00000000
+[05Ch 0092   4]      Private Resource Number : 00000000
+
+[060h 0096   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[061h 0097   1]                       Length : 14
+[062h 0098   2]                     Reserved : 0000
+[064h 0100   4]        Flags (decoded below) : 0000000E
+                            Physical package : 0
+                     ACPI Processor ID valid : 1
+[068h 0104   4]                       Parent : 0000004C
+[06Ch 0108   4]            ACPI Processor ID : 00000000
+[070h 0112   4]      Private Resource Number : 00000000
+
+[074h 0116   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[075h 0117   1]                       Length : 14
+[076h 0118   2]                     Reserved : 0000
+[078h 0120   4]        Flags (decoded below) : 0000000E
+                            Physical package : 0
+                     ACPI Processor ID valid : 1
+[07Ch 0124   4]                       Parent : 0000004C
+[080h 0128   4]            ACPI Processor ID : 00000001
+[084h 0132   4]      Private Resource Number : 00000000
+
+[088h 0136   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[089h 0137   1]                       Length : 14
+[08Ah 0138   2]                     Reserved : 0000
+[08Ch 0140   4]        Flags (decoded below) : 00000000
+                            Physical package : 0
+                     ACPI Processor ID valid : 0
+[090h 0144   4]                       Parent : 00000038
+[094h 0148   4]            ACPI Processor ID : 00000001
+[098h 0152   4]      Private Resource Number : 00000000
+
+[09Ch 0156   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[09Dh 0157   1]                       Length : 14
+[09Eh 0158   2]                     Reserved : 0000
+[0A0h 0160   4]        Flags (decoded below) : 0000000E
+                            Physical package : 0
+                     ACPI Processor ID valid : 1
+[0A4h 0164   4]                       Parent : 00000088
+[0A8h 0168   4]            ACPI Processor ID : 00000002
+[0ACh 0172   4]      Private Resource Number : 00000000
+
+[0B0h 0176   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[0B1h 0177   1]                       Length : 14
+[0B2h 0178   2]                     Reserved : 0000
+[0B4h 0180   4]        Flags (decoded below) : 0000000E
+                            Physical package : 0
+                     ACPI Processor ID valid : 1
+[0B8h 0184   4]                       Parent : 00000088
+[0BCh 0188   4]            ACPI Processor ID : 00000003
+[0C0h 0192   4]      Private Resource Number : 00000000
+
+[0C4h 0196   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[0C5h 0197   1]                       Length : 14
+[0C6h 0198   2]                     Reserved : 0000
+[0C8h 0200   4]        Flags (decoded below) : 00000000
+                            Physical package : 0
+                     ACPI Processor ID valid : 0
+[0CCh 0204   4]                       Parent : 00000024
+[0D0h 0208   4]            ACPI Processor ID : 00000001
+[0D4h 0212   4]      Private Resource Number : 00000000
+
+[0D8h 0216   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[0D9h 0217   1]                       Length : 14
+[0DAh 0218   2]                     Reserved : 0000
+[0DCh 0220   4]        Flags (decoded below) : 00000000
+                            Physical package : 0
+                     ACPI Processor ID valid : 0
+[0E0h 0224   4]                       Parent : 000000C4
+[0E4h 0228   4]            ACPI Processor ID : 00000000
+[0E8h 0232   4]      Private Resource Number : 00000000
+
+[0ECh 0236   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[0EDh 0237   1]                       Length : 14
+[0EEh 0238   2]                     Reserved : 0000
+[0F0h 0240   4]        Flags (decoded below) : 0000000E
+                            Physical package : 0
+                     ACPI Processor ID valid : 1
+[0F4h 0244   4]                       Parent : 000000D8
+[0F8h 0248   4]            ACPI Processor ID : 00000004
+[0FCh 0252   4]      Private Resource Number : 00000000
+
+[100h 0256   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[101h 0257   1]                       Length : 14
+[102h 0258   2]                     Reserved : 0000
+[104h 0260   4]        Flags (decoded below) : 0000000E
+                            Physical package : 0
+                     ACPI Processor ID valid : 1
+[108h 0264   4]                       Parent : 000000D8
+[10Ch 0268   4]            ACPI Processor ID : 00000005
+[110h 0272   4]      Private Resource Number : 00000000
+
+[114h 0276   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[115h 0277   1]                       Length : 14
+[116h 0278   2]                     Reserved : 0000
+[118h 0280   4]        Flags (decoded below) : 00000000
+                            Physical package : 0
+                     ACPI Processor ID valid : 0
+[11Ch 0284   4]                       Parent : 000000C4
+[120h 0288   4]            ACPI Processor ID : 00000001
+[124h 0292   4]      Private Resource Number : 00000000
+
+[128h 0296   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[129h 0297   1]                       Length : 14
+[12Ah 0298   2]                     Reserved : 0000
+[12Ch 0300   4]        Flags (decoded below) : 0000000E
+                            Physical package : 0
+                     ACPI Processor ID valid : 1
+[130h 0304   4]                       Parent : 00000114
+[134h 0308   4]            ACPI Processor ID : 00000006
+[138h 0312   4]      Private Resource Number : 00000000
+
+[13Ch 0316   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[13Dh 0317   1]                       Length : 14
+[13Eh 0318   2]                     Reserved : 0000
+[140h 0320   4]        Flags (decoded below) : 0000000E
+                            Physical package : 0
+                     ACPI Processor ID valid : 1
+[144h 0324   4]                       Parent : 00000114
+[148h 0328   4]            ACPI Processor ID : 00000007
+[14Ch 0332   4]      Private Resource Number : 00000000
+
+Raw Table Data: Length 336 (0x150)
+
+  0000: 50 50 54 54 50 01 00 00 02 7C 42 4F 43 48 53 20  // PPTTP....|BOCHS
+  0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
+  0020: 01 00 00 00 00 14 00 00 01 00 00 00 00 00 00 00  // ................
+  0030: 00 00 00 00 00 00 00 00 00 14 00 00 00 00 00 00  // ................
+  0040: 24 00 00 00 00 00 00 00 00 00 00 00 00 14 00 00  // $...............
+  0050: 00 00 00 00 38 00 00 00 00 00 00 00 00 00 00 00  // ....8...........
+  0060: 00 14 00 00 0E 00 00 00 4C 00 00 00 00 00 00 00  // ........L.......
+  0070: 00 00 00 00 00 14 00 00 0E 00 00 00 4C 00 00 00  // ............L...
+  0080: 01 00 00 00 00 00 00 00 00 14 00 00 00 00 00 00  // ................
+  0090: 38 00 00 00 01 00 00 00 00 00 00 00 00 14 00 00  // 8...............
+  00A0: 0E 00 00 00 88 00 00 00 02 00 00 00 00 00 00 00  // ................
+  00B0: 00 14 00 00 0E 00 00 00 88 00 00 00 03 00 00 00  // ................
+  00C0: 00 00 00 00 00 14 00 00 00 00 00 00 24 00 00 00  // ............$...
+  00D0: 01 00 00 00 00 00 00 00 00 14 00 00 00 00 00 00  // ................
+  00E0: C4 00 00 00 00 00 00 00 00 00 00 00 00 14 00 00  // ................
+  00F0: 0E 00 00 00 D8 00 00 00 04 00 00 00 00 00 00 00  // ................
+  0100: 00 14 00 00 0E 00 00 00 D8 00 00 00 05 00 00 00  // ................
+  0110: 00 00 00 00 00 14 00 00 00 00 00 00 C4 00 00 00  // ................
+  0120: 01 00 00 00 00 00 00 00 00 14 00 00 0E 00 00 00  // ................
+  0130: 14 01 00 00 06 00 00 00 00 00 00 00 00 14 00 00  // ................
+  0140: 0E 00 00 00 14 01 00 00 07 00 00 00 00 00 00 00  // ................

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Message-Id: <20221229065513.55652-7-yangyicong@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tests: acpi: aarch64: Add topology test for aarch64

Add test for aarch64's ACPI topology building for all the supported
levels.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Tested-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Message-Id: <20221229065513.55652-6-yangyicong@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tests: acpi: Add and whitelist *.topology blobs

Add and whitelist *.topology blobs, prepares for the aarch64's ACPI
topology building test.

Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Message-Id: <20221229065513.55652-5-yangyicong@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tests: virt: Update expected ACPI tables for virt test

Update the ACPI tables according to the acpi aml_build change, also
empty bios-tables-test-allowed-diff.h.

The disassembled differences between actual and expected PPTT:

  /*
   * Intel ACPI Component Architecture
   * AML/ASL+ Disassembler version 20180105 (64-bit version)
   * Copyright (c) 2000 - 2018 Intel Corporation
   *
- * Disassembly of tests/data/acpi/virt/PPTT, Tue Nov  1 09:29:12 2022
+ * Disassembly of /tmp/aml-DIIGV1, Tue Nov  1 09:29:12 2022
   *
   * ACPI Data Table [PPTT]
   *
   * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
   */

  [000h 0000   4]                    Signature : "PPTT"    [Processor Properties Topology Table]
-[004h 0004   4]                 Table Length : 00000060
+[004h 0004   4]                 Table Length : 0000004C
  [008h 0008   1]                     Revision : 02
-[009h 0009   1]                     Checksum : 48
+[009h 0009   1]                     Checksum : A8
  [00Ah 0010   6]                       Oem ID : "BOCHS "
  [010h 0016   8]                 Oem Table ID : "BXPC    "
  [018h 0024   4]                 Oem Revision : 00000001
  [01Ch 0028   4]              Asl Compiler ID : "BXPC"
  [020h 0032   4]        Asl Compiler Revision : 00000001

  [024h 0036   1]                Subtable Type : 00 [Processor Hierarchy Node]
  [025h 0037   1]                       Length : 14
  [026h 0038   2]                     Reserved : 0000
  [028h 0040   4]        Flags (decoded below) : 00000001
                              Physical package : 1
                       ACPI Processor ID valid : 0
  [02Ch 0044   4]                       Parent : 00000000
  [030h 0048   4]            ACPI Processor ID : 00000000
  [034h 0052   4]      Private Resource Number : 00000000

  [038h 0056   1]                Subtable Type : 00 [Processor Hierarchy Node]
  [039h 0057   1]                       Length : 14
  [03Ah 0058   2]                     Reserved : 0000
-[03Ch 0060   4]        Flags (decoded below) : 00000000
+[03Ch 0060   4]        Flags (decoded below) : 0000000A
                              Physical package : 0
-                     ACPI Processor ID valid : 0
+                     ACPI Processor ID valid : 1
  [040h 0064   4]                       Parent : 00000024
  [044h 0068   4]            ACPI Processor ID : 00000000
  [048h 0072   4]      Private Resource Number : 00000000

-[04Ch 0076   1]                Subtable Type : 00 [Processor Hierarchy Node]
-[04Dh 0077   1]                       Length : 14
-[04Eh 0078   2]                     Reserved : 0000
-[050h 0080   4]        Flags (decoded below) : 0000000A
-                            Physical package : 0
-                     ACPI Processor ID valid : 1
-[054h 0084   4]                       Parent : 00000038
-[058h 0088   4]            ACPI Processor ID : 00000000
-[05Ch 0092   4]      Private Resource Number : 00000000
-
-Raw Table Data: Length 96 (0x60)
+Raw Table Data: Length 76 (0x4C)

-  0000: 50 50 54 54 60 00 00 00 02 48 42 4F 43 48 53 20  // PPTT`....HBOCHS
+  0000: 50 50 54 54 4C 00 00 00 02 A8 42 4F 43 48 53 20  // PPTTL.....BOCHS
    0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
    0020: 01 00 00 00 00 14 00 00 01 00 00 00 00 00 00 00  // ................
-  0030: 00 00 00 00 00 00 00 00 00 14 00 00 00 00 00 00  // ................
-  0040: 24 00 00 00 00 00 00 00 00 00 00 00 00 14 00 00  // $...............
-  0050: 0A 00 00 00 38 00 00 00 00 00 00 00 00 00 00 00  // ....8...........
+  0030: 00 00 00 00 00 00 00 00 00 14 00 00 0A 00 00 00  // ................
+  0040: 24 00 00 00 00 00 00 00 00 00 00 00              // $...........

PPTT.acpihmatvirt is also updated:
  /*
   * Intel ACPI Component Architecture
   * AML/ASL+ Disassembler version 20180105 (64-bit version)
   * Copyright (c) 2000 - 2018 Intel Corporation
   *
- * Disassembly of tests/data/acpi/virt/PPTT.acpihmatvirt, Wed Dec 28 15:36:06 2022
+ * Disassembly of /tmp/aml-IPKJX1, Wed Dec 28 15:36:06 2022
   *
   * ACPI Data Table [PPTT]
   *
   * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
   */

  [000h 0000   4]                    Signature : "PPTT"    [Processor Properties Topology Table]
-[004h 0004   4]                 Table Length : 000000C4
+[004h 0004   4]                 Table Length : 0000009C
  [008h 0008   1]                     Revision : 02
-[009h 0009   1]                     Checksum : 9E
+[009h 0009   1]                     Checksum : FE
  [00Ah 0010   6]                       Oem ID : "BOCHS "
  [010h 0016   8]                 Oem Table ID : "BXPC    "
  [018h 0024   4]                 Oem Revision : 00000001
  [01Ch 0028   4]              Asl Compiler ID : "BXPC"
  [020h 0032   4]        Asl Compiler Revision : 00000001

  [024h 0036   1]                Subtable Type : 00 [Processor Hierarchy Node]
  [025h 0037   1]                       Length : 14
  [026h 0038   2]                     Reserved : 0000
  [028h 0040   4]        Flags (decoded below) : 00000001
                              Physical package : 1
                       ACPI Processor ID valid : 0
  [02Ch 0044   4]                       Parent : 00000000
  [030h 0048   4]            ACPI Processor ID : 00000000
  [034h 0052   4]      Private Resource Number : 00000000

  [038h 0056   1]                Subtable Type : 00 [Processor Hierarchy Node]
  [039h 0057   1]                       Length : 14
  [03Ah 0058   2]                     Reserved : 0000
-[03Ch 0060   4]        Flags (decoded below) : 00000000
+[03Ch 0060   4]        Flags (decoded below) : 0000000A
                              Physical package : 0
-                     ACPI Processor ID valid : 0
+                     ACPI Processor ID valid : 1
  [040h 0064   4]                       Parent : 00000024
  [044h 0068   4]            ACPI Processor ID : 00000000
  [048h 0072   4]      Private Resource Number : 00000000

  [04Ch 0076   1]                Subtable Type : 00 [Processor Hierarchy Node]
  [04Dh 0077   1]                       Length : 14
  [04Eh 0078   2]                     Reserved : 0000
  [050h 0080   4]        Flags (decoded below) : 0000000A
                              Physical package : 0
                       ACPI Processor ID valid : 1
-[054h 0084   4]                       Parent : 00000038
-[058h 0088   4]            ACPI Processor ID : 00000000
+[054h 0084   4]                       Parent : 00000024
+[058h 0088   4]            ACPI Processor ID : 00000001
  [05Ch 0092   4]      Private Resource Number : 00000000

  [060h 0096   1]                Subtable Type : 00 [Processor Hierarchy Node]
  [061h 0097   1]                       Length : 14
  [062h 0098   2]                     Reserved : 0000
-[064h 0100   4]        Flags (decoded below) : 0000000A
-                            Physical package : 0
-                     ACPI Processor ID valid : 1
-[068h 0104   4]                       Parent : 00000038
+[064h 0100   4]        Flags (decoded below) : 00000001
+                            Physical package : 1
+                     ACPI Processor ID valid : 0
+[068h 0104   4]                       Parent : 00000000
  [06Ch 0108   4]            ACPI Processor ID : 00000001
  [070h 0112   4]      Private Resource Number : 00000000

  [074h 0116   1]                Subtable Type : 00 [Processor Hierarchy Node]
  [075h 0117   1]                       Length : 14
  [076h 0118   2]                     Reserved : 0000
-[078h 0120   4]        Flags (decoded below) : 00000001
-                            Physical package : 1
-                     ACPI Processor ID valid : 0
-[07Ch 0124   4]                       Parent : 00000000
-[080h 0128   4]            ACPI Processor ID : 00000001
+[078h 0120   4]        Flags (decoded below) : 0000000A
+                            Physical package : 0
+                     ACPI Processor ID valid : 1
+[07Ch 0124   4]                       Parent : 00000060
+[080h 0128   4]            ACPI Processor ID : 00000002
  [084h 0132   4]      Private Resource Number : 00000000

  [088h 0136   1]                Subtable Type : 00 [Processor Hierarchy Node]
  [089h 0137   1]                       Length : 14
  [08Ah 0138   2]                     Reserved : 0000
-[08Ch 0140   4]        Flags (decoded below) : 00000000
-                            Physical package : 0
-                     ACPI Processor ID valid : 0
-[090h 0144   4]                       Parent : 00000074
-[094h 0148   4]            ACPI Processor ID : 00000000
-[098h 0152   4]      Private Resource Number : 00000000
-
-[09Ch 0156   1]                Subtable Type : 00 [Processor Hierarchy Node]
-[09Dh 0157   1]                       Length : 14
-[09Eh 0158   2]                     Reserved : 0000
-[0A0h 0160   4]        Flags (decoded below) : 0000000A
-                            Physical package : 0
-                     ACPI Processor ID valid : 1
-[0A4h 0164   4]                       Parent : 00000088
-[0A8h 0168   4]            ACPI Processor ID : 00000002
-[0ACh 0172   4]      Private Resource Number : 00000000
-
-[0B0h 0176   1]                Subtable Type : 00 [Processor Hierarchy Node]
-[0B1h 0177   1]                       Length : 14
-[0B2h 0178   2]                     Reserved : 0000
-[0B4h 0180   4]        Flags (decoded below) : 0000000A
+[08Ch 0140   4]        Flags (decoded below) : 0000000A
                              Physical package : 0
                       ACPI Processor ID valid : 1
-[0B8h 0184   4]                       Parent : 00000088
-[0BCh 0188   4]            ACPI Processor ID : 00000003
-[0C0h 0192   4]      Private Resource Number : 00000000
+[090h 0144   4]                       Parent : 00000060
+[094h 0148   4]            ACPI Processor ID : 00000003
+[098h 0152   4]      Private Resource Number : 00000000

-Raw Table Data: Length 196 (0xC4)
+Raw Table Data: Length 156 (0x9C)

-  0000: 50 50 54 54 C4 00 00 00 02 9E 42 4F 43 48 53 20  // PPTT......BOCHS
+  0000: 50 50 54 54 9C 00 00 00 02 FE 42 4F 43 48 53 20  // PPTT......BOCHS
    0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
    0020: 01 00 00 00 00 14 00 00 01 00 00 00 00 00 00 00  // ................
-  0030: 00 00 00 00 00 00 00 00 00 14 00 00 00 00 00 00  // ................
+  0030: 00 00 00 00 00 00 00 00 00 14 00 00 0A 00 00 00  // ................
    0040: 24 00 00 00 00 00 00 00 00 00 00 00 00 14 00 00  // $...............
-  0050: 0A 00 00 00 38 00 00 00 00 00 00 00 00 00 00 00  // ....8...........
-  0060: 00 14 00 00 0A 00 00 00 38 00 00 00 01 00 00 00  // ........8.......
-  0070: 00 00 00 00 00 14 00 00 01 00 00 00 00 00 00 00  // ................
-  0080: 01 00 00 00 00 00 00 00 00 14 00 00 00 00 00 00  // ................
-  0090: 74 00 00 00 00 00 00 00 00 00 00 00 00 14 00 00  // t...............
-  00A0: 0A 00 00 00 88 00 00 00 02 00 00 00 00 00 00 00  // ................
-  00B0: 00 14 00 00 0A 00 00 00 88 00 00 00 03 00 00 00  // ................
-  00C0: 00 00 00 00                                      // ....
+  0050: 0A 00 00 00 24 00 00 00 01 00 00 00 00 00 00 00  // ....$...........
+  0060: 00 14 00 00 01 00 00 00 00 00 00 00 01 00 00 00  // ................
+  0070: 00 00 00 00 00 14 00 00 0A 00 00 00 60 00 00 00  // ............`...
+  0080: 02 00 00 00 00 00 00 00 00 14 00 00 0A 00 00 00  // ................
+  0090: 60 00 00 00 03 00 00 00 00 00 00 00              // `...........

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Message-Id: <20221229065513.55652-4-yangyicong@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/acpi/aml-build: Only generate cluster node in PPTT when specified

Currently we'll always generate a cluster node no matter user has
specified '-smp clusters=X' or not. Cluster is an optional level
and will participant the building of Linux scheduling domains and
only appears on a few platforms. It's unncessary to always build
it when it cannot reflect the real topology on platforms having no
cluster implementation and to avoid affecting the linux scheduling
domains in the VM. So only generate the cluster topology in ACPI
PPTT when the user has specified it explicitly in -smp.

Tested qemu-system-aarch64 with `-smp 8` and linux 6.1-rc1, without
this patch:
estuary:/sys/devices/system/cpu/cpu0/topology$ cat cluster_*
ff # cluster_cpus
0-7 # cluster_cpus_list
56 # cluster_id

with this patch:
estuary:/sys/devices/system/cpu/cpu0/topology$ cat cluster_*
ff # cluster_cpus
0-7 # cluster_cpus_list
36 # cluster_id, with no cluster node kernel will make it to
physical package id

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Tested-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Message-Id: <20221229065513.55652-3-yangyicong@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tests: virt: Allow changes to PPTT test table

Allow changes to test/data/acpi/virt/PPTT*, prepare to change the
building policy of the cluster topology.

Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Message-Id: <20221229065513.55652-2-yangyicong@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-pci: fix proxy->vector_irqfd leak in virtio_pci_set_guest_notifiers

proxy->vector_irqfd did not free when kvm_virtio_pci_vector_use or
msix_set_vector_notifiers failed in virtio_pci_set_guest_notifiers.

Fixes: 7d37d351
Signed-off-by: Lei Xiang <leixiang@kylinos.cn>
Tested-by: Zeng Chi <zengchi@kylinos.cn>
Suggested-by: Xie Ming <xieming@kylinos.cn>
Message-Id: <20221227081604.806415-1-leixiang@kylinos.cn>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vdpa: commit all host notifier MRs in a single MR transaction

This allows the vhost-vdpa device to batch the setup of all its MRs of
host notifiers.

This significantly reduces the device starting time, e.g. the time spend
on setup the host notifier MRs reduce from 423ms to 32ms for a VM with
64 vCPUs and 3 vhost-vDPA generic devices (vdpa_sim_blk, 64vq per device).

Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20221227072015.3134-4-longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

vhost: configure all host notifiers in a single MR transaction

This allows the vhost device to batch the setup of all its host notifiers.
This significantly reduces the device starting time, e.g. the time spend
on enabling notifiers reduce from 376ms to 9.1ms for a VM with 64 vCPUs
and 3 vhost-vDPA generic devices (vdpa_sim_blk, 64vq per device)

Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20221227072015.3134-3-longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

vhost: simplify vhost_dev_enable_notifiers

Simplify the error path in vhost_dev_enable_notifiers by using
vhost_dev_disable_notifiers directly.

Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20221227072015.3134-2-longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vdpa: harden the error path if get_iova_range failed

We should stop if the GET_IOVA_RANGE ioctl failed.

Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20221224114848.3062-3-longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>