www.infradead.org Git - users/dwmw2/qemu.git/log

hw/xen: Create initial XenStore nodes

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement core serialize/deserialize methods for xenstore_impl

In fact I think we want to only serialize the contents of the domain's
path in /local/domain/${domid} and leave the rest to be recreated? Will
defer to Paul for that.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

xenstore perms WIP

Store perms as a GList of strings, check permissions. No unit tests yet.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: David Woodhoues <dwmw@amazon.co.uk>

hw/xen: Watches on XenStore transactions

Firing watches on the nodes that still exist is relatively easy; just
walk the tree and look at the nodes with refcount of one.

Firing watches on *deleted* nodes is more fun. We add 'modified_in_tx'
and 'deleted_in_tx' flags to each node. Nodes with those flags cannot
be shared, as they will always be unique to the transaction in which
they were created.

When xs_node_walk would need to *create* a node as scaffolding and it
encounters a deleted_in_tx node, it can resurrect it simply by clearing
its deleted_in_tx flag. If that node originally had any *data*, they're
gone, and the modified_in_tx flag will have been set when it was first
deleted.

We then attempt to send appropriate watches when the transaction is
committed, properly delete the deleted_in_tx nodes, and remove the
modified_in_tx flag from the others.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement XenStore transactions

Given that the whole thing supported copy on write from the beginning,
transactions end up being fairly simple. On starting a transaction, just
take a ref of the existing root; swap it back in on a successful commit.

The main tree has a transaction ID too, and we keep a record of the last
transaction ID given out. if the main tree is ever modified when it isn't
the latest, it gets a new transaction ID.

A commit can only succeed if the main tree hasn't moved on since it was
forked. Strictly speaking, the XenStore protocol allows a transaction to
succeed as long as nothing *it* read or wrote has changed in the interim,
but no implementations do that; *any* change is sufficient to abort a
transaction.

This does not yet fire watches on the changed nodes on a commit. That bit
is more fun and will come in a follow-on commit.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement XenStore watches

Starts out fairly simple: a hash table of watches based on the path.

Except there can be multiple watches on the same path, so the watch ends
up being a simple linked list, and the head of that list is in the hash
table. Which makes removal a bit of a PITA but it's not so bad; we just
special-case "I had to remove the head of the list and now I have to
replace it in / remove it from the hash table". And if we don't remove
the head, it's a simple linked-list operation.

We do need to fire watches on *deleted* nodes, so instead of just a simple
xs_node_unref() on the topmost victim, we need to recurse down and fire
watches on them all.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add basic XenStore tree walk and write/read/directory support

This is a fairly simple implementation of a copy-on-write tree.

The node walk function starts off at the root, with 'inplace == true'.
If it ever encounters a node with a refcount greater than one (including
the root node), then that node is shared with other trees, and cannot
be modified in place, so the inplace flag is cleared and we copy on
write from there on down.

Xenstore write has 'mkdir -p' semantics and will create the intermediate
nodes if they don't already exist, so in that case we flip the inplace
flag back to true as as populated the newly-created nodes.

We put a copy of the absolute path into the buffer in the struct walk_op,
with *two* NUL terminators at the end. As xs_node_walk() goes down the
tree, it replaces the next '/' separator with a NUL so that it can use
the 'child name' in place. The next recursion down then puts the '/'
back and repeats the exercise for the next path element... if it doesn't
hit that *second* NUL termination which indicates the true end of the
path.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add xenstore wire implementation and implementation stubs

This implements the basic wire protocol for the XenStore commands, punting
all the actual implementation to xs_impl_* functions which all just return
errors for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: Document Xen HVM emulation

Signed-off-by: David Woodhouse <dwmw2@amazon.co.uk>

kvm/i386: Add xen-evtchn-max-pirq property

The default number of PIRQs is set to 256 to avoid issues with 32-bit MSI
devices. Allow it to be increased if the user desires.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Support MSI mapping to PIRQ

The way that Xen handles MSI PIRQs is kind of awful.

There is a special MSI message which targets a PIRQ. The vector in the
low bits of data must be zero. The low 8 bits of the PIRQ# are in the
destination ID field, the extended destination ID field is unused, and
instead the high bits of the PIRQ# are in the high 32 bits of the address.

Using the high bits of the address means that we can't intercept and
translate these messages in kvm_send_msi(), because they won't be caught
by the APIC — addresses like 0x1000fee46000 aren't in the APIC's range.

So we catch them in pci_msi_trigger() instead, and deliver the event
channel directly.

That isn't even the worst part. The worst part is that Xen snoops on
writes to devices' MSI vectors while they are *masked*. When a MSI
message is written which looks like it targets a PIRQ, it remembers
the device and vector for later.

When the guest makes a hypercall to bind that PIRQ# (snooped from a
marked MSI vector) to an event channel port, Xen *unmasks* that MSI
vector on the device. Xen guests using PIRQ delivery of MSI don't
ever actually unmask the MSI for themselves.

Now that this is working we can finally enable XENFEAT_hvm_pirqs and
let the guest use it all.

Tested with passthrough igb and emulated e1000e + AHCI.

           CPU0       CPU1
  0:         65          0   IO-APIC   2-edge      timer
  1:          0         14  xen-pirq   1-ioapic-edge  i8042
  4:          0        846  xen-pirq   4-ioapic-edge  ttyS0
  8:          1          0  xen-pirq   8-ioapic-edge  rtc0
  9:          0          0  xen-pirq   9-ioapic-level  acpi
12:        257          0  xen-pirq  12-ioapic-edge  i8042
24:       9600          0  xen-percpu    -virq      timer0
25:       2758          0  xen-percpu    -ipi       resched0
26:          0          0  xen-percpu    -ipi       callfunc0
27:          0          0  xen-percpu    -virq      debug0
28:       1526          0  xen-percpu    -ipi       callfuncsingle0
29:          0          0  xen-percpu    -ipi       spinlock0
30:          0       8608  xen-percpu    -virq      timer1
31:          0        874  xen-percpu    -ipi       resched1
32:          0          0  xen-percpu    -ipi       callfunc1
33:          0          0  xen-percpu    -virq      debug1
34:          0       1617  xen-percpu    -ipi       callfuncsingle1
35:          0          0  xen-percpu    -ipi       spinlock1
36:          8          0   xen-dyn    -event     xenbus
37:          0       6046  xen-pirq    -msi       ahci[0000:00:03.0]
38:          1          0  xen-pirq    -msi-x     ens4
39:          0         73  xen-pirq    -msi-x     ens4-rx-0
40:         14          0  xen-pirq    -msi-x     ens4-rx-1
41:          0         32  xen-pirq    -msi-x     ens4-tx-0
42:         47          0  xen-pirq    -msi-x     ens4-tx-1

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Support GSI mapping to PIRQ

If I advertise XENFEAT_hvm_pirqs then a guest now boots successfully as
long as I tell it 'pci=nomsi'.

[root@localhost ~]# cat /proc/interrupts
           CPU0
  0:         52   IO-APIC   2-edge      timer
  1:         16  xen-pirq   1-ioapic-edge  i8042
  4:       1534  xen-pirq   4-ioapic-edge  ttyS0
  8:          1  xen-pirq   8-ioapic-edge  rtc0
  9:          0  xen-pirq   9-ioapic-level  acpi
11:       5648  xen-pirq  11-ioapic-level  ahci[0000:00:04.0]
12:        257  xen-pirq  12-ioapic-edge  i8042
...

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement emulated PIRQ hypercall support

This wires up the basic infrastructure but the actual interrupts aren't
there yet, so don't advertise it to the guest.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: Implement HYPERVISOR_physdev_op

Just hook up the basic hypercalls to stubs in xen_evtchn.c for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Automatically add xen-platform PCI device for emulated Xen guests

It isn't strictly mandatory but Linux guests at least will only map
their grant tables over the dummy BAR that it provides, and don't have
sufficient wit to map them in any other unused part of their guest
address space. So include it by default for minimal surprise factor.

As I come to document "how to run a Xen guest in QEMU", this means one
fewer thing to tell the user about, according to the mantra of "if it
needs documenting, fix it first, then document what remains".

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add basic ring handling to xenstore

Extract requests, return ENOSYS to all of them. This is enough to allow
older Linux guests to boot, as they need *something* back but it doesn't
matter much what.

In the first instance we're likely to wire this up over a UNIX socket to
an actual xenstored implementation, but in the fullness of time it would
be nice to have a fully single-tenant "virtual" xenstore within qemu.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add xen_xenstore device for xenstore emulation

Just the basic shell, with the event channel hookup. It only dumps the
buffer for now; a real ring implmentation will come in a subsequent patch.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add backend implementation of interdomain event channel support

The provides the QEMU side of interdomain event channels, allowing events
to be sent to/from the guest.

The API mirrors libxenevtchn, and in time both this and the real Xen one
will be available through ops structures so that the PV backend drivers
can use the correct one as appropriate.

For now, this implementation can be used directly by our XenStore which
will be for emulated mode only.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: handle HVMOP_get_param

Which is used to fetch xenstore PFN and port to be used
by the guest. This is preallocated by the toolstack when
guest will just read those and use it straight away.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: Reserve Xen special pages for console, xenstore rings

Xen has eight frames at 0xfeff8000 for this; we only really need two for
now and KVM puts the identity map at 0xfeffc000, so limit ourselves to
four.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: handle PV timer hypercalls

Introduce support for one shot and periodic mode of Xen PV timers,
whereby timer interrupts come through a special virq event channel
with deadlines being set through:

1) set_timer_op hypercall (only oneshot)
2) vcpu_op hypercall for {set,stop}_{singleshot,periodic}_timer
hypercalls

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement GNTTABOP_query_size

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: Implement HYPERVISOR_grant_table_op and GNTTABOP_[gs]et_verson

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Support mapping grant frames

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Add xen_gnttab device for grant table emulation

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

kvm/i386: Add xen-gnttab-max-frames property

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Support HVM_PARAM_CALLBACK_TYPE_PCI_INTX callback

The guest is permitted to specify an arbitrary domain/bus/device/function
and INTX pin from which the callback IRQ shall appear to have come.

In QEMU we can only easily do this for devices that actually exist, and
even that requires us "knowing" that it's a PCMachine in order to find
the PCI root bus — although that's OK really because it's always true.

We also don't get to get notified of INTX routing changes, because we
can't do that as a passive observer; if we try to register a notifier
it will overwrite any existing notifier callback on the device.

But in practice, guests using PCI_INTX will only ever use pin A on the
Xen platform device, and won't swizzle the INTX routing after they set
it up. So this is just fine.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback

The GSI callback (and later PCI_INTX) is a level triggered interrupt. It
is asserted when an event channel is delivered to vCPU0, and is supposed
to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0
is cleared again.

Thankfully, Xen does *not* assert the GSI if the guest sets its own
evtchn_upcall_pending field; we only need to assert the GSI when we
have delivered an event for ourselves. So that's the easy part, kind of.

There's a slight complexity in that we need to hold the BQL before we
can call qemu_set_irq(), and we definitely can't do that while holding
our own port_lock (because we'll need to take that from the qemu-side
functions that the PV backend drivers will call). So if we end up
wanting to set the IRQ in a context where we *don't* already hold the
BQL, defer to a BH.

However, we *do* need to poll for the evtchn_upcall_pending flag being
cleared. In an ideal world we would poll that when the EOI happens on
the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd
pairs — one is used to trigger the interrupt, and the other works in the
other direction to 'resample' on EOI, and trigger the first eventfd
again if the line is still active.

However, QEMU doesn't seem to do that. Even VFIO level interrupts seem
to be supported by temporarily unmapping the device's BARs from the
guest when an interrupt happens, then trapping *all* MMIO to the device
and sending the 'resample' event on *every* MMIO access until the IRQ
is cleared! Maybe in future we'll plumb the 'resample' concept through
QEMU's irq framework but for now we'll do what Xen itself does: just
check the flag on every vmexit if the upcall GSI is known to be
asserted.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

i386/xen: add monitor commands to test event injection

Specifically add listing, injection of event channels.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

hw/xen: Implement EVTCHNOP_reset

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_bind_vcpu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_bind_interdomain

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_alloc_unbound

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_send

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_bind_ipi

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_bind_virq

Add the array of virq ports to each vCPU so that we can deliver timers,
debug ports, etc. Global virqs are allocated against vCPU 0 initially,
but can be migrated to other vCPUs (when we implement that).

The kernel needs to know about VIRQ_TIMER in order to accelerate timers,
so tell it via KVM_XEN_VCPU_ATTR_TYPE_TIMER. Also save/restore the value
of the singleshot timer across migration, as the kernel will handle the
hypercalls automatically now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_unmask

This finally comes with a mechanism for actually injecting events into
the guest vCPU, with all the atomic-test-and-set that's involved in
setting the bit in the shinfo, then the index in the vcpu_info, and
injecting either the lapic vector as MSI, or letting KVM inject the
bare vector.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_close

It calls an internal close_port() helper which will also be used from
EVTCHNOP_reset and will actually do the work to disconnect/unbind a port
once any of that is actually implemented in the first place.

That in turn calls a free_port() internal function which will be in
error paths after allocation.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

hw/xen: Implement EVTCHNOP_status

This adds the basic structure for maintaining the port table and reporting
the status of ports therein.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Add support for Xen event channel delivery to vCPU

The kvm_xen_inject_vcpu_callback_vector() function will either deliver
the per-vCPU local APIC vector (as an MSI), or just kick the vCPU out
of the kernel to trigger KVM's automatic delivery of the global vector.
Support for asserting the GSI/PCI_INTX callbacks will come later.

Also add kvm_xen_get_vcpu_info_hva() which returns the vcpu_info of
a given vCPU.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

hw/xen: Add xen_evtchn device for event channel emulation

Include basic support for setting HVM_PARAM_CALLBACK_IRQ to the global
vector method HVM_PARAM_CALLBACK_TYPE_VECTOR, which is handled in-kernel
by raising the vector whenever the vCPU's vcpu_info->evtchn_upcall_pending
flag is set.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HVMOP_set_param

This is the hook for adding the HVM_PARAM_CALLBACK_IRQ parameter in a
subsequent commit.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Split out from another commit]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HVMOP_set_evtchn_upcall_vector

The HVMOP_set_evtchn_upcall_vector hypercall sets the per-vCPU upcall
vector, to be delivered to the local APIC just like an MSI (with an EOI).

This takes precedence over the system-wide delivery method set by the
HVMOP_set_param hypercall with HVM_PARAM_CALLBACK_IRQ. It's used by
Windows and Xen (PV shim) guests but normally not by Linux.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Rework for upstream kernel changes and split from HVMOP_set_param]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_event_channel_op

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch event_channel_op_compat which was never available to HVM guests]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle VCPUOP_register_runstate_memory_area

Allow guest to setup the vcpu runstates which is used as
steal clock.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle VCPUOP_register_vcpu_time_info

In order to support Linux vdso in Xen.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle VCPUOP_register_vcpu_info

Handle the hypercall to set a per vcpu info, and also wire up the default
vcpu_info in the shared_info page for the first 32 vCPUs.

To avoid deadlock within KVM a vCPU thread must set its *own* vcpu_info
rather than it being set from the context in which the hypercall is
invoked.

Add the vcpu_info (and default) GPA to the vmstate_x86_cpu for migration,
and restore it in kvm_arch_put_registers() appropriately.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_vcpu_op

This is simply when guest tries to register a vcpu_info
and since vcpu_info placement is optional in the minimum ABI
therefore we can just fail with -ENOSYS

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_hvm_op

This is when guest queries for support for HVMOP_pagetable_dying.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement XENMEM_add_to_physmap_batch

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_memory_op

Specifically XENMEM_add_to_physmap with space XENMAPSPACE_shared_info to
allow the guest to set its shared_info page.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Use the xen_overlay device, add compat support]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: manage and save/restore Xen guest long_mode setting

Xen will "latch" the guest's 32-bit or 64-bit ("long mode") setting when
the guest writes the MSR to fill in the hypercall page, or when the guest
sets the event channel callback in HVM_PARAM_CALLBACK_IRQ.

KVM handles the former and sets the kernel's long_mode flag accordingly.
The latter will be handled in userspace. Keep them in sync by noticing
when a hypercall is made in a mode that doesn't match qemu's idea of
the guest mode, and resyncing from the kernel. Do that same sync right
before serialization too, in case the guest has set the hypercall page
but hasn't yet made a system call.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: add pc_machine_kvm_type to initialize XEN_EMULATE mode

The xen_overlay device (and later similar devices for event channels and
grant tables) need to be instantiated. Do this from a kvm_type method on
the PC machine derivatives, since KVM is only way to support Xen emulation
for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: Permit --xen-domid argument when accel is KVM

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: David Wooodhouse <dwmw@amazon.co.uk>

hw/xen: Add xen_overlay device for emulating shared xenheap pages

For the shared info page and for grant tables, Xen shares its own pages
from the "Xen heap" to the guest. The guest requests that a given page
from a certain address space (XENMAPSPACE_shared_info, etc.) be mapped
to a given GPA using the XENMEM_add_to_physmap hypercall.

To support that in qemu when *emulating* Xen, create a memory region
(migratable) and allow it to be mapped as an overlay when requested.

Xen theoretically allows the same page to be mapped multiple times
into the guest, but that's hard to track and reinstate over migration,
so we automatically *unmap* any previous mapping when creating a new
one. This approach has been used in production with.... a non-trivial
number of guests expecting true Xen, without any problems yet being
noticed.

This adds just the shared info page for now. The grant tables will be
a larger region, and will need to be overlaid one page at a time. I
think that means I need to create separate aliases for each page of
the overall grant_frames region, so that they can be mapped individually.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: Implement SCHEDOP_poll and SCHEDOP_yield

They both do the same thing and just call sched_yield. This is enough to
stop the Linux guest panicking when running on a host kernel which doesn't
intercept SCHEDOP_poll and lets it reach userspace.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_sched_op, SCHEDOP_shutdown

It allows to shutdown itself via hypercall with any of the 3 reasons:
  1) self-reboot
  2) shutdown
  3) crash

Implementing SCHEDOP_shutdown sub op let us handle crashes gracefully rather
than leading to triple faults if it remains unimplemented.

In addition, the SHUTDOWN_soft_reset reason is used for kexec, to reset
Xen shared pages and other enlightenments and leave a clean slate for the
new kernel without the hypervisor helpfully writing information at
unexpected addresses.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch sched_op_compat which was never available for HVM guests,
        Add SCHEDOP_soft_reset]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: implement HYPERVISOR_xen_version

This is just meant to serve as an example on how we can implement
hypercalls. xen_version specifically since Qemu does all kind of
feature controllability. So handling that here seems appropriate.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Implement kvm_gva_rw() safely]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/xen: handle guest hypercalls

This means handling the new exit reason for Xen but still
crashing on purpose. As we implement each of the hypercalls
we will then return the right return code.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Add CPL to hypercall tracing, disallow hypercalls from CPL > 0]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

xen-platform: allow its creation with XEN_EMULATE mode

The only thing we need to fix to make this build is the PIO hack which
sets the BIOS memory areas to R/W v.s. R/O. Theoretically we could hook
that up to the PAM registers on the emulated PIIX, but in practice
nobody cares, so just leave it doing nothing.

Now it builds without actual Xen, move it to CONFIG_XEN_BUS to include it
in the KVM-only builds.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen-platform: exclude vfio-pci from the PCI platform unplug

Such that PCI passthrough devices work for Xen emulated guests.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/hvm: Set Xen vCPU ID in KVM

There are (at least) three different vCPU ID number spaces. One is the
internal KVM vCPU index, based purely on which vCPU was chronologically
created in the kernel first. If userspace threads are all spawned and
create their KVM vCPUs in essentially random order, then the KVM indices
are basically random too.

The second number space is the APIC ID space, which is consistent and
useful for referencing vCPUs. MSIs will specify the target vCPU using
the APIC ID, for example, and the KVM Xen APIs also take an APIC ID
from userspace whenever a vCPU needs to be specified (as opposed to
just using the appropriate vCPU fd).

The third number space is not normally relevant to the kernel, and is
the ACPI/MADT/Xen CPU number which corresponds to cs->cpu_index. But
Xen timer hypercalls use it, and Xen timer hypercalls *really* want
to be accelerated in the kernel rather than handled in userspace, so
the kernel needs to be told.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/kvm: handle Xen HVM cpuid leaves

Introduce support for emulating CPUID for Xen HVM guests. It doesn't make
sense to advertise the KVM leaves to a Xen guest, so do Xen unconditionally
when the xen-version machine property is set.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Obtain xen_version from KVM property, make it automatic]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support

This just initializes the basic Xen support in KVM for now. Only permitted
on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap
overlay, event channel, grant tables and other stuff will exist. There's
no point having the basic hypercall support if nothing else works.

Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used
later by support devices.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: Add XEN_DISABLED mode and make it default

Also set XEN_ATTACH mode in xen_init() to reflect the truth; not that
anyone ever cared before. It was *only* ever checked in xen_init_pv()
before.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

xen: add CONFIG_XEN_BUS and CONFIG_XEN_EMU options for Xen emulation

The XEN_EMU option will cover core Xen support in target/, which exists
only for x86 with KVM today but could theoretically also be implemented
on Arm/Aarch64 and with TCG or other accelerators (if anyone wants to
run the gauntlet of struct layout compatibility, errno mapping, and the
rest of that fui).

It will also cover the support for architecture-independent grant table
and event channel support which will be added in hw/i386/kvm/ (on the
basis that the non-KVM support is very theoretical and making it not use
KVM directly seems like gratuitous overengineering at this point).

The XEN_BUS option is for the xenfv platform support, which will now be
used both by XEN_EMU and by real Xen.

The XEN option remains dependent on the Xen runtime libraries, and covers
support for real Xen. Some code which currently resides under CONFIG_XEN
will be moving to CONFIG_XEN_BUS over time as the direct dependencies on
Xen runtime libraries are eliminated. The Xen PCI platform device will
also reside under CONFIG_XEN_BUS.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

include: import Xen public headers to hw/xen/interface

There's already a partial set here; update them and pull in a more
complete set.

To start with, define __XEN_TOOLS__ in hw/xen/xen.h to ensure that any
internal definitions needed by Xen toolstack libraries are present
regardless of the order in which the headers are included. A reckoning
will come later, once we make the PV backends work in emulation and
untangle the headers for Xen-native vs. generic parts.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Update to Xen public headers from 4.16.2 release, add some in io/,
define __XEN_TOOLS__ in hw/xen/xen.h, move to hw/xen/interface/]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging

Pull request

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmPO6D0ACgkQnKSrs4Gr
# c8jU2wf+O+0JmsRUuCYera0eXA8YfZyFxa7+A5fy6izyNugJMmHx+Nse9IsvLqGo
# pLTMnc0HH7lLG8ofX9M93M1BOT2a3f//CrZQimfWuPAlKWUkpuOGOepEwbBxt247
# DQAvxESjclZ9anVeSuKBmpz8u7S4H9AYuLupFh9bXZW0C+wgmbZp7Ak7+LNqcbaC
# TwasPgbHVji6j9IuKo1yJfr2f2csjb2zpock1m5E/BRCQxomKdtdFGs4LcHdWqNR
# NVBFc89SNDJknaihkgjxxXvDFjtb96DOQaI7UuFxhCfTae+gJMDIdoUoJoSpQh1j
# dMQ8pKRR0zN7ndZg0ozxT7qxJPp6LA==
# =Xju6
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 23 Jan 2023 20:04:13 GMT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
  block/blkio: Fix inclusion of required headers
  virtio-blk: simplify virtio_blk_dma_restart_cb()
  util/aio: Defer disabling poll mode as long as possible

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

block/blkio: Fix inclusion of required headers

After recent header file inclusion rework the build fails when the blkio
module is enabled:

../block/blkio.c: In function ‘blkio_detach_aio_context’:
../block/blkio.c:321:24: error: implicit declaration of function ‘bdrv_get_aio_context’; did you mean ‘qemu_get_aio_context’? [-Werror=implicit-function-declaration]
  321 |     aio_set_fd_handler(bdrv_get_aio_context(bs),
      |                        ^~~~~~~~~~~~~~~~~~~~
      |                        qemu_get_aio_context
../block/blkio.c:321:24: error: nested extern declaration of ‘bdrv_get_aio_context’ [-Werror=nested-externs]
../block/blkio.c:321:24: error: passing argument 1 of ‘aio_set_fd_handler’ makes pointer from integer without a cast [-Werror=int-conversion]
  321 |     aio_set_fd_handler(bdrv_get_aio_context(bs),
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~
      |                        |
      |                        int
In file included from /home/pipo/git/qemu.git/include/qemu/job.h:33,
                 from /home/pipo/git/qemu.git/include/block/blockjob.h:30,
                 from /home/pipo/git/qemu.git/include/block/block_int-global-state.h:28,
                 from /home/pipo/git/qemu.git/include/block/block_int.h:27,
                 from ../block/blkio.c:13:
/home/pipo/git/qemu.git/include/block/aio.h:476:37: note: expected ‘AioContext *’ but argument is of type ‘int’
  476 | void aio_set_fd_handler(AioContext *ctx,
      |                         ~~~~~~~~~~~~^~~
../block/blkio.c: In function ‘blkio_file_open’:
../block/blkio.c:821:34: error: passing argument 2 of ‘blkio_attach_aio_context’ makes pointer from integer without a cast [-Werror=int-conversion]
  821 |     blkio_attach_aio_context(bs, bdrv_get_aio_context(bs));
      |                                  ^~~~~~~~~~~~~~~~~~~~~~~~
      |                                  |
      |                                  int

Fix it by including 'block/block-io.h' which contains the required
declarations.

Fixes: e2c1c34f139f49ef909bb4322607fb8b39002312
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 2bc956011404a1ab03342aefde0087b5b4762562.1674477350.git.pkrempa@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

virtio-blk: simplify virtio_blk_dma_restart_cb()

virtio_blk_dma_restart_cb() is tricky because the BH must deal with
virtio_blk_data_plane_start()/virtio_blk_data_plane_stop() being called.

There are two issues with the code:

1. virtio_blk_realize() should use qdev_add_vm_change_state_handler()
   instead of qemu_add_vm_change_state_handler(). This ensures the
   ordering with virtio_init()'s vm change state handler that calls
   virtio_blk_data_plane_start()/virtio_blk_data_plane_stop() is
   well-defined. Then blk's AioContext is guaranteed to be up-to-date in
   virtio_blk_dma_restart_cb() and it's no longer necessary to have a
   special case for virtio_blk_data_plane_start().

2. Only blk_drain() waits for virtio_blk_dma_restart_cb()'s
   blk_inc_in_flight() to be decremented. The bdrv_drain() family of
   functions do not wait for BlockBackend's in_flight counter to reach
   zero. virtio_blk_data_plane_stop() relies on blk_set_aio_context()'s
   implicit drain, but that's a bdrv_drain() and not a blk_drain().
   Note that virtio_blk_reset() already correctly relies on blk_drain().
   If virtio_blk_data_plane_stop() switches to blk_drain() then we can
   properly wait for pending virtio_blk_dma_restart_bh() calls.

Once these issues are taken care of the code becomes simpler. This
change is in preparation for multiple IOThreads in virtio-blk where we
need to clean up the multi-threading behavior.

I ran the reproducer from commit 49b44549ace7 ("virtio-blk: On restart,
process queued requests in the proper context") to check that there is
no regression.

Cc: Sergio Lopez <slp@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-id: 20221102182337.252202-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

util/aio: Defer disabling poll mode as long as possible

When we measure FIO read performance (cache=writethrough, bs=4k,
iodepth=64) in VMs, ~80K/s notifications (e.g., EPT_MISCONFIG) are observed
from guest to qemu.

It turns out those frequent notificatons are caused by interference from
worker threads. Worker threads queue bottom halves after completing IO
requests. Pending bottom halves may lead to either aio_compute_timeout()
zeros timeout and pass it to try_poll_mode() or run_poll_handlers() returns
no progress after noticing pending aio_notify() events. Both cause
run_poll_handlers() to call poll_set_started(false) to disable poll mode.
However, for both cases, as timeout is already zeroed, the event loop
(i.e., aio_poll()) just processes bottom halves and then starts the next
event loop iteration. So, disabling poll mode has no value but leads to
unnecessary notifications from guest.

To minimize unnecessary notifications from guest, defer disabling poll
mode to when the event loop is about to be blocked.

With this patch applied, FIO seq-read performance (bs=4k, iodepth=64,
cache=writethrough) in VMs increases from 330K/s to 413K/s IOPS.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Message-id: 20220710120849.63086-1-chao.gao@intel.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'pull-target-arm-20230123' of https://git.linaro.org/people/pmaydell/qemu-arm into staging

target-arm queue:
* Widen cnthctl_el2 to uint64_t
* Unify checking for M Main Extension in MRS/MSR
* bitbang_i2c, versatile_i2c: code cleanups
* SME: refactor SME SM/ZA handling
* Fix physical address resolution for MTE
* Fix in_debug path in S1_ptw_translate
* Don't set EXC_RETURN.ES if Security Extension not present
* Implement DBGCLAIM registers
* Provide stubs for more external debug registers
* Look up ARMCPRegInfo at runtime, not translate time

# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmPOjQQZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3vreD/sGr7outToY4FSZ4GGpC1L6
# ZwF6kjmwED/8EVaGZxWOaL2/oNoEav2YSpzUbqCa79jUx5zFBE145zYknL/bZyjS
# VLX9G2vFFCtwFQ9rc2wV/3JmTmMmSCnHqOZPMSVy5vrQKH6d41WFYZEvGpJmCgh6
# YWK4gnMqkuIHmSvxw+S6q9p/3jzPk7c3vy8eRcxp+AMnfSBkYu0kFXmr7yOwscRS
# adT8GFrkj0our/HtYqvzclVzrxcCVF1pWrtrHK7ZSddmElIcztel+1/yQH3T6onj
# aOyRj1WC3+0t9uKwUNTFSHkRUqMqr6XYvRF+cvpe5N7lbfVn57u2TwmPgUwYbZcg
# 8Mbz+LRYENzTYZa59ACxJXXcG0BivXiTwyrFR8Ck0vakcWFAjDzxHOw9CgHkDwPs
# Dd93b04esehIN7MY8/5CSkbx+8ey+YK+o7sofiDCMKcYwooM1Y+Ls21ZcjA5GH+n
# SsXp93SgagndCydD0ftRUlDTtGL7dhzaGpRmYArjeWzOKBbAmv/WfQeH47p3bpaP
# CB2RUjHzYobMGLO0yp9droOaVKqKKLtc7wGzxgJGx6j5FrN0lnCEMRrKrZJ57Q/q
# z4VoRoo0I6Q994/mVanGqXx8cSucyl0Z3HbC633WvrnZXzoM7+7HlQLhpF+yd9+s
# 4lHiw0rPgqXtwEfeMaESSQ==
# =ubIU
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 23 Jan 2023 13:35:00 GMT
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# gpg:                 aka "Peter Maydell <peter@archaic.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* tag 'pull-target-arm-20230123' of https://git.linaro.org/people/pmaydell/qemu-arm: (26 commits)
  target/arm: Look up ARMCPRegInfo at runtime
  target/arm: Reorg do_coproc_insn
  target/arm: provide stubs for more external debug registers
  target/arm: implement DBGCLAIM registers
  target/arm: Don't set EXC_RETURN.ES if Security Extension not present
  target/arm: Fix in_debug path in S1_ptw_translate
  target/arm: Fix physical address resolution for MTE
  target/arm/sme: Unify set_pstate() SM/ZA helpers as set_svcr()
  target/arm/sme: Rebuild hflags in aarch64_set_svcr()
  target/arm/sme: Reset ZA state in aarch64_set_svcr()
  target/arm/sme: Reset SVE state in aarch64_set_svcr()
  target/arm/sme: Introduce aarch64_set_svcr()
  target/arm/sme: Rebuild hflags in set_pstate() helpers
  target/arm/sme: Reorg SME access handling in handle_msr_i()
  hw/i2c/versatile_i2c: Rename versatile_i2c -> arm_sbcon_i2c
  hw/i2c/versatile_i2c: Use ARM_SBCON_I2C() macro
  hw/i2c/versatile_i2c: Replace TYPE_VERSATILE_I2C -> TYPE_ARM_SBCON_I2C
  hw/i2c/versatile_i2c: Replace VersatileI2CState -> ArmSbconI2CState
  hw/i2c/versatile_i2c: Drop useless casts from void * to pointer
  hw/i2c/bitbang_i2c: Convert DPRINTF() to trace events
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Look up ARMCPRegInfo at runtime

Do not encode the pointer as a constant in the opcode stream.
This pointer is specific to the cpu that first generated the
translation, which runs into problems with both hot-pluggable
cpus and user-only threads, as cpus are removed. It's also a
potential correctness issue in the theoretical case of a
slightly-heterogenous system, because if CPU 0 generates a
TB and then CPU 1 executes it, CPU 1 will end up using CPU 0's
hash table, which might have a wrong set of registers in it.
(All our current systems are either completely homogenous,
M-profile, or have CPUs sufficiently different that they
wouldn't be sharing TBs anyway because the differences would
show up in the TB flags, so the correctness issue is only
theoretical, not practical.)

Perform the lookup in either helper_access_check_cp_reg,
or a new helper_lookup_cp_reg.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230106194451.1213153-3-richard.henderson@linaro.org
[PMM: added note in commit message about correctness issue]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Reorg do_coproc_insn

Move the ri == NULL case to the top of the function and return.
This allows the else to be removed and the code unindented.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20230106194451.1213153-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: provide stubs for more external debug registers

Qemu doesn't implement Debug Communication Channel, as well as the rest
of external debug interface. However, Microsoft Hyper-V in tries to
access some of those registers during an EL2 context switch.

Since there is no architectural way to not advertise support for external
debug, provide RAZ/WI stubs for OSDTRRX_EL1, OSDTRTX_EL1 and OSECCR_EL1
registers in the same way the rest of DCM is currently done. Do account
for access traps though with access_tda.

Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230120155929.32384-3-eiakovlev@linux.microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: implement DBGCLAIM registers

The architecture does not define any functionality for the CLAIM tag bits.
So we will just keep the raw bits, as per spec.

Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230120155929.32384-2-eiakovlev@linux.microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Don't set EXC_RETURN.ES if Security Extension not present

In v7m_exception_taken(), for v8M we set the EXC_RETURN.ES bit if
either the exception targets Secure or if the CPU doesn't implement
the Security Extension. This is incorrect: the v8M Arm ARM specifies
that the ES bit should be RES0 if the Security Extension is not
implemented, and the pseudocode agrees.

Remove the incorrect condition, so that we leave the ES bit 0
if the Security Extension isn't implemented.

This doesn't have any guest-visible effects for our current set of
emulated CPUs, because all our v8M CPUs implement the Security
Extension; but it's worth fixing in case we add a v8M CPU without
the extension in future.

Reported-by: Igor Kotrasinski <i.kotrasinsk@samsung.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: Fix in_debug path in S1_ptw_translate

During the conversion, the test against get_phys_addr_lpae got inverted,
meaning that successful translations went to the 'failed' label.

Cc: qemu-stable@nongnu.org
Fixes: f3639a64f60 ("target/arm: Use softmmu tlbs for page table walking")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1417
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230114054605.2977022-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Fix physical address resolution for MTE

Conversion to probe_access_full missed applying the page offset.

Fixes: b8967ddf ("target/arm: Use probe_access_full for MTE")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1416
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230114031213.2970349-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm/sme: Unify set_pstate() SM/ZA helpers as set_svcr()

Unify the two helper_set_pstate_{sm,za} in this function.
Do not call helper_* functions from svcr_write.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230112102436.1913-8-philmd@linaro.org
Message-Id: <20230112004322.161330-1-richard.henderson@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm/sme: Rebuild hflags in aarch64_set_svcr()

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230112102436.1913-7-philmd@linaro.org
Message-Id: <20230112004322.161330-1-richard.henderson@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm/sme: Reset ZA state in aarch64_set_svcr()

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230112102436.1913-6-philmd@linaro.org
Message-Id: <20230112004322.161330-1-richard.henderson@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm/sme: Reset SVE state in aarch64_set_svcr()

Move arm_reset_sve_state() calls to aarch64_set_svcr().

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230112102436.1913-5-philmd@linaro.org
Message-Id: <20230112004322.161330-1-richard.henderson@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm/sme: Introduce aarch64_set_svcr()

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230112102436.1913-4-philmd@linaro.org
Message-Id: <20230112004322.161330-1-richard.henderson@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm/sme: Rebuild hflags in set_pstate() helpers

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230112102436.1913-3-philmd@linaro.org
Message-Id: <20230112004322.161330-1-richard.henderson@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm/sme: Reorg SME access handling in handle_msr_i()

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230112102436.1913-2-philmd@linaro.org
Message-Id: <20230112004322.161330-1-richard.henderson@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/i2c/versatile_i2c: Rename versatile_i2c -> arm_sbcon_i2c

This device model started with the Versatile board, named
TYPE_VERSATILE_I2C, then ended up renamed TYPE_ARM_SBCON_I2C
as per the official "ARM SBCon two-wire serial bus interface"
description from:
https://developer.arm.com/documentation/dui0440/b/programmer-s-reference/two-wire-serial-bus-interface--sbcon

Use the latter name as a better description.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230110082508.24038-6-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/i2c/versatile_i2c: Use ARM_SBCON_I2C() macro

ARM_SBCON_I2C() macro and ArmSbconI2CState typedef are
already declared via the QOM DECLARE_INSTANCE_CHECKER()
macro in "hw/i2c/arm_sbcon_i2c.h". Drop the VERSATILE_I2C
declarations from versatile_i2c.c.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230110082508.24038-5-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/i2c/versatile_i2c: Replace TYPE_VERSATILE_I2C -> TYPE_ARM_SBCON_I2C

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230110082508.24038-4-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/i2c/versatile_i2c: Replace VersatileI2CState -> ArmSbconI2CState

In order to rename TYPE_VERSATILE_I2C as TYPE_ARM_SBCON_I2C
(the formal ARM naming), start renaming its state.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230110082508.24038-3-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/i2c/versatile_i2c: Drop useless casts from void * to pointer

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230110082508.24038-2-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/i2c/bitbang_i2c: Convert DPRINTF() to trace events

Convert the remaining DPRINTF debug macro uses to tracepoints.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20230111085016.44551-6-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/i2c/bitbang_i2c: Trace state changes

Trace bitbang state machine changes with trace events.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20230111085016.44551-5-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/i2c/bitbang_i2c: Change state calling bitbang_i2c_set_state() helper

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20230111085016.44551-4-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/i2c/bitbang_i2c: Remove unused dummy MemoryRegion

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20230111085016.44551-3-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/i2c/bitbang_i2c: Define TYPE_GPIO_I2C in public header

Define TYPE_GPIO_I2C in the public "hw/i2c/bitbang_i2c.h"
header and use it in hw/arm/musicpal.c.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20230111085016.44551-2-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Unify checking for M Main Extension in MRS/MSR

BASEPRI, FAULTMASK, and their _NS equivalents only exist on devices with
the Main Extension. However, the MRS instruction did not check this,
and the MSR instruction handled it inconsistently (warning BASEPRI, but
silently ignoring writes to BASEPRI_NS). Unify this behavior and always
warn when reading or writing any of these registers if the extension is
not present.

Signed-off-by: David Reiss <dreiss@meta.com>
Message-id: 167330628518.10497.13100425787268927786-0@git.sr.ht
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Widen cnthctl_el2 to uint64_t

This is a 64-bit register on AArch64, even if the high 44 bits
are RES0. Because this is defined as ARM_CP_STATE_BOTH, we are
asserting that the cpreg field is 64-bits.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1400
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230115171633.3171890-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-riscv-to-apply-20230120' of https://github.com/alistair23/qemu into staging

Second RISC-V PR for QEMU 8.0

* riscv_htif: Support console output via proxy syscall
* Cleanup firmware and device tree loading
* Fix elen check when using vector extensions
* add RISC-V OpenSBI boot test
* Ensure we always follow MISA parsing
* Fix up masking of vsip/vsie accesses
* Trap on writes to stimecmp from VS when hvictl.VTI=1
* Introduce helper_set_rounding_mode_chkfrm

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEE9sSsRtSTSGjTuM6PIeENKd+XcFQFAmPKRP0ACgkQIeENKd+X
# cFTHTwgAkyRDxrLepvI0KNaT0+cUBh+3QFlJ5JRtVnDW+5R+3aGT72PTS7Migqoh
# H3IFCB2mcSdQvyjj2jDFlrFd0oVIaqE0+bnhouS/4nHB5S/vmapHi4Mc74Vv1CMB
# rgXScL+C5gDOH1I7XjqOb1FY5Vxqyhi3IzdIoj+0ysUrGmUkqx+ij/cfQL7jkH9Q
# slNAkorgwgrTgMgkJ5RKd4cjyv35O4XKLAsgixVTfJ+WcxKmc/zaJOkNM/UDnmxK
# k2+2P8bshZWtWscXbm3oMC5+2ow1QtFedEkhHqb4adkQIyolKL7P1TfMlCgMSvES
# BKl0DUhqQ+7F77tik3GPy9spQ6LpTQ==
# =ifFF
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 20 Jan 2023 07:38:37 GMT
# gpg:                using RSA key F6C4AC46D4934868D3B8CE8F21E10D29DF977054
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [full]
# Primary key fingerprint: F6C4 AC46 D493 4868 D3B8  CE8F 21E1 0D29 DF97 7054

* tag 'pull-riscv-to-apply-20230120' of https://github.com/alistair23/qemu: (37 commits)
  hw/riscv/virt.c: move create_fw_cfg() back to virt_machine_init()
  target/riscv: Remove helper_set_rod_rounding_mode
  target/riscv: Introduce helper_set_rounding_mode_chkfrm
  tcg/riscv: Use tcg_pcrel_diff in tcg_out_ldst
  target/riscv: Trap on writes to stimecmp from VS when hvictl.VTI=1
  target/riscv: Fix up masking of vsip/vsie accesses
  hw/riscv: use ms->fdt in riscv_socket_fdt_write_distance_matrix()
  hw/riscv: use MachineState::fdt in riscv_socket_fdt_write_id()
  hw/riscv/virt.c: remove 'is_32_bit' param from create_fdt_socket_cpus()
  hw/riscv/sifive_u.c: simplify create_fdt()
  hw/riscv/virt.c: simplify create_fdt()
  hw/riscv/spike.c: simplify create_fdt()
  target/riscv: Use TARGET_FMT_lx for env->mhartid
  target/riscv/cpu.c: do not skip misa logic in riscv_cpu_realize()
  target/riscv/cpu: set cpu->cfg in register_cpu_props()
  hw/riscv/boot.c: use MachineState in riscv_load_kernel()
  hw/riscv/boot.c: use MachineState in riscv_load_initrd()
  hw/riscv: write bootargs 'chosen' FDT after riscv_load_kernel()
  hw/riscv: write initrd 'chosen' FDT inside riscv_load_initrd()
  hw/riscv/spike.c: load initrd right after riscv_load_kernel()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-include-2023-01-20' of https://repo.or.cz/qemu/armbru into staging

Header cleanup patches for 2023-01-20

# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmPKN6YSHGFybWJydUBy
# ZWRoYXQuY29tAAoJEDhwtADrkYZTPeoQAIKl/BF6PFRNq0/k3vPqMe6nltjgkpa/
# p7E5qRlo31RCeUB+f0iW26mySnNTgYkE28yy57HxUML/9Lp1bbxyDgRNiJ406a4L
# kFVF04kOIFez1+mfvWN92DZqcl/EAAqNL6XqSFyO38kYwcsFsi+BZ7DLZbL9Ea8v
# wVywB96mN6KyrLWCJ2D0OqIVuPHSHol+5zt9e6+ShBgN0FfElLbv0F4KH3VJ1olA
# psKl6w6V9+c2zV1kT/H+S763m6mQdwtVo/UuOJoElI+Qib/UBxDOrhdYf4Zg7hKf
# ByUuhJUASm8y9yD/42mFs90B6eUNzLSBC8v1PgRqSqDHtllveP4RysklBlyIMlOs
# DKtqEuRuIJ/qDXliIFHY6tBnUkeITSd7BCxkQYfaGyaSOcviDSlE3AyaaBC0sY4F
# P/lTTiRg5ksvhDYtJnW3mSfmT2PY7aBtyE3D1Z84v9hek6D0reMQTE97yL/j4m7P
# wJP8aM3Z8GILCVxFIh02wmqWZhZUCGsIDS/vxVm+u060n66qtDIQFBoazsFJrCME
# eWI+qDNDr6xhLegeYajGDM9pdpQc3x0siiuHso4wMSI9NZxwP+tkCVhTpqmrRcs4
# GSH/4IlUXqEZdUQDL38DfA22C1TV8BzyMhGLTUERWWYki1sr99yv0pdFyk5r3nLB
# SURwr58rB2zo
# =dOfq
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 20 Jan 2023 06:41:42 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* tag 'pull-include-2023-01-20' of https://repo.or.cz/qemu/armbru:
  include/hw/ppc include/hw/pci-host: Drop extra typedefs
  include/hw/ppc: Don't include hw/pci-host/pnv_phb.h from pnv.h
  include/hw/ppc: Supply a few missing includes
  include/hw/ppc: Split pnv_chip.h off pnv.h
  include/hw/block: Include hw/block/block.h where needed
  hw/sparc64/niagara: Use blk_name() instead of open-coding it
  include/block: Untangle inclusion loops
  coroutine: Use Coroutine typedef name instead of structure tag
  coroutine: Split qemu/coroutine-core.h off qemu/coroutine.h
  coroutine: Clean up superfluous inclusion of qemu/lockable.h
  coroutine: Move coroutine_fn to qemu/osdep.h, trim includes
  coroutine: Clean up superfluous inclusion of qemu/coroutine.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>