www.infradead.org Git - users/dwmw2/qemu.git/log

migration: refactor ram_save_target_page functions

Refactor ram_save_target_page legacy and multifd
functions into one. Other than simplifying it,
it frees 'migration_ops' object from usage, so it
is expunged.

Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20250127120823.144949-3-ppandit@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Trivial cleanup on JSON writer of vmstate_save()

Two small cleanups in the same section of vmstate_save():

  - Check vmdesc before the "mixed null/non-null data in array" logic, to
  be crystal clear that it's only about the JSON writer, not the vmstate on
  its own in the migration stream.

  - Since we have is_null variable now, use that to replace a check.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-17-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Merge precopy/postcopy on switchover start

Now after all the cleanups, finally we can merge the switchover startup
phase into one single function for precopy/postcopy.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-16-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Always set DEVICE state

DEVICE state was introduced back in 2017:

https://lore.kernel.org/qemu-devel/20171020090556.18631-1-dgilbert@redhat.com/

Quote from Dave's cover letter, when the pre-switchover phase was enabled,
the state transition looks like this:

  The precopy flow is:
  active->pre-switchover->device->completed

  The postcopy flow is:
  active->pre-switchover->postcopy-active->completed

To supplement above, when the cap is not enabled:

  The precopy flow is:
  active->completed

  The postcopy flow is:
  active->postcopy-active->completed

It works for us, though we have some code just to special case these state
transitions, so the DEVICE state currently is special only to precopy, and
only conditionally.

I had a quick discussion with Libvirt developers, it turns out that this
may not be necessary. IOW, it seems okay we can have DEVICE state to be
generic, so that we don't have over-complicated state machines.  It not
only helps align all the migration state machine, help cleanup the code
path especially on pre-switchover handling (see the patch itself), another
side benefit is we can unconditionally have a specific state to mark the
switchover phase, which might be helpful for debugging too.

This patch makes the DEVICE state to be present always, marking that source
QEMU is switching over.  Then the state machine will be always as simple
as:

  active-> [pre-switchover->] -> device -> [postcopy-active->] -> complete

After the change, no matter whether pre-switchover or postcopy is enabled
or not, we always have DEVICE state showing the switchover phase.  When
pre-switchover enabled, we'll have an extra stage before that.  When
postcopy is enabled, we'll have an extra stage after that.

A few qtests need touch up in QEMU tree for this change:

  - A few iotest outputs (194, 203, 234, 262, 280)
  - Teach libqos's migrate() on "device" state

Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-15-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Cleanup qemu_savevm_state_complete_precopy()

Now qemu_savevm_state_complete_precopy() is never used in postcopy, clean
it up as in_postcopy==false now unconditionally.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-14-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy

Postcopy invokes qemu_savevm_state_complete_precopy() twice for a long
time, and that caused way too much confusions. Let's clean this up and
make postcopy easier to read.

It's actually fairly straightforward: postcopy starts with saving
non-postcopiable iterables, then later it saves again with non-iterable
only. Move these two calls out makes everything much easier to follow.
Otherwise it's very unclear what qemu_savevm_state_complete_precopy() did
in either of the calls.

No functional change intended.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-13-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Notify COMPLETE once for postcopy

Postcopy invokes qemu_savevm_state_complete_precopy() twice, that means
it'll invoke COMPLETE notify twice.. also twice the tracepoints that
marking precopy complete.

Move that notification (along with the tracepoint) out to the caller, so
that postcopy will only notify once right at the start of switchover phase
from precopy. When at it, rename it to suite the file now it locates.

For precopy, there should have no functional change except the tracepoint
has a name change.

For the other two users of qemu_savevm_state_complete_precopy(), namely:
qemu_savevm_state() and qemu_savevm_live_state(): the notifier shouldn't
matter because they're not precopy at all. Now in these two contexts (aka,
"savevm", and "colo") sometimes the precopy notifiers will still be
invoked, but that's outside the scope of this patch.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-12-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Take BQL slightly longer in postcopy_start()

This paves way for some follow up patch to modify migration states at the
end of postcopy_start(), which should better be with the BQL so that
there's no way of concurrent cancellation.

So we'll do something slightly more with BQL but they're really trivial,
hopefully nothing will really chance with this.

A side benefit is we can drop another explicit lock() in failure path.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-11-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Drop cached migration state in migration_maybe_pause()

I can't see why we must cache the state now after we avoided possible
CANCEL race: that's the only thing I can think of that can modify the
migration state concurrently with the migration thread itself. Make all
the state updates to happen always, then we don't need to cache the state
anymore.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-10-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Adjust locking in migration_maybe_pause()

In migration_maybe_pause() QEMU may yield BQL before waiting for a
semaphore. However it yields the BQL too early, which logically gives it
chance for the main thread to quickly take the BQL and modify the state to
CANCELLING.

To avoid such race condition from happening at all, always update the
migration states within the BQL. It'll make sure no concurrent
cancellation can ever happen.

With that, IIUC there's chance we can remove the extra parameter in
migration_maybe_pause() to update active state, but that'll be done
separately later.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-9-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Adjust postcopy bandwidth during switchover

Precopy uses unlimited bandwidth always during switchover, it makes sense
because this is so critical and no one would like to throttle bandwidth
during the VM blackout.

OTOH, postcopy surprisingly didn't do that. There's one line that in the
middle of the postcopy switchover it tries to switch to postcopy's
specified max-postcopy-bandwidth, but even so it's somewhere in the middle
which is strange.

This patch brings the two modes to always use unlimited bandwidth for
switchover, meanwhile only apply the postcopy max bandwidth after the
switchover is completed.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-8-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Synchronize all CPU states only for non-iterable dump

Do one shot cpu sync at qemu_savevm_state_complete_precopy_non_iterable(),
instead of coding it separately in two places.

Note that in the context of qemu_savevm_state_complete_precopy(), this
patch is also an optimization for postcopy path, in that we can avoid sync
cpu twice during switchover: before this patch, postcopy_start() invokes
twice on qemu_savevm_state_complete_precopy(), each of them will try to
sync CPU info. In reality, only one of them would be enough.

For background snapshot, there's no intended functional change.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-7-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Drop inactivate_disk param in qemu_savevm_state_complete*

This parameter is only used by one caller, which is the genuine precopy
complete path (migration_completion_precopy).

The parameter was introduced in a1fbe750fd ("migration: Fix race of image
locking between src and dst") to make sure the inactivate will happen
before EOF to make sure dest will always be able to activate the disk
properly. However there's no limitation on how early we inactivate the
disk. For precopy completion path, we can always do that as long as VM is
stopped.

Move the disk inactivate there, then we can remove this inactivate_disk
parameter in the whole call stack, because all the rest users pass in false
always.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-6-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Avoid two src-downtime-end tracepoints for postcopy

Postcopy can trigger this tracepoint twice, while only the 1st one is
valid. Avoid triggering the 2nd tracepoint just like what we do with
recording the total downtime.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-5-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Optimize postcopy on downtime by avoiding JSON writer

postcopy_start() is the entry function that postcopy is destined to start.
It also means QEMU source will not dump VM description, aka, the JSON
writer is garbage now.

We can leave that to be cleaned up when migration completes, however when
with the JSON writer object being present, vmstate_save() will still try to
construct the JSON objects for the VM descriptions, even though it'll never
be used later if it's postcopy.

To save those cycles, release the JSON writer earlier for postcopy. Then
vmstate_save() later will be smart enough to skip the JSON object
constructions completely. It can logically reduce downtime because all
such JSON constructions happen during postcopy blackout.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-4-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Do not construct JSON description if suppressed

QEMU machine has a property "suppress-vmdesc". When it is enabled, QEMU
will stop attaching JSON VM description at the end of the precopy migration
stream (postcopy is never affected because postcopy never attach that).

However even if it's suppressed by the user, the source QEMU will still
construct the JSON descriptions, which is a complete waste of CPU and
memory resources.

To avoid it, only create the JSON writer object if suppress-vmdesc is not
specified.

Luckily, vmstate_save() already supports vmdesc==NULL, so only a few spots
that are left to be prepared that vmdesc can be NULL now.

When at it, move the init / destroy of the JSON writer object to start /
end of the migration - the JSON writer object is a sub-struct of migration
state, and that looks like the only object that was dynamically allocated /
destroyed within migration process. Make it the same as the rest objects
that migration uses.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-3-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Remove postcopy implications in should_send_vmdesc()

should_send_vmdesc() has a hack inside (which was not reflected in the
function name) in that it tries to detect global postcopy state and that
will affect the value to be returned.

It's easier to keep the helper simple by only check the suppress-vmdesc
property.  Then:

  - On the sender side of its usage, there's already in_postcopy variable
    that we can use: postcopy doesn't send vmdesc at all, so directly skip
    everything for postcopy.

  - On the recv side, when reaching vmdesc processing it must be precopy
    code already, hence that hack check never used to work anyway.

No functional change intended, except a trivial side effect that QEMU
source will start to avoid running some JSON helper in postcopy path, but
that would only reduce the postcopy blackout window a bit, rather than any
other bad side effect.

Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-2-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: cpr-transfer documentation

Add documentation for the cpr-transfer migration mode.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-25-git-send-email-steven.sistare@oracle.com
[add -machine memory-backend=ram0]
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration-test: cpr-transfer

Add a migration test for cpr-transfer mode. Defer the connection to the
target monitor, else the test hangs because in cpr-transfer mode QEMU does
not listen for monitor connections until we send the migrate command to
source QEMU.

To test -incoming defer, send a migrate incoming command to the target,
after sending the migrate command to the source, as required by
cpr-transfer mode.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-24-git-send-email-steven.sistare@oracle.com
[only allocate in_channels when needed]
Signed-off-by: Fabiano Rosas <farosas@suse.de>

tests/qtest: assert qmp connected

Assert that qmp_fd is valid when we communicate with the monitor.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1736967650-129648-23-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

tests/qtest: enhance migration channels

Change the migrate_qmp and migrate_qmp_fail channels argument to a QObject
type so the caller can manipulate the object before passing it to the
helper. Define migrate_str_to_channel to aid such manipulation.
Add a channels argument to migrate_incoming_qmp.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-22-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration-test: defer connection

Add an option to defer connection to the target monitor, needed by the
cpr-transfer test.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/1736967650-129648-21-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

tests/qtest: defer connection

Add an option to defer making the connecting to the monitor and qtest
sockets when calling qtest_init_with_env. The client makes the connection
later by calling qtest_connect and qtest_qmp_handshake.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-20-git-send-email-steven.sistare@oracle.com
[plumb capabilities list into qtest_qmp_handshake]
Signed-off-by: Fabiano Rosas <farosas@suse.de>

tests/qtest: optimize migrate_set_ports

Do not query connection parameters if all port numbers are known. This is
more efficient, and also solves a problem for the cpr-transfer test.
At the point where cpr-transfer calls migrate_qmp and migrate_set_ports,
the monitor is not connected and queries are not allowed. Port=0 is
never used for cpr-transfer.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-19-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration-test: memory_backend

Allow each migration test to define its own memory backend, replacing
the standard "-m <size>" specification.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/1736967650-129648-18-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: cpr-transfer mode

Add the cpr-transfer migration mode, which allows the user to transfer
a guest to a new QEMU instance on the same host with minimal guest pause
time, by preserving guest RAM in place, albeit with new virtual addresses
in new QEMU, and by preserving device file descriptors.  Pages that were
locked in memory for DMA in old QEMU remain locked in new QEMU, because the
descriptor of the device that locked them remains open.

cpr-transfer preserves memory and devices descriptors by sending them to
new QEMU over a unix domain socket using SCM_RIGHTS.  Such CPR state cannot
be sent over the normal migration channel, because devices and backends
are created prior to reading the channel, so this mode sends CPR state
over a second "cpr" migration channel.  New QEMU reads the cpr channel
prior to creating devices or backends.  The user specifies the cpr channel
in the channel arguments on the outgoing side, and in a second -incoming
command-line parameter on the incoming side.

The user must start old QEMU with the the '-machine aux-ram-share=on' option,
which allows anonymous memory to be transferred in place to the new process
by transferring a memory descriptor for each ram block.  Memory-backend
objects must have the share=on attribute, but memory-backend-epc is not
supported.

The user starts new QEMU on the same host as old QEMU, with command-line
arguments to create the same machine, plus the -incoming option for the
main migration channel, like normal live migration.  In addition, the user
adds a second -incoming option with channel type "cpr".  This CPR channel
must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
UNIX domain socket.

To initiate CPR, the user issues a migrate command to old QEMU, adding
a second migration channel of type "cpr" in the channels argument.
Old QEMU stops the VM, saves state to the migration channels, and enters
the postmigrate state.  New QEMU mmap's memory descriptors, and execution
resumes.

The implementation splits qmp_migrate into start and finish functions.
Start sends CPR state to new QEMU, which responds by closing the CPR
channel.  Old QEMU detects the HUP then calls finish, which connects the
main migration channel.

In summary, the usage is:

  qemu-system-$arch -machine aux-ram-share=on ...

  start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"

  Issue commands to old QEMU:
    migrate_set_parameter mode cpr-transfer

    {"execute": "migrate", ...
        {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-17-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: cpr-transfer save and load

Add functions to create a QEMUFile based on a unix URI, for saving or
loading, for use by cpr-transfer mode to preserve CPR state.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-16-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: VMSTATE_FD

Define VMSTATE_FD for declaring a file descriptor field in a
VMStateDescription.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-15-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: SCM_RIGHTS for QEMUFile

Define functions to put/get file descriptors to/from a QEMUFile, for qio
channels that support SCM_RIGHTS.  Maintain ordering such that
  put(A), put(fd), put(B)
followed by
  get(A), get(fd), get(B)
always succeeds.  Other get orderings may succeed but are not guaranteed.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-14-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: incoming channel

Extend the -incoming option to allow an @MigrationChannel to be specified.
This allows channels other than 'main' to be described on the command
line, which will be needed for CPR.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-13-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: enhance migrate_uri_parse

Export migrate_uri_parse for use outside migration internals, and define
a method migrate_is_uri that indicates when migrate_uri_parse should
be used.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-12-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

hostmem-shm: preserve for cpr

Preserve memory-backend-shm memory objects during cpr-transfer.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-11-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

hostmem-memfd: preserve for cpr

Preserve memory-backend-memfd memory objects during cpr-transfer.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-10-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

physmem: preserve ram blocks for cpr

Save the memfd for ramblocks in CPR state, along with a name that
uniquely identifies it.  The block's idstr is not yet set, so it
cannot be used for this purpose.  Find the saved memfd in new QEMU when
creating a block.  If size of a resizable block is larger in new QEMU,
extend it via the file_ram_alloc truncate parameter, and the extra space
will be usable after a guest reset.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: cpr-state

CPR must save state that is needed after QEMU is restarted, when devices
are realized. Thus the extra state cannot be saved in the migration
channel, as objects must already exist before that channel can be loaded.
Instead, define auxilliary state structures and vmstate descriptions, not
associated with any registered object, and serialize the aux state to a
cpr-specific channel in cpr_state_save. Deserialize in cpr_state_load
after QEMU restarts, before devices are realized.

Provide accessors for clients to register file descriptors for saving.
The mechanism for passing the fd's to the new process will be specific
to each migration mode, and added in subsequent patches.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-8-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

machine: aux-ram-share option

Allocate auxilliary guest RAM as an anonymous file that is shareable
with an external process.  This option applies to memory allocated as
a side effect of creating various devices. It does not apply to
memory-backend-objects, whether explicitly specified on the command
line, or implicitly created by the -m command line option.

This option is intended to support new migration modes, in which the
memory region can be transferred in place to a new QEMU process, by sending
the memfd file descriptor to the process.  Memory contents are preserved,
and if the mode also transfers device descriptors, then pages that are
locked in memory for DMA remain locked.  This behavior is a pre-requisite
for supporting vfio, vdpa, and iommufd devices with the new modes.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-7-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

memory: add RAM_PRIVATE

Define the RAM_PRIVATE flag.

In RAMBlock creation functions, if MAP_SHARED is 0 in the flags parameter,
in a subsequent patch the implementation may still create a shared mapping
if other conditions require it. Callers who specifically want a private
mapping, eg for objects specified by the user, must pass RAM_PRIVATE.

After RAMBlock creation, MAP_SHARED in the block's flags indicates whether
the block is shared or private, and MAP_PRIVATE is omitted.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-6-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

physmem: fd-based shared memory

Create MAP_SHARED RAMBlocks by mmap'ing a file descriptor rather than using
MAP_ANON, so the memory can be accessed in another process by passing and
mmap'ing the fd.  This will allow CPR to support memory-backend-ram and
memory-backend-shm objects, provided the user creates them with share=on.

Use memfd_create if available because it has no constraints.  If not, use
POSIX shm_open.  However, allocation on the opened fd may fail if the shm
mount size is too small, even if the system has free memory, so for backwards
compatibility fall back to qemu_anon_ram_alloc/MAP_ANON on failure.

For backwards compatibility on Windows, always use MAP_ANON.  share=on has
no purpose there, but the syntax is accepted, and must continue to work.

Lastly, quietly fall back to MAP_ANON if the system does not support
qemu_ram_alloc_from_fd.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-5-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

physmem: qemu_ram_alloc_from_fd extensions

Extend qemu_ram_alloc_from_fd to support resizable ram, and define
qemu_ram_resize_cb to clean up the API.

Add a grow parameter to extend the file if necessary. However, if
grow is false, a zero-sized file is always extended.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1736967650-129648-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

physmem: fix qemu_ram_alloc_from_fd size calculation

qemu_ram_alloc_from_fd allocates space if file_size == 0.  If non-zero,
it uses the existing space and verifies it is large enough, but the
verification was broken when the offset parameter was introduced.  As
a result, a file smaller than offset passes the verification and causes
errors later.  Fix that, and update the error message to include offset.

Peter provides this concise reproducer:

  $ touch ramfile
  $ truncate -s 64M ramfile
  $ ./qemu-system-x86_64 -object memory-backend-file,mem-path=./ramfile,offset=128M,size=128M,id=mem1,prealloc=on
  qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address

With the fix, the error message is:
  qemu-system-x86_64: mem1 backing store size 0x4000000 is too small for 'size' option 0x8000000 plus 'offset' option 0x8000000

Cc: qemu-stable@nongnu.org
Fixes: 4b870dc4d0c0 ("hostmem-file: add offset option")
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd"

Let's factor it out so we can reuse it.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-2-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: fix -Werror=maybe-uninitialized

../migration/savevm.c: In function ‘qemu_savevm_state_complete_precopy_non_iterable’:
../migration/savevm.c:1560:20: error: ‘ret’ may be used uninitialized [-Werror=maybe-uninitialized]
1560 | return ret;
| ^~~

Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250114104811.2612846-1-marcandre.lureau@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>

Merge tag 'pull-aspeed-20250127' of https://github.com/legoater/qemu into staging

aspeed queue:

* Fixed serial definitions on the command line
* Fixed sdhci write protected pin on AST2600 EVB machine
* Added timer support on AST2700 SoC
* Updated buildroot and SDK images of functional tests
* Removed sd devices creation when -nodefaults is used
* Added software reset mode support on AST2600 SoC

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmeXSIwACgkQUaNDx8/7
# 7KH5Ew/+Ne9Z0lksOEUw5BJ6Qm3U2oLS90hcjo3MBHpmMHX0MXY2qYOKV2aS7spO
# kvWpTUiPaT682X4IrBuxdCdi2F80dhJSmky81vMn7a3+DZgSsUoPEgw2Ophm5Q37
# 788qVEKk55F8m4r4ZCpAd3+Mc+3rVw6YQW/Rvu2+fVbfaLu6dE4fnQdXmDYc2EzF
# pCYAcYlRp19dP0YnBJnv4/JK6Eybced1VG1cKGNy8VSyMY3vWM7ZOdP4Ybz+d88R
# 0DNEIGRQJQZZFNxvkEJX/tPsK+m2M9G/t5YOuJP22EoF3L8v+rnt7yg+NWE4pbtI
# dqzg8ikICidcP6NMYjTe6C2m9PBcKBhbPumRZOW1lWRoZOShy6cHO7KajJZ3oj8K
# GUOEEh7i5tKbPGdg46ifc0waGMKh97S3dy/8V/N2XqPfL99TXfRAyiq0sG0mS1je
# xGV9vN7LPJ9OYMri6U5SLewrWO93q7Vv4SBv7iDVupZ8Ww6wcJaCWgvUWjxbK7SH
# qE003RvQYmK6gkCH4cYnI2LZBlJyp7wKdO7nG4K2vI+05GVpALTkZPcCQ84WhF5L
# 8wO5wrQPalQrOwkvankqgEJOifWmBAi3Gs/3y/tRg+u4VHoPKcaXLujBqq8pZl6F
# meYAzqqksFj8PJwiCVJVNcHpqvhmyBzvvPAf6NEgbRsDyUiFZAo=
# =gOq1
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 27 Jan 2025 03:49:16 EST
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [full]
# gpg:                 aka "Cédric Le Goater <clg@kaod.org>" [full]
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-aspeed-20250127' of https://github.com/legoater/qemu:
  docs/system/arm/aspeed: Remove tacoma-bmc from the documentation
  aspeed/wdt: Support software reset mode for AST2600
  aspeed/wdt: Fix coding style
  aspeed: Create sd devices only when defaults are enabled
  test/functional: Update buildroot images to 2024.11
  test/functional: Update the Aspeed aarch64 test
  aspeed/soc: Support Timer for AST2700
  hw/timer/aspeed: Add AST2700 Support
  hw/timer/aspeed: Refactor Timer Callbacks for SoC-Specific Implementations
  hw/arm/aspeed: Invert sdhci write protected pin for AST2600 EVB
  hw/sd/sdhci: Introduce a new Write Protected pin inverted property
  hw/arm/aspeed: fix connect_serial_hds_to_uarts

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'hppa-system-for-v10-pull-request' of https://github.com/hdeller/qemu-hppa into staging

hppa updates

* Fixes booting a Linux kernel which is provided on the command line.
* Allow more than 4GB RAM on 64-bit boxes

# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZ5PvvgAKCRD3ErUQojoP
# X7JQAQCn2MR4k4lfClDZHNmAFUNw51j56SB5HC/FCUKfOx4dCQD/Tf2OV/gstMOz
# nfpvIH6ouXZ2/p5npzTyOt+A8fwUpw0=
# =qrs7
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 24 Jan 2025 14:53:34 EST
# gpg:                using EDDSA key BCE9123E1AD29F07C049BBDEF712B510A23A0F5F
# gpg: Good signature from "Helge Deller <deller@gmx.de>" [unknown]
# gpg:                 aka "Helge Deller <deller@kernel.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 4544 8228 2CD9 10DB EF3D  25F8 3E5F 3D04 A7A2 4603
#      Subkey fingerprint: BCE9 123E 1AD2 9F07 C049  BBDE F712 B510 A23A 0F5F

* tag 'hppa-system-for-v10-pull-request' of https://github.com/hdeller/qemu-hppa:
  hw/hppa: Fix booting Linux kernel with initrd
  hw/hppa: Support up to 256 GiB RAM on 64-bit machines

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

docs/system/arm/aspeed: Remove tacoma-bmc from the documentation

The tacoma-bmc machine has recently been removed, so let's remove
it from the documentation now, too.

Fixes: 2b1b66e01f ("arm: Remove tacoma-bmc machine")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250124174507.27348-1-thuth@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

aspeed/wdt: Support software reset mode for AST2600

On the AST2400 and AST2500 platforms, the system can only be reset by enabling
the WDT (Watchdog Timer) and waiting for the WDT timeout. However, starting
from the AST2600 platform, the reset event can be triggered directly and
intentionally by software, without relying on the WDT timeout.

This mechanism, referred to as "software restart", is implemented in hardware.
When using the software restart mechanism, the WDT counter is not enabled.

To trigger a reset generation in software mode, write 0xAEEDF123 to register
0x24 and software mode reset only support SOC reset mode.

A new function, "aspeed_wdt_is_soc_reset_mode", is introduced to determine
whether the SoC reset mode is active.

Signed-off-by: Jamin Lin <jamin_lin@aspeedtech.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250124030249.1706996-3-jamin_lin@aspeedtech.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

aspeed/wdt: Fix coding style

Fix coding style issues from checkpatch.pl.

Signed-off-by: Jamin Lin <jamin_lin@aspeedtech.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250124030249.1706996-2-jamin_lin@aspeedtech.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

aspeed: Create sd devices only when defaults are enabled

When the -nodefaults option is set, sd devices should not be
automatically created by the machine. Instead they should be defined
on the command line.

Note that it is not currently possible to define which bus an
"sd-card" device is attached to:

-blockdev node-name=drive0,driver=file,filename=/path/to/file.img \
-device sd-card,drive=drive0,id=sd0

and the first bus named "sd-bus" will be used.

Reviewed-by: Jamin Lin <jamin_lin@aspeedtech.com>
Acked-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250122070909.1138598-10-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

test/functional: Update buildroot images to 2024.11

The main changes compared to upstream 2024.11 buildroot are

- bumped Linux to version 6.11.11 with a custom config
- changed U-Boot to OpenBMC branch for more support
- included extra target packages

See branch [1] for more details.

There is a slight output change when powering off the machine,
the console now contains :

reboot: Power off not available: System halted

Adjust accordingly the expect string in
do_test_arm_aspeed_buildroot_poweroff().

[1] https://github.com/legoater/buildroot/commits/aspeed-2024.11

Reviewed-by: Jamin Lin <jamin_lin@aspeedtech.com>
Link: https://lore.kernel.org/qemu-devel/20250122070909.1138598-9-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

test/functional: Update the Aspeed aarch64 test

Bumped SDK version to v09.03. v09.04 is available but not yet
supported in QEMU.

Reviewed-by: Jamin Lin <jamin_lin@aspeedtech.com>
Link: https://lore.kernel.org/qemu-devel/20250122070909.1138598-8-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

aspeed/soc: Support Timer for AST2700

Add Timer model for AST2700 Timer support. The Timer controller include 8 sets
of 32-bit decrement counters.

The base address of TIMER0 to TIMER7 as following.
Base Address of Timer 0 = 0x12C1_0000
Base Address of Timer 1 = 0x12C1_0040
Base Address of Timer 2 = 0x12C1_0080
Base Address of Timer 3 = 0x12C1_00C0
Base Address of Timer 4 = 0x12C1_0100
Base Address of Timer 5 = 0x12C1_0140
Base Address of Timer 6 = 0x12C1_0180
Base Address of Timer 7 = 0x12C1_01C0

The interrupt of TIMER0 to TIMER7 as following.
GICINT16 = TIMER 0 interrupt
GICINT17 = TIMER 1 interrupt
GICINT18 = TIMER 2 interrupt
GICINT19 = TIMER 3 interrupt
GICINT20 = TIMER 4 interrupt
GICINT21 = TIMER 5 interrupt
GICINT22 = TIMER 6 interrupt
GICINT23 = TIMER 7 interrupt

Signed-off-by: Jamin Lin <jamin_lin@aspeedtech.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/r/20250113064455.1660564-4-jamin_lin@aspeedtech.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/timer/aspeed: Add AST2700 Support

The timer controller include 8 sets of 32-bit decrement counters, based on
either PCLK or 1MHZ clock and the design of timer controller between AST2600
and AST2700 are almost the same.

TIMER0 – TIMER7 has their own individual control and interrupt status register.
In other words, users are able to set timer control in register TMC10 with
different TIMER base address and clear timer control and interrupt status in
register TMC14 with different TIMER base address.

Introduce new "aspeed_2700_timer_read" and "aspeed_2700_timer_write" callback
functions and a new ast2700 class to support AST2700.

The base address of TIMER0 to TIMER7 as following.
Base Address of Timer 0 = 0x12C1_0000
Base Address of Timer 1 = 0x12C1_0040
Base Address of Timer 2 = 0x12C1_0080
Base Address of Timer 3 = 0x12C1_00C0
Base Address of Timer 4 = 0x12C1_0100
Base Address of Timer 5 = 0x12C1_0140
Base Address of Timer 6 = 0x12C1_0180
Base Address of Timer 7 = 0x12C1_01C0

The register address space of each TIMER is "0x40" , and uses the following
formula to get the index and register of each TIMER.

timer_index = offset >> 6;
timer_offset = offset & 0x3f;

The TMC010 is a counter control set and interrupt status register. Write "1" to
TMC10[3:0] will set the specific bits to "1". Introduce a new
"aspeed_2700_timer_set_ctrl" function to handle this register behavior.

The TMC014 is a counter control clear and interrupt status register, to clear
the specific bits to "0", it should write "1" to TMC14[3:0] on the same bit
position. Introduce a new "aspeed_2700_timer_clear_ctrl" function to handle
this register behavior. TMC014 does not support read operation.

Signed-off-by: Jamin Lin <jamin_lin@aspeedtech.com>
Acked-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/r/20250113064455.1660564-3-jamin_lin@aspeedtech.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/timer/aspeed: Refactor Timer Callbacks for SoC-Specific Implementations

The register set have a significant change in AST2700. The TMC00-TMC3C
are used for TIMER0 and TMC40-TMC7C are used for TIMER1. In additional,
TMC20-TMC3C and TMC60-TMC7C are reserved registers for TIMER0 and TIMER1,
respectively.

Besides, each TIMER has their own control and interrupt status register.
In other words, users are able to set control and interrupt status for TIMER0
in one register. Both aspeed_timer_read and aspeed_timer_write callback
functions are not compatible AST2700.

Introduce common read and write functions for ASPEED timers.
Modify the aspeed_timer_read and aspeed_timer_write functions to delegate to
SoC-specific callbacks first.
Update the AST2400, AST2500, AST2600 and AST1030 specific read and write
functions to call the common implementations for common register accesses.

This refactoring improves the organization of call delegation and prepares the
codebase for future SoC-specific specializations, such as the AST2700.

Signed-off-by: Jamin Lin <jamin_lin@aspeedtech.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/r/20250113064455.1660564-2-jamin_lin@aspeedtech.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/arm/aspeed: Invert sdhci write protected pin for AST2600 EVB

The Write Protect pin of SDHCI model is default active low to match the SDHCI
spec. So, write enable the bit 19 should be 1 and write protected the bit 19
should be 0 at the Present State Register (0x24).

According to the design of AST2600 EVB, the Write Protected pin is active
high by default. To support it, introduces a new "sdhci_wp_inverted"
property in ASPEED MACHINE State and set it true for AST2600 EVB
and set "wp_inverted" property true of sdhci-generic model.

Signed-off-by: Jamin Lin <jamin_lin@aspeedtech.com>
Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Acked-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20241114094839.4128404-4-jamin_lin@aspeedtech.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/sd/sdhci: Introduce a new Write Protected pin inverted property

The Write Protect pin of SDHCI model is default active low to match the SDHCI
spec. So, write enable the bit 19 should be 1 and write protected the bit 19
should be 0 at the Present State Register (0x24). However, some boards are
design Write Protected pin active high. In other words, write enable the bit 19
should be 0 and write protected the bit 19 should be 1 at the
Present State Register (0x24). To support it, introduces a new "wp-inverted"
property and set it false by default.

Signed-off-by: Jamin Lin <jamin_lin@aspeedtech.com>
Acked-by: Cédric Le Goater <clg@redhat.com>
Acked-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20241114094839.4128404-3-jamin_lin@aspeedtech.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/arm/aspeed: fix connect_serial_hds_to_uarts

In the loop, we need ignore the index increase when uart == uart_chosen
We should increase the index only after we allocate a serial.

Signed-off-by: Kenneth Jia <kenneth_jia@asus.com>
Fixes: d2b3eaefb4d7 ("aspeed: Refactor UART init for multi-SoC machines")
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/r/5f9b0c53f1644922ba85522046e92f4c@asus.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/hppa: Fix booting Linux kernel with initrd

Commit 20f7b890173b ("hw/hppa: Reset vCPUs calling resettable_reset()")
broke booting the Linux kernel with initrd which may have been provided
on the command line. The problem is, that the mentioned commit zeroes
out initial registers which were preset with addresses for the Linux
kernel and initrd.

Fix it by adding proper variables which are set shortly before starting
the firmware.

Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 20f7b890173b ("hw/hppa: Reset vCPUs calling resettable_reset()")
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

hw/hppa: Support up to 256 GiB RAM on 64-bit machines

Allow up to 256 GB RAM, which is the maximum a rp8440 machine (the very
last 64-bit PA-RISC machine) physically supports.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'linux-user-fix-gupnp-pull-request' of https://github.com/hdeller/qemu-hppa into staging

linux-user: Add support for various missing netlink sockopt entries

Add missing sockopt calls and thus fix building the debian gupnp package in a chroot.

This fixes debian bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1044651

Signed-off-by: Helge Deller <deller@gmx.de>
# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZ5OPdwAKCRD3ErUQojoP
# X9EWAP0ZvoDehmNzgWMlUpWT+d4O06kMsrDsi+tRddUUSJgp4wEAuuycr4go4b9b
# 6xLDLr81C7MFEGsztGcRVhPwVdDJxAU=
# =Lw8U
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 24 Jan 2025 08:02:47 EST
# gpg:                using EDDSA key BCE9123E1AD29F07C049BBDEF712B510A23A0F5F
# gpg: Good signature from "Helge Deller <deller@gmx.de>" [unknown]
# gpg:                 aka "Helge Deller <deller@kernel.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 4544 8228 2CD9 10DB EF3D  25F8 3E5F 3D04 A7A2 4603
#      Subkey fingerprint: BCE9 123E 1AD2 9F07 C049  BBDE F712 B510 A23A 0F5F

* tag 'linux-user-fix-gupnp-pull-request' of https://github.com/hdeller/qemu-hppa:
  linux-user: netlink: Add missing QEMU_IFLA entries
  linux-user: netlink: add netlink neighbour emulation
  linux-user: netlink: Add emulation of IP_MULTICAST_IF
  linux-user: netlink: Add IP_PKTINFO cmsg parsing
  linux-user: Use unique error messages for cmsg parsing
  linux-user: netlink: Add missing IFA_PROTO to host_to_target_data_addr_rtattr()

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'pull-loongarch-20250124' of https://gitlab.com/bibo-mao/qemu into staging

loongarch queue

# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQQNhkKjomWfgLCz0aQfewwSUazn0QUCZ5M4AwAKCRAfewwSUazn
# 0aJAAP45/9qfbGSYiMCrBXpRFlyvtRN+GEXHEsERfk9Q1V+tQgEA/mMiUEcyc/xc
# Z1Z27cDoqUFRhPmxbd6/KyTGHzo2+As=
# =Zanw
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 24 Jan 2025 01:49:39 EST
# gpg:                using EDDSA key 0D8642A3A2659F80B0B3D1A41F7B0C1251ACE7D1
# gpg: Good signature from "bibo mao <maobibo@loongson.cn>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 7044 3A00 19C0 E97A 31C7  13C4 8E86 8FB7 A176 9D4C
#      Subkey fingerprint: 0D86 42A3 A265 9F80 B0B3  D1A4 1F7B 0C12 51AC E7D1

* tag 'pull-loongarch-20250124' of https://gitlab.com/bibo-mao/qemu:
  target/loongarch: Dump all generic CSR registers
  target/loongarch: Set unused flag with CSR registers
  target/loongarch: Add common source file for CSR register
  target/loongarch: Add common header file for CSR registers
  target/loongarch: Add generic csr function type
  target/loongarch: Remove static CSR function setting
  target/loongarch: Add dynamic function access with CSR register

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

linux-user: netlink: Add missing QEMU_IFLA entries

This fixes the following qemu warnings when building debian gupnp package:
Unknown host QEMU_IFLA type: 61
Unknown host QEMU_IFLA type: 58
Unknown host QEMU_IFLA type: 59
Unknown host QEMU_IFLA type: 60
Unknown host QEMU_IFLA type: 32820

QEMU_IFLA type 32820 is actually NLA_NESTED | QEMU_IFLA_PROP_LIST (a nested
entry), which is why rta_type needs to be masked with NLA_TYPE_MASK.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>

linux-user: netlink: add netlink neighbour emulation

Fixes various warnings in the testsuite while building gupnp:
gssdp-net-DEBUG: Failed to send netlink message: Operation not supported
gupnp-context-DEBUG: Mismatch between host header and host IP (example.com, expected: 127.0.0.1)
gupnp-context-DEBUG: Mismatch between host header and host port (80, expected 4711)
gupnp-context-DEBUG: Mismatch between host header and host IP (192.168.1.2, expected: 127.0.0.1)
gupnp-context-DEBUG: Mismatch between host header and host IP (fe80::01, expected: 127.0.0.1)
gupnp-context-DEBUG: Mismatch between host header and host port (80, expected 4711)
gupnp-context-DEBUG: Failed to parse HOST header from request: Invalid IPv6 address ?[fe80::01%1]? in URI
gupnp-context-DEBUG: Failed to parse HOST header from request: Invalid IPv6 address ?[fe80::01%eth0]? in URI
gupnp-context-DEBUG: Failed to parse HOST header from request: Could not parse port ?:1? in URI
gupnp-context-DEBUG: Mismatch between host header and host IP (example.com, expected: ::1)
gupnp-context-DEBUG: Mismatch between host header and host port (80, expected 4711)
gupnp-context-DEBUG: Mismatch between host header and host IP (example.com, expected: ::1)
gupnp-context-DEBUG: Mismatch between host header and host port (80, expected 4711)
gupnp-context-DEBUG: Mismatch between host header and host IP (example.com, expected: ::1)

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>

linux-user: netlink: Add emulation of IP_MULTICAST_IF

Add IP_MULTICAST_IF and share the code with IP_ADD_MEMBERSHIP / IP_DROP_MEMBERSHIP.
Sharing the code makes sense, because the manpage of ip(7) says:

IP_MULTICAST_IF (since Linux 1.2)
      Set the local device for a multicast socket.  The argument
      for setsockopt(2) is an ip_mreqn or (since Linux 3.5)
      ip_mreq structure similar to IP_ADD_MEMBERSHIP, or an
      in_addr structure.  (The kernel determines which structure
      is being passed based on the size passed in optlen.)  For
      getsockopt(2), the argument is an in_addr structure.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>

linux-user: netlink: Add IP_PKTINFO cmsg parsing

Fixes those warnings:
Unsupported host ancillary data: 0/8

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>

linux-user: Use unique error messages for cmsg parsing

Avoid using the same error message for two different code paths
as it complicates determining the one which actually triggered.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>

linux-user: netlink: Add missing IFA_PROTO to host_to_target_data_addr_rtattr()

Fix this warning:
Unknown host IFA type: 11

While adding IFA_PROTO, convert all IFA_XXX values over to QEMU_IFA_XXX values
to avoid a build failure on Ubuntu 22.04 (kernel v5.18 which does not know
IFA_PROTO yet).

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>

target/loongarch: Dump all generic CSR registers

CSR registers is import system control registers, it had better
dump all CSR registers when VM is running in system mode.

Here is dump output example of CSR registers:
CSR000: CRMD   b4               PRMD   4                EUEN   0                MISC   0
CSR004: ECFG   71c1c            ESTAT  0                ERA    9000000002c31300 BADV   12022c0e0
CSR008: BADI   2b0000
CSR012: EENTRY 90000000046b0000
CSR016: TLBIDX ffffffff8e000228 TLBEHI 120228000        TLBELO0 400000016f19001f TLBELO1 400000016f1a401f
CSR024: ASID   a0004            PGDL   90000001016f0000 PGDH   9000000004680000 PGD    0
CSR028: PWCL   5e56e            PWCH   2e4              STLBPS e                RVACFG 0
CSR032: CPUID  0                PRCFG1 72f8             PRCFG2 3ffff000         PRCFG3 8073f2
CSR048: SAVE0  0                SAVE1  af9c             SAVE2  12010d6a8        SAVE3  8300000
CSR052: SAVE4  0                SAVE5  0                SAVE6  0                SAVE7  0
CSR064: TID    0                TCFG   8f0ca15          TVAL   4cefd8b          CNTC   fffffffffe688aaa
CSR068: TICLR  0
CSR096: LLBCTL 1
CSR136: TLBRENTRY 46ba000       TLBRBADV ffff8000130d81e2 TLBRERA 9000000003585cb8 TLBRSAVE ffff8000130d81e0
CSR140: TLBRELO0 1fe00043       TLBRELO1 40             TLBREHI ffff8000130d800e TLBRPRMD 0
CSR384: DMW0   8000000000000001 DMW1   9000000000000011 DMW2   0                DMW3   0

Signed-off-by: Bibo Mao <maobibo@loongson.cn>

target/loongarch: Set unused flag with CSR registers

On LA464, some CSR registers are not used such as CSR_SAVE8 -
CSR_SAVE15, also CSR registers relative with MCE is not used now.

Flag CSRFL_UNUSED is added for these registers, so that it will
not dumped. In order to keep compatiblity, these CSR registers are
not removed since it is used in vmstate already.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>

target/loongarch: Add common source file for CSR register

Common source file csr.c is added here, it can be used by both
TCG mode and kvm mode. The common code is removed from file
tcg/insn_trans/trans_privileged.c.inc to csrc.c

Signed-off-by: Bibo Mao <maobibo@loongson.cn>

target/loongarch: Add common header file for CSR registers

Common header file csr.h is added here, it can be used by both
TCG mode and kvm mode.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>

target/loongarch: Add generic csr function type

Parameter type TCGv and TCGv_ptr for function GenCSRRead and GenCSRWrite
is not used in non-TCG mode. Generic csr function type is added here
with parameter void type, so that it passes to compile with non-TCG mode.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>

target/loongarch: Remove static CSR function setting

Since CSR function setting is done dynamically in TCG mode, remove
static CSR function setting here.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>

target/loongarch: Add dynamic function access with CSR register

With CSR register, dynamic function access is used for CSR register
access in TCG mode, so that csr info can be used by other modules.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>

Merge tag 'pull-request-2025-01-21v2' of https://gitlab.com/thuth/qemu into staging

* Fix bugs related to the new "boot order" feature in the s390-ccw bios
* Fix crash that occurs when introspecting older s390-virtio-ccw machines
* Fix error in pbkdf code on fast machines (e.g. s390x with crypto adapter)
* Convert kvm_xen_guest avocado test to the functional framework

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmeQpIYRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbXYKA/9HddJS8Ljxwwme2XL1uSXreTGGKjE4QO1
# NKaEyJFfu5KAXCgufr/L4mLLxc8Bdf+qEux1v9u49OadMlYf/WzG5BYW42bLBrnK
# zhZZGnuLZHU6kzhK3OMQ0kJLYVGneKU8WahHiPaOfIjuEr+6SoMfb5N8ttSOG7ry
# Np3HvA5K5m4pOL0kSMJiiCqKSzRPbzWaxxwwB5j+iD4NB5NfLo8kEH1iXqRqkEBQ
# zkM0ab0pYYYZil6DqpNQ84QbWY0qJfhj+1GhsVugTE46ePdr7t7v3K1TFq27cGPw
# seJiUAdQwjUfblmlyjcuZfXr1p2sNAY2xocg/6dyIqroOVU9SxVwqrZAOvXd9t2r
# 7UEoT0EfEkDaEaL3T2me6AEtxpkXwEw/usVHv/79vdAVX4VxHUQz3YxUnG4kByXJ
# AEwUzq9Pm7mIV6I3zZ1AZHmBxENshhL0pBGdsL9F/Wv1tkPEf1WnDJ+1d2v2Hpag
# Pr5i6RikG0x8LoT1+G2Swr43fhOLGybqIiy7T4d4WiCuR3szfj1FCeJoMTEK6jHg
# 29Fps7ypQhfkSCcMCvk8VwImb+lc5bQPrV1PKcpEnLZbf3jU6myO/Ac3j2cnfYd6
# 3HidYK3GTpL7hMegyYh/nmFNp/edsgcky7SnDvcxsedVbwLxX112DaVed1ngPXmu
# 6ZLrIhNk7BU=
# =4IXO
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 22 Jan 2025 02:55:50 EST
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2025-01-21v2' of https://gitlab.com/thuth/qemu:
  pc-bios: Update the s390 bios images with the recent changes
  pc-bios/s390-ccw: Abort IPL on invalid loadparm
  pc-bios/s390-ccw/netmain: Fix error messages with regards to the TFTP server
  pc-bios/s390-ccw: Fix boot problem with virtio-net devices
  pc-bios/s390-ccw/virtio: Add a function to reset a virtio device
  hw/s390x: Fix crash that occurs when inspecting older versioned machines types
  crypto: fix bogus error benchmarking pbkdf on fast machines
  MAINTAINERS: Remove myself as Avocado Framework reviewer
  tests/functional: Convert the kvm_xen_guest avocado test

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

pc-bios: Update the s390 bios images with the recent changes

Fix the problem with the non-quiesced virtio-net device and
make sure to abort the boot process if the user specified a wrong
loadparm parameter.

Signed-off-by: Thomas Huth <thuth@redhat.com>

pc-bios/s390-ccw: Abort IPL on invalid loadparm

Because the loadparm specifies an exact kernel the user wants to boot, if the
loadparm is invalid it must represent a misconfiguration of the guest. Thus we
should abort the IPL immediately, without attempting to use other devices, to
avoid booting into an unintended guest image.

Signed-off-by: Jared Rossi <jrossi@linux.ibm.com>
Message-ID: <20250117212235.1324063-2-jrossi@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

pc-bios/s390-ccw/netmain: Fix error messages with regards to the TFTP server

The code in net_init_ip() currently bails out early if "rc" is less
than 0, so the if-statements that check for negative "rc" codes to
print out some specific error messages with regards to the TFTP server
are never reached. Move them earlier to bring that dead code back to
life.

Reviewed-by: Jared Rossi <jrossi@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Tested-by: Jared Rossi <jrossi@linux.ibm.com>
Message-ID: <20250116115826.192047-4-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

pc-bios/s390-ccw: Fix boot problem with virtio-net devices

When we are trying to boot from virtio-net devices, the
s390-ccw bios currently leaves the virtio-net device enabled
after using it. That means that the receiving virt queues will
continue to happily write incoming network packets into memory.
This can corrupt data of the following boot process. For example,
if you set up a second guest on a virtual network and create a
lot of broadcast traffic there, e.g. with:

ping -i 0.02 -s 1400  -b 192.168.1.255

and then you try to boot a guest with two boot devices, a network
device first (which should not be bootable) and e.g. a bootable SCSI
CD second, then this guest will fail to load the kernel from the CD
image:

$ qemu-system-s390x -m 2G -nographic -device virtio-scsi-ccw \
    -netdev tap,id=net0 -device virtio-net-ccw,netdev=net0,bootindex=1 \
    -drive if=none,file=test.iso,format=raw,id=cd1 \
    -device scsi-cd,drive=cd1,bootindex=2
LOADPARM=[        ]

Network boot device detected
Network boot starting...
   Using MAC address: 52:54:00:12:34:56
   Requesting information via DHCP: done
   Using IPv4 address: 192.168.1.76
   Using TFTP server: 192.168.1.1
Trying pxelinux.cfg files...
   TFTP error: ICMP ERROR "port unreachable"
   Receiving data:  0 KBytes
Repeating TFTP read request...
   TFTP error: ICMP ERROR "port unreachable"
Failed to load OS from network.
Failed to IPL from this network!
LOADPARM=[        ]

Using virtio-scsi.

! virtio-scsi:setup:inquiry: response VS RESP=ff !
ERROR: No suitable device for IPL. Halting...

We really have to shut up the virtio-net devices after we're not
using it anymore. The easiest way to do this is to simply reset
the device, so let's do that now.

Reviewed-by: Jared Rossi <jrossi@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Tested-by: Jared Rossi <jrossi@linux.ibm.com>
Message-ID: <20250116115826.192047-3-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

pc-bios/s390-ccw/virtio: Add a function to reset a virtio device

To be able to properly silence a virtio device after using it,
we need a global function to reset the device.

Reviewed-by: Jared Rossi <jrossi@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Tested-by: Jared Rossi <jrossi@linux.ibm.com>
Message-ID: <20250116115826.192047-2-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

hw/s390x: Fix crash that occurs when inspecting older versioned machines types

qemu-system-s390x currently crashes when trying to inspect older
machines types, for example:

$ echo '{ "execute": "qmp_capabilities" }
         { "execute": "qom-list-properties","arguments":
           { "typename": "s390-ccw-virtio-3.0-machine"}}' \
   | ./qemu-system-s390x -qmp stdio -no-shutdown
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 9},
  "package": "v9.2.0-1071-g81e97df3e7"}, "capabilities": ["oob"]}}
{"return": {}}
**
Bail out! ERROR:../target/s390x/cpu_models.c:832:s390_set_qemu_cpu_model:
  assertion failed: (QTAILQ_EMPTY_RCU(&cpus_queue))
Aborted (core dumped)

The problem is that the versioned s390-ccw-virtio machine types
use instance_init() to set global state that should be initialized
before the CPUs get instantiated. But instance_init() is not called
only for the machine that is finally used, it is also called for
temporary instances of objects that are e.g. just created for
introspection. That means that those instance_init() functions can
also be called while a machine (and its CPUs) is already created,
which triggers the assertion in cpu_models.c.

So we must not use instance_init() for setting global state, but
use the machine->init() function instead, which is really only called
once when the machine comes to life.

Fixes: 3b00f702c2 ("s390x/cpumodel: add zpci, aen and ais facilities")
Message-ID: <20250120085059.239345-1-thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

crypto: fix bogus error benchmarking pbkdf on fast machines

We're seeing periodic reports of errors like:

$ qemu-img create -f luks --object secret,data=123456,id=sec0 \
                  -o key-secret=sec0 luks-info.img 1M
  Formatting 'luks-info.img', fmt=luks size=1048576 key-secret=sec0
  qemu-img: luks-info.img: Unable to get accurate CPU usage

This error message comes from a recent attempt to workaround a
kernel bug with measuring rusage in long running processes:

  commit c72cab5ad9f849bbcfcf4be7952b8b8946cc626e
  Author: Tiago Pasqualini <tiago.pasqualini@canonical.com>
  Date:   Wed Sep 4 20:52:30 2024 -0300

    crypto: run qcrypto_pbkdf2_count_iters in a new thread

Unfortunately this has a subtle bug on machines which are very fast.

On the first time around the loop, the 'iterations' value is quite
small (1 << 15), and so will run quite fast. Testing has shown that
some machines can complete this benchmarking task in as little as
7 milliseconds.

Unfortunately the 'getrusage' data is not updated at the time of
the 'getrusage' call, it is done asynchronously by the scheduler.
The 7 millisecond completion time for the benchmark is short
enough that 'getrusage' sometimes reports 0 accumulated execution
time.

As a result the 'delay_ms == 0' sanity check in the above commit
is triggering non-deterministically on such machines.

The benchmarking loop intended to run multiple times, increasing
the 'iterations' value until the benchmark ran for > 500 ms, but
the sanity check doesn't allow this to happen.

To fix it, we keep a loop counter and only run the sanity check
after we've been around the loop more than 5 times. At that point
the 'iterations' value is high enough that even with infrequent
updates of 'getrusage' accounting data on fast machines, we should
see a non-zero value.

Fixes: https://lore.kernel.org/qemu-devel/ffe542bb-310c-4616-b0ca-13182f849fd1@redhat.com/
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2336437
Reported-by: Thomas Huth <thuth@redhat.com>
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250109093746.1216300-1-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

MAINTAINERS: Remove myself as Avocado Framework reviewer

While I was very enthusiastic when Avocado was presented to
the QEMU community and pushed forward to have it integrated,
time passed and I lost interest. Be honest, remove my R: tag
to not give fake expectation I'd review patches related to
Avocado anymore.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250106055024.70139-1-philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/functional: Convert the kvm_xen_guest avocado test

Use the serial console to execute the commands in the guest instead
of using ssh since we don't have ssh support in the functional
framework yet.

Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Message-ID: <20250113082516.57894-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

Merge tag 'pull-tcg-20250117' of https://gitlab.com/rth7680/qemu into staging

tcg:
  - Add TCGOP_TYPE, TCGOP_FLAGS.
  - Pass type and flags to tcg_op_supported, tcg_target_op_def.
  - Split out tcg-target-has.h and unexport from tcg.h.
  - Reorg constraint processing; constify TCGOpDef.
  - Make extract, sextract, deposit opcodes mandatory.
  - Merge ext{8,16,32}{s,u} opcodes into {s}extract.
tcg/mips: Expand bswap unconditionally
tcg/riscv: Use SRAIW, SRLIW for {s}extract_i64
tcg/riscv: Use BEXTI for single-bit extractions
tcg/sparc64: Use SRA, SRL for {s}extract_i64

disas/riscv: Guard dec->cfg dereference for host disassemble
util/cpuinfo-riscv: Detect Zbs
accel/tcg: Call tcg_tb_insert() for one-insn TBs
linux-user: Add missing /proc/cpuinfo fields for sparc

# -----BEGIN PGP SIGNATURE-----
#
# iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmeKnzUdHHJpY2hhcmQu
# aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV+Kvgf+LG9UjXlWF9GK923E
# TllBL2rLf1OOdtTXWO15VcvGMoWDwB3tVBdhihdvXmnWju+WbfMk6mct5NhzsKn9
# LmuugMIZs+hMROj+bgMK8x47jRIh5N2rDYxcEgmyfIpYb2o9qvyqKecGVRlSJTCE
# bmt5UFbvPThBb8upoMfq3F6evuMx0szBP7wrOwSR/VGpmzIr20UTEWo6I1ALp4uj
# paFaysYol4em3dIhkiuV9cL7E0EIObaNa7l9RUci/BmTq+JaVxUnW1Y2i0PEwKwG
# FJSfYTJk3wBgAVxC2zC2g3ZM7uKuecSXMpiFopTiuyQLp7Q61i9kCNvEq0qY5tdb
# DaqR/g==
# =cv4O
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 17 Jan 2025 13:19:33 EST
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* tag 'pull-tcg-20250117' of https://gitlab.com/rth7680/qemu: (68 commits)
  softfloat: Constify helpers returning float_status field
  accel/tcg: Call tcg_tb_insert() for one-insn TBs
  tcg: Document tb_lookup() and tcg_tb_lookup()
  linux-user: Add missing /proc/cpuinfo fields for sparc
  tcg/riscv: Use BEXTI for single-bit extractions
  util/cpuinfo-riscv: Detect Zbs
  tcg: Remove TCG_TARGET_HAS_deposit_{i32,i64}
  tcg: Remove TCG_TARGET_HAS_{s}extract_{i32,i64}
  tcg/tci: Remove assertions for deposit and extract
  tcg/tci: Provide TCG_TARGET_{s}extract_valid
  tcg/sparc64: Use SRA, SRL for {s}extract_i64
  tcg/s390x: Fold the ext{8,16,32}[us] cases into {s}extract
  tcg/riscv: Use SRAIW, SRLIW for {s}extract_i64
  tcg/riscv64: Fold the ext{8,16,32}[us] cases into {s}extract
  tcg/ppc: Fold the ext{8,16,32}[us] cases into {s}extract
  tcg/mips: Fold the ext{8,16,32}[us] cases into {s}extract
  tcg/loongarch64: Fold the ext{8,16,32}[us] cases into {s}extract
  tcg/arm: Add full [US]XT[BH] into {s}extract
  tcg/aarch64: Expand extract with offset 0 with andi
  tcg/aarch64: Provide TCG_TARGET_{s}extract_valid
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'pull-riscv-to-apply-20250119-1' of https://github.com/alistair23/qemu into staging

Second RISC-V PR for 10.0

* Reduce the overhead for simple RISC-V vector unit-stride loads and stores
* Add V bit to GDB priv reg
* Add 'sha' support
* Add traces for exceptions in user mode
* Update Pointer Masking to Zjpm v1.0
* Add Smrnmi support
* Fix timebase-frequency when using KVM acceleration
* Add RISC-V Counter delegation ISA extension support
* Add support for Smdbltrp and Ssdbltrp extensions
* Introduce a translation tag for the IOMMU page table cache
* Support Supm and Sspm as part of Zjpm v1.0
* Convert htif debug prints to trace event

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEaukCtqfKh31tZZKWr3yVEwxTgBMFAmeMUUwACgkQr3yVEwxT
# gBNgDQ/+JeqcsbJRX+PZQJEV06tDIJpk+mfaBHUYSGdNkjI9fzowNaxFIEB2vaLt
# 4+xAGMnJ4vMcjJyBcPOn1FKAlowM7MsUNITOF9Rstnyriqnj2UsUZ9YBtkuG6gWH
# ZHoYEKu7mAZoZw5RRx4TatHDXw7TYfUsrDPrn+x6yeCZTq9ruRTlHkzp2LC725Vq
# KTnbWAP7WlqiJaSxB5eIFYT5tYP1Blp0yD358B037C57EU9j5zm2FQdFmVK1+xRF
# dFg/urBIzfAjjkCS/t9DmH+S6NgMEut6udUhllk/KUJAzWvsggc4wZZlWjFOJFJY
# fIxx3alhY3pcm1PYjFpf15Poz6Pqva/KGjwgZafirKQtPbRSzfRkUwcHOYRTQT9j
# abeiB44XPaeIl8Jvw7GLxcWtlJ5NmBrZho+2Z9mIhB/Ix5H3PDgs18Oc/s73P2qQ
# JFLRb7cpYy1HbRc0ugvwAmOTY1t6HX8HAtT+3rNhiXpXnj4RW2C/WU1cEqrg8QkM
# cTPiy2zHoBhAWt9aDK1Kvbhb1vur3JaF7rk9jeKlriFr87Ly+yPU+8mnEDw40NMR
# Tc9nivqmOqqXS5AM9O/W1uzTWzpxIUy7XBy3cuSk0uZCoge4IE2Or7P2Rb2uyaNZ
# RkAo/PL2N1cMjP7gB3kLRtYY7FA+nal66KhfbHPRHqj+ZwUAxzs=
# =F3IG
# -----END PGP SIGNATURE-----
# gpg: Signature made Sat 18 Jan 2025 20:11:40 EST
# gpg:                using RSA key 6AE902B6A7CA877D6D659296AF7C95130C538013
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6AE9 02B6 A7CA 877D 6D65  9296 AF7C 9513 0C53 8013

* tag 'pull-riscv-to-apply-20250119-1' of https://github.com/alistair23/qemu: (50 commits)
  hw/char/riscv_htif: Convert HTIF_DEBUG() to trace events
  target/riscv: Support Supm and Sspm as part of Zjpm v1.0
  hw/riscv/riscv-iommu.c: Introduce a translation tag for the page table cache
  target/riscv: Add Smdbltrp ISA extension enable switch
  target/riscv: Implement Smdbltrp behavior
  target/riscv: Implement Smdbltrp sret, mret and mnret behavior
  target/riscv: Add Smdbltrp CSRs handling
  target/riscv: Add Ssdbltrp ISA extension enable switch
  target/riscv: Implement Ssdbltrp exception handling
  target/riscv: Implement Ssdbltrp sret, mret and mnret behavior
  target/riscv: Add Ssdbltrp CSRs handling
  target/riscv: Fix henvcfg potentially containing stale bits
  target/riscv: Add configuration for S[m|s]csrind, Smcdeleg/Ssccfg
  target/riscv: Add implied rule for counter delegation extensions
  target/riscv: Invoke pmu init after feature enable
  target/riscv: Add counter delegation/configuration support
  target/riscv: Add select value range check for counter delegation
  target/riscv: Add counter delegation definitions
  target/riscv: Add properties for counter delegation ISA extensions
  target/riscv: Support generic CSR indirect access
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

hw/char/riscv_htif: Convert HTIF_DEBUG() to trace events

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250116223609.81594-1-philmd@linaro.org>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Support Supm and Sspm as part of Zjpm v1.0

The Zjpm v1.0 spec states there should be Supm and Sspm extensions that
are used in profile specification. Enabling Supm extension enables both
Ssnpm and Smnpm, while Sspm enables only Smnpm.

Signed-off-by: Alexey Baturo <baturo.alexey@gmail.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250113194410.1307494-1-baturo.alexey@gmail.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

hw/riscv/riscv-iommu.c: Introduce a translation tag for the page table cache

This commit introduces a translation tag to avoid invalidating an entry
that should not be invalidated when IOMMU executes invalidation commands.
E.g. IOTINVAL.VMA with GV=0, AV=0, PSCV=1 invalidates both a mapping
of single stage translation and a mapping of nested translation with
the same PSCID, but only the former one should be invalidated.

Signed-off-by: Jason Chien <jason.chien@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20241108110147.11178-1-jason.chien@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Add Smdbltrp ISA extension enable switch

Add the switch to enable the Smdbltrp ISA extension and disable it for
the max cpu. Indeed, OpenSBI when Smdbltrp is present, M-mode double
trap is enabled by default and MSTATUS.MDT needs to be cleared to avoid
taking a double trap. OpenSBI does not currently support it so disable
it for the max cpu to avoid breaking regression tests.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250116131539.2475785-1-cleger@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Implement Smdbltrp behavior

When the Smsdbltrp ISA extension is enabled, if a trap happens while
MSTATUS.MDT is already set, it will trigger an abort or an NMI is the
Smrnmi extension is available.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250110125441.3208676-9-cleger@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Implement Smdbltrp sret, mret and mnret behavior

When the Ssdbltrp extension is enabled, SSTATUS.MDT field is cleared
when executing sret if executed in M-mode. When executing mret/mnret,
SSTATUS.MDT is cleared.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250110125441.3208676-8-cleger@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Add Smdbltrp CSRs handling

Add `ext_smdbltrp`in RISCVCPUConfig and implement MSTATUS.MDT behavior.
Also set MDT to 1 at reset according to the specification.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250110125441.3208676-7-cleger@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Add Ssdbltrp ISA extension enable switch

Add the switch to enable the Ssdbltrp ISA extension.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250110125441.3208676-6-cleger@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Implement Ssdbltrp exception handling

When the Ssdbltrp ISA extension is enabled, if a trap happens in S-mode
while SSTATUS.SDT isn't cleared, generate a double trap exception to
M-mode.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250110125441.3208676-5-cleger@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Implement Ssdbltrp sret, mret and mnret behavior

When the Ssdbltrp extension is enabled, SSTATUS.SDT field is cleared
when executing sret. When executing mret/mnret, SSTATUS.SDT is cleared
when returning to U, VS or VU and VSSTATUS.SDT is cleared when returning
to VU from HS.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250110125441.3208676-4-cleger@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Add Ssdbltrp CSRs handling

Add ext_ssdbltrp in RISCVCPUConfig and implement MSTATUS.SDT,
{H|M}ENVCFG.DTE and modify the availability of MTVAL2 based on the
presence of the Ssdbltrp ISA extension.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250110125441.3208676-3-cleger@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Fix henvcfg potentially containing stale bits

With the current implementation, if we had the following scenario:
- Set bit x in menvcfg
- Set bit x in henvcfg
- Clear bit x in menvcfg
then, the internal variable env->henvcfg would still contain bit x due
to both a wrong menvcfg mask used in write_henvcfg() as well as a
missing update of henvcfg upon menvcfg update.
This can lead to some wrong interpretation of the context. In order to
update henvcfg upon menvcfg writing, call write_henvcfg() after writing
menvcfg. Clearing henvcfg upon writing the new value is also needed in
write_henvcfg() as well as clearing henvcfg upper part when writing it
with write_henvcfgh().

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250110125441.3208676-2-cleger@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Add configuration for S[m|s]csrind, Smcdeleg/Ssccfg

Add configuration options so that they can be enabled/disabld from
qemu commandline.

Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Message-ID: <20250110-counter_delegation-v5-11-e83d797ae294@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Add implied rule for counter delegation extensions

The counter delegation/configuration extensions depend on the following
extensions.

1. Smcdeleg - To enable counter delegation from M to S
2. S[m|s]csrind - To enable indirect access CSRs

Add an implied rule so that these extensions are enabled by default
if the sscfg extension is enabled.

Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Message-ID: <20250110-counter_delegation-v5-10-e83d797ae294@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

target/riscv: Invoke pmu init after feature enable

The dependant ISA features are enabled at the end of cpu_realize
in finalize_features. Thus, PMU init should be invoked after that
only. Move the init invocation to riscv_tcg_cpu_finalize_features.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Message-ID: <20250110-counter_delegation-v5-9-e83d797ae294@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>