www.infradead.org Git - users/dwmw2/qemu.git/log

scsi: move unmap error checking to the complete callback

This will help to account the operation in the following commit.

The difference is that we don't call scsi_disk_req_check_error() before
the 1st discard iteration anymore. That function also checks if
the request is cancelled, however it shouldn't get canceled until it
yields in blk_aio() functions anyway.
Same approach is already used for emulate_write_same.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190923121737.83281-7-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

scsi: store unmap offset and nb_sectors in request struct

it allows to report it in the error handler

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Message-id: 20190923121737.83281-6-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

ide: account UNMAP (TRIM) operations

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190923121737.83281-5-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block: add empty account cookie type

Each block_acct_done/failed call is designed to correspond to a
previous block_acct_start call, which initializes the stats cookie.
However sometimes it is not the case, e.g. some error paths might
report the same cookie twice because it is hard to accurately track if
the cookie was reported yet or not.

This patch cleans the cookie after report.
(Note: block_acct_failed/done without a previous block_acct_start at
all should be avoided. Uninitialized cookie might hold a garbage value
and there is still "< BLOCK_MAX_IOTYPE" assertion for that)

It will be particularly useful in ide code where it's hard to
keep track whether the request done its accounting or not: in the
following patch of the series, trim requests will do the accounting
separately.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190923121737.83281-4-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

qapi: add unmap to BlockDeviceStats

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20190923121737.83281-3-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

qapi: group BlockDeviceStats fields

Make the stat fields definition slightly more readable.
Also reorder total_time_ns stats read-write-flush as done elsewhere.
Cosmetic change only.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190923121737.83281-2-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

iotests: 257: drop device_add

SCSI devices are unused in test, drop them.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-12-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

iotests: 257: drop unused Drive.device field

After previous commit Drive.device is actually unused. Drop it together
with .name property. While being here reuse .node in qmp commands
instead of writing 'drive0' twice.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-11-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

iotests: prepare 124 and 257 bitmap querying for backup-top filter

After backup-top filter appearing it's not possible to see dirty
bitmaps in top node, so use node-name instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-10-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block: teach bdrv_debug_breakpoint skip filters with backing

Teach bdrv_debug_breakpoint and bdrv_debug_remove_breakpoint skip
filters with backing. This is needed to implement and use in backup job
it's own backup_top filter driver (like mirror already has one), and
without this improvement, breakpoint removal will fail at least in 55
iotest.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-9-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block: move block_copy from block/backup.c to separate file

Split block_copy to separate file, to be cleanly shared with backup-top
filter driver in further commits.

It's a clean movement, the only change is drop "static" from interface
functions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-8-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block/backup: fix block-comment style

We need to fix comment style around block-copy functions before further
moving them to separate file to satisfy checkpatch. But do more: fix
all comments style. Also, seems like doubled first asterisk is not
forbidden, but drop it too for consistency.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-7-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block/backup: introduce BlockCopyState

Split copying code part from backup to "block-copy", including separate
state structure and function renaming. This is needed to share it with
backup-top filter driver in further commits.

Notes:

1. As BlockCopyState keeps own BlockBackend objects, remaining
job->common.blk users only use it to get bs by blk_bs() call, so clear
job->commen.blk permissions set in block_job_create and add
job->source_bs to be used instead of blk_bs(job->common.blk), to keep
it more clear which bs we use when introduce backup-top filter in
further commit.

2. Rename s/initializing_bitmap/skip_unallocated/ to sound a bit better
as interface to BlockCopyState

3. Split is not very clean: there left some duplicated fields, backup
code uses some BlockCopyState fields directly, let's postpone it for
further improvements and keep this comment simpler for review.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190920142056.12778-6-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block/backup: improve comment about image fleecing

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-5-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block/backup: split shareable copying part from backup_do_cow

Split copying logic which will be shared with backup-top filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block/backup: fix backup_cow_with_offload for last cluster

We shouldn't try to copy bytes beyond EOF. Fix it.

Fixes: 9ded4a0114968e
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920142056.12778-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block/backup: fix max_transfer handling for copy_range

Of course, QEMU_ALIGN_UP is a typo, it should be QEMU_ALIGN_DOWN, as we
are trying to find aligned size which satisfy both source and target.
Also, don't ignore too small max_transfer. In this case seems safer to
disable copy_range.

Fixes: 9ded4a0114968e
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190920142056.12778-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block/qcow2: introduce parallel subrequest handling in read and write

It improves performance for fragmented qcow2 images.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190916175324.18478-6-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block/qcow2: refactor qcow2_co_pwritev_part

Similarly to previous commit, prepare for parallelizing write-loop
iterations.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190916175324.18478-5-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block/qcow2: refactor qcow2_co_preadv_part

Further patch will run partial requests of iterations of
qcow2_co_preadv in parallel for performance reasons. To prepare for
this, separate part which may be parallelized into separate function
(qcow2_co_preadv_task).

While being here, also separate encrypted clusters reading to own
function, like it is done for compressed reading.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190916175324.18478-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block: introduce aio task pool

Common interface for aio task loops. To be used for improving
performance of synchronous io loops in qcow2, block-stream,
copy-on-read, and may be other places.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190916175324.18478-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

qemu-iotests: ignore leaks on failure paths in 026

Upcoming asynchronous handling of sub-parts of qcow2 requests will
change number of leaked clusters and even make it racy. As a
preparation, ignore leaks on failure parts in 026.

It's not trivial to just grep or substitute qemu-img output for such
thing. Instead do better: 3 is a error code of qemu-img check, if only
leaks are found. Catch this case and print success output.

Suggested-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190916175324.18478-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

This pull request also contains the two commits from the previous pull request
that was dropped due to a mingw compilation error.  The compilation should now
be fixed.

# gpg: Signature made Tue 08 Oct 2019 15:54:26 BST
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  iotests/262: Switch source/dest VM launch order
  block: Skip COR for inactive nodes
  virtio-blk: schedule virtio_notify_config to run on main context
  util/ioc.c: try to reassure Coverity about qemu_iovec_init_extended

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

iotests/262: Switch source/dest VM launch order

Launching the destination VM before the source VM gives us a regression
test for HEAD^:

The guest device causes a read from the disk image through
guess_disk_lchs(). This will not work if the first sector (containing
the partition table) is yet unallocated, we use COR, and the node is
inactive.

By launching the source VM before the destination, however, the COR
filter on the source will allocate that area in the image shared between
both VMs, thus the problem will not become apparent.

Switching the launch order causes the sector to still be unallocated
when guess_disk_lchs() runs on the inactive node in the destination VM,
and thus we get our test case.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191001174827.11081-3-mreitz@redhat.com
Message-Id: <20191001174827.11081-3-mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Skip COR for inactive nodes

We must not write data to inactive nodes, and a COR is certainly
something we can simply not do without upsetting anyone. So skip COR
operations on inactive nodes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191001174827.11081-2-mreitz@redhat.com
Message-Id: <20191001174827.11081-2-mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

virtio-blk: schedule virtio_notify_config to run on main context

virtio_notify_config() needs to acquire the global mutex, which isn't
allowed from an iothread, and may lead to a deadlock like this:

- main thead
  * Has acquired: qemu_global_mutex.
  * Is trying the acquire: iothread AioContext lock via
    AIO_WAIT_WHILE (after aio_poll).

- iothread
  * Has acquired: AioContext lock.
  * Is trying to acquire: qemu_global_mutex (via
    virtio_notify_config->prepare_mmio_access).

If virtio_blk_resize() is called from an iothread, schedule
virtio_notify_config() to be run in the main context BH.

[Removed unnecessary newline as suggested by Kevin Wolf
<kwolf@redhat.com>.
--Stefan]

Signed-off-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20190916112411.21636-1-slp@redhat.com
Message-Id: <20190916112411.21636-1-slp@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

util/ioc.c: try to reassure Coverity about qemu_iovec_init_extended

Make it more obvious, that filling qiov corresponds to qiov allocation,
which in turn corresponds to total_niov calculation, based on mid_niov
(not mid_len). Still add an assertion to show that there should be no
difference.

[Added mingw "error: 'mid_iov' may be used uninitialized in this
function" compiler error fix suggested by Vladimir.
--Stefan]

Reported-by: Coverity (CID 1405302)
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190910090310.14032-1-vsementsov@virtuozzo.com
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190910090310.14032-1-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
fixup! util/ioc.c: try to reassure Coverity about qemu_iovec_init_extended

Merge remote-tracking branch 'remotes/philmd-gitlab/tags/edk2-next-20191007' into staging

Improve scripts relying on the EDK2 submodule,
drop Python2 dependency in EDK2 build scripts.

# gpg: Signature made Mon 07 Oct 2019 14:31:38 BST
# gpg:                using RSA key 89C1E78F601EE86C867495CBA2A3FD6EDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (Phil) <philmd@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 89C1 E78F 601E E86C 8674  95CB A2A3 FD6E DEAD C0DE

* remotes/philmd-gitlab/tags/edk2-next-20191007:
  edk2 build scripts: work around TianoCore#1607 without forcing Python 2
  edk2 build scripts: honor external BaseTools flags with uefi-test-tools
  roms: Add a 'make help' target alias
  roms/Makefile.edk2: don't pull in submodules when building from tarball
  make-release: pull in edk2 submodules so we can build it from tarballs

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging

slirp: Allow non-local DNS address when restrict is off

# gpg: Signature made Mon 07 Oct 2019 00:54:44 BST
# gpg:                using RSA key 5ED9E856F7D6C6EAF51167A18D35C355720BBAFD
# gpg: Good signature from "Samuel Thibault <samuel.thibault@aquilenet.fr>" [unknown]
# gpg:                 aka "Samuel Thibault <sthibault@debian.org>" [marginal]
# gpg:                 aka "Samuel Thibault <samuel.thibault@gnu.org>" [unknown]
# gpg:                 aka "Samuel Thibault <samuel.thibault@inria.fr>" [marginal]
# gpg:                 aka "Samuel Thibault <samuel.thibault@labri.fr>" [marginal]
# gpg:                 aka "Samuel Thibault <samuel.thibault@ens-lyon.org>" [marginal]
# gpg:                 aka "Samuel Thibault <samuel.thibault@u-bordeaux.fr>" [unknown]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82  304B D017 8C76 7D06 9EE6
#      Subkey fingerprint: 5ED9 E856 F7D6 C6EA F511  67A1 8D35 C355 720B BAFD

* remotes/thibault/tags/samuel-thibault:
  slirp: Allow non-local DNS address when restrict is off

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- Fix internal snapshots with typical -blockdev setups
- iotests: Require Python 3.6 or later

# gpg: Signature made Fri 04 Oct 2019 10:59:21 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  iotests: Remove Python 2 compatibility code
  iotests: Require Python 3.6 or later
  iotests: Test internal snapshots with -blockdev
  block/snapshot: Restrict set of snapshot nodes

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

edk2 build scripts: work around TianoCore#1607 without forcing Python 2

It turns out that forcing python2 for running the edk2 "build" utility is
neither necessary nor sufficient.

Forcing python2 is not sufficient for two reasons:

- QEMU is moving away from python2, with python2 nearing EOL,

- according to my most recent testing, the lacking dependency information
in the makefiles that are generated by edk2's "build" utility can cause
parallel build failures even when "build" is executed by python2.

And forcing python2 is not necessary because we can still return to the
original idea of filtering out jobserver-related options from MAKEFLAGS.
So do that.

While at it, cut short edk2's auto-detection of the python3.* minor
version, by setting PYTHON_COMMAND to "python3" (which we expect to be
available wherever we intend to build edk2).

With this patch, the guest UEFI binaries that are used as part of the BIOS
tables test, and the OVMF and ArmVirtQemu platform firmwares, will be
built strictly in a single job, regardless of an outermost "-jN" make
option. Alas, there appears to be no reliable way to build edk2 in an
(outer make, inner make) environment, with a jobserver enabled.

Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Reported-by: John Snow <jsnow@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190920083808.21399-3-lersek@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>

edk2 build scripts: honor external BaseTools flags with uefi-test-tools

Unify the recipe for "build-edk2-tools" in
"tests/uefi-test-tools/Makefile" with the recipe for "edk2-basetools" in
"roms/Makefile".

Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190920083808.21399-2-lersek@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>

roms: Add a 'make help' target alias

Various C projects provide a 'make help' target. Our root directory
does so. The roms/ directory lacks a such rule, but already displays
a help output when the default target is called.
Add a 'help' target aliased to the default one, to avoid:

$ make -C roms help
make: *** No rule to make target 'help'. Stop.

Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190920171159.18633-1-philmd@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>

roms/Makefile.edk2: don't pull in submodules when building from tarball

Currently the `make efi` target pulls submodules nested under the
roms/edk2 submodule as dependencies. However, when we attempt to build
from a tarball this fails since we are no longer in a git tree.

A preceding patch will pre-populate these submodules in the tarball,
so assume this build dependency is only needed when building from a
git tree.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Bruce Rogers <brogers@suse.com>
Cc: qemu-stable@nongnu.org # v4.1.0
Reported-by: Bruce Rogers <brogers@suse.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20190912231202.12327-3-mdroth@linux.vnet.ibm.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>

make-release: pull in edk2 submodules so we can build it from tarballs

The `make efi` target added by 536d2173 is built from the roms/edk2
submodule, which in turn relies on additional submodules nested under
roms/edk2.

The make-release script currently only pulls in top-level submodules,
so these nested submodules are missing in the resulting tarball.

We could try to address this situation more generally by recursively
pulling in all submodules, but this doesn't necessarily ensure the
end-result will build properly (this case also required other changes).

Additionally, due to the nature of submodules, we may not always have
control over how these sorts of things are dealt with, so for now we
continue to handle it on a case-by-case in the make-release script.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Bruce Rogers <brogers@suse.com>
Cc: qemu-stable@nongnu.org # v4.1.0
Reported-by: Bruce Rogers <brogers@suse.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20190912231202.12327-2-mdroth@linux.vnet.ibm.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20191004' into staging

ppc patch queue 2019-10-04

Here's the next batch of ppc and spapr patches.  Includes:
  * Fist part of a large cleanup to irq infrastructure
  * Recreate the full FDT at CAS time, instead of making a difficult
    to follow set of updates.  This will help us move towards
    eliminating CAS reboots altogether
  * No longer provide RTAS blob to SLOF - SLOF can include it just as
    well itself, since guests will generally need to relocate it with
    a call to instantiate-rtas
  * A number of DFP fixes and cleanups from Mark Cave-Ayland
  * Assorted bugfixes
  * Several new small devices for powernv

# gpg: Signature made Fri 04 Oct 2019 10:35:57 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-4.2-20191004: (53 commits)
  ppc/pnv: Remove the XICSFabric Interface from the POWER9 machine
  spapr: Eliminate SpaprIrq::init hook
  spapr: Add return value to spapr_irq_check()
  spapr: Use less cryptic representation of which irq backends are supported
  xive: Improve irq claim/free path
  spapr, xics, xive: Better use of assert()s on irq claim/free paths
  spapr: Handle freeing of multiple irqs in frontend only
  spapr: Remove unhelpful tracepoints from spapr_irq_free_xics()
  spapr: Eliminate SpaprIrq:get_nodename method
  spapr: Simplify spapr_qirq() handling
  spapr: Fix indexing of XICS irqs
  spapr: Eliminate nr_irqs parameter to SpaprIrq::init
  spapr: Clarify and fix handling of nr_irqs
  spapr: Replace spapr_vio_qirq() helper with spapr_vio_irq_pulse() helper
  spapr: Fold spapr_phb_lsi_qirq() into its single caller
  xics: Create sPAPR specific ICS subtype
  xics: Merge TYPE_ICS_BASE and TYPE_ICS_SIMPLE classes
  xics: Eliminate reset hook
  xics: Rename misleading ics_simple_*() functions
  xics: Eliminate 'reject', 'resend' and 'eoi' class hooks
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Compilation fix for KVM (Alex)
* SMM fix (Dmitry)
* VFIO error reporting (Eric)
* win32 fixes and workarounds (Marc-André)
* qemu-pr-helper crash bugfix (Maxim)
* Memory leak fixes (myself)
* VMX features (myself)
* Record-replay deadlock (Pavel)
* i386 CPUID bits (Sebastian)
* kconfig tweak (Thomas)
* Valgrind fix (Thomas)
* Autoconverge test (Yury)

# gpg: Signature made Fri 04 Oct 2019 17:57:48 BST
# gpg:                using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (29 commits)
  target/i386/kvm: Silence warning from Valgrind about uninitialized bytes
  target/i386: work around KVM_GET_MSRS bug for secondary execution controls
  target/i386: add VMX features
  vmxcap: correct the name of the variables
  target/i386: add VMX definitions
  target/i386: expand feature words to 64 bits
  target/i386: introduce generic feature dependency mechanism
  target/i386: handle filtered_features in a new function mark_unavailable_features
  tests/docker: only enable ubsan for test-clang
  win32: work around main-loop busy loop on socket/fd event
  tests: skip serial test on windows
  util: WSAEWOULDBLOCK on connect should map to EINPROGRESS
  Fix wrong behavior of cpu_memory_rw_debug() function in SMM
  memory: allow memory_region_register_iommu_notifier() to fail
  vfio: Turn the container error into an Error handle
  i386: Add CPUID bit for CLZERO and XSAVEERPTR
  docker: test-debug: disable LeakSanitizer
  lm32: do not leak memory on object_new/object_unref
  cris: do not leak struct cris_disasm_data
  mips: fix memory leaks in board initialization
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/i386/kvm: Silence warning from Valgrind about uninitialized bytes

When I run QEMU with KVM under Valgrind, I currently get this warning:

Syscall param ioctl(generic) points to uninitialised byte(s)
    at 0x95BA45B: ioctl (in /usr/lib64/libc-2.28.so)
    by 0x429DC3: kvm_ioctl (kvm-all.c:2365)
    by 0x51B249: kvm_arch_get_supported_msr_feature (kvm.c:469)
    by 0x4C2A49: x86_cpu_get_supported_feature_word (cpu.c:3765)
    by 0x4C4116: x86_cpu_expand_features (cpu.c:5065)
    by 0x4C7F8D: x86_cpu_realizefn (cpu.c:5242)
    by 0x5961F3: device_set_realized (qdev.c:835)
    by 0x7038F6: property_set_bool (object.c:2080)
    by 0x707EFE: object_property_set_qobject (qom-qobject.c:26)
    by 0x705814: object_property_set_bool (object.c:1338)
    by 0x498435: pc_new_cpu (pc.c:1549)
    by 0x49C67D: pc_cpus_init (pc.c:1681)
  Address 0x1ffeffee74 is on thread 1's stack
  in frame #2, created by kvm_arch_get_supported_msr_feature (kvm.c:445)

It's harmless, but a little bit annoying, so silence it by properly
initializing the whole structure with zeroes.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: work around KVM_GET_MSRS bug for secondary execution controls

Some secondary controls are automatically enabled/disabled based on the CPUID
values that are set for the guest. However, they are still available at a
global level and therefore should be present when KVM_GET_MSRS is sent to
/dev/kvm.

Unfortunately KVM forgot to include those, so fix that.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: add VMX features

Add code to convert the VMX feature words back into MSR values,
allowing the user to enable/disable VMX features as they wish. The same
infrastructure enables support for limiting VMX features in named
CPU models.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

vmxcap: correct the name of the variables

The low bits are 1 if the control must be one, the high bits
are 1 if the control can be one. Correct the variable names
as they are very confusing.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: add VMX definitions

These will be used to compile the list of VMX features for named
CPU models, and/or by the code that sets up the VMX MSRs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: expand feature words to 64 bits

VMX requires 64-bit feature words for the IA32_VMX_EPT_VPID_CAP
and IA32_VMX_BASIC MSRs. (The VMX control MSRs are 64-bit wide but
actually have only 32 bits of information).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: introduce generic feature dependency mechanism

Sometimes a CPU feature does not make sense unless another is
present.  In the case of VMX features, KVM does not even allow
setting the VMX controls to some invalid combinations.

Therefore, this patch adds a generic mechanism that looks for bits
that the user explicitly cleared, and uses them to remove other bits
from the expanded CPU definition.  If these dependent bits were also
explicitly *set* by the user, this will be a warning for "-cpu check"
and an error for "-cpu enforce".  If not, then the dependent bits are
cleared silently, for convenience.

With VMX features, this will be used so that for example
"-cpu host,-rdrand" will also hide support for RDRAND exiting.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: handle filtered_features in a new function mark_unavailable_features

The next patch will add a different reason for filtering features, unrelated
to host feature support. Extract a new function that takes care of disabling
the features and optionally reporting them.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

tests/docker: only enable ubsan for test-clang

-fsanitize=undefined is not the same thing as --enable-sanitizers. After
commit 47c823e ("tests/docker: add sanitizers back to clang build", 2019-09-11)
test-clang is almost duplicating the asan (test-debug) test, so
partly revert commit 47c823e5b while leaving ubsan enabled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

win32: work around main-loop busy loop on socket/fd event

Commit 05e514b1d4d5bd4209e2c8bbc76ff05c85a235f3 introduced an AIO
context optimization to avoid calling event_notifier_test_and_clear() on
ctx->notifier. On Windows, the same notifier is being used to wakeup the
wait on socket events (see commit
d3385eb448e38f828c78f8f68ec5d79c66a58b5d).

The ctx->notifier event is added to the gpoll sources in
aio_set_event_notifier(), aio_ctx_check() should clear the event
regardless of ctx->notified, since Windows sets the event by itself,
bypassing the aio->notified. This fixes qemu not clearing the event
resulting in a busy loop.

Paolo suggested to me on irc to call event_notifier_test_and_clear()
after select() >0 from aio-win32.c's aio_prepare. Unfortunately, not all
fds associated with ctx->notifiers are in AIO fd handlers set.
(qemu_set_nonblock() in util/oslib-win32.c calls qemu_fd_register()).

This is essentially a v2 of a patch that was sent earlier:
https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg00420.html

that resurfaced when James investigated Spice performance issues on Windows:
https://gitlab.freedesktop.org/spice/spice/issues/36

In order to test that patch, I simply tried running test-char on
win32, and it hangs. Applying that patch solves it. QIO idle sources
are not dispatched. I haven't investigated much further, I suspect
source priorities and busy looping still come into play.

This version keeps the "notified" field, so event_notifier_poll()
should still work as expected.

Cc: James Le Cuirot <chewi@gentoo.org>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

tests: skip serial test on windows

Serial test is currently hard-coded to /dev/null.

On Windows, serial chardev expect a COM: device, which may not be
availble.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util: WSAEWOULDBLOCK on connect should map to EINPROGRESS

In general, WSAEWOULDBLOCK can be mapped to EAGAIN as done by
socket_error() (or EWOULDBLOCK). But for connect() with non-blocking
sockets, it actually means the operation is in progress:

https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-connect
"The socket is marked as nonblocking and the connection cannot be completed immediately."

(this is also the behaviour implemented by GLib GSocket)

This fixes socket_can_bind_connect() test on win32.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Fix wrong behavior of cpu_memory_rw_debug() function in SMM

There is a problem, that you don't have access to the data using cpu_memory_rw_debug() function when in SMM. You can't remotely debug SMM mode program because of that for example.
Likely attrs version of get_phys_page_debug should be used to get correct asidx at the end to handle access properly.
Here the patch to fix it.

Signed-off-by: Dmitry Poletaev <poletaev@ispras.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

memory: allow memory_region_register_iommu_notifier() to fail

Currently, when a notifier is attempted to be registered and its
flags are not supported (especially the MAP one) by the IOMMU MR,
we generally abruptly exit in the IOMMU code. The failure could be
handled more nicely in the caller and especially in the VFIO code.

So let's allow memory_region_register_iommu_notifier() to fail as
well as notify_flag_changed() callback.

All sites implementing the callback are updated. This patch does
not yet remove the exit(1) in the amd_iommu code.

in SMMUv3 we turn the warning message into an error message saying
that the assigned device would not work properly.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

vfio: Turn the container error into an Error handle

The container error integer field is currently used to store
the first error potentially encountered during any
vfio_listener_region_add() call. However this fails to propagate
detailed error messages up to the vfio_connect_container caller.
Instead of using an integer, let's use an Error handle.

Messages are slightly reworded to accomodate the propagation.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386: Add CPUID bit for CLZERO and XSAVEERPTR

The CPUID bits CLZERO and XSAVEERPTR are availble on AMD's ZEN platform
and could be passed to the guest.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

docker: test-debug: disable LeakSanitizer

There are just too many leaks in device-introspect-test (especially for
the plethora of arm and aarch64 boards) to make LeakSanitizer useful;
disable it for now.

Whoever is interested in debugging leaks can also use valgrind like this:

   QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64 \
   QTEST_QEMU_IMG=qemu-img \
   valgrind --trace-children=yes --leak-check=full \
   tests/device-introspect-test -p /aarch64/device/introspect/concrete/defaults/none

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

lm32: do not leak memory on object_new/object_unref

Bottom halves and ptimers are malloced, but nothing in these
files is freeing memory allocated by instance_init. Since
these are sysctl devices that are never unrealized, just moving
the allocations to realize is enough to avoid the leak in
practice (and also to avoid upsetting asan when running
device-introspect-test).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

cris: do not leak struct cris_disasm_data

Use a stack-allocated struct to avoid a memory leak.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

mips: fix memory leaks in board initialization

They are not a big deal, but they upset asan.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>

hppa: fix leak from g_strdup_printf

memory_region_init_* takes care of copying the name into memory it owns.
Free it in the caller.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

mcf5208: fix leak from qemu_allocate_irqs

The array returned by qemu_allocate_irqs is malloced, free it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>

microblaze: fix leak of fdevice tree blob

The device tree blob returned by load_device_tree is malloced.
Free it before returning.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

ide: fix leak from qemu_allocate_irqs

The array returned by qemu_allocate_irqs is malloced, free it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>

hw/isa: Introduce a CONFIG_ISA_SUPERIO switch for isa-superio.c

Currently, isa-superio.c is always compiled as soon as CONFIG_ISA_BUS
is enabled. But there are also machines that have an ISA BUS without
any of the superio chips attached to it, so we should not compile
isa-superio.c in case we only compile a QEMU for such a machine.
Thus add a proper CONFIG_ISA_SUPERIO switch so that this file only gets
compiled when we really, really need it.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

iotests: Remove Python 2 compatibility code

Some scripts check the Python version number and have two code paths to
accomodate both Python 2 and 3. Remove the code specific to Python 2 and
assert the minimum version of 3.6 instead (check skips Python tests in
this case, so the assertion would only ever trigger if a Python script
is executed manually).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

iotests: Require Python 3.6 or later

Running iotests is not required to build QEMU, so we can have stricter
version requirements for Python here and can make use of new features
and drop compatibility code earlier.

This makes qemu-iotests skip all Python tests if a Python version before
3.6 is used for the build.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

iotests: Test internal snapshots with -blockdev

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>

block/snapshot: Restrict set of snapshot nodes

Nodes involved in internal snapshots were those that were returned by
bdrv_next(), inserted and not read-only. bdrv_next() in turn returns all
nodes that are either the root node of a BlockBackend or monitor-owned
nodes.

With the typical -drive use, this worked well enough. However, in the
typical -blockdev case, the user defines one node per option, making all
nodes monitor-owned nodes. This includes protocol nodes etc. which often
are not snapshottable, so "savevm" only returns an error.

Change the conditions so that internal snapshot still include all nodes
that have a BlockBackend attached (we definitely want to snapshot
anything attached to a guest device and probably also the built-in NBD
server; snapshotting block job BlockBackends is more of an accident, but
a preexisting one), but other monitor-owned nodes are only included if
they have no parents.

This makes internal snapshots usable again with typical -blockdev
configurations.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>

ppc/pnv: Remove the XICSFabric Interface from the POWER9 machine

The POWER8 PowerNV machine needs to implement a XICSFabric interface
as this is the POWER8 interrupt controller model. But the POWER9
machine uselessly inherits of XICSFabric from the common PowerNV
machine definition.

Open code machine definitions to have a better control on the
different interfaces each machine should define.

Fixes: f30c843ced50 ("ppc/pnv: Introduce PowerNV machines with fixed CPU models")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191003143617.21682-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: Eliminate SpaprIrq::init hook

This method is used to set up the interrupt backends for the current
configuration. However, this means some confusing redirection between
the "dual" mode init and the init hooks for xics only and xive only modes.

Since we now have simple flags indicating whether XICS and/or XIVE are
supported, it's easier to just open code each initialization directly in
spapr_irq_init(). This will also make some future cleanups simpler.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>

spapr: Add return value to spapr_irq_check()

Explicitly return success or failure, rather than just relying on the
Error ** parameter. This makes handling it less verbose in the caller.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>

spapr: Use less cryptic representation of which irq backends are supported

SpaprIrq::ov5 stores the value for a particular byte in PAPR option vector
5 which indicates whether XICS, XIVE or both interrupt controllers are
available. As usual for PAPR, the encoding is kind of overly complicated
and confusing (though to be fair there are some backwards compat things it
has to handle).

But to make our internal code clearer, have SpaprIrq encode more directly
which backends are available as two booleans, and derive the OV5 value from
that at the point we need it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>

xive: Improve irq claim/free path

spapr_xive_irq_claim() returns a bool to indicate if it succeeded.
But most of the callers and one callee use int return values and/or an
Error * with more information instead. In any case, ints are a more
common idiom for success/failure states than bools (one never knows
what sense they'll be in).

So instead change to an int return value to indicate presence of error
+ an Error * to describe the details through that call chain.

It also didn't actually check if the irq was already claimed, which is
one of the primary purposes of the claim path, so do that.

spapr_xive_irq_free() also returned a bool... which no callers checked
and was always true, so just drop it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>

spapr, xics, xive: Better use of assert()s on irq claim/free paths

The irq claim and free paths for both XICS and XIVE check for some
validity conditions. Some of these represent genuine runtime failures,
however others - particularly checking that the basic irq number is in a
sane range - could only fail in the case of bugs in the callin code.
Therefore use assert()s instead of runtime failures for those.

In addition the non backend-specific part of the claim/free paths should
only be used for PAPR external irqs, that is in the range SPAPR_XIRQ_BASE
to the maximum irq number. Put assert()s for that into the top level
dispatchers as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>

spapr: Handle freeing of multiple irqs in frontend only

spapr_irq_free() can be used to free multiple irqs at once. That's useful
for its callers, but there's no need to make the individual backend hooks
handle this. We can loop across the irqs in spapr_irq_free() itself and
have the hooks just do one at time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>

spapr: Remove unhelpful tracepoints from spapr_irq_free_xics()

These traces contain some useless information (the always-0 source#) and
have no equivalents for XIVE mode. For now just remove them, and we can
put back something more sensible if and when we need it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

spapr: Eliminate SpaprIrq:get_nodename method

This method is used to determine the name of the irq backend's node in the
device tree, so that we can find its phandle (after SLOF may have modified
it from the phandle we initially gave it).

But, in the two cases the only difference between the node name is the
presence of a unit address. Searching for a node name without considering
unit address is standard practice for the device tree, and
fdt_subnode_offset() will do exactly that, making this method unecessary.

While we're there, remove the XICS_NODENAME define. The name
"interrupt-controller" is required by PAPR (and IEEE1275), and a bunch of
places assume it already.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

spapr: Simplify spapr_qirq() handling

Currently spapr_qirq(), whic is used to find the qemu_irq for an spapr
global irq number, redirects through the SpaprIrq::qirq method. But
the array of qemu_irqs is allocated in the PAPR layer, not the
backends, and so the method implementations all return the same thing,
just differing in the preliminary checks they make.

So, we can remove the method, and just implement spapr_qirq() directly,
including all the relevant checks in one place. We change all those
checks into assert()s as well, since a failure here indicates an error in
the calling code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

spapr: Fix indexing of XICS irqs

spapr global irq numbers are different from the source numbers on the ICS
when using XICS - they're offset by XICS_IRQ_BASE (0x1000). But
spapr_irq_set_irq_xics() was passing through the global irq number to
the ICS code unmodified.

We only got away with this because of a counteracting bug - we were
incorrectly adjusting the qemu_irq we returned for a requested global irq
number.

That approach mostly worked but is very confusing, incorrectly relies on
the way the qemu_irq array is allocated, and undermines the intention of
having the global array of qemu_irqs for spapr have a consistent meaning
regardless of irq backend.

So, fix both set_irq and qemu_irq indexing. We rename some parameters at
the same time to make it clear that they are referring to spapr global
irq numbers.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>

spapr: Eliminate nr_irqs parameter to SpaprIrq::init

The only reason this parameter was needed was to work around the
inconsistent meaning of nr_irqs between xics and xive. Now that we've
fixed that, we can consistently use the number directly in the SpaprIrq
configuration.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>

spapr: Clarify and fix handling of nr_irqs

Both the XICS and XIVE interrupt backends have a "nr-irqs" property, but
it means slightly different things.  For XICS (or, strictly, the ICS) it
indicates the number of "real" external IRQs.  Those start at XICS_IRQ_BASE
(0x1000) and don't include the special IPI vector.  For XIVE, however, it
includes the whole IRQ space, including XIVE's many IPI vectors.

The spapr code currently doesn't handle this sensibly, with the
nr_irqs value in SpaprIrq having different meanings depending on the
backend.  We fix this by renaming nr_irqs to nr_xirqs and making it
always indicate just the number of external irqs, adjusting the value
we pass to XIVE accordingly.  We also move to using common constants
in most of the irq configurations, to make it clearer that the IRQ
space looks the same to the guest (and emulated devices), even if the
backend is different.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>

spapr: Replace spapr_vio_qirq() helper with spapr_vio_irq_pulse() helper

Every caller of spapr_vio_qirq() immediately calls qemu_irq_pulse() with
the result, so we might as well just fold that into the helper.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

spapr: Fold spapr_phb_lsi_qirq() into its single caller

No point having a two-line helper that's used exactly once, and not likely
to be used anywhere else in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

xics: Create sPAPR specific ICS subtype

We create a subtype of TYPE_ICS specifically for sPAPR. For now all this
does is move the setup of the PAPR specific hcalls and RTAS calls to
the realize() function for this, rather than requiring the PAPR code to
explicitly call xics_spapr_init(). In future it will have some more
function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>

xics: Merge TYPE_ICS_BASE and TYPE_ICS_SIMPLE classes

TYPE_ICS_SIMPLE is the only subtype of TYPE_ICS_BASE that's ever
instantiated.  The existence of different classes is mostly a hang
over from when we (misguidedly) had separate subtypes for the KVM and
non-KVM version of the device.

There could be some call for an abstract base type for ICS variants
that use a different representation of their state (PowerNV PHB3 might
want this).  The current split isn't really in the right place for
that though.  If we need this in future, we can re-implement it more
in line with what we actually need.

So, collapse the two classes together into just TYPE_ICS.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>

xics: Eliminate reset hook

Currently TYPE_XICS_BASE and TYPE_XICS_SIMPLE have their own reset methods,
using the standard technique for having the subtype call the supertype's
methods before doing its own thing.

But TYPE_XICS_SIMPLE is the only subtype of TYPE_XICS_BASE ever
instantiated, so there's no point having the split here. Merge them
together into just an ics_reset() function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>

xics: Rename misleading ics_simple_*() functions

There are a number of ics_simple_*() functions that aren't actually
specific to TYPE_XICS_SIMPLE at all, and are equally valid on
TYPE_XICS_BASE. Rename them to ics_*() accordingly.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>

xics: Eliminate 'reject', 'resend' and 'eoi' class hooks

Currently ics_reject(), ics_resend() and ics_eoi() indirect through
class methods. But there's only one implementation of each method,
the one in TYPE_ICS_SIMPLE. TYPE_ICS_BASE has no implementation, but
it's never instantiated, and has no other subtypes.

So clean up by eliminating the method and just having ics_reject(),
ics_resend() and ics_eoi() contain the logic directly.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>

xics: Minor fixes for XICSFabric interface

Interface instances should never be directly dereferenced. So, the common
practice is to make them incomplete types to make sure no-one does that.
XICSFrabric, however, had a dummy type which is less safe.

We were also using OBJECT_CHECK() where we should have been using
INTERFACE_CHECK().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>

spapr/xive: skip partially initialized vCPUs in presenter

When vCPUs are hotplugged, they are added to the QEMU CPU list before
being fully realized. This can crash the XIVE presenter because the
'tctx' pointer is not necessarily initialized when looking for a
matching target.

These vCPUs are not valid targets for the presenter. Skip them.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191001085722.32755-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>

target/ppc: use Vsr macros in BCD helpers

This allows us to remove more endian-specific defines from int_helper.c.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20190926204453.31837-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

spapr: Render full FDT on ibm,client-architecture-support

The ibm,client-architecture-support call is a way for the guest to
negotiate capabilities with a hypervisor. It is implemented as:
- the guest calls SLOF via client interface;
- SLOF calls QEMU (H_CAS hypercall) with an options vector from the guest;
- QEMU returns a device tree diff (which uses FDT format with
an additional header before it);
- SLOF walks through the partial diff tree and updates its internal tree
with the values from the diff.

This changes QEMU to simply re-render the entire tree and send it as
an update. SLOF can handle this already mostly, [1] is needed before this
can be applied. This stores the resulting tree in the spapr machine to have
the latest valid FDT copy possible (this should not matter much as
H_UPDATE_DT happens right after that but nevertheless).

The benefit is reduced code size as there is no need for another set of
DT rendering helpers such as spapr_fixup_cpu_dt().

The downside is that the updates are bigger now (as they include all
nodes and properties) but the difference on a '-smp 256,threads=1' system
before/after is 2.35s vs. 2.5s.

[1] https://patchwork.ozlabs.org/patch/1152915/

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr-pci: Stop providing assigned-addresses

QEMU does not allocate PCI resources (BARs) in any case - coldplug devices
are configured by the firmware and hotplug devices rely on the guest
system to do the assignment via the PCI rescan mechanism. Also in order
to create non empty "assigned-addresses", the device has to be enabled
(i.e. PCI_COMMAND needs the MMIO bit set) first as otherwise
io_regions[i].addr are -1, and devices are not enabled at this point.

This removes "assigned-addresses" and leaves it to those who actually
do resource allocation.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190927022651.71642-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: remove unnecessary if() around calls to set_dfp{64,128}() in DFP macros

Now that the parameters to both set_dfp64() and set_dfp128() are exactly the
same, there is no need for an explicit if() statement to determine which
function should be called based upon size. Instead we can simply use the
preprocessor to generate the call to set_dfp##size() directly.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-8-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: use existing VsrD() macro to eliminate HI_IDX and LO_IDX from dfp_helper.c

Switch over all accesses to the decimal numbers held in struct PPC_DFP from
using HI_IDX and LO_IDX to using the VsrD() macro instead. Not only does this
allow the compiler to ensure that the various dfp_* functions are being passed
a ppc_vsr_t rather than an arbitrary uint64_t pointer, but also allows the
host endian-specific HI_IDX and LO_IDX to be completely removed from
dfp_helper.c.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-7-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: change struct PPC_DFP decimal storage from uint64[2] to ppc_vsr_t

There are several places in dfp_helper.c that access the decimal number
representations in struct PPC_DFP via HI_IDX and LO_IDX defines which are set
at the top of dfp_helper.c according to the host endian.

However we can instead switch to using ppc_vsr_t for decimal numbers and then
make subsequent use of the existing VsrD() macros to access the correct
element regardless of host endian. Note that 64-bit decimals are stored in the
LSB of ppc_vsr_t (equivalent to VsrD(1)).

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-6-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: introduce dfp_finalize_decimal{64,128}() helper functions

Most of the DFP helper functions call decimal{64,128}FromNumber() just before
returning in order to convert the decNumber stored in dfp.t64 back to a
Decimal{64,128} to write back to the FP registers.

Introduce new dfp_finalize_decimal{64,128}() helper functions which both enable
the parameter list to be reduced considerably, and also help minimise the
changes required in the next patch.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-5-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: update {get,set}_dfp{64,128}() helper functions to read/write DFP numbers correctly

Since commit ef96e3ae96 "target/ppc: move FP and VMX registers into aligned vsr
register array" FP registers are no longer stored consecutively in memory and so
the current method of combining FP register pairs into DFP numbers is incorrect.

Firstly update the definition of the dh_*_fprp defines in helper.h to reflect
that FP registers are now stored as part of an array of ppc_vsr_t elements
rather than plain uint64_t elements, and then introduce a new ppc_fprp_t type
which conceptually represents a DFP even-odd register pair to be consumed by the
DFP helper functions.

Finally update the new DFP {get,set}_dfp{64,128}() helper functions to convert
between DFP numbers and DFP even-odd register pairs correctly, making use of the
existing VsrD() macro to access the correct elements regardless of host endian.

Fixes: ef96e3ae96 "target/ppc: move FP and VMX registers into aligned vsr register array"
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-4-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: introduce set_dfp{64,128}() helper functions

The existing functions (now incorrectly) assume that the MSB and LSB of DFP
numbers are stored as consecutive 64-bit words in memory. Instead of accessing
the DFP numbers directly, introduce set_dfp{64,128}() helper functions to ease
the switch to the correct representation.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: introduce get_dfp{64,128}() helper functions

The existing functions (now incorrectly) assume that the MSB and LSB of DFP
numbers are stored as consecutive 64-bit words in memory. Instead of accessing
the DFP numbers directly, introduce get_dfp{64,128}() helper functions to ease
the switch to the correct representation.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

pseries: Update SLOF firmware image

This fixes USB host bus adapter name in the device tree to match QEMU's
one.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: Stop providing RTAS blob

SLOF implements one itself so let's remove it from QEMU. It is one less
image and simpler setup as the RTAS blob never stays in its initial place
anyway as the guest OS always decides where to put it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>