Anton Nefedov [Mon, 23 Sep 2019 12:17:34 +0000 (15:17 +0300)]
scsi: move unmap error checking to the complete callback
This will help to account the operation in the following commit.
The difference is that we don't call scsi_disk_req_check_error() before
the 1st discard iteration anymore. That function also checks if
the request is cancelled, however it shouldn't get canceled until it
yields in blk_aio() functions anyway.
Same approach is already used for emulate_write_same.
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190923121737.83281-7-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Anton Nefedov [Mon, 23 Sep 2019 12:17:33 +0000 (15:17 +0300)]
scsi: store unmap offset and nb_sectors in request struct
it allows to report it in the error handler
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Message-id: 20190923121737.83281-6-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Anton Nefedov [Mon, 23 Sep 2019 12:17:32 +0000 (15:17 +0300)]
ide: account UNMAP (TRIM) operations
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190923121737.83281-5-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Anton Nefedov [Mon, 23 Sep 2019 12:17:31 +0000 (15:17 +0300)]
block: add empty account cookie type
Each block_acct_done/failed call is designed to correspond to a
previous block_acct_start call, which initializes the stats cookie.
However sometimes it is not the case, e.g. some error paths might
report the same cookie twice because it is hard to accurately track if
the cookie was reported yet or not.
This patch cleans the cookie after report.
(Note: block_acct_failed/done without a previous block_acct_start at
all should be avoided. Uninitialized cookie might hold a garbage value
and there is still "< BLOCK_MAX_IOTYPE" assertion for that)
It will be particularly useful in ide code where it's hard to
keep track whether the request done its accounting or not: in the
following patch of the series, trim requests will do the accounting
separately.
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190923121737.83281-4-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Anton Nefedov [Mon, 23 Sep 2019 12:17:30 +0000 (15:17 +0300)]
qapi: add unmap to BlockDeviceStats
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20190923121737.83281-3-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Anton Nefedov [Mon, 23 Sep 2019 12:17:29 +0000 (15:17 +0300)]
qapi: group BlockDeviceStats fields
Make the stat fields definition slightly more readable.
Also reorder total_time_ns stats read-write-flush as done elsewhere.
Cosmetic change only.
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190923121737.83281-2-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:52 +0000 (17:20 +0300)]
iotests: 257: drop device_add
SCSI devices are unused in test, drop them.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-12-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:51 +0000 (17:20 +0300)]
iotests: 257: drop unused Drive.device field
After previous commit Drive.device is actually unused. Drop it together
with .name property. While being here reuse .node in qmp commands
instead of writing 'drive0' twice.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-11-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:50 +0000 (17:20 +0300)]
iotests: prepare 124 and 257 bitmap querying for backup-top filter
After backup-top filter appearing it's not possible to see dirty
bitmaps in top node, so use node-name instead.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-10-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:49 +0000 (17:20 +0300)]
block: teach bdrv_debug_breakpoint skip filters with backing
Teach bdrv_debug_breakpoint and bdrv_debug_remove_breakpoint skip
filters with backing. This is needed to implement and use in backup job
it's own backup_top filter driver (like mirror already has one), and
without this improvement, breakpoint removal will fail at least in 55
iotest.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-9-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:48 +0000 (17:20 +0300)]
block: move block_copy from block/backup.c to separate file
Split block_copy to separate file, to be cleanly shared with backup-top
filter driver in further commits.
It's a clean movement, the only change is drop "static" from interface
functions.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-8-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:47 +0000 (17:20 +0300)]
block/backup: fix block-comment style
We need to fix comment style around block-copy functions before further
moving them to separate file to satisfy checkpatch. But do more: fix
all comments style. Also, seems like doubled first asterisk is not
forbidden, but drop it too for consistency.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-7-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:46 +0000 (17:20 +0300)]
block/backup: introduce BlockCopyState
Split copying code part from backup to "block-copy", including separate
state structure and function renaming. This is needed to share it with
backup-top filter driver in further commits.
Notes:
1. As BlockCopyState keeps own BlockBackend objects, remaining
job->common.blk users only use it to get bs by blk_bs() call, so clear
job->commen.blk permissions set in block_job_create and add
job->source_bs to be used instead of blk_bs(job->common.blk), to keep
it more clear which bs we use when introduce backup-top filter in
further commit.
2. Rename s/initializing_bitmap/skip_unallocated/ to sound a bit better
as interface to BlockCopyState
3. Split is not very clean: there left some duplicated fields, backup
code uses some BlockCopyState fields directly, let's postpone it for
further improvements and keep this comment simpler for review.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190920142056.12778-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:45 +0000 (17:20 +0300)]
block/backup: improve comment about image fleecing
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-5-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:44 +0000 (17:20 +0300)]
block/backup: split shareable copying part from backup_do_cow
Split copying logic which will be shared with backup-top filter.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:43 +0000 (17:20 +0300)]
block/backup: fix backup_cow_with_offload for last cluster
We shouldn't try to copy bytes beyond EOF. Fix it.
Fixes: 9ded4a0114968e Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920142056.12778-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:42 +0000 (17:20 +0300)]
block/backup: fix max_transfer handling for copy_range
Of course, QEMU_ALIGN_UP is a typo, it should be QEMU_ALIGN_DOWN, as we
are trying to find aligned size which satisfy both source and target.
Also, don't ignore too small max_transfer. In this case seems safer to
disable copy_range.
Fixes: 9ded4a0114968e Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190920142056.12778-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Mon, 16 Sep 2019 17:53:24 +0000 (20:53 +0300)]
block/qcow2: introduce parallel subrequest handling in read and write
It improves performance for fragmented qcow2 images.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190916175324.18478-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Mon, 16 Sep 2019 17:53:23 +0000 (20:53 +0300)]
block/qcow2: refactor qcow2_co_pwritev_part
Similarly to previous commit, prepare for parallelizing write-loop
iterations.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190916175324.18478-5-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Mon, 16 Sep 2019 17:53:22 +0000 (20:53 +0300)]
block/qcow2: refactor qcow2_co_preadv_part
Further patch will run partial requests of iterations of
qcow2_co_preadv in parallel for performance reasons. To prepare for
this, separate part which may be parallelized into separate function
(qcow2_co_preadv_task).
While being here, also separate encrypted clusters reading to own
function, like it is done for compressed reading.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190916175324.18478-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Mon, 16 Sep 2019 17:53:21 +0000 (20:53 +0300)]
block: introduce aio task pool
Common interface for aio task loops. To be used for improving
performance of synchronous io loops in qcow2, block-stream,
copy-on-read, and may be other places.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190916175324.18478-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy [Mon, 16 Sep 2019 17:53:20 +0000 (20:53 +0300)]
qemu-iotests: ignore leaks on failure paths in 026
Upcoming asynchronous handling of sub-parts of qcow2 requests will
change number of leaked clusters and even make it racy. As a
preparation, ignore leaks on failure parts in 026.
It's not trivial to just grep or substitute qemu-img output for such
thing. Instead do better: 3 is a error code of qemu-img check, if only
leaks are found. Catch this case and print success output.
Suggested-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190916175324.18478-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
Peter Maydell [Tue, 8 Oct 2019 15:08:35 +0000 (16:08 +0100)]
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Pull request
This pull request also contains the two commits from the previous pull request
that was dropped due to a mingw compilation error. The compilation should now
be fixed.
# gpg: Signature made Tue 08 Oct 2019 15:54:26 BST
# gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request:
iotests/262: Switch source/dest VM launch order
block: Skip COR for inactive nodes
virtio-blk: schedule virtio_notify_config to run on main context
util/ioc.c: try to reassure Coverity about qemu_iovec_init_extended
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Max Reitz [Tue, 1 Oct 2019 17:48:27 +0000 (19:48 +0200)]
iotests/262: Switch source/dest VM launch order
Launching the destination VM before the source VM gives us a regression
test for HEAD^:
The guest device causes a read from the disk image through
guess_disk_lchs(). This will not work if the first sector (containing
the partition table) is yet unallocated, we use COR, and the node is
inactive.
By launching the source VM before the destination, however, the COR
filter on the source will allocate that area in the image shared between
both VMs, thus the problem will not become apparent.
Switching the launch order causes the sector to still be unallocated
when guess_disk_lchs() runs on the inactive node in the destination VM,
and thus we get our test case.
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191001174827.11081-3-mreitz@redhat.com
Message-Id: <20191001174827.11081-3-mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Max Reitz [Tue, 1 Oct 2019 17:48:26 +0000 (19:48 +0200)]
block: Skip COR for inactive nodes
We must not write data to inactive nodes, and a COR is certainly
something we can simply not do without upsetting anyone. So skip COR
operations on inactive nodes.
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191001174827.11081-2-mreitz@redhat.com
Message-Id: <20191001174827.11081-2-mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Vladimir Sementsov-Ogievskiy [Tue, 10 Sep 2019 09:03:10 +0000 (12:03 +0300)]
util/ioc.c: try to reassure Coverity about qemu_iovec_init_extended
Make it more obvious, that filling qiov corresponds to qiov allocation,
which in turn corresponds to total_niov calculation, based on mid_niov
(not mid_len). Still add an assertion to show that there should be no
difference.
[Added mingw "error: 'mid_iov' may be used uninitialized in this
function" compiler error fix suggested by Vladimir.
--Stefan]
Reported-by: Coverity (CID 1405302) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190910090310.14032-1-vsementsov@virtuozzo.com Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190910090310.14032-1-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
fixup! util/ioc.c: try to reassure Coverity about qemu_iovec_init_extended
Peter Maydell [Tue, 8 Oct 2019 11:30:13 +0000 (12:30 +0100)]
Merge remote-tracking branch 'remotes/philmd-gitlab/tags/edk2-next-20191007' into staging
Improve scripts relying on the EDK2 submodule,
drop Python2 dependency in EDK2 build scripts.
# gpg: Signature made Mon 07 Oct 2019 14:31:38 BST
# gpg: using RSA key 89C1E78F601EE86C867495CBA2A3FD6EDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (Phil) <philmd@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 89C1 E78F 601E E86C 8674 95CB A2A3 FD6E DEAD C0DE
* remotes/philmd-gitlab/tags/edk2-next-20191007:
edk2 build scripts: work around TianoCore#1607 without forcing Python 2
edk2 build scripts: honor external BaseTools flags with uefi-test-tools
roms: Add a 'make help' target alias
roms/Makefile.edk2: don't pull in submodules when building from tarball
make-release: pull in edk2 submodules so we can build it from tarballs
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Mon, 7 Oct 2019 14:40:53 +0000 (15:40 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- Fix internal snapshots with typical -blockdev setups
- iotests: Require Python 3.6 or later
# gpg: Signature made Fri 04 Oct 2019 10:59:21 BST
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
iotests: Remove Python 2 compatibility code
iotests: Require Python 3.6 or later
iotests: Test internal snapshots with -blockdev
block/snapshot: Restrict set of snapshot nodes
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
edk2 build scripts: work around TianoCore#1607 without forcing Python 2
It turns out that forcing python2 for running the edk2 "build" utility is
neither necessary nor sufficient.
Forcing python2 is not sufficient for two reasons:
- QEMU is moving away from python2, with python2 nearing EOL,
- according to my most recent testing, the lacking dependency information
in the makefiles that are generated by edk2's "build" utility can cause
parallel build failures even when "build" is executed by python2.
And forcing python2 is not necessary because we can still return to the
original idea of filtering out jobserver-related options from MAKEFLAGS.
So do that.
While at it, cut short edk2's auto-detection of the python3.* minor
version, by setting PYTHON_COMMAND to "python3" (which we expect to be
available wherever we intend to build edk2).
With this patch, the guest UEFI binaries that are used as part of the BIOS
tables test, and the OVMF and ArmVirtQemu platform firmwares, will be
built strictly in a single job, regardless of an outermost "-jN" make
option. Alas, there appears to be no reliable way to build edk2 in an
(outer make, inner make) environment, with a jobserver enabled.
Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: John Snow <jsnow@redhat.com> Cc: Philippe Mathieu-Daudé <philmd@redhat.com> Reported-by: John Snow <jsnow@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190920083808.21399-3-lersek@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Philippe Mathieu-Daudé [Fri, 20 Sep 2019 17:11:59 +0000 (19:11 +0200)]
roms: Add a 'make help' target alias
Various C projects provide a 'make help' target. Our root directory
does so. The roms/ directory lacks a such rule, but already displays
a help output when the default target is called.
Add a 'help' target aliased to the default one, to avoid:
$ make -C roms help
make: *** No rule to make target 'help'. Stop.
Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190920171159.18633-1-philmd@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Michael Roth [Thu, 12 Sep 2019 23:12:02 +0000 (18:12 -0500)]
roms/Makefile.edk2: don't pull in submodules when building from tarball
Currently the `make efi` target pulls submodules nested under the
roms/edk2 submodule as dependencies. However, when we attempt to build
from a tarball this fails since we are no longer in a git tree.
A preceding patch will pre-populate these submodules in the tarball,
so assume this build dependency is only needed when building from a
git tree.
Cc: Laszlo Ersek <lersek@redhat.com> Cc: Bruce Rogers <brogers@suse.com> Cc: qemu-stable@nongnu.org # v4.1.0 Reported-by: Bruce Rogers <brogers@suse.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20190912231202.12327-3-mdroth@linux.vnet.ibm.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Michael Roth [Thu, 12 Sep 2019 23:12:01 +0000 (18:12 -0500)]
make-release: pull in edk2 submodules so we can build it from tarballs
The `make efi` target added by 536d2173 is built from the roms/edk2
submodule, which in turn relies on additional submodules nested under
roms/edk2.
The make-release script currently only pulls in top-level submodules,
so these nested submodules are missing in the resulting tarball.
We could try to address this situation more generally by recursively
pulling in all submodules, but this doesn't necessarily ensure the
end-result will build properly (this case also required other changes).
Additionally, due to the nature of submodules, we may not always have
control over how these sorts of things are dealt with, so for now we
continue to handle it on a case-by-case in the make-release script.
Cc: Laszlo Ersek <lersek@redhat.com> Cc: Bruce Rogers <brogers@suse.com> Cc: qemu-stable@nongnu.org # v4.1.0 Reported-by: Bruce Rogers <brogers@suse.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20190912231202.12327-2-mdroth@linux.vnet.ibm.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Peter Maydell [Mon, 7 Oct 2019 12:49:02 +0000 (13:49 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20191004' into staging
ppc patch queue 2019-10-04
Here's the next batch of ppc and spapr patches. Includes:
* Fist part of a large cleanup to irq infrastructure
* Recreate the full FDT at CAS time, instead of making a difficult
to follow set of updates. This will help us move towards
eliminating CAS reboots altogether
* No longer provide RTAS blob to SLOF - SLOF can include it just as
well itself, since guests will generally need to relocate it with
a call to instantiate-rtas
* A number of DFP fixes and cleanups from Mark Cave-Ayland
* Assorted bugfixes
* Several new small devices for powernv
* remotes/dgibson/tags/ppc-for-4.2-20191004: (53 commits)
ppc/pnv: Remove the XICSFabric Interface from the POWER9 machine
spapr: Eliminate SpaprIrq::init hook
spapr: Add return value to spapr_irq_check()
spapr: Use less cryptic representation of which irq backends are supported
xive: Improve irq claim/free path
spapr, xics, xive: Better use of assert()s on irq claim/free paths
spapr: Handle freeing of multiple irqs in frontend only
spapr: Remove unhelpful tracepoints from spapr_irq_free_xics()
spapr: Eliminate SpaprIrq:get_nodename method
spapr: Simplify spapr_qirq() handling
spapr: Fix indexing of XICS irqs
spapr: Eliminate nr_irqs parameter to SpaprIrq::init
spapr: Clarify and fix handling of nr_irqs
spapr: Replace spapr_vio_qirq() helper with spapr_vio_irq_pulse() helper
spapr: Fold spapr_phb_lsi_qirq() into its single caller
xics: Create sPAPR specific ICS subtype
xics: Merge TYPE_ICS_BASE and TYPE_ICS_SIMPLE classes
xics: Eliminate reset hook
xics: Rename misleading ics_simple_*() functions
xics: Eliminate 'reject', 'resend' and 'eoi' class hooks
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/bonzini/tags/for-upstream: (29 commits)
target/i386/kvm: Silence warning from Valgrind about uninitialized bytes
target/i386: work around KVM_GET_MSRS bug for secondary execution controls
target/i386: add VMX features
vmxcap: correct the name of the variables
target/i386: add VMX definitions
target/i386: expand feature words to 64 bits
target/i386: introduce generic feature dependency mechanism
target/i386: handle filtered_features in a new function mark_unavailable_features
tests/docker: only enable ubsan for test-clang
win32: work around main-loop busy loop on socket/fd event
tests: skip serial test on windows
util: WSAEWOULDBLOCK on connect should map to EINPROGRESS
Fix wrong behavior of cpu_memory_rw_debug() function in SMM
memory: allow memory_region_register_iommu_notifier() to fail
vfio: Turn the container error into an Error handle
i386: Add CPUID bit for CLZERO and XSAVEERPTR
docker: test-debug: disable LeakSanitizer
lm32: do not leak memory on object_new/object_unref
cris: do not leak struct cris_disasm_data
mips: fix memory leaks in board initialization
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Thomas Huth [Tue, 24 Sep 2019 07:47:38 +0000 (09:47 +0200)]
target/i386/kvm: Silence warning from Valgrind about uninitialized bytes
When I run QEMU with KVM under Valgrind, I currently get this warning:
Syscall param ioctl(generic) points to uninitialised byte(s)
at 0x95BA45B: ioctl (in /usr/lib64/libc-2.28.so)
by 0x429DC3: kvm_ioctl (kvm-all.c:2365)
by 0x51B249: kvm_arch_get_supported_msr_feature (kvm.c:469)
by 0x4C2A49: x86_cpu_get_supported_feature_word (cpu.c:3765)
by 0x4C4116: x86_cpu_expand_features (cpu.c:5065)
by 0x4C7F8D: x86_cpu_realizefn (cpu.c:5242)
by 0x5961F3: device_set_realized (qdev.c:835)
by 0x7038F6: property_set_bool (object.c:2080)
by 0x707EFE: object_property_set_qobject (qom-qobject.c:26)
by 0x705814: object_property_set_bool (object.c:1338)
by 0x498435: pc_new_cpu (pc.c:1549)
by 0x49C67D: pc_cpus_init (pc.c:1681)
Address 0x1ffeffee74 is on thread 1's stack
in frame #2, created by kvm_arch_get_supported_msr_feature (kvm.c:445)
It's harmless, but a little bit annoying, so silence it by properly
initializing the whole structure with zeroes.
Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 2 Jul 2019 12:58:48 +0000 (14:58 +0200)]
target/i386: work around KVM_GET_MSRS bug for secondary execution controls
Some secondary controls are automatically enabled/disabled based on the CPUID
values that are set for the guest. However, they are still available at a
global level and therefore should be present when KVM_GET_MSRS is sent to
/dev/kvm.
Unfortunately KVM forgot to include those, so fix that.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 1 Jul 2019 16:32:17 +0000 (18:32 +0200)]
target/i386: add VMX features
Add code to convert the VMX feature words back into MSR values,
allowing the user to enable/disable VMX features as they wish. The same
infrastructure enables support for limiting VMX features in named
CPU models.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 1 Jul 2019 15:38:54 +0000 (17:38 +0200)]
target/i386: expand feature words to 64 bits
VMX requires 64-bit feature words for the IA32_VMX_EPT_VPID_CAP
and IA32_VMX_BASIC MSRs. (The VMX control MSRs are 64-bit wide but
actually have only 32 bits of information).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sometimes a CPU feature does not make sense unless another is
present. In the case of VMX features, KVM does not even allow
setting the VMX controls to some invalid combinations.
Therefore, this patch adds a generic mechanism that looks for bits
that the user explicitly cleared, and uses them to remove other bits
from the expanded CPU definition. If these dependent bits were also
explicitly *set* by the user, this will be a warning for "-cpu check"
and an error for "-cpu enforce". If not, then the dependent bits are
cleared silently, for convenience.
With VMX features, this will be used so that for example
"-cpu host,-rdrand" will also hide support for RDRAND exiting.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 2 Jul 2019 13:32:41 +0000 (15:32 +0200)]
target/i386: handle filtered_features in a new function mark_unavailable_features
The next patch will add a different reason for filtering features, unrelated
to host feature support. Extract a new function that takes care of disabling
the features and optionally reporting them.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 1 Oct 2019 13:48:55 +0000 (15:48 +0200)]
tests/docker: only enable ubsan for test-clang
-fsanitize=undefined is not the same thing as --enable-sanitizers. After
commit 47c823e ("tests/docker: add sanitizers back to clang build", 2019-09-11)
test-clang is almost duplicating the asan (test-debug) test, so
partly revert commit 47c823e5b while leaving ubsan enabled.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The ctx->notifier event is added to the gpoll sources in
aio_set_event_notifier(), aio_ctx_check() should clear the event
regardless of ctx->notified, since Windows sets the event by itself,
bypassing the aio->notified. This fixes qemu not clearing the event
resulting in a busy loop.
Paolo suggested to me on irc to call event_notifier_test_and_clear()
after select() >0 from aio-win32.c's aio_prepare. Unfortunately, not all
fds associated with ctx->notifiers are in AIO fd handlers set.
(qemu_set_nonblock() in util/oslib-win32.c calls qemu_fd_register()).
This is essentially a v2 of a patch that was sent earlier:
https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg00420.html
that resurfaced when James investigated Spice performance issues on Windows:
https://gitlab.freedesktop.org/spice/spice/issues/36
In order to test that patch, I simply tried running test-char on
win32, and it hangs. Applying that patch solves it. QIO idle sources
are not dispatched. I haven't investigated much further, I suspect
source priorities and busy looping still come into play.
This version keeps the "notified" field, so event_notifier_poll()
should still work as expected.
Cc: James Le Cuirot <chewi@gentoo.org> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Marc-André Lureau [Tue, 1 Oct 2019 13:26:07 +0000 (17:26 +0400)]
util: WSAEWOULDBLOCK on connect should map to EINPROGRESS
In general, WSAEWOULDBLOCK can be mapped to EAGAIN as done by
socket_error() (or EWOULDBLOCK). But for connect() with non-blocking
sockets, it actually means the operation is in progress:
https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-connect
"The socket is marked as nonblocking and the connection cannot be completed immediately."
(this is also the behaviour implemented by GLib GSocket)
This fixes socket_can_bind_connect() test on win32.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix wrong behavior of cpu_memory_rw_debug() function in SMM
There is a problem, that you don't have access to the data using cpu_memory_rw_debug() function when in SMM. You can't remotely debug SMM mode program because of that for example.
Likely attrs version of get_phys_page_debug should be used to get correct asidx at the end to handle access properly.
Here the patch to fix it.
Signed-off-by: Dmitry Poletaev <poletaev@ispras.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Eric Auger [Tue, 24 Sep 2019 08:25:17 +0000 (10:25 +0200)]
memory: allow memory_region_register_iommu_notifier() to fail
Currently, when a notifier is attempted to be registered and its
flags are not supported (especially the MAP one) by the IOMMU MR,
we generally abruptly exit in the IOMMU code. The failure could be
handled more nicely in the caller and especially in the VFIO code.
So let's allow memory_region_register_iommu_notifier() to fail as
well as notify_flag_changed() callback.
All sites implementing the callback are updated. This patch does
not yet remove the exit(1) in the amd_iommu code.
in SMMUv3 we turn the warning message into an error message saying
that the assigned device would not work properly.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Eric Auger [Tue, 24 Sep 2019 08:25:16 +0000 (10:25 +0200)]
vfio: Turn the container error into an Error handle
The container error integer field is currently used to store
the first error potentially encountered during any
vfio_listener_region_add() call. However this fails to propagate
detailed error messages up to the vfio_connect_container caller.
Instead of using an integer, let's use an Error handle.
Messages are slightly reworded to accomodate the propagation.
Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 1 Oct 2019 13:36:28 +0000 (15:36 +0200)]
docker: test-debug: disable LeakSanitizer
There are just too many leaks in device-introspect-test (especially for
the plethora of arm and aarch64 boards) to make LeakSanitizer useful;
disable it for now.
Whoever is interested in debugging leaks can also use valgrind like this:
Paolo Bonzini [Tue, 1 Oct 2019 13:36:27 +0000 (15:36 +0200)]
lm32: do not leak memory on object_new/object_unref
Bottom halves and ptimers are malloced, but nothing in these
files is freeing memory allocated by instance_init. Since
these are sysctl devices that are never unrealized, just moving
the allocations to realize is enough to avoid the leak in
practice (and also to avoid upsetting asan when running
device-introspect-test).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Thomas Huth [Mon, 30 Sep 2019 15:04:36 +0000 (17:04 +0200)]
hw/isa: Introduce a CONFIG_ISA_SUPERIO switch for isa-superio.c
Currently, isa-superio.c is always compiled as soon as CONFIG_ISA_BUS
is enabled. But there are also machines that have an ISA BUS without
any of the superio chips attached to it, so we should not compile
isa-superio.c in case we only compile a QEMU for such a machine.
Thus add a proper CONFIG_ISA_SUPERIO switch so that this file only gets
compiled when we really, really need it.
Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Kevin Wolf [Thu, 19 Sep 2019 16:18:31 +0000 (18:18 +0200)]
iotests: Remove Python 2 compatibility code
Some scripts check the Python version number and have two code paths to
accomodate both Python 2 and 3. Remove the code specific to Python 2 and
assert the minimum version of 3.6 instead (check skips Python tests in
this case, so the assertion would only ever trigger if a Python script
is executed manually).
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Kevin Wolf [Wed, 18 Sep 2019 08:50:05 +0000 (10:50 +0200)]
iotests: Require Python 3.6 or later
Running iotests is not required to build QEMU, so we can have stricter
version requirements for Python here and can make use of new features
and drop compatibility code earlier.
This makes qemu-iotests skip all Python tests if a Python version before
3.6 is used for the build.
Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Kevin Wolf [Tue, 17 Sep 2019 10:26:23 +0000 (12:26 +0200)]
block/snapshot: Restrict set of snapshot nodes
Nodes involved in internal snapshots were those that were returned by
bdrv_next(), inserted and not read-only. bdrv_next() in turn returns all
nodes that are either the root node of a BlockBackend or monitor-owned
nodes.
With the typical -drive use, this worked well enough. However, in the
typical -blockdev case, the user defines one node per option, making all
nodes monitor-owned nodes. This includes protocol nodes etc. which often
are not snapshottable, so "savevm" only returns an error.
Change the conditions so that internal snapshot still include all nodes
that have a BlockBackend attached (we definitely want to snapshot
anything attached to a guest device and probably also the built-in NBD
server; snapshotting block job BlockBackends is more of an accident, but
a preexisting one), but other monitor-owned nodes are only included if
they have no parents.
This makes internal snapshots usable again with typical -blockdev
configurations.
Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Peter Krempa <pkrempa@redhat.com>
Cédric Le Goater [Thu, 3 Oct 2019 14:36:17 +0000 (16:36 +0200)]
ppc/pnv: Remove the XICSFabric Interface from the POWER9 machine
The POWER8 PowerNV machine needs to implement a XICSFabric interface
as this is the POWER8 interrupt controller model. But the POWER9
machine uselessly inherits of XICSFabric from the common PowerNV
machine definition.
Open code machine definitions to have a better control on the
different interfaces each machine should define.
Fixes: f30c843ced50 ("ppc/pnv: Introduce PowerNV machines with fixed CPU models") Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191003143617.21682-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
David Gibson [Mon, 30 Sep 2019 02:24:43 +0000 (12:24 +1000)]
spapr: Eliminate SpaprIrq::init hook
This method is used to set up the interrupt backends for the current
configuration. However, this means some confusing redirection between
the "dual" mode init and the init hooks for xics only and xive only modes.
Since we now have simple flags indicating whether XICS and/or XIVE are
supported, it's easier to just open code each initialization directly in
spapr_irq_init(). This will also make some future cleanups simpler.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
David Gibson [Wed, 25 Sep 2019 05:12:07 +0000 (15:12 +1000)]
spapr: Use less cryptic representation of which irq backends are supported
SpaprIrq::ov5 stores the value for a particular byte in PAPR option vector
5 which indicates whether XICS, XIVE or both interrupt controllers are
available. As usual for PAPR, the encoding is kind of overly complicated
and confusing (though to be fair there are some backwards compat things it
has to handle).
But to make our internal code clearer, have SpaprIrq encode more directly
which backends are available as two booleans, and derive the OV5 value from
that at the point we need it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
David Gibson [Wed, 25 Sep 2019 03:24:14 +0000 (13:24 +1000)]
xive: Improve irq claim/free path
spapr_xive_irq_claim() returns a bool to indicate if it succeeded.
But most of the callers and one callee use int return values and/or an
Error * with more information instead. In any case, ints are a more
common idiom for success/failure states than bools (one never knows
what sense they'll be in).
So instead change to an int return value to indicate presence of error
+ an Error * to describe the details through that call chain.
It also didn't actually check if the irq was already claimed, which is
one of the primary purposes of the claim path, so do that.
spapr_xive_irq_free() also returned a bool... which no callers checked
and was always true, so just drop it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
David Gibson [Wed, 25 Sep 2019 03:49:59 +0000 (13:49 +1000)]
spapr, xics, xive: Better use of assert()s on irq claim/free paths
The irq claim and free paths for both XICS and XIVE check for some
validity conditions. Some of these represent genuine runtime failures,
however others - particularly checking that the basic irq number is in a
sane range - could only fail in the case of bugs in the callin code.
Therefore use assert()s instead of runtime failures for those.
In addition the non backend-specific part of the claim/free paths should
only be used for PAPR external irqs, that is in the range SPAPR_XIRQ_BASE
to the maximum irq number. Put assert()s for that into the top level
dispatchers as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
David Gibson [Tue, 24 Sep 2019 14:12:21 +0000 (00:12 +1000)]
spapr: Handle freeing of multiple irqs in frontend only
spapr_irq_free() can be used to free multiple irqs at once. That's useful
for its callers, but there's no need to make the individual backend hooks
handle this. We can loop across the irqs in spapr_irq_free() itself and
have the hooks just do one at time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>
David Gibson [Tue, 24 Sep 2019 14:05:03 +0000 (00:05 +1000)]
spapr: Remove unhelpful tracepoints from spapr_irq_free_xics()
These traces contain some useless information (the always-0 source#) and
have no equivalents for XIVE mode. For now just remove them, and we can
put back something more sensible if and when we need it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
David Gibson [Mon, 23 Sep 2019 05:19:28 +0000 (15:19 +1000)]
spapr: Eliminate SpaprIrq:get_nodename method
This method is used to determine the name of the irq backend's node in the
device tree, so that we can find its phandle (after SLOF may have modified
it from the phandle we initially gave it).
But, in the two cases the only difference between the node name is the
presence of a unit address. Searching for a node name without considering
unit address is standard practice for the device tree, and
fdt_subnode_offset() will do exactly that, making this method unecessary.
While we're there, remove the XICS_NODENAME define. The name
"interrupt-controller" is required by PAPR (and IEEE1275), and a bunch of
places assume it already.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>
David Gibson [Mon, 23 Sep 2019 06:18:28 +0000 (16:18 +1000)]
spapr: Simplify spapr_qirq() handling
Currently spapr_qirq(), whic is used to find the qemu_irq for an spapr
global irq number, redirects through the SpaprIrq::qirq method. But
the array of qemu_irqs is allocated in the PAPR layer, not the
backends, and so the method implementations all return the same thing,
just differing in the preliminary checks they make.
So, we can remove the method, and just implement spapr_qirq() directly,
including all the relevant checks in one place. We change all those
checks into assert()s as well, since a failure here indicates an error in
the calling code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
David Gibson [Tue, 24 Sep 2019 01:12:19 +0000 (11:12 +1000)]
spapr: Fix indexing of XICS irqs
spapr global irq numbers are different from the source numbers on the ICS
when using XICS - they're offset by XICS_IRQ_BASE (0x1000). But
spapr_irq_set_irq_xics() was passing through the global irq number to
the ICS code unmodified.
We only got away with this because of a counteracting bug - we were
incorrectly adjusting the qemu_irq we returned for a requested global irq
number.
That approach mostly worked but is very confusing, incorrectly relies on
the way the qemu_irq array is allocated, and undermines the intention of
having the global array of qemu_irqs for spapr have a consistent meaning
regardless of irq backend.
So, fix both set_irq and qemu_irq indexing. We rename some parameters at
the same time to make it clear that they are referring to spapr global
irq numbers.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
David Gibson [Tue, 24 Sep 2019 01:34:12 +0000 (11:34 +1000)]
spapr: Eliminate nr_irqs parameter to SpaprIrq::init
The only reason this parameter was needed was to work around the
inconsistent meaning of nr_irqs between xics and xive. Now that we've
fixed that, we can consistently use the number directly in the SpaprIrq
configuration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
David Gibson [Tue, 24 Sep 2019 00:53:50 +0000 (10:53 +1000)]
spapr: Clarify and fix handling of nr_irqs
Both the XICS and XIVE interrupt backends have a "nr-irqs" property, but
it means slightly different things. For XICS (or, strictly, the ICS) it
indicates the number of "real" external IRQs. Those start at XICS_IRQ_BASE
(0x1000) and don't include the special IPI vector. For XIVE, however, it
includes the whole IRQ space, including XIVE's many IPI vectors.
The spapr code currently doesn't handle this sensibly, with the
nr_irqs value in SpaprIrq having different meanings depending on the
backend. We fix this by renaming nr_irqs to nr_xirqs and making it
always indicate just the number of external irqs, adjusting the value
we pass to XIVE accordingly. We also move to using common constants
in most of the irq configurations, to make it clearer that the IRQ
space looks the same to the guest (and emulated devices), even if the
backend is different.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>
David Gibson [Mon, 23 Sep 2019 05:50:09 +0000 (15:50 +1000)]
spapr: Replace spapr_vio_qirq() helper with spapr_vio_irq_pulse() helper
Every caller of spapr_vio_qirq() immediately calls qemu_irq_pulse() with
the result, so we might as well just fold that into the helper.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
David Gibson [Mon, 23 Sep 2019 05:43:58 +0000 (15:43 +1000)]
spapr: Fold spapr_phb_lsi_qirq() into its single caller
No point having a two-line helper that's used exactly once, and not likely
to be used anywhere else in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
David Gibson [Tue, 24 Sep 2019 05:51:55 +0000 (15:51 +1000)]
xics: Create sPAPR specific ICS subtype
We create a subtype of TYPE_ICS specifically for sPAPR. For now all this
does is move the setup of the PAPR specific hcalls and RTAS calls to
the realize() function for this, rather than requiring the PAPR code to
explicitly call xics_spapr_init(). In future it will have some more
function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
David Gibson [Tue, 24 Sep 2019 05:29:25 +0000 (15:29 +1000)]
xics: Merge TYPE_ICS_BASE and TYPE_ICS_SIMPLE classes
TYPE_ICS_SIMPLE is the only subtype of TYPE_ICS_BASE that's ever
instantiated. The existence of different classes is mostly a hang
over from when we (misguidedly) had separate subtypes for the KVM and
non-KVM version of the device.
There could be some call for an abstract base type for ICS variants
that use a different representation of their state (PowerNV PHB3 might
want this). The current split isn't really in the right place for
that though. If we need this in future, we can re-implement it more
in line with what we actually need.
So, collapse the two classes together into just TYPE_ICS.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
David Gibson [Tue, 24 Sep 2019 04:19:22 +0000 (14:19 +1000)]
xics: Eliminate reset hook
Currently TYPE_XICS_BASE and TYPE_XICS_SIMPLE have their own reset methods,
using the standard technique for having the subtype call the supertype's
methods before doing its own thing.
But TYPE_XICS_SIMPLE is the only subtype of TYPE_XICS_BASE ever
instantiated, so there's no point having the split here. Merge them
together into just an ics_reset() function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
David Gibson [Tue, 24 Sep 2019 04:13:39 +0000 (14:13 +1000)]
xics: Rename misleading ics_simple_*() functions
There are a number of ics_simple_*() functions that aren't actually
specific to TYPE_XICS_SIMPLE at all, and are equally valid on
TYPE_XICS_BASE. Rename them to ics_*() accordingly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
David Gibson [Tue, 24 Sep 2019 03:56:47 +0000 (13:56 +1000)]
xics: Eliminate 'reject', 'resend' and 'eoi' class hooks
Currently ics_reject(), ics_resend() and ics_eoi() indirect through
class methods. But there's only one implementation of each method,
the one in TYPE_ICS_SIMPLE. TYPE_ICS_BASE has no implementation, but
it's never instantiated, and has no other subtypes.
So clean up by eliminating the method and just having ics_reject(),
ics_resend() and ics_eoi() contain the logic directly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
David Gibson [Tue, 24 Sep 2019 06:00:33 +0000 (16:00 +1000)]
xics: Minor fixes for XICSFabric interface
Interface instances should never be directly dereferenced. So, the common
practice is to make them incomplete types to make sure no-one does that.
XICSFrabric, however, had a dummy type which is less safe.
We were also using OBJECT_CHECK() where we should have been using
INTERFACE_CHECK().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>
Cédric Le Goater [Tue, 1 Oct 2019 08:57:22 +0000 (10:57 +0200)]
spapr/xive: skip partially initialized vCPUs in presenter
When vCPUs are hotplugged, they are added to the QEMU CPU list before
being fully realized. This can crash the XIVE presenter because the
'tctx' pointer is not necessarily initialized when looking for a
matching target.
These vCPUs are not valid targets for the presenter. Skip them.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191001085722.32755-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
Mark Cave-Ayland [Thu, 26 Sep 2019 20:44:53 +0000 (21:44 +0100)]
target/ppc: use Vsr macros in BCD helpers
This allows us to remove more endian-specific defines from int_helper.c.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20190926204453.31837-1-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
spapr: Render full FDT on ibm,client-architecture-support
The ibm,client-architecture-support call is a way for the guest to
negotiate capabilities with a hypervisor. It is implemented as:
- the guest calls SLOF via client interface;
- SLOF calls QEMU (H_CAS hypercall) with an options vector from the guest;
- QEMU returns a device tree diff (which uses FDT format with
an additional header before it);
- SLOF walks through the partial diff tree and updates its internal tree
with the values from the diff.
This changes QEMU to simply re-render the entire tree and send it as
an update. SLOF can handle this already mostly, [1] is needed before this
can be applied. This stores the resulting tree in the spapr machine to have
the latest valid FDT copy possible (this should not matter much as
H_UPDATE_DT happens right after that but nevertheless).
The benefit is reduced code size as there is no need for another set of
DT rendering helpers such as spapr_fixup_cpu_dt().
The downside is that the updates are bigger now (as they include all
nodes and properties) but the difference on a '-smp 256,threads=1' system
before/after is 2.35s vs. 2.5s.
QEMU does not allocate PCI resources (BARs) in any case - coldplug devices
are configured by the firmware and hotplug devices rely on the guest
system to do the assignment via the PCI rescan mechanism. Also in order
to create non empty "assigned-addresses", the device has to be enabled
(i.e. PCI_COMMAND needs the MMIO bit set) first as otherwise
io_regions[i].addr are -1, and devices are not enabled at this point.
This removes "assigned-addresses" and leaves it to those who actually
do resource allocation.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190927022651.71642-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mark Cave-Ayland [Thu, 26 Sep 2019 18:58:01 +0000 (19:58 +0100)]
target/ppc: remove unnecessary if() around calls to set_dfp{64,128}() in DFP macros
Now that the parameters to both set_dfp64() and set_dfp128() are exactly the
same, there is no need for an explicit if() statement to determine which
function should be called based upon size. Instead we can simply use the
preprocessor to generate the call to set_dfp##size() directly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-8-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mark Cave-Ayland [Thu, 26 Sep 2019 18:58:00 +0000 (19:58 +0100)]
target/ppc: use existing VsrD() macro to eliminate HI_IDX and LO_IDX from dfp_helper.c
Switch over all accesses to the decimal numbers held in struct PPC_DFP from
using HI_IDX and LO_IDX to using the VsrD() macro instead. Not only does this
allow the compiler to ensure that the various dfp_* functions are being passed
a ppc_vsr_t rather than an arbitrary uint64_t pointer, but also allows the
host endian-specific HI_IDX and LO_IDX to be completely removed from
dfp_helper.c.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-7-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mark Cave-Ayland [Thu, 26 Sep 2019 18:57:59 +0000 (19:57 +0100)]
target/ppc: change struct PPC_DFP decimal storage from uint64[2] to ppc_vsr_t
There are several places in dfp_helper.c that access the decimal number
representations in struct PPC_DFP via HI_IDX and LO_IDX defines which are set
at the top of dfp_helper.c according to the host endian.
However we can instead switch to using ppc_vsr_t for decimal numbers and then
make subsequent use of the existing VsrD() macros to access the correct
element regardless of host endian. Note that 64-bit decimals are stored in the
LSB of ppc_vsr_t (equivalent to VsrD(1)).
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-6-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Most of the DFP helper functions call decimal{64,128}FromNumber() just before
returning in order to convert the decNumber stored in dfp.t64 back to a
Decimal{64,128} to write back to the FP registers.
Introduce new dfp_finalize_decimal{64,128}() helper functions which both enable
the parameter list to be reduced considerably, and also help minimise the
changes required in the next patch.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-5-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mark Cave-Ayland [Thu, 26 Sep 2019 18:57:57 +0000 (19:57 +0100)]
target/ppc: update {get,set}_dfp{64,128}() helper functions to read/write DFP numbers correctly
Since commit ef96e3ae96 "target/ppc: move FP and VMX registers into aligned vsr
register array" FP registers are no longer stored consecutively in memory and so
the current method of combining FP register pairs into DFP numbers is incorrect.
Firstly update the definition of the dh_*_fprp defines in helper.h to reflect
that FP registers are now stored as part of an array of ppc_vsr_t elements
rather than plain uint64_t elements, and then introduce a new ppc_fprp_t type
which conceptually represents a DFP even-odd register pair to be consumed by the
DFP helper functions.
Finally update the new DFP {get,set}_dfp{64,128}() helper functions to convert
between DFP numbers and DFP even-odd register pairs correctly, making use of the
existing VsrD() macro to access the correct elements regardless of host endian.
Fixes: ef96e3ae96 "target/ppc: move FP and VMX registers into aligned vsr register array" Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-4-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The existing functions (now incorrectly) assume that the MSB and LSB of DFP
numbers are stored as consecutive 64-bit words in memory. Instead of accessing
the DFP numbers directly, introduce set_dfp{64,128}() helper functions to ease
the switch to the correct representation.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-3-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The existing functions (now incorrectly) assume that the MSB and LSB of DFP
numbers are stored as consecutive 64-bit words in memory. Instead of accessing
the DFP numbers directly, introduce get_dfp{64,128}() helper functions to ease
the switch to the correct representation.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-2-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
SLOF implements one itself so let's remove it from QEMU. It is one less
image and simpler setup as the RTAS blob never stays in its initial place
anyway as the guest OS always decides where to put it.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>